Jagorar Fasaha

KServe da Model Hidima akan Kubernetes

KServe daidaitaccen dandamali ne na Kubernetes na asali don ba da samfuran koyan inji a sikeli.

Dubawa

KServe daidaitaccen dandamali ne na Kubernetes na asali don ba da samfuran koyan inji a sikeli. Yana ba ƙungiyoyi guda ɗaya, hanyar bayyanawa don ƙaddamar da ƙira tare da autoscaling, canary rollouts, da sikelin-zuwa-sifi, yana kawar da mafi yawan bututun Kubernetes.

KServe da Model Hidima akan Kubernetes tubalin ginin fasaha ne wanda ke shafar ingancin samfuri, farashin kayan more rayuwa, latency, da aminci a sikeli.

Zurfafa nutsewa

Wanda aka fi sani da KFServing kuma an haife shi daga aikin Kubeflow, KServe yana fayyace albarkatu na al'ada Sabis na Inference. Kuna rubuta ɗan gajeren fayil ɗin YAML yana nunawa a ƙirar da aka adana a cikin ma'ajin abu (S3, GCS, Azure Blob), kuma KServe yana ɗaukar sauran. Yana goyan bayan duka tsinkayar tsinkaya da, ƙara, samar da sabis na LLM. KServe yana jigilar 'lokacin yin hidima' da aka riga aka gina don tsarin gama gari (TensorFlow Serving, TorchServe, Triton, scikit-learn, XGBoost, Hugging Face) kuma yana goyan bayan kwantena na al'ada. An gina shi a saman Sabis ɗin Knative da layin sadarwar (Istio ko makamancin haka), yana ba da ƙididdiga mai sarrafa buƙatu gami da ma'auni na gaskiya-zuwa-sifili, don haka samfuran marasa aiki suna cinye ƙididdiga. Hakanan yana daidaita API ɗin tsinkaya a kusa da Ƙaddamarwa ta Buɗe, don haka abokan ciniki suna magana da kowane ƙira iri ɗaya ba tare da la'akari da tsarin ba.

Fahimtar Fasaha

KServe's autoscaling yana dogara ne akan Knative, wanda ke daidaita ƙididdige ƙididdigewa bisa la'akari ko buƙatun-dakika guda kuma zai iya faɗuwa zuwa kwafin sifili lokacin da zirga-zirgar ababen hawa ta tsaya, sannan sanyi-fara kan buƙata. Sabis ɗin InferenceService yana ƙaddamar da cikakken bututun ƙididdigewa zuwa mai tsinkaya, mai canzawa (pre/post-processing), da abubuwan bayani. Samfuran suna ɗaukar kaya daga ma'ajin abu ta hanyar 'ma'ajiyar kayan ajiya' waɗanda ke jawo kayan tarihi zuwa cikin kwafsa yayin farawa, keɓance ma'ajin ƙira daga hoton kwantena.

Jagorar KServe da Samfurin Hidima akan Kubernetes

KServe daidaitaccen dandamali ne na Kubernetes na asali don ba da samfuran koyan inji a sikeli. Yana ba ƙungiyoyi guda ɗaya, hanyar bayyanawa don ƙaddamar da ƙira tare da autoscaling, canary rollouts, da sikelin-zuwa-sifi, yana kawar da mafi yawan bututun Kubernetes. KServe da Model Hidima akan Kubernetes tubalin ginin fasaha ne wanda ke shafar ingancin samfuri, farashin kayan more rayuwa, latency, da aminci a sikeli. Don gina zurfin fahimta, bi KServe da Model Hidima akan Kubernetes azaman ƙirar aiki, ba fasali ɗaya ba: ayyana sakamakon da ake so, fayyace zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da KServe da Model Hidima akan Kubernetes suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar KServe da Model Hidima akan Kubernetes

KServe yana ci gaba da sauri zuwa ga haɓaka AI, yana ƙara waƙa mai mai da hankali kan LLM tare da fasali kamar KV-cache-aware routing, caching model, da rarrabuwa prefill/decode hidima ga manyan harsuna. Yi tsammanin haɗin kai mai zurfi tare da injunan ƙididdigewa kamar vLLM, mafi kyawun kumburi masu yawa don samfura masu girma da yawa ga GPU ɗaya, da matakin matakin ƙofa don daidaita nauyi na tushen alamar. A matsayin aikin CNCF-incubating, yana zama madaidaicin buɗaɗɗen gaskiya don sanya samfura a bayan Kubernetes, yana rage rata tsakanin kayan aikin bincike da ƙarshen samarwa.

Aiwatar da Gaskiyar Duniya

Banki yana ƙaddamar da ƙirar ƙima ta hanyar rubuta layin InferenceService YAML mai lamba 10 yana nuni a ƙirar a cikin S3, tare da KServe sarrafa autoscaling da shiga.

Ƙungiyar kasuwancin e-commerce tana amfani da sauye-sauye na KServe don aika kashi 10 na zirga-zirga zuwa sabon samfurin shawarwarin, sannan ta kai kashi 100 da zarar ma'auni ya yi kyau.

Gidan binciken bincike yana ba da nau'ikan nau'ikan da ba a cika amfani da su ba tare da sikelin-zuwa-sifili, don haka kowane ƙirar yana jujjuyawa kawai lokacin da buƙatu ta zo kuma ba ta cinye GPU yayin aiki.

Ƙungiya ta MLOps tana amfani da ɓangaren mai canzawa na KServe don gudanar da girman hoto da daidaitawa kafin mai tsinkaya ya gudanar da samfurin hangen nesa na Triton.

Hanyoyin Aiwatarwa

KServe da Model Hidima akan Kubernetes a aikace

Banki yana ƙaddamar da ƙirar ƙima ta hanyar rubuta layin InferenceService YAML mai lamba 10 yana nuni a ƙirar a cikin S3, tare da KServe sarrafa autoscaling da shiga.

Banki yana ƙaddamar da ƙirar ƙira ta hanyar rubuta layin InferenceService YAML mai lamba 10 yana nuni a ƙirar a cikin S3, tare da KServe sarrafa autoscaling da ƙungiyoyin shiga yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓaka ɗan adam don ƙararraki, da bin diddigin abubuwan samarwa da ƙimar kuskure akan lokaci.

KServe da Model Hidima akan Kubernetes a aikace

Ƙungiyar kasuwancin e-commerce tana amfani da sauye-sauye na KServe don aika kashi 10 na zirga-zirga zuwa sabon samfurin shawarwarin, sannan ta kai kashi 100 da zarar ma'auni ya yi kyau.

Ƙungiyar kasuwancin e-commerce tana amfani da ƙaddamarwa na KServe na canary don aika kashi 10 na zirga-zirga zuwa sabon samfurin shawarwarin, sannan ramps zuwa kashi 100 da zarar ma'auni suna kallon lafiya Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓaka ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da farashin kuskure akan lokaci.

KServe da Model Hidima akan Kubernetes a aikace

Gidan binciken bincike yana ba da nau'ikan nau'ikan da ba a cika amfani da su ba tare da sikelin-zuwa-sifili, don haka kowane ƙirar yana jujjuyawa kawai lokacin da buƙatu ta zo kuma ba ta cinye GPU yayin aiki.

Lab ɗin bincike yana ba da ɗimbin samfuran da ba a cika amfani da su ba tare da sikelin-zuwa-sifili, don haka kowane ƙirar yana jujjuya sama kawai lokacin da buƙatu ta zo kuma ba ta cinye GPU yayin da ƙungiyoyi marasa aiki sukan sami sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da tsadar kurakurai a kan lokaci.

KServe da Model Hidima akan Kubernetes a aikace

Ƙungiya ta MLOps tana amfani da ɓangaren mai canzawa na KServe don gudanar da girman hoto da daidaitawa kafin mai tsinkaya ya gudanar da samfurin hangen nesa na Triton.

Ƙungiyar MLOps tana amfani da ɓangaren mai canzawa na KServe don gudanar da girman hoto da daidaitawa kafin mai tsinkaya ya gudanar da samfurin hangen nesa na Triton Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da kuma bin diddigin abubuwan da ake samu da kuma farashin kuskure a kan lokaci.

Hatsari & Tsare-tsare

!

Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.

!

Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.

!

Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.

Taswirar Hanya

1

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike