GUIDE teknik

KServe ak xeetu liggéey ci Kubernetes

KServe platform buñ yamale la, juddoo ci Kubernetes ngir joxe xeetu jàngu masin ci anam wu yaatu.

Résumé

KServe platform buñ yamale la, juddoo ci Kubernetes ngir joxe xeetu jàngu masin ci anam wu yaatu. Dafay jox ekip yi benn anam bu leer ngir mëna jëfandikoo ay model ak autoscaling, génne canary, ak scale-to-zero, di dindi li ëpp ci plomberie Kubernetes.

KServe ak Model Serving ci Kubernetes ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi.

Plongeur bu xóot

Bu njëkk ñu xamee ko ci KFServing te mingi juddoo ci projet Kubeflow, KServe dafay màndargaal ab jumtukaay buñ jagleel InferenceService. Danga bind benn fichier YAML bu gàtt buy joxoñ benn model buñ denc ci dencukaay mbir (S3, GCS, Azure Blob), KServe mooy liggéey ci leneen. Dafay jàppale inference yiy wax luy waaja am ak, di gëna yokk, LLM biy defar. KServe dafay yónnee ay 'waxtu liggéey' yuñ defaree lu jiitu ngir kaadar yiñ gëna xam (TensorFlow, TorchServe, Triton, scikit-learn, XGBoost, Hugging Face) te dafay jàppale konteneer yiñ personaaliseer. Ñu tabax ko ci kaw Knative Serving ak benn layer reso (Istio wala lu mel noonu), dafay joxe autoscaling bu lalu ci laaj boole ci dëgg-dëgg scale-to-zero, kon model yu idle yi duñu lekk benn ordinatër. Dafay yamale itam API biy wax luy waaja am ci Open Inference Protocol, suko defee kiliyaan yi di waxtaan ak model bu nekk ci anam wu wuute, kaadar bi du ci dara.

Gis-gis xarala

KServe's autoscaling mingi wéeru ci Knative, mooy xayma limu replika yi ci concurrence wala laaj-ci-segond te mën na wàcci ba amul benn replika su trafik bi taxawee, ba noppi tàmbali sedd ci laaj. InferenceService dafay dindi ab pipeline inference bu mat sëkk ci biir predictor, transformateur (balaa/ ginaaw liggéey), ak composant yiy leeral. Modèle yi dañuy sarse ci dencukaay mbir jaaraleko ci 'initializers dencukaay' yiy xëcc artefact yi ci pod bi ci ndoorte li, di dindi dencukaay model ci nataalu conteneur biy liggéey.

Xam KServe ak xeetu liggéey ci Kubernetes

KServe platform buñ yamale la, juddoo ci Kubernetes ngir joxe xeetu jàngu masin ci anam wu yaatu. Dafay jox ekip yi benn anam bu leer ngir mëna jëfandikoo ay model ak autoscaling, génne canary, ak scale-to-zero, di dindi li ëpp ci plomberie Kubernetes. KServe ak Model Serving ci Kubernetes ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal KServe ak Model Serving ci Kubernetes ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo KServe ak Model Serving ci Kubernetes dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu KServe ak xeetu liggéey ci Kubernetes

KServe mingi gaaw jëm kanam ci IA buy defar, yokk benn piste bu lalu ci LLM ak ay man-man yu melni KV-cache-aware routing, model cache, ak prefill / decode buñ xaaj ngir model làkk yu mag. Xaarandi lëkkaloo bu gëna xóot ak motëri inference yu melni vLLM, gëna baax ci liggéeyum node yu bari ngir model yu rëy lool ci benn GPU, ak yoon ci niveau gateway ngir balance charge bu sukkandiko ci token. Bi mu nekkee projet buy dundal CNCF, mingi nekk de facto standard bu ubbeeku ngir teg ay model ci ginaaw Kubernetes, di wàññi bërëb bi am ci digganté ay mbiri gëstu ak ay poñ yu mujj yu am doole.

Doxal ci àdduna dëgg

Benn bànk dafay dugal ab xeetu poñ leble ci bind ab InferenceService YAML bu am 10 ligne di joxoñ xeetu S3, ak KServe di yoriinu eskalaasioŋ otomatik ak dugg.

Benn ekipu e-commerce dafay jëfandikoo KServe canary rollouts ngir yónnee 10 pursaa ci dem bi ak dikk bi ci xeetu xalaat bu bees, ba noppi dem ba 100 pursaa su metrics yi xoolee bu baax.

Laboratoire buy gëstu dafay liggéey ci fukki-fukki model yu ñu bariwul luñu koy jëfandikoo ak scale-to-zero, kon model bu nekk dafay wëréelu su amee laaj bu yegsi te du lekk benn GPU bimu nekkee ci idle.

Benn ekipu MLOps dafay jëfandikoo benn komponent transformatër KServe ngir doxal nataal buy soppi yaatuwaayam ak normalisasioŋ balaa prediktër bi di doxal benn xeetu gis-gis bu Triton di liggéey.

Modèlu jëfandikoo

KServe ak xeetu liggéey ci Kubernetes ci jëf

Benn bànk dafay dugal ab xeetu poñ leble ci bind ab InferenceService YAML bu am 10 ligne di joxoñ xeetu S3, ak KServe di yoriinu eskalaasioŋ otomatik ak dugg.

Benn bànk dafay dugal benn xeetu poñ leble ci bind benn InferenceService YAML bu 10 ligne di joxoñ xeetu S3, ak KServe di jëfandikoo autoscaling ak ingress Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ak topp njëg yépp.

KServe ak xeetu liggéey ci Kubernetes ci jëf

Benn ekipu e-commerce dafay jëfandikoo KServe canary rollouts ngir yónnee 10 pursaa ci dem bi ak dikk bi ci xeetu xalaat bu bees, ba noppi dem ba 100 pursaa su metrics yi xoolee bu baax.

Benn ekipu e-commerce dafay jëfandikoo KServe canary rollouts ngir yónnee 10 pursaa ci dem bi ak dikk bi ci benn xeetu xalaat bu bees, ba noppi ramp ba 100 pursaa benn yoon metrics yi nuru lu wér.

KServe ak xeetu liggéey ci Kubernetes ci jëf

Laboratoire buy gëstu dafay liggéey ci fukki-fukki model yu ñu bariwul luñu koy jëfandikoo ak scale-to-zero, kon model bu nekk dafay wëréelu su amee laaj bu yegsi te du lekk benn GPU bimu nekkee ci idle.

Laboratoire gëstukat yi dañuy liggéey fukki-fukki model yu bari yu ñu jëfandikoowul ak scale-to-zero, kon model bu nekk dafay wëréelu su amee laaj bu yegsi te du lekk benn GPU ci jamono ji Teams yu idle dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu kalite ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ak topp error time ga.

KServe ak xeetu liggéey ci Kubernetes ci jëf

Benn ekipu MLOps dafay jëfandikoo benn komponent transformatër KServe ngir doxal nataal buy soppi yaatuwaayam ak normalisasioŋ balaa prediktër bi di doxal benn xeetu gis-gis bu Triton di liggéey.

Benn ekipu MLOps dafay jëfandikoo benn komponent transformatër KServe ngir doxal nataal buy soppi dayo ak normalisasioŋ balaa prediktor bi di doxal benn xeetu gis-gis bu Triton-served. Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee kalite ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njariñu liggéey ak njëgu njuumte ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu