UMHLAHLANDLELA Wobuchwepheshe

Ukugcwalisa Ngaphambilini Okuhlukanisiwe kanye Nokusebenza Kwekhodi

I-architecture esebenzayo ehlukanisa ukuchazwa kwemodeli yolimi ibe izigaba ezimbili ezihlukene—ukugcwalisa kuqala nokunquma—futhi iwaqhube kumachibi ahlukene e-GPU.

Uhlolojikelele

I-architecture esebenzayo ehlukanisa ukuchazwa kwemodeli yolimi ibe izigaba ezimbili ezihlukene—ukugcwalisa kuqala nokunquma—futhi iwaqhube kumachibi ahlukene e-GPU. Kubalulekile ngoba lezi zigaba ezimbili zinezifiso ezihlukile zehadiwe, futhi ukuziphoqa ukuthi zingene emshinini ofanayo kumosha umthamo futhi kulimaza ukubambezeleka.

I-Disaggregated Prefill and Decode Serving kuyibhlokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini.

I-Deep Dive

Uma i-LLM iphendula, isebenza ngezigaba ezimbili. Ukugcwalisa kuqala kufunda wonke ukwaziswa ngesikhathi esisodwa futhi kwakha inqolobane yenani lokhiye (KV); lokhu ukuqhuma okukhulu, okuhambisanayo, okuboshelwe kukhompuyutha okugcwele amayunithi ezibalo e-GPU. I-Decode bese ikhiqiza amathokheni elilodwa ngesikhathi, isinyathelo ngasinye sifunda yonke inqolobane ye-KV—indlela ebopha umkhawulokudonsa, i-compute elula. Hambani ndawonye, ​​ukugcwalisa okude kudala amakhodi awo wonke umuntu (ukuvinjwa kwekhanda lomugqa), futhi ukuhlanganisa kokubili kudala ukuphazamiseka. Ukwahlukanisa kubeka ukugcwalisa kuqala kuchibi elilodwa le-GPU bese kuqoshwa kwelinye, kudlulisa inqolobane ye-KV phakathi kwayo ngoxhumano olusheshayo njenge-NVLink noma i-InfiniBand. I-pool ngayinye ishunwa futhi ikalwe ngokuzimela, ithuthukisa i-goodput, i-smoothing tail latency, futhi ivumela opharetha ukuthi bashaye okuhloswe ngaso ithokheni yesikhathi ukuya kokuqala kanye nethokheni yesikhathi ngasinye ngesikhathi esisodwa.

I-Technical Insight

Zombili lezi zigaba zihlukile ngebhodlela labo. Ukugcwalisa kuqala kucubungula wonke amathokheni okwaziswa ngokuhambisana, ukuze i-FLOPs yayo ikale ngobude obusheshayo futhi ikhulisa ama-tensor cores. I-Decode i-autoregressive: ithokheni ngayinye entsha idinga iphasi elilodwa eliya phambili elifunda kabusha inqolobane ye-KV egcwele evela ku-HBM, ngakho ukuphuma kufakwe kusango lomkhawulokudonsa wememori, hhayi ukubala. Ukwahlukanisa kusebenzisa lokhu ngosayizi, ukuhlanganisa, ngisho nokukhetha ukufana okuhlukile kwechibi ngalinye, bese kuthunyelwa inqolobane ye-KV kusuka kubasebenzi abagcwaliswa ngaphambilini ukuze kuqondwe abasebenzi.

Ukuphatha Ukugcwalisa Okungahlanganisiwe Okuhlukanisiwe kanye Nokunikezela Ngekhodi

I-architecture esebenzayo ehlukanisa ukuchazwa kwemodeli yolimi ibe izigaba ezimbili ezihlukene—ukugcwalisa kuqala nokunquma—futhi iwaqhube kumachibi ahlukene e-GPU. Kubalulekile ngoba lezi zigaba ezimbili zinezifiso ezihlukile zehadiwe, futhi ukuziphoqa ukuthi zingene emshinini ofanayo kumosha umthamo futhi kulimaza ukubambezeleka. I-Disaggregated Prefill and Decode Serving kuyibhlokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini. Ukuze wakhe ukuqonda okujulile, phatha Ukugcwalisa Okungahlanganisiwe Okungahlangani kanye Nokukhipha Ikhodi Ukukhonza njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa Ukugcwalisa Okungahlanganisiwe Okungahlangani kanye Nokunikezwa Kwekhodi Yekhodi athuthukisa izakhiwo, idatha, nokukhetha kwengqalasizinda ngokumelene nokuthembeka nezindleko. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ngesikhathi esifanayo, Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa Lokugcwalisa Ngaphambilini Okuhlukanisiwe kanye Nokunikezwa Kwekhodi Yekhodi

Lindela ukuhlukaniswa ukuze kube okuzenzakalelayo kuzitaki zokukhiqiza. Amasistimu afana ne-DistServe, i-Splitwise, ne-Mooncake ayenze yaduma, futhi i-vLLM ne-NVIDIA Dynamo manje ithumela izindlela ezihlukanisiwe. Ucwaningo luphusha ukulungiselelwa kokudluliswa kwenqolobane ye-KV, ukuhlanganisa inqolobane nokuphinda kusetshenziswe kuzo zonke izicelo, ukulinganisa kabusha okunamandla kokugcwalisa kusengaphambili/ikhodi yokukala ngaphansi kwethrafikhi eguquguqukayo, kanye nokuhlanganiswa okuqinile nokugcinwa kwesikhashana kwesiqalo kanye nokugcwalisa kuqala okuhlanganisiwe. Njengoba amafasitela omongo ekhula abe izigidi zamathokheni, ukuhlukanisa lezi zigaba kuba semqoka kakhulu ekukhonzeni okubiza izindleko, ukubambezeleka okuphansi.

Ukuqaliswa Komhlaba Wangempela

Umsizi wengxoxo uhambisa idokhumenti ende eya kuqoqo lokugcwalisa esindayo, bese izimpendulo ezisakazwayo zisuka kuqoqo lekhodi elungiselelwe inkumbulo ukuze kugcinwe ukuthayipha ukubambezeleka kushelela.

I-NVIDIA Dynamo kanye ne-vLLM ivumela opharetha ukuthi basebenzise amaqembu ezisebenzi ahlukene okugcwalisa kusengaphambili futhi akhiphe ikhodi ukuze ukuqhuma kokwaziswa okude kungavimbi izizukulwane eziqhubekayo.

I-Mooncake (esetshenziswa i-Moonshot AI's Kimi) ihlukanisa ukugcwalisa kuqala bese ikhipha ikhodi futhi yengeza inqolobane ye-KV-cache esabalalisiwe ukuze kunqandwe ukuphinda kusetshenziswe ngokushesha okungafuneki esikalini.

Isevisi yokuqedela ikhodi inikezela ngechibi elincane lokugcwalisa kusengaphambili ukuze uthole imiyalo emifushane kanye nephuli enkulu yokukhipha ikhodi, njengoba izindleko eziningi zivela ekusakazeni amathokheni amaningi okukhiphayo.

Amaphethini Okusebenzisa

Ukugcwalisa Ngaphambilini Okuhlukanisiwe kanye Nokukhipha Ikhodi Ukukhonza ngokusebenza

Umsizi wengxoxo uhambisa idokhumenti ende eya kuqoqo lokugcwalisa esindayo, bese izimpendulo ezisakazwayo zisuka kuqoqo lekhodi elungiselelwe inkumbulo ukuze kugcinwe ukuthayipha ukubambezeleka kushelela.

Umsizi wengxoxo uhambisa idokhumenti ende eya eqoqweni lokugcwalisa elisindayo, bese izimpendulo ezisakazwayo ziphuma kuqoqo le-decode elungiselelwe inkumbulo ukuze aqhubeke ethayipha ukubambezeleka okushelelayo Amaqembu ngokuvamile athola imiphumela engcono uma echaza imikhawulo yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zephutha ngokuhamba kwesikhathi.

Ukugcwalisa Ngaphambilini Okuhlukanisiwe kanye Nokukhipha Ikhodi Ukukhonza ngokusebenza

I-NVIDIA Dynamo kanye ne-vLLM ivumela opharetha ukuthi basebenzise amaqembu ezisebenzi ahlukene okugcwalisa kusengaphambili futhi akhiphe ikhodi ukuze ukuqhuma kokwaziswa okude kungavimbi izizukulwane eziqhubekayo.

I-NVIDIA Dynamo kanye ne-vLLM ivumela opharetha ukuthi basebenzise amaqembu ezisebenzi ahlukene okugcwalisa kusengaphambili futhi akhiphe amakhodi ukuze ukwaziswa okude kungavimbi izizukulwane eziqhubekayo Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka kwabantu ngamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Ukugcwalisa Ngaphambilini Okuhlukanisiwe kanye Nokukhipha Ikhodi Ukukhonza ngokusebenza

I-Mooncake (esetshenziswa i-Moonshot AI's Kimi) ihlukanisa ukugcwalisa kuqala bese ikhipha ikhodi futhi yengeza inqolobane ye-KV-cache esabalalisiwe ukuze kunqandwe ukuphinda kusetshenziswe ngokushesha okungafuneki esikalini.

I-Moonshot (esetshenziswa i-Moonshot AI's Kimi) ihlukanisa ukugcwalisa kuqala nokuqopha futhi yengeza iphuli yenqolobane ye-KV esabalalisiwe ukuze kunqandwe ukuphindaphindeka okusheshayo esikalini Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza ngokuhamba kwesikhathi.

Ukugcwalisa Ngaphambilini Okuhlukanisiwe kanye Nokukhipha Ikhodi Ukukhonza ngokusebenza

Isevisi yokuqedela ikhodi inikezela ngechibi elincane lokugcwalisa kusengaphambili ukuze uthole imiyalo emifushane kanye nephuli enkulu yokukhipha ikhodi, njengoba izindleko eziningi zivela ekusakazeni amathokheni amaningi okukhiphayo.

Isevisi yokuqedela amakhodi inikezela ngechibi elincane lokugcwalisa kusengaphambili ukuze uthole ukwaziswa okufushane kanye nephuli enkulu yokukhipha ikhodi, njengoba izindleko eziningi zivela ekusakazeni amathokheni amaningi okukhiphayo Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu.

!

Izindleko zengqalasizinda nezokulungisa zivame ukubukelwa phansi.

!

Izikhala zokuphepha nokubonakala zingakhula njengoba izinhlelo ziba nzima kakhulu.

Ukuqalisa Umhlahlandlela

1

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa.

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha.

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi.

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala.

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole