Uhlolojikelele
I-DeepSpeed (Microsoft) kanye ne-Megatron-LM (NVIDIA) izitaki zesofthiwe ezenza amamodeli okuqeqesha anezigidigidi zamapharamitha ezinkulungwaneni zama-GPU abe nokwenzeka ngempela. Ngaphandle kwazo, amamodeli wanamuhla awakwazanga ukungena enkumbulweni noma aqedele ukuqeqeshwa ngesikhathi esifanele.
I-DeepSpeed and Megatron Training Stacks iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini.
I-Deep Dive
Ukuqeqesha imodeli enkulu ku-GPU eyodwa akwenzeki ngoba izisindo, ama-gradient, nezimo ze-optimizer azilingani. Lezi zitaki zihlukanisa umsebenzi kuma-GPU amaningi. I-Megatron-LM iphayona i-tensor parallelism, isika ukuphindaphindeka kwe-matrix ngayinye ngaphakathi kwesendlalelo ngasinye kuma-GPU, kanye nokuhambisana kwamapayipi, okubeka izendlalelo ezihlukene kuma-GPU ahlukene. Umnikelo wesignesha we-DeepSpeed yi-ZeRO (Zero Redundancy Optimizer), ethuthukisa izifundazwe, ama-gradients, namapharamitha kuwo wonke ama-GPU esikhundleni sokuwaphindaphinda, ukusika inkumbulo ye-GPU ngayinye ngendlela emangazayo. Okubili kuvame ukuhlanganiswa (Megatron-DeepSpeed) ukuqeqesha amamodeli afana ne-BLOOM-176B ne-Megatron-Turing NLG. Baphinde bangeze ukunemba okuxubile, ukuhlola okuvulayo, nokulayishwa ku-CPU noma ku-NVMe ukuze amamodeli amakhulu aqeqeshe ngehadiwe elinganiselwe.
I-Technical Insight
I-ZeRO inezigaba ezintathu zokwandisa ukonga inkumbulo: Isigaba 1 se-shards optimizer states, Isiteji sesi-2 siphinde sishicilele ama-gradients, futhi Isiteji sesi-3 siphuca amapharamitha ngokwawo, ukuwaqoqa ngokufunwa ngesikhathi sokudlula okuya phambili nangemuva. Kuhlanganiswe ne-tensor parallelism (intra-layer) kanye nephayiphi parallelism (inter-layer), lokhu kwakha 'i-3D parallelism.' Ukushuba okubalulekile ukuxhumana okungaphezulu: konke ukuhlukaniswa kwe-shard kungeza ithrafikhi ye-GPU-to-GPU, ngakho onjiniyela bashuna ukuhlukana ukuze bagcine izixhumanisi ze-NVLink ne-InfiniBand zigcwele.
I-Mastering DeepSpeed kanye ne-Megatron Training Stacks
I-DeepSpeed (Microsoft) kanye ne-Megatron-LM (NVIDIA) izitaki zesofthiwe ezenza amamodeli okuqeqesha anezigidigidi zamapharamitha ezinkulungwaneni zama-GPU abe nokwenzeka ngempela. Ngaphandle kwazo, amamodeli wanamuhla awakwazanga ukungena enkumbulweni noma aqedele ukuqeqeshwa ngesikhathi esifanele. I-DeepSpeed and Megatron Training Stacks iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini. Ukuze wakhe ukuqonda okujulile, phatha i-DeepSpeed and Megatron Training Stacks njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.
Empeleni, amaqembu aqinile asebenzisa Izitaki Zokuqeqeshwa kwe-DeepSpeed ne-Megatron alungiselela izakhiwo, idatha, nokukhetha kwengqalasizinda ngokumelene nokuthembeka nezindleko. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.
Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ngesikhathi esifanayo, Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.
I-Strategic Impact
Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka.
Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha.
Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni.
Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Ukuqaliswa Komhlaba Wangempela
Ukuqeqesha imodeli yezilimi eziningi evulekile ye-BLOOM-176B kusetshenziswa isitaki se-Megatron-DeepSpeed esihlanganisiwe kumakhulukhulu ama-GPU.
Microsoft kanye ne-NVIDIA iqeqesha imodeli ye-530-billion-parameter Megatron-Turing NLG nge-3D parallelism.
I-ZeRO-Offload ivumela abacwaningi ukuthi bashune kahle amamodeli epharamitha ezigidigidi ku-GPU yendawo yokusebenza eyodwa ngokuchitha izimo ze-optimizer ku-CPU RAM.
Kusetshenziswa ukuhlola kokuvula kulezi zitaki ukuze kulingane amawindi womongo amade ngokubala kabusha ukwenza kusebenze esikhundleni sokukugcina konke.
Amaphethini Okusebenzisa
I-DeepSpeed kanye ne-Megatron Training Stacks iyasebenza
Ukuqeqesha imodeli yezilimi eziningi evulekile ye-BLOOM-176B kusetshenziswa isitaki se-Megatron-DeepSpeed esihlanganisiwe kumakhulukhulu ama-GPU.
Ukuqeqesha imodeli yezilimi eziningi ye-BLOOM-176B evulekile kusetshenziswa isitaki se-Megatron-DeepSpeed esihlanganisiwe emakhulwini Amathimba we-GPU ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-DeepSpeed kanye ne-Megatron Training Stacks iyasebenza
Microsoft kanye ne-NVIDIA iqeqesha imodeli ye-530-billion-parameter Megatron-Turing NLG nge-3D parallelism.
Microsoft kanye ne-NVIDIA iqeqeshela imodeli ye-530-billion-parameter ye-Megatron-Turing NLG enamaqembu e-3D parallelism ngokuvamile athola imiphumela engcono uma echaza imikhawulo yekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-DeepSpeed kanye ne-Megatron Training Stacks iyasebenza
I-ZeRO-Offload ivumela abacwaningi ukuthi bashune kahle amamodeli epharamitha ezigidigidi ku-GPU yendawo yokusebenza eyodwa ngokuchitha izimo ze-optimizer ku-CPU RAM.
I-ZeRO-Offload ivumela abacwaningi ukuthi bashune kahle amamodeli epharamitha ezigidigidi endaweni eyodwa yokusebenza ye-GPU ngokuchitha izifundazwe ze-CPU RAM Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-DeepSpeed kanye ne-Megatron Training Stacks iyasebenza
Kusetshenziswa ukuhlola kokuvula kulezi zitaki ukuze kulingane amawindi womongo amade ngokubala kabusha ukwenza kusebenze esikhundleni sokukugcina konke.
Ukusebenzisa ukuhlola kokuvula kulezi zitaki ukuze kulingane umongo omude amafasitela ngokubala kabusha ukwenziwa kusebenze esikhundleni sokukugcina wonke Amaqembu ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, agcina indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Izingozi & Guardrails
Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu.
Izindleko zengqalasizinda nezokulungisa zivame ukubukelwa phansi.
Izikhala zokuphepha nokubonakala zingakhula njengoba izinhlelo ziba nzima kakhulu.
Ukuqalisa Umhlahlandlela
Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa.
Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha.
Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi.
Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala.
Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.