HAGAHA Farsamada

DeepSpeed ​​iyo Megatron Tababarka

DeepSpeed ​​(Microsoft) iyo Megatron-LM (NVIDIA) waa xirmooyin software ka dhigaya moodooyinka tababbarka balaayiin cabbirro ah oo kumanaan GPUs ah oo run ahaantii macquul ah.

Dulmar

DeepSpeed ​​(Microsoft) iyo Megatron-LM (NVIDIA) waa xirmooyin software ka dhigaya moodooyinka tababbarka balaayiin cabbirro ah oo kumanaan GPUs ah oo run ahaantii macquul ah. Iyaga la'aantood, moodooyinka xuduudaha maanta si fudud kuma qabsan karaan xusuusta mana dhamayn karaan tababarka waqti macquul ah.

DeepSpeed ​​​​iyo Megatron Training Stacks waa dhisme farsamo oo saameeya tayada moodeelka, kharashka kaabayaasha, daahitaanka, iyo isku halaynta cabbirka.

quusitaanka qoto dheer

Tababarka nooc weyn oo hal GPU ah waa wax aan macquul aheyn sababtoo ah culeysyada, gradients, iyo gobolada wax hagaajiya kuma haboona. Xirmooyinkani waxay u kala qaybiyeen shaqada GPU-yo badan. Megatron-LM waxay hormood u noqotay isbarbardhigga tensor-ka, iyada oo jarjaraysa isku dhufashada shaxanka shakhsi ahaaneed gudaha lakab kasta GPU-yada oo dhan, iyo isbarbardhigga dhuumaha, kaas oo dhigaya lakabyo kala duwan GPU-yo kala duwan. Wax ku darsiga saxeexa DeepSpeed ​​waa ZeRO (Zero Redundancy Optimizer), kaas oo jeexjeexaya dawladaha hagaajinta, gradients, iyo cabirrada guud ahaan GPU-yada halkii ay ku soo celin lahaayeen iyaga, si weyn u jaraya xusuusta GPU-ba. Labada badanaa waa la isku daraa (Megatron-DeepSpeed ​​​​) si loo tababaro moodooyinka sida BLOOM-176B iyo Megatron-Turing NLG. Waxay sidoo kale ku daraan saxnaanta isku dhafan, isbaarada kicinta, iyo u dajinta CPU ama NVMe si moodooyinka waaweyn ay u tababaraan qalab xaddidan.

Aragtida Farsamada

ZeRO waxay leedahay saddex marxaladood oo kordhinta kaydinta xusuusta: Marxaladda 1-aad ee jaangooyooyinka hagaajinta, Marxaladda 2 sidoo kale waxay jajabisaa gradients, iyo Marxaladda 3 waxay gooysaa cabbirada laftooda, iyaga oo soo ururinaya baahida inta lagu jiro gudbinta hore iyo gadaal. Marka lagu daro isbarbardhigga tensor-ka (lakabka gudaha) iyo isbarbardhigga dhuumaha (lakabka dhexda), tani waxay sameysaa 'isbarbardhigga 3D'. Xiisadda ugu muhiimsan waa isgaarsiinta sare: kala qaybsanaan kastaa waxay ku daraysaa taraafikada GPU-to-GPU, marka injineeradu waxay hagaajiyaan kala qaybsanaanta si ay u xoojiyaan isku xirka NVLink iyo InfiniBand.

Mastering DeepSpeed ​​iyo Tababarka Megatron

DeepSpeed ​​​​(Microsoft) iyo Megatron-LM (NVIDIA) waa xirmooyin software ka dhigaya moodooyinka tababbarka balaayiin cabbirro ah oo kumanaan GPUs ah oo run ahaantii macquul ah. Iyaga la'aantood, moodooyinka xuduudaha maanta si fudud kuma qabsan karaan xusuusta mana dhamayn karaan tababarka waqti macquul ah. DeepSpeed ​​​​iyo Megatron Training Stacks waa dhisme farsamo oo saameeya tayada moodeelka, kharashka kaabayaasha, daahitaanka, iyo isku halaynta cabbirka. Si loo dhiso faham qoto dheer, ula dhaqan DeepSpeed ​​​​iyo Megatron Training Stacks sida moodal hawleed, ma aha hal sifo: qeex natiijooyinka la rabo, cadee fikradaha, oo kala saar waxa nidaamku si kalsooni leh u qaban karo iyo waxa wali u baahan go'aan khabiir.

Ficil ahaan, kooxaha xoogga leh ee isticmaalaya DeepSpeed ​​​​iyo Megatron Training Stacks waxay wanaajiyaan qaab dhismeedka, xogta, iyo doorashooyinka kaabayaasha ee ka soo horjeeda kalsoonida iyo qiimaha. Waxay diiwaangeliyaan shuruudaha guusha ee cad, tijaabiyaan xogta dhabta ah iyo qulqulka shaqada, waxayna ku celceliyaan ku saleysan qaababka guul darrida ee la arkay halkii ay hal mar ku guuleysan lahaayeen halbeegyada. Tani waa halka fahamka aragtida uu isu beddelo karti waara oo dhan badeecada, siyaasadda, iyo hawlgallada.

Go'aamada qaab-dhismeedku waxay horseedaan waxqabadka iyo kharashka hawlgalka sannadaha. Isla mar ahaantaana, hagaajinta hal bartilmaameed waxay qarin kartaa daciifnimada nidaamka ballaaran. Habka ugu adkeysi badan waa in la isku daro xawaaraha tijaabada iyo anshaxa maamulka: socodsiinta duuliyayaasha, qabashada caddaynta, daabacaadda go'aanka, iyo si joogto ah u cusboonaysii ilaalinta sida habdhaqanka moodeelka, filashada isticmaale, iyo shuruudaha sharciyaynta.

Saamaynta Istiraatijiyadeed

Go'aamada qaab-dhismeedku waxay horseedaan waxqabadka iyo kharashka hawlgalka sannadaha.

Go'aamada qaab-dhismeedku waxay horseedaan waxqabadka iyo kharashka hawlgalka sannadaha. Hawlgelinta tayada sare leh, tan waxaa loo tarjumaa shuruuc hawleed la cabbiri karo, xuduudaha lahaanshaha, iyo caadooyinka dib u eegista soo noqnoqda si kooxuhu ay u cabbiraan kalsoonida halkii ay ka saari lahaayeen madmadowga.

Waxbarashada farsamada waxay ka caawisaa kooxaha inay doortaan xidhmo sax ah, ma aha oo kaliya kan ugu cusub.

Waxbarashada farsamada waxay ka caawisaa kooxaha inay doortaan xidhmo sax ah, ma aha oo kaliya kan ugu cusub. Hawlgelinta tayada sare leh, tan waxaa loo tarjumaa shuruuc hawleed la cabbiri karo, xuduudaha lahaanshaha, iyo caadooyinka dib u eegista soo noqnoqda si kooxuhu ay u cabbiraan kalsoonida halkii ay ka saari lahaayeen madmadowga.

Doorashooyinka injineernimada ee wanaagsan waxay yareeyaan shilalka la isku halleyn karo ee wax soo saarka.

Doorashooyinka injineernimada ee wanaagsan waxay yareeyaan shilalka la isku halleyn karo ee wax soo saarka. Hawlgelinta tayada sare leh, tan waxaa loo tarjumaa shuruuc hawleed la cabbiri karo, xuduudaha lahaanshaha, iyo caadooyinka dib u eegista soo noqnoqda si kooxuhu ay u cabbiraan kalsoonida halkii ay ka saari lahaayeen madmadowga.

Mustaqbalka DeepSpeed ​​iyo Tababarka Megatron

Filo is dhexgalka adag ee PyTorch's FSDP (Fully Sharded Data Parallel), kaas oo nuugay fikrado badan oo ZeRO ah, oo mugdi gelinaya xariiqda udhaxeysa xirmooyinka cilmi-baarista iyo qaab-dhismeedka asaasiga ah. Hababka isku-dubbaridka ah iyo qorshayaasha isbarbar-dhigga tooska ah waxay ujeeddadoodu tahay inay meesha ka saaraan hagaajinta gacanta. Marka kooxaha tababarku ay u koraan boqollaal kun oo dardar-geliyayaal ah, dulqaadka khaladka ah, cabbirka laastikada, iyo isgaarsiinta isdhaafsiga ee xisaabinta ayaa noqda xudduudaha injineernimada ee ugu sarreeya, oo ay weheliso taageerada qalab cusub sida NVIDIA Blackwell iyo chips tababbarka caadada ah.

Dhaqangelinta Adduunka-dhabta ah

Tababarka qaabka furan ee BLOOM-176B ee luuqadaha badan ku hadla iyadoo la adeegsanayo isku dhafka Megatron-DeepSpeed ​​​​ee boqolaal GPU-yada ah.

Microsoft iyo NVIDIA waxay tababarayaan 530-bilyan-beere Megatron-Turing NLG model oo leh isbarbar 3D.

ZeRO-Offload waxay u oggolaanaysaa cilmi-baarayaashu inay hagaajiyaan moodooyinka balaayiin-beegyada-badan ee hal goob shaqo oo GPU ah iyagoo u daadinaya gobollada wax-qabad ee CPU RAM.

Isticmaalka isbaarada hawlgelinta ee xidhmooyinkan si aad ugu habboonaato daaqadaha macnaha guud adiga oo dib u xisaabinaya hawl-qabadyada halkii aad ku wada kaydin lahaydeen.

Hababka Dhaqangelinta

DeepSpeed ​​​​iyo Megatron Tabobarrada Tababarka ee ficil ahaan

Tababarka qaabka furan ee BLOOM-176B ee luuqadaha badan ku hadla iyadoo la adeegsanayo isku dhafka Megatron-DeepSpeed ​​​​ee boqolaal GPU-yada ah.

Tababarka qaabka furan ee BLOOM-176B ee luuqadaha badan ku hadla iyadoo la adeegsanayo isku dhafka Megatron-DeepSpeed ​​​​ee boqolaalka kooxood ee GPUs caadi ahaan waxay helayaan natiijooyin ka wanaagsan marka ay qeexaan heerarka tayada ee hore, u hayaan dariiqa kor u qaadida bini'aadamka ee kiisaska cirifka ah, oo la socdaan labadaba faa'iidooyinka wax soo saarka iyo kharashyada qaladka waqti ka dib.

DeepSpeed ​​​​iyo Megatron Tabobarrada Tababarka ee ficil ahaan

Microsoft iyo NVIDIA waxay tababarayaan 530-bilyan-beere Megatron-Turing NLG model oo leh isbarbar 3D.

Microsoft iyo NVIDIA waxay tababbaraan 530-bilyan-beeg-beegyada Megatron-Turing NLG oo wata 3D kooxaha isbarbar-dhigga ah sida caadiga ah waxay helayaan natiijooyin wanaagsan marka ay qeexaan heerarka tayada ee hore, u hayaan dariiqa kor u qaadista bini'aadamka ee kiisaska cirifka ah, oo la socdaan labadaba faa'iidooyinka wax soo saarka iyo kharashyada qaladka waqti ka dib.

DeepSpeed ​​​​iyo Megatron Tabobarrada Tababarka ee ficil ahaan

ZeRO-Offload waxay u oggolaanaysaa cilmi-baarayaashu inay hagaajiyaan moodooyinka balaayiin-beegyada-badan ee hal goob shaqo oo GPU ah iyagoo u daadinaya gobollada wax-qabad ee CPU RAM.

ZeRO-Offload u ogolaanaya cilmi-baarayaasha inay hagaajiyaan moodooyinka balaayiin-halbeegyada hal-abuurka ah ee GPU-ga iyagoo u daadinaya dawladaha hagaajinta CPU Kooxaha RAM caadi ahaan waxay helayaan natiijooyin wanaagsan marka ay qeexaan heerarka tayada ee hore, u hayaan dariiqa kor u kaca bini'aadamka ee kiisaska cirifka ah, oo la socdaan labadaba faa'iidooyinka wax soo saarka iyo kharashyada qaladka waqti ka dib.

DeepSpeed ​​​​iyo Megatron Tabobarrada Tababarka ee ficil ahaan

Isticmaalka isbaarada hawlgelinta ee xidhmooyinkan si aad ugu habboonaato daaqadaha macnaha guud adiga oo dib u xisaabinaya hawl-qabadyada halkii aad ku wada kaydin lahaydeen.

Isticmaalka isbaarada kicinta ee xirmooyinkan si ay ugu habboonaato daaqadaha macnaha guud adiga oo dib u xisaabinaya firfircoonida halkii ay ku kaydin lahaayeen dhammaan Kooxuhu waxay badanaa helaan natiijooyin ka wanaagsan marka ay qeexaan heerarka tayada ee hore, u ilaali dariiqa kor u kaca bini'aadamka ee kiisaska cirifka ah, oo la socdaan labadaba faa'iidooyinka wax soo saarka iyo kharashyada khaladka waqti ka dib.

Khatarta & Dariiqyada Ilaalada

!

Hagaajinta hal bartilmaameed waxay qarin kartaa daciifnimada nidaamka ballaaran.

!

Kaabayaasha dhaqaalaha iyo dayactirka inta badan waa la dhayalsadaa.

!

Nabadgelyada iyo daldaloolada u fiirsashada ayaa kori kara marka nidaamyadu noqdaan kuwo aad u adag.

Qorshe Hawleedka Dhaqangelinta

1

Qeex daahida, tayada, iyo bartilmaameedyada qiimaha ka hor inta aan la hirgelin.

Qeex daahida, tayada, iyo bartilmaameedyada qiimaha ka hor inta aan la hirgelin. Tallaabo kasta ula dhaqan sida albaabka caddaynta: haddii shuruudaha la buuxin waayo, hakad soo bixidda, xidh farqiga, ka dibna balaadhi isticmaalka.

2

Benchmark marka la eego culeyska dhabta ah iyo xaaladaha xogta.

Benchmark marka la eego culeyska dhabta ah iyo xaaladaha xogta. Tallaabo kasta ula dhaqan sida albaabka caddaynta: haddii shuruudaha la buuxin waayo, hakad soo bixidda, xidh farqiga, ka dibna balaadhi isticmaalka.

3

La socodka qalabka khaladaadka, leexashada, iyo saamaynta isticmaalaha.

La socodka qalabka khaladaadka, leexashada, iyo saamaynta isticmaalaha. Tallaabo kasta ula dhaqan sida albaabka caddaynta: haddii shuruudaha la buuxin waayo, hakad soo bixidda, xidh farqiga, ka dibna balaadhi isticmaalka.

4

U diyaari dib-u-noqoshada iyo dariiqyada jawaab-celinta dhacdada ka hor inta aanad miisaan.

U diyaari dib-u-noqoshada iyo dariiqyada jawaab-celinta dhacdada ka hor inta aanad miisaan. Tallaabo kasta ula dhaqan sida albaabka caddaynta: haddii shuruudaha la buuxin waayo, hakad soo bixidda, xidh farqiga, ka dibna balaadhi isticmaalka.

Sii wad Sahaminta