GUIDE teknik

Gaawaay bu xóot ak Megatron

DeepSpeed (Microsoft) ak Megatron-LM (NVIDIA) ñooy losisel yiy tax modeli tàggat yu am ay miliyaar ciy parametre ci ay junni GPU mëna dem.

Résumé

DeepSpeed (Microsoft) ak Megatron-LM (NVIDIA) ñooy losisel yiy tax modeli tàggat yu am ay miliyaar ciy parametre ci ay junni GPU mëna dem. Suñu leen amul, xeetu frontiere yu tay yi mënu ñu nekk ci memory wala jeexal tàggat ci diir bu gàtt.

DeepSpeed ​​ak Megatron Training Stacks dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi.

Plongeur bu xóot

Taggat model bu rëy ci benn GPU lu jafe la ndax poid yi, gradient yi ak stade optimiser yi mënul ànd. Stack yooyu dañu xaaj liggéey bi ci GPU yu bari. Megatron-LM moo njëkka amal paralelism tensor, di dagg matrix bu nekk ci biir layer bu nekk ci GPU yi, boole ci parallelism pipeline, biy def ay layer yu wuute ci GPU yu wuute. Siñaale DeepSpeed ​​mooy ZeRO (Zero Redundancy Optimizer), mooy xaaj staadu optimisatër bi, degrade yi, ak parametre yi ci GPU yi ci barab bi ñu leen di toppandoo, di dagg mémoire bu GPU bu nekk. Ñaari mbir yooyu dañu leen di faral di boole (megatron-gaawaay bu xóot) ngir tàggat model yu melni BLOOM-176B ak Megatron-Turing NLG. Dañuy yokk itam njaxasu njub, saytu aktiwite, ak yebbi ci CPU wala NVMe suko defee model yu mag yi di tàggat ci hardware bu néew.

Gis-gis xarala

ZeRO amna ñatti etape ngir yokk sakkanal mémoire: etape 1 shards optimizer states, etape 2 itam shards gradients, ak etape 3 shards parametre yi ci seen bopp, dajale leen ci laaj ci diir yi ñuy jaar ci kanam ak ci ginaaw. Buñu ko boole ak paralelism tensor (ci biir couche) ak paralelism pipeline (ci biir couche), loolu dafay forme 'paralelism 3D.' Tension bi gëna am solo mooy jokkoo bi ci kaw: bépp xaaj buñ xaaj dafay yokk trafik GPU-to-GPU, kon ingenieur yi dañuy tune xaaj bi ngir mëna wéy di gaaw NVLink ak InfiniBand lëkkalekaay yu fees.

Xam DeepSpeed ak Megatron

DeepSpeed ​​(Microsoft) ak Megatron-LM (NVIDIA) ñooy losisel yiy tax modeli tàggat yu am ay miliyaar ciy parametre ci ay junni GPU mëna dem. Suñu leen amul, xeetu frontiere yu tay yi mënu ñu nekk ci memory wala jeexal tàggat ci diir bu gàtt. DeepSpeed ​​ak Megatron Training Stacks dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal DeepSpeed ​​ak Megatron Training Stacks ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo DeepSpeed ​​ak Megatron Training Stacks dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu DeepSpeed ak Megatron

Xaarandi lëkkaloo bu gëna dëgër ak FSDP (Fully Sharded Data Parallel) bu PyTorch, bi jël xalaati ZeRO yu bari, di nëbb diggante stack gëstu ak kaadar core. Xeetu jegewaale yiy doxal compilatër bi ak waajal paralelism otomatik yi seen mébet mooy dindi ajustement manuel bi. Bi clusters tàggat yaram di màgg ba yegg ci téemeeri junni accelerator, tolerans ci njuumte, elastic scaling, ak jokkoo buy jaxasoo ak ordinatër nekk nañu frontière ingenieur yi gëna am solo, ci wetu jàppale hardware yu bees yu melni NVIDIA Blackwell ak chips tàggat yaram.

Doxal ci àdduna dëgg

Taggat xeetu BLOOM-176B bu ubbeeku ci làkk yu bari, jëfandikoo Megatron-DeepSpeed ​​buñ boole muy boole téemeeri GPU.

ak NVIDIA ñu ngi tàggat xeetu Megatron-Turing NLG bu am 530 milyaar ci paralelism 3D.

ZeRO-Offload dafay may gëstukat yi ñu mëna defar ay model yu bari ay paramet ci benn GPU station de travail ci di tuuru stade optimisateur yi ci RAM CPU bi.

Jëfandikool checkpointing aktivasioŋ ci stack yii ngir mëna ànd ak palanteer yu gëna gudd ci xaymawaat aktivasioŋ yi ci barabu denc leen ñépp.

Modèlu jëfandikoo

DeepSpeed ​​ak Megatron di tàggat yaram ci jëf

Taggat xeetu BLOOM-176B bu ubbeeku ci làkk yu bari, jëfandikoo Megatron-DeepSpeed ​​buñ boole muy boole téemeeri GPU.

Taggat xeetu BLOOM-176B bu ubbeeku ci làkk yu bari, di jëfandikoo Megatron-DeepSpeed ​​stack ci téemeeri GPUs.

DeepSpeed ​​ak Megatron di tàggat yaram ci jëf

ak NVIDIA ñu ngi tàggat xeetu Megatron-Turing NLG bu am 530 milyaar ci paralelism 3D.

__AIU_PROTECTED_5_ ak NVIDIA di tàggat modelu Megatron-Turing NLG bu am 530 milyaar ciy parametre ak parallelism 3D. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee lim bu baax ci kanam, tëye yoonu eskalaasioŋ nit ngir njuumte yu mag yi, ak topp njëg yépp.

DeepSpeed ​​ak Megatron di tàggat yaram ci jëf

ZeRO-Offload dafay may gëstukat yi ñu mëna defar ay model yu bari ay paramet ci benn GPU station de travail ci di tuuru stade optimisateur yi ci RAM CPU bi.

ZeRO-Offload may gëstukat yi ñu defar ay model yu bari-parametre ci benn GPU station de travail ci di tuuru réew yu optimizer yi ci CPU RAM Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njuréefi produit ak error.

DeepSpeed ​​ak Megatron di tàggat yaram ci jëf

Jëfandikool checkpointing aktivasioŋ ci stack yii ngir mëna ànd ak palanteer yu gëna gudd ci xaymawaat aktivasioŋ yi ci barabu denc leen ñépp.

Jëfandikoo checkpointing activation ci stacks yii ngir mëna ànd ak palanteer yu gëna gudd ci recomputer activations yi ci barabu denc leen ñépp. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu