Résumé
DeepSpeed (Microsoft) ak Megatron-LM (NVIDIA) ñooy losisel yiy tax modeli tàggat yu am ay miliyaar ciy parametre ci ay junni GPU mëna dem. Suñu leen amul, xeetu frontiere yu tay yi mënu ñu nekk ci memory wala jeexal tàggat ci diir bu gàtt.
DeepSpeed ak Megatron Training Stacks dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi.
Plongeur bu xóot
Taggat model bu rëy ci benn GPU lu jafe la ndax poid yi, gradient yi ak stade optimiser yi mënul ànd. Stack yooyu dañu xaaj liggéey bi ci GPU yu bari. Megatron-LM moo njëkka amal paralelism tensor, di dagg matrix bu nekk ci biir layer bu nekk ci GPU yi, boole ci parallelism pipeline, biy def ay layer yu wuute ci GPU yu wuute. Siñaale DeepSpeed mooy ZeRO (Zero Redundancy Optimizer), mooy xaaj staadu optimisatër bi, degrade yi, ak parametre yi ci GPU yi ci barab bi ñu leen di toppandoo, di dagg mémoire bu GPU bu nekk. Ñaari mbir yooyu dañu leen di faral di boole (megatron-gaawaay bu xóot) ngir tàggat model yu melni BLOOM-176B ak Megatron-Turing NLG. Dañuy yokk itam njaxasu njub, saytu aktiwite, ak yebbi ci CPU wala NVMe suko defee model yu mag yi di tàggat ci hardware bu néew.
Gis-gis xarala
ZeRO amna ñatti etape ngir yokk sakkanal mémoire: etape 1 shards optimizer states, etape 2 itam shards gradients, ak etape 3 shards parametre yi ci seen bopp, dajale leen ci laaj ci diir yi ñuy jaar ci kanam ak ci ginaaw. Buñu ko boole ak paralelism tensor (ci biir couche) ak paralelism pipeline (ci biir couche), loolu dafay forme 'paralelism 3D.' Tension bi gëna am solo mooy jokkoo bi ci kaw: bépp xaaj buñ xaaj dafay yokk trafik GPU-to-GPU, kon ingenieur yi dañuy tune xaaj bi ngir mëna wéy di gaaw NVLink ak InfiniBand lëkkalekaay yu fees.
Xam DeepSpeed ak Megatron
DeepSpeed (Microsoft) ak Megatron-LM (NVIDIA) ñooy losisel yiy tax modeli tàggat yu am ay miliyaar ciy parametre ci ay junni GPU mëna dem. Suñu leen amul, xeetu frontiere yu tay yi mënu ñu nekk ci memory wala jeexal tàggat ci diir bu gàtt. DeepSpeed ak Megatron Training Stacks dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal DeepSpeed ak Megatron Training Stacks ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.
Ci jëf, ekip yu am doole yiy jëfandikoo DeepSpeed ak Megatron Training Stacks dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.
njeextalu pexe
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.
Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.
Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Doxal ci àdduna dëgg
Taggat xeetu BLOOM-176B bu ubbeeku ci làkk yu bari, jëfandikoo Megatron-DeepSpeed buñ boole muy boole téemeeri GPU.
ak NVIDIA ñu ngi tàggat xeetu Megatron-Turing NLG bu am 530 milyaar ci paralelism 3D.
ZeRO-Offload dafay may gëstukat yi ñu mëna defar ay model yu bari ay paramet ci benn GPU station de travail ci di tuuru stade optimisateur yi ci RAM CPU bi.
Jëfandikool checkpointing aktivasioŋ ci stack yii ngir mëna ànd ak palanteer yu gëna gudd ci xaymawaat aktivasioŋ yi ci barabu denc leen ñépp.
Modèlu jëfandikoo
DeepSpeed ak Megatron di tàggat yaram ci jëf
Taggat xeetu BLOOM-176B bu ubbeeku ci làkk yu bari, jëfandikoo Megatron-DeepSpeed buñ boole muy boole téemeeri GPU.
Taggat xeetu BLOOM-176B bu ubbeeku ci làkk yu bari, di jëfandikoo Megatron-DeepSpeed stack ci téemeeri GPUs.
DeepSpeed ak Megatron di tàggat yaram ci jëf
ak NVIDIA ñu ngi tàggat xeetu Megatron-Turing NLG bu am 530 milyaar ci paralelism 3D.
__AIU_PROTECTED_5_ ak NVIDIA di tàggat modelu Megatron-Turing NLG bu am 530 milyaar ciy parametre ak parallelism 3D. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee lim bu baax ci kanam, tëye yoonu eskalaasioŋ nit ngir njuumte yu mag yi, ak topp njëg yépp.
DeepSpeed ak Megatron di tàggat yaram ci jëf
ZeRO-Offload dafay may gëstukat yi ñu mëna defar ay model yu bari ay paramet ci benn GPU station de travail ci di tuuru stade optimisateur yi ci RAM CPU bi.
ZeRO-Offload may gëstukat yi ñu defar ay model yu bari-parametre ci benn GPU station de travail ci di tuuru réew yu optimizer yi ci CPU RAM Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njuréefi produit ak error.
DeepSpeed ak Megatron di tàggat yaram ci jëf
Jëfandikool checkpointing aktivasioŋ ci stack yii ngir mëna ànd ak palanteer yu gëna gudd ci xaymawaat aktivasioŋ yi ci barabu denc leen ñépp.
Jëfandikoo checkpointing activation ci stacks yii ngir mëna ànd ak palanteer yu gëna gudd ci recomputer activations yi ci barabu denc leen ñépp. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.
Risk yi ak balustrade yi
Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.
Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.
Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.
Roadmap ngir samp gi
Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.
Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Benchmark ci biir sargal ak done yu dëggu.
Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.
Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.
Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.