GUIDE teknik

paralelism tensor ngir model yu mag

Benn anam buñu mëna xaajalee math ci biir benn layer neuronal ci GPU yu bari suko defee benn model bu rëy lool ci benn aparey mën dox ba leegi.

Résumé

Benn anam buñu mëna xaajalee math ci biir benn layer neuronal ci GPU yu bari suko defee benn model bu rëy lool ci benn aparey mën dox ba leegi. Dafa am solo ndax model frontier yi amnañu téemeeri miliyaar ciy parametre yu benn GPU mënul tëye wala xayma lu gaaw kese.

Parallelism tensor ngir model yu mag yi, ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi.

Plongeur bu xóot

Parallelism tensor (ñu koy woowe itam parallelism model intra-couche) dafay xaaj matrice yu diisaay yi ci GPU yi moo gën ñu def ay couche yu mat ci ay aparey yu wuute. Ci biir transformateur, matrix bu mag bi dafay yokk - projections attention ak feed-forward MLP - dañuy xaaj: ci misaal, matrix poid bi njëkk ci MLP dañu ko xaaj ci kolon ak ñaareel bi ci ligne, kon GPU bu nekk dafay xayma benn dagg ak benn all-reduce boole ci resultaa yi. Xaajaloo nañu bàyyi xel ci bopp yi, GPU bu nekk di jëfandikoo benn subset. Ndax GPU bu nekk dafay bokk ci bépp couche ci benn yoon, tensor parallelism dafay wàññi memory bu GPU bu nekk ba noppi gaawal ordinatër bi, waaye dafay laaj jokkoo bu bari, bandwidth bu kawe diggante GPUs couche bu nekk. Loolu moo tax ñu koy faral di tëj ci biir benn node bu NVLink boole, ba noppi boole ci pipeline ak parallelism ci done ngir tàggat yu yaatu lool ak liggéey.

Gis-gis xarala

Kaf gi, Megatron-LM siiwal, mooy tànn dimension partition suko defee jokkoo bi gëna néew. Séddale matrix MLP bu njëkk bi ci kolon dafay tax GPU bu nekk jëfandikoo nonlinearite ci barab bi te amul sync; xaaj ñaareelu rang-wise dafay tekki ni génne yi soxla nañu benn all-reduce ngir boole ay resultaa yi. Kon couche bu nekk dafay am lu tollu ci ñaari all-reduces (ci kanam) ak ñaar (ci ginaaw). Ndax mbootaay yooyu dañuy am ci bépp etaas, latency mooy ëpp doole - kon parallelism tensor mingi dundu ci ginaaw lëkkalekaay yu gaaw yi ci biir node yu melni NVLink moo gën reso yu gëna néew ci biir node yi.

Xam paralelism tensor ngir model yu mag

Benn anam buñu mëna xaajalee math ci biir benn layer neuronal ci GPU yu bari suko defee benn model bu rëy lool ci benn aparey mën dox ba leegi. Dafa am solo ndax model frontier yi amnañu téemeeri miliyaar ciy parametre yu benn GPU mënul tëye wala xayma lu gaaw kese. Parallelism tensor ngir model yu mag yi, ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal Tensor Parallelism for Large Models ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo Tensor Parallelism ngir Model yu Mag yi dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu paralelism tensor ngir model yu mag

Paralelism tensor mingi wéy di nekk lu am solo waaye mingi gëna jaxasoo ak 'paralelism 3D' (tensor + pipeline + done) ba noppi boole ci paralelism eksper ngir xeetu njaxasu-ekspert. Kadre yu melni Megatron-LM, DeepSpeed, ak vLLM ñooy otomatise xaaj bi. Bi GPU di lëkkaloo (NVLink, NVSwitch) ak tissu optik di gëna gaaw, yam ci node-frontière dafay féexal, may grupu tensor yu gëna yaatu. Xaarandi paralelisasioŋ otomatik bu gëna am xel buy tànn dimension shard ak dayo grupu ngir wàññi jokkoog topologie cluster buñ jox.

Doxal ci àdduna dëgg

Taggat ab xeetu paramet 175B ci xaaj matris bu diisaayu couche bu nekk ci 8 GPU ci benn node bu lëkkaloo ak NVLink di jëfandikoo Megatron-LM.

Defar ab xeetu waxtaan bu am paramet 70B ci vLLM ak tensor_parallel_size = 4 suko defee diisaay yi mëna méngoo ak ñeenti GPU yi te tontu ci jamono dëgg.

Séddale boppu transformatër yi ci GPU yi suko defee aparey bu nekk xayma benn subset, ba noppi boole ay sorti ngir layer bi ci topp.

Njaxas paralelismu tensor ci biir node yi ak paralelismu pipeline ci biir node yi ngir tàggat model yu am bilioŋu paramet ci kaw cluster GPU yu mag.

Modèlu jëfandikoo

Paralelism tensor ngir model yu mag ci jëf

Taggat ab xeetu paramet 175B ci xaaj matris bu diisaayu couche bu nekk ci 8 GPU ci benn node bu lëkkaloo ak NVLink di jëfandikoo Megatron-LM.

Taggat benn model 175B-parametre ci sharding matrices diisaayu layer bu nekk ci 8 GPUs ci benn node bu NVLink-konekte ak Megatron-LM Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir produit yu bari, ak topp ci diir bi.

Paralelism tensor ngir model yu mag ci jëf

Defar ab xeetu waxtaan bu am paramet 70B ci vLLM ak tensor_parallel_size = 4 suko defee diisaay yi mëna méngoo ak ñeenti GPU yi te tontu ci jamono dëgg.

Defar ab xeetu chat 70B-parametre ci vLLM ak tensor_parallel_size = 4 suko defee diisaay yi mëna méngoo ak ñeenti GPUs te tontu ci jamono dëgg. Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee kalite ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ak topp gains yu njëkk yi.

Paralelism tensor ngir model yu mag ci jëf

Séddale boppu transformatër yi ci GPU yi suko defee aparey bu nekk xayma benn subset, ba noppi boole ay sorti ngir layer bi ci topp.

Séddale boppu transformatër yi ci GPUs suko defee aparey bu nekk xayma benn subset, ba noppi boole ay génn ngir layer bi ci topp. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ak topp njuréefi produit ak error.

Paralelism tensor ngir model yu mag ci jëf

Njaxas paralelismu tensor ci biir node yi ak paralelismu pipeline ci biir node yi ngir tàggat model yu am bilioŋu paramet ci kaw cluster GPU yu mag.

Tensor parallelism ci biir node yi ak parallelism pipeline ci biir node yi ngir tàggat model yu trillion-parametre ci clusters GPU yu mag yi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu