Résumé
Parallelism ci toppalante dafay xaaj benn toppalante bu gudd ci GPU yu bari ci dimension token (waxtu), ba noppi Ring Attention dafay may GPU yooyu ñu xayma bàyyi xel ci jaarale ko ci bloku butoŋu/valeur ci benn ring. Ñoom ñépp ñu ngi def ay palanteer yu am milioŋ ciy token yu mëna dem te benn GPU amul benn GPU buy tëye toppalante bi yépp.
Paralelism ci toppalante ak bàyyi xel ci ring, jumtukaay la buy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, latency bi, ak wóor gi ci echel bi.
Plongeur bu xóot
Standard attention soxla bepp laaj ngir gis bepp butoŋu/valeur, kon memory activation bi dafay màgg ak guddaayi sequence bi te K/V bi dafa wara am. Parallelism ci toppalante dafay xaaj toppalante bi suko defee GPU bu nekk am token yu ko toppalante (ak seeni laaj, caabi, valeur). Ring Attention dafay raññe GPU yi ci benn ring bu logic: aparey bu nekk dafay tëye ay laajam ci barab bi, ci noonu la bloke K/V yi di jaar hop-by-hop ci biir ring bi. Bu blok bu nekk yegsee, GPU bi dafay xayma benn wàll ci bàyyi xel bi ba noppi dajale ay resultaa ci jëfandikoo online-softmax (benn pexe max/sum biy dawal ni FlashAttention). Ginaaw benn loop bu mat, laaj bu nekk dafa topp butoŋu bu nekk ndànk, amul benn GPU bu musa denc K / V bi yépp. Li gëna am solo mooy jokkoo K/V dafay jaxasoo ak xayma, moo tax du yokk lu bari ci njëgu montor bi.
Gis-gis xarala
Ring Attention mingi wéeru ci softmax ci net bi: mën nañu xayma attention bloc-par-bloc ci di tëye ab maksimum buy daw ak ab normalisatër buy daw, ba noppi nga rescale somme partiel yu njëkk yi su amee valeur bu gëna mag. Loolu mooy tax resultaa bi méngoo ak bàyyi xel bu mat sëkk ci wàllu math. Bague bi du romb tensor K/V kese (dayo balance ak block bi, du sequence bi yépp), te ginaaw jokkoo hop bu nekk dafay jaxasoo ak matmul block bi njëkk, bandwidth - du memory - mooy nekk facteur limitant.
Xam parallelism ci toppalante ak bàyyi xel ci ring
Parallelism ci toppalante dafay xaaj benn toppalante bu gudd ci GPU yu bari ci dimension token (waxtu), ba noppi Ring Attention dafay may GPU yooyu ñu xayma bàyyi xel ci jaarale ko ci bloku butoŋu/valeur ci benn ring. Ñoom ñépp ñu ngi def ay palanteer yu am milioŋ ciy token yu mëna dem te benn GPU amul benn GPU buy tëye toppalante bi yépp. Paralelism ci toppalante ak bàyyi xel ci ring, jumtukaay la buy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, latency bi, ak wóor gi ci echel bi. Ngir tabax xam-xam bu xóot, jàppal Sequence Parallelism ak Ring Attention ni xeetu liggéey, du benn màndarga: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.
Ci jëf, ekip yu am doole yiy jëfandikoo Parallelism ak Ring Attention dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.
njeextalu pexe
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.
Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.
Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Doxal ci àdduna dëgg
Taggat ab 1M-token LLM ci xaaj bu nekk ci 8 GPUs ak Ring Attention
Megatron-LM dafay wàññi mémoire biy tàmbali ci LayerNorm ak gox yi bàyyi
Liggéeyukaay téere bi yépp wala dencukaay kode bu yaatu ci benn jàll te doo dagg
boole ay ring attention ak parallelism tensor ngir mëna méngale ab gis-gis bu gudd lool ci kaw ab node bu bari GPU
Modèlu jëfandikoo
Parallelism ci toppalante ak bàyyi xel ci jëf
Taggat ab 1M-token LLM ci xaaj bu nekk ci 8 GPUs ak Ring Attention.
Taggat benn 1M-token context LLM ci sharding bu nekk ci 8 GPUs ak Ring Attention Teams dañuy faral di am njariñ yu gëna baax suñu joxe ay leeral ci kalite ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njuumte ci diir bi.
Parallelism ci toppalante ak bàyyi xel ci jëf
Paralelismu toppalante Megatron-LM dafay wàññi mémoire biy tàmbali ci LayerNorm ak gox yi bàyyi.
Megatron-LM's parallelism sequence wàññi memory activation ci LayerNorm ak gox yi bàyyi ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.
Parallelism ci toppalante ak bàyyi xel ci jëf
Liggéeyukaay ci téere bi yépp wala dencukaay kode bu mag ci benn jàll te doo dagg.
Liggéeyukaay bu lëmm wala dencukaay kode bu mag ci benn jéego bu jëm kanam te du am truncation Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.
Parallelism ci toppalante ak bàyyi xel ci jëf
Njaxas Fàttaliku Ring ak Paralelism Tensor ngir méngale gis-gis bu gudd lool ci kaw node bu bari GPU.
Teams Ring Attention ak parallelism tensor ngir méngoo ak gis-gis bu yàgg ci node GPU yu bari. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njariñu liggéey ak njëgu njuumte ci diir bi.
Risk yi ak balustrade yi
Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.
Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.
Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.
Roadmap ngir samp gi
Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.
Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Benchmark ci biir sargal ak done yu dëggu.
Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.
Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.
Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.