GUIDE teknik

RMSNorm ak yamale bu njëkk

RMSNorm ab couche normalisation bu woyof la buy reescale activation yi ci seen root moyenne carré, ak couche normalisation pre-couche biy jéego balaa subcouche bu nekk moo gën ginaaw.

Résumé

RMSNorm ab couche normalisation bu woyof la buy reescale activation yi ci seen root moyenne carré, ak couche normalisation pre-couche biy jéego balaa subcouche bu nekk moo gën ginaaw. Ñu bokk defar ay transformatër yu xóot yuy tàggat bu baax te duñu am benn pexe tàngoor.

RMSNorm ak Normalisation Pre-Layer ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci escalier bi.

Plongeur bu xóot

Standard LayerNorm dafay dindi moyenne bi ba noppi xaaj ko ak jaar-jaar bi ci vecteur bi, ba noppi jëfandikoo echel biñ jàng ak coppite. RMSNorm, bi Zhang ak Sennrich dugal ci 2019, dafay daaneel moyenne-centring ak biais bi yépp: dafay xaaj vecteur bu nekk ak root moyenne carré ci élément yi, ba noppi yokk ko ci benefiis bu nekk ci man-man yi. Loolu dafay dindi benn lim ak yenn jëf, dagg xayma ci lu tollu ci 10-50% ci couche norme bi boole ci njub. Ci beneen wàll, plasement 'Pre-LN' (norm balaa bàyyi xel / MLP, ak yoon wu sell wi ko wër) dafay tëye magnitude gradient yi ci ndoorte li, kon xeetu GPT-3, LLaMA, ak PaLM di tàggat te duñu am hacks yuy jàng-taux warmup bi soxla Trans-LN transformer.

Gis-gis xarala

Ngir vecteur x bu am yaatuwaayu d, RMSNorm dafay xayma x_i * g_i / sqrt ((1/d) * sum (x_j^2) + epsilon), fu g nekk vecteur de gain buñ jàng. Amul benn dindi bu yam wala benn njaaxaanaay. Ndax residuel bi ci benn bloc Pre-LN dafay romb normalisation bi, yoonu dàntite bi du laal dara, te gradient yi dañuy naaw ci génn gi dem ci dugg bi, moo tax stack yu xóot yi dañuy booloo.

Xam RMSNorm ak normalisasioŋ bu njëkk ci couche

RMSNorm ab couche normalisation bu woyof la buy reescale activation yi ci seen root moyenne carré, ak couche normalisation pre-couche biy jéego balaa subcouche bu nekk moo gën ginaaw. Ñu bokk defar ay transformatër yu xóot yuy tàggat bu baax te duñu am benn pexe tàngoor. RMSNorm ak Normalisation Pre-Layer ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci escalier bi. Ngir tabax xam-xam bu xóot, jàppal RMSNorm ak Pre-Layer Normalization ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo RMSNorm ak Normalisation Pre-Layer dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu RMSNorm ak Normalisasioŋ bu njëkk

RMSNorm mooy li ñuy jagleel LLM yu bari yi (LLAMA, Mistral, Qwen, Gemma), kon xaarandil mu wéy di nekk standard. Gëstu dafay setal rëset bi: QK-norm dafay jëfandikoo RMSNorm ci laaj ak caabi ngir dakkal màgg logit, ba noppi yenn laboratuwaar yi dañu boole pre- ak post-norm ('sandwich' wala 'peri-LN') ngir gëna dëgër ci escale trillion-paramètre. kernel hardware yi dañuy wéy di boole liggéey bi ngir gaaw.

Doxal ci àdduna dëgg

LLaMA, Mistral, ak Qwen ñoom ñépp ñu ngi wecci LayerNorm ak RMSNorm ngir dindi njuumte ci bepp jeton

Pre-LN dafay may model yu nuroo ak GPT ñu tàggat te duñu am tàngoor buy jàng bi transformatëru Post-LN 2017 soxla

QK-normalisation dafay jëfandikoo RMSNorm ci laajte ak caabi ngir dakkal logits ñu baña kalaate ci model yu mag

Transformatër mobile ak boor yi dañuy jëfandikoo RMSNorm ndax wàññi moyenne ak biais dafay wàññi dem bi ak dikk bi ci mémoire bi

Modèlu jëfandikoo

RMSNorm ak normalisasioŋ bu njëkk ci jëf

LLaMA, Mistral, ak Qwen ñoom ñépp ñu ngi wecci LayerNorm ak RMSNorm ngir dindi njuumte ci bepp jeton.

LLaMA, Mistral, ak Qwen ñoom ñépp ñu ngi wecci LayerNorm ak RMSNorm ngir rase latency inference ci bepp token. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njariñu liggéey ak njëgu njuumte ci diir bi.

RMSNorm ak normalisasioŋ bu njëkk ci jëf

Pre-LN dafay may model yu nuroo ak GPT ñu tàggat te duñu am tàngoor buy jàng bi transformatëru 2017 Post-LN soxla.

Pre-LN dafay may xeetu GPT-style ñu tàggat te duñu am benn tàngoor bu 2017 Post-LN transformer bi soxla.

RMSNorm ak normalisasioŋ bu njëkk ci jëf

QK-normalisation dafay jëfandikoo RMSNorm ci laaj ak caabi ngir tere logits di kalaate ci model yu mag.

QK-normalization dafay jëfandikoo RMSNorm ci laaj ak caabi ngir tere logits di kalaate ci model yu mag. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njuumte ci diir bi.

RMSNorm ak normalisasioŋ bu njëkk ci jëf

Transformatër mobile ak boor yi dañuy jëfandikoo RMSNorm ndax wàññi moyenne ak biais dafay wàññi dem bi ak dikk bi ci mémoire bi.

Transformateur mobile ak boor yi dañuy jël RMSNorm ndax daanu mean ak bias dafay wàññi dem bi ak dikk bi. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir mbir yu boor yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu