Résumé
SwiGLU fonction aktivasioŋ la buy yokk benn projection ligneer bu duggal bi ak ñaareelu projection buñ aktive ci Swish, muy liggéey ni buntu buñ mëna jàng, bu aju ci done yi ci biir transformatër biy feed-forward layers. Dafay gëna suqali kalite modelu làkk, moo tax daanaka LLM bu bees bi daf koy jëfandikoo.
SwiGLU ak Gated Activations dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi.
Plongeur bu xóot
Benn bloc feed-forward transformateur buñ miin mooy ñaari couche ligneaire yu am ReLU wala GELU ci digg bi. Dauphin ak ñeneen ñi ñoo ko xalaat. ci 2016, xaaj projection bu njëkk bi ñaari xaaj nga jëfandikoo benn xaaj bi ngir buntu beneen bi jaaraleko ci yokk-wise élément. SwiGLU, bi Noam Shazeer siiwal ci 2020, dafay jëfandikoo fonction Swish (SiLU) ngir buntu boobu: genn = (Swish (xW) * (xV)) W2, ak ñetti matris yu poid ci barabu ñaar. Gating bi dafay may reso bi mu tànnee jàll wala dindi xibaar ci dimension bu nekk. Ndax yokk ñatteelu matrix bi dafay màgg ay parametre, jëfandikoo gi dafay wàññi dimension bu nëbbu bi ba ci ñaar ci ñett, suko defee xayma bi yépp nekk luy méngoo ak GELU MLP. Jàngat yu Shazeer dañu wane njariñu jaxasoo buñ mëna natt, LLaMA, PaLM, ak Mistral ñoom ñépp jël nañu ko.
Gis-gis xarala
Swish mooy x * sigmoid (beta * x), muy fonction bu nooy, bu amul monotone, te wuute na ak ReLU, dafay may valeur négatif yu ndaw ñu jaar. Ci SwiGLU, bànxaasu 'gate' bi Swish(xW) dafay defar ay valeur yu jege 0 wala 1 yuy yokk 'valeur' bànxaasu xV bi ci wàllu elemen, kon bépp bànxaas bu nëbbu bi dañu koy moduler ci siñaal buñ jàng, bu aju ci dugal. Ñatteelu matrix bi mooy njëg bi; ñaar ci ñett yu nëbbu yi dañuy tëye budget FLOP bi méngoo ak vanille feed-forward layer.
Xam SwiGLU ak aktivasioŋ yuñ tëj
SwiGLU fonction aktivasioŋ la buy yokk benn projection ligneer bu duggal bi ak ñaareelu projection buñ aktive ci Swish, muy liggéey ni buntu buñ mëna jàng, bu aju ci done yi ci biir transformatër biy feed-forward layers. Dafay gëna suqali kalite modelu làkk, moo tax daanaka LLM bu bees bi daf koy jëfandikoo. SwiGLU ak Gated Activations dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal SwiGLU ak Gated Activations ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.
Ci jëf, ekip yu am doole yiy jëfandikoo SwiGLU ak Gated Activations dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.
njeextalu pexe
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.
Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.
Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.
Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Doxal ci àdduna dëgg
LLaMA, PaLM, ak Mistral wecci GELU feed-forward couche ak SwiGLU ngir wàññi jaaxle ci xayma bu tolloo
Dimension bu nëbbu bi dañu ko yokk ba tollu ci ñaar ci ñett (8/3 d) suko defee matrix gating bi gëna yokk du gonfle FLOPs
Xeetu njaxasu-ekspert yu melni Mixtral dafay jëfandikoo ay blok SwiGLU ni reso feed-forward bu eksper bu nekk
gis-gis ak transformatër yu bari dañuy leble GeGLU/SwiGLU gating ngir gëna suqali seen MLP
Modèlu jëfandikoo
SwiGLU ak Gated ci jëf
LLaMA, PaLM, ak Mistral wecci GELU feed-forward couche ak SwiGLU ngir wàññi jaaxle ci xayma bu tolloo.
LLaMA, PaLM, ak Mistral wecci GELU feed-forward layer ak SwiGLU ngir wàññi jaaxle ci ordinatër bu tolloo. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.
SwiGLU ak Gated ci jëf
Dimension bu nëbbu bi dañu ko yokk ba tollu ci ñaar ci ñett (8/3 d) suko defee matrix gating bi gëna yokk du gonf FLOPs.
Dimension bu nëbbu bi dafa escale ci lu tollu ci ñaar ci ñett (8/3 d) suko defee matrix gating bi gëna bari du inflate FLOPs Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp produit yi ak njuumte yi ci diir bi.
SwiGLU ak Gated ci jëf
Royuwaayi njaxasu-ekspert yu melni Mixtral dañuy jëfandikoo blok SwiGLU ni reso feed-forward bu eksper bu nekk.
Modèlu njaxasu-ekspert yu melni Mixtral jëfandikoo ay bloku SwiGLU ni reso feed-forward per-expert. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.
SwiGLU ak Gated ci jëf
Transformatëri gis-gis ak yu bari anam yi dañuy leble GeGLU/SwiGLU ngir gëna suqali seen MLP.
Transformatëri gis-gis ak multimodal leble GeGLU / SwiGLU gating ngir gëna suqali seen MLP sublayers Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee kalite ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njariñu liggéey ak njëgu njuumte ci diir bi.
Risk yi ak balustrade yi
Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.
Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.
Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.
Roadmap ngir samp gi
Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.
Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Benchmark ci biir sargal ak done yu dëggu.
Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.
Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.
Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.