GUIDE teknik

Flash Fexe

Flash Attention anam bu am xel la ngir xayma jéego yi ci biir Transformers te doo musa bind matrix bu mag biy yeexal mémoire bi.

Résumé

Flash Attention anam bu am xel la ngir xayma jéego yi ci biir Transformers te doo musa bind matrix bu mag biy yeexal mémoire bi. Dafay tax model yu am contexte yu gudd yi gëna gaaw, gëna am memory bu baax te du soppi seen math.

Flash Attention ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi.

Plongeur bu xóot

Fàttaliku buñ miin dafay méngale token bu nekk ak beneen token bu nekk, mu defar matrix poñ N-by-N buy màgg quadratically ak guddaayu toppalante bi. Ci anam wu amul benn xam-xam, matrix boobu dañu ko bind ak lire ci GPU memory bandwidth bu kawe (HBM), te shuttling bi - du yokk - mooy bottleneck dëgg. Flash Attention, bi Tri Dao ak ay naataango dugal ci 2022, dafay yamale xayma bi suko defee matrix bi du musa denc lépp. Dafay liggéey ci laaj yi, caabi yi, ak valeur yi ci ay carreaux yu ndaw yu méngoo ak SRAM bu gaaw bi ci puce bi, xayma ay resultaa yu néew, ba noppi boole leen ci jëfandikoo benn trick running-softmax ci net bi. Ci wàllu math, génne gi dafay nuru ak bàyyi xel bu bari waaye dafay jëfandikoo mémoire lineaire te dafay daw lu bari yoon, rawatina ci sequence yu gudd.

Gis-gis xarala

Li gëna am solo mooy tiling ak softmax ci net bi. Softmax dafay soxla rang poñ yépp ngir mëna xayma limuy denominateur, waaye Flash Attention dafay wéy di am maximum buy daw ak sum buy daw ndax dafay stream carreau bu nekk, di rescaling génne yu njëkk yi suko defee resultaa bi mujj nekk gëna jubal. Ndax poñ yi ci digg yi dañuy des ci SRAM (range yu magnitude yu gëna gaaw ci HBM), algorithm bi IO-aware la: dafay wàññi li ñuy jàng ak bind ci mémoire bi moo gën jëfandikoo arithmétique bu ñor.

Xam bàyyi xel ci Flash

Flash Attention anam bu am xel la ngir xayma jéego yi ci biir Transformers te doo musa bind matrix bu mag biy yeexal mémoire bi. Dafay tax model yu am contexte yu gudd yi gëna gaaw, gëna am memory bu baax te du soppi seen math. Flash Attention ab bloku tabax la bu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal Flash Attention ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo Flash Attention dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu Flash Attention

Flash Attention nekkaatna luñuy tabax, ak FlashAttention-2 ak FlashAttention-3 di gëna xëcc ay GPU yu bees yu melni H100 ci gëna mëna xaaj liggéey bi ak di jëfandikoo yooni FP8 yu woyof. Xaarandil wéyal co-design ak hardware, boole bu gëna dëgër ci tàggat ak kaadar inference, ak variants yuñ defar ngir sparse, palanteer buy gliise, ak bàyyi xel ci contexte bu gudd lool. Ginaaw palanteer yi dañuy dem ba ci ay milioŋ ciy token, kernel yu IO-aware yu mel ni yii dañuy wéy di am solo ngir mëna tëye mémoire ak gaawaay.

Doxal ci àdduna dëgg

Taggat xeetu làkk yu mag yu melni Llama ak sistem GPT-class ak palanteer yu gëna gudd ci njëgu mémoire bu gëna néew.

Li gëna gaaw ci jàppale assistant chat yi ci gaawal etape prefill bi ñuy njëkka lire ab laaj bu gudd.

Fexe ba jumtukaayi jàngat këyit yi mëna jël téere yi yépp wala base code yi ci fexe ba ñu mëna bàyyi xel ci toppalante yu gudd yi ci benn GPU.

Transformatër yiy dundal gis-gis ak déglu, fu ay dugal yu am resolusioŋ bu kawe di defar ay token yu gudd lool.

Modèlu jëfandikoo

Flash Fexe ci jëf

Taggat xeetu làkk yu mag yu melni Llama ak sistem GPT-class ak palanteer yu gëna gudd ci njëgu mémoire bu gëna néew.

Taggat xeetu làkk yu mag yu melni Llama ak sistemu GPT-class ak palanteer yu gëna gudd ci njëgu mémoire bu gëna néew. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Flash Fexe ci jëf

Li gëna gaaw ci jàppale assistant chat yi ci gaawal etape prefill bi ñuy njëkka lire ab laaj bu gudd.

Liggéey assistant chat yi gëna gaaw ci gaawlu etape prefill bi ñuy njëkka lire benn prompt bu gudd Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Flash Fexe ci jëf

Fexe ba jumtukaayi jàngat këyit yi mëna jël téere yi yépp wala base code yi ci fexe ba ñu mëna bàyyi xel ci toppalante yu gudd yi ci benn GPU.

Fexe ba jumtukaayi jàngat dokimaa yiy jël téere yi yépp wala codebases yi ci def ay tegtal yu yàgg ci benn GPU Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.

Flash Fexe ci jëf

Transformatër yiy dundal gis-gis ak déglu, fu ay dugal yu am resolusioŋ bu kawe di defar ay token yu gudd lool.

Transformers yu am doole ci gis-gis ak audio, fu ay dugal yu am resolusioŋ bu kawe di defar ay token yu gudd lool. Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu