UMHLAHLANDLELA Wobuchwepheshe

I-Flash Attention

I-Flash Attention iyindlela ehlakaniphile yokubala isinyathelo sokunaka ngaphakathi kwe-Transformers ngaphandle kokubhala i-matrix enkulu yokunaka ukuze ubambezele inkumbulo.

Uhlolojikelele

I-Flash Attention iyindlela ehlakaniphile yokubala isinyathelo sokunaka ngaphakathi kwe-Transformers ngaphandle kokubhala i-matrix enkulu yokunaka ukuze ubambezele inkumbulo. Kwenza amamodeli womongo omude asheshe kakhulu futhi asebenzise inkumbulo kahle ngaphandle kokushintsha izibalo zawo.

I-Flash Attention iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini.

I-Deep Dive

Ukunaka okujwayelekile kuqhathanisa yonke ithokheni nawo wonke amanye amathokheni, okukhiqiza i-matrix yesikolo se-N-by-N ekhula ngokuphindwe kane ngobude bokulandelana. Ngokungazi, leyo matrix ibhalelwe futhi ifundwe emuva kumemori yomkhawulokudonsa ophezulu we-GPU (HBM), futhi lokho kuvalwa - hhayi ukuphindaphindeka - kuyibhodlela langempela. I-Flash Attention, eyethulwe u-Tri Dao nozakwabo ngo-2022, ihlela kabusha ukubala ukuze i-matrix ingalokothi igcinwe ngokugcwele. Icubungula imibuzo, okhiye, kanye namanani kuthayela amancane alingana ku-chip esheshayo ye-SRAM, ihlanganisa imiphumela engaphelele, futhi iwahlanganise ndawonye kusetshenziswa iqhinga le-inthanethi eligijima-softmax. Okukhiphayo kuyafana ngokwezibalo nokunaka okuvamile kodwa kusebenzisa inkumbulo yomugqa futhi kusebenza ngokushesha izikhathi ezimbalwa, ikakhulukazi ekulandeleni okude.

I-Technical Insight

Iqhinga elibalulekile ukufaka amathayela kanye ne-softmax eku-inthanethi. I-Softmax ivamise ukudinga umugqa wonke wezikolo ukuze ibale inani eliphansi, kodwa i-Flash Attention igcina isamba esiphezulu nesisebenzayo njengoba isakaza ithayela ngalinye, ikala kabusha ingxenye yangaphambili yokuphumayo ukuze umphumela wokugcina ube unembile. Ngenxa yokuthi amaphuzu amaphakathi ahlala ku-SRAM (ama-oda obukhulu ashesha kakhulu kune-HBM), i-algorithm iyazi nge-IO: inciphisa ukufundwa kwememori futhi ibhale kunemisebenzi ye-arithmetic eluhlaza.

I-Mastering Flash Attention

I-Flash Attention iyindlela ehlakaniphile yokubala isinyathelo sokunaka ngaphakathi kwe-Transformers ngaphandle kokubhala i-matrix enkulu yokunaka ukuze ubambezele inkumbulo. Kwenza amamodeli womongo omude asheshe kakhulu futhi asebenzise inkumbulo kahle ngaphandle kokushintsha izibalo zawo. I-Flash Attention iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini. Ukuze wakhe ukuqonda okujulile, phatha i-Flash Attention njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-Flash Attention alungiselela izakhiwo, idatha, nokukhetha kwengqalasizinda ngokumelene nokuthembeka nezindleko. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ngesikhathi esifanayo, Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa Le-Flash Attention

I-Flash Attention isiphenduke ibhulokhi yokwakha ezenzakalelayo, i-FlashAttention-2 ne-FlashAttention-3 ecindezela ukuphuma okuningi kuma-GPU amasha njenge-H100 ngokuthuthukisa ukuhlukanisa umsebenzi nokusebenzisa izindlela ezinembayo eziphansi ze-FP8. Lindela idizayini eqhubekayo enezingxenyekazi zekhompuyutha, ukuhlanganiswa okuqinile ohlakeni lokuqeqeshwa kanye nezinkomba, kanye nokwahluka okushuniwe ukunaka okuncane, iwindi elislayidayo, kanye nokunaka kokuqukethwe okude kakhulu. Njengoba amawindi womongo anwebeka afinyelela ezigidini zamathokheni, izinhlamvu ze-IO-aware ezifana nalezi zihlala zibalulekile ekugcineni inkumbulo nesivinini sisebenza.

Ukuqaliswa Komhlaba Wangempela

Ukuqeqesha amamodeli olimi amakhulu njenge-Llama nezinhlelo zesigaba se-GPT ezinomongo omude wamawindi ngezindleko eziphansi zememori.

Ukunikeza abasizi bengxoxo ngokushesha ngokusheshisa isigaba sokugcwalisa kuqala lapho ukwaziswa okude kufundwa kuqala.

Ukunika amandla amathuluzi okuhlaziya amadokhumenti angenisa izincwadi zonke noma izisekelo zekhodi ngokwenza ukunaka okulandelanayo kube nokwenzeka ku-GPU eyodwa.

Amandla okubona kanye nama-Audio Transformers lapho okokufaka okunokulungiswa okuphezulu kudala ukulandelana kwamathokheni amade kakhulu.

Amaphethini Okusebenzisa

I-Flash Attention ekusebenzeni

Ukuqeqesha amamodeli olimi amakhulu njenge-Llama nezinhlelo zesigaba se-GPT ezinomongo omude wamawindi ngezindleko eziphansi zememori.

Ukuqeqesha amamodeli ezilimi ezinkulu njenge-Llama nezinhlelo zesigaba se-GPT ezinomongo omude amawindi ngezindleko eziphansi zenkumbulo Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Flash Attention ekusebenzeni

Ukunikeza abasizi bengxoxo ngokushesha ngokusheshisa isigaba sokugcwalisa kuqala lapho ukwaziswa okude kufundwa kuqala.

Ukunikeza usizo abasizi bengxoxo ngokushesha ngokusheshisa isigaba sokugcwalisa kuqala lapho ukwaziswa okude kufundwa kuqala Amaqembu ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Flash Attention ekusebenzeni

Ukunika amandla amathuluzi okuhlaziya amadokhumenti angenisa izincwadi zonke noma izisekelo zekhodi ngokwenza ukunaka okulandelanayo kube nokwenzeka ku-GPU eyodwa.

Ukunika amandla amathuluzi okuhlaziya amadokhumenti angenisa amabhuku aphelele noma izisekelo zekhodi ngokwenza ukunaka okulandelanayo kufinyeleleke kuQembu elilodwa le-GPU ngokuvamile kuthola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Flash Attention ekusebenzeni

Amandla okubona kanye nama-Audio Transformers lapho okokufaka okunokulungiswa okuphezulu kudala ukulandelana kwamathokheni amade kakhulu.

Amandla okushintsha umbono nomsindo lapho okokufaka kokucaca okuphezulu kudala ukulandelana kwamathokheni okude kakhulu Amaqembu ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu.

!

Izindleko zengqalasizinda nezokulungisa zivame ukubukelwa phansi.

!

Izikhala zokuphepha nokubonakala zingakhula njengoba izinhlelo ziba nzima kakhulu.

Ukuqalisa Umhlahlandlela

1

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa.

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha.

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi.

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala.

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole