Ulimi lwe-AI GUIDE

I-FlashAttention

I-FlashAttention i-algorithm enenkumbulo esebenza kahle ehlanganisa ukunaka okufanayo ncamashi nama-transformer ajwayelekile kodwa ngaphandle kokubhala i-matrix enkulu yokunaka ukuze ibambezele inkumbulo ye-GPU.

Uhlolojikelele

I-FlashAttention i-algorithm enenkumbulo esebenza kahle ehlanganisa ukunaka okufanayo ncamashi nama-transformer ajwayelekile kodwa ngaphandle kokubhala i-matrix enkulu yokunaka ukuze ibambezele inkumbulo ye-GPU. Kwenze ukuqeqeshwa komongo omude nokuchazwa kusheshe kakhulu futhi kushibhe.

I-FlashAttention iyingxenye yesitaki solimi-AI esetshenziselwa ukufunda, ukukhiqiza, ukuhlukanisa, nokuguqula umbhalo nenkulumo ngezikali.

I-Deep Dive

Ukunaka okujwayelekile kubala amaphuzu epheya ngalinye lamathokheni, okukhiqiza i-matrix ye-N-by-N. Ngokulandelana kwamathokheni angu-4,000 lokho amaphuzu ayizigidi ezingu-16, futhi i-matrix kufanele ibhalwe futhi ifundwe kusukela kumemori yomkhawulokudonsa ophezulu we-GPU (HBM). Leyo datha yenkumbulo, hhayi izibalo, iyisisekelo sangempela. I-FlashAttention, eyethulwe ngu-Tri Dao nozakwabo ngo-2022, ihlela kabusha ukubala ukuze i-matrix ingalokothi yenziwe ngokuphelele. Icubungula ukulandelana kwamathayili alingana ku-GPU encane, eshesha kakhulu ku-chip SRAM, ihlanganisa i-softmax ngokukhuphukayo njengoba ihamba. Umphumela uyafana ngokwezibalo nokunaka okujwayelekile kodwa isebenzisa inkumbulo encane kakhulu futhi isebenza ngokushesha izikhathi ezimbalwa, inika amandla umongo omude kakhulu windows.

I-Technical Insight

Iqhinga 'i-softmax eku-inthanethi' ehlanganiswe nokuthayela. I-FlashAttention ilayisha amabhulokhi amancane emibuzo, okhiye, namanani ku-SRAM, ibala ukuphuma kokunaka okuncane, futhi ilinganise ngezibalo ezigijima njengoba amabhulokhi amasha efika ukuze i-softmax normalization ihlale ilungile ngaphandle kokubona wonke amaphuzu ngesikhathi esisodwa. Ngoba ayilokothi igcine i-matrix ye-N-by-N egcwele ku-HBM, inkumbulo ikala ngokulandelana kunokuba i-quadratically, futhi i-kernel ihlanganiswe ekusebenzeni kwe-GPU eyodwa ukuze kuncishiswe ukufundwa nokubhala kwenkumbulo enensayo.

Ukuqonda i-FlashAttention

I-FlashAttention i-algorithm enenkumbulo esebenza kahle ehlanganisa ukunaka okufanayo ncamashi nama-transformer ajwayelekile kodwa ngaphandle kokubhala i-matrix enkulu yokunaka ukuze ibambezele inkumbulo ye-GPU. Kwenze ukuqeqeshwa komongo omude nokuchazwa kusheshe kakhulu futhi kushibhe. I-FlashAttention iyingxenye yesitaki solimi-AI esetshenziselwa ukufunda, ukukhiqiza, ukuhlukanisa, nokuguqula umbhalo nenkulumo ngezikali. Ukuze wakhe ukuqonda okujulile, phatha i-FlashAttention njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho uhlelo olungakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa izexwayiso zokuklama ze-FlashAttention, ukubuyisa, nokubuyekeza izihibe njengohlelo olulodwa lokuxhumana oludidiyelwe. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Ukugeleza komsebenzi wolimi kungahamba ngokushesha ngaphandle kokudela ukuvumelana. Ngesikhathi esifanayo, amaqiniso Akhohliwe angafaka imibiko buthule, ukugeleza kosekelo, noma imiphumela yocwaningo. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Ukugeleza komsebenzi wolimi kungahamba ngokushesha ngaphandle kokudela ukuvumelana.

Ukugeleza komsebenzi wolimi kungahamba ngokushesha ngaphandle kokudela ukuvumelana. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Yandisa ukufinyelela kuzo zonke izilimi nezitayela zokuxhumana.

Yandisa ukufinyelela kuzo zonke izilimi nezitayela zokuxhumana. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amaqembu angachitha isikhathi esiningi ekwahluleleni kuyilapho i-automation isingatha impinda.

Amaqembu angachitha isikhathi esiningi ekwahluleleni kuyilapho i-automation isingatha impinda. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-FlashAttention

I-FlashAttention isiphenduke ibhulokhi yokwakha ezenzakalelayo. I-FlashAttention-2 ethuthukisiwe ukwahlukanisa umsebenzi we-GPU, futhi i-FlashAttention-3 isebenzisa izici ezintsha zehadiwe ye-Hopper njenge-asynchrony ne-FP8 enembayo ephansi. Lindela idizayini eqhubekayo ehambisana nama-chips, ukuhlanganiswa okujulile kumaseva e-inference kumadokhumenti amade, kanye nokwahluka okushunwe ukunakwa okungatheni noma kwewindi elislayidayo. Njengoba umongo amawindi ephushela ezigidini zamathokheni, izinhlamvu ze-IO-aware ezifana nalezi zihlala zibalulekile ekugcineni ukuqeqeshwa kanye nezindleko zokuhlinzeka zilawuleka.

Ukuqaliswa Komhlaba Wangempela

Ukuqeqesha amamodeli amakhulu olimi afana nesistimu yesitayela se-Llama ne-GPT ngokushesha nangezindleko eziphansi ze-GPU

Inikeza abasizi bengxoxo yomongo omude abangenisa izincwadi zonke noma izisekelo zekhodi ngaphandle kokuphelelwa yinkumbulo

Ukusheshisa amapayipi okufingqa amadokhumenti acubungula amashumi ezinkulungwane zamathokheni ngesikhathi esisodwa

Ukubona amandla kanye nama-multimodal transformer lapho ukulandelana okude kwamapeshi esithombe kwenza ukunaka kubize kakhulu

Amaphethini Okusebenzisa

I-FlashAttention ekusebenzeni

Ukuqeqesha amamodeli amakhulu olimi afana nesistimu yesitayela se-Llama ne-GPT ngokushesha nangezindleko eziphansi ze-GPU.

Ukuqeqesha amamodeli olimi amakhulu njengezinhlelo zesitayela se-Llama kanye nesitayela se-GPT ngokushesha nangezindleko eziphansi ze-GPU Amaqembu ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, agcine indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-FlashAttention ekusebenzeni

Inikeza abasizi bengxoxo yomongo omude abangenisa izincwadi zonke noma izisekelo zekhodi ngaphandle kokuphelelwa yinkumbulo.

Ukunikeza abasizi bengxoxo yomongo omude abangenisa izincwadi zonke noma izisekelo zekhodi ngaphandle kokuphelelwa yinkumbulo Amathimba ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-FlashAttention ekusebenzeni

Ukusheshisa amapayipi okufingqa amadokhumenti acubungula amashumi ezinkulungwane zamathokheni ngesikhathi esisodwa.

Ukusheshisa amapayipi okufingqa amadokhumenti acubungula amashumi ezinkulungwane zamathokheni ngesikhathi esisodwa Amaqembu ngokuvamile athola imiphumela engcono lapho echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-FlashAttention ekusebenzeni

Ukubona amandla kanye nama-multimodal transformer lapho ukulandelana okude kwamapeshi esithombe kwenza ukunaka kubize kakhulu.

Umbono onika amandla kanye nama-multimodal transformer lapho ukulandelana okude kwamapeshi esithombe kwenza ukunaka kubize kakhulu Amathimba ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Amaqiniso akhonjiwe angafaka ngokuthula imibiko, ukugeleza kosekelo, noma imiphumela yocwaningo.

!

Ukuzwela okusheshayo kungadala imiphumela engahambisani kuzo zonke izicelo ezifanayo.

!

Idatha yombhalo ebucayi ingase idalulwe uma izilawuli zokufinyelela zibuthakathaka.

Ukuqalisa Umhlahlandlela

1

Chaza ifomethi yokuphumayo, ithoni, namazinga wekhwalithi ngaphambi kokukhishwa.

Chaza ifomethi yokuphumayo, ithoni, namazinga wekhwalithi ngaphambi kokukhishwa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Izimpendulo eziyisisekelo ngemithombo ethembekile noma nini lapho ukunemba kubalulekile.

Izimpendulo eziyisisekelo ngemithombo ethembekile noma nini lapho ukunemba kubalulekile. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Gcina indawo yokuhlola isibuyekezo somuntu ukuze uthole imiphumela ephezulu.

Gcina indawo yokuhlola isibuyekezo somuntu ukuze uthole imiphumela ephezulu. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Landela amaphethini okuhluleka futhi uqeqeshe kabusha imiyalo noma ukuhamba komsebenzi njalo.

Landela amaphethini okuhluleka futhi uqeqeshe kabusha imiyalo noma ukuhamba komsebenzi njalo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole