Mutauro AI GUIDE

FlashAttention

FlashAttention ndeyekuyeuka-inoshanda algorithm iyo inokokorodza kutarisisa kwakafanana seyakajairwa shanduri asi isina kumbonyora hofori yekutarisisa matrix kuti inonoke GPU ndangariro.

Overview

FlashAttention ndeyekuyeuka-inoshanda algorithm iyo inokokorodza kutarisisa kwakafanana seyakajairwa shanduri asi isina kumbonyora hofori yekutarisisa matrix kuti inonoke GPU ndangariro. Yakagadzira kudzidziswa kwenguva refu uye kufungidzira nekukurumidza uye zvakachipa.

FlashAttention chikamu chemutauro-AI stack inoshandiswa kuverenga, kugadzira, kuronga, uye kushandura zvinyorwa uye kutaura pamwero.

Deep Dive

Kutarisisa kwakajairwa kunoverengera zvibodzwa zvema tokeni ese, kugadzira N-by-N matrix. Kune zviuru zvina-zviratidzo zvakatevedzana izvo mamirioni gumi nematanhatu zvibodzwa, uye matrix anofanira kunyorerwa uye kuverenga kumashure kubva kuGPU's high-bandwidth memory (HBM). Kuti ndangariro traffic, kwete masvomhu, ndiyo bhodhoro chairo. FlashAttention, yakaunzwa naTri Dao uye vaanoshanda navo muna 2022, inogadzirisa komputa kuitira kuti matrix isambonyatso kugadzirwa. Iyo inogadzirisa kutevedzana mumataira anoenderana neGPU diki, yekupedzisira-inokurumidza pa-chip SRAM, computing softmax inowedzera sezvainoenda. Mhedzisiro yacho yakafanana nemasvomhu netarisiro yakajairwa asi inoshandisa ndangariro shoma uye inomhanya kakawanda nekukurumidza, ichigonesa kurebesa mamiriro windows.

Technical Insight

Iwo manomano ndiyo 'online softmax' yakasanganiswa nekuisa matairi. FlashAttention inoremedza zvidhinha zvidiki zvemibvunzo, makiyi, uye makoshero muSRAM, inokokorodza zvishoma kutarisisa zvinobuda, uye rescales inomhanya sums sezvivharo zvitsva zvinosvika kuitira kuti softmax normalization igare yakarurama pasina kuona zvibodzwa zvese kamwechete. Nekuti haimbo chengetedza yakazara N-ne-N matrix muHBM, ndangariro zviyero zvine mutsetse kwete quadratically, uye kernel inosanganisirwa kuita imwechete GPU mashandiro kudzikisa kunonoka kurangarira kuverenga nekunyora.

Kubata FlashAttention

FlashAttention ndeyekuyeuka-inoshanda algorithm iyo inokokorodza kutarisisa kwakafanana seyakajairwa shanduri asi isina kumbonyora hofori yekutarisisa matrix kuti inonoke GPU ndangariro. Yakagadzira kudzidziswa kwenguva refu uye kufungidzira nekukurumidza uye zvakachipa. FlashAttention chikamu chemutauro-AI stack inoshandiswa kuverenga, kugadzira, kuronga, uye kushandura zvinyorwa uye kutaura pamwero. Kuti uvake kunzwisisa kwakadzama, bata FlashAttention semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura izvo zvingaitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa FlashAttention dhizaini zvinokurudzira, kudzosa, uye kuongorora zvishwe seimwe yakabatanidzwa yekutaurirana system. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Mutauro workflows inogona kufamba nekukurumidza pasina kupira kuenderana. Panguva imwecheteyo, chokwadi cheHallucified chinogona kupinda chinyararire mishumo, kuyerera kwetsigiro, kana kutsvagisa zvinobuda. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Mutauro workflows inogona kufamba nekukurumidza pasina kupira kuenderana.

Mutauro workflows inogona kufamba nekukurumidza pasina kupira kuenderana. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Inopamhidzira kupinda mumitauro yese nemataera ekutaurirana.

Inopamhidzira kupinda mumitauro yese nemataera ekutaurirana. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Zvikwata zvinogona kupedza nguva yakawanda pakutonga uku otomatiki ichibata kudzokorora.

Zvikwata zvinogona kupedza nguva yakawanda pakutonga uku otomatiki ichibata kudzokorora. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reFlashAttention

FlashAttention yave chivakwa chekuvaka. FlashAttention-2 yakagadziridzwa GPU basa kupatsanura, uye FlashAttention-3 inoshandisa nyowani Hopper hardware maficha seasynchrony uye yakaderera-chaiyo FP8. Tarisira kuenderera mberi dhizaini nemachipisi, kusanganisa kwakadzika mumaseva ekufungidzira kwemagwaro marefu, uye akasiyana akarongedzerwa sparse kana kutsvedza-hwindo kutarisa. Sezvo mahwindo emamiriro ekunze anosundidzira kumamiriyoni ematokeni, IO-inoziva kernels seizvi zvinoramba zvakakosha kuti uchengetedze kudzidziswa uye kusevha mari inogoneka.

Real-World Implementation

Kudzidzira mhando dzemitauro mikuru seLlama uye GPT-maitiro masisitimu nekukurumidza uye nemutengo wakaderera weGPU

Kushandira kwenguva refu-chinyorwa vabatsiri vanopinza mabhuku ese kana macodebase pasina kupera mundangariro

Kumhanyisa magwaro-pfupiso mapaipi ayo anogadzirisa makumi ezviuru zvezviratidzo panguva imwe chete

Simba rekuona uye multimodal transformers uko marefu akateedzana ezvigamba zvemufananidzo anoita kuti kutarisa kudhure

Maitiro Ekuita

FlashAttention mukuita

Kudzidzira mhando dzemitauro mikuru seLlama uye GPT-maitiro masisitimu nekukurumidza uye nemutengo wakaderera weGPU.

Kudzidzira mhando dzemitauro mikuru seLlama uye GPT-maitiro masisitimu nekukurumidza uye nemutengo wakaderera weGPU Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura hunhu hwepamberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

FlashAttention mukuita

Kushandira kwenguva refu-chinyorwa vabatsiri vanopinza mabhuku ese kana macodebase pasina kupera mundangariro.

Kushandira vabatsiri vekutaura kwenguva refu vanopinza mabhuku akazara kana macodebases pasina kupera ndangariro Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

FlashAttention mukuita

Kumhanyisa magwaro-pfupiso mapaipi ayo anogadzirisa makumi ezviuru zvezviratidzo panguva imwe chete.

Kumhanyisa mapepa-kupfupisa mapaipi ayo anogadzirisa makumi ezviuru ematokeni panguva imwe chete Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

FlashAttention mukuita

Simba rekuona uye multimodal transformers uko marefu akateedzana ezvigamba zvemufananidzo anoita kuti kutarisa kudhure.

Simba rekuona uye multimodal transformers uko kutevedzana kwakareba kwezvigamba zvemifananidzo kunoita kuti kutarisa kudhure Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Chokwadi chehuroyi chinogona kupinda chinyararire mishumo, kuyerera kwetsigiro, kana tsvakiridzo.

!

Kunzwa nekukasira kunogona kugadzira mhedzisiro isingaenderane pane zvikumbiro zvakafanana.

!

Sensitive text data inogona kuburitswa kana zvidhiraivho zvisina kusimba.

Implementation Roadmap

1

Tsanangura chimiro chekubuda, toni, uye mhando zviyero usati waburitsa.

Tsanangura chimiro chekubuda, toni, uye mhando zviyero usati waburitsa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Mhinduro dzepasi neakavimbika masosi pese pazvine basa.

Mhinduro dzepasi neakavimbika masosi pese pazvine basa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Chengetedza ongororo yekuongorora yemunhu kune yakakwira-stake zvinobuda.

Chengetedza ongororo yekuongorora yemunhu kune yakakwira-stake zvinobuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Tevera maitiro ekutadza uye dzidzisazve kukurudzira kana mafambiro ebasa nguva nenguva.

Tevera maitiro ekutadza uye dzidzisazve kukurudzira kana mafambiro ebasa nguva nenguva. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora