Jagoran Harshe AI

FlashAttention

FlashAttention shine algorithm mai inganci mai ƙwaƙwalwar ajiya wanda ke ƙididdige ainihin hankali ɗaya kamar daidaitattun masu canji amma ba tare da taɓa rubuta babban matrix ɗin hankali don rage ƙwaƙwalwar GPU ba.

Dubawa

FlashAttention shine algorithm mai inganci mai ƙwaƙwalwar ajiya wanda ke ƙididdige ainihin hankali ɗaya kamar daidaitattun masu canji amma ba tare da taɓa rubuta babban matrix ɗin hankali don rage ƙwaƙwalwar GPU ba. Ya sanya horon yanayi mai tsawo da ƙima cikin sauri da rahusa.

FlashAttention wani ɓangare ne na tarin harshe-AI da ake amfani da shi don karantawa, ƙirƙira, rarrabawa, da canza rubutu da magana a sikeli.

Zurfafa nutsewa

Daidaitaccen hankali yana ƙididdige maki ga kowane nau'i na alamomi, yana samar da matrix N-by-N. Don jerin alamar 4,000 wanda ke da maki miliyan 16, kuma dole ne a rubuta matrix ɗin zuwa kuma a karanta baya daga ƙwaƙwalwar babban bandwidth na GPU (HBM). Wannan zirga-zirgar ƙwaƙwalwar ajiya, ba lissafi ba, shine ainihin ƙulli. FlashAttention, wanda Tri Dao da abokan aiki suka gabatar a cikin 2022, yana sake fasalin lissafin don haka matrix ɗin ba zai taɓa zama cikakke ba. Yana aiwatar da jeri a cikin fale-falen fale-falen da suka dace a cikin ƙaramin GPU's, ultra-sauri akan guntu SRAM, lissafin softmax yana ƙaruwa yayin da yake tafiya. Sakamakon lissafin yayi kama da daidaitaccen kulawa amma yana amfani da ƙarancin ƙwaƙwalwar ajiya kuma yana tafiyar da sauri sau da yawa, yana ba da damar windows mahallin mai tsayi.

Fahimtar Fasaha

Dabarar ita ce 'softmax akan layi' hade da tiling. FlashAttention yana ɗaukar ƙananan tubalan tambayoyi, maɓallai, da ƙima cikin SRAM, yana ƙididdige abubuwan da ake samu na hankali, kuma yana sake ƙididdige ƙimar gudu yayin da sabbin tubalan suka isa don haka daidaitawar softmax ya tsaya daidai ba tare da ganin duk maki lokaci ɗaya ba. Domin ba ya taɓa adana cikakken matrix N-by-N a cikin HBM, ƙwaƙwalwar ajiya tana daidaita layi maimakon quadratically, kuma an haɗa kernel cikin aikin GPU guda ɗaya don rage jinkirin karantawa da rubutu.

Jagorar FlashAttention

FlashAttention shine algorithm mai inganci mai ƙwaƙwalwar ajiya wanda ke ƙididdige ainihin hankali ɗaya kamar daidaitattun masu canji amma ba tare da taɓa rubuta babban matrix ɗin hankali don rage ƙwaƙwalwar GPU ba. Ya sanya horon yanayi mai tsawo da ƙima cikin sauri da rahusa. FlashAttention wani ɓangare ne na tarin harshe-AI da ake amfani da shi don karantawa, ƙirƙira, rarrabawa, da canza rubutu da magana a sikeli. Don gina zurfin fahimta, bi FlashAttention a matsayin samfurin aiki, ba fasali ɗaya ba: ayyana sakamakon da ake so, bayyana zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi suna amfani da ƙira na FlashAttention ya tunzura, dawowa, da sake duba madaukai azaman tsarin sadarwa mai haɗaka. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba. A lokaci guda, abubuwan da ba a iya gani ba na iya shigar da rahotanni cikin nutsuwa, kwararar goyan baya, ko abubuwan bincike. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba.

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Yana faɗaɗa damar shiga cikin harsuna da salon sadarwa.

Yana faɗaɗa damar shiga cikin harsuna da salon sadarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ƙungiyoyi za su iya ciyar da ƙarin lokaci akan hukunci yayin da aiki da kai ke sarrafa maimaitawa.

Ƙungiyoyi za su iya ciyar da ƙarin lokaci akan hukunci yayin da aiki da kai ke sarrafa maimaitawa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar FlashAttention

FlashAttention ya zama tsoho tubalan gini. FlashAttention-2 ya inganta rarrabuwar aikin GPU, kuma FlashAttention-3 yana amfani da sabbin fasalolin kayan masarufi na Hopper kamar asynchrony da ƙarancin madaidaicin FP8. Yi tsammanin ci gaba da ƙira tare da kwakwalwan kwamfuta, haɗin kai mai zurfi cikin sabobin ƙididdigewa don dogayen takardu, da bambance-bambancen da aka saurara don ƙarancin kulawar taga ko zamewa. Kamar yadda mahallin windows ke matsawa zuwa miliyoyin alamu, IO-sane kernels kamar wannan suna da mahimmanci don kiyaye horo da ba da kuɗin sarrafawa.

Aiwatar da Gaskiyar Duniya

Horar da manyan nau'ikan harshe kamar Llama da tsarin tsarin GPT cikin sauri kuma a ƙananan farashin GPU

Hidimar mataimakan taɗi na lokaci mai tsawo waɗanda ke cinye duka littattafai ko ma'auni ba tare da ƙarewar ƙwaƙwalwar ajiya ba

Ƙaddamar da bututun taƙaita daftarin aiki waɗanda ke sarrafa dubun-dubatar alamu a lokaci ɗaya

Ƙarfafa hangen nesa da masu canji na multimodal inda dogayen jeri na facin hoto ke sa hankali ya yi tsada

Hanyoyin Aiwatarwa

FlashAttention a aikace

Horar da manyan nau'ikan harshe kamar Llama da tsarin tsarin GPT cikin sauri kuma a ƙananan farashin GPU.

Horar da manyan nau'ikan harshe kamar Llama da tsarin tsarin GPT da sauri kuma a ƙananan farashin GPU Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da kuma bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

FlashAttention a aikace

Hidimar mataimakan taɗi na lokaci mai tsawo waɗanda ke cinye duka littattafai ko ma'auni ba tare da ƙarewar ƙwaƙwalwar ajiya ba.

Yin hidimar mataimakan taɗi na dogon lokaci waɗanda ke shigar da littattafan gabaɗaya ko bayanan ƙididdiga ba tare da ƙarewar ƙwaƙwalwar ajiya Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da ƙimar kuskure a kan lokaci.

FlashAttention a aikace

Ƙaddamar da bututun taƙaita daftarin aiki waɗanda ke sarrafa dubun-dubatar alamu a lokaci ɗaya.

Ƙaddamar da bututun taƙaita daftarin aiki waɗanda ke aiwatar da dubun-dubatar alamu a lokaci ɗaya Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da farashi na kuskure akan lokaci.

FlashAttention a aikace

Ƙarfafa hangen nesa da masu canji na multimodal inda dogayen jeri na facin hoto ke sa hankali ya yi tsada.

Ƙarfafa hangen nesa da masu canji na multimodal inda dogayen jeri na facin hoto ke ba da hankali Kungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'i, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Abubuwan da aka ruɗe suna iya shigar da rahotanni cikin nutsuwa, kwararar tallafi, ko abubuwan bincike.

!

Hankali na gaggawa na iya ƙirƙirar sakamako mara daidaituwa a cikin buƙatun iri ɗaya.

!

Za a iya fallasa bayanan rubutu mai ma'ana idan ikon samun dama yana da rauni.

Taswirar Hanya

1

Ƙayyade tsarin fitarwa, sautin, da ma'auni masu inganci kafin fitowa.

Ƙayyade tsarin fitarwa, sautin, da ma'auni masu inganci kafin fitowa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Amsa a ƙasa tare da amintattun tushe a duk lokacin da daidaito ya shafi mahimmanci.

Amsa a ƙasa tare da amintattun tushe a duk lokacin da daidaito ya shafi mahimmanci. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Ajiye wurin binciken ɗan adam don abubuwan da ake samu masu girma.

Ajiye wurin binciken ɗan adam don abubuwan da ake samu masu girma. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Bibiyar tsarin gazawar kuma sake horar da tsokaci ko tafiyar aiki akai-akai.

Bibiyar tsarin gazawar kuma sake horar da tsokaci ko tafiyar aiki akai-akai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike