Jagorar Fasaha

RMSNorm da Pre-Layer Normalization

RMSNorm wani yanki ne na daidaita nauyi mai nauyi wanda ke sake daidaita kunnawa ta tushen ma'anar murabba'in su, da wuraren daidaitawa na pre-layer waɗanda ke mataki a gaban kowane mai yin ƙasa maimakon bayan.

Dubawa

RMSNorm wani yanki ne na daidaita nauyi mai nauyi wanda ke sake daidaita kunnawa ta tushen ma'anar murabba'in su, da wuraren daidaitawa na pre-layer waɗanda ke mataki a gaban kowane mai yin ƙasa maimakon bayan. Tare suna sa masu canji masu zurfi suna yin horo a tsaye ba tare da dabarun dumama ba.

RMSNorm da Pre-Layer Normalization wani shingen gini ne na fasaha wanda ke shafar ingancin samfurin, farashin kayayyakin more rayuwa, latency, da aminci a sikeli.

Zurfafa nutsewa

Standard LayerNorm yana cire ma'ana kuma yana rarraba ta daidaitaccen karkatacciyar hanya a cikin sifa, sannan yana amfani da sikelin koyo da motsi. RMSNorm, wanda Zhang da Sennrich suka gabatar a shekarar 2019, ya sauke ma'anar tsaka-tsaki da son zuciya gabaɗaya: kawai yana raba kowane vector ta tushen ma'anar ma'anar abubuwansa kuma yana ninka ta hanyar samun koyan kowane nau'i. Wannan yana cire ƙididdiga ɗaya da ayyuka da yawa, yankan ƙididdigewa da kusan 10-50% a cikin tsarin al'ada yayin daidaita daidaito. Na dabam, jeri na 'Pre-LN' (ka'ida kafin hankali/MLP, tare da tsaftatacciyar hanya a kusa da shi) yana kiyaye girman gradient a lokacin farawa, don haka samfura kamar GPT-3, LLAMA, da PaLM jirgin ƙasa ba tare da koyo-kudin dumama hacks cewa asali Post-LN transformer bukata.

Fahimtar Fasaha

Don vector x na girma d, RMSNorm yana lissafta x_i * g_i / sqrt((1/d) * jimlar(x_j^2) + epsilon), inda g shine ingantaccen vector riba. Babu ragi mai ma'ana kuma babu son zuciya. Saboda ragowar rafi a cikin toshewar Pre-LN yana ƙetare daidaitawa, hanyar ainihi ta tsaya ba a taɓa ba kuma gradients suna gudana kai tsaye daga fitarwa zuwa shigarwa, wanda shine dalilin da yasa tarin zurfafa ke haɗuwa.

Jagorar RMSNorm da Daidaita Pre-Layer

RMSNorm wani yanki ne na daidaita nauyi mai nauyi wanda ke sake daidaita kunnawa ta tushen ma'anar murabba'in su, da wuraren daidaitawa na pre-layer waɗanda ke mataki a gaban kowane mai yin ƙasa maimakon bayan. Tare suna sa masu canji masu zurfi suna yin horo a tsaye ba tare da dabarun dumama ba. RMSNorm da Pre-Layer Normalization wani shingen gini ne na fasaha wanda ke shafar ingancin samfurin, farashin kayayyakin more rayuwa, latency, da aminci a sikeli. Don gina fahimta mai zurfi, bi da RMSNorm da Pre-Layer Normalization a matsayin samfurin aiki, ba sifa ɗaya ba: ayyana sakamakon da ake so, fayyace zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da RMSNorm da Pre-Layer Normaization suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar RMSNorm da Pre-Layer Normalization

RMSNorm yanzu shine tsoho a yawancin LLMs masu buɗaɗɗen nauyi (LLaMA, Mistral, Qwen, Gemma), don haka tsammanin zai kasance daidai. Bincike yana tace girke-girke: QK-al'ada ya shafi RMSNorm ga tambayoyin kulawa da maɓallan don inganta haɓakar logit, kuma wasu labs suna haɗa pre- da bayan-na'a ('sandwich' ko 'peri-LN') don ƙarin kwanciyar hankali a sikelin-trillion-parameter. Kwayoyin kayan aikin suna ci gaba da haɗa aikin don saurin aiki.

Aiwatar da Gaskiyar Duniya

LLAMA, Mistral, da Qwen duk sun maye gurbin LayerNorm tare da RMSNorm don aske larwar ƙima akan kowace alama.

Pre-LN yana ƙyale samfuran salon GPT su horar ba tare da dumama ƙimar koyo wanda 2017 Post-LN transfomer ke buƙata

QK-normalization yana amfani da RMSNorm akan tambayoyin hankali da maɓalli don dakatar da fashe fashe a cikin manyan samfura.

Wayoyin hannu da na'urorin wuta na gefe suna ɗaukar RMSNorm saboda faduwa ma'ana da son zuciya yana rage zirga-zirgar ƙwaƙwalwar ajiya

Hanyoyin Aiwatarwa

RMSNorm da Pre-Layer Normalization a aikace

LLAMA, Mistral, da Qwen duk sun maye gurbin LayerNorm tare da RMSNorm don aske larurar rashin fahimta akan kowace alama.

LLAMA, Mistral, da Qwen duk sun maye gurbin LayerNorm tare da RMSNorm don aske latency akan kowane alama Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefen, da kuma bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

RMSNorm da Pre-Layer Normalization a aikace

Pre-LN yana ƙyale samfuran salon GPT su horar ba tare da ɗumi-ɗumi na koyo wanda 2017 Post-LN taswira ke buƙata.

Pre-LN yana ba da damar ƙirar GPT-style horarwa ba tare da dumama darajar koyo cewa 2017 Post-LN transformer da ake bukata Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'i, da kuma bin diddigin nasarorin samarwa da ƙimar kuskure a kan lokaci.

RMSNorm da Pre-Layer Normalization a aikace

QK-normalization yana amfani da RMSNorm akan tambayoyin kulawa da maɓalli don dakatar da fashe fashe a cikin manyan samfura.

QK-normalization yana amfani da RMSNorm akan tambayoyin kulawa da maɓalli don dakatar da logits daga fashewa a cikin manyan samfura Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

RMSNorm da Pre-Layer Normalization a aikace

Wayoyin hannu da na'urorin wuta na gefe suna ɗaukar RMSNorm saboda faduwa ma'ana da son zuciya yana rage zirga-zirgar ƙwaƙwalwar ajiya.

Wayoyin hannu da na'urorin canji na gefe suna ɗaukar RMSNorm saboda faduwa ma'ana da son rai yana rage zirga-zirgar ƙwaƙwalwar ajiya Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da farashi na kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.

!

Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.

!

Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.

Taswirar Hanya

1

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike