Overview
RMSNorm inzvimbo yakareruka yekujairisa iyo inodzoreredza activation nemidzi yavo inoreva square, uye pre-layer normalization nzvimbo dzinotsika pamberi pega yega sublayer kwete mushure. Pamwe chete vanoita zvakadzika ma transformer kudzidzisa zvakatsiga pasina warmup tricks.
RMSNorm uye Pre-Layer Normalization chivakwa chehunyanzvi chinobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero.
Deep Dive
Standard LayerNorm inobvisa zvinoreva uye inokamura neyakajairwa kutsauka pane imwe vheta, yobva yaisa chikero chakadzidzwa uye shanduko. RMSNorm, yakaunzwa naZhang naSennrich muna 2019, inodonhedza kureva-pakati uye kusarura zvachose: inongopatsanura yega yega vector nemudzi inorevesa square yezvinhu zvayo uye inowanza neyakadzidziswa pane-chimwe chinhu kuwana. Izvi zvinobvisa imwe nhamba uye akati wandei mashandiro, kucheka komputa neinoda kusvika 10-50% mune yakajairwa layer uku ichifananidza kurongeka. Neparutivi, iyo 'Pre-LN' yekuisa (yakajairika pamberi pekutarisa / MLP, ine yakachena yakasara nzira yakaitenderedza) inochengeta gradient magnitudes yakasungwa pakutanga, saka modhi seGPT-3, LLaMA, uye PaLM chitima pasina kudzidza-chiyero chekudziya hacks yaidiwa yekutanga Post-LN transformer.
Technical Insight
Kune vhekita x ye dimension d, RMSNorm inokokorodza x_i * g_i / sqrt((1/d) * sum(x_j^2) + epsilon), uko g inodzidza gain vector. Hapana chirevo chekubvisa uye hapana kusarura. Nekuti iyo yakasara murukova yePre-LN inodarika iyo yakajairwa, nzira yekuzivikanwa inoramba isina kubatwa uye magradients anoyerera akananga kubva kunobuda kuenda kune yekuisa, ndosaka yakadzika ma stacks achisangana.
Mastering RMSNorm uye Pre-Layer Normalization
RMSNorm inzvimbo yakareruka yekujairisa iyo inodzoreredza activation nemidzi yavo inoreva square, uye pre-layer normalization nzvimbo dzinotsika pamberi pega yega sublayer kwete mushure. Pamwe chete vanoita zvakadzika ma transformer kudzidzisa zvakatsiga pasina warmup tricks. RMSNorm uye Pre-Layer Normalization chivakwa chehunyanzvi chinobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero. Kuvaka kunzwisisa kwakadzama, bata RMSNorm uye Pre-Layer Normalization semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvaunoda mhedzisiro, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.
Mukuita, zvikwata zvakasimba zvinoshandisa RMSNorm uye Pre-Layer Normalization inokwidziridza zvivakwa, data, uye sarudzo dzezvivakwa zvinopesana nekuvimbika uye mutengo. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.
Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Panguva imwecheteyo, Kukwirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.
Strategic Impact
Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore.
Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete.
Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira.
Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Real-World Implementation
LLaMA, Mistral, uye Qwen vese vanotsiva LayerNorm neRMSNorm yekuveura inference latency pachiratidzo chega chega.
Pre-LN inobvumira maGPT-maitiro ekudzidzira pasina yekudzidzira-chiyero chekudziya icho 2017 Post-LN transformer yaidiwa.
QK-normalization inoshandisa RMSNorm pamibvunzo yekutarisisa uye makiyi ekumisa matanda kubva kuputika mumhando huru.
Nharembozha uye kumucheto kushandura kunotora RMSNorm nekuti kudonhedza zvinoreva uye kusarura kunoderedza ndangariro traffic
Maitiro Ekuita
RMSNorm uye Pre-Layer Normalization mukuita
LLaMA, Mistral, uye Qwen ese anotsiva LayerNorm neRMSNorm kugera inference latency pane yega tokeni.
LLaMA, Mistral, uye Qwen vese vanotsiva LayerNorm neRMSNorm yekuveura inference latency pane yega tokeni Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
RMSNorm uye Pre-Layer Normalization mukuita
Pre-LN inobvumira maGPT-maitiro ekudzidzira pasina kudzidza-chiyero chekudziya chinodiwa ne2017 Post-LN transformer.
Pre-LN inoita kuti maGPT-maitiro edzidzire asina kudzidza-chiyero chekudziya icho 2017 Post-LN transformer inodiwa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.
RMSNorm uye Pre-Layer Normalization mukuita
QK-normalization inoshandisa RMSNorm pamibvunzo yekutarisisa uye makiyi ekumisa matanda kubva kuputika mumamodhi makuru.
QK-normalization inoshandisa RMSNorm pamibvunzo yekutarisisa uye makiyi ekumisa matanda kubva kuputika mumamodhi mahombe Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
RMSNorm uye Pre-Layer Normalization mukuita
Nharembozha uye kumucheto kushandura kunotora RMSNorm nekuti kudonhedza zvinoreva uye kusarura kunoderedza ndangariro traffic.
Nharembozha uye madhirivhari emupendero anotora RMSNorm nekuti kudonhedza zvinoreva uye kusarura kunoderedza ndangariro traffic Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Njodzi & Guardrails
Kugadzirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba.
Infrastructure uye mari yekugadzirisa inowanzotarisirwa pasi.
Chengetedzo uye kucherechedzwa mapundu anogona kukura sezvo masisitimu anowedzera kuoma.
Implementation Roadmap
Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa.
Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Benchmark pasi pechokwadi mutoro uye data mamiriro.
Benchmark pasi pechokwadi mutoro uye data mamiriro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro.
Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera.
Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.