Technical GUIDE

Mixed Precision Training

Yakasanganiswa chaiyo kudzidziswa inomhanyisa neural network kudzidziswa uye inocheka kushandiswa kwendangariro nekuita masvomhu mazhinji mu16-bit inoyangarara nzvimbo panzvimbo ye32-bit.

Overview

Yakasanganiswa chaiyo kudzidziswa inomhanyisa neural network kudzidziswa uye inocheka kushandiswa kwendangariro nekuita masvomhu mazhinji mu16-bit inoyangarara nzvimbo panzvimbo ye32-bit. Iyo inobvumira iyo yakafanana GPU kudzidzisa mahombe mamodheru nekukurumidza nekupotsa pasina kurasikirwa mukurongeka.

Yakasanganiswa Precision Kudzidziswa inyanzvi yekuvaka inobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero.

Deep Dive

Dzidzo yechinyakare inochengeta huremu uye inomhanyisa masvomhu mu32-bit inoyangarara poindi (FP32). Kurongeka kwakasanganiswa kunoshandisa yakaderera-chaiyo 16-bit mafomati (FP16 kana bfloat16) kune inorema matrix kuwanda, uku uchichengeta 32-bit 'tenzi kopi' yezviyero zvekugadzirisa zvakatsiga. Nekuti 16-bit manhamba ihafu yehukuru, yakawanda inokodzera muGPU ndangariro uye Tensor Cores inoagadzirisa inosvika 2-8x nekukurumidza. Iyo inobata ndeye FP16 yakatetepa renji: madiki gradients anogona kuyerera kusvika zero. Iyo yakajairwa gadziriso ndeyekurasikirwa kuyera, iyo inowanza kurasikirwa nechinhu chakakura pamberi pekudzokera kumashure kuitira kuti madiki magradients arambe achimiririka, wozoipatsanura kunze isati yagadziriswa uremu. NVIDIA's Apex uye yakavakirwa-mukati AMP (Otomatiki Yakasanganiswa Precision) muPyTorch uye TensorFlow otomatiki izvi.

Technical Insight

FP16 inongova ne5 exponent bits, ichipa diki ine simba renji inokonzera gradient underflow. Bfloat16 inochengeta masere eexponent bits (anoenderana neFP32's renji) asi mashoma mantissa bits, saka haiwanzoda kurasikirwa kuyera - chikonzero chakakosha Google TPU uye maGPU azvino anoifarira. Tensor Cores inomhanyisa basa nekuwanza 16-bit operands asi ichiunganidza zvishoma zvishoma muFP32, ichichengetedza iko chaiko uko kukanganisa kukanganisa kwaizowedzera.

Mastering Mixed Precision Training

Yakasanganiswa chaiyo kudzidziswa inomhanyisa neural network kudzidziswa uye inocheka kushandiswa kwendangariro nekuita masvomhu mazhinji mu16-bit inoyangarara nzvimbo panzvimbo ye32-bit. Iyo inobvumira iyo yakafanana GPU kudzidzisa mahombe mamodheru nekukurumidza nekupotsa pasina kurasikirwa mukurongeka. Yakasanganiswa Precision Kudzidziswa inyanzvi yekuvaka inobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero. Kuti uvake kunzwisisa kwakadzama, tora Mixed Precision Training semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa Mixed Precision Training inogadzirisa zvivakwa, data, uye sarudzo dzezvivakwa zvinopesana nekuvimbika uye mutengo. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Panguva imwecheteyo, Kukwirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore.

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete.

Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira.

Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reMixed Precision Training

Precision inoramba ichidonha. FP8 kudzidziswa, inotsigirwa paNVIDIA Hopper uye Blackwell GPUs, yave kuita chiyero chemhando dzemuganhu, uye tsvakiridzo muFP4 uye microscaling mafomati (MXFP) inosundira mberi. Tarisira masisitimu ekusarudza otomatiki pa-layer nemazvo, Hardware kubata anogara-akamanikana mafomati, uye quantization-kuziva kudzidziswa kudzima mutsara pakati peiyo yakaderera-chaiyo kudzidziswa uye kufungidzira, ichideredza mutengo wekudzidzisa matrillion-parameter modhi.

Real-World Implementation

PyTorch's torch.cuda.amp.autocast ichiputira loop yekudzidzira kuita hafu yendangariro uye kupinza kaviri paGPU imwe chete.

Kudzidzisa mhando dzemitauro mikuru seGPT-maitiro ekushandura mu bfloat16 paTPU kudzivirira kurasikirwa-kuyera tuning.

Kukodzera saizi yakakura yebatch pamutengi RTX GPU nekuchinja ResNet mufananidzo kudzidziswa kubva kuFP32 kuenda kuFP16.

FP8 yakasanganiswa chaiyo paNVIDIA H100 GPUs yekucheka mutengo wepretraining frontier-scale modhi.

Maitiro Ekuita

Yakasanganiswa Precision Kudzidziswa mukuita

PyTorch's torch.cuda.amp.autocast ichiputira loop yekudzidzira kuti isvike nepakati ndangariro uye kupinza kaviri paGPU imwe chete.

PyTorch's torch.cuda.amp.autocast inoputira loop yekudzidzira kuti isvike nepakati ndangariro uye kupfuudza kaviri paGPU imwe chete Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Yakasanganiswa Precision Kudzidziswa mukuita

Kudzidzisa mhando dzemitauro mikuru seGPT-maitiro ekushandura mu bfloat16 paTPU kudzivirira kurasikirwa-kuyera tuning.

Kudzidzira mamodheru emitauro mikuru seGPT-maitiro ekushandura mu bfloat16 paTPUs kudzivirira kurasikirwa-kuyera tuning Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

Yakasanganiswa Precision Kudzidziswa mukuita

Kukodzera saizi yakakura yebatch pamutengi RTX GPU nekuchinja ResNet mufananidzo kudzidziswa kubva kuFP32 kuenda kuFP16.

Kukodzera saizi hombe yebhechi pamutengi RTX GPU nekuchinja ResNet mufananidzo kudzidziswa kubva kuFP32 kuenda kuFP16 Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

Yakasanganiswa Precision Kudzidziswa mukuita

FP8 yakasanganiswa chaiyo paNVIDIA H100 GPUs yekucheka mutengo wepretraining frontier-scale modhi.

FP8 yakasanganiswa chaiyo paNVIDIA H100 GPUs yekucheka mutengo wepretraining frontier-scale modhi Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Kugadzirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba.

!

Infrastructure uye mari yekugadzirisa inowanzotarisirwa pasi.

!

Chengetedzo uye kucherechedzwa mapundu anogona kukura sezvo masisitimu anowedzera kuoma.

Implementation Roadmap

1

Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa.

Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Benchmark pasi pechokwadi mutoro uye data mamiriro.

Benchmark pasi pechokwadi mutoro uye data mamiriro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro.

Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera.

Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora