UMHLAHLANDLELA Wobuchwepheshe

Ukuqeqeshwa Okunembayo Okuxubile

Ukuqeqeshwa kokunemba okuxubile kusheshisa ukuqeqeshwa kwenethiwekhi ye-neural futhi kunqamule ukusetshenziswa kwememori ngokwenza izibalo eziningi endaweni eyi-16-bit endaweni ye-32-bit.

Uhlolojikelele

Ukuqeqeshwa kokunemba okuxubile kusheshisa ukuqeqeshwa kwenethiwekhi ye-neural futhi kunqamule ukusetshenziswa kwememori ngokwenza izibalo eziningi endaweni eyi-16-bit endaweni ye-32-bit. Ivumela amamodeli afanayo e-GPU aqeqeshe amamodeli amakhulu ngokushesha cishe akukho ukulahlekelwa ngokunemba.

I-Mixed Precision Training iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini.

I-Deep Dive

Ukuqeqeshwa kwendabuko kugcina izisindo futhi kusebenzisa izibalo endaweni engu-32-bit elintantayo (FP32). Ukunemba okuxubile kusebenzisa amafomethi anonemba aphansi we-16-bit (FP16 noma i-bfloat16) ekuphindaphindeni kwe-matrix esindayo, kuyilapho kugcinwa 'ikhophi eyinhloko' engu-32-bit yezisindo ukuze kube nezibuyekezo ezizinzile. Ngenxa yokuthi izinombolo ezingu-16-bit ziwuhhafu wosayizi, zilingana kakhudlwana kumemori ye-GPU futhi i-Tensor Cores zicubungula cishe ngo-2-8x ngokushesha. Ukubamba ububanzi be-FP16: ama-gradient amancane angageleza aze afike ku-zero. Ukulungiswa okujwayelekile isikali sokulahlekelwa, esiphindaphinda ukulahlekelwa ngento enkulu ngaphambi kokusabalalisa i-backpropagation ukuze ama-gradients amancane ahlale emelele, bese kukuhlukanisela emuva ngaphambi kokubuyekezwa kwesisindo. I-Apex ye-NVIDIA kanye ne-AMP eyakhelwe ngaphakathi (Automatic Mixed Precision) ku-PyTorch kanye ne-TensorFlow zenza lokhu ngokuzenzakalelayo.

I-Technical Insight

I-FP16 inama-exponent bits angu-5 kuphela, enikeza ububanzi obuncane obuguquguqukayo obubangela ukugeleza okuphansi kwe-gradient. I-Bfloat16 igcina amabhithi e-eksponenti angu-8 (okufana nobubanzi be-FP32) kodwa amabhithi e-mantissa ambalwa, ngakho ayivamisile ukudinga ukukala ukulahlekelwa - isizathu esiyinhloko Google ama-TPU nama-GPU esimanje athanda yona. I-Tensor Cores isheshisa umsebenzi ngokuphindaphinda ama-operand angu-16-bit kodwa iqongelela isamba semali esiyingxenye ku-FP32, igcina ukunemba lapho amaphutha esifinyezo ebengahlangana khona.

Ingcweti Ukuqeqeshwa Ukunemba Okuxubile

Ukuqeqeshwa kokunemba okuxubile kusheshisa ukuqeqeshwa kwenethiwekhi ye-neural futhi kunqamule ukusetshenziswa kwememori ngokwenza izibalo eziningi endaweni eyi-16-bit endaweni ye-32-bit. Ivumela amamodeli afanayo e-GPU aqeqeshe amamodeli amakhulu ngokushesha cishe akukho ukulahlekelwa ngokunemba. I-Mixed Precision Training iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini. Ukuze wakhe ukuqonda okujulile, phatha I-Mixed Precision Training njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, cacisa ukuqagela, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-Mixed Precision Training alungiselela izakhiwo, idatha, nokukhetha kwengqalasizinda ngokumelene nokuthembeka nezindleko. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ngesikhathi esifanayo, Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa Lokuqeqeshwa Okunembayo Okuxubile

Ukunemba kulokhu kwehla. Ukuqeqeshwa kwe-FP8, okusekelwa ku-NVIDIA Hopper kanye ne-Blackwell GPUs, kuba indinganiso kumamodeli asemngceleni, futhi ucwaningo lwe-FP4 namafomethi we-microscaling (MXFP) luqhubekela phambili. Lindela izinhlaka zokukhetha ngokuzenzakalelayo ukunemba kwesendlalelo ngasinye, izingxenyekazi zekhompuyutha ukuze ziphathe amafomethi ahlala encipha, kanye nokuqeqeshwa kokwazi ukulinganisa ukuze kufiphazwe umugqa phakathi kokuqeqeshwa okunemba okuphansi nokuchazwayo, kunciphe izindleko zokuqeqesha amamodeli wepharamitha eyizigidigidi.

Ukuqaliswa Komhlaba Wangempela

I-PyTorch's torch.cuda.amp.autocast isonga iluphu yokuqeqesha ukuze icishe ihhafu inkumbulo kanye nokuphuma kabili kwe-GPU eyodwa

Ukuqeqesha amamodeli amakhulu olimi afana neziguquli zesitayela se-GPT ku-bfloat16 kuma-TPU ukugwema ukushuna kokulahlekelwa

Ukufaka usayizi weqoqo elikhudlwana kumthengi we-RTX GPU ngokushintsha ukuqeqeshwa kwesithombe se-ResNet kusuka ku-FP32 kuye ku-FP16

Ukunemba okuxubile kwe-FP8 kuma-NVIDIA H100 GPUs ukuze kwehliswe izindleko zamamodeli esikali somngcele

Amaphethini Okusebenzisa

Ukuqeqeshwa Okunembayo Okuxubile kusebenza

I-PyTorch's torch.cuda.amp.autocast igoqa iluphu yokuqeqesha ukuze icishe ihhafu inkumbulo kanye nokuphuma kabili kwe-GPU eyodwa.

I-PyTorch's torch.cuda.amp.autocast egoqa iluphu yokuqeqesha ukuze icishe ihhafu inkumbulo kanye nokuphindaphinda kabili kokuthi Amaqembu e-GPU eyodwa ngokuvamile athola imiphumela engcono lapho echaza izinga eliphezulu ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Ukuqeqeshwa Okunembayo Okuxubile kusebenza

Ukuqeqesha amamodeli amakhulu olimi afana neziguquli zesitayela se-GPT ku-bfloat16 kuma-TPU ukugwema ukushuna kwesikali sokulahlekelwa.

Ukuqeqesha amamodeli amakhulu olimi afana neziguquli zesitayela se-GPT ku-bfloat16 kuma-TPU ukuze agweme ukushuna izinga lokulahlekelwa Amathimba ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Ukuqeqeshwa Okunembayo Okuxubile kusebenza

Ukufaka usayizi weqeqebana elikhudlwana kumthengi we-RTX GPU ngokushintsha ukuqeqeshwa kwesithombe se-ResNet kusuka ku-FP32 kuye ku-FP16.

Ukufaka usayizi weqeqebana elikhudlwana kumthengi we-RTX GPU ngokushintsha ukuqeqeshwa kwesithombe se-ResNet kusuka ku-FP32 kuya ku-FP16 Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Ukuqeqeshwa Okunembayo Okuxubile kusebenza

Ukunemba okuxubile kwe-FP8 kuma-NVIDIA H100 GPUs ukuze kwehliswe izindleko zamamodeli esikali somngcele.

Ukunemba okuxubile kwe-FP8 kuma-NVIDIA H100 GPUs ukuze kuncishiswe izindleko zamamodeli esikali somngcele wokuqeqesha Amaqembu ngokuvamile athola imiphumela engcono lapho echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu.

!

Izindleko zengqalasizinda nezokulungisa zivame ukubukelwa phansi.

!

Izikhala zokuphepha nokubonakala zingakhula njengoba izinhlelo ziba nzima kakhulu.

Ukuqalisa Umhlahlandlela

1

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa.

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha.

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi.

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala.

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole