GUIDE teknik

FP8 ak formaa yu woyof

FP8 formaa nimero floating 8-bit la buy may model IA yi ñu denc ay poid ak def math ci jëfandikoo ñeent ci memory nimero 32-bit yiñ miin.

Résumé

FP8 formaa nimero floating 8-bit la buy may model IA yi ñu denc ay poid ak def math ci jëfandikoo ñeent ci memory nimero 32-bit yiñ miin. Pexe bu am solo la ngir defar model yu mag yu gëna yomb te gaaw ci tàggat ak serwiis.

FP8 ak format yu am njubte bu woyof, dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, latency bi, ak wóor gi ci escale bi.

Plongeur bu xóot

Reseau neuronal yi dañu defaree ay miliyaar ciy lim. Bu njëkkoon, nimero yooyu dañu daan jëfandikoo 32 bit (FP32) wala 16 bit (FP16/BF16) bu nekk. FP8 daf leen wàññi ba 8 bit, dagg memory ak bandwidth ci genn-wàll ak 16-bit. Amna ñaari xeetu FP8 yuñ gëna xam: E4M3 (4 bit exponent, 3 bit mantissa) dafay joxe lu gëna leer waaye lu gëna ndaw, E5M2 (5 exponent, 2 mantissa) dafay joxe lu gëna yaatu waaye jéego yu gëna ñaawe. Kompromis bi mooy njub: bit yu néew mooy njuumte yu wër. Ngir baña gaawa jeex, kaadar yi dañuy jëfandikoo facteur scaling tensor bu nekk wala bloc bu nekk yuy scale valeur yi ci rang biñ mëna jëfandikoo ci FP8. GPU Hopper ak Blackwell yu NVIDIA yokk nañu ci motëri matrix FP8, muy lu baax ci tàggat ak ci jël doggal. Format yu bees yu melni MXFP8, MXFP4, ak NVFP4 dañuy gëna wàcci ak ay blok yu ñuy séddoo.

Gis-gis xarala

Jafe-jafe FP8 mooy rang dynamique. Ak ay bit exponent yu néew, aktivasioŋ yu mag wala yu ndaw dañuy fees wala ñuy wàcci ba zero. Fix bi mooy scaling: yokk benn tensor ak benn facteur suko defee valeur yi wàcci ci palanteer bi FP8 mëna representé, defal FP8 yokk-accumuler, ba noppi xaajalewaat, di faral di accumuler sommes partielles ci gëna dëggu (FP16/FP32). E4M3 dañu koy gëna jëfandikoo ngir diisaay ak aktivaasioŋ, E5M2 ngir degrade yu rang bi gëna am solo ci njub.

Xam formaa FP8 ak formaa yu woyof

FP8 formaa nimero floating 8-bit la buy may model IA yi ñu denc ay poid ak def math ci jëfandikoo ñeent ci memory nimero 32-bit yiñ miin. Pexe bu am solo la ngir defar model yu mag yu gëna yomb te gaaw ci tàggat ak serwiis. FP8 ak format yu am njubte bu woyof, dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, latency bi, ak wóor gi ci escale bi. Ngir tabax xam-xam bu xóot, jàppal FP8 ak format yu ndaw yi ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo FP8 ak format yu woyof yi dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu FP8 ak formaa yu woyof yi

Precision mingi wàcci. Ginaaw FP8 dafa am formaa yu ndaw yu am 4 bit (MXFP4, NVFP4) yu am eskaal bu ndaw bu ñuy séddoo ci blok bu ndaw bu nekk, te leegi aparey Blackwell dafay gaaw FP4 ci saasi. Xaarandil rëset yu wuute ci njub, fu ay diisaay yu wuute di jëfandikoo yaatuwaayu bit yu wuute, boole ci tàggat yaram bu gëna am xam-xam ci kantite, suko defee 4-bit nekk default ngir inference. Endgame bi mooy tëye model yu ndaw yi ci chips yu gëna néew, yu yomb te duñu ñàkk benn kalite buñ mëna natt.

Doxal ci àdduna dëgg

Taggat xeetu làkk yu mag ci GPU NVIDIA Hopper/Blackwell di jëfandikoo FP8 ngir yokk lu tollu ci ñaari yoon limu mëna def ci BF16

Defar chatbot inference ci FP8 suko defee benn model mëna méngoo ak GPU yu néew te tontu laajte yu bari ci segond bu nekk

Jëfandikoo E5M2 ngir jokkoo gradient ci diiru tàggat buñ séddale ngir dagg bandwidth reso bi ci diggante node yi

Doxal MXFP4/NVFP4-modèle yu bari ngir méngoo ak modelu scale frontière ci benn GPU bu am mémoire bu bari ngir am inference bu yomb

Modèlu jëfandikoo

FP8 ak formaa yu baaxul ci jëf

Taggat xeetu làkk yu yaatu ci GPU NVIDIA Hopper/Blackwell di jëfandikoo FP8 ngir yokk lu tollu ci ñaari yoon limuy génne ci BF16.

Taggat xeetu làkk yu mag ci NVIDIA Hopper / Blackwell GPUs jëfandikoo FP8 ngir lu tollu ci ñaari yoon limu génne ci BF16 Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee kalite ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njuréefi liggéey ak njuumte ci diir bi.

FP8 ak formaa yu baaxul ci jëf

Liggéeyukaay chatbot inference ci FP8 suko defee benn model mëna méngoo ak GPU yu néew te tontu laajte yu bari ci segond bu nekk.

Serving chatbot inference ci FP8 suko defee benn model mëna méngoo ak GPUs yu néew te tontu laajte yu gëna bari ci segond bu nekk. Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

FP8 ak formaa yu baaxul ci jëf

Jëfandikoo E5M2 ngir jokkoo gradient ci diiru tàggat buñ séddale ngir dagg bandwidth reso bi ci diggante node yi.

Jëfandikoo E5M2 ngir jokkoo gradient ci diiru tàggat buñ séddale ngir dagg bandwidth reso bi diggante node yi Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

FP8 ak formaa yu baaxul ci jëf

Doxal MXFP4/NVFP4-modèle yuñ kantite ngir méngoo ak modelu scale frontière ci benn GPU bu am mémoire bu kawe ngir am inference bu yomb.

Defar MXFP4 / NVFP4-modèle quantifié ngir méngoo ak modelu frontière-scale ci benn GPU bu am mémoire bu kawe ngir am inference bu gëna yomb.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu