GUIDE teknik

ZeRO ak Optimisatër yu xaaj

ZeRO (Zero Redondance Optimizer) dafay dindi ñaari yoon ci mémoire bu yàqu-yàqu ci paralelism done ci xaaj staadu optimisatër bi, degrade yi ak poid yi ci GPU yi.

Résumé

ZeRO (Zero Redondance Optimizer) dafay dindi ñaari yoon ci mémoire bu yàqu-yàqu ci paralelism done ci xaaj staadu optimisatër bi, degrade yi ak poid yi ci GPU yi. Daf lay may nga tàggat ay model yu rëy ci parallelism done yu yomb waaye ci memory bu GPU bu nekk.

ZeRO ak Sharded Optimizers dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi.

Plongeur bu xóot

Ci parallelism done bu gëna neex, GPU bu nekk dafay denc benn kopi bu mat sëkk ci staadu optimiser bi, gradient yi, ak ay parametre, te loolu dafay yàq lu bari, rawatina ci Adam, fu staadu optimiser bi mën nekk lu bari yoon ci dayo model bi ci boppam. ZeRO, bi Microsoft dugal ci DeepSpeed, dindi na redondance bi ci xaaj tensor yooyu ci GPU yi suko defee aparey bu nekk am benn dagg. ZeRO dafa am ñatti pàcc yu jëm kanam: pàcc 1 dafay xaaj staadu optimiser, pàcc 2 dafay yokk xaaj gradient, ak pàcc 3 dafay xaaj parametre yi ci seen bopp. Soo ko soxlaa, GPU yi dañuy dajale daggite yi ñàkk ci jokkoo, xayma, ba noppi bàyyi leen. Lépp soo ko boolee mu am memory bu gëna néew ci GPU bu nekk, loolu mooy tax ñu mëna tàggat ay paramet yu tollu ci ay miliyaar ba ay triliyoŋ, boole ci tëye xeetu prograam bu yomb bi ci paralelism done.

Gis-gis xarala

ZeRO dafay jënd ay jokkoo yu gëna bari ngir sakkanal mémoire. Ci 3ème étape bi, balaa layer bi di jàll ci kanam, all-gather dafay dajale parametre layer bi yépp ci GPU bu nekk; Ginaaw loolu ñu sànni daggite yiñ moomul ngir am mémoire bi. Degrade yi dañuy wàññeeku-tasaaroo, kon GPU bu nekk du am lenn ludul daggitu degrade bi méngoo ak parametre yi mu am. FSDP bu PyTorch (Done yuñ xaaj bu mat sëkk) dafay jëfandikoo benn xalaat bi ci boppam, di laxas ay modle ngir xaaj ak xaajwaat ci saasi.

Xam ZeRO ak Optimiser yu Sharded

ZeRO (Zero Redondance Optimizer) dafay dindi ñaari yoon ci mémoire bu yàqu-yàqu ci paralelism done ci xaaj staadu optimisatër bi, degrade yi ak poid yi ci GPU yi. Daf lay may nga tàggat ay model yu rëy ci parallelism done yu yomb waaye ci memory bu GPU bu nekk. ZeRO ak Sharded Optimizers dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu jumtukaay yi, yeexal, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal ZeRO ak Sharded Optimizers ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo ZeRO ak Sharded Optimizers dañuy gëna xéewale architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu ZeRO ak Optimiser yu Sharded

Sharding mingi nekk lu ñuy jagleel tàggat yaram bu yaatu, du tànneef bu wuute. Xaarandi mboolem bu gëna xóot ak yobbu (push ay daggite ci CPU wala NVMe jaaraleko ci ZeRO-Infinity), gëna jëmmal lépp-dajale ak wàññi-tasaaroo ak xayma ngir nëbb seen njëg, ak boole ak tensor ak parallelism pipeline. Ginaaw model yi dañuy gëna màgg, optimisatëru sharded yu am mémoire yi ñoo gëna am solo ngir méngale leen ak budget hardware yu dëggu yi.

Doxal ci àdduna dëgg

Jëfandikoo DeepSpeed ​​ZeRO Stage 2 ngir gëna suqali xeetu làkk bu am ay miliyaar ciy parametre, luko moy dina fees ci mémoire GPU.

Taggat ak PyTorch FSDP, mooy xaaj parametre yi, degrade yi, ak nekkinu optimisatër bi ci GPU yi ba noppi dajale leen ci couche bu nekk ci laaj bi.

Jëfandikoo ZeRO-Offload ngir puus nekkinu optimisatër bi ci mémoire CPU bi, may benn GPU mu tàggat model bu gëna mag VRAM bi yoon yu bari.

Eskaleer ab xeetu parametre bu am ay bilioŋ ak ZeRO-Infinity ci di dajale ay paramet yu bawoo ci dencukaay NVMe su GPU ak CPU jeexee.

Modèlu jëfandikoo

ZeRO ak Optimiser yu Sharded ci jëf

Jëfandikoo DeepSpeed ​​ZeRO Stage 2 ngir gëna suqali xeetu làkk bu am ay miliyaar ciy parametre, luko moy dina fees ci mémoire GPU.

Jëfandikoo DeepSpeed ZeRO Stage 2 ngir gëna xéewale xeetu làkk bu bari-parametre bu nara fees ci mémoire GPU Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

ZeRO ak Optimiser yu Sharded ci jëf

Taggat ak PyTorch FSDP, mooy xaaj parametre yi, degrade yi, ak nekkinu optimisatër bi ci GPU yi ba noppi dajale leen ci couche bu nekk ci laaj bi.

Taggat ak PyTorch FSDP, biy xaaj parametre yi, gradient yi, ak réewu optimiser ci GPUs yi ba noppi dajale leen ci layer bu nekk ci laaj. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi.

ZeRO ak Optimiser yu Sharded ci jëf

Jëfandikoo ZeRO-Offload ngir puus nekkinu optimisatër bi ci mémoire CPU bi, may benn GPU mu tàggat model bu gëna mag VRAM bi yoon yu bari.

Jëfandikoo ZeRO-Offload ngir push optimizer state ci CPU memory, bàyyi benn GPU tàggat benn model bu gëna mag lu bari yoon VRAM Teams dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ak topp njuréefi produit ak njëgu njuumte ci diir bi.

ZeRO ak Optimiser yu Sharded ci jëf

Eskaleer ab xeetu parametre bu am ay bilioŋ ak ZeRO-Infinity ci di dajale ay paramet yu bawoo ci dencukaay NVMe su GPU ak CPU jeexee.

Eskaleer benn xeetu parametre bu am trillion ak ZeRO-Infinity ci streaming parametre shards ci dencukaay NVMe su GPU ak CPU memory jeexee. Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee kalite ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njariñu liggéey ak njëgu njuumte ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu