GUIDE teknik

Sharding ak tàggat yaram buñ mëna delloosi

Pexem denc xeetu tàggat yaram ci ay piyees (shards) suko defee model yu mag yi mëna denc ak sarsewaat te duñu tënk ci memory wala disk limits, suko defee benn run bu tass mën na jël fimu bàyyeewoon ndànk.

Résumé

Pexem denc xeetu tàggat yaram ci ay piyees (shards) suko defee model yu mag yi mëna denc ak sarsewaat te duñu tënk ci memory wala disk limits, suko defee benn run bu tass mën na jël fimu bàyyeewoon ndànk. Dafay am solo ci bépp liggéeyu tàggat yaram buy dem ay fan wala ay ayi-bis ci GPU yu bari.

Checkpoint Sharding ak Taggat buñ mëna delloosi, jumtukaay la buy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, latency bi, ak wóor gi ci escale bi.

Plongeur bu xóot

Checkpoint tàggat yaram dafay wane lépp lu nit ñi soxla ngir mëna dellu ci liggéey bi: poids model, stade optimisateur, oraire de taux d'apprentissage, position loader de données, ak seeds generateur numéro aléatoire. Ci model yu mag yi, nataal bi mën na nekk téemeeri gigabyte, muy lu yaatu lool ngir benn fichier wala benn mémoire masin. Checkpoint sharding dafay xaaj nataal bi ci fichier yu bari ak rang yu bari, kon GPU bu nekk dafay bind boppam ci paralel. Resumable tàggat yaram suko defee sarsewaat yooyu shards ba noppi defaraat stade bu mat sëkk. Su ko amul, dawal ayu-bis yu bari buy daanu ci waxtu 200 dafa wara tàmbaliwaat ci noonu. Kadre yu melni PyTorch, DeepSpeed, ak formaa safetensor yu Hugging Face Hub dañuy def lii ñuy faral di def.

Gis-gis xarala

Sharding dafay dox ndax tàggat-yaram buñ séddale xaajale na poid yi ak réew yi gëna xéewale ci rang yi (jaaraleko ci done, tensor, wala parallelism ZeRO). Rank bu nekk dafay serialize xaaj bimu am, lu ci bari ci formaa yu melni safetensors yuy may chargement bu tayeel, buñ defaree mémoire. Fichier index dafay màndargaal tur parametre yi ci fichier shard yi. Ngir yeggaat ci anam wu jaar yoon, sistem bi dafay wéy di nekk RNG, lim jéego yi optimiser bi def, ak offset dataloader bi gëna jub, suko defee rerun bi di génne benn xeetu lots yi.

Xam Sharding ak tàggat yaram buñ mëna delloosi

Pexem denc xeetu tàggat yaram ci ay piyees (shards) suko defee model yu mag yi mëna denc ak sarsewaat te duñu tënk ci memory wala disk limits, suko defee benn run bu tass mën na jël fimu bàyyeewoon ndànk. Dafay am solo ci bépp liggéeyu tàggat yaram buy dem ay fan wala ay ayi-bis ci GPU yu bari. Checkpoint Sharding ak Taggat buñ mëna delloosi, jumtukaay la buy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, latency bi, ak wóor gi ci escale bi. Ngir tabax xam-xam bu xóot, jàppal Checkpoint Sharding ak Resumable Training ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo Checkpoint Sharding ak Tàggat buñ mëna Delloosi dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu Sharding ak tàggat yuñ mëna delloo

Checkpointing mingi joge ci xew-xew buy taxawal àdduna bi dem ci lu asynchrone te daanaka amul benn xaalis. Xaarandi yeneen ci-memoire ak checkpointing yuñ jaxasoo yuy bind ay shards ci ginaaw bi tàggat bi wéyee, boole ci checkpoints yuñ kodde efaase ak yuñ toppandoo yuy mucc ci node yu bari ci escale junni-GPU. Màngasin yi am ci niir yi ak niveau NVMe yu gëna gaaw yi dina ñu dalal ay shards, ak formaa yuñ yamale yu melni safetensors dina ñu wéy di gëna wóor, gaaw, yobbu parsiel ngir tàmbaliwaat tàggat yaram ak jëfandikoo inference.

Doxal ci àdduna dëgg

Benn modelu frontier dafay jaar ci ay junni GPU yuy denc ay checkpoint sharded ay téemeeri jéego yu néew yu nekk, suko defee benn node bu lajj du yàgg ay simili, du ay fan.

Hugging Face dafay séddale xeetu ubbeeku bu yaatu ni ay safetensor yu bari boole ci index.json suko defee jëfandikukat yi mëna ko yebbi ak sarse ko piyees par piyees.

Gëstukat bi dafay tàmbaliwaat ab fine-tune buñu dagg buy defaraat momentum optimiser bi, lim jéego yi, ak position dataloader bi ngir wéy di dox bu baax.

Spot-instance tàggat ci GPUs cloud yu yomb, fu ay checkpoint yu bari di tax liggéey bi mucc suñu ko dàqee ba noppi ñu defaraat ko.

Modèlu jëfandikoo

Checkpoint Sharding ak tàggat yaram buñ mëna delloo ci jëf

Benn modelu frontier dafay jaar ci ay junni GPU yuy denc ay checkpoint sharded ay téemeeri jéego yu néew yu nekk, suko defee benn node bu lajj du yàgg ay simili, du ay fan.

Benn modelu frontier dafay jaar ci ay junni GPU yuy denc ay poñ yu sharded ay téemeeri jéego yu néew yu nekk, kon benn node bu lajj du am ludul ay simili, du ay fan.

Checkpoint Sharding ak tàggat yaram buñ mëna delloo ci jëf

Hugging Face dafay séddale xeetu ubbeeku bu yaatu ni ay safetensor yu bari boole ci index.json suko defee jëfandikukat yi mëna ko yebbi ak sarse ko piyees par piyees.

Hugging Face di séddale xeetu ubbeeku bu yaatu ni safetensors shards yu bari boole ci index.json suko defee jëfandikukat yi mëna yebbi ak sarse ko piyees par piyees. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njuréefi produit yi ak njëgu njuumte yi.

Checkpoint Sharding ak tàggat yaram buñ mëna delloo ci jëf

Gëstukat bi dafay tàmbaliwaat ab fine-tune buñu dagg buy defaraat momentum optimiser bi, lim jéego yi, ak position dataloader bi ngir wéy di dox bu baax.

Gëstukat bi dafay tàmbaliwaat benn fine-tune bu dagg buy defaraat momentum optimizer bi, lim jéego yi, ak position dataloader bi ngir wéyal bu baax. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee thresholds yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njuumte ci diir bi.

Checkpoint Sharding ak tàggat yaram buñ mëna delloo ci jëf

Spot-instance tàggat ci GPUs cloud yu yomb, fu ay checkpoint yu bari di tax liggéey bi mucc suñu ko dàqee ba noppi ñu defaraat ko.

Spot-instance training ci GPUs cloud yu yomb, fu ay checkpoints yu bari di bàyyi liggéey bi mucc ñu dàq ko ak ñu defaraat ko.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu