GUIDE teknik

Jokkoo bu mbooloo ak NCCL

Jokkoo bu mbooloo mooy anam wi kuréelu GPU di weccoo ak boole ay done, te NCCL mooy bibliotek bu NVIDIA biy tax weccoo yooyu gaaw lool.

Résumé

Jokkoo bu mbooloo mooy anam wi kuréelu GPU di weccoo ak boole ay done, te NCCL mooy bibliotek bu NVIDIA biy tax weccoo yooyu gaaw lool. Liggéeyukaay yu melni all-reduce mooy beat xol bu tàggat buñ séddale, di boole gradient yi ci bepp GPU jéego bu nekk.

Jokkoo bu mbooloo ak NCCL ab bloku tabax xarala la buy indi njeexital ci kalite model bi, njëgu infrastructure bi, latency, ak wóor ci escale bi.

Plongeur bu xóot

Taggat ab model bu mag dafay tekki ni GPU bu nekk dafay xayma ay gradient ci ay done boppam, kon GPU yépp dañu wara déggoo ci benn resultaa buñ boole balaa ñuy jël jéego bu ci topp. Koordinaasioŋ boobu ñu ngi koy defee ci liggéey yu mbooloo: wàññi lépp ci valeur yi ci GPU yi, ba noppi jox ku nekk resultaa bi; all-gather dafay dajale piyeesu GPU bu nekk ci benn kopi bu mat sëkk ci ñoom ñépp; broadcast dafay yónnee benn done GPU ci yeneen yi; wàññi-tasaaroo boole daal di xaaj. NCCL (Bibliothèque de communication collective de NVIDIA) dafay jëfandikoo lii ci anam wu jaar yoon ci GPU yi ci serwër bi ak ci serwër yi, di jëfandikoo algorithm yu xam topologie yu melni ring ak garab all-reduce. Dafay jëfandikoo NVLink ci biir benn node ak InfiniBand wala RoCE diggante node yi, te mooy yax giy jokkoo ci suufu PyTorch DDP, FSDP, DeepSpeed, ak Megatron.

Gis-gis xarala

Ring all-reduce mooy algorithm bu yàgg bi: GPU yi dañuy forme benn ring logique, ba noppi done yi dañu leen xaaj ay pàcc yuy wër suko defee jéego bu nekk di jaxasoo ci jokkoo, loolu mooy tax bandwidth bi gëna baax te amul benn njeexte ci limu GPU yi. Ci node yu bari, algorithm yu sukkandiko ci garab dañuy wàññi latency ci boole resultaa yi ci hierarchie. NCCL dafay gis topologie bi ci boppam, tànn algorithm bi gëna baax, te mën na wàññi math bi ci reso bi ak NVIDIA SHARP, di xaaj done yi wara jaar ci lëkkalekaay yi.

Jokkoo bu mbooloo ak NCCL

Jokkoo bu mbooloo mooy anam wi kuréelu GPU di weccoo ak boole ay done, te NCCL mooy bibliotek bu NVIDIA biy tax weccoo yooyu gaaw lool. Liggéeyukaay yu melni all-reduce mooy beat xol bu tàggat buñ séddale, di boole gradient yi ci bepp GPU jéego bu nekk. Jokkoo bu mbooloo ak NCCL ab bloku tabax xarala la buy indi njeexital ci kalite model bi, njëgu infrastructure bi, latency, ak wóor ci escale bi. Ngir tabax xam-xam bu xóot, jàppee Collective Communication ak NCCL ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo Jokkoo bu Bokk ak NCCL dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu Jokkoo bu Bokk ak NCCL

Ginaaw clusters yi dañuy gëna yokk ba yegg ci téemeeri junni GPU, jokkoo dafay gëna ëpp doole ci diiru tàggat yaram, kon bibliotek yu mbooloo yi dañuy nekk frontiere bu tàng. Xaarandil ordinatër bu gëna xóot ci biir reso bi (switch yi ñooy wàññi), ordinatër ak jokkoo bu gëna baax ngir nëbb latency, ak collectives yu gëna néew precision yuy wàññi bytes yiñ toxal. Competition bi dafay yokk itam, ak jeegoowu jaaykat yu bari ak RDMA bu Ethernet di push yeneen pexe, ci noonu la NCCL di wéy di gëna dëgëral lëkkaloo ak NVLink, NVSwitch, ak tissu optique yuy génn.

Doxal ci àdduna dëgg

Dañuy boole gradient yi ci bépp jéego bu ñuy tàggat ci GPU yépp di jëfandikoo wàññi lépp ci PyTorch DistributedDataParallel

Xaajale gëna xéewale réew yi ak dajale ay parametre ci laaj ak dajale lépp ak wàññi tasaaroo ci FSDP wala DeepSpeed ZeRO

Diffuser diisaayu model bi njëkk ci benn GPU dem ci ñeneen ñi yépp ci ndoortelu tàggat yaram

Jëfandikoo ring ngir wàññi lépp ci NVLink ak InfiniBand ngir tëye yaatuwaayu band bi ci biir cluster GPU yu bari

Modèlu jëfandikoo

Jokkoo bu mbooloo ak NCCL ci jëf

Dañuy boole gradient yi ci bépp jéego bu ñuy tàggat ci GPU yépp di jëfandikoo wàññi lépp ci PyTorch DistributedDataParallel.

Synchronising gradients ci bépp jéego bu tàggat ci GPUs yépp di jëfandikoo lépp-waññi ci PyTorch DistributedDataParallel Teams dañuy faral di am njariñ yu gëna baax suñu leeralee kalite ci kanam, tëye yoonu eskalaasioŋ nit ngir jafe-jafe yi, ba noppi topp njuréefi liggéey ak njëgu njuumte ci diir bi.

Jokkoo bu mbooloo ak NCCL ci jëf

Xaajale xeetu optimisatër ak dajale ay parametre ci laaj ak dajale lépp ak wàññi tasaaroo ci FSDP wala DeepSpeed ZeRO.

Sharding optimizer réew ak dajale ay parametre ci laaj ak lépp-dajale ak wàññi-tasaaroo ci FSDP wala DeepSpeed ​​ZeRO Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ak topp njuréefi produit ak njëgu njuumte ci diir bi.

Jokkoo bu mbooloo ak NCCL ci jëf

Diffuser poid model yu njëkk yi ci benn GPU ba ci ñeneen ñi yépp ci ndoortelu daw tàggat yaram.

Diffusion poid model yu njëkk yi ci benn GPU ba ci ñeneen ñi yépp ci ndoortelu tàggat-yaram Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Jokkoo bu mbooloo ak NCCL ci jëf

Jëfandikoo ring ngir wàññi lépp ci kaw NVLink ak InfiniBand ngir tëye yaatuwaayu band bi ci cluster GPU yu bari node.

Jëfandikoo ring all-reduce ci kaw NVLink ak InfiniBand ngir tëye bandwidth bu kawe ci clusters GPU yu bari. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njuumte ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu