GUIDE teknik

Reseau InfiniBand ak RDMA

InfiniBand ab lëkkaloo bu gaaw la, te amul latency bu bari, muy lëkkale serwër yi ak GPU yi ci cluster IA yi, RDMA dafay may benn masin mu jàng wala bind memory beneen masin te du jëfandikoo CPU bi.

Résumé

InfiniBand ab lëkkaloo bu gaaw la, te amul latency bu bari, muy lëkkale serwër yi ak GPU yi ci cluster IA yi, RDMA dafay may benn masin mu jàng wala bind memory beneen masin te du jëfandikoo CPU bi. Ñoom ñépp bokk ñooy plomberie biy tëye ay junni GPU yu am done ci diiru tàggat model yu mag.

Reseau InfiniBand ak RDMA dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi.

Plongeur bu xóot

Soo tàggatee benn model ci ay junni GPU, reso bi mooy faral di nekk bottleneck bi, du chips yi. InfiniBand tissu commuté la buñu defar ngir loolu: dafay joxe yaatuwaayu band ci lëkkalekaay bu nekk ci téemeeri gigabit ci seconde (NDR dafay daw ci 400 Gb/s) ak latency ci escale microseconde. Li gëna am solo mooy Remote Direct Memory Access (RDMA), muy toxal done yi ci diggante ñaari node, te du romb kernel sistem biy doxal ak kopi CPU yiy yeexal TCP/IP. Bii 'bypass kernel' dafay dindi cycle CPU yi ba noppi wàññi latency bi. InfiniBand itam dafay joxe seytu debit ci aparey bi ngir tissu bu amul benn perte, ba noppi commutateur Quantum yu NVIDIA ak adaptëru ConnectX ñoo ëpp doole ci ordinatër yu mag yi ci IA. RoCE (RDMA ci kaw Ethernet buñ boole) dafay indi njariñ yu noonu mel ci reso Ethernet yi.

Gis-gis xarala

RDMA dafay liggéey ci verb ak ñaari raŋ. Aplikaasioŋ dafay dugal ay laaj liggéey ngir yónnee ak jot ay raŋ; adaptatëru reso bi (HCA) daf leen di jàng ba noppi yónnee done yi ci gox yiñ njëkka bind, yuñ fikse ci mémoire bi ci host bi sori. Ndax NIC mooy yonnee toxal bi ci hardware bi te kernel OS bi dafay bypass, amul benn kopi done te amul benn CPU buy dakkal toxal bi ci paket bu nekk. Kontrolu debit bi lalu ci leble bu InfiniBand dafay tere tampon bi di wal, suko defee tissu bi du ñàkk dara te du am retransmission.

Xam Reseau InfiniBand ak RDMA

InfiniBand ab lëkkaloo bu gaaw la, te amul latency bu bari, muy lëkkale serwër yi ak GPU yi ci cluster IA yi, RDMA dafay may benn masin mu jàng wala bind memory beneen masin te du jëfandikoo CPU bi. Ñoom ñépp bokk ñooy plomberie biy tëye ay junni GPU yu am done ci diiru tàggat model yu mag. Reseau InfiniBand ak RDMA dañuy tabax xarala yu am njeexital ci kalite model bi, njëgu infrastructure bi, yeexal bi, ak wóor ci eskaal bi. Ngir tabax xam-xam bu xóot, jàppal InfiniBand ak RDMA Networking ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo Reseau InfiniBand ak RDMA dañuy gëna baaxal architecture, done, ak tànneefi infrastructure ci wàllu wóor ak njëg. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jamano jooju, Optimisation benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw.

Dogal yi architecture di jël dañuy indi njariñ ak njëgu liggéey bi ay at ci ginaaw. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal.

Njàngalem xarala yi dafay jàppale ekip yi ñu tànn li gën, te baña yam ci li gëna bees daal. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi.

Tanneef yu gëna baax ci wàllu ingeñër dina wàññi jafe-jafe yi ci wàllu wóor ci liggéey bi. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu Reseau InfiniBand ak RDMA

Yaatuwaayu bande bi dafay wéy di yokk: XDR InfiniBand mingi bëgga yegg ba 800 Gb/s ci lëkkalekaay bu nekk, ak kàrtu yoon yuy dem ba 1.6 Tb/s. Taxawaay bi mungi wey di gina am doole ginnaaw bi Ultra Ethernet Consortium defar Ethernet bu méngoo ak InfiniBand ngir ay ligeeyu IA, ak ginnàaw bi ordinatër yi ci biir reso bi (SHARP) di yobbu math bu mbooloo ci kommutatër yi ci seen bopp. Xaarandil boole GPU ak reso bu gëna dëgër, lëkkaloo optik ngir dagg doole, ak tissu yuñ yokk ba tollu ci téemeeri junni accelerator yu model frontier yi di màgg.

Doxal ci àdduna dëgg

Jokko ay junni GPU ci benn super ordinatër IA suko defee done yu gradient yi di toxu ci diggante node yi ci ay mikrosegond ci diiru tàggat buñ séddale

Bayyi benn serwër mu jàng beneen serwër ci saasi (RDMA) ngir gaawlu sistemu fichier yiñ séddale ak base de done te du am benn CPU

Doxal NCCL lépp luy wàññi liggéey ci kaw InfiniBand ngir méngale diisaayu model ci biir benn cluster GPU

Jëfandikoo RoCE ngir indi toxal yu woyof yu melni RDMA ci reso santu done Ethernet yi fi nekk

Modèlu jëfandikoo

Reseau InfiniBand ak RDMA ci jëf

Jokko ay junni GPU ci benn super ordinatër IA suko defee done yu gradient yi di toxu ci diggante node yi ci ay microsegond ci diiru tàggat buñ séddale.

Jokko ay junni GPU ci benn super ordinatër IA suko defee done yu gradient yi di toxu ci diggante node yi ci ay microseconde ci diiru tàggat buñ séddale. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee ay pursàntaasu kalite ci kanam, tëye yoon wi nit ñi di yokk ngir jafe-jafe yi, ba noppi topp njariñu liggéey bi ak njëgu njuumte yi ci diir bi.

Reseau InfiniBand ak RDMA ci jëf

Bayyi benn serwër mu jàng memory beneen serwër ci saasi (RDMA) ngir gaawlu sistem fichier yiñ séddale ak base de done te du am benn CPU.

Bayyi benn serwër mu jàng beneen serwër ci saasi (RDMA) ngir gaawlu sistemu fichier ak base de done yuñ séddale te du am CPU ci kaw. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.

Reseau InfiniBand ak RDMA ci jëf

Doxal NCCL lépp luy wàññi liggéey ci kaw InfiniBand ngir méngale diisaayu model ci biir benn cluster GPU.

Doxal NCCL lépp-waññi liggéey ci kaw InfiniBand ngir méngale diisaayu model ci benn cluster GPU Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Reseau InfiniBand ak RDMA ci jëf

Jëfandikoo RoCE ngir indi toxal yu woyof yu melni RDMA ci reso santu done Ethernet yi fi nekk.

Jëfandikoo RoCE ngir indi RDMA-style low-latency toxal ci reso Ethernet data-center yi fi nekk. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.

Risk yi ak balustrade yi

!

Optimize benn benchmark mën na nëbb ñakk kattan yu gëna yaatu ci sistem bi.

!

Njëg li ñuy fay ci infrastructure yi ak ci toppatoo dañuy faral di suufeel.

!

Bu sistem yi di gëna xawa jafee xam, jafe-jafe yi am ci wàllu kaaraange ak seetlu mën nañu gëna bari.

Roadmap ngir samp gi

1

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo.

Mandargal latency, kalite, ak njëg yi laata ngay jëfandikoo. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Benchmark ci biir sargal ak done yu dëggu.

Benchmark ci biir sargal ak done yu dëggu. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi.

Jumtukaay bi di saytu njuumte yi, derive bi ak njeextalu jëfandikukat bi. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale.

Waajal rollback ak yooni tontu ci jafe-jafe yi laata ngay eskale. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu