Technical GUIDE

Collective Kukurukurirana uye NCCL

Kukurukurirana pamwe chete mashandisiro anoita boka reGPUs uye kusanganisa data, uye NCCL iraibhurari yeNVIDIA inoita kuti kuchinjana uku kuve nekukurumidza.

Overview

Kukurukurirana pamwe chete mashandisiro anoita boka reGPUs uye kusanganisa data, uye NCCL iraibhurari yeNVIDIA inoita kuti kuchinjana uku kuve nekukurumidza. Kushanda senge-kuderedza kurova kwemoyo kwekudzidziswa kwakagoverwa, kuwiriranisa gradients paGPU yega yega nhanho.

Collective Communication uye NCCL inyanzvi yekuvaka inobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero.

Deep Dive

Kudzidzira modhi hombe kunoreva kuti GPU yega yega inoverengera gradients pane yayo yega dhata, ipapo maGPU ese anofanira kubvumirana pane yakasanganiswa mhedzisiro pamberi pedanho rinotevera. Iko kurongeka kunoitwa nekushanda pamwe chete: zvese-kudzikisa sums kukosha muGPU uye zvinopa munhu wese mhedzisiro; vese-vaunganidzira vanounganidza chidimbu cheGPU imwe neimwe mukopi yakazara pavari vese; kutepfenyura kunotumira data reGPU kune mamwe; kuderedza-kuparadzira kusanganisa ipapo kupatsanurwa. NCCL (NVIDIA Collective Communications Library) inoshandisa izvi zvakanaka muGPUs musevha uye nemaseva ese, uchishandisa topology-aware algorithms semhete uye muti zvese zvinoderedza. Inoshandisa NVLink mukati me node uye InfiniBand kana RoCE pakati pemanodhi, uye ndiyo nhare yekukurukurirana pasi pePyTorch DDP, FSDP, DeepSpeed, uye Megatron.

Technical Insight

Ring-yese-inoderedza ndiyo yekare algorithm: maGPU anoumba mhete ine musoro, uye iyo data yakakamurwa kuita chunks inotenderera saka nhanho yega yega inodarika kutaurirana, ichiita iyo yese yekufambisa bandwidth-yakanyanya uye ingangoita yakazvimirira yeGPU kuverenga. Kune akawanda node, miti-yakavakirwa algorithms inoderedza latency nekubatanidza mhinduro hierarchically. NCCL auto-inoona iyo topology, inotora yakanakisa algorithm, uye inogona kuburitsa iyo yekudzikisa math munetiweki neNVIDIA SHARP, ichirevesa data iyo inofanirwa kuyambuka zvinongedzo.

Mastering Collective Communication uye NCCL

Kukurukurirana pamwe chete mashandisiro anoita boka reGPUs uye kusanganisa data, uye NCCL iraibhurari yeNVIDIA inoita kuti kuchinjana uku kuve nekukurumidza. Kushanda senge-kuderedza kurova kwemoyo kwekudzidziswa kwakagoverwa, kuwiriranisa gradients paGPU yega yega nhanho. Collective Communication uye NCCL inyanzvi yekuvaka inobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero. Kuvaka kunzwisisa kwakadzama, kubata Collective Communication uye NCCL semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodiwa, kujekesa fungidziro, uye patsanura zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa Collective Communication uye NCCL inogadzirisa zvivakwa, data, uye sarudzo dzezvivakwa zvinopesana nekuvimbika uye mutengo. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Panguva imwecheteyo, Kukwirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore.

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete.

Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira.

Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reKukurukurirana Kwekubatana uye NCCL

Sezvo masumbu anokwira kusvika kumazana ezviuru zveGPU, kutaurirana kunowedzera kutonga nguva yekudzidziswa, saka maraibhurari akaungana ari muganho unopisa. Tarisira zvakadzika mu-network computing (machinjo ari kuita kuderedza), zviri nani kupindirana kwekombuta uye kutaurirana kuviga latency, uye yakaderera-chaiyo yakaunganidzwa inomisikidza mabyte akafamba. Makwikwi ari kuwedzera zvakare, nekuedza-mutengesi kuedza uye Ethernet-based RDMA kusundira dzimwe nzira, nepo NCCL inoramba ichisimbisa kubatanidzwa neNVLink, NVSwitch, uye ari kubuda machira emaziso.

Real-World Implementation

Kuwiriranisa ma gradients nhanho yega yega yekudzidziswa kune ese maGPU uchishandisa ese-kuderedza muPyTorch DistributedDataParallel

Sharding optimizer states uye kuunganidza parameters pane zvinodiwa nezvose-kuunganidza uye kuderedza-kuparadzira muFSDP kana DeepSpeed ZeRO.

Kutepfenyura ekutanga modhi uremu kubva kune imwe GPU kune vamwe vese pakutanga kwekudzidzira kumhanya

Kushandisa mhete yese-kuderedza pamusoro peNVLink neInfiniBand kuchengetedza bandwidth yakakwira kune akawanda-node GPU masumbu.

Maitiro Ekuita

Collective Communication uye NCCL mukuita

Kuwiriranisa gradients nhanho yega yega yekudzidziswa kune ese maGPU uchishandisa ese-kuderedza muPyTorch DistributedDataParallel.

Kuwiriranisa gradients nhanho yega yega yekudzidziswa kune ese maGPU uchishandisa ese-kuderedza muPyTorch DistributedDataParallel Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Collective Communication uye NCCL mukuita

Sharding optimizer nyika uye kuunganidza paramita pane kudiwa nezvose-kuunganidza uye kuderedza-kuparadzira muFSDP kana DeepSpeed ZeRO.

Sharding optimizer states uye kuunganidza maparamendi pane zvinodiwa ne-ese-kuunganidza uye kuderedza-kuparadzira muFSDP kana DeepSpeed ​​ZeRO Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

Collective Communication uye NCCL mukuita

Kutepfenyura ekutanga modhi uremu kubva kune imwe GPU kune vamwe vese pakutanga kwekudzidzira kumhanya.

Kutepfenyura ekutanga modhi uremu kubva kune imwe GPU kuenda kune vamwe vese pakutanga kwekudzidziswa kumhanya Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Collective Communication uye NCCL mukuita

Kushandisa mhete-zvese-kuderedza pamusoro peNVLink uye InfiniBand kuchengetedza bandwidth yakakwira kune akawanda-node GPU masumbu.

Kushandisa mhete-yese-kuderedza pamusoro peNVLink neInfiniBand kuchengetedza bandwidth yakakwira kune akawanda-node GPU masumbu Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Kugadzirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba.

!

Infrastructure uye mari yekugadzirisa inowanzotarisirwa pasi.

!

Chengetedzo uye kucherechedzwa mapundu anogona kukura sezvo masisitimu anowedzera kuoma.

Implementation Roadmap

1

Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa.

Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Benchmark pasi pechokwadi mutoro uye data mamiriro.

Benchmark pasi pechokwadi mutoro uye data mamiriro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro.

Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera.

Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora