Technical GUIDE

ZeRO uye Sharded Optimizers

ZeRO (Zero Redundancy Optimizer) inobvisa kutambisa ndangariro kudzokororwa kwedata parallelism ne sharding optimizer state, gradients, uye huremu muGPUs.

Overview

ZeRO (Zero Redundancy Optimizer) inobvisa kutambisa ndangariro kudzokororwa kwedata parallelism ne sharding optimizer state, gradients, uye huremu muGPUs. Inokutendera kuti udzidzise mamodheru akakura nekureruka kwekufanana kwedata asi chidimbu cheiyo-GPU ndangariro.

ZeRO uye Sharded Optimizers chivakwa chehunyanzvi chinobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero.

Deep Dive

Mune zvakajairwa data parallelism, yega GPU inochengeta yakawandisa kopi yakazara ye optimizer state, gradients, uye paramita, iyo inoparadza zvakanyanya, kunyanya kuna Adamu, uko optimizer nyika inogona kuwanda kanoverengeka saizi yemuenzaniso pachayo. ZeRO, yakaunzwa ne Microsoft muDeepSpeed, inobvisa iyi redundancy nekugovanisa matensor aya pamaGPU kuitira kuti mudziyo wega wega ungove nechidimbu chete. ZeRO inouya mumatanho matatu anofambira mberi: Stage 1 shards optimizer state, Stage 2 inowedzera gradient sharding, uye Stage 3 shards iyo paramita pachayo. Sezvinodiwa, maGPU anounganidza zvimedu zvisipo kuburikidza nekutaurirana, compute, wobva wazvisunungura. Mhedzisiro yacho yakadzikira zvakanyanya ndangariro paGPU, ichigonesa bhiriyoni-kusvika matrillion-parameter kudzidziswa, uku uchichengeta iri nyore hurongwa hwemhando yedata parallelism.

Technical Insight

ZeRO inotengesa kutaurirana kwekuwedzera kuchengetedza ndangariro. MuNhanho 3, isati yasvika pamberi, iyo yose-inounganidza inounganidza iyo layer izere paramita paGPU yega yega; mushure mezvo zvimedu zvisiri zvevaridzi zvinoraswa kuti zvidzore ndangariro. Gradients anodzikisira-akapararira saka yega GPU inochengeta chete gradient chidimbu chinoenderana nemaparamendi ayo anayo. PyTorch's FSDP (Fully Sharded Data Parallel) inoshandisa iyo pfungwa imwechete yekuzvarwa, kuputira mamodule kune shard uye shard panhunzi.

Mastering ZeRO uye Sharded Optimizers

ZeRO (Zero Redundancy Optimizer) inobvisa kutambisa ndangariro kudzokororwa kwedata parallelism ne sharding optimizer state, gradients, uye huremu muGPUs. Inokutendera kuti udzidzise mamodheru akakura nekureruka kwekufanana kwedata asi chidimbu cheiyo-GPU ndangariro. ZeRO uye Sharded Optimizers chivakwa chehunyanzvi chinobata mhando yemhando, mutengo wezvivakwa, latency, uye kuvimbika pachiyero. Kuvaka kunzwisisa kwakadzama, bata ZeRO uye Sharded Optimizers semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodiwa, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa ZeRO uye Sharded Optimizers zvinogonesa zvivakwa, data, uye sarudzo dzezvivakwa zvinopesana nekuvimbika uye mutengo. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Panguva imwecheteyo, Kukwirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore.

Zvisarudzo zvezvivakwa zvinotyaira kuita uye mutengo wekushandisa kwemakore. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete.

Dzidzo yehunyanzvi inobatsira zvikwata kusarudza murwi wakakodzera, kwete iwo mutsva chete. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira.

Sarudzo dzeinjiniya dziri nani dzinoderedza zviitiko zvekuvimbika mukugadzira. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reZeRO uye Sharded Optimizers

Sharding irikuve iyo yekusarudzika kune yakakura-mwero kudzidziswa kwete yekusarudzika sarudzo. Tarisira kusanganisa kwakadzama nekudonhedza (kusundira zvimedu kuCPU kana NVMe kuburikidza neZeRO-Infinity), zvirinani kupindirana kwese-kuunganidza uye kuderedza-kuparadzira nemakomputa kuvanza mutengo wavo, uye musanganiswa ne tensor uye pombi parallelism. Sezvo mamodheru achiramba achikura, ndangariro-inoshanda sharded optimizers ari pakati pekuaisa pane echokwadi hardware bhajeti.

Real-World Implementation

Kushandisa DeepSpeed ​​ZeRO Stage 2 kukwenenzvera-mabhirioni-parameter mutauro modhi yaizofashukira GPU ndangariro.

Kudzidziswa nePyTorch FSDP, iyo shards paramita, gradients, uye optimizer nyika mhiri kweGPU uye inoaunganidza padanho pane zvinodiwa.

Kushandisa ZeRO-Offload kusundidzira optimizer state kuCPU ndangariro, ichirega imwe GPU ichidzidzisa modhi yakakura kakawanda kupfuura VRAM yayo.

Kuyera triliyoni-parameter modhi neZeRO-Infinity nekutepfenyura paramita shards kubva kuNVMe chengetedzo kana GPU neCPU ndangariro dzapera.

Maitiro Ekuita

ZeRO uye Sharded Optimizers mukuita

Kushandisa DeepSpeed ​​ZeRO Stage 2 kukwenenzvera-mabhirioni-parameter mutauro modhi yaizofashukira GPU ndangariro.

Uchishandisa DeepSpeed ZeRO Stage 2 kukwenenzvera-mabhirioni-parameter mutauro modhi yaizofashukira GPU memory Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

ZeRO uye Sharded Optimizers mukuita

Kudzidziswa nePyTorch FSDP, iyo shards paramita, gradients, uye optimizer nyika mhiri kweGPU uye inoaunganidza padanho pane zvinodiwa.

Kudzidziswa nePyTorch FSDP, iyo shards paramita, gradients, uye optimizer nyika mhiri kweGPU uye inovaunganidza padanho rekuda Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

ZeRO uye Sharded Optimizers mukuita

Kushandisa ZeRO-Offload kusundidzira optimizer state kuCPU ndangariro, ichirega imwe GPU ichidzidzisa modhi yakakura kakawanda kupfuura VRAM yayo.

Kushandisa ZeRO-Offload kusundira optimizer state kuCPU ndangariro, kurega GPU imwe chete idzidzise modhi yakakura kupfuura yayo VRAM Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

ZeRO uye Sharded Optimizers mukuita

Kuyera triliyoni-parameter modhi neZeRO-Infinity nekutepfenyura paramita shards kubva kuNVMe chengetedzo kana GPU neCPU ndangariro dzapera.

Kuyera triliyoni-parameter modhi neZeRO-Infinity nekutepfenyura paramita shards kubva kuNVMe chengetedzo apo GPU neCPU ndangariro inopera Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye kukanganisa mutengo nekufamba kwenguva.

Njodzi & Guardrails

!

Kugadzirisa imwe bhenji kunogona kuvanza yakafara system kushaya simba.

!

Infrastructure uye mari yekugadzirisa inowanzotarisirwa pasi.

!

Chengetedzo uye kucherechedzwa mapundu anogona kukura sezvo masisitimu anowedzera kuoma.

Implementation Roadmap

1

Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa.

Tsanangura latency, mhando, uye mutengo zvinangwa usati waitwa. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Benchmark pasi pechokwadi mutoro uye data mamiriro.

Benchmark pasi pechokwadi mutoro uye data mamiriro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro.

Chishandiso chekutarisa zvikanganiso, kudonha, uye mushandisi maitiro. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera.

Gadzirira nzira dzekudzosera kumashure uye dzezviitiko usati wawedzera. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora