Dubawa
ZeRO (Zero Redundancy Optimizer) yana kawar da ɓarnawar ƙwaƙwalwar ajiyar kwafi na daidaiton bayanai ta hanyar raba yanayin ingantawa, gradients, da ma'auni a cikin GPUs. Yana ba ku damar horar da manya-manyan ƙira tare da sauƙi na daidaiton bayanai amma ɗan juzu'in ƙwaƙwalwar kowane-GPU.
ZeRO da Sharded Optimizers wani shingen gini ne na fasaha wanda ke shafar ingancin samfurin, farashin kayayyakin more rayuwa, latency, da aminci a sikeli.
Zurfafa nutsewa
A cikin daidaiton bayanai na yau da kullun, kowane GPU yana adana cikakken kwafin yanayin ingantawa, gradients, da sigogi, wanda yake da ɓarna sosai, musamman ga Adamu, inda yanayin ingantawa na iya ninka girman samfurin da kanta. ZeRO, wanda Microsoft ya gabatar a cikin DeepSpeed, yana cire wannan jan aiki ta hanyar rarraba waɗannan tenors a cikin GPUs don haka kowace na'ura ta mallaki yanki guda kawai. ZeRO ya zo cikin matakai masu ci gaba uku: Stage 1 shards optimizer state, Stage 2 yana ƙara gradient sharding, da Stage 3 shards sigogi da kansu. Kamar yadda ake buƙata, GPUs suna tattara ɓangarorin da suka ɓace ta hanyar sadarwa, lissafta, sannan a sake su. Sakamakon yana da matuƙar ƙarancin ƙwaƙwalwar ajiya a kowane GPU, yana ba da damar horar da ma'auni na biliyan- zuwa tiriliyan, yayin da ke kiyaye tsarin tsari mai sauƙi na daidaiton bayanai.
Fahimtar Fasaha
ZeRO yana cinikin ƙarin sadarwa don ajiyar ƙwaƙwalwar ajiya. A mataki na 3, kafin wucewar gaba na Layer, duk wani taro yana tattara cikakkun sigogin Layer akan kowane GPU; daga baya an zubar da ɓangarorin da ba mallakarsu ba don dawo da ƙwaƙwalwar ajiya. Gradients an rage-warwatse don haka kowane GPU yana kiyaye yanki na gradient kawai wanda ya dace da sigogin da ya mallaka. PyTorch's FSDP (Cikakken Sharded Data Parallel) yana aiwatar da ra'ayi iri ɗaya na asali, nannade kayan aiki zuwa shard da sake sakewa akan tashi.
Mastering ZeRO da Sharded Optimizers
ZeRO (Zero Redundancy Optimizer) yana kawar da ɓarnawar ƙwaƙwalwar ajiyar kwafi na daidaiton bayanai ta hanyar raba yanayin ingantawa, gradients, da ma'auni a cikin GPUs. Yana ba ku damar horar da manya-manyan ƙira tare da sauƙi na daidaiton bayanai amma ɗan juzu'in ƙwaƙwalwar kowane-GPU. ZeRO da Sharded Optimizers wani shingen gini ne na fasaha wanda ke shafar ingancin samfurin, farashin kayayyakin more rayuwa, latency, da aminci a sikeli. Don gina zurfin fahimta, bi da ZeRO da Sharded Optimizers a matsayin samfurin aiki, ba fasali ɗaya ba: ayyana sakamakon da ake so, bayyana zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.
A aikace, ƙungiyoyi masu ƙarfi da ke amfani da ZeRO da Sharded Optimizers suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.
Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.
Dabarun Tasiri
Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.
Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.
Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.
Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.
Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.
Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.
Aiwatar da Gaskiyar Duniya
Yin amfani da DeepSpeed Zero Stage 2 don daidaita ƙirar harshe na biliyan-biliyoyin da yawa wanda in ba haka ba zai mamaye ƙwaƙwalwar GPU.
Horowa tare da PyTorch FSDP, wanda ke lalata sigogi, gradients, da yanayin ingantawa a cikin GPUs kuma yana tattara su kowane Layer akan buƙata.
Aiwatar da ZeRO-Offload don tura yanayin ingantawa zuwa ƙwaƙwalwar ajiyar CPU, barin GPU guda ɗaya ya horar da ƙira sau da yawa girma fiye da VRAM ɗin sa.
Ƙirƙirar ƙirar siga- tiriliyan tare da ZeRO-Infinity ta hanyar yawo shards daga ajiyar NVMe lokacin ƙwaƙwalwar GPU da CPU ta ƙare.
Hanyoyin Aiwatarwa
ZeRO da Sharded Optimizers a aikace
Yin amfani da DeepSpeed Zero Stage 2 don daidaita ƙirar harshe na biliyan-biliyoyin da yawa wanda in ba haka ba zai mamaye ƙwaƙwalwar GPU.
Yin amfani da DeepSpeed Zero Stage 2 don daidaita ƙirar harshe na biliyan-biliyan-biyu wanda in ba haka ba zai mamaye Ƙungiyoyin ƙwaƙwalwar ajiya na GPU yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da kuma bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.
ZeRO da Sharded Optimizers a aikace
Horowa tare da PyTorch FSDP, wanda ke lalata sigogi, gradients, da yanayin ingantawa a cikin GPUs kuma yana tattara su kowane Layer akan buƙata.
Horowa tare da PyTorch FSDP, wanda ke ɓarna sigogi, gradients, da haɓakawa a cikin GPUs kuma suna tattara su kowane layi akan buƙatu Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.
ZeRO da Sharded Optimizers a aikace
Aiwatar da ZeRO-Offload don tura yanayin ingantawa zuwa ƙwaƙwalwar ajiyar CPU, barin GPU guda ɗaya ya horar da ƙira sau da yawa girma fiye da VRAM ɗin sa.
Aiwatar da ZeRO-Offload don tura yanayin ingantawa zuwa ƙwaƙwalwar CPU, barin GPU guda ɗaya ya horar da samfuri sau da yawa girma fiye da Ƙungiyoyin VRAM ɗin sa yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da kuma bin diddigin nasarorin yawan aiki da ƙimar kuskure akan lokaci.
ZeRO da Sharded Optimizers a aikace
Ƙirƙirar ƙirar siga- tiriliyan tare da ZeRO-Infinity ta hanyar yawo shards daga ajiyar NVMe lokacin ƙwaƙwalwar GPU da CPU ta ƙare.
Ƙirƙirar samfurin siga na tiriliyan- tiriliyan tare da ZeRO-Infinity ta hanyar yawo madaidaicin shards daga ajiyar NVMe lokacin da GPU da ƙwaƙwalwar CPU ke ƙare Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk abubuwan da ake samu da ƙimar kuɗi a kan lokaci.
Hatsari & Tsare-tsare
Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.
Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.
Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.
Taswirar Hanya
Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.
Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.
Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.
Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.
Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.
Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.
Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.
Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.