Dubawa
DeepSpeed (Microsoft) da Megatron-LM (NVIDIA) su ne tarin software waɗanda ke yin ƙirar horarwa tare da biliyoyin sigogi a cikin dubban GPUs a zahiri mai yiwuwa. Idan ba tare da su ba, ƙirar iyakoki na yau ba za su iya dacewa da ƙwaƙwalwar ajiya ba ko gama horo a cikin madaidaicin lokaci.
DeepSpeed da Megatron Training Stacks wani shingen ginin fasaha ne wanda ke shafar ingancin samfuri, farashin kayayyakin more rayuwa, latency, da aminci a sikeli.
Zurfafa nutsewa
Horar da babban samfuri akan GPU ɗaya ba zai yuwu ba saboda ma'aunin nauyi, gradients, da jahohin ingantawa ba su dace ba. Waɗannan tarin sun raba aikin zuwa yawancin GPUs. Megatron-LM ya fara aikin tensor parallelism, yana yanka nau'ikan nau'ikan matrix a cikin kowane Layer a cikin GPUs, da daidaiton bututun mai, wanda ke sanya yadudduka daban-daban akan GPUs daban-daban. Gudunmawar sa hannu ta DeepSpeed ita ce ZeRO (Zero Redundancy Optimizer), wanda ke shards masu haɓaka jihohi, gradients, da sigogi a cikin GPUs maimakon yin kwafin su, yanke ƙwaƙwalwar kowane-GPU sosai. Sau da yawa ana haɗa su biyun (Megatron-DeepSpeed ) don horar da samfura kamar BLOOM-176B da Megatron-Turing NLG. Hakanan suna ƙara daidaito-daidaitacce, wurin duba kunnawa, da saukarwa zuwa CPU ko NVMe don haka manyan samfuran suna horar da ƙayyadaddun kayan aiki.
Fahimtar Fasaha
ZeRO yana da matakai uku na haɓaka ajiyar ƙwaƙwalwar ajiya: Stage 1 shards optimizer states, Stage 2 kuma shards gradients, da Stage 3 shards da sigogi da kansu, tattara su a kan bukatar lokacin wucewa gaba da baya. Haɗe da tensor parallelism (intra-Layer) da kuma daidaitaccen bututu (inter-Layer), wannan yana samar da 'daidaicin 3D.' Makullin tashin hankali shine sadarwa sama da sama: kowane tsagewar shard yana ƙara zirga-zirgar GPU-zuwa-GPU, don haka injiniyoyi suna daidaita rarrabuwar don kiyaye hanyoyin NVLink da InfiniBand cikin sauri.
Jagorar DeepSpeed da Tsarin Horarwar Megatron
DeepSpeed (Microsoft) da Megatron-LM (NVIDIA) su ne tarin software waɗanda ke yin ƙirar horarwa tare da biliyoyin sigogi a cikin dubban GPUs a zahiri mai yiwuwa. Idan ba tare da su ba, ƙirar iyakoki na yau ba za su iya dacewa da ƙwaƙwalwar ajiya ba ko gama horo a cikin madaidaicin lokaci. DeepSpeed da Megatron Training Stacks wani shingen ginin fasaha ne wanda ke shafar ingancin samfuri, farashin kayayyakin more rayuwa, latency, da aminci a sikeli. Don gina zurfin fahimta, bi DeepSpeed da kuma Megatron Training Stacks a matsayin samfurin aiki, ba sifa ɗaya ba: ayyana sakamakon da ake so, bayyana zato, da kuma raba abin da tsarin zai iya dogara da abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.
A aikace, ƙungiyoyi masu ƙarfi da ke amfani da DeepSpeed da Megatron Training Stacks suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.
Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.
Dabarun Tasiri
Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.
Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.
Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.
Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.
Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.
Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.
Aiwatar da Gaskiyar Duniya
Horar da samfurin BLOOM-176B na buɗe harshe da yawa ta amfani da haɗin Megatron-DeepSpeed a cikin ɗaruruwan GPUs.
Microsoft da NVIDIA suna horar da siga-Biliyan 530 na Megatron-Turing NLG tare da daidaiton 3D.
ZeRO-Offload yana barin masu bincike su daidaita samfuran siga-biliyoyin-biyu akan GPU guda ɗaya ta hanyar zubar da jihohin ingantawa zuwa CPU RAM.
Yin amfani da wurin duba kunnawa a cikin waɗannan ɗigon don dacewa da tsawon windows mahallin ta hanyar sake lissafin kunnawa maimakon adana su duka.
Hanyoyin Aiwatarwa
DeepSpeed da Megatron Training Stacks a aikace
Horar da samfurin BLOOM-176B na buɗe harshe da yawa ta amfani da haɗin Megatron-DeepSpeed a cikin ɗaruruwan GPUs.
Horar da ƙirar BLOOM-176B mai harsuna da yawa ta amfani da haɗin haɗin Megatron-DeepSpeed a cikin ɗaruruwan ƙungiyoyin GPUs yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'i, da bin diddigin nasarorin samarwa da farashi na kuskure akan lokaci.
DeepSpeed da Megatron Training Stacks a aikace
Microsoft da NVIDIA suna horar da siga-Biliyan 530 na Megatron-Turing NLG tare da daidaiton 3D.
Microsoft da NVIDIA suna horar da samfurin Megatron-Turing NLG na biliyan 530 tare da Ƙungiyoyin daidaitawa na 3D yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da kuma bin diddigin nasarorin samar da aiki da tsadar kuskure a kan lokaci.
DeepSpeed da Megatron Training Stacks a aikace
ZeRO-Offload yana barin masu bincike su daidaita samfuran siga-biliyoyin-biyu akan GPU guda ɗaya ta hanyar zubar da jihohin ingantawa zuwa CPU RAM.
ZeRO-Offload yana barin masu bincike su daidaita samfuran siga-biliyan-biyu akan GPU guda ɗaya ta hanyar zubar da jihohi masu haɓakawa zuwa Ƙungiyoyin RAM na CPU yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.
DeepSpeed da Megatron Training Stacks a aikace
Yin amfani da wurin duba kunnawa a cikin waɗannan ɗigon don dacewa da tsawon windows mahallin ta hanyar sake lissafin kunnawa maimakon adana su duka.
Yin amfani da maɓallin kunnawa a cikin waɗannan tarin don dacewa da windows mahallin mai tsayi ta hanyar sake lissafin kunnawa maimakon adana su duka Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefen, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.
Hatsari & Tsare-tsare
Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.
Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.
Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.
Taswirar Hanya
Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.
Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.
Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.
Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.
Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.
Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.
Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.
Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.