Jagorar Fasaha

DeepSpeed ​​ da Megatron Training Stacks

DeepSpeed (Microsoft) da Megatron-LM (NVIDIA) su ne tarin software waɗanda ke yin ƙirar horarwa tare da biliyoyin sigogi a cikin dubban GPUs a zahiri mai yiwuwa.

Dubawa

DeepSpeed (Microsoft) da Megatron-LM (NVIDIA) su ne tarin software waɗanda ke yin ƙirar horarwa tare da biliyoyin sigogi a cikin dubban GPUs a zahiri mai yiwuwa. Idan ba tare da su ba, ƙirar iyakoki na yau ba za su iya dacewa da ƙwaƙwalwar ajiya ba ko gama horo a cikin madaidaicin lokaci.

DeepSpeed ​​​​ da Megatron Training Stacks wani shingen ginin fasaha ne wanda ke shafar ingancin samfuri, farashin kayayyakin more rayuwa, latency, da aminci a sikeli.

Zurfafa nutsewa

Horar da babban samfuri akan GPU ɗaya ba zai yuwu ba saboda ma'aunin nauyi, gradients, da jahohin ingantawa ba su dace ba. Waɗannan tarin sun raba aikin zuwa yawancin GPUs. Megatron-LM ya fara aikin tensor parallelism, yana yanka nau'ikan nau'ikan matrix a cikin kowane Layer a cikin GPUs, da daidaiton bututun mai, wanda ke sanya yadudduka daban-daban akan GPUs daban-daban. Gudunmawar sa hannu ta DeepSpeed ​​ita ce ZeRO (Zero Redundancy Optimizer), wanda ke shards masu haɓaka jihohi, gradients, da sigogi a cikin GPUs maimakon yin kwafin su, yanke ƙwaƙwalwar kowane-GPU sosai. Sau da yawa ana haɗa su biyun (Megatron-DeepSpeed ​​​​) don horar da samfura kamar BLOOM-176B da Megatron-Turing NLG. Hakanan suna ƙara daidaito-daidaitacce, wurin duba kunnawa, da saukarwa zuwa CPU ko NVMe don haka manyan samfuran suna horar da ƙayyadaddun kayan aiki.

Fahimtar Fasaha

ZeRO yana da matakai uku na haɓaka ajiyar ƙwaƙwalwar ajiya: Stage 1 shards optimizer states, Stage 2 kuma shards gradients, da Stage 3 shards da sigogi da kansu, tattara su a kan bukatar lokacin wucewa gaba da baya. Haɗe da tensor parallelism (intra-Layer) da kuma daidaitaccen bututu (inter-Layer), wannan yana samar da 'daidaicin 3D.' Makullin tashin hankali shine sadarwa sama da sama: kowane tsagewar shard yana ƙara zirga-zirgar GPU-zuwa-GPU, don haka injiniyoyi suna daidaita rarrabuwar don kiyaye hanyoyin NVLink da InfiniBand cikin sauri.

Jagorar DeepSpeed ​​da Tsarin Horarwar Megatron

DeepSpeed ​​(Microsoft) da Megatron-LM (NVIDIA) su ne tarin software waɗanda ke yin ƙirar horarwa tare da biliyoyin sigogi a cikin dubban GPUs a zahiri mai yiwuwa. Idan ba tare da su ba, ƙirar iyakoki na yau ba za su iya dacewa da ƙwaƙwalwar ajiya ba ko gama horo a cikin madaidaicin lokaci. DeepSpeed ​​​​ da Megatron Training Stacks wani shingen ginin fasaha ne wanda ke shafar ingancin samfuri, farashin kayayyakin more rayuwa, latency, da aminci a sikeli. Don gina zurfin fahimta, bi DeepSpeed ​​​​da kuma Megatron Training Stacks a matsayin samfurin aiki, ba sifa ɗaya ba: ayyana sakamakon da ake so, bayyana zato, da kuma raba abin da tsarin zai iya dogara da abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da DeepSpeed ​​​​ da Megatron Training Stacks suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar DeepSpeed ​​da Horarwar Horarwar Megatron

Yi tsammanin haɗin kai tare da FSDP na asali na PyTorch (Fully Sharded Data Parallel), wanda ya mamaye ra'ayoyin ZeRO da yawa, yana ɓatar da layi tsakanin tarin bincike da ainihin tsarin. Hanyoyi masu haɗawa da masu tsara layi na atomatik suna nufin cire kunnawa na hannu. Yayin da gungu na horarwa ke girma zuwa ga ɗaruruwan dubunnan masu haɓakawa, haƙurin kuskure, ƙima mai ƙarfi, da sadarwa tare da ƙididdigewa sun zama manyan iyakokin injiniya, tare da tallafi don sabbin kayan masarufi kamar NVIDIA Blackwell da guntun horo na al'ada.

Aiwatar da Gaskiyar Duniya

Horar da samfurin BLOOM-176B na buɗe harshe da yawa ta amfani da haɗin Megatron-DeepSpeed ​​​​a cikin ɗaruruwan GPUs.

Microsoft da NVIDIA suna horar da siga-Biliyan 530 na Megatron-Turing NLG tare da daidaiton 3D.

ZeRO-Offload yana barin masu bincike su daidaita samfuran siga-biliyoyin-biyu akan GPU guda ɗaya ta hanyar zubar da jihohin ingantawa zuwa CPU RAM.

Yin amfani da wurin duba kunnawa a cikin waɗannan ɗigon don dacewa da tsawon windows mahallin ta hanyar sake lissafin kunnawa maimakon adana su duka.

Hanyoyin Aiwatarwa

DeepSpeed ​​​​ da Megatron Training Stacks a aikace

Horar da samfurin BLOOM-176B na buɗe harshe da yawa ta amfani da haɗin Megatron-DeepSpeed ​​​​a cikin ɗaruruwan GPUs.

Horar da ƙirar BLOOM-176B mai harsuna da yawa ta amfani da haɗin haɗin Megatron-DeepSpeed ​​​​a cikin ɗaruruwan ƙungiyoyin GPUs yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'i, da bin diddigin nasarorin samarwa da farashi na kuskure akan lokaci.

DeepSpeed ​​​​ da Megatron Training Stacks a aikace

Microsoft da NVIDIA suna horar da siga-Biliyan 530 na Megatron-Turing NLG tare da daidaiton 3D.

Microsoft da NVIDIA suna horar da samfurin Megatron-Turing NLG na biliyan 530 tare da Ƙungiyoyin daidaitawa na 3D yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da kuma bin diddigin nasarorin samar da aiki da tsadar kuskure a kan lokaci.

DeepSpeed ​​​​ da Megatron Training Stacks a aikace

ZeRO-Offload yana barin masu bincike su daidaita samfuran siga-biliyoyin-biyu akan GPU guda ɗaya ta hanyar zubar da jihohin ingantawa zuwa CPU RAM.

ZeRO-Offload yana barin masu bincike su daidaita samfuran siga-biliyan-biyu akan GPU guda ɗaya ta hanyar zubar da jihohi masu haɓakawa zuwa Ƙungiyoyin RAM na CPU yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

DeepSpeed ​​​​ da Megatron Training Stacks a aikace

Yin amfani da wurin duba kunnawa a cikin waɗannan ɗigon don dacewa da tsawon windows mahallin ta hanyar sake lissafin kunnawa maimakon adana su duka.

Yin amfani da maɓallin kunnawa a cikin waɗannan tarin don dacewa da windows mahallin mai tsayi ta hanyar sake lissafin kunnawa maimakon adana su duka Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefen, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.

!

Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.

!

Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.

Taswirar Hanya

1

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike