Jagorar Fasaha

Daidaiton Tensor don Manyan Samfura

Hanya don raba lissafi a cikin layin cibiyar sadarwa guda ɗaya a cikin GPUs da yawa don haka samfurin da ya yi girma ga na'ura ɗaya har yanzu yana iya aiki.

Dubawa

Hanya don raba lissafi a cikin layin cibiyar sadarwa guda ɗaya a cikin GPUs da yawa don haka samfurin da ya yi girma ga na'ura ɗaya har yanzu yana iya aiki. Yana da mahimmanci saboda ƙirar kan iyaka suna da ɗaruruwan biliyoyin sigogi waɗanda babu GPU ɗaya da zai iya riƙe ko ƙididdigewa da sauri shi kaɗai.

Daidaiton Tensor don Manyan Samfura wani shingen gini ne na fasaha wanda ke shafar ingancin samfuri, farashin kayayyakin more rayuwa, latency, da aminci a sikeli.

Zurfafa nutsewa

Daidaituwar Tensor (wanda kuma ake kira daidaitaccen ƙirar intra-Layer) yana ɓata ma'aunin nauyi ɗaya a cikin GPUs maimakon sanya yadudduka gabaɗaya akan na'urori daban. A cikin na'ura mai canzawa, babban matrix multiplications-hasashen hankali da kuma ciyarwar gaba MLP-an raba: misali, MLP's na farko nauyi matrix an raba shi da ginshiƙai da na biyu ta layuka, don haka kowane GPU yana lissafta yanki kuma guda ɗaya-ragu yana haɗa sakamakon. Hankali yana raba kan kawunansu, tare da kowane GPU yana sarrafa wani yanki. Saboda kowane GPU yana yin wani ɓangare na kowane Layer lokaci guda, daidaitawar tensor yana rage ƙwaƙwalwar kowane-GPU kuma yana haɓaka ƙididdigewa, amma yana buƙatar sadarwa akai-akai, babban bandwidth tsakanin GPUs kowane Layer. Shi ya sa galibi ana tsare shi a cikin kumburin da NVLink ya haɗa, kuma a haɗa shi da bututun mai da daidaiton bayanai don manyan horo da ayyukan yi.

Fahimtar Fasaha

Dabarar, wacce Megatron-LM ta shahara, tana zabar girman rabo don haka sadarwa ba ta da yawa. Rarraba ginshiƙi na farko na MLP matrix-hikima yana barin kowane GPU yayi amfani da rashin daidaituwa a cikin gida ba tare da daidaitawa ba; Rarraba layi na biyu-hikima yana nufin abubuwan da ake fitarwa kawai suna buƙatar guda ɗaya-rage don taƙaita sakamako na ɓangarori. Kowane Layer don haka yana haifar da kusan duka biyu-rage (gaba) da biyu (a baya). Saboda waɗannan ƙungiyoyin suna faruwa kowane layi, latency ya mamaye-don haka tensor parallelism yana rayuwa a bayan hanyoyin haɗin-ƙulli mai sauri kamar NVLink maimakon hanyoyin sadarwa na kuɗaɗen hankali.

Kwarewar Tensor Daidaita don Manyan Samfura

Hanya don raba lissafi a cikin layin cibiyar sadarwa guda ɗaya a cikin GPUs da yawa don haka samfurin da ya yi girma ga na'ura ɗaya har yanzu yana iya aiki. Yana da mahimmanci saboda ƙirar kan iyaka suna da ɗaruruwan biliyoyin sigogi waɗanda babu GPU ɗaya da zai iya riƙe ko ƙididdigewa da sauri shi kaɗai. Daidaiton Tensor don Manyan Samfura wani shingen gini ne na fasaha wanda ke shafar ingancin samfuri, farashin kayayyakin more rayuwa, latency, da aminci a sikeli. Don gina zurfin fahimta, bi da Tensor Parallelism ga Manyan Model a matsayin samfurin aiki, ba sifa ɗaya ba: ayyana sakamakon da ake so, bayyana zato, da kuma raba abin da tsarin zai iya dogara da abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da Parallelism Tensor don Manyan Model suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar Tensor Parallelism don Manyan Samfura

Daidaitawar Tensor ya kasance mai tushe amma yana ƙara haɗawa cikin 'daidaitawar 3D' (tensor + bututun + bayanai) kuma a haɗe shi da daidaiton ƙwararru don ƙirar Masana-Cura. Tsarin tsarin kamar Megatron-LM, DeepSpeed ​​​​, da vLLM suna sarrafa sharding. Kamar yadda haɗin gwiwar GPU (NVLink, NVSwitch) da yadudduka na gani ke samun sauri, iyakar iyakar-ƙorafi yana shakatawa, yana ba da damar ƙungiyoyi masu daidaitawa. Yi tsammanin daidaitawa kai tsaye mai wayo wanda ke ɗaukar girman ɓangarorin da girman rukuni don rage girman sadarwa don wani nau'i na topology da aka bayar.

Aiwatar da Gaskiyar Duniya

Horar da samfurin siga na 175B ta hanyar karkatar da ma'aunin nauyi na kowane Layer a cikin 8 GPUs a cikin kumburin haɗin NVLink guda ɗaya ta amfani da Megatron-LM.

Yin hidimar ƙirar siga na 70B a cikin vLLM tare da tensor_parallel_size = 4 don haka ma'aunin ya yi daidai da GPU guda huɗu kuma yana amsawa a cikin ainihin lokaci.

Rarraba hankalin mai canzawa yana kaiwa a ko'ina cikin GPUs don haka kowace na'ura ta ƙididdige juzu'i, sannan tattara abubuwan da aka fitar don Layer na gaba.

Haɗa daidaiton tensor tsakanin nodes da daidaiton bututun mai a fadin nodes don horar da ƙira-ƙira- tiriliyan akan manyan gungu na GPU.

Hanyoyin Aiwatarwa

Daidaiton Tensor don Manyan Samfura a aikace

Horar da samfurin siga na 175B ta hanyar karkatar da ma'aunin nauyi na kowane Layer a cikin 8 GPUs a cikin kumburin haɗin NVLink guda ɗaya ta amfani da Megatron-LM.

Horar da samfurin siga na 175B ta hanyar rarraba ma'aunin nauyi na kowane Layer a cikin 8 GPUs a cikin kullin haɗin NVLink guda ɗaya ta amfani da Ƙungiyoyin Megatron-LM yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Daidaiton Tensor don Manyan Samfura a aikace

Yin hidimar ƙirar siga na 70B a cikin vLLM tare da tensor_parallel_size = 4 don haka ma'aunin ya yi daidai da GPU guda huɗu kuma yana amsawa a cikin ainihin lokaci.

Yin hidimar ƙirar siga ta 70B a cikin vLLM tare da tensor_parallel_size = 4 don haka ma'aunin nauyi ya dace da GPUs guda huɗu kuma suna amsawa a cikin ainihin lokaci Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Daidaiton Tensor don Manyan Samfura a aikace

Rarraba hankalin mai canzawa yana kaiwa a ko'ina cikin GPUs don haka kowace na'ura ta ƙididdige juzu'i, sannan tattara abubuwan da aka fitar don Layer na gaba.

Rarraba hankalin mai canza canji ya jagoranci a cikin GPUs don haka kowace na'ura ta ƙididdige juzu'i, sannan tattara abubuwan samarwa don ƙungiyoyi na gaba yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Daidaiton Tensor don Manyan Samfura a aikace

Haɗa daidaiton tensor tsakanin nodes da daidaiton bututun mai a fadin nodes don horar da ƙira-ƙira- tiriliyan akan manyan gungu na GPU.

Haɗa daidaiton tensor a cikin nodes da daidaiton bututun a cikin nodes don horar da samfuran siga tiriliyan akan manyan gungu na GPU Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.

!

Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.

!

Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.

Taswirar Hanya

1

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike