Jagorar Fasaha

Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarraba

Yadda tsarin AI ke keɓancewa, sake amfani da shi, da dawo da ƙayyadaddun ƙwaƙwalwar ajiya akan GPU, kuma me yasa raguwar raguwa (raguwa) na iya haifar da kurakuran ƙwaƙwalwar ajiya ko da lokacin da yawan ƙwaƙwalwar fasaha ta rage.

Dubawa

Yadda tsarin AI ke keɓancewa, sake amfani da shi, da dawo da ƙayyadaddun ƙwaƙwalwar ajiya akan GPU, kuma me yasa raguwar raguwa (raguwa) na iya haifar da kurakuran ƙwaƙwalwar ajiya ko da lokacin da yawan ƙwaƙwalwar fasaha ta rage. Fahimtar shi shine mabuɗin don dacewa da manyan samfura da guje wa ɓarna masu ban mamaki.

Gudanar da Ƙwaƙwalwar Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwaƙwalwa na GPU ne wanda ke shafar ingancin samfurin, farashin kayan aiki, latency, da aminci a sikelin.

Zurfafa nutsewa

Ƙwaƙwalwar GPU tana da ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙwaƙwalwar ajiya da daraja: katin zai iya samun jimlar 24, 80, ko 192 GB, wanda aka raba ta ma'aunin ƙira, kunnawa, gradients, jihohin ingantawa, da maɓalli na wucin gadi. Kiran direba don keɓance ƙwaƙwalwar ajiya akan kowane aiki zai kasance a hankali, don haka tsarin tsarin kamar PyTorch suna amfani da allocator na caching wanda ke ɗaukar manyan tubalan gaba da fitar da ƙananan yanki, sannan adana ɓangarorin da aka saki a cikin tafkin don sake amfani da su. Kama yana rarrabuwar kawuna: yayin da aka keɓe tenors na masu girma dabam dabam da kuma 'yantar da su, sararin sararin samaniya ya karye zuwa ɓarke ​​​​zuwa warwatse. Kuna iya samun 5 GB kyauta gabaɗaya duk da haka kuna kasa ware madaidaicin 2 GB tensor saboda babu tazari ɗaya da ya isa. Wannan shine dalilin da ya sa horarwa na iya yin karo tare da kurakuran da ba a iya tunawa ba duk da da alama akwai dakin kai.

Fahimtar Fasaha

PyTorch's CUDA caching allocator yana raba ƙwaƙwalwar ajiya zuwa rafukan tubalan kuma yana sake amfani da tubalan da aka saki waɗanda suka dace da girman da ake buƙata, guje wa kiraye-kirayen cudaMalloc/cuda masu tsada. Ragewa yana tasowa lokacin da ba za a iya sake haɗa tubalan ba. Kayayyakin aiki kamar torch.cuda.empty_cache, PYTORCH_CUDA_ALLOC_CONF zažužžukan fadada_segments, da hotunan ƙwaƙwalwar ajiya suna taimakawa. Sabbin hanyoyi suna aro ra'ayoyin memori mai kama-da-wane, yin taswirar shafukan zahiri marasa ci gaba zuwa cikin kewayon kama-da-wane don haka manyan buƙatun sun yi nasara duk da rarrabuwa.

Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarraba

Yadda tsarin AI ke keɓancewa, sake amfani da shi, da dawo da ƙayyadaddun ƙwaƙwalwar ajiya akan GPU, kuma me yasa raguwar raguwa (raguwa) na iya haifar da kurakuran ƙwaƙwalwar ajiya ko da lokacin da yawan ƙwaƙwalwar fasaha ta rage. Fahimtar shi shine mabuɗin don dacewa da manyan samfura da guje wa ɓarna masu ban mamaki. Gudanar da Ƙwaƙwalwar Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwaƙwalwa na GPU ne wanda ke shafar ingancin samfurin, farashin kayan aiki, latency, da aminci a sikelin. Don gina fahimta mai zurfi, bi da Gudanar da Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwararren Ƙwaƙwalwa na GPU guda ɗaya ne: Ƙayyadaddun sakamakon da ake so , bayyana zato, da kuma raba abin da tsarin zai iya yi a dogara ga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarrabawa suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa akan dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarraba

Gudanar da ƙwaƙwalwar ajiya yana ƙara wayo kuma yana da ƙarin shafi, wahayi daga tsarin aiki. Dabaru kamar masu rarraba salon-memory mai kama-da-wane da kulawar shafi (wanda aka yi amfani da su don sarrafa ma'ajin KV yayin tantancewa) suna rage sharar gida da rarrabuwa sosai. Yi tsammanin ginshiƙai zuwa tsoho don faɗaɗawa, masu ɓarna masu rarrabawa, mafi kyawun gani ta hanyar ginannun bayanan martaba, da ƙarin haɗin kai tare da saukewa da sake lissafin don haka tsarin yana jujjuya GPU, CPU, da ƙwaƙwalwar faifai ta atomatik don ci gaba da amfani da girma kuma ba kasafai ba.

Aiwatar da Gaskiyar Duniya

Gudun horon da ya yi karo da 'CUDA ba ta da ƙwaƙwalwar ajiya' duk da tanadin ƙwaƙwalwar ajiya yana nuna sarari kyauta, an daidaita shi ta hanyar saita PYTORCH_CUDA_ALLOC_CONF don kunna sassan da za a iya faɗaɗawa.

Amfani da torch.cuda.memory_summary ko hoton ƙwaƙwalwar ajiya don tantance waɗanne tenors da rarrabuwa ke cin GPU's 80 GB.

vLLM's Paged Hankali yana sarrafa ma'ajiyar kulawar KV a cikin ƙayyadaddun shafuka masu girma don ba da buƙatun taɗi na lokaci ɗaya ba tare da ɓata ƙwaƙwalwar ajiya ba.

Rage girman batch ko ba da damar duban gradient don yanke ƙwaƙwalwar kunnawa da guje wa ɓarna-kore gazawar ƙwaƙwalwar ajiya.

Hanyoyin Aiwatarwa

Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarrabawa a aikace

Gudun horon da ya yi karo da 'CUDA ba ta da ƙwaƙwalwar ajiya' duk da tanadin ƙwaƙwalwar ajiya yana nuna sarari kyauta, an daidaita shi ta hanyar saita PYTORCH_CUDA_ALLOC_CONF don kunna sassan da za a iya faɗaɗawa.

Gudun horon da ke faɗuwa tare da 'CUDA daga ƙwaƙwalwar ajiya' duk da ajiyar ƙwaƙwalwar ajiya tana nuna sarari kyauta, gyarawa ta hanyar saita PYTORCH_CUDA_ALLOC_CONF don ba da damar haɓaka sassa Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don lokuta masu ƙima, da bin diddigin nasarorin samarwa da tsadar kurakurai a kan lokaci.

Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarrabawa a aikace

Amfani da torch.cuda.memory_summary ko hoton ƙwaƙwalwar ajiya don tantance waɗanne tenors da rarrabuwa ke cin GPU's 80 GB.

Yin amfani da torch.cuda.memory_summary ko hoton ƙwaƙwalwar ajiya don tantance abin da tenors da rarrabuwa ke cin Ƙungiyoyin 80 GB na GPU yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓaka ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarrabawa a aikace

vLLM's Paged Hankali yana sarrafa ma'ajiyar kulawar KV a cikin ƙayyadaddun shafuka masu girma don ba da buƙatun taɗi na lokaci ɗaya ba tare da ɓata ƙwaƙwalwar ajiya ba.

vLLM's Paged Hankali kula da kulawar KV cache a cikin ƙayyadaddun shafuka don hidimar buƙatun taɗi na lokaci guda ba tare da ɓata ƙwaƙwalwar ajiya Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

Gudanar da Ƙwaƙwalwar Ƙwaƙwalwar GPU da Rarrabawa a aikace

Rage girman batch ko ba da damar duban gradient don yanke ƙwaƙwalwar kunnawa da guje wa ɓarna-kore gazawar ƙwaƙwalwar ajiya.

Rage girman batch ko ba da damar dubawar gradient don yanke ƙwaƙwalwar kunna kunnawa da guje wa rarrabuwar kawuna daga rashin ƙwaƙwalwar ajiya Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da tsadar kurakurai a kan lokaci.

Hatsari & Tsare-tsare

!

Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.

!

Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.

!

Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.

Taswirar Hanya

1

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike