Jagorar Fasaha

GPTQ da AWQ Ƙididdigar Koyarwa Bayan Koyarwa

GPTQ da AWQ hanyoyi ne na jagoranci guda biyu don raguwar ƙirar harshe da aka riga aka horar zuwa daidaitattun 4-bit don haka suna gudana akan rahusa, ƙaramin kayan aiki.

Dubawa

GPTQ da AWQ hanyoyi ne na jagoranci guda biyu don raguwar ƙirar harshe da aka riga aka horar zuwa daidaitattun 4-bit don haka suna gudana akan rahusa, ƙaramin kayan aiki. Su ne dalilin da ya sa za ku iya gudanar da ingantaccen samfuri akan GPU mabukaci guda ɗaya maimakon taragon datacenter.

GPTQ da AWQ Ƙididdigar Koyarwa Bayan-Training wani shingen gini ne na fasaha wanda ke shafar ingancin samfurin, farashin kayan aiki, latency, da aminci a sikelin.

Zurfafa nutsewa

Ƙididdigar horon bayan horo (PTQ) yana matsawa samfurin da aka gama ba tare da sake horar da shi ba, yana tsara ma'aunin ma'auni mai tsayi zuwa ragi 4 zuwa kusan kwata ƙwaƙwalwar ajiya. Kalubalen shine yin hakan ba tare da lalata daidaito ba. GPTQ (gyaran OBQ) yana ƙididdige ma'aunin ma'auni ta Layer, ta amfani da bayanin oda na biyu daga ƙaramin bayanan daidaitawa don daidaita sauran ma'aunin nauyi da rama kowane kuskuren zagaye. AWQ (Ayyukan Ƙimar Ma'aunin Kunnawa) yana ɗaukar wani kusurwa daban: yana lura cewa ƙananan tashoshi masu nauyi ba su da mahimmanci, an gano su ta hanyar kallon girman kunnawa, kuma yana kare waɗannan tashoshi masu mahimmanci ta hanyar ƙididdige su da ƙarfi. Dukansu suna barin samfura kamar Llama su gudana a cikin 4-bit, kuma kayan aikin kamar vLLM, llama.cpp, da AutoGPTQ sun sanya su na yau da kullun don ƙimar gida da ingantaccen farashi.

Fahimtar Fasaha

GPTQ yana amfani da kimanin Hessian (curvature of the loss) don yanke shawarar yadda zagayawa ɗaya nauyi ya kamata ya motsa sauran, rage girman kuskuren da aka gabatar. AWQ ya tsallake Hessians gabaɗaya: yana ƙididdige ma'aunin sikelin kowane tashoshi ta yadda mahimman tashoshi masu nauyi su kiyaye ingantaccen ingancin su, sannan ƙididdige su daidai. Dukansu suna ci gaba da kunnawa cikin madaidaicin madaidaici kuma kawai damfara ma'aunin nauyi, tunda ma'aunin nauyi ya mamaye ƙwaƙwalwar ajiya yayin da ƙididdige ƙididdigewa yana ƙoƙarin cutar da daidaito sosai.

Jagorar GPTQ da AWQ Ƙididdigar Koyarwa Bayan Koyarwa

GPTQ da AWQ hanyoyi ne na jagoranci guda biyu don raguwar ƙirar harshe da aka riga aka horar zuwa daidaitattun 4-bit don haka suna gudana akan rahusa, ƙaramin kayan aiki. Su ne dalilin da ya sa za ku iya gudanar da ingantaccen samfuri akan GPU mabukaci guda ɗaya maimakon taragon datacenter. GPTQ da AWQ Ƙididdigar Koyarwa Bayan-Training wani shingen gini ne na fasaha wanda ke shafar ingancin samfurin, farashin kayan aiki, latency, da aminci a sikelin. Don gina fahimta mai zurfi, bi da GPTQ da AWQ Ƙididdigar Horarwa a matsayin samfurin aiki, ba fasali ɗaya ba: ayyana sakamakon da ake so, bayyana zato, da kuma raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da GPTQ da AWQ Ƙididdigar Horowa Bayan-Training suna haɓaka gine-gine, bayanai, da zaɓin abubuwan more rayuwa tare da dogaro da farashi. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A lokaci guda, Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru.

Hukunce-hukuncen gine-gine suna haifar da aiki da tsadar aiki na shekaru. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba.

Ilimin fasaha yana taimaka wa ƙungiyoyi su zaɓi tari mai kyau, ba kawai sabon abu ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa.

Zaɓuɓɓukan injiniya mafi kyau suna rage abin dogaro a cikin samarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar GPTQ da AWQ Ƙididdigar Koyarwa Bayan Koyarwa

Ƙididdigar ƙididdigewa tana turawa ƙasa 4 ragowa zuwa 3-bit, 2-bit, da tsare-tsare-daidaitacce, yawanci haɗe da ɓatanci. Yi tsammanin haɗin kai kusa da injunan hidima don haka ƙididdigewa, matsewar KV-cache, da ƙididdige ƙima suna aiki tare. Tallafin kayan masarufi don tsarin ƙananan-bit kamar NVFP4 da MXFP4 yana girma, kuma kayan aikin sarrafa kansa za su ƙara ɗaukar faɗuwar bit-layer. Babban burin yana kusa-rasa 4-bit (kuma ƙasa) azaman tsoho, yana yin samfura masu ƙarfi mai arha don hidima a ko'ina.

Aiwatar da Gaskiyar Duniya

Gudun samfurin Llama na siga na biliyan 70 akan GPU mabukaci guda 24 GB ta amfani da ma'aunin GPTQ 4-bit.

Samfura masu ƙididdigewa na AWQ waɗanda aka yi aiki a babban kayan aiki a cikin vLLM don APIs samarwa masu inganci.

llama.cpp ta amfani da ma'aunin GGUF mai ƙididdigewa don gudanar da ƙirar harshe a cikin gida akan kwamfutar tafi-da-gidanka CPU.

Rungumar Face's AutoGPTQ da ɗakunan karatu na AutoAWQ yana barin masu haɓaka ƙididdige samfurin da aka zazzage cikin ƴan layukan lamba.

Hanyoyin Aiwatarwa

GPTQ da AWQ Ƙididdigar Horarwa a aikace

Gudun samfurin Llama na siga na biliyan 70 akan GPU mabukaci guda 24 GB ta amfani da ma'aunin GPTQ 4-bit.

Gudun samfurin Llama na biliyan 70 akan mabukaci na 24 GB guda ɗaya ta amfani da ma'aunin GPTQ 4-bit Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da kuma bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

GPTQ da AWQ Ƙididdigar Horarwa a aikace

Samfura masu ƙididdigewa na AWQ waɗanda aka yi aiki a babban kayan aiki a cikin vLLM don APIs samarwa masu inganci.

Ƙididdigar ƙididdiga ta AWQ da aka yi aiki a babban kayan aiki a cikin vLLM don samar da ingantaccen farashi na APIs Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da ƙimar kuskure a kan lokaci.

GPTQ da AWQ Ƙididdigar Horarwa a aikace

llama.cpp ta amfani da ma'aunin GGUF mai ƙididdigewa don gudanar da ƙirar harshe a cikin gida akan kwamfutar tafi-da-gidanka CPU.

llama.cpp ta yin amfani da ma'aunin GGUF mai ƙididdigewa don gudanar da ƙirar harshe a cikin gida akan kwamfutar tafi-da-gidanka CPU Ƙungiyoyin CPU yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'i, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

GPTQ da AWQ Ƙididdigar Horarwa a aikace

Rungumar Face's AutoGPTQ da ɗakunan karatu na AutoAWQ yana barin masu haɓaka ƙididdige samfurin da aka zazzage cikin ƴan layukan lamba.

Rungumar Face's AutoGPTQ da ɗakunan karatu na AutoAWQ suna barin masu haɓakawa su ƙididdige samfurin da aka zazzage a cikin ƴan layukan lambar Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da farashi na kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Haɓaka ma'auni ɗaya na iya ɓoye manyan raunin tsarin.

!

Sau da yawa ana raina kayan more rayuwa da kuma kuɗin kulawa.

!

Tsaro da gibin lura na iya girma yayin da tsarin ke ƙara haɓaka.

Taswirar Hanya

1

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa.

Ƙayyade latency, inganci, da maƙasudin farashi kafin aiwatarwa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai.

Alamar ma'auni a ƙarƙashin ainihin kaya da yanayin bayanai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani.

Kula da kayan aiki don kurakurai, ɗigo, da tasirin mai amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli.

Shirya bijirowa da hanyoyin mayar da martani kafin sikeli. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike