UMHLAHLANDLELA Wobuchwepheshe

I-GPTQ kanye ne-AWQ Post-Training Quantization

I-GPTQ ne-AWQ izindlela ezimbili ezihamba phambili zokunciphisa amamodeli olimi asevele aqeqeshelwe ukuya ekunembeni okuyi-4-bit ukuze asebenzise ihadiwe eshibhile, encane.

Uhlolojikelele

I-GPTQ ne-AWQ izindlela ezimbili ezihamba phambili zokunciphisa amamodeli olimi asevele aqeqeshelwe ukuya ekunembeni okuyi-4-bit ukuze asebenzise ihadiwe eshibhile, encane. Kungakho ungasebenzisa imodeli enekhono ku-GPU yomthengi oyedwa esikhundleni se-datacenter rack.

I-GPTQ kanye ne-AWQ Post-Training Quantization iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini.

I-Deep Dive

I-Post-training quantization (PTQ) icindezela imodeli eqediwe ngaphandle kokuyiqeqesha kabusha, yenza imephu izisindo ezinembe kakhulu zehle ziye kumabhithi angu-4 ukuze cishe ikota yenkumbulo. Inselele ukwenza lokhu ngaphandle kokuphazamisa ukunemba. I-GPTQ (ukuthuthukiswa kwe-OBQ) ilinganisa ungqimba lwesisindo ngongqimba, isebenzisa ulwazi lwe-oda lesibili olusuka kudathasethi encane yokulinganisa ukuze kulungiswe izisindo ezisele futhi kunxeshezelwe iphutha ngalinye lokuqoqa. I-AWQ (I-Activation-aware Weight Quantization) ithatha i-engeli ehlukile: ibona ukuthi ingxenye encane yamashaneli esisindo ibaluleke ngokulinganayo, ikhonjwa ngokubheka ubukhulu bokwenza kusebenze, futhi ivikela lawo mashaneli abalulekile ngokukala kunokuwalinganisa ngamandla. Womabili avumela amamodeli afana ne-Llama ukuthi asebenze ku-4-bit, futhi amathuluzi afana ne-vLLM, i-llama.cpp, ne-AutoGPTQ awenze ajwayelekile ekuqondeni kwasendaweni nokonga imali.

I-Technical Insight

I-GPTQ isebenzisa ukulinganisa kwe-Hessian (ijika lokulahlekelwa) ukuze inqume ukuthi ukuzungeza isisindo esisodwa kufanele kugudluze kanjani ezinye, kuncishiswe iphutha elethuliwe. I-AWQ yeqa ama-Hessians ngokuphelele: ibala isici sokukala sesiteshi ngasinye ukuze iziteshi ezibalulekile zesisindo zigcine ukunemba kwazo okusebenzayo, bese zilinganisa ngokulinganayo. Kokubili kugcina ukwenza kusebenze ngokunemba okuphezulu futhi cindezela izisindo kuphela, njengoba izisindo zibusa inkumbulo kuyilapho ukwenza kusebenze ukulinganisa kuvame ukulimaza ukunemba okwengeziwe.

I-Mastering GPTQ kanye ne-AWQ Post-Training Quantization

I-GPTQ ne-AWQ izindlela ezimbili ezihamba phambili zokunciphisa amamodeli olimi asevele aqeqeshelwe ukuya ekunembeni okuyi-4-bit ukuze asebenzise ihadiwe eshibhile, encane. Kungakho ungasebenzisa imodeli enekhono ku-GPU yomthengi oyedwa esikhundleni se-datacenter rack. I-GPTQ kanye ne-AWQ Post-Training Quantization iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini. Ukuze wakhe ukuqonda okujulile, phatha i-GPTQ kanye ne-AWQ Post-Training Quantization njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho uhlelo olungakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-GPTQ kanye ne-AWQ Post-Training Quantization athuthukisa izakhiwo, idatha, nokukhetha kwengqalasizinda ngokumelene nokuthembeka nezindleko. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ngesikhathi esifanayo, Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka.

Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha.

Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni.

Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-GPTQ kanye ne-AWQ Post-Training Quantization

I-Quantization iphusha ngaphansi kwamabhithi angu-4 kuya ku-3-bit, 2-bit, kanye nezikimu ezinembile ezixubile, ngokuvamile ezihlanganiswe nobuncane. Lindela ukusondelana okusondelene nezinjini ezinikezayo ukuze ukwandise, ukucindezelwa kwenqolobane ye-KV, kanye nokuqopha okuqagelayo kusebenza ndawonye. Ukusekelwa kwezingxenyekazi zekhompuyutha zamafomethi ebhithi ephansi njenge-NVFP4 ne-MXFP4 kuyakhula, futhi amathuluzi azenzakalelayo azokhetha ngokuqhubekayo ububanzi bebhithi yesendlalelo ngasinye. Umgomo obanzi ucishe ulahlekelwe ngu-4-bit (nangaphansi) njengokuzenzakalelayo, okwenza amamodeli aqinile ashibhe ukuze asebenze yonke indawo.

Ukuqaliswa Komhlaba Wangempela

Isebenzisa imodeli ye-Llama yepharamitha engu-70-bhiliyoni ku-GPU eyodwa yomthengi ongu-24 GB isebenzisa izisindo ze-GPTQ ezingu-4-bit.

Amamodeli anenani le-AWQ asetshenziswa ekuphumeni okuphezulu ku-vLLM kuma-API okukhiqiza akongayo.

I-llama.cpp isebenzisa izisindo ze-GGUF ezilinganiselwe ukusebenzisa amamodeli olimi endaweni kukhompuyutha ephathekayo ye-CPU.

Imitapo yolwazi ye-Hugging Face's AutoGPTQ kanye ne-AutoAWQ evumela onjiniyela balinganisele imodeli elandiwe ngemigqa embalwa yekhodi.

Amaphethini Okusebenzisa

I-GPTQ kanye ne-AWQ Post-Training Quantization in practice

Isebenzisa imodeli ye-Llama yepharamitha engu-70-bhiliyoni ku-GPU eyodwa yomthengi ongu-24 GB isebenzisa izisindo ze-GPTQ ezingu-4-bit.

Ukusebenzisa imodeli ye-Llama yepharamitha engu-70-bhiliyoni ku-GPU yomthengi oyedwa engu-24 GB esebenzisa izisindo ze-GPTQ ezingu-4-bit Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-GPTQ kanye ne-AWQ Post-Training Quantization in practice

Amamodeli anenani le-AWQ asetshenziswa ekuphumeni okuphezulu ku-vLLM kuma-API okukhiqiza akongayo.

Amamodeli alinganiswe nge-AWQ asetshenziswa ekuphumeni okuphezulu ku-vLLM ukuze ama-API akhiqize Amaqembu ngokuvamile athola imiphumela engcono lapho echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-GPTQ kanye ne-AWQ Post-Training Quantization in practice

I-llama.cpp isebenzisa izisindo ze-GGUF ezilinganiselwe ukusebenzisa amamodeli olimi endaweni kukhompuyutha ephathekayo ye-CPU.

I-llama.cpp isebenzisa izisindo ze-GGUF ezilinganiselwe ukuze isebenzise amamodeli olimi endaweni kukhompuyutha ephathekayo ye-CPU ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-GPTQ kanye ne-AWQ Post-Training Quantization in practice

Imitapo yolwazi ye-Hugging Face's AutoGPTQ kanye ne-AutoAWQ evumela onjiniyela balinganisele imodeli elandiwe ngemigqa embalwa yekhodi.

Imitapo yolwazi ye-Hugging Face's AutoGPTQ kanye ne-AutoAWQ evumela abathuthukisi ukuba balinganisele imodeli elandiwe emigqeni embalwa yamakhodi Amaqembu ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu.

!

Izindleko zengqalasizinda nezokulungisa zivame ukubukelwa phansi.

!

Izikhala zokuphepha nokubonakala zingakhula njengoba izinhlelo ziba nzima kakhulu.

Ukuqalisa Umhlahlandlela

1

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa.

Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha.

Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi.

Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala.

Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole