Uhlolojikelele
I-SmoothQuant yindlela eyenza kube nokwenzeka ukuminyanisa amamodeli olimi amakhulu kuze kufike kuzinombolo ezingu-8-bit zazo zombili izisindo nokwenza kusebenze ngaphandle kokuqeqeshwa kabusha. Kubalulekile ngoba ukwenza kusebenze kumamodeli amakhulu kuqukethe izinto eziphuma ngaphandle ngokwedlulele ezivame ukulimaza izibalo ezinembayo, futhi i-SmoothQuant iyazithambisa.
I-SmoothQuant kanye ne-Activation Quantization iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini.
I-Deep Dive
Uma unciphisa imodeli isuka kokuntanta okungu-16-bit iye ku-8-bit integers, izisindo ziminyanisa kalula kodwa ukwenza kusebenze kuyinkinga: iziteshi ezithile zithwala amanani amakhulu ngokuphindwe ka-10 kuye kwayi-100 kunamanye, futhi ukuwaphoqelela kugridi yenombolo ephelele kulimaza ukunemba. I-SmoothQuant, yethulwe ngu-Xiao et al. ngo-2022, ibona ukuthi izisindo zibushelelezi futhi kulula ukuzilinganisela ngenkathi ukwenza kusebenze kune-spiky. Ngakho-ke ithutha ubunzima ngezibalo: ihlukanisa iziteshi zokuvula ngesilinganiso seshaneli ngayinye futhi iphindaphinde izisindo ezihambisanayo ngesilinganiso esifanayo. Le misebenzi emibili iyakhansela, ishiya okukhiphayo kwemodeli kungashintshile, kodwa manje womabili ama-tensor asezindaweni ezinobungani. Umphumela uba i-W8A8 (izisindo eziyi-8-bit nokwenza kusebenze) ukulahleka kokunemba okucishe kube yiziro kanye nokusheshisa okulinganiselwa ku-2x nokonga inkumbulo.
I-Technical Insight
Iqhinga eliyinhloko into yokushelela kwesiteshi ngasinye s ehlanganiswe njengokuthi s = max(|X|)^alpha / max(|W|)^(1-alpha). Ukwenza kusebenze kukalwa ngo-1/s nezisindo ngo-s, ngakho umkhiqizo we-matrix XW uyalondolozwa. Ngenxa yokuthi isikali sithathwa ungaxhunyiwe ku-inthanethi ezisindweni zesendlalelo sangaphambilini noma umsebenzi ohlanganisiwe, sengeza izindleko zesikhathi sokusebenza eziyiziro. I-alpha hyperparameter (ngokuvamile engu-0.5) ilawula ukuthi ungakanani umthwalo ophuma ngaphandle osuka ekwenziweni kusebenze uye ezisindweni.
I-Mastering SmoothQuant kanye ne-activation Quantization
I-SmoothQuant yindlela eyenza kube nokwenzeka ukuminyanisa amamodeli olimi amakhulu kuze kufike kuzinombolo ezingu-8-bit zazo zombili izisindo nokwenza kusebenze ngaphandle kokuqeqeshwa kabusha. Kubalulekile ngoba ukwenza kusebenze kumamodeli amakhulu kuqukethe izinto eziphuma ngaphandle ngokwedlulele ezivame ukulimaza izibalo ezinembayo, futhi i-SmoothQuant iyazithambisa. I-SmoothQuant kanye ne-Activation Quantization iyibhulokhi yokwakha yobuchwepheshe ethinta ikhwalithi yemodeli, izindleko zengqalasizinda, ukubambezeleka, nokuthembeka esikalini. Ukuze wakhe ukuqonda okujulile, phatha i-SmoothQuant kanye ne-Activation Quantization njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.
Empeleni, amaqembu aqinile asebenzisa i-SmoothQuant kanye ne-Activation Quantization alungiselela izakhiwo, idatha, nokukhetha kwengqalasizinda ngokumelene nokuthembeka nezindleko. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.
Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ngesikhathi esifanayo, Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.
I-Strategic Impact
Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka.
Izinqumo zezakhiwo ziqhuba ukusebenza kanye nezindleko zokusebenza iminyaka. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha.
Imfundo yobuchwepheshe isiza amaqembu ukuthi akhethe isitaki esifanele, hhayi nje esisha. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni.
Izinketho ezingcono zobunjiniyela zinciphisa izehlakalo ezinokwethenjelwa ekukhiqizeni. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.
Ukuqaliswa Komhlaba Wangempela
Ukukhonza i-LLM yepharamitha engu-70B ku-W8A8 kuma-GPU ambalwa ngokunciphisa kokubili inkumbulo nezindleko zokuphindaphinda kwe-matrix
Inika amandla i-INT8 inference kuma-NVIDIA Hopper/Blackwell tensor cores asheshisa ngokomdabu izibalo eziyi-8-bit integer
Isebenzisa amamodeli engxoxo ezindaweni zokugcina zamafu ezikhawulelwe izindleko lapho ukuphuma okuphindwe kabili kunciphisa ngokuqondile inkokhelo yethokheni ngayinye
Ukucindezela izifaki khodi ze-transformer enkulumweni ekudivayisi noma ukuhumusha lapho ama-8-bit kernels agijima ngokushesha nangokupholile.
Amaphethini Okusebenzisa
I-SmoothQuant kanye ne-activation Quantization ekusebenzeni
Ukukhonza i-LLM yepharamitha engu-70B ku-W8A8 kuma-GPU ambalwa ngokunciphisa kokubili izindleko zememori ne-matrix-multiply.
Ukukhonza i-LLM yepharamitha engu-70B ku-W8A8 kuma-GPU ambalwa ngokunciphisa kokubili inkumbulo nezindleko zokuphindaphindeka kwe-matrix Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-SmoothQuant kanye ne-activation Quantization ekusebenzeni
Inika amandla i-INT8 inference kuma-NVIDIA Hopper/Blackwell tensor cores asheshisa ngokomdabu izibalo eziphelele ezingu-8-bit.
Ukunika amandla ukuchazwa kwe-INT8 kuma-NVIDIA Hopper/Blackwell tensor cores asheshisa ngokokuzalwa Amaqembu ezibalo ayi-8-bit integer ngokuvamile athola imiphumela engcono lapho echaza izinga eliphezulu ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-SmoothQuant kanye ne-activation Quantization ekusebenzeni
Isebenzisa amamodeli engxoxo ezindaweni eziphoqelekile zamafu lapho ukuphindaphindeka okuphindwe kabili kunciphisa ngokuqondile inkokhelo yethokheni ngayinye.
Ukuthumela amamodeli ezingxoxo ezindaweni eziphoqelekile zamafu lapho ukuphuma okuphindwe kabili kunciphisa ngokuqondile inkokhiso yethokheni ngayinye Amaqembu ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
I-SmoothQuant kanye ne-activation Quantization ekusebenzeni
Icindezela izifaki khodi ze-transformer zenkulumo ekudivayisi noma ukuhumusha lapho ama-8-bit kernels asebenza ngokushesha nangokupholile.
Izifaki khodi ze-transformer ezicindezelayo zenkulumo ekudivayisi noma ukuhumusha lapho izinhlamvu ezingu-8-bit zigijima ngokushesha futhi zipholile Amathimba ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.
Izingozi & Guardrails
Ukuthuthukisa ibhentshimakhi eyodwa kungafihla ubuthakathaka obubanzi besistimu.
Izindleko zengqalasizinda nezokulungisa zivame ukubukelwa phansi.
Izikhala zokuphepha nokubonakala zingakhula njengoba izinhlelo ziba nzima kakhulu.
Ukuqalisa Umhlahlandlela
Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa.
Chaza ukubambezeleka, ikhwalithi, nezindleko ezihlosiwe ngaphambi kokuqaliswa. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha.
Ibhentshimakhi ngaphansi komthwalo wangempela nezimo zedatha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi.
Ukuqapha amathuluzi amaphutha, ukukhukhuleka, nomthelela wabasebenzisi. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.
Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala.
Lungiselela izindlela zokuhlehlisa nezigameko ngaphambi kokukala. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.