Jagoran Harshe AI

KV cache

Ma'ajiyar KV tana adana maɓalli da ƙima na kayan aikin da taswirar ta riga ta ƙididdige su don alamun da suka gabata, don haka ba dole ba ne a sake ƙididdige su ga kowace sabuwar kalma da ta haifar.

Dubawa

Ma'ajiyar KV tana adana maɓalli da ƙima na kayan aikin da taswirar ta riga ta ƙididdige su don alamun da suka gabata, don haka ba dole ba ne a sake ƙididdige su ga kowace sabuwar kalma da ta haifar. Shi ne babban dalilin da yasa tsara rubutun ke da sauri - kuma babban abin cin ƙwaƙwalwar GPU ɗin ku yayin tattaunawa mai tsawo.

KV Cache wani yanki ne na tarin harshe-AI da ake amfani da shi don karantawa, ƙirƙira, rarrabawa, da canza rubutu da magana a sikeli.

Zurfafa nutsewa

Masu canza canji suna samar da rubutu alama ɗaya a lokaci guda, kuma kowane sabon layin alamar alama yana buƙatar kwatanta da kowane alamar da ta gabata. Tsarin kulawa yana juya kowace alama zuwa tambaya, maɓalli, da ƙima. Ba tare da caching ba, samar da lambar alama 1,000 na nufin sake lissafin maɓallai da ƙima ga duk alamun 999 na farko a kowane mataki - muƙamai, aikin ɓarna. Cache na KV yana adana waɗannan maɓalli da ƙima bayan an fara ƙididdige su kuma sake amfani da su, don haka kowane sabon mataki kawai yana ƙididdige ƙididdiga don sabuwar alama guda ɗaya kuma yana halartar wurin da aka adana. Wannan yana rage farashin kowane-alami daga ƙima tare da tsayin jeri zuwa kusan akai. Kasuwancin kashewa shine ƙwaƙwalwar ajiya: cache yana girma a layi tare da tsayin mahallin, adadin yadudduka, da kawunan hankali, sau da yawa ya zama babban mabukaci ƙwaƙwalwar ajiya a cikin sabis na dogon lokaci.

Fahimtar Fasaha

A lokacin 'prefill' tsarin samfurin yana aiwatar da duk abin da sauri kuma ya cika cache; yayin 'decode' yana haɗa alamar K/V ɗaya a kowane mataki kuma ya sake karantawa. Ma'auni girman cache kamar 2 (K da V) × yadudduka × kawuna × head_dim × jerin_tsawon tsari × tsari, a cikin madaidaicin zaɓi. Don haɓaka wannan, ƙirar zamani suna amfani da tarin tambaya-tambaya ko kulawar tambaya da yawa don raba maɓallai/daraja a cikin kawuna, da tsarin sabis kamar vLLM suna amfani da PagedAttention don ware cache a cikin ɓangarorin da ba su da alaƙa, yanke rarrabuwa da sharar gida.

Jagorar KV Cache

Ma'ajiyar KV tana adana maɓalli da ƙima na kayan aikin da taswirar ta riga ta ƙididdige su don alamun da suka gabata, don haka ba dole ba ne a sake ƙididdige su ga kowace sabuwar kalma da ta haifar. Shi ne babban dalilin da yasa tsara rubutun ke da sauri - kuma babban abin cin ƙwaƙwalwar GPU ɗin ku yayin tattaunawa mai tsawo. KV Cache wani yanki ne na tarin harshe-AI da ake amfani da shi don karantawa, ƙirƙira, rarrabawa, da canza rubutu da magana a sikeli. Don gina zurfin fahimta, kula da KV Cache azaman ƙirar aiki, ba fasali ɗaya ba: ayyana sakamakon da ake so, fayyace zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu ke buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi masu amfani da KV Cache ƙirar ƙira, dawo da, da sake duba madaukai azaman tsarin sadarwar haɗin gwiwa ɗaya. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba. A lokaci guda, abubuwan da ba a iya gani ba na iya shigar da rahotanni cikin nutsuwa, kwararar goyan baya, ko abubuwan bincike. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba.

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Yana faɗaɗa damar shiga cikin harsuna da salon sadarwa.

Yana faɗaɗa damar shiga cikin harsuna da salon sadarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ƙungiyoyi za su iya ciyar da ƙarin lokaci akan hukunci yayin da aiki da kai ke sarrafa maimaitawa.

Ƙungiyoyi za su iya ciyar da ƙarin lokaci akan hukunci yayin da aiki da kai ke sarrafa maimaitawa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar KV Cache

Kamar yadda mahallin windows ke shimfiɗa cikin ɗaruruwan dubunnan alamu, cache na KV ya zama babban ƙulli, don haka ƙididdigewa yana da zafi: ƙididdige cache zuwa rago 8 ko 4, manufofin korar da ke sauke alamun ƙarancin mahimmanci, raba buƙatun prefix, da saukewa zuwa CPU ko faifai. Canza gine-gine kamar latent hankali da yawa damtse cache kanta. Yi tsammanin ci gaba da ƙira na bambance-bambancen hankali da tsarin ƙwaƙwalwar ajiya da nufin yin hidimar dogayen mahallin cikin rahusa kuma a babban kayan aiki.

Aiwatar da Gaskiyar Duniya

Haɓaka amsa chatbot ta hanyar sake amfani da maɓalli/daraja da aka ɓoye daga tarihin tattaunawa maimakon sake sarrafa shi kowane juzu'i.

Prefix caching wanda ke raba cache don dogon tsarin faɗakarwa a tsakanin masu amfani da yawa, yanke farashi da latency.

vLLM's Paged Hankali yana sarrafa ma'ajiyar KV a cikin tubalan don ba da buƙatun lokaci ɗaya akan GPU ɗaya da inganci.

Ƙididdiga cache na KV zuwa ƙananan daidaito don dacewa da tsayin daka cikin ƙayyadaddun ƙwaƙwalwar GPU.

Hanyoyin Aiwatarwa

KV Cache a aikace

Haɓaka amsa chatbot ta hanyar sake amfani da maɓalli/daraja da aka ɓoye daga tarihin tattaunawa maimakon sake sarrafa shi kowane juzu'i.

Haɓaka amsawar chatbot ta hanyar sake yin amfani da maɓallan / ƙimar da aka adana daga tarihin tattaunawar maimakon sake sarrafa shi kowane juzu'i Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da tsadar kurakurai a kan lokaci.

KV Cache a aikace

Prefix caching wanda ke raba cache don dogon tsarin faɗakarwa a tsakanin masu amfani da yawa, yanke farashi da latency.

Prefix caching wanda ke raba cache don dogon tsarin yana faɗaɗa ga masu amfani da yawa, yanke farashi da Ƙungiyoyin latency yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

KV Cache a aikace

vLLM's Paged Hankali yana sarrafa ma'ajiyar KV a cikin tubalan don ba da buƙatun lokaci ɗaya akan GPU ɗaya da inganci.

vLLM's Paged Hankali kula da cache na KV a cikin tubalan don biyan buƙatun lokaci guda akan GPU guda ɗaya yadda ya kamata Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

KV Cache a aikace

Ƙididdiga cache na KV zuwa ƙananan daidaito don dacewa da tsayin daka cikin ƙayyadaddun ƙwaƙwalwar GPU.

Ƙididdiga cache na KV zuwa ƙananan daidaito don dacewa da tsayin daka cikin ƙayyadaddun ƙwaƙwalwar ƙwaƙwalwar GPU yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Abubuwan da aka ruɗe suna iya shigar da rahotanni cikin nutsuwa, kwararar tallafi, ko abubuwan bincike.

!

Hankali na gaggawa na iya ƙirƙirar sakamako mara daidaituwa a cikin buƙatun iri ɗaya.

!

Za a iya fallasa bayanan rubutu mai ma'ana idan ikon samun dama yana da rauni.

Taswirar Hanya

1

Ƙayyade tsarin fitarwa, sautin, da ma'auni masu inganci kafin fitowa.

Ƙayyade tsarin fitarwa, sautin, da ma'auni masu inganci kafin fitowa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Amsa a ƙasa tare da amintattun tushe a duk lokacin da daidaito ya shafi mahimmanci.

Amsa a ƙasa tare da amintattun tushe a duk lokacin da daidaito ya shafi mahimmanci. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Ajiye wurin binciken ɗan adam don abubuwan da ake samu masu girma.

Ajiye wurin binciken ɗan adam don abubuwan da ake samu masu girma. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Bibiyar tsarin gazawar kuma sake horar da tsokaci ko tafiyar aiki akai-akai.

Bibiyar tsarin gazawar kuma sake horar da tsokaci ko tafiyar aiki akai-akai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike