Jagoran Harshe AI

Word2Vec Skip-gram da CBOW

Word2Vec wata dabara ce ta 2013 daga Google wacce ke koyan manyan kalmomi ta hanyar tsinkayar kalmomi daga makwabta, mai da harshe zuwa lissafi inda kalmomi iri daya ke zama kusa da juna.

Dubawa

Word2Vec wata dabara ce ta 2013 daga Google wacce ke koyan manyan kalmomi ta hanyar tsinkayar kalmomi daga makwabta, mai da harshe zuwa lissafi inda kalmomi iri daya ke zama kusa da juna. Ya sa sanannen "sarki - namiji + mace ≈ sarauniya" ya yiwu kuma ya kaddamar da zamani na zamani.

Word2Vec Skip-Gram da CBOW wani ɓangare ne na tarin harshe-AI da ake amfani da shi don karantawa, ƙirƙira, rarrabawa, da canza rubutu da magana a sikeli.

Zurfafa nutsewa

Word2Vec, wanda Tomas Mikolov da abokan aikinsa suka gabatar a Google a shekara ta 2013, yana koyon vector (yawanci lambobi 100-300) ga kowace kalma ta hanyar horar da cibiyar sadarwa mara tushe mai Layer Layer biyu akan taga mahallin zamiya. Ya zo a cikin dadin dandano biyu. CBOW (Bag of Words) yana ɗaukar kalmomin mahallin da ke kewaye da su kuma yana tsinkayar kalmar tsakiyar da ta ɓace, yana daidaita mahallin mahallin tare. Skip-Gram yana jujjuya wannan: yana ɗaukar kalmar tsakiya kuma yana ƙoƙarin tsinkaya kowace kalmar mahallin da ke kewaye. Samfurin bai taba kula da aikin tsinkaya kansa ba; Manufar ita ce matrix nauyi da ya koya a hanya, wanda layuka ya zama kalmar vectors. Words appearing in similar contexts end up with similar vectors, capturing meaning purely from co-occurrence.

Fahimtar Fasaha

Horar da cikakken softmax akan babban ƙamus yana da jinkirin gaske, don haka Word2Vec yana amfani da dabaru kamar samfuri mara kyau, wanda ke canza hasashe azaman rarrabuwa na binary: bambanta kalmar mahallin gaskiya daga ɗimbin kalmomin “marasa kyau” bazuwar. Hakanan yana ƙaddamar da misalan kalmomi akai-akai kamar "da" kuma yana amfani da rarraba-ɗaya-zuwa-0.75 don ɗauka mara kyau. CBOW yana da sauri kuma mafi kyau ga kalmomi akai-akai; Tsallake-Gram tare da ƙima mara kyau yana sarrafa kalmomi da ba kasafai ba kuma mafi kyawu.

Jagorar Word2Vec Skip-gram da CBOW

Word2Vec wata dabara ce ta 2013 daga Google wacce ke koyan manyan kalmomi ta hanyar tsinkayar kalmomi daga makwabta, mai da harshe zuwa lissafi inda kalmomi iri daya ke zama kusa da juna. Ya sa sanannen "sarki - namiji + mace ≈ sarauniya" ya yiwu kuma ya kaddamar da zamani na zamani. Word2Vec Skip-Gram da CBOW wani ɓangare ne na tarin harshe-AI da ake amfani da shi don karantawa, ƙirƙira, rarrabawa, da canza rubutu da magana a sikeli. Don gina zurfin fahimta, bi Word2Vec Skip-Gram da CBOW a matsayin samfurin aiki, ba sifa ɗaya ba: ayyana sakamakon da ake so, fayyace zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu ke buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi ta amfani da Word2Vec Skip-Gram da CBOW ƙirƙira ƙira, dawo da, da sake duba madaukai azaman tsarin sadarwa da aka haɗa. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba. A lokaci guda, abubuwan da ba a iya gani ba na iya shigar da rahotanni cikin nutsuwa, kwararar goyan baya, ko abubuwan bincike. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba.

Gudun aikin harshe na iya tafiya da sauri ba tare da sadaukar da daidaito ba. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Yana faɗaɗa damar shiga cikin harsuna da salon sadarwa.

Yana faɗaɗa damar shiga cikin harsuna da salon sadarwa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ƙungiyoyi za su iya ciyar da ƙarin lokaci akan hukunci yayin da aiki da kai ke sarrafa maimaitawa.

Ƙungiyoyi za su iya ciyar da ƙarin lokaci akan hukunci yayin da aiki da kai ke sarrafa maimaitawa. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar Word2Vec Skip-gram da CBOW

Abubuwan da aka saka a tsaye kamar Word2Vec an maye gurbinsu da ƙirar mahallin (ELMo, BERT, taswira) waɗanda ke ba da kalma daban-daban vector dangane da mahallin jumla, warware matsalar polysemy inda "banki" yana da tsayayyen vector guda ɗaya. Duk da haka Word2Vec yana jure inda sauri, sauƙi, da al'amuran fassara: tsarin shawarwari, bincike, kuma azaman tushen koyarwa. Babban ra'ayinsa, cewa ma'anar ta fito daga kididdigar abubuwan da suka faru tare, ta kasance ginshiƙan ra'ayi na duk ƙirar harshe na zamani.

Aiwatar da Gaskiyar Duniya

Spotify da Airbnb sun daidaita Skip-Gram don koyon shigar da waƙoƙi da jeri ("item2vec") daga jerin zaman mai amfani don shawarwari.

Ƙaddamar da bincike na ma'ana da faɗaɗa ma'anar ma'ana don haka tambayar "laptop" kuma tana saman "littafin rubutu" da "kwamfuta"

Gano kwatanci da alaƙa a cikin rubutu, kamar nau'i-nau'i na babban birni (Paris zuwa Faransa kamar yadda Tokyo yake zuwa Japan)

Ƙaddamar da layin shigar da manyan bututun NLP don nazarin ra'ayi da rarraba daftarin aiki akan ƙayyadaddun bayanai

Hanyoyin Aiwatarwa

Word2Vec Skip-gram da CBOW a aikace

Spotify da Airbnb sun daidaita Skip-Gram don koyon shigar da waƙoƙi da jeri ("item2vec") daga jerin zaman mai amfani don shawarwari.

Spotify da Airbnb sun dace da Skip-Gram don koyon haɗakar waƙoƙi da jeri ("item2vec") daga jerin zaman masu amfani don shawarwari Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da kuma bin diddigin abubuwan da ake samu da kuma tsadar kuɗi a kan lokaci.

Word2Vec Skip-gram da CBOW a aikace

Ƙaddamar da bincike na ma'ana da faɗaɗa ma'ana don haka tambayar "laptop" kuma tana kan "littafin rubutu" da "kwamfuta".

Ƙarfafa bincike na ma'ana da faɗaɗa ma'ana don haka tambaya don "laptop" kuma tana kan "littafin rubutu" da "kwamfuta" Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da kuma bibiyar duk nasarorin da ake samu da kuma tsadar kuɗi a kan lokaci.

Word2Vec Skip-gram da CBOW a aikace

Gano kwatanci da alaƙa a cikin rubutu, kamar nau'i-nau'i na babban birni (Paris ga Faransa kamar yadda Tokyo yake zuwa Japan).

Gano kwatanci da alaƙa a cikin rubutu, kamar nau'i-nau'i na babban birni (Paris zuwa Faransa kamar yadda Tokyo yake zuwa Japan) Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ma'auni masu inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da tsadar kurakurai a kan lokaci.

Word2Vec Skip-gram da CBOW a aikace

Ƙaddamar da layin shigar da manyan bututun NLP don nazarin ra'ayi da rarraba daftarin aiki akan ƙayyadaddun bayanai.

Ƙaddamar da layin shigar da manyan bututun NLP don nazarin jin daɗi da rarrabuwar takardu akan ƙayyadaddun bayanai Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin duk nasarorin samarwa da ƙimar kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Abubuwan da aka ruɗe suna iya shigar da rahotanni cikin nutsuwa, kwararar tallafi, ko abubuwan bincike.

!

Hankali na gaggawa na iya ƙirƙirar sakamako mara daidaituwa a cikin buƙatun iri ɗaya.

!

Za a iya fallasa bayanan rubutu mai ma'ana idan ikon samun dama yana da rauni.

Taswirar Hanya

1

Ƙayyade tsarin fitarwa, sautin, da ma'auni masu inganci kafin fitowa.

Ƙayyade tsarin fitarwa, sautin, da ma'auni masu inganci kafin fitowa. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Amsa a ƙasa tare da amintattun tushe a duk lokacin da daidaito ya shafi mahimmanci.

Amsa a ƙasa tare da amintattun tushe a duk lokacin da daidaito ya shafi mahimmanci. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Ajiye wurin binciken ɗan adam don abubuwan da ake samu masu girma.

Ajiye wurin binciken ɗan adam don abubuwan da ake samu masu girma. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Bibiyar tsarin gazawar kuma sake horar da tsokaci ko tafiyar aiki akai-akai.

Bibiyar tsarin gazawar kuma sake horar da tsokaci ko tafiyar aiki akai-akai. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike