UMHLAHLANDLELA WE-AI womsindo

I-VITS Ukuhlanganiswa Kwenkulumo Yokuphela-kuya-Ekupheleni

I-VITS imodeli yokuguqula umbhalo ube inkulumo eshintsha umbhalo ngokuqondile uwenze amaza omsindo angahluziwe kusistimu eyodwa eqeqeshiwe, yeqa ipayipi elivamile lezigaba ezimbili.

Uhlolojikelele

I-VITS imodeli yokuguqula umbhalo ube inkulumo eshintsha umbhalo ngokuqondile uwenze amaza omsindo angahluziwe kusistimu eyodwa eqeqeshiwe, yeqa ipayipi elivamile lezigaba ezimbili. Ngokuhlanganisa ukucatshangelwa okuguquguqukayo nokuqeqeshwa kokuphikisana, kukhiqiza inkulumo engokwemvelo ngokuphawulekayo, evezayo.

I-VITS End-to-End Speech Synthesis ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya.

I-Deep Dive

I-VITS (I-Variational Inference ehambisana nokufunda kokuphikisana kombhalo-to-ekupheleni wombhalo-kuya-enkulumweni), owethulwe nguKim, Kong, kanye neNdodana ngo-2021, ihlanganisa imibono emithathu amasistimu amadala ayigcina ehlukene. I-autoencoder enemibandela enemibandela (VAE) ifunda ukumelwa okucashile kwenkulumo, ukugeleza okuvamile kwenza lokho kusatshalaliswa okucashile kuvumelana nezimo ngokwanele ukuze kuthwebule imininingwane emihle ye-acoustic, futhi umbandlululi wesitayela se-GAN uphusha i-waveform ekhiqiziwe iye kumaqiniso. Okubaluleke kakhulu, i-VITS iqeqesha imodeli ye-acoustic kanye nevokhoda ndawonye kunezigaba ezimbili, isusa ukungafani okwehlisa izinga lapho amamojula eqeqeshwa ngokuhlukana. Futhi yethula isibikezelo sobude besikhathi se-stochastic, ukuze umusho ofanayo ukhulunywe ngezigqi ezihlukene, ezizwakalayo zemvelo isikhathi ngasinye.

I-Technical Insight

I-VITS ixazulula inkinga yokuqondanisa ngokusesha kwe-Monotonic Alignment (MAS), ethola imephu ehamba phambili phakathi kwamathokheni ombhalo namafreyimu omsindo phakathi nokuqeqeshwa ngaphandle kokuqondanisa kwangaphandle. Ingemuva le-VAE libalwa ngekhompyutha lisuka kumsindo wangempela, kuyilapho okubekwe ngaphambilini embhalweni kuphinde kumiswe ngokugeleza okujwayelekile ukuze kufane nawo. Uma kucatshangelwa, wenza isampula embhalweni ngaphambi kwalokho bese unquma ngokuqondile ku-waveform, ngakho-ke ayikho i-mel-spectrogram ehlukile futhi ayikho i-vocoder ehlukile edingekayo.

I-Mastering VITS Ukuhlanganiswa Kwenkulumo Yokuphela-kuya-Ekupheleni

I-VITS imodeli yokuguqula umbhalo ube inkulumo eshintsha umbhalo ngokuqondile uwenze amaza omsindo angahluziwe kusistimu eyodwa eqeqeshiwe, yeqa ipayipi elivamile lezigaba ezimbili. Ngokuhlanganisa ukucatshangelwa okuguquguqukayo nokuqeqeshwa kokuphikisana, kukhiqiza inkulumo engokwemvelo ngokuphawulekayo, evezayo. I-VITS End-to-End Speech Synthesis ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya. Ukuze wakhe ukuqonda okujulile, phatha i-VITS End-to-End Speech Synthesis njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-VITS End-to-End Speech Synthesis aphatha ikhwalithi, ukubambezeleka, kanye nemvume njengezingxenye ezibalulekile ngokulinganayo zesu lokuthunyelwa. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ngesikhathi esifanayo, ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyakhuphuka uma imvume ingekho. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-VITS End-to-End Speech Synthesis

I-VITS idale umndeni wabalandelayo obusa umthombo ovulekile we-TTS. I-VITS2 yenza ukwakhiwa kwaba lula kanye nemvelo ethuthukisiwe, kuyilapho i-YourTTS kanye ne-Coqui XTTS esetshenziswa kabanzi yanweba indlela yokuhlanganisa izwi nezilimi eziningi. Lindela ukuqhubeka komsebenzi ezintweni ezilula, zesikhathi sangempela ezikudivayisi, ukumbozwa okungcono kwezilimi eziningi ngezilimi ezisetshenziswa kancane, nokulawula okuqinile emizweni nesitayela sokukhuluma, njengoba idizayini esuka ekupheleni iyisisekelo esikhangayo, esiqondwa kahle ongakhela kuso.

Ukuqaliswa Komhlaba Wangempela

I-Coqui TTS ithumela amamodeli asuselwa ku-VITS abathuthukisi abawashuna kahle ukuze bahlanganise izwi lomlandi othile kuma-audiobook.

Abasizi bezwi bomthombo ovulekile kuzingxenyekazi zekhompuyutha ze-Raspberry Pi-class basebenzisa amamodeli e-VITS ahlangene ukuze bakhiphe inkulumo engaxhunyiwe ku-inthanethi ngokugcwele.

Izinhlelo zokusebenza zokufunda ulimi zenza izibonelo zokuphimisa zemvelo zisebenzisa okuhlukile kwe-VITS yezilimi eziningi njenge-YourTTS.

Izitudiyo zegeyimu ye-Indie zihlanganisa imigqa yengxoxo ye-NPC eyahlukahlukene, incike ekubikezeleni ubude besikhathi se-stochastic ngesigqi esingelona irobhothi.

Amaphethini Okusebenzisa

I-VITS End-to-End Speech Synthesis in practice

I-Coqui TTS ithumela amamodeli asuselwa ku-VITS abathuthukisi abawashuna kahle ukuze bahlanganise izwi lomlandi othile kuma-audiobook.

I-Coqui TTS ithumela amamodeli asuselwa ku-VITS abathuthukisi abawashuna kahle ukuze bahlanganise izwi lomlandi othile kuma-audiobooks Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-VITS End-to-End Speech Synthesis in practice

Abasizi bezwi bomthombo ovulekile kuzingxenyekazi zekhompuyutha ze-Raspberry Pi-class basebenzisa amamodeli e-VITS ahlangene ukuze bakhiphe inkulumo engaxhunyiwe ku-inthanethi ngokugcwele.

Abasizi bezwi bomthombo ovulekile kuhardware ye-Raspberry Pi-class basebenzisa amamodeli e-VITS ahlangene okukhipha inkulumo engaxhunyiwe ku-inthanethi ngokugcwele Amaqembu ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-VITS End-to-End Speech Synthesis in practice

Izinhlelo zokusebenza zokufunda ulimi zenza izibonelo zokuphimisa zemvelo zisebenzisa okuhlukile kwe-VITS yezilimi eziningi njenge-YourTTS.

Izinhlelo zokusebenza zokufunda ulimi zenza izibonelo zokuphimisa zemvelo zisebenzisa okuhlukile kwe-VITS yezilimi eziningi njenge-YourTTS Teams ngokuvamile zithola imiphumela engcono uma echaza imikhawulo yekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-VITS End-to-End Speech Synthesis in practice

Izitudiyo zegeyimu ye-Indie zihlanganisa imigqa yengxoxo ye-NPC eyahlukahlukene, incike ekubikezeleni ubude besikhathi se-stochastic ngesigqi esingelona irobhothi.

Izitudiyo zegeyimu ye-Indie zihlanganisa imigqa yengxoxo ye-NPC eyahlukahlukene, incike ekubikezeleni ubude besikhathi esiqinile sesigqi esingelona irobhothi Amaqembu ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyanda uma imvume ingekho.

!

Ukunemba kungase kwehle kuzo zonke izinhlobo zokuphimisela, izilimi zesigodi, noma izindawo ezinomsindo.

!

Umsindo wokwenziwa ungenziwa iphutha njengenkulumo eyiqiniso ngaphandle kokulebula okucacile.

Ukuqalisa Umhlahlandlela

1

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha.

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva.

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo.

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele.

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole