UMHLAHLANDLELA WE-AI womsindo

I-UnivNet Multi-Resolution Vocoder

I-UnivNet iyivokhoda ye-GAN ehlulelayo ekhiqize umsindo kusetshenziswa ama-spectrogram amaningi enziwe ngekhompyutha ngezinqumo ezihlukene ze-STFT, elola imininingwane yefrikhwensi ephezulu.

Uhlolojikelele

I-UnivNet iyivokhoda ye-GAN ehlulelayo ekhiqize umsindo kusetshenziswa ama-spectrogram amaningi enziwe ngekhompyutha ngezinqumo ezihlukene ze-STFT, elola imininingwane yefrikhwensi ephezulu. Ihlose ukuba i-vokhoda yendawo yonke ehlanganisa kahle izikhulumi ezingabonwa nezimo zokuqopha.

I-UnivNet Multi-Resolution Vocoder ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya.

I-Deep Dive

I-UnivNet, ihlongozwe nguJang et al. ngo-2021, ibhekana nobuthakathaka obujwayelekile kumavokhoda e-GAN: amafrikhwensi aphezulu ahlanganisiwe noma agcwele i-artifact. Izimo zayo zokuphehla ugesi ku-full-band mel-spectrograms futhi isebenzisa i-location-variable convolutions (LVC), lapho ama-convolution kernels ebikezelwa khona ngokuhamba esuka kuzici zokufaka ukuze isihlungi sivumelane nokuqukethwe kwendawo. Umbono wesihloko i-multi-resolution spectrogram discriminator (MRSD): esikhundleni sokwahlulela kuphela i-waveform eluhlaza, i-UnivNet ihlanganisa ama-STFT amaningana ngamawindi ahlukene nosayizi be-hop futhi isebenzisa ababandlululi kulawo magnitude e-spectrogram. Lokhu kuphusha ijeneretha ukuthi ithole kokubili imininingwane emihle ye-spectral kanye nesakhiwo sesikhashana esibanzi ngendlela efanele. Iqeqeshelwe izikhulumi eziningi, i-UnivNet ikhiqiza inkulumo yemvelo yamazwi engakaze iwabone ngesikhathi sokuqeqeshwa, izuza ilebula yayo yomhlaba wonke.

I-Technical Insight

I-convolution ye-UnivNet eguquguqukayo yendawo ikhiqiza izisindo zayo ze-kernel ngokuguquguqukayo kusukela kuzici ze-conditioning mel ngenethiwekhi encane ye-kernel-predictor, ngakho isikhathi ngasinye isinyathelo sisebenzisa ngempumelelo isihlungi esivumelana nezimo kune-kernel eyabiwe engaguquki. Kuhlanganiswe ne-multi-resolution spectrogram discriminator, ehlanganisa ukuhwebelana kwezikhathi ezimbalwa ngasikhathi sinye, lokhu kuqondise ngokuqondile ibhendi yamafrikhwensi aphezulu lapho amavokhoda alula we-GAN evamise ukufiphala noma ukuduma.

I-Mastering UnivNet Multi-Resolution Vocoder

I-UnivNet iyivokhoda ye-GAN ehlulelayo ekhiqize umsindo kusetshenziswa ama-spectrogram amaningi enziwe ngekhompyutha ngezinqumo ezihlukene ze-STFT, elola imininingwane yefrikhwensi ephezulu. Ihlose ukuba i-vokhoda yendawo yonke ehlanganisa kahle izikhulumi ezingabonwa nezimo zokuqopha. I-UnivNet Multi-Resolution Vocoder ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya. Ukuze wakhe ukuqonda okujulile, phatha i-UnivNet Multi-Resolution Vocoder njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-UnivNet Multi-Resolution Vocoder aphatha ikhwalithi, ukubambezeleka, kanye nemvume njengezingxenye ezibalulekile ngokulinganayo zesu lokuthumela. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ngesikhathi esifanayo, ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyakhuphuka uma imvume ingekho. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-UnivNet Multi-Resolution Vocoder

Ukubandlululwa kwe-spectrogram ye-UnivNet enezixazululo eziningi sekuyisithako esijwayelekile kuzitaki zesimanje ze-TTS namasistimu athonyekile afana ne-BigVGAN namakhodekhi omsindo we-neural. Lindela uhlaka lwendawo yonke, lwesipikha lwe-agnostic ukuze luqhubeke lunwebeka luye ezwini eliculayo, ukuhlanganiswa kwezilimi eziningi, nomsindo womkhawulokudonsa ogcwele ongu-48 kHz, kuyilapho umbono we-adaptive-kernel wazisa amamodeli akudivayisi asebenza kahle okumele aphathe amazwi ahlukahlukene ngaphandle kokucushwa kahle kwesikhulumi ngasinye.

Ukuqaliswa Komhlaba Wangempela

Amasevisi we-TTS wezipikha eziningi okufanele azwakale engokwemvelo emazwini angekho kudatha yokuqeqeshwa

Amapayipi wokuhlanganisa izwi lapho ivowuda eyodwa yendawo yonke inikezela ngezipikha eziningi eziqondiwe

I-high-fidelity audiobook nokulandisa kwe-podcast okudinga ukufana okupholile kanye namafrikhwensi aphezulu

I-backend vocoder yezinhlelo ze-TTS eziya ekupheleni ezibhanqa i-spectrogram predictor ne-robust waveform generator

Amaphethini Okusebenzisa

I-UnivNet Multi-Resolution Vocoder isebenza

Amasevisi we-TTS wezipikha eziningi okufanele azwakale engokwemvelo emazwini angekho kudatha yokuqeqeshwa.

Amasevisi e-TTS anezipikha eziningi okufanele azwakale engokwemvelo emazwini angekho kudatha yokuqeqeshwa Amathimba ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-UnivNet Multi-Resolution Vocoder isebenza

Amapayipi wokuhlanganisa izwi lapho ivowuda eyodwa yendawo yonke inikezela ngezipikha eziningi eziqondiwe.

Amapayipi okulinganisa izwi lapho i-vokhoda eyodwa esebenza emhlabeni wonke isebenzisa izipikha eziningi eziqondiwe Amathimba ngokuvamile athola imiphumela engcono lapho echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-UnivNet Multi-Resolution Vocoder isebenza

I-high-fidelity audiobook nokulandisa kwe-podcast okudinga ukufana okupholile kanye namafrikhwensi aphezulu.

I-high-fidelity audiobook nokulandisa kwe-podcast okudinga ukufana okucacile kanye namaza aphakeme Amathimba ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamacala asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-UnivNet Multi-Resolution Vocoder isebenza

I-backend vocoder yezinhlelo ze-TTS eziya ekupheleni ezibhanqa i-spectrogram predictor nejeneretha ye-waveform eqinile.

I-backend vocoder yezinhlelo ze-TTS zokuphela-to-ekupheleni ezibhanqa i-spectrogram predictor ne-robust waveform generator Amathimba ngokuvamile athola imiphumela engcono uma echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyanda uma imvume ingekho.

!

Ukunemba kungase kwehle kuzo zonke izinhlobo zokuphimisela, izilimi zesigodi, noma izindawo ezinomsindo.

!

Umsindo wokwenziwa ungenziwa iphutha njengenkulumo eyiqiniso ngaphandle kokulebula okucacile.

Ukuqalisa Umhlahlandlela

1

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha.

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva.

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo.

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele.

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole