UMHLAHLANDLELA WE-AI womsindo

I-Parallel WaveGAN Vocoder

I-Parallel WaveGAN iyi-vocoder ye-neural esheshayo eshintsha i-mel-spectrogram ibe i-waveform yomsindo eluhlaza isebenzisa i-GAN encane, ikhiqize wonke amasampuli ngesikhathi esisodwa.

Uhlolojikelele

I-Parallel WaveGAN iyi-vocoder ye-neural esheshayo eshintsha i-mel-spectrogram ibe i-waveform yomsindo eluhlaza isebenzisa i-GAN encane, ikhiqize wonke amasampuli ngesikhathi esisodwa. Ibalulekile ngoba inikeza inkulumo yesikhathi sangempela, yekhwalithi ephezulu enemodeli ehlangene.

I-Parallel WaveGAN Vocoder ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya.

I-Deep Dive

Ivokhoda isigaba sokugcina sepayipi le-TTS: iguqula imephu yesici se-acoustic (imvamisa i-mel-spectrogram) ibe igagasi lomsindo langempela olizwayo. I-Parallel WaveGAN, ehlongozwe u-Yamamoto, iNgoma, noKim ngo-2019, yenza lokhu ngejeneretha yesitayela se-WaveNet engalawuleki eqeqeshwe njengenethiwekhi ekhiqizayo yokuphikisana. Esikhundleni sokubikezela isampula yomsindo eyodwa ngesikhathi njenge-WaveNet yasekuqaleni, ikhiqiza lonke uhlobo lwamagagasi ngokuhambisana, ilwenza lusheshe kakhulu. Iresiphi yayo eyinhloko ihlanganisa ukulahlekelwa okuphambene nokulahlekelwa kwe-Fourier transform (STFT) yesikhathi esifushane yokuxazululwa okuningi, ngakho imodeli ifanisa isignali yangempela phakathi kwezikhathi ezimbalwa nezikali zefrikhwensi. Umphumela uba ijeneretha encane (cishe amapharamitha ayizigidi eziyi-1.4) esebenza ngokushesha izikhathi eziningi kunesikhathi sangempela ku-GPU.

I-Technical Insight

Ijeneretha iyinethiwekhi ye-dilated-convolution efakwe ku-mel-spectrogram kanye nokokufaka komsindo, umsindo wemephu kanye nezici eziqondile kumasampuli. Ukuqeqesha ngokuhlanganyela kunciphisa ukulahlekelwa kwe-STFT enezixazululo eziningi, kuhlanganiswe ngokuqhathanisa ama-spectrogram obukhulu kumasayizi amaningana e-FFT nobude be-hop, kanye nokulahlekelwa okuphikisayo okuvela kumbandlululi owahlulela iqiniso. Itemu le-STFT lizinzisa futhi lisheshise ukuqeqeshwa kwezitha, lithwebula kokubili imininingwane emihle kanye nesimo esibanzi se-spectral ngaphandle kokucwiliswa kwe-distillation.

I-Mastering Parallel WaveGAN Vocoder

I-Parallel WaveGAN iyi-vocoder ye-neural esheshayo eshintsha i-mel-spectrogram ibe i-waveform yomsindo eluhlaza isebenzisa i-GAN encane, ikhiqize wonke amasampuli ngesikhathi esisodwa. Ibalulekile ngoba inikeza inkulumo yesikhathi sangempela, yekhwalithi ephezulu enemodeli ehlangene. I-Parallel WaveGAN Vocoder ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya. Ukuze wakhe ukuqonda okujulile, phatha i-Parallel WaveGAN Vocoder njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela oyifunayo, ucacise ukucabanga, futhi uhlukanise lokho uhlelo olungakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-Parallel WaveGAN Vocoder aphatha ikhwalithi, ukubambezeleka, kanye nemvume njengezingxenye ezibalulekile ngokulinganayo zesu lokuthunyelwa. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ngesikhathi esifanayo, ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyakhuphuka uma imvume ingekho. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-Parallel WaveGAN Vocoder

I-Parallel WaveGAN isize ekusunguleni ama-vocoder e-GAN njengokuzenzakalelayo okusebenzayo, futhi ukulahlekelwa kwayo kwe-STFT enezixazululo eziningi manje sekuvela kubo bonke abalandelayo njenge-HiFi-GAN nezinhlelo eziningi zokusakaza. I-trajectory ikhomba kumavokhoda amancane kakhulu, abambezeleka kancane kubasizi abakudivayisi, izinsiza zokuzwa, nokuguqulwa kwezwi bukhoma, kanye namavowuda ajwayelekile afinyelela kuzipikha ezingabonwa. Lindela ukuhlanganiswa okuqinile nge-TTS yokuphela-siya-ekugcineni kanye nokusetshenziswa okuphumelelayo kuma-chips eselula nashumekiwe.

Ukuqaliswa Komhlaba Wangempela

Okukhipha inkulumo yesikhathi sangempela kuzisizi zezwi leselula lapho ukubambezeleka nosayizi wemodeli kubalulekile

Isebenza njengejeneretha ye-waveform ebhangqwe namamodeli we-acoustic afana ne-Tacotron 2 noma i-FastSpeech

Umbhalo uye enkulumeni ekudivayisini wamathuluzi okufinyelela angakwazi ukuthembela emafini

Amasistimu okuguqula izwi ahlanganisa kabusha ama-spectrogram aguqulelwe kumsindo ozwakalayo wemvelo

Amaphethini Okusebenzisa

I-Parallel WaveGAN Vocoder ekusebenzeni

Okukhipha inkulumo yesikhathi sangempela kuzisizi zezwi leselula lapho ukubambezeleka nosayizi wemodeli kubalulekile.

Okukhipha inkulumo yesikhathi sangempela kuzisizi zezwi zamaselula lapho ukubambezeleka nosayizi wemodeli Amaqembu ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, agcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Parallel WaveGAN Vocoder ekusebenzeni

Isebenza njengejeneretha ye-waveform ebhangqwe namamodeli we-acoustic afana ne-Tacotron 2 noma i-FastSpeech.

Ukukhonza njengejeneretha ye-waveform ebhangqwe namamodeli acoustic afana ne-Tacotron 2 noma Amaqembu e-FastSpeech ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Parallel WaveGAN Vocoder ekusebenzeni

Umbhalo uye enkulumeni ekudivayisini wamathuluzi okufinyelela angakwazi ukuthembela kumafu.

Umbhalo uye enkulumeni ekudivayisini wamathuluzi okufinyelela angakwazi ukuthembela kumafu Amaqembu ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Parallel WaveGAN Vocoder ekusebenzeni

Amasistimu okuguqula izwi ahlanganisa kabusha ama-spectrogram aguqulelwe kumsindo ozwakalayo wemvelo.

Amasistimu okuguqulwa kwezwi ahlanganisa kabusha ama-spectrogram aguquliwe abe Amathimba omsindo ozwakalayo wemvelo ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyanda uma imvume ingekho.

!

Ukunemba kungase kwehle kuzo zonke izinhlobo zokuphimisela, izilimi zesigodi, noma izindawo ezinomsindo.

!

Umsindo wokwenziwa ungenziwa iphutha njengenkulumo eyiqiniso ngaphandle kokulebula okucacile.

Ukuqalisa Umhlahlandlela

1

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha.

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva.

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo.

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele.

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole