UMHLAHLANDLELA WE-AI womsindo

I-NaturalSpeech kanye ne-Latent Diffusion TTS

I-NaturalSpeech iwumugqa Microsoft wocwaningo lwe-TTS oluhloselwe ikhwalithi yenkulumo yezinga lomuntu, enezinguqulo zakamuva ezisebenzisa ukusabalalisa okucashile ukuze kukhiqizwe amazwi acebile, emvelo.

Uhlolojikelele

I-NaturalSpeech iwumugqa Microsoft wocwaningo lwe-TTS oluhloselwe ikhwalithi yenkulumo yezinga lomuntu, enezinguqulo zakamuva ezisebenzisa ukusabalalisa okucashile ukuze kukhiqizwe amazwi acebile, emvelo. Ibonisa ukuthi amamodeli okusabalalisa, adume ngezithombe, angakhiqiza kanjani umsindo ozwakalayo, olawulekayo.

I-NaturalSpeech kanye ne-Latent Diffusion TTS ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya.

I-Deep Dive

I-NaturalSpeech yoqobo (2022) bekuyisistimu yokuqala ebikwe ukuthi ifinyelele ikhwalithi yezinga lomuntu kubhentshimakhi ye-LJSpeech, ehlulelwa abalaleli abangakwazi ukukusho ngokuthembekile ekurekhodweni kwangempela. Isebenzise i-autoencoder eguquguqukayo enezangaphambili ezifaniswe ngokucophelela ukuze kuvalwe igebe phakathi kokuqeqeshwa nokusho okuthile. I-NaturalSpeech 2 yabe isithatha indlela yokusabalalisa okucashile: inkulumo ifakwe ikhodi ye-neural audio codec ibe amavekhtha afihlekile aqhubekayo, futhi imodeli yokusabalalisa ifunda ukwenza lezo zinto ezifihlekile ngombhalo, okuvumela ukuhlanganisa kwezwi okungasho lutho okuqinile kusuka ekwazisweni okufushane. I-NaturalSpeech 3 yethule ukusabalalisa okuhlukile, okuhlukanisa inkulumo ibe kuzibaluli ezihlukene njengokuqukethwe, i-prosody, i-timbre, nemininingwane ye-acoustic, ukuze ngayinye imodelwe futhi ilawulwe ngokuzimela ngokwethembeka okuphezulu nokuguquguquka.

I-Technical Insight

Ukusabalalisa okucashile kusebenza ngokwengeza umsindo ekumeleleni okucashile okuhlangene kwenkulumo nokuqeqesha inethiwekhi ukuze ihlehlise lowo msindo isinyathelo ngesinyathelo. Kunokuba ikhiphe umsindo wamagagasi angahluziwe noma ama-spectrogram agcwele, i-NaturalSpeech 2 idenoise ama-codec latent, ane-dimensional ephansi futhi kulula ukuyimodelisa. Isimo sombhalo kanye nezwi lereferensi liqondisa ukuhlehla kokusabalalisa, ngakho okokugcina okuyisampula okuyisampula kunquma kube inkulumo efana nokuqukethwe okuceliwe kanye nobunikazi besipikha.

I-Mastering NaturalSpeech kanye ne-Latent Diffusion TTS

I-NaturalSpeech iwumugqa Microsoft wocwaningo lwe-TTS oluhloselwe ikhwalithi yenkulumo yezinga lomuntu, enezinguqulo zakamuva ezisebenzisa ukusabalalisa okucashile ukuze kukhiqizwe amazwi acebile, emvelo. Ibonisa ukuthi amamodeli okusabalalisa, adume ngezithombe, angakhiqiza kanjani umsindo ozwakalayo, olawulekayo. I-NaturalSpeech kanye ne-Latent Diffusion TTS ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya. Ukuze wakhe ukuqonda okujulile, phatha i-NaturalSpeech ne-Latent Diffusion TTS njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho isistimu engakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-NaturalSpeech kanye ne-Latent Diffusion TTS aphatha ikhwalithi, ukubambezeleka, kanye nemvume njengezingxenye ezibalulekile ngokulinganayo zesu lokuthunyelwa. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ngesikhathi esifanayo, ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyakhuphuka uma imvume ingekho. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-NaturalSpeech kanye ne-Latent Diffusion TTS

I-TTS esekelwe ekusakazweni kanye ne-factorized ikhomba kumazwi angewona nje awemvelo kodwa alawuleka kahle, avumela abasebenzisi ukuthi balungise i-timbre, imizwa, ne-prosody njengokudayela okuzimele. Lindela amasampula asheshayo nge-distillation kanye nokusabalalisa kwezinyathelo ezimbalwa, i-zero-shot cloning eqinile kusukela kumasekhondi omsindo, nokuhlanganiswa okuqinile okunamamodeli amakhulu olimi ukuze kuthunyelwe okuqaphela umongo. Le ntuthuko iphinde iqinise isidingo se-watermarking kanye nezivikelo zemvume, njengoba ukwenziwa kwe-high-fidelity cloning kuphakamisa izingozi ezicacile zokusebenzisa kabi.

Ukuqaliswa Komhlaba Wangempela

Izitudiyo ezikopishwayo zifanisa izwi lomlingisi ukusuka kusampula emfushane ukuze enze amafilimu abe okwasendaweni, kusetshenziswa i-NaturalSpeech 2-style zero-shot cloning.

Izinkundla ze-Audiobook zikhiqiza ukulandisa okusezingeni lomuntu okuzabalaza abalaleli ukukuhlukanisa kuthalente lezwi langempela.

Amathuluzi okufinyelela akha kabusha izwi lomuntu siqu kusuka kokurekhodiwe okudala kulabo abalahlekelwe ukukhuluma.

Amasudi okudala okuqukethwe avumela abahleli ukuthi balungise ngokuzimela i-timbre ne-prosody, basebenzise izici ezifactorywe ze-NaturalSpeech 3.

Amaphethini Okusebenzisa

I-NaturalSpeech kanye ne-Latent Diffusion TTS ekusebenzeni

Izitudiyo ezikopishwayo zifanisa izwi lomlingisi ukusuka kusampula emfushane ukuze enze amafilimu abe okwasendaweni, kusetshenziswa i-NaturalSpeech 2-style zero-shot cloning.

Ama-dubbing studios ahlanganisa izwi lomlingisi ukusuka kusampula emfushane ukuze enze amafilimu abe okwasendaweni, esebenzisa i-NaturalSpeech 2-style zero-shot cloning Amaqembu ngokuvamile athola imiphumela engcono lapho echaza imingcele yekhwalithi ngaphambili, agcine indlela yokukhuphuka komuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-NaturalSpeech kanye ne-Latent Diffusion TTS ekusebenzeni

Izinkundla ze-Audiobook zikhiqiza ukulandisa okusezingeni lomuntu okuzabalaza abalaleli ukukuhlukanisa kuthalente lezwi langempela.

Izinkundla ze-Audiobook zikhiqiza ukulandisa okusezingeni lomuntu okuzabalaza abalaleli ukukuhlukanisa kuthalente lezwi langempela Amaqembu ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-NaturalSpeech kanye ne-Latent Diffusion TTS ekusebenzeni

Amathuluzi okufinyelela akha kabusha izwi lomuntu siqu kusuka kokurekhodiwe okudala kulabo abalahlekelwe ukukhuluma.

Amathuluzi okufinyelela akha kabusha izwi lomuntu elisuka ekurekhodweni okudala lalabo abalahlekelwe inkulumo yabo Amathimba ngokuvamile athola imiphumela engcono uma echaza izinga eliphezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-NaturalSpeech kanye ne-Latent Diffusion TTS ekusebenzeni

Amasudi okudala okuqukethwe avumela abahleli ukuthi balungise ngokuzimela i-timbre ne-prosody, basebenzise izici ezifactorywe ze-NaturalSpeech 3.

Amasudi okudala okuqukethwe avumela abahleli ukuthi balungise ngokuzimela i-timbre ne-prosody, besebenzisa izibaluli ze-NaturalSpeech 3's factorized Amaqembu ngokuvamile athola imiphumela engcono uma echaza imikhawulo yekhwalithi ngaphambili, agcine indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyanda uma imvume ingekho.

!

Ukunemba kungase kwehle kuzo zonke izinhlobo zokuphimisela, izilimi zesigodi, noma izindawo ezinomsindo.

!

Umsindo wokwenziwa ungenziwa iphutha njengenkulumo eyiqiniso ngaphandle kokulebula okucacile.

Ukuqalisa Umhlahlandlela

1

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha.

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva.

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo.

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele.

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole