Overview
DiffWave iri diffusion-based vocoder inogadzira odhiyo nekudzokorodza kudzokorodza ruzha rusingaite kuita waveform, yakarongedzwa pane mel-spectrogram. Izvo zvakaunza mamodheru ekusiyanisa kutaura kwepamusoro-kutendeseka, kukwikwidza maGAN uye WaveNet pasina kudzidziswa kweanopikisa.
DiffWave Diffusion Vocoder inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau.
Deep Dive
DiffWave, yakaunzwa naKong et al. muna 2020, inoshandisa denoising diffusion probabilistic modhi chimiro kune mbishi odhiyo. Panguva yekudzidziswa zvishoma nezvishoma inowedzera Gaussian ruzha kune yakachena waveform pamusoro pematanho mazhinji, yobva yadzidza network yekufanotaura uye kubvisa iyo ruzha padanho rega rega. Panguva yechizvarwa inotanga kubva kune ruzha rwakachena uye inomhanyisa maitiro ekudzokera kumashure, akaiswa pane mel-spectrogram, kudzoreredza kutaura kwakachena. Iyo musana ndeye-isina-autoregressive, dilated-convolution network yakafanana neWaveNet asi kufanotaura ruzha kwete masampuli. DiffWave inoenderana nemavokodha akasimba mumhando uye yakasimba zvakanyanya, kunyange kuburitsa zvine musoro kutaura kusingaite uye mhedzisiro inowirirana kune vese vatauri. Iyo huru yekutengeserana ndeyekumhanyisa: kusaita sampling inoda gumi nemaviri kusvika kuzviuru zvematanho, kunyangwe masheji ekukurumidza anocheka izvi kusvika mashoma kusvika matanhatu.
Technical Insight
DiffWave inodzidza gradient yekugovera data zvisina kujeka nekudzidzisa network kufanotaura ruzha rwakawedzerwa padanho rekusarudzika, uchishandisa chakareruka chakaremerwa L2 chinangwa. Sampling inodzosera yakagadziriswa ruzha runyoro, uye nhamba yematanho anotengeserana kunaka kwekumhanya; vatsvakurudzi vakawana nokungwarira akasarudzwa pfupi rudungwe dzematanho matanhatu anochengetedza kuvimbika kwakawanda, kushandura churu-nhanho muitiro kuva chimwe chinhu chiri pedyo zvikuru nekushanda.
Mastering DiffWave Diffusion Vocoder
DiffWave iri diffusion-based vocoder inogadzira odhiyo nekudzokorodza kudzokorodza ruzha rusingaite kuita waveform, yakarongedzwa pane mel-spectrogram. Izvo zvakaunza mamodheru ekusiyanisa kutaura kwepamusoro-kutendeseka, kukwikwidza maGAN uye WaveNet pasina kudzidziswa kweanopikisa. DiffWave Diffusion Vocoder inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau. Kuti uvake kunzwisisa kwakadzama, bata DiffWave Diffusion Vocoder semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvaunoda mhedzisiro, kujekesa fungidziro, uye patsanura zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.
Mukuita, zvikwata zvakasimba zvinoshandisa DiffWave Diffusion Vocoder zvinobata mhando, latency, uye mvumo sezvikamu zvakakosha zvakaenzana zvehurongwa hwekuendesa. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.
Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Panguva imwecheteyo, kusashandiswa kweIzwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.
Strategic Impact
Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza.
Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki.
Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru.
Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Real-World Implementation
High-fidelity neural text-to-speech back ends inodzivirira kusagadzikana kweGAN kudzidziswa
Kugadzirwa kwekutaura kusingaverengeki kwekuwedzera kwedata uye tsvakiridzo yekuteerera
Mutauri-robust voice synthesis apo modhi imwe inobata manzwi akawanda nguva dzose
Muedzo wekukurumidza-sampling diffusion yekutsvaga, kushandisa mapfupi masheti eruzha kune chaiyo-nguva yekuteerera
Maitiro Ekuita
DiffWave Diffusion Vocoder mukuita
High-fidelity neural text-to-speech back ends inodzivirira kusagadzikana kweGAN kudzidziswa.
High-fidelity neural text-to-speech back ends iyo inodzivirira kusagadzikana kweGAN kudzidzisa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
DiffWave Diffusion Vocoder mukuita
Kugadzirwa kwekutaura kusingaverengeki kwekuwedzera kwedata uye tsvakiridzo yekuteerera.
Kugadzirwa kwekutaura kusingaverengeki kwekuwedzera kwedata uye odhiyo yekutsvagisa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
DiffWave Diffusion Vocoder mukuita
Mutauri-robust voice synthesis apo modhi imwe inobata manzwi akawanda nguva dzose.
Mutauri-robust voice synthesis apo imwe modhi inobata manzwi akawanda nguva dzose Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
DiffWave Diffusion Vocoder mukuita
Muedzo wekukurumidza-sampling diffusion yekutsvaga, kushandisa mapfupi masheti eruzha kune chaiyo-nguva yekuteerera.
Muedzo wekukurumidza-sampling yekutsvagisa kutsvagisa, kushandisa mapfupi masheti kune chaiyo-nguva odhiyo Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Njodzi & Guardrails
Kusashandisa izwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo.
Kururama kunogona kudonha mumitauro, mataurirwo, kana nharaunda dzine ruzha.
Synthetic audio inogona kukanganisa kutaura kwechokwadi isina mavara akajeka.
Implementation Roadmap
Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare.
Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure.
Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda.
Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira.
Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.