Audio AI GUIDE

UnivNet Multi-Resolution Vocoder

UnivNet iGAN vocoder iyo vatongi vakagadzira odhiyo vachishandisa akawanda spectrograms akaiswa pane akasiyana STFT resolution, inorodza yakakwira-frequency ruzivo.

Overview

UnivNet iGAN vocoder iyo vatongi vakagadzira odhiyo vachishandisa akawanda spectrograms akaiswa pane akasiyana STFT resolution, inorodza yakakwira-frequency ruzivo. Inovavarira kuve vhokodha yepasirese inogadzira zvakanaka kune vasingaonekwe vatauri uye kurekodha mamiriro.

UnivNet Multi-Resolution Vocoder inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau.

Deep Dive

UnivNet, yakakurudzirwa naJang et al. muna 2021, inobata kushaya simba kwakajairwa kuGAN vocoders: muffled kana artifact-laden high frequency. Mamiriro ejenareta pa-full-band mel-spectrograms uye inoshandisa nzvimbo-variable convolutions (LVC), apo convolution kernels inofanotaurwa panhunzi kubva pane zvinopinda maficha kuitira kuti sefa ienderane nezviri munharaunda. Pfungwa yemusoro ndeye multi-resolution spectrogram discriminator (MRSD): pachinzvimbo chekutonga chete yakaomeswa waveform, UnivNet inoverengera akati wandei maSTFT ane akasiyana hwindo uye hop saizi uye inomhanyisa vanosarura pane iwo spectrogram magnitudes. Izvi zvinosundidzira jenareta kuti iwane zvese zvakanaka spectral ruzivo uye yakafara temporal chimiro kurudyi. Yakadzidziswa pavatauri vakawanda, UnivNet inogadzira matauriro echisikigo emanzwi ayo asina kumboona panguva yekudzidziswa, achiwana zita rayo repasi rose.

Technical Insight

UnivNet's nzvimbo-inoshanduka convolution inogadzira kernel uremu zvine simba kubva kune inomisikidza mel maficha kuburikidza nediki kernel-predictor network, saka nguva yega yega nhanho inoshandisa zvine mutsindo-inogadzirisa sefa pane yakagadziriswa yakagovaniswa kernel. Zvakasanganiswa ne-multi-resolution spectrogram discriminator, iyo inotora nguva yakawanda yekutengeserana-offs panguva imwe chete, izvi zvinonangana nebhendi repamusoro-soro apo manzwi ari nyore eGAN anowanzoita kusajeka kana kutinhira.

Mastering UnivNet Multi-Resolution Vocoder

UnivNet iGAN vocoder iyo vatongi vakagadzira odhiyo vachishandisa akawanda spectrograms akaiswa pane akasiyana STFT resolution, inorodza yakakwira-frequency ruzivo. Inovavarira kuve vhokodha yepasirese inogadzira zvakanaka kune vasingaonekwe vatauri uye kurekodha mamiriro. UnivNet Multi-Resolution Vocoder inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau. Kuvaka kunzwisisa kwakadzama, bata UnivNet Multi-Resolution Vocoder semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa UnivNet Multi-Resolution Vocoder zvinobata mhando, latency, uye mvumo sezvikamu zvakakosha zvakaenzana zvehurongwa hwekuendesa. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Panguva imwecheteyo, kusashandiswa kweIzwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza.

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki.

Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru.

Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reUnivNet Multi-Resolution Vocoder

UnivNet's multi-resolution spectrogram rusarura yave chinhu chakajairwa mumatura emazuva ano eTTS uye masisitimu akafurirwa akaita seBigVGAN uye neural audio codecs. Tarisira kuumbwa kwepasirese, mutauri-agnostic kuti arambe achikura kuenda kuimba yekuimba, mitauro yakawanda, uye yakazara-bandwidth 48 kHz odhiyo, nepo adaptive-kernel pfungwa inozivisa inoshanda pa-mudziyo modhi iyo inofanirwa kubata manzwi akasiyana pasina mutauri-mutauri.

Real-World Implementation

Multi-speaker TTS masevhisi anofanirwa kunzwika echisikigo pamanzwi asiripo mudhata rekudzidzisa

Mapaipi ekubatanidza manzwi apo imwe vokoda yepasi rose inoshandira vakawanda vanonanga vatauri

High-fidelity audiobook uye podcast kurondedzera inoda crisp sibilance uye yakakwirira frequency

Backend vocoder yemagumo-kusvika-kumagumo eTTS masisitimu anobatanidza spectrogram predictor ine yakasimba waveform jenareta.

Maitiro Ekuita

UnivNet Multi-Resolution Vocoder mukuita

Multi-speaker TTS masevhisi anofanirwa kunzwika echisikigo pamanzwi asiripo mudhata rekudzidzisa.

Multi-speaker TTS masevhisi anofanirwa kunzwika echisikigo pamanzwi asiri aripo mukudzidziswa data Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

UnivNet Multi-Resolution Vocoder mukuita

Mapaipi ekubatanidza manzwi apo imwe vokoda yepasi rose inoshandira vakawanda vanonanga vatauri.

Mapaipi ekubatanidza manzwi apo imwe vokoda yepasirese inoshandira vazhinji vanotarirwa vatauri Matimu anowanzo kuwana mibairo iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

UnivNet Multi-Resolution Vocoder mukuita

High-fidelity audiobook uye podcast kurondedzera inoda crisp sibilance uye yakakwirira frequency.

High-fidelity audiobook uye podcast kurondedzera kunoda crisp sibilance uye yakakwirira frequency Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

UnivNet Multi-Resolution Vocoder mukuita

Backend vocoder yemagumo-kusvika-kumagumo TTS masisitimu anobatanidza spectrogram predictor ine yakasimba waveform jenareta.

Backend vocoder yemagumo-kusvika-kumagumo eTTS masisitimu anobatanidza spectrogram predictor ine robust waveform jenareta Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese kubudirira kwekubudirira uye kukanganisa mutengo nekufamba kwenguva.

Njodzi & Guardrails

!

Kusashandisa izwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo.

!

Kururama kunogona kudonha mumitauro, mataurirwo, kana nharaunda dzine ruzha.

!

Synthetic audio inogona kukanganisa kutaura kwechokwadi isina mavara akajeka.

Implementation Roadmap

1

Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare.

Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure.

Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda.

Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira.

Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora