Audio AI GUIDE

MelGAN Generative Vocoder

MelGAN inonyatso convolutional GAN-based vocoder inoshandura mel-spectrograms kuita mbishi odhiyo waveform mune imwechete inokurumidza kumberi kupfuura.

Overview

MelGAN inonyatso convolutional GAN-based vocoder inoshandura mel-spectrograms kuita mbishi odhiyo waveform mune imwechete inokurumidza kumberi kupfuura. Izvo zvine basa nekuti yakaratidza yemhando yepamusoro, isiri-autoregressive yekutaura synthesis inogona kumhanya mazana enguva nekukurumidza kupfuura nguva chaiyo paGPU.

MelGAN Generative Vocoder inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau.

Deep Dive

MelGAN, yakaunzwa naKumar et al. muna 2019, inogadzira odhiyo isina inononoka sampuli-ne-sample loop inoshandiswa neWaveNet. Jenareta yaro murwi wezvakachinjika convolutions izvo upsample mel-spectrogram (kazhinji 80 frequency mabhendi) kusvika kuodhiyo sampuli mwero, nemabhuraki akasara anoshandisa akadhirowewa convolutions kuwedzera munda unogamuchira. Chinhu chikuru chekuvandudza kwaive kudzidziswa nevakawanda vanosarura vachishanda pazvikero zvakasiyana zvekuteerera (iyo yekutanga waveform pamwe neshanduro dzakaderedzwa), imwe neimwe ichitarisa mafafitera ari pamusoro. Kurasikirwa-kufananidza kurasikirwa kunofananidza kusarura ma activation pakati pechokwadi uye manyepo odhiyo, kudzikamisa kudzidziswa kweGAN. Iyo modhi idiki ne-neural-odhiyo zviyero uye inomhanya nekukurumidza kupfuura nguva chaiyo kunyangwe paCPU, ichiita kuti ive inoshanda kune yakamisikidzwa uye pane-mudziyo mameseji-ku-kutaura.

Technical Insight

MelGAN's multi-scale discriminator anoshandisa matatu akafanana network achitarisa odhiyo yakazara, hafu, uye kota resolution, yega yega yekutapa chimiro pamatanho akasiyana. Sezvineiwo, MelGAN inotsamira pakurasikirwa kwechikamu (L1 chinhambwe pakati pekusarura mamepu echokwadi vs. Yakagadzirwa odhiyo) pane yakajeka spectrogram kurasikirwa kwekuvakazve, iyo inokurudzira jenareta kuti ienderane neiyo chaiyo odhiyo's statistics layer by layer.

Mastering MelGAN Generative Vocoder

MelGAN inonyatso convolutional GAN-based vocoder inoshandura mel-spectrograms kuita mbishi odhiyo waveform mune imwechete inokurumidza kumberi kupfuura. Izvo zvine basa nekuti yakaratidza yemhando yepamusoro, isiri-autoregressive yekutaura synthesis inogona kumhanya mazana enguva nekukurumidza kupfuura nguva chaiyo paGPU. MelGAN Generative Vocoder inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau. Kuvaka kunzwisisa kwakadzama, bata MelGAN Generative Vocoder semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodikanwa, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa MelGAN Generative Vocoder zvinobata mhando, latency, uye mvumo sezvikamu zvakakosha zvakaenzana zvehurongwa hwekuendesa. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Panguva imwecheteyo, kusashandiswa kweIzwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza.

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki.

Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru.

Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reMelGAN Generative Vocoder

MelGAN akadyara mhuri yemavokodha eGAN. Vatsivi vayo, HiFi-GAN neUnivNet, vakachengeta nzira yekukurumidza isiri-autoregressive asi yakawedzera-akawanda-nguva uye akawanda-resolution kusarura kweakachena ma frequency. Iyo dhizaini inogara mu-on-mudziyo uye kutenderera TTS uko latency uye modhi saizi zvine basa, uye ayo ekusarura mazano anoramba achipesvedzera neural codecs uye mimhanzi chizvarwa masisitimu uko kudzidziswa kweanopikisa kunovandudza maonero.

Real-World Implementation

Pa-mudziyo mameseji-ku-kutaura mune nhare mbozha apo kavhakodha kadiki, inokurumidza kunzvenga mafambiro emakore

Chaiyo-nguva inoshandura inzwi mapaipi anoshandura mel-spectrogram yemutauri kuita izwi rinonangwa

Maturusi emitambo neanopopotera anogadzira nhaurirano yemunhu kubva kune akagadzirwa spectrograms ine yakaderera latency

Tsvagiridzo yekutanga yeaodhiyo GAN, uko MelGAN inofananidzira kurasikirwa inoshandiswa zvakare mumhanzi uye kugadzirwa kweruzha.

Maitiro Ekuita

MelGAN Generative Vocoder mukuita

Pa-mudziyo mameseji-ku-kutaura mune nhare mbozha apo kanzwi diki, inokurumidza kunzvenga mafambiro emakore.

Pa-mudziyo mameseji-ku-kutaura muvabatsiri venhare apo diki, rinokurumidza vhokodha inonzvenga makore kutenderera nzendo Zvikwata zvinowanzowana mibairo iri nani pazvinenge zvichitsanangudza zvikumbaridzo zvemhando yepamusoro kumberi, chengeta nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

MelGAN Generative Vocoder mukuita

Chaiyo-nguva inoshandura inzwi mapaipi anoshandura mel-spectrogram yemutauri kuita izwi rinonangwa.

Chaiyo-nguva yekushandura inzwi mapaipi anoshandura mel-spectrogram yemukurukuri kuita izwi rinotangwa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

MelGAN Generative Vocoder mukuita

Maturusi emitambo neanopopotera anogadzira nhaurirano yemunhu kubva kune akagadzirwa spectrograms ine yakaderera latency.

Maturusi emitambo neanopopotera anogadzira nhaurirano yemunhu kubva kuakagadzirwa spectrograms ane yakaderera latency Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

MelGAN Generative Vocoder mukuita

Tsvagiridzo yekutanga yeaodhiyo GAN, uko MelGAN yekufananidza-yekurasikirwa inoshandiswazve kumimhanzi uye kugadzirwa kweruzha.

Tsanangudzo dzekutanga dzeaodhiyo GANs, uko MelGAN's-yekufananidza kurasikirwa inoshandiswazve mimhanzi uye inonzwika-inonzwika chizvarwa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Kusashandisa izwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo.

!

Kururama kunogona kudonha mumitauro, mataurirwo, kana nharaunda dzine ruzha.

!

Synthetic audio inogona kukanganisa kutaura kwechokwadi isina mavara akajeka.

Implementation Roadmap

1

Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare.

Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure.

Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda.

Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira.

Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora