Audio AI JAGORA

Daidaici WaveGAN Vocoder

Parallel WaveGAN shine muryar murya mai sauri wanda ke juya mel-spectrogram zuwa sigar sautin sauti mai ɗanɗano ta amfani da ƙaramin GAN, yana samar da duk samfuran lokaci ɗaya.

Dubawa

Parallel WaveGAN shine muryar murya mai sauri wanda ke juya mel-spectrogram zuwa sigar sautin sauti mai ɗanɗano ta amfani da ƙaramin GAN, yana samar da duk samfuran lokaci ɗaya. Yana da mahimmanci saboda yana ba da kusan-ainihin lokaci, magana mai inganci tare da ƙaramin tsari.

Daidaitaccen WaveGAN Vocoder yana zaune a cikin ayyukan aiki na audio-AI wanda ke canza magana, kiɗa, da sauti don sadarwa, samun dama, da samar da kafofin watsa labarai.

Zurfafa nutsewa

Vocoder shine mataki na ƙarshe na bututun TTS: yana canza taswirar fasalin sauti (yawanci mel-spectrogram) zuwa ainihin motsin sautin da kuke ji. Parallel WaveGAN, wanda Yamamoto, Song, da Kim suka gabatar a cikin 2019, yana yin wannan tare da janareta salon WaveNet mara-autoregressive wanda aka horar dashi azaman hanyar sadarwa ta gaba. Maimakon tsinkayar samfurin sauti guda ɗaya a lokaci guda kamar WaveNet na asali, yana samar da dukkan nau'ikan igiyoyin ruwa a layi daya, yana sa shi sauri da sauri. Maɓallin girke-girkensa ya haɗu da asarar gaba tare da asarar gajeriyar juzu'i mai yawa Fourier transform (STFT), don haka ƙirar ta dace da ainihin sigina a cikin lokuta da yawa da ma'auni. Sakamakon shine ƙaramin janareta (kusan sigogi miliyan 1.4) wanda ke tafiyar da sauri sau da yawa fiye da ainihin lokacin akan GPU.

Fahimtar Fasaha

Janareta babbar hanyar sadarwa ce mai faɗaɗa-convolution da aka sharadi akan mel-spectrogram da shigar da amo, ƙarar taswira da fasali kai tsaye zuwa samfurori. Horon da aka haɗa tare yana rage girman asarar STFT mai ƙididdigewa, ƙididdigewa ta hanyar kwatanta girman sikirin mai girma da tsayin FFT da yawa, da asarar gaba daga mai nuna wariya yana yanke hukunci na gaskiya. Kalmar STFT tana daidaitawa da haɓaka horo na gaba, yana ɗaukar cikakkun bayanai dalla-dalla da faffadan siffa ba tare da distillation ba.

Jagoran Parallel WaveGAN Vocoder

Parallel WaveGAN shine muryar murya mai sauri wanda ke juya mel-spectrogram zuwa sigar sautin sauti mai ɗanɗano ta amfani da ƙaramin GAN, yana samar da duk samfuran lokaci ɗaya. Yana da mahimmanci saboda yana ba da kusan-ainihin lokaci, magana mai inganci tare da ƙaramin tsari. Daidaitaccen WaveGAN Vocoder yana zaune a cikin ayyukan aiki na audio-AI wanda ke canza magana, kiɗa, da sauti don sadarwa, samun dama, da samar da kafofin watsa labarai. Don gina zurfin fahimta, bi Parallel WaveGAN Vocoder azaman ƙirar aiki, ba sifa ɗaya ba: ayyana sakamakon da ake so, fayyace zato, da raba abin da tsarin zai iya yi da dogaro daga abin da har yanzu yana buƙatar yanke hukunci na ƙwararru.

A aikace, ƙungiyoyi masu ƙarfi da ke amfani da Parallel WaveGAN Vocoder suna ɗaukar inganci, jinkiri, da yarda a matsayin daidaitattun sassa na dabarun turawa. Suna rubuta ƙayyadaddun ƙa'idodin nasara, gwaji akan bayanan gaskiya da gudanawar aiki, da jujjuyawar bisa ga tsarin gazawar da aka lura maimakon cin nasara na lokaci ɗaya. Wannan shine inda fahimtar ka'idar ta juya zuwa iyawa mai dorewa a cikin samfura, manufofi, da ayyuka.

Yana inganta samun dama ta hanyar rubutu, ba da labari, da mu'amalar murya. A lokaci guda, rashin amfani da murya da haɗarin kwaikwaya yana ƙaruwa lokacin da aka rasa izini. Hanyar da ta fi dacewa ita ce haɗa saurin gwaji tare da horon gudanarwa: gudanar da matukin jirgi, kama shaida, buga rajistan ayyukan yanke shawara, da ci gaba da sabunta abubuwan tsaro kamar yadda halayen ƙira, tsammanin mai amfani, da buƙatun tsari ke tasowa.

Dabarun Tasiri

Yana inganta samun dama ta hanyar rubutu, ba da labari, da mu'amalar murya.

Yana inganta samun dama ta hanyar rubutu, ba da labari, da mu'amalar murya. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Ƙungiyoyin kafofin watsa labaru na iya jigilar sauti mai gogewa cikin sauri tare da ƙaramin kasafin kuɗi.

Ƙungiyoyin kafofin watsa labaru na iya jigilar sauti mai gogewa cikin sauri tare da ƙaramin kasafin kuɗi. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Tsarin fuskantar abokin ciniki na iya aiwatar da hulɗar magana a mafi girman ma'auni.

Tsarin fuskantar abokin ciniki na iya aiwatar da hulɗar magana a mafi girman ma'auni. A cikin ƙawance masu inganci, ana fassara wannan zuwa ƙa'idodin aiki waɗanda za a iya aunawa, iyakokin ikon mallaka, da kuma bita-da-kullin bita don ƙungiyoyi su iya haɓaka kwarin gwiwa a maimakon ɓata shakku.

Makomar Parallel WaveGAN Vocoder

Parallel WaveGAN ya taimaka wajen kafa GAN vocoders a matsayin tsoho mai amfani, kuma asarar STFT mai ƙuduri mai yawa a yanzu yana bayyana a cikin magada kamar HiFi-GAN da tsarin yawo da yawa. Halin yanayin yana nuni zuwa ga ƙarami, ƙananan latency vocoders don mataimakan na'ura, na'urorin ji, da sauya murya mai rai, tare da vocoders na duniya waɗanda ke gabaɗaya ga masu magana da ba a gani. Yi tsammanin haɗin kai tare da ƙarshen-zuwa-ƙarshen TTS da ingantaccen turawa akan wayar hannu da kwakwalwan kwamfuta da aka saka.

Aiwatar da Gaskiyar Duniya

Fitowar magana ta ainihi a cikin mataimakan muryar wayar hannu inda latency da girman ƙirar ke da mahimmanci

Yin hidima azaman janareta na waveform haɗe tare da samfuran sauti kamar Tacotron 2 ko FastSpeech

Rubutu-zuwa-magana akan na'urar don kayan aikin samun dama waɗanda ba za su iya dogara ga gajimare ba

Tsarin jujjuya murya wanda ke sake haɗa jujjuyawar spectrograms zuwa sauti mai sauti na halitta

Hanyoyin Aiwatarwa

Daidaici WaveGAN Vocoder a aikace

Fitowar magana ta ainihi a cikin mataimakan muryar wayar hannu inda latency da girman ƙirar ke da mahimmanci.

Fitowar magana na lokaci-lokaci a cikin mataimakan muryar wayar hannu inda latency da yanayin girman samfurin Ƙungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Daidaici WaveGAN Vocoder a aikace

Yin hidima azaman janareta na waveform haɗe tare da samfuran sauti kamar Tacotron 2 ko FastSpeech.

Yin aiki azaman janareta na waveform wanda aka haɗa tare da samfuran sauti kamar Tacotron 2 ko Ƙungiyoyin FastSpeech yawanci suna samun sakamako mafi kyau lokacin da suka ayyana madaidaicin ƙofa a gaba, kiyaye hanyar haɓaka ɗan adam don ƙararraki, da bin diddigin nasarorin samarwa da ƙimar kuskure akan lokaci.

Daidaici WaveGAN Vocoder a aikace

Rubutu-zuwa-magana akan na'urar don kayan aikin samun dama waɗanda ba za su iya dogara ga gajimare ba.

Rubutu-zuwa-magana akan na'urar don kayan aikin samun dama waɗanda ba za su iya dogaro da Ƙungiyoyin girgije yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'in gefe, da bin diddigin nasarorin samarwa da ƙimar kuskure a kan lokaci.

Daidaici WaveGAN Vocoder a aikace

Tsarin jujjuya murya wanda ke sake haɗa jujjuyawar spectrograms zuwa sauti mai sauti na halitta.

Tsarukan jujjuya murya waɗanda ke sake fasalin jujjuyawar sifofi zuwa sautin sauti na dabi'a Kungiyoyi yawanci suna samun sakamako mafi kyau lokacin da suka ayyana ƙofofin inganci a gaba, kiyaye hanyar haɓakar ɗan adam don shari'o'i, da bin duk nasarorin samarwa da farashi na kuskure akan lokaci.

Hatsari & Tsare-tsare

!

Rashin amfani da murya da haɗarin kwaikwaya yana ƙaruwa lokacin da aka rasa izini.

!

Daidaituwa na iya faɗuwa cikin lafuzza, yaruka, ko mahalli masu hayaniya.

!

Ana iya kuskuren sauti na roba don ingantacciyar magana ba tare da bayyananniyar lakabi ba.

Taswirar Hanya

1

Sami tabbataccen izini don ɗaukar murya, cloning, da sake amfani.

Sami tabbataccen izini don ɗaukar murya, cloning, da sake amfani. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

2

Gwajin ingantattun masu magana daban-daban da yanayin baya.

Gwajin ingantattun masu magana daban-daban da yanayin baya. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

3

Ƙayyade lokacin da dole ne ɗan adam ya duba ko ya amince da abubuwan da aka fitar.

Ƙayyade lokacin da dole ne ɗan adam ya duba ko ya amince da abubuwan da aka fitar. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

4

Yi lakabin sauti na roba da kuma adana bayanan da aka tabbatar don yin lissafi.

Yi lakabin sauti na roba da kuma adana bayanan da aka tabbatar don yin lissafi. Ɗauki kowane mataki azaman ƙofar shaida: idan ba a cika sharuɗɗa ba, dakatar da fitar, rufe tazarar, sannan kawai faɗaɗa amfani.

Ci gaba da Bincike