Audio AI GUIDE

AudioLM

AudioLM i Google tsvagiridzo yehurongwa inogadzira zvinonzwika - kutaura kana mimhanzi yepiyano - nekubata ruzha semutauro uye kufanotaura nezve tokeni.

Overview

AudioLM i Google tsvagiridzo yehurongwa inogadzira zvinonzwika - kutaura kana mimhanzi yepiyano - nekubata ruzha semutauro uye kufanotaura nezve tokeni. Izvo zvine basa nekuti yakaratidza kuti unogona kuburitsa inowirirana, yakasikwa-inonzwika kuenderera mberi pasina chero chinyorwa chinyorwa kana mumhanzi.

AudioLM inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzira midhiya.

Deep Dive

Yakaunzwa ne Google muna 2022, AudioLM inogadziridza chizvarwa chekuteerera sedambudziko rekuenzanisira mutauro: inoshandura masaisai emasaisai kuita madhigirii tokeni uye yofanotaura chiratidzo chinotevera, sekufembera kwezwi rinotevera. Yayo yakakosha hunyengeri ndeye hierarchy yemhando dzetokeni. 'Semantic' tokens (kubva pamuenzaniso senge w2v-BERT) inobata chimiro chenguva refu - fonetiki, syntax, mutinhimira - nepo 'acoustic' tokens (kubva kuSoundStream neural codec) inobata zvinhu zvakanaka senge chiziviso chemutauri, timbre, uye mamiriro ekurekodha. Nekutanga kufanotaura semantic tokens, wozogadzirisa acoustic tokens pavari, AudioLM inogadzira kuenderera kunoramba kwakabatana kwemasekondi akawanda uchichengetedza izwi rekutanga kana chiridzwa. Kupiwa masekonzi mashoma ekutaura, rinoramba richitaura nezwi rimwechete; piyano yakapihwa, inovandudza muchimiro chimwe chete.

Technical Insight

AudioLM inodzidziswa chete pane odhiyo - hapana zvinyorwa. SoundStream inomanikidza odhiyo kuita acoustic tokens kuburikidza neasara vector quantization, nepo w2v-BERT ichipa coarse semantic tokens. Murwi wemamodhiyo emutauro weTransformer unofanotaura zviratidzo mumatanho: semantic kutanga kune chimiro, kozoti dzakakasharara uye dzakanaka acoustic tokeni dzepamusoro-kutendeseka patsva. SoundStream's decoder inozopedzisira yasandura ma tokeni akafano dzokera kuita waveform, ichiburitsa odhiyo inochengeta izwi remutauri uye prosody inoenderana.

Kudzidzira AudioLM

AudioLM i Google tsvagiridzo yehurongwa inogadzira zvinonzwika - kutaura kana mimhanzi yepiyano - nekubata ruzha semutauro uye kufanotaura nezve tokeni. Izvo zvine basa nekuti yakaratidza kuti unogona kuburitsa inowirirana, yakasikwa-inonzwika kuenderera mberi pasina chero chinyorwa chinyorwa kana mumhanzi. AudioLM inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzira midhiya. Kuti uvake kunzwisisa kwakadzama, bata AudioLM semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvinodiwa, tsanangura fungidziro, uye patsanura izvo zvingaitwe nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.

Mukuita, zvikwata zvakasimba zvinoshandisa AudioLM zvinobata mhando, latency, uye mvumo sezvikamu zvakakosha zvakaenzana zvehurongwa hwekuendesa. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Panguva imwecheteyo, kusashandiswa kweIzwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.

Strategic Impact

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza.

Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki.

Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru.

Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.

Ramangwana reAudioLM

AudioLM's token-based recipe yakava hwaro hweakazotevera masisitimu: Google's AudioLM mazano akaiswa muMusicLM yezvinyorwa-kumumhanzi uye SoundStorm yekukurumidza chizvarwa, ukuwo nzvimbo yakakura ikozvino inosanganisa semantic neacoustic tokens pakutaura, mimhanzi, uye ruzha. Tarisira nekukurumidza,-chaiyo-nguva chizvarwa, kureba kwakabatana zvinobuda, uye multimodal kudzora uko zvinyorwa kana mamwe masaini anotungamira chete maodhiyo-akadzidziswa modhi. Iwo matekiniki mamwewo anorodza kunetseka nezve izwi cloning uye odhiyo yakadzika.

Real-World Implementation

Kuenderera mberi nechidimbu chekutaura muizwi remutauri mumwe chete uye matauriro pasina chinyorwa

Kuvandudza mimhanzi yepiyano mitsva inoenderana nechimiro chechidimbu chakarekodhwa chekukurumidza

Kushanda seaodhiyo-chizvarwa musana wezvinyorwa-kune-mimhanzi masisitimu seMusicLM

Tsvagiridzo mune yekutaura synthesis inochengetedza prosody uye kurekodha acoustics kubva kumuenzaniso

Maitiro Ekuita

AudioLM mukuita

Kuenderera mberi nechidimbu chekutaura muizwi remutauri mumwe chete uye matauriro pasina chinyorwa.

Kuenderera mberi nechidimbu chekutaura muizwi remutauri mumwechete uye mataurirwo asina chinyorwa Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

AudioLM mukuita

Kuvandudza mimhanzi yepiyano mitsva inoenderana nechimiro chechidimbu chakarekodhwa chekukurumidza.

Kuvandudza mimhanzi yepiyano mitsva inoenderana nemaitiro epfupi akarekodhwa ekukurumidza Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.

AudioLM mukuita

Kushanda seaodhiyo-chizvarwa musana wezvinyorwa-kune-mimhanzi masisitimu seMusicLM.

Kushanda senge redhiyo-chizvarwa musana wezvinyorwa-kune-mimhanzi masisitimu seMusicLM Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

AudioLM mukuita

Tsvagiridzo mune yekutaura synthesis inochengetedza prosody uye kurekodha acoustics kubva kumuenzaniso.

Tsvagiridzo mune yekutaura synthesis inochengetedza prosody uye kurekodha acoustics kubva kumuenzaniso Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.

Njodzi & Guardrails

!

Kusashandisa izwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo.

!

Kururama kunogona kudonha mumitauro, mataurirwo, kana nharaunda dzine ruzha.

!

Synthetic audio inogona kukanganisa kutaura kwechokwadi isina mavara akajeka.

Implementation Roadmap

1

Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare.

Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

2

Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure.

Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

3

Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda.

Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

4

Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira.

Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.

Ramba Uchiongorora