Overview
SoundStorm i Google modhi yerudzi rweodhiyo inoburitsa kutaura neruzha zvakaenzana kwete chiratidzo chimwe chete panguva, zvichiita kuti redhiyo yemhando yepamusoro iwedzere nekukurumidza. Izvo zvine basa nekuti inocheka chizvarwa latency kwezvimedu zvirefu kubva pamaminetsi kusvika kumasekonzi pasina kupa kutendeka.
SoundStorm Parallel Audio Generation inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau.
Deep Dive
SoundStorm, yakaunzwa na Google muna 2023, inogadzira odhiyo inomiririrwa se discrete acoustic tokens kubva kune neural codec inonzi SoundStream. Mamodheru ekutanga seAudioLM akagadzira aya tokens otomatiki, achifanotaura tokeni imwe neimwe munhevedzano, inononoka kureba odhiyo. SoundStorm pachinzvimbo inoshandisa isiri-autoregressive, mask-yakavakirwa nzira yakakweretwa kubva kumhando yemhando yemhando seMaskGIT. Iyo inotanga neakawanda masiki tokeni uye inoramba ichizadza iwo mukati meashoma ematanho ekudhirodha, achifanotaura ma tokens akawanda panguva imwe chete. Yakamisikidzwa pamasemantic tokens (kubva pamuenzaniso seAudioLM kana SPEAR-TTS), inogona kugadzira masekonzi makumi matatu ehurukuro yechisikigo mukati mehafu yesekondi paTPU, ingangoita zana nekukurumidza kupfuura autoregressive baselines uku ichifananidza hunhu hwavo uye kuenderana kwemutauri.
Technical Insight
SoundStorm inoenzanisira hierarchy yezvakasara vector quantization (RVQ) mazinga kubva kuSoundStream. Munguva yekudzidziswa, maratidziro asina kujairika akafukidzwa uye modhi inodzidza kufanotaura. Pakunongedza inomhanyisa kuvimba-kwakavakirwa parallel decoding: mune imwe neimwe iteration inofanotaura ese masiki tokens, inochengeta iyo inonyanya kuvimba, uye masks zvakare mamwe ese. Iyo inodhirodha akaomesesa RVQ mazinga ekutanga, ozoita akanakisa, achisvika akazara odhiyo mumatanho mashoma kupfuura echiratidzo-ne-chiratidzo chizvarwa.
Mastering SoundStorm Parallel Audio Generation
SoundStorm i Google modhi yerudzi rweodhiyo inoburitsa kutaura neruzha zvakaenzana kwete chiratidzo chimwe chete panguva, zvichiita kuti redhiyo yemhando yepamusoro iwedzere nekukurumidza. Izvo zvine basa nekuti inocheka chizvarwa latency kwezvimedu zvirefu kubva pamaminetsi kusvika kumasekonzi pasina kupa kutendeka. SoundStorm Parallel Audio Generation inogara muodhiyo-AI workflows inoshandura kutaura, mimhanzi, uye ruzha rwekutaurirana, kuwanikwa, uye kugadzirwa kwenhau. Kuvaka kunzwisisa kwakadzama, bata SoundStorm Parallel Audio Generation semuenzaniso wekushandisa, kwete chinhu chimwe chete: tsanangura zvaunoda mhedzisiro, kujekesa fungidziro, uye patsanura izvo zvinogona kuitwa nehurongwa hwakavimbika kubva kune zvichiri kuda kutonga kwenyanzvi.
Mukuita, zvikwata zvakasimba zvinoshandisa SoundStorm Parallel Audio Generation zvinobata mhando, latency, uye mvumo sezvikamu zvakakosha zvakaenzana zvehurongwa hwekuendesa. Ivo vanonyora zvakajeka maitiro ebudiriro, bvunzo vachipokana ne data rechokwadi uye mafambiro ebasa, uye iterate zvichibva pane zvakacherechedzwa maitiro ekutadza kwete kuhwina-nguva imwe chete yebhenji. Apa ndipo apo kunzwisisa kwe theoretical kunoshanduka kuve kugona kwakasimba pane chigadzirwa, mutemo, uye mashandiro.
Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Panguva imwecheteyo, kusashandiswa kweIzwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo. Nzira yakatsiga ndeyekubatanidza kukurumidza kuyedza nekutonga: mhanyisa vatyairi vendege, tora humbowo, buritsa matanda esarudzo, uye urambe uchivandudza chengetedzo semaitiro emuenzaniso, zvinotarisirwa nemushandisi, uye zvinodikanwa zvekutonga.
Strategic Impact
Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza.
Inonatsiridza kusvikika kuburikidza nekunyora, kurondedzera, uye mazwi ekubatanidza. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki.
Zvikwata zveMedia zvinogona kutumira odhiyo yakakwenenzverwa nekukurumidza nemabhajeti madiki. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru.
Masisitimu anotarisana nevatengi anogona kugadzirisa kutaurirana kwekutaura pamwero mukuru. Mukutumirwa kwemhando yepamusoro, izvi zvinoshandurirwa kuita mitemo inoyerwa yekushanda, miganhu yevaridzi, uye tsika dzekudzokorora dzinodzokororwa kuitira kuti zvikwata zvikwire kuvimba pane kukwidza kusajeka.
Real-World Implementation
Kugadzira 30-yechipiri nhaurirano dzinotaurwa dzeAI izwi vabatsiri mukati mesekondi
Kubatanidza hurukuro dze-multi-turn nemanzwi emutauri anowirirana e prototyping
Kugonesa yakaderera-latency mavara-kune-kutaura mune anodyidzana maajenti uko autoregressive mamodheru anononoka
Kugadzira-refu-fomu yakarondedzerwa odhiyo nekukurumidza nekuzadza acoustic tokens zvakafanana
Maitiro Ekuita
SoundStorm Parallel Audio Generation mukuita
Kugadzira 30-yechipiri nhaurirano dzinotaurwa dzeAI izwi vabatsiri mukati mesekondi.
Kugadzira nhaurirano dzemakumi matatu-yechipiri dzinotaurwa dzevabatsiri vezwi veAI pasi pechipiri Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura mhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa nemitengo yekukanganisa nekufamba kwenguva.
SoundStorm Parallel Audio Generation mukuita
Kubatanidza hurukuro dze-multi-turn nemanzwi emutauri anowirirana e prototyping.
Kubatanidza nhaurirano dzakasiyana-siyana nemanzwi emutauri anowirirana eMatimu eprototyping kazhinji anowana mhedzisiro iri nani kana achinge atsanangura zvikumbaridzo zvemhando yepamusoro, chengetedza nzira yekukwira kwevanhu yemakesi emupendero, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
SoundStorm Parallel Audio Generation mukuita
Kugonesa yakaderera-latency mavara-kune-kutaura mune anodyidzana maajenti uko autoregressive mamodheru anononoka.
Kugonesa yakaderera-latency mameseji-kune-kutaura mune anodyidzana maajenti uko autoregressive modhi lag Matimu anowanzo kuwana mhedzisiro iri nani kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
SoundStorm Parallel Audio Generation mukuita
Kugadzira-refu-fomu yakarondedzerwa odhiyo nekukurumidza nekuzadza acoustic tokens zvakafanana.
Kugadzira yakarebesa yakarondedzerwa odhiyo nekukasira nekuzadza acoustic tokens mune yakafanana Matimu anowanzo kuwana zvirinani zvibodzwa kana achinge atsanangura emhando yepamusoro kumberi, chengetedza nzira yekukwira kwevanhu yemakesi ekumucheto, uye kuteedzera zvese zvakawanikwa zvechigadzirwa uye mutengo wekukanganisa nekufamba kwenguva.
Njodzi & Guardrails
Kusashandisa izwi zvisizvo uye njodzi dzekuedzesera dzinowedzera kana chibvumirano chisipo.
Kururama kunogona kudonha mumitauro, mataurirwo, kana nharaunda dzine ruzha.
Synthetic audio inogona kukanganisa kutaura kwechokwadi isina mavara akajeka.
Implementation Roadmap
Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare.
Wana mvumo yakajeka yekutora inzwi, kugadzira, uye kushandisa zvakare. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure.
Yedza mhando pavatauri vakasiyana uye mamiriro ekumashure. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda.
Tsanangura apo munhu anofanira kuongorora kana kubvumidza zvabuda. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.
Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira.
Label synthetic odhiyo uye chengetedza marekodhi ekuzvidavirira. Bata nhanho yega yega segedhi rehumbowo: kana maitiro asina kusangana, imbomira kuburitsa, vhara gaka, uye wobva wawedzera kushandiswa.