Audio AI Itọsọna

MusicLM Hierarchical Music Generation

MusicLM jẹ awoṣe ọrọ-si-orin Google ti o ṣe agbejade awọn iṣẹju pupọ ti ohun ibaramu lati inu apejuwe bi 'orinrin violin kan ti o tunu ti o ṣe atilẹyin nipasẹ riff gita ti o daru.

Akopọ

MusicLM jẹ Google's awoṣe ọrọ-si-orin ti o ṣe agbejade awọn iṣẹju pupọ ti ohun ibaramu lati inu apejuwe bi 'orinrin violin kan ti o tunu ti o ṣe atilẹyin nipasẹ riff gita ti o daru.' O ṣe pataki nitori pe o yanju igbekalẹ orin gigun-gun nipasẹ tito awọn awoṣe ni ipo-iṣe, ṣiṣe itọju iran orin bii awoṣe ede lori awọn ami ohun afetigbọ.

Iran Orin Aṣaṣepo MusicLM joko ni awọn ṣiṣan iṣẹ ohun-AI ti o yi ọrọ pada, orin, ati ohun fun ibaraẹnisọrọ, iraye si, ati iṣelọpọ media.

Jin Dive

Ti kede nipasẹ Google Iwadi ni ibẹrẹ ọdun 2023, MusicLM ṣe agbekalẹ iran orin gẹgẹbi asọtẹlẹ awọn ilana ti awọn ami ohun afetigbọ, bii awoṣe ede kan sọ asọtẹlẹ awọn ọrọ. O nlo awọn ipo-iṣaaju ti awọn aṣoju: awọn ami atunmọ (lati awoṣe ti a pe ni w2v-BERT) gba eto ipele giga bi orin aladun ati ariwo lori awọn igba pipẹ, lakoko ti awọn ami acoustic (lati SoundStream neural codec) gba awọn alaye to dara bi timbre ati sojurigindin. A first stage generates semantic tokens from the text prompt, then later stages fill in acoustic detail conditioned on those semantics. Text conditioning comes from MuLM/MuLan, a joint music-text embedding trained so descriptions and audio land in the same space. This staged approach lets MusicLM stay musically consistent over minutes rather than drifting after a few seconds.

Imọ-imọ-ẹrọ

The key idea is decoupling structure from texture across a token hierarchy. Coarse semantic tokens are sparse and slow-changing, so a Transformer can model long-term form without a huge sequence length. Awọn ami-ami Acoustic jẹ ipon ati iwọn-giga, ṣugbọn wọn nilo nikan ni asọtẹlẹ asọtẹlẹ lori awọn atunmọ ti o wa titi tẹlẹ, ti o jẹ ki ipele kọọkan jẹ itọpa. SoundStream's residual vector quantization produces the layered acoustic codes that a final decoder turns back into 24 kHz waveforms.

Mastering MusicLM Hierarchical Music Generation

MusicLM jẹ Google's awoṣe ọrọ-si-orin ti o ṣe agbejade awọn iṣẹju pupọ ti ohun ibaramu lati inu apejuwe bi 'orinrin violin kan ti o tunu ti o ṣe atilẹyin nipasẹ riff gita ti o daru.' O ṣe pataki nitori pe o yanju igbekalẹ orin gigun-gun nipasẹ tito awọn awoṣe ni ipo-iṣe, ṣiṣe itọju iran orin bii awoṣe ede lori awọn ami ohun afetigbọ. Iran Orin Aṣaṣepo MusicLM joko ni awọn ṣiṣan iṣẹ ohun-AI ti o yi ọrọ pada, orin, ati ohun fun ibaraẹnisọrọ, iraye si, ati iṣelọpọ media. Lati kọ oye ti o jinlẹ, ṣe itọju Iranti Orin Hierarchical MusicLM bi awoṣe iṣẹ, kii ṣe ẹya ẹyọkan: ṣalaye awọn abajade ti o fẹ, ṣe alaye awọn arosọ, ati lọtọ ohun ti eto le ṣe ni igbẹkẹle lati ohun ti o tun nilo idajọ amoye.

Ni iṣe, awọn ẹgbẹ ti o lagbara ti nlo MusicLM Hierarchical Iran Generation ṣe itọju didara, lairi, ati igbanilaaye gẹgẹbi awọn ẹya pataki kanna ti ilana imuṣiṣẹ. Wọn ṣe akọsilẹ awọn ibeere aṣeyọri ti o fojuhan, idanwo lodi si data ojulowo ati ṣiṣan iṣẹ, ati atunbere ti o da lori awọn ilana ikuna ti a ṣakiyesi dipo awọn bori ala-akoko kan. Eyi ni ibiti oye imọ-jinlẹ yipada si agbara ti o tọ kọja ọja, eto imulo, ati awọn iṣẹ ṣiṣe.

O ṣe ilọsiwaju iraye si nipasẹ transcription, alaye, ati awọn atọkun ohun. Ni akoko kanna, ilokulo ohun ati awọn eewu imisi eniyan n pọ si nigbati igbanilaaye ba sonu. Ọna resilient julọ julọ ni lati darapọ iyara idanwo pẹlu ibawi ijọba: ṣiṣe awọn awakọ awakọ, mu ẹri mu, ṣe atẹjade awọn iwe ipinnu, ati imudojuiwọn awọn aabo nigbagbogbo bi ihuwasi awoṣe, awọn ireti olumulo, ati awọn ibeere ilana ti dagbasoke.

Ipa Ilana

O ṣe ilọsiwaju iraye si nipasẹ transcription, alaye, ati awọn atọkun ohun.

O ṣe ilọsiwaju iraye si nipasẹ transcription, alaye, ati awọn atọkun ohun. Ni awọn imuṣiṣẹ ti o ni agbara giga, eyi ni a tumọ si awọn ofin iṣiṣẹ wiwọn, awọn aala nini, ati awọn ilana atunyẹwo loorekoore ki awọn ẹgbẹ le ṣe iwọn igbẹkẹle dipo iwọn aibikita.

Awọn ẹgbẹ Media le firanṣẹ ohun didan yiyara pẹlu awọn isuna-owo kekere.

Awọn ẹgbẹ Media le firanṣẹ ohun didan yiyara pẹlu awọn isuna-owo kekere. Ni awọn imuṣiṣẹ ti o ni agbara giga, eyi ni a tumọ si awọn ofin iṣiṣẹ wiwọn, awọn aala nini, ati awọn ilana atunyẹwo loorekoore ki awọn ẹgbẹ le ṣe iwọn igbẹkẹle dipo iwọn aibikita.

Awọn ọna ṣiṣe ti nkọju si alabara le ṣe ilana awọn ibaraẹnisọrọ sisọ ni iwọn nla.

Awọn ọna ṣiṣe ti nkọju si alabara le ṣe ilana awọn ibaraẹnisọrọ sisọ ni iwọn nla. Ni awọn imuṣiṣẹ ti o ni agbara giga, eyi ni a tumọ si awọn ofin iṣiṣẹ wiwọn, awọn aala nini, ati awọn ilana atunyẹwo loorekoore ki awọn ẹgbẹ le ṣe iwọn igbẹkẹle dipo iwọn aibikita.

The Future of MusicLM Hierarchical Music Generation

MusicLM's hierarchical token approach became a template for later systems like MusicGen and commercial music tools. Reti imudara orin aladun wiwọ (hum a tune, gba eto ni kikun), awọn orin ti iṣeto ni kikun gun pẹlu awọn ẹsẹ ati awọn akorin, ati iṣakoso to dara julọ lori awọn irinṣẹ ati bọtini. Awọn ọran elegun jẹ ofin ati ti iṣe: iwe-aṣẹ data ikẹkọ, ifọwọsi olorin, ati ohun afetigbọ ti ipilẹṣẹ omi ki o le ṣe iyatọ si orin ti eniyan ṣe ni bayi aringbungbun si imuṣiṣẹ.

Real-World imuse

Turning a written scene description into a film or trailer score, e.g. 'epic orchestral build with choir'

Generating background music conditioned on an image caption or even painting descriptions for art installations

Extending a short hummed or whistled melody into a fully instrumented arrangement

Producing varied stock-music tracks at different tempos and moods for advertising and content creators

Awọn Ilana imuse

MusicLM Hierarchical Music Generation in practice

Turning a written scene description into a film or trailer score, e.g. 'epic orchestral build with choir'.

Turning a written scene description into a film or trailer score, e.g. 'apọju orchestral kọ pẹlu akorin' Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ṣalaye awọn ilodi didara ni iwaju, tọju ọna igbega eniyan fun awọn ọran eti, ati tọpa awọn anfani iṣelọpọ mejeeji ati awọn idiyele aṣiṣe lori akoko.

MusicLM Hierarchical Music Generation in practice

Generating background music conditioned on an image caption or even painting descriptions for art installations.

Ṣiṣẹda orin isale ti o ni ilodi si lori akọle aworan tabi paapaa awọn apejuwe kikun fun awọn fifi sori ẹrọ aworan Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ṣalaye awọn ilodi didara ni iwaju, tọju ọna igbega eniyan fun awọn ọran eti, ati tọpa awọn anfani iṣelọpọ mejeeji ati awọn idiyele aṣiṣe lori akoko.

MusicLM Hierarchical Music Generation in practice

Extending a short hummed or whistled melody into a fully instrumented arrangement.

Gbigbe orin aladun kukuru tabi súfèé sinu eto ohun elo ni kikun Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ba ṣalaye awọn ilodi didara ni iwaju, tọju ọna igbega eniyan fun awọn ọran eti, ati tọpa awọn anfani iṣelọpọ mejeeji ati awọn idiyele aṣiṣe lori akoko.

MusicLM Hierarchical Music Generation in practice

Producing varied stock-music tracks at different tempos and moods for advertising and content creators.

Ṣiṣejade awọn orin orin-ọja ti o yatọ ni awọn akoko oriṣiriṣi ati awọn iṣesi fun ipolowo ati awọn olupilẹṣẹ akoonu Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ṣalaye awọn ilodi didara ni iwaju, tọju ọna imudara eniyan fun awọn ọran eti, ati tọpa mejeeji awọn anfani iṣelọpọ ati awọn idiyele aṣiṣe lori akoko.

Awọn ewu & Awọn ọna iṣọ

!

ilokulo ohun ati awọn ewu afarawe ṣe pọ si nigbati igbanilaaye ba sonu.

!

Yiye le ju silẹ kọja awọn asẹnti, awọn ede-ede, tabi awọn agbegbe alariwo.

!

Ohun afetigbọ sintetiki le jẹ aṣiṣe fun ọrọ ododo laisi isamisi to yege.

Ilana Ilana imuse

1

Gba ifọkansi ti o fojuhan fun gbigba ohun, ti ẹda, ati ilotunlo.

Gba ifọkansi ti o fojuhan fun gbigba ohun, ti ẹda, ati ilotunlo. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.

2

Didara idanwo kọja awọn agbohunsoke oniruuru ati awọn ipo abẹlẹ.

Didara idanwo kọja awọn agbohunsoke oniruuru ati awọn ipo abẹlẹ. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.

3

Ṣetumo nigbati eniyan gbọdọ ṣe atunyẹwo tabi fọwọsi awọn abajade.

Ṣetumo nigbati eniyan gbọdọ ṣe atunyẹwo tabi fọwọsi awọn abajade. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.

4

Aami ohun sintetiki ki o tọju awọn igbasilẹ provenance fun iṣiro.

Aami ohun sintetiki ki o tọju awọn igbasilẹ provenance fun iṣiro. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.

Tesiwaju Ṣiṣawari