Akopọ
MusicLM jẹ Google's awoṣe ọrọ-si-orin ti o ṣe agbejade awọn iṣẹju pupọ ti ohun ibaramu lati inu apejuwe bi 'orinrin violin kan ti o tunu ti o ṣe atilẹyin nipasẹ riff gita ti o daru.' O ṣe pataki nitori pe o yanju igbekalẹ orin gigun-gun nipasẹ tito awọn awoṣe ni ipo-iṣe, ṣiṣe itọju iran orin bii awoṣe ede lori awọn ami ohun afetigbọ.
Iran Orin Aṣaṣepo MusicLM joko ni awọn ṣiṣan iṣẹ ohun-AI ti o yi ọrọ pada, orin, ati ohun fun ibaraẹnisọrọ, iraye si, ati iṣelọpọ media.
Jin Dive
Ti kede nipasẹ Google Iwadi ni ibẹrẹ ọdun 2023, MusicLM ṣe agbekalẹ iran orin gẹgẹbi asọtẹlẹ awọn ilana ti awọn ami ohun afetigbọ, bii awoṣe ede kan sọ asọtẹlẹ awọn ọrọ. O nlo awọn ipo-iṣaaju ti awọn aṣoju: awọn ami atunmọ (lati awoṣe ti a pe ni w2v-BERT) gba eto ipele giga bi orin aladun ati ariwo lori awọn igba pipẹ, lakoko ti awọn ami acoustic (lati SoundStream neural codec) gba awọn alaye to dara bi timbre ati sojurigindin. A first stage generates semantic tokens from the text prompt, then later stages fill in acoustic detail conditioned on those semantics. Text conditioning comes from MuLM/MuLan, a joint music-text embedding trained so descriptions and audio land in the same space. This staged approach lets MusicLM stay musically consistent over minutes rather than drifting after a few seconds.
Imọ-imọ-ẹrọ
The key idea is decoupling structure from texture across a token hierarchy. Coarse semantic tokens are sparse and slow-changing, so a Transformer can model long-term form without a huge sequence length. Awọn ami-ami Acoustic jẹ ipon ati iwọn-giga, ṣugbọn wọn nilo nikan ni asọtẹlẹ asọtẹlẹ lori awọn atunmọ ti o wa titi tẹlẹ, ti o jẹ ki ipele kọọkan jẹ itọpa. SoundStream's residual vector quantization produces the layered acoustic codes that a final decoder turns back into 24 kHz waveforms.
Mastering MusicLM Hierarchical Music Generation
MusicLM jẹ Google's awoṣe ọrọ-si-orin ti o ṣe agbejade awọn iṣẹju pupọ ti ohun ibaramu lati inu apejuwe bi 'orinrin violin kan ti o tunu ti o ṣe atilẹyin nipasẹ riff gita ti o daru.' O ṣe pataki nitori pe o yanju igbekalẹ orin gigun-gun nipasẹ tito awọn awoṣe ni ipo-iṣe, ṣiṣe itọju iran orin bii awoṣe ede lori awọn ami ohun afetigbọ. Iran Orin Aṣaṣepo MusicLM joko ni awọn ṣiṣan iṣẹ ohun-AI ti o yi ọrọ pada, orin, ati ohun fun ibaraẹnisọrọ, iraye si, ati iṣelọpọ media. Lati kọ oye ti o jinlẹ, ṣe itọju Iranti Orin Hierarchical MusicLM bi awoṣe iṣẹ, kii ṣe ẹya ẹyọkan: ṣalaye awọn abajade ti o fẹ, ṣe alaye awọn arosọ, ati lọtọ ohun ti eto le ṣe ni igbẹkẹle lati ohun ti o tun nilo idajọ amoye.
Ni iṣe, awọn ẹgbẹ ti o lagbara ti nlo MusicLM Hierarchical Iran Generation ṣe itọju didara, lairi, ati igbanilaaye gẹgẹbi awọn ẹya pataki kanna ti ilana imuṣiṣẹ. Wọn ṣe akọsilẹ awọn ibeere aṣeyọri ti o fojuhan, idanwo lodi si data ojulowo ati ṣiṣan iṣẹ, ati atunbere ti o da lori awọn ilana ikuna ti a ṣakiyesi dipo awọn bori ala-akoko kan. Eyi ni ibiti oye imọ-jinlẹ yipada si agbara ti o tọ kọja ọja, eto imulo, ati awọn iṣẹ ṣiṣe.
O ṣe ilọsiwaju iraye si nipasẹ transcription, alaye, ati awọn atọkun ohun. Ni akoko kanna, ilokulo ohun ati awọn eewu imisi eniyan n pọ si nigbati igbanilaaye ba sonu. Ọna resilient julọ julọ ni lati darapọ iyara idanwo pẹlu ibawi ijọba: ṣiṣe awọn awakọ awakọ, mu ẹri mu, ṣe atẹjade awọn iwe ipinnu, ati imudojuiwọn awọn aabo nigbagbogbo bi ihuwasi awoṣe, awọn ireti olumulo, ati awọn ibeere ilana ti dagbasoke.
Ipa Ilana
O ṣe ilọsiwaju iraye si nipasẹ transcription, alaye, ati awọn atọkun ohun.
O ṣe ilọsiwaju iraye si nipasẹ transcription, alaye, ati awọn atọkun ohun. Ni awọn imuṣiṣẹ ti o ni agbara giga, eyi ni a tumọ si awọn ofin iṣiṣẹ wiwọn, awọn aala nini, ati awọn ilana atunyẹwo loorekoore ki awọn ẹgbẹ le ṣe iwọn igbẹkẹle dipo iwọn aibikita.
Awọn ẹgbẹ Media le firanṣẹ ohun didan yiyara pẹlu awọn isuna-owo kekere.
Awọn ẹgbẹ Media le firanṣẹ ohun didan yiyara pẹlu awọn isuna-owo kekere. Ni awọn imuṣiṣẹ ti o ni agbara giga, eyi ni a tumọ si awọn ofin iṣiṣẹ wiwọn, awọn aala nini, ati awọn ilana atunyẹwo loorekoore ki awọn ẹgbẹ le ṣe iwọn igbẹkẹle dipo iwọn aibikita.
Awọn ọna ṣiṣe ti nkọju si alabara le ṣe ilana awọn ibaraẹnisọrọ sisọ ni iwọn nla.
Awọn ọna ṣiṣe ti nkọju si alabara le ṣe ilana awọn ibaraẹnisọrọ sisọ ni iwọn nla. Ni awọn imuṣiṣẹ ti o ni agbara giga, eyi ni a tumọ si awọn ofin iṣiṣẹ wiwọn, awọn aala nini, ati awọn ilana atunyẹwo loorekoore ki awọn ẹgbẹ le ṣe iwọn igbẹkẹle dipo iwọn aibikita.
Real-World imuse
Turning a written scene description into a film or trailer score, e.g. 'epic orchestral build with choir'
Generating background music conditioned on an image caption or even painting descriptions for art installations
Extending a short hummed or whistled melody into a fully instrumented arrangement
Producing varied stock-music tracks at different tempos and moods for advertising and content creators
Awọn Ilana imuse
MusicLM Hierarchical Music Generation in practice
Turning a written scene description into a film or trailer score, e.g. 'epic orchestral build with choir'.
Turning a written scene description into a film or trailer score, e.g. 'apọju orchestral kọ pẹlu akorin' Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ṣalaye awọn ilodi didara ni iwaju, tọju ọna igbega eniyan fun awọn ọran eti, ati tọpa awọn anfani iṣelọpọ mejeeji ati awọn idiyele aṣiṣe lori akoko.
MusicLM Hierarchical Music Generation in practice
Generating background music conditioned on an image caption or even painting descriptions for art installations.
Ṣiṣẹda orin isale ti o ni ilodi si lori akọle aworan tabi paapaa awọn apejuwe kikun fun awọn fifi sori ẹrọ aworan Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ṣalaye awọn ilodi didara ni iwaju, tọju ọna igbega eniyan fun awọn ọran eti, ati tọpa awọn anfani iṣelọpọ mejeeji ati awọn idiyele aṣiṣe lori akoko.
MusicLM Hierarchical Music Generation in practice
Extending a short hummed or whistled melody into a fully instrumented arrangement.
Gbigbe orin aladun kukuru tabi súfèé sinu eto ohun elo ni kikun Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ba ṣalaye awọn ilodi didara ni iwaju, tọju ọna igbega eniyan fun awọn ọran eti, ati tọpa awọn anfani iṣelọpọ mejeeji ati awọn idiyele aṣiṣe lori akoko.
MusicLM Hierarchical Music Generation in practice
Producing varied stock-music tracks at different tempos and moods for advertising and content creators.
Ṣiṣejade awọn orin orin-ọja ti o yatọ ni awọn akoko oriṣiriṣi ati awọn iṣesi fun ipolowo ati awọn olupilẹṣẹ akoonu Awọn ẹgbẹ nigbagbogbo gba awọn abajade to dara julọ nigbati wọn ṣalaye awọn ilodi didara ni iwaju, tọju ọna imudara eniyan fun awọn ọran eti, ati tọpa mejeeji awọn anfani iṣelọpọ ati awọn idiyele aṣiṣe lori akoko.
Awọn ewu & Awọn ọna iṣọ
ilokulo ohun ati awọn ewu afarawe ṣe pọ si nigbati igbanilaaye ba sonu.
Yiye le ju silẹ kọja awọn asẹnti, awọn ede-ede, tabi awọn agbegbe alariwo.
Ohun afetigbọ sintetiki le jẹ aṣiṣe fun ọrọ ododo laisi isamisi to yege.
Ilana Ilana imuse
Gba ifọkansi ti o fojuhan fun gbigba ohun, ti ẹda, ati ilotunlo.
Gba ifọkansi ti o fojuhan fun gbigba ohun, ti ẹda, ati ilotunlo. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.
Didara idanwo kọja awọn agbohunsoke oniruuru ati awọn ipo abẹlẹ.
Didara idanwo kọja awọn agbohunsoke oniruuru ati awọn ipo abẹlẹ. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.
Ṣetumo nigbati eniyan gbọdọ ṣe atunyẹwo tabi fọwọsi awọn abajade.
Ṣetumo nigbati eniyan gbọdọ ṣe atunyẹwo tabi fọwọsi awọn abajade. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.
Aami ohun sintetiki ki o tọju awọn igbasilẹ provenance fun iṣiro.
Aami ohun sintetiki ki o tọju awọn igbasilẹ provenance fun iṣiro. Ṣe itọju igbesẹ kọọkan bi ẹnu-ọna ẹri: ti awọn ibeere ko ba ni ibamu, daduro yiyọ kuro, pa aafo naa, ati lẹhinna faagun lilo.