UMHLAHLANDLELA WE-AI womsindo

I-Mimi Streaming Audio Codec

I-Mimi iyikhodekhi yomsindo we-neural ecindezela inkulumo ibe umfudlana wamathokheni ahlukahlukene ngesikhathi sangempela, ukuze amamodeli e-AI akwazi ukulalela futhi akhulume ngokubambezeleka okuphansi kakhulu.

Uhlolojikelele

I-Mimi iyikhodekhi yomsindo we-neural ecindezela inkulumo ibe umfudlana wamathokheni ahlukahlukene ngesikhathi sangempela, ukuze amamodeli e-AI akwazi ukulalela futhi akhulume ngokubambezeleka okuphansi kakhulu. Iwumgogodla womsindo ngemuva kwemodeli yezwi ka-Kyutai ye-Moshi.

I-Mimi Streaming Audio Codec ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya.

I-Deep Dive

I-Mimi, ekhishwe ilebhu yaseFrance i-Kyutai ngo-2024, iyikhodekhi ye-neural eguqula umsindo we-24 kHz ube ukusakazwa kwamathokheni ahlukahlukene cishe ku-1.1 kbps kanye namathokheni angu-12.5 kuphela ngomzuzwana. Isebenzisa i-encoder-decoder ene-residual vector quantization (RVQ), ihlukanise amathokheni abe ileveli yokuqala 'ye-semantic' exutshwe kusuka kumodeli yenkulumo ezigadile (WavLM) kanye namaleveli 'acoustic' ambalwa athwebula ukuthungwa kwezwi. Okubalulekile ukuthi isakaza ngokugcwele futhi iyimbangela: ikhipha amathokheni njengoba umsindo ufika kunokulinda isiqeshana esigcwele, esine-latency engaba ngu-80 ms. Lokhu kuvumela imodeli yolimi iphathe inkulumo njengamathokheni ombhalo, okwenza i-Moshi ikwazi ukuxoxa nge-duplex egcwele kuyilapho igcina umsindo owakhiwe kabusha uqondakala futhi ungokwemvelo.

I-Technical Insight

Iqhinga lika-Mimi wuhlelo lwe-split-RVQ. I-codebook yokuqala iqeqeshelwe ukulahlekelwa kwe-distillation ukuze ifane nokushumeka okuvela ku-WavLM, okuphoqelela ukuthi iphathe 'incazelo' yefonetiki, kuyilapho ama-codebook ahambisanayo akha kabusha imininingwane ye-waveform. I-Transformer isebenza ngaphakathi kwebhodlela, futhi ukulahlekelwa kwe-adversarial (GAN) kusikhiphi silola ikhwalithi yokuphumayo. I-causal convolutions igcina yonke into isakaza, ngakho-ke ukubambezeleka kuhlala eduze kuka-80 ms.

I-Mastering Mimi Streaming Audio Codec

I-Mimi iyikhodekhi yomsindo we-neural ecindezela inkulumo ibe umfudlana wamathokheni ahlukahlukene ngesikhathi sangempela, ukuze amamodeli e-AI akwazi ukulalela futhi akhulume ngokubambezeleka okuphansi kakhulu. Iwumgogodla womsindo ngemuva kwemodeli yezwi ka-Kyutai ye-Moshi. I-Mimi Streaming Audio Codec ihlezi ku-audio-AI workflows eguqula inkulumo, umculo, nomsindo wokuxhumana, ukufinyeleleka, nokukhiqizwa kwemidiya. Ukuze wakhe ukuqonda okujulile, phatha i-Mimi Streaming Audio Codec njengemodeli yokusebenza, hhayi isici esisodwa: chaza imiphumela efiselekayo, ucacise ukucabanga, futhi uhlukanise lokho uhlelo olungakwenza ngokwethembeka kulokho okusadinga ukwahlulela kochwepheshe.

Empeleni, amaqembu aqinile asebenzisa i-Mimi Streaming Audio Codec aphatha ikhwalithi, ukubambezeleka, kanye nemvume njengezingxenye ezibalulekile zesu lokuthumela. Babhala imibandela yempumelelo ecacile, ukuhlola okuqhathaniswa nedatha engokoqobo nokugeleza komsebenzi, futhi baphindaphinde ngokusekelwe kumaphethini okuhluleka aqashiwe esikhundleni sokuwina kwebhentshimakhi yesikhathi esisodwa. Yilapho ukuqonda kwethiyori kuguquka kube amandla ahlala njalo kuwo wonke umkhiqizo, inqubomgomo, kanye nokusebenza.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ngesikhathi esifanayo, ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyakhuphuka uma imvume ingekho. Indlela eqine kakhulu iwukuhlanganisa isivinini sokuhlola nesiyalo sokuphatha: qhuba abashayeli bezindiza, bamba ubufakazi, ushicilele amalogi ezinqumo, futhi ubuyekeze izivikelo ngokuqhubekayo njengoba imodeli yokuziphatha, okulindelwe ngabasebenzisi, kanye nezimfuneko zokulawula zishintsha.

I-Strategic Impact

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi.

Ithuthukisa ukufinyeleleka ngokuloba, ukulandisa, nezixhumi ezibonakalayo zezwi. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane.

Amaqembu emidiya angathumela umsindo opholishiwe ngokushesha ngamabhajethi amancane. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu.

Amasistimu abhekene nekhasimende angacubungula ukusebenzelana okukhulunyiwe ngesilinganiso esikhulu. Ekusetshenzisweni kwekhwalithi ephezulu, lokhu kuhunyushwa emithethweni yokusebenza elinganisekayo, imingcele yobunikazi, nemikhuba yokubuyekeza ephindelelayo ukuze amaqembu akwazi ukukala ukuzethemba esikhundleni sokukala ukungaqondakali.

Ikusasa le-Mimi Streaming Audio Codec

Lindela amakhodekhi afana no-Mimi ukuthi abe isixhumi esibonakalayo esijwayelekile phakathi kwamamodeli omsindo nezilimi ezinkulu, asunduze abasizi bezwi besikhathi sangempela ezikhathini zokuphendula ezingaphansi kwe-100 ms. Ucwaningo lwehlisa amanani ethokheni ngenkathi kugcinwa ubunikazi besikhulumi, imizwa, nomculo. Ngenxa yokuthi i-Kyutai inemithombo evulekile engu-Mimi kanye ne-Moshi, kungenzeka ithole amasistimu amaningi avulekile enkulumo-kuya-inkulumo, abasizi abakudivayisi, namathuluzi okuxhumana ezwi aphansi kakhulu.

Ukuqaliswa Komhlaba Wangempela

Inika amandla umsizi wezwi we-Moshi we-Kyutai ogcwele-duplex ukuze alalele futhi akhulume kanyekanye

Ukusakaza amathokheni enkulumo kumodeli yolimi yokuhumusha ngesikhathi sangempela inkulumo-kuya-enkulumweni

Amakholi ezwi e-Ultra-low-bitrate (~1.1 kbps) ngezimo zenethiwekhi ezimbi noma eziminyene

Ukwenza ithokheni yomsindo wenkulumo ekhiqizayo namapayipi okuguqula umbhalo-ube-inkulumo acabanga ngomsindo njengombhalo

Amaphethini Okusebenzisa

I-Mimi Streaming Audio Codec isebenza

Inika amandla umsizi wezwi we-Moshi we-Kyutai ogcwele-duplex ukuze alalele futhi akhulume kanyekanye.

Ukunika amandla umsizi wezwi we-Moshi we-Kyutai ogcwele i-duplex ukuze akwazi ukulalela futhi akhulume kanyekanye Amaqembu ngokuvamile athola imiphumela engcono uma echaza ikhwalithi ephezulu ngaphambili, egcina indlela yokukhuphuka yomuntu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Mimi Streaming Audio Codec isebenza

Ukusakaza amathokheni enkulumo kumodeli yolimi yokuhumusha ngesikhathi sangempela inkulumo-kuya-enkulumweni.

Ukusakaza amathokheni enkulumo abe yimodeli yolimi yesikhathi sangempela Amathimba okuhumusha inkulumo-kuya-nkulumo ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Mimi Streaming Audio Codec isebenza

Amakholi ezwi e-Ultra-low-bitrate (~1.1 kbps) ngezimo zenethiwekhi ezimbi noma eziminyene.

Izingcingo zezwi ze-Ultra-low-bitrate (~1.1 kbps) ezimweni zenethiwekhi ezimbi noma eziminyene kakhulu Amaqembu ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi alandelele kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

I-Mimi Streaming Audio Codec isebenza

Ukwenza ithokheni yomsindo wenkulumo ekhiqizayo namapayipi okuguqula umbhalo-ube-inkulumo acabanga ngomsindo njengombhalo.

Ukwenza ithokheni yomsindo wenkulumo ekhiqizayo kanye namapayipi okuguqula umbhalo-u-enkulumweni acabanga ngomsindo ofana nombhalo Amaqembu ngokuvamile athola imiphumela engcono uma echaza izilinganiso zekhwalithi ngaphambili, egcina indlela yokukhuphuka yabantu yamakesi asemaphethelweni, futhi elandelela kokubili izinzuzo zokukhiqiza nezindleko zamaphutha ngokuhamba kwesikhathi.

Izingozi & Guardrails

!

Ukusetshenziswa kabi kwezwi kanye nezingozi zokuzenza ongeyena ziyanda uma imvume ingekho.

!

Ukunemba kungase kwehle kuzo zonke izinhlobo zokuphimisela, izilimi zesigodi, noma izindawo ezinomsindo.

!

Umsindo wokwenziwa ungenziwa iphutha njengenkulumo eyiqiniso ngaphandle kokulebula okucacile.

Ukuqalisa Umhlahlandlela

1

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha.

Thola imvume esobala yokuthwebula izwi, ukuhlanganisa, nokusebenzisa kabusha. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

2

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva.

Ikhwalithi yokuhlola kuzo zonke izipikha nezimo zangemuva. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

3

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo.

Chaza ukuthi kunini lapho umuntu kufanele abuyekeze noma agunyaze okuphumayo. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

4

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele.

Lebula umsindo wokwenziwa futhi ugcine amarekhodi atholakalayo ukuze aziphendulele. Phatha isinyathelo ngasinye njengesango lobufakazi: uma imibandela ingafinyelelwa, misa ukukhishwa, vala igebe, bese unweba ukusetshenziswa.

Qhubeka Uhlole