GUIDE IA audio

Architecture de parole

DeepSpeech xeetu xàmmee kàddu la bu Baidu dugal ci 2014, muy màndargaal man-mani audio yu ñor yi ci mbind mi, di jëfandikoo reso neuronal buy baaxoo tàggat ak ñàkkum CTC.

Résumé

DeepSpeech xeetu xàmmee kàddu la bu Baidu dugal ci 2014, muy màndargaal man-mani audio yu ñor yi ci mbind mi, di jëfandikoo reso neuronal buy baaxoo tàggat ak ñàkkum CTC. Jàppale na ñu njëkka joge ci pipeline ASR yu jafee defar, ñu dem ci sistem yuñ jàng te lalu ci done.

DeepSpeech Architecture mingi toog ci biir ay liggéeyu audio-IA yuy soppi kàddu, music, ak son ngir jokkoo, yombal jëfandikoo gi, ak defar media.

Plongeur bu xóot

Xammekaayu kàddu yu yàgg yi dañu boole ay xeeti akustik yu wuute, diksoneer waxin, ak xeeti làkk yu am cër yuñ defaree loxo. DeepSpeech dafa wecci li ëpp ci loolu ak benn reso neuronal buñ tàggat ba ci njeexte li. Architecture bi dafay jël spectrogram wala MFCC ci kaw kadre audio yu gàtt ba noppi di leen dundal ci ay couche yu bari yuñ boole, couche buy baamtu ci ñaari yoon yuy jàpp contexte ci jamono yu weesu ak ëlëg, ak couche buy génne distribution probabilité ci kaw araf yi ci jéego bu nekk. Li gëna am solo mooy, dafay jëfandikoo Classification Temporal Connectionist (CTC), muy may reso bi mu jàng liggéey bi am ci diggante audio ak mbind te soxlawul etiketu kadre. Ginaaw loolu Mozilla genne ab jëfandikoo open-source bu siiw (ak yeneen yu bees yuy jëfandikoo jëmmal bu lalu ci LSTM, jëmmal ci streaming), muy tax jëfandikoo gi gëna yomba jëfandikoo.

Gis-gis xarala

Caabi enabler mooy CTC ñàkk. Kaddu ak bind mënul méngoo kaadar-ci-kadre, moo tax CTC dafay dugal màndarga 'amul dara' ba noppi boole lépp lu mëna méngoo ci transkripsioŋ biñ bëgga. Loolu dafay may model bi mu génne benn araf ci jéego bu nekk, ba noppi jàng ci saasi fi son yi di méngoo ak araf yi. RNN bu am ñaari yoon dafay may bépp wax luy waaja am xëcc ci contexte akustik bi ko wër, ba noppi ñuy faral di yokk xeetu làkk n-gram bu biti ci waxtu dekode ngir gëna baaxal ortograafi ak tànneefi baat yi.

Xam Architecture DeepSpeech

DeepSpeech xeetu xàmmee kàddu la bu Baidu dugal ci 2014, muy màndargaal man-mani audio yu ñor yi ci mbind mi, di jëfandikoo reso neuronal buy baaxoo tàggat ak ñàkkum CTC. Jàppale na ñu njëkka joge ci pipeline ASR yu jafee defar, ñu dem ci sistem yuñ jàng te lalu ci done. DeepSpeech Architecture mingi toog ci biir ay liggéeyu audio-IA yuy soppi kàddu, music, ak son ngir jokkoo, yombal jëfandikoo gi, ak defar media. Ngir tabax xam-xam bu xóot, jàppal DeepSpeech Architecture ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ba noppi tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.

Ci jëf, ekip yu am doole yiy jëfandikoo DeepSpeech Architecture dañuy jàppee kalite, yeexal, ak nangu ni cër yu am solo ci pexem dugal. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.

Dafay gëna yombal jëfandikoo gi jaaraleko ci transkripsioŋ, nettali ak interfaasu baat. Ci jamano jooju, risku jëfandikoo Baat bu baaxul ak niru ak nit dafay gëna yokk sudee nanguwul. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.

njeextalu pexe

Dafay gëna yombal jëfandikoo gi jaaraleko ci transkripsioŋ, nettali ak interfaasu baat.

Dafay gëna yombal jëfandikoo gi jaaraleko ci transkripsioŋ, nettali ak interfaasu baat. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ekipu mejaa yi mën nañu yónnee audio bu leer ci anam wu gëna gaaw te seen xaalis gëna néew.

Ekipu mejaa yi mën nañu yónnee audio bu leer ci anam wu gëna gaaw te seen xaalis gëna néew. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Sistem yiy jàkkarloo ak kiliyaan bi mën nañu def waxtaan ci anam wu gëna yaatu.

Sistem yiy jàkkarloo ak kiliyaan bi mën nañu def waxtaan ci anam wu gëna yaatu. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.

Ëlëgu architecture DeepSpeech

DeepSpeech ci boppam ñu ngi ko wecci bu baax ci arsitektir yu lalu ci bàyyi xel ak soppi (Conformer, Whisper, wav2vec 2.0) yuy jàpp muy lu gëna gudd te di saytu seen bopp ci audio bu amul etiket. Waaye xalaatam yu am solo yi, tàggat ci njeexte ak dekodaas CTC, ñu ngi des ci fundamental te ba leegi ñu ngi feeñ ci biir sistem hybrid yu bees yi. Legacy bi ci konseptioŋ la: dafa firndeel ni benn model buñ jàng mën na xëcc ay pipeline yuñ defar bu baax, loolu moo ubbi buntu model yu yaatu yu tay yi, lakk yu bari, yu ñuy saytu seen bopp.

Doxal ci àdduna dëgg

Offline, ci aparey biy xàmmee ndigalu baat ngir aplikaasioŋ yiñ jagleel kumpa, di jëfandikoo DeepSpeech bu Mozilla

Defar ay projet de transcription ci ay podcast wala ay diskur te doo yéem ci ab serwiisu cloud

Jàngale li gëna am solo ci ASR ak CTC ci njàngum masin ci daara yu kawe yi

Tabax interfaasu baat buñ personaalise bu IoT wala aparey yuñ samp fu ñu soxla xàmmeekaay bu woyof te mëna streaming

Modèlu jëfandikoo

Architecture DeepSpeech ci jëf

Xamme komandu baat bu nekk ci biir aparey bi, ngir aplikaasioŋ yiñ jagleel sàmmonte, di jëfandikoo DeepSpeech bu Mozilla.

Offline, ci aparey biy xàmmee komand baat ngir aplikaasioŋ yu lalu ci nëbbëtu yu jëfandikoo Mozilla's open DeepSpeech Teams dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.

Architecture DeepSpeech ci jëf

Defar ay projet de transcription ci podcast yi wala kàddu yi te doo yéem ci ab serwiisu cloud.

Defar ab projet de transcripts ci podcasts wala lectures te doo yéem ci serwiisu cloud Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee ay threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bu gàtt.

Architecture DeepSpeech ci jëf

Jàngale li gëna am solo ci ASR ak CTC ci njàngum masin ci daara yu kawe yi.

Jàngale li gëna am solo ci ASR ak CTC ñàkk ci njàngum masin universite. Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bu gàtt.

Architecture DeepSpeech ci jëf

Tabax interfaasu baat buñ personaalise bu baax ngir IoT wala aparey yuñ samp fu ñu soxla xàmmeekaay bu woyof te mëna streaming.

Tabax interface yu baat yuñ jagleel IoT wala ay aparey yuñ samp fu ñu soxla xàmmeekaay bu woyof, streamable. Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee ay threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.

Risk yi ak balustrade yi

!

Jëfandikoo baat ci anam wu jaarul yoon ak niru ak nit dafay gëna yokk sudee nanguwul.

!

Jaar-jaar mën na wàññeeku ci aksan yi, dialect yi wala barab yu bari xumbaay.

!

Audio synthetik mën nañu ko jaawale ak wax ju dëggu sudee amul etiket bu leer.

Roadmap ngir samp gi

1

Wutal ndigal bu leer ngir jàpp baat bi, klone ko ak jëfandikoowaat ko.

Wutal ndigal bu leer ngir jàpp baat bi, klone ko ak jëfandikoowaat ko. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

2

Saytu kalite ci kàddukat yu bari ak anam yu bari ci ginaaw.

Saytu kalite ci kàddukat yu bari ak anam yu bari ci ginaaw. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

3

Mandargal kañ la nit wara xoolaat wala nangu ay génne.

Mandargal kañ la nit wara xoolaat wala nangu ay génne. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

4

Etiketu audio synthetik te nga denc dokimaa ci fimu bawoo ngir mëna lim.

Etiketu audio synthetik te nga denc dokimaa ci fimu bawoo ngir mëna lim. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.

Weyal di banneexu