Résumé
Kaldi jumtukaay la bu amul fayda, open-source moo nekkoon platform gëstu bi gëna am solo ngir tabax sistem yuy xàmmee kàddu. Dafa am solo ndax lu tollu ci fukki at ci ginaaw la nekkoon fundaasioŋ bi ñuy jaar ngir liggéey ASR ci daara yu kawe yi ak ci usine yi.
Kaldi Speech Recognition Toolkit mingi toog ci biir ay liggéey audio-IA yuy soppi kàddu, music, ak son ngir jokkoo, yombal jëfandikoo gi, ak defar media.
Plongeur bu xóot
Kaldi, mi génn ci 2011 te Daniel Povey moo ko jiite, ñu bind ko ci C++ ak rëset yuñ boole ci script bash ak Perl. Tabax na ci kaw gasoduc ASR bu yàgg bi: génne ci màndarga akustik (MFCCs wala filtrebanks), model son fonem ak Model Mixture Gaussian wala, ci ganaw, reso neuronal yu xóot, ba noppi boole model akustik, lexicon waxin, ak model làkk ci benn graphe buñ mëna seetee. Tanneefam ci xaralaam mooy jëfandikoo ay transducer yu am pondération (WFSTs) yu bawoo ci bibliothèque OpenFST ngir boole bépp balluwaayi xam-xam ci benn graphe decodage. Kaldi dafa yónnee ay 'rëset' ngir ay done yuñ miin lu ci melni Switchboard, Librispeech, ak Wall Street Journal, loolu tax gëstukat yi mëna génne ay resultaa yu bees. Nekk na royuwaay bi ñuy jëfandikoo ngir méngale sistem yu bees yi.
Gis-gis xarala
Li gëna am solo ci Kaldi mooy boole ñeenti WFST ci benn graf bu tuddu HCLG: H dafay xayma neural-net wala GMM ci telefon yi aju ci contexte, C dafay jëfandikoo contexte fonetik (triphones), L mooy waxin lexicon biy xayma telefon yi ci baat yi, G mooy modelu làkk wi. Yokkateg transducer yooyu ak gëna mëna jëfandikoo resultaa bi dafay defar benn graphe bu decoder bi di seet ak algorithm Viterbi buñ dagg ci limyéer, mu soppi kadre audio yi ci anam wu gëna woyof ci toppalante baat yi.
Jumtukaayu xàmmee kàddu Kaldi
Kaldi jumtukaay la bu amul fayda, open-source moo nekkoon platform gëstu bi gëna am solo ngir tabax sistem yuy xàmmee kàddu. Dafa am solo ndax lu tollu ci fukki at ci ginaaw la nekkoon fundaasioŋ bi ñuy jaar ngir liggéey ASR ci daara yu kawe yi ak ci usine yi. Kaldi Speech Recognition Toolkit mingi toog ci biir ay liggéey audio-IA yuy soppi kàddu, music, ak son ngir jokkoo, yombal jëfandikoo gi, ak defar media. Ngir tabax xam-xam bu xóot, jàppal Kaldi Speech Recognition Toolkit ni xeetu liggéey, du benn man-man: leeral njariñ yi nga bëgg, leeral xalaat yi, ak tàqale li sistem bi mëna def ci anam wu wóor ak li ba leegi soxla àtteb kàngam.
Ci jëf, ekip yu am doole yiy jëfandikoo Kaldi Speech Recognition Toolkit dañuy jàppee kalite, latency, ak nangu ni cër yu am solo ci pexem jëfandikoo gi. Dañuy bind kritër yu leer ngir am ndam, natt leen ci done yu dëggu ak def liggéey, ba noppi ñu baamtu ci anamu ñàkka mëna seetlu, du ci benn yoon benchmark wins. Mooy barab bi xam-xam theorie bi di soppiku nekk kàttan buy yàgg ci produit yi, ci politik yi ak ci liggéey yi.
Dafay gëna yombal jëfandikoo gi jaaraleko ci transkripsioŋ, nettali ak interfaasu baat. Ci jamano jooju, risku jëfandikoo Baat bu baaxul ak niru ak nit dafay gëna yokk sudee nanguwul. Xeetu jëf bi gëna dëgër mooy boole gaawaayu jàngat ak disipline nguur: doxal pilote, jàpp firnde, siiwal dogal yi, ak wéy di yeesal kaaraange gi ci anam wi ñuy doxalee, li jëfandikukat bi di xaar, ak sàrti sàrt yi di jëm kanam.
njeextalu pexe
Dafay gëna yombal jëfandikoo gi jaaraleko ci transkripsioŋ, nettali ak interfaasu baat.
Dafay gëna yombal jëfandikoo gi jaaraleko ci transkripsioŋ, nettali ak interfaasu baat. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Ekipu mejaa yi mën nañu yónnee audio bu leer ci anam wu gëna gaaw te seen xaalis gëna néew.
Ekipu mejaa yi mën nañu yónnee audio bu leer ci anam wu gëna gaaw te seen xaalis gëna néew. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Sistem yiy jàkkarloo ak kiliyaan bi mën nañu def waxtaan ci anam wu gëna yaatu.
Sistem yiy jàkkarloo ak kiliyaan bi mën nañu def waxtaan ci anam wu gëna yaatu. Ci jëfandikoo yu am kalite bu kawe, loolu dañu koy tekki ci sàrti liggéey yuñ mëna natt, ay peggu boroom, ak ay xew-xewu xoolaat yu bari suko defee ekip yi mëna yokk wóolu seen bopp ci barabu yokk lu jaxasoo.
Doxal ci àdduna dëgg
Laboratoire yu mag yi dañuy génne Librispeech ak Switchboard ngir gëna dëggal gëstub model akustik bu bees
Tabax sistemu komandu baat ngir làkk yu néew doole wala yu néew doole yi jëfandikoo rëset yu Kaldi
Forse audio ak transcription ngir linguistic, defar ay done, ak waxtu sous-titre
Seetug baat bu teel ak backends diktaasioŋ ci liggéeyukaay yi balaa xeetu end-to-end di màgg
Modèlu jëfandikoo
Jumtukaayu xàmmee kàddu Kaldi ci jëf
Laboratoire akademik yi dañuy defaraat Librispeech ak Switchboard ngir firndeel gëstub model akustik bu bees.
Laboratoire akademik yi di génne Librispeech ak Switchboard benchmarks ngir gëna dëggal gëstu modeling akustik bu bees Ekip yi dañuy faral di am njariñ yu gëna baax suñu leeralee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bu gàtt.
Jumtukaayu xàmmee kàddu Kaldi ci jëf
Tabax sistemu komandu baat buñ personaalise bu baax ngir làkk yu néew doole yi wala làkk yu néew doole yi, jëfandikoo rëset yu Kaldi.
Tabax sistemu komand baat ngir làkk yu néew doole wala làkk yu néew doole yi jëfandikoo rëset Kaldi Ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit yi ak njëgu njuumte yi ci diir bi.
Jumtukaayu xàmmee kàddu Kaldi ci jëf
Forse audio bi ak transcript yi ngir linguistic, defar ay done ak waxtu sous-titre yi.
Forse alignment audio ak transcription ngir linguistic, defar dataset, ak waxtu subtitle Teams yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.
Jumtukaayu xàmmee kàddu Kaldi ci jëf
Seetug baat bu teel ak backends dikte ci liggéeyukaay yi balaa xeetu end-to-end di màgg.
Dooleel seetlu baat bu teel ak backends dictation ci liggéeyukaay yi balaa model yu mujj ba jeexal ekip yi dañuy faral di am njariñ yu gëna baax suñu joxee threshold yu baax ci kanam, tëye yoonu escalation nit ngir jafe-jafe yi, ba noppi topp njariñu produit ak njëgu njuumte ci diir bi.
Risk yi ak balustrade yi
Jëfandikoo baat ci anam wu jaarul yoon ak niru ak nit dafay gëna yokk sudee nanguwul.
Jaar-jaar mën na wàññeeku ci aksan yi, dialect yi wala barab yu bari xumbaay.
Audio synthetik mën nañu ko jaawale ak wax ju dëggu sudee amul etiket bu leer.
Roadmap ngir samp gi
Wutal ndigal bu leer ngir jàpp baat bi, klone ko ak jëfandikoowaat ko.
Wutal ndigal bu leer ngir jàpp baat bi, klone ko ak jëfandikoowaat ko. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Saytu kalite ci kàddukat yu bari ak anam yu bari ci ginaaw.
Saytu kalite ci kàddukat yu bari ak anam yu bari ci ginaaw. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Mandargal kañ la nit wara xoolaat wala nangu ay génne.
Mandargal kañ la nit wara xoolaat wala nangu ay génne. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.
Etiketu audio synthetik te nga denc dokimaa ci fimu bawoo ngir mëna lim.
Etiketu audio synthetik te nga denc dokimaa ci fimu bawoo ngir mëna lim. Japp jéego bu nekk ni buntu firnde: sudee mattul kritër yi, noppali génne gi, tëj bërëb bi, ba noppi nga yaatal jëfandikoo gi.