DE eng

Search in the Catalogues and Directories

Hits 1 – 13 of 13

1
Improving the fusion of acoustic and text representations in RNN-T ...
Zhang, Chao; Li, Bo; Lu, Zhiyun. - : arXiv, 2022
BASE
Show details
2
Joint Unsupervised and Supervised Training for Multilingual ASR ...
Bai, Junwen; Li, Bo; Zhang, Yu. - : arXiv, 2021
BASE
Show details
3
Scaling End-to-End Models for Large-Scale Multilingual ASR ...
BASE
Show details
4
Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion ...
Abstract: Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2% to 7% relative on several relevant benchmarks. ...
Keyword: Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
URL: https://dx.doi.org/10.48550/arxiv.2005.09756
https://arxiv.org/abs/2005.09756
BASE
Hide details
5
Deliberation Model Based Two-Pass End-to-End Speech Recognition ...
BASE
Show details
6
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model ...
BASE
Show details
7
Contextual Speech Recognition with Difficult Negative Training Examples ...
BASE
Show details
8
Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model ...
BASE
Show details
9
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models ...
BASE
Show details
10
Multilingual Speech Recognition With A Single End-To-End Model ...
BASE
Show details
11
Exemplar-based sparse representation features: from TIMIT to LVCSR
In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 19 (2011) 8, 2598-2613
BLLDB
OLC Linguistik
Show details
12
Applications of broad class knowledge for noise robust speech recognition
Sainath, Tara N. - : Massachusetts Institute of Technology, 2009
BASE
Show details
13
Acoustic landmark detection and segmentation using the McAulay-Quatieri Sinusoidal Model
Sainath, Tara N. - : Massachusetts Institute of Technology, 2005
BASE
Show details

Catalogues
0
0
1
0
0
0
0
Bibliographies
1
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
12
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern