2 |
Attention-based Contextual Language Model Adaptation for Speech Recognition ...
|
|
|
|
Abstract:
Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM. ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2106.01451 https://dx.doi.org/10.48550/arxiv.2106.01451
|
|
BASE
|
|
Hide details
|
|
3 |
Attention-based Contextual Language Model Adaptation for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Reranking Machine Translation Hypotheses with Structured and Web-based Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Mispronunciation Detection in Children's Reading of Sentences
|
|
|
|
BASE
|
|
Show details
|
|
7 |
The SRI NIST 2010 Speaker Recognition Evaluation System (PREPRINT)
|
|
|
|
In: DTIC (2011)
|
|
BASE
|
|
Show details
|
|
15 |
Combining Prosodic, Lexical and Cepstral Systems for Deceptive Speech Detection
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Combining Prosodic, Lexical and Cepstral Systems for Deceptive Speech Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Toward Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings
|
|
|
|
In: DTIC (2005)
|
|
BASE
|
|
Show details
|
|
|
|