1 |
A comparative study of several parameterizations for speaker recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Speaker verification in mismatch training and testing conditions ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
A New Amharic Speech Emotion Dataset and Classification Benchmark ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Automatic Dialect Density Estimation for African American English ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems ...
|
|
|
|
Abstract:
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. We demonstrate the proposed model is a general biasing solution which is domain-insensitive and can be ... : This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.00888 https://arxiv.org/abs/2203.00888
|
|
BASE
|
|
Hide details
|
|
12 |
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Telepractice treatment of rhotics (Peterson et al., 2022) ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Telepractice treatment of rhotics (Peterson et al., 2022) ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Learning and controlling the source-filter representation of speech with a variational autoencoder ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Correcting Misproducted Speech using Spectrogram Inpainting ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Decoding Neural Correlation of Language-Specific Imagined Speech using EEG Signals ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|