1 |
The Multilingual TEDx Corpus for Speech Recognition and Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
End-to-end ASR to jointly predict transcriptions and linguistic annotations ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
A Corpus for Large-Scale Phonetic Typology
|
|
|
|
In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
|
|
BASE
|
|
Show details
|
|
8 |
Analysis of Multilingual Sequence-to-Sequence speech recognition systems ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling ...
|
|
Cho, Jaejin; Baskar, Murali Karthick; Li, Ruizhi; Wiesner, Matthew; Mallidi, Sri Harish; Yalta, Nelson; Karafiat, Martin; Watanabe, Shinji; Hori, Takaaki. - : arXiv, 2018
|
|
Abstract:
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
|
|
URL: https://arxiv.org/abs/1810.03459 https://dx.doi.org/10.48550/arxiv.1810.03459
|
|
BASE
|
|
Hide details
|
|
|
|