1 |
How2Sign: A large-scale multimodal dataset for continuous American sign language
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Differentiable Allophone Graphs for Language-Universal Speech Recognition ...
|
|
|
|
Abstract:
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages. While speech annotations at the language-specific phoneme or surface levels are readily available, annotations at a universal phone level are relatively rare and difficult to produce. In this work, we present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings with learnable weights represented using weighted finite-state transducers, which we call differentiable allophone graphs. By training multilingually, we build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language. These phone-based systems with learned allophone graphs can be used by linguists to document new languages, build phone-based lexicons that capture rich pronunciation variations, and re-evaluate the allophone mappings of seen language. We demonstrate the ... : INTERSPEECH 2021. Contains additional studies on phone recognition for unseen languages ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2107.11628 https://dx.doi.org/10.48550/arxiv.2107.11628
|
|
BASE
|
|
Hide details
|
|
5 |
Speech technology for unwritten languages
|
|
|
|
In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
|
|
BASE
|
|
Show details
|
|
6 |
AlloVera: a multilingual allophone database
|
|
|
|
In: LREC 2020: 12th Language Resources and Evaluation Conference ; https://halshs.archives-ouvertes.fr/halshs-02527046 ; LREC 2020: 12th Language Resources and Evaluation Conference, European Language Resources Association, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/ (2020)
|
|
BASE
|
|
Show details
|
|
8 |
Towards Zero-shot Learning for Automatic Phonemic Transcription ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Universal Phone Recognition with a Multilingual Allophone System ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
AlloVera: a multilingual allophone database
|
|
|
|
In: LREC 2020: 12th Language Resources and Evaluation Conference ; https://halshs.archives-ouvertes.fr/halshs-02527046 ; LREC 2020: 12th Language Resources and Evaluation Conference, European Language Resources Association, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/ (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Phoneme Level Language Models for Sequence Based Low Resource ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Multilingual Speech Recognition with Corpus Relatedness Sampling ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
On Leveraging the Visual Modality for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Acoustic-to-Word Models with Conversational Context Information ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Learned In Speech Recognition: Contextual Acoustic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
On Dimensional Linguistic Properties of the Word Embedding Space ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the “Speaking rosetta” JSALT 2017 workshop
|
|
|
|
In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01709578 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Sequence-based Multi-lingual Low Resource Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|