1 |
A prospective study of associations between early fearfulness and perceptual sensitivity and later restricted and repetitive behaviours in infants with typical and elevated likelihood of Autism
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Semi-supervised cycle-consistency training for end-to-end ASR using unpaired speech
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Incorporating Temporal Information in Entailment Graph Mining ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Integrating Lexical Information into Entity Neighbourhood Representations for Relation Prediction ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Investigating the Mechanisms Driving Referent Selection and Retention in Toddlers at Typical and Elevated Likelihood for Autism Spectrum Disorder. ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Enforcing constraints for multi-lingual and cross-lingual speech-to-text systems
|
|
|
|
Abstract:
The recent development of neural network-based automatic speech recognition (ASR) systems has greatly reduced the state-of-the-art phone error rates in several languages. However, when an ASR system trained on one language tries to recognize speech from another language, such a system usually fails, even when the two languages come from the same language family. The above scenario poses a problem for low-resource languages. Such languages usually do not have enough paired data for training a moderately-sized ASR model and thus require either cross-lingual adaptation or zero-shot recognition. Due to the increasing interest in bringing ASR technology to low-resource languages, the cross-lingual adaptation of end-to-end speech recognition systems has recently received more attention. However, little analysis has been done to understand how the model learns a shared representation across languages and how language-dependent representations can be fine-tuned to improve the system’s performance. We compare a bi-lingual CTC model with language-specific tuning at earlier LSTM layers to one without such tuning. This is to understand if having language-independent pathways in the model helps with multi-lingual learning and why. We first train the network on Dutch and then transfer the system to English under the bi-lingual CTC loss. After that, the representations from the two networks are visualized. Results showed that the consonants of the two languages are learned very well under a shared mapping but that vowels could benefit significantly when further language-dependent transformations are applied before the last classification layer. These results can be used as a guide for designing multilingual and cross-lingual end-to-end systems in the future. However, creating specialized processing units in the neural network for each training language could yield increasingly large networks as the number of training languages increases. It is also unclear how to adapt such a system to zero-shot recognition. The remaining work adapts two existing constraints to the realm of multi-lingual and cross-lingual ASR. The first constraint is cycle-consistent training. This method defines a shared codebook of phonetic tokens for all training languages. Input speech first passes through the speech encoder of the ASR system and gets quantized into discrete representations from the codebook. The discrete sequence representation is then passed through an auxiliary speech decoder to reconstruct the input speech. The framework constrains the reconstructed speech to be close to the original input speech. The second constraint is regret minimization training. It separates an ASR encoder into two parts: a feature extractor and a predictor. Regret minimization defines an additional regret term for each training sample as the difference between the losses of an auxiliary language-specific predictor with the real language I.D. and a fake language I.D. This constraint enables the feature extractor to learn an invariant speech-to-phone mapping across all languages and could potentially improve the model's generalization ability to new languages.
|
|
Keyword:
cross-lingual; end-to-end training; multilingual; speech-to-text
|
|
URL: http://hdl.handle.net/2142/113928
|
|
BASE
|
|
Hide details
|
|
11 |
Knowledge base integration in biomedical natural language processing applications
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Learning speech embeddings for speaker adaptation and speech understanding
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Modeling phones, keywords, topics and intents in spoken languages
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Investigating the Mechanisms Driving Referent Selection and Retention in Toddlers at Typical and Elevated Likelihood for Autism Spectrum Disorder.
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Infant EEG theta modulation predicts childhood intelligence
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Neural and behavioural indices of face processing in siblings of children with autism spectrum disorder (ASD): a longitudinal study from infancy to mid-childhood
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Speech technology for unwritten languages
|
|
|
|
In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
|
|
BASE
|
|
Show details
|
|
18 |
Incorporating Temporal Information in Entailment Graph Mining ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
How Phonotactics Affect Multilingual and Zero-shot ASR Performance ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|