1 |
Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information ...
|
|
|
|
Abstract:
Deep learning-based pronunciation scoring models highly rely on the availability of the annotated non-native data, which is costly and has scalability issues. To deal with the data scarcity problem, data augmentation is commonly used for model pretraining. In this paper, we propose a phone-level mixup, a simple yet effective data augmentation method, to improve the performance of word-level pronunciation scoring. Specifically, given a phoneme sequence from lexicon, the artificial augmented word sample can be generated by randomly sampling from the corresponding phone-level features in training data, while the word score is the average of their GOP scores. Benefit from the arbitrary phone-level combination, the mixup is able to generate any word with various pronunciation scores. Moreover, we utilize multi-source information (e.g., MFCC and deep features) to further improve the scoring system performance. The experiments conducted on the Speechocean762 show that the proposed system outperforms the baseline by ... : 5 pages, 2 figures. This paper is submitted to INTERSPEECH 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2203.01826 https://dx.doi.org/10.48550/arxiv.2203.01826
|
|
BASE
|
|
Hide details
|
|
2 |
The Influence of English as a Foreign Language Teachers’ Positive Mood and Hope on Their Academic Buoyancy: A Theoretical Review
|
|
|
|
In: Front Psychol (2022)
|
|
BASE
|
|
Show details
|
|
|
|