1 |
A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice
|
|
|
|
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599085 ; Information, MDPI, 2022, 13 (3), pp.102. ⟨10.3390/info13030102⟩ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet
|
|
|
|
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599076 ; Information, MDPI, 2022, 13 (3), pp.103. ⟨10.3390/info13030103⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Évaluation de la perception des sons de parole chez les populations pédiatriques : réflexion sur les épreuves existantes
|
|
|
|
In: ISSN: 0298-6477 ; EISSN: 2117-7155 ; Glossa ; https://hal.archives-ouvertes.fr/hal-03646757 ; Glossa, UNADREO - Union NAtionale pour le Développement de la Recherche en Orthophonie, 2022, 132, pp.1-27 ; https://www.glossa.fr/index.php/glossa/article/view/1043 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Learning and controlling the source-filter representation of speech with a variational autoencoder
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03650569 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
Domestic Ubimus
|
|
|
|
In: EISSN: 2409-9708 ; EAI Endorsed Transactions on Creative Technologies ; https://hal-hprints.archives-ouvertes.fr/hprints-03602695 ; EAI Endorsed Transactions on Creative Technologies, EAI - European Alliance for Innovation, 2022, ⟨10.4108/eai.22-2-2022.173493⟩ (2022)
|
|
BASE
|
|
Show details
|
|
6 |
A comparative study of several parameterizations for speaker recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Speaker verification in mismatch training and testing conditions ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
A New Amharic Speech Emotion Dataset and Classification Benchmark ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Automatic Dialect Density Estimation for African American English ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Variation in Spanish/s: Overview and New Perspectives
|
|
|
|
In: World Languages and Literatures Faculty Publications and Presentations (2022)
|
|
BASE
|
|
Show details
|
|
16 |
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems ...
|
|
|
|
Abstract:
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. We demonstrate the proposed model is a general biasing solution which is domain-insensitive and can be ... : This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.00888 https://arxiv.org/abs/2203.00888
|
|
BASE
|
|
Hide details
|
|
18 |
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|