1 |
RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION
|
|
|
|
In: IEEE ICASSP 2022 ; https://hal.archives-ouvertes.fr/hal-03539741 ; IEEE ICASSP 2022, 2022, Singapour, Singapore (2022)
|
|
BASE
|
|
Show details
|
|
2 |
High-resolution speaker counting in reverberant rooms using CRNN with Ambisonics features
|
|
|
|
In: EUSIPCO 2020 - 28th European Signal Processing Conference (EUSIPCO) ; https://hal.archives-ouvertes.fr/hal-03537323 ; EUSIPCO 2020 - 28th European Signal Processing Conference (EUSIPCO), Jan 2021, Amsterdam, Netherlands. pp.71-75, ⟨10.23919/Eusipco47968.2020.9287637⟩ (2021)
|
|
Abstract:
International audience ; Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. For that purpose, we address the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. We trained the network to predict up to 5 concurrent speakers in a multichannel mixture, with simulated data including many different conditions in terms of source and microphone positions, reverberation, and noise. The network can predict the number of speakers with good accuracy at frame resolution.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; CRNN; reverberation; Speaker counting
|
|
URL: https://doi.org/10.23919/Eusipco47968.2020.9287637 https://hal.archives-ouvertes.fr/hal-03537323 https://hal.archives-ouvertes.fr/hal-03537323/file/eusipco2020.pdf https://hal.archives-ouvertes.fr/hal-03537323/document
|
|
BASE
|
|
Hide details
|
|
3 |
Does bilingual input hurt? A simulation of language discrimination and clustering using i-vectors
|
|
|
|
In: CogSci 2020 - 42nd Annual Virtual Meeting of the Cognitive Science Society ; https://hal.archives-ouvertes.fr/hal-02959451 ; CogSci 2020 - 42nd Annual Virtual Meeting of the Cognitive Science Society, Jul 2020, Toronto / Virtual, Canada (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Focus Particles and Extraction – An Experimental Investigation of German and English Focus Particles in Constructions with Leftward Association ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Focus Particles and Extraction – An Experimental Investigation of German and English Focus Particles in Constructions with Leftward Association
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Perceptual Associations Between Words and Speaker Age
|
|
|
|
In: Laboratory Phonology: Journal of the Association for Laboratory Phonology; Vol 7, No 1 (2016); 18 ; 1868-6354 (2016)
|
|
BASE
|
|
Show details
|
|
12 |
Measuring passers-by engagement with AmPost: a printed interactive audio poster
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Partikeln im komplexen Satz : Mechanismen der Lizenzierung von Modalpartikeln in Nebensätzen und Faktoren ihrer Verwendung in komplexen Sätzen : kontrastive Untersuchung am Beispiel der Partikeln ja, doch und denn im Deutschen und vedʹ [ved'], že [že] und vot [vot] im Russischen
|
|
|
|
BLLDB
|
|
UB Frankfurt Linguistik
|
|
Show details
|
|
17 |
Prosody and informativity: a cross-linguistic investigation ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment
|
|
|
|
In: DTIC (2015)
|
|
BASE
|
|
Show details
|
|
|
|