DE eng

Search in the Catalogues and Directories

Hits 1 – 10 of 10

1
Learning representations of speech from the raw waveform ; Apprentissage de représentations de la parole à partir du signal brut
Zeghidour, Neil. - : HAL CCSD, 2019
In: https://tel.archives-ouvertes.fr/tel-02278616 ; Machine Learning [cs.LG]. Université Paris sciences et lettres, 2019. English. ⟨NNT : 2019PSLEE004⟩ (2019)
BASE
Show details
2
Learning to detect dysarthria from raw speech
In: ICASSP ; ICASSP-2019 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-02274504 ; ICASSP-2019 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2019, Brighton, United Kingdom (2019)
BASE
Show details
3
Learning Filterbanks from Raw Speech for Phoneme Recognition
In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01888737 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
BASE
Show details
4
Sampling strategies in Siamese Networks for unsupervised speech representation learning
In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01888725 ; Interspeech 2018, Sep 2018, Hyderabad, India (2018)
BASE
Show details
5
End-to-End Speech Recognition From the Raw Waveform
In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01888739 ; Interspeech 2018, Sep 2018, Hyderabad, India. ⟨10.21437/Interspeech.2018-2414⟩ (2018)
Abstract: Accepted for presentation at Interspeech 2018 ; International audience ; State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [SCCO.LING]Cognitive science/Linguistics; [SCCO]Cognitive science; deep; end-to-end; gammatones; Index Terms: speech recognition; scattering; Speech recognition; waveform
URL: https://doi.org/10.21437/Interspeech.2018-2414
https://hal.archives-ouvertes.fr/hal-01888739
https://hal.archives-ouvertes.fr/hal-01888739/file/Zeghidour_USCD_2018_End2end_from_wav.Interspeech.pdf
https://hal.archives-ouvertes.fr/hal-01888739/document
BASE
Hide details
6
SING: Symbol-to-Instrument Neural Generator
In: Conference on Neural Information Processing Systems (NIPS) ; https://hal.archives-ouvertes.fr/hal-01899949 ; Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada (2018)
BASE
Show details
7
Fully Convolutional Speech Recognition ...
BASE
Show details
8
Fader Networks: Manipulating Images by Sliding Attributes
In: 31st Conference on Neural Information Processing Systems (NIPS 2017) ; https://hal.archives-ouvertes.fr/hal-02275215 ; 31st Conference on Neural Information Processing Systems (NIPS 2017), Dec 2017, Long Beach, CA, United States. pp.5969-5978 (2017)
BASE
Show details
9
Learning Weakly Supervised Multimodal Phoneme Embeddings
In: Interspeech 2017 ; https://hal.inria.fr/hal-01687415 ; Interspeech 2017, 2017, Stockholm, Sweden. ⟨10.21437/Interspeech.2017-1689⟩ (2017)
BASE
Show details
10
Learning weakly supervised multimodal phoneme embeddings ...
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
10
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern