Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year
- Medium
- Type
- BLLDB-Access:
  - free (30)
  - subject to license (1)

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 30

1	Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics ...
	Röder, Frank; Eppe, Manfred; Wermter, Stefan. - : arXiv, 2022
	BASE
	Show details

2	LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading ...
	Qu, Leyuan; Weber, Cornelius; Wermter, Stefan. - : arXiv, 2021
	BASE
	Show details

3	Neural Network Learning for Robust Speech Recognition
	Qu, Leyuan. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2021
	Abstract: Recently, end-to-end architectures have dominated the modeling of Automatic Speech Recognition (ASR) systems. Conventional systems usually consist of independent components, like an acoustic model, a language model and a pronunciation model. In comparison, end-to-end ASR approaches aim to directly map acoustic inputs to character or word sequences, which significantly simplifies the complex training procedure. Plenty of end-to-end architectures have been proposed, for instance, Connectionist Temporal Classification (CTC), Sequence Transduction with Recurrent Neural Networks (RNN-T) and attention-based encoder-decoder, which have accomplished great success and achieved impressive performance on a variety of benchmarks or even reached human level on some tasks. However, although advanced deep neural network architectures have been proposed, in adverse environments, the performance of ASR systems suffers from significant degradation because of environmental noise or ambient reverberation. To improve the robustness of ASR systems, in this thesis, we address the research questions and conduct experiments from the following perspectives: Firstly, to learn more stable visual representations, we propose LipSound and LipSound2 and investigate to what extent the visual modality contains semantic information that can benefit ASR performance. The LipSound/LipSound2 model consists of an encoder-decoder with an location-aware attention architecture and directly transforms mouth or face movement sequences to low-level speech representations, i.e. mel-scale spectrograms. The model is trained in a crossmodal self-supervised fashion and does not require any human annotations since the model inputs (visual sequences) and outputs (audio signals) are naturally paired in videos. Experimental results show that the LipSound model not only generates quality mel-spectrograms but also outperforms state-of-the-art models on the GRID benchmark dataset in speaker-dependent settings. Moreover, the improved LipSound2 model further verifies the effectiveness on generalizability (speaker-independent) and transferability (Non-Chinese to Chinese) on large vocabulary continuous speech corpora. Secondly, to exploit the fact that the image of a face contains information about the person's speech sound, we incorporate face embeddings extracted from a pretrained model for face recognition into the target speech separation model, which guide the system for predicting a target speaker mask in the time-frequency domain. The experimental results show that a pre-enrolled face image is able to benefit separating expected speech signals. Additionally, face information is complementary to voice reference. Further improvement can be achieved when combining both face and voice embeddings. Thirdly, to integrate domain knowledge, i.e. articulatory features (AFs) into end-to-end learning, we present two approaches: (a) fine-tuning networks which reuse hidden layer representations of AF extractors as input for ASR tasks; (b) progressive networks which combine articulatory knowledge by lateral connections from AF extractors. Results show that progressive networks are more effective and accomplish a lower word error rate than fine-tuning networks and other baseline models. Finally, to enable end-to-end ASR models to acquire Out-of-Vocabulary (OOV) words, instead of just fine-tuning with the audio containing OOV words, we propose to rescale loss at sentence level or word level, which encourages models to pay more attention to unknown words. Experimental results reveal that fine-tuning the baseline ASR model with loss rescaling and L2/EWC (Elastic Weight Consolidation) regularization can significantly improve the recall rate of OOV words and efficiently overcome the model suffering catastrophic forgetting. Furthermore, loss rescaling at the word level is more stable than the sentence level method and results in less ASR performance loss on general non-OOV words and the LibriSpeech dataset. In sum, this thesis contributes to the robustness of ASR systems by leveraging additional visual sequences, face information and domain knowledge. We achieve significant improvement on speech reconstruction, speech separation, end-to-end modeling and OOV word recognition tasks.
	Keyword: 004: Informatik; ddc:004:
	URL: https://ediss.sub.uni-hamburg.de/handle/ediss/9437 http://nbn-resolving.de/urn:nbn:de:gbv:18-ediss-98286
	BASE
	Hide details

4	Towards a self-organizing pre-symbolic neural model representing sensorimotor primitives ...
	Zhong, Junpei; Cangelosi, Angelo; Wermter, Stefan. - : arXiv, 2020
	BASE
	Show details

5	Conversational Language Learning for Human-Robot Interaction
	Bothe, Chandrakant Ramesh. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2020
	BASE
	Show details

6	Crossmodal Language Grounding in an Embodied Neurocognitive Model
	Heinrich, Stefan; Yao, Yuan; Hinz, Tobias...
	In: Front Neurorobot (2020)
	BASE
	Show details

7	Incorporating End-to-End Speech Recognition Models for Sentiment Analysis ...
	Lakomkin, Egor; Zamani, Mohammad Ali; Weber, Cornelius. - : arXiv, 2019
	BASE
	Show details

8	Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots ...
	Bothe, Chandrakant; Garcia, Fernando; Maya, Arturo Cruz. - : arXiv, 2018
	BASE
	Show details

9	GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection ...
	Lakomkin, Egor; Bothe, Chandrakant; Wermter, Stefan. - : arXiv, 2018
	BASE
	Show details

10	Syntactic Reanalysis in Language Models for Speech Recognition
	Twiefel, Johannes; Hinaut, Xavier; Wermter, Stefan
	In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) ; https://hal.inria.fr/hal-01558462 ; 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Sep 2017, Lisbon, Portugal ; http://icdl-epirob.org/ (2017)
	BASE
	Show details

11	Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture ...
	Heinrich, Stefan; Wermter, Stefan. - : arXiv, 2017
	BASE
	Show details

12	Recurrent Neural Network for Syntax Learning with Flexible Predicates for Robotic Architectures
	Hinaut, Xavier; Twiefel, Johannes; Wermter, Stefan
	In: The Sixth Joint IEEE International Conference Developmental Learning and Epigenetic Robotics (ICDL-EPIROB) ; https://hal.inria.fr/hal-01417697 ; The Sixth Joint IEEE International Conference Developmental Learning and Epigenetic Robotics (ICDL-EPIROB), Sep 2016, Cergy, France ; http://icdl-epirob.org/ (2016)
	BASE
	Show details

13	Semantic Role Labelling for Robot Instructions using Echo State Networks
	Twiefel, Johannes; Hinaut, Xavier; Wermter, Stefan
	In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) ; https://hal.inria.fr/hal-01417701 ; European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Apr 2016, Bruges, Belgium ; https://www.elen.ucl.ac.be/esann/index.php?pg=esann16_programme (2016)
	BASE
	Show details

14	Using Natural Language Feedback in a Neuro-inspired Integrated Multimodal Robotic Architecture
	Twiefel, Johannes; Hinaut, Xavier; Borghetti, Marcelo...
	In: 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) ; https://hal.inria.fr/hal-01417706 ; 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Aug 2016, New York City, United States. pp.52 - 57, ⟨10.1109/ROMAN.2016.7745090⟩ ; http://www.tc.columbia.edu/conferences/roman2016/ (2016)
	BASE
	Show details

15	Natural language acquisition in recurrent neural architectures ; Erwerb von natürlicher Sprache in rekurrenten neuronalen Architekturen
	Heinrich, Stefan. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2016
	BASE
	Show details

16	A Recurrent Neural Network for Multiple Language Acquisition: Starting with English and French
	Hinaut, Xavier; Twiefel, Johannes; Petit, Maxime...
	In: Proceedings of the NIPS Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (CoCo 2015) ; https://hal.inria.fr/hal-02561258 ; Proceedings of the NIPS Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (CoCo 2015), Dec 2015, Montreal, Canada ; http://ceur-ws.org/Vol-1583/ (2015)
	BASE
	Show details

17	Toward a self-organizing pre-symbolic neural model representing sensorimotor primitives
	Zhong, Junpei; Cangelosi, Angelo; Wermter, Stefan. - 2014
	BASE
	Show details

18	Toward a self-organizing pre-symbolic neural model representing sensorimotor primitives
	Zhong, Junpei; Cangelosi, Angelo; Wermter, Stefan. - : Frontiers Media S.A., 2014
	BASE
	Show details

19	Temporal sequence detection with spiking neurons: towards recognizing robot language instructions
	Panchev, Christo; Wermter, Stefan
	In: Connection science. - Abingdon, Oxfordshire : Taylor & Francis 18 (2006) 1, 1-22
	OLC Linguistik
	Show details

20	A modular approach to self-organization of robot control based on language instruction
	Wermter, Stefan; Elmshaw, Mark; Farrand, Simon
	In: Connection science. - Abingdon, Oxfordshire : Taylor & Francis 15 (2003) 2-3, 73-94
	BLLDB
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern