Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 13 of 13

1	Improving the fusion of acoustic and text representations in RNN-T ...
	Zhang, Chao; Li, Bo; Lu, Zhiyun. - : arXiv, 2022
	BASE
	Show details

2	Joint Unsupervised and Supervised Training for Multilingual ASR ...
	Bai, Junwen; Li, Bo; Zhang, Yu. - : arXiv, 2021
	BASE
	Show details

3	Scaling End-to-End Models for Large-Scale Multilingual ASR ...
	Li, Bo; Pang, Ruoming; Sainath, Tara N.. - : arXiv, 2021
	BASE
	Show details

4	Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion ...
	Peyser, Cal; Sainath, Tara N.; Pundak, Golan. - : arXiv, 2020
	BASE
	Show details

5	Deliberation Model Based Two-Pass End-to-End Speech Recognition ...
	Hu, Ke; Sainath, Tara N.; Pang, Ruoming. - : arXiv, 2020
	BASE
	Show details

6	Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model ...
	Kannan, Anjuli; Datta, Arindrima; Sainath, Tara N.. - : arXiv, 2019
	BASE
	Show details

7	Contextual Speech Recognition with Difficult Negative Training Examples ...
	Alon, Uri; Pundak, Golan; Sainath, Tara N.. - : arXiv, 2018
	BASE
	Show details

8	Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model ...
	Li, Bo; Sainath, Tara N.; Sim, Khe Chai. - : arXiv, 2017
	BASE
	Show details

9	No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models ...
	Sainath, Tara N.; Prabhavalkar, Rohit; Kumar, Shankar. - : arXiv, 2017
	BASE
	Show details

10	Multilingual Speech Recognition With A Single End-To-End Model ...
	Toshniwal, Shubham; Sainath, Tara N.; Weiss, Ron J.. - : arXiv, 2017
	BASE
	Show details

11	Exemplar-based sparse representation features: from TIMIT to LVCSR
	Sainath, Tara N.; Kanevsky, Dimitri; Ramabhadran, Bhuvana...
	In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 19 (2011) 8, 2598-2613
	BLLDB
	OLC Linguistik
	Show details

12	Applications of broad class knowledge for noise robust speech recognition
	Sainath, Tara N. - : Massachusetts Institute of Technology, 2009
	Abstract: Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009. ; Cataloged from PDF version of thesis. ; Includes bibliographical references (p. 157-164). ; This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. These classes are formed by grouping together parts of the acoustic signal that have similar temporal and spectral characteristics, and therefore have much less variability than typical sub-word units used in speech recognition (i.e., phonemes, acoustic units). We explore broad classes formed along phonetic and acoustic dimensions. This thesis first introduces an instantaneous adaptation technique to robustly recognize broad classes in the input signal. Given an initial set of broad class models and input speech data, we explore a gradient steepness metric using the Extended Baum-Welch (EBW) transformations to explain how much these initial model must be adapted to fit the target data. We incorporate this gradient metric into a Hidden Markov Model (HMM) framework for broad class recognition and illustrate that this metric allows for a simple and effective adaptation technique which does not suffer from issues such as data scarcity and computational intensity that affect other adaptation methods such as Maximum a-Posteriori (MAP), Maximum Likelihood Linear Regression (MLLR) and feature-space Maximum Likelihood Linear Regression (fM-LLR). Broad class recognition experiments indicate that the EBW gradient metric method outperforms the standard likelihood technique, both when initial models are adapted via MLLR and without adaptation. ; (cont.) Next, we explore utilizing broad class knowledge as a pre-processor for segmentbased speech recognition systems, which have been observed to be quite sensitive to noise. The experiments are conducted with the SUMMIT segment-based speech recognizer, which detects landmarks - representing possible transitions between phonemes - from large energy changes in the acoustic signal. These landmarks are often poorly detected in noisy conditions. We investigate using the transitions between broad classes, which typically occur at areas of large acoustic change in the audio signal, to aid in landmark detection. We also explore broad classes motivated along both acoustic and phonetic dimensions. Phonetic recognition experiments indicate that utilizing either phonetically or acoustically motivated broad classes offers significant recognition improvements compared to the baseline landmark method in both stationary and non-stationary noise conditions. Finally, this thesis investigates using broad class knowledge for island-driven search. Reliable regions of a speech signal, known as islands, carry most information in the signal compared to unreliable regions, known as gaps. Most speech recognizers do not differentiate between island and gap regions during search and as a result most of the search computation is spent in unreliable regions. Island-driven search addresses this problem by first identifying islands in the speech signal and directing the search outwards from these islands. ; (cont.) In this thesis, we develop a technique to identify islands from broad classes which have been confidently identified from the input signal. We explore a technique to prune the search space given island/gap knowledge. Finally, to further limit the amount of computation in unreliable regions, we investigate scoring less detailed broad class models in gap regions and more detailed phonetic models in island regions. Experiments on both small and large scale vocabulary tasks indicate that the island-driven search strategy results in an improvement in recognition accuracy and computation time. ; by Tara N. Sainath. ; Ph.D.
	Keyword: Electrical Engineering and Computer Science
	URL: http://hdl.handle.net/1721.1/53300
	BASE
	Hide details

13	Acoustic landmark detection and segmentation using the McAulay-Quatieri Sinusoidal Model
	Sainath, Tara N. - : Massachusetts Institute of Technology, 2005
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern