2 |
Cascading Oscillators in Decoding Speech: Reflection of a Cortical Computation Principle
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Investigation of Back-off Based Interpolation Between Recurrent Neural Network and N-gram Language Models (Author's Manuscript)
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment
|
|
|
|
In: DTIC (2015)
|
|
BASE
|
|
Show details
|
|
6 |
A Novel Scheme for Speaker Recognition Using a Phonetically-Aware Deep Neural Network
|
|
|
|
In: DTIC (2014)
|
|
Abstract:
We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR). Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments. The use of an ASR-DNN system in the speaker recognition pipeline is attractive as it integrates the information from speech content directly into the statistics, allowing the standard backends to remain unchanged. Improvement from the proposed framework compared to a state-of-the-art system are of 30% relative at the equal error rate when evaluated on the telephone conditions from the 2012 NIST speaker recognition evaluation (SRE). The proposed framework is a successful way to efficiently leverage transcribed data for speaker recognition, thus opening up a wide spectrum of research directions.
|
|
Keyword:
*SPEECH RECOGNITION; DEEP NEURAL NETWORK; NEURAL NETS; STATISTICS; Voice Communications
|
|
URL: http://oai.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA613971 http://www.dtic.mil/docs/citations/ADA613971
|
|
BASE
|
|
Hide details
|
|
8 |
How Autism Affects Speech Understanding in Multitalker Environments
|
|
|
|
In: DTIC (2014)
|
|
BASE
|
|
Show details
|
|
9 |
Development and Utility of Automatic Language Processing Technologies. Volume 2
|
|
|
|
In: DTIC (2014)
|
|
BASE
|
|
Show details
|
|
11 |
Computational Modeling of Emotions and Affect in Social-Cultural Interaction
|
|
|
|
In: DTIC (2013)
|
|
BASE
|
|
Show details
|
|
12 |
What's Wrong With Automatic Speech Recognition (ASR) and How Can We Fix It?
|
|
|
|
In: DTIC (2013)
|
|
BASE
|
|
Show details
|
|
13 |
A Submodularity Framework for Data Subset Selection
|
|
|
|
In: DTIC (2013)
|
|
BASE
|
|
Show details
|
|
14 |
A Spoken Dialogue System for Command and Control
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
15 |
Speech Synthesis Using Perceptually Motivated Features
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
17 |
Machine Recognition vs Human Recognition of Voices
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
18 |
Speaker Clustering for a Mixture of Singing and Reading (Preprint)
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
19 |
Open-Source Multi-Language Audio Database for Spoken Language Processing Applications
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
20 |
Effects of Speech Intensity on the Callsign Acquisition Test (CAT) and Modified Rhyme Test (MRT) Presented in Noise
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
|
|