Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Speaking Style Variability in Speaker Discrimination by Humans and Machines
	Afshan, Amber. - : eScholarship, University of California, 2022
	Abstract: A speaker's voice constantly varies in everyday situations, such as when talking to a friend, reading aloud, talking to pets, or narrating a happy incident. These changes in speaking style affect human and machine abilities to distinguish speakers based on their voice. This dissertation studies the effects of speaking style variability on speaker discrimination performance by humans and machines.We compare human speaker discrimination performance for read speech versus casual conversations. Listeners perform better when stimuli are style-matched, particularly in read speech -- read speech trials. They perform the worst in style-mismatched conditions. Moderate style variability affects the "same speaker" task more than the "different speaker" task. The speakers who are "easy" or "hard" to "tell together" are not the same as those who are "easy" or "hard" to "tell apart." Analysis of acoustic variability suggests that listeners find it easier to "tell speakers together" when they rely on speaker-specific idiosyncrasies and that they "tell speakers apart" based on their relative positions within a shared acoustic space.The effects of style variability on automatic speaker verification (ASV) systems are systematically analyzed using the UCLA Speaker Variability database, which comprises multiple speaking styles per speaker. The performance is better when enrollment and test utterances are of the same style, but it substantially degrades when styles are mismatched. We hypothesize that between-frame entropy can capture style-related spectral and temporal variations. We propose an entropy-based variable frame rate (VFR) technique to address style variability in two different approaches: data augmentation and self-attentive conditioning. Both approaches improve performance in style-mismatch scenarios and are comparable in performance.Furthermore, humans and machines seem to employ different approaches to speaker discrimination. In an attempt to improve ASV performance in the presence of style variability, insights learnt from the human speaker perception experiments are used to design a training loss function, referred to as "CllrCE loss". CllrCE loss focuses on both speaker-specific idiosyncrasies and relative acoustic distances between the speakers to train the ASV system. This loss function improves ASV performance in case of style variability, especially in the case of moderate style variations from conversational speech.
	Keyword: Acoustic space analysis; Computer engineering; Electrical engineering; Human speaker perception; Self-attention conditioning; Speaker verification; Speaking style; Variable frame rate
	URL: https://escholarship.org/uc/item/3zh346jm
	BASE
	Hide details

2	Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification ...
	Ravi, Vijay; Fan, Ruchao; Afshan, Amber. - : arXiv, 2020
	BASE
	Show details

3	Improved subject-independent acoustic-to-articulatory inversion
	Afshan, Amber; Ghosh, Prasanta Kumar. - : ELSEVIER SCIENCE BV, 2015
	BASE
	Show details

Search in the Catalogues and Directories