13 |
MODALITY corpus - SPEAKER 01 - SEQUENCE S6 ...
|
|
|
|
Abstract:
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system, as every utterance was labelled. Recordings in noisy conditions can be used to test the robustness of speech recognition systems. The language material was based on a remote control scenario and it includes 231 words -numbers, names of months and days, a set of verbs and nouns related to a computer device control. They were read by speakers as separated words and sequences resulting in a set of 12 recording sessions per speaker. Half of the sessions were recorded in quiet conditions, the other half contained three kinds of intrusive signals (traffic, babble and factory noise). The corpus includes recordings of 42 speakers (33 male, 9 female). The participants ...
|
|
Keyword:
language; multimodal; native speakers; recordings
|
|
URL: https://mostwiedzy.pl/en/open-research-data/modality-corpus-speaker-01-sequence-s6,10260723221022808-0 https://dx.doi.org/10.34808/atbc-bj43
|
|
BASE
|
|
Hide details
|
|
|
|