DE eng

Search in the Catalogues and Directories

Page: 1 2 3
Hits 1 – 20 of 56

1
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
In: Language Resources and Evaluation Conference (LREC) ; https://hal.archives-ouvertes.fr/hal-01807093 ; Language Resources and Evaluation Conference (LREC), Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Pi, May 2018, Miyazaki, Japan (2018)
Abstract: International audience ; Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources and some even lack a stable orthography. Building systems under these almost zero resource conditions is not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered, unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We detail how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; field linguistics; language documentation; spoken term discovery; unwritten languages; word segmentation; zero resource technologies
URL: https://hal.archives-ouvertes.fr/hal-01807093/document
https://hal.archives-ouvertes.fr/hal-01807093/file/lrec2018_mboshi_final-3.pdf
https://hal.archives-ouvertes.fr/hal-01807093
BASE
Hide details
2
A database of German definitory contexts from selected web sources
In: 11th International Conference on Language Resources and Evaluation (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01798704 ; 11th International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan. pp.3068-3073 (2018)
BASE
Show details
3
A corpus of German political speeches from the 21st century
In: 11th Language Resources and Evaluation Conference (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01798703 ; 11th Language Resources and Evaluation Conference (LREC 2018), May 2018, Miyazaki, Japan. pp.792-797 (2018)
BASE
Show details
4
Speaker Recognition: Building the Mixer 4 and 5 Corpora
In: http://www.lrec-conf.org/proceedings/lrec2008/pdf/902_paper.pdf (2008)
BASE
Show details
5
Corpus support for machine translation at LDC
In: http://www.mt-archive.info/LREC-2006-Ma-1.pdf (2006)
BASE
Show details
6
Corpus support for machine translation at LDC
In: http://www.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/pdf/754_pdf.pdf (2006)
BASE
Show details
7
Integrated linguistic resources for language exploitation technologies
In: http://www.mt-archive.info/LREC-2006-Strassel.pdf (2006)
BASE
Show details
8
Integrated Linguistic Resources for Language Exploitation Technologies
In: http://papers.ldc.upenn.edu/LREC2006/LREC_2006_GALE_Paper.pdf (2006)
BASE
Show details
9
The Mixer and Transcript Reading Corpora: Resources for Multilingual
In: http://www.ll.mit.edu/mission/communications/ist/publications/060524_CampbellJ.pdf (2006)
BASE
Show details
10
The mixer and transcript reading corpora: Resources for multilingual, crosschannel speaker recognition research
In: http://www.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/pdf/530_pdf.pdf (2006)
BASE
Show details
11
Integrated linguistic resources for language exploitation technologies
In: http://www.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/pdf/745_pdf.pdf (2006)
BASE
Show details
12
The Mixer corpus of multilingual, multichannel speaker recognition data
In: http://www.lrec-conf.org/proceedings/lrec2004/pdf/771.pdf (2004)
BASE
Show details
13
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
In: http://papers.ldc.upenn.edu/LREC2004/LREC2004_Fisher_Paper.pdf (2004)
BASE
Show details
14
A Progress Report from the Linguistic Data Consortium: recent activities in resource creation and . . .
In: http://papers.ldc.upenn.edu/LREC2004/LREC2004_LDC_Paper.pdf (2004)
BASE
Show details
15
The MMSR Bilingual and Crosschannel Corpora for
In: http://isca-speech.org/archive_open/archive_papers/odyssey_04/ody4_029.pdf (2004)
BASE
Show details
16
TalkBank: Building an Open Unified Multimodal Database of Communicative
In: http://childes.psy.cmu.edu/lrec/LREC-tb.pdf (2004)
BASE
Show details
17
Linguistic Resource Creation for Research and Technology Development: A Recent Experiment
In: http://papers.ldc.upenn.edu/TALIP2003/SurpriseLang.pdf (2003)
BASE
Show details
18
Shared Resources for Robust Speech-to-Text Technology
In: http://papers.ldc.upenn.edu/Eurospeech2003/STT.pdf (2003)
BASE
Show details
19
TIDES language resources: A resource map for translingual information access
In: http://www.lrec-conf.org/proceedings/lrec2002/pdf/291.pdf (2002)
BASE
Show details
20
Language resources creation and distribution at the linguistic data consortium
In: http://www.lrec-conf.org/proceedings/lrec2002/pdf/245.pdf (2002)
BASE
Show details

Page: 1 2 3

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
56
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern