DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 25

1
Universal Dependencies 2.9
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
2
Universal Dependencies 2.8.1
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
3
Universal Dependencies 2.8
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
4
Spoken corpus Gos 1.1
Zwitter Vitez, Ana; Zemljarič Miklavčič, Jana; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
5
Corpus of 1968 Slovenian literature Maj68 1.0
BASE
Show details
6
Corpus of term-annotated texts RSDO5 1.1
BASE
Show details
7
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0
Ljubešić, Nikola; Fišer, Darja; Erjavec, Tomaž. - : Jožef Stefan Institute, 2021
BASE
Show details
8
Montenegrin web corpus meWaC 1.0
Ljubešić, Nikola; Erjavec, Tomaž. - : Jožef Stefan Institute, 2021
BASE
Show details
9
Comparable corpora of South-Slavic Wikipedias CLASSLA-Wikipedia 1.0
Ljubešić, Nikola; Markoski, Filip; Markoska, Elena. - : Jožef Stefan Institute, 2021
BASE
Show details
10
Training corpus ssj500k 2.3
Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
11
Spoken corpus Gos VideoLectures 4.2 (transcription)
Verdonik, Darinka; Potočnik, Tomaž; Sepesy Maučec, Mirjam; Erjavec, Tomaž; Majhenič, Simona; Žgank, Andrej. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2021
Abstract: Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training continuous speech recognition for Slovene language, for phonetic research or any other research of Slovene academic speech. The corpus contains a selection of public lectures available through the web portal Videolectures.net provided by the Jožef Stefan Institute, and covers 55 lectures and 22 hours of speech. This resource contains only annotated transcriptions of the corpus, while the audio recordings are available at http://hdl.handle.net/11356/1222. The transcriptions for Gos VideoLectures were done manually and carefully checked. The main guidelines for transcription were those of the Gos corpus (http://www.korpus-gos.net/Support/About). The transcription tool Transcriber 1.5.1 (http://trans.sourceforge.net/en/presentation.php) was used for making transcriptions. It can be also used for reading or exporting transcriptions (.trs files) to different formats. The transcriptions comprise the TRS files with tabular metadata, their conversion to TEI and to vertical file format (as used e.g. by Sketch Engine). Each recording has two TRS files, one with pronunciation-based and the other with the standardised/normalised transcription. The TRS zip also contains files with automatically produced word and phone-level alignment with the speech signal, as well as the annotation guidelines (in Slovenian). The TEI and vertical encodings join the two transcriptions at the token level, with the normalised words also linguistically annotated. The annotations comprise the word lemma, the MULTEXT-East MSDs and the Universal dependencies morphological features. As opposed to version 4.1, this version corrects some errors and slightly changes the TEI and vertical encodings.
Keyword: academic speech; speech database; speech recognition; speech transcription; spoken corpus; TEI
URL: http://hdl.handle.net/11356/1444
BASE
Hide details
12
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
BASE
Show details
13
Corpus of Croatian news portals ENGRI (2014-2018)
Bogunović, Irena; Kučić, Mario; Ljubešić, Nikola. - : University of Rijeka, Faculty of Maritime Studies, 2021
BASE
Show details
14
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1
Ljubešić, Nikola; Fišer, Darja; Erjavec, Tomaž. - : Jožef Stefan Institute, 2021
BASE
Show details
15
Spoken corpus Gos VideoLectures 4.1 (transcription)
Verdonik, Darinka; Potočnik, Tomaž; Sepesy Maučec, Mirjam. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2021
BASE
Show details
16
Corpus of Slovenian school texts SBSJ 1.0
BASE
Show details
17
Abstracts from the KAS corpus KAS-Abs 1.0
Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola. - : Jožef Stefan Institute, 2021. : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2021
BASE
Show details
18
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
BASE
Show details
19
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0
BASE
Show details
20
Corpus of term-annotated texts RSDO5 1.0
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
25
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern