DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Training corpus hr500k 1.0
Abstract: The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also manually annotated with syntactic dependencies. Furthermore, about a fifth of the corpus is annotated with semantic role labels. The annotations (and other aspects) of the hr500k corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications for Croatian, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, while (4) the semantic role labelling annotation guidelines are currently in the publication process.
Keyword: dependency treebank; manual annotation; named entities; parsing; part-of-speech tagging; semantic role labelling; TEI; tokenisation
URL: http://hdl.handle.net/11356/1183
BASE
Hide details
2
Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
3
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
4
Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
5
Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
5
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern