DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6 7 8 9
Hits 81 – 100 of 174

81
Bilingual terminology extraction dataset KAS-biterm 1.0
Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola. - : Jožef Stefan Institute, 2018
BASE
Show details
82
Terminology identification dataset KAS-term 1.0
Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola. - : Jožef Stefan Institute, 2018
BASE
Show details
83
Croatian language corpus Riznica 0.1
Brozović Rončević, Dunja; Ćavar, Damir; Ćavar, Małgorzata. - : Institute of Croatian Language and Linguistics, 2018
BASE
Show details
84
Training corpus hr500k 1.0
Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
BASE
Show details
85
Dataset and baseline model of moderated content FRENK-STYRIA-24sata 1.0
Ljubešić, Nikola; Erjavec, Tomaž; Fišer, Darja. - : Jožef Stefan Institute, 2018
BASE
Show details
86
hr500k – A Reference Training Corpus of Croatian.
In: Conference papers (2018)
BASE
Show details
87
Closing a Gap in the Language Resources Landscape: Groundwork and Best Practices from Projects on Computer-mediated Communication in four European Countries
Beißwenger, Michael [Verfasser]; Chanier, Thierry [Verfasser]; Erjavec, Tomaž [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
88
Integrating corpora of computer-mediated communication into the language resources landscape: Initiatives and best practices from French, German, Italian and Slovenian projects
Beißwenger, Michael [Verfasser]; Chanier, Thierry [Verfasser]; Chiari, Isabella [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
89
TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms
Bański, Piotr [Verfasser]; Bowers, Jack [Verfasser]; Erjavec, Tomaž [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
90
TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms
In: Electronic Lexicography in the 21st Century: Proceedings of ELex 2017 Conference ; https://hal.inria.fr/hal-01757108 ; Electronic Lexicography in the 21st Century: Proceedings of ELex 2017 Conference, Sep 2017, Leiden, Netherlands (2017)
BASE
Show details
91
Universal Dependencies 2.1
In: https://hal.inria.fr/hal-01682188 ; 2017 (2017)
BASE
Show details
92
Closing a gap in the language resources landscape : Groundwork and best practices from projects on computer-mediated communication in four European countries.
Beißwenger, Michael; Chanier, Thierry; Chiari, Isabella. - : HAL CCSD, 2017. : Linköping Electronic Conference Proceedings, 2017
In: CLARIN Annual Conference 2016 ; https://hal.archives-ouvertes.fr/hal-01379621 ; CLARIN Annual Conference 2016, Oct 2016, Aix-en-Provence, France. 136, Linköping Electronic Conference Proceedings, pp.1-19, 2017, Selected papers from the CLARIN Annual Conference 2016, 978-91-7685-499-0 ; http://www.ep.liu.se/ecp/contents.asp?issue=136 (2017)
BASE
Show details
93
Universal Dependencies 2.0 alpha (obsolete)
Nivre, Joakim; Agić, Željko; Ahrenberg, Lars. - : Universal Dependencies Consortium, 2017
BASE
Show details
94
Universal Dependencies 2.0
Nivre, Joakim; Agić, Željko; Ahrenberg, Lars. - : Universal Dependencies Consortium, 2017
BASE
Show details
95
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
Nivre, Joakim; Agić, Željko; Ahrenberg, Lars. - : Universal Dependencies Consortium, 2017
BASE
Show details
96
Universal Dependencies 2.1
Nivre, Joakim; Agić, Željko; Ahrenberg, Lars. - : Universal Dependencies Consortium, 2017
BASE
Show details
97
ReLDI token+tag+lemma+NER web service for WebLicht
Ljubešić, Nikola; Perovšek, Matic; Erjavec, Tomaž. - : Jožef Stefan Institute, 2017
BASE
Show details
98
Tweet code-switching corpus Janes-Preklop 1.0
Reher, Špela; Erjavec, Tomaž; Fišer, Darja. - : Jožef Stefan Institute, 2017
BASE
Show details
99
Slovenian parliamentary corpus SlovParl 2.0
Pančur, Andrej; Šorn, Mojca; Erjavec, Tomaž. - : Institute of Contemporary History, 2017
BASE
Show details
100
CMC training corpus Janes-Tag 2.0
Abstract: Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity annotation of non-standard Slovene. As the corpus has been carefully manually annotated, it is also suitable for detailed linguistic explorations which require highly accurate and reliable annotations. As an update to version 1.2, 2.0 corrects some minor errors and includes named entity annotation. A slightly older version of this corpus is described in: ERJAVEC, Tomaž, ČIBEJ, Jaka, ARHAR HOLDT, Špela, LJUBEŠIĆ, Nikola, FIŠER, Darja. Gold-standard datasets for annotation of Slovene computer-mediated communication. In Proceedings of RASLAN 2016: Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016, pp. 29-40, https://nlp.fi.muni.cz/raslan/raslan16.pdf Note that a related corpus, Janes-Norm is also available, cf. http://hdl.handle.net/11356/1084.
Keyword: computer-mediated communication; lemmatisation; manual annotation; named entities; tagging; TEI; tokenisation; word normalisation
URL: http://hdl.handle.net/11356/1123
BASE
Hide details

Page: 1 2 3 4 5 6 7 8 9

Catalogues
2
0
0
0
6
0
0
Bibliographies
7
0
0
0
0
0
2
0
0
Linked Open Data catalogues
0
Online resources
1
0
0
0
Open access documents
155
0
3
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern