Page: 1 2 3 4 5 6 7 8 9 10
82 |
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.1
|
|
|
|
BASE
|
|
Show details
|
|
83 |
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1
|
|
|
|
BASE
|
|
Show details
|
|
85 |
SemEval-2020 Task 3: Graded Word Similarity in Context
|
|
Santos Armendariz, Carlos; Purver, Matthew; Pollak, Senja. - : International Committee for Computational Linguistics, 2020. : https://www.aclweb.org/anthology/2020.semeval-1.3, 2020. : Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), 2020
|
|
BASE
|
|
Show details
|
|
86 |
XML-Encoding of a spoken Serbian corpus targeting forms of address
|
|
|
|
In: Lemmenmeier-Batinić, Dolores; Ljubešić, Nikola; Samardžić, Tanja (2020). XML-Encoding of a spoken Serbian corpus targeting forms of address. In: Conference on Language Technologies & Digital Humanities, Ljubljana, 24 September 2020 - 25 September 2020, 127-130. (2020)
|
|
BASE
|
|
Show details
|
|
87 |
The Image of the Monolingual Dictionary Across Europe. Results of the European Survey of Dictionary use and Culture
|
|
|
|
In: ISSN: 0950-3846 ; EISSN: 1477-4577 ; International Journal of Lexicography ; https://hal.archives-ouvertes.fr/hal-03512668 ; International Journal of Lexicography, Oxford University Press (OUP), 2019 (2019)
|
|
BASE
|
|
Show details
|
|
92 |
The CLASSLA-StanfordNLP model for UD dependency parsing of standard Croatian
|
|
|
|
BASE
|
|
Show details
|
|
93 |
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian
|
|
|
|
BASE
|
|
Show details
|
|
94 |
The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian
|
|
|
|
BASE
|
|
Show details
|
|
97 |
CMC training corpus Janes-Tag 2.1
|
|
|
|
Abstract:
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity annotation of non-standard Slovene. As the corpus has been carefully manually annotated, it is also suitable for detailed linguistic explorations which require highly accurate and reliable annotations. As an update to version 2.0, this version corrects some minor errors in NER annotation and introduces, in addition to MULTEXT-East morphosyntactic descriptions, also Universal Dependencies morphological features and the corpus in CoNLL-U format. The UD features are also included in the vert file. The first version of this corpus is described in: ERJAVEC, Tomaž, ČIBEJ, Jaka, ARHAR HOLDT, Špela, LJUBEŠIĆ, Nikola, FIŠER, Darja. 2016. Gold-standard datasets for annotation of Slovene computer-mediated communication. In Proceedings of RASLAN 2016: Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016, pp. 29-40, https://nlp.fi.muni.cz/raslan/raslan16.pdf FIŠER, Darja, LJUBEŠIĆ, Nikola, ERJAVEC, Tomaž. 2018. The Janes project: language resources and tools for Slovene user generated content. Language Resources & Evaluation. https://rdcu.be/7RX4 Note that a related corpus, Janes-Norm is also available, cf. http://hdl.handle.net/11356/1084.
|
|
Keyword:
computer-mediated communication; lemmatisation; manual annotation; named entities; part-of-speech tagging; TEI; tokenisation; word normalisation
|
|
URL: http://hdl.handle.net/11356/1238
|
|
BASE
|
|
Hide details
|
|
Page: 1 2 3 4 5 6 7 8 9 10
|
|