8 |
The ParlaMint corpora of parliamentary proceedings
|
|
|
|
In: Lang Resour Eval (2022)
|
|
BASE
|
|
Show details
|
|
15 |
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Montenegrin web corpus meWaC 1.0
|
|
|
|
Abstract:
The Montenegrin web corpus meWaC was built by crawling the .me top-level domain in 2019. The corpus was near-deduplicated on paragraph level, normalised via transliteration into the Latin script, and morphosyntactically annotated, lemmatised and dependency-parsed with a prototype version of the classla pipeline (https://pypi.org/project/classla/). Each document is accompanied by the URL and title metadata. The corpus is available in CoNLL-U format and as vertical file (wilth included registry) for mounting on CQP-compatible concordancers.
|
|
Keyword:
web corpus
|
|
URL: http://hdl.handle.net/11356/1429
|
|
BASE
|
|
Hide details
|
|
17 |
Comparable corpora of South-Slavic Wikipedias CLASSLA-Wikipedia 1.0
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
|
|
|
|
BASE
|
|
Show details
|
|
|
|