6 |
The ParlaMint corpora of parliamentary proceedings
|
|
|
|
In: Lang Resour Eval (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1
|
|
|
|
BASE
|
|
Show details
|
|
12 |
The FAIR Index of CMC Corpora
|
|
|
|
In: CMC Corpora through the prism of Digital Humanities ; https://hal.archives-ouvertes.fr/hal-03121698 ; CMC Corpora through the prism of Digital Humanities, 2020 (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Unfinished Business:Construction and Maintenance of a Semantically Tagged Historical Parliamentary Corpus, UK Hansard from 1803 to the present day
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Corpus of Academic Slovene (BSc/BA theses) KAS-dipl 1.0
|
|
Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; Hrovat, Goran. - : Jožef Stefan Institute, 2019. : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2019
|
|
Abstract:
The KAS-dipl corpus of Slovene BSc/BA theses consists of almost 65,000 texts (3,5 million pages or 1,1 billion tokens) written 2000 - 2018 and gathered from the digital libraries of Slovene higher education institutions via the Slovene Open Science portal (http://openscience.si). The theses have associated with them significant metadata, while each thesis in the corpus contains its textual body, i.e. without their front and back matter. The body is divided into pages, these into paragraphs, and then into sentences. The sentence tokens are morphosyntactically annotated, words are lemmatised and English-Slovene pairs of term candidates are marked up and linked. The corpus is distributed in the canonical TEI encoding, in the so called vertical format used by the (no)Sketch Engine and CWB concordancers, and as plain text files. Each format distribution also contains a file with thesis metadata. This repository entry contains the corpus of BSc/BA theses only; separate entries are available that contain PhD theses (KAS-dr: http://hdl.handle.net/11356/1265), MSc/MA theses (KAS-mag: http://hdl.handle.net/11356/1266) and the complete KAS corpus with all three (KAS: http://hdl.handle.net/11356/1244).
|
|
Keyword:
academic writing; BSc/BA theses; TEI; terminology
|
|
URL: http://hdl.handle.net/11356/1267
|
|
BASE
|
|
Hide details
|
|
|
|