3 |
Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Czech HS Contracts Dataset (CHSC) 1.0
|
|
Szabó, Adam; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
|
|
Abstract:
Czech Contracts dataset was created as a part of the thesis Low-resource Text Classification (2021), A. Szabó, MFF UK. Contracts are obtained from the Hlídač Státu web portal. Labels in the development and training set are automatically classified on the basis of the keyword method according to the thesis Automatická klasifikace smluv pro portál HlidacSmluv.cz, J. Maroušek (2020), MFF UK. For this reason, the goal in the classification is not to achieve 100% on the development set, as the classification contains a certain amount of noise. The test set is manually annotated. The dataset contains a total of 97493 contracts.
|
|
Keyword:
contracts; Czech; document classification; Hlídač státu
|
|
URL: http://hdl.handle.net/11234/1-3731
|
|
BASE
|
|
Hide details
|
|
17 |
RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5 ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Slovak MorphoDiTa Models 170914
|
|
Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2020
|
|
BASE
|
|
Show details
|
|
|
|