Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4

Hits 1 – 20 of 80

1	Coreference in Universal Dependencies 1.0 (CorefUD 1.0)
	Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeldes, Amir; Zeman, Daniel; Bourgonje, Peter; Cinková, Silvie; Hajič, Jan; Hardmeier, Christian; Krielke, Pauline; Landragin, Frédéric; Lapshinova-Koltunski, Ekaterina; Martí, M. Antònia; Mikulová, Marie; Ogrodniczuk, Maciej; Recasens, Marta; Stede, Manfred; Straka, Milan; Toldova, Svetlana; Vincze, Veronika; Žitkus, Voldemaras. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2022
	Abstract: CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.0 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 1.0 consists of the same corpora and languages as the previous version 0.2; however, the English GUM dataset has been updated to a newer and larger version, and in the Czech/English PCEDT dataset, the train-dev-test split has been changed to be compatible with OntoNotes. Nevertheless, the main change is in the file format (the MISC attributes have new form and interpretation).
	Keyword: bridging relations; coreference; dependency; harmonized annotation; treebank
	URL: http://hdl.handle.net/11234/1-4698
	BASE
	Hide details

2	GECCC Grammar Error Correction Corpus for Czech
	Náplava, Jakub; Straka, Milan; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2022
	BASE
	Show details

3	Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)
	Mikulová, Marie; Straka, Milan; Štěpánek, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2022
	BASE
	Show details

4	NameTag service description
	Straková, Jana; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

5	MorfFlex CZ 2.0
	Hajič, Jan; Hlaváčová, Jaroslava; Mikulová, Marie. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

6	Universal Dependencies 2.9
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

7	Universal Dependencies 2.8.1
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

8	HaCzech: Dataset of Handwritten Czech
	Procházka, Štěpán; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

9	Sentiment Analysis (Czech Model)
	Vysušilová, Petra; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

10	Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)
	Hajič, Jan; Bejček, Eduard; Bémová, Alevtina. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

11	POS Tagging and Lemmatization (Czech model)
	Vysušilová, Petra; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

12	RobeCzech Base
	Straka, Milan; Náplava, Jakub; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

13	NameTag 2 Models (2021-09-16)
	Straková, Jana; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

14	CoNLL-based Extended Czech Named Entity Corpus 2.0
	Konkol, Michal; Konopík, Miloslav; Ševčíková, Magda. - : University of West Bohemia, 2021
	BASE
	Show details

15	Universal Dependencies 2.8
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

16	Czech HS Contracts Dataset (CHSC) 1.0
	Szabó, Adam; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
	BASE
	Show details

17	RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model ...
	Straka, Milan; Náplava, Jakub; Straková, Jana. - : arXiv, 2021
	BASE
	Show details

18	ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5 ...
	Samuel, David; Straka, Milan. - : arXiv, 2021
	BASE
	Show details

19	Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)
	Guillaume, Bruno; Ramisch, Carlos; Waszczuk, Jakub. - : PARSEME, 2020
	BASE
	Show details

20	Slovak MorphoDiTa Models 170914
	Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2020
	BASE
	Show details

Page: 1 2 3 4

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern