1 |
Coreference in Universal Dependencies 1.0 (CorefUD 1.0)
|
|
Nedoluzhko, Anna; Novák, Michal; Popel, Martin; Žabokrtský, Zdeněk; Zeldes, Amir; Zeman, Daniel; Bourgonje, Peter; Cinková, Silvie; Hajič, Jan; Hardmeier, Christian; Krielke, Pauline; Landragin, Frédéric; Lapshinova-Koltunski, Ekaterina; Martí, M. Antònia; Mikulová, Marie; Ogrodniczuk, Maciej; Recasens, Marta; Stede, Manfred; Straka, Milan; Toldova, Svetlana; Vincze, Veronika; Žitkus, Voldemaras. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2022
|
|
Abstract:
CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.0 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 1.0 consists of the same corpora and languages as the previous version 0.2; however, the English GUM dataset has been updated to a newer and larger version, and in the Czech/English PCEDT dataset, the train-dev-test split has been changed to be compatible with OntoNotes. Nevertheless, the main change is in the file format (the MISC attributes have new form and interpretation).
|
|
Keyword:
bridging relations; coreference; dependency; harmonized annotation; treebank
|
|
URL: http://hdl.handle.net/11234/1-4698
|
|
BASE
|
|
Hide details
|
|
3 |
Addressing Syntax-Based Semantic Complementation: Incorporating Entity and Soft Dependency Constraints into Metonymy Resolution
|
|
|
|
In: Future Internet; Volume 14; Issue 3; Pages: 85 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
The Association between Mothers’ Smartphone Dependency and Preschoolers’ Problem Behavior and Emotional Intelligence
|
|
|
|
In: Healthcare; Volume 10; Issue 2; Pages: 185 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
Multitask Pointer Network for Multi-Representational Parsing
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Joint learning of morphology and syntax with cross-level contextual information flow
|
|
|
|
In: 2022 ; 1 ; 33 (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Analyse en dépendances du français avec des plongements contextualisés
|
|
|
|
In: 28e Conférence sur le Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-03223424 ; 28e Conférence sur le Traitement Automatique des Langues Naturelles, Jun 2021, Lille (virtuel), France (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Study of non-projective dependencies in French ; Étude des dépendances syntaxiques non projectives en français
|
|
|
|
In: ISSN: 1248-9433 ; EISSN: 1965-0906 ; Revue TAL ; https://hal.inria.fr/hal-03389157 ; Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2021, 62 (1) (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Bigger is not always better: Viability selection on body mass varies across life stages in a hibernating mammal.
|
|
|
|
In: Ecology and evolution, vol 11, iss 7 (2021)
|
|
BASE
|
|
Show details
|
|
11 |
To be or not to be adultlike in syntax: An experimental study of language acquisition and processing in children ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|