2 |
Infrastructure for Semantic Annotation in the Genomics Domain
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Towards cross-platform interoperability for machine-assisted text annotation
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Three Dimensions of Reproducibility in Natural Language Processing
|
|
|
|
BASE
|
|
Show details
|
|
16 |
MULTEXT-East "1984" annotated corpus 4.0
|
|
Erjavec, Tomaž; Barbu, Ana-Maria; Derzhanski, Ivan; Dimitrova, Ludmila; Garabík, Radovan; Ide, Nancy; Kaalep, Heiki-Jaan; Kotsyba, Natalia; Krstev, Cvetana; Oravecz, Csaba; Petkevič, Vladimír; Priest-Dorman, Greg; QasemiZadeh, Behrang; Radziszewski, Adam; Simov, Kiril; Tufiş, Dan; Zdravkova, Katerina. - : Jožef Stefan Institute, 2015
|
|
Abstract:
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages. This version of the corpus contains the linguistically annotated texts, with each word tagged by its lemma and its MULTEXT(-East) morphosyntactic description (MSD, i.e., a fine-grained feature-structure based PoS tag). The structurally annotated texts are a separate submission (http://hdl.handle.net/11356/1044), also with somewhat different languages.
|
|
Keyword:
manual annotation; multilingual; parallel corpus; part-of-speech tagging; Slavic languages; TEI
|
|
URL: http://hdl.handle.net/11356/1043
|
|
BASE
|
|
Hide details
|
|
|
|