DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
Canonicalizing the Deutsches Textarchiv
Jurish, Bryan. - 2013
Abstract: Virtually all conventional text-based natural language processing techniques - from traditional information retrieval systems to full-fledged parsers - require reference to a fixed lexicon accessed by surface form, typically trained from or constructed for synchronic input text adhering strictly to contemporary orthographic conventions. Unconventional input such as historical text which violates these conventions therefore presents difficulties for any such system due to lexical variants present in the input but missing from the application lexicon. To facilitate the extension of synchronically-oriented natural language processing techniques to historical text while minimizing the need for specialized lexical resources, one may first attempt an automatic canonicalization of the input text. This paper provides an informal overview of the various canonicalization techniques currently employed by the Deutsches Textarchiv project at the Berlin-Brandenburg Academy of Sciences and Humanities to prepare a corpus of historical German text for part-of-speech tagging, lemmatization, and integration into a robust online information retrieval system.
Keyword: ddc:430; Deutsch; Korpus
URN: urn:nbn:de:kobv:b4-opus-24433
URL: https://edoc.bbaw.de/frontdoor/index/index/docId/2165
https://nbn-resolving.org/urn:nbn:de:kobv:b4-opus-24433
https://edoc.bbaw.de/files/2165/Jurish.pdf
BASE
Hide details
2
Constructing a canonicalized corpus of historical German by text alignment
In: New Methods in Historical Corpora (2013), 221-234
IDS Bibliografie zur deutschen Grammatik
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
1
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern