DE eng

Search in the Catalogues and Directories

Hits 1 – 3 of 3

1
Benchmark for the evaluation of named entity recognition over ancient documents ...
Abstract: The dataset consists of a multilingual noisy corpora for named entity recognition (NER). The noisy versions are simulated from the CoNLL-02 (Spanish and Dutch) and CoNLL-03 (English) NER corpora. The original collections are re-OCRed and four types of noises at two different levels are added in order to simulate various OCR output. More precisely, we first extracted raw texts and converted them into images. These images have been contaminated by adding some common noises when using a scanner. We further extract OCRed data using tesseract open source OCR engine v-3.04.01. Consequently to the image noise insertions, OCRed data contains degradations. Original and noisy texts are finally aligned. This archive contains three folders (one per language). The folders contain the degraded images, the noisy texts extracted by the OCR and their aligned version with clean data. These are the supplementary materials for the TPDL 2020 paper Assessing and minimizing the impact of OCR quality on named entity recognition. If ...
Keyword: OCR, named entity recognition, noisy, degradation
URL: https://dx.doi.org/10.5281/zenodo.3877554
https://zenodo.org/record/3877554
BASE
Hide details
2
Benchmark for the evaluation of named entity recognition over ancient documents ...
BASE
Show details
3
Lexicographical-Based Order for Post-OCR Correction of Named Entities
In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) ; https://hal.archives-ouvertes.fr/hal-02889925 ; 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov 2017, Kyoto, Japan. pp.1192-1197, ⟨10.1109/ICDAR.2017.197⟩ (2017)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
3
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern