DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...13
Hits 1 – 20 of 257

1
Raising the Titanic: Prospects for Reviving the Century Dictionary ...
Triggs, Jeffery A.. - : Rutgers University, 2022
BASE
Show details
2
Exploiting Script Similarities to Compensate for the Large Amount of Data in Training Tesseract LSTM: Towards Kurdish OCR
In: Applied Sciences ; Volume 11 ; Issue 20 (2021)
BASE
Show details
3
Reconocimiento automático de un censo histórico impreso sin recursos lingüísticos
Anitei, Dan. - : Universitat Politècnica de València, 2021
Abstract: [ES] El reconocimiento automático de documentos históricos impresos es actualmente un problema resuelto para muchas colecciones de datos. Sin embargo, los sistemas de reconocimiento automático de documentos históricos impresos aún deben resolver varios obstáculos inherentes al trabajo con documentos antiguos. La degradación del papel o las manchas pueden aumentar la dificultad del correcto reconocimiento de los caracteres. No obstante, dichos problemas se pueden paliar utilizando recursos lingüísticos para entrenar buenos modelos de lenguaje que disminuyan la tasa de error de los caracteres. En cambio, hay muchas colecciones como la que se presenta en este trabajo, compuestas por tablas que contienen principalmente números y nombres propios, para las que no se dispone. En este trabajo se muestra que el reconocimiento automático puede realizarse con éxito para una colección de documentos sin utilizar ningún recurso lingüístico. Este proyecto cubre la extracción de información y el proceso de OCR dirigido, especialmente diseñados para el reconocimiento automático de un censo español del siglo XIX, registrado en documentos impresos. Muchos de los problemas relacionados con los documentos históricos se resuelven utilizando una combinación de técnicas clásicas de visión por computador y aprendizaje neuronal profundo. Los errores, como los caracteres mal reconocidos, son detectados y corregidos gracias a la información redundante que contiene el censo. Dada la importancia de este censo español para la realización de estudios demográficos, este trabajo da un paso más e introduce un modelo demostrador que facilita la investigación sobre este corpus mediante la indexación de los datos. ; [EN] Automatic recognition of typeset historical documents is currently a solved problem for many collections of data. However, systems for automatic recognition of typeset historical documents still need to address several issues inherent to working with this kind of documents. Degradation of the paper or smudges can increase the difficulty of correctly recognizing characters, problems that can be alleviated by using linguistic resources for training good language models which decrease the character error rate. Nonetheless, there are many collections such as the one presented in this paper, composed of tables that contain mainly numbers and proper names, for which a language model is neither available nor useful. This paper illustrates that automatic recognition can be done successfully for a collection of documents without using any linguistic resources. The paper covers the information extraction and the targeted OCR process, specially designed for the automatic recognition of a Spanish census from the XIX century, registered in printed documents. Many of the problems related to historical documents are overcame by using a combination of classical computer vision techniques and deep learning. Errors, such as miss-recognized characters, are detected and corrected thanks to redundant information that the census contains. Given the importance of this Spanish census for conducting demographic studies, this paper goes a step forward and introduces a demonstrator model to facilitate researching on this corpus by indexing the data. ; This work has been partially supported by the BBVA Fundation, as a collaboration between the PRHLT team in charge of the HisClima project and the ESPAREL project. ; Anitei, D. (2021). Reconocimiento automático de un censo histórico impreso sin recursos lingüísticos. Universitat Politècnica de València. http://hdl.handle.net/10251/172694 ; TFGM
Keyword: Censo; Census; Computer Vision; Documentos Históricos Impresos; Historical Printed Documents; LENGUAJES Y SISTEMAS INFORMATICOS; Máster Universitario en Inteligencia Artificial; Optical Character Recognition; Reconocimiento de Formas e Imagen Digital-Màster Universitari en Intel·Ligència Artificial: Reconeixement de Formes i Imatge Digital; Reconocimiento Óptico de Caracteres; Visión por Computador
URL: http://hdl.handle.net/10251/172694
BASE
Hide details
4
Quality Measurement for Optical Character Recognition without ground truth data ...
Weltevrede, Mike. - : Zenodo, 2020
BASE
Show details
5
Quality Measurement for Optical Character Recognition without ground truth data ...
Weltevrede, Mike. - : Zenodo, 2020
BASE
Show details
6
AI in gastronomic tourism ...
BASE
Show details
7
Improving the recognition of Dutch Gothic machine print, at four levels in the processing pipeline, in four days ...
BASE
Show details
8
AI in gastronomic tourism ...
BASE
Show details
9
Improving the recognition of Dutch Gothic machine print, at four levels in the processing pipeline, in four days ...
BASE
Show details
10
NAT: Noise-Aware Training for Robust Neural Sequence Labeling
In: Fraunhofer IAIS (2020)
BASE
Show details
11
OPTICAL CHARACTER RECOGNITION APPLIED TO ANDROID-BASED BILINGUAL TRANSLATOR APPLICATION (ENGLISH AND INDONESIAN) TO SIGN LANGUAGE ...
Pratama, Juan Adhiasta. - : Zenodo, 2019
BASE
Show details
12
OPTICAL CHARACTER RECOGNITION APPLIED TO ANDROID-BASED BILINGUAL TRANSLATOR APPLICATION (ENGLISH AND INDONESIAN) TO SIGN LANGUAGE ...
Pratama, Juan Adhiasta. - : Zenodo, 2019
BASE
Show details
13
Bilingual text detection in natural scene images using invariant moments
Maheshwari, Karan; Joseph Raj, Alex N.; Mahesh, Vijayalakshmi G.. - : Netherlands, IOS Press, 2019
BASE
Show details
14
Wenn Algorithmen Zeitschriften lesen - vom Mehrwert automatisierter Textanreicherung ...
Wanger, Regina; Gasser, Michael. - : ETH Zurich, 2018
BASE
Show details
15
Generating a training corpus for OCR post-correction using encoder-decoder model
In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) ; International Joint Conference on Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01831147 ; International Joint Conference on Natural Language Processing, Nov 2017, Taipei, Taiwan ; https://www.aclweb.org/anthology/I17-1101 (2017)
BASE
Show details
16
Corpus linguistics for History ... : the methodology of investigating place-name discourses in digitised nineteenth-century newspapers ...
Joulain, Amelia Tahirih. - : Lancaster University, 2017
BASE
Show details
17
Radical Recognition in Off-Line Handwritten Chinese Characters Using Non-Negative Matrix Factorization
In: Senior Projects Spring 2016 (2016)
BASE
Show details
18
Using SMT for OCR error correction of historical texts
In: Afli, Haithem orcid:0000-0002-7449-4707 , Qui, Zhengwei, Way, Andy orcid:0000-0001-5736-5930 and Sheridan, Páraic (2016) Using SMT for OCR error correction of historical texts. In: Tenth International Conference on Language Resources and Evaluation (LREC 2016), 23-28 May 2016, Portorož, Slovenia. ISBN 978-2-9517408-9-1 (2016)
BASE
Show details
19
Augmented reality applied to language translation
BASE
Show details
20
Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection ...
Bloodgood, Michael; Strauss, Benjamin. - : Digital Repository at the University of Maryland, 2016
BASE
Show details

Page: 1 2 3 4 5...13

Catalogues
9
0
103
0
0
0
0
Bibliographies
160
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
97
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern