DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...73
Hits 1 – 20 of 1.455

1
Assessing the impact of OCR noise on multilingual event detection over digitised documents
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
Abstract: International audience ; Event detection (ED) is a crucial task for natural language processing (NLP) and it involves the identification of instances of specified types of events in text and their classification into event types. The detection of events from digitised documents could enable historians to gather and combine a large amount of information into an integrated whole, a panoramic interpretation of the past. However, the level of degradation of digitised documents and the quality of the optical character recognition (OCR) tools might hinder the performance of an event detection system. While several studies have been performed in detecting events from historical documents, the transcribed documents needed to be hand-validated which implied a great effort of human expertise and manual labor-intensive work. Thus, in this study, we explore the robustness of two different event detection language-independent models to OCR noise, over two datasets that cover different event types and multiple languages. We aim at analysing their ability to mitigate problems caused by the low quality of the digitised documents and we simulate the existence of transcribed data, synthesised from clean annotated text, by injecting synthetic noise. For creating the noisy synthetic data, we chose to utilise four main types of noise that commonly occur after the digitisation process: Character Degradation, Bleed Through, Blur, and Phantom Character. Finally, we conclude that the imbalance of the datasets, the richness of the different annotation styles, and the language characteristics are the most important factors that can influence event detection in digitised documents.
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Digitised Documents; Event Detection; Information Extraction
URL: https://hal.archives-ouvertes.fr/hal-03635985/file/IJDL2022-Assessing%20the%20Impact%20of%20OCR%20Noise%20on%20Multilingual%20Event%20Detection%20over%20Digitised%20Documents.pdf
https://doi.org/10.1007/s00799-022-00325-2
https://hal.archives-ouvertes.fr/hal-03635985/document
https://hal.archives-ouvertes.fr/hal-03635985
BASE
Hide details
2
PROTECT: A Pipeline for Propaganda Detection and Classification
In: CLiC-it 2021- Italian Conference on Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-03417019 ; CLiC-it 2021- Italian Conference on Computational Linguistics, Jan 2022, Milan, Italy (2022)
BASE
Show details
3
Computational Measures of Deceptive Language: Prospects and Issues
In: ISSN: 2297-900X ; EISSN: 2297-900X ; Frontiers in Communication ; https://hal.archives-ouvertes.fr/hal-03629780 ; Frontiers in Communication, Frontiers, 2022, 7, pp.792378. ⟨10.3389/fcomm.2022.792378⟩ (2022)
BASE
Show details
4
Выявление транснациональных преступлений как направление уголовной политики Российской Федерации ... : Detection of transnational crimes as a direction of criminal policy of Russian Federation ...
Астахова Е.А.. - : Правовая политика и правовая жизнь, 2022
BASE
Show details
5
Potential of automatic speech processing technologies for early detection of oral language disorders: a meta-analytic review ...
Bonnet, Camille. - : Open Science Framework, 2022
BASE
Show details
6
Low-cost electronic circuitry for photoacoustic gas sensing ...
BASE
Show details
7
Low-cost electronic circuitry for photoacoustic gas sensing ...
BASE
Show details
8
Parkinson detection by analyzing speech signals ...
Μαρτινοπούλου, Ευσταθία Ηρακλή. - : Aristotle University of Thessaloniki, 2022
BASE
Show details
9
Assessing the Linguistic Quality of REST APIs for IoT Applications ...
BASE
Show details
10
Assessing the Linguistic Quality of REST APIs for IoT Applications ...
BASE
Show details
11
Assessing the Linguistic Quality of REST APIs for IoT Applications ...
BASE
Show details
12
Detection and Recognition of Asynchronous Auditory/Visual Speech: Effects of Age, Hearing Loss, and Talker Accent ...
Gordon-Salant, Sandra; Schwartz, Maya; Oppler, Kelsey. - : Digital Repository at the University of Maryland, 2022
BASE
Show details
13
Data for: Speech naturalness detection and language representation in the dog brain ...
BASE
Show details
14
Data for: Speech naturalness detection and language representation in the dog brain ...
BASE
Show details
15
MIss RoBERTa WiLDe: Metaphor Identification Using Masked Language Model with Wiktionary Lexical Definitions
In: Applied Sciences; Volume 12; Issue 4; Pages: 2081 (2022)
BASE
Show details
16
A White-Box Sociolinguistic Model for Gender Detection
In: Applied Sciences; Volume 12; Issue 5; Pages: 2676 (2022)
BASE
Show details
17
Artificial Intelligence in Digestive Endoscopy—Where Are We and Where Are We Going?
In: Diagnostics; Volume 12; Issue 4; Pages: 927 (2022)
BASE
Show details
18
Detection of Chinese Deceptive Reviews Based on Pre-Trained Language Model
In: Applied Sciences; Volume 12; Issue 7; Pages: 3338 (2022)
BASE
Show details
19
Toward Realigning Automatic Speaker Verification in the Era of COVID-19
In: Sensors; Volume 22; Issue 7; Pages: 2638 (2022)
BASE
Show details
20
Spam Reviews Detection in the Time of COVID-19 Pandemic: Background, Definitions, Methods and Literature Analysis
In: Applied Sciences; Volume 12; Issue 7; Pages: 3634 (2022)
BASE
Show details

Page: 1 2 3 4 5...73

Catalogues
0
0
3
0
0
0
0
Bibliographies
4
0
0
0
0
0
0
0
2
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1.449
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern