DE eng

Search in the Catalogues and Directories

Hits 1 – 14 of 14

1
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
Abstract: National audience ; This paper tackles the task of NER applied to historical texts obtained from processing digital images of news papers using OCR techniques. The main challenge for this task is that the OCR process leads to misspellings and linguistic errors in the output text, which can impact the performance of the NER. We conduct a comparative evaluation on two historical datasets in German and French against previous state-of-the-art models, and we propose a model based ona hierarchical stack of Transformers to approach the NER task for historical data. Our findings show that the proposed model clearly improves the results on both historical data sets ; Cet article aborde la reconnaissance d’entités nommées (NER) appliquée aux textes historiques obtenus à partir du traitement d’images numériques de journaux à l’aide de tech-niques de reconnaissance optique de caractères (OCR). Nous soutenons que le principal défi pour cette tâche est que le processus OCR produit des textes contenant entre autres des fautes d’orthographe et des erreurs de syntaxes. De plus, des variations sémantiques peuvent être présentes dans les documents anciens, ce qui a un impact sur les performances de la reconnaissance d’entités nommées. Nous menons une évaluation comparative à l’état de l’art de deux ensembles de données historiques en allemand et en français, et nous proposons un modèle basé sur une pile hiérarchique de couches Transformer pour aborder la reconnaissance d’entités nommées dans des données historiques. Nos résultats montrent que le modèle proposé améliore clairement les résultats sur les deux ensembles de données
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; données historiques; données multi-lingues; Extraction d’information; Historical data; Information extraction; Multilingual data; Named entity recognition; reconnaissance d’entités nommées
URL: https://hal.archives-ouvertes.fr/hal-03320332/document
https://hal.archives-ouvertes.fr/hal-03320332
https://hal.archives-ouvertes.fr/hal-03320332/file/main%281%29.pdf
BASE
Hide details
2
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
BASE
Show details
3
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers ...
BASE
Show details
4
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers ...
BASE
Show details
5
Robust Named Entity Recognition and Linking on Historical Multilingual Documents
In: Conference and Labs of the Evaluation Forum (CLEF 2020) ; https://hal.archives-ouvertes.fr/hal-03026969 ; Conference and Labs of the Evaluation Forum (CLEF 2020), Sep 2020, Thessaloniki, Greece. pp.1-17, ⟨10.5281/zenodo.4068074⟩ ; http://ceur-ws.org/Vol-2696/paper_171.pdf (2020)
BASE
Show details
6
Compressive approaches for cross-language multi-document summarization
In: ISSN: 0169-023X ; Data and Knowledge Engineering ; https://hal.archives-ouvertes.fr/hal-02556889 ; Data and Knowledge Engineering, Elsevier, 2020, 125, pp.101763. ⟨10.1016/j.datak.2019.101763⟩ (2020)
BASE
Show details
7
Linking Named Entities across Languages using Multilingual Word Embeddings
In: JCDL '20: The ACM/IEEE Joint Conference on Digital Libraries in 2020 ; ACM/IEEE Joint Conference on Digital Libraries - JCDL 2020 ; https://hal.archives-ouvertes.fr/hal-03026933 ; ACM/IEEE Joint Conference on Digital Libraries - JCDL 2020, Aug 2020, Wuhan, Hubei - Virtual event, China. pp.329-332, ⟨10.1145/3383583.3398597⟩ ; https://dl.acm.org/doi/10.1145/3383583.3398597 (2020)
BASE
Show details
8
Robust Named Entity Recognition and Linking on Historical Multilingual Documents ...
BASE
Show details
9
Robust Named Entity Recognition and Linking on Historical Multilingual Documents ...
BASE
Show details
10
A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming ...
BASE
Show details
11
A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming ...
BASE
Show details
12
TLR at BSNLP2019: A Multilingual Named Entity Recognition System
In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing ; 7th Workshop on Balto-Slavic Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-02364839 ; 7th Workshop on Balto-Slavic Natural Language Processing, Aug 2019, Florence, Italy. pp.83-88, ⟨10.18653/v1/W19-3711⟩ ; https://www.aclweb.org/anthology/W19-3711/ (2019)
BASE
Show details
13
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
In: 11th International Conference on Language Resources and Evaluation (LREC) ; https://hal.archives-ouvertes.fr/hal-01722130 ; 11th International Conference on Language Resources and Evaluation (LREC), 2018, Miyazaki, Japan (2018)
BASE
Show details
14
A First Summarization System of a Video in a Target Language
In: MISSI 2018 - 11th edition of the International Conference on Multimedia and Network Information Systems ; https://hal.archives-ouvertes.fr/hal-01819720 ; MISSI 2018 - 11th edition of the International Conference on Multimedia and Network Information Systems, Sep 2018, Wrocław, Poland. pp.1-12 (2018)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
14
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern