1 |
MasakhaNER: Named entity recognition for African languages
|
|
|
|
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03350962 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021, ⟨10.1162/tacl⟩ (2021)
|
|
BASE
|
|
Show details
|
|
2 |
Evaluating the Morphosyntactic Well-formedness of Generated Texts ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Explorations in Transfer Learning for OCR Post-Correction ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Evaluating the Morphosyntactic Well-formedness of Generated Texts ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Lexically Aware Semi-Supervised Learning for OCR Post-Correction ...
|
|
|
|
Abstract:
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Optical character recognition (OCR) can be used to produce digitized text, and previous work has demonstrated the utility of neural post-correction methods that improve the results of general-purpose OCR systems on recognition of less-well-resourced languages. However, these methods rely on manually curated post-correction data, which are relatively scarce compared to the non-annotated raw images that need to be digitized. In this paper, we present a semi-supervised learning method that makes it possible to utilize these raw images to improve performance, specifically through the use of self-training, a technique where a model is iteratively trained on its own outputs. In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically-aware decoding method that augments the neural post-correction model with a count-based language model constructed from the ... : Accepted to the Transactions of the Association for Computational Linguistics (TACL) ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2111.02622 https://dx.doi.org/10.48550/arxiv.2111.02622
|
|
BASE
|
|
Hide details
|
|
6 |
Lexically-Aware Semi-Supervised Learning for OCR Post-Correction ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Dependency Induction Through the Lens of Visual Perception ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Dependency Induction Through the Lens of Visual Perception ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
AlloVera: a multilingual allophone database
|
|
|
|
In: LREC 2020: 12th Language Resources and Evaluation Conference ; https://halshs.archives-ouvertes.fr/halshs-02527046 ; LREC 2020: 12th Language Resources and Evaluation Conference, European Language Resources Association, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/ (2020)
|
|
BASE
|
|
Show details
|
|
11 |
A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Temporally-Informed Analysis of Named Entity Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Temporally-Informed Analysis of Named Entity Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
AlloVera: a multilingual allophone database
|
|
|
|
In: LREC 2020: 12th Language Resources and Evaluation Conference ; https://halshs.archives-ouvertes.fr/halshs-02527046 ; LREC 2020: 12th Language Resources and Evaluation Conference, European Language Resources Association, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/ (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
|
|
|
|
In: Transactions of the Association for Computational Linguistics, Vol 8, Pp 109-124 (2020) (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Zero-shot Neural Transfer for Cross-lingual Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|