1 |
Two New Datasets for Italian-Language Abstractive Text Summarization
|
|
|
|
In: Information; Volume 13; Issue 5; Pages: 228 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
|
|
|
|
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets
|
|
|
|
In: Future Internet ; Volume 13 ; Issue 11 (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Crowdsourcing Dialect Characterization through Twitter
|
|
|
|
In: ISSN: 1932-6203 ; EISSN: 1932-6203 ; PLoS ONE ; https://hal-amu.archives-ouvertes.fr/hal-01242109 ; PLoS ONE, Public Library of Science, 2014, 9 (e112074 ), ⟨10.1371/journal.pone.0112074⟩ (2014)
|
|
BASE
|
|
Show details
|
|
9 |
The bible, truth, and multilingual ocr evaluation
|
|
|
|
In: http://lampsrv02.umiacs.umd.edu/pubs/Papers/CDRRVI99-BibleTruth/CDRRVI99-BibleTruth.pdf (1999)
|
|
BASE
|
|
Show details
|
|
10 |
The Bible, Truth, and Multilingual OCR Evaluation
|
|
|
|
In: http://lamp.cfar.umd.edu/Media/Publications/Papers/CDRRVI99-BibleTruth/CDRRVI99-BibleTruth.ps (1999)
|
|
Abstract:
Multilingual OCR has emerged as an important information technology, thanks to the increasing need for crosslanguage information access. While many research groups and companies have developed OCR algorithms for various languages, it is difficult to compare the performance of these OCR algorithms across languages. This difficulty arises because most evaluation methodologies rely on the use of a document image dataset in each of these languages and it is difficult to find document datasets in different languages that are similar in content, layout, and fonts. In this paper we propose to use the Bible as a dataset for comparing OCR accuracy across languages. Besides being available in a wide range of languages, Bible translations are closely parallel in content, carefully translated, surprisingly relevant with respect to modern-day language, and quite inexpensive. A project at University of Maryland is currently implementing this idea. We have created a scanned image dataset with groundt.
|
|
Keyword:
Bible; datasets; groundtruth; linguistics; OCR; parallel corpus; performance evaluation
|
|
URL: http://lamp.cfar.umd.edu/Media/Publications/Papers/CDRRVI99-BibleTruth/CDRRVI99-BibleTruth.ps http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.9460
|
|
BASE
|
|
Hide details
|
|
11 |
Whole is Greater than Sum of Parts: Recognizing Scene Text Words
|
|
|
|
In: http://cvit.iiit.ac.in/papers/Anand2013Whole.pdf
|
|
BASE
|
|
Show details
|
|
12 |
2009 10th International Conference on Document Analysis and Recognition The GERMANA database ∗
|
|
|
|
In: http://www.cvc.uab.es/icdar2009/papers/3725a301.pdf
|
|
BASE
|
|
Show details
|
|
|
|