1 |
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
|
|
|
|
In: https://hal.inria.fr/hal-03177623 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
2 |
MasakhaNER: Named entity recognition for African languages
|
|
Adelani, David,; Abbott, Jade; Neubig, Graham; D'Souza, Daniel; Kreutzer, Julia; Lignos, Constantine; Palen-Michel, Chester; Buzaaba, Happy; Rijhwani, Shruti; Ruder, Sebastian; Mayhew, Stephen; Abebe Azime, Israel; Muhammad, Shamsuddeen,; Chinenye Emezue, Chris; Nakatumba-Nabende, Joyce; Ogayo, Perez; Aremu, Anuoluwapo; Gitau, Catherine; Mbaye, Derguene; Alabi, Jesujoba; Yimam, Seid,; Rabiu Gwadabe, Tajuddeen; Ezeani, Ignatius; Niyongabo, Rubungo,; Mukiibi, Jonathan; Otiende, Verrah; Orife, Iroro; David, Davis; Ngom, Samba; Adewumi, Tosin; Rayson, Paul; Adeyemi, Mofetoluwa; Muriuki, Gerald; Anebi, Emmanuel; Chukwuneke, Chiamaka; Odu, Nkiruka; Wairagala, Eric,; Oyerinde, Samuel; Siro, Clemencia; Saul Bateesa, Tobius; Oloyede, Temilola; Wambui, Yvonne; Akinode, Victor; Nabagereka, Deborah; Katusiime, Maurice; Awokoya, Ayodele; Mboup, Mouhamadane; Gebreyohannes, Dibora; Tilaye, Henok; Nwaike, Kelechi; Wolde, Degaga; Faye, Abdoulaye; Sibanda, Blessing; Ahia, Orevaoghene; Dossou, Bonaventure,; Ogueji, Kelechi; Thierno, Ibrahima; DIALLO, Abdoulaye; Akinfaderin, Adewale; Marengereke, Tendai; Osei, Salomey
|
|
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03350962 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021, ⟨10.1162/tacl⟩ (2021)
|
|
Abstract:
International audience ; We take a step towards addressing the underrepresentation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of stateof-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP. 1
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
|
|
URL: https://hal.inria.fr/hal-03350962 https://doi.org/10.1162/tacl https://hal.inria.fr/hal-03350962/document https://hal.inria.fr/hal-03350962/file/adelani_TACL2021.pdf
|
|
BASE
|
|
Hide details
|
|
3 |
Modelling Latent Translations for Cross-Lingual Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Can Multilinguality benefit Non-autoregressive Machine Translation? ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Evaluating Multiway Multilingual NMT in the Turkic Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Participatory Research for Low-resourced Machine Translation:A Case Study in African Languages
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications
|
|
|
|
BASE
|
|
Show details
|
|
|
|