1 |
Automatic Detection of Entity-Manipulated Text using Factual Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
Abstract:
Pre-trained language models (LMs) are currently integral to many natural language processing systems. Although multilingual LMs were also introduced to serve many languages, these have limitations such as being costly at inference time and the size and diversity of non-English data involved in their pre-training. We remedy these issues for a collection of diverse Arabic varieties by introducing two powerful deep bidirectional transformer-based models, ARBERT and MARBERT. To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation. ARLUE is built using 42 datasets targeting six different task clusters, allowing us to offer a series of standardized experiments under rich conditions. When fine-tuned on ARLUE, our models collectively achieve new state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets). Our best model acquires the highest ARLUE score (77.40) across all six task clusters, ... : All authors contributed equally. The order is alphabetical ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2101.01785 https://arxiv.org/abs/2101.01785
|
|
BASE
|
|
Hide details
|
|
9 |
AraT5: Text-to-Text Transformers for Arabic Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Automatic Detection of Machine Generated Text: A Critical Survey ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Proceedings of the Fifth Arabic Natural Language Processing Workshop
|
|
|
|
BASE
|
|
Show details
|
|
19 |
AraWEAT: Multidimensional analysis of biases in Arabic word embeddings
|
|
|
|
BASE
|
|
Show details
|
|
20 |
AraNet: A Deep Learning Toolkit for Arabic Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|