1 |
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
Abstract:
Pre-trained language models (LMs) are currently integral to many natural language processing systems. Although multilingual LMs were also introduced to serve many languages, these have limitations such as being costly at inference time and the size and diversity of non-English data involved in their pre-training. We remedy these issues for a collection of diverse Arabic varieties by introducing two powerful deep bidirectional transformer-based models, ARBERT and MARBERT. To evaluate our models, we also introduce ARLUE, a new benchmark for multi-dialectal Arabic language understanding evaluation. ARLUE is built using 42 datasets targeting six different task clusters, allowing us to offer a series of standardized experiments under rich conditions. When fine-tuned on ARLUE, our models collectively achieve new state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets). Our best model acquires the highest ARLUE score (77.40) across all six task clusters, ... : All authors contributed equally. The order is alphabetical ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2101.01785 https://arxiv.org/abs/2101.01785
|
|
BASE
|
|
Hide details
|
|
4 |
AraT5: Text-to-Text Transformers for Arabic Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
DiaNet: BERT and Hierarchical Attention Multi-Task Learning of Fine-Grained Dialect ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Improving Dialogue Act Classification for Spontaneous Arabic Speech and Instant Messages at Utterance Level ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
JANA: A Human-Human Dialogues Corpus for Egyptian Dialect ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Turn Segmentation into Utterances for Arabic Spontaneous Dialogues and Instance Messages ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|