1 |
Automatic Detection of Entity-Manipulated Text using Factual Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
AraT5: Text-to-Text Transformers for Arabic Language Generation ...
|
|
|
|
Abstract:
Transfer learning with a unified Transformer framework (T5) that converts all language problems into a text-to-text format was recently proposed as a simple and effective transfer learning approach. Although a multilingual version of the T5 model (mT5) was also introduced, it is not clear how well it can fare on non-English tasks involving diverse data. To investigate this question, we apply mT5 on a language with a wide variety of dialects--Arabic. For evaluation, we introduce a novel benchmark for ARabic language GENeration (ARGEN), covering seven important tasks. For model comparison, we pre-train three powerful Arabic T5-style models and evaluate them on ARGEN. Although pre-trained with ~49 less data, our new models perform significantly better than mT5 on all ARGEN tasks (in 52 out of 59 test sets) and set several new SOTAs. Our models also establish new SOTA on the recently-proposed, large Arabic language understanding evaluation benchmark ARLUE (Abdul-Mageed et al., 2021). Our new models are publicly ... : Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). All authors contributed equally ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2109.12068 https://dx.doi.org/10.48550/arxiv.2109.12068
|
|
BASE
|
|
Hide details
|
|
10 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Automatic Detection of Machine Generated Text: A Critical Survey ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Proceedings of the Fifth Arabic Natural Language Processing Workshop
|
|
|
|
BASE
|
|
Show details
|
|
19 |
AraWEAT: Multidimensional analysis of biases in Arabic word embeddings
|
|
|
|
BASE
|
|
Show details
|
|
20 |
AraNet: A Deep Learning Toolkit for Arabic Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|