1 |
Automatic Detection of Entity-Manipulated Text using Factual Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
AraT5: Text-to-Text Transformers for Arabic Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments ...
|
|
|
|
Abstract:
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties. Inspired by geolocation research, we propose the novel task of Micro-Dialect Identification (MDI) and introduce MARBERT, a new language model with striking abilities to predict a fine-grained variety (as small as that of a city) given a single, short message. For modeling, we offer a range of novel spatially and linguistically-motivated multi-task learning models. To showcase the utility of our models, we introduce a new, large-scale dataset of Arabic micro-varieties (low-resource) suited to our tasks. MARBERT predicts micro-dialects with 9.9% F1, ~76X better than a majority class baseline. Our new language model also establishes new state-of-the-art on several external tasks. ... : Accepted in EMNLP 2020 ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2010.04900 https://arxiv.org/abs/2010.04900
|
|
BASE
|
|
Hide details
|
|
16 |
Automatic Detection of Machine Generated Text: A Critical Survey ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Proceedings of the Fifth Arabic Natural Language Processing Workshop
|
|
|
|
BASE
|
|
Show details
|
|
19 |
AraWEAT: Multidimensional analysis of biases in Arabic word embeddings
|
|
|
|
BASE
|
|
Show details
|
|
20 |
AraNet: A Deep Learning Toolkit for Arabic Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|