1 |
Automatic Detection of Entity-Manipulated Text using Factual Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling ...
|
|
|
|
Abstract:
A sufficient amount of annotated data is usually required to fine-tune pre-trained language models for downstream tasks. Unfortunately, attaining labeled data can be costly, especially for multiple language varieties and dialects. We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve performance on data-scarce varieties using only resources from data-rich ones. We demonstrate the utility of our approach in the context of Arabic sequence labeling by using a language model fine-tuned on Modern Standard Arabic (MSA) only to predict named entities (NE) and part-of-speech (POS) tags on several dialectal Arabic (DA) varieties. We show that self-training is indeed powerful, improving zero-shot MSA-to-DA transfer by as large as \texttildelow 10\% F$_1$ (NER) and 2\% accuracy (POS tagging). We acquire even better performance in few-shot scenarios with limited amounts of labeled data. We conduct an ablation study and show that the performance boost observed directly results ... : Accepted at EACL 2021 (Camera Ready Version) ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; Neural and Evolutionary Computing cs.NE
|
|
URL: https://arxiv.org/abs/2101.04758 https://dx.doi.org/10.48550/arxiv.2101.04758
|
|
BASE
|
|
Hide details
|
|
4 |
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
AraT5: Text-to-Text Transformers for Arabic Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Automatic Detection of Machine Generated Text: A Critical Survey ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Proceedings of the Fifth Arabic Natural Language Processing Workshop
|
|
|
|
BASE
|
|
Show details
|
|
19 |
AraWEAT: Multidimensional analysis of biases in Arabic word embeddings
|
|
|
|
BASE
|
|
Show details
|
|
20 |
AraNet: A Deep Learning Toolkit for Arabic Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|