DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6 7 8 9...690
Hits 81 – 100 of 13.783

81
Adapting BigScience Multilingual Model to Unseen Languages ...
BASE
Show details
82
On Efficiently Acquiring Annotations for Multilingual Models ...
BASE
Show details
83
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models ...
BASE
Show details
84
Does Corpus Quality Really Matter for Low-Resource Languages? ...
Abstract: The vast majority of non-English corpora are derived from automatically filtered versions of CommonCrawl. While prior work has identified major issues on the quality of these datasets (Kreutzer et al., 2021), it is not clear how this impacts downstream performance. Taking Basque as a case study, we explore tailored crawling (manually identifying and scraping websites with high-quality content) as an alternative to filtering CommonCrawl. Our new corpus, called EusCrawl, is similar in size to the Basque portion of popular multilingual corpora like CC100 and mC4, yet it has a much higher quality according to native annotators. For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100. Nevertheless, we obtain similar results on downstream tasks regardless of the corpus used for pre-training. Our work suggests that NLU performance in low-resource languages is primarily constrained by the quantity rather than the quality of the data, prompting for ...
Keyword: Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
URL: https://dx.doi.org/10.48550/arxiv.2203.08111
https://arxiv.org/abs/2203.08111
BASE
Hide details
85
IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages ...
Biradar, Shankar; Saumya, Sunil. - : arXiv, 2022
BASE
Show details
86
mSLAM: Massively multilingual joint pre-training for speech and text ...
Bapna, Ankur; Cherry, Colin; Zhang, Yu. - : arXiv, 2022
BASE
Show details
87
On the Representation Collapse of Sparse Mixture of Experts ...
Chi, Zewen; Dong, Li; Huang, Shaohan. - : arXiv, 2022
BASE
Show details
88
Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom ...
BASE
Show details
89
L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models ...
BASE
Show details
90
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts ...
BASE
Show details
91
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
Sun, Xin; Ge, Tao; Ma, Shuming. - : arXiv, 2022
BASE
Show details
92
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
Lees, Alyssa; Tran, Vinh Q.; Tay, Yi. - : arXiv, 2022
BASE
Show details
93
Factual Consistency of Multilingual Pretrained Language Models ...
BASE
Show details
94
Examining Scaling and Transfer of Language Model Architectures for Machine Translation ...
BASE
Show details
95
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
BASE
Show details
96
Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi ...
BASE
Show details
97
Agreement ...
Tal, Shira. - : Open Science Framework, 2022
BASE
Show details
98
Agreement ...
Tal, Shira. - : Open Science Framework, 2022
BASE
Show details
99
Natural Language Descriptions of Deep Visual Features ...
BASE
Show details
100
From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction ...
BASE
Show details

Page: 1 2 3 4 5 6 7 8 9...690

Catalogues
517
4
412
0
2
0
22
Bibliographies
2.117
0
0
0
0
0
0
5
50
Linked Open Data catalogues
0
Online resources
73
17
0
0
Open access documents
11.476
5
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern