Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8 9...690

Hits 81 – 100 of 13.783

81	Adapting BigScience Multilingual Model to Unseen Languages ...
	Yong, Zheng-Xin; Nikoulina, Vassilina. - : arXiv, 2022
	BASE
	Show details

82	On Efficiently Acquiring Annotations for Multilingual Models ...
	Moniz, Joel Ruben Antony; Patra, Barun; Gormley, Matthew R.. - : arXiv, 2022
	BASE
	Show details

83	Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models ...
	Bhattacharya, Sunit; Kumar, Rishu; Bojar, Ondrej. - : arXiv, 2022
	BASE
	Show details

84	Does Corpus Quality Really Matter for Low-Resource Languages? ...
	Artetxe, Mikel; Aldabe, Itziar; Agerri, Rodrigo; Perez-de-Viñaspre, Olatz; Soroa, Aitor. - : arXiv, 2022
	Abstract: The vast majority of non-English corpora are derived from automatically filtered versions of CommonCrawl. While prior work has identified major issues on the quality of these datasets (Kreutzer et al., 2021), it is not clear how this impacts downstream performance. Taking Basque as a case study, we explore tailored crawling (manually identifying and scraping websites with high-quality content) as an alternative to filtering CommonCrawl. Our new corpus, called EusCrawl, is similar in size to the Basque portion of popular multilingual corpora like CC100 and mC4, yet it has a much higher quality according to native annotators. For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100. Nevertheless, we obtain similar results on downstream tasks regardless of the corpus used for pre-training. Our work suggests that NLU performance in low-resource languages is primarily constrained by the quantity rather than the quality of the data, prompting for ...
	Keyword: Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
	URL: https://dx.doi.org/10.48550/arxiv.2203.08111 https://arxiv.org/abs/2203.08111
	BASE
	Hide details

85	IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages ...
	Biradar, Shankar; Saumya, Sunil. - : arXiv, 2022
	BASE
	Show details

86	mSLAM: Massively multilingual joint pre-training for speech and text ...
	Bapna, Ankur; Cherry, Colin; Zhang, Yu. - : arXiv, 2022
	BASE
	Show details

87	On the Representation Collapse of Sparse Mixture of Experts ...
	Chi, Zewen; Dong, Li; Huang, Shaohan. - : arXiv, 2022
	BASE
	Show details

88	Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom ...
	Antypas, Dimosthenis; Preece, Alun; Collados, Jose Camacho. - : arXiv, 2022
	BASE
	Show details

89	L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models ...
	Velankar, Abhishek; Patil, Hrushikesh; Gore, Amol. - : arXiv, 2022
	BASE
	Show details

90	Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts ...
	Amin, Saadullah; Goldstein, Noon Pokaratsiri; Wixted, Morgan Kelly. - : arXiv, 2022
	BASE
	Show details

91	A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
	Sun, Xin; Ge, Tao; Ma, Shuming. - : arXiv, 2022
	BASE
	Show details

92	A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
	Lees, Alyssa; Tran, Vinh Q.; Tay, Yi. - : arXiv, 2022
	BASE
	Show details

93	Factual Consistency of Multilingual Pretrained Language Models ...
	Fierro, Constanza; Søgaard, Anders. - : arXiv, 2022
	BASE
	Show details

94	Examining Scaling and Transfer of Language Model Architectures for Machine Translation ...
	Zhang, Biao; Ghorbani, Behrooz; Bapna, Ankur. - : arXiv, 2022
	BASE
	Show details

95	MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
	Nielsen, Dan Saattrup; McConville, Ryan. - : arXiv, 2022
	BASE
	Show details

96	Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi ...
	Velankar, Abhishek; Patil, Hrushikesh; Joshi, Raviraj. - : arXiv, 2022
	BASE
	Show details

97	Agreement ...
	Tal, Shira. - : Open Science Framework, 2022
	BASE
	Show details

98	Agreement ...
	Tal, Shira. - : Open Science Framework, 2022
	BASE
	Show details

99	Natural Language Descriptions of Deep Visual Features ...
	Hernandez, Evan; Schwettmann, Sarah; Bau, David. - : arXiv, 2022
	BASE
	Show details

100	From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction ...
	Vacareanu, Robert; Valenzuela-Escarcega, Marco A.; Barbosa, George C. G.. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8 9...690

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern