Page: 1 2 3 4 5 6 7 8 9... 690
81 |
Adapting BigScience Multilingual Model to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
On Efficiently Acquiring Annotations for Multilingual Models ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Does Corpus Quality Really Matter for Low-Resource Languages? ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
mSLAM: Massively multilingual joint pre-training for speech and text ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
On the Representation Collapse of Sparse Mixture of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts ...
|
|
|
|
BASE
|
|
Show details
|
|
91 |
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
|
|
|
|
Abstract:
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic and language-independent strategy for multilingual GEC, which can train a GEC system effectively for a new non-English language with only two easy-to-access resources: 1) a pretrained cross-lingual language model (PXLM) and 2) parallel translation data between English and the language. Our approach creates diverse parallel GEC data without any language-specific operations by taking the non-autoregressive translation generated by PXLM and the gold translation as error-corrected sentence pairs. Then, we reuse PXLM to initialize the GEC model and pretrain it with the synthetic data generated by itself, which yields further improvement. We evaluate our approach on three public benchmarks of GEC in different languages. It achieves the state-of-the-art results on the NLPCC 2018 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://dx.doi.org/10.48550/arxiv.2201.10707 https://arxiv.org/abs/2201.10707
|
|
BASE
|
|
Hide details
|
|
92 |
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
93 |
Factual Consistency of Multilingual Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
94 |
Examining Scaling and Transfer of Language Model Architectures for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
96 |
Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 690
|
|