1 |
Does Corpus Quality Really Matter for Low-Resource Languages? ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining ...
|
|
|
|
Abstract:
Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data. In this paper, we present PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models), which extends the conventional denoising objective used to train these models by (i) replacing words in the noised sequence according to a multilingual dictionary, and (ii) predicting the reference translation according to a parallel corpus instead of recovering the original sequence. Our experiments on machine translation and cross-lingual natural language inference show an average improvement of 2.0 BLEU points and 6.7 accuracy points from integrating parallel data into pretraining, respectively, obtaining results that are competitive with several popular models at a fraction of their computational cost. ... : Preprint ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2108.01887 https://dx.doi.org/10.48550/arxiv.2108.01887
|
|
BASE
|
|
Hide details
|
|
6 |
Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
A Call for More Rigor in Unsupervised Cross-lingual Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Translation Artifacts in Cross-lingual Transfer Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Analyzing the Limitations of Cross-lingual Word Embedding Mappings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
On the Cross-lingual Transferability of Monolingual Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text ; Traducción Automática Neuronal no Supervisada, un nuevo paradigma basado solo en textos monolingües
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
|
|
|
|
In: Transactions of the Association for Computational Linguistics, Vol 7, Pp 597-610 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
16 |
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora
|
|
|
|
In: Computational Linguistics, Vol 45, Iss 3, Pp 395-421 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
17 |
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Lexical semantics, Basque and Spanish in QTLeap: Quality Translation by Deep Language Engineering Approaches ; QTLeap - Traducción de calidad mediante tratamientos profundos de ingeniería lingüística
|
|
|
|
BASE
|
|
Show details
|
|
|
|