1 |
Does Corpus Quality Really Matter for Low-Resource Languages? ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Beyond Offline Mapping: Learning Cross-lingual Word Embeddings through Context Anchoring ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
A Call for More Rigor in Unsupervised Cross-lingual Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Translation Artifacts in Cross-lingual Transfer Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Analyzing the Limitations of Cross-lingual Word Embedding Mappings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
On the Cross-lingual Transferability of Monolingual Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text ; Traducción Automática Neuronal no Supervisada, un nuevo paradigma basado solo en textos monolingües
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
|
|
|
|
In: Transactions of the Association for Computational Linguistics, Vol 7, Pp 597-610 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
16 |
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora
|
|
|
|
In: Computational Linguistics, Vol 45, Iss 3, Pp 395-421 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
17 |
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond ...
|
|
|
|
Abstract:
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts. Our system uses a single BiLSTM encoder with a shared BPE vocabulary for all languages, which is coupled with an auxiliary decoder and trained on publicly available parallel corpora. This enables us to learn a classifier on top of the resulting embeddings using English annotated data only, and transfer it to any of the 93 languages without any modification. Our experiments in cross-lingual natural language inference (XNLI dataset), cross-lingual document classification (MLDoc dataset) and parallel corpus mining (BUCC dataset) show the effectiveness of our approach. We also introduce a new test set of aligned sentences in 112 languages, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages. Our implementation, the pre-trained encoder and the multilingual test ... : TACL ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://dx.doi.org/10.48550/arxiv.1812.10464 https://arxiv.org/abs/1812.10464
|
|
BASE
|
|
Hide details
|
|
18 |
Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Lexical semantics, Basque and Spanish in QTLeap: Quality Translation by Deep Language Engineering Approaches ; QTLeap - Traducción de calidad mediante tratamientos profundos de ingeniería lingüística
|
|
|
|
BASE
|
|
Show details
|
|
|
|