1 |
Combining Static and Contextualised Multilingual Embeddings ...
|
|
|
|
Abstract:
Static and contextual multilingual embeddings have complementary strengths. Static embeddings, while less expressive than contextual language models, can be more straightforwardly aligned across multiple languages. We combine the strengths of static and contextual models to improve multilingual representations. We extract static embeddings for 40 languages from XLM-R, validate those embeddings with cross-lingual word retrieval, and then align them using VecMap. This results in high-quality, highly multilingual static embeddings. Then we apply a novel continued pre-training approach to XLM-R, leveraging the high quality alignment of our static embeddings to better align the representation space of XLM-R. We show positive results for multiple complex semantic tasks. We release the static embeddings and the continued pre-training code. Unlike most previous work, our continued pre-training approach does not require parallel text. ... : Accepted to Findings of ACL 2022 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2203.09326 https://dx.doi.org/10.48550/arxiv.2203.09326
|
|
BASE
|
|
Hide details
|
|
2 |
Do Multilingual Language Models Capture Differing Moral Norms? ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Pushing the right buttons: adversarial evaluation of quality estimation
|
|
|
|
In: Proceedings of the Sixth Conference on Machine Translation ; 625 ; 638 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
Do not neglect related languages: The case of low-resource Occitan cross-lingual word embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Anchor-based Bilingual Word Embeddings for Low-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Improving Machine Translation of Rare and Unseen Word Senses ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Pragmatic information in translation: a corpus-based study of tense and mood in English and German ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Anchor-based Bilingual Word Embeddings for Low-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
On the Language Neutrality of Pre-trained Multilingual Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Embedding Learning Through Multilingual Concept Induction ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Embedding Learning Through Multilingual Concept Induction ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|