1 |
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages ...
|
|
|
|
Abstract:
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of ...
|
|
Keyword:
Computation and Language cs.CL; Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2201.11732 https://arxiv.org/abs/2201.11732
|
|
BASE
|
|
Hide details
|
|
2 |
Improving Word Translation via Two-Stage Contrastive Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Prix-LM: Pretraining for Multilingual Knowledge Base Construction ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Self-Alignment Pretraining for Biomedical Entity Representations
|
|
Liu, Fangyu; Shareghi, Ehsan; Meng, Zaiqiao. - : Association for Computational Linguistics, 2021. : Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
|
|
BASE
|
|
Show details
|
|
16 |
Upgrading the Newsroom: An Automated Image Selection System for News Articles ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Upgrading the Newsroom: An Automated Image Selection System for News Articles
|
|
|
|
In: http://infoscience.epfl.ch/record/280322 (2020)
|
|
BASE
|
|
Show details
|
|
|
|