2 |
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Multimodal pretraining unmasked: A meta-analysis and a unified framework of vision-and-language berts ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
Abstract:
Anthology paper link: https://aclanthology.org/2021.emnlp-main.818/ Abstract: The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and images of ImageNet. While one can hardly overestimate how much this benchmark contributed to progress in computer vision, it is mostly derived from lexical databases and image queries in English, resulting in source material with a North American or Western European bias. Therefore, we devise a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures. In particular, we let the selection of both concepts and images be entirely driven by native speakers, rather than scraping them automatically. Specifically, we focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish. On top of the concepts and images obtained through this new protocol, we create a multilingual dataset for {M}ulticultur{a}l ...
|
|
Keyword:
Computer programming; Intelligent System; Machine Learning; Machine translation; Natural Language Processing; Sentiment Analysis
|
|
URL: https://dx.doi.org/10.48448/yncc-sm24 https://underline.io/lecture/38942-visually-grounded-reasoning-across-languages-and-cultures
|
|
BASE
|
|
Hide details
|
|
8 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Multimodal pretraining unmasked: A meta-analysis and a unified framework of vision-and-language berts
|
|
|
|
In: Transactions of the Association for Computational Linguistics, 9 (2021)
|
|
BASE
|
|
Show details
|
|
10 |
The Role of Syntactic Planning in Compositional Image Captioning ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
It’s Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
It’s Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information
|
|
|
|
In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
|
|
BASE
|
|
Show details
|
|
|
|