DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Visual Goal-Step Inference using wikiHow ...
Abstract: Anthology paper link: https://aclanthology.org/2021.emnlp-main.165/ Abstract: Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20%. Our task will facilitate multimodal reasoning about procedural events. ...
Keyword: Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
URL: https://dx.doi.org/10.48448/z6gp-g111
https://underline.io/lecture/37557-visual-goal-step-inference-using-wikihow
BASE
Hide details
2
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text ...
BASE
Show details
3
Natural Language as a Scaffold for Visual Recognition
Yatskar, Mark. - 2017
BASE
Show details
4
Commonly Uncommon: Semantic Sparsity in Situation Recognition ...
BASE
Show details
5
For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia ...
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
5
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern