41 |
A Visual Context-Aware Multimodal System for Spoken Language Processing
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/niloy-euro03.pdf (2003)
|
|
BASE
|
|
Show details
|
|
42 |
Augmenting User Interfaces with Adaptive Speech Commands
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/jfig_icmi03.pdf (2003)
|
|
BASE
|
|
Show details
|
|
43 |
Augmenting User Interfaces with Adaptive Speech Commands
|
|
|
|
In: http://web.media.mit.edu/~pgorniak/jfig.pdf (2003)
|
|
BASE
|
|
Show details
|
|
44 |
Learning Word Meanings and Descriptive Parameter Spaces from Music
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/whitman03learning.pdf (2003)
|
|
BASE
|
|
Show details
|
|
45 |
Grounded spoken language acquisition: Experiments in word learning
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/roy_2003.pdf (2003)
|
|
BASE
|
|
Show details
|
|
46 |
Grounded Spoken Language Acquisition: Experiments in Word Learning
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/ieee_multimedia_2003.pdf (2003)
|
|
BASE
|
|
Show details
|
|
48 |
Learning Visually Grounded Words and Syntax of Natural Spoken Language
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/ec.pdf (2002)
|
|
BASE
|
|
Show details
|
|
49 |
Grounded Spoken Language Acquisition: Experiments in Word Learning
|
|
|
|
In: http://www.media.mit.edu/cogmac/ieee_mm_2002.pdf (2002)
|
|
Abstract:
Language is grounded in sensory-motor experience. (Iroundlng connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multlsensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-tlme robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infantdirected speech and images.
|
|
Keyword:
Semantic (Irounding
|
|
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.4945 http://www.media.mit.edu/cogmac/ieee_mm_2002.pdf
|
|
BASE
|
|
Hide details
|
|
51 |
Learning Words from Sights and Sounds: A Computational Model
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/cogsci_v2.pdf (2001)
|
|
BASE
|
|
Show details
|
|
52 |
Learning visually grounded words and syntax of natural spoken language
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/roy_2000_2001.pdf (2001)
|
|
BASE
|
|
Show details
|
|
53 |
Integration Of Speech And Vision Using Mutual Information
|
|
|
|
In: http://vismod.www.media.mit.edu/people/dkroy/papers/pdf/icassp2000.pdf (2000)
|
|
BASE
|
|
Show details
|
|
54 |
Integration of speech and vision using mutual information
|
|
|
|
In: http://www.icsi.berkeley.edu/~dpwe/research/etc/icassp2000/pdf/143_717.PDF (2000)
|
|
BASE
|
|
Show details
|
|
55 |
Learning Words from Sights and Sounds: A Computational Model
|
|
|
|
In: http://www.media.mit.edu/cogmac/cogsci_2002.pdf (2000)
|
|
BASE
|
|
Show details
|
|
56 |
A Computational Model of Word Learning from Multimodal Sensory Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/pdf/iccm2000.pdf (2000)
|
|
BASE
|
|
Show details
|
|
57 |
Learning Visually Grounded Words and Syntax of Natural Spoken Language
|
|
|
|
In: http://www.media.mit.edu/cogmac/evol_comm_2002.pdf (2000)
|
|
BASE
|
|
Show details
|
|
58 |
Learning Words From Natural Audio-Visual Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/Postscript/icslp98.ps.Z (1999)
|
|
BASE
|
|
Show details
|
|
59 |
Learning Words From Natural Audio-Visual Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/pdf/icslp98.pdf (1999)
|
|
BASE
|
|
Show details
|
|
60 |
Multimodal Adaptive Interfaces
|
|
|
|
In: ftp://whitechapel.media.mit.edu/pub/tech-reports/TR-438.ps.Z (1997)
|
|
BASE
|
|
Show details
|
|
|
|