41 |
A Visual Context-Aware Multimodal System for Spoken Language Processing
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/niloy-euro03.pdf (2003)
|
|
BASE
|
|
Show details
|
|
42 |
Augmenting User Interfaces with Adaptive Speech Commands
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/jfig_icmi03.pdf (2003)
|
|
BASE
|
|
Show details
|
|
43 |
Augmenting User Interfaces with Adaptive Speech Commands
|
|
|
|
In: http://web.media.mit.edu/~pgorniak/jfig.pdf (2003)
|
|
BASE
|
|
Show details
|
|
44 |
Learning Word Meanings and Descriptive Parameter Spaces from Music
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/whitman03learning.pdf (2003)
|
|
BASE
|
|
Show details
|
|
45 |
Grounded spoken language acquisition: Experiments in word learning
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/roy_2003.pdf (2003)
|
|
BASE
|
|
Show details
|
|
46 |
Grounded Spoken Language Acquisition: Experiments in Word Learning
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/ieee_multimedia_2003.pdf (2003)
|
|
BASE
|
|
Show details
|
|
48 |
Learning Visually Grounded Words and Syntax of Natural Spoken Language
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/ec.pdf (2002)
|
|
BASE
|
|
Show details
|
|
49 |
Grounded Spoken Language Acquisition: Experiments in Word Learning
|
|
|
|
In: http://www.media.mit.edu/cogmac/ieee_mm_2002.pdf (2002)
|
|
BASE
|
|
Show details
|
|
51 |
Learning Words from Sights and Sounds: A Computational Model
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/cogsci_v2.pdf (2001)
|
|
BASE
|
|
Show details
|
|
52 |
Learning visually grounded words and syntax of natural spoken language
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/roy_2000_2001.pdf (2001)
|
|
BASE
|
|
Show details
|
|
53 |
Integration Of Speech And Vision Using Mutual Information
|
|
|
|
In: http://vismod.www.media.mit.edu/people/dkroy/papers/pdf/icassp2000.pdf (2000)
|
|
Abstract:
We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speechatword boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. Wereport results of experiments with a corpus of infant-directed speech and images. 1. INTRODUCTION We are developing systems which learn words from co-occurring audio and visual input [5, 4]. Input consists of naturally spoken mutliword utterances paired with visual representations of object shapes (Figure 1). Output of the system is an audio-visual lexicon of sound-shape associations which encode acoustic forms of words (or phrases) and their visually grounded referents. We assume that, in general, the audio and visual signals are uncorrelated in time. However, when a wordisspoken, its visual representatio.
|
|
URL: http://vismod.www.media.mit.edu/people/dkroy/papers/pdf/icassp2000.pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.9060
|
|
BASE
|
|
Hide details
|
|
54 |
Integration of speech and vision using mutual information
|
|
|
|
In: http://www.icsi.berkeley.edu/~dpwe/research/etc/icassp2000/pdf/143_717.PDF (2000)
|
|
BASE
|
|
Show details
|
|
55 |
Learning Words from Sights and Sounds: A Computational Model
|
|
|
|
In: http://www.media.mit.edu/cogmac/cogsci_2002.pdf (2000)
|
|
BASE
|
|
Show details
|
|
56 |
A Computational Model of Word Learning from Multimodal Sensory Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/pdf/iccm2000.pdf (2000)
|
|
BASE
|
|
Show details
|
|
57 |
Learning Visually Grounded Words and Syntax of Natural Spoken Language
|
|
|
|
In: http://www.media.mit.edu/cogmac/evol_comm_2002.pdf (2000)
|
|
BASE
|
|
Show details
|
|
58 |
Learning Words From Natural Audio-Visual Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/Postscript/icslp98.ps.Z (1999)
|
|
BASE
|
|
Show details
|
|
59 |
Learning Words From Natural Audio-Visual Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/pdf/icslp98.pdf (1999)
|
|
BASE
|
|
Show details
|
|
60 |
Multimodal Adaptive Interfaces
|
|
|
|
In: ftp://whitechapel.media.mit.edu/pub/tech-reports/TR-438.ps.Z (1997)
|
|
BASE
|
|
Show details
|
|
|
|