41 |
A Visual Context-Aware Multimodal System for Spoken Language Processing
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/niloy-euro03.pdf (2003)
|
|
BASE
|
|
Show details
|
|
42 |
Augmenting User Interfaces with Adaptive Speech Commands
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/jfig_icmi03.pdf (2003)
|
|
Abstract:
We present a system that augments any unmodified Java application with an adaptive speech interface. The augmented system learns to associate spoken words and utterances with interface actions such as button clicks. Speech learning is constantly active and searches for correlations between what the user says and does. Training the interface is seamlessly integrated with using the interface. As the user performs normal actions, she may optionally verbally describe what she is doing. By using a phoneme recognizer, the interface is able to quickly learn new speech commands. Speech commands are chosen by the user and can be recognized robustly due to accurate phonetic modelling of the user's utterances and the small size of the vocabulary learned for a single application. After only a few examples, speech commands can replace mouse clicks. In e#ect, selected interface functions migrate from keyboard and mouse to speech. We demonstrate the usefulness of this approach by augmenting jfig, a drawing application, where speech commands save the user from the distraction of having to use a tool palette.
|
|
Keyword:
speech
|
|
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8269 http://www.media.mit.edu/cogmac/publications/jfig_icmi03.pdf
|
|
BASE
|
|
Hide details
|
|
43 |
Augmenting User Interfaces with Adaptive Speech Commands
|
|
|
|
In: http://web.media.mit.edu/~pgorniak/jfig.pdf (2003)
|
|
BASE
|
|
Show details
|
|
44 |
Learning Word Meanings and Descriptive Parameter Spaces from Music
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/whitman03learning.pdf (2003)
|
|
BASE
|
|
Show details
|
|
45 |
Grounded spoken language acquisition: Experiments in word learning
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/roy_2003.pdf (2003)
|
|
BASE
|
|
Show details
|
|
46 |
Grounded Spoken Language Acquisition: Experiments in Word Learning
|
|
|
|
In: http://www.media.mit.edu/cogmac/publications/ieee_multimedia_2003.pdf (2003)
|
|
BASE
|
|
Show details
|
|
48 |
Learning Visually Grounded Words and Syntax of Natural Spoken Language
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/ec.pdf (2002)
|
|
BASE
|
|
Show details
|
|
49 |
Grounded Spoken Language Acquisition: Experiments in Word Learning
|
|
|
|
In: http://www.media.mit.edu/cogmac/ieee_mm_2002.pdf (2002)
|
|
BASE
|
|
Show details
|
|
51 |
Learning Words from Sights and Sounds: A Computational Model
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/cogsci_v2.pdf (2001)
|
|
BASE
|
|
Show details
|
|
52 |
Learning visually grounded words and syntax of natural spoken language
|
|
|
|
In: http://web.media.mit.edu/~dkroy/papers/pdf/roy_2000_2001.pdf (2001)
|
|
BASE
|
|
Show details
|
|
53 |
Integration Of Speech And Vision Using Mutual Information
|
|
|
|
In: http://vismod.www.media.mit.edu/people/dkroy/papers/pdf/icassp2000.pdf (2000)
|
|
BASE
|
|
Show details
|
|
54 |
Integration of speech and vision using mutual information
|
|
|
|
In: http://www.icsi.berkeley.edu/~dpwe/research/etc/icassp2000/pdf/143_717.PDF (2000)
|
|
BASE
|
|
Show details
|
|
55 |
Learning Words from Sights and Sounds: A Computational Model
|
|
|
|
In: http://www.media.mit.edu/cogmac/cogsci_2002.pdf (2000)
|
|
BASE
|
|
Show details
|
|
56 |
A Computational Model of Word Learning from Multimodal Sensory Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/pdf/iccm2000.pdf (2000)
|
|
BASE
|
|
Show details
|
|
57 |
Learning Visually Grounded Words and Syntax of Natural Spoken Language
|
|
|
|
In: http://www.media.mit.edu/cogmac/evol_comm_2002.pdf (2000)
|
|
BASE
|
|
Show details
|
|
58 |
Learning Words From Natural Audio-Visual Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/Postscript/icslp98.ps.Z (1999)
|
|
BASE
|
|
Show details
|
|
59 |
Learning Words From Natural Audio-Visual Input
|
|
|
|
In: http://vismod.www.media.mit.edu/~dkroy/papers/pdf/icslp98.pdf (1999)
|
|
BASE
|
|
Show details
|
|
60 |
Multimodal Adaptive Interfaces
|
|
|
|
In: ftp://whitechapel.media.mit.edu/pub/tech-reports/TR-438.ps.Z (1997)
|
|
BASE
|
|
Show details
|
|
|
|