1 |
C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
The Multilingual TEDx Corpus for Speech Recognition and Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
An Information Retrieval Test Collection for English SMS Conversations
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Microblogging Temporal Summarization: Filtering Important Twitter Updates for Breaking News
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Frontiers, Challenges, and Opportunities for Information Retrieval – Report from SWIRL 2012, The Second Strategic Workshop on Information Retrieval in Lorne
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Formative Evaluation for Multilingual Multimedia Search and Sense-Making
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Advances in Multilingual and Multimodal Information Retrieval : 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007, Revised Selected Papers
|
|
|
|
UB Frankfurt Linguistik
|
|
Show details
|
|
11 |
Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval
|
|
|
|
Abstract:
This dissertation considers the problem of information retrieval in speech. Today's speech retrieval systems generally use a large vocabulary continuous speech recognition system to first hypothesize the words which were spoken. Because these systems have a predefined lexicon, words which fall outside of the lexicon can significantly reduce search quality---as measured by Mean Average Precision (MAP). This is particularly important because these Out-Of-Vocabulary (OOV) words are often rare and therefore good discriminators for topically relevant speech segments. The focus of this dissertation is on handling these out-of-vocabulary query words. The approach is to combine results from a word-based speech retrieval system with those from vocabulary-independent ranked utterance retrieval. The goal of ranked utterance retrieval is to rank speech utterances by the system's confidence that they contain a particular spoken word, which is accomplished by ranking the utterances by the estimated frequency of the word in the utterance. Several new approaches for estimating this frequency are considered, which are motivated by the disparity between reference and errorfully hypothesized phoneme sequences. The first method learns alternate pronunciations or degradations from actual recognition hypotheses and incorporates these variants into a new generative estimator for term frequency. A second method learns transformations of several easily computed features in a discriminative model for the same task. Both methods significantly improved ranked utterance retrieval in an experimental validation on new speech. The best of these ranked utterance retrieval methods is then combined with a word-based speech retrieval system. The combination approach uses a normalization learned in an additive model, which maps the retrieval status values from each system into estimated probabilities of relevance that are easily combined. Using this combination, much of the MAP lost because of OOV words is recovered. Evaluated on a collection of spontaneous, conversational speech, the system recovers 57.5\% of the MAP lost on short (title-only) queries and 41.3\% on longer (title plus description) queries.
|
|
Keyword:
computational linguistics; information retrieval; Information Science; Mathematics; natural language processing; ranked utterance retrieval; Speech Communication; speech retrieval; spoken document retrieval
|
|
URL: http://hdl.handle.net/1903/8881
|
|
BASE
|
|
Hide details
|
|
12 |
Classifying Attitude by Topic Aspect for English and Chinese Document Collections
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Overview of the CLEF-2006 cross-language speech retrieval track
|
|
|
|
In: Oard, Douglas W., Wang, Jianqiang, Jones, Gareth J.F. orcid:0000-0003-2923-8365 , White, Ryen W., Pecina, Pavel, Soergel, Dagobert, Huang, Xiaoli and Shafran, Izhak (2007) Overview of the CLEF-2006 cross-language speech retrieval track. In: CLEF 2006: Workshop on Cross-Language Information Retrieval and Evaluation, 20-22 Sept. 2006, Alicante, Spain. (2007)
|
|
BASE
|
|
Show details
|
|
14 |
Investigating cross-language speech retrieval for a spontaneous conversational speech collection
|
|
|
|
In: Inkpen, Diana, Alzghool, Muath, Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Oard, Douglas W. (2006) Investigating cross-language speech retrieval for a spontaneous conversational speech collection. In: HLT-NAACL 2006 - The Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, 8-9 June 2006, New York, USA. (2006)
|
|
BASE
|
|
Show details
|
|
15 |
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
|
|
|
|
In: DTIC (2006)
|
|
BASE
|
|
Show details
|
|
16 |
TREC-9 Experiments at Maryland: Interactive CLIR
|
|
|
|
In: DTIC (2006)
|
|
BASE
|
|
Show details
|
|
17 |
COMPLEX QUESTION ANSWERING BASED ON A SEMANTIC DOMAIN MODEL OF CLINICAL MEDICINE
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Comparing User-Assisted and Automatic Query Translation
|
|
|
|
In: DTIC AND NTIS (2005)
|
|
BASE
|
|
Show details
|
|
|
|