1 |
Applying data-driven learning to the web
|
|
|
|
In: Multiple Affordances of Language Corpora for Data-driven Learning ; https://hal.archives-ouvertes.fr/hal-00937993 ; Agnieszka Lenko-Szymanska & Alex Boulton. Multiple Affordances of Language Corpora for Data-driven Learning, John Benjamins, pp.267-295, 2015, ⟨10.1075/scl.69.13bou⟩ (2015)
|
|
BASE
|
|
Show details
|
|
2 |
Towards a Simple and Efficient Web Search Framework
|
|
|
|
In: DTIC (2014)
|
|
BASE
|
|
Show details
|
|
3 |
An investigation of using navigation aids for web searching and their disorientation ; Shi yong wang ye hang biao ji qi mi tu de tan suo ; 使用網頁航標及其迷途的探索
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Enhancing a Web Crawler with Arabic Search Capability
|
|
|
|
In: DTIC (2010)
|
|
BASE
|
|
Show details
|
|
5 |
A Journey in Entity Related Retrieval for TREC 2009
|
|
|
|
In: DTIC (2009)
|
|
BASE
|
|
Show details
|
|
7 |
Lucene for n-grams using the ClueWeb Collection
|
|
|
|
In: DTIC (2009)
|
|
BASE
|
|
Show details
|
|
8 |
PARADISE Based Search Engine at TREC 2009 Web Track
|
|
|
|
In: DTIC (2009)
|
|
BASE
|
|
Show details
|
|
9 |
The Internet:: measures of the appropriations of a technology of the intellect ; L'internet : mesures des appropriations d'une technique intellectuelle
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-00294711 ; Sciences de l'Homme et Société. Ecole des Hautes Etudes en Sciences Sociales (EHESS), 2002. Français (2002)
|
|
BASE
|
|
Show details
|
|
10 |
A Probabilistic Model for Dimensionality Reduction in Information Retrieval and Filtering
|
|
|
|
In: http://www.nersc.gov/research/SCG/cding/papers_ps/jsis2.ps (2001)
|
|
Abstract:
Dimension reduction methods, such as Latent Semantic Indexing (LSI), when applied to semantic spaces built upon text collections, improve information retrieval, information filtering and word sense disambiguation. A new dual probability model based on similarity concepts is introduced to explain the observed success. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their contribution to the overall statistical significance is negative, giving rise to LSI: LSI is the optimal solution of the model. The peak in likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follow the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments on four standard document collections both confirm and illustrate the results and concepts presented here.
|
|
Keyword:
retrieve information by exactly matching query keywords to; such as Internet search engines
|
|
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.7962 http://www.nersc.gov/research/SCG/cding/papers_ps/jsis2.ps
|
|
BASE
|
|
Hide details
|
|
|
|