1 |
NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Investigating Math Word Problems using Pretrained Multilingual Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Modeling Transitions of Focal Entities for Conversational Knowledge Base Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Cross-lingual semantic specialization via lexical relation induction
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Do we really need fully unsupervised cross-lingual embeddings?
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Diagnostic precision of tumor markers for malignant pleural effusion: a meta-analysis
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Racial mimesis: Translation, literature, and self -fashioning in modern China.
|
|
|
|
BASE
|
|
Show details
|
|
16 |
An Empirical Study of Tokenization Strategies for Biomedical Information Retrieval
|
|
|
|
Abstract:
Due to the great variation of biological names in biomedical text, appropriate tokenization is an important preprocessing step for biomedical IR. Despite its importance, there has been little study on the evaluation of various tokenization strategies for biomedical text. In this work, we conducted a careful, systematic evaluation of a set of tokenization heuristics on all the available TREC biomedical IR test collections with two representative retrieval methods. We also studied the effect of stemming and stop word removal on the retrieval performance. As expected, our experimental results show that tokenization can significantly affect the retrieval accuracy; appropriate tokenization can improve the performance by up to 80\%. In particular, it is shown that different query types require different tokenization heuristics, stemming is effective only for certain queries, and stop word removal in general does not improve the retrieval performance in biomedical text.
|
|
Keyword:
bioinformatics; information retrieval; tokenization
|
|
URL: http://hdl.handle.net/2142/11207
|
|
BASE
|
|
Hide details
|
|
|
|