2 |
Automatic domain adaptation for parsing
|
|
|
|
Abstract:
Current statistical parsers tend to perform well only on their training domain and nearby genres. While strong performance on a few related domains is sufficient for many situations, it is advantageous for parsers to be able to generalize to a wide variety of domains. When parsing document collections involving heterogeneous domains (e.g. the web), the optimal parsing model for each document is typically not obvious. We study this problem as a new task - multiple source parser adaptation. Our system trains on corpora from many different domains. It learns not only statistics of those domains but quantitative measures of domain differences and how those differences affect parsing accuracy. Given a specific target text, the resulting system proposes linear combinations of parsing models trained on the source corpora. Tested across six domains, our system outperforms all non-oracle baselines including the best domain-independent parsing model. Thus, we are able to demonstrate the value of customizing parsing models to specific domains. ; 9 page(s)
|
|
Keyword:
080100 Artificial Intelligence and Image Processing
|
|
URL: http://hdl.handle.net/1959.14/153955
|
|
BASE
|
|
Hide details
|
|
4 |
Structured generative models for unsupervised named-entity clustering
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Sentence-Internal Prosody Does not Help Parsing the Way Punctuation Does
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|