2 |
A Part-of-Speech Tagger for Yiddish: First Steps in Tagging the Yiddish Book Center Corpus ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Parsing Early Modern English for Linguistic Search
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing Results and Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Bare Infinitives and External Arguments
|
|
|
|
In: North East Linguistics Society (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Incremental Phrase Structure Generation and a Universal Theory of V2
|
|
|
|
In: North East Linguistics Society (2020)
|
|
BASE
|
|
Show details
|
|
8 |
Language Variation in Appalachia: A Special Case of Sentence Meaning
|
|
|
|
In: ASA Annual Conference (2019)
|
|
BASE
|
|
Show details
|
|
10 |
Treebank-3
|
|
|
|
In: microphone speech, newswire, telephone speech, transcribed speech, varied (2013)
|
|
BASE
|
|
Show details
|
|
15 |
Treebank-3
|
|
|
|
Abstract:
*Introduction* This release contains the following Treebank-2 Material: * One million words of 1989 Wall Street Journal material annotated in Treebank II style. * A small sample of ATIS-3 material annotated in Treebank II style. * A fully tagged version of the Brown Corpus. and the following new material: * Switchboard tagged, dysfluency-annotated, and parsed text * Brown parsed text The Treebank bracketing style is designed to allow the extraction of simple predicate/argument structure. Over one million words of text are provided with this bracketing applied. *Data* The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation. These 2,499 stories have been distributed in both Treebank-2 (LDC95T7) and Treebank-3 (LDC99T42) releases of PTB. Treebank-2 includes the raw text for each story. Three "map" files are available in a compressed file (pennTB_tipster_wsj_map.tar.gz) as an additional download for users who have licensed Treebank-2 and provide the relation between the 2,499 PTB filenames and the corresponding WSJ DOCNO strings in TIPSTER. *Samples* Please view the following samples: * Part-of-Speech Tags * Dysfluency Annotation * Dysfluency Annotation & Part-of-Speech Tags * Dysfluency Annotation, Part-of-Speech Tags & Turns Joined * Syntactic Annotation * Syntactic Annotation & Part-of-Speech Tags *Updates* After publication, it was discovered that not all of the postscript (*.ps) files had been converted to pdfs and that some of the converted pdfs contained errors. For pdf copies of the documentation files, please go to addenda for a list of the files available. As of October 5, 2016 252 wsj files from Treebank-2 were added that were previously missing. As of February, 2017, 2,499 "raw" wsj files were added from Treebank-2 (LDC95T7). Corpus downoads after these dates will include these missing files.
|
|
URL: https://catalog.ldc.upenn.edu/LDC99T42
|
|
BASE
|
|
Hide details
|
|
|
|