41 |
Review of: Vander Viana, Sonia Zyngier and Geoff Barnbrook (eds.). 2011. Perspectives on Corpus Linguistics. Amsterdam and Philadelphia: John Benjamins.
|
|
|
|
BASE
|
|
Show details
|
|
42 |
Geographical Text Analysis Mapping and spatially analysing corpora
|
|
|
|
BASE
|
|
Show details
|
|
43 |
Integrating corpus linguistics and spatial technologies for the analysis of literature
|
|
|
|
BASE
|
|
Show details
|
|
45 |
Corpus linguistics : method, theory and practice
|
|
|
|
MPI-SHH Linguistik
|
|
Show details
|
|
49 |
Combining documentation and research: ongoing work on an endangered language
|
|
|
|
In: Proceedings of IALP 2012 (2012 International Conference on Asian Language Processing) ; IALP 2012 (2012 International Conference on Asian Language Processing) ; https://halshs.archives-ouvertes.fr/halshs-00731261 ; IALP 2012 (2012 International Conference on Asian Language Processing), 2012, Hanoi, Vietnam. pp.169-172 (2012)
|
|
BASE
|
|
Show details
|
|
50 |
Combining documentation and research: ongoing work on an endangered language
|
|
|
|
In: Proceedings of IALP 2012 (2012 International Conference on Asian Language Processing) ; IALP 2012 (2012 International Conference on Asian Language Processing) ; https://halshs.archives-ouvertes.fr/halshs-00731261 ; IALP 2012 (2012 International Conference on Asian Language Processing), 2012, Hanoi, Vietnam. pp.169-172 (2012)
|
|
BASE
|
|
Show details
|
|
51 |
Combining documentation and research:ongoing work on an endangered language
|
|
|
|
BASE
|
|
Show details
|
|
52 |
CQPweb - combining power, flexibility and usability in a corpus analysis tool
|
|
|
|
BASE
|
|
Show details
|
|
55 |
Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation
|
|
|
|
In: Hardie, Andrew; Lohani, Ram; & Yadava, Yogendra. (2011). Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation. Himalayan Linguistics, 10(1). doi:10.5070/H910123572. Retrieved from: http://www.escholarship.org/uc/item/15t805x8 (2011)
|
|
Abstract:
The Nepali National Corpus (NNC) was, in the process of its creation, annotated with part-of-speech (POS) tags. This paper describes the extension of automated text and corpus annotation in Nepali from POS tags to lemmatisation, enabling a more complex set of corpus-based searches and analyses. This work also addresses certain practical compromises embodied in the initial tagging of the NNC. First, some particular aspects of Nepali morphology – in particular the complexity of the agglutinative verbal inflection system – necessitated improvements to the underlying tokenisation of the text before lemmatisation could be satisfactorily implemented. In practical terms, both the tokenisation and lemmatisation procedures require linguistic knowledge resources to operate successfully: a set of rules describing the default case, and a lexicon containing a list of individual exceptions: words whose form suggests a particular rule should apply to them, but where that rule in fact does not apply. These resources, particularly the lexicons of irregularities, were created by a strongly data-driven process working from analyses of the NNC itself. This approach to tokenisation and lemmatisation, and associated linguistic knowledge resources, may be illustrative and of use to researchers looking at other languages of the Himalayan region, most especially those that have similar morphological behaviour to Nepali.
|
|
Keyword:
Corpus; Lemmatisation; Morphology; Nepali; Tagging; Tokenisation
|
|
URL: http://www.escholarship.org/uc/item/15t805x8
|
|
BASE
|
|
Hide details
|
|
56 |
Visual GISting: bringing together corpus linguistics and Geographical Information Systems
|
|
|
|
BASE
|
|
Show details
|
|
57 |
Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium
|
|
|
|
BASE
|
|
Show details
|
|
58 |
Extending corpus annotation of Nepali:advances in tokenisation and lemmatisation
|
|
|
|
BASE
|
|
Show details
|
|
60 |
Visual GISting:bringing together corpus linguistics and Geographical Information Systems
|
|
|
|
BASE
|
|
Show details
|
|
|
|