1 |
Available online at www.sciencedirect.com
|
|
|
|
In: http://www.lextutor.ca/cv/crossley_cobb_mcn_13.pdf (2013)
|
|
BASE
|
|
Show details
|
|
2 |
Informatica 37 (2013) 193–201 193 Vector Disambiguation for Translation Extraction from Comparable
|
|
|
|
In: http://www.informatica.si/PDF/37-2/14_Apidianaki - Vector Disambiguation for Translation Ex.pdf (2013)
|
|
BASE
|
|
Show details
|
|
3 |
Identifying word translations from comparable documents without a seed lexicon
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/888_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
4 |
Development and application of a cross-language document comparability metric
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/804_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
5 |
Web-Corpora from Top-Level Domains Represent National Varieties of English
|
|
|
|
In: http://lexicometrica.univ-paris3.fr/jadt/jadt2012/Communications/Cook,+Paul+et+al.+-+Do+web+Corpora+from+Top-Level+Domains.pdf (2012)
|
|
Abstract:
In this study we consider the problem of determining whether an English corpus constructed from a given national top-level domain (e.g.,.uk,.ca) represents the national dialect of English of the corresponding country (e.g., British English, Canadian English). We build English corpora from two top-level domains (.uk and.ca, corresponding to the United Kingdom and Canada, respectively) that contain approximately 100M words each. We consider a previously-proposed measure of corpus similarity, and propose a new measure of corpus similarity that draws on the relative frequency of spelling variants (e.g., color and colour). Using these corpus similarity metrics we show that the Web corpus from a given top-level domain is indeed more similar to a corpus known to contain texts from authors of the corresponding country than to a corpus known to contain documents by authors from another country. These results suggest that English Web corpora from national top-level domains may indeed represent national dialects, which in turn suggests that techniques for building corpora from the Web could be used to build large dialectal language resources at little cost.
|
|
Keyword:
comparing corpora; Corpora; dialects of English. 1. Dialectal language resources; Web-as-corpus
|
|
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.657.9797 http://lexicometrica.univ-paris3.fr/jadt/jadt2012/Communications/Cook,+Paul+et+al.+-+Do+web+Corpora+from+Top-Level+Domains.pdf
|
|
BASE
|
|
Hide details
|
|
6 |
PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/231_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
7 |
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/188_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
8 |
The Twins corpus of museum visitor questions
|
|
|
|
In: http://people.ict.usc.edu/~traum/Papers/twins-corpus.pdf (2012)
|
|
BASE
|
|
Show details
|
|
9 |
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
|
|
|
|
In: http://publications.idiap.ch/downloads/papers/2012/Cartoni_LREC_2012.pdf (2012)
|
|
BASE
|
|
Show details
|
|
10 |
Evaluating DBMS-based access strategies to very large multi-layer corpora
|
|
|
|
In: http://www1.ids-mannheim.de/fileadmin/gra/texte/LREC2012_final.pdf (2012)
|
|
BASE
|
|
Show details
|
|
11 |
Reference Lists for the Evaluation of Term Extraction Tools
|
|
|
|
In: http://hal.inria.fr/docs/00/81/65/66/PDF/tke2012-submission-36.pdf (2012)
|
|
BASE
|
|
Show details
|
|
12 |
A new web interface to facilitate access to corpora: development of the ASLLRP data access interface
|
|
|
|
In: http://www.bu.edu/linguistics/UG/LREC2012/LREC-dai-final.pdf (2012)
|
|
BASE
|
|
Show details
|
|
13 |
TimeBankPT: A TimeML annotated corpus of Portuguese
|
|
|
|
In: http://www.di.fc.ul.pt/~ahb/CostaBrancoLREC2012.pdf (2012)
|
|
BASE
|
|
Show details
|
|
14 |
A generic formalism to represent linguistic corpora in RDF and OWL/DL
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/915_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
15 |
Building a learner corpus
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/992_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
16 |
Pexacc: A parallel sentence mining algorithm from comparable corpora
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/382_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
17 |
Text Simplification Tools for Spanish
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2012/pdf/762_Paper.pdf (2012)
|
|
BASE
|
|
Show details
|
|
18 |
Native language detection with ’cheap’ learner corpora
|
|
|
|
In: http://ftp.cs.toronto.edu/pub/gh/Brooke+Hirst-LRCbook-2013.pdf (2011)
|
|
BASE
|
|
Show details
|
|
19 |
Native language detection with ’cheap’ learner corpora
|
|
|
|
In: http://ftp.cs.toronto.edu/pub/gh/Brooke+Hirst-LCR-2012-OLD.pdf (2011)
|
|
BASE
|
|
Show details
|
|
20 |
Collecting spatial information for locations in a text-to-scene conversion system
|
|
|
|
In: http://ceur-ws.org/Vol-759/paper03.pdf (2011)
|
|
BASE
|
|
Show details
|
|
|
|