2 |
Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers ...
|
|
Repo, Liina; Skantsi, Valtteri; Rönnqvist, Samuel; Hellström, Saara; Oinonen, Miika; Salmela, Anna; Biber, Douglas; Egbert, Jesse; Pyysalo, Sampo; Laippala, Veronika. - : arXiv, 2021
|
|
Abstract:
We explore cross-lingual transfer of register classification for web documents. Registers, that is, text varieties such as blogs or news are one of the primary predictors of linguistic variation and thus affect the automatic processing of language. We introduce two new register annotated corpora, FreCORE and SweCORE, for French and Swedish. We demonstrate that deep pre-trained language models perform strongly in these languages and outperform previous state-of-the-art in English and Finnish. Specifically, we show 1) that zero-shot cross-lingual transfer from the large English CORE corpus can match or surpass previously published monolingual models, and 2) that lightweight monolingual classification requiring very little training data can reach or surpass our zero-shot performance. We further analyse classification results finding that certain registers continue to pose challenges in particular for cross-lingual transfer. ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2102.07396 https://arxiv.org/abs/2102.07396
|
|
BASE
|
|
Hide details
|
|
3 |
Jump-Starting Item Parameters for Adaptive Language Tests ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Chatbots language design: the influence of language variation on user experience ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Identifying and describing functional discourse units in the BNC Spoken 2014
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Examining vocabulary acquisition through word associations:triangulating the psycholinguistic and corpus-based approaches
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Working at the interface of hydrology and corpus linguistics:using corpora to identify unrecorded droughts in nineteenth-century Britain
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Advancing Law and Corpus Linguistics: Importing Principles and Practices from Survey and Content Analysis Methodologies to Improve Corpus Design and Analysis
|
|
|
|
In: BYU Law Review (2017)
|
|
BASE
|
|
Show details
|
|
14 |
Discipline-specific reading expectation and challenges for ESL learners in US universities
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Variationist versus text-linguistic approaches to grammatical change in English: nominal modifiers of head nouns ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Triangulating methodological approaches in corpus-linguistic research
|
|
|
|
BASE
|
|
Show details
|
|
|
|