2 |
Geographic Adaptation of Pretrained Language Models ...
|
|
|
|
Abstract:
Geographic linguistic features are commonly used to improve the performance of pretrained language models (PLMs) on NLP tasks where geographic knowledge is intuitively beneficial (e.g., geolocation prediction and dialect feature prediction). Existing work, however, leverages such geographic information in task-specific fine-tuning, failing to incorporate it into PLMs' geo-linguistic knowledge, which would make it transferable across different tasks. In this work, we introduce an approach to task-agnostic geoadaptation of PLMs that forces the PLM to learn associations between linguistic phenomena and geographic locations. More specifically, geoadaptation is an intermediate training step that couples masked language modeling and geolocation prediction in a dynamic multitask learning setup. In our experiments, we geoadapt BERTić -- a PLM for Bosnian, Croatian, Montenegrin, and Serbian (BCMS) -- using a corpus of geotagged BCMS tweets. Evaluation on three different tasks, namely unsupervised (zero-shot) and ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2203.08565 https://dx.doi.org/10.48550/arxiv.2203.08565
|
|
BASE
|
|
Hide details
|
|
3 |
Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Predicting the Growth of Morphological Families from Social and Linguistic Factors
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Predicting the Growth of Morphological Families from Social and Linguistic Factors ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|