1 |
Deriving Disinformation Insights from Geolocalized Twitter Callouts ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
XLM-T: A Multilingual Language Model Toolkit for Twitter ...
|
|
|
|
Abstract:
Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a framework for using and evaluating multilingual language models in Twitter. This framework features two main assets: (1) a strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages. This is a modular framework that can easily be extended to additional tasks, as well as integrated with recent efforts also aimed at the homogenization of Twitter-specific datasets (Barbieri et al. 2020). ... : Submitted to ACL demo. Code and data available at https://github.com/cardiffnlp/xlm-t ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2104.12250 https://dx.doi.org/10.48550/arxiv.2104.12250
|
|
BASE
|
|
Hide details
|
|
3 |
Distilling Relation Embeddings from Pre-trained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Modelling general properties of nouns by selectively averaging contextualised embeddings
|
|
|
|
BASE
|
|
Show details
|
|
6 |
BERT is to NLP what AlexNet is to CV: can pre-trained language models identify analogies?
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Back to the basics: a quantitative analysis of statistical and graph-based term weighting schemes for keyword extraction
|
|
|
|
BASE
|
|
Show details
|
|
8 |
XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Learning Cross-Lingual Word Embeddings from Twitter via Distant Supervision
|
|
|
|
In: Proceedings of the International AAAI Conference on Web and Social Media; Vol. 14 (2020): Fourteenth International AAAI Conference on Web and Social Media; 72-82 ; 2334-0770 ; 2162-3449 (2020)
|
|
BASE
|
|
Show details
|
|
10 |
Analysis and Evaluation of Language Models for Word Sense Disambiguation ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Understanding the source of semantic regularities in word embeddings
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Learning cross-lingual word embeddings from Twitter via distant supervision
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Meemi: A Simple Method for Post-processing and Integrating Cross-lingual Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
SenseDefs: a multilingual corpus of semantically annotated textual definitions : Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking [<Journal>]
|
|
|
|
DNB Subject Category Language
|
|
Show details
|
|
17 |
Improving Cross-Lingual Word Embeddings by Meeting in the Middle ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
The Interplay between Lexical Resources and Natural Language Processing ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|