1 |
A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification ...
|
|
|
|
Abstract:
For automatically identifying hate speech and offensive content in tweets, a system based on a classical supervised algorithm only fed with character n-grams, and thus completely language-agnostic, is proposed by the SATLab team. After its optimization in terms of the feature weighting and the classifier parameters, it reached, in the multilingual HASOC 2021 challenge, a medium performance level in English, the language for which it is easy to develop deep learning approaches relying on many external linguistic resources, but a far better level for the two less resourced language, Hindi and Marathi. It ends even first when performances are averaged over the three tasks in these languages, outperforming many deep learning approaches. These performances suggest that it is an interesting reference level to evaluate the benefits of using more complex approaches such as deep learning or taking into account complementary resources. ... : A slightly modified version of the paper: "A simple language-agnostic yet strong baseline system for hate speech and offensive content identification. In Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation (10 p.). ceur-ws.org ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2202.02511 https://dx.doi.org/10.48550/arxiv.2202.02511
|
|
BASE
|
|
Hide details
|
|
2 |
Using Fisher's Exact Test to Evaluate Association Measures for N-grams ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
LAST at CMCL 2021 Shared Task: Predicting Gaze Data During Reading with a Gradient Boosting Decision Tree Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora
|
|
|
|
In: Bestgen, Yves. Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora. En: Quaderns de filología. Estudis lingüístics, 22 2017: 33-56 (2017)
|
|
BASE
|
|
Show details
|
|
8 |
Using n-grams to map registers across languages and uncover cross-linguistic contrasts: Insights from Correspondence Analysis
|
|
|
|
In: CBL (Cercle Belge de Linguistique) 2016 ; https://hal.archives-ouvertes.fr/hal-01426811 ; CBL (Cercle Belge de Linguistique) 2016, May 2016, Louvain-la-Neuve, Belgium (2016)
|
|
BASE
|
|
Show details
|
|
9 |
Vers une analyse des différences interlinguistiques entre les genres textuels : étude de cas basée sur les n-grammes et l'analyse factorielle des correspondances
|
|
|
|
In: TALN 2016: Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-01426820 ; TALN 2016: Traitement Automatique des Langues Naturelles, Jul 2016, Paris, France (2016)
|
|
BASE
|
|
Show details
|
|
10 |
Exact Expected Average Precision of the Random Baseline for System Evaluation
|
|
|
|
In: Prague Bulletin of Mathematical Linguistics , Vol 103, Iss 1, Pp 131-138 (2015) (2015)
|
|
BASE
|
|
Show details
|
|
14 |
Construction automatique de ressources lexicales pour la fouille d'opinion. ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
How to determine the meaning and use of (causal) connectives in (large) corpora : from hand-based to automatic analyses
|
|
|
|
In: Electronic Document Week (SDN 2004), Workshop ATALA "Modelling and describing discourse organisation in the age of the digital document", La Rochelle ; https://archivesic.ccsd.cnrs.fr/sic_00001224 ; Jun 2004 (2004)
|
|
BASE
|
|
Show details
|
|
|
|