1 |
French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English
|
|
|
|
In: ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03629677 ; ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland (2022)
|
|
Abstract:
International audience ; Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting. Much work on biases in natural language processing has addressed biases linked to the social and cultural experience of English speaking individuals in the United States. We seek to widen the scope of bias studies by creating material to measure social bias in language models (LMs) against specific demographic groups in France. We build on the US-centered CrowS-pairs dataset to create a multilingual stereotypes dataset that allows for comparability across languages while also characterizing biases that are specific to each country and language. We introduce 1,677 sentence pairs in French that cover stereotypes in ten types of bias like gender and age. 1,467 sentence pairs are translated from CrowS-pairs and 210 are newly crowdsourced and translated back into English. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models (three French, one multilingual) favor sentences that express stereotypes in most bias categories. We report on the translation process, which led to a characterization of stereotypes in CrowS-pairs including the identification of US-centric cultural traits. We offer guidelines to further extend the dataset to other languages and cultural environments.
|
|
Keyword:
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
|
|
URL: https://hal.inria.fr/hal-03629677 https://hal.inria.fr/hal-03629677/file/ACLFinal.pdf https://hal.inria.fr/hal-03629677/document
|
|
BASE
|
|
Hide details
|
|
2 |
Establishing a New State-of-the-Art for French Named Entity Recognition
|
|
|
|
In: LREC 2020 - 12th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-02617950 ; LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France ; http://www.lrec-conf.org (2020)
|
|
BASE
|
|
Show details
|
|
3 |
SinNer@Clef-Hipe2020 : Sinful adaptation of SotA models for Named Entity Recognition in French and German
|
|
|
|
In: CLEF 2020 Working Notes. Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum ; https://hal.inria.fr/hal-02984746 ; CLEF 2020 Working Notes. Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Sep 2020, Thessaloniki / Virtual, Greece ; https://impresso.github.io/CLEF-HIPE-2020/ (2020)
|
|
BASE
|
|
Show details
|
|
4 |
CamemBERT: a Tasty French Language Model
|
|
|
|
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-02889805 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States. ⟨10.18653/v1/2020.acl-main.645⟩ (2020)
|
|
BASE
|
|
Show details
|
|
5 |
CamemBERT: a Tasty French Language Model
|
|
|
|
In: https://hal.inria.fr/hal-02445946 ; 2019 (2019)
|
|
BASE
|
|
Show details
|
|
6 |
Syntactic Parsing versus MWEs: What can fMRI signal tell us
|
|
|
|
In: PARSEME-FR 2019 consortium meeting ; https://hal.inria.fr/hal-02272288 ; PARSEME-FR 2019 consortium meeting, Jun 2019, Blois, France ; https://parsemefr.lis-lab.fr/doku.php?id=meeting-20190613 (2019)
|
|
BASE
|
|
Show details
|
|
7 |
Adapting a system for Named Entity Recognition and Linking for 19th century French Novels
|
|
|
|
In: Digital Humanities 2019 ; https://hal.archives-ouvertes.fr/hal-02187283 ; Digital Humanities 2019, Jul 2019, Utrecht, Netherlands. 2019 ; https://dev.clariah.nl/files/dh2019/boa/0904.html (2019)
|
|
BASE
|
|
Show details
|
|
8 |
Un corpus libre, évolutif et versionné en entités nommées du français
|
|
|
|
In: TALN 2019 - Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-02448590 ; TALN 2019 - Traitement Automatique des Langues Naturelles, Jul 2019, Toulouse, France (2019)
|
|
BASE
|
|
Show details
|
|
9 |
Adaptation et évaluation de systèmes de reconnaissance et de résolution des entités nommées pour le cas de textes littéraires français du 19ème siècle
|
|
|
|
In: Atelier Humanités Numériques Spatialisées (HumaNS’2018) ; https://hal.archives-ouvertes.fr/hal-01925816 ; Atelier Humanités Numériques Spatialisées (HumaNS’2018), Nov 2018, Montpellier, France ; http://psig.huma-num.fr/HumaNS/ (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Description et modélisation des chaînes de référence. Le projet ANR Democrat (2016-2020) et ses avancées à mi-parcours
|
|
|
|
In: Cinquième édition du Salon de l’Innovation en TAL (Traitement Automatique des Langues) et RI (Recherche d’Informations) ; https://hal.archives-ouvertes.fr/hal-01797982 ; Cinquième édition du Salon de l’Innovation en TAL (Traitement Automatique des Langues) et RI (Recherche d’Informations), May 2018, Rennes, France. 2018 (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Structured Named Entity Recognition by Cascading CRFs
|
|
|
|
In: Intelligent Text Processing and Computational Linguistics (CICling) ; https://hal.archives-ouvertes.fr/hal-01579109 ; Intelligent Text Processing and Computational Linguistics (CICling), Apr 2017, Budapest, Hungary (2017)
|
|
BASE
|
|
Show details
|
|
12 |
Label-Dependencies Aware Recurrent Neural Networks
|
|
|
|
In: Intelligent Text Processing and Computational Linguistics (CICling) ; https://hal.archives-ouvertes.fr/hal-01579071 ; Intelligent Text Processing and Computational Linguistics (CICling), Apr 2017, Budapest, Hungary ; http://www.cicling.org/2017/ (2017)
|
|
BASE
|
|
Show details
|
|
13 |
Structuration in named entities ; La structuration dans les entités nommées
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-01772268 ; Linguistique. Université Sorbonne Paris Cité, 2017. Français. ⟨NNT : 2017USPCA100⟩ (2017)
|
|
BASE
|
|
Show details
|
|
14 |
DEMOCRAT : description et modélisation des chaînes de référence ; DEMOCRAT : description et modélisation des chaînes de référence: Outils pour l'annotation de corpus et le traitement automatique
|
|
|
|
In: Salon Partenariats Recherche et Industries de la Langue (PAREIL), Vingt-troisième conférence sur le traitement automatique des langues naturelles (TALN 2016) ; https://hal.archives-ouvertes.fr/hal-01384485 ; Salon Partenariats Recherche et Industries de la Langue (PAREIL), Vingt-troisième conférence sur le traitement automatique des langues naturelles (TALN 2016), Jul 2016, Paris, France. 2016 (2016)
|
|
BASE
|
|
Show details
|
|
15 |
Sequential Patterns of POS Labels Help to Characterize Language Acquisition
|
|
|
|
In: DMNLP (ECML/PKDD Workshop) ; https://hal.archives-ouvertes.fr/hal-01140542 ; DMNLP (ECML/PKDD Workshop), 2014, Nancy, France (2014)
|
|
BASE
|
|
Show details
|
|
16 |
Caractériser l'acquisition d'une langue avec des patrons d'étiquettes morpho-syntaxiques
|
|
|
|
In: JADT (JOURNÉE D'ANALYSE DES DOCUMENTS TEXTUELS) ; https://hal.archives-ouvertes.fr/hal-01140342 ; JADT (JOURNÉE D'ANALYSE DES DOCUMENTS TEXTUELS), Jun 2014, PARIS, France (2014)
|
|
BASE
|
|
Show details
|
|
17 |
Peut-on bien chunker avec de mauvaises étiquettes POS ?
|
|
|
|
In: TALN 2014 ; https://hal.archives-ouvertes.fr/hal-01024274 ; TALN 2014, Jul 2014, Marseille, France. pp.125-136 (2014)
|
|
BASE
|
|
Show details
|
|
18 |
Adapt a Text-Oriented Chunker for Oral Data: How Much Manual Effort is Necessary?
|
|
|
|
In: 14th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL) ; https://hal.archives-ouvertes.fr/hal-01174605 ; 14th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), Oct 2013, Hefei, China (2013)
|
|
BASE
|
|
Show details
|
|
19 |
Intégrer des connaissances linguistiques dans un CRF : application à l'apprentissage d'un segmenteur-étiqueteur du français
|
|
|
|
In: TALN2011 ; TALN ; https://hal.archives-ouvertes.fr/hal-00620923 ; TALN, Jun 2011, Montpellier, France. pp.321 (2011)
|
|
BASE
|
|
Show details
|
|
|
|