DE eng

Search in the Catalogues and Directories

Hits 1 – 8 of 8

1
CamemBERT: a Tasty French Language Model
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-02889805 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States. ⟨10.18653/v1/2020.acl-main.645⟩ (2020)
Abstract: International audience ; Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the con-catenation of data in multiple languages. This makes practical use of such models-in all languages except English-very limited. In this paper, we investigate the feasibility of training monolingual Transformer-based language models for other languages, taking French as an example and evaluating our language models on part-of-speech tagging, dependency parsing, named entity recognition and natural language inference tasks. We show that the use of web crawled data is preferable to the use of Wikipedia data. More surprisingly, we show that a relatively small web crawled dataset (4GB) leads to results that are as good as those obtained using larger datasets (130+GB). Our best performing model CamemBERT reaches or improves the state of the art in all four downstream tasks.
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
URL: https://hal.inria.fr/hal-02889805/file/ACL_2020___CamemBERT__a_Tasty_French_Language_Model-6.pdf
https://hal.inria.fr/hal-02889805
https://hal.inria.fr/hal-02889805/document
https://doi.org/10.18653/v1/2020.acl-main.645
BASE
Hide details
2
How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures
In: 19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond ; https://hal.archives-ouvertes.fr/hal-02263276 ; 19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond, Sep 2019, Graz, Austria (2019)
BASE
Show details
3
LMF Reloaded
In: AsiaLex 2019: Past, Present and Future ; https://hal.inria.fr/hal-02118319 ; AsiaLex 2019: Past, Present and Future, Jun 2019, Istanbul, Turkey (2019)
BASE
Show details
4
Enhancing Usability for Automatically Structuring Digitised Dictionaries
In: GLOBALEX workshop at LREC 2018 ; https://hal.archives-ouvertes.fr/hal-01708137 ; GLOBALEX workshop at LREC 2018, May 2018, Miyazaki, Japan (2018)
BASE
Show details
5
Retro-digitizing and Automatically Structuring a Large Bibliography Collection
In: European Association for Digital Humanities (EADH) Conference ; https://hal.archives-ouvertes.fr/hal-01941534 ; European Association for Digital Humanities (EADH) Conference, EADH, Dec 2018, Galway, Ireland (2018)
BASE
Show details
6
Automatically Encoding Encyclopedic-like Resources in TEI
In: The annual TEI Conference and Members Meeting ; https://hal.inria.fr/hal-01819505 ; The annual TEI Conference and Members Meeting, Sep 2018, Tokyo, Japan ; https://tei2018.dhii.asia/ (2018)
BASE
Show details
7
Presenting the Nénufar Project: a Diachronic Digital Edition of the Petit Larousse Illustré
In: GLOBALEX 2018 - Globalex workshop at LREC2018 ; https://hal.archives-ouvertes.fr/hal-01728328 ; GLOBALEX 2018 - Globalex workshop at LREC2018, May 2018, Miyazaki, Japan. pp.1-6 ; https://globalex.link/globalex2018/ (2018)
BASE
Show details
8
Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields
In: electronic lexicography, eLex 2017 ; https://hal.archives-ouvertes.fr/hal-01508868 ; electronic lexicography, eLex 2017, Sep 2017, Leiden, Netherlands (2017)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
8
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern