22 |
How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures
|
|
|
|
In: 19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond ; https://hal.archives-ouvertes.fr/hal-02263276 ; 19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond, Sep 2019, Graz, Austria (2019)
|
|
BASE
|
|
Show details
|
|
23 |
Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures
|
|
|
|
In: 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7) ; https://hal.inria.fr/hal-02148693 ; 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7), Jul 2019, Cardiff, United Kingdom. ⟨10.14618/IDS-PUB-9021⟩ (2019)
|
|
BASE
|
|
Show details
|
|
24 |
Nénufar: Modelling a Diachronic Collection of Dictionary Editions as a Computational Lexical Resource
|
|
|
|
In: ELEX 2019: smart lexicography ; https://hal.inria.fr/hal-02272978 ; ELEX 2019: smart lexicography, Oct 2019, Sintra, Portugal (2019)
|
|
BASE
|
|
Show details
|
|
25 |
LMF Reloaded
|
|
|
|
In: AsiaLex 2019: Past, Present and Future ; https://hal.inria.fr/hal-02118319 ; AsiaLex 2019: Past, Present and Future, Jun 2019, Istanbul, Turkey (2019)
|
|
BASE
|
|
Show details
|
|
26 |
TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries
|
|
|
|
In: ELEX 2019: Smart Lexicography ; https://hal.inria.fr/hal-02264033 ; ELEX 2019: Smart Lexicography, Oct 2019, Sintra, Portugal ; https://elex.link/elex2019/ (2019)
|
|
BASE
|
|
Show details
|
|
27 |
CamemBERT: a Tasty French Language Model
|
|
|
|
In: https://hal.inria.fr/hal-02445946 ; 2019 (2019)
|
|
Abstract:
Web site: https://camembert-model.fr ; Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models—in all languages except English—very limited. Aiming to address this issue for French, we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging, dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and downstream applications for French NLP.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
|
|
URL: https://hal.inria.fr/hal-02445946
|
|
BASE
|
|
Hide details
|
|
28 |
TEI and the Mixtepec-Mixtec corpus: data integration, annotation and normalization of heterogeneous data for an under-resourced language
|
|
|
|
In: 6th International Conference on Language Documentation and Conservation (ICLDC) ; https://hal.inria.fr/hal-02075475 ; 6th International Conference on Language Documentation and Conservation (ICLDC), Feb 2019, Honolulu, United States (2019)
|
|
BASE
|
|
Show details
|
|
29 |
Preparing the Dictionnaire Universel for Automatic Enrichment
|
|
|
|
In: 10th International Conference on Historical Lexicography and Lexicology (ICHLL) ; https://hal.inria.fr/hal-02131598 ; 10th International Conference on Historical Lexicography and Lexicology (ICHLL), Jun 2019, Leeuwarden, Netherlands ; https://easychair.org/smart-program/ICHLL-10/ (2019)
|
|
BASE
|
|
Show details
|
|
30 |
Connecting the Humanities through Research Infrastructures
|
|
|
|
In: 4th Digital Humanities in the Nordic Countries (DHN 2019) ; https://hal.inria.fr/hal-02047512 ; 4th Digital Humanities in the Nordic Countries (DHN 2019), Mar 2019, Copenhagen, Denmark ; https://cst.dk/DHN2019/DHN2019.html (2019)
|
|
BASE
|
|
Show details
|
|
31 |
The place of lexicography in (computer) science
|
|
|
|
In: The Future of Academic Lexicography: Linguistic Knowledge Codification in the Era of Big Data and AI ; https://hal.inria.fr/hal-02358218 ; The Future of Academic Lexicography: Linguistic Knowledge Codification in the Era of Big Data and AI, Frieda Steurs; Dirk Geeraerts; Niels Schiller; Marian Klamer; Iztok Kosem, Nov 2019, Leiden, Netherlands ; https://www.lorentzcenter.nl/lc/web/2019/1177/program.php3?wsid=1177&venue=Oort (2019)
|
|
BASE
|
|
Show details
|
|
32 |
Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures ...
|
|
|
|
BASE
|
|
Show details
|
|
33 |
From disparate disciplines to unity in diversity. How the PARTHENOS project brings Humanities Research Infrastructures together ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
From disparate disciplines to unity in diversity. How the PARTHENOS project brings Humanities Research Infrastructures together ...
|
|
|
|
BASE
|
|
Show details
|
|
38 |
Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
39 |
TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources ...
|
|
|
|
BASE
|
|
Show details
|
|
40 |
Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|