DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...9
Hits 1 – 20 of 170

1
Language identification, a tool for Corsican and for the evaluation of linguistic resources ; L'identification de langue, un outil au service du corse et de l'évaluation des ressources linguistiques
In: Traitement Automatique des Langues ; https://hal.archives-ouvertes.fr/hal-03633290 ; Traitement Automatique des Langues, 2022, Diversité Linguistique, 62 (3), pp.13-37 ; https://www.atala.org/content/diversité-linguistique-linguistic-diversity-natural-language-processing (2022)
Abstract: International audience ; The constitution of corpora is one of the first priorities faced by less-resourced languages. The emergence of Internet-based resources of increasing size and covering more and more languages may suggest that this issue has been resolved, but this is not the case. Following Caswell et al. (2021), who evaluated several large resources, including one with Corsican content, we conducted an analysis of two corpora including this language: An Crúbadán and W2C. In parallel to a manual evaluation, we considered the possibility of using one or more language identification modules to filter the content of these resources, which turns out to be possible but at the cost of low recall. For this task, we tested and re-trained various systems in order to adapt them to Corsican. This work makes it possible to provide a model allowing the identification of 17 European languages as well as Corsican ; La constitution de corpus est une des premières priorités que rencontrent les langues peu dotées. L’émergence de ressources issues d’Internet, de tailles de plus en plus imposantes et couvrant de nombreuses langues, peut laisser penser que ce point est désormais résolu, ce qui n’est pas le cas. À la suite de Caswell et al. (2021), qui ont évalué plusieurs ressources de grande envergure, dont une disposant de contenu corse, nous avons mené une analyse de deux corpus incluant cette langue : An Crúbadán et W2C. Parallèlement à une évaluation manuelle, nous avons estimé la possibilité d’utiliser un ou plusieurs modules d’identification de langue afin de filtrer le contenu de ces ressources, ce qui s’avère possible mais au prix d’un rappel peu élevé. Pour cette tâche, nous avons testé et réentraîné divers systèmes afin de les adapter au mieux au corse. Ce travail nous permet de mettre à disposition un modèle capable d’identifier le corse ainsi que 17 autres langues européennes.
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; corpora; corpus; corse; Corsican; identification de langue; language identification; langues peu dotées; less-resourced languages; qualité; quality
URL: https://hal.archives-ouvertes.fr/hal-03633290/file/TAL_62_3_1_Kevers_HAL.pdf
https://hal.archives-ouvertes.fr/hal-03633290/document
https://hal.archives-ouvertes.fr/hal-03633290
BASE
Hide details
2
Text+: Language- and text-based Research Data Infrastructure ...
BASE
Show details
3
Text+: Language- and text-based Research Data Infrastructure ...
BASE
Show details
4
Text+: Language- and text-based Research Data Infrastructure ...
BASE
Show details
5
Chinese Idioms: Stepping Into L2 Student’s Shoes
In: Acta Linguistica Asiatica, Vol 12, Iss 1 (2022) (2022)
BASE
Show details
6
Vers un outillage informatique optimisé pour corpus langagiers oraux en vue d'une exploitation textométrique : le cas des interrogatives partielles dans ESLO
In: Corpus ; https://halshs.archives-ouvertes.fr/halshs-03133017 ; Corpus, 2021 (2021)
BASE
Show details
7
Automatic text simplification of specialized and technical texts ; Simplification automatique de textes techniques et spécialisés
Cardon, Rémi. - : HAL CCSD, 2021
In: https://hal.archives-ouvertes.fr/tel-03343769 ; Informatique et langage [cs.CL]. Université de Lille, 2021. Français (2021)
BASE
Show details
8
LiFR-Lite (2021-11-05)
Cinková, Silvie; Chromý, Jan; Hořeňovská, Karolína. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
BASE
Show details
9
LiFR-Lite
Cinková, Silvie; Chromý, Jan; Hořeňovská, Karolína. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
BASE
Show details
10
A deixis ; DeixisA Proposal for XML Annotation within the Text ; uma proposta de anotação em XML no âmbito do texto
BASE
Show details
11
A Corpus Approach Study on the Manzanar Free Press
In: University Honors Theses (2021)
BASE
Show details
12
FORMAL-FUNCTIONAL MODELS OF THE UZBEK ELECTRON CORPUS ...
Abdurakhmonova, Nilufar. - : Zenodo, 2021
BASE
Show details
13
FORMAL-FUNCTIONAL MODELS OF THE UZBEK ELECTRON CORPUS ...
Abdurakhmonova, Nilufar. - : Zenodo, 2021
BASE
Show details
14
Kolipsi-1 Corpus v1.0
Glaznieks, Aivars; Frey, Jennifer-Carmen; Abel, Andrea. - : Institute for Applied Linguistics, Eurac Research, 2021
BASE
Show details
15
Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers
In: Language Resources and Evaluation for Language Technologies (LREC) ; https://hal.archives-ouvertes.fr/hal-02503986 ; Language Resources and Evaluation for Language Technologies (LREC), May 2020, Marseille, France (2020)
BASE
Show details
16
Parallel data extraction using word embeddings
In: Lohar, Pintu and Way, Andy orcid:0000-0001-5736-5930 (2020) Parallel data extraction using word embeddings. In: NLPTA 2020 : International Conference on NLP Techniques and Applications, 28-29 Nov 2020, London, UK (Online). (2020)
BASE
Show details
17
Towards a Corsican Basic Language Resource Kit
In: 12th Language Resources and Evaluation Conference (LREC 2020) ; https://hal.archives-ouvertes.fr/hal-02865699 ; 12th Language Resources and Evaluation Conference (LREC 2020), May 2020, Marseille, France (2020)
BASE
Show details
18
Visualizing the development of prose styles in Horse Manuals from Early Modern English to Present-Day English
In: EISSN: 2416-5999 ; Journal of Data Mining and Digital Humanities ; https://hal.archives-ouvertes.fr/hal-02283138 ; Journal of Data Mining and Digital Humanities, Episciences.org, 2020, Special Issue Visualisations in Historical Linguistics, Special issue on Visualisations in Historical Linguistics, pp.1-33 (2020)
BASE
Show details
19
Text Corpora and the Challenge of Newly Written Languages
In: 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) ; https://hal.archives-ouvertes.fr/hal-02611209 ; 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), May 2020, Marseille, France (2020)
BASE
Show details
20
ERRATAS database of editorial principles and practices in printed editions of historical correspondence ...
BASE
Show details

Page: 1 2 3 4 5...9

Catalogues
0
0
1
0
0
0
0
Bibliographies
1
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
168
1
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern