Page: 1 2 3 4 5 6 7 8... 32.432
61 |
Mehrsprachigkeit im Kontext des Kurmancî-Kurdischen und des Deutschen : eine Fallstudie aus einer kurdisch-deutschen Kindertagesstätte
|
|
|
|
BLLDB
|
|
UB Frankfurt Linguistik
|
|
Show details
|
|
65 |
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
|
|
|
|
In: https://hal.inria.fr/hal-03540069 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
66 |
A fine-grained recognition of Named Entities in ELTeC collection using cascades
|
|
|
|
In: Final Action Event of COST Action Distant Reading for European Literary History ; https://hal.archives-ouvertes.fr/hal-03615219 ; Final Action Event of COST Action Distant Reading for European Literary History, Christof Schöch, Apr 2022, Krakow, Poland ; https://www.distant-reading.net/events/conference-programme/ (2022)
|
|
BASE
|
|
Show details
|
|
67 |
The genetic architecture of language functional connectivity
|
|
|
|
In: ISSN: 1053-8119 ; EISSN: 1095-9572 ; NeuroImage ; https://hal.sorbonne-universite.fr/hal-03566120 ; NeuroImage, Elsevier, 2022, 249, pp.118795. ⟨10.1016/j.neuroimage.2021.118795⟩ (2022)
|
|
BASE
|
|
Show details
|
|
68 |
RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION
|
|
|
|
In: IEEE ICASSP 2022 ; https://hal.archives-ouvertes.fr/hal-03539741 ; IEEE ICASSP 2022, 2022, Singapour, Singapore (2022)
|
|
BASE
|
|
Show details
|
|
69 |
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
|
|
McMillan-Major, Angelina; Alyafeai, Zaid; Biderman, Stella; Chen, Kimbo; De Toni, Francesco; Dupont, Gérard; Elsahar, Hady; Emezue, Chris; Aji, Alham Fikri; Ilić, Suzana; Khamis, Nurulaqilla; Leong, Colin; Masoud, Maraim; Soroa, Aitor; Ortiz Suarez, Pedro; Talat, Zeerak; van Strien, Daniel; Jernite, Yacine
|
|
In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
|
|
Abstract:
8 pages plus appendix and references ; In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect to the rights of data subjects represented in data collections, particularly when considering the difficulty in interrogating these collections due to insufficient documentation and tools for analysis. Mindful of these pitfalls, we present our methodology for a documentation-first, human-centered data collection project as part of the BigScience initiative. We identified a geographically diverse set of target language groups (Arabic, Basque, Chinese, Catalan, English, French, Indic languages, Indonesian, Niger-Congo languages, Portuguese, Spanish, and Vietnamese, as well as programming languages) for which to collect metadata on potential data sources. To structure this effort, we developed our online catalogue as a supporting tool for gathering metadata through organized public hackathons. We present our development process; analyses of the resulting resource metadata, including distributions over languages, regions, and resource types; and our lessons learned in this endeavor.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Applications; Collaborative Resource Construction & Crowdsourcing; LR Infrastructures and Architectures; Systems; Tools
|
|
URL: https://hal.inria.fr/hal-03550289
|
|
BASE
|
|
Hide details
|
|
70 |
Source or target first? Comparison of two post-editing strategies with translation students
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03546151 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
71 |
Meta-Analysis of the Functional Neuroimaging Literature with Probabilistic Logic Programming
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03590714 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
72 |
Automatic Normalisation of Early Modern French
|
|
|
|
In: https://hal.inria.fr/hal-03540226 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
73 |
Offline Corpus Augmentation for English-Amharic Machine Translation
|
|
|
|
In: 2022 The 5th International Conference on Information and Computer Technologies ; https://hal.archives-ouvertes.fr/hal-03547539 ; 2022 The 5th International Conference on Information and Computer Technologies, Mar 2022, New York, United States (2022)
|
|
BASE
|
|
Show details
|
|
74 |
The (white) ears of Ofsted: a raciolinguistic perspective on the listening practices of the schools inspectorate
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Non-sexist Language in Vacancy Titles: A Proposal for Drafting and Translation in International Organisations
|
|
|
|
In: Journal of International Women's Studies (2022)
|
|
BASE
|
|
Show details
|
|
76 |
The effects of various combinations of form-focused instruction techniques on the acquisition of English articles by second language learners of English
|
|
|
|
BASE
|
|
Show details
|
|
77 |
A Scoping Review of Teaching Practices for Linguistically Diverse Students in Ontario
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 32.432
|
|