DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...16
Hits 1 – 20 of 302

1
Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
Abstract: International audience ; We present the AGODA (Analyse sémantique et Graphes relationnels pour l'Ouverture des Débats à l'Assemblée nationale) project, which aims to create a platform for consulting and exploring digitised French parliamentary debates (1881-1940) available in the digital library of the National Library of France. This project brings together historians and NLP specialists: parliamentary debates are indeed an essential source for French history of the contemporary period, but also for linguistics. This project therefore aims to produce a corpus of texts that can be easily exploited with computational methods, and that respect the TEI standard. Ancient parliamentary debates are also an excellent case study for the development and application of tools for publishing and exploring large historical corpora. In this paper, we present the steps necessary to produce such a corpus. We detail the processing and publication chain of these documents, in particular by mentioning the problems linked to the extraction of texts from digitised images. We also introduce the first analyses that we have carried out on this corpus with "bag-of-words" techniques not too sensitive to OCR quality (namely topic modelling and word embedding).
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-CY]Computer Science [cs]/Computers and Society [cs.CY]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.HIST]Humanities and Social Sciences/History; France; OCR; Parliamentary debates; Third Republic; Topic modelling; Word embedding; XML-TEI
URL: https://hal.archives-ouvertes.fr/hal-03623351/document
https://hal.archives-ouvertes.fr/hal-03623351
https://hal.archives-ouvertes.fr/hal-03623351/file/puren_bourgeois_pellet_vernus_agoda2022.pdf
BASE
Hide details
2
Image Annotation Tool
Roček, Martin. - : Charles University, Faculty of Arts, 2022
BASE
Show details
3
Corpus of 1968 Slovenian literature Maj68 2.0
BASE
Show details
4
Corpus of academic Slovene KAS 2.0
Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2022. : Faculty of Computer and Information Science, University of Ljubljana, 2022
BASE
Show details
5
Collection of Slovenian paremiological units Pregovori 1.0
Babič, Saša; Miha, Peče; Erjavec, Tomaž. - : ZRC SAZU, 2022. : Jožef Stefan Institute, 2022
BASE
Show details
6
Terminological Methods in Lexicography: Conceptualising, Organising, and Encoding Terms in General Language Dictionaries
BASE
Show details
7
Giving Depth to TEI-Based Descriptions of Manuscripts: The Golden Gospel of Ham
In: Aethiopica; Bd. 24 (2021); 175–211 ; Aethiopica; Vol. 24 (2021); 175–211 ; 2194-4024 ; 1430-1938 ; 10.15460/aethiopica.24.0 (2022)
BASE
Show details
8
Towards an Online Database of Ancient Dramatic Meters
In: FuturoClassico FCl; N. 7 (2021); 143-164 ; 2465-0951 (2022)
BASE
Show details
9
Understanding and reading XML ; Comprendre et lire le XML
In: https://halshs.archives-ouvertes.fr/halshs-03637142 ; École thématique. Comprendre et lire le XML, Bibliothèque du lab. CRISCO EA 4255, France. 2021, pp.72 ; Comprendre et lire le XML (2021)
BASE
Show details
10
XML and namespaces ; XML et espaces de nom
In: https://halshs.archives-ouvertes.fr/halshs-03637189 ; Doctorat. XML et espaces de nom, Bibliothèque du lab. CRISCO EA 4255, France. 2021, pp.44 ; XML et espaces de nom (2021)
BASE
Show details
11
Language Processing in Digital Editions of Russian 18 th Century Texts ; Лингвистическая обработка цифровых изданий русских текстов XVIII века
In: Corpora 2021 International Conference ; https://halshs.archives-ouvertes.fr/halshs-03285725 ; Corpora 2021 International Conference, Saint-Petersburg State University, Jul 2021, Saint-Petersbourg, Russia ; https://events.spbu.ru/events/corpora-2021 (2021)
BASE
Show details
12
La Base de français médiéval et le consortium CAHIER : dix ans d'échanges et de collaborations
In: 10 ans avec CAHIER. Des corpus d'auteurs pour les humanités à leur exploitation numérique ; https://halshs.archives-ouvertes.fr/halshs-03363517 ; 10 ans avec CAHIER. Des corpus d'auteurs pour les humanités à leur exploitation numérique, Jun 2021, Bordeaux, France ; https://cahier10.sciencesconf.org/344494 (2021)
BASE
Show details
13
Expanding the content model of annotationBlock
In: Next Gen TEI, 2021 - TEI Conference and Members’ Meeting ; https://hal.archives-ouvertes.fr/hal-03380805 ; Next Gen TEI, 2021 - TEI Conference and Members’ Meeting, Oct 2021, Virtual, United States (2021)
BASE
Show details
14
ParCzech 3.0
Kopp, Matyáš; Stankov, Vladislav; Bojar, Ondřej. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2021
BASE
Show details
15
Spoken corpus Gos 1.1
Zwitter Vitez, Ana; Zemljarič Miklavčič, Jana; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
16
Corpus of 1968 Slovenian literature Maj68 1.0
BASE
Show details
17
Corpus of term-annotated texts RSDO5 1.1
BASE
Show details
18
Training corpus ssj500k 2.3
Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
19
Spoken corpus Gos VideoLectures 4.2 (transcription)
Verdonik, Darinka; Potočnik, Tomaž; Sepesy Maučec, Mirjam. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2021
BASE
Show details
20
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
BASE
Show details

Page: 1 2 3 4 5...16

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
17
0
0
Linked Open Data catalogues
0
Online resources
24
0
4
0
Open access documents
261
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern