DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4
Hits 1 – 20 of 73

1
Das ZDL-Regionalkorpus: Ein Korpus für die lexikografische Beschreibung der diatopischen Variation im Standarddeutschen
Nolda, Andreas (VerfasserIn); Barbaresi, Adrien (VerfasserIn)
IDS Mannheim
2
A Reproducible IT-Blog Corpus
In: Journal of Open Humanities Data; Vol 7 (2021); 17 ; 2059-481X (2021)
Abstract: The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along with a method to duplicate the data by updating its contents and downloading it to the user’s local machine. The targets have been hand-picked with the intention to represent the discourse on blogs and websites dedicated to questions at the intersection of technology and society from Germany and the United States of America. The texts have been retrieved by web crawling techniques. The resulting corpus is accessible through a search platform and also reproducible with freely accessible descriptors and software.
Keyword: corpus linguistics; discourse analysis; freedom of expression; internet policy; public discussion; web blogs
URL: https://doi.org/10.5334/johd.35
https://openhumanitiesdata.metajnl.com/jms/article/view/35
BASE
Hide details
3
Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-9) 2021. Limerick, 12 July 2021 (Online-Event) ...
Lüngen, Harald; Kupietz, Marc; Bański, Piotr. - : Leibniz-Institut für Deutsche Sprache, 2021
BASE
Show details
4
Trafilatura: {A} Web Scraping Library and Command-Line Tool for Text Discovery and Extraction ...
BASE
Show details
5
Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora
Arnold, Denis [Verfasser]; Fisseni, Bernhard [Verfasser]; Kamocki, Paweł [Verfasser]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2020
DNB Subject Category Language
Show details
6
Using Full Text Indices for Querying Spoken Language Data
Frick, Elena [Verfasser]; Schmidt, Thomas [Verfasser]; Bański, Piotr [Herausgeber]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2020
DNB Subject Category Language
Show details
7
Proceedings of the LREC 2020 Workshop, Language Resources and Evaluation Conference, 11–16 May 2020, 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8)
Bański, Piotr [Herausgeber]; Barbaresi, Adrien [Herausgeber]; Clematide, Simon [Herausgeber]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2020
DNB Subject Category Language
Show details
8
Evaluating a Dependency Parser on DeReKo
Fankhauser, Peter [Verfasser]; Do, Bich-Ngoc [Verfasser]; Kupietz, Marc [Verfasser]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2020
DNB Subject Category Language
Show details
9
Out-of-the-Box and Into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools
In: Language Resources and Evaluation Conference (LREC 2020) ; https://hal.archives-ouvertes.fr/hal-02732851 ; Language Resources and Evaluation Conference (LREC 2020), 2020, pp.5-13 (2020)
BASE
Show details
10
htmldate: A Python package to extract publication dates from web pages ...
Barbaresi, Adrien. - : Zenodo, 2020
BASE
Show details
11
Proceedings of the LREC 2020: 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8)
In: Proceedings of the LREC 2020: 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8). Edited by: Bański, Piotr; Barbaresi, Adrien; Clematide, Simon; Kupietz, Marc; Lüngen, Harald; Pisetta, Ines (2020). Marseille, France: European Language Ressources Association. (2020)
BASE
Show details
12
What's New in EuReCo? Interoperability, Comparable Corpora, Licensing
Kupietz, Marc [Verfasser]; Margaretha, Eliza [Verfasser]; Diewald, Nils [Verfasser]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
13
The Vast and the Focused: On the need for domain-focused web corpora
Barbaresi, Adrien [Verfasser]; Bański, Piotr [Herausgeber]; Barbaresi, Adrien [Herausgeber]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
14
Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures
Ortiz Suárez, Pedro Javier [Verfasser]; Sagot, Benoît [Verfasser]; Romary, Laurent [Verfasser]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
15
Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22 July 2019
Bański, Piotr [Herausgeber]; Barbaresi, Adrien [Herausgeber]; Biber, Hanno [Herausgeber]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
16
Modelling large parallel corpora. The Zurich Parallel Corpus Collection
Graën, Johannes [Verfasser]; Kew, Tannon [Verfasser]; Shaitarova, Anastassia [Verfasser]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
17
Deduplication in large web corpora
Benko, Vladimír [Verfasser]; Bański, Piotr [Herausgeber]; Barbaresi, Adrien [Herausgeber]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
18
The best of both worlds: Multi-billion word “dynamic” corpora
Lüngen, Harald [Herausgeber]; Breiteneder, Evelyn [Herausgeber]; Barbaresi, Adrien [Herausgeber]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2019
DNB Subject Category Language
Show details
19
Diving Into The Complexities Of The Tech Blog Sphere
In: Digital Humanities 2019 ; https://hal.archives-ouvertes.fr/hal-02201532 ; Digital Humanities 2019, ADHO, Jul 2019, Utrecht, Netherlands ; https://dev.clariah.nl/files/dh2019/boa/0964.html (2019)
BASE
Show details
20
German Political Speeches Corpus ...
Barbaresi, Adrien. - : Zenodo, 2019
BASE
Show details

Page: 1 2 3 4

Catalogues
0
3
0
0
18
0
0
Bibliographies
1
0
1
1
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
46
0
3
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern