DE eng

Search in the Catalogues and Directories

Page: 1 2 3
Hits 1 – 20 of 42

1
EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news articles 1.0
Freienthal, Linda; Pelicon, Andraž; Martinc, Matej. - : Ekspress Meedia Group, 2022. : Styria Media Group, 2022
BASE
Show details
2
Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised? ...
BASE
Show details
3
Word-embedding based bilingual terminology alignment ...
BASE
Show details
4
Word-embedding based bilingual terminology alignment ...
BASE
Show details
5
Ekspress news article archive (in Estonian and Russian) 1.0
Purver, Matthew; Pollak, Senja; Freienthal, Linda. - : Ekspress Meedia Group, 2021
BASE
Show details
6
Latvian user comment dataset 1.0
Shekhar, Ravi; Purver, Matthew; Pollak, Senja. - : Ekspress Meedia Group, 2021
BASE
Show details
7
Ekspress user comment dataset 1.0
Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž. - : Ekspress Meedia Group, 2021
BASE
Show details
8
24sata news comment dataset 1.0
Shekhar, Ravi; Pranjic, Marko; Pollak, Senja. - : Styria Media Group, 2021
BASE
Show details
9
Keyword extraction datasets for Croatian, Estonian, Latvian and Russian 1.0
Koloski, Boshko; Pollak, Senja; Škrlj, Blaž; Martinc, Matej. - : Ekspress Meedia Group, 2021. : Styria Media Group, 2021
Abstract: EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content Analysis and Automated Report Generation (http://embeddia.eu/hackashop2021/). The article ids can be used to generate train-test split used in paper: Koloski, B., Pollak, S., Škrlj, B., & Martinc, M. (2021). Extending Neural Keyword Extraction with TF-IDF tagset matching. In: Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, Kiev, Ukraine, pages 22–29. Train and test splits are provided for Latvian, Estonian, Russian and Croatian. The articles with the corresponding ID-s can be extracted from the following datasets: - Estonian and Russian (use the eearticles2015-2019 dataset): https://www.clarin.si/repository/xmlui/handle/11356/1408 - Latvian: https://www.clarin.si/repository/xmlui/handle/11356/1409 - Croatian: https://www.clarin.si/repository/xmlui/handle/11356/1410 dataset_ids folder is organized in the following way: - latvian – containing latvian_train.json: a json file with ids from train articles to replicate the data used in Koloski et al. (2020), the latvian_test.json: a json file with ids from test articles to replicate the data - estonian – containing estonian_train.json: a json file with ids from train articles to replicate the data used in Koloski et al. (2020), the estonian_test.json: a json file with ids from test articles to replicate the data - russian – containing russian_train.json: a json file with ids from train articles to replicate the train data used in Koloski et al. (2020), the russian_test.json: a json file with ids from test articles to replicate the data - croatian - containing croatian_id_train.tsv file with sites and ids (note that just ids are not unique across dataset, therefore site information also needs to be included to obtain a unique article identifier) of articles in the train set, and the croatian_id_test.tsv file with sites and ids of articles in the test set. In addition, scripts are provided for extracting articles (see folder parse containing scripts parse.py and build_croatian_dataset.py, requirements for scripts are pandas and bs4 Python libraries): parse.py is used for extraction of Estonian, Russian and Latvian train and test datasets: Instructions: ESTONIAN-RUSSIAN 1) Retrieve the data ee_articles_2015_2019.zip 2) Create a folder 'data' and subfolder 'ee' 3) Unzip them in the 'data/ee' folder To extract train/test Estonian articles: run function 'build_dataset(lang="ee", opt="nat")' in the parse.py script To extract train/test Russian articles: run function 'build_dataset(lang="ee", opt="rus")' in the parse.py script LATVIAN: 1) Retrieve the latvian data 2) Unzip it in 'data/lv' folder 3) To extract train/test Latvian articles: run function 'build_dataset(lang="lv", opt="nat")' in the parse.py script build_croatian_dataset.py is used for extraction of Croatian train and test datasets: Instructions: CROATIAN: 1) Retrieve the Croatian data (file 'STY_24sata_articles_hr_PUB-01.csv') 2) put the script 'build_croatian_dataset.py' in the same folder as the extracted data and run it (e.g., python build_croatian_dataset.py). For additional questions: {Boshko.Koloski,Matej.Martinc,Senja.Pollak}@ijs.si
Keyword: Croatian news articles; Estonian news articles; keyword extraction; Latvian news articles; news corpus; Russian news articles
URL: http://hdl.handle.net/11356/1403
BASE
Hide details
10
24sata news article archive 1.0
Purver, Matthew; Shekhar, Ravi; Pranjić, Marko. - : Styria Media Group, 2021
BASE
Show details
11
Latvian Delfi article archive (in Latvian and Russian) 1.0
Pollak, Senja; Purver, Matthew; Shekhar, Ravi. - : Ekspress Meedia Group, 2021
BASE
Show details
12
List of single-word male and female occupations in Slovenian
Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko. - : Jožef Stefan Institute, 2021. : Faculty of Computer and Information Science, University of Ljubljana, 2021
BASE
Show details
13
SimLex-999 Slovenian translation SimLex-999-sl 1.0
Pollak, Senja; Vulić, Ivan; Pelicon, Andraž. - : University of Ljubljana, 2021
BASE
Show details
14
Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages ...
BASE
Show details
15
Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages ...
BASE
Show details
16
Evaluation of contextual embeddings on less-resourced languages ...
BASE
Show details
17
Simple Discovery of COVID IS WAR Metaphors Using Word Embeddings ...
BASE
Show details
18
Simple Discovery of COVID IS WAR Metaphors Using Word Embeddings ...
BASE
Show details
19
Investigating cross-lingual training for offensive language detection
In: PeerJ Comput Sci (2021)
BASE
Show details
20
Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech
In: Front Aging Neurosci (2021)
BASE
Show details

Page: 1 2 3

Catalogues
0
0
0
0
1
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
41
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern