DE eng

Search in the Catalogues and Directories

Page: 1 2 3
Hits 1 – 20 of 42

1
EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news articles 1.0
Freienthal, Linda; Pelicon, Andraž; Martinc, Matej. - : Ekspress Meedia Group, 2022. : Styria Media Group, 2022
BASE
Show details
2
Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised? ...
BASE
Show details
3
Word-embedding based bilingual terminology alignment ...
BASE
Show details
4
Word-embedding based bilingual terminology alignment ...
BASE
Show details
5
Ekspress news article archive (in Estonian and Russian) 1.0
Purver, Matthew; Pollak, Senja; Freienthal, Linda. - : Ekspress Meedia Group, 2021
BASE
Show details
6
Latvian user comment dataset 1.0
Shekhar, Ravi; Purver, Matthew; Pollak, Senja. - : Ekspress Meedia Group, 2021
BASE
Show details
7
Ekspress user comment dataset 1.0
Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž. - : Ekspress Meedia Group, 2021
BASE
Show details
8
24sata news comment dataset 1.0
Shekhar, Ravi; Pranjic, Marko; Pollak, Senja. - : Styria Media Group, 2021
BASE
Show details
9
Keyword extraction datasets for Croatian, Estonian, Latvian and Russian 1.0
Koloski, Boshko; Pollak, Senja; Škrlj, Blaž. - : Ekspress Meedia Group, 2021. : Styria Media Group, 2021
BASE
Show details
10
24sata news article archive 1.0
Abstract: The 24sata news portal consists of a portal with daily news and several smaller portals covering news from specific topics, such as automotive news, health, culinary content, and lifestyle advice. The dataset contains over 650,000 articles in Croatian from 2007 to 2019, as well as assigned tags. Description of the Dataset The dataset consists of 11 columns and 657806 rows. Each row represents a single news article published on the 24sata news portals. Besides the 'www.24sata.hr', the biggest news portal, articles from other niche portals affiliated with 24sata are also included. Columns: 'article_id' - Public id of the article on the new site. The article can be accessed by concatenating the site URL and article_id. For example, to access the article with article_id 614684, you can access it on 'www.24sata.hr/--614684'. This id is, by itself, not unique across the dataset - articles from different portals can share the same article_id. 'site' - The location of the portals where the article came from. There are eight different portals covering topics of daily news, to the more focused portals about automotive technologies and trends, health and wellness, culinary trends and recipes, or lifestyle advices. 'title' - The title of the news article. 'lead' - Lead text, a short introduction to the content of an article. Can be empty. 'content' - The content of the news article, contains the bulk of the text. Can be empty if the whole article could fit in the lead text. 'tags' - Tags, zero or more, separated with a '|' character. Article tags are chosen by the author of the article. 'section' - The main section of the news portal where the article was posted (does not need to be set). The most frequent section is 'Vijesti' (News). 'subsection' - The subsection of the section where the article was posted (does not need to be set). Each section can have multiple subsections. 'authors' - Article authors, zero or more, separated with a '|' character. The author does not need to sign the article if he chooses not to so this can be empty. 'published_from' - A date when this article appeared on the portal. Journalists can write the article in advance and pick a future date and time when it will appear on the site. Due to this strategy, the 'published_from' can be much later than the 'date_created'. 'date_created' - A date when this article was originally written. For all articles published before 2nd Feb 2010 the 'date_created' is set to 2nd Feb 2010 - this is the date when the portal was redesigned and the database with news articles recreated.
Keyword: 24sata news articles; croatian news articles; news corpus
URL: http://hdl.handle.net/11356/1410
BASE
Hide details
11
Latvian Delfi article archive (in Latvian and Russian) 1.0
Pollak, Senja; Purver, Matthew; Shekhar, Ravi. - : Ekspress Meedia Group, 2021
BASE
Show details
12
List of single-word male and female occupations in Slovenian
Supej, Anka; Ulčar, Matej; Robnik-Šikonja, Marko. - : Jožef Stefan Institute, 2021. : Faculty of Computer and Information Science, University of Ljubljana, 2021
BASE
Show details
13
SimLex-999 Slovenian translation SimLex-999-sl 1.0
Pollak, Senja; Vulić, Ivan; Pelicon, Andraž. - : University of Ljubljana, 2021
BASE
Show details
14
Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages ...
BASE
Show details
15
Slav-NER: the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages ...
BASE
Show details
16
Evaluation of contextual embeddings on less-resourced languages ...
BASE
Show details
17
Simple Discovery of COVID IS WAR Metaphors Using Word Embeddings ...
BASE
Show details
18
Simple Discovery of COVID IS WAR Metaphors Using Word Embeddings ...
BASE
Show details
19
Investigating cross-lingual training for offensive language detection
In: PeerJ Comput Sci (2021)
BASE
Show details
20
Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech
In: Front Aging Neurosci (2021)
BASE
Show details

Page: 1 2 3

Catalogues
0
0
0
0
1
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
41
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern