Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 10 of 10

1	EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news articles 1.0
	Freienthal, Linda; Pelicon, Andraž; Martinc, Matej. - : Ekspress Meedia Group, 2022. : Styria Media Group, 2022
	BASE
	Show details

2	Retweet communities reveal the main sources of hate speech
	Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor...
	In: PLoS One (2022)
	BASE
	Show details

3	Slovenian Twitter dataset 2018-2020 1.0
	Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
	BASE
	Show details

4	Italian YouTube Hate Speech Corpus
	Cinelli, Matteo; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
	BASE
	Show details

5	Latvian user comment dataset 1.0
	Shekhar, Ravi; Purver, Matthew; Pollak, Senja; Pelicon, Andraž; Krustok, Ivar. - : Ekspress Meedia Group, 2021
	Abstract: The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in Russian. Description of the Datasets There are 6 CSV files: * ``lv-comments-2014.csv`` contains 2 753 655 comments from year 2014 * ``lv-comments-2015.csv`` contains 2 221 122 comments from year 2015 * ``lv-comments-2016.csv`` contains 1 897 669 comments from year 2016 * ``lv-comments-2017.csv`` contains 1 896 083 comments from year 2017 * ``lv-comments-2018.csv`` contains 2 222 051 comments from year 2018 * ``lv-comments-2019.csv`` contains 1 421 883 comments from year 2019 In sum: 12 412 463 comments Columns: * ``comment_id`` (string) - the ID of the written comment * ``article_id`` (string) - the ID of the article for which the comment was written * ``created_time`` (string) - the time and date of the comment * ``subject`` (string) - the title of the comment * ``reply_to_comment_id`` (string) - the parent comments ID * ``content`` (string) - the comment itself * ``is_anonymous`` (string) - * 1 if the comment was published anonymously * 0 if the comment was published by a registered user * ``is_enabled`` (string) - * 1 if the comment was published (online) * 0 if it wasn’t published * Questionable field: not all have been manually moderated * No additional information from the moderators * ``channel_language`` (string) - the language of the channel * 'nat' for Latvian * 'rus' for Russian * ``create_user_id`` (string) - the user ID of the commentator * ``modereted_by`` (string) - the ID of the moderator
	Keyword: comment moderation; offensive language; user comment
	URL: http://hdl.handle.net/11356/1407
	BASE
	Hide details

6	Ekspress user comment dataset 1.0
	Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž. - : Ekspress Meedia Group, 2021
	BASE
	Show details

7	24sata news comment dataset 1.0
	Shekhar, Ravi; Pranjic, Marko; Pollak, Senja. - : Styria Media Group, 2021
	BASE
	Show details

8	SimLex-999 Slovenian translation SimLex-999-sl 1.0
	Pollak, Senja; Vulić, Ivan; Pelicon, Andraž. - : University of Ljubljana, 2021
	BASE
	Show details

9	Investigating cross-lingual training for offensive language detection
	Pelicon, Andraž; Shekhar, Ravi; Škrlj, Blaž...
	In: PeerJ Comput Sci (2021)
	BASE
	Show details

10	Sentiment Annotated Dataset of Croatian News
	Pelicon, Andraž; Pranjić, Marko; Miljković, Dragana. - : Jožef Stefan Institute, 2020
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern