Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6...10

Hits 21 – 40 of 191

21	Slovenian Twitter dataset 2018-2020 1.0
	Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
	BASE
	Show details

22	Slovene ontology of semantic types for nouns SLONEST-noun 1.0
	Kosem, Iztok; Pori, Eva; Gantar, Polona. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

23	Corpus of Serbian Forms of Address 1.0
	Lemmenmeier-Batinić, Dolores; Ljubešić, Nikola; Samardžić, Tanja. - : Slavic Seminary, University of Zurich, 2021
	BASE
	Show details

24	Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0
	Ljubešić, Nikola; Fišer, Darja; Erjavec, Tomaž. - : Jožef Stefan Institute, 2021
	BASE
	Show details

25	Montenegrin web corpus meWaC 1.0
	Ljubešić, Nikola; Erjavec, Tomaž. - : Jožef Stefan Institute, 2021
	BASE
	Show details

26	The Orange workflow for observing collocation clusters ColEmbed 1.0
	Kosem, Iztok; Čibej, Jaka; Ljubešić, Nikola. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

27	Comparable corpora of South-Slavic Wikipedias CLASSLA-Wikipedia 1.0
	Ljubešić, Nikola; Markoski, Filip; Markoska, Elena. - : Jožef Stefan Institute, 2021
	BASE
	Show details

28	Text collection for training the BERTić transformer model BERTić-data
	Ljubešić, Nikola. - : Jožef Stefan Institute, 2021
	Abstract: The BERTić-data text collection contains more than 8 billion tokens of mostly web-crawled text written in Bosnian, Croatian, Montenegrin or Serbian. The collection was used to train the BERTić transformer model (https://huggingface.co/classla/bcms-bertic). The data consists of web crawls before 2015, i.e. bsWaC (http://hdl.handle.net/11356/1062), hrWaC (http://hdl.handle.net/11356/1064), and srWaC (http://hdl.handle.net/11356/1063); previously unpublished 2019-2020 crawls, i.e. cnrWaC, CLASSLA-bs, CLASSLA-hr, and CLASSLA-sr; the cc100-hr and cc100-sr parts of CommonCrawl (https://commoncrawl.org/); and the Riznica corpus (http://hdl.handle.net/11356/1180). All texts were transliterated to the Latin script. The format of the text collection is one-sentence-per-line, empty-line-as-document-boundary. More details, especially on the applied near-deduplication procedure, can be found in the BERTić paper (https://arxiv.org/pdf/2104.09243.pdf).
	Keyword: language model; web corpus
	URL: http://hdl.handle.net/11356/1426
	BASE
	Hide details

29	Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
	Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya. - : CLARIN ERIC, 2021
	BASE
	Show details

30	Corpus of Croatian news portals ENGRI (2014-2018)
	Bogunović, Irena; Kučić, Mario; Ljubešić, Nikola. - : University of Rijeka, Faculty of Maritime Studies, 2021
	BASE
	Show details

31	Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1
	Ljubešić, Nikola; Fišer, Darja; Erjavec, Tomaž. - : Jožef Stefan Institute, 2021
	BASE
	Show details

32	The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.2
	Ljubešić, Nikola; Krsnik, Luka. - : Jožef Stefan Institute, 2021
	BASE
	Show details

33	The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3
	Ljubešić, Nikola; Krsnik, Luka. - : Jožef Stefan Institute, 2021
	BASE
	Show details

34	Abstracts from the KAS corpus KAS-Abs 1.0
	Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola. - : Jožef Stefan Institute, 2021. : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2021
	BASE
	Show details

35	Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
	Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya. - : CLARIN ERIC, 2021
	BASE
	Show details

36	Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0
	Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya. - : CLARIN ERIC, 2021
	BASE
	Show details

37	Slovenian Twitter hate speech dataset IMSyPP-sl
	Kralj Novak, Petra; Mozetič, Igor; Ljubešić, Nikola. - : Jožef Stefan Institute, 2021
	BASE
	Show details

38	Multilingual comparable corpora of parliamentary debates ParlaMint 2.0
	Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya. - : CLARIN ERIC, 2021
	BASE
	Show details

39	English YouTube Hate Speech Corpus
	Ljubešić, Nikola; Mozetič, Igor; Cinelli, Matteo. - : Jožef Stefan Institute, 2021
	BASE
	Show details

40	Corpus of Written Standard Slovene Gigafida 2.0
	Krek, Simon; Erjavec, Tomaž; Repar, Andraž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

Page: 1 2 3 4 5 6...10

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern