Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4

Hits 1 – 20 of 69

1	Universal Dependencies 2.9
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

2	Universal Dependencies 2.8.1
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

3	Universal Dependencies 2.8
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

4	The Orange workflow for observing collocation trends ColTrend 1.0
	Kosem, Iztok; Krek, Simon; Čibej, Jaka. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

5	Slovene ontology of semantic types for nouns SLONEST-noun 1.0
	Kosem, Iztok; Pori, Eva; Gantar, Polona. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

6	Valency lexicon extracted from the Gigafida 2.1 corpus
	Krek, Simon; Gantar, Polona; Krsnik, Luka. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

7	Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus
	Krek, Simon; Gantar, Apolonija; Laskowski, Cyprian. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

8	The Orange workflow for observing collocation clusters ColEmbed 1.0
	Kosem, Iztok; Čibej, Jaka; Ljubešić, Nikola; Krek, Simon; Gantar, Polona; Arhar Holdt, Špela; Logar, Nataša; Laskowski, Cyprian; Klemenc, Bojan; Dobrovoljc, Kaja; Gorjanc, Vojko; Pori, Eva. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	Abstract: The Orange Workflow for Observing Collocation Clusters ColEmbed 1.0 ColEmbed is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data visualization software: https://orangedatamining.com/) that allows the user to observe clusters of collocation candidates extracted from corpora. The workflow consists of a series of data filters, embedding processors, and visualizers. As input, the workflow takes a tab-separated file (.TSV/.TAB) with data on collocations extracted from a corpus, along with their relative frequencies by year of publication and other optional values (such as information on temporal trends). The workflow allows the user to select the features which are then used in the workflow to cluster collocation candidates, along with the embeddings generated based on the selected lemmas (either one lemma or both lemmas can be selected, depending on our clustering criteria; for instance, if we wish to cluster adjective+noun candidates based on the similarities of their noun components, we only select the second lemma to be taken into account in embedding generation). The obtained embedding clusters can be visualized and further processed (e.g. by finding the closest neighbors of a reference collocation). The workflow is described in more detail in the accompanying README file. The entry also contains three .TAB files that can be used to test the workflow. The files contain collocation candidates (along with their relative frequencies per year of publication and four measures describing their temporal trends; see http://hdl.handle.net/11356/1424 for more details) extracted from the Gigafida 2.0 Corpus of Written Slovene (https://viri.cjvt.si/gigafida/) with three different syntactic structures (as defined in http://hdl.handle.net/11356/1415): 1) p0-s0 (adjective + noun, e.g. rezervni sklad), 2) s0-s2 (noun + noun in the genitive case, e.g. ukinitev lastnine), and 3) gg-s4 (verb + noun in the accusative case, e.g. pripraviti besedilo). It should be noted that only collocation candidates with absolute frequency of 15 and above were extracted. Please note that the ColEmbed workflow requires the installation of the Text Mining add-on for Orange. For installation instructions as well as a more detailed description of the different phases of the workflow and the measures used to observe the collocation trends, please consult the README file.
	Keyword: clustering; collocations; temporal trends; word embeddings
	URL: http://hdl.handle.net/11356/1425
	BASE
	Hide details

9	Training corpus ssj500k 2.3
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

10	Frequency lists of collocations from the Gigafida 2.1 corpus
	Krek, Simon; Gantar, Polona; Kosem, Iztok. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

11	Corpus of Written Standard Slovene Gigafida 2.0
	Krek, Simon; Erjavec, Tomaž; Repar, Andraž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

12	Universal Dependencies 2.7
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2020
	BASE
	Show details

13	Universal Dependencies 2.6
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2020
	BASE
	Show details

14	Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

15	Frequency lists of words from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

16	List of formulaic sequences in spoken Slovenian
	Dobrovoljc, Kaja; Roblek, Rebeka; Vianello, Chiara. - : Jožef Stefan Institute, 2020. : Centre for Language Resources and Technologies, University of Ljubljana, 2020
	BASE
	Show details

17	Consonant-vowel structures in the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

18	Consonant-vowel structures in the Gigafida 2.0 corpus
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

19	Consonant-vowel structures in the GOS 1.0 corpus
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

20	Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

Page: 1 2 3 4

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern