DE eng

Search in the Catalogues and Directories

Page: 1 2 3
Hits 1 – 20 of 47

1
The Orange workflow for observing collocation trends ColTrend 1.0
Kosem, Iztok; Krek, Simon; Čibej, Jaka. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
2
Slovene ontology of semantic types for nouns SLONEST-noun 1.0
Kosem, Iztok; Pori, Eva; Gantar, Polona. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
3
Valency lexicon extracted from the Gigafida 2.1 corpus
Krek, Simon; Gantar, Polona; Krsnik, Luka. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
4
Morphological patterns from the Sloleks 2.0 lexicon 1.0
Arhar Holdt, Špela; Čibej, Jaka; Laskowski, Cyprian; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021. : Jožef Stefan Institute, 2021
Abstract: This entry consists of XML files with 96,290 lexical units (nouns, verbs, adjectives, and adverbs) from the Sloleks Morphological Lexicon of Slovene 2.0 (http://hdl.handle.net/11356/1230) that include codes for morphological patterns. The pattern codes were designed based on a manual analysis of automatically extracted paradigms and were obtained as follows: The lexical units from Sloleks 2.0 were first automatically clustered into groups through a rule-based approach based on (1) a number of predetermined grammatical features from the MULTEXT-East Version 6 morphosyntactic specifications for Slovenian (http://nl.ijs.si/ME/V6/), such as part of speech, gender and properness for nouns, aspect for verbs, and (2) the differentiating characteristics of their morphological paradigms (i.e. their mutable word parts, which are similar to but not always overlapping with the linguistic definition of word endings – for example: čas-Ø; čas-a; čas-om / prijatelj- Ø; prijatelj-a; prijatelj-em / odstot-ek; odstot-ka; odstot-kom). More than 1,000 automatically extracted pattern candidates were subsequently linguistically analyzed, combined into groups, and hierarchically organized. As a result, every lexical unit in the XML file features a code (listed as ) corresponding to the relevant morphological paradigm in the hierarchy (available in the accompanying file titled "nssss_morphological_pattern_hierarchy_1.0.tsv"). Because the patterns were extracted from Sloleks 2.0, they reflect the decisions that were implemented in its initial compilation, particularly in terms of the degree of morphological variation documented in the lexicon (e.g. not all morphological variants are necessarily included in the lexicon) and paradigm integrity (for instance, some nouns in Sloleks 2.0 only feature singular or plural forms). It should be noted that non-standard word forms were not included in the design of the patterns. In addition, the XML file does not contain lexical units from Sloleks 2.0 that consist of word forms from more than one morphological paradigm (e.g. lesketati – lesketam / leskečem; or lojen – lojenega / lojnega), or other problematic units (such as those with missing or erroneous data).
Keyword: lexicon; morphological patterns; morphology; Slovenian language
URL: http://hdl.handle.net/11356/1411
BASE
Hide details
5
Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus
Krek, Simon; Gantar, Apolonija; Laskowski, Cyprian. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
6
The Orange workflow for observing collocation clusters ColEmbed 1.0
Kosem, Iztok; Čibej, Jaka; Ljubešić, Nikola. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
7
Training corpus ssj500k 2.3
Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
8
Frequency lists of collocations from the Gigafida 2.1 corpus
Krek, Simon; Gantar, Polona; Kosem, Iztok. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
9
Corpus of Written Standard Slovene Gigafida 2.0
Krek, Simon; Erjavec, Tomaž; Repar, Andraž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
BASE
Show details
10
Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning
In: LREC 2020 - Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-02879883 ; LREC 2020 - Language Resources and Evaluation Conference, May 2020, Marseille, France (2020)
BASE
Show details
11
List of word relations from the Sloleks 2.0 lexicon 1.0
Čibej, Jaka; Arhar Holdt, Špela; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
12
Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
13
Frequency lists of words from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
14
Consonant-vowel structures in the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
15
Consonant-vowel structures in the Gigafida 2.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
16
Consonant-vowel structures in the GOS 1.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
17
Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
18
Frequency lists of word parts from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
19
Training corpus ssj500k 2.2
Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019
BASE
Show details
20
Frequency lists of word parts from the Gigafida 2.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
BASE
Show details

Page: 1 2 3

Catalogues
0
0
0
0
7
0
0
Bibliographies
0
0
0
0
0
0
2
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
38
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern