DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 21

1
Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification
In: Front Artif Intell (2022)
Abstract: This article examines the basis of Natural Language Understanding of transformer based language models, such as BERT. It does this through a case study on idiom token classification. We use idiom token identification as a basis for our analysis because of the variety of information types that have previously been explored in the literature for this task, including: topic, lexical, and syntactic features. This variety of relevant information types means that the task of idiom token identification enables us to explore the forms of linguistic information that a BERT language model captures and encodes in its representations. The core of this article presents three experiments. The first experiment analyzes the effectiveness of BERT sentence embeddings for creating a general idiom token identification model and the results indicate that the BERT sentence embeddings outperform Skip-Thought. In the second and third experiment we use the game theory concept of Shapley Values to rank the usefulness of individual idiomatic expressions for model training and use this ranking to analyse the type of information that the model finds useful. We find that a combination of idiom-intrinsic and topic-based properties contribute to an expression's usefulness in idiom token identification. Overall our results indicate that BERT efficiently encodes a variety of information from topic, through lexical and syntactic information. Based on these results we argue that notwithstanding recent criticisms of language model based semantics, the ability of BERT to efficiently encode a variety of linguistic information types does represent a significant step forward in natural language understanding.
Keyword: Artificial Intelligence
URL: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964145/
https://doi.org/10.3389/frai.2022.813967
BASE
Hide details
2
Training corpus hr500k 1.0
Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
BASE
Show details
3
Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian ...
BASE
Show details
4
Is it worth it? Budget-related evaluation metrics for model selection ...
BASE
Show details
5
Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
6
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
7
Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
8
Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
BASE
Show details
9
Fine-grained human evaluation of neural versus phrase-based machine translation ...
BASE
Show details
10
Serbian-English parallel corpus srenWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details
11
Finnish-English parallel corpus fienWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details
12
Serbian web corpus srWaC 1.1
Ljubešić, Nikola; Klubička, Filip. - : Jožef Stefan Institute, 2016
BASE
Show details
13
Inflectional lexicon hrLex 1.0
Ljubešić, Nikola; Klubička, Filip. - : Faculty of Humanities and Social Sciences, University of Zagreb, 2016
BASE
Show details
14
Inflectional lexicon hrLex 1.2
Ljubešić, Nikola; Klubička, Filip; Boras, Damir. - : Faculty of Humanities and Social Sciences, University of Zagreb, 2016
BASE
Show details
15
Tourism English-Croatian Parallel Corpus 2.0
Toral, Antonio; Esplà-Gomis, Miquel; Klubička, Filip. - : Abu-MaTran project, 2016
BASE
Show details
16
Inflectional lexicon srLex 1.2
Ljubešić, Nikola; Klubička, Filip; Boras, Damir. - : Faculty of Humanities and Social Sciences, University of Zagreb, 2016
BASE
Show details
17
Croatian-English parallel corpus hrenWaC 2.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details
18
Inflectional lexicon srLex 1.0
Ljubešić, Nikola; Klubička, Filip. - : Faculty of Humanities and Social Sciences, University of Zagreb, 2016
BASE
Show details
19
Croatian web corpus hrWaC 2.1
Ljubešić, Nikola; Klubička, Filip. - : Jožef Stefan Institute, 2016
BASE
Show details
20
Slovene-English parallel corpus slenWaC 1.0
Ljubešić, Nikola; Esplà-Gomis, Miquel; Ortiz Rojas, Sergio. - : Jožef Stefan Institute, 2016
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
21
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern