1 |
Semantic Data Set Construction from Human Clustering and Spatial Arrangement
|
|
|
|
In: Computational Linguistics, Vol 47, Iss 1, Pp 69-116 (2021) (2021)
|
|
Abstract:
AbstractResearch into representation learning models of lexical semantics usually utilizes some form of intrinsic evaluation to ensure that the learned representations reflect human semantic judgments. Lexical semantic similarity estimation is a widely used evaluation method, but efforts have typically focused on pairwise judgments of words in isolation, or are limited to specific contexts and lexical stimuli. There are limitations with these approaches that either do not provide any context for judgments, and thereby ignore ambiguity, or provide very specific sentential contexts that cannot then be used to generate a larger lexical resource. Furthermore, similarity between more than two items is not considered. We provide a full description and analysis of our recently proposed methodology for large-scale data set construction that produces a semantic classification of a large sample of verbs in the first phase, as well as multi-way similarity judgments made within the resultant semantic classes in the second phase. The methodology uses a spatial multi-arrangement approach proposed in the field of cognitive neuroscience for capturing multi-way similarity judgments of visual stimuli. We have adapted this method to handle polysemous linguistic stimuli and much larger samples than previous work. We specifically target verbs, but the method can equally be applied to other parts of speech. We perform cluster analysis on the data from the first phase and demonstrate how this might be useful in the construction of a comprehensive verb resource. We also analyze the semantic information captured by the second phase and discuss the potential of the spatially induced similarity judgments to better reflect human notions of word similarity. We demonstrate how the resultant data set can be used for fine-grained analyses and evaluation of representation learning models on the intrinsic tasks of semantic clustering and semantic similarity. In particular, we find that stronger static word embedding methods still outperform ...
|
|
Keyword:
Computational linguistics. Natural language processing; P98-98.5
|
|
URL: https://doi.org/10.1162/coli_a_00396 https://doaj.org/article/852bb49f4b554372b19c37919cd823a3
|
|
BASE
|
|
Hide details
|
|
2 |
Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages
|
|
|
|
In: Transactions of the Association for Computational Linguistics, Vol 9, Pp 410-428 (2021) (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity
|
|
|
|
In: Computational Linguistics, Vol 46, Iss 4, Pp 847-897 (2020) (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More
|
|
|
|
In: http://aclweb.org/anthology/P/P14/P14-2135.pdf (2014)
|
|
BASE
|
|
Show details
|
|
5 |
Author manuscript, published in "COLING 2012, Mumbai: India (2012)" Multi-way Tensor Factorization for Unsupervised Lexical Acquisition
|
|
|
|
In: http://hal.inria.fr/docs/00/78/37/11/PDF/VandeCruysEtAl2012Multi.pdf (2013)
|
|
BASE
|
|
Show details
|
|
6 |
Learning syntactic verb frames using graphical models
|
|
|
|
In: http://www.cl.cam.ac.uk/~do242/Papers/acl12_verb_frames.pdf (2012)
|
|
BASE
|
|
Show details
|
|
7 |
Author manuscript, published in "Empirical Methods in Natural Language Processing, France (2011)" Latent Vector Weighting for Word Meaning in Context
|
|
|
|
In: http://halshs.archives-ouvertes.fr/docs/00/66/64/75/PDF/D11-1094_3_.pdf (2012)
|
|
BASE
|
|
Show details
|
|
8 |
Statistical metaphor processing
|
|
|
|
In: http://wing.comp.nus.edu.sg/~antho/J/J13/J13-2003.pdf (2012)
|
|
BASE
|
|
Show details
|
|
9 |
Learning syntactic verb frames using graphical models
|
|
|
|
In: http://www.aclweb.org/anthology-new/P/P12/P12-1044.pdf (2012)
|
|
BASE
|
|
Show details
|
|
10 |
Multi-way tensor factorization for unsupervised lexical acquisition
|
|
|
|
In: http://aclweb.org/anthology/C/C12/C12-1165.pdf (2012)
|
|
BASE
|
|
Show details
|
|
11 |
Document and corpus level inference for unsupervised and transductive learning of information structure of scientic documents
|
|
|
|
In: http://aclweb.org/anthology/C/C12/C12-2097.pdf (2012)
|
|
BASE
|
|
Show details
|
|
12 |
Exploring variation across biomedical subdomains
|
|
|
|
In: http://www.cl.cam.ac.uk/~do242/Papers/coling10_domain_final.pdf (2010)
|
|
BASE
|
|
Show details
|
|
13 |
Investigating the cross-linguistic potential of VerbNet-style classification
|
|
|
|
In: http://halshs.archives-ouvertes.fr/docs/00/53/90/36/PDF/report-camera-ready.pdf (2010)
|
|
BASE
|
|
Show details
|
|
14 |
Exploring variation across biomedical subdomains
|
|
|
|
In: http://aclweb.org/anthology/C/C10/C10-1078.pdf (2010)
|
|
BASE
|
|
Show details
|
|
15 |
LexSchem: A large subcategorization lexicon for French verbs
|
|
|
|
In: http://lipn.univ-paris13.fr/~messiant/publications/messiant-lrec08.pdf (2008)
|
|
BASE
|
|
Show details
|
|
16 |
A large-scale classification of English verbs
|
|
|
|
In: http://verbs.colorado.edu/~kipper/Papers/lrec-journal.pdf (2008)
|
|
BASE
|
|
Show details
|
|
17 |
The choice of features for classification of verbs in biomedical texts
|
|
|
|
In: http://www.aclweb.org/anthology-new/C/C08/C08-1057.pdf (2008)
|
|
BASE
|
|
Show details
|
|
18 |
Verb class discovery from rich syntactic data
|
|
|
|
In: http://www.cl.cam.ac.uk/users/alk23/cicling-08.pdf (2008)
|
|
BASE
|
|
Show details
|
|
19 |
Verb class discovery from rich syntactic data
|
|
|
|
In: http://www.cl.cam.ac.uk/%7Els418/works/cicling08.pdf (2008)
|
|
BASE
|
|
Show details
|
|
20 |
Automatic classification of verbs in biomedical texts
|
|
|
|
In: http://www.cl.cam.ac.uk/users/alk23/korhonen-acl-06.pdf (2006)
|
|
BASE
|
|
Show details
|
|
|
|