Veksler, Vladislav Daniel; Govostes, Ryan Z; Gray, Wayne D
Defining the dimensions of the human semantic space Incollection
Sloutsky, Vladimir ; Love, Brad ; McRae, Ken (Ed.): 30th Annual Meeting of the Cognitive Science Society, pp. 1282-1287, Cognitive Science Society, Austin, TX, 2008.
@incollection{vdv08csc.paper,
title = {Defining the dimensions of the human semantic space},
author = { Vladislav Daniel Veksler and Ryan Z. Govostes and Wayne D. Gray},
editor = {Sloutsky, Vladimir and Love, Brad and McRae, Ken},
year = {2008},
date = {2008-01-01},
booktitle = {30th Annual Meeting of the Cognitive Science Society},
pages = {1282-1287},
publisher = {Cognitive Science Society},
address = {Austin, TX},
abstract = {We describe VGEM, a technique for converting probability- based measures of semantic relatedness (e.g. Normalized Google Distance, Pointwise Mutual Information) into a vector-based form to allow these measures to evaluate relatedness of multi-word terms (documents, paragraphs). We use a genetic algorithm to derive a set of 300 dimensions to represent the human semantic space. With the resulting dimension sets, VGEM matches or outperforms the probability-based measure, while adding the multi-word term functionality. We test VGEM's performance on multi-word terms against Latent Semantic Analysis and find no significant difference between the two measures. We conclude that VGEM is more useful than probability-based measures because it affords better performance, and provides relatedness between multi-word terms; and that VGEM is more useful than other vector-based measures because it is more computationally feasible for large, dynamic corpora (e.g. WWW), and thus affords a larger, dynamic lexicon.},
keywords = {computational linguistics, Latent Semantic Analysis, LSA, Measures of Semantic Relatedness, multidimensional semantic space, natural language processing, NGD, Normalized Google Distance, semantic dimensions, vector generation, VGEM},
pubstate = {published},
tppubtype = {incollection}
}
We describe VGEM, a technique for converting probability- based measures of semantic relatedness (e.g. Normalized Google Distance, Pointwise Mutual Information) into a vector-based form to allow these measures to evaluate relatedness of multi-word terms (documents, paragraphs). We use a genetic algorithm to derive a set of 300 dimensions to represent the human semantic space. With the resulting dimension sets, VGEM matches or outperforms the probability-based measure, while adding the multi-word term functionality. We test VGEM's performance on multi-word terms against Latent Semantic Analysis and find no significant difference between the two measures. We conclude that VGEM is more useful than probability-based measures because it affords better performance, and provides relatedness between multi-word terms; and that VGEM is more useful than other vector-based measures because it is more computationally feasible for large, dynamic corpora (e.g. WWW), and thus affords a larger, dynamic lexicon.