Upload
a-v-anisimov
View
212
Download
0
Embed Size (px)
Citation preview
Cybernetics and Systems Analysis, Vol. 47, No. 4, July, 2011
A METHOD FOR THE COMPUTATION
OF THE SEMANTIC SIMILARITY
AND RELATEDNESS BETWEEN NATURAL
LANGUAGE WORDS
A. V. Anisimov,a†
O. O. Marchenko,a‡
and V. K. Kysenkoa††
UDC 681.3
Abstract. This paper develops methods for calculating the semantic similarity (closeness)-relatedness of
natural language words. The concept of semantic relatedness allows one to construct algorithmic models
for the context-linguistic analysis with a view to solving problems such as word sense disambiguation,
named entity recognition, natural language text analysis, etc. A new algorithm is proposed for estimating
the semantic distance between natural language words. This method is a weighted modification of the
well-known Lesk approach based on the lexical intersection of glossary entries.
Keywords: computer linguistics, semantic analysis of natural language texts, semantic
similarity-relatedness of words, semantic ambiguity of words.
INTRODUCTION
A key element in computer simulation of natural language processes is the possibility to determine the semantic
closeness (similarity), i.e., the semantic distance between concepts that is often specified on the graph of concepts (notions)
of an ontological knowledge base. The computation of semantic distances is widely used in many problems of computational
linguistics such as automatic abstracting and annotation of texts, word sense disambiguation, anaphora analysis, indexing
and search, and machine translation.
In a natural language, there are a number of classical problems of considerable complexity for the majority of tasks of
computer linguistics, namely, polysemy, homonymy, anaphoric references, pronouns, and other language phenomena whose
computer processing is impossible without some semantic analysis and semantic interpretation of a text. The essence of
polysemy and homonymy problems is that the same words denote sets of different concepts (for example, the word bank has
different semantic meanings such as a financial institution and a riverside). The context in which a given word is located
suggests the meaning in which it is used. To take into account the influence of the context and to determine the actual
meaning of some word, a computer system should find an estimate for the semantic closeness with respect to meanings of
words adjacent to it in the text for each meaning of this word. This is solved by the application of a function for computing
the semantic closeness and relatedness of concepts.
In computer linguistics, the anaphora problem is as follows: the same entity in a text is mentioned using different
words-names; a particular case of an anaphora is a pronoun. For each pronoun, there can be a wide variety of candidates for
replacement (antecedents), i.e., groups of nouns that are located earlier in the text and that are denoted by this pronoun. One
can determine the candidate that is the correct antecedent by substituting each of them for a pronoun (anaphora) and
computing the degree of correspondence of the context of the candidate for replacement with the context of the pronoun
(anaphora). Such a correspondence is also found with the help of a function for computing the semantic closeness and
relatedness of concepts.
5151060-0396/11/4704-0515
©
2011 Springer Science+Business Media, Inc.
a
Taras Shevchenko National University, Kiev, Ukraine,
†
‡
††
[email protected]. Translated from Kibernetika i Sistemnyi Analiz, No. 4, pp. 18–27, July–August 2011. Original
article submitted March 10, 2011.
516
A semantic closeness (similarity) relationship specifies not only a synonymy relationship since meanings of concepts
can be closely related but not identical. The presence of many other relationships stipulates the following refinement of
semantic relatedness: engine and car are connected through the whole-part relationship and cold and hot are antonyms. At
the same time, it is difficult to establish a direct relationship between many words (for example, winter and blizzard), but,
despite this, one can see that they are explicitly semantically related.
Semantic closeness and semantic relatedness relationships differ from one another. Whereas boat and launch are
semantically closely related concepts, engine and fuel are semantically related concepts but are not similar by implication.
Semantic closeness and semantic relatedness are relationships traditionally defined on the semantic graph of an
ontological knowledge base. The determination of the presence of some relationship between concepts is realized by
checking the existence of semantic relationships between nodes that contain corresponding concepts in an ontological
network. Such a check is often reduced to the problem of searching for the shortest path between vertices-concepts of the
graph of the knowledge base. The construction of the path is followed by the stage of its analysis and interpretation whose
purpose is the determination of the semantic meaning of the found path, i.e., the type of the semantic relationship that exists
between these concepts and the depth of this relationship.
There also is another approach to the determination of an estimate for the semantic closeness (similarity)-relatedness of
concepts, which is proposed in [1]. Methods of this line of investigation compute the intersection of the lexical composition of
entries-definitions for two input concepts, and the more words belong to this intersection, the more related these concepts are.
In this article, a new method for the determination of the semantic relatedness of concepts is proposed. It is suggested
that it is more expedient to compute and consider not the simple intersection of sets of lexemes of two entries of some
thesaurus that define two input concepts but also to take into account the position of each word within the entry-definition of
a concept. To this end, a thesaurus entry should be structured by partitioning into zones with different priority levels, for
example, “name,” “definition,” “links to other terms,” and “descriptive part.” Depending on the zone to which belongs a
meaningful word, a definite priority weight is assigned to it. Thus, a set of subsets of terms in which each subset has its
weight rather than a simple set of lexemes of the text defining a concept is considered. We propose to compute and analyze
not the intersection of two lexical sets of the texts defining input concepts but the intersection of structured “multilevel” sets.
This allows one to consider all variants of pairwise intersections of subsets from the first and second set and to take into
account delicate nuances of the lexical structural organization of texts, for example, the number of common words in names
of the first and second concept (the priority weight of such an intersection is highest), the number of common words in the
definition of the first concept and in the name of the second word (as is obvious, its weight should be less than that of the
previous one), the number of common words in the definition of the first concept and in the descriptive part of the entry of
the second concept (its weight decreases still further), etc. Analyzing all possible variants of multilevel intersections and
selecting an optimal weight for each variant, one can construct a qualitatively new efficient estimate for semantic
closeness-relatedness of natural language words.
MODERN METHODS OF COMPUTATION OF SEMANTIC CLOSENESS
Let us consider existing methods for semantic distance computation. From the beginning of the 1980s, several
heuristic methods are developed.
The choice of source data, i.e., the foundation for the computation of semantic closeness, is very important. The
linguistic knowledge bases WordNet and ConceptNet are mostly used in investigations; Wikipedia and Google Search are
also involved. The most important results are obtained in using WordNet and Wikipedia [2–4].
One class of methods is based on the computation of the distance �( , )c c1 2
between two concepts (nodes) c c1 2
and
in some taxonomy (WordNet or the category tree of Wikipedia). For example, the shortest path between two corresponding
vertices in this taxonomy can be used. One of first such metrics proposed in [5] is as follows:
�( , )c cN p
1 2
�
1
,
where N p is the number of vertices in the shortest path connecting nodes c c1 2
and . A minus of this metrics was
noted to be the nonuniformity of depths of some taxonomy concepts. In [6], the following normalized version of this
method is presented that take into account the height of the taxonomy being used:
�( , )c cN
D
p
1 2
� �log
2
,
where D is the maximal depth of the taxonomy tree.
One more method is described in [7]. The proposed algorithm takes into account LSO( , )c c1 2
, i.e., the depth of the
lowest superordinate of two nodes of the taxonomy graph that correspond to concepts c c1 2
and ,
�( , )
( ( , ))
( ) ( )
c cc c
c c1 2
� �
�
log
depth LSO
depth depth
1 2
1 2
,
where depth( )x is the distance from the taxonomy root to a node x.
In [8], Wikipedia is first used for the calculation of semantic distance. The method WikiRelate! applies the
above-mentioned metrics to the Wikipedia category tree.
Another class of algorithms is developed by M. Lesk [1]. He constructed an algorithm based on the idea of definition
of close concepts with the help of a similar collection of words. As the semantic distance between concepts, the ratio of the
number of identical words in definitions of concepts to the total amount of words in two definitions is used.
In the past five years, several methods based on Wikipedia were developed whose accuracy was unattainable earlier.
In [9], the method Wikipedia Link-Based Measure (WLM) is proposed for the computation of semantic closeness on the
basis of links between pages. Its main idea is the assumption that a concept (represented by some Wikipedia entry in this
case) is rather exactly described with the help of incoming and outgoing links. Each link has its weight determined by the
frequency of its occurrence among all pages of the encyclopedia. Thus, to each entry corresponds a vector with links. The
weight of a link is computed using the well-known formula TD-IDF. The distance between entries is found with the help of
cosine distance between vectors of entry weights.
One of the most efficient methods is Explicit Semantic Analysis (ESA) described in [4]. In contrast to the well-known
algorithm Latent Semantic Analysis (LSA) in which implicit relationships between texts of entries are determined, in this
method, a concept is represented in explicit form with the help of the weighed sum of terms obtained from Wikipedia.
A given concept is projected into the space of vectors-entries of Wikipedia. Thus, semantic closeness is determined as the
cosine distance between vectors projected into the space of Wikipedia entries.
In [10], the method WikiWalk is presented that applies the technique for random walks on a graph. Two types of
graphs are considered that are constructed with the help of WordNet and Wikipedia. This method uses the following
algorithm called Personalized PageRank: a certain particle randomly wanders on graph nodes (in the case of Wikipedia, on
its entries) and passes to a new page with some probability. Thus, each graph node is characterized by a vector of
probabilities of transitions to other pages (a teleportation vector). Such a vector turns out to be a unique characteristic of
a Wikipedia page (and the corresponding described concept together with it). The semantic closeness is computed as
the distance between teleportation vectors of corresponding pages.
A METHOD FOR THE COMPUTATION OF SEMANTIC CLOSENESS-RELATEDNESS
Source data used in this paper consist of the free Internet Encyclopedia Wikipedia. At present, English-language,
Russian-language, and Ukrainian-language Wikipedias contain more than 3.5 million, more than 600 thousand, and more
than 250 thousand entries, respectively. This large amount of entries is provided by the “freeness” of the project. Each user
can create, correct, and supplement entries. Owing to moderation, this does not lead to the decrease in the quality of texts of
entries since practically each change is checked by one user or a group of users that have proved their competence. A very
important factor is also the possibility of downloading the complete local copy of Wikipedia. However, this encyclopedia
has definite drawbacks. Some entries are not completely objective; for example, an author can introduce his personal opinion
concerning some question. One more minus is an insufficient rigor of the format of an entry description, which considerably
complicates the development of a program-analyzer of texts of the encyclopedia. The Internet Encyclopedia Wikipedia is
a unique and valuable but not formalized data source.
The Wikipedia structure has a number of properties that can be used in computing semantic closeness. These
properties can model some types of lexical relations between words.
517
• Synonymy. It is determined by means of pages-redirections. As a rule, contents of such entries consist of a string of
the form “#REDIRECT <page name>.” For example, the entry auto refers to the page car.
• Homonymy. It is specified by special pages with a list of possible meanings of a concept. For example, the page
note contains links to different meanings of this word, for example, musical symbol, diplomatic note, financial bond, model
of tape recorders, and river name. In this work, pages of this type are used for word sense disambiguation, in particular, for
obtaining the list of possible meanings of a term.
• Cross references. They are represented by links to other Wikipedia entries. For example, the entry water contains
links to the following entries: chemical substance, liquid, ice, snow, steam, solvent, ocean, river, life, weather, climate, etc.
Such links show interrelations between concepts.
The proposed method called the Estimated Weighted Overlap (EWO) is a development of the above-mentioned Lesk
approach. His method proceeds from the assumption that close concepts are described (or defined) with using similar
collections of words, i.e., the number of common words in dictionary definitions can show the semantic closeness of the
corresponding two concepts. The proposed functional-structural generalization of the Lesk method is based on the idea that a
text reflects the semantic ordering between words. Some words are more important than others depending on their position
in a text. For example, a word from the name of an entry (or from the definition of a term) usually has a greater significance
than a word from the end of the text. To introduce such a distinction, the proposed algorithm assigns to each word of an
entry text a weight corresponding to the significance of the word. Weights of words are computed from the following
features: the name of an entry contains a given word; the word belongs to the definition of the corresponding concept; the
word belongs to the first section of the entry; the word is a link to another entry; other words.
We assume that two words to be estimated are applied to the input of the algorithm. It first fetches the corresponding
entries from Wikipedia. Then the texts of the entries are divided into words. Next, the algorithm eliminates words from the
“stop-list.” The stop-list contains words that do not carry a large semantic load, namely, prepositions, unions, pronouns,
words in general use, etc. At the next step, the algorithm divides sets of words into subsets corresponding to prescribed
factors. For example, for the features described above, the sets are as follows: L1
are words from the name; L2
are words
from the concept definition; L3
are words from the first section; L4
are words representing cross references; L5
are other
words. At the same time, if some word w belongs to Li , then it is eliminated from L j for any j i� .
This method proposes to take into account the structure of entries rather than to simply analyze intersections of two
lexical collections of Wikipedia entries for two input concepts. If names and definitions of concepts contain common terms,
then the list of intersection of lexemes for names and definitions must have a much larger weight of significance than the list of
intersection for the entire other entry body. It is proposed to divide significant words of both entries according to the
corresponding features into groups L Ln1
1 1
, ,� and L Ln1
2 2
, ,� and then to pairwisely count intersections L Li j
1 2
� . In the case
being considered, the number of features is equal to five (the number of features can be another in another realization of the
algorithm). For each possible intersection, the corresponding priority weight is determined, which is maximal for the case of
intersection of terms from names L L1
1
1
2
� and minimal for common terms from the descriptive part of the entry L L5
1
5
2
� . An
intermediate weight is assigned to an intermediate variant, for example, when some terms are used in the definition of the first
concept but appear descriptively at the end of the entry for the second concept (the intersection L L2
1
5
2
� ).
A weight wi is assigned to each word from the ith set Li . Based on sets Li
1
and Li
2
, a matrix D is constructed in which
an element D i j[ , ] is equal to the number of common words in Li
1
and Li
2
, i.e., | |L Li j
1
�
2
, that is multiplied by the weight
w w wij i j� � . We assume that the semantic closeness is equal to the normalized sum of elements of the matrix D.
The algorithm performs the following actions.
1. For two concepts c c1 2
and , retrieve entries t1
and t2
defining these concepts. Extract all words from the entries t1
and t2
. Denote the sets of words by T1
and T2
, respectively.
2. Eliminate the words belonging to the stop-list from T1
and T2
.
3. Divide the sets T1
and T2
into subsets L Ln1
1 1
, ,� and L Ln1
2 2
, ,� according to given features, where n is the number
of features.
4. Construct the matrix D of the form
w L L w L L
w L L w
n n
n n n
11 1
1
| | | |
| |
1
1
1
2
1
1 2
1
1
2
� �
�
� �
� � � �
� � � �
� � n n nL L| |
1 2
�
�
�
�
�
�
�
�
�
.
518
5. Compute the semantic closeness value as the normalized sum
EWO
11
1 2
1
( , )
(| | | | )
,
c c
D
w L L
i j
j
n
i
n
i i i
i
n1 2
�
�
��
�
.
The procedure for obtaining weights wi on the basis of the simulated annealing algorithm (a method for discrete
global optimization) is described below in detail.
WORD SENSE DISAMBIGUATION
Some concepts can have identical spellings but different meanings. For example, the word jaguar can denote an
animal from the cat family and a British car model. Thus, one should correctly choose a meaning (and an entry from
Wikipedia) depending on the second word of a pair. For example, if the pair of words <jaguar; lion> is applied to the input
of the algorithm, then jaguar must be considered as a big cat, and if the pair <jaguar; Mercedes> is applied, then it must be
interpreted as a car model. An algorithm is developed for resolving such ambiguities.
As above, a pair of words is applied to the input of the algorithm. For both words, we obtain a list of possible
candidate entries (meanings). Then for each pair of meanings in which the first meaning belongs to one list and the second
belongs to the other list, the semantic closeness value is computed. Then the pair with the largest value is chosen. More
formally, the algorithm is written as follows.
1. Obtain lists of meanings for both words as follows:
• retrieve the list of entries with the name of the form <word> (refinement) from the index;
• (additionally) retrieve the list of possible meanings from the page with the description of ambiguities.
2. For each pair of entries, calculate the semantic closeness (similarity)-relatedness value.
3. Choose the pair with the highest semantic closeness.
In practical implementations, this process can be optimized as follows: use only the first sections instead of the
intersection of complete texts of entries. This optimization considerably reduces the computational complexity of the process
but do not exert influence on the accuracy of computations.
ESTIMATION OF WEIGHTS
For estimating weights wij , the simulated annealing method [11] is used, i.e., a probabilistic heuristic technique for
solving global optimization problems. This method operates with points in a decision space. In the case being considered,
a point is a vector consisting of five weights that correspond to the chosen features. At each iteration of the algorithm, one
point is stored, namely, the current one, which can be changed according to a definite probabilistic rule. The structure of the
pseudo-code [12] of this algorithm for the maximization of a function F x( ) is as follows.
1. Randomly choose an initial point x0
.
2. Put x xbest 0
� .
3. While i k� , perform the following steps:
• randomly choose a point x among the neighbors of a point xi ;
• if F x F x( ) ( )
best
� , then x xbest
� ;
• if F x F xi( ) ( )� , then x xi� �1
;
• if rnd �
�
eF x F x ti i( ( ) ( ))/
, then x xi�
�1
.
4. Return xbest
.
Here, rnd is a random number beween 0 and 1, and the parameter t i denotes elements of some decreasing sequence.
These values are called annealing temperatures.
On the whole, this method is similar to the gradient descent method, but the use of a probabilistic law prevents the
algorithm from “sticking” at points of local maxima. This property helps one to obtain more efficient results.
519
The Spearman rank correlation coefficient is used in the capacity of the function for maximization. The solution
search space is the space of vectors whose dimension is equal to the number of features used in the algorithm, i.e., to each
coordinate of such a vector corresponds the weight of some feature. To estimate weights, a small training base is created that
consists of pairs of words belonging to main classes of semantic closeness-relatedness relationships such as very close
concepts, absolutely independent concepts, words with many meanings, etc. An optimizing procedure was executed several
times and weights were chosen that provide the maximal correlation with the training base.
SOFTWARE IMPLEMENTATION
A software implementation of the proposed method is developed. The program is written in the programming
language Scala [13, 14], which is a modern well-designed language convenient for the creation of text-processing software.
The current Scala implementation compiles the initial text into a bytecode for the Java Virtual Machine (JVM). This
property allows one to execute the program on all operational systems that are supported by JVM (for example, Windows,
GNU/Linux, and MacOS X). As source data, a local copy of Wikipedia downloaded from the web-site of the project is used.
The total archive size is very large (more than 5.5 Gb) and, hence, to realize an efficient fast search for entries, a
preprocessing was performed. We note that block archiving is used for the creation of an archive. This allows one to
partition this large archive into a set of small (about 1 Mb) archives and to create a search index for them. The unique
XML-file (about 25Gb) that contains all Wikipedia entries is located in the middle of the archive. For retrieving entries from
this file, a parser is developed that takes into account the block structure of the archive and can process large amounts of
data. In general, the mentioned preprocessing can be described as follows.
1. For each entry from a local Wikipedia copy
• retrieve its name and text;
• eliminate parts of the text that are insignificant for the algorithm, for example, links to external resources,
comments, and image descriptions;
• store the name and processed entry text as a text file;
• add the pair <entry name; name of the text file in which the contents is stored> to the database.
2. After processing all Wikipedia entries, create the database index for the field “entry name.”
Thus, entries are stored in conventional text files. As a database, the modern nonrelational document-centric database
MongoDB is used that, according to the results of a set of tests, is considered to be one of most productive. Of importance
also is the possibility of search based on regular expressions in the database, which is actively used in word sense
disambiguation. The size of the final database equals 1.5 Gb. On the whole, this approach to data storage has allowed us to
achieve extremely high efficiency in searching for and retrieving entries.
To optimize weight parameters (to implement the simulated annealing method), a separate application is developed. The
interaction of the optimizer with the program is realized with the help of configuration files. The optimizer gives an answer in the
form of a vector of real numbers, i.e., weight parameters of the algorithm that provide the highest correlation with a training set.
The program for computing semantic distance is developed with console and graphic interfaces. The graphic interface
allows one to introduce pairs of words for estimating semantic closeness in interactive mode. This interface is more
user-friendly and, in addition to estimation as such, allows for browsing a large amount of additional information such as
entry texts, lists of candidate entries, weights of words, etc. The console interface can be more easily called from other
programs and is controlled with the help of parameters of the command line. We also plan to develop an additionally
downloaded separate library for a better integration with third-party applications.
For testing algorithms of computation of semantic closeness-relatedness, the collection of weighed pairs of words
Finkelstein WordSimilarity-353 [15] is frequently used. It contains 353 pairs of words estimated by experts. Each pair is
estimated by a real number between 0 and 10. As an estimate of operation of the proposed algorithm, the Spearman rank
correlation coefficient was used. Below, the coefficients of correlation of meanings computed by the proposed algorithm with
estimates from Finkelstein WordSimilarity are equal to 0.63, 0.68, and 0.74, respectively, for the following three modes:
• without word sense disambiguation;
• with partial word sense disambiguation (candidates are entries whose names are of the form <word>
(<refinement>));
• with complete word sense disambiguation (candidates are obtained from entry lists of ambiguities; as a rule, the
names of these entries are of the form <word> (disambiguation)).
520
These values demonstrate a substantial improvement of results owing to the use of word sense disambiguation. To
compare with some other methods, the diagram presented in Fig. 1 is constructed, which reflects the results of measurements
for various algorithms of computing semantic closeness. The diagram contains estimates obtained by the following methods:
• the RND method returning a random value for a pair of words;
• methods based on the search for a path in a graph, namely, the shortest path method (PATH), Leacock–Chodorov
(LCH) method, Wu–Palmer (WUP) method, and Resnik (RES) method [8, 16];
• the WLM method [9];
• the ESA method [4, 9];
• the EWO method.
The software implementation of the EWO method shows its high efficiency, namely, it estimates 20–100 pairs of
words per second. Some results obtained by the program for computing estimates of semantic closeness-relatedness of words
from a test sample are presented in Table 1.
CONCLUSIONS
In this article, a new efficient method for the computation of semantic closeness-relatedness between natural language
words is described. The presented algorithm is a modification of the well-known Lesk approach. It is based on the positional
structurization of texts of glossary entries that provides the assignment of a priority weight to each significant term
depending on its position in the entry text, which makes it possible to compute lexical intersections of different levels with
different priority weights. In this case, nuances of lexical structures of entries containing definitions of concepts are taken
into account rather than a simple intersection of words of two texts. The Internet Encyclopedia Wikipedia is used as source
data for computations. The simulated annealing method is used for the determination of weight parameters.
The described method has demonstrated a high degree of correlation with test data. Thus, the proposed algorithm
demonstrates results at the level of best modern methods and, at the same time, is transparent and intuitive. A software
implementation of the method is developed whose high operating speed allows one to use it in solving various problems of
computer linguistics.
There are several ways for improving estimation quality, namely,
• addition of new factors to the weight model;
• integration with others techniques for calculating semantic closeness with a view to constructing a complex
estimate.
The efficiency can be increased, for example, by developing a parallel version of the program. This will allow using
modern multiprocessor and multinuclear computing systems.
521
Pair of WordsEstimate of Semantic
Closeness-Relatedness of Words
word 1 word 2 expert algorithm
ñar automobile 8.94 9.99
magician wizard 9.02 6.93
glass magician 2.08 1.1
money currency 9.04 5.67
noon string 0.54 0.82
FBI fingerprint 6.94 4.05
tiger cat 7.35 4.13
tiger tiger 10 10
book paper 7.46 4.44
computer keyboard 7.62 4.38
computer internet 7.58 4.04
physics chemistry 7.35 4.28
drink ear 1.31 1.13
TABLE 1
Fig. 1
Co
rrela
tio
n
This program for computing the semantic closeness-relatedness between natural language words is developed within
the framework of a system of multi-purpose applied systems of semantic analysis and semantic processing of text
documents.
REFERENCES
1. M. Lesk, “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice
cream cone,” in: Proc. of the 5th Annu. Intern. Conf. on Syst. Document SIGDOC’86, ACM, New York (1986),
pp. 24–26.
2. S. Wubben, “Using free link structure to calculate semantic relatedness,” ILK Research Group Technical Report
Series No. 08–01, Tilburq Univ., Tilburq (2008).
3. S. P. Ponzetto and M. Strube, “Knowledge deriver from Wikipedia for computing semantic relatedness,” Artif. Intell.
Res., No. 30, 181–212 (2007).
4. E. Gabrilovich and S. Markovitch, “Computing semantic relatedness using Wikipedia-based explicit semantic
analysis,” in: Proc. 20th Intern. Joint Conf. on Artif. Intell. (Hyderabad, 2007), Morgan Kauffman, San Francisco
(2007), pp. 1606–1611.
5. P. Resnik, “Using information content to evaluate semantic similarity in a taxonomy,” in: Proc. Intern. Joint Conf. on
Artif. Intell. (Montreal, 1995), Morgan Kauffman, San Francisco (1995), pp. 448–453.
6. C. Leacock, M. Chodorow, and G. A. Miller, “Using corpus statistics and wordnet relations for sense identification,”
Comput. Ling., 24, No. 1, 147–165 (1998).
7. Z. Wu and M. Palmer, “Verb semantics and lexical selection,” in: Proc. 32nd. Annu. Meet. of the Assoc. for Comput.
Ling. (Las Cruces, 1994), Morgan Kauffman, San Francisco (1994), pp. 133–138.
8. M. Strube and S. P. Ponzetto, “WikiRelate! Computing semantic relatedness using Wikipedia,” in: Proc. 21st Nat.
Conf. on Artif. Intell., AAAI, Boston, MA (2006), pp. 1419–1424.
9. D. Milne and I. H. Witten, “An effective, low-cost measure of semantic relatedness obtained from Wikipedia links,”
in: Proc. 1st AAAI Workshop on Wikipedia and Artif. Intell. (CIKM’2008) (Chicago, 2008), AAAI Press, Menlo
Park (USA) (2008).
10. E. Yeh, D. Ramage, C. D. Manning, et al., “WikiWalk: Random walks on Wikipedia for semantic relatedness,” in:
ACL-IJCNLP TextGraphs-4 Workshop 2009, Singapore (2009).
11. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, New Series, No. 220,
671–680 (1983).
12. S. Luke, Essentials of Metaheuristics (2009), http://cs.gmu.edu/!sean/book/metaheuristics/.
13. M. Odersky, Scala by Example, Progr. Meth. Lab., EPFL, Lausanne (2009).
14. M. Odersky, L. Spoon, and B. Venners, Programming in Scala, Artima Press, Montain View (2008).
15. L. Finkelstein, E. Gabrilovich, Y. Matias, et al., “Placing search in context: The concept revisited,” ACM Trans.
Inform. Systems, 20, No. 1, 116–131 (2002).
16. T. Pedersen, S. Pathwardhan, and J. Michelizzi, “Wordnet::Similarity — Measuring the relatedness of concepts,” in:
Proc. 19th Nat. Conf. on Artif. Intell. (San Jose, 2004), Springer, Berlin (2004), pp. 1024–1025.
522