Upload
mohammad-nasiruddin
View
219
Download
0
Embed Size (px)
Citation preview
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 1/7
Word Sense Induction for Under-Resourced Languages
Mohammad Nasiruddin
Univ. Grenoble Alpes
Laboratoire LIG - Bâtiment IM2AG B - 41 rue des Mathématiques
38400 Saint Martin d’Hères, France
Abstract. Word Sense Induction (WSI) is the task of automatically identifyingthe meaning of a polysemous word in a sentence in an unsupervised way, i.e.
without relying on any handcrafted resources or manually annotated data. This
article presents the state of the art of different approaches and evaluation
methods of WSI; and how WSI can be applied to under-resourced languages
like Bangla, Assamese, Oriya, Kannada, etc.
1 Introduction
In Computational Linguistics, Word Sense Induction (WSI) or Discrimination is an
open and fundamental problem of Natural Language Processing (NLP), which
concerns the automatic identification of the different uses (senses) of a target word(i.e. meanings) in a given text, without relying on any external resources such as
dictionaries or sense-tagged data.
Given that the output of WSI is a set of senses for the target word (i.e. sense
inventory), this task is closely related to that of Word Sense Disambiguation (WSD),
which relies on an existing sense inventory and aims of assigning a sense label to
solve the ambiguity of words in context. WSD algorithms which use pre-defined
sense inventories (such as WordNet [1]) often contain fine-grained sense distinctions,
which pose serious problems for computational semantic processing [2]. Besides,
most WSD algorithms take a supervised approach, which requires a significant
amount of manually annotated training data. As the aim of WSI is to infer the correct
meaning of a particular word in a context without relying on any sense inventory
and/or sense annotated corpora, it is considered as one of the WSD approaches, which
is known as Unsupervised WSD.The manual construction of sense inventory is an expensive (in terms of man
power), tedious and time-consuming task for new languages, mostly for under-
resourced languages, and the result is highly dependent on the annotators and domain
consideration. By applying an automatic procedure we are able to only extract the
senses that are objectively present in a particular corpus, which not only allows for the
sense inventory to be straightforwardly adapted to a new domain but also helps to
escape the Knowledge Acquisition Bottleneck problem.
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 2/7
WSI searches to automatically identify the senses or use of a given target word
straightly from a raw corpora [3]. At first, it induces the senses of words in a fully
unsupervised way from the raw corpora, and then it uses the induced sense inventory
for the unsupervised disambiguation of the particular occurrences of words. In
induction steps, it maps words and contexts into a limited number of topical
dimensions in a semantic word space from the raw corpora. In disambiguation steps,
it applies the same principle like induction on the target text that has to be
disambiguated. For each target words from the target text that has to be
disambiguated matches with the topic dimensions that were mapped in induction
steps. Then, from the appropriate topic dimension correct senses are selected by
measuring the semantic similarity between the words.
2 State of the Art of WSI
Word senses are prerequisite for disambiguation process, which needs to be suited in
the context for inferring the appropriate meaning. WSI is an unsupervised WSD
technique use machine learning methods on non-sense-tagged corpora with no a
priori knowledge about the task at all. During the learning phase, algorithms induce
words senses from raw text by clustering word occurrences following the
Distributional Hypothesis [4], [5]. This hypothesis is popularized with the phrase “a
word is characterized by the company it keeps”. Two words are considered
semantically close if they co-occur with the same neighboring words. As a result
shifting the focus away from how to select the most suitable senses from an inventory
towards how to automatically discover senses from text. By applying WSI it is
possible to avoid Knowledge Acquisition Bottleneck problem. The single common
thread, which binds this method, is the clustering strategy used on the words in the
un-annotated corpus. Although WSI ends up finding the actual sense of a word, the
clustering and classification enables to label the senses, and therefore these
approaches are treated as part of WSD.
2.1 WSI Approaches
WSI algorithms extract the different senses of word following two approaches –
locally and globally. Local algorithms discover senses of a word per-word basis i.e.
by clustering its instances in contexts according to their semantic similarity, whereas
global algorithms discovers senses in a global manner i.e. by comparing and
determining them from the senses of other words in a full-blown word space model
[6]. Based on the type of clustering performed WSI proposed in the literature are the
following.
2.2.1 Clustering Approaches
Returning to the idea of [4], [5] that word meaning can be derived from context, [7]
discovers word senses from text. The underlying hypothesis of this approach is that
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 3/7
words are semantically similar if they appear in similar documents, within similar
context windows, or in similar syntactic contexts [8]. Lin’s algorithm [9] is a
prototypical example of word clustering, which is based on syntactic dependency
statistics, which occur in a corpus to produce sets of words for each discovered sense
of a target word [10]. By using the similarity function, the following clustering
algorithms are applied to a test set of word feature vectors [7]: K-means, Bisecting K-
means [11], Average-link , Buckshot , and UNICON [12]. The Clustering By
Committee (CBC) [7] also uses syntactic contexts intended for the task of sense
induction, but exploits a similarity matrix to encode the similarities between words
and relies on the notion of committees to output different senses of the word of
interest. These approaches are hard to obtain on a large scale for many domain and
languages.
2.2.1 Extended-clustering Approaches
Considering the observation that words tend to present one sense per collocation [13],
[14] uses word triplets instead of word pairs. A well-known approach to extended-
clustering is the Context-group Discrimination algorithm [15] based on large matrix
computation methods. Another approach, presented by [16], attempts to improve the
usability of small, narrow-domain corpora through self-term expansion. [3] shows
that the task of word sense induction can also be framed in a Bayesian context by
considering contexts of ambiguous words to be samples from a multinomial
distribution. There are other extended-clustering approaches, which includes the bi-
gram clustering technique proposed by [15], the clustering technique using co-
occurrences within phrases presented by [17], the technique for word clustering usinga context window presented by [18], and the method for applying the information
bottleneck algorithm to sense induction proposed by [19]. These additional clustering
techniques can be broadly categorized as either choosing additional features to
consider for target words or introducing more effective algorithms for clustering
techniques.
2.2.3 Graph-based Approaches
The main hypothesis of co-occurrence graphs is assuming that the semantic of a
word is represented by means of co-occurrence graph, whose vertices are co-
occurrences and edges are co-occurrence relations. These approaches are related to
word clustering methods, where co-occurrences between words can be obtained onthe basis of grammatical [20] or collocational relations [21]. [22] provides the idea of
Hypergraph model for this WSI approaches. HyperLex [21] is the successful
approach of a graph algorithm, based on the identification of hubs in co-occurrence
graphs, which have to cope with the need to tune a large number of parameters [23].
To deal with this issue several graph-based algorithms have been proposed, which are
based on simple graph patterns, namely Curvature Clustering [24], Squares,
Triangles and Diamonds (SquaT++) [25], and Balanced Maximum Spanning Tree
Clustering (B-MST) [26]. The patterns aim at identifying meanings using the local
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 4/7
structural properties of the co-occurrence graph. A randomized algorithm which
partitions the graph vertices by iteratively transferring the mainstream message (i.e.
word sense) to neighboring vertices proposed by [27] is Chinese Whispers. By
applying co-occurrence graphs approaches, [28], [29], and [30] have been shown to
achieve the state of the art performance in standard evaluation tasks. [31] reinterpret
the challenge of identifying sense specific information in a co-occurrence graph as
one of community detection, where a community is defined as a group of connected
nodes that are more connected to each other than to the rest of the graph [32].
Recently, [33] introduces a linear time graph-based soft clustering algorithm for WSI
named as MaxMax, which obtains comparable results with those of systems adopting
ex-listing, state of the art methods.
Recent successful WSI approaches are based on Latent Semantic Analysis (LSA)
[34] and [35] on word spaces [10] that find latent dimensions of meaning using Non-negative Matrix Factorization (NMF), then these dimensions are used to distinguish
between different senses of a target word, and then proceed to disambiguate each
given instance of that word.
2.2.4 Translation-oriented Approaches
WSI approaches described above cover only for monolingual data; in the context of
Machine Translation recent work has been done to incorporate bilingual data into the
sense induction task. Translation-oriented WSI approaches involve augmenting
source language context with target language equivalents. [36] describes this process
by using a bilingual corpus that has been word aligned by type and token to construct
two bilingual dictionaries, where each word type is associated with its translationequivalent. The lexicon is filtered such a way that words and their translation
equivalents have matching PoS tags and words appear in translations lexicons for
both directions.
3 Specific Contribution and Research Plan
I addressed a major weakness of supervised WSD systems like dependency on a fixed
sense inventory and lexical resources. This dependence represents a substantial
setback for under-resourced languages, where such resources are unavailable.
Furthermore, the general nature of lexical resources, and their disregard for the
specific task and domain has shown to hinder the performance of NLP applications.
In this regard, WSI, which infers the senses directly from the raw corpora, and does
not rely on predefined resources, presents a promising solution to the problem.
My contribution in this thesis is to develop a unified model by using (statistical
oriented) probability generative model, Independent Component Analysis (ICA), for
the automatic induction of word senses from the text, and subsequent disambiguation
of particular word instances in a completely unsupervised fashion. Then, I will apply
this model to the under-resourced languages like, Bangla, Assamese, Oriya, Kannada,
etc. to achieve the better performance.
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 5/7
4 Current Status of the Research Plan
I am currently developing a WSI system, which circumvents the question of actual
disambiguation method (which is the main source of discrepancy in Unsupervised
WSD) and deal directly with the raw corpora.
4 Expected Achievements
As a first-year PhD student, I look forward to continuing current research work and
exploring the new directions described above. This research helps me to explore the
different venues of WSI and apply to the under-resourced languages.
References
1. Fellbaum, C. (ed.): WordNet: An Electronic Database. MIT Press. Cambridge, MA (1998)
2. Ide, N., Wilks, Y.: Making Sense About Sense. In: Eneko Agirre and Philip Edmonds,
editors: Word Sense Disambiguation, Algorithms and Applications. Springer (2007) 47–73
3. Brody, S., Lapata, M.: Bayesian Word Sense Induction. In: Proceedings of the 12 th
Conference of the European Chapter of the Association for Computational Linguistics,
EACL ‘09. Stroudsburg, PA, USA (2009) 103–111
4. Harris, Z.: Distributional Structure. Word 10 (1954) 146–162
5. Curran, J. R.: PhD Thesis: From Distributional to Semantic Similarity. University of
Edinburg, Edinburg, UK (2004)
6. Apidianaki, M., Van de Cruys, T.: A qualitative Evaluation of Global Word Sense Induction.
In: Proceedings of the 12th International Conference on Intelligent Text Processing and
Computational Linguistics (CICLing). Tokyo, Japan (2011) 253–264
7. Pantel, P., Lin, D.: Discovering Word Senses from Text. In: Proceedings of the 8 th
International Conference on Knowledge Discovery and Data Mining (KDD) (2002) 613–
619
8. Van de Cruys, T.: PhD Thesis: Meaning for Mining – The Extraction of Lexico-semantic
Knowledge from Text. University of Groningen, The Netherlands (2010) 12–18
9. Lin D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 17 th
International Conference on Computer Linguistics (COLING). Montreal, Quebec, Canada
(1998) 768–774
10. Van de Cryus, T., Apidianaki, M.: Latent Semantic Word Sense Induction and
Disambiguation. In: Proceedings of the 49 th Annual Meeting of the Association for
Computational Linguistics: Human Language technologies (ACL-HLT). Portland, Oregon,
USA (2011) 1476–148511. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques.
In: Proceedings of the Workshop on Text Mining, 6th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (2000)
12. Lin, D., Pantel P.: Dirt – Discovery of Inference Rules from Text. In: Proceedings of the 7 th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD). San Francisco, CA, USA (2001) 323–328
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 6/7
8/10/2019 Word Sense Induction for Under-Resourced Languages
http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 7/7
32. Fortunato, S.: Community Detection in Graphs. Physics Reports, 486(5–3) (2010) 74–174
33. Hope, D., Keller B.: MaxMax: A Graph-based Soft Clustering Algorithm Applied to Word
Sense Induction. In: Proceedings of the International Conference on Intelligent Text
Processing and Computational Linguistics (CICLing 2013). Samos, Greece (2013) 368–381
34. Landauer, T., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis
Theory of the Acquisition, Induction and Presentation of Knowledge. Psychology Review
(1997) 104:211–240
35. Laundauer T., Foltz, P., Laham, D.: An Introduction to Latent Semantic Analysis.
Discourse Processes (1998) 25:284–295
36. Apidianaki, M.: Translation-oriented Word Sense Induction Based on Parallel Corpora. In:
the Proceedings of the LREC. Marrakech, Morocco (2008)
37. Rosenberg, A., Hirschberg, J.: V-measure: A Conditional Entropy-based External Cluster
Evaluation Measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in
Natural Language Processing and Computational Natural Language Learning (EMNLP-
CoNLL). Prague, Czech Republic (2007) 410–42038. Manandhar, S., Klapaftis, I. P., Dligach, D., Pradhan, S. S.: SemEval 2010 Task 14: Word
Sense Induction and Disambiguation. In: Proceedings of the 5th International Workshop on
Semantic Evaluation. Uppsala, Sweden (2010) 63–68
39. Rand, W. M.: Objective Criteria for Evaluation of Clustering Methods. Journal of the
American Statistical Association 66(336) (1971) 846–850
40. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical Clustering Algorithms for Document
Dataset. Data Mining and Knowledge Discovery, 10(2) (2005) 141–168
41. Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-
based Word Sense Induction. Computational Linguistics, 39(4). MIT Press (2013) 201–212