Word Sense Induction for Under-Resourced Languages

8/10/2019 Word Sense Induction for Under-Resourced Languages

http://slidepdf.com/reader/full/word-sense-induction-for-under-resourced-languages 1/7

Word Sense Induction for Under-Resourced Languages

Mohammad Nasiruddin

Univ. Grenoble Alpes

Laboratoire LIG - Bâtiment IM2AG B - 41 rue des Mathématiques

38400 Saint Martin d’Hères, France

[email protected]

Abstract. Word Sense Induction (WSI) is the task of automatically identifyingthe meaning of a polysemous word in a sentence in an unsupervised way, i.e.

without relying on any handcrafted resources or manually annotated data. This

article presents the state of the art of different approaches and evaluation

methods of WSI; and how WSI can be applied to under-resourced languages

like Bangla, Assamese, Oriya, Kannada, etc.

1 Introduction

In Computational Linguistics, Word Sense Induction (WSI) or Discrimination is an

open and fundamental problem of Natural Language Processing (NLP), which

concerns the automatic identification of the different uses (senses) of a target word(i.e. meanings) in a given text, without relying on any external resources such as

dictionaries or sense-tagged data.

Given that the output of WSI is a set of senses for the target word (i.e. sense

inventory), this task is closely related to that of Word Sense Disambiguation (WSD),

which relies on an existing sense inventory and aims of assigning a sense label to

solve the ambiguity of words in context. WSD algorithms which use pre-defined

sense inventories (such as WordNet [1]) often contain fine-grained sense distinctions,

which pose serious problems for computational semantic processing [2]. Besides,

most WSD algorithms take a supervised approach, which requires a significant

amount of manually annotated training data. As the aim of WSI is to infer the correct

meaning of a particular word in a context without relying on any sense inventory

and/or sense annotated corpora, it is considered as one of the WSD approaches, which

is known as Unsupervised WSD.The manual construction of sense inventory is an expensive (in terms of man

power), tedious and time-consuming task for new languages, mostly for under-

resourced languages, and the result is highly dependent on the annotators and domain

consideration. By applying an automatic procedure we are able to only extract the

senses that are objectively present in a particular corpus, which not only allows for the

sense inventory to be straightforwardly adapted to a new domain but also helps to

escape the Knowledge Acquisition Bottleneck problem.



WSI searches to automatically identify the senses or use of a given target word

straightly from a raw corpora [3]. At first, it induces the senses of words in a fully

unsupervised way from the raw corpora, and then it uses the induced sense inventory

for the unsupervised disambiguation of the particular occurrences of words. In

induction steps, it maps words and contexts into a limited number of topical

dimensions in a semantic word space from the raw corpora. In disambiguation steps,

it applies the same principle like induction on the target text that has to be

disambiguated. For each target words from the target text that has to be

disambiguated matches with the topic dimensions that were mapped in induction

steps. Then, from the appropriate topic dimension correct senses are selected by

measuring the semantic similarity between the words.

2 State of the Art of WSI

Word senses are prerequisite for disambiguation process, which needs to be suited in

the context for inferring the appropriate meaning. WSI is an unsupervised WSD

technique use machine learning methods on non-sense-tagged corpora with no a

priori knowledge about the task at all. During the learning phase, algorithms induce

words senses from raw text by clustering word occurrences following the

Distributional Hypothesis [4], [5]. This hypothesis is popularized with the phrase “a

word is characterized by the company it keeps”. Two words are considered

semantically close if they co-occur with the same neighboring words. As a result

shifting the focus away from how to select the most suitable senses from an inventory

towards how to automatically discover senses from text. By applying WSI it is

possible to avoid Knowledge Acquisition Bottleneck problem. The single common

thread, which binds this method, is the clustering strategy used on the words in the

un-annotated corpus. Although WSI ends up finding the actual sense of a word, the

clustering and classification enables to label the senses, and therefore these

approaches are treated as part of WSD.

2.1 WSI Approaches

WSI algorithms extract the different senses of word following two approaches –

locally and globally. Local algorithms discover senses of a word per-word basis i.e.

by clustering its instances in contexts according to their semantic similarity, whereas

global algorithms discovers senses in a global manner i.e. by comparing and

determining them from the senses of other words in a full-blown word space model

[6]. Based on the type of clustering performed WSI proposed in the literature are the

following.

2.2.1 Clustering Approaches

Returning to the idea of [4], [5] that word meaning can be derived from context, [7]

discovers word senses from text. The underlying hypothesis of this approach is that



words are semantically similar if they appear in similar documents, within similar

context windows, or in similar syntactic contexts [8]. Lin’s algorithm [9] is a

prototypical example of word clustering, which is based on syntactic dependency

statistics, which occur in a corpus to produce sets of words for each discovered sense

of a target word [10]. By using the similarity function, the following clustering

algorithms are applied to a test set of word feature vectors [7]: K-means, Bisecting K-

means [11], Average-link , Buckshot , and UNICON [12]. The Clustering By

Committee (CBC) [7] also uses syntactic contexts intended for the task of sense

induction, but exploits a similarity matrix to encode the similarities between words

and relies on the notion of committees to output different senses of the word of

interest. These approaches are hard to obtain on a large scale for many domain and

languages.

2.2.1 Extended-clustering Approaches

Considering the observation that words tend to present one sense per collocation [13],

[14] uses word triplets instead of word pairs. A well-known approach to extended-

clustering is the Context-group Discrimination algorithm [15] based on large matrix

computation methods. Another approach, presented by [16], attempts to improve the

usability of small, narrow-domain corpora through self-term expansion. [3] shows

that the task of word sense induction can also be framed in a Bayesian context by

considering contexts of ambiguous words to be samples from a multinomial

distribution. There are other extended-clustering approaches, which includes the bi-

gram clustering technique proposed by [15], the clustering technique using co-

occurrences within phrases presented by [17], the technique for word clustering usinga context window presented by [18], and the method for applying the information

bottleneck algorithm to sense induction proposed by [19]. These additional clustering

techniques can be broadly categorized as either choosing additional features to

consider for target words or introducing more effective algorithms for clustering

techniques.

2.2.3 Graph-based Approaches

The main hypothesis of co-occurrence graphs is assuming that the semantic of a

word is represented by means of co-occurrence graph, whose vertices are co-

occurrences and edges are co-occurrence relations. These approaches are related to

word clustering methods, where co-occurrences between words can be obtained onthe basis of grammatical [20] or collocational relations [21]. [22] provides the idea of

Hypergraph model for this WSI approaches. HyperLex [21] is the successful

approach of a graph algorithm, based on the identification of hubs in co-occurrence

graphs, which have to cope with the need to tune a large number of parameters [23].

To deal with this issue several graph-based algorithms have been proposed, which are

based on simple graph patterns, namely Curvature Clustering [24], Squares,

Triangles and Diamonds (SquaT++) [25], and Balanced Maximum Spanning Tree

Clustering (B-MST) [26]. The patterns aim at identifying meanings using the local



structural properties of the co-occurrence graph. A randomized algorithm which

partitions the graph vertices by iteratively transferring the mainstream message (i.e.

word sense) to neighboring vertices proposed by [27] is Chinese Whispers. By

applying co-occurrence graphs approaches, [28], [29], and [30] have been shown to

achieve the state of the art performance in standard evaluation tasks. [31] reinterpret

the challenge of identifying sense specific information in a co-occurrence graph as

one of community detection, where a community is defined as a group of connected

nodes that are more connected to each other than to the rest of the graph [32].

Recently, [33] introduces a linear time graph-based soft clustering algorithm for WSI

named as MaxMax, which obtains comparable results with those of systems adopting

ex-listing, state of the art methods.

Recent successful WSI approaches are based on Latent Semantic Analysis (LSA)

[34] and [35] on word spaces [10] that find latent dimensions of meaning using Non-negative Matrix Factorization (NMF), then these dimensions are used to distinguish

between different senses of a target word, and then proceed to disambiguate each

given instance of that word.

2.2.4 Translation-oriented Approaches

WSI approaches described above cover only for monolingual data; in the context of

Machine Translation recent work has been done to incorporate bilingual data into the

sense induction task. Translation-oriented WSI approaches involve augmenting

source language context with target language equivalents. [36] describes this process

by using a bilingual corpus that has been word aligned by type and token to construct

two bilingual dictionaries, where each word type is associated with its translationequivalent. The lexicon is filtered such a way that words and their translation

equivalents have matching PoS tags and words appear in translations lexicons for

both directions.

3 Specific Contribution and Research Plan

I addressed a major weakness of supervised WSD systems like dependency on a fixed

sense inventory and lexical resources. This dependence represents a substantial

setback for under-resourced languages, where such resources are unavailable.

Furthermore, the general nature of lexical resources, and their disregard for the

specific task and domain has shown to hinder the performance of NLP applications.

In this regard, WSI, which infers the senses directly from the raw corpora, and does

not rely on predefined resources, presents a promising solution to the problem.

My contribution in this thesis is to develop a unified model by using (statistical

oriented) probability generative model, Independent Component Analysis (ICA), for

the automatic induction of word senses from the text, and subsequent disambiguation

of particular word instances in a completely unsupervised fashion. Then, I will apply

this model to the under-resourced languages like, Bangla, Assamese, Oriya, Kannada,

etc. to achieve the better performance.



4 Current Status of the Research Plan

I am currently developing a WSI system, which circumvents the question of actual

disambiguation method (which is the main source of discrepancy in Unsupervised

WSD) and deal directly with the raw corpora.

4 Expected Achievements

As a first-year PhD student, I look forward to continuing current research work and

exploring the new directions described above. This research helps me to explore the

different venues of WSI and apply to the under-resourced languages.

References

1. Fellbaum, C. (ed.): WordNet: An Electronic Database. MIT Press. Cambridge, MA (1998)

2. Ide, N., Wilks, Y.: Making Sense About Sense. In: Eneko Agirre and Philip Edmonds,

editors: Word Sense Disambiguation, Algorithms and Applications. Springer (2007) 47–73

3. Brody, S., Lapata, M.: Bayesian Word Sense Induction. In: Proceedings of the 12 th

Conference of the European Chapter of the Association for Computational Linguistics,

EACL ‘09. Stroudsburg, PA, USA (2009) 103–111

4. Harris, Z.: Distributional Structure. Word 10 (1954) 146–162

5. Curran, J. R.: PhD Thesis: From Distributional to Semantic Similarity. University of

Edinburg, Edinburg, UK (2004)

6. Apidianaki, M., Van de Cruys, T.: A qualitative Evaluation of Global Word Sense Induction.

In: Proceedings of the 12th International Conference on Intelligent Text Processing and

Computational Linguistics (CICLing). Tokyo, Japan (2011) 253–264

7. Pantel, P., Lin, D.: Discovering Word Senses from Text. In: Proceedings of the 8 th

International Conference on Knowledge Discovery and Data Mining (KDD) (2002) 613–

619

8. Van de Cruys, T.: PhD Thesis: Meaning for Mining – The Extraction of Lexico-semantic

Knowledge from Text. University of Groningen, The Netherlands (2010) 12–18

9. Lin D.: Automatic Retrieval and Clustering of Similar Words. In: Proceedings of the 17 th

International Conference on Computer Linguistics (COLING). Montreal, Quebec, Canada

(1998) 768–774

10. Van de Cryus, T., Apidianaki, M.: Latent Semantic Word Sense Induction and

Disambiguation. In: Proceedings of the 49 th Annual Meeting of the Association for

Computational Linguistics: Human Language technologies (ACL-HLT). Portland, Oregon,

USA (2011) 1476–148511. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques.

In: Proceedings of the Workshop on Text Mining, 6th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining (2000)

12. Lin, D., Pantel P.: Dirt – Discovery of Inference Rules from Text. In: Proceedings of the 7 th

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

(KDD). San Francisco, CA, USA (2001) 323–328





32. Fortunato, S.: Community Detection in Graphs. Physics Reports, 486(5–3) (2010) 74–174

33. Hope, D., Keller B.: MaxMax: A Graph-based Soft Clustering Algorithm Applied to Word

Sense Induction. In: Proceedings of the International Conference on Intelligent Text

Processing and Computational Linguistics (CICLing 2013). Samos, Greece (2013) 368–381

34. Landauer, T., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis

Theory of the Acquisition, Induction and Presentation of Knowledge. Psychology Review

(1997) 104:211–240

35. Laundauer T., Foltz, P., Laham, D.: An Introduction to Latent Semantic Analysis.

Discourse Processes (1998) 25:284–295

36. Apidianaki, M.: Translation-oriented Word Sense Induction Based on Parallel Corpora. In:

the Proceedings of the LREC. Marrakech, Morocco (2008)

37. Rosenberg, A., Hirschberg, J.: V-measure: A Conditional Entropy-based External Cluster

Evaluation Measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in

Natural Language Processing and Computational Natural Language Learning (EMNLP-

CoNLL). Prague, Czech Republic (2007) 410–42038. Manandhar, S., Klapaftis, I. P., Dligach, D., Pradhan, S. S.: SemEval 2010 Task 14: Word

Sense Induction and Disambiguation. In: Proceedings of the 5th International Workshop on

Semantic Evaluation. Uppsala, Sweden (2010) 63–68

39. Rand, W. M.: Objective Criteria for Evaluation of Clustering Methods. Journal of the

American Statistical Association 66(336) (1971) 846–850

40. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical Clustering Algorithms for Document

Dataset. Data Mining and Knowledge Discovery, 10(2) (2005) 141–168

41. Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-

based Word Sense Induction. Computational Linguistics, 39(4). MIT Press (2013) 201–212

Documents

Word Sense Induction for Under-Resourced Languages