Upload
ngohuong
View
216
Download
2
Embed Size (px)
Citation preview
PRESENTED BY ELVIRA NURFADHILAH * B a s e d O n Wo r k B y M o h a m m a d Te d u h U l i n i a n s y a h , S h u n I s h i z a k i ,
A n d K i y o k o U c h i y a m a
1 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Name : Elvira Nurfadhilah Working at : Agency for the Assessment & Application of Technology (BPPT)
Laboratory : Intelligence Computing Laboratory (ICL) Specialisation field : Image Processing and Natural Language Processing Email : [email protected]/ [email protected] /
Educational background Under Graduate : Bogor Agriculture University in Computer Science (2011)
Graduate : Bogor Agriculture University in Computer Science (2015)
Joined at BPPT : 2014
2 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Intelligent Computing Laboratory (ICL) at the Center for Information and Communication Technology (PTIK), BPPT.
ICL deals with image processing, computer vision, language technology and signal processing.
Portal Bahasa (Stemmer and Concordance)
Statistical Machine Translation Text-To- Speech Etc.
Fingerprint and Latten Fingerprint Iris Face and Face sketch Blood vessel etc.
Developing a malaria diagnosis tool based on images of thin and thick smears.
Natural Language Processing
3 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
o Background o Experimental Data o Method o Results and Discussion o Utilizing the proposed technique for Wordnet o Demo Program
4 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
BACKGROUND
5 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Ambiguities arise when a single lexical word may have been created by more than one pos-sible combination of affixes.
Example: beruang:
o beruang (Noun Animal)
o ber ( uang ( Noun Concrete )) : Verb Intransitive
o be ( ruang ( Noun Abstract Concept )) :
Verb Intransitive
6 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
EXPERIMENTAL DATA
7 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
A corpus consists of articles (politics,economics,sports,etc.) downloaded from"Kompas"daily newspaper website (http://www.kompas.com). The corpus contains 20,579,771 words in 1,105,156 sentences.
8 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
There are more than 800 combinations of affixes (prefixes, suffixes, and infixes)
9 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
10 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
METHOD
11 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Linking all possible nodes Retrieve all possible POS tags
candidates from root word dictionary and affix table
Assign linking costs between nodes, search minimum cost,
and decide proper POS tags for each words
Process 1 Process 2 Process 3
12 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Beruang
(be + uang)
(ber + ruang)
13 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015 13
Example
14 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
15 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
16 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Let p1 be one POS and p2 another one, where p2 directly follows pi. The cost of the pair (p1,p2) is:
Cost (p1,p2) = 2log(N/n(p1,p2))
where n(p1,p2) is the number of (p1,p2) pairs which appear in the data. N is the total number of all of the pairs of POS tags in the data.
17 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Example :
18 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Cost (pronoun, adverb) = 4.9
Cost (pronoun, adjective) = 8.3
Cost (pronoun, conjunction) = 8.9
19 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
An Example of Possible Analysis for a Simple Input Sentence
20 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
RESULTS & DISCUSSION
21 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
22 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
23 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
24 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
25 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Reference • Uliniansyah MT, Ishizaki S, Uchiyama K. 2004. Solving Ambiguities in
Indonesian Words by Morphological Analysis Using Minimum Connectivity Cost. Journal of Natural Language Processing, Vol. 11, No. 1
• Kridalaksana, H.(1996). Pembentukan Kata dalam Bahasa Indonesia. PT Gramedia Pustaka Utama.
26 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Utilizing the proposed technique for WordNet We can use Wordnet and this technique to choose the proper sense of the word in a sentence.
Contoh :
Dia \pronoun sedang\adverb berada\verb intransitive di\partical location dalam\noun abstract location kamar\noun building
27 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
28 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
29 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015
Demo Program
30 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015