30
PRESENTED BY ELVIRA NURFADHILAH * Based On Work By Mohammad Teduh Uliniansyah, Shun Ishizaki, And Kiyoko Uchiyama 1 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Embed Size (px)

Citation preview

Page 1: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

PRESENTED BY ELVIRA NURFADHILAH * B a s e d O n Wo r k B y M o h a m m a d Te d u h U l i n i a n s y a h , S h u n I s h i z a k i ,

A n d K i y o k o U c h i y a m a

1 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 2: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Name : Elvira Nurfadhilah Working at : Agency for the Assessment & Application of Technology (BPPT)

Laboratory : Intelligence Computing Laboratory (ICL) Specialisation field : Image Processing and Natural Language Processing Email : [email protected]/ [email protected] /

Educational background Under Graduate : Bogor Agriculture University in Computer Science (2011)

Graduate : Bogor Agriculture University in Computer Science (2015)

Joined at BPPT : 2014

2 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 3: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Intelligent Computing Laboratory (ICL) at the Center for Information and Communication Technology (PTIK), BPPT.

ICL deals with image processing, computer vision, language technology and signal processing.

Portal Bahasa (Stemmer and Concordance)

Statistical Machine Translation Text-To- Speech Etc.

Fingerprint and Latten Fingerprint Iris Face and Face sketch Blood vessel etc.

Developing a malaria diagnosis tool based on images of thin and thick smears.

Natural Language Processing

3 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 4: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

o Background o Experimental Data o Method o Results and Discussion o Utilizing the proposed technique for Wordnet o Demo Program

4 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 5: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

BACKGROUND

5 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 6: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Ambiguities arise when a single lexical word may have been created by more than one pos-sible combination of affixes.

Example: beruang:

o beruang (Noun Animal)

o ber ( uang ( Noun Concrete )) : Verb Intransitive

o be ( ruang ( Noun Abstract Concept )) :

Verb Intransitive

6 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 7: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

EXPERIMENTAL DATA

7 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 8: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

A corpus consists of articles (politics,economics,sports,etc.) downloaded from"Kompas"daily newspaper website (http://www.kompas.com). The corpus contains 20,579,771 words in 1,105,156 sentences.

8 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 9: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

There are more than 800 combinations of affixes (prefixes, suffixes, and infixes)

9 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 10: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

10 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 11: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

METHOD

11 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 12: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Linking all possible nodes Retrieve all possible POS tags

candidates from root word dictionary and affix table

Assign linking costs between nodes, search minimum cost,

and decide proper POS tags for each words

Process 1 Process 2 Process 3

12 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 13: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Beruang

(be + uang)

(ber + ruang)

13 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015 13

Page 14: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Example

14 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 15: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

15 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 16: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

16 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 17: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Let p1 be one POS and p2 another one, where p2 directly follows pi. The cost of the pair (p1,p2) is:

Cost (p1,p2) = 2log(N/n(p1,p2))

where n(p1,p2) is the number of (p1,p2) pairs which appear in the data. N is the total number of all of the pairs of POS tags in the data.

17 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 18: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Example :

18 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 19: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Cost (pronoun, adverb) = 4.9

Cost (pronoun, adjective) = 8.3

Cost (pronoun, conjunction) = 8.9

19 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 20: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

An Example of Possible Analysis for a Simple Input Sentence

20 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 21: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

RESULTS & DISCUSSION

21 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 22: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

22 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 23: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

23 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 24: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

24 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 25: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

25 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 26: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Reference • Uliniansyah MT, Ishizaki S, Uchiyama K. 2004. Solving Ambiguities in

Indonesian Words by Morphological Analysis Using Minimum Connectivity Cost. Journal of Natural Language Processing, Vol. 11, No. 1

• Kridalaksana, H.(1996). Pembentukan Kata dalam Bahasa Indonesia. PT Gramedia Pustaka Utama.

26 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 27: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Utilizing the proposed technique for WordNet We can use Wordnet and this technique to choose the proper sense of the word in a sentence.

Contoh :

Dia \pronoun sedang\adverb berada\verb intransitive di\partical location dalam\noun abstract location kamar\noun building

27 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 28: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

28 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 29: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

29 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Page 30: WORKING PROGRESS AND OUTLINE OF PROJECT PLAN WBS 1

Demo Program

30 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015