Upload
alfian-gozali
View
407
Download
2
Embed Size (px)
DESCRIPTION
MoodUs - WP7 Apps - Chatterbox+Twitter-API Apps for Windows Phone 7 combining your twitter data and sentiment analysis for interesting mood forecast and statistics.
Citation preview
Analysis of Hidden Markov ModelMethod Implementation
in Documents Topic Sentence Extractionfor Information Retrieval
Alfian Akbar Gozali113060074
BackgroundGrowth of the Internet Users
Growth of the internet WebPages
Search engine development
Enormous number of indexing terms
Need more than just
an ordinary ‘trash’
thrower!
Solution: Document Extraction!!
Hidden Markov Model (HMM)Documents Extraction
The Goals are Analyze…The HMM works
Effects of the parameter in HMM
Differences between ordinary indexing and compression indexing with HMM
Effects of document variation to this system
What is HMM?
One of the Markov Chain enhancement
Predict the sequence of pattern that can’t be observed directly
Consist of two state trails: observed and hidden
HMM Elements
Topic Sentence ExtractionDepends on particular language
Doesn’t depend on particular language
• Statistical approach• HMM Hedge
ROUGE - 2Measure accuracy between human extraction
and system extraction
Overall Design
System Testing{NAME} and {NUMERIC} tag
α parameter in decoding
effect of extraction
corpus kinds
Result – Scenario 1 (tagging)
0 40 80 120
160
200
0
10000
20000
pTrans
TaggingUntagging
Number of docs
terms
0 50 100 150 2000
1000200030004000
pEmiss
TaggingUntagging
Number of docs
terms
0 40 80 120
160
200
0
1000
2000
3000
4000
Extracting
TaggingUntagging
number of documents
time
0 40 80 120
160
200
050
100150200250
Evaluation
TaggingUntagging
Number of documents
time
Result – Scenario 2 (alpha)
0.00
10.
10.
30.
5
0.70
0000
0000
0000
10.
934.00%
37.00%
40.00%40.34%
Average Accuracy
average
alpha
accuracy
Result – Scenario 3 (extraction)Execution Time
dengan kompresitanpa kompresi
Number of Terms
dengan kompresitanpa kompresi
71.95%
56.98%
Result – Scenario 4 (corpus)
www.fo
otba
lltrib
al.co
m
www.fi
fa.co
m
www.n
ytim
es.com
0%
20%
40%
60%
80%
rerata
maks akurasi
min akurasi
Extraction Accuracy
rerata
maks akurasi
min akurasi
ConclusionTagging can reduce extracting time and
number of indexed terms
Optimum alpha parameter is 0,2 and 0,3
Compression can reduce indexing time and number of indexed terms
Variation of the corpuses can influence system accuracy
That’s all…
Thank You…