Upload
nia
View
60
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Multilingual document mining and navigation using self-organizing maps. Presenter : Yu-Ting LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee 2011. IPM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
Presenter : YU-TING LU
Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee
2011. IPM
Multilingual document mining and navigation using self-organizing maps
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation• Such directories are generally constructed
manually and may have disadvantages of
narrow coverage and inconsistency.
• Most of existing directories provide only
monolingual hierarchies that organized Web
pages in terms that a user may not be familiar
with.
Intelligent Database Systems Lab
Intelligent Database Systems Lab
資料探勘 Data mining
Intelligent Database Systems Lab
Objectives• This work will propose an approach that could
automatically arrange multilingual Web pages into a
multilingual Web directory to break the language
barriers in Web navigation.
Intelligent Database Systems Lab
Methodology
Intelligent Database Systems Lab
Methodology – Web directory generation• Web page preprocessing and encoding– English• Word segmentation• stop-word elimination• Stemming• keyword selection
– Chinese• select only nouns as keywords
Intelligent Database Systems Lab
Methodology – Web directory generation• Feature map generation
Intelligent Database Systems Lab
Methodology – Web directory generation• Web directory generation– Super cluster construction – Determining dominating clusters – Constructing hierarchy– Parameter setting and discussions
– Super cluster construction – Determining dominating clusters – Constructing hierarchy– Parameter setting and discussions
Intelligent Database Systems Lab
Methodology – Web directory generation• Evaluation of the quality of generated hierarchies
Intelligent Database Systems Lab
Methodology – Multilingual Web directory generation
• Alignment of monolingual Web directories– Calculating semantic similarity – Incorporating structural similarity– Overall similarity
Intelligent Database Systems Lab
Methodology – Multilingual Web directory generation
• Alignment of monolingual Web directories
Intelligent Database Systems Lab
Methodology – Multilingual Web directory generation
• Multilingual Web directory generation
Intelligent Database Systems Lab
Experiments - SOM training
Intelligent Database Systems Lab
Experiments - SOM training
Intelligent Database Systems Lab
Experiments - Hierarchy generation
Intelligent Database Systems Lab
Experiments - Hierarchy generation
Intelligent Database Systems Lab
Experiments - Hierarchy generation
Intelligent Database Systems Lab
Experiments - Hierarchy alignment and Web directory generation
Intelligent Database Systems Lab
Conclusions
• The development of multilingual hierarchy alignment method is fully automated and requires no human intervention.
• It will be convenient for users to have a Web directory providing multilingual category labels and categorizing multilingual Web pages.
Intelligent Database Systems Lab
Comments• Advantages
-The development of multilingual hierarchy alignment method -Fully automated
• Applications- SOM