18
Intelligent Database Systems Presenter : JHOU, YU-LIANG Authors :Shady Shehata , Fakhri Karray, Mohamed S. Kamel, Fellow 2012, IEEE An Efficient Concept-Based Mining Model for Enhancing Text Clustering

Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors :Shady Shehata, Fakhri Karray, Mohamed S. Kamel, Fellow 2012, IEEE An Efficient Concept-Based

Embed Size (px)

Citation preview

Intelligent Database Systems Lab

Presenter : JHOU, YU-LIANG

Authors :Shady Shehata , Fakhri Karray, Mohamed S. Kamel, Fellow

2012, IEEE

An Efficient Concept-Based Mining Model for Enhancing Text Clustering

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodology EvaluationConclusionsComments

Intelligent Database Systems Lab

Motivation• In text mining ,the term frequency is

computed to explore the importance of the term in document.

• However, two terms can have the same frequency in documents, but one term contributes more to the meaning of its sentences than the other term.

Intelligent Database Systems Lab

ObjectivesUsing Concept-Based Mining Model for Text Clustering , improve the clustering quality.

Intelligent Database Systems Lab

Methodology Concept-Based Mining Model

Intelligent Database Systems Lab

Methodology CONCEPT-BASED MINING MODEL

Ex: a concept c which appears twice in document d in the first and the second sentences The concept c appears five times in the verb argument structures of the first sentence s 1 , and three times in the verb argument structuresof the second sentence s 2 . ans : ctf value = (5+3)/2=4

Intelligent Database Systems Lab

MethodologyCorpus-Based Concept Analysis Algorithm

Intelligent Database Systems Lab

Methodology Example of Conceptual Term Frequency

. [ARG0 Texas and Australia researchers] have [TARGET created] [ARG1 industry-ready sheets of materials made from nanotubes that could lead tothe development of artificial muscles].

[ARG1 materials] [TARGET made ] [ARG2 from nanotubes that could leadto the development of artificial muscles].

[ARG1 nanotubes] [R-ARG1 that] [ARGM-MOD could] [TARGET lead] [ARG2 to the development of artificial muscles].

Intelligent Database Systems Lab

Methodology Example of Conceptual Term Frequency

1. First verb argument structure for the verb created:. [ARG0 Texas and Australia researchers]. [TARGET created]. [ARG1 industry-ready sheets of materials madefrom nanotubes that could lead to the development of artificial muscles].

2. Second verb argument structure for the verb made:. [ARG1 materials]. [TARGET made]. [ARG2 from nanotubes that could lead to the development of artificial muscles].

3. Third verb argument structure for the verb lead:. [ARG1 nanotubes]. [R-ARG1 that]. [ARGM-MOD could]. [TARGET lead]. [ARG2 to the development of artificial muscles].

Intelligent Database Systems Lab

MethodologyExample of Conceptual Term Frequency

1. Concepts in the first verb argument structure of the verb created:. Texas Australia researchers. created. industry-ready sheets materials nanotubes lead development artificial muscles

2. Concepts in the second verb argument structure of the verb made:. materials. nanotubes lead development artificial muscles

3. Concepts in the third verb argument structure of the verb lead:. nanotubes. lead. development artificial muscles.

Intelligent Database Systems Lab

Methodology Example of Conceptual Term Frequency

Intelligent Database Systems Lab

Methodology Concept-Based Similarity Measure

Intelligent Database Systems Lab

Experimental Result

Intelligent Database Systems Lab

Experimental Result

Intelligent Database Systems Lab

Experimental Result

Intelligent Database Systems Lab

Experimental Result

Intelligent Database Systems Lab

Conclusions

The new approach enhance text clustering quality.

Intelligent Database Systems Lab

CommentsAdvantages Improve the text clustering quality.Applications -Concept-based mining model -Conceptual term frequency