10
Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: [email protected] Web: http://www2.cs.uregina.ca/~jty ao

Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: [email protected] Web: jtyao

Embed Size (px)

Citation preview

Page 1: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

Panel Discussion on Foundations of Data Mining

at RSCTC2004

J. T. Yao

University of Regina

Email: [email protected]

Web: http://www2.cs.uregina.ca/~jtyao

Page 2: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

What is the Foundations of Data Mining?

DM research mainly focuses on algorithms and methodologies.

There is a lack of study on mathematical modeling of, or foundations of, data mining

The study of foundations of data mining is in its infancy, and there are probably more questions than answers. (Mannila 2000)

Page 3: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

What is the Foundations of Data Mining?

Chen's approach (2002): data mining can be studied from three different but related dimensions. The philosophical dimension deals with the

nature and scope of data mining. The technical dimension covers data mining

methods and techniques. The social dimension concern the social

impact and consequences of data mining.

Page 4: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

What is the Foundations of Data Mining?

Xie and Raghavan's approach (2002): logical foundation of data mining based on Bacchus' probability logic. Precise definition of intuitive notions, such as

``pattern'', ``previously unknown knowledge'' and ``potentially useful knowledge''.

A logic induction operator is defined for discovering ``previously unknown and potentially useful knowledge''.

Page 5: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

What is the Foundations of Data Mining?

Lin's (2002), Tsumoto (2002), and Yao's (2001) approaches: Granular computing as a basis for data mining. A concept consists of two parts, the intension and

extension of the concept. The intension of a concept consists of properties objects. The extension of a concept is the set of instances. A rule can be expressed in the form, φ=>ψwhere φ and ψ are intensions of two concepts. Rules are interpreted using extensions of the two

concepts.

Page 6: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

A Multi-level Framework for Modeling Data Mining

The kernel focuses on the study of knowledge without reference to data mining algorithms.

The technique levels focus on data mining algorithms without reference to particular application.

The application levels focus on the utility of discovered knowledge with respect to particular domains of applications.

Page 7: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

How do Rough Sets Contribute to FDM?

Knowledge is an entity in the semantic levels of data. Knowledge embedded in data is related to semantic interpretations of data.

The existence of knowledge in data is unrelated to whether we have an algorithm to extract it.

We need to separate the study of knowledge and the study of data mining algorithms, and in turn to separate them from the study of utility of discovered knowledge.

Page 8: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

How do Rough Sets Contribute to FDM?

Concepts are used as a primitive notion of data mining: Every concept is understood as a unit of thoughts that

consists of two parts, the intension and the extension of the concept.

Tarski's approach is used to study concepts through the notions of a model and satisfiability.

An information table is used as a model. The intension of a concept is expressed by a formula of a

decision language in the information table. The extension of a concept is expressed by a subset of

objects.

Page 9: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

How do Rough Sets Contribute to FDM?

Rules are used to express relationships. Rules can be interpreted and classified in terms of

extensions of concepts and are based on probability theory.

Many classes of rules can be defined: association rules, exception rules, peculiarity rules, similarity, negative association, conditional association rules.

Both concepts and rules are used as examples to illustrate the focus of discussion at kernel level.

Page 10: Panel Discussion on Foundations of Data Mining at RSCTC2004 J. T. Yao University of Regina Email: jtyao@cs.uregina.ca Web: jtyao

References

Chen, Z. The three dimensions of data mining foundation, FDM’02, 119-124, 2002.

Lin, T.Y. Issues in modeling for data mining, COMPSAC’02, 1152-1157, 2002.

Mannila, H. Theoretical frameworks for data mining, SIGKDD Explorations, (2), 30-32, 2000.

Tsumoto, S.,T.Y Lin, J.F. Peters. Foundations of Data Mining via Granular and Rough Computing. COMPSAC’02, 1123-1124, 2002

Yao, Y.Y. Modeling data mining with granular computing, COMPSAC’01, 638-643, 2001.

Yao, Y.Y., A step towards the foundations of data mining, SPIE Vol. 5098, 254-263, 2003.