CSCI 347 / CS 4206: Data Mining Module 01: Introduction Topic
03: Stages in Data Mining
Slide 3
Module 01: Introduction - Objectives Understand the definition
of basic data mining terms Understand, at a general level,
structural descriptions in data mining Understand, at a general
level, the main steps/stages in data mining Be aware of the biases
of different basic approaches to data mining Be aware of fielded
applications in data mining Understand and identify technical and
ethical issues in data mining 2CSCI347/CS4206 Data Mining
Slide 4
Stages in Data Mining The overall approach your textbook uses
to describe data mining is to look at it according to what goes
into the system (input), what happens to it (the algorithms or
processing), and what comes out (output). Input Data Acquisition
Cleansing / Transformation Processing (Algorithms) Output
Representation Evaluation 3CSCI347/CS4206 Data Mining
Slide 5
Input As the text authors state, We are overwhelmed with data.
We collect an incredible amount of data, and there are potentially
useful patterns in that data, but the vast amount of data available
makes it impossible to manually uncover these patterns. Input data
is not only divided on the dimension of source or industry, but
also by type of data. Is the data numeric or symbolic? Is it
relatively error-free, or is there much error in it? Is it
consistent? 4CSCI347/CS4206 Data Mining
Slide 6
Processing Some authors divide the data mining task into two
categories: predictive and descriptive (Tan, Steinbach, and Kumar,
2004). Predictive systems use some variables to predict unknown or
future values of other variables Descriptive systems find human
interpretable patterns in the data. Some predictive systems are:
Classification Regression Deviation Detection Some descriptive
systems are: Clustering Association Rule Discovery Sequential
Pattern Discovery What are some examples of these? 5CSCI347/CS4206
Data Mining
Slide 7
Output The format of the output of the system is also
important. Sometimes the correct answer is all that matters
Sometimes it is important that the patterns discovered make sense
to human users For example: If Im classifying sea ice, I may not be
terribly concerned about the patterns the classifier came up with
in making its decisions, as long as I have faith that the decisions
are correct. If Im a physician, Im far more worried about having a
traceable decision logic that is human readable if Im going to make
a decision to intervene or not in a pregnancy. 6CSCI347/CS4206 Data
Mining
Slide 8
THE Mystery Sound And what is the mystery sound for this
section???