Upload
spotanand9941
View
218
Download
0
Embed Size (px)
Citation preview
7/31/2019 Applying Data Mining
1/13
Applying Data Mining
By Susan L. Miertschin
1
7/31/2019 Applying Data Mining
2/13
Is Data Mining Appropriate for the
ro em a an
Can you clearly define the problem?
Does potentially meaningful data exist? Does the data contain hidden knowledge or is the data factual
an use u or reporting purpose on y
Will the cost of processing the data be less than the likely
gained from the data mining project?
7/31/2019 Applying Data Mining
3/13
Determine if the Problem is Suitable for
Shallow Knowledge
Mu ti imensiona Know e ge
Hidden Knowledge
Deep Knowledge
7/31/2019 Applying Data Mining
4/13
our ypes o now e e
Factual
Factual
On-line analytical Processing
manipulated in a database
oo s use o man pu a emultidimensional knowledge
Not easily found using database query
Data mining algorithms can findpatterns
Can only be found if some
direction about what we are
7/31/2019 Applying Data Mining
5/13
Data Mining vs. Data Query: An
xamp e
You already almost know Find regularities in data
obvious without the aid of
tools y
Amount of data
Organization of data
o scures patterns Limits of human capabilities
to consider many things at
7/31/2019 Applying Data Mining
6/13
A computer program that
-
A person trained to
solving skills of one or
more human experts
order to capture the
experts implicitknowledge in explicit form
7/31/2019 Applying Data Mining
7/13
Data Mining ToolData
If Swollen Glands = YesThen Diagnosis = Strep Throat
Expert SystemBuilding Tool
Human Expert Knowledge Engineer
If Swollen Glands = Yes
Figure 1.2 Data mining vs. expert systems
Then Diagnosis = Strep Throat
7/31/2019 Applying Data Mining
8/13
What is simple search? Nearest neighbor classifier
K-nearest neighbor classifier
7/31/2019 Applying Data Mining
9/13
Create a table of instances with known classifications
is is t e training ata
Get a new instance
using the Euclidean distance metric for comparison
purposes 22
Find the instance in the training set that is closest on thebasis of the distance metric to the new instant
11 nn
Classify the new instance the same way as the one closest toit in the training data
7/31/2019 Applying Data Mining
10/13
Problems with Nearest Neighbor
ass ca on
Computation times will be large when the training set is
large
No differentiation of relevant from irrelevant attributes
o way to te w ic attri utes i erentiate among c asses
10
7/31/2019 Applying Data Mining
11/13
Different algorithms are available for different data mining tasks
Different tools exist that implement different algorithms and
different versions of algorithms
11
7/31/2019 Applying Data Mining
12/13
e.g., Algorithms Available in Microsofts
na ys s erv ces
Decision Trees
Linear Regression Nave Bayes
Clustering Algorithms
Association Rules Sequence Clustering
Time Series Analysis
Neura Networ s Logistic Regression
12
7/31/2019 Applying Data Mining
13/13
Applying Data Mining
By Susan L. Miertschin
13