18
What is Data Mining? What is Data Mining? process of finding correlations process of finding correlations or patterns among dozens of or patterns among dozens of fields in large relational fields in large relational databases databases finding hidden information in a finding hidden information in a database database Ajay Tripath

What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Embed Size (px)

Citation preview

Page 1: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

What is Data Mining?What is Data Mining?

• process of finding correlations or process of finding correlations or patterns among dozens of fields in patterns among dozens of fields in large relational databases large relational databases

• finding hidden information in a finding hidden information in a database database

Ajay Tripathi

Page 2: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Buzz Buzz WordWordss

Conventional Conventional ApproachApproach

Data Mining Data Mining ApproachApproach

Query Query A well defined A well defined query is stated in query is stated in a language such a language such as SQL. as SQL.

The query is not The query is not well defined. The well defined. The data Minor might data Minor might not exactly sure not exactly sure of what he of what he wants. wants.

Ajay Tripathi

Page 3: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Buzz Buzz WordWordss

Conventional Conventional ApproachApproach

Data Mining Data Mining ApproachApproach

Data Data Raw Fact Raw Fact Data have been Data have been cleansed and cleansed and modified for modified for better support to better support to mining process. mining process.

Ajay Tripathi

Page 4: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Buzz Buzz WordWordss

Conventional Conventional ApproachApproach

Data Mining Data Mining ApproachApproach

OutpuOutput t

Output of the Output of the query consists of query consists of data from data from database that database that satisfies the satisfies the query. The query. The output is usually output is usually a subset of a subset of database. database.

Output probably Output probably is not a subset of is not a subset of the database. the database. Instead it is the Instead it is the output of some output of some analysis of analysis of contents of the contents of the database. database.

Ajay Tripathi

Page 5: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

With The Help Of Data Mining With The Help Of Data Mining Companies….Companies….

• Build Relationships among Internal Build Relationships among Internal as well as External factors in as well as External factors in Organization.Organization.

• Determine the impact on Sales, Determine the impact on Sales, customer satisfaction and Profit.customer satisfaction and Profit.

• View Detail Transactional Data in View Detail Transactional Data in Summary Form.Summary Form.

Ajay Tripathi

Page 6: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

PredictivePredictiveDescriptiveDescriptive

Makes a prediction Makes a prediction about values of about values of data using known data using known results found from results found from different data different data

Identifies patterns Identifies patterns or relationships in or relationships in data data

Explore the Explore the properties of data properties of data examined, not to examined, not to predict new predict new properties properties

Ajay Tripathi

Page 7: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

PredictivePredictiveDescriptiveDescriptive• ClassificationClassification

• RegressionRegression

• Time SeriesTime Series

• PredictionPrediction

• ClusteringClustering

• SummarizationSummarization

• Association RulesAssociation Rules

• Sequence Sequence DiscoveryDiscovery

Ajay Tripathi

Page 8: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Techniques……….Techniques……….– Statistical TechniquesStatistical Techniques

• Point EstimationPoint Estimation

• Bay’s TheoremBay’s Theorem

• Hypothesis TestingHypothesis Testing

• Regression and CorrelationRegression and Correlation

– Similarity MeasuresSimilarity Measures– Decision TreeDecision Tree– Neural NetworkNeural Network– Genetic AlgorithmGenetic Algorithm

Ajay Tripathi

Page 9: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Decision Tree TechniqueDecision Tree Technique

• Predictive Modeling TechniquePredictive Modeling Technique

• Used in Classification, Clustering, and Used in Classification, Clustering, and Prediction TaskPrediction Task

• use a “divide and conquer” use a “divide and conquer” technique technique

Ajay Tripathi

Page 10: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

A decision tree is a tree A decision tree is a tree where where • the root and each internal node is the root and each internal node is

labeled with a question or problem labeled with a question or problem • The branches of the tree represent The branches of the tree represent

each possible answer to the each possible answer to the associated question. associated question.

• Each leaf node represents a Each leaf node represents a prediction of a solution to the prediction of a solution to the problem under consideration problem under consideration

Ajay Tripathi

Page 11: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Decision Tree Model Consists Decision Tree Model Consists Of….Of….

• A tree itselfA tree itself

• An algorithm to create the tree.An algorithm to create the tree.

• An algorithm that will be applicable An algorithm that will be applicable to data and solve the problem.to data and solve the problem.

Ajay Tripathi

Page 12: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Ajay Tripathi

Page 13: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Real World Example On DTReal World Example On DT• Assume you own a sporting goods storeAssume you own a sporting goods store• You are sure that if the local team wins you will You are sure that if the local team wins you will

be able to sell a significant number of T-shirts be able to sell a significant number of T-shirts proclaiming the local team as the national proclaiming the local team as the national champion. champion.

• You expect to sell between 2,000 to 10,000 shirts You expect to sell between 2,000 to 10,000 shirts at $20 each. at $20 each.

• You can order the shirts for $7 each. Any shirts You can order the shirts for $7 each. Any shirts you do not sell you can sell as scrap for $2 each. you do not sell you can sell as scrap for $2 each.

• In addition, you estimate that there is a 60% In addition, you estimate that there is a 60% chance of the local team winning. chance of the local team winning.

• You must decide today if you will order any shirts, You must decide today if you will order any shirts, and if so, how many?and if so, how many?

Ajay Tripathi

Page 14: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

In this problem you face two In this problem you face two uncertain eventsuncertain events

• you don't know who will win the you don't know who will win the championship gamechampionship game

• you don't know how many shirts you you don't know how many shirts you can sell even if the local team does can sell even if the local team does win.win.

Ajay Tripathi

Page 15: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

First of all you face a decision. You must decide how many shirts to order. Here let us assume you can only order in quantities of 5,000. This means you must order 5,000, 10,000, or none at all.

Once you have made a decision about how many shirts to order, you next come to next node--the local team either wins or loses the championship.

If the team loses, you must sell all of the shirts as scrap losing either $25,000 if you order 5,000 shirts ($2*5,000-$7*5,000) or $50,000 if you order 10,000 shirts ($2*10,000-$7*10,000).

If the team wins, you face another uncertainty--the demand for shirts.

Ajay Tripathi

Page 16: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

you cannot sell more shirts than you have.

Therefore, if you order only 5,000 shirts, you will make the same profit if demand is 5,000, 7,000 or 9,000.

All shirts you have in excess of demand can be sold as scrap.

Here is the solution…………..

Ajay Tripathi

Page 17: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

The solution indicated by this tree is that you should order 5,000 shirts because this option yields the highest expected value.

Ajay Tripathi

Page 18: What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or

Ajay Tripathi