Upload
william-atkinson
View
216
Download
0
Embed Size (px)
Citation preview
What is Data Mining?What is Data Mining?
• process of finding correlations or process of finding correlations or patterns among dozens of fields in patterns among dozens of fields in large relational databases large relational databases
• finding hidden information in a finding hidden information in a database database
Ajay Tripathi
Buzz Buzz WordWordss
Conventional Conventional ApproachApproach
Data Mining Data Mining ApproachApproach
Query Query A well defined A well defined query is stated in query is stated in a language such a language such as SQL. as SQL.
The query is not The query is not well defined. The well defined. The data Minor might data Minor might not exactly sure not exactly sure of what he of what he wants. wants.
Ajay Tripathi
Buzz Buzz WordWordss
Conventional Conventional ApproachApproach
Data Mining Data Mining ApproachApproach
Data Data Raw Fact Raw Fact Data have been Data have been cleansed and cleansed and modified for modified for better support to better support to mining process. mining process.
Ajay Tripathi
Buzz Buzz WordWordss
Conventional Conventional ApproachApproach
Data Mining Data Mining ApproachApproach
OutpuOutput t
Output of the Output of the query consists of query consists of data from data from database that database that satisfies the satisfies the query. The query. The output is usually output is usually a subset of a subset of database. database.
Output probably Output probably is not a subset of is not a subset of the database. the database. Instead it is the Instead it is the output of some output of some analysis of analysis of contents of the contents of the database. database.
Ajay Tripathi
With The Help Of Data Mining With The Help Of Data Mining Companies….Companies….
• Build Relationships among Internal Build Relationships among Internal as well as External factors in as well as External factors in Organization.Organization.
• Determine the impact on Sales, Determine the impact on Sales, customer satisfaction and Profit.customer satisfaction and Profit.
• View Detail Transactional Data in View Detail Transactional Data in Summary Form.Summary Form.
Ajay Tripathi
PredictivePredictiveDescriptiveDescriptive
Makes a prediction Makes a prediction about values of about values of data using known data using known results found from results found from different data different data
Identifies patterns Identifies patterns or relationships in or relationships in data data
Explore the Explore the properties of data properties of data examined, not to examined, not to predict new predict new properties properties
Ajay Tripathi
PredictivePredictiveDescriptiveDescriptive• ClassificationClassification
• RegressionRegression
• Time SeriesTime Series
• PredictionPrediction
• ClusteringClustering
• SummarizationSummarization
• Association RulesAssociation Rules
• Sequence Sequence DiscoveryDiscovery
Ajay Tripathi
Techniques……….Techniques……….– Statistical TechniquesStatistical Techniques
• Point EstimationPoint Estimation
• Bay’s TheoremBay’s Theorem
• Hypothesis TestingHypothesis Testing
• Regression and CorrelationRegression and Correlation
– Similarity MeasuresSimilarity Measures– Decision TreeDecision Tree– Neural NetworkNeural Network– Genetic AlgorithmGenetic Algorithm
Ajay Tripathi
Decision Tree TechniqueDecision Tree Technique
• Predictive Modeling TechniquePredictive Modeling Technique
• Used in Classification, Clustering, and Used in Classification, Clustering, and Prediction TaskPrediction Task
• use a “divide and conquer” use a “divide and conquer” technique technique
Ajay Tripathi
A decision tree is a tree A decision tree is a tree where where • the root and each internal node is the root and each internal node is
labeled with a question or problem labeled with a question or problem • The branches of the tree represent The branches of the tree represent
each possible answer to the each possible answer to the associated question. associated question.
• Each leaf node represents a Each leaf node represents a prediction of a solution to the prediction of a solution to the problem under consideration problem under consideration
Ajay Tripathi
Decision Tree Model Consists Decision Tree Model Consists Of….Of….
• A tree itselfA tree itself
• An algorithm to create the tree.An algorithm to create the tree.
• An algorithm that will be applicable An algorithm that will be applicable to data and solve the problem.to data and solve the problem.
Ajay Tripathi
Ajay Tripathi
Real World Example On DTReal World Example On DT• Assume you own a sporting goods storeAssume you own a sporting goods store• You are sure that if the local team wins you will You are sure that if the local team wins you will
be able to sell a significant number of T-shirts be able to sell a significant number of T-shirts proclaiming the local team as the national proclaiming the local team as the national champion. champion.
• You expect to sell between 2,000 to 10,000 shirts You expect to sell between 2,000 to 10,000 shirts at $20 each. at $20 each.
• You can order the shirts for $7 each. Any shirts You can order the shirts for $7 each. Any shirts you do not sell you can sell as scrap for $2 each. you do not sell you can sell as scrap for $2 each.
• In addition, you estimate that there is a 60% In addition, you estimate that there is a 60% chance of the local team winning. chance of the local team winning.
• You must decide today if you will order any shirts, You must decide today if you will order any shirts, and if so, how many?and if so, how many?
Ajay Tripathi
In this problem you face two In this problem you face two uncertain eventsuncertain events
• you don't know who will win the you don't know who will win the championship gamechampionship game
• you don't know how many shirts you you don't know how many shirts you can sell even if the local team does can sell even if the local team does win.win.
Ajay Tripathi
First of all you face a decision. You must decide how many shirts to order. Here let us assume you can only order in quantities of 5,000. This means you must order 5,000, 10,000, or none at all.
Once you have made a decision about how many shirts to order, you next come to next node--the local team either wins or loses the championship.
If the team loses, you must sell all of the shirts as scrap losing either $25,000 if you order 5,000 shirts ($2*5,000-$7*5,000) or $50,000 if you order 10,000 shirts ($2*10,000-$7*10,000).
If the team wins, you face another uncertainty--the demand for shirts.
Ajay Tripathi
you cannot sell more shirts than you have.
Therefore, if you order only 5,000 shirts, you will make the same profit if demand is 5,000, 7,000 or 9,000.
All shirts you have in excess of demand can be sold as scrap.
Here is the solution…………..
Ajay Tripathi
The solution indicated by this tree is that you should order 5,000 shirts because this option yields the highest expected value.
Ajay Tripathi
Ajay Tripathi