Upload
hoan-phuc
View
1.178
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Data Mining With SQL ServerNguyen To Hoan PhucPham Huu Khanh
2
Objectives
• Understand What is Data Mining
• Learn the Data Mining Process
• How to work with Data Mining
3
Agenda
• Data vs. Information
• What is Data Mining?
• Technical Platform
• Data Mining Process Overview
• Key Concepts and Terminology
• DEMO: Data Mining Process in Detail Using DMX
4
Data vs. Information
5
6
7
SELECT * FROM SALES
WHERE AMOUNT >= ALL (SELECT
AMOUNT
FROM PERSONSALES A
INNER JOIN COMPANYSALES B
ON A.ID = B.ID)
Which sales person had the highest amount of sales?
How many sales transactions occurred in the month of December?
Which product will have the highest amount of sales in the next month?
Which customer will buy our new product?
8
What is Data Mining?
9
Data Mining
• Technologies for analysis of data and discovery
of (very) hidden patterns
• Uses a combination of statistics, probability
analysis and database technologies
• Fairly young (<20 years old) but clever
algorithms developed through database
research
10
What does Data Mining Do?
Explores Your Data
Finds Patterns -
Trends
Performs Predictio
ns
11
Data Mining Tasks
• Classification• Phân loai, xêp hang
• Regression• Hôi quy
• Segmentation• Phân khuc
• Association• Liên kêt
• Sequence Analysis• Phân tich chuôi, day
12
The Technical Platform
13
Data acquisition and integration from multiple sources
Data transformation and synthesis
Knowledge and pattern detection through Data Mining
Data enrichment with logic rules and hierarchical views
Data presentation and distribution
Data publishing for mass recipients
Integrate Analyze Report
SQL ServerWe Need More Than Just Database Engine
14
DM – Part of Microsoft SQL Server
15
Server Mining Architecture
Analysis ServicesServer
Mining Model
Data Mining Algorithm DataSource
Excel/Visio/SSRS/Your App
OLE DB/ADOMD/XMLA
Deploy
BIDSExcelVisioSSMS
AppData
16
Data Mining Process Overview
17
Mining Model Mining ModelMining Model
Mining Process
DM EngineDM Engine
Training data
Data to be predictedMining Model
With predictions
18
Steps for Building a DM Model
1. Model Creation
• Define columns for cases: visually (BIDS), using DMX, or from PMML
2. Model Training
• Feed lots of data from a real database, or from a system log
Congratulations! We now have a model
3. Model Testing
• Test on sample data to check predictions.
• Testing data must be different from training
• If we get nonsense, adjust the algorithm, its parameters, model
design, or even data
4. Model Use (Exploration and Prediction)
• Use the model on new data to predict outcomes
19
Many Approaches
• Work the way you like:
• Database experts and SQL veterans:
• Write queries in DMX (similar to T-SQL)
• Everyone else:
• Use Business Intelligence Development Studio (BIDS) –
rich GUI included with SSAS
• Hosted in Visual Studio (included!)
• You don’t have to program – click-click instead
• Use Excel 2007 with Data Mining Add-Ins
• The “Data Mining” tab has everything you need
• “Table Analysis” tab is easier but simplified
20
Key Concepts and Terminology
21
Mining Structure
• Describes data to be mined• Columns from a data source and their:
• Data Type• Content Type
• Contains Mining Models• Often we build several different models in one
structure• Holds training data, known as Cases (if required)• Holds testing data, known as Holdout (in SQL
2008)
22
Mining Structure
23
Data Mining Model
• Container of patterns discovered by a Data
Mining Algorithm amongst the training Cases
• A table containing patterns
• Expressed by visualisers
• Specifies usage of columns already defined in
the Mining Structure
24
Cases: The Things We Study
• Case – set of columns (attributes) you want to
analyse
• Age, Gender, Region, Annual Spending
• Case Key – unique ID of a case
25
DEMOData Mining Process in DetailUsing DMX
26
Data Mining ExtensionsDMX• “T-SQL” for Data Mining• Easy! Like scripting for IT Pros
• Two types of statements:• Data Definition
• CREATE, ALTER, EXPORT, IMPORT, DROP• Data Manipulation
• INSERT INTO, SELECT, DELETE
27
DMX – Just Like T-SQLCREATE MINING MODEL CreditRisk
(CustID LONG KEY,
Gender TEXT DISCRETE,
Income LONG CONTINUOUS,
Profession TEXT DISCRETE,
Risk TEXT DISCRETE PREDICT)
USING Microsoft_Decision_Trees
INSERT INTO CreditRisk
(CustId, Gender, Income, Profession, Risk)
Select
CustomerID, Gender, Income, Profession,Risk
From Customers
Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)
FROM CreditRisk PREDICTION JOIN NewCustomers
ON CreditRisk.Gender=NewCustomer.Gender
AND CreditRisk.Income=NewCustomer.Income
AND CreditRisk.Profession=NewCustomer.Profession
28
Demo’s Steps
1• Create Mining Structure
2• Create Mining Model
3• Process Mining Model
4• Test Model
5• Execute Prediction
29
30
THANK YOU !