30
Data Mining With SQL Server Nguyen To Hoan Phuc Pham Huu Khanh

Data Mining With SQL Server

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Data Mining With SQL Server

Data Mining With SQL ServerNguyen To Hoan PhucPham Huu Khanh

Page 2: Data Mining With SQL Server

2

Objectives

• Understand What is Data Mining

• Learn the Data Mining Process

• How to work with Data Mining

Page 3: Data Mining With SQL Server

3

Agenda

• Data vs. Information

• What is Data Mining?

• Technical Platform

• Data Mining Process Overview

• Key Concepts and Terminology

• DEMO: Data Mining Process in Detail Using DMX

Page 4: Data Mining With SQL Server

4

Data vs. Information

Page 5: Data Mining With SQL Server

5

Page 6: Data Mining With SQL Server

6

Page 7: Data Mining With SQL Server

7

SELECT * FROM SALES

WHERE AMOUNT >= ALL (SELECT

AMOUNT

FROM PERSONSALES A

INNER JOIN COMPANYSALES B

ON A.ID = B.ID)

Which sales person had the highest amount of sales?

How many sales transactions occurred in the month of December?

Which product will have the highest amount of sales in the next month?

Which customer will buy our new product?

Page 8: Data Mining With SQL Server

8

What is Data Mining?

Page 9: Data Mining With SQL Server

9

Data Mining

• Technologies for analysis of data and discovery

of (very) hidden patterns

• Uses a combination of statistics, probability

analysis and database technologies

• Fairly young (<20 years old) but clever

algorithms developed through database

research

Page 10: Data Mining With SQL Server

10

What does Data Mining Do?

Explores Your Data

Finds Patterns -

Trends

Performs Predictio

ns

Page 11: Data Mining With SQL Server

11

Data Mining Tasks

• Classification• Phân loai, xêp hang

• Regression• Hôi quy

• Segmentation• Phân khuc

• Association• Liên kêt

• Sequence Analysis• Phân tich chuôi, day

Page 12: Data Mining With SQL Server

12

The Technical Platform

Page 13: Data Mining With SQL Server

13

Data acquisition and integration from multiple sources

Data transformation and synthesis

Knowledge and pattern detection through Data Mining

Data enrichment with logic rules and hierarchical views

Data presentation and distribution

Data publishing for mass recipients

Integrate Analyze Report

SQL ServerWe Need More Than Just Database Engine

Page 14: Data Mining With SQL Server

14

DM – Part of Microsoft SQL Server

Page 15: Data Mining With SQL Server

15

Server Mining Architecture

Analysis ServicesServer

Mining Model

Data Mining Algorithm DataSource

Excel/Visio/SSRS/Your App

OLE DB/ADOMD/XMLA

Deploy

BIDSExcelVisioSSMS

AppData

Page 16: Data Mining With SQL Server

16

Data Mining Process Overview

Page 17: Data Mining With SQL Server

17

Mining Model Mining ModelMining Model

Mining Process

DM EngineDM Engine

Training data

Data to be predictedMining Model

With predictions

Page 18: Data Mining With SQL Server

18

Steps for Building a DM Model

1. Model Creation

• Define columns for cases: visually (BIDS), using DMX, or from PMML

2. Model Training

• Feed lots of data from a real database, or from a system log

Congratulations! We now have a model

3. Model Testing

• Test on sample data to check predictions.

• Testing data must be different from training

• If we get nonsense, adjust the algorithm, its parameters, model

design, or even data

4. Model Use (Exploration and Prediction)

• Use the model on new data to predict outcomes

Page 19: Data Mining With SQL Server

19

Many Approaches

• Work the way you like:

• Database experts and SQL veterans:

• Write queries in DMX (similar to T-SQL)

• Everyone else:

• Use Business Intelligence Development Studio (BIDS) –

rich GUI included with SSAS

• Hosted in Visual Studio (included!)

• You don’t have to program – click-click instead

• Use Excel 2007 with Data Mining Add-Ins

• The “Data Mining” tab has everything you need

• “Table Analysis” tab is easier but simplified

Page 20: Data Mining With SQL Server

20

Key Concepts and Terminology

Page 21: Data Mining With SQL Server

21

Mining Structure

• Describes data to be mined• Columns from a data source and their:

• Data Type• Content Type

• Contains Mining Models• Often we build several different models in one

structure• Holds training data, known as Cases (if required)• Holds testing data, known as Holdout (in SQL

2008)

Page 22: Data Mining With SQL Server

22

Mining Structure

Page 23: Data Mining With SQL Server

23

Data Mining Model

• Container of patterns discovered by a Data

Mining Algorithm amongst the training Cases

• A table containing patterns

• Expressed by visualisers

• Specifies usage of columns already defined in

the Mining Structure

Page 24: Data Mining With SQL Server

24

Cases: The Things We Study

• Case – set of columns (attributes) you want to

analyse

• Age, Gender, Region, Annual Spending

• Case Key – unique ID of a case

Page 25: Data Mining With SQL Server

25

DEMOData Mining Process in DetailUsing DMX

Page 26: Data Mining With SQL Server

26

Data Mining ExtensionsDMX• “T-SQL” for Data Mining• Easy! Like scripting for IT Pros

• Two types of statements:• Data Definition

• CREATE, ALTER, EXPORT, IMPORT, DROP• Data Manipulation

• INSERT INTO, SELECT, DELETE

Page 27: Data Mining With SQL Server

27

DMX – Just Like T-SQLCREATE MINING MODEL CreditRisk

(CustID LONG KEY,

Gender TEXT DISCRETE,

Income LONG CONTINUOUS,

Profession TEXT DISCRETE,

Risk TEXT DISCRETE PREDICT)

USING Microsoft_Decision_Trees

INSERT INTO CreditRisk

(CustId, Gender, Income, Profession, Risk)

Select

CustomerID, Gender, Income, Profession,Risk

From Customers

Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)

FROM CreditRisk PREDICTION JOIN NewCustomers

ON CreditRisk.Gender=NewCustomer.Gender

AND CreditRisk.Income=NewCustomer.Income

AND CreditRisk.Profession=NewCustomer.Profession

Page 28: Data Mining With SQL Server

28

Demo’s Steps

1• Create Mining Structure

2• Create Mining Model

3• Process Mining Model

4• Test Model

5• Execute Prediction

Page 29: Data Mining With SQL Server

29

Page 30: Data Mining With SQL Server

30

THANK YOU !