37
Class 3 1 Marketing 8001: Database Marketing Class 3 Discovering and Utilizing Reliable Patterns in Customer Data

Marketing 8001: Database Marketing Class 3

  • Upload
    peta

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Marketing 8001: Database Marketing Class 3. Discovering and Utilizing Reliable Patterns in Customer Data. Today’s Agenda. Exam 1 Next Week Reminder Class 3 Materials Accessing online SAS Data Step Issues Displaying Results: Proc Format Differentiating Customers Review Homework1 - PowerPoint PPT Presentation

Citation preview

Page 1: Marketing 8001: Database Marketing Class 3

Class 3 1

Marketing 8001:Database Marketing

Class 3

Discovering and Utilizing Reliable Patterns in Customer Data

Page 2: Marketing 8001: Database Marketing Class 3

Class 3 2

Today’s Agenda

• Exam 1 Next Week• Reminder• Class 3 Materials• Accessing online SAS• Data Step Issues• Displaying Results: Proc Format• Differentiating Customers• Review Homework1• Class Exercise• Notes on Missing Data…Review on Your Own

Page 3: Marketing 8001: Database Marketing Class 3

Class 3 3

Exam 1 Next Week

• Readings and class work to date, including material and homework for next week

• SAS– Data Step programming– Procedures, including Proc Format

• Make sure you can find topics in…– LSB– Online documentation of SAS

• No multiple choice• You will need to be able to write SAS code and execute

SAS Procs to answer questions about customers.• Exam should take 2 hours…but I’ve reserved the room

until 7:30 p.m., in case some students need more time.

Page 4: Marketing 8001: Database Marketing Class 3

Class 3 4

Reminder

• Keep up with readings– LSB– D&D– Missou Media Pack

• Material in readings and Power Point slides can be on exams, even if not covered in class

• Bring questions about readings and SAS to class or contact me

• SAS is the tool we’ll use to do much of what is presented in D&D text

• Do not wait until Tuesday to do homework, unless you’ve mastered SAS

Page 5: Marketing 8001: Database Marketing Class 3

Class 3 5

Class 3 Materials

• Download Class 3 files to desktop…C:\M8001– Class 3 PowerPoint => Class3_SP2008.ppt– Word document for SAS data analysis class exercise

• ProcTabulateToProcExportExerciseClass3SP2008.doc

• Work on the Proc Tabulate part of this problem (and outputting the Proc Tabulate table) BEFORE you look at the SAS code I’ve provided for you.

– SAS program and data for SAS Class Exercise• ProcTabulateAndExportClass3SP2008.sas• HLPrimSampForTabExportClass3SP08.sas7bdat

– GJC Notes on Proc Tabulate• Notes on Using Proc Tabulate.doc

Page 6: Marketing 8001: Database Marketing Class 3

Class 3 6

Accessing Online SAS

• http://v8doc.sas.com/sashtml/• For Procs

– Click on + Base SAS Software– Click on + Base SAS Procedures Guide– Click on + Procedures– Click on Proc desired

• For Data Step documentation– Click on + SAS Language Reference:

Concepts

Page 7: Marketing 8001: Database Marketing Class 3

Class 3 7

SAS Data Step Issues

• SAS is standard tool for DBM

• Helps us deal with numbers, data

• Numbers are not abstract…they represent measurements of attributes/characteristics of real customers

• DBM is about marketing to people…and using data to do it well and profitably.

Page 8: Marketing 8001: Database Marketing Class 3

Class 3 8

SAS Data Step Issues

• Creating & manipulating variables…see LSB– Variable = expression;

• New_Variable = Old_Variable + 17;

– Math operations• Add + Var3 = Var1 + Var2;• Subtract - Var3 = Var1 – Var2• Multiply * Var3 = Var1 * Var2• Divide / Var3 = Var1 / Var2• Exponentiate ** Var3 = Var1**2• For division…remember zero divide exception

– Do not to forget to end statement with semi-

colon…;

Page 9: Marketing 8001: Database Marketing Class 3

Class 3 9

SAS Data Step Issues– Standard math rules for precedence for one line of

SAS code…see LSB• Exponentiation• Multiplication & Division• Addition & Subtraction• If many calculations in Data Step, remember SAS executes

statements in order.

– GJC Recommendation• Always use parentheses…don’t rely on standard math rules

above• Remember to use as many left “(“ as right “)”• e.g. Logit_Score=Log(Prob_Score/(1-Prob_Score))• Always check variables…especially complex ones

– Proc Freq or Tabulate for categorical variables– Proc Print for continuous

Page 10: Marketing 8001: Database Marketing Class 3

Class 3 10

SAS Data Step Issues

• SAS Functions– See LSB 3.3 for functions (pp 80-81)– If necessary, you may need to look online– Be familiar with functions…lookup when

needed

Page 11: Marketing 8001: Database Marketing Class 3

Class 3 11

IF-THEN / ELSE Statements

• Mnemonics…see LSBEQ (equals) (=)NE (not equals) (^=)GT (greater than) (>)LT (less than) (<)GE (greater than or equal) (>=)LE (less than or equal) (<=)

• GJC Recommendation– Except for “=“ sign, use the character mnemonics– You’ll make fewer errors

Page 12: Marketing 8001: Database Marketing Class 3

Class 3 12

IF-THEN / ELSE Statements

• IF condition THEN action; ELSE action;• IF Gender=‘F’ THEN Female=1; ELSE Female=0;• Note: Because Gender is character, need to place

values in single quotes. • Validate Coding

Proc Freq;Tables Gender * Female / Missing;Run;

• Remember…the more complicated the code, the more detailed your checking must be.

• Don’t want a preventable SAS bug to cost you money…or a lower grade on an exam

Page 13: Marketing 8001: Database Marketing Class 3

Class 3 13

IF-THEN / ELSE Statements

• Always remember to initialize variables to zero or a default value– ELSE does it…

IF Gender=‘F’ THEN Female=1; ELSE Female=0;

– Alternative Coding, explicitly initialize value…Female=0;

IF Gender=‘F’ THEN Female=1;

• GJC Recommendation– Explicitly initialize variable, fewer errors

Page 14: Marketing 8001: Database Marketing Class 3

Class 3 14

Displaying Results: Proc Format

• Very useful for categorizing and displaying continuous variables.

• Can also be used to combine levels of categorical variables together.

• Can define verbal labels for variables because:– Analysis takes place in your head, not in a computer!

• Proc Format does not read data• Following code will generate an error because

Proc Format does not read an input data file– Proc Format Data=Temp;

Page 15: Marketing 8001: Database Marketing Class 3

Class 3 15

Displaying Results: Proc Format

• Business managers most comfortable with data in tables.

• Formats created with Proc Format do not change any values for variables in your dataset…it only affects printing in Procs.

Page 16: Marketing 8001: Database Marketing Class 3

Class 3 16

Displaying Results: Proc Format• Example…first use Proc Format to create formats, then use the

Format statement in analysis Proc to associate format with specific variable.

Proc Format; Value AgeFmt35-55=’35-55 years’Other=‘Else’;Run;

Proc Freq Data=Temp;Tables Age;Format Age AgeFmt. ; /* Period at end of format associates it with var */

Run;

Page 17: Marketing 8001: Database Marketing Class 3

Class 3 17

Displaying Results: Proc Format

• We have seen how Proc Format works in Class 1 and 2 exercises

• Make sure you can use it!

• I recommend adapting code I’ve given you if you can.

Page 18: Marketing 8001: Database Marketing Class 3

Class 3 18

Differentiating Customers

• DBM is all about differences among customers that can be captured in data about them.

• DBMers want to use data to segment their customer base so that they can…– Market to them appropriately– Target them for increased profitability

• So, DBMers are always looking for differences in response (order, pay, return, not pay) for customers with different characteristics

Page 19: Marketing 8001: Database Marketing Class 3

Class 3 19

So, How Do We Determine if Customers with Different

Characteristics Have Different Response Rates?

• First step is to conduct a test

• Second step is to analyze the data

Page 20: Marketing 8001: Database Marketing Class 3

Class 3 20

From a Data Point-of-View, How Do We Differentiate Customers?

• Let’s say we have conducted a test and have merged our test’s “frozen file” of customer characteristics with our response variable (Order).

• The marketing team wants to know “how much higher than average do buyers of multiple Healthy Living books respond?”

• The frozen file variable NHLBKPAY measures the number of HL books paid.

Page 21: Marketing 8001: Database Marketing Class 3

Class 3 21

From a Data Point-of-View, How Do We Differentiate Customers?

• Method 1: Subset data via a Data Step and analyze separate datasets…this works, but is really inefficient.

Data Multi NotMulti; /*Creates 2 files */

Set A.TestData;

If NHLBKPAY GE 2 then Output Multi;

Else Output NotMulti;

Proc Means Data=Multi; Var Order; Run;

Proc Means Data=NotMulti; Var Order; Run;

Proc Means Data=A.TestData; Var Order; Run;

Page 22: Marketing 8001: Database Marketing Class 3

Class 3 22

From a Data Point-of-View, How Do We Differentiate Customers?

• Method 2: Use Proc Freq– Cross characteristic with response variable– Need Proc Format for NHLBKPAY because it can have many

values.– Proc Format

Value Freq Low-1 = ‘NotMulti’ 2-High=‘Multi’;Run;

– Proc Freq Data=A.TestData;Format NHLBKPAY Freq.;Tables NHLBKPAY * Order; Run;

Page 23: Marketing 8001: Database Marketing Class 3

Class 3 23

From a Data Point-of-View, How Do We Differentiate Customers?

• Method 3: Use Where statement with Proc MeansProc Means Data=A.TestData; Where NHLBKPay GE 2;Var Order; /* Order rate for Multis */Run;Proc Means Data=A.TestData; Where NHLBKPay LE 1; Var Order; /*Order rate for non-Multis*/

Run;

Proc Means Data=A.TestData;Var Order; /*Order rate for all names, the average rate*/

Run;

Page 24: Marketing 8001: Database Marketing Class 3

Class 3 24

From a Data Point-of-View, How Do We Differentiate Customers?

• Use Proc Tabulate to create tables– It’s complicated…but very powerful…so…– For simple tables, use Proc Freq and Proc

Means– For more complicated tables, use Proc

Tabulate

Page 25: Marketing 8001: Database Marketing Class 3

Class 3 25

More Complicated Tables

• Let’s say we want order rates for cells created by crossing one customer characteristic with another…a complicated problem.

• For example, calculate order rate for all 6 combinations of NHLBKBUY (in 3 categories) crossed with TLHLPAY (in 2 categories)

• A 3 * 2 tables has three rows and two columns for a total of 6 cells or combinations…will also be row results and column results.

• Draw a picture of such a table on paper!

Page 26: Marketing 8001: Database Marketing Class 3

Class 3 26

From a Data Point-of-View, How Do We Differentiate Customers?

• What do we need to calculate order rates for names with a given characteristic?…or combination of characteristics?– Number orderers with characteristic

(numerator)– Number all names with characteristic

(denominator)

• So, for a 3*2 table how can we get the numbers we need?

Page 27: Marketing 8001: Database Marketing Class 3

Class 3 27

More Complicated Tables: Use Proc Freq with Where Statement

• Calculate numbers of orderers for each combination of characteristics in the 3 * 2 table

• Proc Format Value Freq

Low-0 = ‘Zero’1= ‘One’2-High=‘2 or More’;

Value RecencyLow-12=’12 Mo Recent’13-High=‘GT 12 Mo Rec’;

Run;• Proc Freq Data=Temp;

Tables NHLBKBUY * TLHLPAY; Where Order=1; /* 3*2 table for orderers in sample */Format NHLBKBUY Freq. TLHLPAY Recency. ; Run;

Page 28: Marketing 8001: Database Marketing Class 3

Class 3 28

More Complicated Tables: Use Proc Freq with Where Statement

• Calculate total number of names for one combination of characteristics in the 3 * 2 table Proc Format Value Freq

Low-0 = ‘Zero’1= ‘One’2-High=‘2 or More’;

Value RecencyLow-12=’12 Mo Recent’13-High=‘GT 12 Mo Rec’;

Run;• Proc Freq Data=Temp;

Tables NHLBKBUY * TLHLPAY; /*3*2 table for names in sample */Format NHLBKBUY Freq. TLHLPAY Recency. ; Run;

Page 29: Marketing 8001: Database Marketing Class 3

Class 3 29

More Complicated Tables: Use Proc Freq with Where Statement

• Calculate order rates and indices by hand.– Have 6 numerators

– Have 6 denominators

Page 30: Marketing 8001: Database Marketing Class 3

Class 3 30

More Complicated Tables: Use Proc Means with Where Statement

• Mean of 0-1 coded order variable is order rate as a proportion.

• Problem: You cannot use Proc Format with Proc Means.• But, Where statement does work for ranges of variables• Proc Means Data=Temp;

Where NHLBKBUY LE 0 and TLHLPAY LE 12; Var Order;Run;

• Proc Means Data=Temp;Where NHLBKBUY EQ 1 and TLHLPAY LE 12; Var Order;Run;

• Need to make 4 more runs…but gets job done

Page 31: Marketing 8001: Database Marketing Class 3

Class 3 31

More Complicated Tables:Use Proc Tabulate

• Need Proc Format to categorize variablesProc Tabulate Data=Temp;Class NHLBKBUY TLHLPAY ; Var Order;

Table NHLBKBUY * (N PCTN MEAN*ORDER), TLHLPAY;

Format NHLBKBUY Freq. TLHLPAY Recency. ; Run;

• On paper, draw what this table would look like• Save the code for any Proc Tabulate that works…it’s a difficult Proc

to use and adapting code is easier than re-writing it.

Page 32: Marketing 8001: Database Marketing Class 3

Class 3 32

Review Homework1

Page 33: Marketing 8001: Database Marketing Class 3

Class 3 33

Class Exercise

• Summarizing Data with Proc Tabulate

• Outputting Proc Tabulate table to Excel

Page 34: Marketing 8001: Database Marketing Class 3

Class 3 34

Notes on Missing Data…a Quick Primer Review This Section on Your Own

• How SAS represents missing data…– Numeric data => with a decimal point (.)– Character data => with blanks => ’ ‘

• In real world data, almost all data have missing values, e.g….– In a SAS file with 15,000 names, the mean of VarA is

based on 14,001 names– But, mean of VarB based on 832 names– Can’t compare means (or do correlations) because

names may be different

Page 35: Marketing 8001: Database Marketing Class 3

Class 3 35

Notes on Missing Data• When you have missing data…

– Need to determine you do not have data problems…never assume data are OK!

– May need to create special flags to encode missing data for modeling and profiling, especially if names with missing data respond at different rate than names with data.

– In example above…• 1 means female• 0 means not female (i.e., male or missing)

Page 36: Marketing 8001: Database Marketing Class 3

Class 3 36

Notes on Missing Data

– Another example• 1 mean female• 2 means male• Missing Recoded to 3 where 3 means Don’t Know

GenderNew = 0;If Gender = ‘ ‘ then GenderNew = 3;If Gender = ‘F’ then GenderNew = 1;If Gender = ‘M’ then GenderNew = 2;

• Note: In this example the variable Gender is character…the variable GenderNew is numeric.

• There should no names with a value of zero!

Page 37: Marketing 8001: Database Marketing Class 3

Class 3 37

Notes on Missing Data• The Bad News

– Treatment of missing data is usually complicated– Variables with much missing data not very useful– In new business, transaction & promotion variables

mostly missing…that is, they have little coverage• The Good News

– Only our Healthy Living (HL) data have missing values

– So, for class don’t worry about it– In real world, worry about it– If you want to work with missing data, practice with

the HL data…in these data, missing values and other unique values of variables are encoded as negative integer numbers