7-Lecture Jan 15th 2014

Embed Size (px)

Citation preview

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining

    Lecture #1 : Jan 15th 2014

  • 8/11/2019 7-Lecture Jan 15th 2014



    Data is produced at a phenomenal rate

    Our ability to store has grown

    Users expect more sophisticated information How?



  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining works with Warehouse


    Data Warehousing provides the

    Enterprise with a memory

    Data Mining provides theEnterprise with intelligence

  • 8/11/2019 7-Lecture Jan 15th 2014


    Database Processing vs. Data Mining



    Well defined



    Poorly defined

    No precise query language

    Output Precise

    Subset of database

    Output Fuzzy

    Not a subset of database

    Database Processing Data Mining

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining a business Process

    Business Process: data mining is a business process

    that interacts with other business processes

    data mining starts with data, then through

    analysis informs or inspires action which in turncreates data that begets data mining

    Organizations wanting to excel do not view data

    mining as a side show. It readily fits in with other

    strategies for understanding markets and


  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining large amounts of data

    i. How much is a lot of Data?

    ii. Excel: max rows possible ?. A very versatile tool forworking with relatively small amounts of data

    iii. Early days of data mining (1960s and 70s) data wasscarce and some of the techniques were developedin that period

    iv. Today computing power is readily available andlarge amount of data is not a handicap it is anadvantage

    Data mining techniques work better with a large samplepopulation

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining Meaningful Patterns and Rules

    i. Business Operations : generates the data as well asthe patterns at the same time

    ii. Data Mining: the goal is to find patterns that areuseful to the business. Helping business is moreimportant than amusing the miner.

    iii. Call Center Application: Classifies customers asGreen, Amber and Red for targeting retention,facilitating customer acquisition goal beingoffering better customer value.

    iv. Companies are generating business modelscentered around data mining.

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining and Customer Relationship Management

    Firms of all sizes need to create 1-2-1 relationships withcustomers. Form a learning relationship with their customers.

    Firms are learning to look at the value of each customer

    individually to focus on profitable customers.

    Segmentation to personalization requires changes throughoutthe organization especially in marketing sales and customer


    Delivery centered to a customer centered organization

    Data Mining is only a collection of tools and techniques tosupport a customer centric organisation

  • 8/11/2019 7-Lecture Jan 15th 2014


    What is Data Mining

    Narrow sense : a collection of tools andtechniques to support the business

    Broader sense: is an attitude that business

    actions should be based on learning, thatinformed decisions are better thanuninformed decisions and that measuringresults is beneficial to the business

    It is a business process and methodology forapplying analytical tools and techniques

  • 8/11/2019 7-Lecture Jan 15th 2014


  • 8/11/2019 7-Lecture Jan 15th 2014


    Making Money or Loosing Money

    Home Equity Loans generate revenues for the

    banks like Fidelity Investments

    Bill Paying Service should it be discontinued

    as it is loosing money. Customers perceive it

    as a value added service.

    Customer owns a house and a large credit

    card outstanding debt what should the bank


  • 8/11/2019 7-Lecture Jan 15th 2014


    Bank of America Case Study

    BofA - boost its home equity loans business.

    Using common sense the message was:

    People with college age children want to borrow

    against home equity to pay tuition bills

    People with high but variable incomes want to use

    home equity to smooth out peaks and valleys in

    their income stream.

  • 8/11/2019 7-Lecture Jan 15th 2014


    Bank of America Case Study Data from 42 systems of record was cleansed. Some records

    dated back to 1914. customer records had about 250 fields Decision Tree techniques were applied to the customer. Those

    that had availed the product offering as well as those whospurned the offering. Rules were discovered and a goodprospect flag was generated by a data mining model.

    Sequential patterns were studied when does the customerwant the loan. Clustering was done. 14 clusters weregenerated, one or two had intriguing properties 39% of customers had business and personal accounts

    The cluster accounted for 25%+ of the customers who had beenclassified as responders

    People may be using home equity loans to start a business

    Message use your equity to do what you always wanted to do

  • 8/11/2019 7-Lecture Jan 15th 2014


    Virtuous Cycle

    1. Identify Business Opportunities

    2. Mining Data transform into actionable


    3. Acting on Information

    4. Measure results

    5. GO TO STEP 1 ( infinite loop)Focus on Business Results rather than amusing

    the data miner

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining and Marketing TestsControl Group

    Chosen at random

    receives message

    Response measures

    message without model

    Target Group

    chosen by model

    receives message

    Response measures

    message with model

    Holdout Group

    Chosen at Random

    receives no message

    Response Measures

    background response

    Modeled Holdout


    Chosen by model no


    Response measuresmessage model without





    Picked by Model

    NO YES

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining Systems vs Operational


  • 8/11/2019 7-Lecture Jan 15th 2014


    Chapter 2

    Data Mining Applications in

    Marketing and CustomerRelationship Marketing

  • 8/11/2019 7-Lecture Jan 15th 2014


    Customer Lifecycle

    Data Mining refers to the life cycle of the

    customer relationship. Five major phases:

    Prospects: are in the target market

    Responders prospects who have exhibited interest

    New Customers: responders who make a


    Established Customers Former customers

  • 8/11/2019 7-Lecture Jan 15th 2014


    Customer Lifecycle Stages

  • 8/11/2019 7-Lecture Jan 15th 2014


    Subscription vs Event Based Relationships

    Event Based Relationships

    Transactions purchasing a mobile prepaid card

    Companies communicate via broadcasts

    Encourage customers to visit websites

    Subscription Based Relationships

    Postpaid Mobile Contract

    Contracts enable a learning relationships

    Customer can be studied over time

  • 8/11/2019 7-Lecture Jan 15th 2014


    Customer Experiencenewspaper subscribes

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining

    Process : Customer Acquisition

    Who are prospects:

    Prospect base may change over time

    Will the past be a good predictor for the future

    Prospects in a new geography may differ from

    current customers

    Changes to products, services may bring in a

    different target audience

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining

    Role in Customer Acquisition

    Data availability limits the role of data mining Response Modeling is used for channels such as direct mail and

    telemarketing as cost of contact is relatively high. Dataavailability falls into 3 categories:

    Source of prospect Appended individual/household data

    Appended demographic data at geographical level

    Typically prospect lists are purchased

    Modeling may be required to shortlist customers for direct

    marketing perhaps based on demographic data

    Echo effect is a challenge to building models. For example aprospect receives an e-mail but responds over phone

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining

    Role in Customer Activation

    Operational process, how can data mining help

    Activation provides a view of new customers at

    the point they start. A very important perspective

    and as a data source it needs to be preserved

    Customer activation provides initial conditions of

    customer relationship. Such initial conditions are

    often useful predictors of long term customerbehaviour.

  • 8/11/2019 7-Lecture Jan 15th 2014


    Activation Funnelhome delivery newspaper subscribers

    New sales leads come though many channels Prospects/Leads

    Only leads with verifiable addresses and credit

    cards become ordersORDERS



    Only orders with routable addresses become


    Only some subscriptions are paid

    Data Mining can play the role in understanding whether or not customers are

    moving through the process the way they should be or what characteristics

    cause a customer to fail during the activation stage

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining

    Customer Relationship Management

    Primary goal is to increase customer value

    Up-selling buy a more expensive model

    Cross Selling broaden customer relationship

    Usage Stimulation loyalty points Customer Value Calculation assign a future

    expected value to each customer

    Customer Options vs Simplicity ?

    Are data mining and personalizationcompatible ?

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining

    Customer Relationship Management

    Data Mining helps dig out customer affinities

    Data Mining can play a key role in understandingthe operational side of the business

    Customer retention is the key Predictive Modeling is often applied in this area

    Techniques of Survival Analysis or comparing longstanding customers with customers with short tenures

    Win-back Why customers left? ( analyze customer complaints)

    Tends to depend more on operational strategies

  • 8/11/2019 7-Lecture Jan 15th 2014


    Targeting Customers

    A nationwide publication determined itsreaders have the following characteristics:

    59% readers are college educated

    46% have professional or executive occupations 21% have household income >= USD75K

    7% have household income >= $100K

    Targeting Objective:i. Any suggestions for increasing revenue for the publication?

  • 8/11/2019 7-Lecture Jan 15th 2014


    Targeting Customers

    A nationwide publication determined itsreaders have the following characteristics:

    59% readers are college educated

    46% have professional or executive occupations 21% have household income >= USD75K

    7% have household income >= $100K

    Targeting Objective:i. Increase circulation amongst prospects matching the profile

    ii. Sell advertising space to businesses wanting to reach such an


    iii. Next Steps ?

  • 8/11/2019 7-Lecture Jan 15th 2014


    Who Fits the Profile

    Who matches the profile better Amy : Professional College educated earns $80K pa

    Bob: High School Grad earning $50K pa

    How will you make the comparisons?

  • 8/11/2019 7-Lecture Jan 15th 2014


    Who Fits the Profile


    Any room for improvement?

  • 8/11/2019 7-Lecture Jan 15th 2014


    Who Fits the Profile

    US Population Figures: College Educated = 20.3%

    Professional/Executive = 19.2%

    Income > $75K = 9.5%

    Income > $100K = 2.4%

  • 8/11/2019 7-Lecture Jan 15th 2014


    Who Fits the Profile

  • 8/11/2019 7-Lecture Jan 15th 2014


    Who Fits the Profile

    New scores( Index Based) relate the

    publications target audience with the US

    Population, hence they make more sense

  • 8/11/2019 7-Lecture Jan 15th 2014


    Data Mining & Direct Marketing Advertising : reaches prospects about whom

    nothing is known as individuals

    Direct Marketing : requires min min a phoneno or email id

    Countries have restrictions on use of data

    Household-Level data can be used directly fora rough cut segmentation based on income,

    car ownership . Is this dataset the right size?

  • 8/11/2019 7-Lecture Jan 15th 2014


    Response Modeling Campaign response rates are in low single digits

    Models help improve response rates to direct solicitation

    Likelihood of response

    Ranking of prospects

    Data Mining techniques are extensively applied to response


    Direct Solicitation is an expensive process and must conform

    to resource constraints (budgets)

  • 8/11/2019 7-Lecture Jan 15th 2014


  • 8/11/2019 7-Lecture Jan 15th 2014


    Response Modeling



    Or Penetration

    Lift = concentration/penetration

    Benefit = concentration - penetration

  • 8/11/2019 7-Lecture Jan 15th 2014


    Max Benefit

    Max Benefit = penetration where the perpendicular distance

    between the curves is max

    KS statistic is also max

    Split points results in a good list and bad list of prospects

    Maximizes un-weighted average of sensitivity and specificity

    Sensitivity : likelihood that diagnosis is correct ( in medical

    world) = true positives/(false negative + true positive)

    Specificity : = proportion of true negatives amongst the people

    who get a negative result = True -ve/(True

    ve + false +ve)

    Max Benefit point also minimizes the expected loss

  • 8/11/2019 7-Lecture Jan 15th 2014


    Confusion Matrix

    Model Prediction


    No YES

    NO True Negative False Positive

    YES False Negative True Positive

    Sensitivity : = True positives/(False negative + True positive)Specificity : = True Negative /(True Negative + False positive)