DATA SCIENCE AND ANALYTICS DEMYSTIFIED Kevin Gray Cannon Gray LLC

Data Science and Analytics Demystified

Kevin Gray

http://www.cannongray.com/

Cannon Gray is a marketing science and analytics company based in Tokyo

Partners with clients, marketing research agencies, consultants and ad agencies located in many regions of the world

Founded in 2008 One-man consultancy Work from home office Company name honors both sides of the


Provide Advanced Analytics and Consultation for a broad range of quantitative marketing research, including: Consumer Segmentation Data Mining and Predictive Analytics Market Response Modeling Demand Forecasting Key Driver Analysis Pricing Research Coaching and Mentoring

Company Philosophy: Marketing science is social science and

technical proficiency by itself will not make one a competent marketing scientist

It requires diverse skills, such as being able to partner with marketers to help them see the big picture and anticipate key decisions they'll have to make

Company Philosophy: Marketing scientists must deliver knowledge

and insights that facilitate cost-effective and profitable decisions - this is Cannon Gray's focus, not number crunching

Each project is tailored to address specific marketing issues and to the country or countries being researched

Once upon a time…

Began my marketing research (MR) career in the 1980s in Financial Services in Manhattan

In some ways, a different world: No World Wide Web No social media PCs had limited capacity and were mainly

regarded as curiosities

Once upon a time…

But plenty for a researcher to do: Survey research by postal mail or telephone Some postal surveys were entirely DIY, from

design through analysis and reporting. Also spent many days behind one-way mirrors

observing focus groups

Once upon a time…

Because research was harder and more expensive: Thought more about our objectives More careful in planning, execution and

analysis Today, IT makes it easy to fix mistakes...so we

make more of them!

Once upon a time…

For legal, operational and marketing purposes my company maintained substantial amounts of customer data

Computers were mainframes networked together into "virtual machines" and processing was quite speedy

Some data were still stored on magnetic tape but much of it had been migrated to servers

Once upon a time…

Used SAS for building data files and for analysis Did not have a “data warehouse”

With support of very capable MIS colleagues, created data files ad hoc for specific purposes

Bigger challenge was finding out which data were kept in which parts of the organization and what the assorted data codes meant

Once upon a time…

Able to integrate "hard" customer data with "soft" data from surveys or with exogenous data such as economic trends

Performed fairly advanced statistical analyses of merged data files: Segmentations Key Driver Analysis Time Series Analysis

Was I a “data scientist”?

By some definitions, YES: “Data science is, in general terms, the

extraction of knowledge from data.” - Wikipedia

“I think data-scientist is a sexed up term for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.” – Nate Silver

"Sometimes things don’t change as much as all the terminology changes!" - veteran US marketing research recruiter

“Data science” not just for marketing research

We should note that much “data science” has no connection with marketing

Used in many other fields, for example: Medical and pharmaceutical research Fraud detection Credit scoring Human Resource Management Oil and gas exploration Military and security Seismology

Much has changed

Computer hardware and software has greatly advanced

Big data as many would characterize it today was scarce and Hadoop didn't exist

Bayesian methods were very limited and techniques such as LARS, Stochastic Gradient Boosting and Random Forests had not yet been developed

But much hasn’t changed

I cannot accept claims that data science is entirely new

That it has caught many marketing researchers by surprise is disturbing

That many veteran researchers have become confused about the meaning of analytics is even more worrying Analysis is a core component of marketing


But much hasn’t changed

For many years in many countries, there have been specialist agencies and consultancies working in this advanced analytics space

Also, many clients have been doing these sorts of things internally for a long time

But much hasn’t changed

Data base marketing, data mining and predictive analytics describe much of is now called analytics or data science These terms have been in use since the 80's

and 90‘s Aren't "trad" marketing science business

practice areas such as driver analysis and segmentation analytics and data science?

What is “analytics”?

There are many ways analytics can be defined

One way is as a research procedure for decision making

Another is as statistical procedures These are just two ways! 

Analytics as a research procedure

Analytics is the discovery and communication of meaningful patterns in data. It makes use of information technology, statistics and mathematical algorithms to develop knowledge, to quantify performance or to make predictions. It uses the insights gained from this process to recommend action or to guide decision making. Analytics is best thought of as a research procedure for decision making, not simply as isolated tools or steps in a process.

My own definition, heavily inspired by Wikipedia and other knowledgeable sources

Basic components of procedure

1. Defining Objectives2. Data Collection3. Data Preparation and Cleaning4. Model Building5. Model Evaluation6. Interpretation7. Scoring New Data or Simulations Using the

Model8. Communication of Results and Implications to

Decision Makers

Some basic kinds of statistics

Descriptive and Exploratory Analysis - frequencies, means, bar charts

Models that Predict - predicting consumption frequency of new customers

Models that Explain - identifying brand choice drivers

Analysis of Cross Sectional Data - data collected at one period in time

Analysis of Longitudinal or Time Series Data - data collected at several periods in time

Text Mining - analysis of social media conversations

Some basic kinds of statistics

Models with Quantitative Dependent Variables - monthly spend

Models with Categorical Dependent Variables - product user/non user

Time to Event Models - customer churn analysis Methods that Group Variables - factor analysis

of attribute ratings Methods that Group Cases - cluster analysis of

consumers Simulations and Forecasts - sales forecasts

under various marketing mix scenarios

Some textbooks on research methods and statistics

- Handbook of Statistical Distributions (Krishnamoorthy)- Practical Tools for Designing and Weighting Survey Samples

(Valliant et al.)- Design and Analysis of Experiments (Montgomery)- Experimental and Quasi-Experimental Designs (Shadish et al.)- Propensity Score Analysis (Guo and Fraser)- Methods of Meta-Analysis (Schmidt and Hunter)- Applied Multivariate Statistical Analysis (Johnson and Wichern)- The R Book (Crawley)- Regression Modelling Strategies (Harrell) - Categorical Data Analysis (Agresti)- Multilevel and Longitudinal Modeling (Rabe-Hesketh and


Some textbooks on research methods and statistics

- Time Series Analysis (Wei)- Multiple Time-Series Analysis (Lütkepohl)- Bayesian Data Analysis (Gelman et al.)- Applied Bayesian Hierarchical Methods (Congdon)- Risk Assessment and Decision Analysis (Fenton and Neil)- The Data Warehouse Toolkit (Kimball and Ross)- Data Mining Techniques (Linoff and Berry)- Data Mining (Whitten et al.)- Applied Predictive Modeling (Kuhn and Johnson)- An Introduction to Statistical Learning (James et al.)- Elements of Statistical Learning (Hastie et al.)

Wars of Words

"The exact meaning of [data science] is a matter of some debate; it seems like a hybrid of a computer scientist and a statistician." Statistics and Science: A Report of the

London Workshop on the Future of the Statistical Science

Product of a meeting in London in November, 2013 attended by more than 100 prominent statisticians from around the world

Wars of Words

Moreover, lack of agreement about the meaning of “big data”

For example see:http://datascience.berkeley.edu/what-is-big-data/

43 experts asked…43 definitions offered!

Computer Science vs. Statistics

Many data scientists are computer scientists mainly concerned with IT matters Confuse data management with data analysis Confuse mathematics with statistics

Often are not well-versed in statistics and some actually distrustful of statistical models

Conjoint, structural equation modeling, time series analysis and many other statistical tools are a foreign world for some

Computer Science vs. Statistics

Statisticians (and marketers) often criticize current data science practice as: Mechanical and algorithm driven Focusing too much on the What and not

enough on the Why

Computer Science vs. Statistics

Marketing is also about changing behavior, not just predicting it! Two people can do the same things for the

same reasons Two people can do the same things for

different reasons Two people can do different things for the

same reasons Two people can do different things for

different reasons

Computer Science vs. Statistics

There is also the “multiple me”. On different occasions: I can do the same things for the same reasons I can do the same things for different reasons I can do different things for the same reasons I can do different things for different reasons

A Declaration of Peace

To be fair, many statisticians (like me) should learn more about computer science

Data science teams can include computer scientists, statisticians, economists, psychologists and specialists from many other backgrounds

No need for such teams to be comprised of only one type of data scientist!!

Hype vs. Reality

"Corporations are not as sophisticated or as successful as we might grasp from the sound bytes appearing in conferences, books, and journals. Instead opinion-based decision making, statistical malfeasance, and counterfeit analysis are pandemic. We are swimming in make-believe analytics.“

Randy Bartlett in A Practitioner's Guide to Business Analytics

The author is an analytics veteran of more than 20 years with degrees in both computer science and statistics

Hype vs. Reality

We humans do not appear to be hard-wired to use data to make decisions

For years, managers have complained of information overload!

Our schooling has not prepared us fully exploit new data sources and advanced information technology

Hype vs. Reality

As long as there are human managers and human consumers, data and analytics will never entirely replace gut feel in decisions

Many important decisions cannot simply be calculated Even thermostats are regularly overruled by


Hype vs. Reality

More data - particularly when the numbers aren't trending in the same direction - may be more fuel for organizational politics and make decision-making more unwieldy

Also, humans naturally resist change and many companies very bureaucratic

Abrupt and radical transformation in the way we make decisions is unlikely

The future?

However, decision-making will gradually evolve and become more, if never wholly, evidence-based This fits in neatly with the essential purpose of

marketing research! Over the next few years decreasing emphasis on

data infrastructure and more emphasis on what data tell us and how they can be leveraged

With bigger and messier data, understanding people will become more critical, not less Demand will rise for marketing scientists able to see

beyond math and programming who truly understand marketing and consumers

The future?

More analytic options also mean more risk and more need for well-trained and experienced researchers

The resurgence of Bayesian statistics is further evidence that human judgment cannot be purged from analytics "Science cannot be done by the numbers." -

Noel Cressie and Chris Wikle in Statistics for Spatio-Temporal Data

Some things Marketing Research can do

A downside of rapid technological change is increasing specialization and even more silos

In marketing research, the well-rounded generalist becoming hard to find and over-specialization is increasing MR educational and training programs will

need to provide more cross-training to counteract this flip side of progress

“Be a jack of all trades and a master of at least two” – David McCallum (former Global MD Nielsen Customized)

Page 39: Data Science and Analytics Demystified

Cannon Gray LLC

Some things Marketing Research can do

Embrace new technologies and methodologies but don’t neglect less exotic activities Educating clients about how to use marketing

research to make better decisions Changing habits of thinking and improving

our own decision-making skills

Some things marketing research can do

We need to be better at marketing marketing research

We must compellingly respond to contentions that data science has made marketing research irrelevant We need to show that "data science" has

actually been part of marketing research for a long time!

In summary

Data science is not entirely new and not entirely old

It can do amazing things but cannot work miracles

Despite the hype and hogwash, I see it much more as friend than foe

In summary

Change is a threat to those who stick too closely to the tried and true but an opportunity for those able to blend new skills with knowledge that has stood the test of time

Some more history…

The foregoing shows you what presentations looked like in

the 1990’s

http://www.cannongray.com/[email protected]