62
Technology Ready, Set, Go! eSampler

Ready, Set, Go! - Technology eSampler

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Technology

Ready,Set,Go! eSampler

Page 2: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Page 3: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

1 Implementation Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Planning Your Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1. Defi ne Business Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2. Build Key Performance Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3. Collect Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4. Analyze Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5. Test Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6. Implement Insights  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Implementing and Customizing Your Code  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Cross Domain Tracking  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Enhanced Ecommerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Custom Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Download Clicks  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Advanced Content Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Troubleshooting Code Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Setting Up the Google Analytics Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Setting Up Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Focusing on Potential Customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Removing Parameters That Do Not Point to Unique Content . . . . . . . . . . . . . . . . . . . . . . 11

Eliminating Duplicate Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11

Setting Up Site Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Enabling Display Advertising and Demographics Reports  . . . . . . . . . . . . . . . . . . . . . . . . 13

Excluding Referrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Tagging Your Inbound Traffi c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Tagging Custom Campaigns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Tagging FeedBurner Traffi c  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Page 4: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contentsxvi

Managing Your Implementations Effectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Creating Raw Data and Staging Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Creating an Analytics Staging Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Keeping Track of Implementation and Confi guration Changes  . . . . . . . . . . . . . . . . . . . . 18

Keeping Track of External and Overall Changes with Annotations . . . . . . . . . . . . . . . . . . 19

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

I Offi cial Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  21

2 AdWords Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Integrating AdWords and Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Linking AdWords and Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Deleting and Editing the Google Analytics and AdWords Link . . . . . . . . . . . . . . . . . . . . . 29

Top 10 Causes of Google Analytics and AdWords Data Discrepancies  . . . . . . . . . . . . . . . 30

Integration Data, Structure, and Standard Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

AdWords Dimensions and Metrics in Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . 31

AdWords Account Structure Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

AdWords Standard Reports Overview  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Optimizing AdWords Performance Using Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Identifying Winners and Losers—The ABC Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Finding Negative Keywords with Custom Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Creating Remarketing Lists Using Google Analytics Data . . . . . . . . . . . . . . . . . . . . . . . . . 51

Optimizing Shopping Campaigns  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 AdSense Integration  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Integrating AdSense and Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Linking Analytics to AdSense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Linking Multiple AdSense Accounts and/or Google Analytics Properties . . . . . . . . . . . . . . 60

Unlinking and Managing Access to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Data Discrepancies Between Google Analytics and AdSense  . . . . . . . . . . . . . . . . . . . . . . 61

Analyzing AdSense Effectiveness Using Google Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

AdSense Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

AdSense Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

AdSense Referrers  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Page 5: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents xvii

Google Analytics Dashboard to Monitor AdSense Performance  . . . . . . . . . . . . . . . . . . . 69

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  72

4 Mobile Apps Integrations  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Viewing Google Play and iTunes Data on Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Android SDK v4: Setting Up Install Tracking and Campaign Measurement  . . . . . . . . . . . 74

iOS SDK v3: Setting Up Install Tracking and Campaign Measurement . . . . . . . . . . . . . . . 78

Analyzing Mobile Apps—The Full Customer Journey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Sources Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Google Play Referral Flow Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5 Webmaster Tools Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  87Linking Webmaster Tools to Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Analyzing Webmaster Tools Data on Google Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Queries Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Landing Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Geographical Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96

6 YouTube Integration  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Integrating YouTube Into Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

YouTube Video Tracking in Google Analytics Using Google Tag Manager . . . . . . . . . . . . . . . .r 99

Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Custom Report to Monitor Video Performance  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

II Custom Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7 Custom Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  107Methods to Import Data into Google Analytics  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

The Measurement Protocol  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Data Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  108

Real-World Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

Importing Content Data  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

Importing Product Profi t Margin Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116

Page 6: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contentsxviii

Importing Refund Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117

Limitations and Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119

8 User Data Integration  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121The Siloed Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

The User ID  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Creating a User ID View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  124

Setting the User ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Storing the User ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Importing Additional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

9 Marketing Campaign Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  135Google Analytics Acquisition Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  135

Tagging Custom Marketing Campaigns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Measuring Online Campaigns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Measuring Offl ine Campaigns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  140

Cost Data Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .t 141

The Cost Data Import Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Analyzing Marketing Campaigns  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

10 A/B Testing Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  149Integrating Optimizely Data into Google Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  149

Sending Test Variations as Events for Advanced Segmentation . . . . . . . . . . . . . . . . . . . .  151

Analyzing Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  155

Ending Your Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Dealing with “No Signifi cant Difference” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  156

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

11 Email Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  159Tracking Email Opens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  159

Step 1: Create a Custom Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  160

Page 7: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents xix

Step 2: Create an Email Campaign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  161

Step 3: Add the Google Analytics Code to Your Email  . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Step 4: Send Your Email and Analyze the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Tracking User Behavior Across Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Step 1: Set Up a User ID View in Google Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  166

Step 2: Add the MailChimp ID to the Links in Your Emails . . . . . . . . . . . . . . . . . . . . . . . 166

Step 3: Send the User ID to Google Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  167

Bonus Step: Add a Custom Dimension with a User ID Value . . . . . . . . . . . . . . . . . . . . . .  168

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

12 Offl ine Data Integration  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173The Full Customer Journey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  173

Implementation Details and Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .t 174

Step 1: Defi ne Your Data Collection Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174

Step 2: Create the Google Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  175

Step 3: Add and Edit the Script to Match Your Needs . . . . . . . . . . . . . . . . . . . . . . . . . . .176

Step 4: Add a Trigger  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .176

Step 5: Make Sure the Form Is Being Filled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .176

And Finally...The Script! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Index  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Page 8: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices1

On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong fi gures, will the right answers come out?” I am not able rightly to apprehend the kind of confusion of ideas that could provoke

such a question.

—Charles Babbage, Passages from the Life of a Philosopher

Charles Babbage’s quote is a succinct explanation of the term GIGO (garbage in, garbage out), which, in decision sciences, is commonly used to describe situations where inaccurate data is

fed into a model, resulting in the production of equally inaccurate results. The same is true in this book’s context: You must make sure you are collecting accurate data before you start using it.

In order to use Google Analytics as a decision-making tool, companies cannot aff ord to rely on partial, inaccurate, or otherwise misaligned data. Google Analytics must be set up properly to meet the measurement needs and business objectives of companies.

In this chapter you will learn some of the most important steps in order to have clean, organized, and accurate data. The chapter is divided in fi ve sections, each representing a step when it comes to implementing Google Analytics in a website or app successfully:

1. Understanding the web analytics process: Before you implement Google Analytics, it is important to understand how the data will be used and how the collection and analysis of data relate to other business areas. This will help you decide on the data needs of your company and which metrics will be used to measure success.

2. Implementing and customizing codes: Once your data needs and success metrics are defi ned, you should start looking for the necessary Google Analytics customizations to implement onyour website or app.

3. Setting up the Google Analytics interface: Following the code implementation, you will need to set up the Google Analytics interface to make sure it processes your data in the way you want.

Page 9: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices2

4. Tagging inbound traffi c: In order to accurately measure all your website or app traffi c, especially marketing campaigns, you will need to tag inbound links with custom URL parameters called UTMs.

5. Managing the implementation: To ensure that your implementation is always tidy, you shouldalways keep track of changes on your Google Analytics account.

Please note that this chapter does not intend to provide a comprehensive description of Google Analytics implementation methods and capabilities; rather, it focuses on the most important aspects required to build an accurate and organized data collection.

Pl i Y I l t tiPlanning Your ImplementationThe objective of web analytics is to improve the experience of online customers while helping a The objective of web analytics is to improve the expercompany to achieve its results; it is not a technology to produce reports and spill data. Web analytics is a virtuous cycle that should never start with data collection; collecting data is a means to an end.

The diagram in Figure 1-1 shows a process you can use to implement web analytics in your company. It is not the process; it is a process. Each company should fi nd the process that works best for it, but this is a simple process that might work for you.

1. Start with a clear defi nition of business goals.2. Build a set of key performance indicators (KPIs) to track goal achievement.3. Collect accurate and complete data.4. Analyze data to extract insights.5. Test alternatives based on assumptions learned from data analysis.6. Implement insights based on either data analysis or website testing.

1. DEFINEGOALS

2. BUILDKPIs

3. COLLECTDATA

4. ANALYZEDATA

5. TESTALTERNATIVES

6. IMPLEMENTINSIGHTS

WEB ANALYTICS PROCESS

Figure 1-1: The web analytics process

Page 10: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 3

This book focuses on steps three and four of the process in Figure 1-1: collecting and analyzing data. However, it is important to take a step back, before we dive into the bits and bytes of data, to remember that data should not live in a silo; it should be strongly linked to business and customer needs. Below you will learn a little about each of the steps shown in Figure 1-1. Following this section you will dive deeper into the technical aspects of Google Analytics implementation best practices.

1. Defi ne Business GoalsThis is the fi rst step when it comes to understanding and optimizing a website or app: You must understand your business goals in order to improve it. The answer to the following question is critical in defi ning your goals: Why does your website or app exist?

Each website or app will have its own unique objectives. For some, the objective will be to increase pages viewed in order to sell more advertising (increase engagement); for others, the objective will be to decrease pages viewed because they want their visitors to fi nd answers (increase satisfaction). For some, the objective will be to increase ecommerce transactions (increase revenue), and for others the objective will be to sell only if the product fi ts the needs of the customer (decrease products returns).

As you can see in the web analytics process proposed in Figure 1-1, the objectives are absolutely necessary in order to start the process. Only after they are defi ned can you proceed to build the KPIs.It is also very important to constantly revisit the goals in the light of website analyses and optimiza-tion to fi ne-tune them.

2. Build Key Performance IndicatorsIn order to measure goal achievement, you will need to create KPIs to understand whether the website results are going up or down. A KPI must be like a good work of art: It wakes you up. Sometimes it makes you happy and sometimes it makes you sad, but it should never leave you untouched, because if that is the case, you are not using the right KPIs.

And good works of art are rare. You have just a few truly touching works of art per museum, and not every work of art touches the same people. The same applies to KPIs. There are just a few truly good KPIs per company, and each person (or hierarchy level) will be interested in diff erent KPIs—the ones that relate to their day-to-day activities. Upper-management will be touched by the overall achievement of the website’s goals; mid-management will be touched by campaign and site optimization results; and analysts will be touched by every single metric in the world!

Good KPIs should contain three attributes:

■ Simple: People in several departments with diff erent backgrounds make decisions in companies.If KPIs are complex and hard to understand, it is unlikely that decision makers across the company will use them.

■ Relevant: Each company has its unique objectives; therefore, it should also have its own set of KPIs to measure improvement.

■ Timely: Even excellent KPIs are useless if it takes a month to get information when your industry changes every week.

Page 11: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices4

By following the defi nition of the business objectives and the metrics that will be used to measure them, you will be in a much better condition to collect the data that will be needed.

3. Collect DataWhen any company starts to collect website or app data, two questions should be asked:

■ Is my data accurate? If your data is not accurate, it is like building an empire in the sand; your foundations can be shaken too easily.

■ Am I collecting all the data that I need? If data is not collected, you will not be able to understand customer behavior properly.

You will learn more about Google Analytics data collection techniques in the following sections, so I will keep this step succinct.

4. Analyze DataData analysis is a rich fi eld, which goes from simple fi ltering, sorting, and grouping to advanced statistical analysis. In this book you will learn about ways to analyze data using several Google Analytics reports and features, but the following are some general ideas that can help you go from data to insights:

■ Segment or die: Segmentation is an essential technique when it comes to analyzing customer behavior. By segmenting your customers into meaningful segments, you will be able to optimize their experiences more easily and eff ectively.

■ Look at trends, not data points: It is critical to look at your metrics over time to understand if the website results are improving or not.

■ Explore your data with visualization techniques: You can chose from an endless pool of graphsand tools to visualize numbers. Exploring data with charts will uncover patterns and trends that are hard to fi nd by crunching numbers.

It’s important to note that data analysis can lead to three diff erent outcomes (as shown in Figure 1-1):

■ To discover an insight for implementation, such as a bug or a page that does not convert for an obvious reason.

■ To develop a hypothesis regarding a low converting customer touch point that will lead to a split test.

■ To come to an understanding of a data collection failure: Important data can be either missing or inaccurate.

Page 12: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 5

5. Test AlternativesThere is an African proverb that says, “No one tests the depth of a river with both feet.” In the same spirit, it is very unwise to change your website without fi rst trying with the tip of your toes. When you test, you lower the risk of a loss in revenue due to a poor new design, and you bring science to the decision-making process in the organization.

But the most interesting outcome of experimenting is not the fi nal result; it is the learning expe-rience about your customers—a chance to understand what they like and dislike, which ultimatelywill lead to more or fewer conversions.

The web analyst must try endlessly and learn to be wrong quickly, learn to test everything and understand that the customer should choose, not the designer or the website manager. Experimenting and testing empowers an idea democracy, meaning that ideas can be created by anyone in the orga-nization, and the customers (the market) will choose the best one; the winner is scientifi cally clear.

Following are a few tips when it comes to website testing:

■ Testing is not limited to landing pages: It should be implemented across the website, wherever visitors are abandoning it and wherever the website is leaving money on the table.

■ Try your tools (and your skills) with a small experiment: Sometimes it is wise to start small and then grow. Once you are familiar with your tools, try a test in an important page but for a small (or less profi table) segment. Then head for the jackpot!

■ Measure multiple goals: While you improve macro conversions, you might be decreasing registrations or newsletter signups, which might have a negative impact in the long run.

■ Test for diff erent segments: Segments such as country and operating systems can have com-pletely diff erent behaviors, so the tests should also be segmented in order to understand thosediff erences.

Google Analytics off ers an A/B testing feature called Content Experiments; learn more about it at http://goo.gl/HTGX2d.

6. Implement InsightsNo insight implementation is a synonym of no web analytics. If you go through all the preceding steps but cannot actually implement the results on your website, it is as if you did nothing. Following are some tips that can help you overcome implementation bottlenecks:

■ Get C-level support: This will be essential if you come to a point where organizational priori-ties must be set and resources allocated.

Page 13: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices6

■ Start small: As mentioned previously, starting small helps to set expectations; people understand the tools and what is required from them.

■ Be friendly: Being a nice person is always helpful; that’s the way human nature works.

I l ti d C t i i Y C dImplementing and Customizing Your CodeIf you are implementing Google Analytics for the fi rst time, you will see a wizard that will guide you If you are implementing Google Analytics for the first time you will see a wizato retrieving the appropriate tracking code to use, right after creating an account. The fi rst choice: what would you like to track, a website or a mobile app? If you choose a website, you will get a JavaScript code to implement on it; if you choose an app, you will get links to download either the Android or iOS SDKs.

If you miss the previous step or would like to fi nd your tracking info at a later stage, you can fi nd this page by logging into Google Analytics and clicking on Admin on the top of any page. This will lead you to the Administration panel where you can fi nd an item named Tracking Info.

While implementing the default code on your website or app will provide you with important information about customer behavior, other code customizations might be required to accommo-date your business needs. In the next section, I describe the customizations that I believe to be the most important; for a comprehensive and detailed description of all customizations available, visit http://goo.gl/t1Td5T.

IMPLEMENTING GOOGLE ANALYTICS THROUGH GOOGLE TAG MANAGER

If you are an experienced analyst/developer/marketer, you are probably asking yourself, “Whenis he going to start talking about Google Tag Manager?” A great question! In this chapter I focus my attention on the Google Analytics methods that should be used when enhancing your implementation, regardless of how you choose to actually implement them.

As you might already be aware, Google Tag Manager is a powerful and scalable way to organizeyour Google Analytics (and other tools) implementations. It will make updates easier and cleaner,and it will transform you into a hero. Here are a few resources you should consider when imple-menting Google Analytics through Google Tag Manager:

■ The offi cial Google Tag Manager Help Center: http://goo.gl/1uXK90■ The offi cial Google Tag Manager Developer documents: http://goo.gl/CPTYH6■ Google Tag Manager Step-By-Step Guide (Web): http://goo.gl/lBiX6t■ Guide to Google Tag Manager for Mobile Apps: http://goo.gl/ib3LL7

Page 14: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 7

Cross Domain TrackingIf you would like to measure multiple websites that are linked together within a single Google Analytics property, it is important to adjust the code with Cross Domain Tracking (tracking behavior across subdomains does not require additional confi guration). Failing to take into account multiple domains when implementing Google Analytics can signifi cantly decrease data accuracy. Common cases are ecommerce carts, which are sometimes hosted on diff erent domains; if the tracking code is not set up correctly in such instances, you might see a large number of direct or self-referral ses-sions ending on a transaction.

In order to understand Cross Domain Tracking thoroughly and grab the necessary codes for implementation, I recommend reading through both the Developer documentation at http://goo.gl/5JvxJ1 and the Help Center at http://goo.gl/TJ0Wfp.

Enhanced EcommerceIf your website or app off ers merchandise or another type of ecommerce transaction, it is critical to implement the Google Analytics Enhanced Ecommerce functionality so that you can understand your customer journey better. This feature will enable you to have a deeper understanding of shopping behavior, campaign ROI, customer lifetime value, and other important information.

For a business and technical overview of the Enhanced Ecommerce feature, read http://goo.gl/th9Roy.

Custom DimensionsCreating audience segments is one of the most important techniques when trying to understand and optimize customer behavior; it allows you to make your website or app more relevant to diff erent groups of users. Google Analytics provides a powerful segmentation capability by default, using a multitude of metric and dimension combinations.

In addition to the default segments, Custom Dimensions allow you to add attributes of a user, session, or action when collecting data. For example, a business that sells diff erent types of member-ships should be able to understand how each member type behaves; a large publisher should be able to understand how each of their authors is performing; and a travel website should be able to know which kind of hotel their returning customers like the most.

You will learn more about Google Analytics Segments and Custom Reports throughout the book. However, the subject is especially important when it comes to Custom Dimensions, as those dimen-sions do not appear in any of the standard reports. Therefore, the best ways to analyze behavior based on Custom Dimensions are as follows:

1. Create a segment: The Segment builder enables you to create a segment that includes orexcludes the behavior of specifi c users. For example, you might want to exclude from your

Page 15: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices8

reports all your existing clients (defi ned through a Custom Dimension) using a segment. Thiswould be wise when analyzing customer acquisition eff orts. You might also want to include in your reports only users who are part of your loyalty program (defi ned through a Custom Dimension) to analyze what type of content they are most interested in. Those are only two examples; to learn more about creating segments, visit http://goo.gl/6gbC2k.

2. Build a Custom Report: Google Analytics allows its users to create Custom Reports using the metrics and dimensions available in the tool to tailor their reports to their business needs. This functionality can be used to build reports including Custom Dimensions and acquisitionbehavior, or conversion metrics that can help you understand your users’ behavior. To learnmore about Custom Reports, visit http://goo.gl/e0ADkr.

For a detailed explanation on why and how to use Custom Dimensions, read http://

goo.gl/fvhL8L.

Download ClicksDiff erent websites have diff erent goals. You learned previously about a way to measure ecommerce transactions, and you will learn shortly about a way to measure advertising revenue through the AdSense integration, but some websites will have downloads as their main goal. Google Analytics will not measure clicks on download links by default, so it is critical to add a customized code to your website if you are encouraging people to download any type of fi le. Here is a guide explaining how to do it: http://goo.gl/uUm4rq.

Advanced Content TrackingEvery website owner should be able to understand how its users consume content. However, some-times users behave in ways that cannot be measured by a default implementation. For example, when someone lands on a long article, reads through the whole piece, and then leaves the website, from a Google Analytics perspective, this person viewed just one page and didn’t interact with the content. This is a problem when it comes to content publishers.

With that in mind, Justin Cutroni, Analytics Evangelist at Google, developed a script that sends events to Google Analytics whenever a user scrolls down a page. In addition, the script uses Custom Dimensions to categorize users into “scanners,” users who scroll to the bottom of the content in less than 60 seconds, and “readers,” users who take more than 60 seconds to reach the bottom of thecontent. This solution is excellent for measuring users’ content consumption patterns. Read more at http://goo.gl/21eIiO.

Page 16: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 9

Troubleshooting Code ImplementationIf you manage a website, it is critical to keep an eye open at all times to make sure your implementa-tion is okay, especially when you update the website code. The following list of tools created by the Google Analytics team will help you with this task:

■ Diagnostics (in-product feature): When you log in to your Google Analytics account and select a view, you will notice a bell icon in the upper-right corner of your page. You will also notice that sometimes there will be a notifi cation number there. If you click on the bell, you will fi nda list of customized notifi cations for your code implementation and set up. Make sure you read through them and fi x the issues. Learn more at http://goo.gl/8NC2Y4.

■ Real Time (in-product feature): Google Analytics provides Real Time data for website behavior, where you can see what is happening right now on your website or app. This is very useful forwebsite debugging, since you can make changes in the code and fi nd out how they are aff ect-ing the data in real time.

■ Tag Assistant (Chrome extension): This extension allows you to check your Google Analytics tag (and other Google tags) while browsing the website. It is a handy tool to check and trouble-shoot implementations quickly. Download it from the Chrome Store at http://goo.gl/P1LstJ.

■ Google Analytics Debugger (Chrome Extension): This extension provides more detailed andtechnical data (as compared to the extension) about what is being sent from a page to GoogleAnalytics. Download it from the Chrome Store at http://goo.gl/yn9dHj.

S tti U th G l A l ti I t fSetting Up the Google Analytics Interface In this section you will learn some of the most important settings to help you create a clean Google In this section you will learn some of the most important settings to help yoAnalytics account with a good level of data accuracy. For a comprehensive and detailed explanation of all possible tool settings, visit http://goo.gl/2aWv9b.

Setting Up GoalsGoals are the soul of a Google Analytics account; no analysis will provide valuable insights if you do not measure your goals. Goals can be measured in multiple ways: an ecommerce transaction (see previous section), a thank-you page for a newsletter subscription, a session that lasted a certain time,

Page 17: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices10

a visit with a certain number of pages viewed, and others. In order to help website owners set up goals, Google Analytics provides a series of templates, as shown in Figure 1-2.

Figure 1-2: Google Analytics goals templates

However, if you decide to create a custom goal based on your own needs, you can choose among four goal types:

■ Destination: Triggered when a web page or app screen loads (e.g., reaching a “thank you” page). ■ Duration: Triggered when a user stays on a website or app longer than a pre-defi ned amount

of time in a single session.■ Pages/Screens per session: Triggered when a user views more than a pre-defi ned amount of

pages or screens in a single session.■ Event: Triggered when an event is triggered by the user (e.g., clicking on a button or playing

a video).

Use the following guide to learn more about why and how to set up goals: http://goo.gl/YbDVqi.

Focusing on Potential CustomersWide ranges of people may visit your website; unfortunately, that number includes employees of your own organization and service providers, neither of whom are the visitors you want to understand and optimize for. Therefore, it is important to create fi lters that exclude the IP range used by your organization and its service providers, such as web development and marketing agencies.

Page 18: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 11

Google Analytics off ers a series of predefi ned fi lters, where you will fi nd an option to “exclude traffi c from the IP addresses” (see Figure 1-3). This option is perfect if you want to exclude a simple range of addresses by using the “that begin with” or “that end with” options. If you want to fi lter a more complex range of IP addresses, check out http://goo.gl/PSaL15.

Figure 1-3: Predefi ned fi lter to exclude IP addresses

In addition, Google Analytics also off ers the option to fi lter bot traffi c. This fi lter will exclude all hits coming from the IAB known bots and spiders, allowing you to identify the real number of users who are coming to your site. To include the fi lter, visit your Administration panel and select a checkbox option available in the View Settings menu in the view you would like to fi lter; the option is labeled “Exclude all hits from known bots and spiders.”

Removing Parameters That Do Not Point to Unique ContentOne of the interesting insights we can learn from Google Analytics is the navigation patterns between website pages; you can fi nd this information in the Behavior section of Google Analytics standard reports. However, websites can use multiple URL parameters to refer to the same page and, by default, Google Analytics considers one page with multiple parameter values as multiple distinct pages. Therefore, if your content is not unique for these parameters, you should remove the duplicate pages from your reports.

Google Analytics provides a simple interface to exclude URL parameters from reports; under View Settings in the Administration Panel you will fi nd a fi eld called “Exclude URL Query Parameters.” When you add a parameter to this fi eld, GA will ignore the parameter, joining pages that might be considered separate.

Eliminating Duplicate PagesGoogle Analytics is case sensitive. This means that example.com/HELLO and example.com/hellowould be recognized as two diff erent pages, generating duplicate entries in your content reports. However, from a customer’s perspective, those pages are usually the same. (Check if this is the case with your website before you create the following fi lter.) Therefore, it is important to lowercase all

Page 19: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices12

URLs. Figure 1-4 shows an example of what this fi lter would look like. You can learn more about creating view fi lters at http://goo.gl/VzefpJ.

Figure 1-4: Filter to lowercase URLs

Because the same issue can aff ect other fi elds, especially campaign data, I also recommend creat-ing lowercase fi lters for the following fi elds:

■ Campaign name■ Campaign term■ Campaign medium■ Campaign source

Setting Up Site SearchAn excellent way to understand visitor intent is to study search terms used on the internal site search (search boxes located on the website that allow visitors to search the website content); they show what your visitors are looking for on the website.

A proper setup of the Google Analytics Site Search feature will help website owners understand which content is being searched for, which searches are yielding irrelevant results, and which ones are driving sales (or another goal) on the website. As shown in Figure 1-5, you will have the option to add up to fi ve parameters to defi ne a search and up to fi ve parameters as a category. You will also be able to strip the parameters from this view (check the box below the text fi eld), which works like removing the parameter, as explained above. Here is a guide on how to do it: http://goo.gl/jvm8wu.

Page 20: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 13

Figure 1-5: Setting up site search

Enabling Display Advertising and Demographics ReportsEnabling both Display Advertising and Demographics and Interest Reports will bring a vast amount of insightful and actionable data into your reports. Once you enable them you will see behavior information relating to user age, gender, and interests. But even more importantly, this data can also be used to segment standard reports and create remarketing lists. (See Chapter 2, “AdWords Integration,” for more on remarketing.)

The fi rst step to enabling these reports is to update Google Analytics to support Display Advertising, which enhances data with the DoubleClick cookie information whenever it is present (for websites), or with the Advertiser ID when they are collected (for apps). To enable this setting, log in to Google Analytics and click on Admin at the top of your screen, choose the property you would like to enable, and click on “Property Settings.” You will fi nd an item named “Enable Advertiser Features.” Please note that once you enable the advertiser features, you might be required to update your privacy policy. You can read more about this setting and its requirements at http://goo.gl/ycVvpM.

The second step, which can also be performed in the Property Settings, is to enable the Demographics and Interest Reports. Read more about why and how to enable this set of reports at http://goo.gl/OwZpr4.

Excluding ReferralsThis setting allows you to add domains to be ignored by Google Analytics as referrals. This means that a user who lands on your website from an excluded domain will be handled similar to

Page 21: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices14

Direct traffi c. If the user has previously visited the website through an Organic Search, he or she will still be attributed to Organic.

Google Analytics will add your own domain to this list by default (the same domain that you added to the Property settings). Another common use would be a third-party cart where the user is redirected to your website after a purchase or a sister website that should not count as a Referral.

T i Y I b d T ffiTagging Your Inbound Traffi cProperly implemented, Google Analytics can help you with the important task of measuring customer Properly implemented Google Analytics can help youacquisition campaigns. Google Analytics automatically detects when users reach a website through an Organic Search or Referral, but it won’t know a user came from a newsletter unless you give it a way to detect that. The same happens to AdWords campaigns: unless you link AdWords to Google Analytics, you won’t see accurate numbers on your reports; but this is the subject of an entire chapter. For now I will focus on other marketing platforms.

If you are sending newsletters, purchasing banner placements, or even advertising offl ine, it is important to use campaign tags properly. Google Analytics will show users coming from a billboard or a TV ad as Direct traffi c; it can show visitors from newsletters as Direct, mail.google.com, or other email provider traffi c; it can show visitors from banner campaigns as Direct, ad.doubleclick.net,or the website itself. These behaviors are clearly suboptimal when it comes to measuring campaign eff ectiveness.

For such cases, Google Analytics has developed a system for you to “tell” it if users came from a campaign: UTM parameters. (UTM stands for Urchin Traffi c Monitor, a remnant of Urchin, the tool Google acquired in order to build Google Analytics.) Basically, the system allows you to construct links that convey specifi c information about how the visitor arrived at the website.

Tagging Custom CampaignsUsing UTM parameters, you can create links that include fi ve variables that, taken together, help Google Analytics “see” how users got to the website:

■ utm_source describes the origin of the visitor. Since every visitor must come from some place, this is a required parameter. It is usually the URL of the website where the campaign is running, such as theguardian.com, online-behavior.com, newsletter, or others.

■ utm_medium describes the channel used by the visitor; it is also a required parameter. It could be cpc, display, social, email, or others.

Page 22: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 15

■ utm_name describes the name of the campaign. It could be a special campaign such as “Launch,” an ongoing campaign such as “Product X,” or a newsletter edition such as “newsletter-jan-2015.”

■ utm_term describes the term clicked on in a campaign. It could be a search term or a term usedin a newsletter. For example, if you are advertising on a search platform for the search terms “analytics” and “measurement,” you would have the source example.com, the medium cpc,the name “Analytics Campaign,” and the terms “analytics” and “measurement” for each ad.

■ utm_content describes the version of an advertisement on which a visitor clicked. It is often used to analyze the eff ectiveness of banner design or copy in a campaign. For example, if youadvertise on cnn.com and use two diff erent banners, you would use the same parameters forsource, medium, and name, but would add a unique value for each banner on the content UTM; this would enable you to learn which banner is better.

NOTE Google has developed a tool in order to build links using these campaign variables called URL Builder. It can be accessed at http://goo.gl/yQycsq. In order to tag multiple URLs once, usethe following template, created by Cardinal Path, a Google Analytics Certifi ed Partner and Google Analytics Premium reseller: http://c05tdu.

If you have existing campaigns tagged with custom link parameters (diff erent from the UTM), there is a way to translate them into UTMs without physically changing the campaign links, but this would require an addition to the GA tracking code. For technical implementation details, check the following plugin: http://goo.gl/GytPhO.

Tagging FeedBurner Traffi cFor content publishers, from individual bloggers to large content portals, Really Simple Syndication (or RSS) is a common way to inform readers of new posts/articles. RSS is a family of web feed formats used to publish frequently updated works, and FeedBurner is a tool provided by Google to create (or burn) website feeds.

To help publishers better understand traffi c acquired through RSS, the FeedBurner team created a way to make sure that feed links are tagged properly with UTM parameters. This is important to have a better understanding of how and where readers consume your content.  

In order to tag FeedBurner traffi c, log in to your feed at http://goo.gl/SuI6rx. On the Analyzetab (the default), you will fi nd a link on the left sidebar under Services named Confi gure Stats. Click on it and you will reach the screen shown in Figure 1-6.

As indicated by number 3, you will be given the option to Track Clicks as a Traffi c Source in Google Analytics. Once you check the box to enable the tracking, click on Customize. You will see the screen shown in Figure 1-7.

Page 23: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices16

1

23

Figure 1-6 Confi guring FeedBurner links

Figure 1-7 Customizing FeedBurner links

FeedBurner allows you to use the following dynamic variables to populate the UTM parameters:1. ${feedUri}: The feed URI2. ${feedName}: The feed name

Page 24: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 17

3. ${distributionChannel}: The channel in which the feed is distributed, usually either feed or email

4. ${distributionEndpoint}: The application where a click request originates, such as Gmail

Here is a suggestion of how you can set up the parameters in order to understand FeedBurner traffi c in an eff ective way:

1. Source: feedburner2. Medium: ${distributionChannel}3. Campaign: ${feedName} Feed4. Content: ${distributionEndpoint}

Both Custom Campaigns and FeedBurner traffi c discussed in this section can be found under the Acquisition tab on Google Analytics; to analyze a campaign search for it on the All Traffi c report.

M i Y I l t ti Eff ti lManaging Your Implementations EffectivelyGoogle Analytics implementations are a continuous process; there are always new features that require Google Analytics implementations are a continuous process; there are always newchanges to the tracking code or to the account settings. In order not to lose control over what is and is not implemented, or when it was confi gured, you must be extremely organized. In this chapter, you will learn a method to avoid losing data and context on Google Analytics reports.

NOTE If you are not acquainted with the defi nitions of accounts, properties, and views, read http://goo.gl/TAv93N before proceeding. In addition, please note that when you create new views,they will start being populated from their creation date, even if another view in the property has been collecting data for longer than that.

Creating Raw Data and Staging ViewsThe best way to check confi guration errors is to have a view that does not use any fi lters. By comparing it to your main view, you will be able to quickly learn if you have a misplaced or problematic fi lter. Once you create this view, you should also set up the same goals you have in your main view. This will make the data more relevant in case you need to use it. For example, if you fi nd out that your main view has a fi lter that aff ected your past data, you might want to use the Raw Data view for a while.

Suppose that you decide to create a fi lter to lowercase URLs (as proposed earlier), but you are uncertain about how it can aff ect your data. The best way to proceed is to have an additional view with the exact same settings as your main view and apply the new fi lter to the test view only.

Once the fi lter is applied, you can check the data and compare the numbers to learn if anything went wrong. (Tip: Wait for at least one full day of data, as fi lters might take 24 hours to start fi ltering data.) The following article shows how to add a new view: http://goo.gl/wHHEuj.

Page 25: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices18

Creating an Analytics Staging PropertyIf you have worked in the web analytics industry long enough, you have probably seen data corrupted as a consequence of bad implementations. Code changes should be undertaken with care. However, since code changes aff ect all views in a property, it is not eff ective to create a new view in this case.

Since most websites have a staging site where changes are tested before going live, I suggest having a diff erent tracking code (that is, a new Google Analytics property) used for those environments to test code changes on the Google Analytics tracking code. Also make sure to have the same confi gura-tions on both properties. Learn how to set up a property at http://goo.gl/VBkTkd.

Keeping Track of Implementation and Confi guration ChangesChanges are constantly made to Google Analytics views by users as they refi ne their website goals, improve fi lters, take advantage of new features, and so forth. Every change may impact data, some-times in unexpected ways. For this reason, it is essential to have a system in place to keep track of code and view changes, especially in large organizations where more than one person is involved with Google Analytics. But even when only one person is involved, this is important, as employees may go on leave, get promoted, or leave the company.

Google Analytics off ers an out-of-the-box feature called Change History that includes changes made to your account settings, such as changes in goals, fi lters, and user permissions. As shown in Figure 1-8, apart from the change itself, you will see who did it and when. To fi nd this report, log in to Google Analytics and click on Admin at the top of your screen; this setting will be available under your account settings.

Figure 1-8: Change History table sample

In order to centralize the collection and sharing of the changes made to a Google Analytics account, including code changes, I propose using a Google Docs form. The form should be created so that all interested parties can be aware of all changes. These will then be saved for historical knowledge to be used by the whole team (and future team members). Figure 1-9 shows an example of such a form with fi elds that you might want to create.

NOTE You can learn how to build a Google Docs form at http://goo.gl/1XKAkI.

Page 26: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Implementation Best Practices 19

Figure 1-9: Tracking Google Analytics implementations using Google Docs

Keeping Track of External and Overall Changes with AnnotationsGoogle Analytics Annotations is a feature that allows you to annotate data points on the Google Analytics user interface, providing context when analyzing data, which allows for richer analyses. Here are some important occasions when you should use this feature:

■ Offl ine marketing campaigns (radio, TV, and billboards)■ Major changes to the website (design, structure, and content)■ Changes to tracking (changing the tracking code and adding events)■ Changes to goals or fi lters

Page 27: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Chapter 1 Implementation Best Practices20

While annotations can (and should) be used for technical changes to the website, it is important to keep them at a high level. You shouldn’t add detailed information about your changes or annotate relatively minor tweaks; otherwise the annotations will become too dense to convey meaningful information to readers.

The use of both methods described here (form and annotations) should create an optimal mix. Watch the following video to learn how to use the Annotations feature: http://goo.gl/MiHVuH.

SSummary In this chapter you learned best practices for Google Analytics implementations and recommenda-In this chapter youtions on how to best set the tool so that it collects clean and accurate data. You learned about the fi ve major steps when it comes to implementing Google Analytics in your website or app in a clean, organized, and accurate way.

1. Understand the web analytics process: Before implementing Google Analytics, it is impor-tant to understand how the data will be used and how the collection and analysis of datarelate to other business areas.

2. Implement and customize codes: Once your data needs are defi ned, you should start look-ing for the necessary Google Analytics customizations to implement on your website or app.

3. Set up the Google Analytics interface: Following the code implementation, you will needto set up the Google Analytics interface to make sure it processes your data in the wayyou want.

4. Tag inbound traffi c: In order to accurately measure all your website or app traffi c, especially marketing campaigns, you will need to tag inbound links with custom URL parameterscalled UTMs.

5. Manage the implementation: To make sure your implementation is always tidy, you shouldalways keep track of changes on your Google Analytics account.

In the next chapters you will learn how to integrate Google tools into Google Analytics in order to enhance your data and create a powerful, data-driven decision-making tool. For each of the integra-tions you will learn how to integrate it into Google Analytics and how to use the resulting reports to analyze and optimize online behavior.

Page 28: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Page 29: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1 Everything You Ever Needed to Know about Spreadsheets but WereToo Afraid to Ask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Some Sample Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

Moving Quickly with the Control Button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

Copying Formulas and Data Quickly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

Formatting Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

Paste Special Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

Inserting Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

Locating the Find and Replace Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

Formulas for Locating and Pulling Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Using VLOOKUP to Merge Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Filtering and Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Using PivotTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Using Array Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Solving Stuff with Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

OpenSolver: I Wish We Didn’t Need This, but We Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

2 Cluster Analysis Part I: Using K-Means to Segment Your Customer Base . . . . . . . . 29Girls Dance with Girls, Boys Scratch Their Elbows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Getting Real: K-Means Clustering Subscribers in E-mail Marketing . . . . . . . . . . . . . . . . . . . . . . .35

Joey Bag O’ Donuts Wholesale Wine Emporium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

The Initial Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

Determining What to Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38

Start with Four Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Euclidean Distance: Measuring Distances as the Crow Flies . . . . . . . . . . . . . . . . . . . . . . . . . 41

Distances and Cluster Assignments for Everybody! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Solving for the Cluster Centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Making Sense of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Page 30: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents viii

Getting the Top Deals by Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

The Silhouette: A Good Way to Let Different K Values Duke It Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

How about Five Clusters? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Solving for Five Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Getting the Top Deals for All Five Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Computing the Silhouette for 5-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

K-Medians Clustering and Asymmetric Distance Measurements . . . . . . . . . . . . . . . . . . . . . . . . 66

Using K-Medians Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Getting a More Appropriate Distance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67

Putting It All in Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

The Top Deals for the 5-Medians Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75

3 Naïve Bayes and the Incredible Lightness of Being an Idiot . . . . . . . . . . . . . . . . . . . . 77When You Name a Product Mandrill, You’re Going to Get Some Signal andSome Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77

The World’s Fastest Intro to Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

Totaling Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Joint Probability, the Chain Rule, and Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

What Happens in a Dependent Situation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Using Bayes Rule to Create an AI Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83

High-Level Class Probabilities Are Often Assumed to Be Equal . . . . . . . . . . . . . . . . . . . . . 84

A Couple More Odds and Ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85

Let’s Get This Excel Party Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87

Removing Extraneous Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87

Splitting on Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Counting Tokens and Calculating Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

And We Have a Model! Let’s Use It. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4 Optimization Modeling: Because That “Fresh Squeezed” Orange JuiceAin’t Gonna Blend Itself. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .f 101

Why Should Data Scientists Know Optimization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .102

Page 31: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents ix

Starting with a Simple Trade-Off. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .f 103

Representing the Problem as a Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103

Solving by Sliding the Level Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105

The Simplex Method: Rooting around the Corners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Working in Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

There’s a Monster at the End of This Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Fresh from the Grove to Your Glass...with a Pit Stop Through a Blending Model . . . . . . . . . 118

You Use a Blending Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Let’s Start with Some Specs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Coming Back to Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Putting the Data into Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Setting Up the Problem in Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Lowering Your Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Dead Squirrel Removal: The Minimax Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

If-Then and the “Big M” Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Multiplying Variables: Cranking Up the Volume to 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Modeling Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .144

Normally Distributed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5 Cluster Analysis Part II: Network Graphs and Community Detection . . . . . . . . . . .155What Is a Network Graph? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Visualizing a Simple Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Brief Introduction to Gephi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Gephi Installation and File Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

Laying Out the Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Node Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Pretty Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .166

Touching the Graph Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168

Building a Graph from the Wholesale Wine Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .170

Creating a Cosine Similarity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

Producing an r-Neighborhood Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

How Much Is an Edge Worth? Points and Penalties in Graph Modularity . . . . . . . . . . . . . . . . 178

Page 32: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents x

What’s a Point and What’s a Penalty? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Setting Up the Score Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Let’s Get Clustering! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Split Number 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Split 2: Electric Boogaloo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

And…Split 3: Split with a Vengeance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

Encoding and Analyzing the Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

There and Back Again: A Gephi Tale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

6 The Granddaddy of Supervised Artifi cial Intelligence—Regression . . . . . . . . . . . . 205Wait, What? You’re Pregnant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Don’t Kid Yourself. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .f 206

Predicting Pregnant Customers at RetailMart Using Linear Regression . . . . . . . . . . . . . . . . . 207

The Feature Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Assembling the Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Creating Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .210

Let’s Bake Our Own Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Linear Regression Statistics: R-Squared, F Tests, t Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . .221

Making Predictions on Some New Data and Measuring Performance . . . . . . . . . . . . . . 230

Predicting Pregnant Customers at RetailMart Using Logistic Regression . . . . . . . . . . . . . . . . 239

First You Need a Link Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

Hooking Up the Logistic Function and Reoptimizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

Baking an Actual Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

Model Selection—Comparing the Performance of the Linearand Logistic Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245

For More Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

7 Ensemble Models: A Whole Lot of Bad Pizza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .251Using the Data from Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252

Bagging: Randomize, Train, Repeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

Decision Stump Is an Unsexy Term for a Stupid Predictor . . . . . . . . . . . . . . . . . . . . . . . . 254

Doesn’t Seem So Stupid to Me! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .255

You Need More Power! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .257

Page 33: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents xi

Let’s Train It . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

Evaluating the Bagged Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Boosting: If You Get It Wrong, Just Boost andTry Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

Training the Model—Every Feature Gets a Shot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

Evaluating the Boosted Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

8 Forecasting: Breathe Easy; You Can’t Win . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285The Sword Trade Is Hopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Getting Acquainted with Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Starting Slow with Simple Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

Setting Up the Simple Exponential Smoothing Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . 290

You Might Have a Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Holt’s Trend-Corrected Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

Setting Up Holt’s Trend-Corrected Smoothing in a Spreadsheet . . . . . . . . . . . . . . . . . . 300

So Are You Done? Looking at Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

Multiplicative Holt-Winters Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Setting the Initial Values for Level, Trend, and Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . 315

Getting Rolling on the Forecast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

And...Optimize! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .324

Please Tell Me We’re Done Now!!! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .326

Putting a Prediction Interval around the Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .327

Creating a Fan Chart for Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

9 Outlier Detection: Just Because They’re Odd Doesn’t Mean They’re Unimportant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

Outliers Are (Bad?) People, Too . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .335

The Fascinating Case of Hadlum v. Hadlum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336

Tukey Fences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .337

Applying Tukey Fences in a Spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338

The Limitations of This Simple Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Terrible at Nothing, Bad at Everything . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

Preparing Data for Graphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .342

Page 34: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

Contents xii

Creating a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .345

Getting the k Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .347

Graph Outlier Detection Method 1: Just Use the Indegree . . . . . . . . . . . . . . . . . . . . . . . . 348

Graph Outlier Detection Method 2: Getting Nuanced with k-Distance . . . . . . . . . . . . . 351

Graph Outlier Detection Method 3: Local Outlier Factors Are Where It’s At . . . . . . . .353

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .358

10 Moving from Spreadsheets into R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .R 361Getting Up and Running with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

Some Simple Hand-Jamming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .363

Reading Data into R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .R 370

Doing Some Actual Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .372

Spherical K-Means on Wine Data in Just a Few Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .372

Building AI Models on the Pregnancy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .378

Forecasting in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385

Looking at Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395Where Am I? What Just Happened? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395

Before You Go-Go . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395

Get to Know the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

We Need More Translators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397

Beware the Three-Headed Geek-Monster: Tools, Performance, andMathematical Perfection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397

You Are Not the Most Important Function of Your Organization . . . . . . . . . . . . . . . . . 400

Get Creative and Keep in Touch! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

Page 35: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

1This book relies on you having a working knowledge of spreadsheets, and I’m going to

assume that you already understand the basics. If you’ve never used a formula before in your life, then you’ve got a slight uphill battle here. I’d recommend going through a For Dummies book or some other intro-level tutorial for Excel before diving into this.

That said, even if you’re a seasoned Excel veteran, there’s some functionality that’ll keep cropping up in this text that you may not have had to use before. It’s not diffi cult stuff; just things I’ve noticed not everyone has used in Excel. You’ll be covering a wide variety of little features in this chapter, and the example at this stage might feel a bit disjointed. But you can learn what you can here, and then, when you encounter it organically later in the book, you can slip back to this chapter as a reference.

As Samuel L. Jackson says in Jurassic Park, “Hold on to your butts!”

EXCEL VERSION DIFFERENCES

As mentioned in the book’s introduction, these chapters work with Excel 2007, 2010, 2013, 2011 for Mac, and LibreOffi ce. Sadly, in each version of Excel, Microsoft has moved stuff around for the heck of it.

For example, things on the Layout tab on 2011 are on the View tab in the other ver-sions. Solver is the same in 2010 and 2013, but the performance is actually better in 2007 and 2011 even though 2007’s Solver interface is grotesque.

The screen captures in this text will be from Excel 2011. If you have an older or newer version, sometimes your interactions will look a little different—mostly when it comes to where things are on the menu bar. I will do my best to call out these differences. If you can’t fi nd something, Excel’s help feature and Google are your friends.

The good news is that whenever we’re in the “spreadsheet part of the spreadsheet,”everything works exactly the same.

As for LibreOffi ce, if you’ve chosen to use open source software for this book, then I’m assuming you’re a do-it-yourself kind of person, and I won’t be referencing the LibreOffi ce interface directly. Never you mind, though. It’s a dead ringer for Excel.

Everything You EverNeeded to Know about Spreadsheets but Were Too Afraid to Ask

Page 36: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

2 Data Smart

Some Sample Data

NOTE

The Excel workbook used in this chapter, “Concessions.xlsx,” is available for download at the book’s website at www.wiley.com/go/datasmart.

Imagine you’ve been terribly unsuccessful in life, and now you’re an adult, still living at home, running the concession stand during the basketball games played at your old high school. (I swear this is only semi-autobiographical.)

You have a spreadsheet full of last night’s sales, and it looks like Figure 1-1.

Figure 1-1: Concession stand sales

Figure 1-1 shows each sale, what the item was, what type of food or drink it was, the price, and the percentage of the sale going toward profi t.

Moving Quickly with the Control ButtonIf you want to peruse the records, you can scroll down the sheet with your scroll wheel, track pad, or down arrow. As you scroll, it’s helpful to keep the header row locked at the top of the sheet, so you can remember what each column means. To do that, choose

Page 37: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

3Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Freeze Panes or Freeze Top Row from the “View” tab on Windows (“Layout” tab on Mac2011 as shown in Figure 1-2).

Figure 1-2: Freezing the top row

To move quickly to the bottom of the sheet to look at how many transactions you have, you can select a value in one of the populated columns and press Ctrl+↓ (Command+↓on a Mac). You’ll zip right to the last populated cell in that column. In this sheet, the fi nalrow is 200. Also, note that using Ctrl/Command to jump around the sheet from left to right works much the same.

If you want to take an average of the sales prices for the night, below the price column, column C, you can jot the following formula:

=AVERAGE(C2:C200)

The average is $2.83, so you won’t be retiring wealthy anytime soon. Alternatively, you can select the last cell in the column, C200, hold Shift+Ctrl+↑+ to highlight the whole col-umn, and then select the Average calculation from the status bar in the bottom right of the spreadsheet to see the simple summary statistic (see Figure 1-3). On Windows, you’ll need to right-click the status bar to select the average if it’s not there. On Mac, if your status baris turned off, click the View menu and select “Status Bar” to turn it on.

Page 38: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

4 Data Smart

Figure 1-3: Average of the price column in the status bar

Copying Formulas and Data QuicklyPerhaps you’d like to view your profi ts in actual dollars rather than as percentages. You can add a header to column E called “Actual Profi t.” In E2, you need only to multiply the price and profi t columns together to obtain this:

=C2*D2

For beer, it’s $2. You don’t have to rewrite this formula in every cell in the column. Instead, Excel lets you grab the right-bottom corner of the cell and drag the formula where you like. The referenced cells in columns C and D will update relative to where you copy the formula. If, as in the case of the concession data, the column to the left is fully populated, you can double-click the bottom-right corner of the formula to have Excel fi ll the whole column (see Figure 1-4). Try this double-click action for yourself, because I’ll be using it all over the place in this book, and if you get the hang of it now, you’ll save yourself a whole lot of heartache.

Now, what if you don’t want the cells in the formula to change relative to the target when they’re dragged or copied? Whatever you don’t want changed, just add a $ in front of it.

Page 39: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

5Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

For example, if you changed the formula in E2 to:

=C$2*D$2

Figure 1-4: Filling in a formula by dragging the corner

Then when you copy the formula down, nothing changes. The formula continues to reference row 2.

If you copy the formula to the right, however, C would become D, D would become E, and so on. If you don’t want that behavior, you need to put a $ in front of the column refer-ences as well. This is called an absolute reference as opposed to a e relative reference.

Formatting CellsExcel offers static and dynamic options for formatting values. Take a look at column E, the Actual Profi t column you just created. Select column E by clicking on the gray E column label. Then right-click the selection and choose Format Cells.

From within the Format Cells menu, you can tell Excel the type of number to be found in column E. In this case you want it to be Currency. And you can set the number of decimal places. Leave it at two decimals, as shown in Figure 1-5. Also available in Format Cells are options for changing font colors, text alignment, fi ll colors, borders, and so on.

Page 40: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

6 Data Smart

Figure 1-5: The Format Cells menu

But here’s a conundrum. What if you want to format only the cells that have a certain value or range of values in them? And what if you want that formatting to change with the values?

That’s called conditional formatting, and this book makes liberal use of it. Cancel out of the Format Cells menu and navigate to the Home tab. In the Styles

section (Mac calls it Format), you’ll find the Conditional Formatting button (see Figure 1-6). Click the button to drop down a menu of options. The conditional formatting most used in this text is Color Scales. Pick a scale for column E and note how each cell in the column is colored based on its high or low value.

Page 41: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

7Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Figure 1-6: Applying conditional formatting to the profi t

To remove conditional formatting, use the Clear Rules options under the Conditional Formatting menu.

Paste Special ValuesIt’s often in your best interest not to have a formula lying around like you see in Column E in Figure 1-4. If you were using the RAND() formula to generate a random value, for example,it changes each time the spreadsheet auto-recalculates, which while awesome, can also be extremely annoying. The solution is to copy and paste these cells back to the sheet as fl at values.

To convert formulas to values only, simply copy a column fi lled with formulas (grab column E) and paste it back using the Paste Special option (found on the Home tab under the Paste option on Windows and under the Edit menu on Mac). In the Paste Special win-dow, choose to paste as values (see Figure 1-7). Note also that Paste Special allows you to transpose the data from vertical to horizontal and vice versa when pasting. You’ll be using ethat a fair bit in the chapters to come.

Page 42: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

8 Data Smart

Figure 1-7: The Paste Special window in Excel 2011

Inserting ChartsIn the concession stand sales workbook, there’s also a tab called Calories with a tiny table that shows the calorie count of each item the concession stand sells. You can chart data like this in Excel easily. On the Insert tab (Charts on a Mac), there is a charts section that provides different visualization options such as bar charts, line graphs, and pie charts.

NOTE

In this book, we’re going to use mostly column charts, line graphs, and scatter plots. Never be caught using a pie chart. And especially never use the 3D pie charts Excel offers, or my ghost will personally haunt you when I die. They’re ugly, they don’t com-municate data well, and the 3D effect has less aesthetic value than the seashell paintings hanging on the wall of my dentist’s offi ce.

Highlighting columns A:B on the Calories workbook, you can select a Clustered Column chart to visualize the data. Play around with the graph. Sections can be right-clicked to bring up formatting menus. For example, right-clicking the bars, you can select “Format

Page 43: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

9Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Data Series…” under which you can change the fi ll color on the bars from the default Excel blue to any number of pleasing shades—black, for instance.

There’s no reason for the default legend, so you should select it and press delete to remove it. You might also want to select various text sections on the graph and increase the size of their font (font size is under the Home tab in Excel). This gives the graphshown in Figure 1-8.

Figure 1-8: Inserting a calories column chart

Locating the Find and Replace MenusYou’re going to use fi nd and replace a fair bit in this book. On Windows you can eitherpress Ctrl+F to open up the Find window (Ctrl+H for replace) or navigate to the Home tab and use the Find button in the Editing section. On Mac, there’s a search fi eld on the top right of the sheet (press the down arrow for the Replace menu), or you can just press Cmd+F to bring up the Find and Replace menu.

Just to test it out, open up the replace menu on the Calories sheet. You can replace every instance of the word “Calories” with the word “Energy” (see Figure 1-9) by popping the words in the Find and Replace window and pressing Replace All.

Page 44: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

10 Data Smart

Figure 1-9: Running a Find and Replace

Formulas for Locating and Pulling ValuesIf I didn’t assume you at least knew some formulas in Excel (SUM, MAX, MIN, PERCENTILE, andso on), we’d be here all day. And I want to get started. But there are some formulas used a lot in this book that you’ve probably not used unless you’ve dug deep into the wonderful world of spreadsheets. These formulas deal with fi nding a value in a range and returning itslocation or on the fl ip side fi nding a location in a range and returning its value.

I want to cover a few of those on the Calories tab.Sometimes you want to know the place in line of some element in a column or row. Is it

fi rst, second, third? The MATCH formula handles that quite nicely. Below your calorie data, label A18 as Match. You can implement the formula one cell over in B18 to fi nd where inthe item list above the word “Hamburger” appears. To use the formula, you supply it a value to look for, a range to search in, and a 0 to force it to give you back the position of the keyword itself:

=MATCH("Hamburger",A2:A15,0)

This yields a 6, because “Hamburger” is the sixth item in the list (see Figure 1-10).Next up is the INDEX formula. Label A19 as Index.This formula takes in a range of values and a row and column number and returns

the value in the range at that location. For example, you can feed the INDEX formula ourcalorie table A1:B15, and to pull back the calorie count for bottled water, feed in 3 rows down and 2 columns over:

=INDEX(A1:B15,3,2)

Page 45: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

11Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

This yields a calorie count of 0 as expected (see Figure 1-10).Another formula you’ll see a lot in this text is OFFSET. Go ahead and label A20 as Offset,

and you can play with the formula in B20.With this formula, you provide a range that acts like a cursor which is moved around

with row and column offsets (similar to INDEX for the single valued case except it’s 0-based).For example, you can provide OFFSET with a reference to the top left of the sheet, A1, and then pull back the value 3 cells below by providing a row offset of 3 and a column offset of 0:

=OFFSET(A1,3,0)

This returns the name of the third item on the list, “Chocolate Bar.” See Figure 1-10.The last formula I want to look at in this section is SMALL (it has a counterpart called

LARGE that works the same way). If you have a list of values and you want to return, say, the third smallest, SMALL does that for you. To see this, label A21 as Small and in B21 feed in the list of calorie counts and an index of 3:

=SMALL(B2:B15,3)

This hands back a value of 150 which is the third smallest after 0 (bottled water) and 120 (soda). See Figure 1-10.

Now, there’s one more formula used for looking up values that’s kind of like MATCH on steroids and that’s VLOOKUP (and its horizontal counterpart HLOOKUP). That’s got its ownsection next because it’s a beast.

Figure 1-10: Formulas you should learn

Page 46: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

12 Data Smart

Using VLOOKUP to Merge DataGo ahead and fl ip back to the Basketball Game Sales tab. You can still reference a cell here from the previous tab, Calories, by simply placing the tab name and “!” in front of a referenced cell. For example, Calories!B2 is a reference to the calories in beer regardlessof what sheet you’re working in.

Now, what if you wanted to toss the calorie data into a column back on the sales sheet so that next to each item sold the appropriate calorie count was listed? You’d somehow have to look up the calorie count of each item sold and place it into a column next to the transaction. Well, it turns out there’s a formula for that called VLOOKUP.

Go ahead and label Column F in the spreadsheet Calories for this purpose. Cell F2will include the calorie count for the fi rst beer transaction from the Calories table. Using the VLOOKUP formula, you supply the item name from cell A2, a reference to the tableCalories!$A$1:$B$15, and the relative column offset you want your return value to be read out of, which is to say the second column:

=VLOOKUP(A2,Calories!$A$1:$B$15,2,FALSE)

The FALSE at the end of the VLOOKUP formula means that you will not accept approximate matches for “Beer.” If the formula can’t fi nd “Beer” on the calories table, it returns an error.

When you enter the formula, you can see that 200 calories is read in from the table on the Calories tab. Since you’ve put the $ in front of the table references in the formula, you can copy this formula down the column by double-clicking the bottom-right corner of the cell. Voila! As shown in Figure 1-11, you have calorie counts for every transaction.

Figure 1-11: Using VLOOKUP to grab calorie counts

Page 47: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

13Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Filtering and SortingNow that you have calories in there, say you now want to view only those transactions from the Frozen Treats category. What you want to do then is fi lter the sheet. To do so, fi rst you select the data in range A1:F200. You can put the cursor in A1 and press Shift+Ctrl+↓then →. An even easier method is to click the top of column A and hold the click as you mouse over to column F to highlight all six columns.

Then to place auto-fi ltering on these six columns, you press the Filter button in the Data section of the ribbon. It looks like a gray funnel as shown in Figure 1-12.

Figure 1-12: Place auto-fi lter on a selected range

Once auto-fi lter is activated, you can click the drop-down menu that appears in cell B1 and choose to show only certain categories (in this case, only the Frozen Treats transac-tions will be displayed). See Figure 1-13.

Once you’ve fi ltered, highlighting columns of data allows the summary bar in Excel to give you rolled-up information just on the cells that remain. For example, having fi ltered just the Frozen Treats, we can highlight the values in column E and use the summary bar to get a quick total of profi t just from that category. See Figure 1-14.

Page 48: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

14 Data Smart

Figure 1-13: Filtering on category

Figure 1-14: Summarizing a fi ltered column

Auto-fi lter allows you to sort as well. For example, if you want to sort by profi t, just click the auto-fi lter menu on the Profi t cell (D1) and select Sort Ascending (or “Smallest to Largest” in some versions). See Figure 1-15.

Page 49: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

15Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Figure 1-15: Sorting in ascending order by profi t

To remove all the fi ltering you’ve applied, either you can go back into the Category fi lter menu and check the other boxes, or you can un-toggle the fi lter button on the ribbon that you pressed in the fi rst place. You’ll see that although you have all of your data back, the Frozen Treats are still in the order you sorted them in.

Excel also offers the Sort interface for doing more complex sorts than might be possible with auto-fi lter. To use the feature, you highlight the data to be sorted (grab A:F again) and select Sort from the Sort & Filter section of the Data tab in Excel. This will bring up the sort menu. On Mac, to get this window, you must press the down arrow in the sort button and select Custom Sort….

In the sort menu, shown in Figure 1-16, you can note whether your data has column headers or not, and if it does have headers like this example does, then you can select, by name, the columns to be sorted.

Now, the most awesome part of this sorting interface is that under the “Options…” button, you can select to sort left to right instead of column data. That’s something you cannot do with auto-fi lter. In top to bottom of this book you’ll need to randomly sort data by both columns and rows in two quick steps, and this interface is going to be your friend. For now, just cancel out of it as the data is already ordered the way you want it.

Page 50: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

16 Data Smart

Figure 1-16: Using the Sort menu

Using PivotTablesWhat if you wanted to know the total counts of each item type you sold? Or you wanted to know revenue totals by item?

These questions are akin to “aggregate” or “group by” queries that you’d run in a tra-ditional SQL database. But this data isn’t in a database. It’s in a spreadsheet. That’s where PivotTables come to the rescue.

Just as when you fi ltered your data, you start by selecting the data you want to manipu-late—in this case, the purchase data in the range A1:F:200. From the Insert tab (Data tabon Mac), you can press the PivotTable button and select for Excel to create a new sheet with a PivotTable. While some versions of Excel allow you to insert a PivotTable into an existing sheet, it’s standard practice to select the new sheet option unless you have a really good reason not to.

In this new sheet, the PivotTable Builder will be aligned to the right of the table (it fl oats on a Mac). The builder allows you to take the columns from the original selected data and use them as report fi lters, column and row labels for grouping, or values. A report fi lteris similar in function to a fi lter from the previous section—it allows you to select only a subset of the data, such as Frozen Treats. The Column Labels and Row Labels fi ll in the meat of the PivotTable report with distinct values from the selected columns.

Page 51: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

17Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

On Windows, the initial PivotTable built will be completely empty, while on Mac it is often prepopulated with distinct values from the fi rst selected column down the rows of the table and distinct values from the second column across the columns. If you’re on a Mac, go ahead and uncheck all the boxes in the builder, so that you can work along from an empty table.

Now, say you wanted to know total revenue by item. To get at that, you’d drag the Item tile in the PivotTable Builder into the Rows section and the Price tile into the Values sec-tion. This means that you’ll be operating on revenue grouped by item name.

Initially, however, the PivotTable is set up to merely count the number of price records that are within a group. For example, there are 20 Beer rows. See Figure 1-17.

Figure 1-17: The PivotTable builder and a count of sales by item

You need to change the count to a sum in order to examine revenue. To do so, on Windows, drop the menu down on the Price tile in the Values section of the builder and select “Value Field Settings….” On Mac, press the little “i” button. From there, “sum” canbe selected from the various summary options.

Page 52: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

18 Data Smart

What if you wanted to break out these sums by category? To do so, you drag the Category tile into the Columns section of the builder. This gives the table shown in Figure 1-18. Note that the PivotTable in the fi gure automatically totals up rows and columns for you.

Figure 1-18: Revenue by item and category

And if you want to ever get rid of something from the table, just uncheck it or grab the tile from the section it’s in and drag it out of the sheet as if you were tossing it away. Go ahead and drop the Category tile.

Once you get a report you want in a PivotTable, you can always select the values and paste them to another sheet to work on further. In this example, you can copy the table (A5:B18 on Mac) and Paste Special its values into a new tab called Revenue By Item (see Figure 1-19).

Feel free to swap in various row and column labels until you get the hang of what’s going on. For instance, try to get a total calorie count sold by category using a PivotTable.

Page 53: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

19Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Figure 1-19: Revenue by Item tab created by pasting values from a PivotTable

Using Array FormulasIn the concession transaction workbook, there is a tab called Fee Schedule. As it turns out, Coach O’Shaughnessy would let you run the snack stand only if you kicked some of the profi t back to him (perhaps to subsidize his tube sock-buying habit). The Fee Schedule tab shows the percent cut he takes on each item sold.

So how much money do you owe him for last night’s game? To answer that question, you need to multiply the total revenue of each item from the PivotTable by the cut for the coach and sum them all up.

There’s a great formula for this operation that will do all the multiplication and sum-mation in a single step. Rather creatively named, it’s called SUMPRODUCT. In cell E1 onthe Revenue By Item sheet, add a label called Total Cut for Coach. In C2, determine theSUMPRODUCT of the revenue and the fees by adding this formula:

=SUMPRODUCT(B2:B15,'Fee Schedule'!B2:O2)

Page 54: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

20 Data Smart

Uh oh. There’s an error; the cell just reads #Value. What’s going wrong?Even though you’ve selected two ranges of equal size and put them in SUMPRODUCT,

the formula can’t see that the ranges are equal because one range is vertical and one’s horizontal.

Fortunately, Excel has a function for fl ipping arrays in the right direction. It’s called TRANSPOSE. You need to write the formula like this:

=SUMPRODUCT(B2:B15,TRANSPOSE('Fee Schedule'!B2:O2))

Nope! Still getting an error.The reason you’re still getting an error is that every formula in Excel, by default, returns

a single value. Even TRANSPOSE returns the fi rst value in the transposed array. If you want the whole array returned, you have to turn TRANSPOSE into an “array formula,” which meansexactly what you might think. Array formulas hand you back arrays, not single values.

You don’t have to change the way you type your SUMPRODUCT to make this happen. All you need to do is when you’re done typing the formula, instead of pressing Enter, press Ctrl+Shift+Enter. On the Mac, you use Command+Return.

Victory! As shown in Figure 1-20, the calculation now reads $57.60. But I suggest round-ing that down to $50, because how many socks does Coach really need?

Figure 1-20: Taking a SUMPRODUCT with an array formula

Solving Stuff with SolverMany of the techniques you’ll study in this book can be boiled down to optimization mod-els. An optimization problem is one where you have to make the best decision (choose the best investments, minimize your company’s costs, fi nd the class schedule with the

Page 55: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

21Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

fewest morning classes, or so on). In optimization models then, the words “minimize” and “maximize” come up a lot when articulating an objective.

In data science, many of the practices, whether that’s artifi cial intelligence, data mining, or forecasting, are actually just some data prep plus a model-fi tting step that’s actually an optimization model. So it’d make sense to teach optimization fi rst. But learning all there is to know about optimization is tough to do straight off the bat. So you’ll do an in-depth optimization study in Chapter 4 after you do some more fun machine learning problemsrin Chapters 2 and 3. To fi ll in the gaps though, it’s best if you get a little practice with optimization now. Just a taste.

In Excel, optimization problems are solved using an Add-In that ships with Excel called Solver.

• On Windows, Solver may be added in by going to File (in Excel 2007 it’s the top left Windows button) ➪ Options ➪ Add-ins, and under the Manage drop-down choosing Excel Add-ins and pressing the Go button. Check the Solver Add-In boxand press OK.

• On Mac, Solver is added by going to Tools then Add-ins and selecting Solver.xlamfrom the menu.

A Solver button will appear in the Analysis section of the Data tab in every version.All right! Now that Solver is installed, here’s an optimization problem: You are told you

need 2,400 calories a day. What’s the fewest number of items you can buy from the snack stand to achieve that? Obviously, you could buy 10 ice cream sandwiches at 240 calories a piece, but is there a way to do it for fewer items than that?

Solver can tell you!To start, make a copy of the Calories sheet, name the sheet Calories-Solver, and clear

out everything but the calories table on the copy. If you don’t know how to make a copy of a sheet in Excel, you simply right-click the tab you’d like to copy and select the Move or Copy menu. This gives you the new sheet shown in Figure 1-21.

To get Solver to work, you need to provide it with a range of cells it can set with deci-sions. In this case, Solver needs to decide how many of each item to buy. So in Column C next to the calorie counts, label the column How many? (or whatever you feel like), and you can allow Solver to store its decisions in this column.

Excel considers blank cells to be 0s so you needn’t fi ll in these cells with anything to start. Solver will do that for you.

Page 56: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

22 Data Smart

Figure 1-21: The copied Calories-Solver sheet

In cell C16, sum up the number of items to be bought above as:

=SUM(C2:C15)

And below that you can sum up the total calorie count of these items (which you’ll want eventually to equal 2,400) using the SUMPRODUCT formula:

=SUMPRODUCT(B2:B15,C2:C15)

This gives the initial sheet shown in Figure 1-22.Now you’re ready to build the model, so bring up the Solver window by pressing the

Solver button on the Data tab.

Page 57: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

23Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Figure 1-22: Getting calorie and item counts set up

NOTE

The Solver window, shown in Figure 1-23 in Excel 2011, looks pretty similar in Excel2010, 2011, and 2013. In Excel 2007, the layout is slightly different, but the only substantive difference is that there is no algorithm selection box. Rather, there’s an “Assume Linear Model” checkbox under the Options menu. We’ll learn all about theseelements later.

The main elements you plug into Solver to solve a problem, as shown in Figure 1-23, are an objective cell, an optimization direction (minimization or maximization), some decision variables that can be changed by Solver, and some constraints.

Page 58: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

24 Data Smart

Figure 1-23: The uninitialized Solver window

In your case, the objective is to minimize the total items in cell C16. The cells that can be altered are the item selections in C2:C15. And the constraints are that C17, the total calories, needs to be equal to 2,400. Also, we’ll need to add a constraint that our decisions be counting numbers, so we’ll need to check the non-negative box (under the options menu in Excel 2007) and add an integer constraint to the decisions. After all, you can’t buy 1.7 sodas. These integer constraints will be covered in depth in Chapter 4.

To add in the total calorie constraint, press the Add button and set C17 equal to 2,400 as shown in Figure 1-24.

Figure 1-24: Adding the calorie constraint

Similarly, add a constraint setting C2:C15 to be integers as shown in Figure 1-25.

Page 59: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

25Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

Figure 1-25: Adding an integer constraint

Press OK.In Excel 2010, 2011, and 2013, make sure the solving method is set to Simplex LP.

Simplex LP is appropriate for this problem, because this problem is linear (the “L” in LP rstands for linear as you’ll see in Chapter 4). By linear, I mean that the problem involves nothing but linear combinations of the decisions in C2 through C15 (sums, products with constants such as calorie counts, etc.).

If we had non-linear calculations in the model (perhaps a square root of a decision, a logarithm, or an exponential function), then we could use one of the other algorithms Excel provides in Solver. Chapter 4 covers this in great detail.

In Excel 2007, you would denote the problem as linear by clicking the Assume Linear Model under the Options screen. Your fi nal setup should appear as in Figure 1-26.

Figure 1-26: Final Solver setup for minimizing items needed for 2,400 calories

Page 60: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

26 Data Smart

All right! Go ahead and press the Solve button. Excel should fi nd a solution almost immediately. And that solution, as shown in Figure 1-27, is 5. Now, your Excel might pick a different 5 items than mine in the screenshot, but the minimum is 5 nonetheless.

Figure 1-27: The optimized item selection

OpenSolver: I Wish We Didn’t Need This, but We DoThis book was originally designed to work completely with Excel’s built-in Solver. However, as it turns out, functionality was removed from Solver in later versions for mysterious and dunadvertised reasons.

What that means is that while this whole book works using vanilla Solver in Excel 2007 and Excel 2011 for Mac, in Excel 2010 and Excel 2013, the built-in Solver will occasion-ally complain that a linear optimization model is too large (I’ll give you a heads-up in this book whenever a model gets that complex).

Luckily, there’s an excellent free tool called OpenSolver that’s available for the Windows versions of Excel that addresses this defi ciency. With OpenSolver, you can still build your model in the regular Solver interface, but OpenSolver provides a button that you press to use its Simplex LP algorithm implementation, which is blazingly fast.

Page 61: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

27Everything You Ever Needed to Know about Spreadsheets but Were Too Afraid to Ask

To set up OpenSolver, navigate to http://OpenSolver.org and download the zip fi le. Uncompress the fi le into a folder, and whenever you want to solve a beefy model, just set it up in a spreadsheet like normal and double-click the OpenSolver.xlam fi le, which will give you an OpenSolver section on the Data tab in Excel. Press the Solve button to solve an existing model. As shown in Figure 1-28, I’ve applied OpenSolver in Excel 2013 to the model from the previous section, and it buys fi ve slices of pizza.

Figure 1-28: OpenSolver buys pizza like a madman

Wrapping UpAll right, you’ve learned how to navigate and select ranges quickly, how to leverage absolute references, how to paste special values, how to use VLOOKUP and other matching formulas,how to sort and fi lter data, how to create PivotTables and charts, how to execute array formulas, and how and when to bust out Solver.

Page 62: Ready, Set, Go! - Technology eSampler

click to find out more about this book on wiley.com

28 Data Smart

Here’s either a depressing or fun fact depending on your perspective. I’ve known man-agement consultants at prominent fi rms who earn excellent salaries by doing what I call the “consulting two-step”:

1. Talk about nonsense with clients (sports, vacation, barbeque ... not that there’s anything nonsensical about smoked meats).

2. Summarize data in Excel.

You may not know all there is to know about college football (I certainly don’t), but if you internalize this chapter, you’ll have point number two knocked out.

But you’re not here to become a management consultant. You’re here to drive deep into data science, and that starts in the next chapter where we’ll get started with a little bit of unsupervised machine lear ning.