Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Preview:

DESCRIPTION

Presentation to NYC MSBIgData Group

Citation preview

Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Rob Collie

Me

13+ years at Microsoft in Redmond

Technical design, strategic direction, and project management

Office 97, Windows Installer (MSI) v1, Excel 2003, Excel 2007, Bing

Designed much of PowerPivot v1

CTO, Pivotstream.com

PowerPivotPro.com, PowerPivotFAQ.com

“Dominate” is a dramatic word

Back end storage and processing isn’t going anywhere (but it will change slightly)

Not a threat – an opportunity

Two Agendas

What the opportunity looks like & where Excel/PowerPivot fits

How Excel earned its stigmas and how PowerPivot dispels most of them

Will swap back and forth between them

Why Excel “Sucks” Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects!2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 &

like it!5. No separation of presentation and logic. No “portability.”

DEMO1: MILES AND MILES OF DATA

300 Million Rows in One Workbook

If Printed Out, Those 300M Rows Would Stretch 1,000 Miles!

Want Billions of Rows? Import to Tabular BISM

Want Billions? Import to Tabular BISM

Import Results – Same Formulas and UX as PowerPivot, Just a Different Frame (VS vs. Excel)

Updating The Checklist – PowerPivot “Fixes” Excel Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects!2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 &

like it!5. No separation of presentation and logic. No “portability.”

OPPORTUNITY

Trend #1: Data Explosion

Library of Congress:

530 miles of bookshelves

10 Terabytes (That’s it???)

2006 2007 2008 2009 2010 20110

200

400

600

800

1000

1200

1400

1600

1800

2000

Worldwide Data Storage (EB)

Worldwide Data in Storage:

~180 Million TB in 2006

10x increase in 5 years!

~3 Libraries of Congress per US Household

Trend #2: BI Spending ACCELERATES in Recessions

page 15

If Big Data is not accessible via the right tools, you might as well not even be storing it.

DEMO: WHAT NEW YORKERS DRINK

Demo Screenshot: Corona Dominates NYC Beer Sales

Note that this demo is running in my browser!– No Excel or PowerPivot install required– Even runs on Mac and iPad

Very “Fisher Price” UX, not scary like Excel – just a friendly website

But Stella Artois Rules Manhattan

Note that the report is sliced to Manhattan only, one click

Also note that the user cannot download the workbook, just interact with it – secure and controlled

VERY Different Bestseller list in the Bronx

“Cordina” brand holds spots 2, 3, 5, and 7

This report automatically refreshes itself with the latest data on a regular schedule – no human intervention required!

The Checklist Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects, with no enforced “blessed” version2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 & like

it!5. No separation of presentation and logic. No “portability.”

Big Data is a Matter of Opinion

The v’s

<Went looking for supporting articles>

Confirmation!

Important Points/My Opinions

Decisionmakers don’t care how data is stored

Decisionmakers don’t care how big the data is– Even 1,000 rows is bigger than they can digest– Humans can digest one screen at most– They need us to give them SMALL data

Decisionmakers don’t like to learn new tools

It is pointless and counterproductive to fight any of this

Opinion: At the place where it matters, there is no difference between Big Data and BI – it’s all Insight, consumed primarily by non-technical humans

But Decisionmakers are an Obstacle

Only they know what they know

Only they know what they need– They don’t even know what they want til they see what

they don’t!

They don’t know how to explain either of the above

They don’t understand your language at all – what’s easy, what’s difficult

They budget to spend about 10% of the time required with you

True Story: How a week became an hour

In 2006, I hired a top-notch BI pro for a project at MS

I was the domain expert (the “decisionmaker”) but knew nothing of the toolset.

He was the technical pro (the “doer”) and knew nothing of the domain.

Writing and debugging a single formula took a full week of iteration and communication.

In 2009 I revisited the same project– But thanks to PowerPivot, this time I was both decisionmaker and doer

The same formula process now took LESS THAN ONE HOUR!– This was true even though I had forgotten every last detail of the 2006

project

Why did a week become an hour? HOW???

Communication: The “Dark Matter” of BI Projects

Knowledge Worker

…but person to person communication at “2400 Baud Dialup” speed BI Pro

Internal Communicationat “Broadband” Speed…

Where the Time Gets Spent Where the Time Gets Spent

Internal Communicationat “Broadband” Speed…

Never budgeted or accounted or rewarded… so they don’t commit

Of which, 10% create PivotTables

- Every org has them- ~7M Java Devs, 2M SQL Pros- Each supports avg of 15 BDM’s- Support majority of informed decisions in the biz world

Excel Pros – Data Pros’ New Allies

300M Users

30M Pros

But Even Better…

They intrinsically know the business as well as the decisionmakers (often, they ARE decisionmakers)

They share your (IT, development) mindset more than you’d expect

They can and will pick up PowerPivot quickly

They NEED you

They’re great teammates and are thrilled to cooperate with you

Traditional Model Bottlenecks on BI Pro, and Coming Soon to Big Data

Knowledge Worker /Analyst / Excel Pro BI Pro

BI Pro Intensely Engaged with One Project at a Time

Everyone Else Waits Make uninformed decisions/guesses Burn time inefficiently with

spreadsheets Make costly spreadsheet mistakes Leak sensitive information Become entrenched in spreadsheet

process, resistant to improvement once BI resources available

BIgData Pro Now Can Address Multiple Projects

BUDGET VS ACTUALS

The Demo See blog posts:

– http://ppvt.pro/BudgetActuals1– http://ppvt.pro/BudgetActuals2

The Checklist Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects, with no enforced “blessed” version2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 & like

it!5. No separation of presentation and logic. No “portability.”

Bonus Demos

Weather

Power View

Connection to Hadoop

UFO’s