Download pdf - Big Data & the Enterprise

Welcome

It used to be easy…

they all looked pretty much alike

NoSQL BigData MapReduce Graph Document

BigTable Shared Nothing

Column Oriented CAP Eventual

Consistency

ACID BASE Mongo Coudera Hadoop

Voldemort Cassandra Dynamo Marklogic Redis

Velocity Hbase Hypertable Riak BDB

Now it’s downright

c0nfuZ1nG!

What Happened?

we changed scale

we changed tack

the

big data

conundrum

the

big data

conundrum ?

The Internet

~1% of that is text

Which isn’t mostly text

Words (0.6)

Web Pages (40)

Everything (500,000)

Sizes in Petabytes

~0.01% is web pages

And there is lots of other stuff out there

mobile

sensors

Logs

video audio

Social data

weather

Gartner

80% of business is conducted on unstructured information

Big Data is now a new class of economic asset*

*World economic forum 2012

Yet 80% Enterprise Databases < 1TB

Reference from 2009

so what does Big Data mean for

the enterprise?

Insight α data

Data beats Algorithms

>

Backing up a bit…

We live in a world of, largely, private data

Where data is often changed and forwarded on

Sometimes we’re a bit more organized!

But most of our data is not generally accessible

Exposed

Core Operational

Data

Sharing is often an afterthought

Exposed

Core Operational

Data

How do we process, acquire, reason about and act upon information?

The Brain Reptilian

Primitive operations: balance, temperature regulation, breathing

Mammalian

Emotion, short-term memory, flee of fight etc (limbic)

Neocortex

Plan, innovate, problem solve etc

Our intelligence is segregated in disparate worlds

could our corporations be

more intelligent?

Siloed, closed, bespoke data makes our organisations opaque and

unresponsive

What if we exposed it all?

So what might that look like?

•  Single data store •  Federated, homogenous stores •  Federated, heterogeneous stores

The Google Approach

MapReduce

Google Filesystem

BigTable

Tenzing

Megastore

F1

Dremel

Spanner

The Ebay Approach

so is one approach

better?

Data Volume?

��

��

1001010 1000 10,000

TB

We live well within the overlap region

Academic acumen?

Performance Trade-Off Curve

•  Volume (pure physical size) •  Velocity (rate of change) •  Variety (number of different types of data,

formats and sources) •  Static & Dynamic Complexity (do you need to

interpret the affect one message has on another)

Our ability to model data is much more of a gating factor than raw size, particularly

when considering new forms of data

Dave Campbell (Microsoft – VLDB Keynote)

Problem

Core Data Model

Gravitate around a single data model

Core Data Model

Application Specific Models

Views / linkages

Globally Accessible

Core Data Model

The data itself follows a similar pattern

Core Data

Application Specific data

Views / linkages

Globally Accessible

Compose Solutions (for now)

��

��

��

��

�

Big Data is more than the opportunity for better insight over new data sources

It is the opportunity to make the organisation smarter, simply by making data more accessible

But the harder job, for us, is unifying the various domains to make all that

data intelligible

Thanks

http://www.benstopford.com @benstopford