Welcome
It used to be easy…
they all looked pretty much alike
NoSQL BigData MapReduce Graph Document
BigTable Shared Nothing
Column Oriented CAP Eventual
Consistency
ACID BASE Mongo Coudera Hadoop
Voldemort Cassandra Dynamo Marklogic Redis
Velocity Hbase Hypertable Riak BDB
Now it’s downright
c0nfuZ1nG!
What Happened?
we changed scale
we changed tack
the
big data
conundrum
the
big data
conundrum ?
The Internet
~1% of that is text
Which isn’t mostly text
Words (0.6)
Web Pages (40)
Everything (500,000)
Sizes in Petabytes
~0.01% is web pages
And there is lots of other stuff out there
mobile
sensors
Logs
video audio
Social data
weather
Gartner
80% of business is conducted on unstructured information
Big Data is now a new class of economic asset*
*World economic forum 2012
Yet 80% Enterprise Databases < 1TB
Reference from 2009
so what does Big Data mean for
the enterprise?
Insight α data
Data beats Algorithms
>
Backing up a bit…
We live in a world of, largely, private data
Where data is often changed and forwarded on
Sometimes we’re a bit more organized!
But most of our data is not generally accessible
Exposed
Core Operational
Data
Sharing is often an afterthought
Exposed
Core Operational
Data
How do we process, acquire, reason about and act upon information?
The Brain Reptilian
Primitive operations: balance, temperature regulation, breathing
Mammalian
Emotion, short-term memory, flee of fight etc (limbic)
Neocortex
Plan, innovate, problem solve etc
Our intelligence is segregated in disparate worlds
could our corporations be
more intelligent?
Siloed, closed, bespoke data makes our organisations opaque and
unresponsive
What if we exposed it all?
So what might that look like?
• Single data store • Federated, homogenous stores • Federated, heterogeneous stores
The Google Approach
MapReduce
Google Filesystem
BigTable
Tenzing
Megastore
F1
Dremel
Spanner
The Ebay Approach
so is one approach
better?
Data Volume?
������������
����������������������������
1001010 1000 10,000
TB
We live well within the overlap region
Academic acumen?
Performance Trade-Off Curve
• Volume (pure physical size) • Velocity (rate of change) • Variety (number of different types of data,
formats and sources) • Static & Dynamic Complexity (do you need to
interpret the affect one message has on another)
Our ability to model data is much more of a gating factor than raw size, particularly
when considering new forms of data
Dave Campbell (Microsoft – VLDB Keynote)
Problem
Core Data Model
Gravitate around a single data model
Core Data Model
Application Specific Models
Views / linkages
Globally Accessible
Core Data Model
The data itself follows a similar pattern
Core Data
Application Specific data
Views / linkages
Globally Accessible
Compose Solutions (for now)
�������������
���������
�� ������� ����
������� �������������
�
Big Data is more than the opportunity for better insight over new data sources
It is the opportunity to make the organisation smarter, simply by making data more accessible
But the harder job, for us, is unifying the various domains to make all that
data intelligible
Thanks
http://www.benstopford.com @benstopford