BIG Data Understanding to From

Big data to big understanding

COCHRANE a s s o c i a t e s

cochrane.org.uk c a - g l o b a l . o r g

Peter Cochrane

GENERALITY !Augmenting data mining with big data

Fom fine detai l

to the big picture

WHYAll T h e E x c i t e m e n t ?

A new tool addresses some of our biggest challenges:

-A ful ly networked, connected and increasingly open world -Disparity of disconnected/si loed discipl ines and industr ies -Rapid r ise of complexity far exceeding human abi l i t ies -Gross fai lure of old thinking, models and methods -Rise of non-l inerity and emergent behaviour -Displacement of technologies and people -Growth of interdiscipl inary relationships -Acceleration of technology and change -Freedom of data and information -Global isation of everything - +++++

WHYThe B I G D e a l ?

These problem sets are way beyond:

-The desk top -Past thinking -Spread sheets -Old data bases -Simple analysis -Basic mathematics -Simple programming -Relational data bases -Al l our past experiences - +++++

WHY Is I t a l l s o I m p o r t a n t ?

We are in a new era of:

-Novel causal it ies -Correlation discovery -More powerful computer models -Unusual and unexpected solutions -Extremely rare event identif ication -Unexpected behaviours and outcomes -Original classes of relationship discovery -Degree of freedom reduction from six to two -Rare and improbable association identif ication -Previously unseen classes of objects/behaviours -Essential element creation for sustainable futures - +++++

FREEDOM What D e g re e o f S e p a r a t i o n ?

Analogue World D i s c o n n e c t e d Insulated Society

Digital World C o n n e c t e d N e t w o r k e d


Organisat ions









The smaller the separation the bigger the networking

and data generated

BIG Global Dynamic D i s t r i b u t e d

We have to addresses new sets of issues and dimensions created by

a fast moving digital world that have remained largely unseen, untapped, and of a scale and complexity never seen before

SCALE! Beyond H u m a n C a p a c i t y

A small sample of our formidable challenges:

-Sustainabi l i ty -Genome Decode -Protein structure/folding -Social network associations -Genome-protein communication -Disease causal ity and propagation -Global/National medical records analysis -Seismic analysis for raw material location -Money laundering and tax avoidance tracing -Terrorist and criminal activ ity characterisation -Astronomical data analysis and ‘body’ c lassif ication +++++

SIZE How M u c h D a t a ?

0 2000 2005 2010 2015 2020 2025 2030








More Data Created in 2002 than al l of t ime up to that point !

Spread of data

creation est imates

Beyond Moore’s Law exponen t i a l g rowth that is Aaccelerating


SIZE How M u c h D a t a ?

>> than the per day

2003 ~ 5EB/year 2015 ~ 5EB/day

WHERE Does I t a l l C o m e F ro m ?

Things Pe o p l e M a c h i n e s E d u c a t i o n H e a l t h C a re I n s t i t u t i o n s G o v e r n m e n t C o m m u n i c a t i o n + + + + + + + + + + + + +

Tr a n s p o r t a t i o n N e t w o r k i n g

C o m m e rc e B u s i n e s s S e c u r i t y Po l i c i n g S c i e n c e

M e d i a + + + +

C o m p e t i t i o n E x p l o r a t i o n Re s e a rc h M a r k e t s G re e n S o c i a l O p e n A p p s + + + +

ANALYTICS Structured U n - S t r u c t u re d S e m i - S t r u c t u re d

Applicabil ity:

-Retai l -Science -Banking -Security -Defence -Medicine -Wholesale -Commerce -Production -Technology -Manufacturing

Manufacturing Government Exploration Inst itut ions Resourcing Innovation Education Creativity

Logist ics Energy +++++

ANALYTICS Lots o f D e t a i l v Re l a t i o n s h i p s

Data Mining Data Micro-View Data & Detai l

Big Data Macro-View Relationships

Limited C o n t a i n e d C o n s t r a i n e d

Expaning Te n d i n g t o

T h e I n f i n i t e

HUH? Knowns U n k n o w n s U n k n o w n U n k n o w n s

The many problems:

-Certain and wel l defined chal lenges

-Suspected or manifest in some way, but i l l def ined

-To be discovered, become apparent, present problems

-Primary l imitations are our abi l i ty to detect and characterise

-Secondary l imitations include our inabi l i ty to recognise s ignif icance

-Causal ity, probabl ity, statist ics conspire to conceal , confuse and tr ick us!

- +++++

TRUTH Is It S t a t i c Ve r a c i o u s ?

The Earth is: - F l a t - S t a t i c - S p h e r i c a l - A n o b l a t e s p h e ro i d -T h e c e n t r e o f u n i v e r s e -T h e a x i s o f t h e s u n a n d s t a r s - C e n t r e o f u n i v e r s e

P l a n t s : - G ro w o u t o f t h e s o i l - H a ve n o s e n s o r y f a c i l i t y - C a n n o t g ro w w i t h o u t l i g h t

The ab i l i t y to o b s e r ve , measure and model with increasing accuracy creates dynamic and more relevant truths in line with our growing knowledge & reality

In general ‘truth’ is dynamic and not a f ixed entity - it mutates as we

gather more information and create deeper


HUMAN Limited Re a s o n i n g a n d A n a l y s i s !

Big Data scale and complexity:

-Render Big Data beyond human abi l i t ies alone

-See structured and relational databases fal l ing far short

-Make crude correlation and association analysis inadequate

-Includes many disparate/hidden relationships that are confounding

-Introduces mult i -dimensional visual isation/conceptual isation diff icult ies

-Extends analysis beyond ‘Order 5’ mathematical models/general methods

- +++++

THEORYWe D o N o t H a v e O n e !

Big Data really needs a Big Theory:

-Complexity confounds us -There are no general ised solutions -There is no suitable math framework -To some degree we are working partial ly bl ind -We can only use what we have already establ isged -Computer model l ing/s imulation/analogues can be used -Hypothesis test ing and experimental tr ials are often vital - +++++

We Have ‘NO” G e n e r a l P u r p o s e To o l s / M e t h o d o l o g i e s

SYMBIOSIS A New P a r t n e r f o r M a n k i n d

Joining forces with m a c h i n e s a p p e a r s t o b e t h e o n l y v i a b l e f u t u re

Vital S y m b i o s i s & A u g m e n t a t i o n


Low High


Analysis Modelling Processing

Mathematics Computation

History Intuition

Creativity Experience



PICTURESVary with people, things +++:

-Social - Intent -Interest -Mobil i ty -Browsing -Expertise -Ownership -Connectiv ity -Consumption -Communication - +++++

In F o r m & S c a l e

PICTURES We M i g h t C o n j o u r !

This wil l vary with people, things, organisation:

-Social - Intent -Interest -Mobil i ty -Browsing -Expertise -Ownership -Connectiv ity -Consumption -Communication - +++++

TOOLSGenerally beyond the abil it ies of most companies:

-Graph theory -Hash f i l ter ing -Causal ity test ing -Weighted mapping -Trajectory projection -N Dimensional s i ft ing - +++++

Key to S u c c e s s S p e c i a l i s e d

THE # Key to A n a l y s i s B y Tr a n s f o r m a t i o n

Very weak to strong relationship identif ication:

-General ly appl icable -Reveals subtle relationships -Effective for small and big data -Weeds out/reveals concealed l inks - +++++

MAPPING Path Tr a c i n g Re l a t i o n s h i p s

N-Dimensional relationship characterisation:

-Easi ly understood -General ly appl icable -Fits human perception -Reveals subtle relationships -Effective for small and bid data -Weeds out/reveals concealed l inks - +++++

VISUALISATIONExploit graphics & display technology

-Big -Clear -Static -Colour -Expl ic it - Intuit ive -Animated -Interactive - +++++

In E v e r y D i m e n s i o n

ANIMATIONS Add New D i m e n s i o n s G re a t e r U n d e r s t a n d i n g

Search: Hans Roling @ TED Hans Rosling


GENESIS The J o u r n e y H a s J u s t B e g u n

Big Data and Complexity need generalised theories l ike physics

needed thermodynamics and quantum mechanics

UNDERSTANDING Never B e e n s o D i f f i c u l t

Our s ingle biggest chal lenge as a species:

-Our past was bui lt on it -Our future depends upon it -We are not evolving to be any smarter -Ult imately we are l imited by our tools -Techno l ogy i s ou r on l y s u r v i va l rou t e

END GAME Wisdoms K n o w l e d g e U n c e r t a i n t y

Our big conceptual challenge: moving on from a history of ‘static truths’ and mostly clear and certain answers to a world dominated by the probabal ist ic and uncertain where ‘the truth’ has to be updated and rewritten

Our primary tool here: Increasingly powerful instruments of observation and m e a s u r e m e n t , c o m p l e m e n t e d b y computer deep model l ing and simulation


