Enterprise intelligence apr2012 load - romania - 30 min

Preview:

DESCRIPTION

 

Citation preview

© 2012 IBM Corporation1

Enterprise Intelligence

Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics

Email: jeffjonas@us.ibm.comBlog: www.jeffjonas.typepad.com

Twitter: http://www.twitter.com/jeffjonas

© 2012 IBM Corporation2

My Background

Early 80‟s: Founded Systems Research & Development (SRD), a custom software consultancy

Personally designed and deployed +/- 100 systems, a number of which contained multi-billions of transactions describing 100‟s of millions of entities

1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA)

2001: Funded by In-Q-Tel, the venture capital arm of the CIA

2005: IBM acquires SRD

Today: Primarily focused on „sensemaking on streams‟ with special attention towards privacy and civil liberties protections

© 2012 IBM Corporation3

Time

Com

puti

ng P

ower

Gro

wth

Sensemaking Algorithms

Available Observation

Space

Context

Trend: Organizations Are Getting Dumber

EnterpriseAmnesia

Every two days now we create as much information as we did from the dawn of civilization up until 2003.”

~ Eric Schmidt, CEO Google

© 2012 IBM Corporation4

Amnesia, definition

A defect in memory, especially resulting from brain damage.

© 2012 IBM Corporation5

Enterprise Amnesia, definition

A defect in memory, resulting in wasted resources, lower revenues, unnecessary fraud losses, etc.

© 2012 IBM Corporation6

Time

Sensemaking Algorithms

Available Observation

Space

ContextWHY?

Trend: Organizations Are Getting DumberC

ompu

ting

Pow

er

Gro

wth

© 2012 IBM Corporation7

Algorithms at Dead End.

You Can‟t Squeeze Knowledge

Out of a Pixel.

© 2012 IBM Corporation8

scrila34@msn.com

No Context

© 2012 IBM Corporation9

Context, definition

Better understanding something by taking into account the things around it.

© 2012 IBM Corporation10

Information in Context … and Accumulating

Top 200Customer

Job Applicant

IdentityThief

CriminalInvestigation

scrila34@msn.com

© 2012 IBM Corporation11

The Puzzle Metaphor

Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors

What it represents is unknown – there is no picture on hand

Is it one puzzle, 15 puzzles, or 1,500 different puzzles?

Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted

Some pieces may even be professionally fabricated lies

Until you take the pieces to the table and attempt assembly, you don‟t know what you are dealing with

© 2012 IBM Corporation12

Puzzling

Cottage Garden

© 2010 Royce B. McClure,

Artist All Rights Reserved

© 2010 Ravensburger USA,

Inc.

Down Home Music

© Kay Lamb Shannon,

Artist

Licensed by Cypress Fine

Art Licensing

© 2011 Ravensburger USA

Inc.

Neuschwanstein Beauty

© 2009 Photo Copyright

Robert Cushman Hayes

© 2009 Ravensburger USA,

Inc.

Vegas

Artwork provided by

Hadley House Licensing,

Minneapolis

© 2011 Giesla Hoelscher

All Rights Reserved

© 2011 Ravensburger USA,

Inc.

270 pieces

90%200 pieces

66%

150 pieces

50%

6 pieces2%

30 pieces10% (duplicates)

© 2012 IBM Corporation13

© 2012 IBM Corporation14

© 2012 IBM Corporation15

First Discovery

© 2012 IBM Corporation16

More Data Finds Data

© 2012 IBM Corporation17

Duplicates in Front Of Your Eyes

© 2012 IBM Corporation18

First Duplicate Found Here

© 2012 IBM Corporation19

© 2012 IBM Corporation20

Incremental Context – Incremental Discovery

6:40pm START

22min “Hey, this one is a duplicate!”

35min “I think some pieces are missing.”

37min “Looks like a bunch of hillbillies ona porch.”

44min “Hillbillies, playing guitars, sittingon a porch, near a barber sign …and a banjo!”

© 2012 IBM Corporation21

150 pieces

50%

© 2012 IBM Corporation22

Incremental Context – Incremental Discovery

47min “We should take the sky and grassoff the table.”

2hr “Let‟s switch sides, and see if wecan make sense of this fromdifferent perspectives.”

2hr10m “Wait, there are three … no, fourpuzzles.”

2hr17m “We need a bigger table.”

2hr18m “I think you threw in a few randompieces.”

© 2012 IBM Corporation23

© 2012 IBM Corporation24

How Context Accumulates

With each new observation … one of three assertions are made: 1) Un-associated; 2) placed near like neighbors; or 3) connected

Must favor the false negative

New observations sometimes reverse earlier assertions

Some observations produce novel discovery

As the working space expands, computational effort increases

Given sufficient observations, there can come a tipping point

Thereafter, confidence improves while computational effort decreases!

© 2012 IBM Corporation25

Big Data [in context]. New Physics.

More data: better the predictions– Lower false positives

– Lower false negatives

More data: bad data good– Suddenly glad your data is not perfect

More data: less compute

© 2012 IBM Corporation26

Big Data

Pile of ____ In Context

© 2012 IBM Corporation27

One Form of Context: “Expert Counting”

Is it 5 people each with 1 account … or is it 1 person with 5 accounts?

Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times?

If one cannot count … one cannot estimate vector or velocity (direction and speed).

Without vector and velocity … prediction is nearly impossible.

© 2012 IBM Corporation28

Entity ResolutionDemonstration

© 2012 IBM Corporation29

VOTERGeorge F Balston

YOB: 1951 D/L: 4801

13070 SW Karen Blvd Apt 7

Beaverton, OR 97005

Last voted: 2008

DECEASED PERSONGeorge Balston

YOB: 1951 SSN: 5598

DOD: 1995

Entity Resolution Demonstration

When it comes to best practices in voter matching, if only a name and year of birth match, this is insufficient proof of a match. Many

different people in the U.S. share a name and year of birth.

Human review is required.

Unfortunately, there are thousands and thousands of cases just like this and state election offices don‟t have the staff (or budget) to

manually review such volumes.

© 2012 IBM Corporation30

VOTERGeorge F Balston

YOB: 1951 D/L: 4801

13070 SW Karen Blvd Apt 7

Beaverton, OR 97005

Last voted: 2008

DECEASED PERSONGeorge Balston

YOB: 1951 SSN: 5598

DOD: 1995

Now Consider This Tertiary DMV Record

DMVGeorge F Balston

YOB: 1951 SSN: 5598 D/L: 4801

3043 SW Clementine Blvd Apt 210

Beaverton, OR 97005

The DMV record contains enough features to match both the voter (name, year of birth and driver‟s license) and/or the deceased persons record (name, year of birth and SSN). For the sake of argument, let‟s

say it matches the voter best.

© 2012 IBM Corporation31

VOTERGeorge F Balston

YOB: 1951 D/L: 4801

13070 SW Karen Blvd Apt 7

Beaverton, OR 97005

Last voted: 2008

DMVGeorge F Balston

YOB: 1951 SSN: 5598 D/L: 4801

3043 SW Clementine Blvd Apt 210

Beaverton, OR 97005

DECEASED PERSONGeorge Balston

YOB: 1951 SSN: 5598

DOD: 1995

Features Accumulate

The voter/DMV record now shares a name, year of birth and SSN with the deceased person record. In voter matching best practices, this evidence would be sufficient to make a determination that this voter

is in fact deceased. This case no longer needs human review.

© 2012 IBM Corporation32

VOTERGeorge F Balston

YOB: 1951 D/L: 4801

13070 SW Karen Blvd Apt 7

Beaverton, OR 97005

Last voted: 2008

DMVGeorge F Balston

YOB: 1951 SSN: 5598 D/L: 4801

3043 SW Clementine Blvd Apt 210

Beaverton, OR 97005

DECEASED PERSONGeorge Balston

YOB: 1951 SSN: 5598

DOD: 1995

Useful Insight Revealed!

As features accumulate it becomes possible to resolve previous un-resolvable identity

records.

As events and transactions accumulate –

detection of relevance improves.

Here we can see George who died in 1995 voted in

2008.

© 2012 IBM Corporation33

IBM InfoSphere Identity Insight V8

© 2012 IBM Corporation34

MoneyGram International

© 2012 IBM Corporation35

Enterprise IntelligenceOne Plausible Journey

Enterprise IntelligenceOne Plausible Journey

© 2012 IBM Corporation36

ObservationSpace

Sense and Respond

What you know

New Observations

© 2012 IBM Corporation37

ObservationSpace

Decide

?Relevance

Finds the Sensor(<200ms)

Data Finds Data

Sense and Respond

© 2012 IBM Corporation38

Explore and Reflect

ObservationSpace

Decide

?

DirectedAttention

Relevance Find You

DeepReflection

CuratedData

PatternDiscovery

RelevanceFinds the Sensor

(<200ms)

Data Finds Data

Sense and Respond

© 2012 IBM Corporation39

ObservationSpace

Decide

?

DirectedAttention

NEWINTERESTS

DeepReflection

CuratedData

PatternDiscovery

RelevanceFinds the Sensor

(<200ms)

Data Finds Data

Explore and ReflectSense and Respond

© 2012 IBM Corporation40

ObservationSpace

Decide

?

DeepReflection

CuratedData

PatternDiscovery

RelevanceFinds the Sensor

(<200ms)

Data Finds Data

InfoSphere StreamsILog

NetezzaSPSS

Watson

DirectedAttention

Cognos

Explore and ReflectSense and Respond

InfoSphere Streams

NEWINTERESTS

SPSSSensemaking

© 2012 IBM Corporation41

ObservationSpace

Decide

?

DirectedAttention

NEWINTERESTS

DeepReflection

CuratedData

PatternDiscovery

RelevanceFinds the Sensor

(<200ms)

Data Finds Data

Report and Manage

Explore and ReflectSense and Respond

© 2012 IBM Corporation42

Decide

?

DirectedAttention

NEWINTERESTS

PatternDiscovery

RelevanceFinds the Sensor

(<200ms)

Data Finds Data

Info Management Systems

Content ManagementCase ManagementData Warehousing

Report and Manage

© 2012 IBM Corporation43

Big Data Trends

© 2012 IBM Corporation44

Val

ue o

f D

ata

The Greater the Context, the Greater the Value

Pile of Data

Records Managed(Big) (Ludicrous Big)

Data in Context

© 2012 IBM Corporation45

Willing

ness

to

Wai

tThe better the

predictions … the faster they will be

wanted.

“Why did we have to wait until the

end of the day for the smart answer?”

Time Is Of The Essence

Relevance (Iffy) (Totally)

Day

Hour

200ms

Batch

Real-Time

© 2012 IBM Corporation46

Closing Thoughts

© 2012 IBM Corporation47

The most competitive organizations

are going to make sense of what they are observing

fast enough to do something about it

while they are observing it.

© 2012 IBM Corporation48

Time

Sensemaking Algorithms

Available Observation

Space

Context

Wish This On The Competitor

EnterpriseAmnesia

Com

puti

ng P

ower

Gro

wth

© 2012 IBM Corporation49

Time

The Way Forward: Enterprise Intelligence

Sensemaking Algorithms

Available Observation

Space

Context

Com

puti

ng P

ower

Gro

wth

© 2012 IBM Corporation50

Related Blog Posts

Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel

Puzzling: How Observations Are Accumulated Into Context

On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others

G2 | Sensemaking – One Year Birthday Today. Cognitive Basics Emerging.

© 2012 IBM Corporation51

Email: jeffjonas@us.ibm.com

Blog: www.jeffjonas.typepad.com

Twitter: http://www.twitter.com/jeffjonas

Questions?

© 2012 IBM Corporation52

Enterprise Intelligence

Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics

Email: jeffjonas@us.ibm.comBlog: www.jeffjonas.typepad.com

Twitter: http://www.twitter.com/jeffjonas

© 2012 IBM Corporation53

Sensemaking on StreamsMy G2 Secret Little IBM Project

3+ years in the making

© 2012 IBM Corporation54

G2 Mission Statement

1) Evaluate each new observation against previous observations.

2) Determine if what is being observed is relevant.

3) Delivering this actionable insight to its consumer … fast enough to do something about it while it is still happening.

4) Doing this with sufficient accuracy and scale to really matter.

© 2012 IBM Corporation55

From Pixels to Pictures to Action

Observations

Data Finds Data

PersistentContext

Relevance Finds You

Consumer(An analyst, a system, the sensor itself, etc.)

This is G2

© 2012 IBM Corporation56

Uniquely G2

More scalable, faster and extensible– Designed for grid compute and sub-200ms sense and respond

Smarter– Tolerance for disagreement (no such thing as a single version of truth)

– Support for more abstract entities (e.g., locations, products, asteroids)

– Support for more exotic features (e.g., biometrics, social circles)

Crazy stuff– Detects on its own when it is confused and makes “note to self”

– Geospatial reasoning including a sense of here and now

Privacy by Design (PbD) – More privacy and civil liberties enhancing features baked-in than any other

commercial technology

© 2012 IBM Corporation57

PbD: Self-Correcting False Positives

Which reveals this is a FALSE POSITIVE

John T Smith Jr123 Main Street

703 111-2000DOB: 03/12/1984

John T Smith123 Main Street

703 111-2000DL: 009900991

A plausible claim these two people are the same

1

2 John T Smith Sr123 Main Street

703 111-2000DL: 009900991

Until this record comes into view

3

© 2012 IBM Corporation58

PbD: Self-Correcting False Positives

John T Smith Jr123 Main Street

703 111-2000DOB: 03/12/1984

John T Smith123 Main Street

703 111-2000DL: 009900991

John T Smith Sr123 Main Street

703 111-2000DL: 009900991

New Best Practice:FIXED IN REAL-TIME

(not end of month)

John T Smith123 Main Street

703 111-2000DL: 009900991

1

3

2

2

© 2012 IBM Corporation59

Customer Facing Systems

Data Mining

Back-of-House Accounting Systems

Fraud

This System That System

Sensemaking

Recommended