Cognitive Computing Reasoning By Similarity With
Associative Memories
Paul Hofmann, PhD, CTO Saffron Technology October 2013
Cognitive Computing: Reasoning By Analogy
Cognitive Distance Is The Universal Measure for Reasoning By Similarity
Regularity <-> Randomness Signal <-> Noise
Distance between Strings
Cognitive Distance based on Kolmogorov Complexity CD ~ max {log(x),log(y)}-log(x,y) / ( logN-min{log(x),log(y)} ) à the saddle is closer to the cowboy
Cilibrasi, Rudi L.; Vitanyi, Paul M.B. The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No 3, March 2007, 370–383.
x=131M “saddle” y=87M “movie”
y=1,890M
xy=73M xy=8M
What is closer to cowboy? 1. saddle or 2. movie
Pairwise Similarity Is Not Enough
Cognition Is About Context -> Semantic Triples
Context Matters
The Power Of Dependencies
Triple Store Reduces Noise
XOR Problem – X, Y, Z 2 Random sources X and Y are made dependent by output Z Criminal? (Z) Employment Status (X) Buys a new car with cash (Y)
Pairwise correlations are very noisy
What is this an image of?
10/1/13 8 Saffron Technology, Inc. All Rights Reserved.
It’s close to this…
10/1/13 9 Saffron Technology, Inc. All Rights Reserved.
The Power of Dependenciesà where the value is
NoSQL - Associative Memories Are Truly Asynchronous Computing
Connec@ons and counts = synapses and strengths
Hopfield Network Emerging pa=erns
Ising Model for order à disorder phase transi@on e.g. Ferromagne@sm
weights are determinis@c à parameter free
C B
A
+1
-‐1 -‐1
Training Vector ABC 101 011 100
C B
A
+1
-‐1 -‐1
B SAME AS ON
B DIFF
FROM OFF
C B
A
+1
-‐1 -‐1
{0,1}
{0,1}
{0,1}
{0,1}
{0,1}
{0,1}
{0,1}
w0
w3
w1
w10 w9
w7
w13 w12
w5 w8
w6
w4
w11
w2
Associative Memories - Asynchronous Computing
Large Scale Machine Learning on Sparse Matrices
Why is this so special? • No static model à
parameter free, non-linear & instant incremental learning
• Combines graph & statistics
• Input vector of millions of attributes
• Saffron stores & queries billions of triples
refid 1234 1 1 1 1 1 1 1 1 1 1
place London 1 1 1 1 1 1 1 1 1 1
person John Smith 1 1 1 1 1 1 1 1 1 1
person Prime Minister 1 1 1 1 1 1 1 1 1 1
time 14-‐Jan-‐09 1 1 1 1 1 1 1 1 1 1
verb flew 1 1 1 1 1 1 1 1 1 1
verb meet 1 1 1 1 1 1 1 1 1 1
keyword rainy 1 1 1 1 1 1 1 1 1 1
keyword day 1 1 1 1 1 1 1 1 1 1
keyword aboard 1 1 1 1 1 1 1 1 1 1
duration 2 hours 1 1 1 1 1 1 1 1 1 1
1234
London
John Smith
Prim
e Minster
14-‐Ja
n-‐09
flew
meet
rainy
day
aboard
2 hours
refid
place
person
person
time
verb
verb
keyw
ord
keyw
ord
ketword
duratio
n
Organization United Airlines
refid& 1234 1 1 1 1 1 1 1 1 1 1
place& London& 1 && 1 1 1 1 1 1 1 1 1
person& John&Smith& 1 1 && 1 1 1 1 1 1 1 1
organization United&Airlines& 1 1 1 && 1 1 1 1 1 1 1
time 14<Jan<09 1 1 1 1 && 1 1 1 1 1 1
verb& flew& 1 1 1 1 1 && 1 1 1 1 1
verb& meet& 1 1 1 1 1 1 && 1 1 1 1
keyword& rainy& 1 1 1 1 1 1 1 && 1 1 1
keyword& day& 1 1 1 1 1 1 1 1 && 1 1
keyword& aboard& 1 1 1 1 1 1 1 1 1 && 1
duration 2&hours& 1 1 1 1 1 1 1 1 1 1 &
1234
Lond
on&
John
&Smith
&
Unite
d&Airline
s&
14<Ja
n<09
flew&
meet&
rainy&
day&
aboard&
2&ho
urs&
refid
&
place&
person
&
organizatio
n&
time
verb&
verb&
keyw
ord&
keyw
ord&
ketw
ord&
duratio
n
Person&&&&&&&&&&&&&&&&&&Prime&Minister&
John Smith flew to London on 14 Jan 2009 aboard United Airlines to meet with Prime Minister for 2 hours on a rainy day.
refid& 1234 1 1 1 1 1 1 1 1 1 1
person& John&Smith 1 && 1 1 1 1 1 1 1 1 1
person& Prime&Minster& 1 1 && 1 1 1 1 1 1 1 1
organization& United&Airlines& 1 1 1 && 1 1 1 1 1 1 1
time 14<Jan<09 1 1 1 1 && 1 1 1 1 1 1
verb& flew& 1 1 1 1 1 && 1 1 1 1 1
verb& meet& 1 1 1 1 1 1 && 1 1 1 1
keyword& rainy& 1 1 1 1 1 1 1 && 1 1 1
keyword& day& 1 1 1 1 1 1 1 1 && 1 1
keyword& aboard& 1 1 1 1 1 1 1 1 1 && 1
duration 2&hours& 1 1 1 1 1 1 1 1 1 1 &
1234
John
&Smith
Prim
e&&M
inster
Unite
d&&Airlines
14<Ja
n<09
flew&
meet&
rainy&
day&
aboard&
2&ho
urs&
refid
&
person
person
&
organizatio
n&
time
verb&
verb&
keyw
ord&
keyw
ord&
ketw
ord&
duratio
n
Place&&&&&&&&&&&&&&&&&London
• ETL - unify structured & un-structured data, entity extraction, and build semantic graph with counts on edges
• Reason by similarity using cognitive distance for all triples
The 4C’s Of Cognitive Computing
10/1/13 13 ©2013 Saffron Technology, Inc. All rights reserved.
Patterns: Connections, Counts, Context and Concepts in Hybrid Data Real-time learning about entities in data, their connections, their frequencies
Non-linear, parameter free, no rules, or modeling required
Who/what is related? How? Where? When?
Who/what is similar? How similar/different?
What could happen? Where? When?
What has been done before? Did it work?
SENSE-M
AK
ING
DEC
ISION
-MA
KIN
G
Google Twitter rss
FACEBOOK DATABASE SOCIAL NETWORKS
STOCKS Email
DATABASES EXCEL Word PDF
Reasoning by Similarity
Predictive Maintenance
I remember this feeling, it means… Temperature
was 100F+ with high
winds
I remember Tail #123
reported a similar issue.
The last time the pilot reported this problem…
RESULT: 100% recall 1% false alarms Up from 63% recall 18% false alarms Data Sources: Structured and unstructured, Maintenance records,
purchase orders, work orders, everything that speaks to these issues
10/1/13 14
Predict which part will break avoiding un-planned down time or premature maintenance caused by category (not asset) based maintenance. Use Saffron’s experience based learning and reasoning to analyze the unique characteristics of an aircraft’s life to determine it’s maintenance schedule: unify a pilot’s intuition & sensory recall, with the aircraft’s complete maintenance history, the aircraft’s flight routing and experience, and mechanics’ observations and experiences.
©2013 Saffron Technology, Inc. All rights reserved.
0%
20%
40%
60%
80%
100%
Saffron CBM Prior CBM
1 % false alarms
100% hits
Early Warning System
Strategic Early Warning System – Igor Ansoff Scan environment to detect weak signals & rare events to predict surprises • Learn model • Score threads in real time
using SMB triples of incoming emails.
Real time decision if person contacting The Bill & Melinda Gates Foundation poses a threat. Real time threat scoring system of people and groups based on dynamic incremental machine learning approach on unified Web, enterprise data, email, streams, and other metadata.
Incidence Repor,ng Metadata + E-‐mails
Harvested Web Pages (Terabytes & growing )
Structured and Unstructured Data
Personalized Pattern Recognition In Healthcare
The difficulty of this case – Only 15 patients for each of 3 conditions – 90 metrics, 6 locations, 20 time frames – We observes & score 10,000 attributes/beat*patient
-> 100 million triples / beat*patient
Sengupta, Partho P. Intelligent Platforms for Disease Assessment: Novel Approaches in Functional Echocardiography. Feigenbaum Lecture, American Society of Electrocardiography, July, 2013.
90% Accuracy 76% human
54% with R, C-‐tree
Even becer: More data, more power Higher-‐level hypermatrix
Automate Echo Cardio Gram Diagnoses keeping human accuracy - Using traditional statistics has failed so far -
Use Saffron’s non-parametric incremental learning approach to predict the three different conditions from Echo Cardio Gram
Watch The Video With Dr. Sengupta Part 1 http://www.youtube.com/watch?v=rGkyDkDmZts 10:30 - nice Big data setup 12:30 - 14:00 Intelligent Computing Part 2 http://www.youtube.com/watch?v=SAby6-tMvng 4:40 - Look inside the dataset as a matrix 5:30 - Saffron <<< here it is 6:16 - Associate Memory Reasoning 7:17 - heat map where I can see a pattern 7:56 - 8:26 compare patterns and accuracy of 89.6% 8:51 - 9:07 need to do pattern recognition for intelligent assessment
10/1/13 17 Saffron Technology, Inc. All Rights Reserved.
Twitter @paul_hofmann Email [email protected] Homepage www.paulhofmann.net Blog www.paulhofmann.net/blog Slide Share www.slideshare.com/paulhofmann LinkedIn www.linkedin.com/in/hofmannpaul
Technical Positioning
§ Our claim to true Cognitive Computing – Many others assume AI logic over “facts” – In contrast, see Hofstadter’s Surfaces and Essences:
• Analogy as the Fuel and Fire of Thinking – We reason by similarity using “Cognitive Distance”
§ Definition of truly “associative” – Associations as connections (Qliktech, SAP, RDF, …) – Associations as dependencies (a statistical meaning) – The first is getting crowded, but we can do both
Idealized Graph Assumption
Assumed… But, Natural Graphs…
Small degree è Easy to par@@on Many high degree ver@ces
(power-‐law degree distribu@on) è
Very hard to par@@on
Saffron’s 3rd Generation Cognitive Computing Platform
10/1/13 ©2013 Saffron Technology, Inc. All rights reserved. 21
Saffron: Advantage Market – Consumer Intelligence
OEM Partner and Customer Applications
Analytic Reasoning REST APIs ANALOGIES, CONNECTIONS, CLASSIFICATIONS, EPISODIC PATTERNS,
TEMPORAL TRENDS, and CUSTOMER DEFINED
INSIGHT
REASONING
SaffronMemoryBase ENTITY CONNECTIONS, COUNTS AND CONTEXT SPACES, MEMORIES,
MATRICES, ROWS, COLUMNS
SaffronAdmin DATA INGESTION TEMPLATES FOR
STRUCTURED DATA, Entity Extraction
3rd Party NLP Tools for Text Analysis
Structured Unstructured Streaming Other
KNOWLEDGE
INGESTION
DATA
Open Platform with Adapters for 3rd Party Tools
10/1/13 © Saffron Technology, Inc. All rights reserved. 22
Saffron Proprietary
Today
Open API Connectors
Drive Adoption
Collect and Harvest
• Unstructured Content
• Semi-Structured Data
• Structured Data
Ingest and Unify Hybrid Data�
• Data Dictionaries • Ontologies • Name Lists • Special parsers for disparate data types • Connector for SAP Text
Store and Analyze�
• Connections & Counts
• Semantic Analysis • Statistical Analysis • Clustering & Pattern • Prediction
View and Report�
• SaffronAdvantage™ • Trends & Episodes • Emerging Patterns • Episodic Patterns • Prediction
Saffron API Connectors User Experience Statistics & Big
Data Stores Natural Language
& Sentiment Processing
Social & Open Source Content
Reasoning
10/1/13 © Saffron Technology, Inc. All rights reserved.
«Premium Web Content Public Data �
«Deep Web
«Industry News
«Blogs – Industry, Financial, more
«Social Media
«Open Source News
«Compe@tors Sites
Knowledge Store�
l Matrices,
Rows, Columns
En@ty Memories
l
Enterprise Stores l Connec@ons,
Counts, Context
l
l Specialized Stores
Early Signals & Emerging
Trends �
n Velocity
n Threat Scoring (or Opportunity)
n New & Interes@ng
n Export to 3rd Party Tools
ªDatabase
Private Data �
ªTelephone/Email
ªPDF, Word Documents
ªMarke@ng Data
ªSpreadsheets
ªProduc@on Data
ªCalendars
ªERP -‐ SCM ªCRM
ªData Warehouse
t Temporal & Geo Tagging
Content Management �
En@@es, Acribute Extrac@on
t Custom Extractors
t
t Export to 3rd Party Tools
VOCt
t Sen@ments
Event Extractor
t
Saffron Admin™�
Saffron Memory Base®�
Saffron Advantage™�
Cloud Hos@ng u
Culture �
Saffron Community Blog u t Change Management
t Strategic Industry Alliances
t Risk Management Process Design
Repor@ng Customiza@ons u t Analy@c Product Support t SenseMaking Educa@on Custom Harves@ng u
User Guides u
Public Content Management u
Insight�
Social Listening
Brand & Reputa@on
Spa@al Associa@ons
Connec@on (graph) Analysis
Customer 3600 View Distribu@on Intelligence
Temporal Associa@on Trends Compe@@ve Intelligence Similarity Analysis
Sales Intelligence Episodes, Repea@ng Pacerns Contextual Discovery & Diagnos@cs
SenseMaking �Culture �
Saffron Advantage™�
Platform Approach to Intelligence
COMPETITIVE LANDSCAPE
10/1/13 24
Competitive AND Complimentary Positioning
10/1/13 ©2012 Saffron Technology, Inc. All rights reserved. 25
Semantic Stores
Associative Memories
Statistical Packages
Data Visualization
COMPLEMENT UNIFY
UN
IFY
CO
MPLEM
ENT
• Complements § Efficient connection store for “higher” AI to
extract more formal relationships and control memory with business rules
§ Massive frequency store for any “flavor” of statistics, including use of Saffron to quickly discover and build more traditional models
§ Architectures § Saffron methods of partitioning and
compression fit well with column-oriented infrastructure as one storage solution for both data and memories
§ Saffron is additive, not replacing data stores, but adding a memory base to a polymorphic architecture
§ Visualizations § Saffron APIs add intelligence to existing
business applications and data-oriented visualizations providing smarter, faster access for business owners and operational users
Saffron vs. Selected Competitors
Advanced faceted search. No complete semantic graph and no counts for advanced statistics. Hybrid logic and statistics. No incremental ingestion/learning, real world is not a jeopardy question. Symbolics with add-on statistics. No unified representation, only a “bolt on” of traditional statistics. Lead in statistics, some text analytics. No semantic graph for hybrid analytics in combination. Associative ”experience” GUI but not an associative store. No complete graph, no count statistics. Manual link construction in GUI. No automated intelligence, no deep analytics. Biologically inspired AI toolkit. Nascent and unclear mix of traditional methods and new claims.
10/1/13 ©2012 Saffron Technology, Inc. All rights reserved. 26
Saffron’s Unique Capabilities v. MapReduce
10/1/13 ©2012 Saffron Technology, Inc. All rights reserved. 27
MapReduce Saffron MemoryBase
Distributed batch processing
New attributes -> code change
Low level assembler-like API
Generic framework
Distributed real-time transactions
Real-time update: no manual effort
High level, declarative API -> no programming
Optimized solution for advanced analytics
Saffron’s Unique Capabilities vs. RDBMS
10/1/13 ©2012 Saffron Technology, Inc. All rights reserved. 28
RDBMS Saffron (Matrices)
Table joins for semantics
Predefined schema
Limited keys & sorting joins
No natural partitioning
Structured data is fact-based
Nearest-neighbor is infeasible
Pre-joined matrices
Schema-less
Everything is a key, globally sorted
Shared-nothing parallelism
Knowledge is more exploitable
Nearest-neighbor is instant
WHAT DO ANALYSTS SAY ABOUT SAFFRON?
10/1/13 29
HYBRID
Converging Analytics of Hybrid of Data
10/1/13 ©2012 Saffron Technology, Inc. All rights reserved. 30
STRUCTURED
CONTENT
Source: Gartner
Analy=c Processes: Descrip=ve, Diagnos=c, Predic=ve, Prescrip=ve
Market-‐Driven Convergence
Market-‐Driven Convergence
It’s The Dawning Of The Age Of BI DBMS
Disparate data
Quickly changing
requirements
“Use associative when you can’t predict the future but need to prepare for anything”, 2011
Associa=ve index
Component Hadoop Apache Projects Commercial distribu=on or Hadoop integra=on
App dev / scripting Pig, Cascading, WebHDFS
Integrate and transform Pig, Sqoop, Flume
DBMS SQL Hive, Derby
DBMS NoSQL Cassandra, Hbase
View on BI Ecosystem