Upload
lucidworks
View
104
Download
0
Embed Size (px)
Citation preview
©IBM 2016 Please do not distribute without permission
Search++: Cognitive transformation of human-‐system interaction
1
Sridhar Sudarsan, Distinguished Engineer & CTO, IBM Watson Partnerships@sridharsudarsan
October 14, 2016
2
The biggest taxi companyowns no cars.
The largest accommodation companyowns no real estate.
The biggest media companyowns no content.
The largest retailercarries no inventory.
Disruption is upon us.
3
Oil & Gas80,000 sensors in a facilityproduce 15 petabytes of data
Public Safety520 terabytes of data are produced by New York City's surveillance cameras each day
Energy & Utilities680m+ smart meters will produce280 petabytes of data by 2017
HealthcareThe equivalent of 300 million books of health related data is produced per human in a lifetime
4
©IBM 2016 Please do not distribute without permission
What is driving the need for Cognitive Computing?
2000
Daily Volum
e in Exaby
tes
9000
8000
7000
6000
5000
4000
3000
Sensors & Devices
VoIP
Enterprise Data
Social Media
Percent of uncertain data
100%
80%
60%
40%
20%
0
2020
80% of Data will be uncertain in 2020
5
©IBM 2016 Please do not distribute without permission
This disruption is fueled by three forces.
T h e p o w e r f u l c a p a b i l i t i e s a n d o u t c o m e s b r o u g h t o n b y
c o g n i t i v e c o m p u t i n g .
T h e a b i l i t y t o b u i l d b u s i n e s s i n c o d e w i t h t h e
A P I e c o n o m y.
T h e p r o l i f e r a t i o n o f d i f f e r e n t t y p e s o f d a t a .
6
©IBM 2016 Please do not distribute without permission
More devices are creating more information.
1,200,000l i n e s o f c o d e i n a
sma r t p h on e
80,000l i n e s o f c o d e i n a p a c emake r
100,000,000l i n e s o f c o d e i n
a n ew c a r
5,000,000l i n e s o f c o d e i nsma r t a p p l i a n c e
7
©IBM 2016 Please do not distribute without permission
Unstructured Data Explosion
Health datawill grow
99%
Insurancedata will grow
94%
Utilities datawill grow
99%
Manufacturing data will grow
99%
88%unstructured.
84%unstructured.
84%unstructured.
82%unstructured.
80% of this data has been “invisible” to computers, and therefore useless to us.
Until now.
8
Genes
Chemical Compounds
Diseases
PatientsAnimal Models
FDA Orange Book/Moieties
Cells PatentsDrugs
Plant Biology
®™
Meaningful insights are only gained when data reveals a universe of relationships
©IBM 2016 Please do not distribute without permission
Humans excel at
DILEMMAS
COMPASSION
DREAMING
ABSTRACTION
IMAGINATION
MORALS
GENERALIZATION
Cognitive Systemsexcel atCOMMON SENSE (but with many biases)
NATURAL LANGUAGE PROCESSING AT SCALE
LOCATING KNOWLEDGE
PATTERN IDENTIFICATION
MACHINE LEARNING
ELIMINATING BIASES
PROVIDING ENDLESS CAPACITY
Cognitive systems are creating a new partnership between humans and technology
10
©IBM 2016 Please do not distribute without permission
So what are the Characteristics of a Cognitive System?
Scale in Proportion
Provide Supporting Evidence
Ingest Variety of (Big) Data
Respond with Degree of
Confidence
Learn with Every Interaction
Offer ContextualGuidance &
Insights Generate &
Evaluate Hypothesis
UnderstandNatural
Language
Understand personality at a
Deeper level
Relate between Terms &
Concepts
Engage in a Dialog
11
©IBM 2016 Please do not distribute without permission
What is IBM Watson?
Cognitive Technology
Read & Understand Natural Language
Generate multiple hypothesis with Evidence
Support for several usage patterns
Natural extension of what humans can do at their best
Learns over time
Engagement Discovery Insights Extraction Platform
12
©IBM 2016 Please do not distribute without permission
Discovery
• Help find questions you’re not thinking to ask
• Connect the dots & uncover new pathways
• Lead to new inspiration
Exploration• Connect Information• Identify correlations &
insights• Explore your problem
area better
Engagement
• Understand, handle & fulfill intents
• Engage in a dialog with users
• Answer questions around products & services
• Evaluate a presented condition or a pattern
• Check against a set of written policy assertions
• Simplify decision making through cognitive visualization
Cognitive Interactions
Today, cognitive computing broadly enables four classes of interactions
Decision
13
©IBM 2016 Please do not distribute without permission
Interact with a cognitive system
©IBM 2016 Please do not distribute without permission
Today, Watson has grown into a rich and flexible API ecosystem…
…with more to come
15
©IBM 2016 Please do not distribute without permission
Interacting with Watson
Speech to Text
Text to Speech
Conversation AlchemyLanguage
©IBM 2016 Please do not distribute without permission
Demonstration – Let’s look at how we use information Retrieval in one of these services
©IBM 2016 Please do not distribute without permission 18
Key steps to use Watson with an example stackexchange forum data
Content prep
Format content
Ingest content
Training & Test
Split to training and
test
Configure custom scorers
Integrate experience
Look up with Watson
Watson responds to questions
©IBM 2016 Please do not distribute without permission
Watson Service versus Traditional Search
Results
• Recall@1 improvement of ~50% for ranked results
• Custom scorers based on user popularity shows further improvements
Notes
• Out of the box Solr & Retrieve & Rank configuration
©IBM 2016 Please do not distribute without permission
Let’s look at using Information Retrieval in a simple Application
20
©IBM 2016 Please do not distribute without permission
Steps of the Case Study
• Design Thinking to define the problem• Identifying our Corpus• Training• Applying real project data
©IBM 2016 Please do not distribute without permission
The Problem
• Sponsor Users• Project / Risk Management Professionals
• Methodology• All design and development work are iterative• “Playbacks” milestones to review original goals, review designs and communicate
current state• Playback Zero (Design & Architecture)• Hills Playback
• Hills 1 : Identify & Source documents • Hills 2 : Train Watson to identify risks and answer related questions • Hills 3 : Test Watson• Hills 4 : Integrate with application
©IBM 2016 Please do not distribute without permission
Hill 1 : Identify & Ingest documents
-‐ Ingested “Identifying & Managing Project Risk”-‐ Populated a Solr Index with ingested documents (JSON)-‐ Trained Watson Retrieve and Rank Service
©IBM 2016 Please do not distribute without permission
Hill 2 : Train Watson• Phase 1 : Train to answer common questions on risks/risk management (Short tail questions)• Phase 2 : Train to surface the most relevant answer from many answers (Long tail questions)
Frequency of Questions
100s
Short Tail Long Tail
Watson Retrieve and Rank
Watson Conversation
24
100,000s
Here Watson uses reasoning strategies that focus onidentifying the most appropriate answer…
Here Watson uses reasoning strategies that focus on thelanguage and context of the question…
©IBM 2016 Please do not distribute without permission
Hill 2 : Train short tail questions
-‐ Goal : Provide a dialog based interaction -‐ Map intents to user input (Example : I don’t have a SME, What should I do -‐> risk_expert_availability)-‐ Model entities for the domain (Example : SME : Domain Expert, Subject Expert)-‐ Create dialog flows to model the conversation-‐ # of intents : 7 -‐ # of entities : 5-‐ # of dialog flow : 1 -‐ # of training samples : 250
Watson Conversation Toolkit
©IBM 2016 Please do not distribute without permission
Hill 2 : Train long tail questions
Goal : Identify relevant answers / solutions for risk and rank them
-‐ Trained on 1000 questions-‐ Ingested 363 documents-‐ Each question mapped to multiple answers from risk related document sources
-‐ Train a Ranker-‐ Evaluate Ranker accuracy using precision, recall and f-‐ measures
©IBM 2016 Please do not distribute without permission
Hill 3 : Test Watson
Trained on PMI Risk Manual (Tom Kendrik)• True positives : 88% ("risk" sentences were classified as "risk”)• False negatives: 30% ("no_risk" sentences were classified "risk.”)
> classifyText(plainText="Documentation is available on the web and in print form",apiEndpoint,classifierId,nlcUsername,nlcPW)
none risk0.98875386 0.01124614> classifyText(plainText="Always keep a detailed project log and allow the team to edit it",apiEndpoint,classifierId,nlcUsername,nlcPW)
risk none0.5552859 0.4447141> classifyText(plainText="Poorly defined project scope may lead to confusion among the team",apiEndpoint,classifierId,nlcUsername,nlcPW)
risk none0.9836238 0.0163762> classifyText(plainText="Always keep a detailed project log",apiEndpoint,classifierId,nlcUsername,nlcPW)
none risk0.94648106 0.05351894
Testing using command-‐line
100 question set
©IBM 2016 Please do not distribute without permission
Results
-‐ Trained on PMI Book, Tested on a project design deliverable document
-‐ Around 10% of sentences were false negatives
-‐ Suggested Next Steps-‐ Improve ground truth-‐ Add more documents-‐ Feedback from users
©IBM 2016 Please do not distribute without permission
Hill 4 : Integrate with UI (Slack)
SolrIndex
Application layer (REST)
Alchemy Language
Conversation
Cloudant
Retrieve & Ranker
Is Long tail?
No
Yes
©IBM 2016 Please do not distribute without permission
Results
Results of Case Study• Obtained good results when trained on SOWs, RFP and other documents
containing risk related contents• Various risk concepts, categories can be handled by a trained language model
using Alchemy• Ideal to have a representative set of end user questions related to risk
Conclusions• Watson can analyze and detect project risks when handling large proposals and
other client deliverables
©IBM 2016 Please do not distribute without permission
Future Research Ideas
01020304050607080
1 2 3 4 5 6 7 8 9
Num
ber o
f Risks Identified
Project Time
Earlier Risk Detection?
Cognitive None
0
5
10
15
1 2 3 4
Vs. With Cognitive
Jr PM
Sr PM
Engineer
0
5
10
15
1 2 3 4
Normalized Rate of Detection?Without Cognitive
Jr PM
Sr PM
Engineer
• Other Training data sources
• More artifacts over time
• Training effort by experts
©IBM 2016 Please do not distribute without permission
What’s Next for Cognitive Computing (and Watson)?
The Power to “See”
Image Analysis and Anomaly Detection including radiological interpretation
Anomaly
Humanoid Interactions
Robotics form factor and humanoid gestures, inputs and outputs
Neuromorphic Computing
SyNAPSE: Neurosynaptic SystemsA brain-‐inspired chip to transform mobility and Internet of Things through sensory perception
32
©IBM 2016 Please do not distribute without permission
Watson RoboticsEmpowering human-‐machine interaction
Experiments on integrating Watson with AldebaranNAO robots (http://www.aldebaran.com/en)
Anthropomorphic animationVocal/auditory interactionsResponses augmented with anatomical gesturing to punctuate key points
©IBM 2016 Please do not distribute without permission
©IBM 2016 Please do not distribute without permission
In 10 years, cognitive systems will be to computing what transaction processing is today
• Amplify human creativity• Inspiring us to new alternatives to decision options• Bringing the breadth of all human knowledge to the tip of our tongue
• Learn their behavior through formal and informal training processes• Interact with humans on our terms – in the language of humans• Demonstrate their expertise through trust and depth of character• Evolve strategies of success – adapting to ever changing knowledge and
understanding• Establish transformative relationships between humans and machines