Upload
others
View
14
Download
1
Embed Size (px)
Citation preview
1 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
© 2009 IBM CorporationIBM Confidential
February, 2013
1© 2009 IBM Corporation
In Search of Answers: Watson and a Look
Forward
Frank SteinDirector of Analytics Solution CenterWashington, [email protected]
Leveraging Information for Smarter Organizational Outcomes
2 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Agenda
What is Watson?
Double Jeopardy: Where are we going with Watson?
Final Jeopardy: Watson at Work in Health Care
3 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Watson answers a grand challengeCan we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
4 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Real Language is Real Hard
Chess• A finite, mathematically well-defined search space
• Limited number of moves and states
• Grounded in explicit, unambiguous mathematical rules
Human Language• Ambiguous, contextual and implicit
• Grounded only in human cognition• Seemingly infinite number of ways to express the same meaning
5 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
The hard part for Watson is understanding the Natural Language well enough to find and validate the correct answer in the #1 spot…Computing a confidence that it’s right…and doing it fast enough to compete on Jeopardy!
Where was Einstein born?One day, from among his city views of Ulm, Otto chose a watercolor to
send to Albert Einstein as a remembrance of Einstein´s birthplace.
Welch ran this?If leadership is an art then surely Jack Welch has proved himself a master
painter during his tenure at GE.
Structured
Unstructured
Natural Language Processing
6 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
The Jeopardy! Challenge: A compelling and notable way to drive and measure the technology of automatic Question Answering along 5 Key Dimensions
Broad/Open Broad/Open DomainDomain
Complex Complex LanguageLanguage
High High PrecisionPrecision
Accurate Accurate ConfidenceConfidence
HighHighSpeedSpeed
$800In cell division, mitosis
splits the nucleus & cytokinesis splits this liquid cushioning the
nucleus
$200If you're standing, it's the
direction you should look to check out the
wainscoting. $1000Of the 4 countries in the world that the U.S. does
not have diplomatic relations with, the one that’s farthest north
7 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
7
Broad Domain
Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text.
We do NOT attempt to anticipate all questions and build databases.
In a random sample of 20,000 questions we found2,500 distinct types*. The most frequent occurring <3% of the time.
The distribution has a very long tail.
And for each these types 1000’s of different things may be asked.
*13% are non-distinct (e.g, it, this, these or NA)
Even going for the head of the tail willbarely make a dent
We do NOT try to build a formal model of the world
8 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Automatic Learning for “Reading”
Officials Submit Resignations (.7)People earn degrees at schools (0.9)
Inventors patent inventions (.8)
Volumes of Text Volumes of Text Syntactic FramesSyntactic Frames Semantic FramesSemantic Frames
Vessels Sink (0.7)People sink 8-balls (0.5) (in pool/0.8)
subject verb object
Sentence
Parsing Generalization &
Statistical Aggregation
Fluid is a liquid (.6)Liquid is a fluid (.5)
9 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Medical journal concept annotations
Medications
SymptomsDiseases
Modifiers
10 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Keyword Evidence
IBM Confidential
celebrated
India
In May 1898
400th anniversary
arrival in
Portugal
India
In May
Garyexplorer
celebrated
anniversary
in Portugal
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
10
arrived in
In May, Gary arrived in India after he celebrated his anniversary in Portugal.
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
This evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence
11 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
celebrated
May 1898 400th anniversary
arrival in
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
Portugallanded in
27th May 1498
Vasco da Gama
Temporal Reasoning
Statistical Paraphrasing
GeoSpatial Reasoning
explorer
On the 27th of May 1498, Vasco da Gama landed in Kappad Beach
Kappad Beach
Para- phrase
s
Geo- KB
DateMath
11
IndiaStronger evidence can be much harder to find and score. The evidence is still not 100% certain.
Search Far and Wide
Explore many hypotheses
Find Judge Evidence
Many inference algorithms
DeeperEvidence
12 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
In 1968
When "60 Minutes" premiered this man was U.S. president.
this man was U.S. president.
Lyndon B Johnson
?
12
Divide and Conquer (Typical in Final Jeopardy!) Must identify and solve
sub-questions from different sources to answer
the top level question
13 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Missing Links and Common Bonds: Identifying Hidden Associations
Cross
Your T’s,Your Legs, Rubicon
Category: Common Bonds: “Your legs, your T’s, the Rubicon”
Answer: Things you cross
Technical Approach:
• Semantic Networks• Lexical Collocation• Syntactic Relationships• Wikipedia Links
14 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Answer with Confidence
Question
. . .
Models
Evidence Sources
Models
Models
Models
Models
ModelsPrimarySearch
CandidateAnswer
Generation
HypothesisGeneration
Final Confidence Merging &
RankingSynthesis
Answer Sources
Question & Topic
Analysis
Deep Evidence Scoring
Learned Modelshelp combine and
weigh the Evidence
HypothesisGeneration
Hypothesis and Evidence Scoring
QuestionDecomposition
Hypothesis and Evidence Scoring
Answer Scoring
EvidenceRetrieval
The technology behind IBM Watson
How it Really Works with Content
15 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
DeepQA: The Technology Behind Watson Massively Parallel Probabilistic Evidence-Based Architecture
Generates and scores many hypotheses using a combination of 1000’s Natural Language Processing, Information Retrieval, Machine Learning and Reasoning Algorithms.
These gather, evaluate, weigh and balance different types of evidence to deliver the answer with the best support it can find.
. . .
Answer Scoring
Models
Answer & Confidence
Question
Evidence Sources
Models
Models
Models
Models
ModelsPrimarySearch
CandidateAnswer
Generation
HypothesisGeneration
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
Synthesis
Answer Sources
Question & Topic
Analysis
EvidenceRetrieval
Deep Evidence Scoring
Learned Modelshelp combine and
weigh the Evidence
HypothesisGeneration
Hypothesis and Evidence Scoring
QuestionDecomposition
1000’s of Pieces of Evidence
Multiple Interpretations
100,000’s Scores from many Deep Analysis
Algorithms
100’s sources
100’s Possible Answers
Balance& Combine
16 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
16
The Best Human Performance: Our Analysis Reveals the Winner’s Cloud
Winning Human Performance
Winning Human Performance
2007 QA Computer System2007 QA Computer System
Grand Champion Human Performance
Grand Champion Human Performance
Each dot represents an actual historical human Jeopardy! game
More ConfidentMore Confident Less ConfidentLess Confident
17 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Baseline
12/2007
8/2008
5/2009
10/2009
11/2010
12/2008
DeepQA: Incremental Progress in Precision and Confidence 6/2007-11/2010
5/2008
4/2010
Prec
isio
n
18 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Key Technologies Used by Watson
Natural Language processing• Computational linguistics, grammar frameworks, lexical structure• Dictionaries (Wordnet) and Ontologies (semantic web)
Information Management• Knowledge representation, Search and information retrieval• Unstructured Information Management Architecture (UIMA)
Machine learning• Predictive Analytics, Logistic Regression• Dynamic learning
Distributed computing, massively parallel processing• At HW level: clusters, generally HPC• At File System level: high performance file system, MapReduce,
Hadoop
19 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Key Technologies Used by Watson
Natural Language processing• Computational linguistics, grammar frameworks, lexical structure• Dictionaries (Wordnet) and Ontologies (semantic web)
Information Management• Knowledge representation, Search and information retrieval• Unstructured Information Management (UIMA)
Machine learning• Predictive Analytics, Logistic Regression• Dynamic learning
Distributed computing, massively parallel processing• At HW level: clusters, generally HPC• At File System level: high performance file system, MapReduce,
Hadoop
IBM Content Analytics
20 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
90 x IBM Power 750 servers 2880 POWER7 coresPOWER7 3.55 GHz chip500 GBps on-chip bandwidth10 Gb Ethernet network15 Terabytes of memory20 Terabytes of Disk (clustered)80 Teraflops (Trillion Operations/sec)10 ‘refrigerator sized’ racks
Watson – a Workload Optimized System
21 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Double Jeopardy: Where are we going with Watson?
22 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Watson Enhancements
1. Multi-Dimensional Input and Pipeline
2. Dialoging with Humans
3. Providing Enhanced Evidence
4. Automatic Learning Technology
5. Decision Support over Big Data
6. Multimedia Analytics
7. Context Sensitive Q&A
23 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
23
Answer SetProblem ScenarioEvidence
Answer j
EvidenceEvidence
EvidenceEvidence
EvidenceEvidence
Question Evidence AnswerEvidenceEvidence
….
….….
….….
….….
….
Specific, short questions produce a narrow, targeted range of hypotheses.
Any subset of factors may provide some indication of an answer (e.g., Diagnosis or Treatment)
CombineCombineCombine
Much Harder – Multi-Dimensional Input and Multi-Dimensional Confidence Merging
Moving to Multi-Dimensional Input and Processing
Factor 1
Factor n
Answer i
Answer k
How to combine confidences across factors and factor sets
Evidence
Evidence
Which subsets of factors represent fruitful hypotheses and which may be sufficiently evidenced by the content? There are ~55K fields in an EMR – any subset may predict one or more of some several 100K diseases or conditions. And the clinicians will want to know WHY.
24 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
24
Medical Record
Q: What diagnosis explains the patient’s condition?
Present FactorsRed, painful eyeBlurred visionFamily history of arthritis
Dialoguing to an answer
The first symptom of Lyme disease (also called Lyme’s disease) for about 50% of people is a small, red bull’s-eye rash, called erythema migrans, at the site of an infected tick bite. … Other early, acute Lyme symptoms are flu-like – fatigue, achy muscles or joints, fever, chills, stiff neck, swollen glands, and a headache.
The first symptom of Lyme disease (also called Lyme’s disease) for about 50% of people is a small, red bull’s-eye rash, called erythema migrans, at the site of an infected tick bite. …Other early, acute Lyme symptoms are flu-like – fatigue, achy muscles or joints, fever, chills, stiff neck, swollen glands, and a headache.
Absent FactorsCircular rashFatigueHeadache
Lyme disease can affect different body systems, such as the nervous system, joints, skin, and heart. Symptoms are often described as happening in three stages (although not everyone experiences all three): 1.A circular rash, typically within 1-2 weeks of infection, often is the first sign of infection. … 2.Along with the rash, a person may have flu-like symptoms such as swollen lymph nodes, fatigue, headache, and muscle aches.
Lyme disease can affect different body systems, such as the nervous system, joints, skin, and heart. Symptoms are often described as happening in three stages (although not everyone experiences all three):1.A circular rash, typically within 1-2 weeks of infection, often is the first sign of infection. …2.Along with the rash, a person may have flu-like symptoms such as swollen lymph nodes, fatigue, headache, and muscle aches.
Lyme disease is caused by the bacterium Borrelia burgdorferi and is transmitted to humans through the bite of infected blacklegged ticks. Typical symptoms include high temperature, headache, fatigue, and a characteristic skin rash called erythema migrans.
Lyme disease is caused by the bacterium Borrelia burgdorferi and is transmitted to humans through the bite of infected blacklegged ticks. Typical symptoms include high temperature, headache, fatigue, and a characteristic skin rash called erythema migrans.
25 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
25
It’s all about the Evidence
26 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
26
Learning Technology generates questions to enhance understanding and acquire new knowledge
Automatically generates Learning Questions…
Does “contraindicates the use of” mean “should not use” in general?
These disorders can stop the nerves and muscles in your esophagus from working right. This can cause food to move slowly or even get stuck in the esophagus.
These disorders can stop the nerves and muscles in your esophagus from working right. This can cause food to move slowly or even get stuck in the esophagus.
Patients with preexisting seizure disorder should not use bupropion due to a higher-than-proportional increase in the possibility of seizure as the dose is increased.
Patients with preexisting seizure disorder should not use bupropion due to a higher-than-proportional increase in the possibility of seizure as the dose is increased.
A
Watson considers…What would have to be
true for seizure disorder to be correct?
What would have to be true for seizure
disorder to be correct?
Does contraindicates the use of bupropion mean should not use bupropion?
Q
What neurological condition contraindicates the use of
bupropion?
27 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Increasing Volume of Information, driven by Multimedia
1990’s 2020’s
Video
Text
Exa
Peta
Tera
Giga
Dat
a Vo
lum
e
2000’s 2010’s
Structured data
Audio
ImageMed
High
Low
Com
puta
tiona
l Nee
ds
Soph
istic
atio
n of
Ana
lysi
s
Expr
essi
vene
ss
Digital Marketing
10+% of video views
Wide Area Imagery
100’s TB per day72 video hrs/minute
Media
Source: IBM Market Insights based on composite sources
Safety / Security
Healthcare
Customer
1B camera phones
1B medical images/yr
10s millions cameras
Enterprise Video
Used by 1/3 of enterprises
Video
28 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
28
Efficient decision support over “big data” unstructured (and structured) content
Unstructured DataBroad, rich in contextRapidly growing, currentInvaluable yet under utilized
SQL/XQuery
Existing BI
Inference/Rules
Structured DataPrecise, explicit Narrow, expensive
Jeopardy! Challenge
Deeper Understanding but BrittleHigh Precision at High CostNarrow Limited Coverage
Shallow UnderstandingLow Precision
Broad Coverage
Deeper Understanding, Higher Precision and Broader,
Timely Coverage at lower costs
Key WordSearch
Relevance Ranking
Open-Domain
Question-Answering
29 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Multimedia is becoming more critical to business decision making today
More than 70% of Big Data is multimedia dataExpand value and insights from multimedia analyticsNeed to push boundaries in difficult technical domains to address needs of emerging users.
Enterprise Social Media
Customer
Huge Volumes of Multimedia Data
Inte
ract
ion
“Traditional” Data
Safety/Security Health Mobile Media
Semantic Understanding
Modeling
Feature Extraction
Data collection
UAVs
EvidenceSources
30 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Spatial-temporal data is analyzed to provide most appropriate insights
Adding Context To Improve Answers (Contextual Computing)
Smart phones and web browsers generate vast amounts of data about users and customers.
People freely share their sentiments, expertise, desires, and intentions via social media.
Open Data and Open APIs start delivering breakthroughs in access and integration.
Semantic technologies like RDF enable automatic data discovery and ingestion.
New databases emerge to store and manage web- scale graphs of contextual information.
Advanced sensors and analytics can be exploited for dramatic new insights and predictions.
Adaptive interactions and visualizations deliver value and insight to Markets of One.
SocialBusiness
SocialMedia
Open DataOpen Data
31 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Final Jeopardy: Watson in Health Care
32 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Healthcare faces some of the most complex information challenges
Medical information is doubling every 5 years, much of which is unstructured
81% of physicians report spending 5 hours or less per month reading medical journals
“Medicine has become too complex. Only about 20% of the knowledge clinicians use today is evidence-base.” Steven Shapiro, Chief Medical & Scientific Officer, UPMC
33 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
IBM Watson goes to work in healthcare
Approach:
Use Case: Utilization Management
• Objective review of requested procedure
• Actionable analysis based on relevant content, policies, and guidelines
• Ingestion of disparate data – clinical/medical, and patient data
• Confidence weighted insights
• Prompt for additional information
• Learn form actions taken
Goals:• Improved patient outcomes
• Evidence based review
• Accelerated processing of requests (e.g. treatment)
• Minimize human error
• Improve consistency in reviews
• Increase effectiveness and efficiency leading to cost savings
Supporting clinicians in the review and authorization of medical procedures and treatment towards achieving better patient care and outcomes
34 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Solution (proposed)
Proposed Use Case: Oncology Diagnosis & Treatment (ODT)
• Clinical support for patient assessment based on objective evidence – patient data, medical info, research studies, articles, best practices, guidelines, etc.
• Evidence panel identifying key information used to support diagnosis, recommendations (e.g. suggested tests) and treatment options
• Systematic applied learning based on action taken and outcome derived
• Initial focus on lung, breast, prostate and colorectal cancers
Goal• Create individualized cancer diagnostic and treatment plans
• Enhance clinical confidence with greater access to and understanding of information
• Speed up time to evidence- based treatment
• Reduce diagnostic and administrative errors
• Accelerate the dissemination of practice-changing research
Assisting physicians with the diagnosis and treatment of cancer
IBM Watson goes to work in healthcare
Note: This solution is not currently in production, and is subject to change in scope, as market conditions evolve
35 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Oncology Diagnosis and Treatment
36 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
IBM Watson Is delivered as a service accessible through the cloud
Questions Answers
Private Cloud
Hybrid IT
Public Cloud
IBM Platform
Metadata (e.g. Query Pattern, Data Models, Ontologies)
User
IBM Watson as a ServiceIBM Watson as a Service
Algorithms
Public Info
Algorithms
3rd Party Data
Algorithms
Client Info
Algorithms
IBM Data
37 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
CognitiveComputing
Knowledge Alchemy
We have just begun to build a new era of Knowledge Alchemy powered by Cognitive Computing
SQL
NoSQL
InformationKnowledge
ESB
A D A PT
Context
Decisions & Actions
Feedback & Learning
Data
Traditional Computing
38 © 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Page 38 2/26/2013
[email protected]/ascdc