MSIM 111 Session 5 (IBM Watson by Armen Pischdotchian)

2015 IBM Corporaton1

IBM Watson and Old Dominion University

Watson from DeepQA toDeep Learning

By: Armen Pischdotchian


Agenda About cognitve systems The statstcs behind DeepQA The DeepQA Pipeline in Detail From DeepQA to Deep Learning


About Cognitve Systems


What is common amongst cognitve systems

The three L's: Language: are you leveraging an NLP stack? Levels: do you score or rank returned responses? Learning: do you employ machine learning technologies?

Coming soon to the three L's is the forth L: Limbs: robotcs


Natural Language Processing Challenges


Deterministc vs. Probabilistc Systems


Linear Regression Logistcal Regression


NLP terminology


When recall is more important than precision

5 Relevant documents (red fsh)

5 irrelevant documents (blue fsh)

The search has retrieved 3 relevant

documents out of a total of 5 relevant

documents from the corpus and 1 irrelevant document.

Recall = 3 / 5 = 0.6

Precision = 3 / 4 = 0.75 (the blue fsh is not part of the equaton at all).

These images are from www.lucidata.inc


The case of 100% recall and low precision

5 Relevant documents (red fsh)


In Watson Discovery Advisor, this is thepreferred scenario even though there may be some irrelevant documents with a high score.

The algorithm team will then work on increasing the precision of this system.

What would be the preferred outcome for the Watson Engagement Advisor?


The case of 100% precision and low recall5 Relevant documents (red fsh)


Zero false positves, 100% precision No blue fsh in the net

But there are many false negatves Many red fsh in the sea

There are potentally many relevant documents that we will never consider.Perfect precision with poor recall is of no value to a DeepQA system.

These images are from www.lucidata.inc


Precision and accuracy in Jeopardy!


Stage 2: Hypothesis Generaton Precision vs.Percentage atempted

Copyright 2010, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602


Search Engine vs. Questons Answering SystemA QA system demands more processing from the system and less analysis on the

user compared to a search engine.


The DeepQA Pipeline


An example Jeopardy! questonIN 1698, THIS COMETDISCOVERER TOOK A

SHIP CALLED THEPARAMOUR PINK ONTHE FIRST PURELY

SCIENTIFIC SEA VOYAGE

IN 1698, THIS COMETDISCOVERER TOOK A

SHIP CALLED THEPARAMOUR PINK ONTHE FIRST PURELY

SCIENTIFIC SEA VOYAGE

Related Content(Structured & Unstructured)

Primary Search

Wilhelm TempelWilhelm Tempel

HMS ParamourHMS Paramour

Isaac NewtonIsaac Newton

Halleys CometHalleys Comet

Pink PantherPink Panther

Christiaan HuygensChristiaan Huygens

Peter SellersPeter Sellers

Edmond HalleyEdmond Halley

Candidate Answer Generation

1) Edmond Halley (0.85)2) Christiaan Huygens (0.20)3) Peter Sellers (0.05)

1) Edmond Halley (0.85)2) Christiaan Huygens (0.20)3) Peter Sellers (0.05)

Merging &Ranking

EvidenceRetrieval

Question Analysis

Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer)Date(1698)Took(discoverer, ship)Called(ship, Paramour Pink)

Keywords: 1698, comet, paramour, pink, AnswerType(comet discoverer)Date(1698)Took(discoverer, ship)Called(ship, Paramour Pink)

[0.58 0 -1.3 0.97][0.71 1 13.4 0.72][0.12 0 2.0 0.40]

[0.84 1 10.6 0.21]

[0.33 0 6.3 0.83][0.21 1 11.1 0.92][0.91 0 -8.2 0.61]

[0.91 0 -1.7 0.60]EvidenceScoring

Spat

ial

Tem

pora

l

Lexi

cal

Taxo

nom

ic

Models

Models

Models

Models

Models

Models


ScoringFinal

MergingRanking

Scoring

Question

Answer, Confidence,

Evidence

TrainedModels

CandidateAnswer

GenerationPrimarySearch

ContextualAnswer Scoring

AnswerScoring

EvidenceRetrieval

Scoring

SearchQuestionAnalysis

Wikipediaetc.


AnswerScoring ContextualAnswer

Scoring

AnswerScoring

How Watson responds to a Queston


Queston Analysis (QA) OverviewWhat is Queston Analysis?

Queston Analysis is the frst stage in the Watson pipeline Ultmate goal: Understand what is being asked

Various algorithms and technologies to identfy as much as possible about theinput queston

Named Entty Detecton Natural Language Processing (NLP) Shallow and Deep Semantc Relaton Detecton

All downstream components rely on the annotatons produced by QA


Stage 1: Queston Analysis Queston analysis technologies includesPart of speech parsing technologyNamed Entty DetectonRelaton ExtractonInverse Document Frequency (IDF)


Question

PrimarySearch


Stage 2: Hypothesis Generaton


Who is the 44th President of the United States?

Question

PrimarySearch


Keywords:44th President United States

Stage 2: Hypothesis Generaton Primary search


Question

CandidateAnswer



Barack ObamaGeorge W. BushHarvard Law SchoolIllinois


Stage 2: Hypothesis Generaton Candidate Answer Gen


Stage 3: Hypothesis Scoring What is Hypothesis Scoring?

Enumeraton of annotators responsible for scoring previous generated candidateanswers

The results produced by these scorers are ranked by the Merging and Rankingcomponents to produce a ranked list of answers.

Outcome: a confdence level of a generated hypothesis Scorers can produce results in any (reasonable) range In fnal merging step, scorers are normalized according to how well their scoring

heuristc correlates to the correct answer Normalized to [0..1] in fnal merging


Hypothesis & Evidence Scoring

Hypotheses EvidenceFeaturesTextual

Alignment

Term andnGram

Matching

LogicalForm

Analysis

Hypothesis Scoring - components

. . .

Question/Topic

Analysis

Question

Hypothesis &Evidence Scoring

Answer,Confidence

Evidence

FinalMerging

& Ranking

HypothesisGeneration

TrainedModels


AnswerIdf scorer

Context Independent scorer

Uses concept referred to as Inverse Document Frequency

Rato of total documents versus documents containing targettext

Target text = candidate answer textLarge corpus (e.g., Wikipedia)Lucene formulaLog scale

Scores in range (0inf)

Higher score indicates more informatveness (answer textappears in few documents)

Example10,000 documentsAnswer text appears in only 10 documentsLog (10,000 / 10) = Log (1,000) = 3


Textual Alignment Answer ScorerSurface similarity measurementQuestonSupportng passage

Dynamic programming for subsequence alignment

Consider the following example:Who led the Allied forces on the European front during World War 2?Dwight D. Eisenhower was supreme commander of Allied forces during the D-Dayinvasion and European front during World War 2.--Overlap is signifcant

Now, consider the example:In 1698, what comet discoverer took a ship called the Paramour Pink on the frstpurely scientfc sea voyage?Edmund Halley made probably the frst primarily scientfc voyage to study thevariaton of the magnetc compass

--Fewer textual overlaps, likely with lower IDF scores



ScoringScoring

Question


Scoring


ContextualAnswer ScoringContextual

Answer Scoring

Barack Obama is the 44th President of the United StatesGeorge W. Bush is the 44th President of the United StatesHarvard Law School is the 44th President of the UnitedStatesIllinois is the 44th President of the United States

Barack Hussein Obama II (i/brk husen obm/; born August 4, 1961) is the 44th and current President of the United States.

George Walker Bush (born July 6, 1946) is anAmerican politician who served as the 43rdPresident of the United States from 2001 to2009 and the 46th Governor of Texas from1995 to 2000.

Barack Obama .95George W. Bush .80Harvard Law School .05Illinois.10


ScoringFinal

MergingRanking

Scoring

Question

TrainedModels

CandidateAnswer



AnswerScoring

Scoring


Wikipediaetc.


AnswerScoring ContextualAnswer

Scoring

AnswerScoring

Answer, Confidence,

Evidence

Stage 4: Final Merger and Ranking


Challenge: Heterogenous feature types and values


EvidenceRetrieval


Candidate Answer AnswerScoring Contextual AnswerScoring Confidence

Barack Obama 0.90 0.90 .95

George W. Bush 0.90 0.80 .65

Harvard Law School 0.10 0.05 .05

Illinois 0.15 0.10 .10

Stage 4: Final Merger and Ranking confdence scoring


Watson is Deep Learning


University of Texas Watson university competton demo


Watson is going Deep Learning

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33

Documents

MSIM 111 Session 5 (IBM Watson by Armen Pischdotchian)