40
Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD) r distribution – please contact [email protected]

Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Embed Size (px)

Citation preview

Page 1: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Search Support in Information Analysis and Synthesis

Emily S. Patterson, PhDResearch Scientist

Associate Director, Converging Perspectives on Data (CPoD)

Not for distribution – please contact [email protected]

Page 2: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

• Ohio State University (OSU) consortium:

• Innovate solutions to data overload (beyond “tweaks”)

• Advance methods for envisioning useful support for

information analysis and comprehension (IA&C)

• Develop interdisciplinary talent (Cognitive Systems

Engineering, Design)

• Innovate by pursuing leverage points:

1. Process vulnerability with high consequences for failure

2. “New” technological capability

3. Grounded basis for predicting performance

improvement

Converging Perspectives on Data (CPoD)

Page 3: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

©2000 Christoffersen, Woods, Malin

Page 4: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Design Seed

• Modular design concept

– Generalizes across software, architecture, scenario,

domain

• Instantiated in a case

– Ideally animated mockup (ani-mock)

• Addresses process vulnerability

• Leverages technological advance

– Multiple levels for relying on machine processing

• Elicits feedback on concept “usefulness”

Page 5: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

• Study: 10 NASIC analysts on Ariane 501• Repeated inaccurate information

“High profit” attribute-based document search Social bookmarking of documents/messages Automated event detection Circular reporting alerts (magnified phrases) Overview visualizations

• Inappropriately applied default assumption Broaden consideration of perspectives on data

• Missed update that overturned assessment Mixed-initiative update detection

Presentation Outline

Page 6: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Study: Observe 10 NASIC analysts (lab)

Time: 2 hour session (avg = 55 min)

Task: Causes, impacts of Ariane 501 accident

Participants: 10 NASIC analysts (avg = 13 yrs)

Tools: Search/browse features of Pathfinder, Word

Data: “On topic” database (~2000 documents)

Briefing: Verbal (video-taped)

Procotol: Think aloud, semi-structured interviews

Analysis: Process tracing, briefing accuracy

Page 7: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Quick Reaction Task

In 1996, the European Space Agency lost a satellite during the first qualification launch of a new rocket design. Give a short briefing about the basic facts of the incident: when it was, why it occurred, and what the immediate impacts were?

Page 8: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

1. Repeated inaccurate information

2. Relied on default assumption

3. Missed update that changed assessment

Reasons for Inaccurate Statements

Page 9: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Software failureJune 4, 1996

When Why - operational contributors

Inertialreferencesystem

Backup andprimary IRS

Embeddedsoftware

No guidance databecause IRS shutdown

Diagnostic datainterpreted asguidance data

Booster andmain enginenozzlesswiveledabnormally

Rocket self-destructed

Rocketveered offcourse

Numerical overflowoccurred because thehorizontal velocityhad more digits thanprogrammed

Flight profiledifferent on A5because a fasterrocket than A4

IRS shut downbecause ofnumerical overflow

Re-usedsoftware fromAriane 4

Software notneeded afterliftoff

No protectionfor common-mode failure

Insufficienttestingrequirements

No protection fornumericaloverflow onhorizontal velocity

No integratedtesting “in the loop”

Poorcommunicationacrossorganizations

No softwarequalificationreview

Multiplecontractorspoorlycoordinated

Reviewprocess wasinadequate

What happened

Why - design and testing contributors

Why - organizational contributors

Where

Less than aminuteafter liftoff

©1999 Patterson

Inaccurate Information on Accident Contributors

Page 10: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Constructed Database

~ 2000 text documents• on target 60%• context 35%• off topic 5%• high profit 0.5% (9 documents)

• Highly biased or deceptive sources • Lack of expertise in the subject area• Distanced from the original data • Language translation• Predictions• “Moving target” of data, events

Sources of Inaccurate Information

File Edit Date Title

Page 11: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Ariane 5 Flight 501 Failure: Report by the Inquiry Board (July 19, 1996)Inertial Reference Software Error Blamed for Ariane 5 Failure; Defense Daily (July 24, 1996)Software Design Flaw Destroyed Ariane 5; next flight in 1997; Aerospace Daily (July 24, 1996)Ariane 5 Rocket Faces More Delay; The Financial Times Limited

(July 24, 1996) Flying Blind: Inadequate Testing led to the Software Breakdown that Doomed Ariane 5; The Financial Times Limited (July 25, 1996)Board Faults Ariane 5 Software; Aviation Week and Space Technology (July 29, 1996)Ariane 5 Explosion Caused by Faulty Software; Satellite News (August 5, 1996) Ariane 5 Report Details Software Design Errors ; Aviation Week and Space Technology (September 9, 1996)Ariane 5 Loss Avoidable with Complete Testing; Aviation Week and Space Technology (September 16, 1996)

“High Profit” Documents

Page 12: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Information Sampling by Participant 5

419 Query 2

24 on-topic

8 cut and

paste

2000 in Database

3 key

725 Query 1

6 High

Profit

3 High

Profit

28 read

Participant 5: 96 minutesExperience: 17 yearsQuery 1: ESA | (european & space & agency)Query 2: (ESA | (european & space & agency)) > (19960601) Infodate

419

28

Key documents

Key documents that are high profit

High profit documents

Legend

Page 13: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

S2: 73 minutesesa & ariane*(esa & ariane*) & failure

S3: 24 minuteseurope 1996(europe 1996) & (launch failure)(europe 1996) & ((launch failure):%2)

S4: 68 minutes(european space agency):%3 & ariane & failure & (launcher |rocket))

S5: 96 minutesESA | (european & space & agency)(ESA | (european & space & agency)) > (19960601) Infodate

S6: 32 minutes1996 & Ariane(1996 & Ariane) & (destr* | explo*)(1996 & Ariane) & (destr* | explo*) & (fail*)

S7: 73 minutessoftware & guidance

S8: 27 minutesesa & arianeariane & 5(ariane & 5):%2((ariane & 5):%2) & (launch & failure)

S9: 44 minutes1996 & European Space Agency & satellite1996 & European Space Agency & lost1996 & European Space Agency & lost & rocket

161

29

22

5

169

15

419

28

7

18466

14 12

194

4

29

Key documents Key documents that are high profitHigh profit documents

Page 14: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Participants whose key documents were not high profit documentsParticipant Experience Time Final query Documents High profit docs

(years) (min.) (# hits) (# read) (# read)3 4 24 22 5 06 8 32 184 7 28 11 27 194 12 09 18 44 29 4 0

Average: 10.3 32* 107 7* 0.5*

Participants whose key documents were high profit documentsParticipant Experience Time Final query Documents High profit docs

(years) (min.) (# hits) (# read) (# read)2 8 73 161 29 34 8 68 169 15 25 17 96 419 28 27 9 73 66 14 5

Average: 10.5 78* 204 22* 3*

* Significant differences using Wilcoxon-Mann-Whitney Non-Parametric Test

More Time to Find High Profit Documents

Page 15: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Participants whose key documents were not high profit documents Participant Accurate Vague Inaccurate Nothing

3 5 2 2 11

6 11 1 3 5

8 9 0 0 11

9 5 3 1 11

Average: 7.5 1.5 1.5* 9.5

Participants whose key documents were high profit documents

Participant Accurate Vague Inaccurate Nothing

2 5 2 0 13

4 11 2 0 7

5 12 3 0 5

7 8 1 0 11

Average: 11 2 0* 6.75

* Significant differences using Wilcoxon-Mann-Whitney Non-Parametric Test

More Accurate with High Profit Documents

Page 16: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

July 5, 1996 (Report 858):Ariane 5 lifts off much faster… information…exhausted the temporary memory (buffer)capacity…both systems simultaneously declaredthemselves to be in an irredemiable errorsituation and commenced a resetprocedure…when the system was reset, thevehicle’s position at that time…was adopted asthe reference base

September 16, 1996 (Report 1385):the active inertial reference system transmittedessentially diagnostic information to thelauncher’s main computer, where it wasinterpreted as flight data and used for flightcontrol calculations

Participant’s ResponseArticle Date/Content

Participant 7 Briefing: “numerical values beyond the programmed limits of the flightcomputer…the platforms initiated a diagnostic “reset” mode that fed incorrect values to theflight computer”

nothing

July 29, 1996 (Report 1440):as a result of the double failure, the active IRSonly transmitted diagnostic information tothe booster’s on-board computer, which wasinterpreted as flight data and used for flightcontrol calculations

“We know there was a problem because the guidanceplatforms shut down. After they shut down, the inertialreference system sent diagnostic information so they’redesigned to shut down when something goes wrong.Assuming the other system has taken over, it’s sendingdiagnostic information so that the people on the groundcan figure out what went wrong with it. Having themboth shut down, the guidance computer is interpretingthe diagnostic information as where it’s at and insteadof getting numbers, it’s getting other things…”

“...In this article, it says when it shut down, it started areset procedure. In the other article, it says diagnostic

information. This article and the otherone…are incompatible, inconsistent with eachother…Of course messages that can’t both be right happenall the time. I’m finding it hard to believe that the vehicleis going to fly without any inertial inputs whatsoever…let’s look at the source…FBIS report. Translatedtext…the other one was later also…it sounds good. If I

had to guess, I would go with the other one.

Think Aloud: Assessing Information Accuracy

Page 17: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

lack of technicalexpertise

errors in translation

biased interpretations

lack of privilegedaccess

lack updates

predictions offuture events

contestable interpretation

Observations

Hypotheses

discrepant

Sourcereputation forcredibility

reputation ofbias

reputation forexpertise in aparticular area

officiality ofresponsibility to dothe analysis

Documenttemporalrelationshipto events

amount quoteddirectly fromofficial document

translateddepth andbreadth ofthemecoverage

length

Description

corroborativeInteractionsbetweendescriptions

corroboration

same sourcedeception

temporal relationshipto updates

level ofsensationalism

level of technicalsophistication

abstract

©1999 Patterson

Accuracy “Cues” and Judgments

Page 18: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Generic “High profit” Attribute-based Search

Page 19: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Feedback from Analysts:• It would be important to filter out a group like press releases, not just select them• With new searches, you change people, places, things, but not your area (communications, command and control)• I want profile to generate documents; then I want to visualize what portions contribute; then I want to see how this changes the results• I want to see what the high value input is from each query and see progress over queries

Tailored “High profit” Attribute-based Search:

Technology Forecasting

Page 20: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Social Bookmarking: Del.icio.us (now Yahoo)

Page 21: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Del.icio.us: Katrina tag

Page 22: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Social Tagging: Barriers to Implementation

Feedback from Analysts:• What scale is needed for emergent properties to emerge? If you’re really lucky, you might get 10,000 people to use this inside analysis agencies. If you have 50, does it work? • We want to automate tags. We want one database to talk to another. These are the big issues.• It’s only going to work to find related people if people use it. Juniors need seniors to do it to find them in the network. • Not always English, text, or open source• Distinguish between crime scene analogy vs text as data; in Iraq non-verbal stuff can tell you if someone is a friend or foe• We are doing all of this already. It’s hard to see what’s new about this. • All documents already have tags. • This requires someone to type in something. • Tags are such a big hassle that we just put down anything.

Page 23: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Ariane 501 Launch Failure(June 4, 1996)

Inquiry Board Report(July 19, 1996)

Ariane 502 Launch(October 30, 1997)

Early reports:details, eyewitnesses,immediate reactions,inaccuracies Summaries of

board findings

Comprehensive,long-term reactions,less diversity

Short summaries,Updates on themes

©1999 Patterson

3 Important Events in Ariane 501

Page 24: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

ariane

EntireDatabase

failure

ariane AND failure; narrowed by time

ariane AND failure AND software; narrowed by time

501Report

502

Automated Event Detection: Query Results

Page 25: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

“OK, this is the same one [the Ariane 501 launch]. This is after the fact. Uh oh. Remember I said how data changes? I’m looking…apparently it says a mechanical failure and then I come along. What’s this say? Failure was due to the brain. It turned out that computer software which was designed for 4, which is much slower. So it turns out now my analysis has changed. It now looks like it was an integral failure. Period. Brought on by internal software. So I’ll qualify this (draws an arrow from previous note below a line and writes “#1435 wrong software used, software for AR4 used in AR5 launch”). That was the problem. Lost guidance. Launch software.”

“Software failure. It’s a confirmation of the previous message saying it’s a software failure… or is this the same message? Yeah, yeah, that’s where the highlighter…”

November 3, 1997 (second time):

a software failure caused the rocket to veer off course and

fall apart August 4, 1996:

failure was due to Ariane-5’s “brain.” It turned out that

the computer software in the Ariane-5 was originally

designed for the Ariane-4, a much slower rocket. Seconds

after take-off, Ariane-5 reached a velocity that exceeded the “brain’s”

computing capacity. It lost all guidance and attitude

information, and the on-board computer tried to supercede the software programme and activated the rocket’s solid

fuel propellant boosters

Participant’s ResponseArticle Date/Content

nothing

November 3, 1997 (third time):

a software failure caused the rocket to veer off course and

fall apart

Participant 9: Illusory Confirmation

Page 26: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Circular Reporting Protection: Magnified Phrases

Feedback from Analysts:• I want to copy pieces, not just whole document• I don’t want to have to always highlight phrases• I want multiple phrases shown from where in the document• Add marking function to come back later (reminder)• Add freetext “stickies” • Add manual tags:

• reliable, not reliable, questionable, no evidence

© 2005 Patterson

Page 27: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Extension: Tagging Magnified Phrases

Feedback from Analysts:• All: Want both folders and tags; add hierarchies (1000 tags; 1600 folders (range 450-2100))• All: Want for >1 project; want to flag/file under other topics than working on• Want to see emerging themes; need support for reconfiguring as model changes• Pre-generate tags without typing, but also give flexibility (no strict ontology)• Automated tags (classification) on bottom and my tags on top • I put like things together to help with focus and understanding, but not everyone does• 425 tags are too many• Tags are such a hassle that we put anything• I print and stack by topics and studies so I’m not sure if I’d use this• Want to sort/view differently: by classified/unclassified, humint/sigint, date/source/title/classification

© 2005 Patterson

Page 28: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

“Low-profit” key articles “High-profit” key articlesEurope: Causes of Ariane 5Failure(July 5, 1996)

Software design flawdestroyed Ariane V; nextflight in 1997(July 24, 1996)

Ariane 5 Failure: InquiryBoard Findings(July 25, 1996)

Board Faults Ariane 5Software(July 29, 1996)

False computer commandblamed in Ariane V failure(June 6, 1996)

Ariane 5 loss avoidable withcomplete testing(September 16, 1996)

Low and High Profit Documents

Page 29: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

report dates

report space

disruptingevent

event thread

prediction of future event

analysis ofpast event

Visual Narratives

(ongoing plan)

Selection Mechanism (e.g., keyword search)

Epoch

landmark event

updates

Models of document types (e.g., high profit document)

Models of scenario elements(e.g., accident)

Page 30: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

© 1999 Tinapple, Woods, Patterson

Overview Visualization: Documentspace + Themespace

Page 31: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Overview + Detailed Integrated Workspace

Page 32: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

1. Repeated inaccurate information

2. Relied on default assumption

3. Missed update that changed assessment

Reasons for Inaccurate Statements

Page 33: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Participant 3: Inaccurate Statement

…the basic impact of the launch is a failure. The monetary loss can be recovered by the insurance...

Page 34: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Participant 6: Confidence Assessment

I am very confident. Everybody agrees. It was the official inquiry board. The reports, they weren’t written by the French, they were written by other people and they don’t disagree. If they did, they would say so.

Page 35: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Broaden Assessment of Perspectives on Data

• Provide “perspective” information on demand:• Organizational role

• Spatial

• Temporal

• Political group affiliation

• Query expansion suggestions • SS-4 missile = R-12 missile

• Cuban missile crisis = Caribbean crisis; October crisis

• Similar query detection

• Find who has opened similar documents

• Find who has written similar analyses

Page 36: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

1. Repeated inaccurate information

2. Relied on default assumption

3. Missed update that changed assessment

Reasons for Inaccurate Statements

Page 37: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Ariane 5 Program Impacts

Loss of rocket booster

Loss of payload

Delay A5 qualification

Delay 502 launch

No paying customer for 503

Cluster Satellite Program Impacts

Loss in market share

Ariane 4 Program Impacts

Program extended

Additional launchers ordered

Insurance rates rise

No 502payload

Delay 503 launch

Program cancelled

Rebuild 1

Additional funds found: rebuild 4

Cannot launch on A5: launch on Soyuz

Inaccurate Information on Accident Impacts

© 1999 Patterson

Page 38: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

(Inaccurate?) Predicted Launch Delays

Prior to 501 Launch September ‘96

Right after 501 incident(June 4, ‘96) December ‘96

After Inquiry Board Report (July 19, ‘96)

December ‘96

March ‘97

Actual 502Launch Date

March - June ‘97

July ‘97

September ‘97

October 30, ‘97

Projected Launch Date

Announcement DateJune 96

July

Aug

Sept

Oct

Nov

Dec

Jan 97

Feb

Mar

Apr

May

June

July

Aug

Sept

Oct

Nov

June 96

July

Aug

Sept

Oct

Nov

Dec

Jan 97

Feb

Mar

Apr

May

June

July

Aug

Sept

Oct

Nov

Launch Date

Report Date

Page 39: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Ariane 4 Program

(Inaccurate?) Predicted Disruptions to Plans

502

Ariane 5 Program

ClusterSatellite Program

lost satellites(no insurance)

programcancelled rebuild 1

additionalfunds found:rebuild 4

cannot launch on Ariane 5: launch on Soyuz

investigationdelay 502 launch

no 502payload

delay 503launch

no 503 payload

programextended

additional launchers ordered

insurance rates rise

501

Page 40: Search Support in Information Analysis and Synthesis Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)

Study Limitations

• Small number of study participants

• Single scenario

– Correct answer known

– Prediction plays a minor role

– No prior experience in knowledge area

– Does not include intentional deception

• Simulated rather than naturalistic setting

– No ability to print documents

– No ability to access other analysts

– Unfamiliar tool and restricted feature use (Pathfinder)

• Verbal protocols more “read aloud” than “think aloud”