30
MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever 0 50 100 150 200 250 300 350 400 10/13/2000 10/20/2000 1 0/27/20 00 11/3/200 0 1 1/10/2000 11/17/2000 11/24/2000 TIM E Num ber C ases C ases N ew _cases D ead Track_id D ate Disease C ountry City_name Cases N ew _case Dead Ebola 10/30/00 Ebola U ganda Gulu 224 19 73 Ebola 10/31/00 Ebola U ganda Gulu 239 15 75 Ebola 11/01/00 Ebola U ganda Gulu 251 12 80 Ebola 11/11/00 Ebola U ganda Gulu 269 4 87 Ebola 11/13/00 Ebola U ganda Gulu 321 1 104 Ebola 11/17/00 Ebola U ganda Gulu 329 4 107 Ebola 11/17/00 Ebola U ganda Masindi 4 0 4 Ebola 11/19/00 Ebola U ganda Mbarara 12 2 9 Ebola 11/20/00 Ebola Tanzania M w anza 7 2 0 Ebola 11/21/00 Ebola K enya Busia 3 3 0

MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

Embed Size (px)

DESCRIPTION

MITRE Status of MiTAP 0 Availability: excellent -Available ~100% to users inside, outside firewall -12 individual user accounts, 6 group accounts -8 daily users on average, mostly repeat users 0 Data capture: rich & dynamic -~70 working sources, new source added in 30 min -Average 5.8K msgs/day, 1 min latency -250K msgs total in system 0 Analysis tools: improving -Messages in 6 languages (with COTS translation) -Sorted into 173 newsgroups -Color coded tagging (pers/org/loc/disease) -Popup summarization 0 Product: need to understand how system is being used

Citation preview

Page 1: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

TIDES IFE-BioKickOff Meeting

David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth,

Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever

October 17, 2001

0

50

100

150

200

250

300

350

400

10/13

/2000

10/20

/2000

10/27

/2000

11/3/

2000

11/10

/2000

11/17

/2000

11/24

/2000

TIME

Num

ber

Cas

es

CasesNew_casesDead

Track_id Date Disease Country City_nameCases New_casesDeadEbola 10/30/00 Ebola Uganda Gulu 224 19 73Ebola 10/31/00 Ebola Uganda Gulu 239 15 75Ebola 11/01/00 Ebola Uganda Gulu 251 12 80Ebola 11/11/00 Ebola Uganda Gulu 269 4 87Ebola 11/13/00 Ebola Uganda Gulu 321 1 104Ebola 11/17/00 Ebola Uganda Gulu 329 4 107Ebola 11/17/00 Ebola Uganda Masindi 4 0 4Ebola 11/19/00 Ebola Uganda Mbarara 12 2 9Ebola 11/20/00 Ebola Tanzania Mwanza 7 2 0Ebola 11/21/00 Ebola Kenya Busia 3 3 0

Page 2: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Agenda0 Current Status and Experiments (Laurie)0 User Feedback on MiTAP and Exercise (Eric)0 Lessons Learned (Laurie)0 Architecture Briefing (Jay & Scott)0 Geospatial Processing (George)0 Schedule (Jay)0 Issues and Discussion (All)

Page 3: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Status of MiTAP0 Availability: excellent

- Available ~100% to users inside, outside firewall- 12 individual user accounts, 6 group accounts- 8 daily users on average, mostly repeat users

0 Data capture: rich & dynamic- ~70 working sources, new source added in 30 min- Average 5.8K msgs/day, 1 min latency- 250K msgs total in system

0 Analysis tools: improving- Messages in 6 languages (with COTS translation)- Sorted into 173 newsgroups- Color coded tagging (pers/org/loc/disease)- Popup summarization

0 Product: need to understand how system is being used

Page 4: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

02000400060008000

100001200014000

7/1

7/15

7/29

8/12

8/26 9/9

9/23

10/7

10/2

1

# M

essa

ges

02468101214

# U

sers

# messages# users

MiTAP Activity:Messages and Users Over Time

Aug Experiment

Attack on America

Page 5: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Performance Summary: Sudan 1999 vs Attack on America 2001

Sudan I ncidentJ uly 1999

CommentsAvailability NA 95% Security via I P fi ltering

Users 5 10Capture

Msgs 1000 40,000 250,000 msgs total Sources 20 70 29 new sources added; 30 min/ source

Throughput NA 8000 msgs/ day Latency f or feeds: < 1min Languages 1 6 French, Spanish, Portuguese, Russian,

Chinese EnglishAnalysis

News groups NA 173 89 new groups

Tagging No YesPeople, organizations, locations, date, diseases

Translation No Yes 5 languages, variable quality Search No (web only) Yes Boolean, sort by date/ relevance

Attack on AmericaSeptember 2001

Page 6: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Disease of the Month ExperimentsAugust September October

Who MI TAP Team: control vs test

UMass/ NYU: no control

MI TAP Team: control vs test

What dengue f ever dengue f ever bio threats

Why debug experiment, underlying processes

stimulate thinking re inf o extraction, I R

see what system collected since exercise

FindingsMiTAP report had more detail, more up-to-date, poorer coverage

(nothing evaluated)

MiTAP user wrote report with 1/ 5 searches, 1/ 2 docs, more up-to-date

Lessons Learned

useless f or report writing, search diffi cult, online capture confi g hard

search more diffi cult with more docs, search poorly integrated, need better viz tools

summaries useless, duplicates hard to distinguish

Outcomesimproved source integration (f aster, easier)

(brainstorming session cancelled due to change in priorities)

improvements on search

Page 7: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Feedback from Eric0 Report on Bio-Threats0 Deployment for N20 MiTAP Status

- Utility- Usability- Accessibility

Page 8: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Lessons LearnedAvailability

User accounts for production systemNo training needed (instructions available on website)Stronger security (e.g., intrusion detection)Better back-up, monitoring of throughputMore processing power

CaptureReduced latency on scheduled downloads and

spidering, hourly capture of headlinesDistributed capture processingBetter capture of formatted sourcesSome badly filtered, excess volume causes backlogPoor zoning/formatting/decoding of some sources

Page 9: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Lessons Learned (2)Analysis

Improved search (e.g., by date/relevance, popups, integrated with news server)

Improved “normalization” of names, regionsToo much data! - need better filtering, topic

detection & clustering, summarizationBetter MT, support for ArabicQ&AGeospatial & temporal visualizationAdvanced searchBetter information extraction

Page 10: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Lessons Learned (3)Product

No environment for preparing reportsWorkspace

Drag&drop repositoryEditing capabilitiesMultidoc summarizationCollaboration feature (chat & shared workspace)

Page 11: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Catalyst Update: Recent work0 Usability for developers

- Logger- Configuration file refinements

0 Improvements for distributed systems- Redesign of I/O polling procedures- Explicit synchronization feature for

Language Processor developers

Page 12: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Logger

Documents

MetaDataWord.Text

SentenceWord.POS

Entity Extractio

n

Tokenize

Tagger

Sentence

Entities

catlogger catlogger

Page 13: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

In progress0 Usability for developers

- Monitor (system status capability)- Native XML I/O! (for ease of debugging &

for lightweight Catalyst )0 Information retrieval

- Integration between Catalyst and new IR engine

- Pushing stream filters toward archived streams

0 Documentation

Page 14: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Monitor

Documents

MetaDataWord.Text

SentenceWord.POS

Entity Extractio

n

Tokenize

Tagger

Sentence

Monitor Monitor

Entities

Page 15: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

XML I/O

XML doc XML doc

XML doc EventExtractio

nXML doc

Catalyst to XML

EventExtractio

nXML to

Catalyst

Present

With XML I/O feature Easier to debug!

Page 16: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

XML I/O

Non-Catalyst Process

XML

Wrapper

Process

CatalystProcesse

s

CatalystProcesse

s

With XML I/O feature

Easier path to integrate existing language processing systems!

Page 17: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Archived streams

XML docAnswer Extractio

n

IndexRefineme

nt

Question Answering Application

Candidate

Selection

Coreference

filter criteria

Filter criteria must be pushed upstream from its origination point toward the indices so that process may be reduced to little more than is absolutely necessary.

Origination point

Indices

Page 18: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

For the Midterm - 12/12/2001

0 Monitor0 XML I/O support in the Catalyst library0 Lightweight Catalyst design0 Documentation

Page 19: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Catalyst collaborations0 Qanda

-Catalyst-based Qanda used for TREC-Catalyst-based Qanda deployed at AFIWC

0 Information retrieval-Archived annotation streams (for creating IR indexes)-Seekable streams (for processing IR queries)

0 Other projects-ACE/Alembic (Information Extraction)-Audio hot-spotting (Speech Retrieval)-Reading-comp (Question Answering)

Page 20: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Document Management 0 Process scheduling0 System linkage0 Inter-site cooperation support0 User features

Page 21: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Process Scheduling0 Problem: MiTAP needs the ability to prioritize

sources- ‘Catching up’ on a new source shouldn’t prevent

timely processing of an important existing source0 Solution:

- Preprocessing daemon will notify scheduler of incoming content

- Scheduler assigns jobs to available resources based on priority

0 Status:- Prototype scheduler delivered (Ponte)- Preprocessing daemon rewrite in mid-November

(Wohlever)

Page 22: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

System Linkage0 Problem: Ever notice how new features tend to only

apply to new content?- MiTAP is not flexible - difficult to:

=Reprocess and repost a message that has errors=Find the original source document=Etc.

- Currently, retroactive changes require 11th hour hacking (or sometimes 12th hour hacking)

0 Solution: Keep database of linkage information to make the system more flexible

0 Status: - Additional information currently being logged- Linkage database - March

Page 23: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Inter-site Cooperation Support 0 Problem: Collaboration with other TIDES

contractors who have large legacy systems- Issue of communication more than scalability

0 Solution:- Linkage database for annotations, similar to the

one used for system maintenance- Web client server communication- Path to scalable solution w/richer interactions

0 Status:- Data management - January- Communications: investigation of relevant

protocols and preliminary design - completed- Native XML support for Catalyst - December

Page 24: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

User features0 Problem: MiTAP helps you find good information,

then what?0 Solution:

- Web accessible support for user views and data organization to assist in reporting and analysis

- Automated view construction/feedback incorporating additional TIDES technologies

0 Status:- Schema for v.1 of workspace developed

(Ponte, Anderson)- Supporting code in progress (Ponte)- Prototype - December

Page 25: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Geo-Spatial Normalization - Goal

Goal:We have: Text containing place namesWe want: Points on maps

Process:Extract place namesLook up places on a listDetermine Lat-LongDisplay

Seattle

47.6 N 122.317 W

Problems:• Place name not on list• More than one place with same name

Page 26: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Geo-Spatial Normalization - Solution

Solution:Part 1: A significant portion of the references

can be resolved using easy methods.

Unambiguous: Seattle ToulouseAmbiguous: Paris WashingtonDisambiguated:Paris, Texas The State of WashingtonSolution:Part 2: Use the “easily resolved” references as

training data for a machine learning classifier which will distinguish the rest.

Page 27: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Geo-Spatial Normalization - Plans

For MidTerm (Dec. 12, 2001)• Detect a significant portion of the “easily

resolvable” references• Display with some map tool

- Web delivery desirable

After MidTerm (May, 2002)• Try to find more “easily resolvable” references• Do the machine learning part• Integrate with other mapping tools

Page 28: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

IFE-Bio ScheduleWhat Why When

Availability Add user accounts Widen access to system by requestI mprove quality of online capture I mprove system utility as sources are added

Build new message processing demon I ncrease throughput, decrease posting latency mid-November

Replace tides2000 with more powerf ul machine

I ncrease throughput, decrease posting latency November

Simplif y document processing scripts & improve logging and error detection Simplif y admin duties December

Augment search page f unctionality Simplif y fi nding relevant data ongoingHandle zoning & encoding issues better I mprove translations ongoingAdd MT f or other languages Support Arabic, others as availableAdd question answering Simplif y fi nding relevant data December

I mprove sorting, fi ltering, thumbnail "key entity" list

Provide better fi ltering (e.g., FBI S, Relief Web), provide better name tagging to be used f or better sorting into newsgroups

soon

Product

Evaluation Disease of the Month ExperimentsAssess utility, evaluate usability, measure progress monthly

Data Capture

Analysis

(see architecture schedule, f ollowing)

Page 29: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Architecture ScheduleWhat Why When

Scheduler Prototype Support of new message capture daemon

Delivered, support ongoing

DB Tools Prerequisite f or system linkage and intersite cooperation J anuary

System Linkage DBEnable addition of new f eatures; ease system administration

March

Analysis Architecture support f or Q&A December

Product User Workspace Protoype Support f or report construction December

I nfrastructure Catalyst Monitor Ease development and debugging December

Native XML Support Support f or legacy systems DecemberDocumentation Usability Ongoing

Data Capture

Page 30: MITRE TIDES IFE-Bio KickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry,

MITRE

Issues and Discussion0 How is MiTAP currently being used?

- Who are the users?- What are the users doing?- What do users want?

0 Prioritization of issues- Integrated feasibility experiment versus

operational prototype: =Possible deployment vs integration of other TIDES technologies

(Do we need to adjust our priorities?)

- Along what dimensions should we optimize?=Availability, capture, analysis, presentation