Upload
carla-wynn
View
23
Download
0
Embed Size (px)
DESCRIPTION
TIDES IFE-Bio KickOff Meeting. David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever October 17, 2001. Agenda. - PowerPoint PPT Presentation
Citation preview
MITRE
TIDES IFE-BioKickOff Meeting
David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth,
Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever
October 17, 2001
0
50
100
150
200
250
300
350
400
10/1
3/20
00
10/2
0/20
00
10/2
7/20
00
11/3
/200
0
11/1
0/20
00
11/1
7/20
00
11/2
4/20
00
TIME
Nu
mb
er
Ca
se
s
Cases
New_cases
Dead
Track_id Date Disease Country City_nameCases New_casesDeadEbola 10/30/00 Ebola Uganda Gulu 224 19 73Ebola 10/31/00 Ebola Uganda Gulu 239 15 75Ebola 11/01/00 Ebola Uganda Gulu 251 12 80Ebola 11/11/00 Ebola Uganda Gulu 269 4 87Ebola 11/13/00 Ebola Uganda Gulu 321 1 104Ebola 11/17/00 Ebola Uganda Gulu 329 4 107Ebola 11/17/00 Ebola Uganda Masindi 4 0 4Ebola 11/19/00 Ebola Uganda Mbarara 12 2 9Ebola 11/20/00 Ebola Tanzania Mwanza 7 2 0Ebola 11/21/00 Ebola Kenya Busia 3 3 0
MITRE
Agenda0 Current Status and Experiments (Laurie)0 User Feedback on MiTAP and Exercise (Eric)0 Lessons Learned (Laurie)0 Architecture Briefing (Jay & Scott)0 Geospatial Processing (George)0 Schedule (Jay)0 Issues and Discussion (All)
MITRE
Status of MiTAP0 Availability: excellent
- Available ~100% to users inside, outside firewall- 12 individual user accounts, 6 group accounts- 8 daily users on average, mostly repeat users
0 Data capture: rich & dynamic- ~70 working sources, new source added in 30 min- Average 5.8K msgs/day, 1 min latency- 250K msgs total in system
0 Analysis tools: improving- Messages in 6 languages (with COTS translation)- Sorted into 173 newsgroups- Color coded tagging (pers/org/loc/disease)- Popup summarization
0 Product: need to understand how system is being used
MITRE
0
2000
4000
6000
8000
10000
12000
14000
7/1
7/1
5
7/2
9
8/1
2
8/2
6
9/9
9/2
3
10/7
10/2
1
# M
essa
ges
0
2
4
6
8
10
12
14
# U
sers
# messages
# users
MiTAP Activity:Messages and Users Over Time
Aug Experiment
Attack on America
MITRE
Performance Summary: Sudan 1999 vs Attack on America 2001
Sudan I ncident
J uly 1999
Comments
Availability NA 95% Security via I P fi ltering
Users 5 10
Capture
Msgs 1000 40,000 250,000 msgs total
Sources 20 70 29 new sources added; 30 min/ source
Throughput NA 8000 msgs/ day Latency for feeds: < 1min
Languages 1 6 French, Spanish, Portuguese, Russian,
Chinese English
Analysis
News groups NA 173 89 new groups
Tagging No Yes
People, organizations, locations, date,
diseases
Translation No Yes 5 languages, variable quality
Search No (web only) Yes Boolean, sort by date/ relevance
Attack on America
September 2001
MITRE
Disease of the Month ExperimentsAugust September October
WhoMI TAP Team: control
vs test
UMass/ NYU: no
control
MI TAP Team: control
vs test
What dengue f ever dengue f ever bio threats
Whydebug experiment,
underlying processes
stimulate thinking re
inf o extraction, I R
see what system
collected since
exercise
Findings
MiTAP report had
more detail, more up-to-
date, poorer coverage
(nothing evaluated)
MiTAP user wrote
report with 1/ 5
searches, 1/ 2 docs,
more up-to-date
Lessons
Learned
useless f or report
writing, search
diffi cult, online
capture confi g hard
search more diffi cult
with more docs, search
poorly integrated,
need better viz tools
summaries useless,
duplicates hard to
distinguish
Outcomes
improved source
integration (f aster,
easier)
(brainstorming session
cancelled due to
change in priorities)
improvements on
search
MITRE
Feedback from Eric
0 Report on Bio-Threats0 Deployment for N20 MiTAP Status
- Utility- Usability- Accessibility
MITRE
Lessons LearnedAvailability
User accounts for production systemNo training needed (instructions available on
website)Stronger security (e.g., intrusion detection)Better back-up, monitoring of throughputMore processing power
CaptureReduced latency on scheduled downloads and
spidering, hourly capture of headlinesDistributed capture processingBetter capture of formatted sourcesSome badly filtered, excess volume causes backlogPoor zoning/formatting/decoding of some sources
MITRE
Lessons Learned (2)
Analysis Improved search (e.g., by date/relevance,
popups, integrated with news server) Improved “normalization” of names, regionsToo much data! - need better filtering, topic
detection & clustering, summarizationBetter MT, support for ArabicQ&AGeospatial & temporal visualizationAdvanced searchBetter information extraction
MITRE
Lessons Learned (3)
ProductNo environment for preparing reportsWorkspace
Drag&drop repositoryEditing capabilitiesMultidoc summarizationCollaboration feature (chat & shared workspace)
MITRE
Catalyst Update: Recent work
0 Usability for developers- Logger- Configuration file refinements
0 Improvements for distributed systems- Redesign of I/O polling procedures- Explicit synchronization feature for
Language Processor developers
MITRE
Logger
Documents
MetaDataWord.Text
SentenceWord.POS
Entity Extractio
n
Tokenize
Tagger
Sentence
Entities
catlogger catlogger
MITRE
In progress
0 Usability for developers- Monitor (system status capability)- Native XML I/O! (for ease of debugging &
for lightweight Catalyst )0 Information retrieval
- Integration between Catalyst and new IR engine
- Pushing stream filters toward archived streams
0 Documentation
MITRE
Monitor
Documents
MetaDataWord.Text
SentenceWord.POS
Entity Extractio
n
Tokenize
Tagger
Sentence
Monitor Monitor
Entities
MITRE
XML I/O
XML doc XML doc
XML doc EventExtractio
n
XML doc
Catalyst to XML
EventExtractio
n
XML to Catalyst
Present
With XML I/O featureEasier to debug!
MITRE
XML I/O
Non-Catalyst Process
XML
Wrapper
Process
CatalystProcesse
s
CatalystProcesse
s
With XML I/O feature
Easier path to integrate existing language processing systems!
MITRE
Archived streams
XML docAnswer Extractio
n
IndexRefineme
nt
Question Answering Application
Candidate
Selection
Coreference
filter criteria
Filter criteria must be pushed upstream from its origination point toward the indices so that process may be reduced to little more than is absolutely necessary.
Origination point
Indices
MITRE
For the Midterm - 12/12/2001
0 Monitor0 XML I/O support in the Catalyst library0 Lightweight Catalyst design0 Documentation
MITRE
Catalyst collaborations
0 Qanda-Catalyst-based Qanda used for TREC-Catalyst-based Qanda deployed at AFIWC
0 Information retrieval-Archived annotation streams (for creating IR indexes)-Seekable streams (for processing IR queries)
0 Other projects-ACE/Alembic (Information Extraction)-Audio hot-spotting (Speech Retrieval)-Reading-comp (Question Answering)
MITRE
Document Management
0 Process scheduling0 System linkage0 Inter-site cooperation support0 User features
MITRE
Process Scheduling
0 Problem: MiTAP needs the ability to prioritize sources
- ‘Catching up’ on a new source shouldn’t prevent timely processing of an important existing source
0 Solution: - Preprocessing daemon will notify scheduler of
incoming content - Scheduler assigns jobs to available resources
based on priority0 Status:
- Prototype scheduler delivered (Ponte)- Preprocessing daemon rewrite in mid-
November (Wohlever)
MITRE
System Linkage
0 Problem: Ever notice how new features tend to only apply to new content?
- MiTAP is not flexible - difficult to:=Reprocess and repost a message that has errors
=Find the original source document
=Etc.
- Currently, retroactive changes require 11th hour hacking (or sometimes 12th hour hacking)
0 Solution: Keep database of linkage information to make the system more flexible
0 Status: - Additional information currently being logged- Linkage database - March
MITRE
Inter-site Cooperation Support
0 Problem: Collaboration with other TIDES contractors who have large legacy systems
- Issue of communication more than scalability0 Solution:
- Linkage database for annotations, similar to the one used for system maintenance
- Web client server communication- Path to scalable solution w/richer interactions
0 Status:- Data management - January- Communications: investigation of relevant
protocols and preliminary design - completed- Native XML support for Catalyst - December
MITRE
User features
0 Problem: MiTAP helps you find good information, then what?
0 Solution: - Web accessible support for user views and
data organization to assist in reporting and analysis
- Automated view construction/feedback incorporating additional TIDES technologies
0 Status:- Schema for v.1 of workspace developed
(Ponte, Anderson)- Supporting code in progress (Ponte)- Prototype - December
MITRE
Geo-Spatial Normalization - Goal
Goal:We have: Text containing place namesWe want: Points on maps
Process:Extract place namesLook up places on a listDetermine Lat-LongDisplay
Seattle
47.6 N 122.317 W
Problems:• Place name not on list• More than one place with same name
MITRE
Geo-Spatial Normalization - Solution
Solution:Part 1: A significant portion of the references
can be resolved using easy methods.
Unambiguous: Seattle ToulouseAmbiguous: Paris WashingtonDisambiguated:Paris, Texas The State of WashingtonSolution:Part 2: Use the “easily resolved” references as
training data for a machine learning classifier which will distinguish the rest.
MITRE
Geo-Spatial Normalization - Plans
For MidTerm (Dec. 12, 2001)• Detect a significant portion of the “easily
resolvable” references• Display with some map tool
- Web delivery desirable
After MidTerm (May, 2002)• Try to find more “easily resolvable” references• Do the machine learning part• Integrate with other mapping tools
MITRE
IFE-Bio ScheduleWhat Why When
Availability Add user accounts Widen access to system by request
I mprove quality of online capture I mprove system utility as sources are added
Build new message processing demonI ncrease throughput, decrease
posting latencymid-November
Replace tides2000 with more powerf ul
machine
I ncrease throughput, decrease
posting latencyNovember
Simplif y document processing scripts &
improve logging and error detectionSimplif y admin duties December
Augment search page f unctionality Simplif y fi nding relevant data ongoing
Handle zoning & encoding issues better I mprove translations ongoing
Add MT f or other languages Support Arabic, others as available
Add question answering Simplif y fi nding relevant data December
I mprove sorting, fi ltering, thumbnail
"key entity" list
Provide better fi ltering (e.g.,
FBI S, Relief Web), provide
better name tagging to be used
f or better sorting into
newsgroups
soon
Product
Evaluation Disease of the Month Experiments
Assess utility, evaluate
usability, measure progressmonthly
Data Capture
Analysis
(see architecture schedule, f ollowing)
MITRE
Architecture Schedule
What Why When
Scheduler PrototypeSupport of new message
capture daemon
Delivered, support
ongoing
DB Tools Prerequisite f or system linkage
and intersite cooperationJ anuary
System Linkage DB
Enable addition of new
f eatures; ease system
administration
March
AnalysisArchitecture support
f or Q&ADecember
ProductUser Workspace
ProtoypeSupport f or report construction December
I nfrastructure Catalyst MonitorEase development and
debuggingDecember
Native XML Support Support f or legacy systems December
Documentation Usability Ongoing
Data Capture
MITRE
Issues and Discussion
0 How is MiTAP currently being used?- Who are the users?- What are the users doing?- What do users want?
0 Prioritization of issues- Integrated feasibility experiment versus
operational prototype: =Possible deployment vs integration of other TIDES technologies
(Do we need to adjust our priorities?)
- Along what dimensions should we optimize?=Availability, capture, analysis, presentation