Upload
james-powell
View
984
Download
1
Tags:
Embed Size (px)
DESCRIPTION
The proliferation of plans can result in debilitating information overload in public health and medical emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they are in alignment with one another. Human analysis of plans is time-consuming and difficult, so text analysis software tools are needed that can help humans (a) compare plans to find gaps or discrepancies and (b) locate relevant sections of plans and display links to them. This research-in-progress describes two text analysis tools being developed at the Los Alamos National Laboratory as part of E-SOS (Emergency Situation Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both tools were used to analyze pan flu plans from the White House, the US Department of Health and Human Services, the Centers for Disease Control, and the 50 states.
Citation preview
Form 836 (7/06)
LA-UR-Approved for public release;distribution is unlimited.
Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLCfor the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptanceof this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce thepublished form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requeststhat the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos NationalLaboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does notendorse the viewpoint of a publication or guarantee its technical correctness.
Title:
Author(s):
Intended for:
09-02971
Using Text Analysis to Reduce Information Overload inPandemic Influenza Planning
Linn Marks Collins, Jorge H. Roman, James E. Powell, MarkL. B. Martinez, Ketan K. Mane, Xiang Yao, A. ShellySpearing, Miriam E. BlakeLos Alamos National Laboratory
Geoffrey Hoare, Rhonda WhiteFlorida Department of Health
6th International Conference on Information Systems forCrisis Response and ManagementSpecial Session on Solutions for Information OverloadMay 10-13, 2009Göteborg, Sweden
1
The proliferation of plans can result in debilitating information overload in public health and medical emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they are in alignment with one another. Human analysis of plans is time‐consuming and difficult, so text analysis software tools are needed that can help humans (a) compare plans to find gaps or discrepancies and (b) locate relevant sections of plans and display links to them. This research‐in‐progress describes two text analysis tools being developed at the Los Alamos National Laboratory as part of E‐SOS (Emergency Situation Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both tools were used to analyze pan flu plans from the White House, the US Department of Health and Human Services, the Centers for Disease Control, and the 50 states.
Proceedings of the 6th International ISCRAM Conference – Gothenburg, Sweden, May 2009 J. Landgren and S. Jul, eds.
Using Text Analysis to Reduce Information Overloadin Pandemic Influenza Planning
James E. Powell, Presenting AuthorLos Alamos National Laboratory
Linn Marks Collins, Jorge H. Roman, Mark L. B. Martinez, Ketan K. Mane,Xiang Yao, A. Shelly Spearing, Miriam E. Blake
Los Alamos National LaboratoryGeoffrey Hoare, Rhonda WhiteFlorida Department of Health
2
Problem – Information Overload
The proliferation of plans can result in debilitating information overload in public health and medical emergencies
Pandemic influenza (pan flu) example• The US Department of Health and Human Services (HHS) and its Centers for
Disease Control and Prevention (CDC) have pan flu plans intended to coordinate the 50 states’ responses
• Each of the 50 states has its own pan flu plan• These plans and related implementing materials are often hundreds of pages
long and get updated frequently• Plans need to be analyzed, compared, and revised so that they are in
alignment with one another• Human analysis of plans is time-consuming and difficult
3
Solution – Text Analysis and Document Linking
Text analysis software tools are needed that can help humans
1. Compare plans to find gaps or discrepancies by:• Extracting key concepts or themes from each plan• Identifying unique and common themes• Displaying the themes in a table and in a network visualization to facilitate
comparison
2. Locate relevant sections of plans by:• Conducting a federated search of indexed plans• Displaying links to relevant plans• Executing both tasks while users are writing sections of a plan, based on what
they are writing
4
Research in Progress at theLos Alamos National Laboratory, Los Alamos, NM, US
Two text analysis tools are being developed as part of E-SOS: Emergency Situation Overview and Synthesis• Project began in 2007• Builds on several years of prior work by team members
E-SOS employs a number of tools• Collaborative workspaces• Semantic web technologies• Digital library technologies• Awareness tools
Two of the awareness tools are being used to analyze pan flu plans• THEMAT (Theme Awareness Tool)• CAT (Content Awareness Tool)
5
E-SOS – System Goals
Collaborative workspaces where users can report and discuss information“Awareness tools”which display information that’s relevant to what users are currently reporting and discussingTechnologies that synthesize information from heterogeneous sources
6
Texts – Pan Flu Plans
The White House Implementation Plan for the National Strategy for Pandemic InfluenzaUS Health and Human Services Federal Guidance to Assist States in Improving State-Level Pandemic Influenza Operating PlansThe Centers for Disease Control and Prevention (CDC) Influenza Pandemic Operation Plan (OPLAN)Pan flu plans for the states• In the initial study (December 2008 – February 2009) the Florida Department of
Health team provided the LANL team with the above pan flu plans as well as pan flu plans for six states in the US
• In the subsequent study (March 2009 - present) the LANL team expanded the project to include pan flu plans for all 50 states in the US, in keeping with its mission as a national laboratory
7
THEMAT - Methodology
Extract themes, called knowledge signatures (kSigs), from each documentCreate sets of kSigs (taxonomies) for each document or set of documentsCompare taxonomies to identify unique and common themesCompute the network analytic relationships among themesGenerate theme network (tNet) visualizations for each document or set of documents
8
THEMAT – Results
A set of knowledge signatures (kSigs) was generated for each documentThe common themes in pan flu plans were identified The unique themes in pan flu plans were identifiedThe common and unique themes were displayed side-by-side in columns to facilitate comparisonThe kSigs were highlighted in all documentsTheme networks (tNets) were created for each plan to display keyconcepts and their relationships
9Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human Services, Centers for Disease Control, and six states showing similarities and differences: for example, “authorities” and “countries” are important in the federal plans but not the state plans
10Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human Services, and Centers for Disease Control showing similarities and differences: for example, “global influenza preparedness” is important in the WH and HHS plans but not in the CDC plan
11Interactive interface for finding the specific files where a knowledge signature (kSig) is located: in this case, “global influenza preparedness” can be found in Appendix C of the Health andHuman Services pan flu plan and in Chapter 5 of the US White House pan flu plan
12Interactive interface for locating highlighted knowledge signatures (kSigs) within a document: in this case, “global influenza preparedness” in the US White House pan flu plan
Navigationcolumnwith links to all of the pages and kSigs in a document
13Interactive interface visualizing the theme network (tNet) for the US Centers for Disease Control pan flu plan, showing which themes are related to each other and the importance of each theme (most important to least important = red, blue, green, yellow)
14Visualization of the relative number of themes in the plans and the degree of similarity among and between them: in this case, the CDC plan and the California plan are least similar to each other and appear on the X-axis; the Arizona plan is a few degrees closer to the Californiaplan; the White House Plan is a few degrees closer to the CDC plan; and the White House plan contains more of the core themes than the other plans do
15
THEMAT Analysis of Pan Flu Plans –Preliminary Conclusions
Displays of common and unique themes reveal similarities and differences in the plans• Some of the differences reflect differences in the authorities and responsibilities of
the government agencies that created the plans• Some of the differences reflect inconsistencies in terminology, which is a potential
human factors problem
Displays of knowledge signatures (kSigs) highlighted in the text reflect the original document structure• A common structure for all of the documents and plans would make it easier for
users to read through the highlighted kSigs and compare plans
Theme networks (tNets) provide a more compact view of themes than lists• Visualizations are not as familiar as lists, so some users may find them difficult to
understand
16
CAT – Methodology
Step 1: Create a repository of content at LANLStep 2: Configure the CAT to include this repository as one of the targets of the federated searchStep 3: See what links to relevant content the CAT retrieves as users compose text about pan flu plansStep 4: Revise the targets as needed
CAT Components
17
CAT – Results
A repository of content was created at LANLThe CAT was configured to include this repository as one of the targets of the federated searchThe number of links returned depends on the number of targets• One link per target is returned for each segment of parsed text (aka REST
query) • In order to increase the number of links, the number of targets has to be
increased
The repository of pan flu documents was subsequently divided into a number of smaller targets in order to increase the number of linksThis allows for finer-grained access
18
CAT – Screen Shot
Results of a federated search of (1) one local repository of pandemic flu plans and (2) four websites with pandemic flu content
19
CAT Linking of Pan Flu Plans – Preliminary Conclusions
The resulting output, showing links to relevant texts, reflects the target structure• Creating a collection of smaller, finer-grained targets optimizes the federated
search capabilities of the CAT
Under these conditions, the CAT facilitates:• Locating content that is related to the topic the user is writing about• Following links to that content• Creating a list of references for the topic
20
Future Work
Use another E-SOS tool, the Web Awareness Tool (WEBAT), to collect more pan flu content from the web• Focus on adding pan flu plans from other countries
Use the THEMAT to analyze the additional contentUse the CAT to make the additional content a target of a federated searchBridge the two tools in order to better support planningShare the resulting text analysis with the appropriate federal agencies in the USShare relevant portions of the text analysis with the appropriate state agencies in the US and request that the person(s) responsible for writing pan flu plans complete a questionnaire providing feedback
21
Questions?
22
CAT – Drupal – Screen Shot