• Information overload– more than 12 million references already in
MEDLINE– thousands more each day– well-articulated queries retrieve many relevant
articles• Most information from an article is not used
• Breast cancer risk factors are not well understood– Other than age and gender current risk factors
explain only half of the breast cancer occurrence
Tammy Tengs, Sc.DHealth Priorities Research Group
School of Social Ecology University of California, Irvine
Catherine Blake, M.S., MCompSc
Information and Computer Science
University of California, Irvine
Wanda Pratt, Ph.D.Information School and
Division of Biomedical & Health Informatics
University of Washington
• Traditional meta-analysis considers only the 29 studies where smoking is a primary topic• Information synthesis considers secondary information from the remaining 58 studies• Including secondary information can reduce certain publication biases
Breast Cancer Studies that Report Smoking
Smoking reported1 study Primary topicSecondary topic
Primary topic Secondary topic
Research Goals(1) Understand how scientists in
medicine and public health currently use biomedical literature to answer research questions
(2) Design and implement technology to support the observed information behaviors and work practices
(3) Use the technology to quantify the risk of smoking and breast cancer
Motivation
ConclusionMethod•User study
– 2 groups of scientists in medicine and public health
– observations and interviews during a systematic review
– retrospective analysis after a meta-analysis• Meta-analysis training• Literature review
Information Synthesis
Information Synthesis Framework
Co llabo ratio n
V er ifi cation
E xtern alD ata
Con cepts
Facts
A n alysisM E D L IN E E xtractio n
Con textu alQ uestion s
R esearchQ uestion
Selectio nD om ainCorpu s
•During a systematic review scientists iterate between retrieval, extraction and analysis
•Most information required is located only within the full-text of an article (not the abstract or title)
•Information Synthesis requires collaboration
•Pilot study demonstrated– Automated extraction looks promising– Automated analysis successful
Future Work
•Framework to support scientists as they retrieve, extract and analyze information from articles
•Supports observed user behaviors and work practices
Pilot implementation• Automated Extraction
– heuristic approach – training set precision and recall=(0.84,
0.86)– test set precision and recall= (0.68,
0.71)
• Automated Analysis– Random-effects meta-analysis module
implemented in java
Critical components
Implementation and Use
What are women with breast cancer exposed to?
What are women in a similar population exposed to?
Are these rates
significantly different?
External database of risk factors
2
For each study• number of patients• age of patients • risk-factor exposure• …
Codebook• age, gender • % responses• location• …
Breast cancer studies
1
3
Using Information Synthesis to Quantify Breast Cancer Risk Factors
Automated Information Extraction and Analysis for
Information Synthesis
Blake, C. and Pratt,W. (In Press) Collaborative Information Synthesis American Society for Information Science and Technology (ASIST), 2002 Philadelphia, PA.C.Blake (2002) Information Synthesis: A Process used by Scientists in Medicine and Public Health to Overcome Information Overload, Fourth International Conference on Conceptions of Library and Information Science: Emerging Frameworks and Methods (CoLIS 4), Doctoral Forum, Seattle, WA. Tengs, T. and Osgood, N.D. (2001). The link between smoking and Impotence : Two Decades of Evidence. Preventive Medicine, 32(6), 447-452.
•Further evaluation of extraction algorithms
•Implementation of verification component
•Final smoking and breast cancer analysis
Related Work
•Identify information from text
•Verify the information extracted
•Synthesis using meta-analysis
Funded byCalifornia Breast Cancer Research Program