1
• Information overload more than 12 million references already in MEDLINE thousands more each day well-articulated queries retrieve many relevant articles • Most information from an article is not used • Breast cancer risk factors are not well understood Other than age and gender current risk factors explain only half of the breast cancer occurrence Tammy Tengs, Sc.D Health Priorities Research Group School of Social Ecology University of California, Irvine Catherine Blake, M.S., MCompSc Information and Computer Science University of California, Irvine Wanda Pratt, Ph.D. Information School and Division of Biomedical & Health Informatics University of Washington • Traditional meta-analysis considers only the 29 studies where smoking is a primary topic • Information synthesis considers secondary information from the remaining 58 studies • Including secondary information can reduce certain publication biases Breast Cancer Studies that Report Smoking Smoking reported 1 study Primary topic Secondary topic Primary topic Secondary topic Research Goals (1) Understand how scientists in medicine and public health currently use biomedical literature to answer research questions (2) Design and implement technology to support the observed information behaviors and work practices (3) Use the technology to quantify the risk of smoking and breast cancer Motivat ion Conclusi on Method •User study 2 groups of scientists in medicine and public health observations and interviews during a systematic review retrospective analysis after a meta- analysis • Meta-analysis training • Literature review Information Synthesis Information Synthesis Framework C o llab o ration V erifica tio n E xtern al D ata C o n cep ts F acts A nalysis M E D LIN E E xtractio n C o n tex tu al Q u estion s R esearch Q u estion S electio n D om ain C o rp u s •During a systematic review scientists iterate between retrieval, extraction and analysis •Most information required is located only within the full-text of an article (not the abstract or title) •Information Synthesis requires collaboration •Pilot study demonstrated – Automated extraction looks promising – Automated analysis successful Future Work •Framework to support scientists as they retrieve, extract and analyze information from articles •Supports observed user behaviors and work practices Pilot implementation Automated Extraction – heuristic approach – training set precision and recall=(0.84, 0.86) – test set precision and recall= (0.68, 0.71) Automated Analysis – Random-effects meta-analysis module implemented in java Critical components Implementation and Use What are women with breast cancer exposed to? What are women in a similar population exposed to? Are these rates significantly different? External database of risk factors 2 For each study • number of patients • age of patients • risk-factor exposure Codebook age, gender • % responses • location Breast cancer studies 1 3 Using Information Synthesis to Quantify Breast Cancer Risk Factors Automated Information Extraction and Analysis for Information Synthesis Blake, C. and Pratt,W. (In Press) Collaborative Information Synthesis American Society for Information Science and Technology (ASIST), 2002 Philadelphia, PA. C.Blake (2002) Information Synthesis: A Process used by Scientists in Medicine and Public Health to Overcome Information Overload, Fourth International Conference on Conceptions of Library and Information Science: Emerging Frameworks and Methods (CoLIS 4), Doctoral Forum, Seattle, WA. Tengs, T. and Osgood, N.D. (2001). The link between smoking and Impotence : Two Decades of Evidence. Preventive Medicine, 32(6), 447-452. •Further evaluation of extraction algorithms •Implementation of verification component •Final smoking and breast cancer analysis Related Work •Identify information from text •Verify the information extracted •Synthesis using meta- analysis Funded by California Breast Cancer Research Pro

Information overload –more than 12 million references already in MEDLINE –thousands more each day –well-articulated queries retrieve many relevant articles

Embed Size (px)

Citation preview

Page 1: Information overload –more than 12 million references already in MEDLINE –thousands more each day –well-articulated queries retrieve many relevant articles

• Information overload– more than 12 million references already in

MEDLINE– thousands more each day– well-articulated queries retrieve many relevant

articles• Most information from an article is not used

• Breast cancer risk factors are not well understood– Other than age and gender current risk factors

explain only half of the breast cancer occurrence

Tammy Tengs, Sc.DHealth Priorities Research Group

School of Social Ecology University of California, Irvine

Catherine Blake, M.S., MCompSc

Information and Computer Science

University of California, Irvine

Wanda Pratt, Ph.D.Information School and

Division of Biomedical & Health Informatics

University of Washington 

• Traditional meta-analysis considers only the 29 studies where smoking is a primary topic• Information synthesis considers secondary information from the remaining 58 studies• Including secondary information can reduce certain publication biases

Breast Cancer Studies that Report Smoking

Smoking reported1 study Primary topicSecondary topic

Primary topic Secondary topic

Research Goals(1) Understand how scientists in

medicine and public health currently use biomedical literature to answer research questions

(2) Design and implement technology to support the observed information behaviors and work practices

(3) Use the technology to quantify the risk of smoking and breast cancer

Motivation

ConclusionMethod•User study

– 2 groups of scientists in medicine and public health

– observations and interviews during a systematic review

– retrospective analysis after a meta-analysis• Meta-analysis training• Literature review

Information Synthesis

Information Synthesis Framework

Co llabo ratio n

V er ifi cation

E xtern alD ata

Con cepts

Facts

A n alysisM E D L IN E E xtractio n

Con textu alQ uestion s

R esearchQ uestion

Selectio nD om ainCorpu s

•During a systematic review scientists iterate between retrieval, extraction and analysis

•Most information required is located only within the full-text of an article (not the abstract or title)

•Information Synthesis requires collaboration

•Pilot study demonstrated– Automated extraction looks promising– Automated analysis successful

Future Work

•Framework to support scientists as they retrieve, extract and analyze information from articles

•Supports observed user behaviors and work practices

Pilot implementation• Automated Extraction

– heuristic approach – training set precision and recall=(0.84,

0.86)– test set precision and recall= (0.68,

0.71)

• Automated Analysis– Random-effects meta-analysis module

implemented in java

Critical components

Implementation and Use

What are women with breast cancer exposed to?

What are women in a similar population exposed to?

Are these rates

significantly different?

External database of risk factors

2

For each study• number of patients• age of patients • risk-factor exposure• …

Codebook• age, gender • % responses• location• …

Breast cancer studies

1

3

Using Information Synthesis to Quantify Breast Cancer Risk Factors

Automated Information Extraction and Analysis for

Information Synthesis

Blake, C. and Pratt,W. (In Press) Collaborative Information Synthesis American Society for Information Science and Technology (ASIST), 2002 Philadelphia, PA.C.Blake (2002) Information Synthesis: A Process used by Scientists in Medicine and Public Health to Overcome Information Overload, Fourth International Conference on Conceptions of Library and Information Science: Emerging Frameworks and Methods (CoLIS 4), Doctoral Forum, Seattle, WA. Tengs, T. and Osgood, N.D. (2001). The link between smoking and Impotence : Two Decades of Evidence. Preventive Medicine, 32(6), 447-452.

•Further evaluation of extraction algorithms

•Implementation of verification component

•Final smoking and breast cancer analysis

Related Work

•Identify information from text

•Verify the information extracted

•Synthesis using meta-analysis

Funded byCalifornia Breast Cancer Research Program