18
Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Embed Size (px)

DESCRIPTION

One node (open) search: formulate a problem (literature A) find a different literature C containing complementary information focus on implicit links between A and C But… most scientists already have more hypotheses and leads than they can handle!

Citation preview

Page 1: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Aiding Biomedical Researchers

with Tools to Assist Discovery

Neil R. Smalheiser

May 18, 2006

Page 2: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Don SwansonUndiscovered Public

Knowledge “A affects B”, (separately) “B

affects C” Does A affect C? The pieces are all public, but need

to be put together to see a pattern

Page 3: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

One node (open) search: formulate a problem (literature A) find a different literature C containing

complementary information focus on implicit links

between A and C But… most scientists already have more

hypotheses and leads than they can handle!

Page 4: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

The Two Node (Closed) Search

Link between A and C is either known (often newly discovered) or hypothesized

Examine title terms B in common between A and C as possibly pointing to meaningful links

A and C don’t have to be disjoint!

Page 5: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

The Arrowsmith Project Human Brain Project, NLM, NIMH Public web interfaces for one and

two node searches Develop the system further in

collaboration with neuroscience field testers

Page 6: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

http://arrowsmith.psych.uic.edu

Page 7: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Lessons from Field Testers Used Arrowsmith two node search for

many daily information needs finding, assessing or prioritizing

hypotheses Items studied in common to two literatures Browsing unfamiliar lit C for the subset that

is likely to be most relevant to familiar lit A Arrowsmith as an extension of PubMed

searches

Page 8: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Lessons for the “Back End” Two node searches need to be fast

(seconds, not minutes), B-list needs to be assessed quickly

(seconds or minutes, not hours) No need to be comprehensive No need to find only “novel” links

Page 9: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Filtering and Ranking B-terms Features permitting users to filter and

rank B-terms: Semantic categories Frequency Recency MeSH Characteristic-ness Coherence Stoplist

Page 10: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

A quantitative model for filtering and ranking B-terms Even though each search is different, and each

person has their own idea of “relevance”, can identify features that are associated with chosen B-terms

Chose 5 gold standards, with user-chosen positive and *negative B-terms

combined all 7 features into single logistic regression model (optimal weighting of each feature, 1 score for each B-term; score varies for each 2 node search)

Page 11: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

ID

A-literature query C-literature query Raw B-terms

Relevant B-terms sought

1 retinal detachment[ti] n = 5122

aortic aneurysm[ti] n = 5687

n = 2294

a) diseases or syndromes in which both features have been described n = 30

b) surgical procedures used for diagnosis or treatment of both n = 26

2 mglur5[ti] OR (metabotropic glutamate receptor[ti] OR metabotropic glutamate receptors[ti]) n = 2032

Lewy body[ti] OR Lewy bodies[ti] n = 1141

n = 820

a) signaling molecules that directly or indirectly modulate orare modulated by mGluR5 and that either modulate Lewy bodies or are altered in diseases that have Lewy bodies n = 19

b) specific brain regions studied in both n = 42

3 "magnesium"[MeSH Terms] AND magnesium[ti] AND ("1900"[PDAT] : "1987/12/31"[PDAT]) n = 6238

("migraine disorders"[MeSH] AND migraine[ti]) AND ("1900"[PDAT] : "1987/12/31"[PDAT]) n = 3205

n = 1879

terms described as relevant in the JASIST paper (ref. 23, in Appendix) excluding two judged too general to be useful (reactivity and spreading) n = 41

4 beta-amyloid precursor protein[ti] OR amyloid precursor protein[ti] OR APP[ti] AND ("amyloid"[MeSH Terms] OR amyloid[Text Word]) n = 2118

reelin[All Fields] n = 493

n = 1003

genes or proteins shared in Reelin and APP (amyloid precursor protein) signal transduction pathways n = 54

5 ("nitric oxide"[MeSH Terms] OR nitric oxide[ti]) AND (("mitochondria"[MeSH Terms] OR mitochondria[ti]) OR mitochondrial[ti]) n = 786

(psd[ti] OR psd93[ti] OR psd95[ti] OR psds[ti]) OR "postsynaptic density"[ti] OR "postsynaptic densities"[ti] n = 545

n = 584

physiological or pathological responses that link the action of nitric oxide on mitochondria and the normal function of post-synaptic densities n = 51

Page 12: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Some Findings of the Model Coherence was most important in

identifying relevant B-terms. Characteristic value, semantic category

mapping, frequency and recency all contributed significantly as well.

> 5% of the marked relevant B-terms in the gold standard searches were terms found on the 1400 word stoplist (e.g., Down Syndrome)

Page 13: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

20

40

60

80

100

120

Number of B-terms

B-term score

retinal detachment vs aortic aneurysm

predicted non-relevantpredicted relevant

Page 14: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

predicted recall

predicted precision

retinal detachment vs aortic aneurysm

Page 15: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

200

400

Number of B-terms

two randomly selected literatures

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

10

20

30

Number of B-terms

mesothelioma vs machiavellianism predicted non-relevantpredicted relevant

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

50

100

Number of B-terms

retinal detachment vs aortic aneurysm

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.20

10

20

30

Number of B-terms

B-term score

mesothelioma/etiology vs mesothelioma/physiology

Page 16: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Implications We can now rank all B-terms rigorously and

automatically, in order of the probability that they will be found relevant by SOME user

We can now predict the NUMBER of relevant B-terms in any given search

Can apply to B-terms arising within abstracts We now have a global measure of OVERALL

implicit information linking two (topical, disjoint) literatures

Can apply to one node searches too!

Page 17: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Conclusion The two node search can now be

conducted and analyzed in a matter of minutes, not hours or days

Can be utilized by the general scientific public for a variety of information needs, including but NOT restricted to searching for and assessing hypotheses

Page 18: Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006

Thanks to…. Vetle Torvik Don Swanson Wei Zhou Maryann Martone & Guy Perkins Ramin Homayouni Bob Bilder & Don Kalar