87
Lars Juhl Jensen Biomedical text mining

Lars Juhl Jensen Biomedical text mining. exponential growth

Embed Size (px)

Citation preview

Page 1: Lars Juhl Jensen Biomedical text mining. exponential growth

Lars Juhl Jensen

Biomedical text mining

Page 2: Lars Juhl Jensen Biomedical text mining. exponential growth

exponential growth

Page 3: Lars Juhl Jensen Biomedical text mining. exponential growth
Page 4: Lars Juhl Jensen Biomedical text mining. exponential growth
Page 5: Lars Juhl Jensen Biomedical text mining. exponential growth

~45 seconds per paper

Page 6: Lars Juhl Jensen Biomedical text mining. exponential growth

information retrieval

Page 7: Lars Juhl Jensen Biomedical text mining. exponential growth

named entity recognition

Page 8: Lars Juhl Jensen Biomedical text mining. exponential growth

augmented browsing

Page 9: Lars Juhl Jensen Biomedical text mining. exponential growth

text corpora

Page 10: Lars Juhl Jensen Biomedical text mining. exponential growth

information extraction

Page 11: Lars Juhl Jensen Biomedical text mining. exponential growth

information retrieval

Page 12: Lars Juhl Jensen Biomedical text mining. exponential growth

find the relevant papers

Page 13: Lars Juhl Jensen Biomedical text mining. exponential growth

ad hoc retrieval

Page 14: Lars Juhl Jensen Biomedical text mining. exponential growth

user-specified query

Page 15: Lars Juhl Jensen Biomedical text mining. exponential growth

“yeast AND cell cycle”

Page 16: Lars Juhl Jensen Biomedical text mining. exponential growth

PubMed

Page 17: Lars Juhl Jensen Biomedical text mining. exponential growth
Page 18: Lars Juhl Jensen Biomedical text mining. exponential growth

indexing

Page 19: Lars Juhl Jensen Biomedical text mining. exponential growth

fast lookup

Page 20: Lars Juhl Jensen Biomedical text mining. exponential growth

stemming

Page 21: Lars Juhl Jensen Biomedical text mining. exponential growth

word endings

Page 22: Lars Juhl Jensen Biomedical text mining. exponential growth

dynamic query expansion

Page 23: Lars Juhl Jensen Biomedical text mining. exponential growth

MeSH terms

Page 24: Lars Juhl Jensen Biomedical text mining. exponential growth

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1

and this modification served as a priming step to promote subsequent

Cdc5-dependent Swe1 hyperphosphorylation and degradation

Page 25: Lars Juhl Jensen Biomedical text mining. exponential growth

no tool will find that

Page 26: Lars Juhl Jensen Biomedical text mining. exponential growth

named entity recognition

Page 27: Lars Juhl Jensen Biomedical text mining. exponential growth

computer

Page 28: Lars Juhl Jensen Biomedical text mining. exponential growth

as smart as a dog

Page 29: Lars Juhl Jensen Biomedical text mining. exponential growth

teach it specific tricks

Page 30: Lars Juhl Jensen Biomedical text mining. exponential growth
Page 31: Lars Juhl Jensen Biomedical text mining. exponential growth
Page 32: Lars Juhl Jensen Biomedical text mining. exponential growth

identify the concepts

Page 33: Lars Juhl Jensen Biomedical text mining. exponential growth

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1

and this modification served as a priming step to promote subsequent

Cdc5-dependent Swe1 hyperphosphorylation and degradation

Page 34: Lars Juhl Jensen Biomedical text mining. exponential growth

comprehensive lexicon

Page 35: Lars Juhl Jensen Biomedical text mining. exponential growth

proteins

Page 36: Lars Juhl Jensen Biomedical text mining. exponential growth

chemicals

Page 37: Lars Juhl Jensen Biomedical text mining. exponential growth

compartments

Page 38: Lars Juhl Jensen Biomedical text mining. exponential growth

tissues

Page 39: Lars Juhl Jensen Biomedical text mining. exponential growth

diseases

Page 40: Lars Juhl Jensen Biomedical text mining. exponential growth

organisms

Page 41: Lars Juhl Jensen Biomedical text mining. exponential growth

CDC2

Page 42: Lars Juhl Jensen Biomedical text mining. exponential growth

cyclin dependent kinase 1

Page 43: Lars Juhl Jensen Biomedical text mining. exponential growth

orthographic variation

Page 44: Lars Juhl Jensen Biomedical text mining. exponential growth

upper- and lower-case

Page 45: Lars Juhl Jensen Biomedical text mining. exponential growth

CDC2

Page 46: Lars Juhl Jensen Biomedical text mining. exponential growth

Cdc2

Page 47: Lars Juhl Jensen Biomedical text mining. exponential growth

spaces and hyphens

Page 48: Lars Juhl Jensen Biomedical text mining. exponential growth

cyclin dependent kinase 1

Page 49: Lars Juhl Jensen Biomedical text mining. exponential growth

cyclin-dependent kinase 1

Page 50: Lars Juhl Jensen Biomedical text mining. exponential growth

prefixes and postfixes

Page 51: Lars Juhl Jensen Biomedical text mining. exponential growth

CDC2

Page 52: Lars Juhl Jensen Biomedical text mining. exponential growth

hCDC2

Page 53: Lars Juhl Jensen Biomedical text mining. exponential growth

“black list”

Page 54: Lars Juhl Jensen Biomedical text mining. exponential growth

SDS

Page 55: Lars Juhl Jensen Biomedical text mining. exponential growth

scalable implementation

Page 56: Lars Juhl Jensen Biomedical text mining. exponential growth

text corpora

Page 57: Lars Juhl Jensen Biomedical text mining. exponential growth

>10 km<10 hours

Page 58: Lars Juhl Jensen Biomedical text mining. exponential growth

most use Medline

Page 59: Lars Juhl Jensen Biomedical text mining. exponential growth

~22 million abstracts

Page 60: Lars Juhl Jensen Biomedical text mining. exponential growth

few use full-text articles

Page 61: Lars Juhl Jensen Biomedical text mining. exponential growth

no access

Page 62: Lars Juhl Jensen Biomedical text mining. exponential growth

PDF files

Page 63: Lars Juhl Jensen Biomedical text mining. exponential growth
Page 64: Lars Juhl Jensen Biomedical text mining. exponential growth

layout-aware extraction

Page 65: Lars Juhl Jensen Biomedical text mining. exponential growth

millions of full-text articles

Page 66: Lars Juhl Jensen Biomedical text mining. exponential growth

information extraction

Page 67: Lars Juhl Jensen Biomedical text mining. exponential growth

formalize the facts

Page 68: Lars Juhl Jensen Biomedical text mining. exponential growth

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1

and this modification served as a priming step to promote subsequent

Cdc5-dependent Swe1 hyperphosphorylation and degradation

Page 69: Lars Juhl Jensen Biomedical text mining. exponential growth

two approaches

Page 70: Lars Juhl Jensen Biomedical text mining. exponential growth

co-mentioning

Page 71: Lars Juhl Jensen Biomedical text mining. exponential growth

counting

Page 72: Lars Juhl Jensen Biomedical text mining. exponential growth

within documents

Page 73: Lars Juhl Jensen Biomedical text mining. exponential growth

within paragraphs

Page 74: Lars Juhl Jensen Biomedical text mining. exponential growth

within sentences

Page 75: Lars Juhl Jensen Biomedical text mining. exponential growth

co-mentioning score

Page 76: Lars Juhl Jensen Biomedical text mining. exponential growth

NLPNatural Language Processing

Page 77: Lars Juhl Jensen Biomedical text mining. exponential growth

grammatical analysis

Page 78: Lars Juhl Jensen Biomedical text mining. exponential growth

part-of-speech tagging

Page 79: Lars Juhl Jensen Biomedical text mining. exponential growth

multiword detection

Page 80: Lars Juhl Jensen Biomedical text mining. exponential growth

semantic tagging

Page 81: Lars Juhl Jensen Biomedical text mining. exponential growth

sentence parsing

Page 82: Lars Juhl Jensen Biomedical text mining. exponential growth

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 83: Lars Juhl Jensen Biomedical text mining. exponential growth

extract stated facts

Page 84: Lars Juhl Jensen Biomedical text mining. exponential growth

high precision

Page 85: Lars Juhl Jensen Biomedical text mining. exponential growth

poor recall

Page 86: Lars Juhl Jensen Biomedical text mining. exponential growth

ExerciseGo to http://diseases.jensenlab.org

Find TYMS disease associations

Inspect the text-mining evidence

Look for examples of synonym usage

Find genes linked to colorectal cancer

Page 87: Lars Juhl Jensen Biomedical text mining. exponential growth

thank you!