86
Lars Juhl Jensen The pragmatic text miner It’s just another type of poorly standardized data

The pragmatic text miner: It’s just another type of poorly standardized data

Embed Size (px)

Citation preview

Page 1: The pragmatic text miner: It’s just another type of poorly standardized data

Lars Juhl Jensen

The pragmatic text minerIt’s just another type of poorly standardized

data

Page 2: The pragmatic text miner: It’s just another type of poorly standardized data

why text mining?

Page 3: The pragmatic text miner: It’s just another type of poorly standardized data

data mining

Page 4: The pragmatic text miner: It’s just another type of poorly standardized data

guilt by association

Page 5: The pragmatic text miner: It’s just another type of poorly standardized data
Page 6: The pragmatic text miner: It’s just another type of poorly standardized data

structured data

Page 7: The pragmatic text miner: It’s just another type of poorly standardized data

unstructured text

Page 8: The pragmatic text miner: It’s just another type of poorly standardized data

biomedical literature

Page 9: The pragmatic text miner: It’s just another type of poorly standardized data

>10 km

Page 10: The pragmatic text miner: It’s just another type of poorly standardized data

too much to read

Page 11: The pragmatic text miner: It’s just another type of poorly standardized data

computer

Page 12: The pragmatic text miner: It’s just another type of poorly standardized data

as smart as a dog

Page 13: The pragmatic text miner: It’s just another type of poorly standardized data

teach it specific tricks

Page 14: The pragmatic text miner: It’s just another type of poorly standardized data
Page 15: The pragmatic text miner: It’s just another type of poorly standardized data
Page 16: The pragmatic text miner: It’s just another type of poorly standardized data

named entity recognition

Page 17: The pragmatic text miner: It’s just another type of poorly standardized data

text corpus

Page 18: The pragmatic text miner: It’s just another type of poorly standardized data

comprehensive lexicon

Page 19: The pragmatic text miner: It’s just another type of poorly standardized data

synonyms

Page 20: The pragmatic text miner: It’s just another type of poorly standardized data

expansion rules

Page 21: The pragmatic text miner: It’s just another type of poorly standardized data

prefixes and suffixes

Page 22: The pragmatic text miner: It’s just another type of poorly standardized data

flexible matching

Page 23: The pragmatic text miner: It’s just another type of poorly standardized data

hyphens and spaces

Page 24: The pragmatic text miner: It’s just another type of poorly standardized data

“black list”

Page 25: The pragmatic text miner: It’s just another type of poorly standardized data

a

Page 26: The pragmatic text miner: It’s just another type of poorly standardized data

co-mentioning

Page 27: The pragmatic text miner: It’s just another type of poorly standardized data

within documents

Page 28: The pragmatic text miner: It’s just another type of poorly standardized data

within paragraphs

Page 29: The pragmatic text miner: It’s just another type of poorly standardized data

within sentences

Page 30: The pragmatic text miner: It’s just another type of poorly standardized data

weighted score

Page 31: The pragmatic text miner: It’s just another type of poorly standardized data

unifying text & data

Page 32: The pragmatic text miner: It’s just another type of poorly standardized data

text mining

Page 33: The pragmatic text miner: It’s just another type of poorly standardized data

curated knowledge

Page 34: The pragmatic text miner: It’s just another type of poorly standardized data

experimental data

Page 35: The pragmatic text miner: It’s just another type of poorly standardized data

computational predictions

Page 36: The pragmatic text miner: It’s just another type of poorly standardized data

protein networks

Page 37: The pragmatic text miner: It’s just another type of poorly standardized data

Szklarczyk et al., Nucleic Acids Research, 2015string-db.org

Page 38: The pragmatic text miner: It’s just another type of poorly standardized data

chemical networks

Page 39: The pragmatic text miner: It’s just another type of poorly standardized data

Kuhn et al., Nucleic Acids Research, 2014stitch-db.org

Page 40: The pragmatic text miner: It’s just another type of poorly standardized data

subcellular localization

Page 41: The pragmatic text miner: It’s just another type of poorly standardized data

Binder et al., Database, 2014compartments.jensenlab.org

Page 42: The pragmatic text miner: It’s just another type of poorly standardized data

tissue expression

Page 43: The pragmatic text miner: It’s just another type of poorly standardized data

tissues.jensenlab.org Santos et al., submitted, 2015

Page 44: The pragmatic text miner: It’s just another type of poorly standardized data

disease associations

Page 45: The pragmatic text miner: It’s just another type of poorly standardized data

diseases.jensenlab.org Frankild et al., Methods, 2015

Page 46: The pragmatic text miner: It’s just another type of poorly standardized data

many databases

Page 47: The pragmatic text miner: It’s just another type of poorly standardized data

different formats

Page 48: The pragmatic text miner: It’s just another type of poorly standardized data

different identifiers

Page 49: The pragmatic text miner: It’s just another type of poorly standardized data

variable quality

Page 50: The pragmatic text miner: It’s just another type of poorly standardized data

not comparable

Page 51: The pragmatic text miner: It’s just another type of poorly standardized data

hard work

Page 52: The pragmatic text miner: It’s just another type of poorly standardized data

common identifiers

Page 53: The pragmatic text miner: It’s just another type of poorly standardized data

quality scores

Page 54: The pragmatic text miner: It’s just another type of poorly standardized data

calibrate vs. gold standard

Page 55: The pragmatic text miner: It’s just another type of poorly standardized data

von Mering et al., Nucleic Acids Research, 2005

Page 56: The pragmatic text miner: It’s just another type of poorly standardized data

general framework

Page 57: The pragmatic text miner: It’s just another type of poorly standardized data

interactive web resources

Page 58: The pragmatic text miner: It’s just another type of poorly standardized data

semantic web services

Page 59: The pragmatic text miner: It’s just another type of poorly standardized data

augmented browsing

Page 60: The pragmatic text miner: It’s just another type of poorly standardized data

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009reflect.ws

Page 61: The pragmatic text miner: It’s just another type of poorly standardized data

medical data mining

Page 62: The pragmatic text miner: It’s just another type of poorly standardized data

Jensen et al., Nature Reviews Genetics, 2012

Page 63: The pragmatic text miner: It’s just another type of poorly standardized data

structured data

Page 64: The pragmatic text miner: It’s just another type of poorly standardized data

Jensen et al., Nature Reviews Genetics, 2012

Page 65: The pragmatic text miner: It’s just another type of poorly standardized data

119 million diagnoses

Page 66: The pragmatic text miner: It’s just another type of poorly standardized data

6.2 million patients

Page 67: The pragmatic text miner: It’s just another type of poorly standardized data

distributions

Page 68: The pragmatic text miner: It’s just another type of poorly standardized data

Jensen et al., Nature Communications, 2014

Page 69: The pragmatic text miner: It’s just another type of poorly standardized data

trajectories

Page 70: The pragmatic text miner: It’s just another type of poorly standardized data

Jensen et al., Nature Communications, 2014

Page 71: The pragmatic text miner: It’s just another type of poorly standardized data

clinical narrative

Page 72: The pragmatic text miner: It’s just another type of poorly standardized data
Page 73: The pragmatic text miner: It’s just another type of poorly standardized data

unstructured text

Page 74: The pragmatic text miner: It’s just another type of poorly standardized data

Danish

Page 75: The pragmatic text miner: It’s just another type of poorly standardized data

busy doctors

Page 76: The pragmatic text miner: It’s just another type of poorly standardized data

comprehensive lexicon

Page 77: The pragmatic text miner: It’s just another type of poorly standardized data

adverse drug events

Page 78: The pragmatic text miner: It’s just another type of poorly standardized data

drugs

Page 79: The pragmatic text miner: It’s just another type of poorly standardized data

Clozapine

Page 80: The pragmatic text miner: It’s just another type of poorly standardized data

clozapin

clossapin

klozapine

chlosapin

chlosapine

chlozapin

chlozapine

klossapin

closapine

klozapinklosapi

nClozapine

Page 81: The pragmatic text miner: It’s just another type of poorly standardized data

rule-based system

Page 82: The pragmatic text miner: It’s just another type of poorly standardized data

Eriksson et al., Drug Safety, 2014

Drug introduction Drug discontinuationAdverse eventIdentification start

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

ADR ofadditional drug

Page 83: The pragmatic text miner: It’s just another type of poorly standardized data

Eriksson et al., Drug Safety, 2014

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

Page 84: The pragmatic text miner: It’s just another type of poorly standardized data

Eriksson et al., Drug Safety, 2014

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

Page 85: The pragmatic text miner: It’s just another type of poorly standardized data

direct medical implications

Page 86: The pragmatic text miner: It’s just another type of poorly standardized data

Acknowledgments

STRING/STITCHMichael KuhnDamian SzklarczykAndrea Franceschini Milan SimonovicAlexander RothSune Pletscher-FrankildJianyi LinPablo MinguezChristian von MeringPeer Bork

Text miningSune Pletscher-FrankildJasmin SaricEvangelos PafilisAlberto SantosJanos BinderKalliopi TsafouHeiko HornMichael KuhnReinhardt SchneiderSean O’ Donoghue

EHR miningAnders Boeck JensenRobert ErikssonPeter Bjødstrup JensenAndreas Bok AndersenSabrina Gade Ellesøe Henriette Schmock Tudor OpreaPope MoseleyThomas WergeSøren Brunak