The pragmatic text miner: It’s just another type of poorly standardized data

  • View
    159

  • Download
    4

  • Category

    Science

Preview:

Citation preview

Lars Juhl Jensen

The pragmatic text minerIt’s just another type of poorly standardized

data

why text mining?

data mining

guilt by association

structured data

unstructured text

biomedical literature

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

text corpus

comprehensive lexicon

synonyms

expansion rules

prefixes and suffixes

flexible matching

hyphens and spaces

“black list”

a

co-mentioning

within documents

within paragraphs

within sentences

weighted score

unifying text & data

text mining

curated knowledge

experimental data

computational predictions

protein networks

Szklarczyk et al., Nucleic Acids Research, 2015string-db.org

chemical networks

Kuhn et al., Nucleic Acids Research, 2014stitch-db.org

subcellular localization

Binder et al., Database, 2014compartments.jensenlab.org

tissue expression

tissues.jensenlab.org Santos et al., submitted, 2015

disease associations

diseases.jensenlab.org Frankild et al., Methods, 2015

many databases

different formats

different identifiers

variable quality

not comparable

hard work

common identifiers

quality scores

calibrate vs. gold standard

von Mering et al., Nucleic Acids Research, 2005

general framework

interactive web resources

semantic web services

augmented browsing

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009reflect.ws

medical data mining

Jensen et al., Nature Reviews Genetics, 2012

structured data

Jensen et al., Nature Reviews Genetics, 2012

119 million diagnoses

6.2 million patients

distributions

Jensen et al., Nature Communications, 2014

trajectories

Jensen et al., Nature Communications, 2014

clinical narrative

unstructured text

Danish

busy doctors

comprehensive lexicon

adverse drug events

drugs

Clozapine

clozapin

clossapin

klozapine

chlosapin

chlosapine

chlozapin

chlozapine

klossapin

closapine

klozapinklosapi

nClozapine

rule-based system

Eriksson et al., Drug Safety, 2014

Drug introduction Drug discontinuationAdverse eventIdentification start

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

ADR ofadditional drug

Eriksson et al., Drug Safety, 2014

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

Eriksson et al., Drug Safety, 2014

Drug introduction Drug discontinuation

Adverse eventNegative modifier Indication Pre-existingcondition

Adverse drug reaction Possibleadverse drug reaction

Adverse event

ADR ofadditional drug

Identification start

direct medical implications

Acknowledgments

STRING/STITCHMichael KuhnDamian SzklarczykAndrea Franceschini Milan SimonovicAlexander RothSune Pletscher-FrankildJianyi LinPablo MinguezChristian von MeringPeer Bork

Text miningSune Pletscher-FrankildJasmin SaricEvangelos PafilisAlberto SantosJanos BinderKalliopi TsafouHeiko HornMichael KuhnReinhardt SchneiderSean O’ Donoghue

EHR miningAnders Boeck JensenRobert ErikssonPeter Bjødstrup JensenAndreas Bok AndersenSabrina Gade Ellesøe Henriette Schmock Tudor OpreaPope MoseleyThomas WergeSøren Brunak