Upload
lars-juhl-jensen
View
159
Download
4
Tags:
Embed Size (px)
Citation preview
Lars Juhl Jensen
The pragmatic text minerIt’s just another type of poorly standardized
data
why text mining?
data mining
guilt by association
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
text corpus
comprehensive lexicon
synonyms
expansion rules
prefixes and suffixes
flexible matching
hyphens and spaces
“black list”
a
co-mentioning
within documents
within paragraphs
within sentences
weighted score
unifying text & data
text mining
curated knowledge
experimental data
computational predictions
protein networks
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
chemical networks
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
subcellular localization
Binder et al., Database, 2014compartments.jensenlab.org
tissue expression
tissues.jensenlab.org Santos et al., submitted, 2015
disease associations
diseases.jensenlab.org Frankild et al., Methods, 2015
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
general framework
interactive web resources
semantic web services
augmented browsing
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009reflect.ws
medical data mining
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
119 million diagnoses
6.2 million patients
distributions
Jensen et al., Nature Communications, 2014
trajectories
Jensen et al., Nature Communications, 2014
clinical narrative
unstructured text
Danish
busy doctors
comprehensive lexicon
adverse drug events
drugs
Clozapine
clozapin
clossapin
klozapine
chlosapin
chlosapine
chlozapin
chlozapine
klossapin
closapine
klozapinklosapi
nClozapine
rule-based system
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse eventIdentification start
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
ADR ofadditional drug
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
direct medical implications
Acknowledgments
STRING/STITCHMichael KuhnDamian SzklarczykAndrea Franceschini Milan SimonovicAlexander RothSune Pletscher-FrankildJianyi LinPablo MinguezChristian von MeringPeer Bork
Text miningSune Pletscher-FrankildJasmin SaricEvangelos PafilisAlberto SantosJanos BinderKalliopi TsafouHeiko HornMichael KuhnReinhardt SchneiderSean O’ Donoghue
EHR miningAnders Boeck JensenRobert ErikssonPeter Bjødstrup JensenAndreas Bok AndersenSabrina Gade Ellesøe Henriette Schmock Tudor OpreaPope MoseleyThomas WergeSøren Brunak