View
220
Download
1
Category
Tags:
Preview:
Citation preview
Mining text and data on chemicals
Lars Juhl Jensen
three parts
text mining
data integration
medical records
Part 1text mining
exponential growth
some things are constant
~45 seconds per paper
information retrieval
find the relevant papers
still too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
identify the concepts
small molecules
proteins
diseases
comprehensive lexicon
synonyms
orthographic variation
“black list”
unfortunate names
Reflect
augmented browsing
browser add-on
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010
Firefox
Internet Explorer
Google Chrome
Safari
Utopia Documents
web services
collaboration
SciVerse
information extraction
formalize the facts
co-mentioning
NLPNatural Language Processing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
Part 2data integration
STITCH
Kuhn et al., Nucleic Acids Research, 2012
~300,000 small molecules
~2.6 million proteins
1100+ genomes
experimental data
physical binding
chemical–protein
protein–protein
curated knowledge
drug targets
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
text mining
co-mentioning
NLPNatural Language Processing
many data types
many databases
different formats
different identifiers
variable quality
not comparable
spread over many genomes
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
probabilistic scores
orthology transfer
combine the evidence
Part 3patient records
a hard problem
in Danish
by busy doctors
about psychiatric patients
no lexicon
acronyms
typos
delusions
domain specific system
patient record excerpt
F20
F200
Negation
Family
medication
adverse drug events
diagnoses
pharmacovigilance
patient stratification
Roque et al., PLoS Computational Biology, 2011
disease comorbidity
Roque et al., PLoS Computational Biology, 2011
DNA sequencing
genotype
phenotype
Acknowledgments
ReflectSune FrankildHeiko HornEvangelos PafilisJuan-Carlos Silla-CastroMichael KuhnReinhardt SchneiderSean O’Donoghue
STITCHMichael KuhnDamian SzklarczykAndrea FranceschiniMilan SimonovicAlexander RothPablo MinguezTobias Doerks
Manuel StarkChristian von MeringPeer Bork
EPJ-miningFrancisco S RoquePeter B JensenRobert ErikssonHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulThomas WergeSøren Brunak
larsjuhljensen
Recommended