115
Mining text and data on chemicals Lars Juhl Jensen

Mining text and data on chemicals Lars Juhl Jensen

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Mining text and data on chemicals Lars Juhl Jensen

Mining text and data on chemicals

Lars Juhl Jensen

Page 2: Mining text and data on chemicals Lars Juhl Jensen

three parts

Page 3: Mining text and data on chemicals Lars Juhl Jensen

text mining

Page 4: Mining text and data on chemicals Lars Juhl Jensen

data integration

Page 5: Mining text and data on chemicals Lars Juhl Jensen

medical records

Page 6: Mining text and data on chemicals Lars Juhl Jensen

Part 1text mining

Page 7: Mining text and data on chemicals Lars Juhl Jensen

exponential growth

Page 8: Mining text and data on chemicals Lars Juhl Jensen
Page 9: Mining text and data on chemicals Lars Juhl Jensen
Page 10: Mining text and data on chemicals Lars Juhl Jensen

some things are constant

Page 11: Mining text and data on chemicals Lars Juhl Jensen
Page 12: Mining text and data on chemicals Lars Juhl Jensen

~45 seconds per paper

Page 13: Mining text and data on chemicals Lars Juhl Jensen

information retrieval

Page 14: Mining text and data on chemicals Lars Juhl Jensen

find the relevant papers

Page 15: Mining text and data on chemicals Lars Juhl Jensen

still too much to read

Page 16: Mining text and data on chemicals Lars Juhl Jensen

computer

Page 17: Mining text and data on chemicals Lars Juhl Jensen

as smart as a dog

Page 18: Mining text and data on chemicals Lars Juhl Jensen

teach it specific tricks

Page 19: Mining text and data on chemicals Lars Juhl Jensen
Page 20: Mining text and data on chemicals Lars Juhl Jensen
Page 21: Mining text and data on chemicals Lars Juhl Jensen

named entity recognition

Page 22: Mining text and data on chemicals Lars Juhl Jensen

identify the concepts

Page 23: Mining text and data on chemicals Lars Juhl Jensen

small molecules

Page 24: Mining text and data on chemicals Lars Juhl Jensen

proteins

Page 25: Mining text and data on chemicals Lars Juhl Jensen

diseases

Page 26: Mining text and data on chemicals Lars Juhl Jensen

comprehensive lexicon

Page 27: Mining text and data on chemicals Lars Juhl Jensen

synonyms

Page 28: Mining text and data on chemicals Lars Juhl Jensen

orthographic variation

Page 29: Mining text and data on chemicals Lars Juhl Jensen

“black list”

Page 30: Mining text and data on chemicals Lars Juhl Jensen

unfortunate names

Page 31: Mining text and data on chemicals Lars Juhl Jensen

Reflect

Page 32: Mining text and data on chemicals Lars Juhl Jensen

augmented browsing

Page 33: Mining text and data on chemicals Lars Juhl Jensen

browser add-on

Page 34: Mining text and data on chemicals Lars Juhl Jensen

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

Page 35: Mining text and data on chemicals Lars Juhl Jensen

Firefox

Page 36: Mining text and data on chemicals Lars Juhl Jensen

Internet Explorer

Page 37: Mining text and data on chemicals Lars Juhl Jensen

Google Chrome

Page 38: Mining text and data on chemicals Lars Juhl Jensen

Safari

Page 39: Mining text and data on chemicals Lars Juhl Jensen

Utopia Documents

Page 40: Mining text and data on chemicals Lars Juhl Jensen

web services

Page 41: Mining text and data on chemicals Lars Juhl Jensen

collaboration

Page 42: Mining text and data on chemicals Lars Juhl Jensen
Page 43: Mining text and data on chemicals Lars Juhl Jensen
Page 44: Mining text and data on chemicals Lars Juhl Jensen
Page 45: Mining text and data on chemicals Lars Juhl Jensen

SciVerse

Page 46: Mining text and data on chemicals Lars Juhl Jensen
Page 47: Mining text and data on chemicals Lars Juhl Jensen
Page 48: Mining text and data on chemicals Lars Juhl Jensen
Page 49: Mining text and data on chemicals Lars Juhl Jensen
Page 50: Mining text and data on chemicals Lars Juhl Jensen
Page 51: Mining text and data on chemicals Lars Juhl Jensen

information extraction

Page 52: Mining text and data on chemicals Lars Juhl Jensen

formalize the facts

Page 53: Mining text and data on chemicals Lars Juhl Jensen

co-mentioning

Page 54: Mining text and data on chemicals Lars Juhl Jensen

NLPNatural Language Processing

Page 55: Mining text and data on chemicals Lars Juhl Jensen

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 56: Mining text and data on chemicals Lars Juhl Jensen

Part 2data integration

Page 57: Mining text and data on chemicals Lars Juhl Jensen

STITCH

Page 58: Mining text and data on chemicals Lars Juhl Jensen

Kuhn et al., Nucleic Acids Research, 2012

Page 59: Mining text and data on chemicals Lars Juhl Jensen

~300,000 small molecules

Page 60: Mining text and data on chemicals Lars Juhl Jensen

~2.6 million proteins

Page 61: Mining text and data on chemicals Lars Juhl Jensen

1100+ genomes

Page 62: Mining text and data on chemicals Lars Juhl Jensen

experimental data

Page 63: Mining text and data on chemicals Lars Juhl Jensen

physical binding

Page 64: Mining text and data on chemicals Lars Juhl Jensen

chemical–protein

Page 65: Mining text and data on chemicals Lars Juhl Jensen

protein–protein

Page 66: Mining text and data on chemicals Lars Juhl Jensen
Page 67: Mining text and data on chemicals Lars Juhl Jensen

curated knowledge

Page 68: Mining text and data on chemicals Lars Juhl Jensen

drug targets

Page 69: Mining text and data on chemicals Lars Juhl Jensen

complexes

Page 70: Mining text and data on chemicals Lars Juhl Jensen

pathways

Page 71: Mining text and data on chemicals Lars Juhl Jensen

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 72: Mining text and data on chemicals Lars Juhl Jensen

text mining

Page 73: Mining text and data on chemicals Lars Juhl Jensen

co-mentioning

Page 74: Mining text and data on chemicals Lars Juhl Jensen
Page 75: Mining text and data on chemicals Lars Juhl Jensen

NLPNatural Language Processing

Page 76: Mining text and data on chemicals Lars Juhl Jensen
Page 77: Mining text and data on chemicals Lars Juhl Jensen

many data types

Page 78: Mining text and data on chemicals Lars Juhl Jensen

many databases

Page 79: Mining text and data on chemicals Lars Juhl Jensen

different formats

Page 80: Mining text and data on chemicals Lars Juhl Jensen

different identifiers

Page 81: Mining text and data on chemicals Lars Juhl Jensen

variable quality

Page 82: Mining text and data on chemicals Lars Juhl Jensen

not comparable

Page 83: Mining text and data on chemicals Lars Juhl Jensen

spread over many genomes

Page 84: Mining text and data on chemicals Lars Juhl Jensen

quality scores

Page 85: Mining text and data on chemicals Lars Juhl Jensen

von Mering et al., Nucleic Acids Research, 2005

Page 86: Mining text and data on chemicals Lars Juhl Jensen

calibrate vs. gold standard

Page 87: Mining text and data on chemicals Lars Juhl Jensen

von Mering et al., Nucleic Acids Research, 2005

Page 88: Mining text and data on chemicals Lars Juhl Jensen

probabilistic scores

Page 89: Mining text and data on chemicals Lars Juhl Jensen

orthology transfer

Page 90: Mining text and data on chemicals Lars Juhl Jensen

combine the evidence

Page 91: Mining text and data on chemicals Lars Juhl Jensen

Part 3patient records

Page 92: Mining text and data on chemicals Lars Juhl Jensen

a hard problem

Page 93: Mining text and data on chemicals Lars Juhl Jensen

in Danish

Page 94: Mining text and data on chemicals Lars Juhl Jensen

by busy doctors

Page 95: Mining text and data on chemicals Lars Juhl Jensen

about psychiatric patients

Page 96: Mining text and data on chemicals Lars Juhl Jensen

no lexicon

Page 97: Mining text and data on chemicals Lars Juhl Jensen

acronyms

Page 98: Mining text and data on chemicals Lars Juhl Jensen

typos

Page 99: Mining text and data on chemicals Lars Juhl Jensen

delusions

Page 100: Mining text and data on chemicals Lars Juhl Jensen

domain specific system

Page 101: Mining text and data on chemicals Lars Juhl Jensen

patient record excerpt

Page 102: Mining text and data on chemicals Lars Juhl Jensen

F20

F200

Negation

Family

Page 103: Mining text and data on chemicals Lars Juhl Jensen

medication

Page 104: Mining text and data on chemicals Lars Juhl Jensen

adverse drug events

Page 105: Mining text and data on chemicals Lars Juhl Jensen

diagnoses

Page 106: Mining text and data on chemicals Lars Juhl Jensen

pharmacovigilance

Page 107: Mining text and data on chemicals Lars Juhl Jensen

patient stratification

Page 108: Mining text and data on chemicals Lars Juhl Jensen

Roque et al., PLoS Computational Biology, 2011

Page 109: Mining text and data on chemicals Lars Juhl Jensen

disease comorbidity

Page 110: Mining text and data on chemicals Lars Juhl Jensen

Roque et al., PLoS Computational Biology, 2011

Page 111: Mining text and data on chemicals Lars Juhl Jensen

DNA sequencing

Page 112: Mining text and data on chemicals Lars Juhl Jensen

genotype

Page 113: Mining text and data on chemicals Lars Juhl Jensen

phenotype

Page 114: Mining text and data on chemicals Lars Juhl Jensen

Acknowledgments

ReflectSune FrankildHeiko HornEvangelos PafilisJuan-Carlos Silla-CastroMichael KuhnReinhardt SchneiderSean O’Donoghue

STITCHMichael KuhnDamian SzklarczykAndrea FranceschiniMilan SimonovicAlexander RothPablo MinguezTobias Doerks

Manuel StarkChristian von MeringPeer Bork

EPJ-miningFrancisco S RoquePeter B JensenRobert ErikssonHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulThomas WergeSøren Brunak

Page 115: Mining text and data on chemicals Lars Juhl Jensen

larsjuhljensen