109
Network biology Large-scale data and text mining Lars Juhl Jensen

Large-scale data and text mining

Embed Size (px)

Citation preview

Page 1: Large-scale data and text mining

Network biologyLarge-scale data and text mining

Lars Juhl Jensen

Page 2: Large-scale data and text mining

protein networks

Page 3: Large-scale data and text mining

medical networks

Page 4: Large-scale data and text mining

guilt by association

Page 5: Large-scale data and text mining
Page 6: Large-scale data and text mining

protein networks

Page 7: Large-scale data and text mining

STRING

Page 8: Large-scale data and text mining

functional associations

Page 9: Large-scale data and text mining

computational predictions

Page 10: Large-scale data and text mining

gene fusion

Page 11: Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 12: Large-scale data and text mining

gene neighborhood

Page 13: Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 14: Large-scale data and text mining

phylogenetic profiles

Page 15: Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 16: Large-scale data and text mining

experimental data

Page 17: Large-scale data and text mining

gene coexpression

Page 18: Large-scale data and text mining
Page 19: Large-scale data and text mining

protein interactions

Page 20: Large-scale data and text mining

Jensen & Bork, Science, 2008

Page 21: Large-scale data and text mining

curated knowledge

Page 22: Large-scale data and text mining

complexes

Page 23: Large-scale data and text mining

pathways

Page 24: Large-scale data and text mining

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 25: Large-scale data and text mining

many databases

Page 26: Large-scale data and text mining

different formats

Page 27: Large-scale data and text mining

different identifiers

Page 28: Large-scale data and text mining

variable quality

Page 29: Large-scale data and text mining

not comparable

Page 30: Large-scale data and text mining

not same species

Page 31: Large-scale data and text mining

hard work

Page 32: Large-scale data and text mining

quality scores

Page 33: Large-scale data and text mining

von Mering et al., Nucleic Acids Research, 2005

Page 34: Large-scale data and text mining

calibrate vs. gold standard

Page 35: Large-scale data and text mining

von Mering et al., Nucleic Acids Research, 2005

Page 36: Large-scale data and text mining

homology-based transfer

Page 37: Large-scale data and text mining

Franceschini et al., Nucleic Acids Research, 2013

Page 38: Large-scale data and text mining

vizualization

Page 39: Large-scale data and text mining

string-db.org

Page 40: Large-scale data and text mining

missing most of the data

Page 41: Large-scale data and text mining

text mining

Page 42: Large-scale data and text mining

>10 km

Page 43: Large-scale data and text mining

too much to read

Page 44: Large-scale data and text mining

computer

Page 45: Large-scale data and text mining

as smart as a dog

Page 46: Large-scale data and text mining

teach it specific tricks

Page 47: Large-scale data and text mining
Page 48: Large-scale data and text mining
Page 49: Large-scale data and text mining

named entity recognition

Page 50: Large-scale data and text mining

comprehensive lexicon

Page 51: Large-scale data and text mining

CDC2

Page 52: Large-scale data and text mining

cyclin dependent kinase 1

Page 53: Large-scale data and text mining

expansion rules

Page 54: Large-scale data and text mining

hCdc2

Page 55: Large-scale data and text mining

CDC2

Page 56: Large-scale data and text mining

flexible matching

Page 57: Large-scale data and text mining

cyclin-dependent kinase 1

Page 58: Large-scale data and text mining

cyclin dependent kinase 1

Page 59: Large-scale data and text mining

“black list”

Page 60: Large-scale data and text mining

SDS

Page 61: Large-scale data and text mining

augmented browsing

Page 62: Large-scale data and text mining

Reflect

Page 63: Large-scale data and text mining

browser add-on

Page 64: Large-scale data and text mining

real-time text mining

Page 65: Large-scale data and text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

Page 66: Large-scale data and text mining

information extraction

Page 67: Large-scale data and text mining

co-mentioning

Page 68: Large-scale data and text mining

within documents

Page 69: Large-scale data and text mining

within paragraphs

Page 70: Large-scale data and text mining

within sentences

Page 71: Large-scale data and text mining

natural language processing

Page 72: Large-scale data and text mining

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 73: Large-scale data and text mining

text corpus

Page 74: Large-scale data and text mining

~22 million abstracts

Page 75: Large-scale data and text mining

millions of full-text articles

Page 76: Large-scale data and text mining

medical networks

Page 77: Large-scale data and text mining

Jensen et al., Nature Reviews Genetics, 2012

Page 78: Large-scale data and text mining
Page 79: Large-scale data and text mining

opt-out

Page 80: Large-scale data and text mining

opt-in

Page 81: Large-scale data and text mining

structured data

Page 82: Large-scale data and text mining

Jensen et al., Nature Reviews Genetics, 2012

Page 83: Large-scale data and text mining

unstructured data

Page 84: Large-scale data and text mining
Page 85: Large-scale data and text mining

Danish

Page 86: Large-scale data and text mining

busy doctors

Page 87: Large-scale data and text mining

psychiatric patients

Page 88: Large-scale data and text mining

custom dictionaries

Page 89: Large-scale data and text mining

drugs

Page 90: Large-scale data and text mining

adverse drug events

Page 91: Large-scale data and text mining

complex filters

Page 92: Large-scale data and text mining

Eriksson et al., submitted, 2013

Page 93: Large-scale data and text mining

new adverse drug reactions

Page 94: Large-scale data and text mining

Eriksson et al., submitted, 2013

Drug substance ADE p-value

Chlordiazepoxide Nystagmus 4.0e-8

Simvastatin Personality changes

8.4e-8

Dipyridamole Visual impairment

4.4e-4

Citalopram Psychosis 8.8e-4

Bendroflumethiazide

Apoplexy 8.5e-3

Page 95: Large-scale data and text mining

temporal correlation

Page 96: Large-scale data and text mining

diagnosis trajectories

Page 97: Large-scale data and text mining

Jensen et al., in preparation, 2013

Page 98: Large-scale data and text mining

national discharge registry

Page 99: Large-scale data and text mining

6.2 million patients

Page 100: Large-scale data and text mining

14 years

Page 101: Large-scale data and text mining

confounding factors

Page 102: Large-scale data and text mining

age and gender

Page 103: Large-scale data and text mining

Jensen et al., submitted, 2013

Female MaleIn

-pati

ent

Out-

pati

ent

Em

erg

ency

room

Page 104: Large-scale data and text mining

lifestyle

Page 105: Large-scale data and text mining

reporting biases

Page 106: Large-scale data and text mining

complex trajectories

Page 107: Large-scale data and text mining

Jensen et al., submitted, 2013

Page 108: Large-scale data and text mining

medical implications

Page 109: Large-scale data and text mining

AcknowledgmentsSTRINGChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork

Text miningSune FrankildJasmin SaricEvangelos PafilisKalliopi TsafouAlberto SantosJanos BinderHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’ Donoghue

EHR miningAnders Boeck JensenPeter Bjødstrup JensenRobert ErikssonFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak