130
Literature mining and large-scale data integration Lars Juhl Jensen EMBL Heidelberg

Literature mining and large-scale data integration

Embed Size (px)

DESCRIPTION

Computational and Systems Biology Course, Centre for Computational and Systems Biology (CoSBi), Trento, Italy, March 10-14, 2008

Citation preview

Page 1: Literature mining and large-scale data integration

Literature mining andlarge-scale data integration

Lars Juhl JensenEMBL Heidelberg

Page 2: Literature mining and large-scale data integration

literature mining

Page 3: Literature mining and large-scale data integration

why?

Page 4: Literature mining and large-scale data integration
Page 5: Literature mining and large-scale data integration

too much to read

Page 6: Literature mining and large-scale data integration

information retrieval

Page 7: Literature mining and large-scale data integration

finding the papers

Page 8: Literature mining and large-scale data integration

ad hoc retrieval

Page 9: Literature mining and large-scale data integration

user-specified query

Page 10: Literature mining and large-scale data integration

“yeast AND cell cycle”

Page 11: Literature mining and large-scale data integration

stemming

Page 12: Literature mining and large-scale data integration

yeast / yeasts

Page 13: Literature mining and large-scale data integration

dynamic query expansion

Page 14: Literature mining and large-scale data integration

yeast / S. cerevisiae

Page 15: Literature mining and large-scale data integration

ranking

Page 16: Literature mining and large-scale data integration
Page 17: Literature mining and large-scale data integration
Page 18: Literature mining and large-scale data integration
Page 19: Literature mining and large-scale data integration
Page 20: Literature mining and large-scale data integration
Page 21: Literature mining and large-scale data integration
Page 22: Literature mining and large-scale data integration
Page 23: Literature mining and large-scale data integration
Page 24: Literature mining and large-scale data integration

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 25: Literature mining and large-scale data integration

no tool will find it

Page 26: Literature mining and large-scale data integration

entity recognition

Page 27: Literature mining and large-scale data integration

identifying the substance(s)

Page 28: Literature mining and large-scale data integration

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 29: Literature mining and large-scale data integration

Cdc28 yeast

Page 30: Literature mining and large-scale data integration

Cdc28 cell cycle

Page 31: Literature mining and large-scale data integration

good synonyms list

Page 32: Literature mining and large-scale data integration

manual curation

Page 33: Literature mining and large-scale data integration

orthographic variation

Page 34: Literature mining and large-scale data integration

CDC28

Page 35: Literature mining and large-scale data integration

Cdc28p

Page 36: Literature mining and large-scale data integration

disambiguation

Page 37: Literature mining and large-scale data integration

hairy

Page 38: Literature mining and large-scale data integration

SDS

Page 39: Literature mining and large-scale data integration

APC

Page 40: Literature mining and large-scale data integration

Cdc2

Page 41: Literature mining and large-scale data integration
Page 42: Literature mining and large-scale data integration
Page 43: Literature mining and large-scale data integration
Page 44: Literature mining and large-scale data integration
Page 45: Literature mining and large-scale data integration

still too much to read

Page 46: Literature mining and large-scale data integration

information extraction

Page 47: Literature mining and large-scale data integration

formalizing the facts

Page 48: Literature mining and large-scale data integration
Page 49: Literature mining and large-scale data integration

co-mentioning

Page 50: Literature mining and large-scale data integration

statistical methods

Page 51: Literature mining and large-scale data integration

NLPNatural Language Processing

Page 52: Literature mining and large-scale data integration

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 53: Literature mining and large-scale data integration

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 54: Literature mining and large-scale data integration
Page 55: Literature mining and large-scale data integration

no new discoveries

Page 56: Literature mining and large-scale data integration

text mining

Page 57: Literature mining and large-scale data integration

undiscovered links

Page 58: Literature mining and large-scale data integration
Page 59: Literature mining and large-scale data integration

Raynaud’s syndrome

Page 60: Literature mining and large-scale data integration

fish oil

Page 61: Literature mining and large-scale data integration
Page 62: Literature mining and large-scale data integration

temporal trends

Page 63: Literature mining and large-scale data integration
Page 64: Literature mining and large-scale data integration

buzzwords

Page 65: Literature mining and large-scale data integration
Page 66: Literature mining and large-scale data integration

data integration

Page 67: Literature mining and large-scale data integration

association networks

Page 68: Literature mining and large-scale data integration
Page 69: Literature mining and large-scale data integration

information extraction

Page 70: Literature mining and large-scale data integration
Page 71: Literature mining and large-scale data integration

curated knowledge

Page 72: Literature mining and large-scale data integration
Page 73: Literature mining and large-scale data integration

protein interaction data

Page 74: Literature mining and large-scale data integration
Page 75: Literature mining and large-scale data integration

genetic interaction data

Page 76: Literature mining and large-scale data integration
Page 77: Literature mining and large-scale data integration

gene expression data

Page 78: Literature mining and large-scale data integration
Page 79: Literature mining and large-scale data integration

computational predictions

Page 80: Literature mining and large-scale data integration

conserved neighborhood

Page 81: Literature mining and large-scale data integration
Page 82: Literature mining and large-scale data integration

gene fusion

Page 83: Literature mining and large-scale data integration
Page 84: Literature mining and large-scale data integration

phylogenetic profiles

Page 85: Literature mining and large-scale data integration
Page 86: Literature mining and large-scale data integration

variable reliability

Page 87: Literature mining and large-scale data integration

raw quality scores

Page 88: Literature mining and large-scale data integration
Page 89: Literature mining and large-scale data integration
Page 90: Literature mining and large-scale data integration
Page 91: Literature mining and large-scale data integration

not comparable

Page 92: Literature mining and large-scale data integration

benchmarking

Page 93: Literature mining and large-scale data integration

calibrate vs. gold standard

Page 94: Literature mining and large-scale data integration
Page 95: Literature mining and large-scale data integration

probabilistic scores

Page 96: Literature mining and large-scale data integration

spread over many species

Page 97: Literature mining and large-scale data integration

373 genomes

Page 98: Literature mining and large-scale data integration
Page 99: Literature mining and large-scale data integration

transfer by orthology

Page 100: Literature mining and large-scale data integration
Page 101: Literature mining and large-scale data integration

combine all evidence

Page 102: Literature mining and large-scale data integration

P = 1-(1-P1).(1-P2).(1-P3)…

Page 103: Literature mining and large-scale data integration

web resources

Page 104: Literature mining and large-scale data integration
Page 105: Literature mining and large-scale data integration
Page 106: Literature mining and large-scale data integration

signaling networks

Page 107: Literature mining and large-scale data integration

phosphoproteomics

Page 108: Literature mining and large-scale data integration
Page 109: Literature mining and large-scale data integration

in vivo phosphosites

Page 110: Literature mining and large-scale data integration

kinases are unknown

Page 111: Literature mining and large-scale data integration

computational methods

Page 112: Literature mining and large-scale data integration
Page 113: Literature mining and large-scale data integration

overprediction

Page 114: Literature mining and large-scale data integration

context

Page 115: Literature mining and large-scale data integration

scaffolders

Page 116: Literature mining and large-scale data integration

association networks

Page 117: Literature mining and large-scale data integration
Page 118: Literature mining and large-scale data integration

NetworKIN

Page 119: Literature mining and large-scale data integration
Page 120: Literature mining and large-scale data integration

benchmarking

Page 121: Literature mining and large-scale data integration
Page 122: Literature mining and large-scale data integration

2.5-fold better accuracy

Page 123: Literature mining and large-scale data integration

web resources

Page 124: Literature mining and large-scale data integration
Page 125: Literature mining and large-scale data integration
Page 126: Literature mining and large-scale data integration

summary

Page 127: Literature mining and large-scale data integration

literature mining is good

Page 128: Literature mining and large-scale data integration

data integration is better

Page 129: Literature mining and large-scale data integration

Acknowledgments

Reflect & NLP– Evangelos Pafilis– Jasmin Saric– Rossitza Ouzounova– Sean O’Donoghue– Isabel Rojas

STRING & STITCH– Christian von Mering– Michael Kuhn– Manuel Stark– Samuel Chaffron– Philippe Julien– Tobias Doerks– Jan Korbel– Berend Snel– Martijn Huynen– Peer Bork

NetworKIN & NetPhorest– Rune Linding– Martin Lee Miller– Gerard Ostheimer– Francesca Diella– Karen Colwill– Jing Jin– Pavel Metalnikov– Vivian Nguyen– Adrian Pasculescu– Jin Gyoon Park– Leona D. Samson– Nikolaj Blom– Rob Russell– Peer Bork– Søren Brunak– Michael Yaffe– Tony Pawson

Page 130: Literature mining and large-scale data integration

http://larsjuhljensen.wordpress.com