26
Bioinformatics Predrag Radivojac INDIANA UNIVERSITY

Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Embed Size (px)

Citation preview

Page 1: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Bioinformatics

Predrag RadivojacINDIANA UNIVERSITY

Page 2: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Basics of Molecular Biology

Can we understand how cells function?

Eukaryotic cell

Page 3: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Bioinformatics is multidisciplinary!

• What is Bioinformatics?

– Integrates: computer science, statistics, chemistry, physics, and molecular biology

– Goal: organize and store huge amounts of biological data and extract knowledge from it

• Major areas of research– Genomics– Proteomics– Databases

• Practical discipline

Some major applications

· Drug design · Evolutionary studies · Genome characterization

Page 4: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Interesting Problems

Sequence

Alignment

Page 5: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Interesting Problems

Page 6: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Interesting Problems

• Sequence assembly

Goal:

solve the puzzle, i.e. connect the pieces into one

genomic sequence

Page 7: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Interesting Problems

• Proteomics

S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

Mass spectrometry

Page 8: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Interesting Problems

• Microarray data

Page 9: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Interesting Problems

• Functional Genomics • Gene Regulation

Page 10: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Diseases are interconnected…

Goh et al. PNAS, 104: 8685 (2007).

Page 11: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Disease

www.cancer.gov

• Development of tools that can be used to understand and treat human disease

• Prediction of disease-associated genes

• Important from• biological standpoint• medical standpoint• computational standpoint

• Background• human genome• low-throughput data• high-throughput data• ontologies for protein function at

multiple levels

The Time is Right!

Page 12: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Alzheimer’s disease

Top PhenoPred hits:

1) CDK5

2) NTN1

AUC = 77.5%

Page 13: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Loss/Gain of function and disease

Pauling et al. Science 110: 543 (1949). Chui & Dover. Curr Opin Pediatr, 13: 22 (2001).

Sickle Cell Disease: Autosomal recessive disorder E6V in HBB causes interaction w/ F85 and L88 Formation of amyloid fibrils Abnormally shaped red blood cells, leads to sickle cell anemia Manifestation of disease vastly different over patients

2hbs

E6V

http://gingi.uchicago.edu/hbs2.html

4hhb

Page 14: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Lipitor (ATORVASTATIN)

E6V

Page 15: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

15

Proteins = chains of amino acids

• biomolecule, macromolecule– more than 50% of the dry

weight of cells is proteins

• polymer of amino acids connected into linear chains

• strings of symbols

• machinery of life– play central role in the

structure and function of cells

– regulate and execute many biological functions

a) amino acid b) amino acid chain

Introduction to Protein Structure by Branden and Tooze

Page 16: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

16

• peptide bonds are planar and strong

• by rotating at each amino acid, proteins adopt structure

Protein structure

Introduction to Protein Structure by Branden and Tooze

Page 17: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

17

Protein function

• Multi-level phenomenon– biochemical function – biological function– phenotypical function

• Example: kinase– biochemical function –

transferase– biological function – cell

cycle regulation– phenotypical function –

disease

• Function is everything that happens to or through a protein (Rost et al. 2003)

Page 18: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Myoglobin 1.4A X-ray PDB: 2jho 153 residues

C- C< 6A

Protein contact graph

Page 19: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Protein contact graph

Page 20: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Protein contact graph

Page 21: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

21

S113 of isocitrate dehydrogenase

G = (V, E)

f: V A A = {A, C, D, … W, Y} g: V {1, +1}

Notation:

Residue neighborhood

Page 22: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

22

Graphlets are small non-isomorphic connected graphs.

Different positions of the pivot vertex with respect to the graphlet correspond to graph-theoretical concept of automorphism orbits, or orbits.

S

Przulj et al. Bioinformatics 20: 3508 (2004).

Page 23: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Results

Page 24: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

2-graphlets: 013-graphlets: 011, 0124-graphlets: 0111, 0112

0122, 0123

Key insight:

Efficient combinatorial enumeration

of graphlets / orbits over 7 disjoint cases

breadth-first search

Page 25: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

A C D E F G H I K … Y AA AC AD …

01 |A|o2 |A|2

o5, o6, o11 |A|3

o3, o4 ?

A = {0, 1} 00, 01 = 10, 11 (3)A = {0, 1, 2} 00, 11, 22, 01 = 10,

02 = 20, 12 = 21 (6)

binomial (multinomial) coefficients

|A |= 20, dimensionality = 1,062,420

01 02

Page 26: Bioinformatics Predrag Radivojac I NDIANA U NIVERSITY

Inner product between vectors of counts of labeled orbits

where

K is a kernel because matrices of inner products are symmetric and positive definite (proof due to David Haussler).

A C D E F G H I K … Y AA AC AD …

A C D E F G H I K … Y AA AC AD …

Graphlet kernel

i(x) is the number of times labeled orbit i occurs in the graph