26
Bioinformatics Predrag Radivojac INDIANA UNIVERSITY

Bioinformatics

Embed Size (px)

DESCRIPTION

Bioinformatics. Predrag Radivojac Indiana University. Basics of Molecular Biology. Can we understand how cells function?. Eukaryotic cell. Bioinformatics is multidisciplinary!. What is Bioinformatics? Integrates : computer science, statistics, chemistry, physics, and molecular biology - PowerPoint PPT Presentation

Citation preview

Bioinformatics

Predrag RadivojacINDIANA UNIVERSITY

Basics of Molecular Biology

Can we understand how cells function?

Eukaryotic cell

Bioinformatics is multidisciplinary!

• What is Bioinformatics?

– Integrates: computer science, statistics, chemistry, physics, and molecular biology

– Goal: organize and store huge amounts of biological data and extract knowledge from it

• Major areas of research– Genomics– Proteomics– Databases

• Practical discipline

Some major applications

· Drug design · Evolutionary studies · Genome characterization

Interesting Problems

Sequence

Alignment

Interesting Problems

Interesting Problems

• Sequence assembly

Goal:

solve the puzzle, i.e. connect the pieces into one

genomic sequence

Interesting Problems

• Proteomics

S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

Mass spectrometry

Interesting Problems

• Microarray data

Interesting Problems

• Functional Genomics • Gene Regulation

Diseases are interconnected…

Goh et al. PNAS, 104: 8685 (2007).

Disease

www.cancer.gov

• Development of tools that can be used to understand and treat human disease

• Prediction of disease-associated genes

• Important from• biological standpoint• medical standpoint• computational standpoint

• Background• human genome• low-throughput data• high-throughput data• ontologies for protein function at

multiple levels

The Time is Right!

Alzheimer’s disease

Top PhenoPred hits:

1) CDK5

2) NTN1

AUC = 77.5%

Loss/Gain of function and disease

Pauling et al. Science 110: 543 (1949). Chui & Dover. Curr Opin Pediatr, 13: 22 (2001).

Sickle Cell Disease: Autosomal recessive disorder E6V in HBB causes interaction w/ F85 and L88 Formation of amyloid fibrils Abnormally shaped red blood cells, leads to sickle cell anemia Manifestation of disease vastly different over patients

2hbs

E6V

http://gingi.uchicago.edu/hbs2.html

4hhb

Lipitor (ATORVASTATIN)

E6V

15

Proteins = chains of amino acids

• biomolecule, macromolecule– more than 50% of the dry

weight of cells is proteins

• polymer of amino acids connected into linear chains

• strings of symbols

• machinery of life– play central role in the

structure and function of cells

– regulate and execute many biological functions

a) amino acid b) amino acid chain

Introduction to Protein Structure by Branden and Tooze

16

• peptide bonds are planar and strong

• by rotating at each amino acid, proteins adopt structure

Protein structure

Introduction to Protein Structure by Branden and Tooze

17

Protein function

• Multi-level phenomenon– biochemical function – biological function– phenotypical function

• Example: kinase– biochemical function –

transferase– biological function – cell

cycle regulation– phenotypical function –

disease

• Function is everything that happens to or through a protein (Rost et al. 2003)

Myoglobin 1.4A X-ray PDB: 2jho 153 residues

C- C< 6A

Protein contact graph

Protein contact graph

Protein contact graph

21

S113 of isocitrate dehydrogenase

G = (V, E)

f: V A A = {A, C, D, … W, Y} g: V {1, +1}

Notation:

Residue neighborhood

22

Graphlets are small non-isomorphic connected graphs.

Different positions of the pivot vertex with respect to the graphlet correspond to graph-theoretical concept of automorphism orbits, or orbits.

S

Przulj et al. Bioinformatics 20: 3508 (2004).

Results

2-graphlets: 013-graphlets: 011, 0124-graphlets: 0111, 0112

0122, 0123

Key insight:

Efficient combinatorial enumeration

of graphlets / orbits over 7 disjoint cases

breadth-first search

A C D E F G H I K … Y AA AC AD …

01 |A|o2 |A|2

o5, o6, o11 |A|3

o3, o4 ?

A = {0, 1} 00, 01 = 10, 11 (3)A = {0, 1, 2} 00, 11, 22, 01 = 10,

02 = 20, 12 = 21 (6)

binomial (multinomial) coefficients

|A |= 20, dimensionality = 1,062,420

01 02

Inner product between vectors of counts of labeled orbits

where

K is a kernel because matrices of inner products are symmetric and positive definite (proof due to David Haussler).

A C D E F G H I K … Y AA AC AD …

A C D E F G H I K … Y AA AC AD …

Graphlet kernel

i(x) is the number of times labeled orbit i occurs in the graph