Bioinformatics [3mm] Some selected examples and a bit of ... · Some selected examples ... and a...

Preview:

Citation preview

BioinformaticsSome selected examples . . . and a bit of an overview

Ingo Ruczinski

Department of BiostatisticsJohns Hopkins Bloomberg School of Public Health

July 19, 2007 @ EnviroHealth Connections

Ingo Ruczinski Bioinformatics / Computational Biology

Bioinformatics and Computational Biology

Wikipedia:

Bioinformatics and computational biology involve the use oftechniques including applied mathematics, informatics, statistics,computer science, artificial intelligence, chemistry, and biochemistryto solve biological problems usually on the molecular level.· · ·

Major research efforts in the field include sequence alignment, genefinding, genome assembly, protein structure alignment, proteinstructure prediction, prediction of gene expression and protein-proteininteractions, and the modeling of evolution.

Ingo Ruczinski Bioinformatics / Computational Biology

Bioinformatics and Computational Biology

Wikipedia:

The terms bioinformatics and computational biology are often usedinterchangeably. However bioinformatics more properly refers to thecreation and advancement of algorithms, computational and statisticaltechniques, and theory to solve formal and practical problemsinspired from the management and analysis of biological data.Computational biology, on the other hand, refers to hypothesis-driveninvestigation of a specific biological problem using computers, carriedout with experimental or simulated data, with the primary goal ofdiscovery and the advancement of biological knowledge.

Ingo Ruczinski Bioinformatics / Computational Biology

Bioinformatics and Computational Biology

NIH definition of Bioinformatics and Computational Biology:

Bioinformatics and computational biology are rooted in life sciencesas well as computer and information sciences and technologies. Bothof these interdisciplinary approaches draw from specific disciplinessuch as mathematics, physics, computer science and engineering,biology, and behavioral science.· · ·

Bioinformatics applies principles of information sciences andtechnologies to make the vast, diverse, and complex life sciencesdata more understandable and useful. Computational biology usesmathematical and computational approaches to address theoreticaland experimental questions in biology. Although bioinformatics andcomputational biology are distinct, there is also significant overlapand activity at their interface.

Ingo Ruczinski Bioinformatics / Computational Biology

Bioinformatics and Computational Biology

NIH definition of Bioinformatics and Computational Biology:

The NIH Biomedical Information Science and Technology InitiativeConsortium agreed on the following definitions of bioinformatics andcomputational biology recognizing that no definition could completelyeliminate overlap with other activities or preclude variations ininterpretation by different individuals and organizations.

Bioinformatics: Research, development, or application ofcomputational tools and approaches for expanding the use ofbiological, medical, behavioral or health data, including those toacquire, store, organize, archive, analyze, or visualize such data.

Computational Biology: The development and application ofdata-analytical and theoretical methods, mathematical modeling andcomputational simulation techniques to the study of biological,behavioral, and social systems.

Ingo Ruczinski Bioinformatics / Computational Biology

The central dogma of biology

Drawn by Ebbe Sloth Andersen http://old.mb.au.dk/

Ingo Ruczinski Bioinformatics / Computational Biology

Topics

DNA −→ RNA −→ Protein

DNA Sequence analysis, genome annotation, evolutionarybiology, phylogeny, DNA alterations, comparativegenomics, SNP association studies

RNA Analysis of gene expression and regulation

Proteins Analysis of protein expression, protein-protein docking,prediction of protein structure

Systems Biology:modeling biological systems, gene/protein networks

Ingo Ruczinski Bioinformatics / Computational Biology

Some selected examples

1 Chromosomal alterations

2 Protein structure prediction

3 2D gel electrophoresis

Ingo Ruczinski Bioinformatics / Computational Biology

Some selected examples

1 Chromosomal alterations

2 Protein structure prediction

3 2D gel electrophoresis

Ingo Ruczinski Bioinformatics / Computational Biology

Karyotypes

Ingo Ruczinski Bioinformatics / Computational Biology

Trisomy

http://www.medgen.ubc.ca

Ingo Ruczinski Bioinformatics / Computational Biology

DNA changes

Ingo Ruczinski Bioinformatics / Computational Biology

The data

Ingo Ruczinski Bioinformatics / Computational Biology

Deletion

Ingo Ruczinski Bioinformatics / Computational Biology

FISH

Ingo Ruczinski Bioinformatics / Computational Biology

Amplification

Ingo Ruczinski Bioinformatics / Computational Biology

Uniparental Isodisomy

Ingo Ruczinski Bioinformatics / Computational Biology

Cancer samples

Ingo Ruczinski Bioinformatics / Computational Biology

Mosaicism

Ingo Ruczinski Bioinformatics / Computational Biology

SNPchip S4 classes and methods

Ingo Ruczinski Bioinformatics / Computational Biology

Estimation

1 By SNP:

Estimate genotype and copy number for each SNP.

2 Within a sample:

Borrow strength between SNPs to infer regions of LOHand copy number changes.

3 Between samples:

Comparison between normal and disease populations tofind chromosomal alterations associated with disease.

Ingo Ruczinski Bioinformatics / Computational Biology

Vanilla ICE

1

2

3

5

1

2

3

4

5A B CD E

ICE

Van

DeletionNormal

LOHAmplification

50 51 52 53 54 55

1

2

3

A

69 70 71 72 73 74

D

174 176 178

B

238 240 242 244 246

E

Mb

ICE

Van

Ingo Ruczinski Bioinformatics / Computational Biology

A HapMap sample

100 150 200

1

2

345

ICE

Van

DeletionNormal

LOHAmplification

190

135 140 145 150

1

2

345

143

Mb

149 149.5 150 150.5 151 151.5 152

ICE

Van

Ingo Ruczinski Bioinformatics / Computational Biology

Many HapMap samples

Ingo Ruczinski Bioinformatics / Computational Biology

SNP Trio

Ingo Ruczinski Bioinformatics / Computational Biology

HMM for SNP Trio

chromosome 10

position (Mb)

0 20 40 60 80 100 120 140

MI−S

MI−D

UPI−M

UPI−F

BPI

BPI

non−BPI

chromosome 22

position (Mb)

15 20 25 30 35 40 45 50

MI−S

MI−D

UPI−M

UPI−F

BPI

BPI

non−BPI

Ingo Ruczinski Bioinformatics / Computational Biology

HMM for SNP Trio

80.9

1

MI−S

MI−D

UPI−M

UPI−P

BPI

HMM

child

copy

num

ber

−2−1.5

−1−0.5

00.5

1

mother

copy

num

ber

−2−1.5

−1−0.5

00.5

1

father

copy

num

ber

−2−1.5

−1−0.5

00.5

1

0 20 40 60 80 100 120 135

80.9

1

MI−S

MI−D

UPI−M

UPI−P

BPI

HMM

child

copy

num

ber

−2−1.5

−1−0.5

00.5

1

mother

copy

num

ber

−2−1.5

−1−0.5

00.5

1

father

copy

num

ber

−2−1.5

−1−0.5

00.5

1

15 20 25 30 35 40 45 49

Ingo Ruczinski Bioinformatics / Computational Biology

Some selected examples

1 Chromosomal alterations

2 Protein structure prediction

3 2D gel electrophoresis

Ingo Ruczinski Bioinformatics / Computational Biology

Proteins

−→ Amino acids are the building blocks of proteins.

Ingo Ruczinski Bioinformatics / Computational Biology

Proteins

−→ Both figures show the same protein (the bacterial protein L).

The right figure also highlights the secondary structure elements.

Ingo Ruczinski Bioinformatics / Computational Biology

Proteins

From Lehninger, Principles of Biochemistry

Ingo Ruczinski Bioinformatics / Computational Biology

Functional Annotation

Ingo Ruczinski Bioinformatics / Computational Biology

Genome Wide Annotation

Ingo Ruczinski Bioinformatics / Computational Biology

Some selected examples

1 Chromosomal alterations

2 Protein structure prediction

3 2D gel electrophoresis

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

A:1 A:2 A:3 A:4 A:5 A:6 A:7 A:8 A:9 A:10 A:11 A:12 B:1 B:2 B:3 B:4 B:5 B:6 B:7 B:8 B:9 B:10 B:11 B:12

A B

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

A:1 A:2 A:3 A:4 B:1 B:2 B:3 B:4

A B

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

−20

0

20

40

60

% r

educ

tion

of c

once

ntra

tion

as c

ompa

red

to b

ackg

roun

d 1st Trimester

3rd Trimester

Folate Placebo

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

Ingo Ruczinski Bioinformatics / Computational Biology

2D Gel Electrophoresis

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

1300

1200

1100

1000

900

800

700

600

500

400

300

200

100

0

790

1056

853

662

1458

948

1026

770

586

768

797

332

248

Ingo Ruczinski Bioinformatics / Computational Biology

http://biostat.jhsph.edu/∼iruczins/

Ingo Ruczinski Bioinformatics / Computational Biology

Recommended