Introduction to BioinformaticsIntroduction to Bioinformatics
2
Human Genome ProjectHuman Genome Project
Genome Health Implications
A New
Disease
Encyclopedia
New Genetic
Fingerprints
New
Diagnostics
New
Treatments
Goals• Identify the approximate 40,000 genes in human DNA• Determine the sequences of the 3 billion bases that make up human DNA• Store this information in database• Develop tools for data analysis• Address the ethical, legal and social issues that arise from genome research
3
BTBTITIT
Bioinformatics
Biocomputing
4
What is Bioinformatics?What is Bioinformatics?
Informatics – computer science
Bio – molecular biology
Bioinformatics – solving problems arising from biology using methodology from computer science.
5
Basics in Molecular BiologyBasics in Molecular Biology
6
ChromosomesChromosomes
DNA in a human cell: 2m DNA in a human body: Earth-to-Sun:
km 102 11
km 105.1 8
7
DNA(Deoxyribonucleic Acid)DNA(Deoxyribonucleic Acid)
Nucleotide 들로 구성 Nucleotide = Sugar + Phosphate + Nitrogenous base Adenine – Thymine Guanine – Cytosine
Double Helix 구조
8
DNA Base-pairsDNA Base-pairs
9
DNADNA
AACCTGCGGAAGGATCATTACCGAGTGCGGGTCCTTTGGGCCCAACCTCCCATCCGTGTCTATTGTACCCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCTCTGCCCCCCGGGCCCGTGCCCGCCGGAGACCCCAACACGAACACTGTCTGAAAGCGTGCAGTCTGAGTTGATTGAATGCAATCAGTTAAAACTTTCAACAATGGATCTCTTGGTTCCGGCATGCAATCAGTCCCGTTGCTTCGGCACTGTCTGAAAGCGCCTTTGGGCCCAACCTCCCATCCGTGTCTATTGTACCCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGCCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGCTATTGTACCCGTTGCTTCGGATCTCTTGGGGATCTCTTGGTTCCGGCATGCAATCAGTCCCGTTGCTTCGGCACTGTCTGAAAGCGCCTTTGGGCCCAACCTCCCACCGTTGCTTCGGCGGGCCCGCCGCTTGTCGGCCGCCGGGGGGGCGGCCGCCGGGGGCACTGTCTGAAAGCTCGGCCGCC
10
Some FactsSome Facts
DNA differs between humans by 0.2%, (1 in 500 bases).
Human DNA is 98% identical to that of chimpanzees.
97% of DNA in the human genome has no known function.
3.2*109 letters in the DNA code in every cell in your body.
1014 cells in the body. 12,000 letters of DNA decoded by the Human
Genome Project every second.
11
Gene and GenomeGene and Genome
Gene Fundamental unit of heredity 단백질을 합성하는데 필요한 정보 포함 Genome 의 일부
Genome 생명체가 갖는 전체 DNA
12
Numbers of GenesNumbers of Genes
Humans 25,000 - 40,000
C. elegans (worm): 19,000
S. cerevisiae (yeast) 6,000
Tuberculosis microbe 4,000
13
RNA(Ribonucleic Acid)RNA(Ribonucleic Acid)
A, C, G, U(Uracil) mRNA
DNA 에서 gene 을 transcription 하여 세포 내에서 단백질을 합성하는 기관인 ribosome 에 정보 전달
tRNA Ribosome 이 아미노산을 만들 때 , mRNA 와 아미노산 사이의 adaptor 역할을 함
14
Molecular Biology: Flow of Molecular Biology: Flow of Information (Central Dogma)Information (Central Dogma)
DNA “gene”
RNA
Protein
FoldedProtein
15
DNA (gene) RNA ProteinDNA (gene) RNA Protein
controlstatement
TATA start
Termination stop
controlstatement
Ribosomebinding
gene
Transcription (RNA polymerase)
mRNA
Protein
Translation (Ribosome)
5’ utr 3’ utr
16
CodonCodon
tRNA 는 3 개의 nucleic acid 와 결합 codon
조합 개수 64 20 가지의 아미노산 , stop codon 지정
하나의 codon 은 하나의 amino acid 를 만들고 amino acid 가 결합하여 단백질을 형성한다 .
17
Genetic Code: 3 bases=1amino acidGenetic Code: 3 bases=1amino acid
FirstPosition(5’ end)T
C
A
G
T C A G
Second positionThirdPosition(3’ end)T
CAG
TCAG
TCAGTCAG
PhePheLeuLeuLeuLeuLeuLeu
llelleLleMet
ValValValVal
AlaAlaAlaAla
ThrThrThrThr
ProProProPro
SerSerSerSer
TyrTyrSTOPSTOPHisHisGlnGlnAsnAsnLysLys
AspAspGluGiu
CysCysSTOPTrpArgArgArgArg
SerSerArgArg
GlyGlyGlyGly
18
Protein StructureProtein Structure
19
Human Genetic VariationsHuman Genetic Variations(Single Nucleotide Polymorphisms)(Single Nucleotide Polymorphisms) SNP’s- “genetic individuality” ~1/1000 bases variable (2 humans) Make us more/less susceptible to diseases May influence the effect of drug treatments
TTTGCTCCGTTTTCATTTGCTCYGTTTTCATTTGCTCTGTTTTCA
20
SNP (Single Nucleotide PolymorpSNP (Single Nucleotide Polymorphism)hism) Finding single nucleotide changes at specific regions of
genes
Diagnosis of hereditary diseases Personal drug Finding more effective drugs and
treatments
21
Human IndividualityHuman Individuality
22
Flood of Data! (SWISS-PROT)Flood of Data! (SWISS-PROT)
1988 1990 1992 1994 1996
807060504030
2010 0
Year of release
Nu
mb
er
of
seq
uen
ces x
10
00
23
How Can We Analyze the Flood How Can We Analyze the Flood of Data?of Data?Data: don’t just store it, analyze it! By com
paring sequences, one can find out about things like ancestors of organisms phylogenetic trees protein structures protein function
24
Bioinformatics Is About:Bioinformatics Is About:
Elicitation of DNA sequences from genetic material
Sequence annotation (e.g. with information from experiments)
Understanding the control of gene expression (i.e. under what circumstances proteins are transcribed from DNA)
The relationship between the amino acid sequence of proteins and their structure.
25
Aim of Research in BioinformaticsAim of Research in Bioinformatics
Understand the functioning of living things – to “improve the quality of life”.
Drug design Identification of genetic risk factor Gene therapy Genetic modification of good crops and animals, etc
26
Extension of Bioinformatics ConcExtension of Bioinformatics Concept ept Genomics
Functional genomics Structural genomics
Proteomics: large scale analysis of the proteins of an organism
Pharmacogenomics: developing new drugs that will target a particular disease
Microarray: DNA chip, protein chip