Upload
junior
View
216
Download
0
Embed Size (px)
Citation preview
8/8/2019 Gene and Protein
1/37
Gene, Proteins, and Genetic Code
8/8/2019 Gene and Protein
2/37
Protein Synthesis in a Cell
8/8/2019 Gene and Protein
3/37
Protein and Amino Acids
8/8/2019 Gene and Protein
4/37
Protein
8/8/2019 Gene and Protein
5/37
Protein
GOT Ecoli
8/8/2019 Gene and Protein
6/37
A protein sequence>gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region
MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN
IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD
EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI
SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE
SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL
VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG
FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP
TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV
VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST
A protein sequence may have a few hundreds to several
thousands amino acids.
8/8/2019 Gene and Protein
7/37
Protein synthesis
8/8/2019 Gene and Protein
8/37
Genetic code.
.
AT
T
C
A
CA
G
T
GG
A
.
.
I
H
S
G
8/8/2019 Gene and Protein
9/37
Notes on translation
Three Reading frames
Third base not important
5 -> 3
Start and end codon
Open Reading Frame (ORF)
Each gene is an ORF, but not all ORF aregenes.
8/8/2019 Gene and Protein
10/37
The Central Dogma of Molecular Biology
DNA RNA Proteintranscript translation
replication
genotype phenotype
8/8/2019 Gene and Protein
11/37
Exception retroviruses
DNA RNA Proteintranscript translation
replication
genotype phenotype
8/8/2019 Gene and Protein
12/37
ProteinPhenotype
DNA
(Genotype)
Biology
8/8/2019 Gene and Protein
13/37
Genes One gene encodes one protein (or sometimesRNA).
Like a program, it starts with start codon (e.g.
ATG), then each three code one amino acid. Thena stop codon (e.g. TGA) signifies end of the gene.
Genes are dense in prokaryotes and sparse ineukaryotes.
In the middle of a eukaryotic gene, there areintrons that are spliced out (as junk) aftertranscription. Good parts are called exons. This isthe task of gene finding.
8/8/2019 Gene and Protein
14/37
Gene related diseases Hemophilia: on X chromosome.
Sickle-Cell Anemia: single nucleotide mutation in the firstexon of beta-globin gene (removes a cutting site). 1 in 12African Americans are carriers. (sick for homozygotes)
BRCA1 gene (chr. 17q) responsible for inheritedbreast cancer (10% of breast cancer)
Fragile X syndrome (mentally retard) 1 in 1250 males,
2500 females (dominate, but females have partiallyexpressed good gene). FMR-1 gene: tri-nucleotide repeats>200 causes disease.
P53 gene: chr. 17p, tumor suppressor protein.
8/8/2019 Gene and Protein
15/37
Genetic Test Example:
http://www.myriad.com/index.php
Cons and Pros: Can possibly avoid/early diagnose the disease.
Can make you unhappier
Can help insurance company discriminate thedefected gene carriers
8/8/2019 Gene and Protein
16/37
8/8/2019 Gene and Protein
17/37
8/8/2019 Gene and Protein
18/37
Gene Prediction and Annotation
Prokaryotes
1. Start/stop codon (ORF)
2. Promoters
3. Content4. Sequence similarity
8/8/2019 Gene and Protein
19/37
8/8/2019 Gene and Protein
20/37
Start Codon
May miss short genes.
Do not know which start codon to use.
Overlapping ORF at different reading frames.
8/8/2019 Gene and Protein
21/37
Promoters
5'-XXXXPPPPPPXXXXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG XXXX-3
-35 -10 Gene to be transcribed
-10: T A T A A T
77% 76% 60% 61% 56% 82%
-35: T T G A C A
69% 79% 61% 56% 54% 54%
Pribnow box
Inprokaryotes, the promoter consists of two short sequences at -10 and -35 position
upstream of the gene, that is, prior to the gene in the direction of transcription. The
sequence at -10 is called the Pribnow box and usually consists of the six nucleotides
TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes. The
other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence allows
a very high transcription rate.
These rules are only
approximately correct.
8/8/2019 Gene and Protein
22/37
Scoring a 6-mer as Pribnow box Computers deal with exact formulae but not
English description.
We need a score function to measure thelikelihood that a 6-mer is a pribnow box
8/8/2019 Gene and Protein
23/37
An exemplary function for pribnow
box fitness evaluation
log()
8/8/2019 Gene and Protein
24/37
Content I codon bias A codon XYZ occurs with different freqencies in
coding regions and non-coding regions
different amino acids have different freq.
Diff. codons for the same amino acid have diff. freq. In non-coding regions approx. p(X)*p(Y)*p(Z)
8/8/2019 Gene and Protein
25/37
http://www.kazusa.or.jp/codon/
8/8/2019 Gene and Protein
26/37
8/8/2019 Gene and Protein
27/37
8/8/2019 Gene and Protein
28/37
Content II - Hidden Markov
Model (HMM)
8/8/2019 Gene and Protein
29/37
Eukaryotes Basic idea similar to Prokaryotes
Difference:
8/8/2019 Gene and Protein
30/37
DN
A-specific transcription factors
These are the basic of gene-regulatorynetwork
Another hot area in Bioinformatics
8/8/2019 Gene and Protein
31/37
Splicing
Consensus sequences have been identified as necessary butnot sufficient forsplicing. In vertebrates, these sequencesare (the slash identifies the exon-intron or intron-exon
junction): C(orA)AG/GTA(orG)AGT "donor" splice site
T(orC)nNC(orT)AG/G "acceptor" splice site.
A third sequence, which in yeast is TACTAAC , is necessarywithin the intron sequence.
These rules are only
approximately correct.
8/8/2019 Gene and Protein
32/37
8/8/2019 Gene and Protein
33/37
8/8/2019 Gene and Protein
34/37
8/8/2019 Gene and Protein
35/37
8/8/2019 Gene and Protein
36/37
8/8/2019 Gene and Protein
37/37
Gene Prediction Software Try Gene Scan at
http://genes.mit.edu/GENSCAN.htmlby
using the sequence at
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=3253144
Did Gene Scan work well?