80
Gene architecture and sequence annotation Week 2

Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Gene architecture and

sequence annotation

Week 2

Page 2: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Last week:

1) How to search genomic databases

such as NCBI and ensembl

1) How to obtain sequence files

Page 3: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Sequence of the

Cystic Fibrosis

Gene: CFTR

This week we will learn to identify genetic

architecture within sequence files

Page 4: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

This week will learn the differences

between the two types of Nucleic Acid

Sequences

1) Genomic—the sequence of nucleotides

on a chromosome

2) Expressed sequences—the sequence

of nucleotides in mRNA/cDNA

Page 5: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

DNA RNA protein

Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

The expression of genomic

information

Page 6: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

DNA RNA protein

genome transcriptome proteome

Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

Page 7: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

DNA RNA

cDNA

ESTs

UniGene

phenotype

genomic

DNA

databases

protein

sequence

databases

protein

Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

Page 8: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Learning Objectives:

Understand sequence differences between genomic

and expressed sequences

Use programs to determine the correct open reading

frame (ORF) of an expressed sequence

Annotate sequence files

Page 9: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Genomic DNA is one source

of nucleic acid sequence

Strachan, T. & Read, A.P. Human Molecular Genetics. (New York; Wiley-Liss, 1999).

Page 10: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

The chemical properties of DNA are

important for sequence analysis

Page 11: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA is composed of two anti-parallel strands

5’ is the beginning of the sequence and 3’ is

the end of the sequence

DNA sequence is always written with 5’ at the

left side and 3’ at the right side

Page 12: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA is composed of two anti-parallel strands

5’ is the beginning of the sequence and 3’ is

the end of the sequence

DNA sequence is always written with 5’ at the

left side and 3’ at the right side

Strand 1: 5’ GAT…

Page 13: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA is composed of two anti-parallel strands

5’ is the beginning of the sequence and 3’ is

the end of the sequence

DNA sequence is always written with 5’ at the

left side and 3’ at the right side

Strand 1: 5’ GAT…

Strand 2: 5’ AGT…

Page 14: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA has strict base pairing rules that determine

the sequence of the complementary strand

Page 15: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

DNA RNA protein

Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

Transcription is the process of making

RNA from a DNA template

Page 16: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

During transcription and RNA molecule is

synthesized from genomic DNA

Page 17: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

RNA polymerase adds bases to the 3’ end

of the growing RNA molecule

Page 18: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A

Molecular Approach

(Sunderland; Sinauer

Associates, 2000).

The rule of complementary base pairing are

followed for RNA transcription

During RNA

transcription Uridine

is added instead of

Thymine. Uridine

base pairs with

Adenine.

In Bioinformatics we

ignore this fact—all

Uridine are written

as Thymine.

Page 19: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A

Molecular Approach

(Sunderland; Sinauer

Associates, 2000).

Template strand=

antisense

The template strand is anti-parallel to the

growing mRNA molecule

3’

5’

5’

3’

Page 20: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

Template strand=

antisense

non-template

strand =

sense strand

The template strand is anti-parallel to the

growing mRNA molecule

3’

5’

5’

3’

This strand has

the same

sequence as the

mRNA molecule

Page 21: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Genes can be found on both

strands of a chromosomeForward strand

Reverse strand

5’

5’

Page 22: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The original RNA molecule undergoes

processing that changes the sequence

Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Page 23: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The original RNA molecule is processed

Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Exons are segments

of DNA that are found

in mature mRNA

Page 24: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The original RNA molecule is processed

Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Introns are segments

of DNA that are

removed through

splicing. They are

not found in mRNA

Page 25: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The original RNA molecule is processed

Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

The sequence in red is

the coding sequence

(often abbreviated

CDS)

Page 26: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The original RNA molecule is processed

Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

The sequence in red is

the coding sequence

(often abbreviated

CDS)

Page 27: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

In the mRNA the exons are joined together

as one continuous sequence

Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Page 28: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Translation is the process by which an

mRNA molecule is used to make a protein

+1 is the first translated

nucleotide (usually the A

(followed by TG

(ATG=Methionine)

Page 29: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Translation is the process by which an

mRNA molecule is used to make a protein

The red indicates all the sequence within

the mRNA that will be used during

translation to code for protein

Page 30: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The sequences within an mRNA that do

not directly code for protein are called

Untranslated Regions

5’ UTR-

UnTranslated Region

before start codon—

does not code for

protein

3’ UTR-

UnTranslated Region

after stop codon—does

not code for protein

Page 31: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

mRNA is converted to cDNA using reverse

transcription

Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).

Page 32: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Because it is cDNA, not mRNA that is

sequenced we use T not U in sequence

files

Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).

Page 33: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

How do we identify introns/exons in our

sequence files?

Page 34: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

We will use KRAS as an example

Page 35: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The KRAS gene produces 4 transcripts

(splice variants)

Transcript

Table

Page 36: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

This is the transcript diagram for this gene

region

Page 37: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The Transcript Diagram shows the organization

of the transcripts generated from the gene locus

Page 38: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Use the link under the “Transcript ID” column

identify the exons and introns in a specific

transcript

Page 39: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The exon/intron map for a specific transcript

The lines are intronic sequence

Page 40: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The exon/intron map for a specific transcript

The lines are intronic sequence

Bars are exonic sequence: filled bars

mean coding sequence and unfilled bars

are UTR sequence

Page 41: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The exon/intron map for a specific transcript

The number of introns is always the number of exons -1.

5 exons, means 4 introns

Page 42: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The RefSeq link will direct you to the NCBI

nucleotide record for that gene

Page 43: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

NCBI nucleotide record

Page 44: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

NCBI nucleotide record continued

Page 45: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

NCBI nucleotide record also contains the

sequence

Page 46: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

60

Every nucleotide within the sequence has

an exact position

Each nucleotide has a number associated

with its position

Page 47: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

NCBI nucleotide contains the annotation of

the sequence

Page 48: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The numbers refer to nucleotide positions

Page 49: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Viewing features within the

sequence file

Page 50: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Once you select a sequence feature, the

nucleotide sequence of the feature

become highlighted

Page 51: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

CDS stands for coding sequence and this

will also show you the translation of the

nucleotide sequence into amino acid

sequence

Page 52: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

DNA RNA protein

Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

The genetic code

Page 53: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The genetic code is based on three

nucleotides “coding” for one amino acid

Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol;

O’Reilly, 2003).

Codons

Amino acid

Page 54: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

An Open Reading Frame (ORF)

begins with ATG and ends with TAA,

TAG or TGA

Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol;

O’Reilly, 2003).

Page 55: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

To find the coding sequence you must

identify the start and stop codons within the

sequence

Page 56: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Which start codon is right?

Page 57: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Which start codon is right?

The correct ORF is the longest translated

sequence

Page 58: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Any sequence has 6 possible

reading frames

Two strands of DNA

Triplet code (three

nucleotides in a codon)

Page 59: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Any sequence has 6 possible

reading frames

5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’

5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1

5’ C GCA TGG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTT AA 3’ FRAME +2

5’ CG CAT GGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TTA A 3’ FRAME +3

Page 60: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The next three reading frames are based

on the reverse complement sequence

5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’

3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence

5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

Page 61: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Generating the reverse complement

sequence

5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’

3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence

5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

Page 62: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The 6 possible reading frames

5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’

3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence

5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

5’ TTA AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC CAT GCG 3’ FRAME -1

5’ T TAA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACC ATG CG 3’ FRAME -2

5’ TT AAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CCA TGC G 3’ FRAME -3

Page 63: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The correct reading frame will

have the largest ORF

5’ M V L R W S S H G S V Ter 3’

5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’

5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1

ATG (M) is the start codon

TAA, TAG or TGA are the three stop codons—they do

not code for an amino acid

(amino acids)

Always

begins with

ATG

Always ends

with a stop

codon

Page 64: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Using the ORF-finder

program to identify ORFs

http://www.ncbi.nlm.nih.gov/gorf/gorf.html

Or Google “ORF-finder”

Page 65: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Using ORF-finder

Page 66: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Using ORF-finder

Page 67: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Using ORF-finder

Page 68: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Results from ORF-finder

Page 69: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

There are 6 possible reading

frames

Page 70: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

For our purposes, the largest

ORF is the correct one

Page 71: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Selecting an ORF gives you

the translation

Page 72: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

ORFs begin with a start codon

and end with a stop codon

Page 73: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

ORF-finder results match with

NCBI nucleotide

Page 74: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Sequences found in the genomic DNA

are removed from the mRNA

Page 75: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Sequences found in the genomic DNA

are removed from the mRNA

Introns are the

sequences that

are removed

The mature mRNA

sequence contains only

exonic sequence

Page 76: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

An mRNA sequence includes 5’UTR,

ORF, 3’UTR

5’ UTR-

Unstranslated region

before start codon—

does not code for

protein

3’ UTR-

Untranslated

region after stop

codon—does not

code for protein

Coding sequence

(red)

Page 77: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

There are 6 possible reading frames in a

nucleic acid sequence

Page 78: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

The correct ORF is usually the largest

Page 79: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

ORFs start with ATG and end with a stop

codon

Page 80: Gene architecture and sequence annotationandrew-michaelson.com/fweb/lab_website/Bio345/Week-2.pdf · between the two types of Nucleic Acid Sequences 1) Genomic—the sequence of nucleotides

Worksheet