busqueda de Promotores

Preview:

DESCRIPTION

Busqueda de promotores area bioinformatica

Citation preview

Secuencias regulatorias y búsqueda de promotores

Bq. Francisco DuartePh.D. Biotechnology

Contenidos

1. Background

2. Representación de motivos regulatorios

3. Algoritmos de búsqueda de promotores

4. Bases de datos relacionadas con la búsqueda depromotores

Probability of occurrence of each nucleotide

for -10 sequence

T A T A A T

77% 76% 60% 61% 56% 82%

for -35 sequence

T T G A C A

69% 79% 61% 56% 54% 54%

TRANSFAC Estructura de un gen eucarionte

ContigGene

Splice

Variants mRNA

Regulatory

Elements

CDS5’-UTR 3’-UTR

5‘

Splicing

3‘

Transcription

primary

transcript

altern.exon

promoterenhancer 1enhancer 2

TSS

TATAbox

initiatorInr

box Abox Bbox Cbox A‘

compositeelement

box E box Dbox D‘box Fbox Gbox A‘‘

Esquema general de la estructura jerárquica de lasregiones regulatorias de la transcripcion en geneseucariontes

¿Qué es un factor de transcripción?

A transcription factor is a protein that regulates transcription

after nuclear translocation by specific interaction with DNA

or by stoichiometric interaction with a protein that can be

assembled into a sequence-specific DNA-protein complex.

http://www.gene-regulation.com/pub/databases/transfac/clSM.html

Regiones regulatorias

Gene regulation

• Virtually every cell in your body contains acomplete set of genes

• But they are not all turned on in every tissue

• Each cell in your body expresses only a smallsubset of genes at any time

• During development different cells express differentsets of genes in a precisely regulated fashion.

• Gene regulation occurs at the level oftranscription or production of mRNA

• A given cell transcribes only a specific set ofgenes and not others

• Insulin is made by pancreatic cells

Características de las regiones reguladoras

Chequear: http://www.ccg.unam.mx/Computational_Genomics/PromoterTools/http://molbiol-tools.ca/Promoters.htmhttp://www.phisite.org/main/index.php?nav=tools&nav_sel=hunterhttp://www.fruitfly.org/seq_tools/promoter.htmlhttp://linux1.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb

Central dogma

Genetic information always goes from DNA to RNA to protein

Gene regulation has been well studied in E. coli

When a bacterial cell encounters a potential food source it will manufacture the enzymesnecessary to metabolize that food

Gene Regulation

In addition to sugars like glucose and lactose E. coli cells also require amino acidsOne essential aa is tryptophan.

When E. coli is swimming in tryptophan (milk & poultry) it will absorb the amino acids fromthe mediaWhen tryptophan is not present in the media then the cell must manufacture its’ ownamino acids

Trp Operon

E. coli uses several proteins encoded by a cluster of 5 genes to manufacture the amino acidtryptophan.

All 5 genes are transcribed together as a unit called an operon, which produces a singlelong piece of mRNA for all the genes.

RNA polymerase binds to a promoter located at the beginning of the first gene andproceeds down the DNA transcribing the genes in sequence

Gene regulation

In addition to amino acids, E. coli cells also metabolize sugars

in their environment.

In 1959 Jacques Monod and Fracois Jacob looked at the

ability of E. coli cells to digest the sugar lactose.

In the presence of the sugar lactose, E. coli makes an

enzyme called beta galactosidase.

Beta galactosidase breaks down the sugar lactose so the E.

coli can digest it for food.

It is the LAC Z gene in E. coli that codes for the enzyme beta

galactosidase.

Lac Z Gene

The tryptophane gene is turned on when there is no tryptophan in the

media.

That is when the cell wants to make it´s own tryptophan.

E. coli cells can not make the sugar lactose.

They can only have lactose when it is present in their environment.

Then they turn on genes to beak down lactose.

The E. coli bacteria only needs beta galactosidase if there is lactose in the

environment to digest. There is no point in making the enzyme if there is no

lactose sugar to break down.

It is the combination of the promoter and the DNA that regulate when a

gene will be transcribed.

This combination of a promoter and a gene is called an OPERON

THE OPERON

Operon is a cluster of genes encoding related enzymes that are regulated together

Operon consists of:• a promoter site where RNA polymerase binds and begins transcribing themessage.• a region that makes a repressor.

Repressor sits on the DNA at a spot between the promoter and the gene to betranscribed.

This site is called the operator.

LAC Z GENE

• E. coli regulate the production of BetaGalactocidase by using a regulatory protein calleda repressor

• The repressor binds to the lac Z gene at a sitebetween the promotor and the start of the codingsequence

• The site the repressor binds to is called theoperator

LAC Z GENE

• Normally the repressor sits on the operatorrepressing transcription of the lac Z gene

• In the presence of lactose the repressorbinds to the sugar and this allows thepolymerase to move down the lac Z gene

LAC Z GENE

This results in the production of beta galactosidasewhich breaks down the sugar.

When there is no sugar left the repressor willreturn to its spot on the chromosome and stopthe transcription of the lac Z gene.

Mecanismooperon apagado

GENE REGULATION

• In eukaryotic organisms like ourselves there are severalmethods of regulating protein production

• Most regulatory sequences are found upstream fromthe promoter

• Genes are controlled by regulatory elements in thepromoter region that act like one/off switches ordimmer switches

GENE REGULATION

• Specific transcription factors bind to these regulatoryelements and regulate transcription.

• Regulatory elements may be tissue specific and willactivate their gene only in one kind of tissue

• Sometimes the expression of a gene requires thefunction of two or more different regulatory elements

INTRONS AND EXONS

• Eukaryotic DNA differs from prokaryotic DNA it that the codingsequences along the gene are interspersed with noncodingsequences.

• The coding sequences are called

– EXONS

• The non coding sequences are called

– INTRONS

INTRONS AND EXONS

• After the initial transcript is produced theintrons are spliced out to form the completedmessage ready for translation

• Introns can be very large and numerous, sosome genes are much bigger than the finalprocessed mRNA

INTRONS AND EXONS

• Muscular dystrophy

• DMD gene is about 2.5 million base pairs long

• Has more than 70 introns

• The final mRNA is only about 17,000 base pairslong

RNA Splicing

• Provides a point where the expression of a gene can becontrolled

• Exons can be spliced together in different ways

• This allows a variety of different polypeptides to beassembled from the same gene

• Alternate splicing is common in insects and vertebrates,where 2 or 3 different proteins are produced from onegene

Protein domains in regulator sequences

TFBS: Transcription factors binding sites

Motif representations: from alignments to motifs

Transcription factors

Sequence-specific

DNA bindingNon-DNA binding

TF1 TF2 TF3 TF4

adapter

Co-activator

HAT

DNA

Layer I

Layer III

Layer II

Structure of transcription factors

USF-1, dimer

DNA binding

domain

Activation

domain

oligomerization

domain

Ligand-

binding

domain

Protein-protein

interaction

domain

N Gene Schema and positions of a CE

TRANSCompel

accession number

1.

Scavenger receptor, Homo sapiens

Enhancer –4500/-4100

C00080

2.

GM-CSF,

Mus musculus

-53 -40

: :

C00081

3.

Collagenase, Homo sapiens

-89 -82 -72 -66

: : : :

C00083

4.

IgH ,

Mus musculus

Enhancer at 3’ flank

C00133

5.

Interleukin 2,

Homo sapiens

-283 -268

: :

C00109

6.

Interleukin 2,

Homo sapiens

-167 -142

: :

C00165

7.

2, Mus musculus

-167 -142

: :

C00158

8.

IgH,

Homo sapiens

C00173

9.

А1, Rattus

norvegicus

-117 -73

: :

С00101

10.

IRF-1, Mus

musculus

-123 -113 -49 -40

: : : :

C00192

AP-1 Ets

AP-1 Ets

AP-1 Ets

AP-1 Ets

AP-1 NFAT

AP-1 NF-B

AP-1 Oct-2

Ets CBF

NF-B C/EBP

NF-B STAT-1

Ternary complex NFATp - AP1 - DNA

Synergistic activation of

transcription

Low level

of transcription

Low level

of transcription

F1

F1

F1

F2

F2

F2

Composite elements

Minimal functional units where both protein-DNA and protein-protein

interactions contribute to a highly specific pattern of gene expression

and provide cross-coupling of different signal transduction pathways.

Membrane receptor

Src

SH3

SH2Ras

Ras

GDP

GTP

AdaptorsPLC

PI3-K

Phosphorylation

IP3

Ca2+

Ca2+Ca2+

Ca2+ dependent canal

Calcineurin

ERK

ERK

JNK

JNK

P38MAPK

P38MAPK

NFATp NFATp

NFATp

P

P Pc-Fos c-Fos

с-Fos

c-Jun

c-Jun

c-Jun

c-Jun

ATF-2 ATF-2

ATF-2

IL-2

PKB/Akt

Composite element

cytoplasm

Nucleus

Integration of signals. Cross-coupling of signal transduction pathways

-180 -150-249

AP-1

NFAT

HMG Y

NFAT NFAT

AP-1STAT 6 NF-Y

-114 -88

AP-1

NFAT

HMG Y

-60

AP-1

NFAT

TATA

-28

c-MAF

CE CE

ST

Mouse IL-4 promoter

+1

ST

GM-CSF Homo sapiens

+1

T-cell specific inducible enhancer at –3500 bp Promoter

TATTT

-54

AP-1

NFAT

CE

NF-Bp50/p65

-88

AP-1

NFAT

CE

AP-1

NFAT

CE

AP-1

NFAT

AP-1

NFAT

CE

NF-Bc-Rel/p65

HMG Y(I)

-114

CD28 response element

CBF CBF

Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains(AD), which contact the RNA polymerase II basal transcription machinery.

Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66

Enhanceosome

TFIIA

TFIIE

TFIIH

Site-specific TF

TFIIF

RNA pol II

TFIID

Co-activator

p300/CBP

Acetilase

PCAF

Closed nucleosomes

Acetilation

TFIIB

Acetylase

Acetylation

Databases on gene regulation

http://regulondb.ccg.unam.mx/

Buscar .gbk y 100pares de basesupstream

Ejercicio

BLASTp vs NR para buscar probables ortólogos

>malE - 100 bases upstream

aaagaactacctgaatttcgagattaggcctt

gatcgcgccggggtgaaagcgttatact

gacgcgcaaacgtttgcgcaatttgggcacag

agggggtt

>malE - 100 bases upstream

aggaggatggaaagaggatgtcatagaaagaa

actaaagaccgttaagcgacctctgcgt

atccacgagcaatatacacaaatggaaaagga

cgggttat

http://molbiol-tools.ca/Promoters.htm

http://www.prodoric.de/vfp/vfp_promoter.php

http://www.phisite.org/main/index.php?nav=tools&nav_sel=hunter