27
Highly Conserved Non- Highly Conserved Non- Coding Sequences are Coding Sequences are Associated with Associated with Vertebrate Development Vertebrate Development PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11. Yvonne Li Yvonne Li Paper presentation for MEDG505 Jan 27, 2005

Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

  • Upload
    evan

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development. Yvonne Li Paper presentation for MEDG505 Jan 27, 2005. PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11. Outline. Motivation. Method and Results. Discussion. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Highly Conserved Non-Highly Conserved Non-Coding Sequences are Coding Sequences are

Associated with Vertebrate Associated with Vertebrate DevelopmentDevelopment

PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11.

Yvonne LiYvonne LiPaper presentation for MEDG505

Jan 27, 2005

Page 2: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Outline

Motivation

Discussion

Method and Results

Page 3: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Motivation Gene Regulatory Networks for development have

been described in invertebrates but not characterized for vertebrates

Studies have shown: a number of developmental genes are regulated by highly

conserved enhancer regions at distances of hundreds of kb ultra-conserved elements are more frequent than expected there is a significant association between these highly

conserved elements and DNA binding proteins

Goal: look for all such elements in the entire human genome and see how they relate to development.

Page 4: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

MethodMethod

Computationally identify

Computationally analyze

Experimentally validate

Page 5: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Sequence Data CNE : Highly Conserved Noncoding Elements

Which 2 species to use for whole-genome alignment?

Identifying Identifying

Page 6: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Sequence Data Which 2 species to use for whole-

genome alignment? Human and Fugu Fugu has 1/8 genome size of

human but similar gene repertoire Fugu’s developmental blueprint

is very similar to Human

Identifying Identifying

Two ways to detect CNEs1. Whole-genome alignment2. Regional alignments

Page 7: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Obtaining CNEsIdentifying Identifying

1373 core set of elementsLength: ave 199bp max 736 bpIdentity: ave 84% max 98%

1365 conserved in mouse1316 conserved in rat1310 conserved in chicken1093 conserved in zebrafish

STATS

Start with Fugu genome assembly MegaBLAST against Ensembl human genome

v18.34.1 Remove alignments < 100bp in length

Masked coding and non-coding RNA content Remove telomere-like sequences and

transposons

Page 8: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

CNE Distribution

CNEs in human genome are found on all chromosomes except 21 and Y

Distribution of CNEs is highly clustered Clustered CNEs by genomic location

165 clusters The 20 largest clusters have ≥ 20

CNEs

Analyzing Analyzing

Page 9: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

CNE associated genes Find most statistically over-represented GO terms

For each CNE, extract closest gene from Ensembl 12 of the 13 terms relate to transcriptional regulation and

development

How many clusters situated near such transdev genes? Over 93% of clusters have transdev gene within 500kb of its CNEs.

15% have 2 or more.

CNEs generally located large distances from nearest gene Average distance between CNE and 5’ end of closest human gene

is 182kb, with 93 CNEs > 500kb, and 12 CNEs > 1Mb.

Transdev genes are located in regions of low gene density Average number of genes within 500 kb upstream or downstream

is 16 for all human genes and 6 for transdev genes

Analyzing Analyzing

Page 10: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Obtaining rCNEs Use MLAGAN (Localized multiple alignment) to identify

additional conserved sequences around specific genes

MLAGAN more sensitive than whole-genome alignment Species: Human, Fugu, Mouse, Rat Algorithm itself is more sensitive Require only 40bp window with 60% identity

Chose 4 cluster regions containing diff types of developmental genes: SOX21, PAX6, HLXB9, SHH

Sometimes, the CNEs are more conserved than the gene’s coding exons!

Identifying Identifying

Page 11: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Sox21 MLAGAN

Page 12: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Vertebrates vs Invertebrates Are the CNEs also found in invertebrates?

Use all CNEs and rCNEs Search whole genome sequence of

Ciona intestinalis Drosophila melanogaster Caenorhabditis elegans Anopheles gambiae

No significant matches (however, the genes have clear homologs)

43 CNEs show significant similarity to at least one other CNE (their genes have clear paralogous relationships)

Page 13: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

MethodMethod

Computationally identify CNEs

Computationally analyze CNEs

Experimentally validate a few CNEs

Page 14: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Experimental Validation Coinject CNEs with green fluorescent

protein (GFP) reporter, in zebrafish embryos

Idea: CNEs contain something that affects the

transcription of a transdev gene

The transdev gene affects development

Examine the ability of CNEs to up-regulate GFP reporter expression

Page 15: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Experimental Validation Chose 25 regions for GFP assay

10 CNEs, 15 rCNEs Look for GFP expression in live embryos

Average of 200 embryos screened per control No upregulation

Average of 188 embryos screened per element GFP expression in all but 2 elements; varied from 4% to 44%

Page 16: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

SOX21 associated elementsKnown

SRY-related box gene Acts as a transcriptional repressor

during early development Expressed in a complex manner in

CNS, and in nasal epithelium, lens and retina of eye, inner ear

Page 17: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

PAX6 associated elementsKnown

Paired-box containing transcription factor, known to be influenced by cis-acting elements in upstream, intronic and downstream positions

Expressed in developing eye, forebrain, hindbrain, spinal cord

Page 18: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

HLXB9 associated elementsKnown

Homeobox gene associated with autosomal dominant effects

Zebrafish ortholog is expressed in notochord, hypochord, tail mesoderm, and tailbud

Page 19: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

SHH associated elementsKnown

A signaling molecule Zebrafish ortholog is expressed

mainly in midline structures like floorplate and notochord, but also in branchial arches, pectoral fin buds, retina

Page 20: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Limitations CNE-gene misassociated, especially in gene-rich

regions Can kind of tell from results of assays

CNEs missed due to stringent whole-genome analysis

Down regulation of expression will not be detected

Assayed elements out of context and individually Each element had cases of unexpected expression

Tissues from few cells are underrepresented Late developing tissues or cell types after 24 h will

be missed completely

Page 21: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Summary Identified a set of 1373 vertebrate CNEs Experimentally showed CNE-transdev gene

association

CNEs found in clusters, in front of transdev genes CNEs act at large distances from coding sequence The relative order and positions of CNEs are

conserved No vertebrate CNEs were found in invertebrates, even

though the genes had clear homologs

Many of these results are paralleled by a similar paper(Sandelin et al. 2004) >50bp, >95% Human/Mouse identity 3583 Human/Mouse/Pufferfish UCRs; ave length 125 bp

Page 22: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Discussion Almost all CNEs are associated with

developmental regulators Do most transdev genes have CNEs associated?

CNEs act at large distances from gene They could be enhancers or silencers

The relative positioning and order of CNEs are completely conserved Do they play a role in structuring the genomic

architecture around transdev genes?

Page 23: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Discussion No vertebrate CNEs are found in invertebrates

Are there CNEs in invertebrates?

But PAX6 in Drosophila has been shown to have an highly effective LE9 enhancer, that is also well conserved in vertebrates (The Interactive Fly)

Why is it not found in this analysis? Only 52 bp in length! (but the MLAGAN should have found

it ..) So, maybe invertebrate enhancers/CNEs are shorter Should maybe look for shorter CNEs in vertebrates

Page 24: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Discussion

Missing whole genome CNEs due to stringency of parameters. Try discontinuous MegaBLAST which does not

require exact word match of 20.

Only 109 of 256 of non-coding ultraconserved regions from Berejano et al. are identified.

Page 25: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Discussion What is in the CNE?

Modules of transcription factor binding sites? Hard to account for the high level conservation. Perform assays on portions of the CNEs. Use computational methods.

Regulatory RNAs? (i.e. microRNAs) Lack of EST evidence. Use regulatory RNA gene finders?

Something else entirely?

One thing is in agreement: More functional studies are needed.

Page 26: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

Discussion Do CNEs work together?

How to robustly test combinations of elements?

Mutations in CNEs can cause human disease Studies are showing that mutations in CNEs

cause disorders. CNEs at very distal locations can still effect the transcription

May be candidates for genetic screens seeking sequence variation associated with disease

Check it out with dbSNP!

Page 27: Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development

References & Acknowledgements Thanks to Misha Bilenky for lots of fun discussion

Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJ, Cooke JE, Elgar G. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11.

Elgar, G. Identification and analysis of cis-regulatory elements in development using comparative genomics with the pufferfish, Fugu rubripes. Semin Cell Dev Biol. 2004 Dec;15(6):715-9.

Venkatesh B, Yap WH. Comparative genomics using fugu: a tool for the identification of conserved vertebrate cis-regulatory elements. Bioessays. 2005 Jan;27(1):100-7.

Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004 Dec 21;5(1):99.

The interactive fly. http://www.sdbonline.org/fly/aimain/1aahome.htm