View
1
Download
0
Category
Preview:
Citation preview
Genomics Informed Medicine
NGS approaches Massive Parallel Sequencing - a very dynamic field-
choice of test:
Whole Exome Sequencing WES- only coding regions explored,
700 €/sample.
Whole Genome Sequencing WGS-entire genome explored, approx
10,000 €/sample- current efforts aim at 1,000 €/sample .
RNA-seq- determines sequence and levels of gene expression,
800 €/sample (can be performed for large –mRNA/LncRNA and small
miRNA).
Chip-seq- determines DNA sequences bound to proteins,
700 €/sample
Array CGH- determines using arrays deletions and copy number variations
FIRST GENOME SEQUECING: SANGER METHOD, 1.5 BILLION US $
Structura ADN
2
5 ’
5 ’
3 ’
3 ’
T = A
C = G
A = T
G = C
5 ’
5 ’ 3 ’
3 ’
Structura ADN
3
Cromozomii
La toate fiinţele vii, ADN are aceeaşi structură, variind doar ca mărime.
- virus: ADN = 103 -104 perechi de baze
- bacterii: ADN = 106 pb, ADN circular, un singur cromozom
- eucariote: ADN = 3 x 109 pb (la om), împărţit pe mai mulţi cromozomi, ADN
este asociat cu proteine.
Genomul uman:
-3.2 milliarde de perechi de baze pentru 23 cromozomi;
-aproximativ 21-22.000 gene care codifică pentru proteine (sub 2% din genom)
(genă = regiune a ADN care conţine informaţia necesară sintezei ARN
mesager (ARNm)
4
Replicarea ADN
Fluxul informaţiei genetice:
REPLICARE
traducere
transcriere
5
ADN
ARN
E
eucariote: ADN
ARN primar
ARN mesager
Proteine
1. Transcrierea ADNului în ARN: principii generale
Transcrierea (Transcription)
Transcriere
Maturare
Traducere
6
A G C T . . .
T C G A
A G C U 3’ ARN
ADN, catena sens sau codantă
ADN, catena antisens sau non codantă
eucariote:
3 polimeraze :
ARN polimeraza I : ARN ribozomal (5.8S, 18S, 28S)
ARN polimeraza II : ARNm, micro-ARN
ARN polimerase III : ARNt + ARNr 5S + ARN mici nucleare
1. Transcrierea ADNului în ARN: principii generale
Transcrierea
7
5’
3. Sinteza ARNm : eucariote
- Debutează cu sinteza unui ARN primar
ARN primar suferă o maturare în 3 etape, finalizând cu obţinerea ARNm:
- aditia unui cap 5’
- « splicing » : eliminarea intronilor şi menţinerea exonilor
- poliadenilarea extremităţii 3’ a ARN
ARN primar
ARN primar exon intron exon intron exon
exon exon exon AAAAAA ARNm
Cap 5” coadă poliA
Transcrierea
9
3. Sinteza ARNm : eucariote
ARN primar este sintetizat de ARN polimeraza II, care este recrutată la situsul de iniţiere al
transcrierii, de către factorii generali ai transcrierii
factorii generali ai transcrierii formează un complex de proteine, ancorat la ADN de TATA-box
binding Protein (TBP):
ADN ARN primar
+ 1
TBP
factorii generali ai
transcrierii ARN pol II
Transcrierea
- Ansamblul secvenţelor cis situate în vecinătatea situsului de iniţiere a transcrierii + TATA sau
TATA-like box constituie promotorul genei.
- Alte grupe de secvenţe cis se pot găsi la distanţă de situsul de iniţiere a transcrierii; ele
formează enhancers (amplificatori) sau silencers (reprimatori), în funcţie de tipul de factori
trans activatori sau represori pe care îi recrutează
TATA cis cis cis // // cis cis exoni + introni cis cis
promotor (300-500 pb) enhancer/
silencer
enhancer/
silencer
Structura unei gene eucariote:
Transcrierea
10
The 3D Genome in
Transcriptional Regulation
Adapted from B. Ren Cell Stem Cell 2014 2014 June 5; 14(6): 762–775.
How one genome sequence can give rise to so many
different cell types, the answer to this question lies, at least
in part, in the ability of distinct cell types to express genes
at different levels and in different combinations. “Lineage-
specific” regulation of gene expression occurs at the level
of transcription.
Features of the genome beyond its primary nucleotide
sequence must contribute to the lineage-specific gene
regulation that underlies cellular identity. Other genomic
features other than primary nucleotide sequence are
important.
However, no linear representation of the human
genome – no matter how well annotated with
functional elements – can fully capture the
molecular mechanisms responsible for lineage-
specific transcriptional regulation.
The role of non-linear interactions in transcriptional
regulation is exemplified by two fundamental
properties of metazoan enhancer function:
1)Enhancers can direct the expression of target genes
located far away in linear distance (i.e. number of
intervening base pairs)
1)The gene most heavily influenced by an enhancer is
not always the gene that is closest by linear distance.
“Long-range” regulation is possible because
enhancers are in close physical proximity to the
promoters of their target genes in vivo, despite long
stretches of intervening nucleotides.
This physical proximity allows protein complexes
bound at enhancers to interact with those bound at
promoters, thereby influencing transcription of
target genes.
Physical interactions like those between an enhancer and promoter,
a series of molecular techniques based on the concept of Chromatin
Conformation Capture (3C) are used.
Chemical crosslinking secures 3D contacts between genomic loci
occurring in live cells.
This cross-linked chromatin is then isolated, and digested with a
restriction enzyme.
Re-ligation is performed in extremely diluted solutions .
Only loci that were contacting each other in vivo (and thus fixed
together by crosslinking) will be ligated together.
Higher-order genome organization
B. Ren Cell Stem Cell 2014 June 5; 14(6): 762–775.
The genome is organized at many levels ranging
from higher-order structures that are visible under
the microscope. The most fundamental unit of
higher-order genome organization is the
chromosome. Each chromosome occupies its own
sub-volume of the interphase nucleus, known as a
Chromosome Territory (CT).
CTs can be visualized by Fluorescent in Situ Hybridization
(FISH) using probes sets designed to paint entire
chromosomes. Cts are also evident in C-data which
demonstrate a consistent preference for intra-chromosomal
over inter-chromosomal interactions.
Gene-rich regions tend to localize to the periphery of Cts
which facilitates access to transcriptional machinery sharing of
this machinery between active genes on different
chromosomes.
Specific regions can shift position from the CT interior to the
CT periphery as genes in those regions become active during
development.
The position of a given CT within the nucleus is
highly stable through interphase.
Level 1: Chromosomes occupy distinct sub-
regions of the nucleus known as chromosome
territories (CTs). Individual chromosomes are
indicated by different colors.
Genomic regions at the nuclear periphery have been
studied in further detail using a method that can identify
regions that come into contact with proteins of the nuclear
lamina, a filamentous network of proteins abutting the
inner nuclear membrane.
Genomic regions that contact the nuclear lamina, which
are known as Lamin Associated Domains (LADs), are
characterized by low levels of transcriptional activity, low
gene density, and repressive histone modifications
including H3K27me3 and H3K9me.
These observations suggest a link between transcriptional
silencing and the nuclear lamina.
*
*
Chromatin organization
• Nuclear intermediate filament (IF) proteins.
• Cytoskeleton component of nucleus
• Associated with inner nuclear membrane.
• Contribute to chromatin regulation, regulate gene expression, DNA replication, DNA repair,
cell proliferation & differentiation etc.
• Developmental processes, including tissue formation and homeostasis and organogenesis
Transcription Factor
DNA methylation &
Histone modifications
Transcription
Nuclear Lamina
Nuclear Lamins
Gene regulation during development
Courtesy of J; Staerk
LAMIN
LMNA
Lamin A, C, Δ10 and C2
LMNB1
Lamin B1
LMNB2
Lamin B2 and B3
• Viscosity and stiffness to nucleus • Postnatal development. • Lamin A/C KO mice born apparently
normal but develop growth retardation and have a muscular disease phenotype and die .
• Mutations in LMNA found, associated with ∼14 distinct human diseases.
• Confer elasticity to nucleus • Cellular processes during embryogenesis. • Lamin B1 KO mice have major defects in the
lungs and brain, dies shortly after birth. • Few disease-causing mutations have been
identified in LMNB1 or LMNB2 and are mostly embryonic-lethal.
The family of Lamin proteins
Courtesy of J. Staerk
• LADs are large chromosomal domains that associate with the lamina.
• Most genes in LADs are transcriptionally inactive and enriched in repressive histone marks
such as H3K27me3 and H3K9me2 , suggesting a repressive role for LADs.
• Lamina-genome interactions are widely involved in the control of gene expression programs
during lineage commitment and terminal differentiation.
Molecular Cell, Volume 38, Issue 4, 2010, 603 - 613
Neural Precursor Cells
Astrocytes
Embryonic Stem Cell
Lamin associated domains (LADs)
Courtesy of J. Staerk
The association of specific genes with the nuclear
lamina often coincides with their transcriptional
silencing during differentiation (Peric-Hupkes et al.,
2010).
Examples include the key pluripotency genes Oct4,
Nanog, and Klf4. Conversely, loss of association
with the lamina and re-positioning away from the
nuclear periphery often coincides with
transcriptional activation.
Level 2: Transcriptionally inactive regions are enriched at the
nuclear periphery where they contact the nuclear lamina (red).
Actively transcribed genes often co-localize at RNA polymerase
II transcription factories (yellow). These and other instances of
colocalization between regions with similar transcriptional
activity may provide the physical basis for the observations of A
and B compartments in C-data.
Topological domains
Chromosomes are comprised of structural units called
Topological Domains, also known as Topologically-
Associating Domains (TADs).
TADs are regions of high local contact frequency, which are
separated by sharp boundaries across which contacts are
relatively infrequent.
Mammalian genomes contain roughly 2000 TADs covering
more than 90% of the mapable genome. They vary in size from
a few hundred kilobases (kb) to several megabases with an
average size of approximately 1 Mb. TADs are too small for
current microscopy-based methods but FISH is generally
consistent with C-data.
TADs are a fundamental unit of genome
organization. TADs have now been described in
every mouse and human cell type in which they
have been scrutinized as well as in Drosophila
(TAD size is considerably smaller in Drosophila
at ~100 kb on average).
The boundaries between TADs are strikingly consistent
across cell types.
Roughly 50-90% of TAD boundaries overlap in pairwise
comparisons between cell types. The locations of TAD
boundaries are also highly conserved between mouse and
human, indicating that both the existence and location of
TADs have functional significance that is under selective
pressure.
TADs are not detectable during mitosis.
TADs frequently overlap with regions demarcated by other
functional annotations related to transcriptional activity including
histone modifications, replication timing, and association with the
nuclear lamina.
Transitions between compartment A and compartment B also
frequently occur at TAD boundaries. A given TAD tends to be all in
the active compartment A, or all in the inactive compartment B.
TADs in the active compartment A tend to contain a higher density of
internal interactions, as might be expected given the role of
interactions between cis-regulatory elements in transcriptional
activity.
The same TAD can be found in different compartments (i.e. A or B)
in different cell types.
cis interactions across TAD boundaries are
infrequent. These boundaries may limit the
potential target genes of a given enhancer, or vice
versa limit the potential enhancers of a given
target gene.
Promoters and enhancers within the same TAD
often show coordinated activity.
The insertion of a reporter construct designed to act as a
regulatory sensor into different locations within the same
TAD yields highly similar patterns of reporter gene
expression in transgenic mouse embryos. Well-described
cases of long-range regulation involve a promoter and
distal enhancer that lie within the same TAD.
The HOXD gene cluster straddles the border between two
TADs, and is influenced by distal regulatory elements from
those different TADs at different stages in development.
CCCTC-Binding factor (CTCF) binds three
regularly spaced repeats of the core sequence
CCCTC in the Myc promoter and thus was named
CCCTC binding factor. Lobanenkov et al.
Oncogene 5 (12): 1743–53, 1990. It binds to
CCGCGNGGNGGCAG sequence. CTCF binds to
15,000-40,000 sites in the human genome
11 Zinc Finger protein different use of Zinc fingers
for DNA engagament
Binding sites for the protein CTCF are highly
enriched at TAD boundaries.
CTCF can function as a transcriptional insulator in
certain contexts by blocking enhancer-promoter
interactions and/or preventing the spread of epigenetic
marks.
Deletion of a specific TAD boundary containing
CTCF binding sites led to an increase in interactions
between adjacent TADs.
Level 3: Topological domains, or Topologically-Associating Domains
(TADs) are regions of frequent local interactions separated by
boundaries across which interactions are less frequent. CTCF binding
sites and other sequence features (TSS, SINEs; not depicted here) are
enriched at TAD boundaries. Note that CTCF also binds within TADs.
Cohesin is often present at TAD boundaries.
Level 4: Transcriptional regulation depends on long-range
Interactions between cis-regulatory elements such as enhancers (light
red) and promoters (light yellow). These cis-regulatory interactions
are facilitated by proteins including Transcription Factors (“TFs”;
blue), co-factors such as Mediator (“Med”; red) and Cohesin (purple
ring), and RNA Polymerase II (“Pol II”; yellow).
.
Knockdown of CTCF leads to an increase in interactions
between adjacent domains (so-called “inter-domain
interactions”), though not complete abrogation of TAD
boundaries.
Loss of Cohesin (recruited by CTCF and present at many
TAD boundaries) also leads to an increase in inter-domain
interactions.
TAD boundaries are also enriched for SINE elements and
Transcriptional Start Sites (TSSs, particularly those of so-
called “housekeeping” genes), but the requirement of these
elements for boundary activity has not been explored in as
much detail.
TAD boundaries range in size from tens of kb to more
than 100 kb. The lack of precise boundary locations
may be due in part to limited resolution of the C-
technologies used to identify TAD boundaries
(currently between ~10-40 kb).
The formation of a TAD boundary requires more than
one sequence element – for example, the combination
of several CTCF binding sites, and perhaps
housekeeping TSSs and SINEs, spread over several kb.
TOPOLOGICAL ASSOCIATED DOMAINA
TADs and A/B compartments
A) Diagrammatic representation of
two neighboring TADs.
3C: Capturing Chromosome Conformation
Mediator, CTCF, cohesisn are ARCHITECTURAL PROTEINS
involved in connecting cis-acting elements (promoter-enhancer,
promoter-promoter, enhancer-enhancer)
Interactions between cis-regulatory
elements direct lineage-specific
transcription
Certain contacts occur far more often than expected by
chance based on the linear distance between the loci
involved.
Interaction describes the relationship between loci that are
in contact more frequently than would be expected based
on linear distance. The term “looping” is sometimes used to
describe such interactions.
Using C-technologies and other molecular techniques it
was revealed that the promoters of active β-globin genes
interact with an upstream regulatory sequence known as
the Locus Control Region (LCR), despite more than 40 kb
of intervening sequence. These interactions were not
observed in cell types where β-globin genes are silent.
Reproducible interactions are common in mammalian
genomes, and that interacting loci are highly enriched
for characteristics of cis-regulatory elements.
One recent C-study detected more than a million
interactions genome-wide between loci that are on average
separated by roughly 100 kb, including approximately
30,000 interactions between active promoters and putative
enhancers.
The vast majority of these interactions did not cross a TAD
boundary, consistent with the role of TAD boundaries in
constraining 3D interactions.
Interacting partners are not readily predicted by linear
distance. Fewer than 10% of all interactions between TSSs
and distal regions involved the closest TSS by linear
distance.
Enhancers and promoters do not interact in a simple 1:1
relationship:
-One promoter often interacts with multiple enhancers
-One enhancer often interacts with multiple promoters
-Promoters often interact with other promoters
-Enhancers often interact with other enhancers.
Cis-regulatory interactions often vary between cell types,
which is particularly true for interactions between
promoters and putative enhancers.
The presence of putative enhancer-promoter interactions is
highly correlated with a gene’s transcriptional activity.
Housekeeping genes tend to be highly expressed but not
involved in interactions with putative enhancers.
Lineage-specific genes are particularly dependent on long-
range regulatory interactions. Some broadly-expressed
genes (e.g. Myc) interact with distinct sets of enhancers in
different cell types.
Interactions between the LCR and β-globin genes are not
simply a consequence of transcription, because inhibition of
transcription by treatment with RNA polymerase II inhibitors
does not disrupt these interactions, despite a drastic reduction
in βglobin transcription.
Forced ectopic interactions between the LCR and β-globin
promoter (i.e. the creation of LCR-promoter interactions in
cells where such an interaction is not naturally present)
stimulates β-globin transcription.
Deng and colleagues (2012) created an ectopic interaction
between the β-globin promoter and LCR in the pro-
erythroblast cell line GE1, which does not normally express
β-globin nor display an interaction between promoter and
LCR. Creation of this ectopic interaction caused a dramatic
increase in β-globin expression.
Genome-wide evidence on the action of Tumor Necrosis
Factor on target cells demonstrates that enhancer-promoter
interactions often exist prior to the onset of transcription
for a particular response.
As differentiation proceeds, cells gain priming interactions
for stimuli that are important at later stages of
differentiation, while losing priming interactions required
at earlier stages.
cis-regulatory interactions are
secured by TFs and architectural
proteins
Central to any discussion of cis-regulatory interactions is a
consideration of how, at the molecular level, these
interactions are established and maintained.
At the sequence level both promoters and enhancers are
composed of binding sites for TFs. Promoters bind a core
set of General Transcription Factors (GTFs), and these
GTFs in turn recruit RNA polymerase II and additional
cofactors.
The repertoire of TFs that bind at enhancers is more
contingent on the cell type in question.
The sequence-specific DNA binding factor CTCF stands
apart from other TFs with respect to genome
organization.
Regions bound by CTCF are frequently engaged in
physical interactions with themselves as well as with
other regions.
CTCF is described as a “master weaver of genome” and
as an “architectural protein”. CTCF is ubiquitously
expressed, and binds to tens of thousands of sites
throughout the genome.
Part of CTCF’s function is to establish a structural
framework that is similar between cell types. The
involvement of CTCF in 3D interactions is integral to
its function, the impact of CTCF binding on
transcription depends on the locus and cell type in
question.
CTCF and other TFs share the ability to recruit cofactors that are
also involved in the formation of cis-regulatory interactions.
One such cofactor is the Cohesin complex. Cohesin is well known
for its role in holding sister chromatids together until anaphase
when they are separated and migrate to opposite spindle poles.
Cohesin is commonly found at enhancers, where it acts together
with the Mediator complex to maintain physical interaction
between promoters and enhancers.
Mediator can directly interface with factors bound at enhancers
and those bound at promoters, facilitating communication
between them. Cohesin is also present at CTCF binding sites,
many of which are outside of traditional enhancers and lack
Mediator binding.
In a given cell type (including ESCs) the majority of
Cohesin binding falls into one of two categories:
1)Sites that are co-occupied by Mediator and multiple
TFs
OR
2) Sites that cooccupied by CTCF (Kagey et al., 2010,
Yan et al., 2013, Faure et al., 2012, Hnisz et al., 2013).
80% of cis interactions involved loci bound by some
combination of Cohesin, Mediator, and/or CTCF, leading
the authors to label these factors as “architectural proteins”.
Cohesinmediator interactions also occurred over shorter
distances (mean <100 kb) than did Cohesin-CTCF
interactions (mean >1 Mb). Cohesin may function as a
general stabilizer of these interactions.
A complex picture in which a number of trans factors
including lineage-specific TFs, CTCF, Mediator, and
Cohesin are involved in anchoring different types of cis-
regulatory interactions (including, but not limited to,
interactions between promoters and enhancers).
Interactions are anchored by factors that recognize DNA
in a sequence-specific manner, thereby determining
which specific loci are most likely to participate in
stable interactions.
These DNA binding factors in turn recruit cofactors
such as Cohesin and Mediator, which further promote
and stabilize the interactions.
A newly-described class of non-coding RNA (ncRNA-a),
which can direct the transcriptional upregulation of other
genes in cis, thus functioning analogously to classically-
defined enhancer elements.
As ncRNA-a are transcribed, they engage in physical
interactions with their target promoters, and these
interactions are dependent on the recruitment of Mediator
by the nascent ncRNA-a. Like TFs, ncRNA-a can anchor
cis-regulatory interactions and recruit cofactors to further
stabilize these interactions.
Genome organization and pluripotency
Pluripotent cells have the same features of genome
organization as differentiated cells, including A/B
compartments, LADs, TADs, and cis-regulatory interactions.
One unique feature is that chromatin is generally less
condensed and more loosely organized in pluripotent cells
than in lineage committed cells.
Correspondingly, histone modifications that mark
heterochromatin expand during lineage commitment to cover
a substantially larger portion of the genome in differentiated
cells than in ESCs.
C-data revealed that transcriptionally inactive regions tend
to participate in fewer specific long-range interactions in
ESCs than in non-ESCs. These results are all consistent
with a chromatin conformation that is particularly
malleable in pluripotent cells, and which may function to
maintain a state of permissiveness for the different
transcriptional programs required for lineage specification.
Although condensed heterochromatin is less prevalent in
ESCs than in other cell types, transcriptional repression is
still important to the pluripotent state.
The repression of many genes associated with lineage
commitment requires Polycomb group (PcG) proteins.
Genomic regions enriched for PcG binding and/or its
associated repressive histone modification H3K27me3
contact each other at high frequency in C-data generated
from ESCs.
Another unique feature of higher-order genome
organization in pluripotent cells is that regions with a high
density of binding sites for the key pluripotency TFs
Oct4, Sox2, and Nanog (together abbreviated as OSN)
tend to co-localize in nuclear space.
OSN are directly involved in higher-order genome
organization in ESCs, which is further supported by the
demonstration that loss of either Oct4 or Nanog
diminishes long-range contacts between OSN-bound
regions.
Surprisingly, binding of CTCF and Cohesin is not
enriched at long-range contact sites in ESCs, suggesting
that the role of OSN in shaping higher-order structure
of the pluripotent genome is independent of
architectural proteins.
OSN also anchor short-range cis-regulatory interactions
that do require Cohesin.
OSN and other key pluripotency genes are in contact
with the silencing environment of the nuclear lamina
less frequently in ESCs than in differentiated derivates.
Interactions between the promoters of different
pluripotency genes can be detected In ESCs both in cis
and in trans, indicating that they colocalize in the
pluripotent nucleus, perhaps at shared RNA Polymerase
II transcription factories.
A more comprehensive view of the genome as a 3D entity
is required. Many of the functional modules in the genome
are arranged in linear fashion.
Exons are always transcribed in linear order, and
promoters are always located immediately upstream of the
transcription unit. This machinery is processive – that is, it
moves along a stretch of DNA in a line – and thus the
functional modules on which the machinery acts are
arranged in linear fashion in the genome.
Unlike the exons of a gene, the enhancers that regulate a
gene’s transcription are often not arranged in a linear
fashion with respect to the gene in question. Enhancers
can be found upstream or downstream of the genes they
regulate, can act over large linear distances, and can skip
over intervening genes.
The machinery of transcriptional regulation is structural –
that is, it relies on 3D interactions between modules that
may be separated by considerable linear distance.
Genome organization plays a role in myriad other
processes including DNA repair, DNA replication, and X
chromosome inactivation.
Mutations in genes that encode genome and nuclear
architectural components (including subunits of the
Cohesin complex, Mediator complex, and Nuclear Lamins)
can result in severe developmental phenotypes.
Genes encoding Mediator subunits, Cohesin subunits, and
CTCF are also mutated at significant frequency in cancer,
raising questions about the potential contribution of defects
in genome organization to malignancy.
SNPs that are linked to human disease by Genome-Wide
Association Study (GWAS) are commonly found within
enhancers, suggesting that perturbation of long-range
regulation is the mechanism behind a sizable portion of
pathogenic sequence variation.
Genome Sequencing: from The
Human Genome Project to
Clinical Applications
Lander et al., Nature 2001, 409, 877
Plasmids-4 Kb
Cosmids-40 Kb
BAC, YAC 100-500 Kb
Bacterial genome-2Mb
E. Coli F plasmid- BAC allows
stable cloning of up to 1 million bp
Lander et al., Nature 2001, 409, 877
Long repetitive sequences make full-resolution difficult
Lander et al., Nature 2001, 409, 877
Technology Evolution to Massive Parallel Sequencing
An emulsion method for DNA amplification and a special instrument
Bentley et al. 2008
-Human genome = 30,000 genes
-Hundreds of genes acquired by horizontal transfer from bacteria
-Dozens of genes acquired from transposons
-50% of the genome is derived from transposable elements of which
DNA and LTR transposons are inactive
-Segmental duplication is frequent in the human genome and involved
pericentromeric and subtelomeric regions
-Recombination rates are higher in distal regions of of chromosomes
In a pattern that promotes occurrence of 1 cross over per chromosome
arm in each meiosis
-Alu transposable elements predominate in GC rich regions while
GC-poor regions are associated with dark G- bands in karyotypes Lander et al., Nature 2001, 409, 877
-SNP= Single Nucleotide Polymorphism
-Two human genomes compared will differ at 2.5 million places
corresponding to a frequency of 1 per 1300 nucleotide pairs
-Rate of nucleotide change/genome 5 nt/ 1000 are changed in
1 million years due to the acuracy of replication
-Human and chimpanzee chromosomes are separated by 5
million years of evolution- very similar: human and mouse
chromosomes separated by 100 million years of evolution
are much more different Lander et al., Nature 2001, 409, 877
-Number of genes: 6000 for yeast Saccharomyces cerevisiae; 18,000
for the nematode C. elegans; 13,000 for Drosophila melanogaster;
30,000 for humans
-A total of 3 billion years of evolution
Lander et al., Nature 2001, 409, 877
Chimpanzee and human chromosomes are almost identical except
for human chromosome 2
99% of Alu repeats (types of transposon repeats) are in
the same place in the human and chimpanzee genomes
The 1% repeats that differ contain human-sepcific Alu, still active
which can induce genetic diseases.
Human Alu seqs (1 million) and mouse B1 (400,000) evolved
From the 7SL RNA which encodes the SRP RNA
Alu restriction site = ag/ct
Lander et al., Nature 2001, 409, 877
-3 million transposable element remnants in the human genome
-Presently these elements are responsible for new mutations, like
for 2/1000 mutations
-Hypothesis: 170 million years ago : critical speciation events
leading to mammalian radiation for a common ancestor may
have involved a burst in transposition activity
Lander et. al., Nature 2001, 409, 877
Transposons in the Human Genome
Transposons:
-LINES (LINE1 is still active !) are most ancient, 6kb in
length, encode 2
orfs and have a polymerase II promoter, move to the
nucleus a complex
of proteins and the RNA; an endonuclease makes a ss nick
and the RT
uses the nicked RNA to prime RT from the 3’ end-
imperferct with
unfinished 5’ ends; new insertions are flanked by 7-20bp
target site
Duplications; LINES target AT rich gene-poor regions due
to TTTT/A
endonuclease preferred cleavage site
Long interspersed nuclear elements- can insert into the gene for
Factor VIII and produce hemophilia
-SINES are short 100-400 bp and use LINES to function
-Promoter regions are shared with tRNA sequences or with the
7SL RNA of the signal recognition particle- this subfamily
of SINES is the Alu repeat
!
-LTR transposons are flanked by LTRs and contain gag, pol
coding for RT and RNAseH; transposition occurs via the RT in
a cytoplasmic virus-like structure, primed by a tRNA as opposed to
chromosomal priming for SINES. They generated extracellular
retroviruses by acquiring an envelope protein
-DNA transposons resemble bacterial transposons having terminal
Inverted repeats and using cut and paste mechanisms- they are short
lived elements
-Rapid transcription of SINE elements into RNA can only occur
near genes in opened chromatin; SINE RNA can appear in massive
amounts, inhibit PKR, stimulating translation.
-Stress induced SINE transcription which leads to massive
increases in protein translation- mechanism of evolution?
-Y chromosome has lower levels of somatic genes transcribed
and shows lower than expected numbers of Alu elements; the
reverse is true for chromosome 19
Lander et al., Nature 2001, 409, 877
-LINE1 and Alu constitute 60% of all interspersed repeat sequence
-LINE1 and Alu are vertically transmitted
-Genomes of the worm, fly, and mustard weed have many more
types of recent active transposons of which LINE and SINE
elements are 5-6%
-The rate of housecleaning through small genomic deletions is
75 fold higher in flies than in mammals
-New spontaneous mutations due to LINES are 30 times more
likely to occur in mice than humans
Lander et. al., Nature 2001, 409, 877
Gene Expression measures mRNA
Microarrays 3,000-6,000 genes at a time
Northern blot- one gene at a time
RNA-sequencing all genome
Promoter activation- gene regulation
Chromatin immunoprecipitation- PCR for specific DNA seqs
Chromatin immunoprecipitation- chip hybridization
Chromatin immunoprecipitation- massive parallel sequencing
Epo
cytoplasmJAK2JAK2
nucleus
P P
X
STAT 5
STAT 1, 3, 5
STAT 5
nucleus
DNA
Cross-link DNA-proteins in vivo
(formaldehyde)
Chromatin extraction
Sonication
Incubation with Ab-STAT
Chromatin Immunoprecipitation (ChIP)
Epo treatment (100u/ml), 15 minutes
Cross-link reversion, PK and RNAse A
treatments, purification of DNA
with known targets : PCR
Protein APrecipitation of Ab-STAT/proteins
complexes (protein A sepharose)
- + M
Agarose gel
ChIP – seq allows identiciation of DNA sequences physically
bound by a transcription factor. It does not give information
about transcriptional activation.
Microarrays give information about mRNA expression levels
not sequence of mRNAs.
Golub et al., Science 1999, 286, 531.
ALL AML
Golub et al., Science 1999, 286, 531.
RNA-seq gives information about RNA levels and sequence
Methodology
Small RNA Sequencing
Identification of sncRNAs
RNA extraction
&
Illumina Libraries Preparation
DNA extraction
&
Illumina Libraries Preparation
OxBS sequencing
Identification of 5hmC
MOABS (MOdel based Analysis
of Bisulfite Sequencing data)
/novel pipeline
miRNA/smallRNA seq
Type of RNA-seq but differ from other RNA-seq in that input material is enriched for small RNAs.
Allows to examine tissue specific expression, disease associations and to discover previously uncharacterized small RNAs.
A combination of ChiP-Seq, 3C (Conformation Capture Assay),
and RNA-seq gives information about chromatin shape, promoter
and enhancer usage, the conseqeunces on gene expression and sequence
of transcripts.
Exome sequencing gives information on the sequence of expressed genes
in a certain cell type/sample, but not about possible deletions and
insertions, copy number variations. It also misses regulatory sequences.
Array CGH approaches (molecular karyotype) gives information about
copy number variations.
Whole genome sequencing gives information about coding and non-
coding sequenced, copy number variations, and will replace exome+
aCGF.
Recommended