Upload
pierce-crawford
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Mapping Sites of Transcription Across the Drosophila Genome
Using High Resolution Tiling Microarrays LBNL, Berkeley CA
August 20, 2007
A. Willingham Affymetrix, Inc
I. Affymetrix’s Contribution to Specific Aims and Milestones
II. Previous Studies•Manak et al analysis of developmental transcriptome
III. Initial Results for Aim I•sample preparation & data processing•first look at cell line data on 35bp arrays•pilot analysis of brand-new 7bp arrays
IV. RACE-array •example of ENCODE extension analysis of genes on Chr21 & 22
V. Summary and Steps for Moving Forward
Presentation Outline
Specific Aim 1
• 480 samples on 35-bp genome tiling arrays• 24 samples on 7-bp genome tiling array sets• 160 RACE-fragment pools (16,000 prod’s)
Specific Aim 2
• RNAi of 120 RNA binding proteins on arrays
Specific Aim 3
• Northern blotting of ncRNA models
RNA Samples and Genome Tiling Arrays
total RNA A+ RNAnuclear
RNApolysomal
RNAtotal #
samples
total # arrays
(triplicates)Life Cycle Embryos 12 12 12 48 144
Larvae 10 10 - - 20 60Pupae 5 5 - - 10 30Adults (male) 8 8 - - 16 48Adults (female) 8 8 - - 16 48
Cell Lines Survey 100 - - - 100 300Selected - 30 30 30 90 270
Dissected tissues 90 90 - - 180 540TOTAL 480 1440
35 bp Arrays
A+ RNAsmall RNAs
(strand-specific)total # array sets
(triplicates)*Life Cycle Embryos 8 8 72
Larvae 2 2 18Pupae 2 2 18Adults (male) 2 2 18Adults (female) 2 2 18
Cell Lines Survey - - -Selected 2 2 18
Dissected tissues 6 6 54TOTAL 24 24 216
7 bp Arrays
MilestonesAim Description Group Total
Aim 1.1 RNA sample preparation Bloomington (Cherbas) 480 RNAs in triplicateAim 1.2 35-bp tiling arrays Affymetrix (Gingeras) 1,440 35-bp arraysAim 1.3 5-bp tiling arrays Affymetrix (Gingeras) 504 5-bp arrays (72 sets)**Aim 1.4 pooled RT-PCR products Affymetrix (Gingeras) 160 pools of 100 products
Aim 2.1 data synthesis Wash U (Brent) data synthesisAim 2.2a RT-PCR sequencing Wash U (Brent) 20,000 RT-PCR to validate exonsAim 2.2b RACE sequencing LBNL (Celniker) 20,000 RACE to map TSSsAim 2.2c cDNA screening LBNL (Celniker) 6,000 cDNA screens for new transcriptsAim 2.3a ncRNA sequencing LBNL (Celniker) 16 454 Life Sciences runsAim 2.3b ncRNA Northern blots Affymetrix (Gingeras) 1,000 Northern blots
Aim 2.4 RNA splicing UCHC (Gravely); UCB (Brenner) 120 RNAi knock-downs or RNA binding proteinsAim 3.1 RNAi of ncRNAs Harvard (Perrimon) 100 ncRNA transcriptsAim 3.2 overexpression of ncRNAs Harvard (Perrimon) 100 ncRNA transcripts
Aim Year 1 Year 2 Year 3 Year 4Life Cycle Cell Lines Dissected Tissues Advanced Dissections
Aim 1.1 120 RNAs x 3 120 RNAs x 3 120 RNAs x 3 120 RNAs x 3Aim 1.2 360 arrays 360 arrays 360 arrays 360 arraysAim 1.3 18 array sets 18 array sets 18 array sets 18 array setsAim 1.4 40 pools 40 pools 40 pools 40 pools
Aim 2.1 data synthesis data synthesis data synthesis data synthesisAim 2.2a 5,000 RT-PCR 5,000 RT-PCR 5,000 RT-PCR 5,000 RT-PCRAim 2.2b 5,000 RACE 5,000 RACE 5,000 RACE 5,000 RACEAim 2.2c 1,500 cDNAs 1,500 cDNAs 1,500 cDNAs 1,500 cDNAsAim 2.3a 4 runs 4 runs 4 runs 4 runsAim 2.3b 250 blots 250 blots 250 blots 250 blots
Aim 2.4 30 knock-downs 30 knock-downs 30 knock-downs 30 knock-downsAim 3.1 25 ncRNAs 25 ncRNAs 25 ncRNAs 25 ncRNAsAim 3.2 25 ncRNAs 25 ncRNAs 25 ncRNAs 25 ncRNAs
•stepwise nature of individual aims & responsibilities•involvement & interdependencies of each step•propose shifting milestones to more of a “ramp-up” model
0 1212 24months
36 48
120x3-35bp arrays 120x3-35bp arrays 120x3-35bp arrays 120x3-35bp arrays
18sets-7bp arrays 18sets-7bp arrays 18sets-7bp arrays 18sets-7bp arrays
40pools (100)RACE40pools (100)RACE 40pools (100)RACE40pools (100)RACE
250Northerns 250Northerns 250Northerns 250Northerns
30 RNA-binder K.D. 30 RNA-binder K.D. 30 RNA-binder K.D. 30 RNA-binder K.D.
70x3-35bp arrays 110x3-35bp arrays 140x3-35bp arrays 160-35bp arrays
10sets-7bp arrays 20sets-7bp arrays 20sets-7bp arrays 22sets-7bp arrays
20pools (100)RACE40pools (100)RACE 50pools (100)RACE50pools (100)RACE
100Northerns 200Northerns 400Northerns 400Northerns
15 RNA-binder K.D. 30 RNA-binder K.D. 30 RNA-binder K.D. 45 RNA-binder K.D.
current
proposed
Timeline for Milestones
Transcription Analysis of Early (0-24hr) of Drosophila Embryogenesis
•70% Annotated•30% Unannotated Manak et al. Nature Genetics, v38 Sep 2006
Differential expression in Drosophila embryogenesis(~40kb region of Chromosome 3R)
5’ TSS
0-2 hr
2-4 hr
4-6 hr
6-8 hr
8-10 hr
10-12 hr
12-14 hr
14-16 hr
16-18 hr
18-20 hr
20-22 hr
22-24 hr
19Kb
Maternally Expressed Genes (Restarted in two patterns)
Unannotated transcription updates known gene annotations
Manak et al. Nature Genetics, v38 Sep 2006
Drosophila:5`-sites predicted by txn co-reg.~1500 genesavg 1st intron size = ~20kbavg 1st annotated intron = ~1.7kb
Affymetrix sample preparation & data generation pipeline
sample treatment & QC•DNase-treat•BioAnalyzer
1st-strand cDNA synth.
•random primed•Superscript-II
2nd-strand cDNA synth.
•DNA Pol-I•save aliquot for downstream QC
label & hybridize to arrays
•TdT-based end labeling•CEL file generation
signal graph generation
•median-scaling•q-norm bioreps•select bandwidth
transfrag generation
•select min-run•select max-gap
data distribution•tomeweb hosting•FTP to servers?•deliver to DCC, GEO, etc
this example highlights method for generation of RNA maps but is similar for other applications:
•RNA maps of long and short RNAs•RACE-array maps•RNAi knockdown experiments•chromatin-immunoprecipitation
quality control•overlap w/ RACE•Northern blots•QPCR of cDNA
Current Sample Prep(5 cell line samples completed in triplicate)
(for 3 other cell lines, several samples failed)
RNA IDBiological sample ID sample
AFFX: DNase treat
AFFX: RNA ug yield (post-DNase)
AFFX: Agilent BioAnalyzer
AFFX: cDNA yield (ug) (15ug RNA)
AFFX: ug amt hyb to array
AFFX: array type
AFFX: hyb date AFFX: comment
14 BS10 S2-DRSC 8/7/2007 69.3 8/8/2007 15.3 8ug 35bp-AS 8/13/200716 BS11 S2-DRSC 8/7/2007 72.8 8/8/2007 16.5 8ug 35bp-AS 8/13/200718 BS12 S2-DRSC 8/7/2007 73.0 8/8/2007 4.4 repeat cDNA synth86 BS28 CME L1 8/7/2007 35.3 8/8/2007 14.4 8ug 35bp-AS 8/13/200788 BS29 CME L1 8/7/2007 60.1 8/8/2007 12.9 8ug 35bp-AS 8/13/200790 BS30 CME L1 8/7/2007 49.0 8/8/2007 14.5 8ug 35bp-AS 8/13/2007
134 BS13 ML-DmD16-c3 8/7/2007 22.8 8/8/2007 11.5 8ug 35bp-AS 8/13/2007136 BS14 ML-DmD16-c3 8/7/2007 35.4 8/8/2007 RNA sample degraded138 BS15 ML-DmD16-c3 8/7/2007 48.0 8/8/2007 10.7 8ug 35bp-AS 8/13/2007170 BS1 ML-DmD11 7/25/2007 38.4 7/25/2007 17.3 8ug 35bp-AS 8/13/2007172 BS2 ML-DmD11 7/25/2007 36.4 7/25/2007 18.9 8ug 35bp-AS 8/13/2007174 BS3 ML-DmD11 7/25/2007 28.8 7/25/2007 16.8 8ug 35bp-AS 8/13/2007182 BS4 ML-DmD20-c2 7/25/2007 31.7 7/25/2007 15.3 8ug 35bp-AS 8/13/2007184 BS5 ML-DmD20-c2 7/25/2007 27.6 7/25/2007 13.4 8ug 35bp-AS 8/13/2007186 BS6 ML-DmD20-c2 7/25/2007 33.5 7/25/2007 14.4 8ug 35bp-AS 8/13/2007188 BS16 ML-DmD20-c5 8/7/2007 62.8 8/8/2007 14.1 8ug 35bp-AS 8/13/2007190 BS17 ML-DmD20-c5 8/7/2007 64.8 8/8/2007 16.2 8ug 35bp-AS 8/13/2007192 BS18 ML-DmD20-c5 8/7/2007 32.3 8/8/2007 16.3 8ug 35bp-AS 8/13/2007194 BS7 ML-DmD17-c3 7/25/2007 23.3 7/25/2007 16.1 8ug 35bp-AS 8/13/2007196 BS8 ML-DmD17-c3 7/25/2007 5.14 7/25/2007 sample lost in EtOH precip198 BS9 ML-DmD17-c3 7/25/2007 58.1 7/25/2007 14.6 8ug 35bp-AS 8/13/2007200 BS34 Sg4 8/7/2007 30.4 8/8/2007 16.0 8ug 35bp-AS 8/13/2007202 BS35 Sg4 8/7/2007 49.8 8/8/2007 12.2 8ug 35bp-AS 8/13/2007204 BS36 Sg4 8/7/2007 44.6 8/8/2007 14.4 8ug 35bp-AS 8/13/2007
•Hosted at http://transcriptome.affymetrix.com/download/modENCODE/
RNA QC by Agilent BioAnalyzersample name conc
(mg/ml) comments GEL EGRAM
#170_BS1_ML-DmD11
0.384
date: 07-25-07
0
10
20
30
40
50
Sam
ple
1
[FU]
20 25 30 35 40 45 50 55 60 65 [s]
Sample 1
#86_BS28_CME L1
0.70613
date: 08-08-07
0
10
20
30
40
50
60
Sam
ple
4
[FU]
20 25 30 35 40 45 50 55 60 [s]
Sample 4
#136_BS14_ML-DmD16-c3
0.70777
date: 08-08-07 RNA sample degraded obtain more from Cherbas group
0
2
4
6
8
10
12
14
16
18
Sam
ple
8
[FU]
20 25 30 35 40 45 50 55 60 65 [s]
Sample 8
transcription in 4 Drosophila cell lines:overlapping transcription
union of transcription in 4 cell lines
0.0E+00
5.0E+06
1.0E+07
1.5E+07
2.0E+07
2.5E+07
3.0E+07
3.5E+07
Kc167
Kc167
+DmBG3
Kc167
+DmBG3+
Cl8
Kc167
+DmBG3+
Cl8+D
mD4
bas
e-p
air
cove
rag
e
union of transcription in 4 cell lines
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Kc167
Kc167+
DmBG3
Kc167+
DmBG3+C
l8
Kc167+
DmBG3+C
l8+Dm
D4
% o
verl
ap (
rela
tive
to
bla
nke
t tr
ansf
rag
s)
exons
500bp promoter
5`exons
introns
intergenic regions
transcription in 4 Drosophila cell lines: overlapping annotation
RNA Samples and Genome Tiling Arrays7 nt resolution arrays
•new 7bp design•5 arrays, total of ~14.4 million probes
•by comparison, 35bp array has ~3.1 million probes•5bp design required 7 arrays… 40% more chips required
•1512 arrays instead of 1080•replicates & strand not calculated in original budget
•updated genome version (release 5) used for design•repeats can be masked or unmasked
•virtual probes
•existing 35-bp design•1 array, total of ~3.1 million probes•Affy commercial group will produce an “updated” 2.0 design
•39bp resolution, release5-based design•however, we will continue using the current design
•35bp resolution more optimal for RNA maps•7bp arrays have better coverage & newer design•question of $ cost per array?
•comparison of nucleotide coverage (dm3, release5)•35bp array = 111,117,940 nt •7bp array masked = 107,355,171 nt•7bp array unmasked = 118,523,115 nt
•Cherbas total RNA samples from 2 cell lines (KC & clone8)
•Same labeled reactions hyb’d to 35bp and 7bp arrays
•Signal graphs generated in TAS: 2 technical replicates for each sample were q-norm together
•Bandwidth = 30 (7bp) or 50 (35bp), Norm target = 200
•Transfrags generated in TAS using 5% bacterial negative controls
•7bp arrays: min-run 50, max-gap 10
•35bp arrays: min-run 50, max-gap 90
•Intersections of 7bp vs 35bp and overlap with FlyBase annotations performed in Galaxy
•Hosted at: http://transcriptome.affymetrix.com/download/modENCODE/pilot_studies/Dros-7bp-pilot/
•Share with modENCODE DCC & ArrayExpress to determine whole-chromosome vs whole-chip data hosting
New 7-bp 5-chip array compared to 35-bp 1-chip array
Pseudo-ROC curves comparing base-pair coverage& overlap with annotated exons
probe FPR thresholds: 7bp vs 35bp arrays
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
0.0% 0.5% 1.0% 1.5% 2.0% 2.5%
% false bacterial transfrags
% b
ase-
pai
r co
vera
ge
35bp exon overlap
35bp total txn
7bp exon overlap
7bp total txn
•five different thresholds for calculated probe false-positive rate were used•1%...3%...5%...7%...10% (7% and 10% not shown for 35bp array)
•7bp arrays clearly have a significantly lower false-positive rate for forming transfrags from bacterial negative regions
•~4-5 fold lower than 35bp arrays •attributable to higher probe density and different min-run & max-gap rules
•35bp and 7bp arrays have similar amount of bp coverage in transfrags •BUT 7bp arrays have 50-65% more transfrags•7bp transfrags are more “fragmented” and do a better job of delineating exons with small introns•7-bp array has better “resolution” of small exons
•Intersection with annotations shows both 35bp and 7bp arrays are detecting similar amounts of transcription as measured by bp coverage
sample bp coverage # of transfrags bp coverage # of transfragsCME-W1-Cl8 35bp array 19,419,814 75,586 12,939,004 43,798CME-W1-Cl8 7bp array 19,205,875 113,791 13,686,476 61,929
KC167 35bp array 21,604,108 78,879 13,772,604 44,243KC167 7bp array 21,744,331 130,819 14,412,649 64,075
*transfrags were generated using bacterial controls false-positive rate of 5%
total array detected
sample% false bact transfrags
% bp coverage (exon overlap)
% bp coverage (total)
35bp_5%fpr 2.3% 46.4% 17.2%7bp_5%fpr 0.5% 50.4% 18.0%
Summary: 7bp arrays
modENCODE RACE array methodology
•5` RACE for 16,000 Drosophila genes
•choice of tissues?
•hybridize products (in pools of 100) to 35bp arrays
•1Mb separation between genes
•confirm presence of transfrags
•identify new, “rare” transfrags due to amplification of PCR
•human ENCODE project has done a similar study on the genes present on chromosomes 21 & 22
Conclusions•array types & applications
•pilot analysis of 7bp arrays•updated for dm3-release5 genome annotation: bpmaps & IGB
•sample processing pipeline & data generation•multiple applications require different types of graphs & transfrags
•bandwidth0 versus smoothing (e.g. bandwidth50)•RACE array
•lessons learned by ENCODE•QC and validation
•some of the specific aims (Northerns, RACE) will address these•additional analysis such as RT-PCR and QPCR validation of novel transcripts
•data hosted at affy-transcriptome website:•http://transcriptome.affymetrix.com/download/modENCODE/•sharing pilot data with DCC (Nicole Washington) to facilitate the process
Steps Moving Forward•adjusting milestones?•changes in samples? (usage of 7bp versus 35bp)•shifting focus in favor of more analysis of small RNAs?•data hosting and transfer issues?
Acknowledgements
ComputationS. GhoshH. TammanaN. GargS. DikeJ. Cheng
Molecular BiologyI. BellJ. DrenkowE. DumaisJ. DumaisR. DuttaguptaP. KapranovA. Willingham J. Manak
AFFX Transcriptome GroupTom Gingeras