Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.106.065961
The Carnegie Protein Trap Library: A Versatile Tool for DrosophilaDevelopmental Studies
Michael Buszczak,* Shelley Paterno,* Daniel Lighthouse,* Julia Bachman,* Jamie Planck,*Stephenie Owen,* Andrew D. Skora,* Todd G. Nystul,* Benjamin Ohlstein,* Anna Allen,*
James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis,†
Nahathai Srivali,* Roger A. Hoskins‡ and Allan C. Spradling*,1
*Howard Hughes Medical Institute Research Laboratories, Department of Embryology, Carnegie Institution of Washington, Baltimore,Maryland 21218, †Department of Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205 and
‡Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720
Manuscript received September 18, 2006Accepted for publication December 18, 2006
ABSTRACT
Metazoan physiology depends on intricate patterns of gene expression that remain poorly known.Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancertrap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. Bysequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced greenfluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, wefound that 600–900 different genes are trapped in our collection. A core set of 244 lines trapped differentidentifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegiecollection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggestnew strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions.
THE central challenge of postsequence genomics isto learn how an enhanced knowledge of genes,
transcripts, and proteins can be applied to betterunderstand the biology of multicellular organisms.Gaining an accurate picture of where and when meta-zoan genes are expressed remains a prerequisite formany such advances (Stathopoulos and Levine 2005).The discovery of distinctive, regulated programs ofgene expression at a fine scale has the potential to revealnew cell types and substructures that make up tissues andthe biological processes that govern their function.However, sensitive and widely applicable methods will berequired to detect and distinguish developmentallyprogrammed gene expression changes from thosecaused simply by cell cycling or environmental pertur-bation.
Several methods for analyzing gene expression withintissues are currently available. Particular cell types cansometimes be cultured in vitro into populations of usefulsize. However, isolated cells in artificial media frequentlybehave differently from cells in vivo interacting withprecisely positioned neighbors in three-dimensionalmicroenvironments. Another approach is to isolatetissue cells by flow sorting, microdissection, or lasercapture and then determine their expression profiles
in depth (reviewed in Espina et al. 2006). Visualizingpatterns of gene expression within the intact tissues oftransgenic organisms containing gene expression re-porters may be the most general method (Tomancak
et al. 2002). Epitope tagging, enhancer trapping, andgene trapping all have the added advantage that geneexpression can subsequently be observed in livingtissues, revealing dynamic processes that are largelybeyond the reach of methods based on fixed material(reviewed in Herschman 2003; Dirks and Tanke 2006).
Protein trapping is a variation of gene trapping inwhich endogenous genes are engineered to produceunder normal controls protein segments fused to areporter such as GFP. The great potential of thistechnology has been extensively documented in yeast,where large collections of strains that each trap adifferent gene have been generated using transposableelements (Ross-Macdonald et al. 1999) or by homol-ogous recombination (Huh et al. 2003). Extensive geneand protein trapping has also been carried out in cul-tured embryonic stem (ES) cells (Gossler et al. 1989;Friedrich and Soriano 1991), where fusions withmore than half of annotated mouse genes have beenrecovered (see Skarnes et al. 2004). However, relativelyfew of these ES cell lines have so far been used to gener-ate corresponding mouse strains where the versatilityand sensitivity of the method for analyzing tissue struc-ture can be tested.
1Corresponding author: Carnegie Institution, 3520 San Martin Dr.,Baltimore, MD 21218. E-mail: [email protected]
Genetics 175: 1505–1531 (March 2007)
Large-scale protein trap screens may also reveal newinformation about genome structure and function.Identifying in an unbiased manner locations through-out a genome where a coding exon can be expressedtests the accuracy and completeness of its annotation.Characterizing the splicing patterns that lead to normalor aberrant GFP expression tests the current catalog oftranscript isoforms generated by alternative splicing.Moreover, by recovering insertions in the same genethat splice differently and produce GFP with varyingefficiency, such a project might generate a data set usefulfor studying the determinants of splice site selection andtranscript stability.
Drosophila provides a favorable system for applyinggene traps to diverse developmental and genomicstudies. The genome sequence has been extensivelyannotated on the basis of experimental data (Misra
et al. 2002). Thousands of enhancer trap lines have beengenerated in large-scale transposon screens and culledof redundant strains by the gene disruption project (seeBellen et al. 2004). In contrast, producing Drosophilaprotein trap lines has remained difficult. Several hun-dred such lines were generated using a mobile GFP-containing exon flanked by both splice acceptor anddonor sites (Morin et al. 2001; Clyne et al. 2003).However, the process was highly inefficient, with as fewas 1 in 1500 progeny flies expressing GFP. Positive linesoften contained more than one insertion, preferentiallytagged a small number of hotspot loci, and tagged manysites not predicated to fuse the GFP exon in frame toany known coding region (Morin et al. 2001). Kelso
et al. (2004) found that the recovery of lines could beincreased by using an automated embryo sorter to selectGFP-positive embryos and established a website, Fly-Trap, to gather information on Drosophila protein traplines. Consequently, we initiated a large-scale proteintrap screen to increase gene coverage, test the genomeannotation, and address some of the remaining techni-cal difficulties in efficient line production.
Here we report the production of lines that trap 600–900 Drosophila genes, including 244 where one or moretrapped proteins can currently be identified. Using theDrosophila ovary as a test system we confirm that proteintrap lines reveal fine-scale details of developmentallyregulated protein expression, making them exception-ally valuable discovery tools for a wide range of studies.Finally, mapping RNA splicing patterns downstreamfrom .1200 insertions provides insight into how anadded exon affects splicing and suggests how the pro-duction of protein trap lines can be expanded to cover alarger fraction of the Drosophila proteome.
MATERIALS AND METHODS
Generation of P-element lines for protein trap screening:The P-element-based protein trap screens presented here uti-lized the pPGA, pPGB, and pPGC vectors described in Morin
et al. (2001). These elements carry a mini-white transgene inthe opposite orientation to an enhanced green gluorescentprotein (EGFP) exon, which is composed of EGFP sequence,without start or stop codons, flanked by splice acceptor anddonor sites from the Drosophila MHC locus. A, B, and C referto the position of the splice sites within the first and last codonsof the EGFP exon sequence. Previously used pPGA, pPGB, andpPGC third chromosome insertions (Morin et al. 2001) wereremobilized in the presence of balancer chromosomes. Newinsertions that mapped to the CyO balancer chromosome, didnot express EGFP, and exhibited remobilization rates off of theCyO balancer of at least 60% in single-pair mating assays wererecovered and used as starting stocks in the screen (see below).
piggyBac protein trap vectors: To make a shuttle vector forsubcloning the EGFP exon into different transposable ele-ments, the entire EGFP exons from the pPGA, pPGB, or pPGCplasmids were excised from the original P-element plasmids(kind gift of W. Chia) using EcoRI and PstI and subcloned intopBluescript (Stratagene, La Jolla, CA). These plasmids werethen cut with EcoRV and KpnI, end filled using Klenow, andreligated to themselves to create pBS-GFPA, pBS-GFPB, andpBS-GFPC. The resulting plasmids carry the EGFP exonsequence between a unique EcoRI site at the 59 end andunique PstI, SmaI, BamHI, and XbaI sites at the 39 end. Newtagging sequences can be inserted between the splice acceptorand donor sites of the exon using unique NcoI and XhoI sites.
Two different piggyBac protein trap vectors (Figure 1) wereconstructed using pBac{D. m. w1} (Handler and Harrell
1999) (kind gift of A. Handler). The pBac{D. m. w1} plasmidwas digested with ClaI to remove most of the mini-whitesequence and a linker containing HpaI, XhoI, and SpeI siteswas inserted in its place to form pBAC{DClaI}. To createpBAC{BglII-GFP}, the EGFP exons from pBS-GFPA, pBS-GFPB,and pBS-GFPC were subcloned into the BglII and MfeI sitesof pBAC{DClaI}. To create pBAC{HpaI-GFP}, EGFP exon se-quences were inserted between the MfeI and HpaI sites ofpBAC{DClaI}. An intronless yellow transgene from the yellow-BSX plasmid (Bellen et al. 2004) was then subcloned into theunique SpeI site of both pBAC{BglII-GFP} and pBAC{HpaI-GFP}to form either pBAC{BglII-GFP; y1}, which has the EGFP exonand yellow transgene oriented away from each other, orpBAC{HpaI-GFP; y1}, which has the EGFP exon and yellowtransgene pointing toward each other (Figure 1A). BothpBAC{BglII-GFP; y1} and pBAC{HpaI-GFP; y1} vectors carryingthe EGFP exon in the A frame were transformed into y w fliesusing the phspBac helper plasmid (Handler and Harrell
1999) (kind gift from A. Handler).We created stable genomic sources of the piggyBac tranpo-
sase using P-element transformation vectors. To place thepiggyBac transposase under control of the ubiquitin promoter,the piggyBac transposase ORF was excised from phspBac usingBamHI and DraI and ligated into the BamHI and SmaI sites ofthe pCasper3-Up2-RX poly(A) P-element vector (Ward et al.1998) (kind gift of R. Fehon), which carried a modifiedmultiple cloning site (kind gift of A. Hudson), to form pP{Ub-pBACtrans}. To make an inducible piggyBac transposase source,phspBac was digested with EcoRI and DraI and the fragmentcontaining both the Drosophila hsp70 promoter and piggyBactransposase ORF was ligated into the EcoRI and StuI sites ofpCasper4 to form pP{hsp70-pBACtrans}. These vectors wereused to transform y w flies, using standard P-element trans-formation techniques.
To test the activity of the piggyBac transposase transgenes,single-pair matings were set up using the pBAC{HpaI; y1}24.3insertion, which mapped to the X chromosome, and pP{Ub-pBACtrans} or pP{hsp70-pBACtrans} stocks. The pBAC{HpaI;y1}24.3 insertion was mobilized in males that were thenoutcrossed to y w females. Phenotypically yellow1 males in
1506 M. Buszczak et al.
the next generation were scored as new insertions. ThepP{hsp70-pBACtrans} was able to remobilize the pBAC{HpaI;y1}24.3 insert in 43% (n ¼ 30) of single-pair matings testedwhereas the pP{Ub-pBACtrans} was able to remobilize thepBAC{HpaI; y1}24.3 insert in 48% (n ¼ 21) of single-pairmatings tested. The pBAC{HpaI; y1}24.3 insertion was remobi-lized in the presence of a CyO balancer chromosome. Newinsertions that did not express EGFP and mapped to the CyOchromosome were used in the pBAC-based protein trap screen.
Generation of embryos with novel transpositions: Weisolated new EGFP-expressing P-element insertions using thefollowing genetic scheme:
F1 :yw
Y;
Sco
CyO pPGfw 1; GFP exong;1
13
yw
yw;
1
1;Ki; Pfry1t7 :2 ; D2 -3g99B
Ki; Pfry1t7 :2 ; D2 -3g99B
F2 :yw
Y;
1
CyO pPGfw 1; GFP exong;1
Ki; Pfry1t7 :2 ; D2 -3g99B3
yw
yw;
1
1;
1
1
Sort embryos: :
New pBAC insertions were generated through a similargenetic scheme:
F1 :yw
Y;
Sco
CyO pBACfy 1; GFP exong;1
13
yw
yw;
1
1;Pfw 1; pBACtransgPfw 1; pBACtransg
F2 :yw
Y;
1
CyO pBACfy 1; GFP exong;1
Pfw 1; pBACtransg3yw
yw;
1
1;
1
1
Sort embryos:
Hereafter, P-element and piggyBac element-based protein traplines were treated the same. For the F1 cross several hundredmales and females of the appropriate genotypes were matedin bottles to produce several thousand males in which theelements were mobilized for the F2 cross. These males werecrossed to 8000–10,000 virgin y w females in a population cage.These virgin females were obtained using a virgining stock thatcarried a heat-shock-inducible hid transgene on the Y chro-mosome (kind gift of R. Lehmann). Two separate overnightembryo collections from each population cage were screenedfor EGFP expression. We limited the number of times wescreened embryos from a particular cage to try to minimize thenumber of identical insertions recovered due to premeoiticinsertion events.
Embryo sorting and line establishment: We screened forEGFP expression in embryos using a COPAS Drosophilaembryo sorter (Union Biometrica). Embryos were dechorio-nated in 50% bleach for 2.5 min and washed extensively withwater. Dechorionated embryos were then washed into sortingsolution (0.53 PBS, 2% Tween-20). With the exception of thesorting solution, the COPAS sorter was used according to themanufacturer’s protocol, using the manufacturer’s solutions.The sorter and sample pressures of the COPAS machine andembryo density were maintained so that the COPAS sorterscreened 15–20 embryos/sec. The sorter used a 488, 514 nmmultiline argon laser. EGFP fluorescence was detected usingPMT1 set to 510 nm. Red fluorescence, used as a measure ofembryo autofluorescence, was detected using a second PMTset to 580 nm. Baseline values for each fluorescent axis were set
empirically using previously isolated fly strains that express lowlevels of EGFP and y w non-EGFP expressing embryos (Figure1). Approximately 250,000 embryos were sorted in five 50,000-embryo batches per day. Sorted embryos were collected andwashed in dH2O. All the embryos from a single batch wereplaced together in standard food vials. We estimate that�80%of the sorted embryos survived to adulthood. Sorted flies thatsurvived to adulthood and did not carry the Ki, P{ry1t7.2;D2-3}99B or P{w1; pBACtrans} chromosomes were individuallyoutcrossed to a y w stock. New lines that carried EGFP-expressing insertions that did not map to the starting CyOchromosome were maintained as stocks.
DNA sequencing, RT–thermal asymmetric interlaced PCRanalyses, and prediction of fusion potential: Genomic se-quences flanking either P-element or piggyBac protein trapinsertions were determined by members of the LawrenceBerkeley Lab group using an established protocol for sequenc-ing inverse PCR products from genomic DNA (Bellen et al.2004). Database software developed for the annotation of theDrosophila gene disruption project (Bellen et al. 2004) wasused to manage the sequence data. Once the insertion site of agiven protein trap line was determined, a FileMaker Pro data-base that contained information [version 3.2 of the Drosoph-ila genome annotation (Misra et al. 2002)] for all Drosophilatranscripts, exons and introns, and their reading frames wasused to predict which gene(s) and transcript(s) were beingtrapped by a given protein trap insertion.
We developed a reverse transcriptase coupled thermalasymmetric interlaced PCR (RT–TAIL) protocol largely onthe basis of methods used to determine T-DNA insertion sitesin Arabidopsis (Singer and Burke 2003). This methodallowed us to determine the mRNA sequence adjacent to theEGFP exon without using gene-specific primers. Total RNAwas isolated from 15 adult flies using an RNAqueous-96automated kit (catalog no. 1812; Ambion, Austin, TX). Thesamples were ground in 200 ml of sample buffer and spun for 5min at 14,000 rpm. The supernatant was placed in a 96-wellplate and 100 ml of 100% EtOH was mixed with each sample.The sample was transferred to the filter plate, washed, andthen treated with Dnase I (Ambion) for 15 min. Rebindingbuffer was added to each well of the filter plate, and the platewas washed extensively. The RNA was eluted off the filter andprecipitated with 7.5 m LiCl solution (Ambion). The resultingRNA pellet was washed with 75% EtOH and then retreatedwith Dnase I for 30 min at 37�. Dnase inactivation reagent(Ambion) was added to the samples. The RNA samples werespun and the supernatant was transferred to a new plate. Adetailed protocol is available upon request.
The following GFP-specific primers were used for RT–TAILPCR:
GFP-For1, 59-GGAGGACGGCAACATCCTGG-39;GFP-For2, 59-CAACGTCTATATCATGGCCG-39;GFP-For3, 59-AGACCCCAACGAGAAGCGCG-39;GFP-Rev1, 59-GTCGTGCTGCTTCATGTGGTCG-39;GFP-Rev2, 59-GACACGCTGAACTTGTGGCCG-39;GFP-Rev3, 59-AGCTCCTCGCCCTTGCTCACC-39 .
The arbitrary degenerate (AD) primers used in this studywere originally described by Singer and Burke (2003) but arelisted here for convenience:
AD3, 59-AGWGNAGWANCAWAGG-39;AD4, 59-STTGNTASTNCTNTGC-39;AD5, 59-NTCGASTWTSGWGTT-39;AD6, 59-WGTGNAGWANCANAGA-39.
A pool of the AD primers was mixed according to Singer
and Burke (2003).
Protein Trap Lines in Drosophila 1507
The first round of RT–TAIL PCR was set up in 96-well formatusing a one-step RT–PCR kit (QIAGEN, Valencia, CA). Forevery reaction 5 ml of total RNA was mixed with 10 ml 53 buffer,2 ml 10 mm dNTP solution, 1 ml GFP-For1 or -Rev1 primer, 12.5ml AD primer mix, 2 ml enzyme mix, and 17.5 ml of dH2O. Thereverse transcription reaction was carried out at 50� for 30 min.The sample was then heated to 95� for 15 min and then cycledfor primary TAIL–PCR according to Singer and Burke
(2003), using a MJ Research (Watertown, MA) thermal cycler.The secondary and tertiary TAIL–PCR reactions were carriedout according to Singer and Burke (2003), using GFP-For2 or-Rev2 primers and GFP-For3 or -Rev3 primers, respectively, andregular TAQ DNA polyermase (Roche, Indianapolis). ThePCR products of the tertiary reaction were treated with exoSAP(United States Biochemical, Cleveland) and sequenced usingGFP-For3 or -Rev3 primers.
The RT–TAIL PCR protocol using the three GFP-Forprimers, which amplified off of the 39 end of EGFP, consis-tently yielded better results than the same reaction using theRev primers. Therefore most of the RT–TAIL PCR data definesplicing products at the 39 end of the EGFP sequence. Toidentify sequence fusing to the 59 end of EGFP, we employed a59 RACE kit according to the manufacturer’s protocol (Am-bion), using the EGFP reverse primers listed above.
Analysis of protein expression in tissues: Samples weredissected in Grace’s medium, placed in 48-well plates outfittedwith a nylon mesh bottom, and fixed in 4% paraformaldehydebuffered in 13 PBS for 10 min at room temperature. The platewas washed extensively with PBT (13 PBS, 0.5% Triton X-100,0.3% BSA) and incubated overnight at 4� with rabbit anti-GFPantibody (Torrey Pines) (1:2000) in PBT. The samples werethen washed extensively with PBT and incubated with goatanti-rabbit Alexa488 (Molecular Probes, Eugene, OR) (1:400)for 4 hr at room temperature. The samples were then washedwith PBT, stained with 2 mg/ml of DAPI, and mounted inVectashield (Vector Laboratories, Burlingame, CA). Imageswere collected using a Leica SP2 confocal microscope.
RESULTS
Generating a large initial collection of tagged strainsexpressing EGFP: Our initial strategy was to generate amuch larger number of lines containing new proteintrap vector insertions than in previous screens and toinstitute additional technical improvements. Because oftheir proven utility, we used the same P-element-basedprotein trap vectors employed by Morin et al. (2001),but we also constructed a similar set of vectors withpiggyBac (Figure 1A). As described in materials and
methods, we set up crosses in small population cages tolimit the recovery of clusters, utilized dominant markersto remove the transposase source from all new lines, andidentified rare GFP-expressing embryos rapidly andsensitively using an automated embryo sorter (Figure1B). This protocol allowed us to screen .60 millionembryos over a period of 2.5 years, to identify .7500‘‘green’’ embryos, and to use each one to start anindividual culture (see Table 1). Ultimately, 7404 strainswere successfully established, maintained by selectionfor white1 eye color, and analyzed further as dia-grammed in Figure 1C.
The same scheme was used with both transposons;however, in practice the piggyBac vectors were not nearly
as efficient at generating EGFP-positive candidate linesas the P-element vectors (Table 1). P-element vectorstypically exhibited 70% mobilization and generated �1EGFP-expressing embryo per 1000 sorted. In compari-son, the piggyBac vectors displayed nearly 50% mobili-zation, but they yielded only 1 EGFP-expressing embryoper 50,000 sorted. Thus, the piggyBac vectors wereslightly less efficient at mobilization, but drastically lessefficient at generating EGFP-positive lines upon in-sertion. Consequently, we soon abandoned attempts togenerate large numbers of piggyBac protein trap inser-tions (Table 1), but continued to characterize the lineswe did recover to learn if they would shed any light onthe lower frequency of trapping observed.
Localizing insertions on the annotated genome: Toidentify candidate proteins that may have been fusedwithin individual lines, we determined the genomicDNA sequence flanking the insertion(s) in collabora-tion with the Berkeley Drosophila Genome Project(BDGP) gene disruption project (materials and
methods). In most cases, the sequences from boththe 59 and 39 vector end junctions mapped by BLASTanalysis to a unique insertion site within the Drosophilagenome sequence. Lines for which the sequencingreaction failed, the sequence matched repetitive DNA,or the 59 and 39 sequences differed (indicating that twoor more insertions were present in the stock) wererecycled back into the starting pool, and frequently aunique single insert was eventually identified. Alto-gether the insertions in 1375 C frame, 3172 B frame,and 1009 A frame P-element and 164 piggyBac A framelines were localized to unique genomic sites.
Knowing the genomic location of an insertionallowed us to predict which transcripts would incorpo-rate the EGFP exon and whether they would undergosplicing and translation into a functional fusion protein.First, we removed �1550 duplicate lines derived frompremeiotic clusters that were identified because theybore insertions identical in position and orientation tothose in one or more sibling lines. Of the 4170 in-dependent lines remaining, 2149 (52%) were associatedwith a gene correctly oriented for possible fusion (i.e.,located between �500 and the 39 end). We also clas-sified the ways an insert can be located relative to itsclosest annotated transcript into general categories asdiagrammed in Figure 1D and classified all the lines(Table 2).
Expression of a fusion protein is expected when theGFP exon resides between two coding exons within anintron of matching reading frame (class 1A). Such inser-tions made up only 23% of the total localized insertionsand defined 192 different genes (Table 4). Forty percentof insertions were close to an annotated gene but werenot predicted to express the EGFP exon (classes 2–4),while 37% of the lines were not located within 0.5 kb of acorrectly oriented gene. EGFP production from theselines might be explained by the use of unannotated
1508 M. Buszczak et al.
Figure 1.—Generation and classification of protein trap vector insertions. (A) Schematic of protein trap vectors (after Morin
et al. 2001). (B) Sample output from automated sorting of Drosophila embryos mobilized from site not expressing GFP. RareGFP1 embryos (red circles) registering above a threshold value are diverted by the machine and later used to start individualcultures. (C) Scheme for characterization of putative protein trap lines (see text). (D) Classification of the general types ofrelationships between transposon inserts and the local genome annotation. Classes 1–4 consist of insertions in the appropriateorientation located within a codon intron (class 1), a noncoding transcribed region (class 2), an upstream genomic region (class3), or an exon (class 4). For each class, the insert was either of the appropriate frame (subclass A) or of nonappropriate frame
(Continued)
Protein Trap Lines in Drosophila 1509
genic elements, by noncanonical splicing events, or bythe presence of a second insertion at a canonical site.
The Drosophila genome annotation is highly sup-ported by experimental evidence (Misra et al. 2002);hence the frequency of these discrepancies was sur-prising, but a similar outcome has been reported inprevious protein trap screens using both yeast (Ross-Macdonald et al. 1999) and Drosophila (Morin et al.2001). Some protein-coding genes may have beenmissed within cDNA libraries (Hild et al. 2003), and asubstantial number of genes may contain unannotatedfar-upstream promoters and alternative translation startsites. There might be a large class of RNA genes thathave escaped detection but that can drive expression ofEGFP using cryptic start sites. Alternatively, the highselective pressure used to isolate EGFP1 strains mayhave led to the recovery of rare events in which thenormal gene or transcript structure has changed.
Analysis of fusion transcripts: We sequenced por-tions of the fusion transcripts to address how EGFPexpression arises in lines of various classes. A limitednumber of 59 RNA sequences were obtained by 59 RACEanalysis or by TAIL–PCR (see materials and methods).As expected, several lines in class 1A were found toinitiate in normal exons upstream from the insertionsite. However, we discovered several CB lines that
spliced into the EGFP exon from the 59 P-element se-quences of the vector. The P-element promoter is highlyefficient at enhancer trapping, and the entire first exonof the transposase gene is present in the vector alongwith the start of intron 1, which is in the frame com-patible with CB lines. These lines were associated withnuclear localized EGFP, possibly due to fusion of thefirst exon of P transposase with EGFP. Further evidenceof enhancer trapping was observed in the analysis of lineCA07138. The EGFP RNA was fused to sequences, in-cluding an in-frame ATG start codon, derived by tran-scription and splicing from the noncoding strand of themini-white transgene carried in the P-element vector(Figure 1E). These observations suggest that EGFP ex-pression in a significant number of the lines depends ontranscripts initiated from within the transposon itself byenhancer trapping rather than on EGFP exon additionby splicing into exogenous transcripts.
Analysis of downstream transcript sequences: Togain additional information, we analyzed the sequencesdownstream from the EGFP exon from many lines in thecollection by carrying out RT–TAIL–PCR in a 96-wellformat (materials and methods). 39 sequences up to700 bp in length and defining the location of one tosix downstream exons were obtained for .1200 lines(Table 3). The pattern of downstream splicing allowedproductive fusions to be identified and indicated linesthat splice out-of-frame and likely become subject tononsense-mediated decay (NMD) (Vasudevan andPeltz 2003). Fusions within lines of classes 2–4 often
(subclass B) to fuse to the protein if splicing continued to the next annotated exon splice acceptor site. Class 5 consists of trans-posons inserted .0.5 kb from a correctly oriented annotated gene. (E) The structure of cryptic transcripts initiated within theDrosophila mini-white marker gene that contain an ATG codon and splice in frame to EGFP, thereby allowing expression inde-pendent of an endogenous transcript in some lines. (F) Western blot analysis of Dlg1 and eIF-4E protein production in controlanimals (y w, CC00380) and insertion lines predicted to trap Dlg1 (CC01936) or eIF-4E (CC00392, CC00375, and CC01492). (G)Abnormal nuclear accumulation of CG15015-EGFP in line CC01311 whose insertion lies within the FHC domain (left). Tissueculture cells expressing N-terminal or C-terminal fusions are found in the cytoplasm (center and right).
TABLE 1
Project summary
Type No. %
Total setup 7404 100CA 1344 18CB 4000 54CC 1870 25piggyBac A 190 3Aligned sequence 5720 77Clusters 1550 21Independent aligned 4170 56Gene hits 2149 29Spacing hits 2093 28Different genes 1154 16Protein traps 244Enhancer traps 256Novel gene/exon 50Unclassified 300Balanced stocks 878 12
TABLE 2
Line types
Lines
Type Code N %
Coding intron, in frame 1A 1337 23Coding intron, out of frame 1B 173 3Noncoding intron, in frame 2A 29 1Noncoding intron, out of frame 2B 429 81–500 bp upstream, in frame 3A 56 11–500 bp upstream, out of frame 3B 756 13Exon 4 804 14.500 bp to next oriented gene 5 2093 37Total 5677 100
See class definitions in Figure 1D.
1510 M. Buszczak et al.
were predicted to encode a ‘‘linker peptide,’’ whichmight or might not include a stop codon, derived fromthe translation of a small segment of upstream nucleo-tides. The RNA analysis also revealed the presence of asecond insertion in 14% of type 1A lines, but between 24and 50% of the other classes. The second insertionsfound within class 2–5 lines were often valid protein trapalleles (class 1A) and were frequently the true source ofthe lines’ EGFP production. This information allowedus to identify additional candidate fusions (Table 4), tocorrect many initial line classifications, and to moreaccurately estimate the total number of trapped genes(Table 1: 600–900). By the time lines were selectedand balanced, secondary insertions or damage did notcontribute substantially to the phenotypes reported inTable 4. Tests estimated the frequency of backgroundlethal mutations among balanced, saved lines at 7–21%,similar to the best transposon screens (Spradling et al.1999).
Novel splicing suggests new genes and exons: Wecompared the splicing observed downstream from theinserted exons with that of the genome annotation(Misra et al. 2002) to identify new Drosophila gene andtranscript isoform candidates. To identify new candi-date genes, we focused on lines inserted .0.5 kb froman appropriately oriented known gene and for whichRNA sequence data were also available. In 114 of these205 lines, the RNA sequence coincided with the positionand orientation of the insertion and therefore indicatedthe splicing pattern downstream of the single EGFPexon. Most of the lines spliced to one or more novelexons. At least 8 probably correspond to unannotatedgenes because they match previous gene predictions(Hild et al. 2003) or are supported by EST data (seeTable 4). Most of the remaining exons do not predictproteins with homologs in other species and representeither aberrant splicing events or novel or untranslatedexons.
Similar analysis of 297 lines with intron insertionsallowed us to test for novel exons and transcriptisoforms. We examined 443 splicing events and identi-fied a total of 35 (7.9%) that did not correspond tocurrent gene models (Misra et al. 2002). Since at least
some of these differences probably resulted from aber-rant splicing induced by the insertions, this represents amaximum estimate of the fraction of unannotatedgenomic exons and emphasizes the high accuracy andcompleteness of current Drosophila gene models, atleast for abundant transcript forms. Often, the RNAdata indicated which isoform among several predictedto fuse in frame is likely to predominate in ovariantissue. For example, we could determine that lineCA06613 in ovarian tissue predominantly fuses theSu(Tpl) gene rather than Mi-2, in whose transcriptionunit it also lies in frame.
The nature of the noncanonical splices observed wasinteresting. The most common events (21/35) were forinsertions in large introns to splice to a novel exon(s)prior to joining the predicted downstream exon. Somesimply appear to define alternative isoforms that skipexons or utilize different exon combinations not pre-viously documented. Some of these events may havebeen induced by the abnormal position of the EGFPexon within the primary transcript. However, severallines appear to define alternative isoforms because theyutilize different combinations of known exons in nopreviously documented transcript isoforms. Three linesutilized 59 start sites for exons that differed by 6, 21, or27 bp from the annotated exon. The CC01473 transcriptreads through an annotated exon into the adjacentintron and probably defines a novel alternate transcript39 end. Although we consider it likely that many of thesedifferences reflect endogenous Drosophila gene ex-pression, all of the candidate novel genes and transcriptisoforms require independent confirmation in strainslacking protein trap insertions. Such tests were beyondthe scope of our project.
Protein trap insertions likely vary in the fraction of theendogenous protein that is tagged with EGFP for avariety of reasons. First, in many lines only some of themultiple-transcript isoforms contributing to proteinproduction are tagged by the insertion and fused inframe. Second, the splicing efficiency of the EGFP exonmight vary due to its surrounding genomic context. Toinvestigate this issue, we analyzed the protein productsof tagged genes by Western blotting. The tagged pro-teins were easily distinguished from their wild-typecounterparts on the basis of size and by probing withprotein-specific and anti-EGFP antibodies (Figure 1F).In line CC01936 all three isoforms are predicted toincorporate the EGFP exon in frame, and nearly all ofthe ovarian Dlg1 protein incorporated EGFP as in-dicated by its mobility. A similar result was reportedpreviously in the case of line CB02119 (Buszczak andSpradling 2006), where the precursors of five of sixannotated transcripts are predicted to contain theinsertion, although only two fuse in frame. In contrast,only �50% of the ovarian wild-type eIF-4E protein istagged with EGFP (Figure 1F) despite the fact that six ofseven annotated eIF-4E transcripts initiate upstream
TABLE 3
RNA analysis
Type SuccessfulConfirmed
DNASecondinsert
%confirmed
CA 316 224 50 82CB 328 166 75 69CC 572 165 72 70Piggy A 13
Totals 1229 555 197 74
Protein Trap Lines in Drosophila 1511
TA
BL
E4
Iden
tifi
edtr
app
edp
rote
ins
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
14-3
-3e
CA
0650
614
0689
623R
1L
eth
alR
A1.
51
4R
B1.
51
4R
C1.
51
4R
D1.
51
41A
26-2
9-p
CA
0673
513
9873
133L
1h
vR
A0.
51
33A
Aco
nC
C00
758
2116
9155
2L1
Let
hal
RB
1.5
14
1AA
ctn
CC
0196
119
2733
2X
�h
vR
B2.
52
10R
A2.
52
10R
C2.
52
101A
AG
O1
CA
0691
498
4163
32R
�h
vR
C3.
53
8R
A3.
52
7R
B0.
52
71A
Alh
CC
0136
729
3565
33R
�h
vR
A5.
52
10R
D5.
52
101A
apt
CC
0139
219
4683
192R
1h
vR
B1.
51
5R
D1.
52
6R
E1.
53
6R
C1.
51
51A
apt
CC
0118
619
4738
082R
1h
vR
B1.
51
5R
D2.
52
6R
E2.
53
6R
C1.
51
51A
arg
CB
0357
941
7662
X1
hv
RA
3.5
15
1AA
rgk
CB
0549
290
5163
33L
�L
eth
alR
A3.
52
4R
B2.
52
31A
Arg
kC
B03
789
9056
078
3L�
Let
hal
RA
2.5
24
RB
0.5
23
1AA
tpa
CC
0031
916
7834
183R
1L
eth
alR
C1.
53
10R
A1.
51
10R
B0.
52
9R
D0.
53
61A
baz
CC
0194
117
0725
82X
1h
vR
A1.
51
71A
bel
CC
0086
944
8535
03R
�h
vR
A1.
51
41A
Bes
t1C
B02
354
5996
196
3R�
hv
RA
12
74
bT
ub
56D
CC
0206
915
3385
632R
�L
eth
alR
B1.
51
2R
C1.
52
2R
D1.
52
21A
bo
nC
B02
667
1641
8866
3R1
Sem
ilet
hal
RA
0.5
110
3BB
sgC
A06
978
8104
393
2L1
hv
RB
2.5
27
RA
2.5
27
RD
2.5
27
RC
2.5
27
1Ab
un
CB
0343
112
4829
432L
�h
vR
A3.
51
5R
B1
13
RD
1.5
13
RE
2.5
24
1AC
amC
C00
814
8149
270
2R1
Let
hal
RA
2.5
25
1AC
AP
CA
0692
461
9037
82R
1h
vR
I9.
51
14R
H7.
53
12R
J8.
52
13R
G7.
53
121A
CA
PC
A07
185
Tra
nsp
oso
n2R
1h
vR
I/R
F3.
51
141A
Cat
CC
0090
718
8159
513L
1L
eth
alR
A1.
51
31A
Cg
CC
0146
910
0631
842R
1h
vR
D1.
51
10R
B1.
51
11R
C2.
53
11R
A2.
52
111A
CG
1072
4C
A07
499
1340
6231
3L1
hv
RA
1.5
17
RB
1.5
17
1AC
G11
138
CA
0684
412
4725
70X
�h
vR
C1.
51
41A
CG
1125
5C
B04
917
1301
5302
3L�
hv
RA
1.5
13
1AC
G11
266
CC
0139
170
3315
72L
1h
vR
A2.
52
6R
D2.
58
9R
G2.
57
8R
F1.
56
71A
CG
1196
3C
C06
238
4764
704
3R1
Sem
ilet
hal
RA
2.5
29
1AC
G12
163
CC
0062
510
7693
53R
�h
vR
A1.
51
6R
B1.
51
51A
CG
1278
5C
C06
135
1209
8103
3R�
Let
hal
RA
11
64
CG
1392
0C
C01
646
1650
510
3L1
Let
hal
RA
1.5
13
1AC
G14
207
CB
0206
919
5024
92X
1L
eth
alR
A1.
51
4R
B1.
51
51A
CG
1440
CA
0728
783
3169
9X
1h
vR
A1.
51
51A
CG
1464
8C
A06
610
2292
313R
1h
vR
A1.
51
6R
B1.
53
61A
CG
1465
6C
A06
996
6240
153R
�L
eth
alR
A2.
51
31A
CG
1592
6C
B04
063
1236
5107
X1
hv
RA
0.5
25
3AC
G16
00C
B03
410
3416
244
2R�
hv
RA
1.5
23
RC
1.5
13
RB
1.5
23
1AC
G17
273
CC
0129
416
6549
863R
�L
eth
alR
A1.
51
51A
CG
1764
6C
B02
833
1737
431
2L1
hv
RB
1.5
212
RA
0.5
212
2BC
G18
88C
B02
075
5434
043
2R�
hv
RA
1.5
12
1B
(con
tin
ued
)
1512 M. Buszczak et al.
TA
BL
E4
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
CG
1910
CC
0149
127
5738
623R
1h
vR
B1.
52
4R
A1.
51
4R
D0.
52
41A
CG
3036
CA
0680
148
9415
72L
1h
vR
A2.
52
71A
CG
3101
2C
A06
686
2664
0622
3R1
hv
RC
1.5
16
1AC
G31
012
CA
0681
026
6460
373R
1h
vR
C4.
51
6R
D1.
51
31A
CG
3169
4C
A07
748
2873
276
2L�
hv
RA
1.5
15
1AC
G32
062
CC
0051
110
5156
083L
1h
vR
B2.
52
13R
D2.
52
141A
CG
3242
3C
C00
236
5177
892
3L�
hv
RA
3.5
310
RD
2.5
29
RB
3.5
310
RC
12
81A
CG
3247
9C
A06
614
8820
413L
1L
eth
alR
A4.
52
101A
CG
3248
6C
C00
904
3060
297
3L�
hv
RD
1.5
18
1AC
G32
560
CA
0677
217
6324
69X
1h
vR
A3.
51
81A
CG
3287
CB
0448
326
9996
72R
1h
vR
B2.
52
5R
C1.
51
41A
CG
3393
6C
C01
586
5176
796
3R1
Let
hal
RA
2.5
36
RB
1.5
25
2BC
G38
10C
A07
694
1802
294
X�
hv
RC
1.5
24
RB
12
4R
A0.
51
32B
CG
3939
CA
0756
230
6987
1X
1h
vR
A1.
51
21A
CG
5059
CA
0692
620
5097
423L
�h
vR
A1.
51
5R
C1.
52
5R
B1.
51
5R
D1.
52
51A
CG
5060
CC
0052
616
0790
413R
1L
eth
alR
A1.
51
71A
CG
5130
CB
0361
920
4865
523L
�h
vR
A1.
52
4R
B1
24
2AC
G51
74C
A07
176
1430
9292
2R1
hv
RJ
1.5
16
RI
1.5
16
RA
1.5
16
RH
1.5
15
1AC
G53
92B
A00
207
1500
1227
3L�
RA
2.5
28
1AC
G61
51C
A07
529
1580
8775
3L�
Let
hal
RA
2.5
15
RC
2.5
15
RB
2.5
15
1AC
G63
30C
B03
223
2277
9020
3R�
Let
hal
RB
2.5
26
RA
0.5
15
1AC
G64
16C
C00
858
8627
040
3L1
Let
hal
RE
3.5
19
RF
3.5
19
RA
2.5
28
RG
2.5
25
1AC
G64
24C
C00
677
1360
4099
2R�
hv
RA
23
4R
B0.
52
34
CG
6783
CA
0696
073
9231
33R
�h
vR
B1.
51
3R
C1.
51
3R
A1.
52
41A
CG
6854
CA
0733
215
0993
553L
1L
eth
alR
C1.
51
4R
A1.
52
4R
B1.
52
51A
CG
6930
CA
0655
675
9741
03R
�h
vR
A0.
51
5R
B2.
52
6R
C1.
51
51A
CG
6945
CC
0086
415
0882
633L
�L
eth
alR
A0.
53
33A
CG
7185
CC
0064
583
0808
93L
�h
vR
A1.
51
71A
CG
7484
CB
0410
117
6472
863L
1h
vR
B0.
51
33A
CG
8209
CB
0208
679
6969
33L
1h
vR
A1.
51
31A
CG
8213
BA
0016
948
6330
92R
�R
A1.
51
81A
CG
8351
CA
0722
846
3129
93R
1L
eth
alR
A4.
51
51A
CG
8443
CA
0660
412
0751
982R
1h
vR
A1.
51
71A
CG
8552
CA
0735
281
5958
02L
�h
vR
A0.
51
73A
CG
8583
CA
0660
373
5438
83L
1h
vR
A1.
51
41A
CG
8920
CC
0082
516
2081
992R
1h
vR
C2.
52
8R
B2.
52
8R
A1.
52
41A
CG
9331
CB
0496
220
8245
592L
1h
vR
B1.
52
5R
D1.
52
5R
C1.
52
6R
E1.
51
51A
CG
9772
CB
0218
816
3496
3R�
hv
RB
1.5
15
RA
0.5
15
RC
0.5
12
1AC
G97
96C
C00
817
9226
998
3R�
hv
RA
1.5
14
1AC
G98
94C
C00
719
2755
189
2L1
hv
RB
2.5
24
RA
1.5
13
1AC
p1
CC
0137
798
5205
72R
1h
vR
B2.
52
4R
A2.
52
4R
C2.
51
41A
(con
tin
ued
)
Protein Trap Lines in Drosophila 1513
TA
BL
E4
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
Crc
CA
0650
754
5615
83R
�h
vR
A2.
51
41A
Crp
CB
0307
316
2802
972L
�h
vR
A1.
51
31A
Cyc
BC
C01
846
1869
4008
2R�
hv
RA
1.5
15
RD
1.5
15
RB
11
5R
C0.
51
41A
Dek
CC
0092
112
7439
402R
1h
vR
A1.
51
11R
C1.
51
11R
B0.
51
10R
D1.
51
91A
Dek
CA
0661
612
7447
772R
1h
vR
A2.
51
11R
C2.
51
11R
B1.
51
10R
D2.
51
91A
des
at1
CC
0169
482
6973
83R
1h
vR
A1.
52
5R
C1.
52
5R
E1.
52
5R
B1.
52
52B
Df3
1C
B02
104
2162
9393
2L�
hv
RB
2.5
24
RA
1.5
13
RF
1.5
13
1Ad
lg1
CC
0193
611
2862
74X
1h
vR
F5.
51
9R
B6.
52
17R
H5.
52
16R
E2.
52
131A
DL
PC
A06
573
6480
860
2L1
hv
RA
0.5
14
3AD
oa
CB
0388
924
7174
833R
1h
vR
A3.
51
13R
B3.
52
13R
E0.
52
12R
F3.
51
131A
do
mB
A00
164
1721
1471
2R1
RD
1.5
215
RA
1.5
214
RE
1.5
211
2AD
pC
A06
594
9111
298
2R1
hv
RA
1.5
19
1Ad
rlC
C00
251
1919
0343
2L1
hv
RA
0.5
14
3BE
f2b
CC
0192
421
6816
942L
�L
eth
alR
A1.
51
5R
C1.
52
5R
B1
25
1Aef
fC
C01
915
1056
5092
3R�
Let
hal
RA
2.5
26
1AeI
F-2
bC
C06
208
1251
9527
3L�
Let
hal
RA
1.5
13
1AeI
F3-
S9C
B04
769
1342
3919
2R1
hv
RB
1.5
25
RA
11
42A
eIF
-4a
CB
0372
159
8247
42L
1L
eth
alR
A1.
51
5R
C1.
51
5R
B0.
51
5R
D0.
51
51A
eIF
-4E
CC
0039
293
9507
83L
�h
vR
D1.
52
6R
B1.
52
6R
C1.
51
5R
A1.
52
61A
eIF
-5A
BA
0015
519
9456
882R
1R
B1.
52
4R
A1.
52
42A
eIF
-5C
BA
0028
014
2519
23R
�R
A1.
52
8R
C2.
53
9R
F1.
52
8R
D1.
52
82A
Eip
63E
CA
0674
235
6939
93L
1h
vR
D2.
51
11R
E3.
52
12R
A4.
53
12R
B3.
52
111A
Elf
CA
0651
512
4357
892L
1h
vR
A1.
51
71A
eRF
1C
B03
931
2034
2600
3L1
Let
hal
RC
2.5
28
RB
2.5
28
RE
2.5
28
RG
2.5
28
1AF
as2
CB
0361
340
2945
4X
�h
vR
A9
210
4Afa
xC
C01
359
1640
3817
3L�
Let
hal
RA
1.5
15
RC
1.5
15
1AF
er1H
CH
CA
0650
326
2127
913R
�L
eth
alR
A1.
51
3R
B2.
52
4R
C2.
52
4R
D2.
52
41A
Fer
2LC
HC
A07
607
2621
5006
3R1
Let
hal
RA
33
4R
B3
34
RC
11
24
Fim
CC
0149
317
1857
87X
�h
vR
A1.
51
5R
C2.
52
6R
D0.
51
51A
Fkb
p13
CA
0734
017
3850
642R
�h
vR
A0.
52
5R
B1.
51
51A
Fp
ps
CB
0493
771
9469
82R
�h
vR
A1.
51
61A
Fs(
2)K
etC
A07
301
2073
5659
2L1
hv
RA
2.5
26
1AG
di
CA
0710
894
9396
72L
�L
eth
alR
A1.
51
31A
gish
CB
0280
412
1063
143R
1h
vR
D2.
52
12R
B2.
52
12R
E2.
52
12R
A0.
53
121A
Glc
AT
-SC
A07
168
9616
795
2L1
hv
RA
1.5
15
RB
0.5
15
1AG
liC
B02
989
1576
2784
2L�
hv
RA
0.5
27
RB
0.5
25
RC
0.5
38
RD
0.5
27
3AG
-oa
47A
CA
0665
863
3178
32R
1Se
mil
eth
alR
B2.
52
8R
C2.
53
9R
D2.
52
8R
E2.
52
81A
gp21
0C
C00
195
1647
884
2R1
hv
RA
0.5
119
5H
DA
C4
CA
0713
413
1728
50X
�h
vR
A2.
52
14R
C1.
51
121A
hep
hC
C00
664
2776
3272
3R�
Let
hal
RB
4.5
414
RA
4.5
414
RK
3.5
313
RH
5.5
516
1AH
is2A
vC
C00
358
2269
3293
3R1
Let
hal
RA
2.5
14
1A
(con
tin
ued
)
1514 M. Buszczak et al.
TA
BL
E4
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
ho
mer
CB
0212
167
2293
32L
�h
vR
A1.
51
6R
B0.
51
6R
C1.
51
71A
ho
wC
A07
414
1788
1934
3R1
Let
hal
RA
2.5
28
RB
2.5
27
RC
2.5
28
1AH
rb87
FC
C00
189
9485
980
3R�
hv
RA
1.5
13
RB
1.5
13
1AH
rb98
DE
CC
0156
324
4259
693R
1h
vR
A1.
51
5R
E1.
51
5R
B0.
51
5R
C0.
51
51A
Hrb
98D
EC
A06
921
2442
7038
3R1
Let
hal
RA
2.5
15
RE
2.5
15
RB
2.5
15
RC
2.5
15
1AH
sc70
Cb
CB
0265
614
0312
993L
1L
eth
alR
A2.
52
6R
B2.
52
6R
C1.
51
51A
Imp
CB
0457
310
7026
76X
�h
vR
F2.
52
6R
H3.
53
7R
G2.
52
6R
D2.
52
61A
Ind
yC
C00
377
1883
3638
3L�
hv
RB
1.5
19
RC
1.5
29
RA
0.5
29
1Ain
x7C
B04
539
6892
590
X�
hv
RB
4.5
35
1Bju
mu
CC
0029
461
8224
53R
1h
vR
A1.
51
31A
Jup
iter
CB
0519
074
3065
13R
�h
vR
D1.
51
4R
H1.
51
5R
A1.
51
5R
E0.
52
61A
kay
CC
0115
625
6082
143R
1h
vR
A1.
51
3R
B0.
51
31A
kek1
CB
0219
012
8228
342L
�h
vR
A0.
51
23A
kis
CC
0146
622
0632
2L�
hv
RA
12.5
218
RB
1.5
17
1Aki
sC
C00
801
2219
242L
�h
vR
A12
.52
18R
B1
17
1Al(
1)G
0084
CC
0136
819
5256
02X
�h
vR
A4.
52
121A
l(1)
G01
68C
C00
492
1539
3000
X1
hv
RA
2.5
17
RB
0.5
26
1Al(
1)G
0320
CA
0668
494
4736
8X
�h
vR
A1.
51
21A
l(2)
0871
7C
A06
962
1468
8632
2R�
hv
RB
2.5
25
RA
1.5
14
1Al(
3)02
640
CA
0746
013
3628
53L
1L
eth
alR
A2.
51
41A
l(3)
82F
dC
A07
520
1123
169
3R�
Let
hal
RL
7.5
319
RB
7.5
319
RJ
7.5
319
RF
11
131A
Lam
CB
0374
955
4307
02L
1L
eth
alR
A1.
52
42A
Lam
CC
B04
957
1046
2132
2R�
Let
hal
RA
1.5
14
1Ala
rpC
C06
230
2415
2038
3R�
hv
RB
1.5
46
RC
1.5
26
RA
1.5
46
RD
1.5
16
2AL
sd-2
CA
0705
114
9696
07X
�h
vR
A1.
51
41A
M6
CA
0660
221
5024
473L
1h
vR
A1.
52
52B
Map
205
CC
0010
927
8915
533R
�h
vR
A1.
51
21A
Map
mo
du
lin
CC
0139
813
7568
942R
�h
vR
B3.
53
7R
A2.
52
61A
mas
kC
C00
924
2006
0119
3R1
hv
RA
1.5
116
RB
1.5
116
1AM
dh
CB
0496
822
9781
923R
1h
vR
A1.
51
71A
me3
1BC
B05
282
1024
0237
2L1
hv
RA
1.5
15
RB
1.5
25
1AM
enC
C06
325
8545
125
3R�
hv
RB
2.5
14
RA
2.5
14
1AM
i-2C
A06
598
1990
1556
3L�
Let
hal
RA
1.5
15
RA
1.5
55
RB
1.5
55
1AM
ob
1C
B04
396
1154
6502
3L�
hv
RC
1.5
14
RD
1.5
14
1Am
od
(md
g4)
CA
0701
217
2003
823R
�h
vR
R4.
52
5R
A4.
52
5R
F4.
52
5R
D4.
52
51A
mu
bC
C01
995
2190
9824
3L1
hv
RA
7.5
29
RB
7.5
29
1AN
etB
BA
0025
314
5964
60X
�h
vR
A3.
52
91A
NFA
TC
A07
788
1353
4781
X1
hv
RA
1.5
110
1AN
lpC
C01
224
2583
1370
3R1
hv
RA
2.5
13
1AN
md
mc
CB
0264
748
7368
43R
�h
vR
A1.
52
3R
B1.
51
3R
A1
12
1AN
rx-I
VC
A06
597
1214
1797
3L1
hv
RA
1.5
112
RB
1.5
114
1A
(con
tin
ued
)
Protein Trap Lines in Drosophila 1515
TA
BL
E4
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
Od
aC
C01
311
Tra
nsp
oso
n3L
�L
eth
alR
A2.
51
111A
Od
aC
B03
751
8060
372
2R�
hv
RA
1.5
12
1AO
rc2
CB
0440
097
8960
43R
1h
vR
A0.
51
5R
A0.
52
43A
osa
CC
0044
513
5294
723R
�L
eth
alR
B4.
52
16R
A4.
52
161A
Pab
p2
CC
0038
040
1973
62R
1h
vR
A2.
52
5R
B2.
52
51A
Pas
t1C
B02
132
8523
601
3R1
hv
RA
1.5
14
RB
11
31A
Pd
e8C
A07
101
1955
4469
2R1
hv
RA
3.5
219
RE
4.5
320
RB
0.5
218
1AP
di
CA
0652
615
1346
693L
�L
eth
alR
A1.
51
2R
B1.
51
2R
D1.
52
2R
C0.
51
21A
Pd
p1
CB
0224
678
4794
43L
�h
vR
F1.
51
5R
B1.
52
6R
G1.
52
71B
Pic
ot
CA
0747
412
5487
872R
�L
eth
alR
A2.
52
61A
Pkn
CC
0165
451
5799
52R
�h
vR
B2.
52
10R
C2.
52
11R
F2.
52
10R
D2.
52
101A
Pli
CB
0304
019
7125
733R
�h
vR
A2.
52
101A
Pm
m45
AC
B02
099
4996
761
2R�
hv
RB
11
4R
A1
13
4Ap
olo
CC
0132
620
3036
433L
1L
eth
alR
A1.
51
51A
psq
CC
0164
5T
ran
spo
son
2R1
Let
hal
RB
3.5?
210
1AP
tp10
DC
C06
344
1153
8719
X1
hv
RB
2.5
214
RC
1.5
110
1Ap
Uf6
8C
A06
961
1501
291
3L�
hv
RC
3.5
46
RD
3.5
68
RA
2.5
15
RB
2.5
46
1Ap
um
CC
0047
949
8377
13R
�L
eth
alR
A8.
52
13R
D8.
52
13R
C8.
52
13R
B6.
51
111A
Rab
11C
A07
717
1693
7950
3R1
Let
hal
RA
1.5
14
RB
2.5
25
1AR
ab2
CA
0746
525
8459
22R
1L
eth
alR
A1.
51
41A
Rm
62C
B02
119
1833
559
3R�
Mfs
teri
leR
E1.
52
6R
C2.
53
7R
D2.
52
6R
B1.
52
61A
Rp
L10
Ab
CB
0265
311
8159
183L
1L
eth
alR
A0.
51
3R
C0.
52
3R
B0.
51
23A
Rp
L13
AC
C01
920
1449
810
3R1
hv
RB
0.5
13
3AR
pL
30C
B03
373
1900
9261
2L�
Let
hal
RB
0.5
23
3AR
tnl1
CA
0652
350
0077
22L
�h
vR
B3.
52
7R
E2.
51
6R
D0.
52
6R
A1.
51
51A
Rtn
l1C
A06
547
4997
815
2L�
hv
RB
3.5
27
RE
2.5
16
RD
2.5
26
RA
0.5
15
1AS6
kC
C01
583
5802
962
3L�
hv
RA
1.5
110
1ASa
p-r
CA
0724
126
7145
393R
�h
vR
A1.
51
7R
B1
26
1Asa
r1C
A07
674
1818
4952
3R1
Let
hal
RA
4.5
26
1Asc
rib
CA
0768
322
3937
843R
1h
vR
C10
214
4sd
CA
0757
515
7120
98X
1h
vR
B3.
53
12R
A1.
52
101A
Sdc
CC
0087
117
3629
462R
�h
vR
C2.
52
6R
A2.
52
7R
B2.
52
61A
Sec6
1aC
C00
735
6479
544
2L�
Let
hal
RA
2.5
14
1ASe
ma-
1aC
A07
125
8592
539
2L1
hv
RA
1.5
120
1ASe
ma-
2aC
A06
989
1241
1885
2R1
hv
RC
2.5
215
RB
2.5
215
RA
2.5
215
1Asg
gC
A06
683
2536
739
X1
hv
RB
2.5
210
RA
2.5
210
RE
2.5
210
RF
2.5
210
1ASh
3bC
C01
823
7361
764
3L�
Let
hal
RB
1.5
12
RA
2.5
23
1ASi
nC
C01
921
2102
4944
3L1
Let
hal
RA
11
34
sls
CA
0674
421
0759
33L
�h
vR
C0.
51
1R
A2.
52
141A
smC
C00
233
1550
4045
2R�
hv
RA
2.5
210
RC
2.5
29
1Asm
i21F
CA
0721
111
1909
42L
�h
vR
B2.
52
5R
A1
24
1A
(con
tin
ued
)
1516 M. Buszczak et al.
TA
BL
E4
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
sno
CC
0103
213
1046
08X
�h
vR
B1.
51
12R
A1.
51
31A
snR
NP
69D
CB
0293
212
7277
083L
�L
eth
alR
A0.
51
23A
sop
CB
0229
498
9750
32L
�L
eth
alR
A1
22
4ASP
oC
kC
A06
644
2276
6451
3L1
hv
RA
0.5
24
RC
0.5
25
3ASp
t6C
A07
692
6162
464
X1
hv
RA
2.5
16
1Asq
dC
B02
655
9470
746
3R�
hv
RB
1.5
17
RC
1.5
16
RA
1.5
15
1Ast
wl
CA
0724
914
4026
793L
�L
eth
alR
A1.
51
21A
Su(v
ar)2
-10
CC
0201
350
0459
12R
1h
vR
I1.
51
8R
H1.
51
7R
D1.
51
9R
C1
18
1ASu
rf4
CC
0168
411
1712
913R
�L
eth
alR
A1.
51
4R
B2.
52
51A
sws
CC
0171
178
6128
8X
�h
vR
A1.
51
111A
Sxl
CB
0556
269
8603
4X
�Se
mil
eth
alR
B2
23
RC
1.5
16
RN
1.5
16
RO
1.5
18
1AT
ER
94C
B04
973
5877
016
2R1
hv
RA
1.5
15
RB
1.5
24
1AT
m1
CC
0171
011
1161
233R
1L
eth
alR
B3.
52
10R
J3.
52
10R
D3.
52
10R
G3.
52
101A
Tm
1C
C00
578
1111
7364
3R1
Let
hal
RB
3.5
210
RJ
3.5
210
RD
3.5
210
RG
3.5
210
1Atm
od
CC
0041
626
3892
393R
1h
vR
E2.
52
7R
D1.
51
6R
F2.
52
7R
C2.
52
71A
tmo
dC
A07
346
2640
0579
3R1
Let
hal
RE
6.5
27
RD
5.5
16
RF
6.5
27
RC
6.5
27
1AT
op
1C
C01
414
1521
4506
X1
hv
RA
1.5
18
1Atr
a2C
C01
925
1049
1357
2R�
hv
RC
33
7R
B2.
52
5R
A2.
52
6R
E1
14
4tr
alC
A06
517
1250
8905
3L1
Let
hal
RA
1.5
17
1AT
rxr-
1C
A06
750
8137
659
X1
hv
RA
1.5
14
RB
11
41A
Tsp
42E
eC
C01
420
2903
327
2R1
hv
RA
2.5
25
1AT
sp96
FC
C01
830
2170
7093
3R�
Let
hal
RA
13
RA
11
51B
tsr
CC
0139
319
9329
912R
�L
eth
alR
A1.
51
41A
Tu
do
r-SN
CC
0073
726
2842
3L�
hv
RA
1.5
14
1Atu
nC
C00
482
1168
1495
2R�
hv
RG
2.5
216
RA
2.5
216
RC
2.5
216
RE
2.5
215
1Atw
inC
A06
641
2004
4629
3R�
hv
RB
4.5
27
RE
5.5
28
RC
5.5
18
RD
5.5
34
1AU
ev1A
CA
0749
653
5882
13L
�L
eth
alR
A1.
51
41A
VA
Ch
TC
A06
666
1453
8229
3R1
Let
hal
RA
22
2R
A1.
51
81A
Vh
a13
CA
0764
415
4698
163R
�L
eth
alR
A1.
51
31A
Vh
a16
CA
0670
825
1899
62R
�L
eth
alR
A2.
52
4R
B2.
52
4R
C1.
51
3R
D2.
52
41A
Vh
a26
CC
0138
014
1789
23R
1L
eth
alR
B2.
52
5R
A1.
51
41A
Vh
a55
CA
0763
484
5273
83R
�L
eth
alR
B2.
52
4R
A1.
51
31A
vib
CB
0533
015
0459
623R
�L
eth
alR
A2.
52
91A
vkg
CC
0079
150
1900
52L
�h
vR
A2.
52
91A
vsg
CA
0700
497
0777
13L
1h
vR
A1.
51
2R
D1.
51
2R
B2.
52
3R
C2.
52
31A
xl6
CB
0324
869
1878
62L
�h
vR
A1.
51
2R
A0.
51
21A
yps
CA
0679
112
1172
673L
�L
eth
alR
A1.
51
41A
zip
CC
0162
620
8968
452R
�L
eth
alR
B2.
52
14R
A2.
51
141A
Zn
72D
CA
0770
316
1026
553L
�h
vR
A4.
53
5R
B4.
53
61A
CB
0231
846
3810
23R
1h
v5B
CC
0130
980
2255
63L
1h
v5B
(con
tin
ued
)
Protein Trap Lines in Drosophila 1517
from the insertion site. These examples suggest that theEGFP incorporation level varies between genes in largepart due to the tagging of a subset of gene isoforms thatthemselves display differing expression levels.
In contrast, we found little evidence of short-rangecontext effects. Less than twofold variation in proteinexpression as measured by Western blotting with anti-EGFP antibodies was observed between lines withinsertions at sites within the same intron (N. Srivali
and A. Spradling, unpublished data). However, theseexperiments did reveal that insertions of the piggyBac-based vector consistently produced less EGFP proteinthan lines with the corresponding P-element-basedvector that were inserted in the same intron (N. Srivali
and A. Spradling, unpublished data). This suggeststhat some aspect of the structure of the piggyBac vectorused compromised splicing efficiency.
Identification of protein trap and enhancer trap corecollections: To help identify a core set of valid gene traplines we examined the EGFP expression patterns ofmany nonredundant lines in both the adult ovary andlarval salivary gland. There was a strong correlationbetween insert location and the nature of the stainingpatterns observed. More than 95% of lines in class 1A,the in-frame fusions, produced patterned EGFP expres-sion above background in at least some ovarian cells orin the salivary gland. In contrast, a much smaller, but stillsignificant, fraction of lines in classes 2–5 also expressedEGFP in a regulated manner. Combining informationon insert location, genome annotation, RNA transcriptsequence, and EGFP pattern, we identified a set of 244lines predicted to produce fusions between EGFP and431 protein isoforms of 233 distinct genes (Table 4).These new protein trap lines express EGFP in a widevariety of cellular compartments under diverse devel-opmental controls (see Figures 2 and 3).
A second major class of lines in the collection showedthe properties expected of EGFP enhancer traps (Ta-ble 5). These CB lines were susceptible to enhancertrapping from the P-element promoter, were locatedmostly upstream of the annotated start site, expressedEGFP in nuclei, and the RNA analysis, if available, didnot indicate fusion in frame downstream. The expres-sion patterns of such insertions in well-characterizedgenes supported this interpretation. For example, lineCB02030 in ptc showed strong expression in the innersheath cells of the germarium (Forbes et al. 1996), whileline CB04353 in Dad strongly expressed in the germlinestem cells and immediately downstream germ cells (Kai
and Spradling 2003; Casanueva and Ferguson 2004).The characterization of a significant number of lines
in the collection remains incomplete (Table 1). Many ofthese contain insertions located .0.5 kb from an ap-propriately oriented annotated gene but where RNAsequence data were not obtained. Others are insertedwithin genes at locations not predicted to generateprotein fusions or gene traps. Some of these lines
TA
BL
E4
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
CB
0265
812
5220
672R
�h
v5B
CC
0167
012
9448
373R
1h
v5B
CC
0193
221
8568
623R
�h
v5B
CA
0743
656
4435
82R
�sl
5BC
B03
064
8536
402
2L1
hv
5BC
C00
523
2280
6145
3R�
hv
5B
aA
nn
ota
tio
nre
leas
e5.
0.
1518 M. Buszczak et al.
express EGFP in the ovary, and we cannot rule out thatothers express transcripts in other tissues. On the basisof the processing of previous lines in these same classesthis suggests that a significant number of new enhancertraps and a handful of new protein traps could be sortedout from a larger number of lines with secondary inser-tions in already trapped genes. Consequently, thenumber of different genes trapped in the collectionis likely to increase beyond the 600 or so currentlycharacterized.
Even when a protein is tagged in frame, the insertionof the EGFP sequence is expected to disrupt its normalstructure and localization some fraction of the time. Forexample, line CC01311 traps CG15015, the Drosophilahomolog of mammalian Cip4, a modular proteinthat interacts with Cdc42 and helps to regulate theactin cytoskeleton (Aspenstrom 1997). The CC01311P element is inserted between the first two coding exonsand thus disrupts the FES/Cip4 domain of CG15015(Figure 1G). The protein trap fusion product localizesto the nucleolus while transgenes of CG15015 tagged at
either the very N or C termini localize to the cytoplasmwhen expressed in S2 cells. We observed that 3 otherprotein trap fusion products of 107 analyzed accumu-lated in the nucleus when they were expected to residein the cytoplasm.
Diverse behavior of tagged proteins: The 244 iden-tified protein trap lines of the core collection exhibitextremely diverse patterns of EGFP expression, suggest-ing that proteins occupying a wide range of cellularcompartments can be tagged in vivo. We observed manylines with EGFP fluorescence in the cytoplasm (Figure2A) or nucleus (Figure 2B) as expected. Localization tointracellular membranous structures was also com-monly seen, as illustrated by a trap in the ER structuralcomponent Reticulon-1 (Figure 2D). Gene trapping ofsecretory proteins is thought to be inefficient due toretention of the fusion proteins in the ER where theactivity of the fusion gene may be affected (Skarnes
et al. 1995). The full-length protein traps we constructedcould label secreted proteins, as indicated by theextracellular localization of EGFP in CA06735, a fusion
Figure 2.—Protein traps for the study of pro-tein subcellular localization. Patterns of subcellu-lar localization of EGFP expressed from thefollowing lines that trap the indicated genes wereobserved: (A) cytoplasmic, CA06607 (CG17342);(B) nuclear, BA00164 (dom); (C) enhancer trapnuclear, CB04353 (Dad) in stem cells and earlycystocytes; (D) endoplasmic reticulum,CA06523 (Rtnl1); (E) extracellular, CA06735 (ca-thepsin K); (F) membrane, CA07474 (Picot); (G)apical, CC01941 (Baz); (H) chromatin, CA07249(stwl); (I) nuclear membrane, CA07301(Fs(2)Ket); (J) lipid droplets, CA07051 (Lsd-2);(K) novel structure, CA07332 (CG6854); (L)novel structure, CC01326 (polo).
Protein Trap Lines in Drosophila 1519
Figure 3.—Protein traps for the study of developmental regulation during oogenesis. The expression in the ovary of variousprotein trap lines is shown to illustrate how they can be used to associate genes with developmental processes. (A) Schematic of anovariole tip. The terminal filament (TF), cap cells (CpC), germline stem cells (GSC), cystoblast (CB), and escort cells (ES) areillustrated. (B–H) Cell type identification. (B) Terminal filament, CB02069 (CG14207); (C) cap cells, CB03410 (CG1600); (D)escort cells, CC01359 (fax); (E) follicle cells, CC06135 (CG12785); (F) outer border cells and posterior follicle cells, CB02349(NK7.1); (G) oocyte nucleus equals the germinal vesicle (arrow), CB04219 (CG13776); (H) novel sheath cell type, CC01646(CG12920). (I–L) Analyzing developmental processes. (I) Novel structure in center of midstage follicle, CC00523 (Msn); (J) fu-some, CC01436 (Sec61); (K) germline and somatic ring canals, CA07004 (Vsg); (L) chorion gene amplification, CB04400 (Orc2).(M–P) Localization of proteins in the oocyte. (M) Posterior pole, CC01442 (EIF-4E); (N) posterior pole, CA06517 (Tral); (O)posterior pole, CC00236 (CG32423); (P) anterior pole, CA07529 (CG6151). (Q–T) Developmental regulation of gene expressionin early germ cells. (Q) Control with little change, CC01961 (Actn); (R) GSC/CB enriched, CC06238 (CG11963); (S) GSC andearly cyst enriched, CC01915 (Eff); (T) GSC and forming cyst enriched, CC01442 (EIF-4E).
1520 M. Buszczak et al.
TA
BL
E5
En
han
cer
trap
alle
les
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
1.28
CB
0449
223
8942
52R
1h
vR
A0.
51
13
4EH
PC
B05
417
1993
4636
3R�
hv
RA
11
44
abo
/l(
2)06
225
CB
0327
210
9753
072L
�h
vR
A0.
51
23
Ad
f1C
B02
276
2550
455
2R�
Let
hal
RA
2.5
14
1A
dk3
/C
G46
74C
B04
307
6715
035
3R�
hv
RB
0.5
22
RA
0.5
22
3ao
pC
B02
762
2178
398
2L�
Let
hal
RB
12
44
AT
PC
L/
CG
8370
CB
0442
711
8887
672R
�h
vR
A0.
52
93
bic
CB
0385
587
5772
72R
1L
eth
alR
A0.
52
2R
B0.
51
13
bn
lC
B02
854
1566
2816
3R�
hv
RA
0.5
25
RB
0.5
25
3b
oca
CB
0207
034
5428
32R
�h
vR
A0.
51
23
bo
cksb
eute
lC
B03
586
5416
706
3R�
hv
RA
11
24
Brd
CB
0266
514
9654
673L
�h
vR
A1
14
btn
CB
0350
118
4140
773R
1h
vR
A1
11
4C
BP
CB
0376
272
3096
5X
1h
vR
A0.
51
43
Cct
1C
B02
171
1546
105
3L1
hv
RA
12
44
cdi
CB
0512
914
9199
503R
1h
vR
A1
28
22
CG
1022
5/T
bp
-1C
B02
343
1957
8503
3R1
hv
RA
0.5
13
3C
G10
272
CB
0356
722
3308
33R
1L
eth
alR
A1.
53
9R
B1
28
RC
1.5
36
RD
1.5
39
2C
G10
399
CB
0475
369
6041
62L
1h
vR
A0.
51
23
CG
1044
4C
B05
265
1616
1990
2R�
Let
hal
RA
11
24
CG
1086
3C
B05
229
3942
458
3L1
hv
RB
11
63
CG
1099
0C
B02
351
1364
8912
X1
hv
RA
12
45
CG
1104
2C
B03
720
9046
478
X1
hv
RA
0.5
11
3C
G11
16C
B03
511
1045
469
3R�
hv
RA
11
5R
B1
14
RC
11
54
CG
1138
2C
B02
249
1103
926
X1
hv
RB
0.5
11
3C
G11
526
CB
0222
833
2588
33L
1h
vR
A1.
52
3R
B1.
51
32
CG
1153
7C
B03
533
3143
415
3L�
hv
RC
11
84
CG
1163
8/C
G32
814
CB
0233
792
0725
X1
hv
RA
12
54
CG
1177
9C
B05
200
1498
3800
3R1
hv
RC
1.5
15
RA
1.5
14
RD
1.5
14
1C
G11
940
CB
0502
419
7425
60X
�h
vR
A1.
51
3R
B0.
52
31
CG
1236
0C
B02
135
8856
387
3R1
hv
RA
0.5
25
RB
0.5
25
3C
G12
367
CB
0531
280
3326
02R
1h
vR
A1
13
4C
G12
40C
B02
937
2767
042
3L1
Let
hal
RA
12
24
CG
1241
8C
B05
329
5700
359
3R1
hv
RA
0.5
11
3C
G12
744/
cbx
CB
0509
157
6281
72R
1h
vR
A1
12
RB
0.5
23
3C
G12
797
CB
0350
210
7425
042R
1h
vR
A0.
51
13
CG
1329
5C
B04
422
6068
307
3L1
hv
RA
11
34
CG
1389
5C
B04
175
7082
353L
�h
vR
A0.
52
33
CG
1421
5C
B02
255
1954
6304
X1
hv
RA
0.5
17
3C
G14
430
CB
0223
668
7985
1X
1h
vR
A0.
51
13
CG
1447
8C
B02
133
1334
5190
2R1
hv
RA
12
24
(con
tin
ued
)
Protein Trap Lines in Drosophila 1521
TA
BL
E5
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
CG
1454
2C
B02
373
2187
8549
3R�
Let
hal
RA
0.5
16
3C
G14
709
CB
0439
773
9488
63R
1h
vR
A0.
51
83
CG
1621
CB
0204
233
8060
22R
�h
vR
A0.
51
23
CG
1667
CB
0547
757
3128
02R
1Se
mil
eth
alR
A2
15
4C
G16
708
CB
0438
811
9353
33R
�h
vR
A1.
52
42
CG
1681
7C
B02
934
5503
678
3R1
hv
RA
11
44
CG
1697
1C
B03
898
2241
063L
1h
vR
B1
23
RC
12
3R
D1
23
4C
G17
002/
vim
arC
B05
094
2873
307
2R1
hv
RB
11
54
CG
1709
0C
B03
116
5442
483L
1h
vR
B1.
52
102
CG
1709
0C
B02
270
5442
853L
�h
vR
A1.
51
25
CG
1732
3C
B02
962
1882
3565
2L1
hv
RA
0.5
25
3C
G17
836
CB
0526
314
7495
493R
1h
vR
B3.
53
5R
A3.
53
5R
C2.
53
4R
D0.
52
31
CG
1910
/l(
3)s1
921
CB
0370
227
5732
143R
�L
eth
alR
B1.
52
4R
A0.
51
42
CG
2051
CB
0362
516
1485
83R
1h
vR
B1
13
RA
0.5
13
RC
0.5
13
4C
G21
86C
B05
020
1075
1743
X�
hv
RA
11
74
CG
2446
CB
0370
311
6087
41X
�h
vR
A0.
53
4R
E1
34
RB
13
4R
D1
23
3C
G26
98C
B03
026
3827
319
3R1
hv
RA
12
84
CG
2865
CB
0302
321
8754
9X
�h
vR
A0.
51
23
CG
2926
CB
0223
214
1409
03R
1h
vR
A0.
51
13
CG
2974
CB
0404
799
8061
4X
�h
vR
A0.
51
23
CG
3005
5C
B02
739
8477
121
2R1
hv
RA
11
14
CG
3049
7C
B02
106
3667
177
2R�
hv
RA
22
34
CG
3124
1C
B02
140
1408
1632
3R1
Let
hal
RA
22
24
CG
3147
5C
B04
600
1500
6356
3R1
hv
RA
0.5
24
3C
G31
522
CB
0269
327
9018
3R�
hv
RB
12
10R
C1
22
4C
G31
64C
B02
987
1294
462L
1h
vR
A1.
51
8R
C1.
51
8R
B0.
51
7R
D0.
51
71
CG
3165
0C
B05
467
5043
131
2L�
hv
RA
1.5
24
RB
12
4R
C1
24
2C
G31
688
CB
0357
020
4290
452L
1h
vR
A1
26
4C
G31
689
CB
0323
927
3530
02L
1h
vR
C1.
52
11R
D1.
52
11R
A1.
52
11R
B1.
52
112
CG
3178
2C
B03
345
1671
6141
2L�
hv
RA
2.5
16
RB
2.5
17
1C
G32
043
CB
0324
794
5566
83L
1h
vR
B2
25
RA
22
54
CG
3209
CB
0544
519
9565
112R
1h
vR
A1
17
RB
11
64
CG
3234
5C
B03
988
6281
403L
�h
vR
A0.
51
15
CG
3243
6C
B02
614
2134
0125
3L�
hv
RA
1.5
15
1C
G32
436
CB
0414
821
3402
683L
1h
vR
A1.
51
51
CG
3248
6C
B03
414
3070
840
3L1
Let
hal
RD
0.5
18
3C
G33
21C
B05
150
1015
3408
3R1
hv
RA
22
2R
B2
22
3C
G33
214
CB
0417
321
5006
373L
�h
vR
A1
16
4C
G33
232
CB
0374
024
6687
53L
1h
vR
A0.
53
93
CG
3355
8C
B05
689
2859
128
2R�
hv
RA
0.5
216
3C
G33
967
CB
0440
110
5494
983R
�L
eth
alR
A1
19
4
(con
tin
ued
)
1522 M. Buszczak et al.
TA
BL
E5
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
CG
3398
2C
B04
965
1552
7849
3L1
hv
RA
11
13
CG
3409
CB
0441
226
0232
72R
�h
vR
A1.
53
72
CG
3411
0C
B04
263
2101
0670
3R1
Let
hal
RC
2.5
19
1C
G34
28C
B02
131
9480
857
3L�
hv
RA
11
34
CG
3654
/U
ch-L
3C
B04
163
9470
143
3L�
hv
RA
0.5
14
3C
G40
91C
B02
284
1958
9843
2R�
Let
hal
RA
12
3R
C1
34
4C
G43
00C
B04
249
1227
0328
3L�
hv
RA
11
4R
B1
14
4C
G45
70C
B02
124
6682
617
3R1
hv
RA
11
14
CG
4612
CB
0533
120
4136
412R
�h
vR
A0.
52
33
CG
5381
CB
0461
610
3219
492L
1h
vR
A0.
52
73
CG
5543
CB
0485
419
6967
642R
1h
vR
A0.
51
13
CG
5548
CB
0566
714
9578
72X
�h
vR
A0.
52
23
CG
5677
CB
0205
420
0557
233R
�L
eth
alR
A1
11
4C
G60
14C
B04
785
2139
0388
3L1
hv
RA
1.5
37
2C
G62
18C
B03
213
1116
8745
3R1
hv
RA
12
54
CG
6311
/N
edd
4C
B04
145
1752
2938
3L1
hv
RC
11
54
CG
6439
CB
0383
617
8448
243R
�L
eth
alR
A1
16
4C
G64
99C
B05
192
1107
5733
3R�
Let
hal
RA
21
54
CG
6540
CB
0392
218
5422
28X
1h
vR
A1
12
4C
G67
70C
B02
632
1204
6132
2L�
hv
RA
0.5
11
3C
G71
10C
B04
925
1339
9559
2L1
hv
RB
22
74
CG
7228
CB
0311
579
9418
12L
�h
vR
A1
25
RB
12
54
CG
7331
CB
0515
441
7556
33R
1h
vR
A1
11
4C
G76
37C
B03
644
6698
443
2R�
hv
RA
0.5
12
3C
G80
36C
B04
958
4495
764
3R1
Let
hal
RB
2.5
34
RC
1.5
23
1C
G80
92C
B05
336
1110
1113
2R�
Let
hal
RA
11
6R
B1
13
4C
G81
28C
B02
087
1559
1543
X1
hv
RA
12
44
CG
8206
CB
0452
715
6345
96X
1h
vR
A0.
51
13
CG
8444
CB
0383
750
8636
63R
1L
eth
alR
A1
13
4C
G85
83C
B04
752
7353
717
3L�
hv
RA
11
44
CG
9062
CB
0205
671
7145
82R
�h
vR
B1
18
4C
G91
71C
B02
706
5800
187
2L�
hv
RA
14
8R
B1
37
4C
G93
28C
B03
031
2079
7707
2L�
hv
RB
0.5
23
3C
G95
91C
B05
694
9508
902
3R�
Let
hal
RA
22
94
CG
9666
CB
0450
319
2575
793L
�L
eth
alR
A1
34
4C
G96
99C
B02
196
1658
1750
X�
hv
RA
2.5
25
RF
2.5
25
RB
2.5
25
RD
2.5
25
1C
G98
21C
B03
335
4646
100
3R�
Let
hal
RB
12
2R
A1
22
4C
G99
21C
B02
890
1622
6793
X�
hv
RA
0.5
23
3C
G99
24C
B03
517
9852
398
3R�
hv
RB
2.5
39
2C
hd
64C
B03
690
4122
396
3L�
hv
RB
11
34
chrb
CB
0542
911
4809
493L
1h
vR
C1
14
4
(con
tin
ued
)
Protein Trap Lines in Drosophila 1523
TA
BL
E5
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
Cp
190
CB
0493
311
1009
853R
1h
vR
A0.
52
52
cro
lC
B03
039
1180
9411
2L�
hv
RD
14
8R
A0.
54
8R
B0.
54
7R
C0.
54
84
Csp
CB
0221
722
2607
503L
1L
eth
alR
A1
15
RB
11
7R
C1
16
4C
tBP
CB
0407
388
3794
03R
�L
eth
alR
A1.
51
6R
B1.
52
8R
C1.
52
61
Cyc
B3
CB
0298
120
6965
203R
�L
eth
alR
A1
11
4C
ycD
CB
0261
815
8036
75X
1h
vR
C1
15
RB
12
6R
A0.
52
64
Dad
CB
0435
312
8822
903R
1Se
mil
eth
alR
A2.
53
8R
B1.
52
7R
C1.
53
81
dal
lyC
B03
495
8820
577
3L1
hv
RA
0.5
19
2d
apC
B05
233
5599
791
2R1
Let
hal
RA
0.5
13
RB
0.5
13
2D
ap16
0C
B04
282
2114
2183
2L�
Let
hal
RA
1.5
211
RB
1.5
210
2D
cp-1
/p
ita
CB
0316
019
4390
262R
1L
eth
alR
A1
13
4D
ekC
B02
288
1274
3725
2R�
hv
RA
0.5
111
RD
0.5
19
RC
0.5
111
RB
0.5
110
3d
esat
1C
B02
105
8269
738
3R1
hv
RA
1.5
25
RC
1.5
25
RE
1.5
25
RB
1.5
25
2D
lC
B02
040
1515
1950
3R�
Let
hal
RA
0.5
18
RB
0.5
16
3D
p1/
imd
CB
0375
414
2995
092R
�h
vR
A0.
53
7R
B0.
53
7R
C0.
53
73
Dre
f/R
pL
13C
B02
226
9967
309
2L�
hv
RA
11
44
drk
CB
0297
493
8965
62R
�L
eth
alR
E1.
52
6R
C1.
52
6R
B1.
52
6R
A1.
52
62
du
pC
B04
090
1126
8107
2R�
Let
hal
RA
11
34
eas
CB
0262
016
1723
62X
1h
vR
B1
25
RA
12
5R
E1
26
RD
0.5
37
4ed
lC
B04
040
1456
1061
2R�
hv
RA
0.5
22
3eI
FC
B02
125
1426
796
3R1
Let
hal
RA
12
8R
G1
28
RC
0.5
39
RF
0.5
28
4eI
F3-
S10
CB
0549
325
9579
3R�
Let
hal
RA
11
4R
B0.
53
54
eIF
5BC
B05
358
3460
609
3L�
hv
RB
0.5
19
3E
ip75
BC
B05
160
1800
7641
3L�
Let
hal
RB
1.5
16
1em
cC
B02
035
7492
953L
1h
vR
A0.
51
23
end
oA
CB
0508
414
7323
503R
�Se
mil
eth
alR
A0.
51
23
En
oC
B02
039
1727
775
2L�
Let
hal
RB
33
4R
E3
34
RC
33
4R
D3
34
4es
gC
B02
017
1533
3865
2L1
Let
hal
RA
11
14
fas
CB
0299
295
1051
02R
1h
vR
A0.
53
133
fjC
B04
634
1412
0268
2R1
hv
RA
11
14
flfl
/C
G95
91C
B05
447
9510
094
3R1
Let
hal
RA
1.5
211
RB
1.5
211
2fo
k/n
ebC
B05
794
2008
5050
2L�
Let
hal
RA
11
24
for
CB
0295
636
3219
02L
�L
eth
alR
A4.
53
10R
B2.
51
8R
H3.
52
9R
I4.
53
101
fs(1
)K10
/kz
CB
0279
021
3612
7X
�h
vR
A0.
51
23
ftz-
f1C
B05
043
1875
8843
3L�
Let
hal
RB
1.5
19
1F
ur1
CB
0348
921
2988
143R
�h
vR
A0.
54
12R
B0.
54
12R
C0.
54
11R
D0.
53
103
fwC
B05
224
1189
8010
X�
hv
RA
0.5
215
3fz
2C
B02
997
1916
3698
3L�
Sem
ilet
hal
RB
13
34
glec
CB
0236
417
6819
683R
�Se
mil
eth
alR
A1
12
4G
lyco
gen
inC
B05
177
1708
7979
2R1
Sem
ilet
hal
RB
0.5
15
RA
0.5
13
3G
rip
163
CB
0513
911
9886
093L
�L
eth
alR
A4
15
4
(con
tin
ued
)
1524 M. Buszczak et al.
TA
BL
E5
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
grp
CB
0489
416
6839
202L
1h
vR
B1.
52
7R
C1.
53
82
Gst
E1
CB
0495
614
2858
712R
1L
eth
alR
A0.
51
13
Gst
S1C
B05
261
1298
4903
2R�
hv
RB
12
5R
C1
25
RA
0.5
25
4gu
khC
B04
444
1480
9830
3R�
hv
RA
11
6R
B1
15
RC
11
64
Gyc
76C
CB
0311
719
7836
693L
�L
eth
alR
C2.
55
19R
A1.
54
18R
B1.
53
182
hd
cC
B05
774
2610
3641
3R1
hv
RC
0.5
16
3H
LH
m7
CB
0224
321
8626
373R
1h
vR
A0.
51
13
Hm
gDC
B02
648
1760
0985
2R1
hv
RA
1.5
23
RB
1.5
34
2H
P1b
CB
0284
989
7531
4X
1h
vR
A2
22
4H
r39
CB
0503
921
2507
902L
1h
vR
C2.
53
5R
B2.
53
8R
A2.
53
8R
D2.
53
82
Hsc
70-4
CB
0560
311
0684
123R
�h
vR
A1
22
RC
0.5
22
RE
0.5
22
4H
sr&
oh
grC
B03
168
1712
2190
3R1
hv
RA
0.5
11
RB
0.5
11
RC
0.5
11
3IP
3K1
CB
0222
797
8257
72L
1h
vR
A1
14
4Ir
pC
B02
050
6238
617
3R1
hv
RA
11
104
kuz
CB
0200
013
5502
472L
1L
eth
alR
A1
212
RB
12
12R
C1
212
4l(
2)gl
CB
0233
118
536
2L�
hv
RB
2.5
29
RA
13
10R
C2.
52
9R
D1
39
1la
ma
CB
0545
753
4845
93L
�h
vR
C1.
53
5R
B0.
53
54
Lan
AC
B04
172
6211
152
3L�
hv
RA
0.5
115
3L
BR
CB
0317
317
6079
922R
�h
vR
B0.
53
4R
A0.
52
3R
C0.
51
23
lea
CB
0289
814
2053
12L
�L
eth
alR
A0.
51
143
Lk6
CB
0212
075
9018
43R
�h
vR
A0.
51
63
lola
CB
0288
864
2921
52R
�h
vR
C0.
52
6R
G0.
52
6R
H0.
52
6R
J0.
52
63
Map
60C
B03
167
5478
371
2R1
hv
RA
11
24
mb
cC
B04
603
1960
8306
3R�
Let
hal
RA
1.5
114
1M
bs
CB
0215
016
0451
083L
�L
eth
alR
B1
218
RA
12
18R
C1
219
4M
ESR
3C
B02
595
1861
7241
2L1
hv
RA
0.5
34
3M
ESR
4C
B04
813
1343
5844
2R�
Let
hal
RA
0.5
23
3m
irr
CB
0268
912
6865
473L
1L
eth
alR
A0.
51
5R
B0.
51
53
Mn
fC
B03
632
1111
3376
3L�
hv
RA
1.5
29
RE
1.5
210
RD
1.5
29
2M
ocs
1C
B04
106
1106
3638
3L1
hv
RA
0.5
14
RC
0.5
14
RB
0.5
44
3m
od
CB
0217
227
8783
413R
1h
vR
A3
36
4m
Rp
S17
CB
0366
320
0619
522R
�h
vR
A1
12
4M
yo31
DF
CB
0437
710
5067
732L
1h
vR
A1
210
RB
11
94
neb
/fo
kC
B04
551
2008
5119
2L1
Let
hal
RA
1.5
12
1n
esC
B05
687
1925
3774
3L1
hv
RA
12
4R
B1
24
RC
12
44
Neu
3C
B02
076
1052
3038
3R�
Let
hal
RB
11
84
NK
7.1
CB
0234
910
1988
283R
�h
vR
A0.
52
43
nm
oC
B02
015
7972
207
3L1
RA
13
9R
B1
28
RE
12
8R
C1
47
4n
mo
CB
0463
579
7243
93L
�L
eth
alR
A1.
53
9R
B1.
52
8R
E1.
52
8R
C1.
54
73
Nrg
CB
0488
384
1140
1X
1h
vR
C0.
52
8R
B0.
52
8R
A0.
52
83
orb
2C
B04
897
8946
768
3L�
hv
RD
2.5
36
RB
1.5
25
RC
0.5
25
2
(con
tin
ued
)
Protein Trap Lines in Drosophila 1525
TA
BL
E5
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
Orc
2C
B03
773
9790
734
3R�
Let
hal
RA
22
44
Pak
3C
B04
117
1227
9337
3R�
hv
RA
11
2R
B1
12
4P
i3K
21B
CB
0513
229
7734
2L1
hv
RA
0.5
13
3P
ka-C
1C
B03
724
9699
251
2L�
Let
hal
RB
12
2R
A1
22
RC
12
24
PN
UT
S-R
BC
B04
203
8704
882L
�L
eth
alR
B1
25
RC
13
4R
D1
15
4p
pa
CB
0355
118
5765
122R
�h
vR
A0.
51
13
Psc
CB
0508
388
6824
12R
�h
vR
A1
25
4p
tcC
B02
030
4537
352
2R1
Let
hal
RA
11
64
pxb
CB
0562
511
4918
663R
�h
vR
B0.
52
5R
A0.
52
53
PyK
CB
0424
718
1941
693R
1L
eth
alR
B1.
52
42
Rab
1C
B03
450
1709
4755
3R�
Let
hal
RA
11
5R
B1
14
4R
ab23
CB
0378
115
0912
03R
�h
vR
A1.
52
32
rdgB
CB
0385
613
6567
72X
1h
vR
A0.
52
12R
C0.
53
13R
D0.
52
123
Rga
CB
0213
914
3830
23R
�L
eth
alR
A1
28
RB
12
84
rgr
CB
0224
244
8794
92R
1h
vR
A1
15
4rh
oC
B05
698
1463
699
3L1
hv
RA
0.5
24
3R
ho
GD
IC
B05
226
1992
2267
3L1
Let
hal
RA
1.5
25
2R
pd
3C
B04
302
4626
763
3L1
Let
hal
RA
11
44
sas
CB
0330
729
8850
53R
1Se
mil
eth
alR
B0.
52
103
Sc2
CB
0463
839
0351
23L
1h
vR
A1
14
4Se
p2
CB
0501
816
4140
543R
1h
vR
A1
12
4SF
1C
B05
052
1336
6171
3R1
hv
RA
11
64
SF2
CB
0302
812
1677
403R
�h
vR
A0.
51
43
sgl
CB
0510
169
5788
73L
�h
vR
A1
13
4sh
gC
B02
169
1694
4287
2R�
Let
hal
RA
11
24
Sir2
/D
naJ
-HC
B02
821
1316
5875
2L1
hv
RA
11
24
siz
CB
0328
821
0277
583L
�L
eth
alR
A1
16
4sk
dC
B02
029
2101
8904
3L�
Let
hal
RD
1.5
213
RC
1.5
213
2So
dC
B03
598
1110
6792
3L�
hv
RA
11
24
Ssd
pC
B02
353
1402
2949
3R�
Let
hal
RA
1.5
22
RB
1.5
22
RC
1.5
22
2st
gC
B03
726
2508
1026
3R�
hv
RA
11
24
stu
mp
sC
B05
360
1041
7302
3R1
Let
hal
RA
0.5
15
3st
wl
CB
0522
314
4029
823L
1h
vR
A1
12
4ta
nky
rase
CB
0324
921
4869
053R
�h
vR
A1
17
4ta
raC
B02
767
1206
3444
3R1
hv
RA
1.5
12
1ta
raC
B03
523
1207
5785
3R�
Let
hal
RA
1.5
25
RB
1.5
25
1T
bh
CB
0404
578
8948
8X
1h
vR
B0.
52
83
Tct
pC
B03
738
7036
007
3R�
hv
RA
11
14
To
mC
B02
307
1496
2360
3L1
hv
RA
0.5
11
3tr
bl
CB
0274
420
3947
213L
�Se
mil
eth
alR
A0.
51
23
trn
CB
0511
413
1071
423L
1h
vR
A0.
52
23
(con
tin
ued
)
1526 M. Buszczak et al.
with CG8947, a Drosophila cathepsin K homolog(Figure 2E), and the membrane location of a trap inPicot, a phosphate symporter (Figure 2F). Proteins thatare apically localized in polarized epithelia such asovarian follicle cells were easily visualized, as observedfor Bazooka (Par3) (Figure 2G). Subcompartmental-ized nuclear proteins were also readily apparent. Forexample, a trap of the Stonewall (Stwl) HMG-relatedprotein involved in germ cell chromatin organization(Clark and McKearin 1996) labeled nurse cell nucleiand the oocyte nucleus (inset) differently (Figure 2H).Fs(2)Ket, a protein involved in nuclear import, waslocalized to the nuclear periphery (Figure 2I). In somecases, cell-specific cellular compartments were labeled,such as in CA07051, which traps Lsd-2 and exhibitedEGFP localization to lipid droplets that arise in late-stage nurse cells (Figure 2J).
These studies provide a high-resolution view ofknown protein locations in living cells and also identifymany proteins that were not previously known to residewithin these compartments. In addition, the value ofprotein trapping as a discovery tool was illustrated by thefact that we observed new patterns of localization as well.For example, the HDAC4 protein, fused by the CA07134trap, labeled a small body often found in only one nursecell within an egg chamber (Figure 2K). A spindle-likestructure in young nurse cells was labeled with a proteintrap in the polo gene encoding a mitotic kinase (Figure2L). Antibodies specific for the trapped protein can beused to isolate and further investigate the proteinspresent in these novel structures.
Analysis of developmental regulation—ovarian cells:All the lines in the core collection were characterized onthe basis of their patterns of expression in germ cellsand follicle cells during oogenesis. These experimentsidentified lines expressing in the major classes ofsomatic cells, including terminal filament cells (Figure3B), cap cells (Figure 3C), escort cells (Figure 3D),profollicle cells (Figure 3E), and border and posteriorcells (Figure 3F). Other lines expressed in germ cells ofvarious ages, including some that were highly enrichedin the oocyte nucleus (germinal vesicle) (Figure 3G). Asin the case of subcellular compartments, these studiesdocumented patterns of developmental expression formany genes that were not previously known. Thesegenes become attractive candidates for study of theirfunction in the corresponding processes.
Strikingly, the collection also revealed the likelyexistence of new cell types and novel biological pro-cesses previously unrecognized despite many years ofstudy of ovarian biology. Line CC01646 traps theCG12920 protein and is expressed in a small subset ofovarian sheath cells that likely represent a novel cell type(Figure 3H). In line CC00523 we observed accumula-tion of Msn-EGFP preferentially at the center of mid-stage growing follicles (Figure 3I). It was not previouslyknown that this region was the site of unique protein
TA
BL
E5
(Co
nti
nu
ed)
Gen
eL
ine
Site
aC
hr
Stra
nd
Ph
eno
typ
eT
1In
sert
Met
Sto
pT
2In
sert
Met
Sto
pT
3In
sert
Met
Sto
pT
4In
sert
Met
Sto
pT
ype
trx
CB
0515
610
1084
783R
�h
vR
A1.
53
9R
B1.
53
8R
C1.
52
7R
D1.
52
82
ttk
CB
0227
427
5507
553R
1h
vR
E1.
52
5R
F1.
52
5R
B1
25
RC
12
52
Ugt
37c1
CB
0290
012
7332
282R
�L
eth
alR
A0.
51
13
Ugt
86D
aC
B03
314
6982
675
3R1
hv
RA
0.5
13
3V
ha1
00-2
CB
0340
414
2246
303R
�h
vR
B1.
52
62
Vh
aPP
A1-
1C
B02
209
1072
9723
3R�
Let
hal
RA
12
24
wu
n2
CB
0226
753
0176
02R
1h
vR
A0.
51
63
Xb
p1
CB
0206
117
0311
152R
1L
eth
alR
B1
13
RA
11
24
Xe7
CB
0561
014
9504
33R
1L
eth
alR
A1
18
4Z
4/C
G12
974
CB
0230
521
2792
453L
1Se
mil
eth
alR
A1
12
4
aA
nn
ota
tio
nre
leas
e5.
0.
Protein Trap Lines in Drosophila 1527
accumulation. Msn encodes a protein involved in Junkinase signaling, suggesting that a special intercellularjunction may assemble in this region to structurallyorganize the nurse cells. We observed a similar expres-sion program (not shown) for the line CB03040 thatfuses the Pli gene, encoding a protein associated withthe NF-kB signaling response.
Developmentally specific subcellular structures, in-cluding the fusome (Figure 3J, arrow) and both somaticand germline ring canals (Figure 3K), were also labeledby rare lines. A general and extremely useful applicationof the collection is to identify new proteins that areassociated with such structures and analyze the effect ofmutations. For example, the preferential accumulationof Sec61 in the fusome observed here has been validatedin recent studies (Snapp et al. 2004). Proof of principleexperiments of this type that focus on the fusome will bedescribed elsewhere.
Another valuable capacity of protein trap lines is theability to follow important developmental processes athigh resolution and in living cells. During oogenesis, atleast four major clusters of chorion genes undergospecific gene amplification in stage 10B follicles, aprocess that can be visualized as small ‘‘amplificationdots’’ of BrdU incorporation (Calvi et al. 1998). Theamplifying genes specifically contain substantial amo-unts of replication initiation proteins such as Orc2 atthis time, whereas normally Orc2 is found throughoutthe cell nuclei (Royzman et al. 1999). A protein trap linein Orc2 allows the amplifying loci to be directly visu-alized (Figure 3L). Inspection shows that the dots arenot present in preamplification stage follicle cells butstrongly label amplifying gene loci at stage 10B (Figure3L, arrow).
The Drosophila oocyte represents an importantmodel system for studying RNA and protein localiza-tion. Several biochemical and genetic studies haveidentified proteins enriched at either the anterior orthe posterior pole of the oocyte (Lasko 2003; Wilhelm
and Smibert 2005). Protein traps in genes identified inthese studies, including EIF-4E (Figure 3M) and Tral(Figure 3N), faithfully recapitulate the localization pat-terns of their endogenous counterparts to the posteriorpole of the oocyte (Wilhelm et al. 2003, 2005). Severalother proteins tagged in the collection display posteriorlocalization patterns including a trap in CG32423(Figure 3O), a largely uncharacterized RNA-bindingprotein. In addition, a trap in CG6151 appears to beenriched at the anterior of the oocyte (Figure 3P).While future work will clarify the role of these proteinsin oocyte patterning, these examples show that proteintrapping can complement other approaches and beused to identify new components of localized RNPcomplexes within the cells.
Protein traps provide unique opportunities to analyzegene regulation during development in populations ofcells that are difficult to isolate and in those that are
sensitive to loss of cellular context. We illustrate thepotential of this approach using the regulation of germcell development within and just downstream from thegermline stem cells (GSCs). Many protein trap lines,including CC01961 in Actn, showed uniform expressionin GSCs, CBs, and developing germline cysts (Figure3Q). However, it was possible to find other exampleswhere expression levels between GSCs and early germcells differed from those in other germ cells within thegermarium. One of the most striking examples was lineCC06238 that traps the putative Drosophila succinateCoA ligase gene. Expression was stronger in stem cells(and sometimes early cystoblasts) than in later germcells as illustrated in Figure 3R. Several other genes weredownregulated shortly after GSC division, including effete(Figure 3S), encoding the UbcD1 ubiquitin-conjugatingenzyme that has been shown toaffect germline cyst forma-tion (Lilly et al. 2000). Another line whose EGFP expres-sion was downregulated slightly later, at the completion ofcyst formation, trapped the Drosophila eIF-4E gene(Figure 3T). Downregulation of a related gene CG8023was previously observed at a slightly earlier time, duringcyst divisions (Kai et al. 2005).
Studies on the limitations of current protein trapmethods: We also tested the sensitivity of the approachused here to detect Drosophila genes by looking at theexpression of the lines in the core collection in germlinestem cells. First, although lines were selected on thebasis of expression in embryos, we found ovarianexpression above background in .90% of lines in thecore collection. However, this does not address whethermany other genes exist that were fused but expressedEGFP at levels too low to detect in either tissue. Analysisof germline stem cell RNA by hybridization to Affymet-trix arrays detected transcripts from �6500 Drosophilagenes over an �1000-fold dynamic range (Kai et al.2005). Although translational regulation and differen-tial protein stability, not to mention differences in stain-ing sensitivity between different preparations, would beexpected to introduce potential variation, we werecurious whether protein trap lines could detect stemcell gene transcripts across the full range of expressionlevels.
We observed a strong correspondence between thesetwo measures of stem cell gene expression (Figure 4, A–E). Lines with very strong EGFP expression tended tohave RNA levels at least 10-fold higher than lines withabove background but relatively weak expression. Theselines in turn had signals higher than most lines scored asbelow the level of detection on arrays. The correlationwas not perfect; for example, some lines showed moreEGFP expression in stem cells than might be expectedfrom the microarray study (Figure 4F). The existence ofsuch lines was not unexpected, because some lines likelystill carry second insertions, and the microarray usedwas based on release 1 gene models. Overall, we coulddetect EGFP above background in GSCs from nearly all
1528 M. Buszczak et al.
lines whose levels of mRNA are called as ‘‘present’’ onAffymetrix arrays (Kai et al. 2005). This suggests thatprotein trap lines are not limited to a relatively smallnumber of highly expressed genes, but can be used tofollow a large fraction of gene activity.
DISCUSSION
The Carnegie protein trap collection—a versatileresearch tool: These experiments significantly expandthe number of protein trap lines available for studies ofgene expression within a complex multicellular animal(also see accompanying article by Quinones-Coello
et al. 2007, this issue). Our initial characterization ofthese lines extends previous documentation that thebehavior of the EGFP-tagged protein often correspondsto the behavior of the protein to which it is fused.Moreover, we demonstrate that collections such as oursare exceptionally useful as tools of gene discovery.Candidate genes can be selected on the basis of thedevelopmental expression, subcellular localization, ordynamic behavior of particular protein isoforms. Thesame line can subsequently be used to purify the proteinand its associated complexes and to create deletions forfurther genetic analysis. The Carnegie collection isavailable for research use from the Carnegie Institution.Information is available at http://www.ciwemb.edu/resources/proteintrapcollection.html and at http://flytrap.med.yale.edu/.
Subcellular distribution of protein location: Previ-ously, gene and protein trapping in yeast has been usedto estimate the fraction of proteins that are localizedto various cellular compartments (Ross-Macdonald
et al. 1999; Kumar et al. 2002; Huh et al. 2003). Morethan half of all proteins showed a simple localizationto the cytoplasm or nucleus. Other subcellular struc-tures labeled by tagged proteins in yeast included the
plasma membrane, the ER, mitochondria, the lysosome,and the perixosome. We obtained similar results. Thedistribution patterns of EGFP-tagged proteins withinDrosophila ovarian cells generally matched those ofyeast and most known structures within egg chambershave now been labeled with at least one protein trap line(this study; Morin et al. 2001; Clyne et al. 2003). More-over, the large size of ovarian cells often allowed us todistinguish the fine structure of several subcellular com-partments labeled by EGFP fusion proteins generatedin this screen.
Developmental regulation of protein expression:Despite the fact that only one tissue was examinedclosely, a large number of proteins in the core collectionwere expressed and many were developmentally regu-lated. A diverse array of cell types within the germariumincluding the terminal filament, cap cells, escort cells,germline stem cells, and prefollicle cells were labeled invarious lines. However, expression frequently varies fromstage to stage, not only in cell type but also in subcellularlocation, complicating the problem of accurate annota-tion. Currently, protein trap images within the ovary arebeing curated in the FlyTrap database. Because of therelative cellular simplicity of the germarium and de-veloping ovarian follicles, it may be possible to developtools for displaying expression patterns at single-cellresolution in this tissue. It will be particularly valuable toadd data from many other tissues and developmentalstages for these same lines, to facilitate comparisons.
Identification of insertions not predicted by genomeannotation: One of the surprising results of our studieswas the relatively high frequency of EGFP-positive linesthat were located at sites not predicted to fuse to anyannotated Drosophila transcript. However, similar re-sults were observed in previous studies of gene traptransposons. At least 44% of insertions analyzed byMorin et al. (2001) were not within annotated genes;moreover, the reading frame of insertions in genes was
Figure 4.—Studies of protein trap expressionWe compared the apparent intensity of GFP-protein staining in the GSCs with the RNA levelof the corresponding gene as determined by Af-fymetrix arrays (Kai et al. 2005). (A–F) The pat-tern of protein trap expression of the indicatedgene (see Table 4 for strain names). The expres-sion level from Affymetrix software (mean ofthree measurements) is given.
Protein Trap Lines in Drosophila 1529
not determined. In yeast, Ross-Macdonald et al. (1999)observed that while 1346 EGFP-positive insertions werein the correct frame, another 480 were not. Since mostlacked an alternative start site, they postulated that ahigher than expected frequency of translational frame-shifting may occur. In a recent study in the mouse, 24%of genes were trapped in more than one reading frame(De-Zolt et al. 2006). We also observed this phenome-non; however, our studies emphasized the difficulty ofdrawing final conclusions until the location of everyinsertion and the actual pattern of splicing within themutant strains have been characterized.
Sensitivity of gene traps: A potential limitation ofprotein trapping in vivo is that many gene products maybe expressed at levels so low that EGFP expressed at thesame level could not be detected above background.Only 20% of mouse secretory traps that are G418resistant express detectable CD2, even though theneophosphotransferase gene is fused to CD2 (De-Zolt
et al. 2006). Only 33% of b-geo lines resistant to G418express detectable lacZ. This probably indicates thatmany genes exist that generate enough neophospho-transferase to confer G418 resistance, but not enoughb-galactosidase to be scored as lacZ positive (De-Zolt
et al. 2006). Consistent with this, Ross-Macdonald et al.(1999) found that 415 of 1340 in-frame fusions (31%)could be detected above background by immunofluo-rescence. In contrast, Huh et al. (2003), who taggedcomplete proteins, detected signals above backgroundfor 4156 of 6029 (69%) genes. The system we employedis also designed to tag full-length proteins, and this mayhave enhanced its sensitivity.
The requirement that each line generate EGFPfluorescence in embryos might provide a limitation onthe number of genes that could be tagged. However, ourexperiments argue that this poses relatively little selec-tion on which genes can be fused. We found that genesexpressed in germline stem cells at a wide variety oflevels on the basis of microarray studies had been fusedin our collection of protein trap lines. There was a roughcorrelation between the levels observed using antibodystaining in these cells and the microarray results. Thiswould indicate that the protein trap methodology canpotentially be used to analyze thousands of diverseDrosophila genes.
Increasing proteome coverage: Our analysis revealedtwo major limitations of the current strategy for gener-ating protein traps using P elements. Despite theadvantages of embryo sorting, the inherent 59 bias ofP-element insertion (Bellen et al. 2004) greatly limitsthe rate at which new genes can be trapped. Many of theinsertions were recovered when an insertion occurred atan internal promoter that lies within a coding intron ofanother gene isoform. Many genes lack such alternativepromoters, so it will be necessary to use differentmethods to efficiently recover a more diverse collectionof protein trap strains.
At least two alternative approaches are worthy ofconsideration. First, it should be possible to take advan-tage of the extensive collection of P-element insertionsin Drosophila genes that have been generated by theBDGP gene disruption project and other members ofthe Drosophila community (reviewed in Matthews
et al. 2005). We calculate that �2000 genes already havean existing P-element insertion within a coding intron.Moreover, P elements can recombine into the sites ofexisting P elements in the presence of transposase (Sepp
and Auld 1999). Consequently, a protein trap allele ofeach of these genes could, in principle, be generated bycombining a protein trap insertion of the appropriatereading frame with the ‘‘target’’ gene insertion in asingle strain, ideally using inserts bearing scorable mark-ers and then screening for replacement.
Alternatively, transposons with a broader insertionalspecificity than the P element would be worthwhile.piggyBac elements are suitable for widespread mutagen-esis of Drosophila genes (Thibault et al. 2004) and asgene traps (Bonin and Mann 2004). Minimal sequencesfor piggyBac transposition were defined recently (Li et al.2005). We obtained several hundred piggyBac proteintrap insertions that express EGFP, but the piggyBac genetrap vector appeared to be less efficient, on the basis ofEGFP intensity and Western blotting, than P-elementgene traps in nearby locations (also see accompanyingarticle by Quinones-Coello et al. 2007, this issue). Newvectors containing different splice acceptor and donorsites should be tested within the context of a piggyBacelement. Several new transposons with diverse inser-tional specificities, including Minos (Arensburger et al.2005; Metaxakis et al. 2005), are also worthy of con-sideration. Continued efforts to provide greater cover-age within the Drosophila proteome are warrantedbecause of the exceptional utility of protein traps inanalyzing the development and physiology of multicel-lular organisms.
We thank Lynn Cooley for helpful discussions. We thank X. Morin,W. Chia, A. Hudson, R. Lehmann, and A. Handler for reagents. We aregrateful to the following people for their assistance with diverse aspectsof the project: Alison Pinder, Joseph Carlson, Martha Evans-Holm,Crista Sewald, Becca Sheng, Melanie Issigonis, Megan Kutzer, EmilySeay, and Dianne Williams. We thank the Howard HughesMedical Institute, Carnegie Institution, and the National Institutesof Health for support. M.B. was a fellow of the American Cancer Soci-ety. T.G.N. is a Howard Hughes Fellow of Life Sciences ResearchFoundation.
LITERATURE CITED
Arensburger, P., Y. J. Kim, J. Orsetti, C. Aluvihare, D. A. O’Brochta
et al., 2005 An active transposable element, Herves, fromthe African malaria mosquito Anopheles gambiae. Genetics 169:697–708.
Aspenstrom, P., 1997 A Cdc42 target protein with homology to thenon-kinase domain of FER has a potential role in regulating theactin cytoskeleton. Curr. Biol. 7: 479–487.
Bellen, H. J., R. W. Levis, G. Liao, Y. He, J. W. Carlson et al.,2004 The BDGP gene disruption project: single transposon
1530 M. Buszczak et al.
insertions associated with 40% of Drosophila genes. Genetics167: 761–781.
Bonin, C. P., and R. S. Mann, 2004 A piggyBac transposon gene trapfor the analysis of gene expression and function in Drosophila.Genetics 167: 1801–1811.
Buszczak, M., and A. C. Spradling, 2006 The Drosophila P68 RNAhelicase regulates transcriptional deactivation by promotingRNA release from chromatin. Genes Dev. 20: 977–989.
Calvi, B. R., M. A. Lilly and A. C. Spradling, 1998 Cell cycle con-trol of chorion gene amplification. Genes Dev. 12: 734–744.
Casanueva, M. O., and E. L. Ferguson, 2004 Germline stem cellnumber in the Drosophila ovary is regulated by redundantmechanisms that control Dpp signaling. Development 131:1881–1890.
Clark, K. A., and D. M. McKearin, 1996 The Drosophila stonewallgene encodes a putative transcription factor essential for germcell development. Development 122: 937–950.
Clyne, P. J., J. S. Brotman, S. T. Sweeney and G. Davis, 2003 Greenfluorescent protein tagging Drosophila proteins at their nativegenomic loci with small P elements. Genetics 165: 1433–1441.
De-Zolt, S., F. Schnutgen, C. Seisenberger, J. Hansen, M.Hollatz et al., 2006 High-throughput trapping of secretorypathway genes in mouse embryonic stem cells. Nucleic AcidsRes. 34: e25.
Dirks, R. W., and H. J. Tanke, 2006 Advances in fluorescent track-ing of nucleic acids in living cells. Biotechniques 40: 489–496.
Espina, V., J. Milia, G. Wu, S. Cowherd and L. A. Liotta,2006 Laser capture microdissection. Methods Mol. Biol. 319:213–229.
Forbes, A. J., A. C. Spradling, P. W. Ingham and H. Lin, 1996 Therole of segment polarity genes during early oogenesis in Dro-sophila. Development 122: 3283–3294.
Friedrich, G., and P. Soriano, 1991 Promoter traps in embryonicstem cells: a genetic screen to identify and mutate developmentalgenes in mice. Genes Dev. 5: 1513–1523.
Gossler, A., A. L. Joyner, J. Rossant and W. C. Skarnes,1989 Mouse embryonic stem cells and reporter constructs todetect developmentally regulated genes. Science 244: 463–465.
Handler, A. M., and R. A. Harrell, II, 1999 Germline transforma-tion of Drosophila melanogaster with the piggyBac transposon vec-tor. Insect Mol. Biol. 8: 449–457.
Herschman, H. R., 2003 Molecular imaging: looking at problems,seeing solutions. Science 302: 605–608.
Hild, M., B. Beckmann, S. A. Haas, B. Koch, V. Solovyev et al.,2003 An integrated gene annotation and transcriptional pro-filing approach towards the full gene content of the Drosophilagenome. Genome Biol. 5: R3.
Huh, W. K., J. V. Falvo, L. C. Gerke, A. S. Carroll, R. W. Howson
et al., 2003 Global analysis of protein localization in buddingyeast. Nature 425: 686–691.
Kai, T., and A. Spradling, 2003 An empty Drosophila stem cellniche reactivates the proliferation of ectopic cells. Proc. Natl.Acad. Sci. USA 100: 4633–4638.
Kai, T., D. Williams and A. C. Spradling, 2005 The expression pro-file of purified Drosophila germline stem cells. Dev. Biol. 283:486–502.
Kelso, R. J., M. Buszczak, A. T. Quinones, C. Castiblanco, S.Mazzalupo et al., 2004 Flytrap, a database documenting aGFP protein-trap insertion screen in Drosophila melanogaster. Nu-cleic Acids Res. 32: D418–D420.
Kumar, A., S. Agarwal, J. A. Heyman, S. Matson, M. Heidtman
et al., 2002 Subcellular localization of the yeast proteome.Genes Dev. 16: 707–719.
Lasko, P., 2003 Cup-ling oskar RNA localization and translationalcontrol. J. Cell Biol. 163: 1189–1191.
Li, X., R. A. Harrell, A. M. Handler, T. Beam, K. Hennessy
et al., 2005 piggyBac internal sequences are necessary for effi-cient transformation of target genomes. Insect Mol. Biol. 14:17–30.
Lilly, M. A., M. de Cuevas and A. C. Spradling, 2000 Cyclin Aassociates with the fusome during germline cyst formation inthe Drosophila ovary. Dev. Biol. 218: 53–63.
Matthews, K. A., T. C. Kaufman and W. M. Gelbart, 2005 Re-search resources for Drosophila: the expanding universe. Nat.Rev. Genet. 6: 179–193.
Metaxakis, A., S. Oehler, A. Klinakis and C. Savakis, 2005 Minosas a genetic and genomic tool in Drosophila melanogaster. Genetics171: 571–581.
Misra, S., M. A. Crosby, C. J. Mungall, B. B. Matthews, K. S.Campbell et al., 2002 Annotation of the Drosophila melanogastereuchromatic genome: a systematic review. Genome Biol. 3:RESEARCH0083.
Morin, X., R. Daneman, M. Zavortink and W. Chia, 2001 A pro-tein trap strategy to detect GFP-tagged proteins expressed fromtheir endogenous loci in Drosophila. Proc. Natl. Acad. Sci. USA98: 15050–15055.
Quinones-Coello, A. T., L. N. Petrella, K. Ayers, A. Melillo, S.Mazzalupo et al., 2007 Exploring strategies for protein trap-ping in Drosophila. Genetics 175: 1089–1104.
Ross-Macdonald, P., A. Sheehan, C. Friddle, G. S. Roeder and M.Snyder, 1999 Transposon mutagenesis for the analysis of pro-tein production, function, and localization. Methods Enzymol.303: 512–532.
Royzman, I., R. J. Austin, G. Bosco, S. P. Bell and T. L. Orr-Weaver,1999 ORC localization in Drosophila follicle cells and theeffects of mutations in dE2F and dDP. Genes Dev. 13: 827–840.
Sepp, K. J., and V. J. Auld, 1999 Conversion of lacZ enhancer traplines to GAL4 lines using targeted transposition in Drosophila mel-anogaster. Genetics 151: 1093–1101.
Singer, T., and E. Burke, 2003 High-throughput TAIL-PCR as a tool toidentify DNA flanking insertions. Methods Mol. Biol. 236: 241–272.
Skarnes, W. C., J. E. Moss, S. M. Hurtley and R. S. Beddington,1995 Capturing genes encoding membrane and secreted pro-teins important for mouse development. Proc. Natl. Acad. Sci.USA 92: 6592–6596.
Skarnes, W. C., H. von Melchner, W. Wurst, G. Hicks, A. S. Nord
et al., 2004 A public gene trap resource for mouse functionalgenomics. Nat. Genet. 36: 543–544.
Snapp, E. L., T. Iida, D. Frescas, J. Lippincott-Schwartz and M. A.Lilly, 2004 The fusome mediates intercellular endoplasmicreticulum connectivity in Drosophila ovarian cysts. Mol. Biol. Cell15: 4512–4521.
Spradling, A. C., D. Stern, A. Beaton, E. J. Rhem, T. Laverty et al.,1999 The Berkeley Drosophila Genome Project gene disrup-tion project: single P-element insertions mutating 25% of vitalDrosophila genes. Genetics 153: 135–177.
Stathopoulos, A., and M. Levine, 2005 Genomic regulatory net-works and animal development. Dev. Cell 9: 449–462.
Thibault, S. T., M. A. Singer, W. Y. Miyazaki, B. Milash, N. A.Dompe et al., 2004 A complementary transposon tool kit forDrosophila melanogaster using P and piggyBac. Nat. Genet. 36:283–287.
Tomancak, P., A. Beaton, R. Weiszmann, E. Kwan, S. Shu et al.,2002 Systematic determination of patterns of gene expres-sion during Drosophila embryogenesis. Genome Biol. 3:RESEARCH0088.
Vasudevan, S., and S. W. Peltz, 2003 Nuclear mRNA surveillance.Curr. Opin. Cell Biol. 15: 332–337.
Ward, R. E., IV, R. S. Lamb and R. G. Fehon, 1998 A conserved func-tional domain of Drosophila coracle is required for localizationat the septate junction and has membrane-organizing activity. J.Cell Biol. 140: 1463–1473.
Wilhelm, J. E., and C. A. Smibert, 2005 Mechanisms of transla-tional regulation in Drosophila. Biol. Cell 97: 235–252.
Wilhelm, J. E., M. Hilton, Q. Amos and W. J. Henzel, 2003 Cup isan eIF4E binding protein required for both the translational re-pression of oskar and the recruitment of Barentsz. J. Cell Biol.163: 1197–1204.
Wilhelm, J. E., M. Buszczak and S. Sayles, 2005 Efficient proteintrafficking requires trailer hitch, a component of a ribonucleo-protein complex localized to the ER in Drosophila. Dev. Cell9: 675–685.
Communicating editor: T. Schupbach
Protein Trap Lines in Drosophila 1531