27
Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.106.065961 The Carnegie Protein Trap Library: A Versatile Tool for Drosophila Developmental Studies Michael Buszczak,* Shelley Paterno,* Daniel Lighthouse,* Julia Bachman,* Jamie Planck,* Stephenie Owen,* Andrew D. Skora,* Todd G. Nystul,* Benjamin Ohlstein,* Anna Allen,* James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, Nahathai Srivali,* Roger A. Hoskins and Allan C. Spradling* ,1 *Howard Hughes Medical Institute Research Laboratories, Department of Embryology, Carnegie Institution of Washington, Baltimore, Maryland 21218, Department of Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205 and Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 Manuscript received September 18, 2006 Accepted for publication December 18, 2006 ABSTRACT Metazoan physiology depends on intricate patterns of gene expression that remain poorly known. Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancer trap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. By sequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced green fluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, we found that 600–900 different genes are trapped in our collection. A core set of 244 lines trapped different identifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256 additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegie collection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggest new strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions. T HE central challenge of postsequence genomics is to learn how an enhanced knowledge of genes, transcripts, and proteins can be applied to better understand the biology of multicellular organisms. Gaining an accurate picture of where and when meta- zoan genes are expressed remains a prerequisite for many such advances (Stathopoulos and Levine 2005). The discovery of distinctive, regulated programs of gene expression at a fine scale has the potential to reveal new cell types and substructures that make up tissues and the biological processes that govern their function. However, sensitive and widely applicable methods will be required to detect and distinguish developmentally programmed gene expression changes from those caused simply by cell cycling or environmental pertur- bation. Several methods for analyzing gene expression within tissues are currently available. Particular cell types can sometimes be cultured in vitro into populations of useful size. However, isolated cells in artificial media frequently behave differently from cells in vivo interacting with precisely positioned neighbors in three-dimensional microenvironments. Another approach is to isolate tissue cells by flow sorting, microdissection, or laser capture and then determine their expression profiles in depth (reviewed in Espina et al. 2006). Visualizing patterns of gene expression within the intact tissues of transgenic organisms containing gene expression re- porters may be the most general method (Tomancak et al. 2002). Epitope tagging, enhancer trapping, and gene trapping all have the added advantage that gene expression can subsequently be observed in living tissues, revealing dynamic processes that are largely beyond the reach of methods based on fixed material (reviewed in Herschman 2003; Dirks and Tanke 2006). Protein trapping is a variation of gene trapping in which endogenous genes are engineered to produce under normal controls protein segments fused to a reporter such as GFP. The great potential of this technology has been extensively documented in yeast, where large collections of strains that each trap a different gene have been generated using transposable elements (Ross-Macdonald et al. 1999) or by homol- ogous recombination (Huh et al. 2003). Extensive gene and protein trapping has also been carried out in cul- tured embryonic stem (ES) cells (Gossler et al. 1989; Friedrich and Soriano 1991), where fusions with more than half of annotated mouse genes have been recovered (see Skarnes et al. 2004). However, relatively few of these ES cell lines have so far been used to gener- ate corresponding mouse strains where the versatility and sensitivity of the method for analyzing tissue struc- ture can be tested. 1 Corresponding author: Carnegie Institution, 3520 San Martin Dr., Baltimore, MD 21218. E-mail: [email protected] Genetics 175: 1505–1531 (March 2007)

The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.106.065961

The Carnegie Protein Trap Library: A Versatile Tool for DrosophilaDevelopmental Studies

Michael Buszczak,* Shelley Paterno,* Daniel Lighthouse,* Julia Bachman,* Jamie Planck,*Stephenie Owen,* Andrew D. Skora,* Todd G. Nystul,* Benjamin Ohlstein,* Anna Allen,*

James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis,†

Nahathai Srivali,* Roger A. Hoskins‡ and Allan C. Spradling*,1

*Howard Hughes Medical Institute Research Laboratories, Department of Embryology, Carnegie Institution of Washington, Baltimore,Maryland 21218, †Department of Cell Biology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205 and

‡Department of Genome Biology, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720

Manuscript received September 18, 2006Accepted for publication December 18, 2006

ABSTRACT

Metazoan physiology depends on intricate patterns of gene expression that remain poorly known.Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancertrap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. Bysequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced greenfluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, wefound that 600–900 different genes are trapped in our collection. A core set of 244 lines trapped differentidentifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegiecollection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggestnew strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions.

THE central challenge of postsequence genomics isto learn how an enhanced knowledge of genes,

transcripts, and proteins can be applied to betterunderstand the biology of multicellular organisms.Gaining an accurate picture of where and when meta-zoan genes are expressed remains a prerequisite formany such advances (Stathopoulos and Levine 2005).The discovery of distinctive, regulated programs ofgene expression at a fine scale has the potential to revealnew cell types and substructures that make up tissues andthe biological processes that govern their function.However, sensitive and widely applicable methods will berequired to detect and distinguish developmentallyprogrammed gene expression changes from thosecaused simply by cell cycling or environmental pertur-bation.

Several methods for analyzing gene expression withintissues are currently available. Particular cell types cansometimes be cultured in vitro into populations of usefulsize. However, isolated cells in artificial media frequentlybehave differently from cells in vivo interacting withprecisely positioned neighbors in three-dimensionalmicroenvironments. Another approach is to isolatetissue cells by flow sorting, microdissection, or lasercapture and then determine their expression profiles

in depth (reviewed in Espina et al. 2006). Visualizingpatterns of gene expression within the intact tissues oftransgenic organisms containing gene expression re-porters may be the most general method (Tomancak

et al. 2002). Epitope tagging, enhancer trapping, andgene trapping all have the added advantage that geneexpression can subsequently be observed in livingtissues, revealing dynamic processes that are largelybeyond the reach of methods based on fixed material(reviewed in Herschman 2003; Dirks and Tanke 2006).

Protein trapping is a variation of gene trapping inwhich endogenous genes are engineered to produceunder normal controls protein segments fused to areporter such as GFP. The great potential of thistechnology has been extensively documented in yeast,where large collections of strains that each trap adifferent gene have been generated using transposableelements (Ross-Macdonald et al. 1999) or by homol-ogous recombination (Huh et al. 2003). Extensive geneand protein trapping has also been carried out in cul-tured embryonic stem (ES) cells (Gossler et al. 1989;Friedrich and Soriano 1991), where fusions withmore than half of annotated mouse genes have beenrecovered (see Skarnes et al. 2004). However, relativelyfew of these ES cell lines have so far been used to gener-ate corresponding mouse strains where the versatilityand sensitivity of the method for analyzing tissue struc-ture can be tested.

1Corresponding author: Carnegie Institution, 3520 San Martin Dr.,Baltimore, MD 21218. E-mail: [email protected]

Genetics 175: 1505–1531 (March 2007)

Page 2: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

Large-scale protein trap screens may also reveal newinformation about genome structure and function.Identifying in an unbiased manner locations through-out a genome where a coding exon can be expressedtests the accuracy and completeness of its annotation.Characterizing the splicing patterns that lead to normalor aberrant GFP expression tests the current catalog oftranscript isoforms generated by alternative splicing.Moreover, by recovering insertions in the same genethat splice differently and produce GFP with varyingefficiency, such a project might generate a data set usefulfor studying the determinants of splice site selection andtranscript stability.

Drosophila provides a favorable system for applyinggene traps to diverse developmental and genomicstudies. The genome sequence has been extensivelyannotated on the basis of experimental data (Misra

et al. 2002). Thousands of enhancer trap lines have beengenerated in large-scale transposon screens and culledof redundant strains by the gene disruption project (seeBellen et al. 2004). In contrast, producing Drosophilaprotein trap lines has remained difficult. Several hun-dred such lines were generated using a mobile GFP-containing exon flanked by both splice acceptor anddonor sites (Morin et al. 2001; Clyne et al. 2003).However, the process was highly inefficient, with as fewas 1 in 1500 progeny flies expressing GFP. Positive linesoften contained more than one insertion, preferentiallytagged a small number of hotspot loci, and tagged manysites not predicated to fuse the GFP exon in frame toany known coding region (Morin et al. 2001). Kelso

et al. (2004) found that the recovery of lines could beincreased by using an automated embryo sorter to selectGFP-positive embryos and established a website, Fly-Trap, to gather information on Drosophila protein traplines. Consequently, we initiated a large-scale proteintrap screen to increase gene coverage, test the genomeannotation, and address some of the remaining techni-cal difficulties in efficient line production.

Here we report the production of lines that trap 600–900 Drosophila genes, including 244 where one or moretrapped proteins can currently be identified. Using theDrosophila ovary as a test system we confirm that proteintrap lines reveal fine-scale details of developmentallyregulated protein expression, making them exception-ally valuable discovery tools for a wide range of studies.Finally, mapping RNA splicing patterns downstreamfrom .1200 insertions provides insight into how anadded exon affects splicing and suggests how the pro-duction of protein trap lines can be expanded to cover alarger fraction of the Drosophila proteome.

MATERIALS AND METHODS

Generation of P-element lines for protein trap screening:The P-element-based protein trap screens presented here uti-lized the pPGA, pPGB, and pPGC vectors described in Morin

et al. (2001). These elements carry a mini-white transgene inthe opposite orientation to an enhanced green gluorescentprotein (EGFP) exon, which is composed of EGFP sequence,without start or stop codons, flanked by splice acceptor anddonor sites from the Drosophila MHC locus. A, B, and C referto the position of the splice sites within the first and last codonsof the EGFP exon sequence. Previously used pPGA, pPGB, andpPGC third chromosome insertions (Morin et al. 2001) wereremobilized in the presence of balancer chromosomes. Newinsertions that mapped to the CyO balancer chromosome, didnot express EGFP, and exhibited remobilization rates off of theCyO balancer of at least 60% in single-pair mating assays wererecovered and used as starting stocks in the screen (see below).

piggyBac protein trap vectors: To make a shuttle vector forsubcloning the EGFP exon into different transposable ele-ments, the entire EGFP exons from the pPGA, pPGB, or pPGCplasmids were excised from the original P-element plasmids(kind gift of W. Chia) using EcoRI and PstI and subcloned intopBluescript (Stratagene, La Jolla, CA). These plasmids werethen cut with EcoRV and KpnI, end filled using Klenow, andreligated to themselves to create pBS-GFPA, pBS-GFPB, andpBS-GFPC. The resulting plasmids carry the EGFP exonsequence between a unique EcoRI site at the 59 end andunique PstI, SmaI, BamHI, and XbaI sites at the 39 end. Newtagging sequences can be inserted between the splice acceptorand donor sites of the exon using unique NcoI and XhoI sites.

Two different piggyBac protein trap vectors (Figure 1) wereconstructed using pBac{D. m. w1} (Handler and Harrell

1999) (kind gift of A. Handler). The pBac{D. m. w1} plasmidwas digested with ClaI to remove most of the mini-whitesequence and a linker containing HpaI, XhoI, and SpeI siteswas inserted in its place to form pBAC{DClaI}. To createpBAC{BglII-GFP}, the EGFP exons from pBS-GFPA, pBS-GFPB,and pBS-GFPC were subcloned into the BglII and MfeI sitesof pBAC{DClaI}. To create pBAC{HpaI-GFP}, EGFP exon se-quences were inserted between the MfeI and HpaI sites ofpBAC{DClaI}. An intronless yellow transgene from the yellow-BSX plasmid (Bellen et al. 2004) was then subcloned into theunique SpeI site of both pBAC{BglII-GFP} and pBAC{HpaI-GFP}to form either pBAC{BglII-GFP; y1}, which has the EGFP exonand yellow transgene oriented away from each other, orpBAC{HpaI-GFP; y1}, which has the EGFP exon and yellowtransgene pointing toward each other (Figure 1A). BothpBAC{BglII-GFP; y1} and pBAC{HpaI-GFP; y1} vectors carryingthe EGFP exon in the A frame were transformed into y w fliesusing the phspBac helper plasmid (Handler and Harrell

1999) (kind gift from A. Handler).We created stable genomic sources of the piggyBac tranpo-

sase using P-element transformation vectors. To place thepiggyBac transposase under control of the ubiquitin promoter,the piggyBac transposase ORF was excised from phspBac usingBamHI and DraI and ligated into the BamHI and SmaI sites ofthe pCasper3-Up2-RX poly(A) P-element vector (Ward et al.1998) (kind gift of R. Fehon), which carried a modifiedmultiple cloning site (kind gift of A. Hudson), to form pP{Ub-pBACtrans}. To make an inducible piggyBac transposase source,phspBac was digested with EcoRI and DraI and the fragmentcontaining both the Drosophila hsp70 promoter and piggyBactransposase ORF was ligated into the EcoRI and StuI sites ofpCasper4 to form pP{hsp70-pBACtrans}. These vectors wereused to transform y w flies, using standard P-element trans-formation techniques.

To test the activity of the piggyBac transposase transgenes,single-pair matings were set up using the pBAC{HpaI; y1}24.3insertion, which mapped to the X chromosome, and pP{Ub-pBACtrans} or pP{hsp70-pBACtrans} stocks. The pBAC{HpaI;y1}24.3 insertion was mobilized in males that were thenoutcrossed to y w females. Phenotypically yellow1 males in

1506 M. Buszczak et al.

Page 3: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

the next generation were scored as new insertions. ThepP{hsp70-pBACtrans} was able to remobilize the pBAC{HpaI;y1}24.3 insert in 43% (n ¼ 30) of single-pair matings testedwhereas the pP{Ub-pBACtrans} was able to remobilize thepBAC{HpaI; y1}24.3 insert in 48% (n ¼ 21) of single-pairmatings tested. The pBAC{HpaI; y1}24.3 insertion was remobi-lized in the presence of a CyO balancer chromosome. Newinsertions that did not express EGFP and mapped to the CyOchromosome were used in the pBAC-based protein trap screen.

Generation of embryos with novel transpositions: Weisolated new EGFP-expressing P-element insertions using thefollowing genetic scheme:

F1 :yw

Y;

Sco

CyO pPGfw 1; GFP exong;1

13

yw

yw;

1

1;Ki; Pfry1t7 :2 ; D2 -3g99B

Ki; Pfry1t7 :2 ; D2 -3g99B

F2 :yw

Y;

1

CyO pPGfw 1; GFP exong;1

Ki; Pfry1t7 :2 ; D2 -3g99B3

yw

yw;

1

1;

1

1

Sort embryos: :

New pBAC insertions were generated through a similargenetic scheme:

F1 :yw

Y;

Sco

CyO pBACfy 1; GFP exong;1

13

yw

yw;

1

1;Pfw 1; pBACtransgPfw 1; pBACtransg

F2 :yw

Y;

1

CyO pBACfy 1; GFP exong;1

Pfw 1; pBACtransg3yw

yw;

1

1;

1

1

Sort embryos:

Hereafter, P-element and piggyBac element-based protein traplines were treated the same. For the F1 cross several hundredmales and females of the appropriate genotypes were matedin bottles to produce several thousand males in which theelements were mobilized for the F2 cross. These males werecrossed to 8000–10,000 virgin y w females in a population cage.These virgin females were obtained using a virgining stock thatcarried a heat-shock-inducible hid transgene on the Y chro-mosome (kind gift of R. Lehmann). Two separate overnightembryo collections from each population cage were screenedfor EGFP expression. We limited the number of times wescreened embryos from a particular cage to try to minimize thenumber of identical insertions recovered due to premeoiticinsertion events.

Embryo sorting and line establishment: We screened forEGFP expression in embryos using a COPAS Drosophilaembryo sorter (Union Biometrica). Embryos were dechorio-nated in 50% bleach for 2.5 min and washed extensively withwater. Dechorionated embryos were then washed into sortingsolution (0.53 PBS, 2% Tween-20). With the exception of thesorting solution, the COPAS sorter was used according to themanufacturer’s protocol, using the manufacturer’s solutions.The sorter and sample pressures of the COPAS machine andembryo density were maintained so that the COPAS sorterscreened 15–20 embryos/sec. The sorter used a 488, 514 nmmultiline argon laser. EGFP fluorescence was detected usingPMT1 set to 510 nm. Red fluorescence, used as a measure ofembryo autofluorescence, was detected using a second PMTset to 580 nm. Baseline values for each fluorescent axis were set

empirically using previously isolated fly strains that express lowlevels of EGFP and y w non-EGFP expressing embryos (Figure1). Approximately 250,000 embryos were sorted in five 50,000-embryo batches per day. Sorted embryos were collected andwashed in dH2O. All the embryos from a single batch wereplaced together in standard food vials. We estimate that�80%of the sorted embryos survived to adulthood. Sorted flies thatsurvived to adulthood and did not carry the Ki, P{ry1t7.2;D2-3}99B or P{w1; pBACtrans} chromosomes were individuallyoutcrossed to a y w stock. New lines that carried EGFP-expressing insertions that did not map to the starting CyOchromosome were maintained as stocks.

DNA sequencing, RT–thermal asymmetric interlaced PCRanalyses, and prediction of fusion potential: Genomic se-quences flanking either P-element or piggyBac protein trapinsertions were determined by members of the LawrenceBerkeley Lab group using an established protocol for sequenc-ing inverse PCR products from genomic DNA (Bellen et al.2004). Database software developed for the annotation of theDrosophila gene disruption project (Bellen et al. 2004) wasused to manage the sequence data. Once the insertion site of agiven protein trap line was determined, a FileMaker Pro data-base that contained information [version 3.2 of the Drosoph-ila genome annotation (Misra et al. 2002)] for all Drosophilatranscripts, exons and introns, and their reading frames wasused to predict which gene(s) and transcript(s) were beingtrapped by a given protein trap insertion.

We developed a reverse transcriptase coupled thermalasymmetric interlaced PCR (RT–TAIL) protocol largely onthe basis of methods used to determine T-DNA insertion sitesin Arabidopsis (Singer and Burke 2003). This methodallowed us to determine the mRNA sequence adjacent to theEGFP exon without using gene-specific primers. Total RNAwas isolated from 15 adult flies using an RNAqueous-96automated kit (catalog no. 1812; Ambion, Austin, TX). Thesamples were ground in 200 ml of sample buffer and spun for 5min at 14,000 rpm. The supernatant was placed in a 96-wellplate and 100 ml of 100% EtOH was mixed with each sample.The sample was transferred to the filter plate, washed, andthen treated with Dnase I (Ambion) for 15 min. Rebindingbuffer was added to each well of the filter plate, and the platewas washed extensively. The RNA was eluted off the filter andprecipitated with 7.5 m LiCl solution (Ambion). The resultingRNA pellet was washed with 75% EtOH and then retreatedwith Dnase I for 30 min at 37�. Dnase inactivation reagent(Ambion) was added to the samples. The RNA samples werespun and the supernatant was transferred to a new plate. Adetailed protocol is available upon request.

The following GFP-specific primers were used for RT–TAILPCR:

GFP-For1, 59-GGAGGACGGCAACATCCTGG-39;GFP-For2, 59-CAACGTCTATATCATGGCCG-39;GFP-For3, 59-AGACCCCAACGAGAAGCGCG-39;GFP-Rev1, 59-GTCGTGCTGCTTCATGTGGTCG-39;GFP-Rev2, 59-GACACGCTGAACTTGTGGCCG-39;GFP-Rev3, 59-AGCTCCTCGCCCTTGCTCACC-39 .

The arbitrary degenerate (AD) primers used in this studywere originally described by Singer and Burke (2003) but arelisted here for convenience:

AD3, 59-AGWGNAGWANCAWAGG-39;AD4, 59-STTGNTASTNCTNTGC-39;AD5, 59-NTCGASTWTSGWGTT-39;AD6, 59-WGTGNAGWANCANAGA-39.

A pool of the AD primers was mixed according to Singer

and Burke (2003).

Protein Trap Lines in Drosophila 1507

Page 4: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

The first round of RT–TAIL PCR was set up in 96-well formatusing a one-step RT–PCR kit (QIAGEN, Valencia, CA). Forevery reaction 5 ml of total RNA was mixed with 10 ml 53 buffer,2 ml 10 mm dNTP solution, 1 ml GFP-For1 or -Rev1 primer, 12.5ml AD primer mix, 2 ml enzyme mix, and 17.5 ml of dH2O. Thereverse transcription reaction was carried out at 50� for 30 min.The sample was then heated to 95� for 15 min and then cycledfor primary TAIL–PCR according to Singer and Burke

(2003), using a MJ Research (Watertown, MA) thermal cycler.The secondary and tertiary TAIL–PCR reactions were carriedout according to Singer and Burke (2003), using GFP-For2 or-Rev2 primers and GFP-For3 or -Rev3 primers, respectively, andregular TAQ DNA polyermase (Roche, Indianapolis). ThePCR products of the tertiary reaction were treated with exoSAP(United States Biochemical, Cleveland) and sequenced usingGFP-For3 or -Rev3 primers.

The RT–TAIL PCR protocol using the three GFP-Forprimers, which amplified off of the 39 end of EGFP, consis-tently yielded better results than the same reaction using theRev primers. Therefore most of the RT–TAIL PCR data definesplicing products at the 39 end of the EGFP sequence. Toidentify sequence fusing to the 59 end of EGFP, we employed a59 RACE kit according to the manufacturer’s protocol (Am-bion), using the EGFP reverse primers listed above.

Analysis of protein expression in tissues: Samples weredissected in Grace’s medium, placed in 48-well plates outfittedwith a nylon mesh bottom, and fixed in 4% paraformaldehydebuffered in 13 PBS for 10 min at room temperature. The platewas washed extensively with PBT (13 PBS, 0.5% Triton X-100,0.3% BSA) and incubated overnight at 4� with rabbit anti-GFPantibody (Torrey Pines) (1:2000) in PBT. The samples werethen washed extensively with PBT and incubated with goatanti-rabbit Alexa488 (Molecular Probes, Eugene, OR) (1:400)for 4 hr at room temperature. The samples were then washedwith PBT, stained with 2 mg/ml of DAPI, and mounted inVectashield (Vector Laboratories, Burlingame, CA). Imageswere collected using a Leica SP2 confocal microscope.

RESULTS

Generating a large initial collection of tagged strainsexpressing EGFP: Our initial strategy was to generate amuch larger number of lines containing new proteintrap vector insertions than in previous screens and toinstitute additional technical improvements. Because oftheir proven utility, we used the same P-element-basedprotein trap vectors employed by Morin et al. (2001),but we also constructed a similar set of vectors withpiggyBac (Figure 1A). As described in materials and

methods, we set up crosses in small population cages tolimit the recovery of clusters, utilized dominant markersto remove the transposase source from all new lines, andidentified rare GFP-expressing embryos rapidly andsensitively using an automated embryo sorter (Figure1B). This protocol allowed us to screen .60 millionembryos over a period of 2.5 years, to identify .7500‘‘green’’ embryos, and to use each one to start anindividual culture (see Table 1). Ultimately, 7404 strainswere successfully established, maintained by selectionfor white1 eye color, and analyzed further as dia-grammed in Figure 1C.

The same scheme was used with both transposons;however, in practice the piggyBac vectors were not nearly

as efficient at generating EGFP-positive candidate linesas the P-element vectors (Table 1). P-element vectorstypically exhibited 70% mobilization and generated �1EGFP-expressing embryo per 1000 sorted. In compari-son, the piggyBac vectors displayed nearly 50% mobili-zation, but they yielded only 1 EGFP-expressing embryoper 50,000 sorted. Thus, the piggyBac vectors wereslightly less efficient at mobilization, but drastically lessefficient at generating EGFP-positive lines upon in-sertion. Consequently, we soon abandoned attempts togenerate large numbers of piggyBac protein trap inser-tions (Table 1), but continued to characterize the lineswe did recover to learn if they would shed any light onthe lower frequency of trapping observed.

Localizing insertions on the annotated genome: Toidentify candidate proteins that may have been fusedwithin individual lines, we determined the genomicDNA sequence flanking the insertion(s) in collabora-tion with the Berkeley Drosophila Genome Project(BDGP) gene disruption project (materials and

methods). In most cases, the sequences from boththe 59 and 39 vector end junctions mapped by BLASTanalysis to a unique insertion site within the Drosophilagenome sequence. Lines for which the sequencingreaction failed, the sequence matched repetitive DNA,or the 59 and 39 sequences differed (indicating that twoor more insertions were present in the stock) wererecycled back into the starting pool, and frequently aunique single insert was eventually identified. Alto-gether the insertions in 1375 C frame, 3172 B frame,and 1009 A frame P-element and 164 piggyBac A framelines were localized to unique genomic sites.

Knowing the genomic location of an insertionallowed us to predict which transcripts would incorpo-rate the EGFP exon and whether they would undergosplicing and translation into a functional fusion protein.First, we removed �1550 duplicate lines derived frompremeiotic clusters that were identified because theybore insertions identical in position and orientation tothose in one or more sibling lines. Of the 4170 in-dependent lines remaining, 2149 (52%) were associatedwith a gene correctly oriented for possible fusion (i.e.,located between �500 and the 39 end). We also clas-sified the ways an insert can be located relative to itsclosest annotated transcript into general categories asdiagrammed in Figure 1D and classified all the lines(Table 2).

Expression of a fusion protein is expected when theGFP exon resides between two coding exons within anintron of matching reading frame (class 1A). Such inser-tions made up only 23% of the total localized insertionsand defined 192 different genes (Table 4). Forty percentof insertions were close to an annotated gene but werenot predicted to express the EGFP exon (classes 2–4),while 37% of the lines were not located within 0.5 kb of acorrectly oriented gene. EGFP production from theselines might be explained by the use of unannotated

1508 M. Buszczak et al.

Page 5: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

Figure 1.—Generation and classification of protein trap vector insertions. (A) Schematic of protein trap vectors (after Morin

et al. 2001). (B) Sample output from automated sorting of Drosophila embryos mobilized from site not expressing GFP. RareGFP1 embryos (red circles) registering above a threshold value are diverted by the machine and later used to start individualcultures. (C) Scheme for characterization of putative protein trap lines (see text). (D) Classification of the general types ofrelationships between transposon inserts and the local genome annotation. Classes 1–4 consist of insertions in the appropriateorientation located within a codon intron (class 1), a noncoding transcribed region (class 2), an upstream genomic region (class3), or an exon (class 4). For each class, the insert was either of the appropriate frame (subclass A) or of nonappropriate frame

(Continued)

Protein Trap Lines in Drosophila 1509

Page 6: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

genic elements, by noncanonical splicing events, or bythe presence of a second insertion at a canonical site.

The Drosophila genome annotation is highly sup-ported by experimental evidence (Misra et al. 2002);hence the frequency of these discrepancies was sur-prising, but a similar outcome has been reported inprevious protein trap screens using both yeast (Ross-Macdonald et al. 1999) and Drosophila (Morin et al.2001). Some protein-coding genes may have beenmissed within cDNA libraries (Hild et al. 2003), and asubstantial number of genes may contain unannotatedfar-upstream promoters and alternative translation startsites. There might be a large class of RNA genes thathave escaped detection but that can drive expression ofEGFP using cryptic start sites. Alternatively, the highselective pressure used to isolate EGFP1 strains mayhave led to the recovery of rare events in which thenormal gene or transcript structure has changed.

Analysis of fusion transcripts: We sequenced por-tions of the fusion transcripts to address how EGFPexpression arises in lines of various classes. A limitednumber of 59 RNA sequences were obtained by 59 RACEanalysis or by TAIL–PCR (see materials and methods).As expected, several lines in class 1A were found toinitiate in normal exons upstream from the insertionsite. However, we discovered several CB lines that

spliced into the EGFP exon from the 59 P-element se-quences of the vector. The P-element promoter is highlyefficient at enhancer trapping, and the entire first exonof the transposase gene is present in the vector alongwith the start of intron 1, which is in the frame com-patible with CB lines. These lines were associated withnuclear localized EGFP, possibly due to fusion of thefirst exon of P transposase with EGFP. Further evidenceof enhancer trapping was observed in the analysis of lineCA07138. The EGFP RNA was fused to sequences, in-cluding an in-frame ATG start codon, derived by tran-scription and splicing from the noncoding strand of themini-white transgene carried in the P-element vector(Figure 1E). These observations suggest that EGFP ex-pression in a significant number of the lines depends ontranscripts initiated from within the transposon itself byenhancer trapping rather than on EGFP exon additionby splicing into exogenous transcripts.

Analysis of downstream transcript sequences: Togain additional information, we analyzed the sequencesdownstream from the EGFP exon from many lines in thecollection by carrying out RT–TAIL–PCR in a 96-wellformat (materials and methods). 39 sequences up to700 bp in length and defining the location of one tosix downstream exons were obtained for .1200 lines(Table 3). The pattern of downstream splicing allowedproductive fusions to be identified and indicated linesthat splice out-of-frame and likely become subject tononsense-mediated decay (NMD) (Vasudevan andPeltz 2003). Fusions within lines of classes 2–4 often

(subclass B) to fuse to the protein if splicing continued to the next annotated exon splice acceptor site. Class 5 consists of trans-posons inserted .0.5 kb from a correctly oriented annotated gene. (E) The structure of cryptic transcripts initiated within theDrosophila mini-white marker gene that contain an ATG codon and splice in frame to EGFP, thereby allowing expression inde-pendent of an endogenous transcript in some lines. (F) Western blot analysis of Dlg1 and eIF-4E protein production in controlanimals (y w, CC00380) and insertion lines predicted to trap Dlg1 (CC01936) or eIF-4E (CC00392, CC00375, and CC01492). (G)Abnormal nuclear accumulation of CG15015-EGFP in line CC01311 whose insertion lies within the FHC domain (left). Tissueculture cells expressing N-terminal or C-terminal fusions are found in the cytoplasm (center and right).

TABLE 1

Project summary

Type No. %

Total setup 7404 100CA 1344 18CB 4000 54CC 1870 25piggyBac A 190 3Aligned sequence 5720 77Clusters 1550 21Independent aligned 4170 56Gene hits 2149 29Spacing hits 2093 28Different genes 1154 16Protein traps 244Enhancer traps 256Novel gene/exon 50Unclassified 300Balanced stocks 878 12

TABLE 2

Line types

Lines

Type Code N %

Coding intron, in frame 1A 1337 23Coding intron, out of frame 1B 173 3Noncoding intron, in frame 2A 29 1Noncoding intron, out of frame 2B 429 81–500 bp upstream, in frame 3A 56 11–500 bp upstream, out of frame 3B 756 13Exon 4 804 14.500 bp to next oriented gene 5 2093 37Total 5677 100

See class definitions in Figure 1D.

1510 M. Buszczak et al.

Page 7: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

were predicted to encode a ‘‘linker peptide,’’ whichmight or might not include a stop codon, derived fromthe translation of a small segment of upstream nucleo-tides. The RNA analysis also revealed the presence of asecond insertion in 14% of type 1A lines, but between 24and 50% of the other classes. The second insertionsfound within class 2–5 lines were often valid protein trapalleles (class 1A) and were frequently the true source ofthe lines’ EGFP production. This information allowedus to identify additional candidate fusions (Table 4), tocorrect many initial line classifications, and to moreaccurately estimate the total number of trapped genes(Table 1: 600–900). By the time lines were selectedand balanced, secondary insertions or damage did notcontribute substantially to the phenotypes reported inTable 4. Tests estimated the frequency of backgroundlethal mutations among balanced, saved lines at 7–21%,similar to the best transposon screens (Spradling et al.1999).

Novel splicing suggests new genes and exons: Wecompared the splicing observed downstream from theinserted exons with that of the genome annotation(Misra et al. 2002) to identify new Drosophila gene andtranscript isoform candidates. To identify new candi-date genes, we focused on lines inserted .0.5 kb froman appropriately oriented known gene and for whichRNA sequence data were also available. In 114 of these205 lines, the RNA sequence coincided with the positionand orientation of the insertion and therefore indicatedthe splicing pattern downstream of the single EGFPexon. Most of the lines spliced to one or more novelexons. At least 8 probably correspond to unannotatedgenes because they match previous gene predictions(Hild et al. 2003) or are supported by EST data (seeTable 4). Most of the remaining exons do not predictproteins with homologs in other species and representeither aberrant splicing events or novel or untranslatedexons.

Similar analysis of 297 lines with intron insertionsallowed us to test for novel exons and transcriptisoforms. We examined 443 splicing events and identi-fied a total of 35 (7.9%) that did not correspond tocurrent gene models (Misra et al. 2002). Since at least

some of these differences probably resulted from aber-rant splicing induced by the insertions, this represents amaximum estimate of the fraction of unannotatedgenomic exons and emphasizes the high accuracy andcompleteness of current Drosophila gene models, atleast for abundant transcript forms. Often, the RNAdata indicated which isoform among several predictedto fuse in frame is likely to predominate in ovariantissue. For example, we could determine that lineCA06613 in ovarian tissue predominantly fuses theSu(Tpl) gene rather than Mi-2, in whose transcriptionunit it also lies in frame.

The nature of the noncanonical splices observed wasinteresting. The most common events (21/35) were forinsertions in large introns to splice to a novel exon(s)prior to joining the predicted downstream exon. Somesimply appear to define alternative isoforms that skipexons or utilize different exon combinations not pre-viously documented. Some of these events may havebeen induced by the abnormal position of the EGFPexon within the primary transcript. However, severallines appear to define alternative isoforms because theyutilize different combinations of known exons in nopreviously documented transcript isoforms. Three linesutilized 59 start sites for exons that differed by 6, 21, or27 bp from the annotated exon. The CC01473 transcriptreads through an annotated exon into the adjacentintron and probably defines a novel alternate transcript39 end. Although we consider it likely that many of thesedifferences reflect endogenous Drosophila gene ex-pression, all of the candidate novel genes and transcriptisoforms require independent confirmation in strainslacking protein trap insertions. Such tests were beyondthe scope of our project.

Protein trap insertions likely vary in the fraction of theendogenous protein that is tagged with EGFP for avariety of reasons. First, in many lines only some of themultiple-transcript isoforms contributing to proteinproduction are tagged by the insertion and fused inframe. Second, the splicing efficiency of the EGFP exonmight vary due to its surrounding genomic context. Toinvestigate this issue, we analyzed the protein productsof tagged genes by Western blotting. The tagged pro-teins were easily distinguished from their wild-typecounterparts on the basis of size and by probing withprotein-specific and anti-EGFP antibodies (Figure 1F).In line CC01936 all three isoforms are predicted toincorporate the EGFP exon in frame, and nearly all ofthe ovarian Dlg1 protein incorporated EGFP as in-dicated by its mobility. A similar result was reportedpreviously in the case of line CB02119 (Buszczak andSpradling 2006), where the precursors of five of sixannotated transcripts are predicted to contain theinsertion, although only two fuse in frame. In contrast,only �50% of the ovarian wild-type eIF-4E protein istagged with EGFP (Figure 1F) despite the fact that six ofseven annotated eIF-4E transcripts initiate upstream

TABLE 3

RNA analysis

Type SuccessfulConfirmed

DNASecondinsert

%confirmed

CA 316 224 50 82CB 328 166 75 69CC 572 165 72 70Piggy A 13

Totals 1229 555 197 74

Protein Trap Lines in Drosophila 1511

Page 8: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E4

Iden

tifi

edtr

app

edp

rote

ins

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

14-3

-3e

CA

0650

614

0689

623R

1L

eth

alR

A1.

51

4R

B1.

51

4R

C1.

51

4R

D1.

51

41A

26-2

9-p

CA

0673

513

9873

133L

1h

vR

A0.

51

33A

Aco

nC

C00

758

2116

9155

2L1

Let

hal

RB

1.5

14

1AA

ctn

CC

0196

119

2733

2X

�h

vR

B2.

52

10R

A2.

52

10R

C2.

52

101A

AG

O1

CA

0691

498

4163

32R

�h

vR

C3.

53

8R

A3.

52

7R

B0.

52

71A

Alh

CC

0136

729

3565

33R

�h

vR

A5.

52

10R

D5.

52

101A

apt

CC

0139

219

4683

192R

1h

vR

B1.

51

5R

D1.

52

6R

E1.

53

6R

C1.

51

51A

apt

CC

0118

619

4738

082R

1h

vR

B1.

51

5R

D2.

52

6R

E2.

53

6R

C1.

51

51A

arg

CB

0357

941

7662

X1

hv

RA

3.5

15

1AA

rgk

CB

0549

290

5163

33L

�L

eth

alR

A3.

52

4R

B2.

52

31A

Arg

kC

B03

789

9056

078

3L�

Let

hal

RA

2.5

24

RB

0.5

23

1AA

tpa

CC

0031

916

7834

183R

1L

eth

alR

C1.

53

10R

A1.

51

10R

B0.

52

9R

D0.

53

61A

baz

CC

0194

117

0725

82X

1h

vR

A1.

51

71A

bel

CC

0086

944

8535

03R

�h

vR

A1.

51

41A

Bes

t1C

B02

354

5996

196

3R�

hv

RA

12

74

bT

ub

56D

CC

0206

915

3385

632R

�L

eth

alR

B1.

51

2R

C1.

52

2R

D1.

52

21A

bo

nC

B02

667

1641

8866

3R1

Sem

ilet

hal

RA

0.5

110

3BB

sgC

A06

978

8104

393

2L1

hv

RB

2.5

27

RA

2.5

27

RD

2.5

27

RC

2.5

27

1Ab

un

CB

0343

112

4829

432L

�h

vR

A3.

51

5R

B1

13

RD

1.5

13

RE

2.5

24

1AC

amC

C00

814

8149

270

2R1

Let

hal

RA

2.5

25

1AC

AP

CA

0692

461

9037

82R

1h

vR

I9.

51

14R

H7.

53

12R

J8.

52

13R

G7.

53

121A

CA

PC

A07

185

Tra

nsp

oso

n2R

1h

vR

I/R

F3.

51

141A

Cat

CC

0090

718

8159

513L

1L

eth

alR

A1.

51

31A

Cg

CC

0146

910

0631

842R

1h

vR

D1.

51

10R

B1.

51

11R

C2.

53

11R

A2.

52

111A

CG

1072

4C

A07

499

1340

6231

3L1

hv

RA

1.5

17

RB

1.5

17

1AC

G11

138

CA

0684

412

4725

70X

�h

vR

C1.

51

41A

CG

1125

5C

B04

917

1301

5302

3L�

hv

RA

1.5

13

1AC

G11

266

CC

0139

170

3315

72L

1h

vR

A2.

52

6R

D2.

58

9R

G2.

57

8R

F1.

56

71A

CG

1196

3C

C06

238

4764

704

3R1

Sem

ilet

hal

RA

2.5

29

1AC

G12

163

CC

0062

510

7693

53R

�h

vR

A1.

51

6R

B1.

51

51A

CG

1278

5C

C06

135

1209

8103

3R�

Let

hal

RA

11

64

CG

1392

0C

C01

646

1650

510

3L1

Let

hal

RA

1.5

13

1AC

G14

207

CB

0206

919

5024

92X

1L

eth

alR

A1.

51

4R

B1.

51

51A

CG

1440

CA

0728

783

3169

9X

1h

vR

A1.

51

51A

CG

1464

8C

A06

610

2292

313R

1h

vR

A1.

51

6R

B1.

53

61A

CG

1465

6C

A06

996

6240

153R

�L

eth

alR

A2.

51

31A

CG

1592

6C

B04

063

1236

5107

X1

hv

RA

0.5

25

3AC

G16

00C

B03

410

3416

244

2R�

hv

RA

1.5

23

RC

1.5

13

RB

1.5

23

1AC

G17

273

CC

0129

416

6549

863R

�L

eth

alR

A1.

51

51A

CG

1764

6C

B02

833

1737

431

2L1

hv

RB

1.5

212

RA

0.5

212

2BC

G18

88C

B02

075

5434

043

2R�

hv

RA

1.5

12

1B

(con

tin

ued

)

1512 M. Buszczak et al.

Page 9: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E4

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

CG

1910

CC

0149

127

5738

623R

1h

vR

B1.

52

4R

A1.

51

4R

D0.

52

41A

CG

3036

CA

0680

148

9415

72L

1h

vR

A2.

52

71A

CG

3101

2C

A06

686

2664

0622

3R1

hv

RC

1.5

16

1AC

G31

012

CA

0681

026

6460

373R

1h

vR

C4.

51

6R

D1.

51

31A

CG

3169

4C

A07

748

2873

276

2L�

hv

RA

1.5

15

1AC

G32

062

CC

0051

110

5156

083L

1h

vR

B2.

52

13R

D2.

52

141A

CG

3242

3C

C00

236

5177

892

3L�

hv

RA

3.5

310

RD

2.5

29

RB

3.5

310

RC

12

81A

CG

3247

9C

A06

614

8820

413L

1L

eth

alR

A4.

52

101A

CG

3248

6C

C00

904

3060

297

3L�

hv

RD

1.5

18

1AC

G32

560

CA

0677

217

6324

69X

1h

vR

A3.

51

81A

CG

3287

CB

0448

326

9996

72R

1h

vR

B2.

52

5R

C1.

51

41A

CG

3393

6C

C01

586

5176

796

3R1

Let

hal

RA

2.5

36

RB

1.5

25

2BC

G38

10C

A07

694

1802

294

X�

hv

RC

1.5

24

RB

12

4R

A0.

51

32B

CG

3939

CA

0756

230

6987

1X

1h

vR

A1.

51

21A

CG

5059

CA

0692

620

5097

423L

�h

vR

A1.

51

5R

C1.

52

5R

B1.

51

5R

D1.

52

51A

CG

5060

CC

0052

616

0790

413R

1L

eth

alR

A1.

51

71A

CG

5130

CB

0361

920

4865

523L

�h

vR

A1.

52

4R

B1

24

2AC

G51

74C

A07

176

1430

9292

2R1

hv

RJ

1.5

16

RI

1.5

16

RA

1.5

16

RH

1.5

15

1AC

G53

92B

A00

207

1500

1227

3L�

RA

2.5

28

1AC

G61

51C

A07

529

1580

8775

3L�

Let

hal

RA

2.5

15

RC

2.5

15

RB

2.5

15

1AC

G63

30C

B03

223

2277

9020

3R�

Let

hal

RB

2.5

26

RA

0.5

15

1AC

G64

16C

C00

858

8627

040

3L1

Let

hal

RE

3.5

19

RF

3.5

19

RA

2.5

28

RG

2.5

25

1AC

G64

24C

C00

677

1360

4099

2R�

hv

RA

23

4R

B0.

52

34

CG

6783

CA

0696

073

9231

33R

�h

vR

B1.

51

3R

C1.

51

3R

A1.

52

41A

CG

6854

CA

0733

215

0993

553L

1L

eth

alR

C1.

51

4R

A1.

52

4R

B1.

52

51A

CG

6930

CA

0655

675

9741

03R

�h

vR

A0.

51

5R

B2.

52

6R

C1.

51

51A

CG

6945

CC

0086

415

0882

633L

�L

eth

alR

A0.

53

33A

CG

7185

CC

0064

583

0808

93L

�h

vR

A1.

51

71A

CG

7484

CB

0410

117

6472

863L

1h

vR

B0.

51

33A

CG

8209

CB

0208

679

6969

33L

1h

vR

A1.

51

31A

CG

8213

BA

0016

948

6330

92R

�R

A1.

51

81A

CG

8351

CA

0722

846

3129

93R

1L

eth

alR

A4.

51

51A

CG

8443

CA

0660

412

0751

982R

1h

vR

A1.

51

71A

CG

8552

CA

0735

281

5958

02L

�h

vR

A0.

51

73A

CG

8583

CA

0660

373

5438

83L

1h

vR

A1.

51

41A

CG

8920

CC

0082

516

2081

992R

1h

vR

C2.

52

8R

B2.

52

8R

A1.

52

41A

CG

9331

CB

0496

220

8245

592L

1h

vR

B1.

52

5R

D1.

52

5R

C1.

52

6R

E1.

51

51A

CG

9772

CB

0218

816

3496

3R�

hv

RB

1.5

15

RA

0.5

15

RC

0.5

12

1AC

G97

96C

C00

817

9226

998

3R�

hv

RA

1.5

14

1AC

G98

94C

C00

719

2755

189

2L1

hv

RB

2.5

24

RA

1.5

13

1AC

p1

CC

0137

798

5205

72R

1h

vR

B2.

52

4R

A2.

52

4R

C2.

51

41A

(con

tin

ued

)

Protein Trap Lines in Drosophila 1513

Page 10: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E4

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

Crc

CA

0650

754

5615

83R

�h

vR

A2.

51

41A

Crp

CB

0307

316

2802

972L

�h

vR

A1.

51

31A

Cyc

BC

C01

846

1869

4008

2R�

hv

RA

1.5

15

RD

1.5

15

RB

11

5R

C0.

51

41A

Dek

CC

0092

112

7439

402R

1h

vR

A1.

51

11R

C1.

51

11R

B0.

51

10R

D1.

51

91A

Dek

CA

0661

612

7447

772R

1h

vR

A2.

51

11R

C2.

51

11R

B1.

51

10R

D2.

51

91A

des

at1

CC

0169

482

6973

83R

1h

vR

A1.

52

5R

C1.

52

5R

E1.

52

5R

B1.

52

52B

Df3

1C

B02

104

2162

9393

2L�

hv

RB

2.5

24

RA

1.5

13

RF

1.5

13

1Ad

lg1

CC

0193

611

2862

74X

1h

vR

F5.

51

9R

B6.

52

17R

H5.

52

16R

E2.

52

131A

DL

PC

A06

573

6480

860

2L1

hv

RA

0.5

14

3AD

oa

CB

0388

924

7174

833R

1h

vR

A3.

51

13R

B3.

52

13R

E0.

52

12R

F3.

51

131A

do

mB

A00

164

1721

1471

2R1

RD

1.5

215

RA

1.5

214

RE

1.5

211

2AD

pC

A06

594

9111

298

2R1

hv

RA

1.5

19

1Ad

rlC

C00

251

1919

0343

2L1

hv

RA

0.5

14

3BE

f2b

CC

0192

421

6816

942L

�L

eth

alR

A1.

51

5R

C1.

52

5R

B1

25

1Aef

fC

C01

915

1056

5092

3R�

Let

hal

RA

2.5

26

1AeI

F-2

bC

C06

208

1251

9527

3L�

Let

hal

RA

1.5

13

1AeI

F3-

S9C

B04

769

1342

3919

2R1

hv

RB

1.5

25

RA

11

42A

eIF

-4a

CB

0372

159

8247

42L

1L

eth

alR

A1.

51

5R

C1.

51

5R

B0.

51

5R

D0.

51

51A

eIF

-4E

CC

0039

293

9507

83L

�h

vR

D1.

52

6R

B1.

52

6R

C1.

51

5R

A1.

52

61A

eIF

-5A

BA

0015

519

9456

882R

1R

B1.

52

4R

A1.

52

42A

eIF

-5C

BA

0028

014

2519

23R

�R

A1.

52

8R

C2.

53

9R

F1.

52

8R

D1.

52

82A

Eip

63E

CA

0674

235

6939

93L

1h

vR

D2.

51

11R

E3.

52

12R

A4.

53

12R

B3.

52

111A

Elf

CA

0651

512

4357

892L

1h

vR

A1.

51

71A

eRF

1C

B03

931

2034

2600

3L1

Let

hal

RC

2.5

28

RB

2.5

28

RE

2.5

28

RG

2.5

28

1AF

as2

CB

0361

340

2945

4X

�h

vR

A9

210

4Afa

xC

C01

359

1640

3817

3L�

Let

hal

RA

1.5

15

RC

1.5

15

1AF

er1H

CH

CA

0650

326

2127

913R

�L

eth

alR

A1.

51

3R

B2.

52

4R

C2.

52

4R

D2.

52

41A

Fer

2LC

HC

A07

607

2621

5006

3R1

Let

hal

RA

33

4R

B3

34

RC

11

24

Fim

CC

0149

317

1857

87X

�h

vR

A1.

51

5R

C2.

52

6R

D0.

51

51A

Fkb

p13

CA

0734

017

3850

642R

�h

vR

A0.

52

5R

B1.

51

51A

Fp

ps

CB

0493

771

9469

82R

�h

vR

A1.

51

61A

Fs(

2)K

etC

A07

301

2073

5659

2L1

hv

RA

2.5

26

1AG

di

CA

0710

894

9396

72L

�L

eth

alR

A1.

51

31A

gish

CB

0280

412

1063

143R

1h

vR

D2.

52

12R

B2.

52

12R

E2.

52

12R

A0.

53

121A

Glc

AT

-SC

A07

168

9616

795

2L1

hv

RA

1.5

15

RB

0.5

15

1AG

liC

B02

989

1576

2784

2L�

hv

RA

0.5

27

RB

0.5

25

RC

0.5

38

RD

0.5

27

3AG

-oa

47A

CA

0665

863

3178

32R

1Se

mil

eth

alR

B2.

52

8R

C2.

53

9R

D2.

52

8R

E2.

52

81A

gp21

0C

C00

195

1647

884

2R1

hv

RA

0.5

119

5H

DA

C4

CA

0713

413

1728

50X

�h

vR

A2.

52

14R

C1.

51

121A

hep

hC

C00

664

2776

3272

3R�

Let

hal

RB

4.5

414

RA

4.5

414

RK

3.5

313

RH

5.5

516

1AH

is2A

vC

C00

358

2269

3293

3R1

Let

hal

RA

2.5

14

1A

(con

tin

ued

)

1514 M. Buszczak et al.

Page 11: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E4

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

ho

mer

CB

0212

167

2293

32L

�h

vR

A1.

51

6R

B0.

51

6R

C1.

51

71A

ho

wC

A07

414

1788

1934

3R1

Let

hal

RA

2.5

28

RB

2.5

27

RC

2.5

28

1AH

rb87

FC

C00

189

9485

980

3R�

hv

RA

1.5

13

RB

1.5

13

1AH

rb98

DE

CC

0156

324

4259

693R

1h

vR

A1.

51

5R

E1.

51

5R

B0.

51

5R

C0.

51

51A

Hrb

98D

EC

A06

921

2442

7038

3R1

Let

hal

RA

2.5

15

RE

2.5

15

RB

2.5

15

RC

2.5

15

1AH

sc70

Cb

CB

0265

614

0312

993L

1L

eth

alR

A2.

52

6R

B2.

52

6R

C1.

51

51A

Imp

CB

0457

310

7026

76X

�h

vR

F2.

52

6R

H3.

53

7R

G2.

52

6R

D2.

52

61A

Ind

yC

C00

377

1883

3638

3L�

hv

RB

1.5

19

RC

1.5

29

RA

0.5

29

1Ain

x7C

B04

539

6892

590

X�

hv

RB

4.5

35

1Bju

mu

CC

0029

461

8224

53R

1h

vR

A1.

51

31A

Jup

iter

CB

0519

074

3065

13R

�h

vR

D1.

51

4R

H1.

51

5R

A1.

51

5R

E0.

52

61A

kay

CC

0115

625

6082

143R

1h

vR

A1.

51

3R

B0.

51

31A

kek1

CB

0219

012

8228

342L

�h

vR

A0.

51

23A

kis

CC

0146

622

0632

2L�

hv

RA

12.5

218

RB

1.5

17

1Aki

sC

C00

801

2219

242L

�h

vR

A12

.52

18R

B1

17

1Al(

1)G

0084

CC

0136

819

5256

02X

�h

vR

A4.

52

121A

l(1)

G01

68C

C00

492

1539

3000

X1

hv

RA

2.5

17

RB

0.5

26

1Al(

1)G

0320

CA

0668

494

4736

8X

�h

vR

A1.

51

21A

l(2)

0871

7C

A06

962

1468

8632

2R�

hv

RB

2.5

25

RA

1.5

14

1Al(

3)02

640

CA

0746

013

3628

53L

1L

eth

alR

A2.

51

41A

l(3)

82F

dC

A07

520

1123

169

3R�

Let

hal

RL

7.5

319

RB

7.5

319

RJ

7.5

319

RF

11

131A

Lam

CB

0374

955

4307

02L

1L

eth

alR

A1.

52

42A

Lam

CC

B04

957

1046

2132

2R�

Let

hal

RA

1.5

14

1Ala

rpC

C06

230

2415

2038

3R�

hv

RB

1.5

46

RC

1.5

26

RA

1.5

46

RD

1.5

16

2AL

sd-2

CA

0705

114

9696

07X

�h

vR

A1.

51

41A

M6

CA

0660

221

5024

473L

1h

vR

A1.

52

52B

Map

205

CC

0010

927

8915

533R

�h

vR

A1.

51

21A

Map

mo

du

lin

CC

0139

813

7568

942R

�h

vR

B3.

53

7R

A2.

52

61A

mas

kC

C00

924

2006

0119

3R1

hv

RA

1.5

116

RB

1.5

116

1AM

dh

CB

0496

822

9781

923R

1h

vR

A1.

51

71A

me3

1BC

B05

282

1024

0237

2L1

hv

RA

1.5

15

RB

1.5

25

1AM

enC

C06

325

8545

125

3R�

hv

RB

2.5

14

RA

2.5

14

1AM

i-2C

A06

598

1990

1556

3L�

Let

hal

RA

1.5

15

RA

1.5

55

RB

1.5

55

1AM

ob

1C

B04

396

1154

6502

3L�

hv

RC

1.5

14

RD

1.5

14

1Am

od

(md

g4)

CA

0701

217

2003

823R

�h

vR

R4.

52

5R

A4.

52

5R

F4.

52

5R

D4.

52

51A

mu

bC

C01

995

2190

9824

3L1

hv

RA

7.5

29

RB

7.5

29

1AN

etB

BA

0025

314

5964

60X

�h

vR

A3.

52

91A

NFA

TC

A07

788

1353

4781

X1

hv

RA

1.5

110

1AN

lpC

C01

224

2583

1370

3R1

hv

RA

2.5

13

1AN

md

mc

CB

0264

748

7368

43R

�h

vR

A1.

52

3R

B1.

51

3R

A1

12

1AN

rx-I

VC

A06

597

1214

1797

3L1

hv

RA

1.5

112

RB

1.5

114

1A

(con

tin

ued

)

Protein Trap Lines in Drosophila 1515

Page 12: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E4

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

Od

aC

C01

311

Tra

nsp

oso

n3L

�L

eth

alR

A2.

51

111A

Od

aC

B03

751

8060

372

2R�

hv

RA

1.5

12

1AO

rc2

CB

0440

097

8960

43R

1h

vR

A0.

51

5R

A0.

52

43A

osa

CC

0044

513

5294

723R

�L

eth

alR

B4.

52

16R

A4.

52

161A

Pab

p2

CC

0038

040

1973

62R

1h

vR

A2.

52

5R

B2.

52

51A

Pas

t1C

B02

132

8523

601

3R1

hv

RA

1.5

14

RB

11

31A

Pd

e8C

A07

101

1955

4469

2R1

hv

RA

3.5

219

RE

4.5

320

RB

0.5

218

1AP

di

CA

0652

615

1346

693L

�L

eth

alR

A1.

51

2R

B1.

51

2R

D1.

52

2R

C0.

51

21A

Pd

p1

CB

0224

678

4794

43L

�h

vR

F1.

51

5R

B1.

52

6R

G1.

52

71B

Pic

ot

CA

0747

412

5487

872R

�L

eth

alR

A2.

52

61A

Pkn

CC

0165

451

5799

52R

�h

vR

B2.

52

10R

C2.

52

11R

F2.

52

10R

D2.

52

101A

Pli

CB

0304

019

7125

733R

�h

vR

A2.

52

101A

Pm

m45

AC

B02

099

4996

761

2R�

hv

RB

11

4R

A1

13

4Ap

olo

CC

0132

620

3036

433L

1L

eth

alR

A1.

51

51A

psq

CC

0164

5T

ran

spo

son

2R1

Let

hal

RB

3.5?

210

1AP

tp10

DC

C06

344

1153

8719

X1

hv

RB

2.5

214

RC

1.5

110

1Ap

Uf6

8C

A06

961

1501

291

3L�

hv

RC

3.5

46

RD

3.5

68

RA

2.5

15

RB

2.5

46

1Ap

um

CC

0047

949

8377

13R

�L

eth

alR

A8.

52

13R

D8.

52

13R

C8.

52

13R

B6.

51

111A

Rab

11C

A07

717

1693

7950

3R1

Let

hal

RA

1.5

14

RB

2.5

25

1AR

ab2

CA

0746

525

8459

22R

1L

eth

alR

A1.

51

41A

Rm

62C

B02

119

1833

559

3R�

Mfs

teri

leR

E1.

52

6R

C2.

53

7R

D2.

52

6R

B1.

52

61A

Rp

L10

Ab

CB

0265

311

8159

183L

1L

eth

alR

A0.

51

3R

C0.

52

3R

B0.

51

23A

Rp

L13

AC

C01

920

1449

810

3R1

hv

RB

0.5

13

3AR

pL

30C

B03

373

1900

9261

2L�

Let

hal

RB

0.5

23

3AR

tnl1

CA

0652

350

0077

22L

�h

vR

B3.

52

7R

E2.

51

6R

D0.

52

6R

A1.

51

51A

Rtn

l1C

A06

547

4997

815

2L�

hv

RB

3.5

27

RE

2.5

16

RD

2.5

26

RA

0.5

15

1AS6

kC

C01

583

5802

962

3L�

hv

RA

1.5

110

1ASa

p-r

CA

0724

126

7145

393R

�h

vR

A1.

51

7R

B1

26

1Asa

r1C

A07

674

1818

4952

3R1

Let

hal

RA

4.5

26

1Asc

rib

CA

0768

322

3937

843R

1h

vR

C10

214

4sd

CA

0757

515

7120

98X

1h

vR

B3.

53

12R

A1.

52

101A

Sdc

CC

0087

117

3629

462R

�h

vR

C2.

52

6R

A2.

52

7R

B2.

52

61A

Sec6

1aC

C00

735

6479

544

2L�

Let

hal

RA

2.5

14

1ASe

ma-

1aC

A07

125

8592

539

2L1

hv

RA

1.5

120

1ASe

ma-

2aC

A06

989

1241

1885

2R1

hv

RC

2.5

215

RB

2.5

215

RA

2.5

215

1Asg

gC

A06

683

2536

739

X1

hv

RB

2.5

210

RA

2.5

210

RE

2.5

210

RF

2.5

210

1ASh

3bC

C01

823

7361

764

3L�

Let

hal

RB

1.5

12

RA

2.5

23

1ASi

nC

C01

921

2102

4944

3L1

Let

hal

RA

11

34

sls

CA

0674

421

0759

33L

�h

vR

C0.

51

1R

A2.

52

141A

smC

C00

233

1550

4045

2R�

hv

RA

2.5

210

RC

2.5

29

1Asm

i21F

CA

0721

111

1909

42L

�h

vR

B2.

52

5R

A1

24

1A

(con

tin

ued

)

1516 M. Buszczak et al.

Page 13: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E4

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

sno

CC

0103

213

1046

08X

�h

vR

B1.

51

12R

A1.

51

31A

snR

NP

69D

CB

0293

212

7277

083L

�L

eth

alR

A0.

51

23A

sop

CB

0229

498

9750

32L

�L

eth

alR

A1

22

4ASP

oC

kC

A06

644

2276

6451

3L1

hv

RA

0.5

24

RC

0.5

25

3ASp

t6C

A07

692

6162

464

X1

hv

RA

2.5

16

1Asq

dC

B02

655

9470

746

3R�

hv

RB

1.5

17

RC

1.5

16

RA

1.5

15

1Ast

wl

CA

0724

914

4026

793L

�L

eth

alR

A1.

51

21A

Su(v

ar)2

-10

CC

0201

350

0459

12R

1h

vR

I1.

51

8R

H1.

51

7R

D1.

51

9R

C1

18

1ASu

rf4

CC

0168

411

1712

913R

�L

eth

alR

A1.

51

4R

B2.

52

51A

sws

CC

0171

178

6128

8X

�h

vR

A1.

51

111A

Sxl

CB

0556

269

8603

4X

�Se

mil

eth

alR

B2

23

RC

1.5

16

RN

1.5

16

RO

1.5

18

1AT

ER

94C

B04

973

5877

016

2R1

hv

RA

1.5

15

RB

1.5

24

1AT

m1

CC

0171

011

1161

233R

1L

eth

alR

B3.

52

10R

J3.

52

10R

D3.

52

10R

G3.

52

101A

Tm

1C

C00

578

1111

7364

3R1

Let

hal

RB

3.5

210

RJ

3.5

210

RD

3.5

210

RG

3.5

210

1Atm

od

CC

0041

626

3892

393R

1h

vR

E2.

52

7R

D1.

51

6R

F2.

52

7R

C2.

52

71A

tmo

dC

A07

346

2640

0579

3R1

Let

hal

RE

6.5

27

RD

5.5

16

RF

6.5

27

RC

6.5

27

1AT

op

1C

C01

414

1521

4506

X1

hv

RA

1.5

18

1Atr

a2C

C01

925

1049

1357

2R�

hv

RC

33

7R

B2.

52

5R

A2.

52

6R

E1

14

4tr

alC

A06

517

1250

8905

3L1

Let

hal

RA

1.5

17

1AT

rxr-

1C

A06

750

8137

659

X1

hv

RA

1.5

14

RB

11

41A

Tsp

42E

eC

C01

420

2903

327

2R1

hv

RA

2.5

25

1AT

sp96

FC

C01

830

2170

7093

3R�

Let

hal

RA

13

RA

11

51B

tsr

CC

0139

319

9329

912R

�L

eth

alR

A1.

51

41A

Tu

do

r-SN

CC

0073

726

2842

3L�

hv

RA

1.5

14

1Atu

nC

C00

482

1168

1495

2R�

hv

RG

2.5

216

RA

2.5

216

RC

2.5

216

RE

2.5

215

1Atw

inC

A06

641

2004

4629

3R�

hv

RB

4.5

27

RE

5.5

28

RC

5.5

18

RD

5.5

34

1AU

ev1A

CA

0749

653

5882

13L

�L

eth

alR

A1.

51

41A

VA

Ch

TC

A06

666

1453

8229

3R1

Let

hal

RA

22

2R

A1.

51

81A

Vh

a13

CA

0764

415

4698

163R

�L

eth

alR

A1.

51

31A

Vh

a16

CA

0670

825

1899

62R

�L

eth

alR

A2.

52

4R

B2.

52

4R

C1.

51

3R

D2.

52

41A

Vh

a26

CC

0138

014

1789

23R

1L

eth

alR

B2.

52

5R

A1.

51

41A

Vh

a55

CA

0763

484

5273

83R

�L

eth

alR

B2.

52

4R

A1.

51

31A

vib

CB

0533

015

0459

623R

�L

eth

alR

A2.

52

91A

vkg

CC

0079

150

1900

52L

�h

vR

A2.

52

91A

vsg

CA

0700

497

0777

13L

1h

vR

A1.

51

2R

D1.

51

2R

B2.

52

3R

C2.

52

31A

xl6

CB

0324

869

1878

62L

�h

vR

A1.

51

2R

A0.

51

21A

yps

CA

0679

112

1172

673L

�L

eth

alR

A1.

51

41A

zip

CC

0162

620

8968

452R

�L

eth

alR

B2.

52

14R

A2.

51

141A

Zn

72D

CA

0770

316

1026

553L

�h

vR

A4.

53

5R

B4.

53

61A

CB

0231

846

3810

23R

1h

v5B

CC

0130

980

2255

63L

1h

v5B

(con

tin

ued

)

Protein Trap Lines in Drosophila 1517

Page 14: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

from the insertion site. These examples suggest that theEGFP incorporation level varies between genes in largepart due to the tagging of a subset of gene isoforms thatthemselves display differing expression levels.

In contrast, we found little evidence of short-rangecontext effects. Less than twofold variation in proteinexpression as measured by Western blotting with anti-EGFP antibodies was observed between lines withinsertions at sites within the same intron (N. Srivali

and A. Spradling, unpublished data). However, theseexperiments did reveal that insertions of the piggyBac-based vector consistently produced less EGFP proteinthan lines with the corresponding P-element-basedvector that were inserted in the same intron (N. Srivali

and A. Spradling, unpublished data). This suggeststhat some aspect of the structure of the piggyBac vectorused compromised splicing efficiency.

Identification of protein trap and enhancer trap corecollections: To help identify a core set of valid gene traplines we examined the EGFP expression patterns ofmany nonredundant lines in both the adult ovary andlarval salivary gland. There was a strong correlationbetween insert location and the nature of the stainingpatterns observed. More than 95% of lines in class 1A,the in-frame fusions, produced patterned EGFP expres-sion above background in at least some ovarian cells orin the salivary gland. In contrast, a much smaller, but stillsignificant, fraction of lines in classes 2–5 also expressedEGFP in a regulated manner. Combining informationon insert location, genome annotation, RNA transcriptsequence, and EGFP pattern, we identified a set of 244lines predicted to produce fusions between EGFP and431 protein isoforms of 233 distinct genes (Table 4).These new protein trap lines express EGFP in a widevariety of cellular compartments under diverse devel-opmental controls (see Figures 2 and 3).

A second major class of lines in the collection showedthe properties expected of EGFP enhancer traps (Ta-ble 5). These CB lines were susceptible to enhancertrapping from the P-element promoter, were locatedmostly upstream of the annotated start site, expressedEGFP in nuclei, and the RNA analysis, if available, didnot indicate fusion in frame downstream. The expres-sion patterns of such insertions in well-characterizedgenes supported this interpretation. For example, lineCB02030 in ptc showed strong expression in the innersheath cells of the germarium (Forbes et al. 1996), whileline CB04353 in Dad strongly expressed in the germlinestem cells and immediately downstream germ cells (Kai

and Spradling 2003; Casanueva and Ferguson 2004).The characterization of a significant number of lines

in the collection remains incomplete (Table 1). Many ofthese contain insertions located .0.5 kb from an ap-propriately oriented annotated gene but where RNAsequence data were not obtained. Others are insertedwithin genes at locations not predicted to generateprotein fusions or gene traps. Some of these lines

TA

BL

E4

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

CB

0265

812

5220

672R

�h

v5B

CC

0167

012

9448

373R

1h

v5B

CC

0193

221

8568

623R

�h

v5B

CA

0743

656

4435

82R

�sl

5BC

B03

064

8536

402

2L1

hv

5BC

C00

523

2280

6145

3R�

hv

5B

aA

nn

ota

tio

nre

leas

e5.

0.

1518 M. Buszczak et al.

Page 15: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

express EGFP in the ovary, and we cannot rule out thatothers express transcripts in other tissues. On the basisof the processing of previous lines in these same classesthis suggests that a significant number of new enhancertraps and a handful of new protein traps could be sortedout from a larger number of lines with secondary inser-tions in already trapped genes. Consequently, thenumber of different genes trapped in the collectionis likely to increase beyond the 600 or so currentlycharacterized.

Even when a protein is tagged in frame, the insertionof the EGFP sequence is expected to disrupt its normalstructure and localization some fraction of the time. Forexample, line CC01311 traps CG15015, the Drosophilahomolog of mammalian Cip4, a modular proteinthat interacts with Cdc42 and helps to regulate theactin cytoskeleton (Aspenstrom 1997). The CC01311P element is inserted between the first two coding exonsand thus disrupts the FES/Cip4 domain of CG15015(Figure 1G). The protein trap fusion product localizesto the nucleolus while transgenes of CG15015 tagged at

either the very N or C termini localize to the cytoplasmwhen expressed in S2 cells. We observed that 3 otherprotein trap fusion products of 107 analyzed accumu-lated in the nucleus when they were expected to residein the cytoplasm.

Diverse behavior of tagged proteins: The 244 iden-tified protein trap lines of the core collection exhibitextremely diverse patterns of EGFP expression, suggest-ing that proteins occupying a wide range of cellularcompartments can be tagged in vivo. We observed manylines with EGFP fluorescence in the cytoplasm (Figure2A) or nucleus (Figure 2B) as expected. Localization tointracellular membranous structures was also com-monly seen, as illustrated by a trap in the ER structuralcomponent Reticulon-1 (Figure 2D). Gene trapping ofsecretory proteins is thought to be inefficient due toretention of the fusion proteins in the ER where theactivity of the fusion gene may be affected (Skarnes

et al. 1995). The full-length protein traps we constructedcould label secreted proteins, as indicated by theextracellular localization of EGFP in CA06735, a fusion

Figure 2.—Protein traps for the study of pro-tein subcellular localization. Patterns of subcellu-lar localization of EGFP expressed from thefollowing lines that trap the indicated genes wereobserved: (A) cytoplasmic, CA06607 (CG17342);(B) nuclear, BA00164 (dom); (C) enhancer trapnuclear, CB04353 (Dad) in stem cells and earlycystocytes; (D) endoplasmic reticulum,CA06523 (Rtnl1); (E) extracellular, CA06735 (ca-thepsin K); (F) membrane, CA07474 (Picot); (G)apical, CC01941 (Baz); (H) chromatin, CA07249(stwl); (I) nuclear membrane, CA07301(Fs(2)Ket); (J) lipid droplets, CA07051 (Lsd-2);(K) novel structure, CA07332 (CG6854); (L)novel structure, CC01326 (polo).

Protein Trap Lines in Drosophila 1519

Page 16: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

Figure 3.—Protein traps for the study of developmental regulation during oogenesis. The expression in the ovary of variousprotein trap lines is shown to illustrate how they can be used to associate genes with developmental processes. (A) Schematic of anovariole tip. The terminal filament (TF), cap cells (CpC), germline stem cells (GSC), cystoblast (CB), and escort cells (ES) areillustrated. (B–H) Cell type identification. (B) Terminal filament, CB02069 (CG14207); (C) cap cells, CB03410 (CG1600); (D)escort cells, CC01359 (fax); (E) follicle cells, CC06135 (CG12785); (F) outer border cells and posterior follicle cells, CB02349(NK7.1); (G) oocyte nucleus equals the germinal vesicle (arrow), CB04219 (CG13776); (H) novel sheath cell type, CC01646(CG12920). (I–L) Analyzing developmental processes. (I) Novel structure in center of midstage follicle, CC00523 (Msn); (J) fu-some, CC01436 (Sec61); (K) germline and somatic ring canals, CA07004 (Vsg); (L) chorion gene amplification, CB04400 (Orc2).(M–P) Localization of proteins in the oocyte. (M) Posterior pole, CC01442 (EIF-4E); (N) posterior pole, CA06517 (Tral); (O)posterior pole, CC00236 (CG32423); (P) anterior pole, CA07529 (CG6151). (Q–T) Developmental regulation of gene expressionin early germ cells. (Q) Control with little change, CC01961 (Actn); (R) GSC/CB enriched, CC06238 (CG11963); (S) GSC andearly cyst enriched, CC01915 (Eff); (T) GSC and forming cyst enriched, CC01442 (EIF-4E).

1520 M. Buszczak et al.

Page 17: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E5

En

han

cer

trap

alle

les

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

1.28

CB

0449

223

8942

52R

1h

vR

A0.

51

13

4EH

PC

B05

417

1993

4636

3R�

hv

RA

11

44

abo

/l(

2)06

225

CB

0327

210

9753

072L

�h

vR

A0.

51

23

Ad

f1C

B02

276

2550

455

2R�

Let

hal

RA

2.5

14

1A

dk3

/C

G46

74C

B04

307

6715

035

3R�

hv

RB

0.5

22

RA

0.5

22

3ao

pC

B02

762

2178

398

2L�

Let

hal

RB

12

44

AT

PC

L/

CG

8370

CB

0442

711

8887

672R

�h

vR

A0.

52

93

bic

CB

0385

587

5772

72R

1L

eth

alR

A0.

52

2R

B0.

51

13

bn

lC

B02

854

1566

2816

3R�

hv

RA

0.5

25

RB

0.5

25

3b

oca

CB

0207

034

5428

32R

�h

vR

A0.

51

23

bo

cksb

eute

lC

B03

586

5416

706

3R�

hv

RA

11

24

Brd

CB

0266

514

9654

673L

�h

vR

A1

14

btn

CB

0350

118

4140

773R

1h

vR

A1

11

4C

BP

CB

0376

272

3096

5X

1h

vR

A0.

51

43

Cct

1C

B02

171

1546

105

3L1

hv

RA

12

44

cdi

CB

0512

914

9199

503R

1h

vR

A1

28

22

CG

1022

5/T

bp

-1C

B02

343

1957

8503

3R1

hv

RA

0.5

13

3C

G10

272

CB

0356

722

3308

33R

1L

eth

alR

A1.

53

9R

B1

28

RC

1.5

36

RD

1.5

39

2C

G10

399

CB

0475

369

6041

62L

1h

vR

A0.

51

23

CG

1044

4C

B05

265

1616

1990

2R�

Let

hal

RA

11

24

CG

1086

3C

B05

229

3942

458

3L1

hv

RB

11

63

CG

1099

0C

B02

351

1364

8912

X1

hv

RA

12

45

CG

1104

2C

B03

720

9046

478

X1

hv

RA

0.5

11

3C

G11

16C

B03

511

1045

469

3R�

hv

RA

11

5R

B1

14

RC

11

54

CG

1138

2C

B02

249

1103

926

X1

hv

RB

0.5

11

3C

G11

526

CB

0222

833

2588

33L

1h

vR

A1.

52

3R

B1.

51

32

CG

1153

7C

B03

533

3143

415

3L�

hv

RC

11

84

CG

1163

8/C

G32

814

CB

0233

792

0725

X1

hv

RA

12

54

CG

1177

9C

B05

200

1498

3800

3R1

hv

RC

1.5

15

RA

1.5

14

RD

1.5

14

1C

G11

940

CB

0502

419

7425

60X

�h

vR

A1.

51

3R

B0.

52

31

CG

1236

0C

B02

135

8856

387

3R1

hv

RA

0.5

25

RB

0.5

25

3C

G12

367

CB

0531

280

3326

02R

1h

vR

A1

13

4C

G12

40C

B02

937

2767

042

3L1

Let

hal

RA

12

24

CG

1241

8C

B05

329

5700

359

3R1

hv

RA

0.5

11

3C

G12

744/

cbx

CB

0509

157

6281

72R

1h

vR

A1

12

RB

0.5

23

3C

G12

797

CB

0350

210

7425

042R

1h

vR

A0.

51

13

CG

1329

5C

B04

422

6068

307

3L1

hv

RA

11

34

CG

1389

5C

B04

175

7082

353L

�h

vR

A0.

52

33

CG

1421

5C

B02

255

1954

6304

X1

hv

RA

0.5

17

3C

G14

430

CB

0223

668

7985

1X

1h

vR

A0.

51

13

CG

1447

8C

B02

133

1334

5190

2R1

hv

RA

12

24

(con

tin

ued

)

Protein Trap Lines in Drosophila 1521

Page 18: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E5

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

CG

1454

2C

B02

373

2187

8549

3R�

Let

hal

RA

0.5

16

3C

G14

709

CB

0439

773

9488

63R

1h

vR

A0.

51

83

CG

1621

CB

0204

233

8060

22R

�h

vR

A0.

51

23

CG

1667

CB

0547

757

3128

02R

1Se

mil

eth

alR

A2

15

4C

G16

708

CB

0438

811

9353

33R

�h

vR

A1.

52

42

CG

1681

7C

B02

934

5503

678

3R1

hv

RA

11

44

CG

1697

1C

B03

898

2241

063L

1h

vR

B1

23

RC

12

3R

D1

23

4C

G17

002/

vim

arC

B05

094

2873

307

2R1

hv

RB

11

54

CG

1709

0C

B03

116

5442

483L

1h

vR

B1.

52

102

CG

1709

0C

B02

270

5442

853L

�h

vR

A1.

51

25

CG

1732

3C

B02

962

1882

3565

2L1

hv

RA

0.5

25

3C

G17

836

CB

0526

314

7495

493R

1h

vR

B3.

53

5R

A3.

53

5R

C2.

53

4R

D0.

52

31

CG

1910

/l(

3)s1

921

CB

0370

227

5732

143R

�L

eth

alR

B1.

52

4R

A0.

51

42

CG

2051

CB

0362

516

1485

83R

1h

vR

B1

13

RA

0.5

13

RC

0.5

13

4C

G21

86C

B05

020

1075

1743

X�

hv

RA

11

74

CG

2446

CB

0370

311

6087

41X

�h

vR

A0.

53

4R

E1

34

RB

13

4R

D1

23

3C

G26

98C

B03

026

3827

319

3R1

hv

RA

12

84

CG

2865

CB

0302

321

8754

9X

�h

vR

A0.

51

23

CG

2926

CB

0223

214

1409

03R

1h

vR

A0.

51

13

CG

2974

CB

0404

799

8061

4X

�h

vR

A0.

51

23

CG

3005

5C

B02

739

8477

121

2R1

hv

RA

11

14

CG

3049

7C

B02

106

3667

177

2R�

hv

RA

22

34

CG

3124

1C

B02

140

1408

1632

3R1

Let

hal

RA

22

24

CG

3147

5C

B04

600

1500

6356

3R1

hv

RA

0.5

24

3C

G31

522

CB

0269

327

9018

3R�

hv

RB

12

10R

C1

22

4C

G31

64C

B02

987

1294

462L

1h

vR

A1.

51

8R

C1.

51

8R

B0.

51

7R

D0.

51

71

CG

3165

0C

B05

467

5043

131

2L�

hv

RA

1.5

24

RB

12

4R

C1

24

2C

G31

688

CB

0357

020

4290

452L

1h

vR

A1

26

4C

G31

689

CB

0323

927

3530

02L

1h

vR

C1.

52

11R

D1.

52

11R

A1.

52

11R

B1.

52

112

CG

3178

2C

B03

345

1671

6141

2L�

hv

RA

2.5

16

RB

2.5

17

1C

G32

043

CB

0324

794

5566

83L

1h

vR

B2

25

RA

22

54

CG

3209

CB

0544

519

9565

112R

1h

vR

A1

17

RB

11

64

CG

3234

5C

B03

988

6281

403L

�h

vR

A0.

51

15

CG

3243

6C

B02

614

2134

0125

3L�

hv

RA

1.5

15

1C

G32

436

CB

0414

821

3402

683L

1h

vR

A1.

51

51

CG

3248

6C

B03

414

3070

840

3L1

Let

hal

RD

0.5

18

3C

G33

21C

B05

150

1015

3408

3R1

hv

RA

22

2R

B2

22

3C

G33

214

CB

0417

321

5006

373L

�h

vR

A1

16

4C

G33

232

CB

0374

024

6687

53L

1h

vR

A0.

53

93

CG

3355

8C

B05

689

2859

128

2R�

hv

RA

0.5

216

3C

G33

967

CB

0440

110

5494

983R

�L

eth

alR

A1

19

4

(con

tin

ued

)

1522 M. Buszczak et al.

Page 19: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E5

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

CG

3398

2C

B04

965

1552

7849

3L1

hv

RA

11

13

CG

3409

CB

0441

226

0232

72R

�h

vR

A1.

53

72

CG

3411

0C

B04

263

2101

0670

3R1

Let

hal

RC

2.5

19

1C

G34

28C

B02

131

9480

857

3L�

hv

RA

11

34

CG

3654

/U

ch-L

3C

B04

163

9470

143

3L�

hv

RA

0.5

14

3C

G40

91C

B02

284

1958

9843

2R�

Let

hal

RA

12

3R

C1

34

4C

G43

00C

B04

249

1227

0328

3L�

hv

RA

11

4R

B1

14

4C

G45

70C

B02

124

6682

617

3R1

hv

RA

11

14

CG

4612

CB

0533

120

4136

412R

�h

vR

A0.

52

33

CG

5381

CB

0461

610

3219

492L

1h

vR

A0.

52

73

CG

5543

CB

0485

419

6967

642R

1h

vR

A0.

51

13

CG

5548

CB

0566

714

9578

72X

�h

vR

A0.

52

23

CG

5677

CB

0205

420

0557

233R

�L

eth

alR

A1

11

4C

G60

14C

B04

785

2139

0388

3L1

hv

RA

1.5

37

2C

G62

18C

B03

213

1116

8745

3R1

hv

RA

12

54

CG

6311

/N

edd

4C

B04

145

1752

2938

3L1

hv

RC

11

54

CG

6439

CB

0383

617

8448

243R

�L

eth

alR

A1

16

4C

G64

99C

B05

192

1107

5733

3R�

Let

hal

RA

21

54

CG

6540

CB

0392

218

5422

28X

1h

vR

A1

12

4C

G67

70C

B02

632

1204

6132

2L�

hv

RA

0.5

11

3C

G71

10C

B04

925

1339

9559

2L1

hv

RB

22

74

CG

7228

CB

0311

579

9418

12L

�h

vR

A1

25

RB

12

54

CG

7331

CB

0515

441

7556

33R

1h

vR

A1

11

4C

G76

37C

B03

644

6698

443

2R�

hv

RA

0.5

12

3C

G80

36C

B04

958

4495

764

3R1

Let

hal

RB

2.5

34

RC

1.5

23

1C

G80

92C

B05

336

1110

1113

2R�

Let

hal

RA

11

6R

B1

13

4C

G81

28C

B02

087

1559

1543

X1

hv

RA

12

44

CG

8206

CB

0452

715

6345

96X

1h

vR

A0.

51

13

CG

8444

CB

0383

750

8636

63R

1L

eth

alR

A1

13

4C

G85

83C

B04

752

7353

717

3L�

hv

RA

11

44

CG

9062

CB

0205

671

7145

82R

�h

vR

B1

18

4C

G91

71C

B02

706

5800

187

2L�

hv

RA

14

8R

B1

37

4C

G93

28C

B03

031

2079

7707

2L�

hv

RB

0.5

23

3C

G95

91C

B05

694

9508

902

3R�

Let

hal

RA

22

94

CG

9666

CB

0450

319

2575

793L

�L

eth

alR

A1

34

4C

G96

99C

B02

196

1658

1750

X�

hv

RA

2.5

25

RF

2.5

25

RB

2.5

25

RD

2.5

25

1C

G98

21C

B03

335

4646

100

3R�

Let

hal

RB

12

2R

A1

22

4C

G99

21C

B02

890

1622

6793

X�

hv

RA

0.5

23

3C

G99

24C

B03

517

9852

398

3R�

hv

RB

2.5

39

2C

hd

64C

B03

690

4122

396

3L�

hv

RB

11

34

chrb

CB

0542

911

4809

493L

1h

vR

C1

14

4

(con

tin

ued

)

Protein Trap Lines in Drosophila 1523

Page 20: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E5

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

Cp

190

CB

0493

311

1009

853R

1h

vR

A0.

52

52

cro

lC

B03

039

1180

9411

2L�

hv

RD

14

8R

A0.

54

8R

B0.

54

7R

C0.

54

84

Csp

CB

0221

722

2607

503L

1L

eth

alR

A1

15

RB

11

7R

C1

16

4C

tBP

CB

0407

388

3794

03R

�L

eth

alR

A1.

51

6R

B1.

52

8R

C1.

52

61

Cyc

B3

CB

0298

120

6965

203R

�L

eth

alR

A1

11

4C

ycD

CB

0261

815

8036

75X

1h

vR

C1

15

RB

12

6R

A0.

52

64

Dad

CB

0435

312

8822

903R

1Se

mil

eth

alR

A2.

53

8R

B1.

52

7R

C1.

53

81

dal

lyC

B03

495

8820

577

3L1

hv

RA

0.5

19

2d

apC

B05

233

5599

791

2R1

Let

hal

RA

0.5

13

RB

0.5

13

2D

ap16

0C

B04

282

2114

2183

2L�

Let

hal

RA

1.5

211

RB

1.5

210

2D

cp-1

/p

ita

CB

0316

019

4390

262R

1L

eth

alR

A1

13

4D

ekC

B02

288

1274

3725

2R�

hv

RA

0.5

111

RD

0.5

19

RC

0.5

111

RB

0.5

110

3d

esat

1C

B02

105

8269

738

3R1

hv

RA

1.5

25

RC

1.5

25

RE

1.5

25

RB

1.5

25

2D

lC

B02

040

1515

1950

3R�

Let

hal

RA

0.5

18

RB

0.5

16

3D

p1/

imd

CB

0375

414

2995

092R

�h

vR

A0.

53

7R

B0.

53

7R

C0.

53

73

Dre

f/R

pL

13C

B02

226

9967

309

2L�

hv

RA

11

44

drk

CB

0297

493

8965

62R

�L

eth

alR

E1.

52

6R

C1.

52

6R

B1.

52

6R

A1.

52

62

du

pC

B04

090

1126

8107

2R�

Let

hal

RA

11

34

eas

CB

0262

016

1723

62X

1h

vR

B1

25

RA

12

5R

E1

26

RD

0.5

37

4ed

lC

B04

040

1456

1061

2R�

hv

RA

0.5

22

3eI

FC

B02

125

1426

796

3R1

Let

hal

RA

12

8R

G1

28

RC

0.5

39

RF

0.5

28

4eI

F3-

S10

CB

0549

325

9579

3R�

Let

hal

RA

11

4R

B0.

53

54

eIF

5BC

B05

358

3460

609

3L�

hv

RB

0.5

19

3E

ip75

BC

B05

160

1800

7641

3L�

Let

hal

RB

1.5

16

1em

cC

B02

035

7492

953L

1h

vR

A0.

51

23

end

oA

CB

0508

414

7323

503R

�Se

mil

eth

alR

A0.

51

23

En

oC

B02

039

1727

775

2L�

Let

hal

RB

33

4R

E3

34

RC

33

4R

D3

34

4es

gC

B02

017

1533

3865

2L1

Let

hal

RA

11

14

fas

CB

0299

295

1051

02R

1h

vR

A0.

53

133

fjC

B04

634

1412

0268

2R1

hv

RA

11

14

flfl

/C

G95

91C

B05

447

9510

094

3R1

Let

hal

RA

1.5

211

RB

1.5

211

2fo

k/n

ebC

B05

794

2008

5050

2L�

Let

hal

RA

11

24

for

CB

0295

636

3219

02L

�L

eth

alR

A4.

53

10R

B2.

51

8R

H3.

52

9R

I4.

53

101

fs(1

)K10

/kz

CB

0279

021

3612

7X

�h

vR

A0.

51

23

ftz-

f1C

B05

043

1875

8843

3L�

Let

hal

RB

1.5

19

1F

ur1

CB

0348

921

2988

143R

�h

vR

A0.

54

12R

B0.

54

12R

C0.

54

11R

D0.

53

103

fwC

B05

224

1189

8010

X�

hv

RA

0.5

215

3fz

2C

B02

997

1916

3698

3L�

Sem

ilet

hal

RB

13

34

glec

CB

0236

417

6819

683R

�Se

mil

eth

alR

A1

12

4G

lyco

gen

inC

B05

177

1708

7979

2R1

Sem

ilet

hal

RB

0.5

15

RA

0.5

13

3G

rip

163

CB

0513

911

9886

093L

�L

eth

alR

A4

15

4

(con

tin

ued

)

1524 M. Buszczak et al.

Page 21: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E5

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

grp

CB

0489

416

6839

202L

1h

vR

B1.

52

7R

C1.

53

82

Gst

E1

CB

0495

614

2858

712R

1L

eth

alR

A0.

51

13

Gst

S1C

B05

261

1298

4903

2R�

hv

RB

12

5R

C1

25

RA

0.5

25

4gu

khC

B04

444

1480

9830

3R�

hv

RA

11

6R

B1

15

RC

11

64

Gyc

76C

CB

0311

719

7836

693L

�L

eth

alR

C2.

55

19R

A1.

54

18R

B1.

53

182

hd

cC

B05

774

2610

3641

3R1

hv

RC

0.5

16

3H

LH

m7

CB

0224

321

8626

373R

1h

vR

A0.

51

13

Hm

gDC

B02

648

1760

0985

2R1

hv

RA

1.5

23

RB

1.5

34

2H

P1b

CB

0284

989

7531

4X

1h

vR

A2

22

4H

r39

CB

0503

921

2507

902L

1h

vR

C2.

53

5R

B2.

53

8R

A2.

53

8R

D2.

53

82

Hsc

70-4

CB

0560

311

0684

123R

�h

vR

A1

22

RC

0.5

22

RE

0.5

22

4H

sr&

oh

grC

B03

168

1712

2190

3R1

hv

RA

0.5

11

RB

0.5

11

RC

0.5

11

3IP

3K1

CB

0222

797

8257

72L

1h

vR

A1

14

4Ir

pC

B02

050

6238

617

3R1

hv

RA

11

104

kuz

CB

0200

013

5502

472L

1L

eth

alR

A1

212

RB

12

12R

C1

212

4l(

2)gl

CB

0233

118

536

2L�

hv

RB

2.5

29

RA

13

10R

C2.

52

9R

D1

39

1la

ma

CB

0545

753

4845

93L

�h

vR

C1.

53

5R

B0.

53

54

Lan

AC

B04

172

6211

152

3L�

hv

RA

0.5

115

3L

BR

CB

0317

317

6079

922R

�h

vR

B0.

53

4R

A0.

52

3R

C0.

51

23

lea

CB

0289

814

2053

12L

�L

eth

alR

A0.

51

143

Lk6

CB

0212

075

9018

43R

�h

vR

A0.

51

63

lola

CB

0288

864

2921

52R

�h

vR

C0.

52

6R

G0.

52

6R

H0.

52

6R

J0.

52

63

Map

60C

B03

167

5478

371

2R1

hv

RA

11

24

mb

cC

B04

603

1960

8306

3R�

Let

hal

RA

1.5

114

1M

bs

CB

0215

016

0451

083L

�L

eth

alR

B1

218

RA

12

18R

C1

219

4M

ESR

3C

B02

595

1861

7241

2L1

hv

RA

0.5

34

3M

ESR

4C

B04

813

1343

5844

2R�

Let

hal

RA

0.5

23

3m

irr

CB

0268

912

6865

473L

1L

eth

alR

A0.

51

5R

B0.

51

53

Mn

fC

B03

632

1111

3376

3L�

hv

RA

1.5

29

RE

1.5

210

RD

1.5

29

2M

ocs

1C

B04

106

1106

3638

3L1

hv

RA

0.5

14

RC

0.5

14

RB

0.5

44

3m

od

CB

0217

227

8783

413R

1h

vR

A3

36

4m

Rp

S17

CB

0366

320

0619

522R

�h

vR

A1

12

4M

yo31

DF

CB

0437

710

5067

732L

1h

vR

A1

210

RB

11

94

neb

/fo

kC

B04

551

2008

5119

2L1

Let

hal

RA

1.5

12

1n

esC

B05

687

1925

3774

3L1

hv

RA

12

4R

B1

24

RC

12

44

Neu

3C

B02

076

1052

3038

3R�

Let

hal

RB

11

84

NK

7.1

CB

0234

910

1988

283R

�h

vR

A0.

52

43

nm

oC

B02

015

7972

207

3L1

RA

13

9R

B1

28

RE

12

8R

C1

47

4n

mo

CB

0463

579

7243

93L

�L

eth

alR

A1.

53

9R

B1.

52

8R

E1.

52

8R

C1.

54

73

Nrg

CB

0488

384

1140

1X

1h

vR

C0.

52

8R

B0.

52

8R

A0.

52

83

orb

2C

B04

897

8946

768

3L�

hv

RD

2.5

36

RB

1.5

25

RC

0.5

25

2

(con

tin

ued

)

Protein Trap Lines in Drosophila 1525

Page 22: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

TA

BL

E5

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

Orc

2C

B03

773

9790

734

3R�

Let

hal

RA

22

44

Pak

3C

B04

117

1227

9337

3R�

hv

RA

11

2R

B1

12

4P

i3K

21B

CB

0513

229

7734

2L1

hv

RA

0.5

13

3P

ka-C

1C

B03

724

9699

251

2L�

Let

hal

RB

12

2R

A1

22

RC

12

24

PN

UT

S-R

BC

B04

203

8704

882L

�L

eth

alR

B1

25

RC

13

4R

D1

15

4p

pa

CB

0355

118

5765

122R

�h

vR

A0.

51

13

Psc

CB

0508

388

6824

12R

�h

vR

A1

25

4p

tcC

B02

030

4537

352

2R1

Let

hal

RA

11

64

pxb

CB

0562

511

4918

663R

�h

vR

B0.

52

5R

A0.

52

53

PyK

CB

0424

718

1941

693R

1L

eth

alR

B1.

52

42

Rab

1C

B03

450

1709

4755

3R�

Let

hal

RA

11

5R

B1

14

4R

ab23

CB

0378

115

0912

03R

�h

vR

A1.

52

32

rdgB

CB

0385

613

6567

72X

1h

vR

A0.

52

12R

C0.

53

13R

D0.

52

123

Rga

CB

0213

914

3830

23R

�L

eth

alR

A1

28

RB

12

84

rgr

CB

0224

244

8794

92R

1h

vR

A1

15

4rh

oC

B05

698

1463

699

3L1

hv

RA

0.5

24

3R

ho

GD

IC

B05

226

1992

2267

3L1

Let

hal

RA

1.5

25

2R

pd

3C

B04

302

4626

763

3L1

Let

hal

RA

11

44

sas

CB

0330

729

8850

53R

1Se

mil

eth

alR

B0.

52

103

Sc2

CB

0463

839

0351

23L

1h

vR

A1

14

4Se

p2

CB

0501

816

4140

543R

1h

vR

A1

12

4SF

1C

B05

052

1336

6171

3R1

hv

RA

11

64

SF2

CB

0302

812

1677

403R

�h

vR

A0.

51

43

sgl

CB

0510

169

5788

73L

�h

vR

A1

13

4sh

gC

B02

169

1694

4287

2R�

Let

hal

RA

11

24

Sir2

/D

naJ

-HC

B02

821

1316

5875

2L1

hv

RA

11

24

siz

CB

0328

821

0277

583L

�L

eth

alR

A1

16

4sk

dC

B02

029

2101

8904

3L�

Let

hal

RD

1.5

213

RC

1.5

213

2So

dC

B03

598

1110

6792

3L�

hv

RA

11

24

Ssd

pC

B02

353

1402

2949

3R�

Let

hal

RA

1.5

22

RB

1.5

22

RC

1.5

22

2st

gC

B03

726

2508

1026

3R�

hv

RA

11

24

stu

mp

sC

B05

360

1041

7302

3R1

Let

hal

RA

0.5

15

3st

wl

CB

0522

314

4029

823L

1h

vR

A1

12

4ta

nky

rase

CB

0324

921

4869

053R

�h

vR

A1

17

4ta

raC

B02

767

1206

3444

3R1

hv

RA

1.5

12

1ta

raC

B03

523

1207

5785

3R�

Let

hal

RA

1.5

25

RB

1.5

25

1T

bh

CB

0404

578

8948

8X

1h

vR

B0.

52

83

Tct

pC

B03

738

7036

007

3R�

hv

RA

11

14

To

mC

B02

307

1496

2360

3L1

hv

RA

0.5

11

3tr

bl

CB

0274

420

3947

213L

�Se

mil

eth

alR

A0.

51

23

trn

CB

0511

413

1071

423L

1h

vR

A0.

52

23

(con

tin

ued

)

1526 M. Buszczak et al.

Page 23: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

with CG8947, a Drosophila cathepsin K homolog(Figure 2E), and the membrane location of a trap inPicot, a phosphate symporter (Figure 2F). Proteins thatare apically localized in polarized epithelia such asovarian follicle cells were easily visualized, as observedfor Bazooka (Par3) (Figure 2G). Subcompartmental-ized nuclear proteins were also readily apparent. Forexample, a trap of the Stonewall (Stwl) HMG-relatedprotein involved in germ cell chromatin organization(Clark and McKearin 1996) labeled nurse cell nucleiand the oocyte nucleus (inset) differently (Figure 2H).Fs(2)Ket, a protein involved in nuclear import, waslocalized to the nuclear periphery (Figure 2I). In somecases, cell-specific cellular compartments were labeled,such as in CA07051, which traps Lsd-2 and exhibitedEGFP localization to lipid droplets that arise in late-stage nurse cells (Figure 2J).

These studies provide a high-resolution view ofknown protein locations in living cells and also identifymany proteins that were not previously known to residewithin these compartments. In addition, the value ofprotein trapping as a discovery tool was illustrated by thefact that we observed new patterns of localization as well.For example, the HDAC4 protein, fused by the CA07134trap, labeled a small body often found in only one nursecell within an egg chamber (Figure 2K). A spindle-likestructure in young nurse cells was labeled with a proteintrap in the polo gene encoding a mitotic kinase (Figure2L). Antibodies specific for the trapped protein can beused to isolate and further investigate the proteinspresent in these novel structures.

Analysis of developmental regulation—ovarian cells:All the lines in the core collection were characterized onthe basis of their patterns of expression in germ cellsand follicle cells during oogenesis. These experimentsidentified lines expressing in the major classes ofsomatic cells, including terminal filament cells (Figure3B), cap cells (Figure 3C), escort cells (Figure 3D),profollicle cells (Figure 3E), and border and posteriorcells (Figure 3F). Other lines expressed in germ cells ofvarious ages, including some that were highly enrichedin the oocyte nucleus (germinal vesicle) (Figure 3G). Asin the case of subcellular compartments, these studiesdocumented patterns of developmental expression formany genes that were not previously known. Thesegenes become attractive candidates for study of theirfunction in the corresponding processes.

Strikingly, the collection also revealed the likelyexistence of new cell types and novel biological pro-cesses previously unrecognized despite many years ofstudy of ovarian biology. Line CC01646 traps theCG12920 protein and is expressed in a small subset ofovarian sheath cells that likely represent a novel cell type(Figure 3H). In line CC00523 we observed accumula-tion of Msn-EGFP preferentially at the center of mid-stage growing follicles (Figure 3I). It was not previouslyknown that this region was the site of unique protein

TA

BL

E5

(Co

nti

nu

ed)

Gen

eL

ine

Site

aC

hr

Stra

nd

Ph

eno

typ

eT

1In

sert

Met

Sto

pT

2In

sert

Met

Sto

pT

3In

sert

Met

Sto

pT

4In

sert

Met

Sto

pT

ype

trx

CB

0515

610

1084

783R

�h

vR

A1.

53

9R

B1.

53

8R

C1.

52

7R

D1.

52

82

ttk

CB

0227

427

5507

553R

1h

vR

E1.

52

5R

F1.

52

5R

B1

25

RC

12

52

Ugt

37c1

CB

0290

012

7332

282R

�L

eth

alR

A0.

51

13

Ugt

86D

aC

B03

314

6982

675

3R1

hv

RA

0.5

13

3V

ha1

00-2

CB

0340

414

2246

303R

�h

vR

B1.

52

62

Vh

aPP

A1-

1C

B02

209

1072

9723

3R�

Let

hal

RA

12

24

wu

n2

CB

0226

753

0176

02R

1h

vR

A0.

51

63

Xb

p1

CB

0206

117

0311

152R

1L

eth

alR

B1

13

RA

11

24

Xe7

CB

0561

014

9504

33R

1L

eth

alR

A1

18

4Z

4/C

G12

974

CB

0230

521

2792

453L

1Se

mil

eth

alR

A1

12

4

aA

nn

ota

tio

nre

leas

e5.

0.

Protein Trap Lines in Drosophila 1527

Page 24: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

accumulation. Msn encodes a protein involved in Junkinase signaling, suggesting that a special intercellularjunction may assemble in this region to structurallyorganize the nurse cells. We observed a similar expres-sion program (not shown) for the line CB03040 thatfuses the Pli gene, encoding a protein associated withthe NF-kB signaling response.

Developmentally specific subcellular structures, in-cluding the fusome (Figure 3J, arrow) and both somaticand germline ring canals (Figure 3K), were also labeledby rare lines. A general and extremely useful applicationof the collection is to identify new proteins that areassociated with such structures and analyze the effect ofmutations. For example, the preferential accumulationof Sec61 in the fusome observed here has been validatedin recent studies (Snapp et al. 2004). Proof of principleexperiments of this type that focus on the fusome will bedescribed elsewhere.

Another valuable capacity of protein trap lines is theability to follow important developmental processes athigh resolution and in living cells. During oogenesis, atleast four major clusters of chorion genes undergospecific gene amplification in stage 10B follicles, aprocess that can be visualized as small ‘‘amplificationdots’’ of BrdU incorporation (Calvi et al. 1998). Theamplifying genes specifically contain substantial amo-unts of replication initiation proteins such as Orc2 atthis time, whereas normally Orc2 is found throughoutthe cell nuclei (Royzman et al. 1999). A protein trap linein Orc2 allows the amplifying loci to be directly visu-alized (Figure 3L). Inspection shows that the dots arenot present in preamplification stage follicle cells butstrongly label amplifying gene loci at stage 10B (Figure3L, arrow).

The Drosophila oocyte represents an importantmodel system for studying RNA and protein localiza-tion. Several biochemical and genetic studies haveidentified proteins enriched at either the anterior orthe posterior pole of the oocyte (Lasko 2003; Wilhelm

and Smibert 2005). Protein traps in genes identified inthese studies, including EIF-4E (Figure 3M) and Tral(Figure 3N), faithfully recapitulate the localization pat-terns of their endogenous counterparts to the posteriorpole of the oocyte (Wilhelm et al. 2003, 2005). Severalother proteins tagged in the collection display posteriorlocalization patterns including a trap in CG32423(Figure 3O), a largely uncharacterized RNA-bindingprotein. In addition, a trap in CG6151 appears to beenriched at the anterior of the oocyte (Figure 3P).While future work will clarify the role of these proteinsin oocyte patterning, these examples show that proteintrapping can complement other approaches and beused to identify new components of localized RNPcomplexes within the cells.

Protein traps provide unique opportunities to analyzegene regulation during development in populations ofcells that are difficult to isolate and in those that are

sensitive to loss of cellular context. We illustrate thepotential of this approach using the regulation of germcell development within and just downstream from thegermline stem cells (GSCs). Many protein trap lines,including CC01961 in Actn, showed uniform expressionin GSCs, CBs, and developing germline cysts (Figure3Q). However, it was possible to find other exampleswhere expression levels between GSCs and early germcells differed from those in other germ cells within thegermarium. One of the most striking examples was lineCC06238 that traps the putative Drosophila succinateCoA ligase gene. Expression was stronger in stem cells(and sometimes early cystoblasts) than in later germcells as illustrated in Figure 3R. Several other genes weredownregulated shortly after GSC division, including effete(Figure 3S), encoding the UbcD1 ubiquitin-conjugatingenzyme that has been shown toaffect germline cyst forma-tion (Lilly et al. 2000). Another line whose EGFP expres-sion was downregulated slightly later, at the completion ofcyst formation, trapped the Drosophila eIF-4E gene(Figure 3T). Downregulation of a related gene CG8023was previously observed at a slightly earlier time, duringcyst divisions (Kai et al. 2005).

Studies on the limitations of current protein trapmethods: We also tested the sensitivity of the approachused here to detect Drosophila genes by looking at theexpression of the lines in the core collection in germlinestem cells. First, although lines were selected on thebasis of expression in embryos, we found ovarianexpression above background in .90% of lines in thecore collection. However, this does not address whethermany other genes exist that were fused but expressedEGFP at levels too low to detect in either tissue. Analysisof germline stem cell RNA by hybridization to Affymet-trix arrays detected transcripts from �6500 Drosophilagenes over an �1000-fold dynamic range (Kai et al.2005). Although translational regulation and differen-tial protein stability, not to mention differences in stain-ing sensitivity between different preparations, would beexpected to introduce potential variation, we werecurious whether protein trap lines could detect stemcell gene transcripts across the full range of expressionlevels.

We observed a strong correspondence between thesetwo measures of stem cell gene expression (Figure 4, A–E). Lines with very strong EGFP expression tended tohave RNA levels at least 10-fold higher than lines withabove background but relatively weak expression. Theselines in turn had signals higher than most lines scored asbelow the level of detection on arrays. The correlationwas not perfect; for example, some lines showed moreEGFP expression in stem cells than might be expectedfrom the microarray study (Figure 4F). The existence ofsuch lines was not unexpected, because some lines likelystill carry second insertions, and the microarray usedwas based on release 1 gene models. Overall, we coulddetect EGFP above background in GSCs from nearly all

1528 M. Buszczak et al.

Page 25: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

lines whose levels of mRNA are called as ‘‘present’’ onAffymetrix arrays (Kai et al. 2005). This suggests thatprotein trap lines are not limited to a relatively smallnumber of highly expressed genes, but can be used tofollow a large fraction of gene activity.

DISCUSSION

The Carnegie protein trap collection—a versatileresearch tool: These experiments significantly expandthe number of protein trap lines available for studies ofgene expression within a complex multicellular animal(also see accompanying article by Quinones-Coello

et al. 2007, this issue). Our initial characterization ofthese lines extends previous documentation that thebehavior of the EGFP-tagged protein often correspondsto the behavior of the protein to which it is fused.Moreover, we demonstrate that collections such as oursare exceptionally useful as tools of gene discovery.Candidate genes can be selected on the basis of thedevelopmental expression, subcellular localization, ordynamic behavior of particular protein isoforms. Thesame line can subsequently be used to purify the proteinand its associated complexes and to create deletions forfurther genetic analysis. The Carnegie collection isavailable for research use from the Carnegie Institution.Information is available at http://www.ciwemb.edu/resources/proteintrapcollection.html and at http://flytrap.med.yale.edu/.

Subcellular distribution of protein location: Previ-ously, gene and protein trapping in yeast has been usedto estimate the fraction of proteins that are localizedto various cellular compartments (Ross-Macdonald

et al. 1999; Kumar et al. 2002; Huh et al. 2003). Morethan half of all proteins showed a simple localizationto the cytoplasm or nucleus. Other subcellular struc-tures labeled by tagged proteins in yeast included the

plasma membrane, the ER, mitochondria, the lysosome,and the perixosome. We obtained similar results. Thedistribution patterns of EGFP-tagged proteins withinDrosophila ovarian cells generally matched those ofyeast and most known structures within egg chambershave now been labeled with at least one protein trap line(this study; Morin et al. 2001; Clyne et al. 2003). More-over, the large size of ovarian cells often allowed us todistinguish the fine structure of several subcellular com-partments labeled by EGFP fusion proteins generatedin this screen.

Developmental regulation of protein expression:Despite the fact that only one tissue was examinedclosely, a large number of proteins in the core collectionwere expressed and many were developmentally regu-lated. A diverse array of cell types within the germariumincluding the terminal filament, cap cells, escort cells,germline stem cells, and prefollicle cells were labeled invarious lines. However, expression frequently varies fromstage to stage, not only in cell type but also in subcellularlocation, complicating the problem of accurate annota-tion. Currently, protein trap images within the ovary arebeing curated in the FlyTrap database. Because of therelative cellular simplicity of the germarium and de-veloping ovarian follicles, it may be possible to developtools for displaying expression patterns at single-cellresolution in this tissue. It will be particularly valuable toadd data from many other tissues and developmentalstages for these same lines, to facilitate comparisons.

Identification of insertions not predicted by genomeannotation: One of the surprising results of our studieswas the relatively high frequency of EGFP-positive linesthat were located at sites not predicted to fuse to anyannotated Drosophila transcript. However, similar re-sults were observed in previous studies of gene traptransposons. At least 44% of insertions analyzed byMorin et al. (2001) were not within annotated genes;moreover, the reading frame of insertions in genes was

Figure 4.—Studies of protein trap expressionWe compared the apparent intensity of GFP-protein staining in the GSCs with the RNA levelof the corresponding gene as determined by Af-fymetrix arrays (Kai et al. 2005). (A–F) The pat-tern of protein trap expression of the indicatedgene (see Table 4 for strain names). The expres-sion level from Affymetrix software (mean ofthree measurements) is given.

Protein Trap Lines in Drosophila 1529

Page 26: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

not determined. In yeast, Ross-Macdonald et al. (1999)observed that while 1346 EGFP-positive insertions werein the correct frame, another 480 were not. Since mostlacked an alternative start site, they postulated that ahigher than expected frequency of translational frame-shifting may occur. In a recent study in the mouse, 24%of genes were trapped in more than one reading frame(De-Zolt et al. 2006). We also observed this phenome-non; however, our studies emphasized the difficulty ofdrawing final conclusions until the location of everyinsertion and the actual pattern of splicing within themutant strains have been characterized.

Sensitivity of gene traps: A potential limitation ofprotein trapping in vivo is that many gene products maybe expressed at levels so low that EGFP expressed at thesame level could not be detected above background.Only 20% of mouse secretory traps that are G418resistant express detectable CD2, even though theneophosphotransferase gene is fused to CD2 (De-Zolt

et al. 2006). Only 33% of b-geo lines resistant to G418express detectable lacZ. This probably indicates thatmany genes exist that generate enough neophospho-transferase to confer G418 resistance, but not enoughb-galactosidase to be scored as lacZ positive (De-Zolt

et al. 2006). Consistent with this, Ross-Macdonald et al.(1999) found that 415 of 1340 in-frame fusions (31%)could be detected above background by immunofluo-rescence. In contrast, Huh et al. (2003), who taggedcomplete proteins, detected signals above backgroundfor 4156 of 6029 (69%) genes. The system we employedis also designed to tag full-length proteins, and this mayhave enhanced its sensitivity.

The requirement that each line generate EGFPfluorescence in embryos might provide a limitation onthe number of genes that could be tagged. However, ourexperiments argue that this poses relatively little selec-tion on which genes can be fused. We found that genesexpressed in germline stem cells at a wide variety oflevels on the basis of microarray studies had been fusedin our collection of protein trap lines. There was a roughcorrelation between the levels observed using antibodystaining in these cells and the microarray results. Thiswould indicate that the protein trap methodology canpotentially be used to analyze thousands of diverseDrosophila genes.

Increasing proteome coverage: Our analysis revealedtwo major limitations of the current strategy for gener-ating protein traps using P elements. Despite theadvantages of embryo sorting, the inherent 59 bias ofP-element insertion (Bellen et al. 2004) greatly limitsthe rate at which new genes can be trapped. Many of theinsertions were recovered when an insertion occurred atan internal promoter that lies within a coding intron ofanother gene isoform. Many genes lack such alternativepromoters, so it will be necessary to use differentmethods to efficiently recover a more diverse collectionof protein trap strains.

At least two alternative approaches are worthy ofconsideration. First, it should be possible to take advan-tage of the extensive collection of P-element insertionsin Drosophila genes that have been generated by theBDGP gene disruption project and other members ofthe Drosophila community (reviewed in Matthews

et al. 2005). We calculate that �2000 genes already havean existing P-element insertion within a coding intron.Moreover, P elements can recombine into the sites ofexisting P elements in the presence of transposase (Sepp

and Auld 1999). Consequently, a protein trap allele ofeach of these genes could, in principle, be generated bycombining a protein trap insertion of the appropriatereading frame with the ‘‘target’’ gene insertion in asingle strain, ideally using inserts bearing scorable mark-ers and then screening for replacement.

Alternatively, transposons with a broader insertionalspecificity than the P element would be worthwhile.piggyBac elements are suitable for widespread mutagen-esis of Drosophila genes (Thibault et al. 2004) and asgene traps (Bonin and Mann 2004). Minimal sequencesfor piggyBac transposition were defined recently (Li et al.2005). We obtained several hundred piggyBac proteintrap insertions that express EGFP, but the piggyBac genetrap vector appeared to be less efficient, on the basis ofEGFP intensity and Western blotting, than P-elementgene traps in nearby locations (also see accompanyingarticle by Quinones-Coello et al. 2007, this issue). Newvectors containing different splice acceptor and donorsites should be tested within the context of a piggyBacelement. Several new transposons with diverse inser-tional specificities, including Minos (Arensburger et al.2005; Metaxakis et al. 2005), are also worthy of con-sideration. Continued efforts to provide greater cover-age within the Drosophila proteome are warrantedbecause of the exceptional utility of protein traps inanalyzing the development and physiology of multicel-lular organisms.

We thank Lynn Cooley for helpful discussions. We thank X. Morin,W. Chia, A. Hudson, R. Lehmann, and A. Handler for reagents. We aregrateful to the following people for their assistance with diverse aspectsof the project: Alison Pinder, Joseph Carlson, Martha Evans-Holm,Crista Sewald, Becca Sheng, Melanie Issigonis, Megan Kutzer, EmilySeay, and Dianne Williams. We thank the Howard HughesMedical Institute, Carnegie Institution, and the National Institutesof Health for support. M.B. was a fellow of the American Cancer Soci-ety. T.G.N. is a Howard Hughes Fellow of Life Sciences ResearchFoundation.

LITERATURE CITED

Arensburger, P., Y. J. Kim, J. Orsetti, C. Aluvihare, D. A. O’Brochta

et al., 2005 An active transposable element, Herves, fromthe African malaria mosquito Anopheles gambiae. Genetics 169:697–708.

Aspenstrom, P., 1997 A Cdc42 target protein with homology to thenon-kinase domain of FER has a potential role in regulating theactin cytoskeleton. Curr. Biol. 7: 479–487.

Bellen, H. J., R. W. Levis, G. Liao, Y. He, J. W. Carlson et al.,2004 The BDGP gene disruption project: single transposon

1530 M. Buszczak et al.

Page 27: The Carnegie Protein Trap Library: A Versatile Tool for ... · James E. Wilhelm,* Terence D. Murphy,* Robert W. Levis,* Erika Matunis, ... or aberrant GFP expression tests the current

insertions associated with 40% of Drosophila genes. Genetics167: 761–781.

Bonin, C. P., and R. S. Mann, 2004 A piggyBac transposon gene trapfor the analysis of gene expression and function in Drosophila.Genetics 167: 1801–1811.

Buszczak, M., and A. C. Spradling, 2006 The Drosophila P68 RNAhelicase regulates transcriptional deactivation by promotingRNA release from chromatin. Genes Dev. 20: 977–989.

Calvi, B. R., M. A. Lilly and A. C. Spradling, 1998 Cell cycle con-trol of chorion gene amplification. Genes Dev. 12: 734–744.

Casanueva, M. O., and E. L. Ferguson, 2004 Germline stem cellnumber in the Drosophila ovary is regulated by redundantmechanisms that control Dpp signaling. Development 131:1881–1890.

Clark, K. A., and D. M. McKearin, 1996 The Drosophila stonewallgene encodes a putative transcription factor essential for germcell development. Development 122: 937–950.

Clyne, P. J., J. S. Brotman, S. T. Sweeney and G. Davis, 2003 Greenfluorescent protein tagging Drosophila proteins at their nativegenomic loci with small P elements. Genetics 165: 1433–1441.

De-Zolt, S., F. Schnutgen, C. Seisenberger, J. Hansen, M.Hollatz et al., 2006 High-throughput trapping of secretorypathway genes in mouse embryonic stem cells. Nucleic AcidsRes. 34: e25.

Dirks, R. W., and H. J. Tanke, 2006 Advances in fluorescent track-ing of nucleic acids in living cells. Biotechniques 40: 489–496.

Espina, V., J. Milia, G. Wu, S. Cowherd and L. A. Liotta,2006 Laser capture microdissection. Methods Mol. Biol. 319:213–229.

Forbes, A. J., A. C. Spradling, P. W. Ingham and H. Lin, 1996 Therole of segment polarity genes during early oogenesis in Dro-sophila. Development 122: 3283–3294.

Friedrich, G., and P. Soriano, 1991 Promoter traps in embryonicstem cells: a genetic screen to identify and mutate developmentalgenes in mice. Genes Dev. 5: 1513–1523.

Gossler, A., A. L. Joyner, J. Rossant and W. C. Skarnes,1989 Mouse embryonic stem cells and reporter constructs todetect developmentally regulated genes. Science 244: 463–465.

Handler, A. M., and R. A. Harrell, II, 1999 Germline transforma-tion of Drosophila melanogaster with the piggyBac transposon vec-tor. Insect Mol. Biol. 8: 449–457.

Herschman, H. R., 2003 Molecular imaging: looking at problems,seeing solutions. Science 302: 605–608.

Hild, M., B. Beckmann, S. A. Haas, B. Koch, V. Solovyev et al.,2003 An integrated gene annotation and transcriptional pro-filing approach towards the full gene content of the Drosophilagenome. Genome Biol. 5: R3.

Huh, W. K., J. V. Falvo, L. C. Gerke, A. S. Carroll, R. W. Howson

et al., 2003 Global analysis of protein localization in buddingyeast. Nature 425: 686–691.

Kai, T., and A. Spradling, 2003 An empty Drosophila stem cellniche reactivates the proliferation of ectopic cells. Proc. Natl.Acad. Sci. USA 100: 4633–4638.

Kai, T., D. Williams and A. C. Spradling, 2005 The expression pro-file of purified Drosophila germline stem cells. Dev. Biol. 283:486–502.

Kelso, R. J., M. Buszczak, A. T. Quinones, C. Castiblanco, S.Mazzalupo et al., 2004 Flytrap, a database documenting aGFP protein-trap insertion screen in Drosophila melanogaster. Nu-cleic Acids Res. 32: D418–D420.

Kumar, A., S. Agarwal, J. A. Heyman, S. Matson, M. Heidtman

et al., 2002 Subcellular localization of the yeast proteome.Genes Dev. 16: 707–719.

Lasko, P., 2003 Cup-ling oskar RNA localization and translationalcontrol. J. Cell Biol. 163: 1189–1191.

Li, X., R. A. Harrell, A. M. Handler, T. Beam, K. Hennessy

et al., 2005 piggyBac internal sequences are necessary for effi-cient transformation of target genomes. Insect Mol. Biol. 14:17–30.

Lilly, M. A., M. de Cuevas and A. C. Spradling, 2000 Cyclin Aassociates with the fusome during germline cyst formation inthe Drosophila ovary. Dev. Biol. 218: 53–63.

Matthews, K. A., T. C. Kaufman and W. M. Gelbart, 2005 Re-search resources for Drosophila: the expanding universe. Nat.Rev. Genet. 6: 179–193.

Metaxakis, A., S. Oehler, A. Klinakis and C. Savakis, 2005 Minosas a genetic and genomic tool in Drosophila melanogaster. Genetics171: 571–581.

Misra, S., M. A. Crosby, C. J. Mungall, B. B. Matthews, K. S.Campbell et al., 2002 Annotation of the Drosophila melanogastereuchromatic genome: a systematic review. Genome Biol. 3:RESEARCH0083.

Morin, X., R. Daneman, M. Zavortink and W. Chia, 2001 A pro-tein trap strategy to detect GFP-tagged proteins expressed fromtheir endogenous loci in Drosophila. Proc. Natl. Acad. Sci. USA98: 15050–15055.

Quinones-Coello, A. T., L. N. Petrella, K. Ayers, A. Melillo, S.Mazzalupo et al., 2007 Exploring strategies for protein trap-ping in Drosophila. Genetics 175: 1089–1104.

Ross-Macdonald, P., A. Sheehan, C. Friddle, G. S. Roeder and M.Snyder, 1999 Transposon mutagenesis for the analysis of pro-tein production, function, and localization. Methods Enzymol.303: 512–532.

Royzman, I., R. J. Austin, G. Bosco, S. P. Bell and T. L. Orr-Weaver,1999 ORC localization in Drosophila follicle cells and theeffects of mutations in dE2F and dDP. Genes Dev. 13: 827–840.

Sepp, K. J., and V. J. Auld, 1999 Conversion of lacZ enhancer traplines to GAL4 lines using targeted transposition in Drosophila mel-anogaster. Genetics 151: 1093–1101.

Singer, T., and E. Burke, 2003 High-throughput TAIL-PCR as a tool toidentify DNA flanking insertions. Methods Mol. Biol. 236: 241–272.

Skarnes, W. C., J. E. Moss, S. M. Hurtley and R. S. Beddington,1995 Capturing genes encoding membrane and secreted pro-teins important for mouse development. Proc. Natl. Acad. Sci.USA 92: 6592–6596.

Skarnes, W. C., H. von Melchner, W. Wurst, G. Hicks, A. S. Nord

et al., 2004 A public gene trap resource for mouse functionalgenomics. Nat. Genet. 36: 543–544.

Snapp, E. L., T. Iida, D. Frescas, J. Lippincott-Schwartz and M. A.Lilly, 2004 The fusome mediates intercellular endoplasmicreticulum connectivity in Drosophila ovarian cysts. Mol. Biol. Cell15: 4512–4521.

Spradling, A. C., D. Stern, A. Beaton, E. J. Rhem, T. Laverty et al.,1999 The Berkeley Drosophila Genome Project gene disrup-tion project: single P-element insertions mutating 25% of vitalDrosophila genes. Genetics 153: 135–177.

Stathopoulos, A., and M. Levine, 2005 Genomic regulatory net-works and animal development. Dev. Cell 9: 449–462.

Thibault, S. T., M. A. Singer, W. Y. Miyazaki, B. Milash, N. A.Dompe et al., 2004 A complementary transposon tool kit forDrosophila melanogaster using P and piggyBac. Nat. Genet. 36:283–287.

Tomancak, P., A. Beaton, R. Weiszmann, E. Kwan, S. Shu et al.,2002 Systematic determination of patterns of gene expres-sion during Drosophila embryogenesis. Genome Biol. 3:RESEARCH0088.

Vasudevan, S., and S. W. Peltz, 2003 Nuclear mRNA surveillance.Curr. Opin. Cell Biol. 15: 332–337.

Ward, R. E., IV, R. S. Lamb and R. G. Fehon, 1998 A conserved func-tional domain of Drosophila coracle is required for localizationat the septate junction and has membrane-organizing activity. J.Cell Biol. 140: 1463–1473.

Wilhelm, J. E., and C. A. Smibert, 2005 Mechanisms of transla-tional regulation in Drosophila. Biol. Cell 97: 235–252.

Wilhelm, J. E., M. Hilton, Q. Amos and W. J. Henzel, 2003 Cup isan eIF4E binding protein required for both the translational re-pression of oskar and the recruitment of Barentsz. J. Cell Biol.163: 1197–1204.

Wilhelm, J. E., M. Buszczak and S. Sayles, 2005 Efficient proteintrafficking requires trailer hitch, a component of a ribonucleo-protein complex localized to the ER in Drosophila. Dev. Cell9: 675–685.

Communicating editor: T. Schupbach

Protein Trap Lines in Drosophila 1531