19
Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in Glycine max Young B. Cho, Sarah I. Jones, and Lila O. Vodkin 1 Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 ORCID ID: 0000-0001-7866-3152 (L.O.V.) The soybean (Glycine max) seed coat has distinctive, genetically programmed patterns of pigmentation, and the recessive k1 mutation can epistatically overcome the dominant I and i i alleles, which inhibit seed color by producing small interfering RNAs (siRNAs) targeting chalcone synthase (CHS) mRNAs. Small RNA sequencing of dissected regions of immature seed coats demonstrated that CHS siRNA levels cause the patterns produced by the i i and i k alleles of the I locus, which restrict pigment to the hilum or saddle region of the seed coat, respectively. To identify the K1 locus, we compared RNA-seq data from dissected regions of two Clark isolines having similar saddle phenotypes mediated by CHS siRNAs but different genotypes (homozygous i k K1 versus homozygous i i k1). By examining differentially expressed genes, mapping information, and genome resequencing, we identied a 129-bp deletion in Glyma.11G190900 encoding Argonaute5 (AGO5), a member of the Argonaute family. Amplicon sequencing of several independent saddle pattern mutants from different genetic backgrounds revealed independent lesions affecting AGO5, thus establishing Glyma.11G190900 as the K1 locus. Nonfunctional AGO5 from k1 alleles leads to altered distributions of CHS siRNAs, thus explaining how the k1 mutation reverses the phenotype of the seed coat regions from yellow to pigmented, even in the presence of the normally dominant I or i i alleles. INTRODUCTION The interaction of the I and K1 loci in soybean (Glycine max) is an intriguing classical example of epistasis in which the phenotype of the dominant I and i i alleles (inhibition of seed coat pigmentation) is not manifested in seed homozygous for the recessive k1 allele. The dominant I and i i alleles are found in most commercial cultivars and in many of the standard varieties used in soybean breeding programs. For example, the Williams, Williams 82 (the cultivar used for the reference soybean genome), and Clark varieties are homozygous for the i i allele, which produces a pigmented hilum (where the seed coat attaches to the pod), but otherwise the majority of the seed coat proper is yellow (nonpigmented). Figure 1 shows the seed coat phenotypes of the four alleles of I in back- crossed or mutant isolines in the background of the Clark variety. Independent, naturally occurring mutations in a number of yellow- seeded cultivars result in completely pigmented (black) seed coats and are homozygous for the recessive i allele. Many of these isogenic recessive mutations result from naturally occurring de- letions at the inverted repeat chalcone synthase (CHS) clusters CHS1-3-4 and CHS4-3-1 present in the i i allele (Todd and Vodkin, 1996; Tuteja et al., 2004). This unusual structure of the dominant i i allele spawns tissue-specic primary and secondary CHS short- interfering RNAs (siRNAs) that target at least nine CHS genes. The unlinked CHS7 and CHS8 are the main genes downregulated in immature seed coats (Tuteja et al., 2009; Cho et al., 2013) resulting in yellow rather than pigmented seed coats. The rare i k allele is not used commercially but was introgressed into various lines by backcrossing (see a pedigree summary in Supplemental Figure 1). The i k allele results in a two-colored seed coat, known as the saddle pattern, since the pigment extends from the hilum to oc- cupy a saddle-shaped region on both sides of the seed coat proper. Most soybean varieties also contain the dominant K1 locus. Interestingly, a recessive k1 spontaneous mutation interacts epistatically to partially overcome the effect of the dominant I and i i alleles and to extend the pigmented region over a larger surface of the seed coat, as shown in Figure 1. Thus, seed with the i i k1 genotype have a saddle pattern that mimics the i k K1 phenotype. The effect of the k1 mutation on the dominant I allele is even more pronounced, and the seed with the I k1 genotype are either completely black or near blackwith a narrow strip of non- pigmented region at the outer edge of the seed coat, which is visible in fully expanded seed before desiccation but often is not apparent on the mature seed. The k1 allele is found in some unadapted germplasm, and new mutations from K1 to k1 have occurred in modern varieties. Except for a brief description of the intriguing phenotypes of this genetic interaction described in a 1958 abstract (Williams, 1958) and in review chapters by Bernard and Weiss (1973), Palmer and Kilen (1987), and Palmer et al. (2004), nothing is known about the nature of the K1 locus other than it segregates independently from the I locus and has a pu- tative map position in the soybean composite reference map at Soybase (Grant et al., 2010). In this work, our objectives are (1) to determine whether the pattern phenotypes specied by i i K1 (pigmented hilum genotype) or the i k K1 and i i k1 saddle genotypes are conditioned by different levels of CHS siRNAs in the sectors of seed coats with different 1 Address correspondence to [email protected]. The author responsible for distribution of materials integral to the nding presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Lila O. Vodkin (l-vodkin@ illinois.edu). www.plantcell.org/cgi/doi/10.1105/tpc.17.00162 The Plant Cell, Vol. 29: 708–725, April 2017, www.plantcell.org ã 2017 ASPB.

Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Mutations in Argonaute5 Illuminate EpistaticInteractions of the K1 and I Loci Leading to SaddleSeed Color Patterns in Glycine max

Young B. Cho, Sarah I. Jones, and Lila O. Vodkin1

Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801

ORCID ID: 0000-0001-7866-3152 (L.O.V.)

The soybean (Glycine max) seed coat has distinctive, genetically programmed patterns of pigmentation, and the recessive k1mutation can epistatically overcome the dominant I and ii alleles, which inhibit seed color by producing small interfering RNAs(siRNAs) targeting chalcone synthase (CHS) mRNAs. Small RNA sequencing of dissected regions of immature seed coatsdemonstrated that CHS siRNA levels cause the patterns produced by the ii and ik alleles of the I locus, which restrict pigmentto the hilum or saddle region of the seed coat, respectively. To identify the K1 locus, we compared RNA-seq data fromdissected regions of two Clark isolines having similar saddle phenotypes mediated by CHS siRNAs but different genotypes(homozygous ik K1 versus homozygous ii k1). By examining differentially expressed genes, mapping information, and genomeresequencing, we identified a 129-bp deletion in Glyma.11G190900 encoding Argonaute5 (AGO5), a member of the Argonautefamily. Amplicon sequencing of several independent saddle pattern mutants from different genetic backgrounds revealedindependent lesions affecting AGO5, thus establishing Glyma.11G190900 as the K1 locus. Nonfunctional AGO5 from k1 allelesleads to altered distributions of CHS siRNAs, thus explaining how the k1 mutation reverses the phenotype of the seed coatregions from yellow to pigmented, even in the presence of the normally dominant I or ii alleles.

INTRODUCTION

The interaction of the I and K1 loci in soybean (Glycine max) is anintriguing classical example of epistasis inwhich the phenotype ofthedominant Iand iialleles (inhibitionof seedcoatpigmentation) isnotmanifested inseedhomozygous for the recessivek1allele.Thedominant Iand iiallelesare found inmostcommercial cultivars andin many of the standard varieties used in soybean breedingprograms. For example, the Williams, Williams 82 (the cultivarused for the reference soybean genome), and Clark varieties arehomozygous for the ii allele, which produces a pigmented hilum(where the seed coat attaches to the pod), but otherwise themajority of theseedcoatproper is yellow (nonpigmented). Figure1shows the seed coat phenotypes of the four alleles of I in back-crossed or mutant isolines in the background of the Clark variety.Independent, naturally occurringmutations in a number of yellow-seeded cultivars result in completely pigmented (black) seedcoats andare homozygous for the recessive i allele.Manyof theseisogenic recessive mutations result from naturally occurring de-letions at the inverted repeat chalcone synthase (CHS) clustersCHS1-3-4 andCHS4-3-1 present in the ii allele (Todd and Vodkin,1996; Tuteja et al., 2004). This unusual structure of the dominant ii

allele spawns tissue-specific primary and secondary CHS short-interfering RNAs (siRNAs) that target at least nineCHS genes. Theunlinked CHS7 and CHS8 are the main genes downregulated inimmature seedcoats (Tuteja et al., 2009;Choet al., 2013) resulting

in yellow rather than pigmented seed coats. The rare ik allele is notused commercially but was introgressed into various lines bybackcrossing (see apedigree summary inSupplemental Figure 1).The ik allele results in a two-colored seed coat, known as thesaddle pattern, since the pigment extends from the hilum to oc-cupy a saddle-shaped region on both sides of the seed coatproper.Most soybean varieties also contain the dominant K1 locus.

Interestingly, a recessive k1 spontaneous mutation interactsepistatically topartially overcome theeffect of thedominant Iand ii

alleles and to extend the pigmented region over a larger surface ofthe seed coat, as shown in Figure 1. Thus, seed with the ii k1genotype have a saddle pattern that mimics the ik K1 phenotype.The effect of the k1mutation on the dominant I allele is evenmorepronounced, and the seed with the I k1 genotype are eithercompletely black or “near black” with a narrow strip of non-pigmented region at the outer edge of the seed coat, which isvisible in fully expanded seed before desiccation but often is notapparent on the mature seed. The k1 allele is found in someunadapted germplasm, and new mutations from K1 to k1 haveoccurred in modern varieties. Except for a brief description of theintriguing phenotypes of this genetic interaction described ina1958abstract (Williams, 1958) and in reviewchaptersbyBernardand Weiss (1973), Palmer and Kilen (1987), and Palmer et al.(2004), nothing is known about the nature of the K1 locus otherthan it segregates independently from the I locus and has a pu-tative map position in the soybean composite reference map atSoybase (Grant et al., 2010).In this work, our objectives are (1) to determine whether the

pattern phenotypes specified by ii K1 (pigmented hilum genotype)or the ik K1 and ii k1 saddle genotypes are conditioned by differentlevels of CHS siRNAs in the sectors of seed coats with different

1 Address correspondence to [email protected] author responsible for distribution of materials integral to the findingpresented in this article in accordance with the policy described in theInstructions for Authors (www.plantcell.org) is: Lila O. Vodkin ([email protected]).www.plantcell.org/cgi/doi/10.1105/tpc.17.00162

The Plant Cell, Vol. 29: 708–725, April 2017, www.plantcell.org ã 2017 ASPB.

Page 2: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

pigment phenotypes and (2) to use global expression analysiscombined with mapping data and structural analysis of whole-genome resequencing data to identify the K1 locus and how itinteracts epistatically with alleles of the I locus to reverse thephenotype normally specified by the dominant I and ii alleles.Using small RNA analyses and RNA-seq, we demonstrate thatCHS siRNAs are the causeof saddle seed coat patterns producedby the iiand ikallelesof the I locus,which restrict pigment to certainregions of the seed coats. By combining RNA-seq and whole-genome resequencing, we also show that the epistatic K1 locus,which modifies the spatial regulation of CHS siRNA productionwithin the seed coats, encodes Argonaute5 (AGO5), a memberof the Argonaute family of proteins. Independently occurringspontaneousmutations having the saddle pattern phenotypealsoshowed different lesions in the AGO5 gene with dramatic effectson itspredictedproteinstructure.The functionofAGO5appears tobe integral to the spatial distribution of the CHS siRNAs, thusexplaining how the k1 allele reverses the phenotype of the seed

coat regions fromyellow topigmented, even in thepresenceof thenormally dominant ii or I alleles, and explains the classical geneticinteractions of these two loci.

RESULTS

The Recessive k1 Mutation Modifies the PigmentationPatterns Specified by Alleles of the I Locus

Asshown inFigure1A, the iialleleproducesapigmentedhilumandthe ik allele produces a saddle pattern in the presence of thedominant K1 allele. However, in the presence of a homozygousrecessive k1 allele, the pigment occupies a larger area, resulting ina saddle pattern on seed with the ii k1 genotype that mimics the ik

K1 phenotype. To examine the differential abundance of CHSsiRNAs in the two regions, we dissected seed coats and con-ducted high-throughput sequencing using small RNA libraries, asdescribed in thenext section.Since thepigment is not yet visible inthe immature green seed stages during which CHS siRNAs aremost abundant, we used position to determine how to dissect thetwo regions. Figure 1B shows some of the actual tissue as well asa schematic of the dissections.While themorphology of the hilumregion can be determined visibly, there is no way to differentiatethe saddle prior to pigment formation, except by positioning. Wedissected the saddle region in a conservative manner includingonly the central region to ensure that no yellow tissuewas likely tobe included. Likewise, the yellow tissue was taken far distal fromthe saddle area. In total, we analyzed data from 16 different smallRNA libraries yielding from 22 to 88 million reads per library(Supplemental Data Set 1).

Distribution of CHS siRNAs Results in the Pigmented HilumPattern of the Dominant ii Allele

The Williams cultivar is homozygous for the dominant ii allele,which produces a pattern with pigment restricted to the hilumregion of the seed coat. To investigate whether the presence ofCHS siRNAs is limited to only the nonpigmented region, wedissected the hilum from the seed coat proper from two stages ofthe immature seed, 25 to 50 mg and 50 to 100 mg (stage de-marcations based on the weight of individual seeds). We hadpreviously determined that these twostagescontainedhigh levelsof small RNAs in the total seed coats ofWilliams (Cho et al., 2013).We used at least 10 seeds in RNA extractions to even out thebiological variation for the dissected hilum and seed coat proper.High-throughput small RNA sequencing was conducted for twobiological repeatsofeachseedcoat regionand thedataareshownin Figures 2A and 2B and in Supplemental Data Set 2. The nor-malized CHS siRNAs that align to the CHS coding regions werehighest in yellow regions of the seed coat and were significantlyreduced in the pigmented hilum region in both repeats of bothseedweight ranges. The siRNAs representing target genesCHS7andCHS8 had the highest levels, as expected, each exhibiting an;20-fold difference between the pigmented and the non-pigmented regions. The fold increase is not as pronounced in the25 to50mgstage, likely due to the fact that in theearlier stages it ismore difficult to obtain tissue dissections of the hilum regions thatare free of cells from the seed coat proper. At the immature green

Figure 1. Pattern Alleles Exhibited by the Interaction of the I andK1 Loci inSoybean.

Cultivars or isolines are homozygous for the indicated alleles, and, forbrevity, only one of the alleles is indicated in the genotypes.(A) Seeds on the left have the commonly found dominant K1 allele andseeds in the right column are homozygous for the rare, recessive k1mutation. Dominance relations of the I alleles are in the order I >ii>ik>i, andthe dominant I and ii alleles are found in most commercially used cultivars.The Williams and Clark cultivars have the ii allele that limits pigment to thehilum region near the embryonic axis. The saddle pattern isolines used inthis study are in the genetic background of the cultivar Clark and are re-ferred to as Clark 8 (PI 547450, ik K1) and Clark 18a (PI 547439, ii k1). Thearrow denotes that Clark 18a, whichwas found in 1956 in Ames, Iowa, wasaspontaneousmutation fromaClarkparent line. Thepigment canbeeitherblack or brown of varying intensities depending upon other genes presentin the lines (Williams, 1952; Bernard and Weiss, 1973).(B) An illustration of the dissection of immature green seed coats to obtainsections from seed with the pigmented hilum or saddle pattern pheno-types. Since the immature green seed do not show the pigmented phe-notype, the dissections were conducted in a conservative manner to takesections that would avoid tissues withmixed phenotypes at maturity. Left:Actual dissected regions from the immature green seed coats of the 100 to200mgseedweight range.Right: An illustration of the regionsdissectedasoutlined on the phenotype of the mature seed.

A Saddle Phenotype Mediated by an Argonaute Protein 709

Page 3: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

stage, the pigment has not formed (Figure 1B); thus, onecannot discern the precise boundaries of the pigmented andnonpigmented tissues. In summary, these data clearly showthat the ii allele specifies the high level accumulation of theCHS siRNAs only in the regions of the seed coat that are notpigmented. Thus, not only does the ii allele control highlytissue-specific production of CHS siRNAs (Tuteja et al., 2009;Cho et al., 2013), but it also directs the pattern-specificproduction of CHS small RNAs within seed coats of the samegenotype.

Quantitative Variation in CHS siRNAs Results in the SaddlePigment Patterns on Seed Coats

Four small RNA libraries from two biological repeats were con-structed from thepigmented saddle region and thenonpigmentedseed coat of the Clark 8 isoline, which is homozygous for the ik

saddle pattern allele in the presence of the dominant K1 allele(Supplemental Data Set 1). For the saddle pattern genotypes, weused a slightly larger seed weight range to prevent mixing ofphenotypes in dissected tissue. Both CHS7 and CHS8 siRNAswere ;15-fold higher in the nonpigmented region than the pig-mented regions dissected from 100 to 200 mg seed (Figure 2C).The biological repeat data also show that expression levels of allCHS siRNAs, especially CHS7 and CHS8 siRNAs, are muchhigher in the nonpigmented seed coat than in the pigmentedsaddle (Figure 2D).

We next examined the presence of CHS siRNAs in the pig-mented saddle region versus the nonpigmented seed coat regionin four small RNA libraries from two biological repeats of the Clark18a isoline (ii k1). In this isoline, the recessive k1mutation extendsthe pigmented region from the hilum to form a saddle pattern thatis similar in phenotype to theClark 8 isoline having ik K1genotype.Figures 2E and 2F show that the presence of CHS siRNAs waspredominantly limited to the nonpigmented yellow regions as inthe caseof theClark 8 isoline.CHS7andCHS8 siRNAswere;26-fold higher in thenonpigmented regions inbothbiological repeats.SupplementalDataSet2shows the levelsof smallRNAsmatchingeachof the annotatedCHSGlymamodels in the soybeangenomewithin thedifferent regionsof thesaddlepattern seedcoatshavingthe same genotype.

Transcriptome Data Show CHS siRNA and mRNA LevelsHave Inverse Patterns of Abundance in the DifferentRegions of the Seed Coat Saddle Patterns

Figures2Gand2Hclearly showthat theexpression levelsofCHS7siRNAs and their target CHS7 mRNAs demonstrated inversepatterns of abundance in two biological repeats of small RNA inwhich the same biological samples were also subjected to RNA-seq. CHS7 siRNAs were more highly expressed in the non-pigmented region where the expression level ofCHS7mRNAwaslow. However, in the pigmented regions, CHS7 mRNAs werehighly expressed and siRNAs were not. The biological repeatshows a similar expression pattern for both CHS7 siRNAs andmRNAs. The results are similar for CHS8 (Supplemental Data Set3). Taken together, the small RNAandRNA-seqdata demonstratethatCHSsiRNAshaveacritical role in the formationof thepigment

pattern on soybean seed coats through the downregulation oftheir target CHS mRNAs.

Differential Expression of RNA-Seq from ik K1 versus ii k1Genotypes Revealed a Small Number of Candidate Genesfor the K1 Locus

To identify theK1 locus,wecompared the transcriptomedata fromthe two saddle pattern Clark isolines, Clark 8 (ik K1) versus Clark18a (ii k1). In total, we analyzed 12mRNA libraries containing threebiological repeats from the two dissected regions of each ge-notype, as indicated in Supplemental Data Set 1. The RNA-seqreads (43 to97million reads total fromeach library)were aligned toall 88,647 soybean gene models and splice variants and nor-malized as reads per kilobase of gene model size per millionmapped reads (RPKM) (Mortazavi et al., 2008). We compared thetranscriptomedata of the pigmented region ofClark 8 (K1 allele) tothe pigmented region of Clark 18a (k1 allele) in order to minimizevariation due to position on the seed coat while searching forgenetic differences specific to the K1 locus. Likewise, the non-pigmented regionswere compared between the two varieties.Weused both the Cufflinks package (Trapnell et al., 2012) and theDESeq software (Anders and Huber, 2010) for analyses of tran-scriptome data. Supplemental Figure 2 shows Cufflinks scatter-plotsanddensityplotscomparing theRNA-seqdata.According tosoybean public map resources shown in Supplemental Figure 3,the K1 locus has been mapped to Chromosome 11; thus, weconcentrated our RNA-seq analyses on that chromosome regionnear a mapped marker and purposefully took a wide swath of10 million bases representing ;30% of the chromosome. Figure3A illustrates that there were relatively few significantly differen-tially expressed genes in the DESeq analysis with a Benjamini-Hochberg adjustedP valueof <0.05 thatwere foundwithin a 5-Mbrange on either side of the marker used. There were seven dif-ferentially expressed genes in the pigmented saddle tissue, andtheywere also found in the 25differentially expressedgenes in theyellow distal tissue, thus yielding only a small number of potentialcandidate genes in this region of 10 Mb on chromosome 11, asshown in Supplemental Data Set 4.We further determined that Glyma.11G190900 (Chr11:

26,361,722..26,367,787), annotated as an Argonaute protein(AGO5), is the closest of these candidate genes to the markerBARC-040309-07711 at position Gm11:25,085,355, which isa closely linked marker to the K1 locus in the current Wm82.a2reference genome (shown in Supplemental Figure 3D). We alsosearched for the AGO5 Glyma model in the older version of thesoybean reference (Wm82.a1), where it corresponds to Gly-ma11g19650 (Gm11:16394576-16400897) in theWm82.a1.v1.1assembly and is 800 kb from the position of BARC-040309-07711 (Gm11:15,576,898) in Wm82.a1. Thus, Glyma.11G190900was the strongest candidate gene since others were located muchfurther from the marker position on the genetic map.Figure 3B shows that the yellow sectors of theK1 genotype had

the highest AGO5 transcripts at ;7.5 RPKM. It also depicts thatAGO5 was expressed at significantly higher levels in both theblack saddle and yellow sectors of the K1 genotypes comparedwithseedcoatsof the k1genotypes. In summary, Figure3Bshowsthat AGO5 transcripts were higher by 2.3-fold and 6.2-fold in the

710 The Plant Cell

Page 4: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Figure 2. Distribution of CHS siRNAs or mRNAs within Sectored Regions of Seed Coats of Different Genotypes.

Black bars represent pigmented regions of the seed coats, and yellowbars represent the nonpigmented regions dissected from immature seed coats of theindicatedgenotypesandseedphenotypes.Thenumberof total smallRNAreads inmillions (M)derived fromeachof the tissuesections is indicatedbelowthe

A Saddle Phenotype Mediated by an Argonaute Protein 711

Page 5: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

black saddle and the yellow regions, respectively, of the K1compared with k1 seed coats.

Resequencing Reveals a Small Deletion in AGO5(Glyma.11G190900) in the Saddle Mutant Having theRecessive k1 Allele

Sincethe locationofgeneticmapmarkers isonlyanapproximation,weinvestigated all candidate genes and searched for structural variantsincluding insertions, deletions, and single-nucleotide polymorphismsby comparingwhole-genome resequencing data (Supplemental DataSet 1) between the nonpigmented progenitor Clark (ii K1) and itsisogenicsaddlemutantClark18a(ii k1).Figure4Aillustratesthatasmalldeletion (absence of alignments) was found in the exon 7 region ofGlyma.11G190900 in the saddle mutant (k1), while its progenitor (K1)does not have this structural variant. The depth of alignments from themutant k1 line dropped in the vicinity of the exon 7 region. Bowtie2allows gaps and partial read alignments at the insertion/deletionbreakpoint. Since only partial reads aligned to the K1 reference allele,these delineate the breakpoint. Inspection of the alignments showedthere are two extra sequences, 59-CTTTGGTATCT-39 and 59-TTGCCTGTT-39, flanking the alignments atGm11:26,365,578aroundthe39 endof intron6ofGlyma.11G190900andatGm11:26,365,450 inexon7of the gene. Thus, the chimeric sequence is predicted to be 59-CTTTGGTATCTTTTTTGCCTGTT-39. Based on the whole-genomeresequencing data showing this chimeric sequence, we project thedeletionmutationomits the129-bp regionbetweenGm11:26,365,450and Gm11:26,365,578.

To confirm the deletion detected by genomic resequencingdata,weperformedPCRamplifyingthefragmentacross thedeletedregion that spans exon 7. Figure 4B shows that each of the sevensoybean lines carrying the dominant K1 allele displayed bands of;500 bp in length, while the Clark 18a (ii k1) saddle mutant hada400-bpfragment.Finally,wesubjected theentire6-kbAGO5geneamplified from the k1 allele of Clark 18a to next-generation se-quencing and found the same chimeric sequence (Figure 4C) thatwas observed in the whole genomic resequencing data. This chi-meric sequence reveals that the 129-bp deletion occurs in a T-richregionand removestheAGat theendof intron6andallbut12basesof the 139 bases of exon 7 of Glyma.11G190900.

The 129-bp Genomic Deletion Results in Loss of the EntireExon 7 in the k1 Transcript by Exon-SkippingAlternative Splicing

Asshown inFigure3B, the levelsof the k1 transcript areapparentlyaffected by the 129-bp genomic deletion since the RPKMs for

Glyma.11G190900 were reduced in k1 compared with K1 by 2.3-fold and 6.2-fold in the black saddle and the yellow seed coatregions, respectively. Thus, the deletion in the recessive k1 allelereduces the expression of Glyma.11G190900 rather than abol-ishing it entirely.Weanalyzedpositiongraphsof theRNA-seqdataalignments in detail to understand which part of the gene istranscribed in the presence of the deletion. Figure 4D shows thatthe 129-bp genomic deletion appeared to abolish the entire139 bases of exon 7 from the mRNA transcript even thougha partial region of the 39 end of exon 7 (12 bp) remains in thegenome. This was demonstrated by the absence of alignments tothe entire exon 7 of Glyma.11G190900 in the recessive k1 saddlemutant (Clark 18a). Since the genomic deletion removes the AGofthe 39 splicing signal immediately prior to exon 7 (Figure 4C), wepropose that the lossof the 39 splice site causes the intron splicingmachinery to skip to thenext 39splicing signal immediately prior toexon 8, thus eliminating the entire exon 7 from the final cyto-plasmic transcript.To test that all of exon 7 is removed by splicing, wemodified the

transcript sequence of Glyma.11G190900 by omitting the full139bpof exon7 touseasa reference for theRNA-seqalignments.Weused very stringent conditions not allowing anymismatchwiththe Bowtie1 program, which does not allow any gaps, evena single base deletion or insertion in the alignments. As shownFigure 4D, the alignment gap was found against the originaltranscript but not the modified transcript (right panel). Figure 4Eshows that the chimeric transcripts found in RNA-seq data of thek1 saddle mutant (Clark 18a) cleanly fused exon 6 to exon 8, thusconfirming that the 129-bp genomic deletion leads to skipping ofthe entire 139 bp of exon 7.

The Clark 18a k1 Deletion Results in Early Termination of theAGO5 Protein

Glyma.11G190900 encodes an AGO5 protein, one of the AGOfamilymembers. AGOproteins have critical roles in the function ofsmallRNAs inposttranscriptional regulationof targetgenesdue totheir ability to bind to nucleic acids (Chapman and Carrington,2007; Fang and Qi, 2016). The AGO complex recruits single-stranded RNA after Dicer cleaves double-stranded RNA andguides it to the target transcript, which is cleaved. AGO5(Glyma.11G190900) likely plays a crucial role, either directly orindirectly, in regulating the level ofCHS siRNAs in a spatialmannercausing the saddle pattern phenotype in the recessive k1 allelesaddle mutant (Clark 18a).Figure 5A illustrates the amino acid sequence of AGO5

(Glyma.11G190900) in K1 varieties. AGO5 has 922 amino acids

Figure 2. (continued).

arrow. RPM, normalized total number of reads per million; BR, biological repeat. The full data for siRNA or mRNA levels for all CHS genes are shown inSupplemental Data Sets 2 and 3.(A)and (B)CHSsiRNAs fromtwostagesofWilliams (blackhilum; ii K1) immatureseedatweight ranges25 to50or50 to100mg.Asecondbiological repeat ofeach weight range shows the same trend (Supplemental Data Set 1).(C) and (D) CHS siRNAs from two biological repeats of Clark 8 (black saddle; ik K1) at the 100 to 200 mg seed weight range.(E) and (F) CHS siRNAs from two biological repeats of Clark 18a (black saddle; ii k1) at the 100 to 200 mg seed weight range.(G) Inverse relationship of CHS7 mRNA and siRNA (smRNA) levels from the two biological repeats of Clark 8 (black saddle; ik K1).(H) Inverse relationship of CHS7 mRNA and siRNA levels from the two biological repeats of Clark 18a (black saddle; ii k1).

712 The Plant Cell

Page 6: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

and includes the PAZ andPIWI domains, which are reported tobe functional domains for interaction with small RNAs. ThePAZ domain (amino acids 317–434) is an interface for bindingto nucleic acids and the PIWI domain (positions 603–877) is ananchoring site of the 59RNA guide strand. Next, we examinedthe coding potential ofAGO5without exon 7, as found in Clark18a transcripts. The elimination of exon 7 results in a frame-shifted protein at amino acid 376 and premature terminationafter 389 amino acids omitting the essential PIWI domain andpart of the PAZ domain (Figure 5B). The prematurely termi-nated k1 protein from the Clark 18a mutation is clearly non-functional.

Multiple Independent Saddle Pattern Mutations HaveLesions in the AGO5 Protein

The recessive k1 allele with the defective AGO5 on chromosome11 (Clark 18a) arose as a de novo mutation in the variety Clark.Records indicate itwas found inAmes, Iowa in 1956.Weextendedour examination to other black saddle mutations that arose inindependent soybean varieties andwhich have been preserved inthe germplasmcollectionmaintained by theUSDA (SupplementalData Set 5). Initially, we surveyed regions of the entireAGO5 genesplit intooverlapping sections todeterminewhether any structuralchangeswere visiblebygel electrophoresis aswas found tobe the

Figure 3. An Argonaute Gene, Glyma.11G190900 (AGO5), Is the Most Likely Candidate for the k1 Mutation Based on Analysis of RNA-Seq Data in theRegion of a Mapped Marker.

(A)An illustrationof thesmall numberof significantlydifferentially expressedgenes found ina large regionof 10Mbsurroundingapreviouslymappedmarker(BARC 0400309-07711) in the vicinity of the K1 locus on chromosome 11 at ;25 Mb. Black arrows indicate that only seven genes were significantlydifferentially expressed in this regionwhenRNA-seqwascompared from theblack saddle tissue, andyellowarrows indicate only 25geneswere found tobesignificantly differentially expressed in the yellow seed coat sectors between the two genotypes.(B)Statistically significant variation inRPKM levels forGlyma.11G190900 (AGO5) between theblack saddle comparisonsandyellow region comparisons ofK1 to k1 genotypes. Data represent the average of three biological repeats of RNA-seq with SE bars.

A Saddle Phenotype Mediated by an Argonaute Protein 713

Page 7: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Figure 4. Structural Variation in Glyma.11G190900 and Its Transcripts Encoding AGO5 Found in the Clark 18a k1 Black Saddle Mutation.

(A) The absence of Bowtie2 alignments in genome resequencing reveals a small deletion within exon 7 in the recessive k1 black saddle mutant comparedwith the dominantK1 allele of yellow seed. The graph represents normalized counts of alignments in 0.25 scale (count at base3 1,000,000/total number ofreads as displayed by IGV).

Page 8: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

case for the129-bpdeletionwithinClark 18a.However, nonewereobservable. We then designed primers to amplify the AGO5 gene(6067 bases of Glyma.11G190900 + 196 bp upstream of the59UTR [untranslated region]); the PCR products were barcodedand subjected to next-generation sequencing and assembly asdescribed in Methods. Supplemental Data Set 5 shows thechanges found in the DNA sequence of each amplicon structure.Figures 5B to 5F summarize the predicted effects on the AGO5protein, and Supplemental Figure 4 shows aMultalin alignment ofthe amino acid sequences of the nine AGO5 amplicons. Com-pared with the Lincoln parent line, which produces the full-lengthprotein, a mutant in the variety Lincoln found in 1954 had a singlebasepair deletion inexon13,which immediately introducesastopcodon that prematurely terminates the protein at amino acid 617.An independent saddle pattern mutation in Lincoln in 1945 wasmissing four amino acids due to 12 nucleotides missing withinexon 10. Compared with the Calland parent, a saddle mutantfound in Calland showed a 15-bp deletion at the junction of the59UTR with exon 1, including the initiation methionine codon.The next in-frame methionine is not until position 152 within theprotein. Finally, two large-seeded and saddle-patterned varieties,known as Kurakake and Kurakake Daizu, which were collected inJapan, both had the same deletion of 4 bp in exon 6 that results ina frameshift at position 351 and a premature stop codon at po-sition 373 in the protein. These data showing multiple, in-dependent, and large lesions that would inactivate the AGO5protein confirm Glyma.11G190900 to be the K1 locus and theorigin of different k1 alleles that specify a saddle pattern pheno-type. The fact that the 1945 Lincoln saddle mutant is missing onlyfour amino acids suggests that these amino acids may be criticalto the function of the AGO5 protein.

As shown in the amplicon sequencing results of SupplementalData Set 5, the number of reads assembled is from 69,609 to191,356, and the representation of the perfect matches of eachbase in the sequencing is generally much greater than 1000, al-lowing very accurate calls of the consensus contig. We found onedifference between the Phytozome reference genome Gly-ma.11G190900 (AGO5 in the Williams 82 cultivar) and all of theother AGO5 genes in the nine soybean varieties andmutants thatwe subjected to amplicon sequencing, including Williams, theprogenitor recurrent parent ofWilliams 82. Therewas an extra A in

exon 20, which would increase the reading frame of the AGO5protein to 922 amino acids rather than the 890 called for Gly-ma.11G190900 in the reference genome.We think this representsa correction to theAGO5 reference sequence at the 39 end, basedon amplicon sequencing of nine different lines. The only otherdifference relative to the Glyma.11G190900 gene model in thePhytzome call was a single deleted T within intron 6 in both theCalland K1 parent and the Calland k1 mutant line, which is likelya true variant occurring in the parent line relative to thePhytozomesequence.

Epistatic Interaction of the Mutated AGO5 k1 Gene with theDominant I Allele

Figure6A illustrates thenearlycompletenullificationofdominanceof the I allele in the presence of a recessive k1 mutation. Thisepistatic interaction leads to a near fully pigmented black or nearbrown seed coat phenotype. While the color of the pigment isdependent on theR locus, known to be aMYB transcription factorregulating enzymes in the later stages of the flavonoid pathway(Gillman et al., 2011; Zabala and Vodkin, 2014), the extent ofdistribution of the pigment is controlled by the I locus and itsinteraction with alleles of the K1 locus. Figure 6A illustrates theoriginal line numbers as well as the current Plant Introductionnumbers bywhich these isolines aremaintained in the germplasmcollection since their release in 1968 by its curator Richard L.Bernard. Theseare among the relatively fewmaterials and recordsof the unusual interaction of the two loci (Williams, 1958; BernardandWeiss, 1973). Thesourceof the k1allele used in thecreationofthe near black (Clark 18b) and near brown (Clark 18c) isolines wasClark 18a (L67-3479), the spontaneous mutation that we havedemonstrated tobea129-bpdeletion in theAGO5gene. Thus,weexamined these lines by PCR and Figure 6B showed that bothcontain the smaller PCR fragment that is characteristic of the k1allele present in Clark 18a, as expected.

Multiple AGOs Are Expressed in the Immature Seed Coats

Alignments to the soybean genome with Glyma.11G190900revealed high similarity to Glyma.12G083500, which is also an-notated as AGO5. Although pairwise alignments of the two

Figure 4. (continued).

(B)PCRvalidates that the recessive k1 allele inClark 18a has a small deletion. All varieties carrying the dominantK1 allele showaband of;500bp in length,while the recessive k1 allele shows 400 bp. Abbreviations for different cultivars are as follows:W82,Williams 82;W,Williams; UC2, Clark; C8, Clark 8; C18a,Clark 18a; UC1, UC7, and UC9 are other Clark isolines with other alleles of the I locus but all are homozygous for K1.(C)Sequencing the entire 6-kbAGO5 amplicon from theClark 18a line containing the k1mutation reveals a 129-bp deletion. The top line is the sequence ofGlyma.11G190900 exon 7 and junction regions in Clark K1. The bottom line shows the chimeric sequence covering the deletion point that is found in theClark 18a line containing the k1mutation. The 129-bp deletion removes the AG (red) of the 39 splice site preceding exon 7 and all except 12 bases of exon 7.Exon 7 sequences are shown in blue. See Supplemental Data Set 5 for the complete amplicon sequence of Clark 18a.(D) Position graphs of RNA-seq alignments of Clark 18a (k1) against the full 3514-bp transcript of Glyma.11G190900 (AGO5) from the reference genome(Williams82,K1genotype)or toamodified3375-bp transcript that lacks theentire139bpofexon7 (rightpanel). Bowtie1,whichdoesnotallowany insertionsor deletions, was usedwith nomismatches allowed. The red arrowmarks the position of exon 7 or the fusion position of exon 6 to exon 8 (x axis, position onthe transcript; y axis, non-normalized count of the number of times each base occurs within the aligned reads).(E)Chimeric transcripts validate that the entire 139 bp of exon 7 is spliced out in the recessive k1 saddlemutant. Top: The sequence of the flanking region ofexon 7 in the transcript of Glyma.11G190900. Bottom: Chimeric transcripts that were found in the RNA-seq as shown in the alignments in (D) (right) to themodified transcript missing exon 7.

A Saddle Phenotype Mediated by an Argonaute Protein 715

Page 9: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

transcriptsshowed90%nucleotidesimilarityoverall (SupplementalFigure 5), the polymorphismsaredistributed throughout the codingregions. Therefore, alignments by Bowtie1, which allow up to onlythreemismatches and no gaps, will distinguish between these twoparalogous transcripts. The chromosome 12 AGO5 with <0.6RPKM in the yellow sectors was expressed much less than thechromosome 11 AGO5 at 8 RPKM. The AGO5 on chromosome12doesnothave the129-bpdeletionobserved in thek1mutantandhas intactPAZandPIWIdomains.Although its transcriptsaremuchless abundant, it is possible that it could perform some of thefunction of AGO5 in the k1 mutant line.

In total, there are 20 genes annotated as AGO in the soybeangenome that have been examined by homology to known Ara-bidopsis thaliana AGO genes (Liu et al., 2014). A phylogenetic treeof these 20 proteins is shown in Supplemental Figure 6, and theRNA-seq levels found for 12AGO familymemberswith thehighestexpression are shown in Figure 7. The highest level of tran-scripts in the seed coats was observed for two AGO1 genes(Glyma.16G217300.1 and Glyma.09G167100.1) at ;22 and

28 RPKM, respectively. The full table of RPKM levels of alltissues and genotypes is shown in Supplemental Data Set 6.Several of the AGOs, including AGO5b (Glyma.11G190900),AGO10g, and AGO10d, differed in transcript levels within theblack and yellow sectors of the same genotype.Of these 20AGOgenes, only AGO5b (Glyma.11G190900) showed statisticallysignificant expression differences between K1 and k1 geno-types. The other AGO5-related sequence, Glyma.12G083500,was not among the 12 most highly expressed AGOs.In summary, RNA-seq analyses of isolines with patterned tis-

sues contrasting black saddle genotypes (ik K1 versus ii k1)combined with genome resequencing of isogenic mutant linesenabled a focus on a reduced number of candidate genes in thevicinity of the legacy map position of the K1 gene includingGlyma.11G190900 encoding AGO5. The presence of a 129-basedeletion in the k1 mutation that arose in 1956 in the variety Clarkwas a strong indication that AGO5 is the k1 allele producingaprematurely terminatedprotein. Amplicon sequencingof severalindependent saddle pattern mutations from different genetic

Figure 5. Schematic of the Predicted Lesions in the AGO5 Protein Found in Independent Saddle Pattern Mutants.

(A)TheAGO5structure found in standardK1 linesandparents includingWilliams,Clark, Lincoln, andCallandshowing the relativepositionsofPAZandPIWIdomains.(B) Protein prematurely terminated in the k1 allele of the Clark 18a mutation.(C) Protein prematurely terminated in the k1 allele of a 1954 saddle mutation in Lincoln.(D) Proteins prematurely terminated in two different Kurakake lines, large seeded Plant Introduction lines collected in Japan.(E) Protein missing four amino acids in a 1945 saddle pattern mutant of Lincoln.(F) Protein of the saddle mutant found in Calland in 1970 and putatively missing a large part of its N terminus due to absence of the initiation codon.Exampleseedphenotypesare illustrated.Dark lines represent aminoacidsof theAGO5sequence, a tickmark indicates the relativepositionof alteredaminoacids introduced after a frameshift mutation, and dotted lines represent missing portions of the protein sequence. The Clark parent line is based on whole-genome resequencing data but the other nine are based on full length sequencing of the AGO5 amplicon. See Supplemental Data Set 5 for amore detailedcharacterization of the affected positions in the genes and proteins. Amino acid numbers shown underneath are relative to the initiation codon of Gly-ma.11G190900 in thestandardK1allele.All 10 linessequencedshowedanextraA inexon20,whichwould increase the reading frameof theAGO5protein to922 amino acids rather than 890 as called by Phytozome for theGlyma.11G190900 predicted protein in the reference genome, representing a correction tothe reference.

716 The Plant Cell

Page 10: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

backgrounds revealed independent lesions affecting productionof the AGO5 protein, thus establishing Glyma.11G190900 as theK1 locus and implicating a role for AGO5 in controlling the spatialpatterning of the CHS siRNAs generated at the I locus.

DISCUSSION

Based on knowledge of the small RNA pathway in other organ-isms, we previously presented a model for the naturally occurringdominant I and ii alleles in preventing pigment formation in soy-bean seed coats (Tuteja et al., 2009). The long inverted repeat ofthe ii allele contains two inverted repeat clusters, cluster A (CHS1-3-4) and cluster B (CHS4-3-1), each of which also have invertedrepeats of CHS genes within them. A precursor CHS double-

stranded RNA (dsRNA) forms at some point within the region. Thecleavageof theprogenitorCHSdsRNA from theCHS1-3-4clusterregions by Dicer-Like proteins (DCL) generates primary CHSsiRNAs, someofwhichhavesimilarity to themoredistantly relatedCHS7 and CHS8 transcripts that are expressed in seed coatdevelopment. After cleavage at the CHS7/8 sites targeted by theprimary CHS siRNAs, RNA-dependent RNA polymerase syn-thesizesdsRNA from thecleavedCHS7 andCHS8mRNAs,whichare thendiced togenerate secondaryCHS siRNAs that also targetCHS7 and CHS8. Thus, the action of RNA-dependent RNApolymerase amplifies the silencing response as well as spreadingit over a larger region of the target. Sequence polymorphismsbetween the CHS1-3-4 genes at the origin of the dsRNA and theCHS7/8 gene targets allowed us to differentiate the primary

Figure 6. Epistatic Interaction of the Mutated AGO5 k1 Gene with the Dominant I Allele.

(A) Seed of the indicated isolines illustrating the almost fully pigmented phenotypes (known as near black or near brown) that were created by crossing theClark 18a black saddle k1mutation with an isoline containing the dominant I allele that specifies all yellow hila and seed coats. The increase in pigmentedsurface area is an interaction of the dominant I allele in the presence of the k1mutation, while the color of the pigment is determined by the genotype of theRlocus (Williams, 1952; Bernard and Weiss, 1973). The PI and L numbers are official USDA germplasm numbers assigned to these isolines, which werereleased in 1968 by Richard L. Bernard.(B)PCRshows that the small deletion in theAGO5geneofClark 18a (black saddle, ii k1R) is also found in theClark 18b (near black, I k1R) andClark 18c (nearbrown, I k1 r) isolines.

A Saddle Phenotype Mediated by an Argonaute Protein 717

Page 11: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

CHS1-3-4 siRNAs thatwere present in the very small seed at 12 to14 d after flowering and the transition to the secondary CHS7/8siRNAsbeginning at the 5 to 6mg seedweight range through theirmaximal expression at the 50 to 75 mg seed weight (Cho et al.,2013). The primary CHS siRNAs were lower in abundance butcomposed of a higher proportion of 22-nucleotide small RNAs,whereas the more abundant secondary CHS siRNAs were pri-marily 21 nucleotides.

C2-Idf, a dominant allele of the C2 locus, which encodeschalcone synthase in maize (Zea mays), also has repeated copies

of theCHS genes and operates via endogenous silencing throughsiRNAs (DellaVedovaetal., 2005), thoughsilencingby this locus isnot tissuespecificorpattern specificas is thecase for theallelesofthe soybean I locus. Small RNA sequencing of cosuppressed,nonpigmented transgenic petunia (Petunia hybrida) flowers withintroduced CHS genes has shown that CHS siRNAs are thecausative factor of silencing the pigment pathway in flowers (DePaoli et al., 2009). There are many commonalities between thenaturally occurring soybean seed coat and the transgenic petuniasystems (Eckardt, 2009). The first observations of silencing of

Figure 7. Comparison of the Expression Levels of 12 Argonaute Genes within Different Tissues of the Same Genotypes.

Top: RNA-seq levels within the black versus yellow tissue sectors of the ik K1 saddle pattern genotype. Bottom: RNA-seq levels within the black versusyellow tissuesectorsof the ii k1saddlepatterngenotype.Data represent theaverageof threebiological repeatsofRNA-seqdatanormalizedasRPKMwithSE

bars. Black bars are from the black saddle region and yellow are from the yellow tissue. The 12Argonaute geneswith highest expression are shown, and thecomplete data set for all 20 soybean Argonaute genes is in Supplemental Data Set 6. The subclassification of the AGOs by a, b, c, etc., followsthenomenclatureusedbyLiu et al. (2014). The redboxdenotes severalArgonauteswith variable expressionbetween thesectors. The redunderlinedenotesthe K1 locus, Glyma.11G190900, which is AGO5b, the more highly expressed of the two AGO5 soybean genes.

718 The Plant Cell

Page 12: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

chalcone synthase by cosuppression in transgenic petunia(Napoli et al., 1990; van der Krol et al., 1990) were also associatedwith variegatedor patterned flowers aswell aswhite flowers.Morerecently, it has been shown that the naturally occurring colorpatterns in petunia are the result of tandem arrangements ofCHSgenes and that CHS siRNAs are present in the nonpigmentedsectors (Morita et al., 2012). They propose that other loci are likelyinvolved in the patterning phenomena in petunia, but none havebeen identified to date. Here, we investigated the soybean systemto determine whether the patterns in soybean are due to thedifferential levelsofCHSsiRNAsandwhetherwecould identify theK1 locus that might be interacting with the small RNA pathwaysince it so dramatically changes the phenotype mediated by the Iand ii alleles.

Even more intriguing than the tissue specificity exemplified bythe I locusare thepattern-specificalleles that result in the twocolorregions of yellow and pigmented tissue on the same seed coats(Figure 1A). Using small RNA sequencing and RNA-seq of dis-sected regions, we demonstrated definitively that these patternsaredue to thedistributionofCHSsiRNAs in thesectorsdestined tobe yellow and not in the regions destined to be pigmented (Figure2). These results clearly show that the levels of the CHS siRNAsfollow a developmental program that leads to their increasedexpression within certain subregions of the seed coat that ismediated in genotypes with the ii and ik alleles.

The K1 Locus, Identified as AGO5, Influences the SpatialDistribution of CHS siRNAs

The RNAi pathway in plants evolved as a defense mechanismagainst RNA viruses and endogenous transposable elements(Baulcombe, 2004;Matzke andBirchler, 2005; Fusaro et al., 2006;Chapman and Carrington, 2007; Borges and Martienssen, 2015).Certain viral proteins can interfere with the pathway and suppressposttranscriptional silencing by siRNAs (Mallory et al., 2002; Diaz-Pendon et al., 2007). The posttranscriptional downregulation ofCHS mRNAs can be interrupted by viral proteins that are sup-pressors of silencing (Senda et al., 2004; Nagamatsu et al., 2007)leading topigmentmottlingon theseedcoat inplants infectedwithsoybean mosaic virus. Cold temperature also interferes with theproduction of small RNAs and is associated with partial discol-oration of the yellow seed with pigment (Kasai et al., 2009).

In addition to the environmental effects of viruses and tem-perature, the seed color is influenced genetically by the K1 locus,which interacts with the I locus to control the pigment distributionin the seed coat. The k1 mutant allele extends the pigmentedregion of the ii allele to form a saddle pattern that mimics thephenotype of the ik allele (Figure 1A). Thus, it appears to overcomethe silencing of the I locus. Here,we askedwhether the phenotypewas mediated also by the distribution of CHS siRNAs within thedifferent color regions of the seed coat. Similar to patterned seedcoatswith genotype ik K1, thosewith genotype ii k1 also displayedquantitative variation and inverse correlation of the levels of CHSsiRNAs and CHS7/8 mRNAs (Figures 3C to 3H). Thus, we con-clude that the k1 allele results in altered spatial distribution of theCHS siRNAs. Interestingly, the effect of the k1 mutation on thedominant I allele is even more pronounced with the inhibition ofsilencing over most of the seed coat surface (Figures 1 and 7).

Using a combination of RNA-seq and genomic resequencingcoupledwith the known chromosome location of theK1 locus, weidentified the K1 locus as Glyma.11G190900 encoding an AGO5protein (Figures 3 and 4). The chromosome 11 AGO5 in the Clark18a k1 line appears to be a nonfunctional protein. More evidencethat Glyma.11G190900 is the K1 locus is the finding of differentlesions in theAGO5protein in independentmutant pairs in Lincolnand Calland as identified by amplicon sequencing (Figure 5). Weconclude that AGO5 at the K1 locus participates in the spatialpatterning of the CHS siRNAs produced from the CHS1-3-4clusters of the I locus.

How Could a Deficit of AGO5 Affect Spatial Biogenesis ofCHS siRNAs?

A complete lack of AGO family proteins would be expectedto have highly detrimental effects, since the small RNA path-way is critical to many cellular functions through the action ofmicroRNAs (miRNAs) that often regulate transcription factors aswell as through siRNAs that are involved in methylation. Gly-ma.11G190900 is one of at least 20 different gene models an-notated as encoding Argonaute proteins in the soybean genome.There is an AGO5 paralog, Glyma.12G083500, which has 90%nucleotide similarity to Glyma.11G190900 that encodes the K1locus. The AGO5a (Glyma.12G083500) on chromosome 12 wasalso expressed in seed coats, though at a much lower level thanAGO5b (Glyma.11G190900) on chromosome 11. In addition,there are many AGOs more highly expressed in the seed coats(Figure 7) that may partially compensate for the loss of function ofone of the AGO5 genes in processing essential siRNAs andmiRNAs. Multiple genes for Dicer proteins also exist in soybean.Mutant plants constructed with zinc finger nuclease technologydisrupting either DCL1a or DCL1b homologs did not showa pronouncedmorphological or molecular phenotype, but doublemutant plants were severely affected in both (Curtin et al., 2015).The quantities of CHS siRNAs were very different in the pig-

mented and nonpigmented regions of seed coats with the saddlephenotype, implying that the production of CHS siRNAs is af-fected. Reduced or nonfunctional AGO5 could lead to alteredefficiency of particular miRNAs or other siRNAs to downregulatetheir targets. One of these targets might be a transcription factoraffecting a suite of genes or other miRNAs that regulate spatialpatterning in the seed coats. In Arabidopsis, for example, a tran-scription factor complex that generates the precursor of miR166defines cell fate in roots (Carlsbecker et al., 2010).The AGOs have been grouped into three major phylogenetic

clades (FangandQi, 2016) basedonnomenclature for the 10AGOgenes in Arabidopsis: AGO1/5/10, AGO2/3/7, and AGO4/6/8/9.AGO5 belongs to the same group as the founding Argonautemember AGO1, whose function is to load miRNAs and trans-acting siRNAs that are involved in many developmental pro-cesses. Sorting of small RNAs into Argonaute complexes can bedirected by the 59 terminal nucleotide of the small RNA with dif-ferent AGOs having preferences for different 59 nucleotides. Al-though in the same group as AGO1, AGO5 has been shown topreferentially recruit sequences with a 59 C-terminal nucleotidewhich are derived from intergenic sequences (Mi et al., 2008).AGO5 has been implicated to be important in male germline

A Saddle Phenotype Mediated by an Argonaute Protein 719

Page 13: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

development inArabidopsis (Borgesetal., 2011)and toparticipatein female gametogenesis in the ovules (Tucker et al., 2012). Aprotein complex MEL1/AGO5c is associated with 21-nucleotidephased siRNAs (phasiRNAs), which are mostly generated fromlong noncoding intergenic regions in Arabidopsis that are initiallycleaved by the 22-nucleotide miR2118 (Komiya et al., 2014). Theclosest homolog in maize, AGO5c, is also potentially the bindingpartner in anther development of premeiotic 21-nucleotidephasiRNAs, which are initially cleaved by the 22-nucleotidemiR2118 (Zhai et al., 2015).

PhasiRNAs are particularly abundant in soybean with over500 loci having been found of which 483 overlapped proteincoding genes and 41% of those corresponded to the abundantNB-LRR class of proteins that are often involved in disease re-sistance (Arikit et al., 2014). To date, we have not observed anysmall RNAs mapping to the intergenic regions of the I locusclusters onchromosome8, rather only small RNAsmapping to theCHS transcript regions. Since the CHS siRNAs are not formedfrom an initial cleavage step by a miRNA, but rather by dicing ofa dsRNA originating from the unique structure of the dominant ii

allele, they are not classified as phasiRNAs. The secondaryamplification that produces phasiRNAs can be triggered by22-nucleotidemiRNAs (Chen et al., 2010;Cuperus et al., 2010; Feiet al., 2013). Interestingly, the primary CHS siRNAs representingtheCHS1-3-4origin regionarehigher in22-nucleotidesmallRNAscomparedwith themore abundant 21-nucleotide secondaryCHSsiRNAs that map to the target CHS7 andCHS8 genes (Cho et al.,2013). It is possible that the soybean AGO5 homolog mightparticipate as a binding partner for some of the CHS siRNAs as itdoes for phasiRNAs in Arabidopsis (Komiya et al., 2014) and likelyalso in maize (Zhai et al., 2015). This could explain the lack ofsilencing in the black saddle region but cannot readily explain theeffective silencing of the target CHS7 and CHS8 mRNAs in theyellow sectors of the ii k1 seed coats where the AGO5 protein isterminated prematurely. More likely, the need for functional AGO5in mediating patterning may be manifested only very early indevelopment of the seed coat either directly in the function of thelow abundance primary CHS siRNAs or indirectly by alteringpattern development through the action of other miRNAs orsiRNAs.

The I and K1 Alleles Reflect an Epistatic InteractionInvolving the Small RNA Pathway

The I locus is one of the examples showing how small RNAsdetermine the dominance of alleles by their trans-acting influenceon gene expression. We think understanding the interaction be-tween the iiandk1alleles, producing thesaddlepatternphenotypeof soybean seed, can shed light on the molecular mechanism ofepistatic interactions through the small RNA pathway. Variousmolecular mechanisms of epistatic interactions have been dis-covered in diverse organisms (reviewed in Lehner, 2011). How-ever, few reports havedemonstrated that small RNAs are involvedin the epistatic interactions between classical gene loci.

How is the expression of CHS siRNAs so precisely regulatedwithin the same seed coat tissue with the same genotype (ii k1) tolead to a clear phenotypic boundary of the saddle color pattern? Itis very possible that the boundary represents a larger suite of

genes and small RNAs that are undergoing developmental shiftsas the cells of the seed coat expand and differentiate, but only thechange inCHSmRNAs and siRNAs is reflected as an easily visiblephenotype. To address this question, we are currently correlatingthe global viewofmRNA, siRNA, andmiRNApopulations from theblack versus yellow sectors within each genotype as well asacross genotypes. In this manner, we may find miRNAs and non-CHS siRNAs that show significant differential expression whichwould give us an insight to the molecular mechanism of whetherother small RNAs are affected that could lead to developmentaltiming of the sharp boundary between the two interacting ii and k1alleles. While many of the AGO family members appear to havenear equal expression in the black versus yellow regions of seedcoats with the same genotype, some of them, including AGO5b,AGO10g, and AGO10d, showed differential expression within thedifferent regionsof theseedcoat (Figure 7).Differential expressionof AGOs between tissue types or in response to stress orpathogens is one of the many factors that influence miRNA bio-genesis and function (Jeong et al., 2013).The mechanism of the dominant alleles of the I locus (I and ii) is

due to the unusual structure that generates CHS siRNAs. Thesequence of the ii allele is well defined through BAC sequencing(Clough et al., 2004; Tuteja and Vodkin, 2008). Restriction frag-ment length polymorphism and PCRmapping previously showedthat several of the independent, naturally occurring, recessive imutations that result in complete seed coat pigmentation weredeletions in theCHS1-3-4cluster regions (ToddandVodkin, 1996;Tuteja et al., 2004). In this article, we demonstrated that the firststep in the mechanism of the epistatic k1 mutation is mediatedthrough deficient AGO5, which effectively reduces the quantitiesof CHS siRNAs in a pattern-specific manner when k1 is in thepresence of the dominant ii or I alleles.The recessive k1 allele with the defective AGO5b on chromo-

some 11 (Clark 18a) arose as a de novo mutation in the varietyClark. Records indicate it was found in Ames, Iowa in 1956. Sinceall commercially developed soybean varieties carry the yellowseed coat combination of either I K1 or ii K1, any rare seed arisingwith fullypigmentedorsaddlepattern formsareeasily visible in thefield harvests of inbred lines (Williams, 1945; Wilcox, 1988). TheUSDA germplasm collection maintains at least 60 of these pig-mented lines that were initially observed as de novo mutations infield plots of named varieties dating back to 1945 through thepresent. The linesbreed true for the pigmentedphenotype andareoften assumed to be mutations of either the classical I or K1 loci,although traditional tests of allelism have not been conducted formost of them. However, the basic genetics of interaction of thedominant I alleleswith a k1mutationwasdelineated by Leonard F.Williams (Williams, 1958) and Richard L. Bernard (Bernard andWeiss, 1973) who served among the earliest soybean geneticists,breeders, and curators of theUSDAgermplasmcollection locatedat the University of Illinois.Using these seed resources from the 1960s, we found that two

isolines created to demonstrate the epistatic interaction of the k1allele with the dominant I allele also carry the small AGO5 deletion(Figure 6). This is as expected, since the sourceof the k1 allelewasderived from the Clark 18a mutation. Even more powerful evi-dence than cosegregation of the molecular variant with the seedphenotype is our finding that three independent black saddle

720 The Plant Cell

Page 14: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

mutations also display lesions in theAGO5 gene (Figure 5) relativeto their progenitor parent lines (two from the variety Lincoln andone from Calland). The types of lesions displayed in the AGO5geneareprimarily fromsmall deletions occurring in exonic regions(ranging from 1–129 nucleotides in size) that generally affect thereading frame and induce premature termination of the protein.Exon skipping was also demonstrated for the Clark 18a variant,deletion of the initiation codon for the Calland mutation, anddeletion of only four amino acids for one of the Lincoln mutations.Kurakake varieties are large pod, flat-seeded, and black saddlepattern varieties used in Japan for vegetable (edamame) soy-beans. The earliest report of a genetic factor named k was for theblack saddle trait in Kurakake that segregated as recessive toa non-saddle variety (Takagi, 1929). Crosses with two of the earlyspontaneous saddle pattern mutations in named varieties de-veloped in the United States revealed allelism with the Kurakakesaddle pattern gene (reviewed inBernard andWeiss, 1973). Asweshowed in Figure 5, twoKurakake black saddle varieties also havelesions in the AGO5 gene.

While mutations arising in the ii K1 varieties are easy to spot asthey produce either the black or brown saddle pattern, those thatoccur in a I K1 parent line are near black or brown, which aredifficult to distinguish from recessive i mutations that are largerdeletions within the CHS1-3-4 regions of the I locus. Thus, weinitially concentrated on the saddle pattern mutations rather thanthose that are possibly near black or near brown. Now that K1 isknown to beAGO5, all of the lines could be surveyed by ampliconsequencing to verify de novo k1mutations in theAGO5 locus. It isalso possible that some de novo saddle pattern phenotypes thatrevert fromyellowseed tosaddleor completelypigmentedmaybecontrolled by loci other than K1 or I. Thus, they would reflecta release of the posttranscriptional silencing imposed by thedominant I or ii alleles. For example, classical tests of allelismhave shown that a black saddle mutant derived from an X-rayedClark line is a different gene, and it was named k3 (Bernard andWeiss, 1973). It will be of interest to determine whether exami-nation of the k3 line or other de novo mutations with saddlephenotypes lead to discovery of different genes involved incontrolling the pattern distributions of the CHS siRNAs. Thedistribution of the CHS siRNAs is likely to represent a largerdevelopmental program than that reflected in the highly visiblephenotype of seed color.

In summary, we have shown that spatial regulation of theproduction of soybeanCHS siRNAs in the seed coat is influencedby twogenetically defined variants of the I locus (ii and ik) aswell asthe independent k1 allele using quantitative data from small RNAand mRNA populations of these variant lines. These results il-lustrate that not only are the CHS siRNAs in the soybean systemtissue specific as shown in a large data set of RNA-seq results(Cho et al., 2013), but also that they are exquisitely regulated ina pattern-specific manner during development. The nature of anepistatic interaction between the I locus alleles and the k1 mu-tation has been shown to operate via chromosome 11 AGO5,which has a role in pattern distribution of CHS siRNAs generatedby the I locus on chromosome 8 to downregulate the targetmRNAs from the CHS7 and CHS8 genes that reside on chro-mosomes 1 and 11, respectively. The k1mutation should providea way to assess other small RNAs and their target genes and

pathwaysaffectedbyAGO5malfunction in the k1allele.Molecularand genomic investigation of additional multiple independentmutations producing saddle pattern phenotypes may reveal ad-ditional genes interactingwith the small RNApathway in soybean.

METHODS

Plant Materials and Tissue Collection

The soybean (Glycine max) lines used in this study are inbred and ho-mozygous for the indicated loci. All lines were developed by soybeangeneticists and breeders during the 1960s and 1970s and are availablefrom the USDA germplasm collection through GRIN (Germplasm Re-sources Information Network). The variety Williams (PI 548631), maturitygroup III, has genotype ii K1with a black hilum on an otherwise yellow seedcoat. The genome of the closely related Williams 82 isoline was recentlysequenced (Schmutz et al., 2010) and is used as a reference genomestandard in soybean. The saddle pattern isolines are in the background ofthe cultivar Clark (ii K1), maturity group IV, and have internal lab numbers ofClark 8 andClark 18a.Clark 8 (PI 547450, L70-4204) has the saddle patternwith genotype ik K1 and results from repetitive backcrossing and se-lection of the ikallele fromBlackEyebrow intoClark.Clark18a (PI 547439,L67-3479) has the saddle pattern specified by interaction of the ii alleleand the k1 mutation originally found in Clark. The PI number is an ac-cession number by which information on the cultivar is searchable in theGRIN database. The L number is a breeding line number and is alsosearchable. Supplemental Figure 1 is a pedigree summary. Other lineswith the saddle pattern mutations that were investigated include Lincoln(UC80, PI548362) and two Lincoln black saddle mutations arising in-dependently in 1945 (UC145, L88-5422, PI634895) and 1954 (UC146,L88-5424, PI634896); Calland (UC82, PI548527), and a Calland blacksaddle mutant arising in 1970 (UC144, L88-5344, PI634873); and twolarge-seeded Japanese varieties Kurakake (UC150, PI506948) andKurakake Daizu (UC151, PI506949), which both have the black saddlephenotype.

Soybean plants were grown in the greenhouse and immature seedswere harvested over the course of several weeks. The seeds were shelled,pooled together, and then sorted by weight range. The seeds were firstdissected to separate the seed coat and the cotyledon. To obtain theregions within the seed coat, they then were dissected again to separatethe pigmented and the nonpigmented regions of the seed coat with eachpart placed in different 15-mL tubes. Figure 1B shows an illustration of thedissectionmethodsince thepigmentationcannot beobserved in thegreenstages at which the CHS siRNA levels are maximal. The seed coats werefrozen in liquid nitrogen for 10 min and stored in the freezer (280°C) untilthey were lyophilized. At least 10 samples were used for each extraction inorder to minimize biological variation.

RNA and Small RNA Isolation

Total RNAwas isolated by standardmethods using phenol and chloroformextractions (Wang and Vodkin, 1994) and precipitated with ethanol butwithout lithium chloride in order to preserve small RNAs. For mRNA iso-lation, the sample was further purified by lithium chloride precipitation. Inthe case of the seed coats with pigments, the pigmented regions containproanthocyanidins that bind to the RNA in the regions of the seed coatthat will eventually contain the anthocyanin pigments (Todd and Vodkin,1993). ThemodifiedRNA isolationmethod thatwas designed specificallyto overcome this problem uses hydrated polyvinylpolypyrrolidone,polyproline, and BSA as competitors for the proanthocyanidin com-pounds in order to extract high-quality RNA (Wang and Vodkin, 1994;Wang et al., 1994) throughout seed coat development with pigmentedgenotypes.

A Saddle Phenotype Mediated by an Argonaute Protein 721

Page 15: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Small RNA-Seq and Data Analysis

Small RNA libraries and high-throughput sequencing were performedusing the TruSeq Small RNA kit according to the manufacturer’s in-structions (Illumina) with theGenomeAnalyzer-II andHiSeq 2000 (Illumina)by theKeckCenter (Universityof Illinois,Urbana, IL) usingstandard Illuminaprotocols (https://support.illumina.com). Some of the sequences werebarcoded for sequencing within a single lane. Generally, a total of 20 to80 million reads were obtained from these deep sequencing libraries.Adapter trimming was performed using Illumina’s Flicker pipeline, whichfinds the presenceof the adapter in each readby finding thebest alignmentof theadapter to the read, and then removing it. The sizesof the small RNAsafter adapter trimming ranged from14 to 33nucleotideswith themajority inthe range of 18 to 25 nucleotides. Adapter trimmed sequences werecompared with obtain the number and occurrences of unique sequences.Alignments of siRNA sequences toCHSGlymamodels (Phytozome, JointGenome Institute) from the current assembly of the Williams 82 referencegenome (Wm82.a2) ofG.max (Schmutz et al., 2010) and to the knownCHSgenes from BACs from Williams 82 including BAC77G7a, accessionEF623858, containing the I locusclusters (Tuteja andVodkin, 2008;Cloughetal., 2004)wereperformedusing theBowtie v.1program (Langmeadetal.,2009). Supplemental Data Set 2 shows the correspondence of CHS genenames to Glyma models. Alignments were made to individual CHS se-quences with no mismatches allowed. Small RNA sequencing data werenormalized in reads per million.

mRNA Sequencing Data Analysis

Transcriptome libraries and high-throughput sequencing (RNA-seq) wereperformed using the TruSeq RNA sample kit or Stranded TruSeq RNASample Kit (Supplemental Data Set 1) per the manufacturer’s instructions(Illumina) and sequencedwith theHiSeq 2000 (Illumina) by theKeckCenterusing the standard Illumina protocol (https://support.illumina.com). A totalof 43 to 97 million reads from each library of either 75 or 100 nucleotideswere obtained from these deep sequencing libraries. Alignments ofmRNAsequences to all 88,647Glymamodels, including splice variants,from the Williams 82 reference genome of G. max (Phytozome, JointGenome Institutes) were performed using the Bowtie program v.1(Langmead et al., 2009). The DESeq package (Anders and Huber, 2010)was used for statistical testing, and significantly expressed genes wereselected based on an adjusted P value <0.05 controlled for false dis-covery (Benjamini and Hochberg, 1995). Independently, the Cufflinkspackage (Trapnell et al., 2012) was also used to compare differentialgene expression. Transcriptome datawere also normalized in RPKMasthe transcriptome depends on the transcript length (Mortazavi et al.,2008).

DNA Isolation and PCR Amplifications

Soybean genomic DNA was isolated from freeze-dried shoot tips bystandard methods using phenol-chloroform precipitation. PCR experi-ments were performed in a 50-mL reaction mixture containing 1 mg ofgenomicDNA in5mLofPCRbuffer, 2.5mMMg2+,0.4mMdeoxynucleotidetriphosphate, 2.5 units of Takara EX Taq polymerase, and 0.2 mM forwardand reverse primers. The following primers were used to span exon7 of Glyma.11G190900 with a predicted fragment of 583 bp: BC41F,59-GTGAGCAAGTTATGCTATTGTTTCCCTTC-39, and BC41R,59-TAGAGTTTTCTCTATCATGAGGACGCTGA-39. PCR amplificationswere conducted in a PTC-200 programmable thermocycler (MJ Research)via an initial denaturation step at 94°C for 1 min followed by 30 cycles ofdenaturing at 94°C for 10 s, annealing at 55°C for 1 min, and elongation at72°C for 1 min, to end with a 2-min extension at 72°C. The amplified re-actions were separated on a 0.7% agarose gel and bands stained via GelGreen (Biotium).

Whole-Genome Resequencing and Data Analysis

Whole-genome sequencing library construction and high-throughputsequencing were performed by the Keck Center. The libraries were pre-pared with Kapa library construction kits (https://www.kapabiosystems.com) and quantitated by qPCR and sequenced on one lane for 100 cyclesfrom each end of the fragments on a HiSeq 2000 (Illumina) using a TruSeqSBS sequencing kit version 3 and analyzedwithCasava1.8 (pipeline 1.9).The average size of the DNA fragments was around 500 nucleotides;insert size is 300 nucleotides. Quality check for sequencing data wasdone by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Alignments were performed using Bowtie2 (Langmead andSalzberg, 2012). We used default alignment and reporting options.Bowtie2 output files, in SAM format, were converted to the BAM formatand sorted by Samtools (Li et al., 2009). The Integrative Genomic Viewer(IGV) was used to convert from BAM to TDF format and visualize nor-malized [count at base 3 one million/total number of reads] data(Robinson et al., 2011). To call single-nucleotide polymorphisms, weused the Samtools command (mpileup -uf) and converted BCF to VCFformat to visualize in the IGV.

Amplicon Sequencing and Analyses

A 6.2-kb amplicon containing the entire region of Glyma.11G190900 plus196basesof theregionupstreamofthe59UTRwasamplifiedfromninedifferentsoybean parent lines or mutations containing eitherK1 or k1 genotypes usingthefollowingprimers: forwardprimer,59-TTCACCACACTACTCAAGAT-39andreverse primer 59-ATTAATATAACAAGCTGACG-39. PCR amplifications wereperformed as above in Corning Axygen thin walled tubes in a PTC-200 DNAEngine from MJ Research using modified reaction times as follows: initialdenaturationstepat96°C for2min followedby39cyclesofdenaturingat96°Cfor 20 s, annealing at 55°C for 1 min, and elongation at 72°C for 4 min, toend with a 7-min extension at 72°C. For each soybean genotype, fourreactions were pooled and purified with a Zymo DNA Clean and Con-centrator kit and the concentration adjusted to between 40 and 64 ng/mL.A total of 35 mL was submitted for barcoding of PCR reactions, ampliconsequencing with the Illumina MySeq, and automated assembly by theCenter for Computational and Integrative Biology DNA Core Facility atMassachusetts General Hospital (Cambridge, MA). Coverage of eachamplicon ranged from 69,609 to 191,356 reads, and representation ofeach base in an assembled contig sequence was generally >1000 withthe great majority being perfect matches to the consensus contig.Amplicon sequences were aligned to the gene, transcript, or proteinsequences of Glyma.11G190900 AGO5 using Multalin (Corpet, 1988).

Accession Numbers

The data have been entered into Gene Expression Omnibus undersuperseries GSE43347 for the 16 small RNA samples comparing thepatterned seedcoats and seriesGSE89126 for the 12RNA-seq samples ofPI547450 (Clark 8 ik K1) and PI547439 (Clark 18a ii k1), respectively, andSRP092073 and SRP092133 for the paired-end genomic sequencing ofPI548533 Clark (ii K1) and PI547439 (Clark 18a, ii k1), respectively. Am-plicon sequences were entered in GenBank as KY621334 for Clark 18a (ii

k1) andKY631911 toKY631917 andKY785328 for otherK1 and k1 lines asshown inSupplemental DataSet 5. Accession numbers forCHSgenes canbe found in Supplemental Data Set 2. Sequences for Glyma.11G190900(AGO5b) and Glyma.12G083500 (AGO5a) can be found in Phytozome athttps://phytozome.jgi.doe.gov.

Supplemental Data

Supplemental Figure 1. The Pedigree of Clark Isolines That Show theSaddle Pattern Phenotype.

722 The Plant Cell

Page 16: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Supplemental Figure 2. Cufflinks Plots Comparing RNA-Seq Datafrom Clark (ii K1) and Clark 18a (ii k1).

Supplemental Figure 3. Mapping Information Resources at Soybasefor the K1 Locus.

Supplemental Figure 4. Alignment of Nine AGO5 Proteins Derivedfrom Amplicon Sequences Comparing K1 and k1 Lines.

Supplemental Figure 5. Alignment of Two Closely Related SoybeanAGO5 Coding Sequence Transcripts.

Supplemental Figure 6. Phylogenetic Tree for Soybean ArgonauteGenes.

Supplemental Data Set 1. Summary of Sequencing Data Generatedfor Small RNAs, mRNAs, and Genomic DNAs.

Supplemental Data Set 2. Small RNA Levels of 21 CHS GlymaModels Comparing Tissues from Pattern Genotypes as Shown inFigures 3 and 4.

Supplemental Data Set 3. RNA-Seq Levels of 21 CHS Glyma ModelsComparing Tissues from the Pattern Genotypes of Clark 8 (ik K1) andClark 18a (ii k1).

Supplemental Data Set 4. Glyma Models with Differential RNA-SeqExpression on Chromosome 11 in the Region of a Putative Markernear the K1 Locus.

Supplemental Data Set 5. Summary of AGO Amplicon SequencingResults for Standard K1 and Mutant k1 Lines and Example Output.

Supplemental Data Set 6. Expression from RNA-Seq Data for20 AGO Genes within Different Regions of the Saddle Pattern SeedCoats Having the Same Genotypes.

Supplemental File 1. Fasta File of the Alignment Used to Generate thePhylogenetic Tree in Supplemental Figure 6.

ACKNOWLEDGMENTS

We thank Alvaro Hernandez and staff of the University of Illinois High-Throughput Sequencing Unit of the Biotechnology Center for the smallRNA, RNA-seq, and genome resequencing services. We thank NicoleStang-Thomann and the staff of the Center for Computational andIntegrative Biology (CCIB) at Massachusetts General Hospital for theuse of the CCIB DNA Core Facility (Cambridge, MA), which providedamplicon sequencing services. We thank undergraduate and academicassistants Drew Metz, Achira Kulasekara, and Berwin Xie for help withdatahandlingandpythonscripting.Weacknowledgeseedobtained fromthe USDA Soybean Germplasm Collections (Urbana, IL) available fromthe Germplasm Information Resources Network (GRIN). The researchwas supported by grants from theUnited SoybeanBoard, theUSDA, andthe Illinois Soybean Association.

AUTHOR CONTRIBUTIONS

Y.B.C. and S.I.J. collected tissue samples, performed experiments, ana-lyzed and interpreted data, and drafted results. L.O.V. designed initialapproach, led and coordinated the project, interpreted data, draftedsections, and edited the manuscript. All authors read and approved themanuscript.

Received February 26, 2017; revisedMarch 28, 2017; acceptedMarch 28,2017; published March 28, 2017.

REFERENCES

Anders, S., and Huber, W. (2010). Differential expression analysis forsequence count data. Genome Biol. 11: R106.

Arikit, S., Xia, R., Kakrana, A., Huang, K., Zhai, J., Yan, Z., Valdés-López, O., Prince, S., Musket, T.A., Nguyen, H.T., Stacey, G., andMeyers, B.C. (2014). An atlas of soybean small RNAs identifiesphased siRNAs from hundreds of coding genes. Plant Cell 26:4584–4601.

Baulcombe, D. (2004). RNA silencing in plants. Nature 431: 356–363.Benjamini, Y., and Hochberg, Y. (1995). Controlling the false dis-

covery rate: a practical and powerful approach to statistical testing.J. R. Stat. Soc. B 57: 289–300.

Bernard, R.L., and Weiss, M.G. (1973). Qualitative genetics. InSoybeans: Improvement, Production, and Uses, 1st ed, B.E.Caldwell, ed (Madison, WI: American Society of Agronomy), pp.117–149.

Borges, F., Pereira, P.A., Slotkin, R.K., Martienssen, R.A., andBecker, J.D. (2011). MicroRNA activity in the Arabidopsis malegermline. J. Exp. Bot. 62: 1611–1620.

Borges, F., and Martienssen, R.A. (2015). The expanding world ofsmall RNAs in plants. Nat. Rev. Mol. Cell Biol. 16: 727–741.

Carlsbecker, A., et al. (2010). Cell signalling by microRNA165/6 di-rects gene dose-dependent root cell fate. Nature 465: 316–321.

Chapman, E.J., and Carrington, J.C. (2007). Specialization andevolution of endogenous small RNA pathways. Nat. Rev. Genet. 8:884–896.

Chen, H.-M., Chen, L.T., Patel, K., Li, Y.H., Baulcombe, D.C., andWu, S.H. (2010). 22-Nucleotide RNAs trigger secondary siRNAbiogenesis in plants. Proc. Natl. Acad. Sci. USA 107: 15269–15274.

Cho, Y.B., Jones, S.I., and Vodkin, L. (2013). The transition fromprimary siRNAs to amplified secondary siRNAs that regulate chal-cone synthase during development of Glycine max seed coats.PLoS One 8: e76954.

Clough, S.J., Tuteja, J.H., Li, M., Marek, L.F., Shoemaker, R.C., andVodkin, L.O. (2004). Features of a 103-kb gene-rich region in soy-bean include an inverted perfect repeat cluster of CHS genescomprising the I locus. Genome 47: 819–831.

Corpet, F. (1988). Multiple sequence alignment with hierarchicalclustering. Nucleic Acids Res. 16: 10881–10890.

Cuperus, J.T., Carbonell, A., Fahlgren, N., Garcia-Ruiz, H., Burke,R.T., Takeda, A., Sullivan, C.M., Gilbert, S.D., Montgomery, T.A.,and Carrington, J.C. (2010). Unique functionality of 22-nt miRNAsin triggering RDR6-dependent siRNA biogenesis from target tran-scripts in Arabidopsis. Nat. Struct. Mol. Biol. 17: 997–1003.

Curtin, S.J., et al. (2015). MicroRNA maturation and microRNA targetgene expression regulation are severely disrupted in soybean dicer-like1 double mutants. G3 (Bethesda) 6: 423–433.

Della Vedova, C.B., Lorbiecke, R., Kirsch, H., Schulte, M.B.,Scheets, K., Borchert, L.M., Scheffler, B.E., Wienand, U.,Cone, K.C., and Birchler, J.A. (2005). The dominant inhibitorychalcone synthase allele C2-Idf (inhibitor diffuse) from Zea mays (L.)acts via an endogenous RNA silencing mechanism. Genetics 170:1989–2002.

De Paoli, E., Dorantes-Acosta, A., Zhai, J., Accerbi, M., Jeong,D.-H., Park, S., Meyers, B.C., Jorgensen, R.A., and Green, P.J.(2009). Distinct extremely abundant siRNAs associated with co-suppression in petunia. RNA 15: 1965–1970.

Diaz-Pendon, J.A., Li, F., Li, W.X., and Ding, S.W. (2007). Sup-pression of antiviral silencing by cucumber mosaic virus 2b proteinin Arabidopsis is associated with drastically reduced accumulationof three classes of viral small interfering RNAs. Plant Cell 19: 2053–2063.

A Saddle Phenotype Mediated by an Argonaute Protein 723

Page 17: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Eckardt, N.A. (2009). Tissue-specific siRNAs that silence CHS genesin soybean. Plant Cell 21: 2983–2984.

Fang, X., and Qi, Y. (2016). RNAi in plants: an Argonaute-centeredview. Plant Cell 28: 272–285.

Fei, Q., Xia, R., and Meyers, B.C. (2013). Phased, secondary, smallinterfering RNAs in posttranscriptional regulatory networks. PlantCell 25: 2400–2415.

Fusaro, A.F., Matthew, L., Smith, N.A., Curtin, S.J., Dedic-Hagan,J., Ellacott, G.A., Watson, J.M., Wang, M.B., Brosnan, C., Carroll,B.J., and Waterhouse, P.M. (2006). RNA interference-inducinghairpin RNAs in plants act through the viral defence pathway. EMBORep. 7: 1168–1175.

Gillman, J.D., Tetlow, A., Lee, J.-D., Shannon, J.G., and Bilyeu, K.(2011). Loss-of-function mutations affecting a specific Glycine maxR2R3 MYB transcription factor result in brown hilum and brownseed coats. BMC Plant Biol. 11: 155.

Grant, D., Nelson, R.T., Cannon, S.B., and Shoemaker, R.C. (2010).SoyBase, the USDA-ARS soybean genetics and genomics data-base. Nucleic Acids Res. 38 (suppl. 1): D843–D846.

Jeong, D.-H., Thatcher, S.R., Brown, R.S., Zhai, J., Park, S.,Rymarquis, L.A., Meyers, B.C., and Green, P.J. (2013). Compre-hensive investigation of microRNAs enhanced by analysis of se-quence variants, expression patterns, ARGONAUTE loading, andtarget cleavage. Plant Physiol. 162: 1225–1245.

Kasai, A., Ohnishi, S., Yamazaki, H., Funatsuki, H., Kurauchi, T.,Matsumoto, T., Yumoto, S., and Senda, M. (2009). Molecularmechanism of seed coat discoloration induced by low temperaturein yellow soybean. Plant Cell Physiol. 50: 1090–1098.

Komiya, R., Ohyanagi, H., Niihama, M., Watanabe, T., Nakano, M.,Kurata, N., and Nonomura, K. (2014). Rice germline-specific Ar-gonaute MEL1 protein binds to phasiRNAs generated from morethan 700 lincRNAs. Plant J. 78: 385–397.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009).Ultrafast and memory-efficient alignment of short DNA sequencesto the human genome. Genome Biol. 10: R25.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read align-ment with Bowtie 2. Nat. Methods 9: 357–359.

Lehner, B. (2011). Molecular mechanisms of epistasis within andbetween genes. Trends Genet. 27: 323–331.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N.,Marth, G., Abecasis, G., and Durbin, R.; 1000 Genome ProjectData Processing Subgroup (2009). The sequence alignment/mapformat and SAMtools. Bioinformatics 25: 2078–2079.

Liu, X., Lu, T., Dou, Y., Yu, B., and Zhang, C. (2014). Identification ofRNA silencing components in soybean and sorghum. BMC Bio-informatics 15: 4.

Mallory, A.C., Reinhart, B.J., Bartel, D., Vance, V.B., and Bowman,L.H. (2002). A viral suppressor of RNA silencing differentially reg-ulates the accumulation of short interfering RNAs and micro-RNAsin tobacco. Proc. Natl. Acad. Sci. USA 99: 15228–15233.

Matzke, M.A., and Birchler, J.A. (2005). RNAi-mediated pathways inthe nucleus. Nat. Rev. Genet. 6: 24–35.

Mi, S., et al. (2008). Sorting of small RNAs into Arabidopsis argonautecomplexes is directed by the 59 terminal nucleotide. Cell 133: 116–127.

Morita, Y., Saito, R., Ban, Y., Tanikawa, N., Kuchitsu, K., Ando, T.,Yoshikawa, M., Habu, Y., Ozeki, Y., and Nakayama, M. (2012).Tandemly arranged chalcone synthase A genes contribute to thespatially regulated expression of siRNA and the natural bicolor floralphenotype in Petunia hybrida. Plant J. 70: 739–749.

Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold,B. (2008). Mapping and quantifying mammalian transcriptomes byRNA-Seq. Nat. Methods 5: 621–628.

Nagamatsu, A., Masuta, C., Senda, M., Matsuura, H., Kasai, A., Hong,J.-S., Kitamura, K., Abe, J., and Kanazawa, A. (2007). Functionalanalysis of soybean genes involved in flavonoid biosynthesis by virus-induced gene silencing. Plant Biotechnol. J. 5: 778–790.

Napoli, C., Lemieux, C., and Jorgensen, R. (1990). Introduction ofa chimeric chalcone synthase gene into petunia results in reversibleco-suppression of homologous genes in trans. Plant Cell 2: 279–289.

Palmer, R.G., and Kilen, T.C. (1987). Qualitative genetics and cyto-genetics. In Soybeans: Improvement, Production and Uses, 2nd ed,J.R. Wilcox, ed (Madison, WI: American Society of Agronomy), pp.135–209.

Palmer, R.G., Pfeiffer, T.W., Buss, G.R., and Kilen, T.C. (2004).Qualitative genetics. In Soybeans: Improvement, Production andUses, H.G. Boerma and J.E. Specht, eds (Madison, WI: AmericanSociety of Agronomy), pp. 137–233.

Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M.,Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative ge-nomics viewer. Nat. Biotechnol. 29: 24–26.

Schmutz, J., et al. (2010). Genome sequence of the palaeopolyploidsoybean. Nature 463: 178–183.

Senda, M., Masuta, C., Ohnishi, S., Goto, K., Kasai, A., Sano, T.,Hong, J.S., and MacFarlane, S. (2004). Patterning of virus-infectedGlycine max seed coat is associated with suppression of endoge-nous silencing of chalcone synthase genes. Plant Cell 16: 807–818.

Takagi, F. (1929). On the inheritance of some characters in Glycinesoja (Bentham) (soybean). Sci. Rep. Tohuku Univ. Ser. 4: 577–589.

Todd, J.J., and Vodkin, L.O. (1993). Pigmented soybean (Glycinemax) seed coats accumulate proanthocyanidins during de-velopment. Plant Physiol. 102: 663–670.

Todd, J.J., and Vodkin, L.O. (1996). Duplications that suppress anddeletions that restore expression from a chalcone synthase multi-gene family. Plant Cell 8: 687–699.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R.,Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012).Differential gene and transcript expression analysis of RNA-seqexperiments with TopHat and Cufflinks. Nat. Protoc. 7: 562–578.

Tucker, M.R., Okada, T., Hu, Y., Scholefield, A., Taylor, J.M., andKoltunow, A.M. (2012). Somatic small RNA pathways promote themitotic events of megagametogenesis during female reproductivedevelopment in Arabidopsis. Development 139: 1399–1404.

Tuteja, J.H., Clough, S.J., Chan, W.C., and Vodkin, L.O. (2004). Tissue-specific gene silencing mediated by a naturally occurring chalconesynthase gene cluster in Glycine max. Plant Cell 16: 819–835.

Tuteja, J.H., and Vodkin, L.O. (2008). Structural features of the en-dogenous CHS silencing and target loci in the soybean genome.Crop Sci. 48: 49–69.

Tuteja, J.H., Zabala, G., Varala, K., Hudson, M., and Vodkin, L.O.(2009). Endogenous, tissue-specific short interfering RNAs silencethe chalcone synthase gene family in glycine max seed coats. PlantCell 21: 3063–3077.

van der Krol, A.R., Mur, L.A., Beld, M., Mol, J.N.M., and Stuitje, A.R.(1990). Flavonoid genes in petunia: addition of a limited number ofgene copies may lead to a suppression of gene expression. PlantCell 2: 291–299.

Wang, C.S., and Vodkin, L.O. (1994). Extraction of RNA from tissuesthat contain high levels of procyanidins that bind RNA. Plant Mol.Biol. Report. 12: 132–145.

Wang, C.S., Todd, J.J., and Vodkin, L.O. (1994). Chalcone synthasemRNA and activity are reduced in yellow soybean seed coats withdominant I alleles. Plant Physiol. 105: 739–748.

Wilcox, J.R. (1988). Performance and use of seed coat mutants insoybean. Crop Sci. 28: 30–32.

724 The Plant Cell

Page 18: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

Williams, L.F. (1945). Off-colored seeds in the Lincoln soybean.Soybean Digest 5: 50–61.

Williams, L.F. (1952). The inheritance of certain black and brownpigments in the soybean. Genetics 37: 208–215.

Williams, L.F. (1958). Alteration of dominance and apparent change indirection of gene action by a mutation at another locus affecting thepigmentation of the seed coat of the soybean. Proc. Intl. Natl. Cong.Genet. 10: 315–316.

Zabala, G., and Vodkin, L.O. (2014). Methylation affects transpositionand splicing of a large CACTA transposon from a MYB transcriptionfactor regulating anthocyanin synthase genes in soybean seedcoats. PLoS One 9: e111959.

Zhai, J., Zhang, H., Arikit, S., Huang, K., Nan, G.-L., Walbot, V., andMeyers, B.C. (2015). Spatiotemporally dynamic, cell-type-dependentpremeiotic and meiotic phasiRNAs in maize anthers. Proc. Natl. Acad.Sci. USA 112: 3146–3151.

A Saddle Phenotype Mediated by an Argonaute Protein 725

Page 19: Mutations in Argonaute5 Illuminate Epistatic Interactions ...Mutations in Argonaute5 Illuminate Epistatic Interactions of the K1 and I Loci Leading to Saddle Seed Color Patterns in

DOI 10.1105/tpc.17.00162; originally published online March 28, 2017; 2017;29;708-725Plant Cell

Young B. Cho, Sarah I. Jones and Lila O. VodkinGlycine maxSeed Color Patterns in

Loci Leading to SaddleI and K1 Illuminate Epistatic Interactions of the Argonaute5Mutations in

 This information is current as of October 29, 2020

 

Supplemental Data

/content/suppl/2017/04/28/tpc.17.00162.DC2.html /content/suppl/2017/03/28/tpc.17.00162.DC1.html /content/suppl/2017/05/02/tpc.17.00162.DC3.html

References /content/29/4/708.full.html#ref-list-1

This article cites 58 articles, 22 of which can be accessed free at:

Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists