82
The crown-of-thorns starfish genome as a guide for biocontrol of this coral reef pest Supplementary Notes 1. Background information on the crown-of-thorns starfish……………………………………..............2 2. Genome and transcriptome sequencing, assembly and annotation………………………………….8 3. Comparison of GBR and OKI genomes…………………………………………………………………………….13 4. Phylogenomic and population genomic analyses………….………………………………………………..21 5. Protein domain analyses…………………………………………………………………………………………………23 6. Tissue gene expression analyses……………………………………………………………..........................26 7. Exoproteome analyses……………………………………………………..…………………………………………….32 8. Identification and analysis of ependymin-related genes……………………..………………………….62 9. Identification and analysis of GPCRs…………………………………………………….…………………………69 10. References……………………………………………………………………………………………………………………75 WWW.NATURE.COM/NATURE | 1 SUPPLEMENTARY INFORMATION doi:10.1038/nature22033

SUPPLEMENTAR Y INFORMATION - media.nature.com · century the primary method against COTS was through lethal injection with a range of chemical substances with at least 100 reported

Embed Size (px)

Citation preview

The crown-of-thorns starfish genome as a guide for biocontrol of this coral

reef pest

Supplementary Notes

1. Background information on the crown-of-thorns starfish……………………………………..............2

2. Genome and transcriptome sequencing, assembly and annotation………………………………….8

3. Comparison of GBR and OKI genomes…………………………………………………………………………….13

4. Phylogenomic and population genomic analyses………….………………………………………………..21

5. Protein domain analyses…………………………………………………………………………………………………23

6. Tissue gene expression analyses……………………………………………………………..........................26

7. Exoproteome analyses……………………………………………………..…………………………………………….32

8. Identification and analysis of ependymin-related genes……………………..………………………….62

9. Identification and analysis of GPCRs…………………………………………………….…………………………69

10. References……………………………………………………………………………………………………………………75

WWW.NATURE.COM/NATURE | 1

SUPPLEMENTARY INFORMATIONdoi:10.1038/nature22033

1. Background information on the crown-of-thorns starfish

The crown-of-thorns starfish (COTS), Acanthaster planci, is a corallivore predator asteroid

echinoderm (Class Asteroidea, Order Valvitida) and part of the native biodiversity of coral

reef ecosystems throughout the Indo-Pacific. When population densities are low their

predation rate has little noticeable impact on coral cover (Pratchett et al. 2014). However,

the species exhibits dramatic fluctuations in population density with outbreak populations

numbering hundreds of thousands to several millions with densities of >150,000 COTS km2

(Kayal et al. 2012). Outbreak events typically lead to a significant loss of scleractinian coral

cover, as well as changes to overall coral reef biodiversity (Porter 1972, Bouchon 1985,

Pratchett 2010). COTS outbreaks were reported 82 times prior to 1990 across the Indo-

Pacific but since then have been noted at least 246 times and in the most severe cases more

than 96% of coral cover can be consumed (Moran 1986, Birkeland and Lucas 1990, Pratchett

et al. 2014).

Outbreaks of COTS continue to occur across the Indo-Pacific; from the coast of South Africa

to the Gulf of California and are accentuating the degradation of coral reefs due to the

accumulative effects of other regional and global stressors (Adjeroud et al. 2009; Barham et

al. 1973; Cameron et al. 1991; Pearson 1981; Schleyer and Celliers 2003). Whereas

numerous stressors have been identified as drivers of coral cover loss throughout their

geographical range, COTS have been responsible for far more loss of coral in many areas of

the Indo-Pacific than attributed to any other cause with the exception of the sweeping

indiscriminate destructive high energy impacts of cyclones (Osborne et al. 2011, Sweatman

et al. 2011; De’ath et al. 2012, Kayal et al. 2012; Riegl et al. 2013, Gouezo et al. 2015, Mori

et al. 2015). The effects of COTS outbreaks can result in a dramatic decrease of live coral

cover ranging from 37% to over 99% (De’ath et al. 2012; Yasuda et al. 2009; Riegl et al.

2013; Hock et al. 2014). Recently it has been shown that coral reefs with high densities of

COTS are potentially more susceptible to climate change and storm activity (Baird et al.

2013). Recovery from COTS outbreak events require between 10-20 years such that

repetitive occurrences lead to decreased resilience of coral reefs, and mitigation strategies

WWW.NATURE.COM/NATURE | 2

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

are increasingly considered necessary to control this species when in pest phase (Connell et

al. 2015).

Most starfish are carnivores and can occupy the top trophic level in benthic communities.

Several species are classified as keystone predators as they are consumers that have a

disproportionately large effect on benthic communities and ecosystems and COTS are no

exception (Menge and Freidenburg 2001; Menge and Sanford 2013; Pratchett 2010; Rilov

and Shiel 2011). Several biological attributes predispose COTS to warrant their keystone

species status (Stump 1990, Lawrence 2013). COTS are corallivores with a virtually unlimited

supply of pasture-like coral fields on extensive expansions of reef flats upon which to graze

and reefs that have high coral cover are particularly rich feeding grounds. The feeding mode

of COTS, via external digestion of coral flesh, includes extrusion of their stomach (under

neuroendocrine control) with a surface area up to 10-fold greater than other coral reef

starfish such as Linckia and Culcita (Semmens et al. 2013). With the ability to consume up to

7 to 10 m2 of coral flesh per year, the assimilation of nutritious coral flesh is highly efficient

with rapid somatic growth and gonadal development (Birkeland and Lucas 1990, Pratchett

et al. 2014). The tube feet are major organs of respiration in starfish but are significantly

covered by the extruded stomach during feeding (Yamaguchi 1975). To partially counteract

this COTS are covered with extensive respiratory papilla on their dorsal surface which allows

high oxygen consumption during feeding and digesting (Farmanfarmaian 1966, Cole and

Burggren 1981). Predation on adult COTS is minimized as they are adorned with numerous

sharp articulated calcareous spines overlaid by epidermal tissue coated with toxin-laden

mucus and puncture wounds from these spines results in envenomation of toxins, such as

plancitoxins, with diverse biological activities including death (Shiomi et al. 2004, Dong et al.

2011, Savitri et al. 2011). In addition, spinal glands can release saponins into the immediate

water column which can transverse fish gills causing haemorrhagic activity, respiratory

distress and death (Yasumoto et al. 1964, Hostettmann and Marston 1995). Furthermore, a

tendency to form large aggregations provides considerable barriers to many more

persistent predators. COTS are also able to combat a degree of predation with an ability

common to the echinoderm phyla, and one which they retain throughout life, which is to

regenerate their tissue when lost via a predatory attack, or indeed culling by slicing animals

in half, i.e. these events do not necessarily result in death (Messmer et al. 2013).

WWW.NATURE.COM/NATURE | 3

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

COTS have extreme fecundity, with disproportionate increases with increasing size; a 30 cm

diameter female is capable of carrying 15 million eggs whereas a 50 cm one can release 120

million (Conand 1984; Kettle and Lucas 1987). With fecundity tightly scaled to body size,

large COTS are remarkably prolific as well as having the highest recorded external

fertilization rates reported for marine invertebrates (Benzie et al. 1994). This dioecious

species forms particularly pronounced pre-spawning aggregations with synchronous

spawning often observed. Aggregations can result in the synchronized release of billions of

gametes, and if coupled to high successful fertilization, can result in a next generation

settlement recruitment of many millions of individuals (Babcock et al. 1994). Moreover, the

eggs themselves are toxic as are the larvae containing pronounced levels of saponins that

deter many coral reef fish predators (Yamaguchi 1973, 1977; Lucas et al. 1979; Gladstone

1992). With a highly elastic planktonic phase, being as short as 9-12 days or as long as many

months, larval dispersal can result in highly localised recruitment, or wide dispersal across

open oceans that populate distance reefs (Yamaguchi 1977, Timmers et al. 2012, Nakamura

et al. 2014). In short, COTS possess biological traits with all the prerequisites bestowing it as

an outbreak species. Extreme population fluctuations, and hence outbreaks, are part and

parcel of this species. Outbreaks will occur and the challenge is to identify potential control

points that can mitigate them when they inevitably do take place.

It is generally accepted that outbreaks of COTS are one of the few disturbances on coral

reefs that is amenable to pest control management by direct intervention (Tables S1.1 and

S1.2). Integrated pest control deploys a variety of actions to ensure favourable ecological

and economic consequences (Babendreier 2007). The use of either chemical or biological

control has proven successful against a variety of terrestrial pests over large areas (Van

Lenteren 1988). Pest control targets specific life stages of the animal and in COTS this could

include the 10-40 day larval phase, the 6-month coralline algae feeding juvenile stage and

the 1-7 year coral eating adult phase, including pre-spawning aggregations (Birkeland and

Lucas 1990). Of these, the only one readily amenable to pest control has been upon adults

(Fraser et al. 2000). For COTS, control approaches can be broadly classified as mechanical,

including hand-picking, lethal injection and barriers (Rivera-Posada and Pratchett 2012).

Physical removal of individual adults and disposal by burial ashore managed to remove

WWW.NATURE.COM/NATURE | 4

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

220,000 COTS in 1957 during an outbreak on the Miyako Islands, Japan (Table S1.1; Moran

1986). However, the majority of control measures have been through chemical control,

which was first used against starfish in 1936 (Galtsoff and Loosanoff 1939). For over half a

century the primary method against COTS was through lethal injection with a range of

chemical substances with at least 100 reported mitigation programs culling approximately

16 million COTS (Table S1.1; Rivera-Posada and Pratchett 2012). Although lethal injection

has been in use for over 50 years the only advances have been in the efficiency of the toxic

substance used, which has included sodium bisulfate, sodium hypochlorite, ammonia,

ammonium hydroxide, acetic acid, formaldehyde and copper sulfate (Table S1.2; Birkeland

and Lucas 1990; Pratchett et al. 2014). Many of these substances have unintentional

detrimental impacts due to their generalised toxicity on the immediate environment. These

chemicals were administered in solution through hypodermic syringe injections requiring

multiple injections to kill an adult. Advances of this approach have been extremely limited

and restricted to an efficiency improvement; previously multiple injections were required to

kill an animal whereas single shot methods have been developed to include ox bile and lime

juice as the toxic agents (Rivera-Posada et al. 2014; Moutardier et al. 2015).

For pest control management there is a strong case to examine other potentially more

effective avenues and in particular to preferentially discover species-specific vulnerabilities

to develop a more efficient and large scale control technology against COTS. For example,

identification of insect neuropeptide receptors and their guanine nucleotide-binding (G)

protein-coupled receptor (GPCR) signalling systems is leading to the development of new

generation species-specific pest control (Caers et al. 2012, Cohen 2014, Audsley and Down

2015). For equivalent approaches to be developed against COTS there is the requirement for

a thorough knowledge of its genome. In order to gain insight into possible genetic

mechanisms involved in rapid and recent population expansion, we sequenced, annotated,

and compared the genomes of two individual COTS specimens, one from the Great Barrier

Reef (‘GBR’) and the other from OKInawa (‘OKI’).

1.1 Control program and methods

For more than 50 years mitigation efforts have been used to limit or prevent the impact of

mass population influx of COTS onto coral reefs (Birkeland and Lucas 1990). The ultimate

WWW.NATURE.COM/NATURE | 5

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

objective of these control programs has been to prevent coral mortality from going beyond

acceptable limits. Over 120 culling efforts have been reported since 1957 with over 17.6

million starfish being eliminated (Table S1.1). The methods applied to cull COTS has been,

and remains, limited to either physical removal or injection with a solution of a toxic

chemical by divers. These established methods all involve the treatment of each individual

starfish. The tendency for COTS to be concealed during daylight hours and reinfest through

immigration from neighbouring areas requires systematic and repeated sweeps over each

area being controlled. Such efforts are very labour intensive, and potentially exorbitantly

expensive, and by their very nature limited to local scales.

Manual collection followed by disposal ashore is the most common technique used in the

Pacific. Chopping up starfish in situ has also been deployed but the animal must be

quartered, at a minimum, through the central disc and nerves to prevent regeneration

(Messmer et al. 2013). An important operational parameter for efficiency is time and as

handling and chopping a starfish into pieces it is not optimal. Air injection of COTS has been

trialled, with the resultant floating starfish collected from the surface but maximal

harvesting rates are only 21 starfish per hour (Kenchington and Pearson 1981). Both these

methods entail a high risk of injury as handling exposes the operator to puncture wounds

from the numerous toxic spines of COTS (Lee et al. 2013). The use of lethal injections, with

an extension arm attached to syringe reservoirs, improves protection to the operator and

this approach is being increasing used in culling programs (Rivera-Posada and Pratchett

2012). Whereas culling efficiency is approximately 38 COTS per hour through physical

removal, it increases to 132 COTS per hour when lethal injection is used (Kenchington and

Pearson 1981). A wide variation of toxic chemicals have been either tested or deployed in

culling programs (Table S1.2). Toxicity action ranges from compounds which disrupt of acid-

base metabolism, from very low to very high pKas (i.e. 1.9 for sodium bisulphate to 9.25 for

ammonia), to heavy metal poisoning (i.e. copper), to fixatives (i.e. formalin), to detergents

(i.e. bile salts). Some of these compounds require high concentrations and multiple

injections, decreasing culling rate efficiency. Others may induce autotomy and not outright

death. Recent gall (ox bile), for culling programs on the Great Barrier Reef, with a detergent

reaction, has proven lethal with a single injection and a resultant improvement of culling

efficiency to +211 COTS per hour (Rivera-Posada et al. 2014). Although there have been

WWW.NATURE.COM/NATURE | 6

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

marginal improvements in efficiencies, all of these methods require treatment of COTS one-

by-one such that it must plateau, although further improvements may be made through the

use of autonomous vehicles (Takemura et al. 2015). Nevertheless, there are clearly benefits

to be made if genomic technologies can be developed with integrated pest management

strategies to improve culling efficiency.

Additional Supplementary Tables

Table S1.1. History of COTS control programs, numbers culled and method deployed*

* Original references are quoted in the major reviews of Randall 1972, Cheney 1973, Endean and

Chesher 1973, Yamaguchi 1986, Zann and Weaver 1988, Birkeland and Lucas 1990 and Rivera-

Posada and Pratchett 2012.

Table S1.2. Chemicals used for lethal injection in COTS control programs

WWW.NATURE.COM/NATURE | 7

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

2. Genome and transcriptome sequencing, assembly and annotation

2.1 Genome sequencing and assembly

Sequencing statistics, genome size estimates, assembly, heterozygosity estimates, gene

model, and transcriptome results for the GBR and OKI genomes are summarized in

Extended Data Table 1 and Fig. S2.1.

Paired-end assemblies yielded 17,868 and 17,265 contigs with N50s of 54,939 and 54,788

bp for GBR and OKI, respectively (Extended Data Table 1), and mate-pair sequencing data

(Boetzer et al. 2011) resulted in final assemblies of 383.5 and 383.8 Mb into 3274 and 1765

scaffolds with N50s of 917 and 1,521 kb for GBR and OKI genomes respectively (Extended

Data Table 1). Both genomes have a GC content of 41.3%.

A comparison of the COTS genomes with other marine invertebrate deuterostome genomes

suggests that they are relatively high quality (Table S2.1). This may be because of the low

heterozygosity within each genome; 0.88 and 0.92% within GBR and OKI genomes

(Extended Data Table 1).

2.2 Genome size estimation

k-mer analysis (Chapman et al. 2011) of raw reads indicated GBR genome size of 441 Mb

and 421 Mb for OKI (Table S2.1); (ii) flow cytometry estimates for the OKI genome were 480

Mb; and (iii) scaffolded assemblies for both GBR and OKI equaled 384 Mb (Extended Data

Fig. 2). k-mer based genome size estimates (Chikhi & Medvedev 2014) determined the

optimal k-mer length for subsequent analysis was 17 (Extended Data Fig. 2).

2.3 Transcriptome sequencing and assembly

A summary of the tissue transcriptomes generated from GBR and OKI used in this study are

shown in Table S2.2 and histograms of Tuxedo genome-guided transcript expression are

shown in Fig. S2.2.

WWW.NATURE.COM/NATURE | 8

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

2.4 Gene modelling and annotation

Gene models were predicted using Augustus and PASA, and final gene models were

evaluated in EVidence Modeler (EVM; Fig. S2.1; Haas et al. 2008). The weight (importance)

for each type of evidence in EVM are shown in Table S2.3. Based on this approach, we

predict 24,747 genes in GBR and 24,323 genes in OKI (Extended Data Table 1). The A. planci

genome can be found on NCBI as BioProject PRJDB3175 (Table S2.4).

Fig. S2.1. COTS Genome Assembly and Annotation Pipeline. This figure summarizes the methods

used to sequence (in blue), assemble (in black), and annotate (purple and orange) two A. planci

genomes, OKI (red) and GBR (green), in parallel.

WWW.NATURE.COM/NATURE | 9

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Table S2.1. Marine invertebrate deuterostome genomes (adapted largely from Cameron et al. 2015)

Species Name, genome version phylum GenBank accession Length (Mb)

Scaffold number

Scaffold N50 (kb)

Contig number

Contig N50 (kb)

GC (%) Genes (#) Reference

Acanthaster planci (COTS), GBR-v1.0 Echinodermata - 383 3274 916 25012 35.7 41.31 24747 this report

Acanthaster planci (COTS), OKI-v1.0 Echinodermata - 383 1765 1521 24314 36.3 41.3 24323 this report

Patiria miniata, v1.0 Echinodermata GCA_000285935.1 811 60183 53 179756 9.4 40.2 29697 Cameron et al. (2015)

Strongylocentrotus purpuratus, v4.0 Echinodermata GCA_000002235.3 1032 31879 431 140454 17.6 38.3 31871 Cameron et al. (2015)

Lytechinus variegatus, v2.0 Echinodermata GCA_000239495.2 1061 322936 46 481804 9.7 36.4 28204 Cameron et al. (2015)

Saccoglossus kowalevskii, v1.1 Hemichordata - 758 7282 552 20913 89 38 34239 Simakov et al. (2015)

Ptychodera flava, v0.6 Hemichordata - 1229 218255 196 322077 7.6 37 34647 Simakov et al. (2015)

Ciona intestinalis, vKH Chordata GCA_000224145.2 115 1280 3102 6381 37 36.02 14983 Satou et al. (2008)

Ciona savignyi Chordata GCA_000149265.1 587 34009 601 74923 23 37.1 - Small et al. (2007)

Botryllus schlosseri Chordata GCA_000444245.1 580 120139 7 130124 7 40.6 - Voskoboynik et al. (2013)

Oikopleura dioica Chordata GCA_000209535.1 70 4196 22 6678 11 39.9 13505 Denoeud et al. (2010)

Branchiostoma floridae, v2.0 Chordata GCA_000003815.1 522 398 2587 41927 28 41.2 28627 Putnam et al. (2008)

WWW.NATURE.COM/NATURE | 10

doi:10.1038/nature22033 SUPPLEMENTARY INFORMATIONRESEARCH

Table S2.2. Summary of Acantasther planci transcriptomes

Trinity (de novo)

Tuxedo (Genome guided)

Source Tissue Genes (#)

Isoforms (#)

Contig N50

GC (%) Genes (#) Isoforms (#)

Aligned/paired reads (%)

GBR GBR GBR GBR GBR GBR

Testis* Podia* Spine*

Stomach* Body Wall*

(All GBR reads)

103915 96841 70975 91997 74119 93094

193591 153629

97780 154134 103046 153191

3440 3043 1949 3132 1774 3255

44.22 43.64 40.97 44.16 40.55 43.72

27819 23083 21105 23104 23833 29635

35469 30145 24780 29842 27789 52365

78.3 78.7 76.7 78.7 78.5 N/A

OKI OKI OKI OKI OKI OKI OKI OKI OKI OKI OKI

Testisº Podiaº Spineº

Mouthº Nerve-Female#1 Nerve-Female#2

Nerve-Male#1 Oocyte

Early Gastrula Middle Gastrula

(All OKI reads)

40482 85307

104055 25147 73842 67649 78054

164663 75552

147017 186200

35852 56760 64509 22322 53860 50909 56489

118728 49745 82413

110737

811 2642 2833

801 3006 3006 2352 1425 2306 2772 2853

42.32 42.94 43.16 38.47 43.41 43.32 43.15 41.80 43.29 43.38 43.11

18857 22215 24289 13065 21244 22211 25124 51470 21244 29068 33036

22387 28768 31576 13681 27173 26848 31221 55967 27173 36306 69261

73.2 74.1 73.2 72.7 66.4 65.5 65.0 62.6 64.0 63.1 N/A

OKI/GBR Total 259329 147429 3171 43.44 N/A N/A N/A

WWW.NATURE.COM/NATURE | 11

doi:10.1038/nature22033 SUPPLEMENTARY INFORMATIONRESEARCH

Fig. S2.2. Histograms of tissue RNA transcript abundance. Histograms of Tuxedo genome-

guided transcript expression. For each tissue sampled in either OKI or GBR, the overall

expression level (fpkm) for all transcripts of each tissue/sample type are plotted by

frequency. Overall, expression level histograms for similar tissue types confirm a general

overlap of expression patterns between OKI and GBR.

Table S2.3. Specific EVM weights assigned to transcript evidence used for EVM gene

prediction

Evidence type Description EVM weight *

TRANSCRIPT ABINITIO_PREDICTION ABINITIO_PREDICTION ABINITIO_PREDICTION OTHER_PREDICTION

PASA assembled transcripts from combined tissue transcriptomes Augustus SNAP GeneMarkES TransDecoder peptides predicted from PASA assembled transcripts

8 10

5 1 1

*least (1) to most (10) important

Additional Supplementary Tables

Table S2.4. Details on A. Planci NCBI BioProject PRJDB3175.

0.0

0.2

0.4

0.6

-7.5 -5.0 -2.5 0.0 2.5 5.0

log10(fpkm)

de

nsity

condition

X01_podia

X02_spine

X03_testis

X04_nerve_f1

X05_nerve_f2

X06_nerve_m1

X07_eg

X08_mg

X09_oocyte

X10_moucl

genes

0.0

0.2

0.4

-7.5 -5.0 -2.5 0.0 2.5 5.0

log10(fpkm)

de

nsity

condition

A_Body_wall

A_Gonad

A_Podia

A_Spine

A_Stomach

genes

Supplemental Figure 3.1: Histograms of RNA transcript expression level.

OKI

GBR

WWW.NATURE.COM/NATURE | 12

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

3. Comparison of GBR and OKI genomes

The final assembly and annotation of OKI and GBR genomes do not support either

cryptic speciation or marked genetic divergence between these two individuals,

despite being separated by 5000 km. This is based on single-nucleotide

polymorphism (SNP) analysis, gene model liftOver and whole genome alignment,

which all confirm that the two genomes share a high degree of conservation in both

the overall nucleotide sequence, and gene structure and organization.

3.1 Estimation of intra-genome heterozygosity

Overall genome heterozygosity was estimated by SNP analysis. 3,359,642 and

3,425,577 SNPs were identified in the GBR and OKI genomes, respectively, equating

to a SNP rate of 0.88% and 0.92% (Extended Data Fig. 2; Extended Data Table 1).

3.2 Estimates of inter-genome heterozygosity

OKI and GBR COTS genomic assemblies were aligned using by reciprocal BLASTN+

(Camacho et al. 2009) and found to be 98.8% identical, for either scaffolds longer

than 10 kb, or alignments with bit-scores over 10,000 (Extended Data Fig. 2; Fig.

S3.1). A 1.4% SNP rate was detected by mapping OKI reads to the GBR genome, and

vice versa, which is consistent with heterozygosity rate as measured by BLASTN+

alignments (Extended Data Table 1). Further, of these SNPs, approximately 64% are

common to both genomes (Fig. S3.2). A histogram of the number of SNPs per a

sliding 100 bp window, taken at 50 bp increments along the respective mapping

alignments, was generated (Fig. S3.2). Both COTS genomes show a geometric

distribution of SNPs, which suggests that COTS SNPs are caused by recombination

and not by random mutation (e.g. as would be the case if Poisson distribution were

observed) consistent with low overall genomic heterozygosity (Simakov et al. 2015).

WWW.NATURE.COM/NATURE | 13

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Fig. S3.1. Intergenomic heterozygosity by BLASTN alignment. OKI and COTS scaffolds were

aligned to each other using BLASTN+ to measure overall genomic heterozygosity between

OKI and GBR. The histograms above show the distribution of alignments with greater than

95% identity (top), or longer than 10 kb (bottom) for OKI scaffolds aligned to GBR (left) or

GBR scaffolds aligned to OKI (right). The arithmetic means for each set of BLASTN+

alignments are consistent with heterozygosity as measured by SNP analysis (Fig S3.2, Table

S2.1).

Supplemental Figure 4.1: Inter-genomic heterozygosity by blastN alignment

oki scaffolds blastN to gbr scaffolds

percent identity, for alignments with greater than 95% identity

Fre

qu

en

cy

95 96 97 98 99 100

02

00

40

06

00

oki scaffolds blastN to gbr scaffolds

percent identity, for alignments longer than 10kb

Fre

qu

en

cy

95 96 97 98 99 100

01

00

20

03

00

40

05

00

60

0

gbr scaffolds blastN to oki scaffolds

percent identity, for alignments with greater than 95% identity

Fre

qu

en

cy

95 96 97 98 99 100

01

00

200

30

04

00

50

06

00

gbr scaffolds blastN to oki scaffolds

percent identity, for alignments longer than 10kb

Fre

qu

en

cy

95 96 97 98 99 100

01

00

20

03

00

40

05

00

WWW.NATURE.COM/NATURE | 14

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Fig. S3.2. SNP analysis. a, Venn diagrams of the counts of SNPs common to both genomes,

based on remapping of SNPs across OKI and GBR. b, Histograms of SNP count per 100 bp

SNP window for OKI, GBR, and GBR reads mapped to OKI. These histograms show the

number of SNPs found in a 100 bp window, taken at 50 bp increments along the respective

mapping alignments. A geometric distribution (blue) suggests that the SNP distribution is

caused by recombination and not random mutation, consistent with genomes containing

low heterozygosity (Simakov et al. 2015).

3.3 LiftOver analysis between GBR and OKI genomes

Coordinate Conversion (liftOver) from the UCSC Genome Browser Utilities (Kent et

al. 2002). Settings were optimised to procure the maximal number of significant

gene model matches between GBR and OKI genomes (Fig. S3.3, Table S3.1). We

noted that relaxing the Similarity score from 1 to 0.95 led to the most dramatic

increase in lifted over gene models (e.g. from ~7,000 to 15,000-20,000 gene models,

across all block coverage scores), while relaxing beyond 0.95 Similarity score, or

relaxing the Block coverage score resulted in a more cumulative increase in the

number of lifted over genes. Thus, at the genomic level, the differences in gene

models between OKI and GBR is mostly the result of point mutations or small

differences in similarity (e.g. <0.95% ID), but differences in intron order or gene

WWW.NATURE.COM/NATURE | 15

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

synteny are also present. Next, to develop a final list of lifted over genes, we

compared different settings, and find that a similarity score of 0.9 and a block

coverage of 0.75 recovers 100% of the gene models that were recovered using a

similarity score of 0.5 and a block coverage of 0.5; however, the former recovers

7,598 more gene models than the latter (Fig. S3.4).

Fig. S3.3. Optimisation of similarity and block coverage variables in liftOver analysis.

Thus, for the final liftOver gene model list, we used a similarity score of 0.9 and a

block coverage value of 0.75 for the genes lifted over from GBR to OKI genomes

(Table S3.1). This resulted in 22,820 gene model liftOvers, based on 20,551 unique

OKI genes, and 20,997 unique GBR gene models (Tables S3.1, S3.2). Additionally, we

performed an analysis of the reasons the GBR gene models that were not recovered

by these parameters (Fig. S3.5; Tables S3.3-S3.6).

0

5,000

10,000

15,000

20,000

25,000

1

0.9

5

0.9

0.8

5

0.8

0.7

5

1

0.9

5

0.9

0.8

5

0.8

0.7

5

1

0.9

5

0.9

0.8

5

0.8

0.7

5

1

0.9

5

0.9

0.8

5

0.8

0.7

5

1

0.9

5

0.9

0.8

5

0.8

0.7

5

1

0.9

5

0.9

0.8

5

0.8

0.7

5

1 0.95 0.9 0.85 0.8 0.75

Genes lifted from Gbr to Oki

Genes lifted from Oki to Gbr

Block coverage

Similarity

WWW.NATURE.COM/NATURE | 16

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Fig. S3.4. Comparison and optimisation of liftOver settings.

WWW.NATURE.COM/NATURE | 17

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Table S3.1. Comparison of liftOver parameters and number of matching GBR and OKI gene models.

minBlock, minimum ratio

of alignment blocks or exons

that must remap

minMatch, minimum

ratio of bases that

must remap

GBR models to OKI models

OKI models to GBR models

% GBR to OKI

% OKI to GBR

1 1 7,255 7,173 29.32% 29.49% 1 0.95 16,370 16,004 66.15% 65.80% 1 0.9 16,806 16,421 67.91% 67.51% 1 0.85 16,925 16,540 68.39% 68.00% 1 0.8 16,975 16,596 68.59% 68.23% 1 0.75 17,011 16,629 68.74% 68.37%

0.95 1 7,256 7,173 29.32% 29.49% 0.95 0.95 16,923 16,530 68.38% 67.96% 0.95 0.9 17,372 16,960 70.20% 69.73% 0.95 0.85 17,491 17,082 70.68% 70.23% 0.95 0.8 17,541 17,139 70.88% 70.46% 0.95 0.75 17,577 17,172 71.03% 70.60%

0.9 1 7,260 7,179 29.34% 29.52% 0.9 0.95 18,345 17,942 74.13% 73.77% 0.9 0.9 18,900 18,526 76.37% 76.17% 0.9 0.85 19,049 18,690 76.97% 76.84% 0.9 0.8 19,109 18,756 77.22% 77.11% 0.9 0.75 19,148 18,792 77.38% 77.26%

0.85 1 7,262 7,182 29.34% 29.53% 0.85 0.95 19,186 18,750 77.53% 77.09% 0.85 0.9 19,870 19,478 80.29% 80.08% 0.85 0.85 20,061 19,695 81.06% 80.97% 0.85 0.8 20,143 19,784 81.40% 81.34% 0.85 0.75 20,187 19,831 81.57% 81.53%

0.8 1 7,263 7,185 29.35% 29.54% 0.8 0.95 19,512 19,072 78.85% 78.41% 0.8 0.9 20,279 19,874 81.95% 81.71% 0.8 0.85 20,504 20,131 82.85% 82.77% 0.8 0.8 20,608 20,251 83.27% 83.26% 0.8 0.75 20,659 20,307 83.48% 83.49%

0.75 1 7,269 7,192 29.37% 29.57% 0.75 0.95 20,122 19,621 81.31% 80.67%

0.75 0.9 20,997 20,551 84.85% 84.49%

0.75 0.85 21,286 20,883 86.01% 85.86% 0.75 0.8 21,435 21,062 86.62% 86.59% 0.75 0.75 21,517 21,166 86.95% 87.02%

Boxed row designates the liftOver parameters used in this study.

WWW.NATURE.COM/NATURE | 18

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Fig. S3.5. Flow diagram outlining the reasons for incomplete liftOver between GBR and OKI

genomes.

WWW.NATURE.COM/NATURE | 19

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Additional Supplementary Tables

Table S3.2. List of GBR-OKI gene model matches using liftOver parameters.

Table S3.3. Total list of genes that did not liftOver.

Table S3.4. List of genes that did not liftOver because of boundary problems.

Table S3.5. List of genes that did not liftOver because partially deleted.

Table S3.6. List of genes that did not liftOver because completely deleted.

WWW.NATURE.COM/NATURE | 20

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

21

4. Phylogenomic and population genomic analyses

4.1 Phylogenomic analyses

Our bioinformatics pipeline retained only genes that were inferred to be orthologous

among the HaMStR model organisms core orthologue set and sampled from at least

15 of 28 taxa. This resulted in a final matrix of 427 orthologous groups (OGs) totalling

95,585 amino acids positions in length. After excluding Alicut/Aliscore-trimmed

alignments shorter than 50 amino acids in length, the average OG length was 224

amino acids and the longest was 531 amino acids. All OGs were sampled from at

least 15 taxa but some were sampled for as many as 25 taxa with an average of 17

taxa sampled per OG. Missing data in the complete dataset were 47.31%.

We analysed a concatenated supermatrix of 427 genes (95,585 amino acids), 45.16%

missing data) recovering a fully-resolved tree consistent with the growing consensus

of echinoderm phylogeny (Telford et al. 2014; O’Hara et al. 2014; Cannon et al.

2014; Reich et al. 2015). With exception of support for hemichordate monophyly

(bootstrap support, bs = 98%), we found maximal support for all phylum- and class-

level taxa as well as relationships among them. Each of the five major lineages of

Echinodermata was recovered monophyletic with Crinoidea sister to all other

echinoderms (Eleutherozoa). Within Eleutherozoa, we found strong support for

Echinoidea (Echinoidea + Holothuroidea) sister to Asterozoa (Ophiuroidea +

Asteroidea; bs= 100), consistent with other recent investigations that had more

limited sampling within Asteroidea (Telford et al. 2014; O’Hara et al. 2014; Cannon

et al. 2014; Reich et al. 2015). Taxon sampling, and annotations and characteristics

of each gene analysed are presented in Tables S4.1 and S4.2.

4.2 Population genomic analysis

Estimation of the historical effective population size was determined by MSMC

(Schiffelsand Durbin 2014). Time estimates can be significantly influenced by the

mutation rate and generation time. Using a generation time of three years, the

WWW.NATURE.COM/NATURE | 21

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

22

mutation rate was set from 0.5 x 10-8 to 1.5 x 10-8 and the effective population size

decline and recovery times were estimated (Table S4.3).

Table S4.3. MSMC estimated decline and recovery time with different mutation rates.

GBR OKI

mutation rate

recovery time (years ago)

decline time (years ago)

recovery time (years ago)

decline time (years ago)

5.00E-09 45,248 91,672 43,580 88,292

7.50E-09 30,166 61,115 29,053 58,862

1.00E-08 22,624 45,836 21,790 44,146

1.25E-08 18,099 36,669 17,432 35,317

1.50E-08 15,083 30,557 14,527 29,431

Additional Supplementary Tables

Table S4.1. Taxon sampling, data sources, and number of genes sampled per

taxon.

Table S4.2. Gene sampling and annotation.

WWW.NATURE.COM/NATURE | 22

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

23

5. Protein domain analyses

5.1 Protein domain annotations

In addition to the COTs, all protein domains were identified in the following

metazoan taxa: Mnemiopsis leidyi, Amphimedon queenslandica, Trichoplax

adhaerens, Nematostella vectensis, Lottia gigantea, Lingula anatina, Capitella teleta,

Caenorhabditis elegans, Drosophilia melanogaster, Branchiostoma floridae,

Ciona intestinalis, Danio rerio, Xenopus tropicalis, Homo sapiens, Saccoglossus

kowalevskii, Ptychodera flava, and Stronglyocentrotus purpuratus We downloaded

protein coding genes for all 18 species and used HMMER to annotate all known

protein domains based on the Pfam database (version 29.0) (Finn et al. 2015). If a

domain occurred multiple times in a protein sequence, it was counted only once

(Tables S5.1, S5.2).

Table S5.2. Pfam statistics.

Species Number of proteins

Proteins with Pfam annotation

Percent with Pfam annotation

Amphimedon queenslandica 40122 20466 51.0 Mnemiopsis leidyi 16559 9630 58.2

Trichoplax adhaerens 11520 9368 81.3 Nematostella vectensis 24773 17305 69.9 Acanthaster planci 24323 14254 58.6 Stronglyocentrotus purpuratus 28987 22587 77.9 Saccoglossus kowalevskii 34239 28286 82.6 Brachiostoma floridae 50817 39302 77.3 Ptychodera flava 34647 19793 57.1 Ciona intestinalis 16667 9453 56.7 Danio rerio 25642 23118 90.1 Xenopus tropicalis 18442 15417 83.6 Homo sapiens 20313 20092 98.9 Drosophila melanogaster 13918 10813 77.7

Caenorhabditis elegans 20447 13852 67.7

Capitella teleta 32175 20571 63.9 Lottia gigantea 23349 14450 61.9 Lingula anatina 47943 22623 47.2

We then iteratively conducted a fisher exact test using R (R development core team.

2008), comparing the number of counts in Pfam families found in species, to the

WWW.NATURE.COM/NATURE | 23

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

24

background, defined as the average of the counts in the remaining species (Table

S5.3).

5.2 Lineage-specific domain enrichment

To assess the differences in protein domains across metazoan genomes, we

examined protein domain (Pfam) expansion and contraction in each species, based

on the total number of unique genes that contained each Pfam domain. Heat maps

are ordered based on increasing lineage-specificity from basal metazoans,

protostomes, deuterostomes, ambulacrarian to echinoderms, with A. planci on the

far right. We used the scaled value for each individual Pfam domain as a proxy for

expansion, whereby any value greater than the mean was considered a domain

expansion, and any value less than the mean is a domain contraction. Using this

methodology, we find that certain taxa reveal dramatic, lineage specific domain

expansion events (Extended Data Fig. 3, Table S5.1). For instance, this is exemplified

in B. floridae and S. kowalevskii (Figure 5.1a). Further investigation of deuterostome

taxa reveals many ambulacrarian specific expansions (Figure 5.1b and Tables S5.4,

S5.5), as well some echinoderm- and A. planci-specific domain expansions (Extended

Data Fig. 3, Tables S5.6, S5.7).

Additional Supplementary Tables Table S5.1. Collation of Pfam domains in select metazoan genomes.

Table S5.3. Scaled analysis of gene expansion and contractions of metazoan Pfam

domains.

Table S5.4. Raw number of genes and scaled analysis of gene expansion and

contractions of deuterostome Pfam domains.

Table S5.5. Scaled analysis of gene expansion of deuterostome Pfam domains

showing expanded ambulacrarian domains.

Table S5.6. Raw number of genes and scaled analysis of gene expansion and

contractions of ambulacrarian Pfam domains.

WWW.NATURE.COM/NATURE | 24

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

25

Table S5.7. Raw number of genes and scaled analysis of gene expansion and

contractions of echinoderm Pfam domains.

WWW.NATURE.COM/NATURE | 25

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

26

6. Tissue gene expression analyses

6.1 Similarity of tissue transcriptomes

Seven and ten tissue transcriptomes were sequenced from GBR and OKI,

respectively (Tables S2.1, S2.3, and S6.1). Trimmed reads from all these

transcriptomes, except GBR “He-new” and GBR “new”, were mapped to the 24,747

and 24,323 gene models in GBR and OKI (Tables S2.1, S2.3).

Table S6.1. Transcriptome Statistics.

Tissue Total gene

models Number

expressed genes Percent

expressed genes

GBR

Body wall

24,747

17,030 69% Gonad 21,197 86% He-new 17,230 70% Podia 18,424 74%

Spine 16,330 66% Stomach 18,727 76%

new 16,323 66%

OKI

01 Podia

24,323

14,662 60% 02 Spine 17,868 73% 03 Testes 13,919 57% 04 Nerve f1 16,216 67%

05 Nerve f2 15,958 66% 06 Nerve m1 16,678 69% 07 eg 15,944 66%

08 mg 18,746 77%

09 oocyte 18,264 75% 10 mouth 10,268 42%

All tissue transcriptomes from GBR and OKI locations cluster closely together in

comparison to the oocyte transcriptome (Fig. S6.1). When the oocyte transcriptome

is removed, 51% of the variance lies along PC1, clearly distinguishing the OKI mouth,

GBR body wall, and GBR spine from the remaining 13 transcriptomes (Fig. S6.1). All

three radial nerve samples cluster tightly together. Based on these PCAs, there is no

evidence that the samples are segregating based on geographical location.

WWW.NATURE.COM/NATURE | 26

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

27

Fig. S6.1. Similarity of GBR and OKI tissue transcriptomes. Principal component analyses

using total FPKM values for gene models shared between GBR and OKI, including a, and

excluding b, oocyte tissue. Heat maps depict sample similarity based on Euclidean distance,

with red corresponding to a high degree of similarity. OKI transcriptomes, red dots; GBR

transcriptomes, blue dots.

Based on the PCAs of all 17 transcriptomes, we selected the following eight tissues

to characterise gene expression in COTS: male and female radial nerves, tube foot

(podia), spine, body wall, stomach, mouth and spent testes (Table S6.2). The

WWW.NATURE.COM/NATURE | 27

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

28

presented order of these tissues was derived from the Euclidean distance (Love et al.

2014) of these transcriptomes based on both expression (Fig. S6.2) and Pfam (Finn et

al. 2015) similarity (Fig. S6.3).

Table S6.2. Tissue transcriptomes used to characterise gene expression.

Source Tissue No. genes with

counts Median

count value No. genes >=

median % genes >=

median #Pfams >=

median

OKI Radial Nerve ♂ 16356 14 8299 51% 3283

OKI Radial Nerve ♀ 15374 11 7719 50% 3050

GBR Tube foot 16100 13 8445 52% 3364

OKI Spine 16805 16 8656 52% 3419

GBR Spent Testes 18417 17 9497 52% 3669

GBR Stomach 16408 17 8333 51% 3289

OKI Mouth 10667 19 5644 53% 2600

GBR Body-wall 15486 9 7547 49% 3003

Fig. S6.2. Transcriptome similarity of selected COTs tissues. a, Heat map showing the

Euclidean distance between COTS tissue transcriptomes based on the FPKM expression

values of the 22,820 liftOver gene models between OKI and GBR gene models and b,

corresponding PCA plot of the same data. Full FPKM values can be located in Table S6.3.

WWW.NATURE.COM/NATURE | 28

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

29

Fig. S6.3. Domain enrichment in tissue transcriptomes. a, Heat map showing the Euclidean

distance between COTS tissue transcriptomes based on similarity of expressed Pfam

domains. b, Heat map indicating the total domain expansion and contraction between each

tissue and c, Pfam domains that are enriched in the transcriptome of a single tissue or

closely related tissue transcriptome based on Euclidean distance. Table S6.4 and S6.5

contains the specific number of genes and Pfam domains descriptions for each of the two

heat maps (b and c) presented in the same order, respectively. Table S6.6 contains the

number of all Pfam domains for all tissues with the corresponding gene models.

6.2 Tissue-enriched gene expression

Further analyses rely on the identification of genes that are expressed in a tissue-

specific manner. Therefore, to achieve a list of genes that are differentially

expressed in each tissue, we identified all genes with an expression value (FPKM)

above the median for each respective tissue (Table S6.2). Median FPKM value was

calculated for each transcriptome after genes with zero counts were removed (Table

S6.2), and tissue-specific lists (Table S6.2) were defined by the genes with an FPKM

value above the median, resulting in approximately 50% of the total genes for all

tissues (Table S6.2). We determined the number of Pfam domains represented by

these lists of ‘expressed’ genes (Table S6.2), finding that each tissue is characterised

by a similar number of protein domains.

WWW.NATURE.COM/NATURE | 29

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

30

Using the median-FPKM value as a putative threshold for gene expression, we find

that each transcriptome is characterised by genes that are both enriched in specific

tissues and ubiquitously expressed across various tissue types (Fig. S6.4).

Fig. S6.4. Heat map indicating the expression profiles (FPKM value scaled by row) for all

liftOver genes across 8 tissue transcriptomes. Darker colours indicate higher expression.

Columns are ordered based on the Euclidean distance of tissue-specific transcriptomes, as

shown in Fig. S6.3. Scaled and unscaled FPKM values used to generate this heat map can be

viewed in full in Table S6.3

6.3 Tissue-enriched domain expansions

Using the lists of tissue-specific genes (Table S6.2), we performed Pfam enrichment

analyses to identify domains that are more prevalent in certain tissue-types (Fig.

S6.3). To supplement tissue similarity analyses based on expression profiles, we also

clustered the tissue-specific transcriptomes based on Pfam domain similarity using

WWW.NATURE.COM/NATURE | 30

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

31

Euclidean distance (Love et al. 2014) (Fig. S6.3a and Table S6.4). Euclidean distance

based on Pfam resulted in a similar dendrogram and clustering order as the analyses

based on expression values, providing further support for the patterns of tissue

similarity illustrated by gene expression data. Complete lists of protein domains and

corresponding gene model IDs can be found in Table S6.6.

Tissue-specific domain enrichment analyses were performed using the same

methods as the domain expansion analyses outlined in the Online Methods and

Supplementary Note 5. All tissues are both enriched and depleted in certain protein

domains (Fig. S6.3b and Table S6.4); the mouth exhibits the highest degree of

domain depletion, suggesting decreased transcriptome complexity with respect to

other tissues. This is supported by expression data, which indicates that that mouth

transcriptome has the lowest number of expressed genes. In addition, we identified

(i) all domains that are specific to an individual tissue and closest-clustering pairs

(Fig. S6.3c and Table S6.5), (ii) the genes that show the same number of genes per

domain across all tissues (Table S6.7) and (iii) the genes that are enriched in different

clustering blocks based on the Euclidean distance (Table S6.8).

Additional Supplementary Tables Table S6.3. FPKM values for OKI – GBR LiftOver gene models.

Table S6.4. Actual and scaled values for the number of genes expressed above the

median in each tissue for each Pfam domain.

Table S6.5. Actual and scaled values for the number of genes per tissue with each

Pfam, indicating tissue-specific domain expansions.

Table S6.6. The number of all Pfam domains for all tissues and corresponding gene

models.

Table S6.7. Pfam domains with the same number of genes in all tissues.

Table S6.8. Actual and scaled values for the number of genes per tissue with each

Pfam, indicating tissue-specific domain expansions.

WWW.NATURE.COM/NATURE | 31

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

32

7. Exoproteome Analyses

7.1 In silico secretome prediction

In silico prediction of secreted proteins from COTS (OKI) using three datasets (for

details see the Online Methods) predicts approximately 1,775 secreted proteins (Fig.

S7.1). To obtain the final list of secreted proteins without any transmembrane

prediction, all predicted transmembrane proteins were subtracted from the protein

list. In summary, 1,207 genes represent the final in silico secretome prediction. This

dataset was used to aid in the mass spectrometry analysis of seawater obtained

from COTS at aggregation and in the presence of giant triton (alarm).

Fig. S7.1. In silico secretome prediction. Pipeline of bioinformatic analysis of secreted

proteins. Venn diagram shows bioinformatics data of signal peptide prediction using three

methods.

WWW.NATURE.COM/NATURE | 32

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

33

7.2. Exoproteome purification and identification

Of the estimated 1,125 proteins identified forming the COTS protein secretome,

some may be released into the surrounding water as potential conspecific signalling

cues. COTS aggregation-conditioned seawater and COTS exposed to giant triton-

conditioned seawater were collected for exoproteome analysis by nano liquid

chromatography coupled with Triple Time-of-Flight mass spectrometry (nanoLC-

MS/MS), with the workflow shown in Fig. S7.2. See below for further description of

methods.

Fig. S7.2. Exoproteome extraction and analysis workflow. Proteins were extensively

fractionated followed by identification with high-accuracy nanoLC-Triple-TOF-MS.

WWW.NATURE.COM/NATURE | 33

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

34

7.3.1 In-solution trypsin digestion, fractionation and NanoHPLC-ESI-Triple Time-of-

Flight analysis

Reconstituted samples containing about 1 mg exoproteins in 100 µl extraction buffer

were reduced in 5 µl of 200 mM dithiothreitol (DTT) for 60 min at 37°C. Alkylation

was carried out in 20 µl of 200 mM of iodoacetamide (IAA) prepared in 25 mM of

NH4HCO3 for 60 min at room temperature. 20 µl of 200 mM DTT was then added and

the mixture incubated at room temperature for 30‐60 min. The urea concentration

was reduced with 775 µl MilliQ-H2O, and digestion was performed overnight with

trypsin (1:50 ratio) at 37°C. The reaction was stopped by adjusting the pH of the

solution to <3 by adding 10% formic acid. Tryptic peptides (50 μg) were fractionated

on a Biobasic SCX HPLC (2.1 x 150mm) column (Thermo Fisher Scientific, Waltham,

MA) using a PerkinElmer Series 200 HPLC system (Perkin-Elmer, Boston, MA) at a

flow rate of 0.2 ml/min. The mobile phases used were composed of (A) 2.5 mM

ammonium acetate in 25% acetonitrile, pH 4.5 and (B) 250 mM ammonium

acetate in 25% acetonitrile, pH 4.5. Seventeen fractions were collected using an 80

min gradient of 2% B for 5 min, 2-50% B for 40 min, 50–98% B for 5 min, and 98% B

for 10 min, 98-2% B for 5 min followed by 10 min at 2% B at a flow rate of 0.2

ml/min. The SCX fractions were dried in SpeedVac SC250EXP (Thermo Fisher

Scientific, Waltham, MA) at 40°C.

Dried fractions were resuspended in 0.1% v/v formic acid (25 μl) and analysed on a

Shimadzu Prominance Nano high pressure liquid chromatography system (Kyoto,

Japan) coupled to a Triple Time-Of-Flight 5600 mass spectrometer (Triple TOF-MS,

AB SCIEX, Concord, Canada) equipped with a nano electrospray ion source (ESI;

NanoLC-ESI-Triple-TOF-MS). Eight µl of each digested sample was injected onto a

C18 trap column (50 mm x 300 μm, Agilent Technologies, Sydney, Australia) at 30

µl/min. The samples were de-salted on the trap column for 5 min by flushing with

solvent A (0.1% aqueous formic acid) at 30 µl/min. The trap column was then placed

in-line with a 150 mm x 75 μm 300SBC18, 3.5 µm analytical nano HPLC column

(Agilent Technologies, Australia). Linear gradients of 1-40% solvent B [90/10

acetonitrile/0.1% formic acid (aq)] over 40 min at 300 nl/min flow rate, followed by a

steeper gradient from 40-80% solvent B in 10 min were used for peptide elution.

WWW.NATURE.COM/NATURE | 34

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

35

Solvent B was held at 80% for 5 min for washing the column and returned to 1%

solvent B for equilibration prior to the next sample injection. The ion spray voltage

was set to 2400V, declustering potential (DP) 100V, curtain gas flow 25, nebuliser gas

1 (GS1) 12 and interface heater at 150oC. The mass spectrometer acquired 500 ms

full scan TOF-MS data followed by 20 x 50 ms full scan product ion data in an

Information Dependant Acquisition (IDA) mode. Full scan TOFMS data were acquired

over the mass range 350-1800 and for product ion ms/ms 100-1800. Ions observed

in the TOF-MS scan exceeding a threshold of 100 counts and a charge state of +2 to

+5 were set to trigger the acquisition of product ion, ms/ms spectra of the resultant

20 most intense ions. Data were acquired and processed using Analyst TF 1.5.1

software (AB SCIEX, Concord, Canada).

7.3.2 Protein identification parameters

The PEAKS Studio used the following parameters: precursor ion mass tolerance, 0.1

Da; fragment ion mass tolerance, 0.1 Da; fully tryptic enzyme specificity with two

possible missed cleavage; monoisotopic precursor mass; a fixed modification of

cysteine carbamidomethylation; and variable modifications included methionine

oxidation, conversion of glutamine and glutamic acid to pyroglutamic acid,

acetylation of lysine and deamidation of asparagine. Proteins were grouped if they

were supported by the same peptides, and the number of high-confidence

supporting peptides that are uniquely mapped to the protein group was set to ≥1.

Although a minimum of two peptides was commonly used to consider protein

identifications statistically significant, identifications with one proteotypic peptide

was allowed due to the high homology of proteins derived from two genomes. The

protein confidence score (-10*lgP) was calculated based on confidence score of its

supporting peptides.

7.3.3 Validation of quantitative proteomics and data deposition

The results were validated sequentially with the charge of featured peptides

between 2 and 5, fold change ≥1 and detected in more than one sample of the

triplicate, while the FDR of protein was set to ≤1%, the number of unique peptides

and fold change of each protein were set to ≥1 and ≥2, respectively; peptide ratio

WWW.NATURE.COM/NATURE | 35

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

36

versus quality-score and ratio versus average-area (MS signal intensity) were set to

recommended values of 8, respectively. The quantitative patterns of grouped

secreted proteins in normalised log2(ratio) were presented, and proteins were

clustered using one minus Pearson correlation (Lin 1989. The mass spectrometry

proteomics data and protein database have been deposited to the

ProteomeXchange Consortium via the PRIDE (Cote et al. 2012; Vizcaino et al. 2014,

2016) partner repository with the dataset identifier PXD005409.

In total, there are 394 proteins identified from the exoproteome of COTS using the

combined databases of GBR and OKI (Fig. S7.3, Table S7.1). Of these, 108 are derived

from precursor protein sequences containing the hallmarks of a typical secreted

protein (Fig. S7.4, Table S7.2). A total of 71 were present in preparations derived

from aggregation, 14 within the alarm and 23 within both. Of those exoproteins

identified, several have putative or defined functions as described in COTS or other

species, including the plancitoxins, vitellogenin, ependymin, peroxidisin and

pentraxin. Plancitoxins were the most abundant proteins identified; these are known

to function as potent toxins, causing liver cells to undergo apoptosis (cellular death)

by entering the nucleus of the cell and then degrading DNA (Lee et al. 2014).

Another protein with high abundance was vitellogenin; this protein appears to be

produced in both female and male COTS, and is the major yolk protein in asteroids

(Alqaisi et al. 2016).

Fig. S7.3. Number of peptides detected in seawater samples specific to aggregated (244) and

triton-alarmed (77) COTS. 73 sequences were found to be common between conditions.

WWW.NATURE.COM/NATURE | 36

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

37

A disintegrin and metalloproteinase with thrombospondin motifs; oki.137.43; gbr.350.28 MMLSASLFLSFAAMALAGPPVPNHRFSKDELQRYFGVDSDDKAPEYEIVYPEYVTADTKRSMDRSSIT

ASHLDVYVDAFGETLHMTVEHDDSGIKPGLEAEYLTDEGIIKVPVQTDCIYSGKVVGEGDSLVSVTTC

VGLMAIAYHASGPTYIEPLDDEHAYKRDIGRGLPHVAYKNKPQNGASCPVRSPLCVEGDVKPSGTKYL

KLAFVGDALLHYLRGNSTLTALTTVFNAVKNILQLDSLAGKDLVPKLVHMIILTVRQPGFWISWNAYRY

LPSAAEWLDTKQYPPSDDRHWDNAAIFSGLPFVNGILGLAYVGVCDSKEAVSVSRFHSLDEATATAAH

EIGHNLGMCHDNQDNSCPSTGYVMAPYESEVKIEPIWSSCSRRYYNRFVDSNTCYNDS

Reprolysin family propeptide (PF01562); Reprolysin (PF01421)

Acid ceramidase; oki.24.59; gbr.394.7 MLVRVLAPLAVLVMVFAPSLGQDVYPYTDKVCRTDVQYPPADAKTKVPSFVLNLDMNPEDRWRPLMKN

KSAELKAMINDIIDLVGAFFKNKTKVVNLLDEVLGPLAYTLPQPYQAELIGIADASGVPLGQIVLYNI

FYEVFTVCTSIVAESPNGTLYHARNLDFGLFLGWDVKNQKWELTERLRPLVVNVDYQVQGKTVAKAVH

FAGYVGVLTGIKPGVVTMTMNERYNIDGGYIGILEWIMGKRSEQWMGFFMRETLLNATDTKAKVINAK

ILAPGYIILGGSKTGEGAVIVIDRKEPAYVEELDPKKGKWFLVETNYDPTEKPPFFDDRRTPALKCMT

NTTQKAVGLGPIYNVLSTKPNLNLLTTYTAMMQVNSGYLETYIRQCPQPCYPCPGVPHSWGIIACNSRF

KHVFNVEMTCEGCSGAVTRVLNKLGDKVAKFDVDLAQKKVVIESALSSDELLETLKKTGKETTYVGQE

N

NAAA-beta (PF15508); CBAH (PF02275); HMA (PF00403)

Hyalin-like; oki.8.102; gbr.4.107 MGSLILLIVAVLPSLVSSIVNDLSCEYRPTAGTGEYTFDTNVLNDGEFSFSFKVLTNSDVRIMLSPYAE

EDTAAYVITSFQRPGPIASRGSGSWHQGRHEGHVMAKRTLSSIGQNNLLTTIKRYDGSVLVSHSVADES

PFRESELRCMDHCMNGYWICFQSDTIKLGRAGDSAPYLEYQVPSGETFNPRYVGFATGSTNIGLFGDFR

FADECVSQGSGSELGGTTTKQPRDCRNVDCGDTTCAFGFKTDSNGCQTCDCKDSPCDPLSSCTETCKH

GHEKNEYQCETCTCTPGPCDDDPCQNGAYCYGIGSNDYGCYCMEGYSGKNCEEDVQEPSIKCPGNMNQ

TTDEGVGFASVTYPEVTATDNSGQEPSISCTPNTATLPVRSNLIQCVATDDSNNKASCSYTILITDEE

PPKLTCSDPINRPTDSGEPFATVDYDLPLVEDNYAASGVSSCDKDQDYKFPIGDTEVTCTALDLYGNE

GKCSFNVTVSDQEPPKFECPLPMLDETLDQGESFATVDYTLPDVTDNVDSDLTVSCTEGPGSQFPVGE

WTVTCSAIDKAGNSKECSFPVEVEDDQPPKLVCPDLISNTTDPGKAYATLQYSLPQAVDNADPKPAVS

CELGPGSKFWVKPNTVVCTATDKYGNSNTCDITVEIKDEELPKITCPEPMENVSVDTGKAYATVDYAP

SGVQDNEDRTPDVSCDGPKDSQFDIGESTVTCTVTDRSDNSASCSFTINVIDDEKPKIICPIPMPDVK

TDTGKRTATVDYGEATATDNADPNPEVTCDKGTNTEFGIGTTTVRCKAIDDAGNKKGCKFDVTVEDDE

EPTLECPESMDPPTDEGKSFATVEYPPPAVSDNVDASPKVTCSETTGSEFSFGPTTVECTAVDSSMNE

ATCNFTINVKDTEKPELDCPIVITEPVEEGKSYAIVEFSPNVFDNVDPEPQVSCTPPSGEEFDIGKTE

VTCTAVDDDGNFDSCDIEVLVEDNEGPIIKCQIPMPSSADPGSSSTFVNYNMPTATDNSGNVPTIECV

PPSGSEFTIGTKNVTCTATDSSGNSNQCSFGVIVKDTEAPTFDCPDDMAPSMDEGQSFATVSYKTPTA

EDNWDNDPRVTCTPRSATTFNAGKNTVTCTAIDTARNKKNCTFTIFVEDTEEPSITCPIDMDEPTEVG

VSYAVIDFETPKAFDNADPRPKVDCDRVADSQFDIGSTSVNCTARDNAGNEETCTFTIVINDEESPNI

TCPSRMDKVSVDWHQAYATVNYDPPTVRDNADLSPSVSCVKASGSEFGLGETMVTCTASDKYGNSKSC

KFPINVVDDEKPEITCPDRIDQPTDRDQDYATVIYPEFTVKDYVDDDLTVTCQKSSGSKFYIEDNHVT

CTARDDAGNEASCTFPVVVTDEQPPKLTCPVNMTKSLDEGKPYATIKYTKPTATDNVDPQPEVSCNVP

PDSEIYELGLFPVTCIAIDYAENANSCMFTVEVKDEEKPKITCPMNMAPSTDPGEKYATILYTEPIVV

DNFDPSPEVTCSVGQFEQFQVGSKTVTCTATDSSDNTNSCTFKVTVGDTEPPSMQCRVNNIYKFTDKG

KATATVDYALPDVSDNVDPSPSVSCNPSSNTQFSTGSTTITCTATDSANNVNDCQFSVIVT

Antistasin (PF02822); HYR (PF02494) x 16

Lectican/chondroitin sulfate proteoglycan (aggrecan/neurocan core protein-like); oki.91.45; gbr.124.31 MDMSAFEAIVFLVGLYASQAAVCPADWRQYGDICYFPITTSMSWEEANQACFAKEAQLALPRSQMEQD

AIWGMLQETVNGQFPERIWIACSDFEEEGNWRPCPLRDDGSNAYENWRGNQPDNNNGADCAAMIRSNG

GRWGDRPCRELNFAVCQLPAILMSVPTVCLLINTDGRPASACLVGHNLKEIPVTGVVECGMACRLEPR

CRSFNLVKQGRAGMLCQLKNVTWSEADEKSFMKTQENCYFFEL

Lectin_C (PF00059); PAN_1 (PF00024)

WWW.NATURE.COM/NATURE | 37

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

38

Alpha-L-fucosidase; oki.12.80; gbr.3.112 MTCITLLKIFVLMAFLESILAQGQPPYQPNWQSIDSRPLPGWYDDSKFGIFMHWGVYSVPSYGSEWFW

RYWASNVKSYVDFMKQNYPPDFTYADFAKEFRTELFNASHFAEIVEASGARYYVLTTKHHEGYTMWPS

RYSWNWNAMDVGPKRDLVGEIATAIRTLKTKVYFGLYHSLFAWYHPLYLQDEANNFQTRLYPQQISTP

MLHELVENYHPEIIWSDGFEKTTGPEYWNSTGFLAWLYNSSPVRDIVVVNDRWGEGTVCHHGGYYTCH

DRYNPGTLQNHKWENAMTVDKHSWGYRRNVKLADYLDAEELIASLAKTVSCGGNMLLNIGPTHDGRIV

PIFEERLRQIGAWLKVNGEAIYGSHPWKAQNDTKTQNIWYTSQMNGTTVEAVYAIILKWPTNNHVFLG

APISSKLTTVSMLGVEPPLKWEMGPKGVGMNVTMPTLNPVQIPCQWAWVLKLCNIL

Alpha_L_fucos (PF01120); Fucosidase_C (PF16757) Alpha-L-fucosidase; oki.11.195; gbr.81.217 MAGYATKLSVVALCIFFGAIHQCLGKRVKYDYEYDYEPNWPSLDSRPLPPWYDEAKIGIFMHWGVFSV

PSFGSEWFWWYWQGQPQAPYVEFMKENYRPGFTYADFASMFTAEFFDPNAFAEIIEASGAKYFVLTSK

HHEGFTNWPSKYSWNWNAMDVGPKRDLVGELATAIRTTAKDVHFGLYHSLFEWFNPLFLQDQKAGFKT

QAYVQDICLPELYEIVMSYKPDLLWSDGDWSATPDYWNSTEFLAWLYNSSPVKDTIVTNDRWGSGTLC

QHGGYYTCSDRYNPGTLQKHKWENAMTIDKKSWGYRREATLADYLSIDELVAILAQTISCGGNLLMNI

GPTHDGRIVPVFEERLRQMGAWLKVNGDAIYASKPWRVQNDTKTKDVWYTSKMEGSLLSVFAIVLDWP

VTNQLLLGAPIATNQTQINMLGYSIPLQWKQAPSGGGIVVTMPALNPAQLPCQWAWVIKMQTVQ

Alpha_L_fucos (PF01120); Fucosidase_C (PF16757) Alpha-L-fucosidase; oki.11.192, oki.11.193, oki.11.194, gbr.81.214, gbr.81.215, gbr.81.216 MFPQHVLVCCSLIAAVAGVQARNEPNWESLDARPLPSSYDEAKIGVFSHWGVYSVPSYGSEWFWWNWQ

ALCYPGYVKFMQKTRPSGFTYQDFAAEFKVELFDTEQFTDILQASGANYFVLTSNHHEGFTDWPSKYS

WNWNSMDVGPKRDLVGEVAAAVRSKTNLRFGLYHSQFEWFNPLYIQDLKHLFTTNDYVTKVYMPELME

LVETYRPELVWSDGSGEGVYQYWKSTEFLAWLYNDSPVKDSVVVNVRWGILCECNHGGYYTCQDRYNP

GVLQKHKFENAMTLDLSSLGYRRDATLKDIMDIDVLIATLAQTVSCGGNLLVNVGPTRDGRIVPIFEE

RLRQMGAWLKVNGEAIYSSKPWKAQKDVKTESVWYTSKQNETVTAVYVIVLDWPIDNDIILGSTMPTN

RTTVSVLGHEGALKWTRGPSGEGMTVTLPILNPTQMPCHWAWVLKIQNLK

Alpha_L_fucos (PF01120); Fucosidase_C (PF16757)

C-type lectin domain family (alpha-N-acetylgalactosamine-specific lectin); oki.105.24; gbr.330.8 MAFLRVVFAVFIVGLASDLVARCQAGCPTCPPTWTLHFGSCYRLFATPQTFDEAEKHCQQIAGSRKGH

LVSIHNKAENQFVYRMWTTAIVNYNYLWIGMDDRTKEGHFHWTDGSNVDYNAWGESQPDNHNNEDCVH

LRPQKTYADWNDIPCSHKYAYICKMSTTRVGHVTPHNLMVTYLNKPPFAVIEATYLH

Lectin_C (PF00059)

Angiotensin-converting enzyme-like; oki.25.24; gbr.112.9 MGLPRPSFALVALVLALVSLEGAGAKFSPTSAMLELEIIKEHSPSLLRNLCCEDRDQIYDCIIRAWGSI

GGGIHECCSYLRGEQITNEELAAAWLRELDYLSVETAYAGSNFNWNFQTNMTAQNSLYTKNSTMLIGD

FSLEMKNQARQFDTSMFQDPSIKRQFMLLLRGGYLNDRAKRENMTRIANEMENIYGKGTVCRENGECL

TLEPDLEDLMANKRDYDELLWAWKGWRDAVGRKIRPLYPQYVELKNEGARTNSYADESEVWQERYEMR

GDAFEEMLGDIYDAVKPMYQQLHAYVRRKLAERYGKDKVDTNGLIPAHLLGNMWGQQWNNIYDLVIPY

PDVPDLDVTQEMRRQGYTVHKMFKVAERFFKSLGMDPMPDSFWENSMLEKPKDREVVCHGSAHNFFKN

REVRIKMCTEITMEDLYTVHHEMGHCEYYLQYHRQPVVFSSGANPGFHEAVGDTIALSVVTQDYLHEI

GLLDKVSRNKEADINYLMKVALSKIAFLPFGLMIDKWRWGVFRGEIKPESYNEAWWKLREEYQGLKPP

VERTEEDFDPGAKYHIPSGTPYIRYFISFVIQFQFHRAMCEEKGHVGPLHLCNNYNSKRAGQKLRDML

SLGISVPWWEALEVLTGSEFIDPSAIQEYFAPLIEWLKEQNGDNVGWQKHL

Peptidase_M2 (PF01401)

Angiotensin-converting enzyme-like; oki.27.88; gbr.41.46 MSRVLSRMYLSLLLALISAKAGSSLQQARESDPITDQNEANEFLRHYTEQAQIIIYAGALASWGYYTN

VTAYNQQKDAHNQGLVLGTKPQTEPKSRVGKGPMSVIRPELKGTLIGDREVEASLVSAAFRQEAYQNA

SRFDVSGFDEDVKRQFQKIKDIGTAALEPSKVEEYNNVVNKMTDNYSAGTVCKEDQPTECLQLEPGLA

HIMATSTNWDELVWAWKGFRDAVGTPNKPLYKKFVKLANEAAVANGHADMGAYWRSDYESATIVDEAY

WWW.NATURE.COM/NATURE | 38

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

39

KIYDQILPLYQQLHAYVRRIMHKTHGSKVDVKGKIPSCLLGDMWGRFWGNIFGQVVPYPDRPNIDVTD

TMVAKNFTPRIMFELADDFFASLGLLRVNPAFWSNSMIVKPKDGRQVVCHPSAWDLGNGEDFRVKMCT

EVTMDFFQTIHHELGHTQYQMQYSDLPFPFRDGANGAFHEAVGEVMTLSISTPAHLSHPDIGLLEPGS

GTDEETDINFLLKTALNTIGTLPFSLALDQWRWDVFAGDISEDKWTERWWQLKHDLVGTEAPVSRTED

DFDPGAMYHIVVAYPFLGYYMRTIIQFQFQKALCDAAGHTGPLHRCDFYKSQEAGTKFANMLKLGCSK

PWPDAMEAITGQRAISADAINAYFEPLMTWLTETNRKNGEQIGWKAPGNGAILPNVSLVVLVLCLLVT

SRLL

Peptidase_M2 (PF01401) partial; Peptidase_M2 (PF01401)

Arylsulfatase G-like; oki.47.7; gbr.81.161 MQIFIVFLMFLTQTLWSNSILPCEAETVRDEGGVYAEKGQGAKPNFIILFVDDLGWGDLGANWNEDGL

PSDTPFLDELAAKGTRFTDFHAGASLCTPSRAALLTGRLGKRTGVVGNFNVQSCAGMPLNETTMAETF

NAAGYRTGMIGKWHLGIYGGFGPVHRGFDSYLGIPYSDDMGCVDNPGYNLPPCPPCNHSTGYGTLYDL

FLKILAKGTFPCNKMAAVPLIENTTIIEQPVDLAAVAGHYKKHASSFIKQSAQDGKPFLLYVAFTHVH

VPLLFNAKFAGTSARGPFGDTVRELDDTVAGIMEAVSQAGVENDTMVWFTSDNGPWAAKCLYAGSSGP

FLGLWQKTKGGGGSTAKMTIWEAGHREPTFAYWPGHIPAGRVSDSLLSALDIYPTIASIAGIPMPKYR

GYDGMDVKDVLFGGSIYERTLFHPNSMASGALGDIGAVRQGRYKAVYQIGSARPNCLGETSPPRHRNP

PLIFDIYKDPAEEQPLNQSSAEYKSALQDIEASLQAFLEDVKQDNTTVADYRQDPRCIPCCNPNQVDC

RCSG

Sulfatase (PF00884); Sulfatase_C (PF14707)

Arylsulfatase I-like; oki.82.31; gbr.36.19 MMMTTFRLMFPLTILLIFSVIVTEHPVLAKTKNNIRYEEPKSKMQPHIIIILADDLGWNDVSFHGSYQ

IPTPHLDELAYSGVLLSNYYVLPICTPTRSALMTGRYPIHTGMWHSVIIAAEPWGLGMDEVILPQLMK

QQGYHTHMVGKWHLGFFDEEHIPSQRGFDSYFGYYLGKGDYWTHYDTEPLFLYFAHQAVHSGNDAQHA

LQAPMQYYDRFPNITDHKRRMFAAMTAVMDESVGNVTRALKQAGLYDNSVIIFSTDNGGPAHGFDFNH

ANNYPLRGTKHSLWEGGVRGTAFVHSNLLTKPGRISHDLLHVSDWLPTIYNLAGGDSSKLKNLDGFDI

WPTLSRGVKSPRSEVLLNIDTISGVSALRIGDMKIMYGDIEHGKWSGWYKPEGLPPHYVPPPPPAGAFA

VHCPPKPSNASTNCDPFKAACLYNITNDPCEFYNLADWNQDVVASMTQRLNEYKTTMAPARNKPSDPDA

NPDLHGGYWVPWVKND

Sulfatase (PF00884)

Beta-D-xylosidase 2-like; oki.182.61; gbr.167.8 MAAKSRLLGLLSFICLLLSQTFHQSSESSEFPFQNYSLSWDERLDDLISRLQLDEIVLQLARGGVGPNG

PSPPIPRLGIGPYNWNTVCLRGDFNAGNATSYPQPLGLAATFSRSITGGVAIATSEEVRAKYNNFTQH

GIYDDHCGLSCLSPVVNIMRHPLWGRNQETFGEDPFLASEMARAFVRGLQEPSYAPGTEARYLRTSAG

CKFFGVHNGPEDYPSSRYTFNAGFGFKGYVVSDRNALEYVLLKHGYTETPLQTAVAGVKAGCNLEQSDS

AENVYTNLTEAVQMGVVSEDELKQLVRPLFYSRMRLGEFDPPGMNSYSRFNASDMVQCLQYRNLALAGA

IKSFVVLKNENNTLPVGTIQKLAIVGPFANSPWDLFGSFAPQTDPRFISTPWDGLRGLGQFQRLAPGC

NNPTCDQYNKTSIMDAVVGADFVIVCLGTALGKTLVLLLFNAGPLDILWADQSPGVHAIVECFFPAQA

TGAALKKLFTNADPGMPAGRLPFTWPASLEQVPAMSNYNMTNRTYRYFTGEPLYPFGYGLAFTQFNYT

NLTLGQTTIDPCDDLMVHVTMINTGKYDGYEVVQVYIKWHNASVPTPKIQLAAFDRFKATINNTVTFF

LKMPARVRAVFTDELVLEPGMFTVFAGGQQPGQKRQNYSLPWDKRLDDLISRLQLDDIVLQLARGGSG

PNGPSPPIPRLGIGPYNWNTECLRGDVGAGNATSFPQALGLAASFSISLVNSVAKATSEEVRAKYNNL

TKHGIHRDHGGLSCFSPVVNIMRHPLWGRNQETYGEDPYLTGEMAKAFVRGLQGLQGNFSRYLRTSAG

CKHFDVHSGPENYPSSRYTFDAKVSEHDMYMTYLPAFHECVKAGTYSVMCSYNSINGVPSCVNHKFLT

DILRSQFGFKGYIVSDQKAIEYVFLKHKYTHSPLQTAVAAVKAGCNLELCYSAKNVYTNLTDAVQMGL

VSEDELKQLVRPLFYTRMRLGEFDPPWMNPYARFVASEMVESFPHRNLALLSASKTFVLLKNEKNTLP

VGGIHTLAIIGPFADSPQDLFGSYAPQTDPKFISTPWQGLRSLGRTQRLAPGCNNPVCDQYNETAIMD

AVTGADLVIVCLGTGTKVEREGLDRRTMSLPGHQLQLLQNAVKYALGKPLVLLLFNAGPLDILWADQS

PGVHAIVECFFPAQATGAALKQLFTNAEPANPAARLPFTWPASLDQVPPMTNYSMMNRTYRYFFGEPL

YPFGYGLSFTQFSYTNLTLGRTTISPCDDLLVYVTLVNIGKYPGDETVQVYIKWHNASVPTPNIQLAA

FNRFKTTVKNTVTALLRVPARVRAVFTDELVLEPGVFILFAGGQQPGQKRQVGSSVLNTTFTVEGPVT

PLSHCPQ

Glyco_hydro_3 (PF00933); Glyco_hydro_3_C (PF01915) x 2; Fn3-like (PF14310); Glyco_hydro_3 (PF00933); Glyco_hydro_3_C (PF01915); Fn3-like (PF14310)

WWW.NATURE.COM/NATURE | 39

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

40

Beta-D-xylosidase 2-like; oki.116.42; gbr.227.11 MMMFLSQPVAVVGLFVLLVGAVPPLIGTQGKNFPFWNDSLPWDERLDDLLSRLTLDDVTLQMARGGSGP

NGPAPPIPRLGIGPYDWDTECLRGDVEAGNATSFPQALGLAATFSTQLLYDVAQATGVEVRAKHNNFT

KHGIYKTHGGLSCFSPVINIMRHPLWGRNQETYGEDPFLTGEMARAFVTGLQGNDPRYVLANAGCKHF

DAYAGPENYPQSRFSFNAEVSDRDLFMTFLPQFHECVKAGSYSIMCSYNSINGVPACANTRFLRDILR

DQFGFEGYVVSDELAVEFIRLAHRYTNTSLETAVACVKAGLNLELSNFKNVVYLHIAEAVEKQLLSAD

EVFALVRPLFYTRMRLGEFDPPARNPYTKLTVDDVVESERHRSLAVDAAVQSFVLLRNDGVLPLKSIN

KLAVVGPFGNNSEQLFGDYAPNSLPEYITTPLQGLASIAQSTRFAAGCNRSLCTEYNQSSVLQAVSGA

DFVIVCLGSGTSIESEGLDRRNLALPGHQLQLLQDAVKYAGGKPVVLLLFTAGPFDISWAVNSPDVPV

IVQCFLPAQATGVAIRNMFTNEQGANPAGRLPYTWPASMDQVPPMTNYTMVDRTYRYFSGTPLYQFGY

GLSYTVFQYKQLVLKPDRIMPCDNVTLTVTLANVGKYSGDEVVQVYIKWANATVPVPKIQLAAFERVS

IATGKMTSVTLSIPARVRAAYTDRLVLQPGSFGVFVGGHQPGRSVGGHPSSNVMTGSFHVDGPETDLA

KCPK

Glyco_hydro_3 (PF00933); Glyco_hydro_3_C (PF01915); Fn3-like (PF14310)

Beta-D-xylosidase 2-like; oki.182.63; gbr.167.5 MLTAKSLCISLHCVLLLHIITSTAAELPFRNVSLSWDERLNDLIPRLYLDEIASQMTRAGYKENGPTLP

IPRLGIGPYNWVTECLRGDVESGNATSYAMPIGLAASFSVDLLTAVGTATSIEVRAKYNNYTSHGIYK

DFGGLSCFSPVINIVRHPLWGRIQETYGEDPFISGELAKAYVAGLHGDHPRYVRTSSGCKHIFAYDGP

EDIPSPRFSFNSVVNDADMQMTYLPMFHECVKAGTFNLVCSYNSINGIPACASKKYLTDIVRNQWGFK

GYVSSDDGALEYLHSAHNYTKGPLDSTVAAIQAGCNLELTGFKTPVYTHLTQAVQLGLISIEEMTTLV

RPLFYTRMRLGEFDPPDMNPYTKLNVDEVVESAEHQELAVSVAVRTFVLLKHIGNVLPVGKIATLAVV

GPMADSPYDPFGDYPPGTLREYITTTREGLKSIASIVKYAGGCSSPRCTDYDPKEIISAVTDVDFVVV

CLGTGTSIESESRDRPNMDLPGSQLQLLQDAVKYADGRPVVLLLFNAGPLNITWADESPDVHAIVECX

XXXXXXXXXXXXFMTNGPEGNPAARLPYTWPASMEDVPPMTNYSFYNRSYRYFTGTPLYPFGYGLSYT

EFTYNRITVSNPLLKPCDDLHISVTLTNVGHYAGDEVIQVYVGWPDAAYPVPKLQLGAFLRVSTTPQN

EITNYVTIPARVRAVYNETLVLQPGKFMLYAGGQQPGQKRRVSSNVLITGFTVVGPATKLSECPP

Glyco_hydro_3 (PF00933); Glyco_hydro_3_C (PF01915); Fn3-like (PF14310)

Beta-D-xylosidase 2-like; oki.5.153; gbr.137.1 MVDKTCFLSPVVLILLSLACPQLVWLTEFPFQNATLPWEERLNDLVSRLELEDIILQLARGGAGPNGPA

PPIPRLGIGPYNWNTECLHGDAEAGNASTWPQVIGVAASFAADLAKSVASATSEEVRAKYNNFTRHGI

RRDHCGLTCFSPVANIMRHPLWGRNQETFGEDPFMTGEMVSSYVNGLQGLDGPVARYLRTGAGCKDFA

VFSGPEDYPASKYTFNSGATERDLYMTYLPAFHECIKAGAYSVMCGYNKIDSVPACLNQRFLKDILRT

EFGFKGYVVSEKSALEYAFLKDNYTQTALETAVAAVKAGCNLEQSDTPHNIYTNLTAAVQMKLVTVDE

LRELVRPLFYTRLRLGEFDPPSMNPWGRFNASEVVESLQHQNFALEGALQTFVMLKNENNTLPVGGVP

IIAIVGPFADSPQEILGSFAPQTDPKFISTPWGGLGGLGKVRRLAPGCNNPVCDQYNQTAIMEAVTGA

DLVIVCLGTGTQIENVGLDRRNMSLPGHQLLLLQQAVKYALGKPLVLLLFSGGPLDIGWADSNPGVHT

ILQCFFPGQATGGALKNLFTNSQFPVAAGPSGKLPFTWPASMDQVPPITNYNMTGRTYRYFTGDPLYP

FGYGLSFTSIKYINVSVGNTTINPCDDLMVYVALVNTGTVYAYESVQVYIKWHNASVPAPNIQLAAFT

RLRTTMDNPVTVYLRMPARVRAVFTDQLVLEPGMFTVYGGGQQPNQKRQAPSNVVNTTFTVQGPVTPL

SKCP

Glyco_hydro_3 (PF00933); Glyco_hydro_3_C (PF01915); Fn3-like (PF14310)

Beta-glucuronidase-like; oki.29.106; gbr.19.105 MWTFKAAVLGSYFMLMTSAMITDNSVFLNSQLPQIRIPPKPMLYPRESETREIKELNGLWKFRADDSP

SRNEGFSAKWYGYAMSIDISLDMQEVSKAVNIETPFVLSCFVCLFFFLKQTGPVIDMPVPSSFNDVTQ

DRTLRDFVGWVWYDREFFAPIGWRNPDVRVVLRFASAHYNTVVFWFVXQWTVFTGTPSPVTNGVDGLT

VTIVPSYPPGYFVQNVQFDFFNYAGIHRWVHLYTTPRVHISDITVTTDLEGSTGIMNYTVLVGGLTSH

SPAATVELKDPSEGGRVVASSKSLSGVFTVSDVKPWWPYTMSNNSAFLYTLQVCVRNGATSDVYRLPV

GFRTVSVNNKKLFINNKPFYFHGVNKHEDNDVRGKGLDLPLTIKDINLMKWMGANSLRTSHYPYAEEF

LDLCDQHGIVVIDESPGVGIKLESNMGPVSLAHHLEVMMEMYQRDKNRPSVVMWSVANEPDSTLPTAP

HYFGTVINFTRSLDPTRLVTFVLGGTSVEREKVAQWCDVLCLNTYFSWYSDSGHLELVEMQSNTSLWD

WHLKFNKTIIQSEYGADTIPGLHMDPPQMFTEDYQCDMMAGYHATFDILRHNFLTGELIWNFADFMTV

QSVTRVVGNKKGVFTRHRQPKAAAHLLRRRYLSLAEESKV

Glyco_hydro_2_N (PF02837); Glyco_hydro_2 (PF00703); Glyco_hydro_2_C (PF02836)

WWW.NATURE.COM/NATURE | 40

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

41

Conserved uncharacterized protein; oki.116.41; gbr.227.10 MIGPVVLFLAAAVAAFPGAADAAPPDPSMSVVRMLQANEVRVDEVAWTGLSDGQSLYGGSPCSCQGKE

CGCCQSVKIPALGINSKACANVTFLSEQIGAKLSLSIDGKVIFDQTVSVKNPEPICKNFKGRFQICGE

LYKLSVSPEEFEACARLQAKAYGRVIATVDLGCFKIPLKEDELALVESTDTVDEWLDLSPSVSANGPC

SCSHDQCQCCQRIKIKAIKINNNICIKVQFLSSNIGVSLSLTIDGKTVFTKTLSLKNPPPICEPLGVG

KVCISLYDLSLTKDALSGCGRLQAKLLGKTVATVKLGCFKIPLHLALYGRSPEVEGILHMGDLAEALA

GPVMGGELTADNFPLTIALEPYLTADDGEN

DUF4773 (PF15998) x 2

C-type lectin domain family (C-type lectin 2); oki.8.451; gbr.442.11 MNALTASLVLTALVTAVFAGCGPSCPEGYLNWEHDCYKLYDEAKNWAAAEQRCVADGAHLTSVHSAEE

DDFLNQLSQQGTAGNKHTWIGLNDHQAEGSYVWTDGSPTDYLNWHKGEPNNHGKGEHCMEINFFELDG

TWNDHFCDREHRFICKMPPIYD

Lectin_C (PF00059)

C-type lectin domain family (C-type lectin 2); oki.8.452; gbr.442.12 MNALASSLVLTAVVSTVFASCGPFLCPPGYTKWQTNCYKLFDEVKNWAAAEQRCVADGAHLASVHSAA

ENNFVNQMALQGTAGGQETWIGLNDLQTENSFVWTDGSPIDYTNWELDEPNDFYPGEDCSHLNHVASN

GEWNDFYCDQEFRFICKK

Lectin_C (PF00059)

Sialate O-acetylesterase-like; oki.22.70 MRSASFLHACVHILILFTQSRGQRSAPLNSGVLRTHGARHNGHLLNLIRLKFIPQLQNEFKEAFSLFD

KDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREA

FRVFDKDGNGFISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEETCKLASYYSNYMVLQQ

QPHNAVVWGYSDIMATIEVKVGDKMYKASIETQFPSRVGVWKVKLDPMPAGGPYDIVVFCEDWRAVNTV

VLKDVMFGDVWICSGQSNMAFTVDQSFNGSKELNESINYPDIRLLAVKQVLSETPYNDLHGLYEPWSK

PSPETLGSKAATFTYFSALCWFFGRDLYDTLQYPIGLISTNWGGTPIEAWSSPDVLKTCGTRNRSVSK

TNDRLTGPVKGPSMPSSLWNSMIHPLLNFTIKGAIWYQGESNTMDPDPYKCLFKTMINDWREKWHQGT

EGSTDLHFPFGFMQLCTSNSPSSEIIGPFPTLRWHQTYDYGYVPNDVMQNVFMGVGIDLPDVKSPYGP

IHPRDKQDMGTRLSLAGRAIAYGQNVSYAGPYPTSFTVNTTTLTLVIEYSGGKANIRLVGKTGFEVCC

GGSPPCTYYDTWVPAKITGQPTTSSISLSYYCYNRDATAVRYLWRDMPCAFKDCPVYSVENNLPGPPFI

LNL

EF-hand_7 (PF13499) x 2; SASA/DUF303 (PF03629)

Carboxypeptidase Q; oki.38.68; gbr.63.59 MLSYYVIILLELCTPTTLQEVLLISLPSVSRSKPLLRRPDYDLQKIKQEIASYKDVANEIMAYIVNGSA

KGQVYNRLALFTDMFGNRLVGTKNLENSIDFMLNELQKDGLQNVHGEEVVVPHWVRGNESAVMLEPRRY

NLIMLGLGSSVGTPPEGITAEAIVVSSFDELKKRASEVPGKIVVFNQPWVNYGVSVAYRDFAAVNTAKL

GGVASLVRSIAAFSIHSPHTGWQALSIVKQLGLRPKRTMRMVMWTGEEVGGVGSLQYYQRHKANASNY

DLVLESDMGTFTPYGIEIRGSNETLEIVKGIVELLGPVNATTFRKGEDGLDVSYWEKDGVPGGSLLNH

NEHYFWFHHSDGDTMSVQDPHQMDLCAAVWTVVSYIVADLDNMLPRK

Peptidase_M28 (PF04389)

Lysosomal protective protein; oki.231.4; gbr.449.3 MQVMKMNGSPALAVLVCVFCVTSGQPAADEITSLPGVSGSLSSRQYSGYLRASGTIKLHYWFVESERD

PENDPLVLWMNGGPGCSSLDGYLSELGPYQVNDDGMTLRANKYSWNQVANVIFLEAPAGVGFSYSDDK

NYTTNDDETAENNYLALLDFFKKFPNMANKPFFITGESYGGIYVPTLSVRVMANASINFKGFAIGNGL

LDTYLNTETAVYYAYYHGIIGEDIWAKLQKYCCTNGSCSFVTPPNKQCSLALMETQDFTMNKGLNPYD

VTGDCAGGVPSDSFKERTRQVLSYFFEPPHPKQPLKNKVSGSLDSNNIIPCINTTAENNYLNRPEVRK

ALHIPDVVPKWKVCSVLDYHRVYNSMRAQFNALLPKHRGLVYNGDADIMCNFLGDQKFVASLKRTERG

ERRPWIYNQQVAGFVKDYDQVTFMTVKAAGHMVPSFKPGASLQMITNFLSNQPQ

Peptidase_S10 (PF00450)

WWW.NATURE.COM/NATURE | 41

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

42

Cathepsin D-like; oki.21.76; gbr.162.41 MKLVLLAVLLCAAVVRCSRIPLYKMRRTRRILSENELTWKPSKYSASPNGTVRIVDYMDAQYYGPITI

GTPPQKFNVIFDTGSSNLWVPASNCSLLDVACDLHNKYDWSKSSTYQPNGTRFSIQYGTGSCSGILDI

DTVQVGGDKALKQTFGAADHEPGITFIVAKFDGILGMGYPQISVDDVQPFFDTLMAQKSLDKDVFSFY

LDRKEGAAVGGELILGSSDPKYYTGDFHYVDVSKQGYWQFAMDGVQVVKDDKPLLSLCSGGCQAICDT

GTSLLVGPKAEVEKIQMAIGAAPLFEGEYLVECSKIPTMPNVTFTLNGKVFVLTPQDYVLKESEAGET

LCLSGFLGMDIPKPIGPLWILGDVFIGKYYTEFDRVNNRVGFATAIKGVDIKVHN

Asp (PF00026)

Cathepsin L-like; oki.117.13; gbr.131.13 MLRVAVFCVLAVAALGMPYTFNTELDGDWELFKKVHSKQYRAFNEEAHRRSIWEDNVKIIAKHNLEYD

LGNHTYRLGMNSYGDMTSQEFKDVMNGYKTRANPPKATVTFREPQNVKYPDTVDWREKGYVTPVKNQG

QCGSCWAFSATGSLEGQNFAKTGILPSLSEQNLVDCSYVEGDDGCNGGLMDDAFTYVKENNGIDKETC

YKYKAKDETKCKYNQTTGCVKGFCTGFVDIEAGSETDLLAACATKGPISVAIDASHSSFQLYREGVYN

EPQCSSRELDHGVLVVGYGVNSGEDYWLVKNSWGTDWGVDGYIMMSRNKNNQCGIATSASYPLV

Inhibitor_I29 (PF08246); Peptidase_C1 (PF00112)

Cathepsin L-like; oki.7.160; gbr.29.79 MFASTRILQTAFVLFVVCLSLGFLRATSDEVGKAGCGLHWEEWKEAHEKKYDSLTEEVDKRQVWEKNI

VVVREHNSKTGRSFDLAMNKFGDQTHAEMISQMKYPVDPIQIPITPLLNGIAEPPSSVDWRTKGYVTP

VRNQGACGGSVAYAAADTVASREAIHEAKPARVLSAQEINDCCAITHRACLPPIVLDKVFDCIHSIGG

LCMADSYHKSKNFTCNNGTCSPFAKVPNGGVQVATGDEKALAAAVAIEPILVGLDANHTSFFMYRSGI

YSEPNCKTKEPNHAMVLVGYGSQNGQDYWICKNSWDGIPNYLKGESLRKLATDLGRDGWDLGLVLGFT

VGELQIFKIDNKDNKREETRSMLAEYIERTAKGDVLGGLLGGLRAIGRNNLCIKLEKAAEDEGDFK

Inhibitor_I29 (PF08246); Peptidase_C1 (PF00112); Death (PF00531)

Cholinesterase-like; oki.186.13; gbr.181.14 MNLCVFTCLLALAGAAMAGPLIQTRNGRIEGVTETFKEDKYLKVDKEIDIYRGIPYAEPPVGHRRFQP

PVPVNSWSGTLNAARHGPACIQYFISFSGMDEDCLFLDVYRPHTVSKTAAVMVFIHGGAFFFGYGSMP

EYLGQPISAVGDVIYVAINYRLGPFGFLSTGDSAAPGNVGLLDQVLALKWVKDNIQRFGGDPDNITIF

GESAGGASVSFHLVSKHSRGLFKRAIMQSGTSTSFFAYQRSLDYAKNQAKEVGLKAGCPTGTTAEMIS

CLQALPARELRSVAYKVGLAYLPVVDGSFLHDKPENILAAGDFQKLDILIGTMQDEGSLVALVENLFS

FFASKAPPMSHDEFLKTYPGWIYNYGDVANNTAMKQAIETRYVSDSQAADPASDYLDNFIRIMTDYIW

IVPTEVTAQAHLREGNKVYMYQMTHTPTVSIFHIFFLGPKWVGAIHADDLPFVFGNAWIPKVFYKSTK

PLPEERMMSNTIMKYWTNFAKTGSPNDGVVPDWPEYNLDKKQYKDISVYFPTKSGGIRQDYVSFWTND

IPKLVTRSDVISDPTRGIEAFFEVFNLVEQCKAQFDENGLYIRQEKSDS

COesterase (PF00135) Complement C1q and tumor necrosis factor-related protein 9; oki.86.54; gbr.2181.1 MQLRFAISLSIALLLSLNLRAALTGPVPAMGNAWAKGENGEMRTLRNNCTKGNNKMDLPAKVGPRGQIG

LVEATWEGGDKEHKDKPGKTPVDSAHQVAFTMYMLSSSHTSINENTRLPFSSSITYVGVTRFNFGTGT

FFCDVPGVYVFTFSAATYFYNPLIIHLRKNFDFVISARNNDKMQEEQVSGSAVVVLEKDDFVYLSYLG

VVYSASFRRYTTFSGFLLYPK

C1q (PF00386)

Complement C1q tumor necrosis factor-related protein 2; oki.86.53; gbr.410.13 MQLRFAISLSIALLLSLNLRAALTGPVPAMGNEGSKGENGEMRTLRNNCTKGNNKMDLPAKVGPRGQIC

LVQAIEEAGVKGQKDKRGKTPDDSANQVAFTVYRVSESRQSSNQDTRLPFHLSKTLLPGTSFDFRTGT

FTCNKPGTYVFTFSAARSRSFTLILHLRKNGSILASARNSDKSQEEQVSGSAVVVLEWRDTVYLSFFG

KVFGMFGRAYTTFSGFLLYPK

C1q (PF00386)

Deleted in malignant brain tumors 1 protein-like; oki.96.79, oki.83.49; gbr.504.1, gbr.645.4 MKTLTLFLLILPVVFGHGGIPEPDVTVRLAGSNHYNEGRVEVYYQGQWGTVCDDEWDISDADVVCRQL

GFARATRAVSEAGFGEGTGQILLDDVACSGRESRLELCANRGWGVENCDHSEDAGVVCHGSITVRLVG

GENDNEGRVEVYYQGQWGTVCDDEWDISDANVVCRQLGFAGAARAVSEAEFGEGTGQILLDDVACTGD

WWW.NATURE.COM/NATURE | 42

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

43

ESRLEDCPNSGWGVENCFHYEDAGVVCLSFDVNNVRLVGPDPNVGRVEVYHNGVWGTVCDDDWDIDDA

SVVCRQLGFTNGAARAASDAQFGEGAGPIFLDDVACAGTESRLAHCPNPGWEVENCGHSEDAGVVCLP

DEEPGVTVRLAGSNHYNEGRVEVYYQGEWGTVCDDEWDISDANVVCRQLGFAGATQAVSEARFGEGTG

RILLDDVACTGRESRLEVCINRGWGEENCDHSEDAGVVCHSGTYM

SRCR (PF00530) x 4

Deleted in malignant brain tumors 1 protein-like; oki.96.82; gbr.645.1 MKTLAVFLLIFPMVFGYGFLPDEGDVRLVGRDPNTGRVEVYHNGVWGTVCDDGWDFDDASVVCRQLGF

TNGAARAATGANFGAGEGPIFLDDVACAGTESRLVDCPNPGWEVENCGHSEDAGVVCLPDEEPDVTVR

LAGSDNHSEGRVEVYYPGEWGTVCDDEWDISDADVVCRQLGFSSATEAVPGARFGEGIGQILLDDVAC

TGDESRLEDCPNRGWGVENCGHGEDAGVVCLNSEPVSLRLVGGENDNEGRVEVFYQGEWGTVCDDDWD

LNDANVVCKQLGFASAEEAVPEARFGQGTGEILLDDVACTGDESRLEDCPNSGWGVENCWHGEDAGVV

CLCADVSNIRLIGPDPGTGRVEVYHNGVWGTVCDDYWDIDDANVVCRQLGFTQGAVRAASFAEFGEGE

GPIFLDNVACAGTESRLVDCPNPGWEVENCGHSEDAGVVCLFNEVADSEGAVRLAQGPNGPHEGRVEI

YHDSQWGTVCDDTWYTDYNAQVVCRQLGYAGVEEVKRLAFFQEGEGPIWMDDVYCEGDEAGLADCPFA

GWGVNDCGHYEDVGVVCLTDFSEGDVRLYGPDPNVGRVEVYHNGVWGTVCDDDWDFDDASVVCRQLGF

TNGAARAATDANFGAGAGPIFLDDVACAGTESRLVDCPNPGWEVENCDHDEDAGVVCLPDEEPDVTVR

LAGSNHYNEGRVEVYYQGEWGTVCDDEWDISDANVVCRQLGFERATRAVSEAGFGEGTGRILLDDVAC

SGRESKLELCANRGWGEENCDHSEDAGVVCHGSVTVRLVGGENDNEGRVEVYYQGEWGTVCDDEWDIS

DANVVCRQLGFAGAARAVSEAGFGQGTGQILLDDVACTGDEFRLEDCPNRGWGVENCFHHEDAGVVCL

SFGMYMKKMVVLGACHFIKWHAADSYENCNRSMECNVDDKNLPRYGLGKGFFHVERIVSLFVLVEGNSL

PSKNFAYVLTS

SRCR (PF00530) x 8

di-N-acetylchitobiase-like; oki.86.66; gbr.245.15 MATSFRVVLVFAVVCQGWCLDVFSRSNGDCPCEDPALCNVIKTPPRKELFAFWVGGTHWKEYDWSKLTT

VVMFGSHYEADLMCYAHSKGVRVTLLGEFPAANLTSVADRSAWVMQQVDRAIQGHMDGINFDIEYPLD

ASKAKYLTALVDETTKAFHLSIPGSQVTFDVAWSPNCIDGRCYDYKGIADSCDFLFVMSYDEQSQIFG

PCIAMANSPYNKTAGGVESYLKLGIPADQLVLGVPWYGYNYNCTSLSTKTNVCHIPHVPFRGVNCSDA

AGKQVSYQGIIQLLMQNSTSGRLYNTTYQAPYFNYVDATTGDHHQVWYDNPQSLTTRYKYAQKMKLRG

VGMWNADTLDYRDNPTSKKLTKEMWDAIGKFFLP

Glyco_hydro_18 (PF00704)

Dipeptidyl peptidase 1-like; oki.165.35; gbr.627.4 MMATVKVFLAVVAFLPVVLADTPANCTYDDIVGRWVFKVSAGGGDNTLRCSDPGPVDHTVIVDLKFPD

VAVDATYGHTGFWTLIYNQGFEVVLNKKKYFAFSKYVKEGKKYKSICDETLPGWSHNVVGTDWACYVG

TKNQTKNRPPPEKPVDASKQLYKIDRDLINKINAAQSSWKAGVYPEYEKMTVEEMVRRRGGRASIMAS

KPSPAPVTEAVRNLAKTLPLSFDWRNVNGQNFVSPVRNQGGCGSCYAFGSMAMYEARLRIATNNTKQL

VMSPQDVVSCSEYSQGCEGGFPYLIAGKYAEDFGLVEESCTPYVGEDTPCKKNTCKRYYATDYKYVGG

FYGGCNEELMRIQLVKDGPIAVSFEVYPDFQAYKGGIYHHVGLTDRPGYRFNPFEITNHVVLVVGYGA

DPKTGEKFWVVKNSWGKYWGEQGYFRIRRGTDECAIESIAVETFPIYP

CathepsinC_exc (PF08773); Peptidase_C1 (PF00112)

Disintegrin and metalloproteinase domain-containing protein; oki.137.42; gbr.350.29 MMLPASLFLCLSAMAFAGPPVPSHRFSKDELQRYFGVDSDEKAPEYEIVYPEYATDDMKRSVGRSAIA

ALSLDVYVDAFGETLHMTVERDDSGIKPGLEVEYYTDEGIITEPVQTDCIYTGKVVGEADSLVSATTC

EGLMAIAYHTSGPTYIEPLDDEHAYKRDIGRGLPHVAYKNKPKNGASCPVRSLRCTEGDIKPSSTKYL

ELAFVGDAVLYYRRGNTTQTSLTTLFNAPGYRISWDLYRYLPSAAEWLDDNKYPSSDDRHWDNAVVLS

GMHFNYGILGLAYVGACDSKEAVSVSSFLSFDEATDTAAHEIGHNLGMCHDSEGNSCPPSGYVMAAYE

NEEKIKPIWSSCSRRYYNSFVEKMTCYNDS

Pep_M12B_propep (PF01562); Reprolysin (PF01421)

Dopamine beta-hydroxylase-like; oki.27.87; gbr.41.45 MLAASLFLLYAVTGGFYYPVCSEPHSAFPYSILLVAEGQDSDQAATLFWAVDFEKETVDFRLSVPLIY

GGTLRENGEWFAFGMSPDGTLSNADLVVFEFPQGELKLTEAYTDDTGEVHEDGNDHDYVLLGWRVSPG

VGGNDESPAHLEVEFRRKFDTCDRHDYLIDSGTVNLFYLRGSHSSISSGYIDPREAEIAFQRAQLLKS

WWW.NATURE.COM/NATURE | 43

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

44

TRRTPPLPPNVKTADIVMHNTAIPNQATTYWCRVQKFPDIEEDHHIIQYEAVVTPGNEGIVHHMEVFH

CILPAGVKVPDYNGECEGEDLPAELASCRRVIGAWAMGAKAFFYPEEVGVPIGGSSVSSYVVLQVHYN

NPRLREGVLDSSGIRFHYTSSLRPYDAGVLDSSGIRFHYTSSLRPYDAGILEIGATYSPNLSIPPESN

GFYFTGYCSPDCTDKGIPDRGIRVFASQLHTHLSGTAIWTKHIRNGVEMKELNRDDHYNAMFQEIRFL

QKDVTVLPGDALITTCRYDTSARQNVTLGGFGIQDEMCVNYMHYYPSINLEVCKSTIASTALAAFFKT

VDSRTGDSPHDVNLTSPSAVANCFLHMTWTPSVKLLWQYVINEAPLDIECLKSSGQPFTGKWKDQPPPS

VKIPLPRKKRKCPHKKIHPFQRKLGSSPTT

DOMON (PF03351); Cu2_monooxygen (PF01082); Cu2_monoox_C (PF03712)

Endothelin-converting enzyme 1-like; oki.52.105; gbr.39.67 MILSVICVLCQAVVSLGLPVQDATLVSPGPICLDPECVIDSGVILSALDQTADPCQDFYRYACGGWMD

KAQIPPWDSSISKSFGGLYHANLKTVKTILEADSPMYSALQKARDYYAACMDLQGMEKAGAQPLLKLI

EVIGGWSLLPSLKLEFTTNNPQFLTTLIAVQKITGSPLFDMGVTIDDKNSSRHIIEFVQSGLWLGARE

LYLGGHDDLLAAYVKFGVTLASLLAWDKGLSIAGDFLVQTERKMKEILAFEIELAKISVSMAELRNPW

KTYHKMTLSEFAQLVPDVDVQSYVNGVFGREIPMDEEVLVPTLSYFPKMNELTKRTPQRTIHDYVVWN

LVASLSGSLSQAYREAVLEFTSAFTGTMTVAPRWMTCAGSANQVLGFATGAEFVRKRHSLEIKDKIKA

IVENVRKTFIDRLPSVDWMDNTTKSLAVQKAQAIQEKLVAPSWLEDTNRIDDYYSKLMVNSKSFFNNI

LSAGKFYSEKNLAKYGEPVDRMEWDMVPAEVNAYYTSSMNEIVFPAGILQFPFYSSALPSSINYGSIG

WVIGHELTHGFDDRGRNYDDVGNLHNWWKNASAQAYKERAQCVLEQYSSFKIGDKHVNGLLTLGENIA

DNGGLRLALQAYHSYRELKGGKETRLPGLQDMTPEQIFFIGAGQTWCKLDTPQHAVLKLLSDPHSPGK

YRVIGTFSNTEEFSEAFKCPKGSVMNPEKKCHIW

Peptidase_M13_N (PF05649); Peptidase_M13 (PF01431)

Ependymin-related; oki.140.50; gbr.184.58 MSLARVSTLVLALVLVGTVSAVSQTPCCYPKQFVARLEIIDTVLGNGTVVVFEKKTEMAYDEINEKTAE

ITEVYNDVTGVVEKTKLIYDYKKGKKYIIEGGHCTTKHLSYSFLPQCIPPIATFDETVPIGLGDAFPV

SSFHFLIPSSSPSFNLLVRYSVGAEGCIPYQLGIFAVQKTTRGSSRGLPVMDPVPPSHPMFPWYASPT

AGPYRKLSSAYQYSDYQNGIEDPAYWFDPPSGCFPGLRGSKSTIGGPLEKVATMNKILRNRVL

Ependymin (PF00811)

Ependymin-related; oki.66.102; gbr.184.59 MVSYLSTTLLLACLAAVGADEPTPCCPPKQFTIRIDEETTDLVDGVVTTLEQQVHKAYDATNKLLTDIL

FQYDSVTGATVQSKTIYLFPQGAKYVIRNGKCTKETLHYQFLELCVPAVAKYGHSYTFGLGEYSANLF

YLEIPHQGYNQVYDFVYGADKCIPYQYAVSRKAAGTTSSASSAGFNGTLTLPVPFNQSRLAVGAEVPL

SIRAQYFDYVEGISDPAYWFKLPAECKQLEQQPSEKQKAAAKMMTKKLMAKLNSDKKFVMY

Ependymin (PF00811)

Ependymin-related; oki.227.19; gbr.213.18 MNRCSVVRLVALLAATVLVADADNYCCTAPQFVIRADQLQGLQLPSGQGIAQLNVLDMALDFTNERAGE

EIDSYSMGRLTKIKVIIDKKKNVMYTIIGQNCTKTEARGEFLQCIPKTAHFDGSSYLGDNELTLDAFS

FPLDEPKVVRGNVTVSVTHGNCIYAGTWLVGEATQTEPPIPLVSSTSFVNFKRGIADPSRWFDVPSFC

QQEKSTRARRSVRPVSDDHKDVMKLVSIFNTMMLPDVGHAKVLKPQKPSSKQ

Ependymin (PF00811)

Ependymin-related; oki.227.20; gbr.213.19 MNPCSVVRLVALLAATVLVADADNYCCTAPQFVIRADQLQGIKLPSGQGLAQLNLVDIAFDFTNERLGE

EIDSYSLGRLTKIKVIIDKKKNVMYTIIGQNCTKTEARGEFMQCIPKTAHFDGSSYLGDNELTLDTFS

FPVDEPKVVSGNVSMSVTHGNCIFAGTLLVGEATQTEPPIPLVSSTSFVNFKRGIADPSRWFDVPSFC

QQKKSTRARRSVRSVSDDHKDVMKLVSIFNTMMLPDVEPAKVLKPQKP

Ependymin (PF00811)

WWW.NATURE.COM/NATURE | 44

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

45

Ependymin-related; oki.11.62; gbr.218.31 MQTEACVCSPLYQAMLAMACRVVLAVVCLLLVVEIWAEEPQPCCWINQMTGKFELQQTEELEGGTVVL

ETTNAEFAYDKTFSCKAFIFTETFANGTQRVDRLVEFYNRIEKAVYQGNVTLGDDALSGNTWELNFAQ

GGYVANRTYLLAADYCVPITVNFVLMNFNTDPPEANAGTIAYADIELGICDRDKFFHIPDECPQDIMT

PFLEKISKRRQLLEMQYEAFM

Ependymin (PF00811)

Ependymin-related; oki.11.63a; 218.32a MSFSSLACLLMVVIAGAQANVAQGLPFRGVYNPFSHHADEPQPCSIPRYFTYLQEDFVTTVEEGRLKVE

EESWEGAYDTIAQKYSAKKEERFFNGTDDYSKIIFDEGKGVEYIIEKRGEETVCLVLDIEGQKFPDTY

TFPKNAKFVGEATLGDRDLVVNVWYYPSEDNTKHTVKTYTREECVPIGLRHRKFDPQTGVEIELRESS

LYDIKLGICDEEKYFKVPDECKEGRALKEPTLTMMKIRNHFN

Ependymin (PF00811)

Ependymin-related; oki.11.65; gbr.218.33 MLSSFIVCVLVVSAGAMAVKKPVLNAQTVFGISPREGEDERKPCATPKYFTFDQADFMTTLEEGRLTVE

YVNFHGAYDEVLKRYSVETFLDFFNGTLLHLKGKGYLVQRLGDEIECEVYDLEGKKFPEEFSFPEDAT

FLADSTLGDRDLTVESWYYVSEDGTKHNVKTVTKDECVPVSLFSRKFDPETGEELEVVNGQVINFKLG

ICDPESYFKIPEECKEVGVSKELSKNMKKMHRFGLM

Ependymin (PF00811)

Ependymin-related; oki.11.66a; gbr.218.34 MKDHGLLCVSTIVIQIIDLGNLVLRACRSESWSVALGLKTARHTHHNPYQGPPIAAMYAAVILCILVVA

ASASPEKFIFGNKLDQPAPEKCCAEPYYTFQADSVFNTLQEGSLLTELIRARGAYDSIDKKFGLKVDLH

ISNGTVELYQLINDFKEGLGYYIYTEEEETKCVEFPITSGFPYNCIPEGSTYVGSVTIGDRALRAANW

YFNDKTDPTKDMHIVFSIKEEECIDLGYLARTFDPETGTEISVDRTGISDYNLGICDPDTYFKPPEEC

KSAKVKRVNSVPKRIGGLRGPRGQRLFQ

Ependymin (PF00811)

Ependymin-related; oki.11.66b; gbr.218.35 MWSGLLVCMLVVGAMAFQPSVPFGASQPGERHEPCCAPRYFTFNQVTTTTSVQDGSLLVEYDNAEGAYD

AKFERIAVKLVIDYFNGTEVYLRLIEDYIKGVSYYILEHAGEDICFVGRTEGRFNEECLNDDAQFLSE

ATLGDYDLVLDNWYVVSEDKTEHSVKSVQHEGCVPVGLLTRTFDPDTGKELKVDDSRVLDFKLGICDP

DKYFKPPASCDEGRAVDKPTPQMLKYRRKGLFRRSISE

Ependymin (PF00811)

Ependymin-related; oki.141.70; gbr.60.100c MGLKSVAFVLFVVVAASYAKSLANVKPCCYPDKYEISSGTQAGLSRNGRGTGYSIQSVSAVDATAMKIG

EKGIFFEEDGTAHEFRNIKDYAKKEEYRIDPKGEQCEVEPLEEDMPLCVPENATFSDSSYLGNDSLIV

DSYIYFYNYPGYVVGHQSVGVAKEGCIPTSYTFSGSLGKGRRRTDILTITGFYNYKDGISDEASFFDV

PGYCETSVNKNAELWELLRYKQVPIHF

Ependymin (PF00811)

Ependymin-related; oki.141.71; gbr.60.101 MGLKSVAFVLLVVVAASYAKSLANIKPCCYPDKYEISSGTQAGLSKNGRGTGYSIQSVSAVDATAMKIG

EKGTFFAEDGMAYEFRNIKDYAKEEEYRIDPKRETCEVVELKDEMPRCVPDNATYSDSSYLGNDSLTV

DSYIYFYHYPGVVVGQQSVGVAKEGCIPTSYTFSGSLGKGRRRTDILTVTGFYNYVDGISDEASFFDV

PEYCEESVNKNAELWELLRYKQVPIHF

Ependymin (PF00811)

Ependymin-related; oki.141.72; gbr.60.102a MDIKLAALLLLVGLATSYAQPAEGPCCFPDQFVVGVDSEAFLGQGYPWQLEEEQSRESASKIQPEGVVL

GSETAVGGRGGTKRLAIVGQSAVDVNKNMIGNEFTLFQSDEERPSRQRLIYDYEQGFQYIIDTDELKC

SKTKLQGDIPRCVPPGVDFNTTVYLGDRQLFIDSYHYQIQQLFKNGRSSVSVTKEGCIPNSASFSGTT

FRTSILSFAGYFNYEAGIADPDRFFEVPEYCPTVRL

Ependymin (PF00811)

WWW.NATURE.COM/NATURE | 45

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

46

Ependymin-related; oki.141.73; gbr.60.102b MDIKSAVLLLLVGLAVSYAQPAEGPCCFPDQFEVGYGSETAVGGRGGTKRLALVGQAAIDVTKNMIGNE

FMLFQSDEERPSRQRLVQDYEQGFQYIIDTDELKCSKTKLQGDIPHCVPPGVDFNTTVYLGDRQLFID

SYHYQIQQLFKNGRSSVSVTKEGCIPNSASFSGTTFRTSILSLTGYFNYEAGIADPDRFFEVPEYCPT

YIEKEPKLLELLKYSQVLF

Ependymin (PF00811)

Ependymin-related; oki.141.74; gbr.60.103 MEVKSVFLLLLVVVATSLAQKKCCFPKEFEALDGQVVGTISAGKPVAVLESIQFAFDYFNQRAGEFAFI

QDGAEVYMYQIIVDYKAQTEYIIQAHTQTCQKIPLPAGTNMSHCVPDDATYESSFYVGDNKMTADSFT

YSLKQGVVAGNVILSVSKGDCIPYSVTFFGQHQGTPVLQVTGFVNYTSGIQDPARYFTVPEYCMEQSF

SQAPKMYNSFIHLFV

Ependymin (PF00811)

Ependymin-related; oki.141.75; gbr.60.104 MSILSVVFLCQAFLAISYAQKKCCYPDQFVSIEGIEIGLSQSGKGSATVGKIQVAFDYTNKRVAELGIM

TTDQKSEELQGIADYSKGVQYFIQPKARECIKVPLAGPMPHCVPDNATSVGSIYLGNHKLTVDVYNLP

VDEGIVSLSVTHGSFLAVTGFFNYTAGIKDPSKFFTVPDYCPKSFLEVPVFQKLPSYKMFF

Ependymin (PF00811)

Epididymis-specific alpha-mannosidase-like; oki.12.113; gbr.3.77 MAIHPLLMLIAAVTAVCTAAPTSMTPPMESATPTSMLVTLHTYYAIDSELNELRRTQKTAYDNYKMDRF

HDEDIYISSTMVNIAGMPDLSGLTNADRFAMNESDLLQLHGSNLQIFRNLMFFVEADEALSNSPINFGN

NFSTIRVRLNRVVAMLEAVMEQRDFFQTGFPTPPAVYLQPNGNELERNTRDLFVLQEMHSYLHVAHYDL

YTWPVVHASLARAQSPTGGSNGDDATPTMIQAFIIPHSHMDVGWIYTVQESMEAYAADVYTTVVANLV

KDSKRRFIAVEQEFFRLWWTTVASDTQKVEYFRTVMNQGCFIQTHDYLHSSRKVTLFACFFFLPEGHS

FIYETFGKRPRFSWHVDPFGASSATPTLFALMGFDAHLTSRIDYDIKEQMQKNKGLQFMWRASPSLGE

SQQIFTHVMDQYSYCTPGRLPFSLKTGFYWNGYAVFPKPPPGVSYPDMSLPVTNENIKKYAAVLVNNI

KQRAAWFRTQQLLWPWGCDKQFFNATIQFENMDKLVAYINQNAKSLGVQVQYATLGEYFQAVHQTNLS

WALKQEGDYLSYSSAANAAWTGFYTSRSALKGIARRAQSTLHAGETLFSIYLHQPKLNRTVNTTEVLN

SLQGLRWASAEVQHHDGITGTDSVKVKGMFEDHLETAENKTLASMKKVFQDLIKNPGRDEEEPDILTS

VGPGRILDVNRDKPLAPILVYNSLGWSVKRLVQTSITDPNVTVIDIHGRDIGSQVNPPLEPGGPYHLF

FYAQLAPLNLDVYYIKYKTHPSDTTAHQGKLESLGAVALDITPSKDTKPMQGADVKSISNDCFEVTYD

TTTNMLMAITDKKNGKTIPMEQVFMEYYSHYNAMFGQTSNLYVFRPWGAGPHTAGESAKLDIVTGPYV

NETRQSIFNMYDPKNSRFVVTLRIFDLPHSHNDDIVCGHIELDFKVGPLMPNKELVYRFSTKLDSSRV

LYTNDNGFQTMKRTWRPNKPEPEAQNYYPLVSTAYIESPKDDIRLTVMAERSHGVGSLNNGQMEVMLL

RRLITNSGYDDKNNLTLEEPEVAMPTLWLLLGNRTHSSELQRRAWLHLENPPIIMAVNQNPEVLQKKL

KGKSPNPLPTVLTDLPLNVHLLTMKIPGWTYKTSHKEHLRSLQARLRQGSYDRSEEDPNLDRILLRLQ

HLYEKGEHPVLSKPATIDLAKFLSPLGTIASISERSLTAIWQADQVKRWTWKVKDDPSASQQFNNSGA

ATPRARNTTIFTLNPQEIKTFFITLQKTEA

Glyco_hydro_38 (PF01074); Alpha-mann_mid (PF09261); Glyco_hydro_38C (PF07748)

Fibrillin-1 –like; oki.2.22; gbr.73.44 MLKSLVLTVWIVWIVLLNTIPESVDGDVTLEMLLTTYTGNDRDFDNNCCDFCGFFNGDSCDISFDISIN

NMGGTPIYVLSTGLIQNDAESVTFGSVVGTSPNPIQMTFSSWPWRVKMVLDVYDDDSINVIGGQPELVD

TYETTITHVPEANSSIAVTHYYSETGTRANNPTTLTFELKVYCNSNYYGTDCATYCQATDGNSGHYTC

HPLTGDKICLDGYEDPTTNCLTETDECLSSPCLNNATCTDQINRFTCVCSDGFEGTLCETNTDECASN

ACQNDAICLDEINGYTCVCPDGFEDQINRFTCVCSDGFEGTLCETNTDECTSNPCPNDATCLDEINGY

TCVCPDGFEGTQCETNTYECASNPCQNDAICLDEINGYTCVCPEGFEDTTPDIRSTTQEQVETLQTTS

QSATFDFTTDNAEISASLGGHTYLTRLQFSTDEPSNGVTTVVHPLRTEQELATTVPVFRPTTRKHTTSL

ETTVQPSTTVANVVGTESATLLEDQTQSTISSSAAFTTHRTTDASRSATEPVISPTTASEACGSQPCQ

NGGVCSDVFEGYECSCPLGFVGADCELIDLCIPSPCYNGAKCSMNSHQNFTCTCARGYNGALCADDID

ECEEEGDCPDRSECVNSIGSYTCVCIQGFKGQECSERDFCADGPCLNNATCVSVVEKFSCECKGGYTG

TRCETIIESPCQPSPCLNGGECMFGQGDGQPQCLCGQQFLGENCEIDTDIFSCKMYVEGASVDTETFK

TDTANLIKSTMADNAGSYSTVEVVLVSTADYESAETGEPVTLVTYLVLVNGTALTPAEVTDLMEGTSDD

VMDDIVSYKPFKGNVSARSEMVRAKKQKQKVHSVDECDASKEVMSVQAEVYLASHLDGNVEQSTV

DSL (PF01414); EGF (PF00008) x 5; EGF_CA (PF07645); EGF (PF00008)

WWW.NATURE.COM/NATURE | 46

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

47

Ficolin-2-like; oki.73.19; gbr.259.26 MMKFCMLLLLACCVVGIRAENGNKHQSTEPSNPEAVQSAVTSEVLKLIEVAIKVAQYTDSAEKQKAVGW

VQERLSNIKDLLDSTNKQPTEPETVYYEDCSALLSEGFSESGLYDIYPAYPESTDPMQVYCDQETDGG

GWIVFQRRVDGSESFERTWDEYRQGFGNLEGEFWLGNDNLVLLTVPGQVSGGSGGNVELRVDIRDWAD

NTAFAKYLDFSVQGSEFTLTADNFAEESTAGDALDYHNGMDFTTIDHDNDRASGGNCANWMRGGFWFN

YCLTADPNGPYLPEEGEHGALSPGTDQGVTWTTYTGHGYPGREYSLKGTEMKLRRAPVILS

Fibrinogen_C (PF00147) Gamma-interferon-inducible lysosomal thiol reductase; oki.287.11; gbr.460.4 MMGYRSTVAVLLVVFATQSARGMSCTFPPELWCSSSEIAESCHVVKQCSEWQSPVKDAPKVNFTLYYE

SYCPDCQLFISGQLHDAYMAVSEIMNLTMVPYGNAEEKREGSKYVFQCQHGQEECRGNVLETCILHFA

PFQTAFQTIYCMEVSRDPVTYARECMEKMKVNPEQVFACANGSLGNALEHQMALKTDALKPPHQYVPW

VTLNGVHTEKIQNEAEMNLKKLICDTYQGAKKPPACSQDKPKLRSRMRE

SapA (PF02199); GILT (PF03227)

Glutathione peroxidase-like; oki.208.14; gbr.467.2 MATAASAAMPFSALLVVLWTVAALSTAATGPLESVCVREGSASVHLFSLGSLNDTSPPVPLSRYAGKVL

LLYHQLNALAERYEGMLEILALPCNQFGLQEPGENDEILNGVKYVRPGGGFEPAFPVFAKIDVNGKKE

HELYTHLKSVCPPVKLEIGDKSKLYWSDIKIGDITWNFEKFLVGGDGQAYKRYDPSIHPKGIEADIEG

LILRERLRSEEERRDFEAFLHEKVY

GSHPx (PF00255)

Heme-binding protein 2-like; oki.355.13; gbr.476.10 MAISQGSLVLLLALTGFIVCTGFSINKHDKGESGPPFFCHELECPKFKEDYNSSDYQIRRYETSKWVS

TTITGIDYQAASEEAFLRLYEYIQGQNDQKVKIPMTVPVINSVQPGLGPVCASNFTFSFFVPFEFQSN

TPKPTNPELFLTTLDQHKAYVRVYSGFTNEKVFPKEAAALAAALNSTQTYDKSYYYLAGYDSPFVVHN

RHNEIWFIATEK

SOUL (PF04832)

KDEL motif-containing protein 1-like; oki.156.24; gbr.130.23 MSRISRTFVVTLLCCVLACCAVKDVVTGYDRERVVCPLKSRVWGPGLEANFNVPARFFFIQAVDFENN

SFTYSPGPNAFQVSVRPTSGRGRVWTQVLDRNDGSFIVRFRLYESYPGMRIEVKSGDRHVQGSPFLIS

DPVYDEKCYCPEQIQSQWQADMRCRSEMHPQIEEDLSIFPAIDLDRLATETVNRFARHHSLCHYSIINN

RVQLPDVEFFINLGDWPLEKRGADHSPLPILSWCGSDESRDIVLPTYDITESTLETLGRVTLDMLSVQ

ANTGPKWVNKTNKAFWRGRDSRRERLNLVKLSREQPELIDAALTNFFFFRNEEAEYGPKVKHISFFDF

FKYKFQINIDGTVAAYRLPYLLSGDSTVLKHDSVFYEHFYKQLEPWVHYIPFKKDLSDLVDKIKWAQT

HDEEAKTIAQNAQQFAREHLMSNNIFCYYFQLLQEYAKRQTGPPVVREGMELVEQPQDGTPCQCSRLS

PIVVKATTMDHWMRNKWIEKVLSKFDRNAVTPISDDQEPLACQVLNIKVLDKVVGSIGGVAEVHDCELY

IKALFSVDAIRRFEEREESTSCAEFVELRNSLLDLTRYHIMVDVQQQIEKSDIAICVEDFAMTVYKALP

GTWRTPLQPAIHHPTVAHDLRRMWRAKYAPDEPDTCQSQDQSQDSPDKCTLSVLLEAMENTSQQASELS

EAEQDQPIEGPTGTLKSPEESEETCASRVAFELGSAGCDESGQGTVEALEKAVKEVETKLIVAKNFPEL

AERIPEYLRDAQGFSVEDLPQDEELDLSRSELLECWISEEALEKLCGIEEWKPDYIPPSSLPVVSSSSS

EVISFLTEPSADNQAEKMPKDRLEDPSQDGVTQCGSKRKLSSWQIGESPHKAAKNSVSGQGGVVCAGDV

DCEAGADLGESSEMVLDIDDPDREVVTEPSDKQTDAEAASKEGRADRRISCNCESSPEEEGLTALCQCL

TPLEEALGETENQVNQQDESLVSNTDSTDTSLVVSCEMPSQSVNSDPDKDQEDLVIFHKLLNVSSEAES

LLGVNDMPVTVDSSMSPAQTGRNETLQCSGERPVTTSCTSDTNSKDAAEHVEPSNTIDEPEAAGQMADT

SAEVLLVDAEILNVEHVSNSSGQRDVDTSPLPAALKQRANSTPSKTVSEQVTDDPAKPQQVPDKCKEAS

NTGPSCLPSTTRTVQTNWSVNVIPLKKTVSSTASQKATSESRDERPGHRPITQLRSPVILPSISTPVHS

LTSHQPRRTQQSETASSLNTNEEALKSASSREAPPGGARSLMLHSVAF

Filamin (PF00630); Glyco_transf_90 (PF05686)

Laccase-like; oki.97.25; gbr.140.46 MTGFVTAVVFLTVVSVVTALRPADDHPCYRQCDFTTPMICTYNWTVNWYYALSNQICSGDEVVVSVTN

NLQNNEGVTIHWHGIFQNGSQFMDGLPMVTQCPIPSPGNFEYRFTPYEPGTHWFHSHTGLQRSDGLMG

PFIIRESRRTDPHGDLYDLDVAENVIFLNDWHHDTSVSEFAKSQWGGGSRVDSILINGKGIAPGISPP

HPPLEVFNVEQGKRYRFRVINGASSFCNMQFSVQNHTLLVIASDGGPFQPQAVEYFRINGGERMDFVL

NTNQTVDNYQIQVIGLPCGNTVPSKQIAYLRYDGANDPPAINELPNITTYGNSSTWWQAFPGKSFAQV

WWW.NATURE.COM/NATURE | 47

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

48

DSADADYRSTEGAEDRQEYIQVGFGSRRDPTTNQLMTYPQLNDITYYFPTSPVLSQFGDLPENIFCNQS

DFPQGECAGEKLCSCVHTINVGEKERIEVILVNSDDFGVLHPMHLHGQQFEVVAMERMEGVTIPLVRE

LLANGSILRNNNGPRKDVIIVPDMGYTIFQFQGWNPGWWIFHCHIEFHLEVN

Cu-oxidase_3 (PF07732); Cu-oxidase (PF00394); Cu-oxidase_2 (PF07731)

Lysosomal alpha-glucosidase-like; oki.157.9, gbr.498.9 MQSLRATGCHRAIFGIAFISYLLSGVTLTSGMQCSGIASSYRFDCHPEEDATQQECQRRGCCWLASKN

LDEGVPYCFYPSDFPTYQMGSPQPTAFGYTVMLKRTTKGYYPNDVMQLQMDLYFETSYRLHFKIYDPK

SKRYQVPVPTPKVSKRVPSTDYKVQFSRSPMGLIVTRRVDGTAIFNSTLSPLIFADQFLQLSTSLVTP

YIYGLGEHRGRFRVASDWETYPMWSRDQPPHVGDNLYGVHPFHLGLEGSSGNSHGVFLLNSNAMEVVL

QPAPALTYRTIGGILDFYIFMGPRSDQVVQQYTEVIGRPFMPPYWGLGFHLCRWGYGSSNRTLEIVKQ

MRAASIPQDTQWNDIEYAVGRKDFTVNNGSFAGLPELIGYLHSVGMHYIPITDPGISSTQPAGTYPPY

DTGIAQDVFIKDSNGKPIVGRVWPGSTVFPDFFNNKTKSWWLDQIRDFHSKVPFDGLWIDMNEPSNFV

NGSVNGCPQNSWNNPPYTPAIVGGKLSAMTLCPSATQAIGKHYDLHSLYGYSEAIATHEALVEIRHKR

SFVISRSTYPGSGKYTGHWLGDNYSLWPDMAYSIAGILSFNLFGIPLVGADICGFNLNTTEELCQRWM

QLGAFYPFSRNHNTLGAMDQDPTSFSQDMQTSTRKALQLRYSLLPYIYTLFHFAHTQGSTVARPLFFE

FLGDPQLYDVDKQFMLGSAIIVTPVLEKGATSVTGIFPKGVWLGVYYEFSINLFDMDYKNLGTHFQVD

STQPVTLPAPLNEINVHVREGRIVPLQLPSGGNPTTTTQYRQLAFTLLITRSDRSPGTGQLFWDDGDS

LDTYESGNYLLVEFASDATMMNSTVVHDGYSGSQPVTVETIILGAVPRPVSQVTINGTAVPFHYLPDVQ

NVYVDKLKVPMQFNFYVKWKYQTQE

Trefoil (PF00088); NtCtMGAM_N (PF16863); Gal_mutarotas_2 (PF13802); Glyco_hydro_31 (PF01055)

Lysozyme; oki.43.110; gbr.99.40 MMRLAVLPVFGFVVLMIAEPCMLTSVNASAGPVPYNCMHCICIVESNCKMPNPVCHMDVGSLSCGPYQ

IKKAYWNDARLKGGSLMGDWKKCTATFTCSEDAVQGYMERYAIYSRLGHNPTCEDFARIHNGGPNGFK

NPATIPYWDKVKNCLERK

Destabilase (PF05497)

Melanotransferrin; oki.3.93, gbr.11.89 MAQAAIFLTTVLLVLCLHHATSQVTEMRWCTTSSHEEEKCLAMKNAFASNNLKTLNCVAGESAMHCMR

LISTNQADLITLDGGDVYVAGKEFNMIPIMQEVYAGNDMGYYAIAVVKKNNTGFGLRDLQGKKSCHTG

VRKTAGWNVPVGYLLEAGYMIPTDCQDDIRSTGAFFSESCAPGALSSEYNPDGNNPESLCALCQTTTP

IKCPRNSNEPFYNYGGAFRCMAVGGGDVAFIKPVTITENTDGNNQADWAVSLRSQDFQLLCKDNTRAE

VGQHESCNLAFVPSHAVMASKNFDSAVLQDFRAVLGQAQELFGPDTNTNGFSMFDSSLYGASNLLFKD

STQMLADVTKEYDAFLGADYLATLKGLDKCPDGTLRWCAISAQEKSKCRAMKAAFSGAGITPNISCYE

SYSADACAVDIAGDEADLVSLDGGELYEHGREGRVAPILAEDYGTGDPTARYWGVAVVKRSSSFTIND

LKGKKSCHTGYMRSAGWVVPIGFLINRGDIVSSHACDIPKAVGEFFSQSCVPGVLEPSNNPFNTNPDN

LCALCKGQGENKCKPNHNEPYVGYDGAFRCLVEDSGDVAFVKHSTVPSNVNGDQSWNSGVRKEDYQLL

CPDGTRKNIDDYRDCNLAKLPSHAVVTAVGKTTSQRDAMKTVLKSGQDQFRFDNSPQGMFKMFDSAGY

GANARDLLFKDVTLYLNDTPTTYDQFLSQEYRDALDVLYCNPSSRPSAAAGLLPSLLVMMLAWVMHRL

ARG

Transferrin (PF00405) x 2

N-acetylglucosamine-6-sulfatase-like; oki.277.11; gbr.290.3 MDHLWRICLVMSLVITVYGNGKRPNIVFILTDDQDVTMNGMTPLVNTVSLIGKQGITFNNMFVSSPLC

CPSRSSIFTGNYVHNHKTLNNSISGGCSNKNWQSGPERSTFATYLKGMGYSTFFAGKYLNQYGTKLAG

GVSHIPPGWDEWNGLVKNSKYYNYTLSVNGKAEQHGDDYHHDYLTDLINNRSHEFLEKQSESTPPFFV

MVSTPACHAPFDSAPQYVQNFTKNAAPRGPSFNKAGKDKHWLIRHAPNPMAKSSVTILDDVFRKRYSR

QFSLPMDKRQLYEFDIRVPLLVRGPKIKAGTVTDHIAVNIDIMPTIVHLAGDPAPQNVDGISLVPILI

PNITESDEDKRDCLTKNKEENTVEDCPNTDPHTSFQEYYDLNKDPHQLTNTIKSVKPPDLAAMHKLLVL

LQLCKGDNCRQLIPPHTRH

Sulfatase (PF00884)

WWW.NATURE.COM/NATURE | 48

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

49

Nucleobindin-2-like; oki.21.120; gbr.122.8 MERWRSLVLLLALLAVCRAAPVLPEKTDTDDEGDIEDSEHTGLEYDRYLRKVVKTLENDPVMKKKLDDL

SLDDLKTGKMMEILDGVSDKMRARLDDLKRSEIQRLRQVARQRFQQANHRPMDQKQIEDMTGHLDINNM

DQFGGKDFEKLIKKATEDLDKADKERKEEFKRYEMQKEIERRQKLKEMDEAKRKEAEQEHEERRKKLKE

KHMKHPGSKPQLEDVWEETDHLEREDFNPRTFFNLHDTNGDGTLDAFELEALFVKQVEKMYAERHNSD

PREKFEEISRMREHVMNEIDKNKDKMVSRKEFMEATEAEDFDEDKGWDDLEEQDIYTEDELETYEKNI

QEEMKKLKLRLQQEHDQYKENRVPQPGDPVVMNQAKAAISNPQQLADQVIQAAADHEAEKFHQASQQMK

KLVDEVANQVKANAQGGQKQQDNTQHQQQQQQ

EF-hand_7 (PF13499)

Otoancorin-like; oki.67.66; gbr.101.44, 65.74 MMQLFQLSLLVVFLAGLKYLEIRALERDDQWRSPAATRGHEWHEAFDPKEHDTDVLDDSRLHEVAKAYG

DFFVPDGSGRTDLNMSAVMYQAEKQQREAMAGGGLGRPAFIKQPILPNTVLKRISPDIIRNMTASEVFQ

LQKVTEMSYESLAHLLSNLQPQEFDKLIALARNATNFETKIPDPTIAKAVIDSCHRQWGPVSEWTTKQV

SKIGPMLTSLNIEQVKQLNEEVLIEVIRTFARVGFRGPATRTLVLKAKRSWGAIRTWNPDQFATLGPLL

IYLTPRDLKHIQPAQNVTSLLPLLGSLELERGQARAIISIAASSPDWRWTLEQVQSLGKMRQYLDPMEL

KAAPASAFASPELLEEMLPSNRRGHHRQTKEIARVLKESKGDVSAWGTEDFRSMGRAASGLGVSDLEQL

DPSVVQGAIEDIADADYSPRQRKVLLRKYQRARGVQNTAMSAAEVRQMKGLAADLSTSDLAHMDPEDIR

ESVDVFAKNAKRMKKTQKREIIKQLKEAPGGIKDAIRDMGDMVKELPLKDLDNLCTANFTTMADQNSTS

VAAAGLMNWTDGQSMKLFRCFKDEVLGSGEAEGADLPALPTPTTIRYLGSIARGMTCSDINGFVADDIL

PTVGSMMEQEGWSPRQLDCTHRKVRSSLSEAHSDYTADFTETEIASLTGQLLKEFSVVELDTIPSSHCE

MLYSEISEEDLMGLKREKRKVLTSRALQCLGVDGELDTLEQDTMDVIGNLACDLDADALRRLSSSTFTD

NLYSLQKCCLDVDQLVVVGERLVQELGSPNQWLSDTISDIGPLLVSLSELEIQSLEEDQFSLVAEDVMV

RFAEYKDKWRRHCDIDLQPQDISSREEGFVSVAIKAKDALVAVSQADGSRRKRRESTYSPTCDEIESLG

DGNIAWTVDELGAMSADTFDSCGYALGEVTGFTDAQLAALLNKAKEAWGQAADMTPDQISQLGHIASK

FTPAEISQLNLTETDTVYAIAQYHIYTTDQLGAGVARFLELSGVSVASLDSLDLTALGNFLCGLTVVE

MASIPSSAYQEAASTIGDLRSCDAGQWASLKAKAVEEYGAIDTWMPEVFAEVGSLVAGFTAEELSSLS

DVSIAGIKPHAVSLIGPQTLAAGWSSSQLSKLDQLQAEAVTEEQLAALSEERRSALLDAEYGDDVSLA

EMDEVTEEENDGTQGKSAGYQSTGLVPTIIAMCNVISTAMQ

Mesothelin (PF06060)

Ovoperoxidase; oki.26.138; gbr.13.85 MVIFSSFTLLPGQGQDTSQRLLLLALLVSLGSCTLLDKDVGMLEDLEDLLLKDENMKGYSNQKNGRTPF

GLWNQFNSARRSPMNRKKLQTYLKMLEERDPTFTRVTERLTAHVTXIAKPEQGAAETLLIRVVDNAYD

DGLSKPRTKSVAGGPLPNARNASRAVFDNRETTLSDLTTLAMHFGQLTDHDLTAVHTPSDVNCSDCSV

DGECFSILINNADPVFGGVQACFPFVRSNFETDSSGVRQHINSITGYLDASFVYGSDDASALDLIDSN

GFLLHDTDGVTGRQLLPPDVDLDLCAGVNETEGIYCGKAGDGRAPEQPGLTALHTLFLREHNRVADAL

LSLDSTLSPFNVYQTARKIVGAEWQHIVYNEFLPLIVGADLYASEGLSPDSTYAYNPAVDASSANVFA

AAAFRFGHTLVPFDITRVNRNYRPRFDEIKLTEAFFNATYIYDESIPDGAVDSILRGMTVQNSQKVDQ

HFSDAITDNLFGDPNKEGDGFDLTALNIQRARDHGLPGYTTIRKDFCELGEINSFRDLFDDGVMTRQN

FRNLRDTYADVRDIDAFVGFVLEKPLPNALVGPTLACIFADQFRRLKFGDRFFYQSLGQFTPAQIQEI

EKASMARLLCDNVEAVDEIQPYIFMKARNFGNLERRQQGKRGPYTSFYEYSRSGAWPHKRETVLEGLD

NRRVSCTATSGEIPVVDLSKFV

An_peroxidase (PF03098)

Palmitoyl protein thiosterase 1-like; oki.142.20; gbr.18.23 MDMELQNLAFFLVVLSPALATTPLVMWHGMGDSCCNPLSLGRIQTLVEKEVPGIYVRSLEIGNNIIED

TLNGFLMNANKQVEMACQKIRSDPKLAKGYNSMGFSQGGQFLRTVAQRCPTPPMFNLISVGGQHQGVF

GFPRCPGNYSSICEYIRKLLNFGAYLPIVQAEYWHDPLNEEEYRQKSIYLADINQERKVNETYKTNLQ

KLKKFVMVMFGNDTMVQPKESEWFGFYEPGQDKEVYTLKESPLYTEDKLGLKEMDEQGKLVFLTSYTD

HLQFTESWFVQNLIPYIK

Palm_thioest (PF02089)

PC3-like endoprotease variant B-like; oki.102.30; gbr.67.22 MGLRALSPLLLVIATLLGGTATTDNFGEEREFHNEWAVEVPGGEEVAREVAEEFGFTFGRRIGALENM

YSLQKDNHERRSRRQAVDVTSLLVEDARVAWVEQQETHFHEKRDAAAVSHNQTGDDYVPPFNDPRFPD

QWYLHNDGQNDATPGVDMNLAPVWKLGIMGQGSVVAVVDDGVDGTHPDLQANYDPLASWDFNGNDSDP

WWW.NATURE.COM/NATURE | 49

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

50

HPDDRAEVGGKNNHGTRCAGEIAAAANNGICGVGAAPKAGIGGIRFLDGRVTDMMEAEALTFNNQHVD

IYSCCWGPSDNGKTMREPGKLMTEALAQGCREGRGGKGSIYVWASGNGGHNDDDCGADAYVGNIHTIS

VGSINDKGESVYFMESCPSTMGVILSGGLNDRSSLGEMKKRQNLVITTDIHDACIDNFIGTSSAAPLA

AGLLALVLQANPNLTWRDIQHIVVNGAHIPNTVESGWHVNGAGFHVNEKFGFGMLDAGKMVELALTWS

NVGKQKICEVPTFKIDKSVLQGTSYLTHLQVDCPHMTKLEHTLVRISFKAPRRGDVSLKVWSPFGTPS

ELLSRRRHDNSTDEVVRFPFMSVRNWGENPTGTWGFELMYHFTPPDVQPKPKWPDIPRQHDSMVELMD

VQLILYGTADGEDEGTSNSDDANPEVNTEFTGRLDSDEVVDIFNDEQTDDEEIVVDLDDLAAGVRPPPN

TAHNLDKNREGVIDTGQFDKVLQDPEVRSLIHDIYVHKKAEALRRLAAKRLHQDKRYHDQRQRSQDLDR

AHDEEESRREALEHVLELLHDGGL

S8_pro-domain (PF16470); Peptidase_S8 (PF00082); P_proprotein (PF01483)

Conserved uncharacterized protein; oki.1.62; gbr.2.191 MRALVAAVLVLGLVGFQMTNAWNEDGLDFLERFLKEEVAETKKSACSTNLPIVGKITWQSSTLGAYSS

NKAVDGSSSSNLYPSQHCSHTITNAKNPWWMVDLGSNHCITQVRILNRGDCCSKRLEGAVVRIGPSVT

ATENWACGSPVTAAQAAPLGGTIEFTCQPALKGRYVSVDIPGSATLQLCEVTLEEIPQGQCPDSQPFD

IVGKPAEQSTTYDKRFTANKAVDGSSSSIISHCSHTGERLQNAIVRAGTSETATANQACGAPITANQAQ

PLGGTINIKCDRPLRARYVSVDIPGTATLQLCEVSVEVLSSPDC

F5_F8_type_C (PF00754)

Peptidase inhibitor 15/16-like; oki.208.17; gbr.334.16 MLRFGDVFFLLGLYLFWGAGVNARVDADGTYTAVSLTPEEKDVFLNAHNELRSNVDPEAANMMFMNWD

KSLALMAQAWSAKCIWDHGQPTPNISPFTSLGQNMYLITGYGNRPSGRAVSTFWYNENRHYTFETDAC

SGVTCDHYTQLVWAKSRSLGCGMAFCKSVSNTKWRDVWIVTCNYGPRCPRFSPLRPAYRSSIRTTASA

PPIPERTL

CAP (PF00188)

Peptidyl-prolyl cis-trans isomerase FKBP2; oki.19.124, gbr.72.126 MVPCKSIVSAIVLVILVISCTFFTDVEAGDKPKKLQIGVKKKVENCEQRSKSGDTLHMHYTGTLQDGTE

FDSSIPRGSPFVFTLGAGQVIKGWDQGLLNMCVGEKRKLVIPSDLGYGDRGSPPKIPGGATLIFEVELM

KIDRKKEL

FKBP_C (PF00254)

Phospholipase A2-like; oki.8.262; gbr.558.2 MNFLVVIVTTVSLAGAASAGEIQNLYQFGKMVMCLGNLNVLEGLEYNGYGCYCGRGGKGTPLDDTDRC

CKQHDECYERATDEMGCWSIETYATTYDYTKSKVSGKCTIKCSSEEGKVHKRKLISPDDNTISRRVRP

ECQNDVPIWTLSLPTELESDYSRFTIRKKCKAFICECDRIGAQCFADKRSTFNRSLISYTKDKC

Phospholip_A2_1 (PF00068)

Phospholipase A2-like; oki.8.264; gbr.558.1 MKTFLILAMAVALAKAQSTDEITNLVQFGKLVMCLGNIGYTEGLEYNGYGCFCGKGGKGTPVDATDRC

CEVHDNCYGQAVKEGKCWSIETYGTTYWYDKSTSSGSCSIRCWEENEYNRFVPSKACKAAICECDRKA

AQCFADNRPTFNRKYLSYAKDTC

Phospholip_A2_1 (PF00068)

Phospholipase A2-like; oki.8.261, gbr.558.4 MKTFLILTMAVALAIQFGNLVMCLGNIGYTEGLEYDGYGCFCGKGGKGTPVDATDRCCEVHDNCYGQA

VEEGKCWSVETYGTTYWYDQSTSGSCSIRCWEEGDYNSLVPRKACKAAICECDRKAAQCFADNRPTFN

RKYLNYAKDTC

Phospholip_A2_1 (PF00068)

Plancitoxin-1; oki.27.35a; gbr.147.30 MPCCVMTFTFLVLTAIMVGTSEAAVTCKGANGYPVDWFIVYKLPQDSSSSVQVIKDGYGQMYMDVNNP

VLTLSSASLKDTNHAIAYTLEEIYRNQGNDDLAYVMYNDQPPPSKEIQTGLNGHTKGVLAFDDDTGFW

LVHSVPKFAPPASKEYKWPDNARRNGQTLLCITFNYNQFEKIGQQLKYNHPLVYNYDLPPDLAKDNPS

IKDVINGVHVTVAPWNRALTLQSKDGQTFVSFNKAGKFSADLYKDWLAPYFKSGLYCETWQNGRARKL

NSSCVGGIDVYNVREVSLRGGSDFKGTKDHSKWAVTTKPGLKWTCIGGINRVNLSPHQAEQTLAVPGS

WWW.NATURE.COM/NATURE | 50

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

51

LLSNSWSGLYDVLIYREFGCTLMIAVPHFPYPAPALALYSLVRTAKALIYSGYIPFPSRLRNVACPRWT

AMEVILGRKIFQRHLRVKVIFGEMKKNRFIVYKLPQDSASSIPEIRDGYGQMYMDVNNPVLTLSSASL

KDTNHAIAYTLEEIYRNHGTGNLAQVMYNDQPPPGKEIQSGLYGHTKGVLAFDDDTGFWLVHSVPKFP

LPSSKSYNWPDNARRNGQTLLCVTYNYNQFEKIGQQLKYNYPWVYDNTIPPDLVGQTPSIVDLVNNIH

VTSPPWNRRLNLQSKNGQTFVSFNKAGKWGKDLYAGWIALDFNSGLYVETWQNGGRNLNSSCIGGLNV

YNVKQVNLSGGSNFKGTKDHSKWAVATKSGLIWTCFGGINRQNLSPHQAEQTLAVPGSLLSNSWSGSA

E

DNase_II (PF03265) x 2

Plancitoxin-1; oki.27.35b; gbr.147.31 MSTMVVILMLLAVTVLTAMMGTSQQLKYNYPGVYDSDLPSKLVGKTPSIVDLVKNVHVTSPPWNRQLN

LQSKSGQTFVSFNKASKWGEDLYKNWLATHFKSGLYCETWQNGGRNLNSSCEAGLNVYNVKKVSLSGG

SDFKGTKDHSKWAVTTKSGLKWTCIGEPFTASSRADPSSPREFVEQIMEWILWYSASSKPVIREGYGQ

MYMDVNNQALKFSSTSLKDDDHAIAYTVDDIYKNHGKGNLAHVMYNDQPPAGEEIQSGLVGHTKGVLA

FDGTSGFWLVHSVPKFPLPASRSYDWPDNAKRNGQTLLCITFKYDQFEKIGQQLKYNYPGVYDSDFPS

RLVGQTPSIVDLVNNIHVTSPPWRQLNLQSRSGQSFDSFNKASKWGADLYKDWLATHFKSGLYCETWQ

NGERNLNSSCEGGLNVYNVMKVSLSGGSDFKGTKDHSKWAVTTKPGQKWTCIGGINRQNLAPHQTEQT

LAVLGVQLMECEYLLSTHTDLGNRSANFTTLDFERLEVSGVKTTRFVVYKLPQDSASSVQEIKDGYAH

MYMDVNNPVLTLSSASLKDTNHAIAYTLEEIYRNQGSDNFAYLMYNDQPPAGKEIQSGLVGHSKGVVA

FDDYTGFWLVHSVPKFPIPGSKGYTWPDNARRNGQTLLCVTYPYNQFEKIGQQLKYNYPGVYDSQLPS

SLAGDNPSIKDVINGVHVTVAPWNRELSLQSKDGKIFVSFNKASKWGLDLYKDWLATRFKSGLYCETW

QNGGRNLNSSCEAGLNVYNVKKVSLKGGSDFKGTKDHSKWAVTTKSGLQWACFGGINRQTSQMYRGGG

AVCFEHPDVHKTFYDCVAEYEPCT

DNase_II (PF03265) x 3

Tachylectin-like; oki.66.97; gbr.732.1 MFVNTRQYFPLSLVWAVCILCEVLRPNRVSACPQATTSWTQLSGALKHVSVGNSGVWGVNTHNWIYYK

GTSYGEEESPTCRAWEKVSGSLTQLDVGHNIVWGVNVHNNIYYRQGITASNPKGRDWVQVSGALKHVS

VSQRGHVWGVNRQDYIYHRIGASNCNPAGDSWRKLDGRLKQISVGSGGVWGVNSGNNIYYRVGTYGDL

PSDPDGSDWKQIQGSLKYISSADMIYGVNSNDNIYYRVGVSEGTPWGTRWEQIPGALKQIESLSCVVW

GVNRSDNIYKKKTDN

Hyd_WA (PF06462) x 4

Deleted in malignant brain tumors 1 protein-like; oki.42.51; gbr.22.25 MIASMHNFVGRSARPWWSWLLHCWASFCVIWASAQYHEDSSSRSDISRPPNNLPCKDYTGRKYYHGET

YHIDKCTSCTCNNATVKCLFESCPVPTCRRPISFPGECCRLCPYNITVNKVRPVIPRSQSIQEGRAEN

NLTVNLDVLYANTNDTTSVTGQGLWQTAMWISSMEDGSVKLPGTYVGNVLTEGQESLDLRKRGSISTNF

YINDIIYPVDMSNLTCDEARYLCAKFNRGENPQVAKSFLAFHFEARPSEDVLTGCSPIEDCKGCCTDQL

ESGESTPLSEPNGVPLFQIGTRVVRGPDWKWGDQDGFPPGKGTIVDELESDGWIAVLWDAGERHFYRM

GAEGKYDLKLIEDSRVRLVDGVDELSGHVEINHDGTWGTVCDVRWDMRDANVVCRQLGSFLKAVEIKK

GSFYGESDRPIVLSRVKCKGTETRLADCPFVSTINHPCASLQVAGVVCRPKLYSLRLVGGSDRLRGHV

EIYLGGIWGTLGDNDWDIDDARVVCRQLGFSGASQAMSGAHQGDGPVHMDGLACDGSEERLADCPSYS

RKKPARVRAADAWVVCRGD

VWC (PF00093); MIB_HERC2 (PF06701); SRCR (PF00530) x 2

Deleted in malignant brain tumors 1 protein-like; gbr.504.3 MKILAVFLLIFQAVFGNEVIPEPDVTLRLVGSENDKEGRVEVYYQGEWGTVCDDQWDINDANVVCKQL

GFVNATEAVLGARFGEGTGRILLDDVDCTGDESRLEDCPNRGVVPDEGDVRLIGPEPNLGRVEVNHNG

IWGTVCDDEWDIDDANVVCRQLGFTNGAARAASEAEFGQGENPIFLVDVACGGTESRLVDCSNPGWIV

EKCGHSEDAGVVCLPNEEPDVTLRLVGSDNDKEGRVEVYYQGEWGTVCDDQWDINDANVVCKQLGFVN

ATEAVLGARFGEGTGRILLDDVDCTGDESRLEDCGSRGWGVENCWHNEDAGVVCHSSEIEEDVRLVGP

EPNSGRVEVKYNNVWGTVCDDNWDIEDANVVCRQLGYTNGAARALSGAQFGRGEDPILLDEVACVGTE

NRLVDCSNAGWGTTDCSHSEDAGVVCLPSEEPDVTVRLAGSENDKEGRVEVYYQGEWGTVCDDEWDVT

DANVVCKQLGFAGAIQAVSGARFGQGSGQILLDDVGCTGNETRLEDCANRGWGVQYCGHDEDAGVICY

GYETEGNIRLVGPETNLGRVEVNHNGIWGTVCDDNWDIEDANVVCRQLGFTNGASRAATRALFGQGTD

PIYLDEVQCNGTESRLVDCSNAGWGTTDCSHSEDAGVVCLPSEGPNVTVRLAGSENYNEGRVEVYYRG

WWW.NATURE.COM/NATURE | 51

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

52

EWGTVCDDEWDTSDANITCKQLGFESAIEAVSRARFGQGLGPVLLDDLGCTGDESRLEDCPNRGWGVE

NCEHYEDAGVICSSGSFSQ

SRCR (PF00530) x 7

Serine carboxypeptidase CPVL-like; oki.52.56, gbr.411.2 MSSVVLTTIVVALMLAVMPGALHAVRGPFQAMFAANPPAHADMTGVDPGQPLFLTPYIERGQIQEGQRL

SLVGDLNGTSVKSYSGFLTVNKDYNSNMFFWFFPAQHERITTRDKRAPVMSDTILLHANLSHIERHQL

RTVPVSGTRENYITPHNLGSFYKAAGLAEILWRQ

Peptidase_S10 (PF00450)

calcineurin-like phosphoesterase domain-containing protein 1-like; oki.62.127, oki.62.127a, oki.62.128; gbr.6.127, gbr.6.127b MASALFAIFSLILLSGVSAKEPRPCCWISQMTGQAELQGTKKLEDGTVILEESELQWAYDATFQAEAFI

FDEILENSTVIKGRIVNFYDQGLSFYIIDNQGVERCAKIRIPARFPKKCIPEHAEYYGNRTLGNDSLVA

HEWKLNTVAGGDVLNVSFLIQDEYCVPISFGVVSQEHEGTWRGPFTFIASADPQYGLTAGWNNPGLLD

WTQEVELTQRGIERINKMKPRPRFLVVLGDMLDAMPGRPGRDDQKVSFTEVFSKVDPEIPLVFLPGNH

DLSDSPSMEDIKLYRDTFGDDFYSFWVGGVRFIVLNSQFYADSSKCQEARDEQDTWLNEQLEDVQATG

CKHLVVFQHIPWFLKNPEEENEYFNLDVNLRLPMLEKLRKAGVRIIFCGHYHRNAGGFYKDMEEVVTS

AWGLQLGEDKSGLRVVKVTEDSISHQYFSVDDIPLERQAPDPASSDESRWMIPTEDPCQCTEVQLGGLN

ILPSLQPFRVKYWNTLLRSRDSEEIQTWMKCEVCRDPLGMENSDIPDNALSASDVGHTGYGIRRSRLN

FKAAWCPRSIDENQWIRIDLQAPTTVAGLITQGRHHSDVWVTSYAVQYSDDGINWNNVTGSDGTTAQF

QANTDNETPVTNIFPASLTTRFIQIRPLAWARYICLRLELLGCRSV

Metallophos (PF00149); F5_F8_type_C (PF00754)

Protein SpAN-like; oki.27.95, gbr.41.55 MRLILLAMLLGCALAASTNRVRRNKFEKGNRLRPTPKATLDRDGEDFNRPRVKRKSQFEKGNRPTPTPT

MPLDLVDGEDNRSGRDRRAPPATWPNNKVCYEFASGFNDDNMKKNLRNAMNEFERVSCMKFIEAGTSD

ASCTSQSSPLLTVQNTAEGCWAHVGCHSSTNTVNVPTKCDLYDEIGVLIHELFHALGRYHEHTRPDRD

HFVTVQWDNVLDGQAHNFEKHTSDDFITFGIPYDYESIMHYGRSFFAKDKSQPTLTLIDTKYNDKVGK

QKQLSPSDVLYVNKLYECYEGEPEPCKNGGERMASSDCYCIFPFMGTDCGDVDPGVTVTENDGSGPIS

LVSLNYPDHYPLNTVTQNLLMCTDTSKKVQIVVEDFDVEGYGCYYDKFRYTLKAGDVNPETKCEDDLK

GSTVKADGNSIFITLVSDDMYTFKGYKMEFTCV

Astacin (PF01400); CUB (PF00431)

Aqualysin-1-like; gbr.117.1 MHTLVLLLLVGVAAAGLAPLYEVEENVKGHYLIKFKDEVDSDMTAEGIQRHVQQQRLGEAIISHRYYN

VLKGVAAKLSEEAVQYVRTLDDVEYVSQDGMAYAAAIPWGLDRIGQRRLPLDGRFTPDPVYNEGQGVE

IWIVDTGIRPTHSDFGNRASIVFDAYGGNVITVGATNSRDERCRFSNYGNCVDIFAPGRAIVSAGWKT

DTSITTMSGTSTACAHVAGIVALYLAQNRTLSPAEIKSKLKTSATTHLLSNVGQGSPNRLAYIDP

Inhibitor_I9 (PF05922); Peptidase_S8 (PF00082)

>Subtilisin-like protease; oki.571.1, oki.406.1; gbr.685.5MHTLVLLLLVGVAAAGLAPLYEVEENVKGHYLIKFKDEVDSDMTAEGIQRHVQQQRLGEAIISHRYYN

VLKGVAAKLSEEAVQYVRTLDDVEYVSQDGMAYAAAIPWGLDRIGQRRLPLXWTLHT

Inhibitor_I9 (PF05922)

Conserved uncharacterized protein; oki.258.16; gbr.501.3 MRIFLLLAFVAVVRGASLPWSVVPWEEWDVDLGITGPEVSMPSPSDSPNSIKVCCEGYTGTVEDGCPTP

VCDSPCLNGGTCAAPNHCICPKAYTGQTCDTLKREFVWSSWANVNQKPLLDSFQMTLNLYHIRDEVEA

CKNVVPEDIECRTRENHMDHTQTGQVVTCDRRSGFKCRNDEQPYGEPCLDYEVRILCPVKDQVTKRGF

TCIHETLEYNVGEVVKEGDCSACECQSDGRWNCHRDNHFCDRVRTGQCVVGERVFNHGDTTTLDCKMCT

CDTTTGWSCVNIDDHTCVAPPEDDYCRIDNHFIFFHGQTAKKDCNTCICKNREWDCTQMICPDKTNKAA

YFHKCTNPLTNERHNDGEIIKRDCNVCVCTQGKWDCTRDPCVEGTSLESEYSLANRKVCTTPTGLPFPH

GATISQDCNACKCFDGKWQCTRRYCSPETYPNSTCLDDVTISLVKSGQTIRRGCDVCVCNGGNLTCTTK

PCETNNGMCFDEQRRFVTTSTSFNRQYRPACNPDGTFRPIQCNPVHGVCFCVTTLGQVIPGTAVKQDT

GSPNCSPYTTGNAWSFMAVYEPMDVPQTNTQTGSDPCIETTRNTDTHLTGYNTPKCLRNGFFAPMQCD

LHTGVCYCVTMEGATIPGTVMHVSQGRPNCDVIREDMTPCQQERATAEHYKLPFVPECNQLGYYQPIQ

WWW.NATURE.COM/NATURE | 52

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

53

VNTMLGVRFCVDTNGVVIPHTITKSSNLHCDNIVDSEDVEPYTNKVDENHAPLIFQTYPIKDAIHKTP

RCKLQRLSSQFVSTVYRPTCLSDGRYTPMQCPEGVGVCFCVDVTGAIIPETVSRSGNTPNCNDYRSIF

DVVTPMNTSTVIIHPDVSLIHTGWELKLREIDHPTHVVGPQYQSTTTPQVLKTNVTTCIRQQQIASMV

PGVFYPSCHRNGTFNTMQHTPLNSKVYCVDKNGVKIELTERRMDVHLTLPECNKYWNTTIEINSLNKH

WTSTHQDVMTTTPIPLDKLFPAGLHHHPNCTRQQEITSRLEGAVVPTCEPNGTYTPLQCNATTGVCYC

ILPYGDVLPETVLPMVKGTPNCWHLRNASIQHSTTPVPDVCRLPISRGPCLARFPRWAYSTLLNKCHL

FVYGGCRGNDNRFETKEECENTCGYMTPRDSRPVTCSRKLNRFYRMSLTQRLRMKRPTCTPDGNYAPK

QCHLTRTERGLIESVCECIHRITGEPQVCPEEEVMVNPEFIDTIGVSTTLAPVPVIDGCLHERVEYHE

GDNFFQECNQWWVWPILVFGGGECIRRLMWPVWAIGENCYVNGQVAVEKDQGCEHCTCMGGQWDCEPMA

NCDTDPGPMSESGTSTHCVMEDGTSMTVGSSFYNKCNKCTCLSTGVVSCQNNYCIPRHCYERGIPYHTE

DTIMRDCNRCTCFSGQWQCQEYTCDPYIRTMVRVEGCQYKEKTYSNDETFYDVCNLCRCLGNNQTSCNR

RFCNPTSCYVKEVEIRNGESEIFECKTCSCINGLLDCHEDEECLNVMVDPVPEVNRDTSVTCNFEGKT

YTNGDEFFKQCQPCRCFDDGQVQCEEKACSPNLCYMEGVGYRSGKVLLTDIHKCTCYRGGHWDCQDKP

ADAGPPEVIECLDHQPVTVCATNPCNHVICPTNRDAICIPDPCNHQCHPAFFDIYGNRLRC

Mucin2_WxxW (PF13330); Thyroglobulin_1 (PF00086) x 6; Kunitz_BPTI (PF00014); Thyroglobulin_1 (PF00086); VWC (PF00093)

Transforming growth factor-beta-induced protein ig-h3-like; gbr.80.55 MAAGRILPVLALVALAILANPEETSGAGVLEVAEDYGAASFVSFARKCAYLKNLLETSTQPGGFTLFA

FSEKAYSESPPALRKLIANNTQTLQWVLEYHVALGPYMSSDIKDNLLLTSLYPSPTSGASLPLQYLRT

NIYTILSEAERERLGTGKVVTAGGAPIVQADLQASNGIVHIIDKVMFPLPTGADMTEFVNDDGRFSSL

FGFLQTANLTKALETDPSRPLTLFAPNNQAFKNLPKNVVQKLANVTFLQQVLEYHVVPGAYYAAGLWD

NQILHPLYNKPLLVERGQGGIYLQNSKVLQADNTVSNGVVHEISAVLIPPK

Fasciclin (PF02469) x 2

Transforming growth factor-beta-induced protein ig-h3-like; oki.2.137; gbr.80.58 MAAGRILPVLALVALAILATPEETSGAGVLEVAEDYGAASFVSFARKCAYLKNLLETSTQPGGFTLFA

FSEKAYSESPPALRKLIANNTQTLQWVLEYHVALGPYMSSDIKDNLLLTSLYPSPTSGASLPLQDLRT

NIYTILSRERI

Fasciclin (PF02469)

Transforming growth factor-beta-induced protein ig-h3-like; oki.2.139; gbr.80.54 MYLISAALFLLSLLGEYSLAHDVRNQEGNILQVCEKAGALTFVKYARATPWVNKTLVSGIGYMALAPT

DRAFGDLPLVVKIALKDPQTLEWYLRYHIALSVAYKQEINNNLRIPSAFRPPGKTEDLPEPVLPIRFN

VYDVLADYLSDGSFGQYLRTNIYTILSEAERERLGTGKVVTAGGAPIVQADLPASNGIVHIIDKVMFP

LPTGADMTEFVNDDGRFSSLFGFLQTANLTKALETDPSRPLTLFAPNNQAFKNLPKSVVQKLANVTFL

QQVLEYHVVPGAYYAAGLWDNQILHPLYNKPLLVERGQGGIYLQNSKVLQADNTVSNGVVHEISAVLI

PPK

Fasciclin (PF02469) x 2

Serine protease inhibitor Kazal-type-like; gbr.307.8 MRKLILITCIAVLLCSVYDAKGEPDPVLPDFCGKYGILPACPRILWPVCASTGTTYDNLCLMCADMLR

EEIPTSVTYTTGRCATD

Kazal_1 (PF00050)

Conserved uncharacterized protein; oki.13.128; gbr.43.33 MSQLALVALLGVIGHALSAQGQEHIPLTLNVESGLQDVPCGGFRYFSVEVTDPCKDLRVMVTKIEGEPD

VYIGRGNNMFPTDNTLAWSSYEWGSENLTVSSWDPEFEVGTFYIGVHAYCGIDVHTGNTSSKVKVLAES

LATSHMHPEITAGSPIRDGRVDAQGYNYYRFCLPHKCANVEVKLENCLSGADCPDSYGYPELLVSRSIV

RPSINDHSWKLATVTRRSVYLRHDDPDVKPGHYFVGVYGWCTPDENCPDKSTCGPCEYVANMAYSVSII

MTDVADCNPNPEKRNALSVEGQKHVPLTLNVESGLYEVPCGEFRYFSVEVTDPCKDLRVRIQAIQGEPD

LYISRGNDKFPTDKSLTWTSYNWGSEDLTVSSWDPEFEVGPFYIGVHAYCGSDVGTGHTPSKFTILAES

VPTSHPHDEITVNSPIRDGRVVAQGYNYYRFCLPQKCANVEVKLENCLSGADCPDSYGYPELLVSRSIV

QPSINDHSWKLASIYRRSVYLRHDDPDVKPGHYYVGVYGWCTPDEHCPDKNSCGPCEYVPNMTYNVSLI

LTDVADCHPTAGNTATISMSSFVAVQFSGFVMNCSFSFSFGSNDNN

WWW.NATURE.COM/NATURE | 53

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

54

Conserved uncharacterized protein; oki.132.13; gbr.332.10 MRQAMRLPIYVATLLTVVSAIDIPVFTNFVTTEVATISHSSTVKLIGYLCGQARGNHVNVSLELPHNPD

WNPSLGVLYYYVVDSPSKGEADALCTNLHQGVPGPSCNVKSWPSVGDLYITVHSGLADAVAFSVIGTSY

TKREVNPGVGGERTRKVFSPLKPPTANRVPRLPPGQETKEGQVIYLAEFVTLMGPQAIPRLQKAHLNFT

FCPTPRMGRVSAIDVRVATKFTTTEVTTLSQNSTVNLIGFLCGQAKGNTVNVTVVLNNNPYWRLNQGVL

YYYVVDSPSKGEADALCTNMNQGAPWAYCTVKSWPSLGDLYIKGRSGPVEAVSFSLDAERTQQSEVSFS

ADAVKIDKVPASAQKAPTSTAEPLPRVPPGYKLRDGKTVYLTEYITLTASQVVPRLQEAHLNFTFCPTP

ETGSQYSIIGTVIAKEGRSSWVQYICNKYPCELSHPENIIAYNGRQLPINTVVTGAGQWKQLYALIICW

GGPFDPKSGLYIGDFLFDAVANKV

Conserved uncharacterized protein; oki.172.23; gbr.106.45 MKAAVVMVYLVAVASAHICLLNPHQRGSMKGINVKDAGDCGLLIKPCGNRIQDKPGIQIKGGSPYTVVF

QKNLNHFETFPNGTATNGYFEISFLTSTDVQMLSKVNDGATPSLTLYYPNVTMPRGPIGVPAILQLTYV

TNNAEVPMGGIFYQCADIELF

Conserved uncharacterized protein; oki.172.24; gbr.106.46 MKAAVVMVCLVAVASAHLCLLNPHQRGSMKGINVKEADDCGLLVKPCGNRTQEKPGIQIKGGSPYTVVF

QKNLNHFETFPNGTVTNGYFEISFWTSTECHILGKVDDGATPSLTLYSPVVTMPTGPIGVPAVLQLTYV

TNNAEVPMGGIFYQCADIELF

Novel uncharacterized protein; oki.111.24; gbr.283.2 MFRTLVVVLLATVAVSVFGKDYPVSKDCNDDPAGCKICVQTYQFMKDSLMNAAFVDDTRKYCGYICPSA

RNSSLPSCQPKAVQLREGDTCVLNEDGPVCGVCTGTVVWLKEMLLNKQLIMTADVYLNFYCDLAASPCV

RQICRQYVREMNKLAMALGTVLDAKSVCQPMCSSSVTVHTPNLAGAIADFLRHVTEVMDVVAVGHRSAT

PDPPMGHCFADEQIHRWANGTPGPPLDHWYLLVGRETILYCSVLSA

Novel uncharacterized protein; oki.327.14; gbr.309.8 MAHCFSLTVLAIFLLGVGLVVPKKVEEEGSYARMETVKLKNLQLENRLRGFEPANLESSKGFMRDDVAK

RRRRVPKTDEMLASKKRSSVSN

Novel uncharacterized protein; oki.107.43; gbr.33.86 MRYIAVALLLTILLFDVVVGDGVKAQAPDKHSPAVADSRARRAVPHAGLINPADLGMRLKSRRSDPFRR

APRMKRYRRMVEDPEMYKRGLPGGKQIRVKKASKV

Conserved uncharacterized protein; oki.132.11; gbr.332.7 MKHSKMILLSVGLLFSVAWAIPXXXXXXXXXXXIIVPVVTNYITTEVQTLAQDASITFIGQLCEQYKGM

GVQVTVLLNNNPDWDPSVGVLYYYVVDDPNKGSSAALCNNSPGGSPQANCTVKAWPSNGDIYIKGHAGQ

SEAISFTLSVVIRPKASKGGASIKPRPFSFIKPPLRNLGGTPSANNLTVYLSEVVTLTATQPLGYHKKA

LLAFTFCPTPQTGTRYEVQSTTSATDGRSSYASYICDKLPCVVDGENVIAHNGRQLPSNIVRTGSGTWK

TLYALIVCWGGVWNPPKDEYIGHFIFNAHIM

Conserved uncharacterized protein; gbr.332.9 MRQAMRLPIYVATLLTVVSAIDIPVFTNFVTTEVATISHSSTVKLIGYLCGQARGNHVNVSLELPHNPD

WNPSLGVLYYYVVDSPSKGEADALCTNLHQGVPGPSCNVKSWPSVGDLYITVHSGLADAVAFSVIGTSY

TKREVNPGVGGERTRKVFSPLKPPTANRVPRLPPGQETKEGQVIYLAEVVTLMGPQAIPRLQKAHLNFT

FCPTPRTGSQYVIEARVISVDMKSTWSQCICDKYPCELSHPENIIGCNNTPLPINTVVTARGQWKQLYA

LIVCFDGPYDPKSHSYVGQFEFTAVAIKV

Novel uncharacterized protein; oki.33.77; gbr.422.5 MFRTLVVVLLVTVAVSVFGKDHPVSKDCNDDPAGCRICVETYQFMKASLTNAAFVRSNVEYFKLSTCPF

VPEGQPEHRLCVNHFNEIYGHVRLFVSMYLDNTRKYCGYVCPSARNSSLPSCQPKAVQQREGDTCVLNK

EGPVCGVCTGTVIWLKEMFLNQQFIQSAGVYLNVYCDLAKSPCLQKICRRYVRETEMVFLAFGAALDAK

SVCQPMCSGSATASIPNLASTVVDFLRRVTEAKDTVGSK

Conserved uncharacterized protein; oki.13.130; gbr.43.30 MSQLALVVLLGVICHALSAQGQEHIPLTLNVESGLQDVPCGGFRYFSVEVTDPCKDLRVMVTKIEGEPD

VYIGRGNNMFPTDNTLAWSSYEWGSENLTVSSWDPEFEVGTFYIGVHAYCGIDVHTGNTSSKVKVLAES

LATSHMHPEITAGSPIRDGRVDAQGYNYYRFCLPHKCANVEVKLENCLSGADCPDSYGYPELLVSRSIV

RPSINDHSWKLASIYRRSVYLQHNDSDVQPGHYFVGIYGWCTPDENCPDKSTCGPCEYVANMAYNVSVI

WWW.NATURE.COM/NATURE | 54

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

55

MTDVADCDLNPEKCNALSVEGQKHVPLTLNVESGLHKVPCGEFRYFSVEVTDPCKDLRVRVNMVEGEPD

LYIGRGNNKFPTDNTLAWSSYNWGSEDLTVSSWDPEFEVGTFYIGVHAYCGIDVATGDTPSKVTILAES

IPTTHPHNEITVNSSIRDGRVNAEGYNYYRFCLPHKCANVEVKLENCLSAAECPDSYGYPELLVSRSIV

RPSINDHSWKLASIYRRSVYLQHNDSDVQPGHHYVGVYGWCTPDENCPDKSTCGPCDYVANMTYSVSII

MTDIPDCDPNAWNPDPCNTNKAQILMPSYLVFCVSAILMKIFF

Novel uncharacterized protein; gbr.452.4 MRSLFWLLYVGMAVIIMMDFCTAQESQYTTDCTRFITLPGCPKNYRPHCTTVDGTVYQYANLCLLCRAI

ESEELPEDVSFARCTEEQLENNQ

Novel uncharacterized protein; oki.50.195; gbr.58.38 MFRFTAILLVLGMVAIQASRARPETRRYETNLKDSLKRREDLDLVNQLLERLLASEEKREVQKDCPEAT

ICSIGECYYVTDTPPGYGLVLYNRNSDKLLFQFDEDTLRRLDDKKTNVTVALLSTYFPYTSEDFEVDGE

TYNVKVDNVTVDDEGNITVDGFCLVLKEEWSRQRQLARRFY

Novel uncharacterized protein; oki.327.23; gbr.309.15 MANGSLKIQMLLLVIFAGVVAFAAPVNQQDNGFLGNMNRATENKTQSNADNDMQQEDLQEKNMHDLQNL

KKRPNWLIFRERHLMDKPSLRIPLSHVDFFPHLQDDDPQATKIVNKF

Conserved uncharacterized protein; oki.460.2 MNTLIVFASCCLLLSLQAAGSPFGYDEIHSMSKKYTEKSEPETAWAIPVSDSSAPPPINVSVVTNYITT

EVQTLAQEAEITFIGQLCEQSKGFEVEVTVLLNNNPDWDPSVGVLYYYVVDDPNKGSSAALCNNSPGGT

PQANCTVKAWPSNGDIYIKGHAGQSEAVSFTLSVTMKQKAGIKPTPFFFDKSPFRNLERTPSANNPTVY

LSQVVTLSSIQPLGYHKSALLAFTFCPTPQTGTRYEVQSATSATDGHSSYASYICDKLPCVVDGENVIA

ANGRQLPSNIVQTESGTWKTLYALIVGWGGVWNPPKDEYIGHFIFDAQTM

Vitellogenin; oki.185.68; gbr.198.26 MKLLLFLVGIALANAATTIIQNNDLEKSITIHRNAPETYPTEFQLSKTHRFNYTGDIRTSFPQTGNET

VGQRLQCIVEIYPLTTTLWNMRLVNPTIYEINGTHDKPQWMVNSTKVTEELRELLQYNVSILINQGKI

EAVFVNKSEAEWIVNLKKGIMNMISLTVEKDNVYEVDEEGVSGICKTLYTIKENKNEDLVTIMNITKV

RDLTNCSKTAFNKLMTFKAKDCDDCEKSKDSMEALATFKYNITGTKQRYIINSVQSDAHYVFFPMGVK

GGSVLVHVNQTLKLINVTSDPYFEKSPTQESRGGLTYLFPEIVKVEEDMLNTVQKVDFMLKKLEKQTN

SSLTIDSPSYFLLLVRSLFEANYETLEDIWNLVRHKPEQRKWLIEAIPFVNRPDMTLLIKELITTEET

LLTTEEKITILTRLGFIRQPSETTVEAIKALLCDLGNHVQDICEHRVLQWHHNTKVNTTYVYNLVRMR

HACYRSLGIQINSLKKTAKFVPPALVHSLLHCEKHFDDSTKATPWDPTVYKQMGVNQVTSEVDDNLTLE

EAQALVDVTKKTKVREMCLVAMGQVGLPDHIPYIEPILHQTVTNQTQDVRIAACFALRKMKNIPNKVL

SLLLPILRNPYEDPELRMIAYLVTTVTNKRPSVFTLIGQELNKEKSKQVRSFIYSHLKTLSESVVPCE

KDLAIAARYALVFTKPFNLGPTYSKVRNVGFYDHETKLGAKLKTRTMFSEGELTPSRTNIGLEVEMLG

KKTNFLEVEMKTSGMTNLLDKMFGVDGHFYKRKSIFDLLKESNRKRRSVSSNEEELNKINKKLSPYEV

TPEDPRFRLYISLVGNRILDMDVNKEYLTTLIKEGKFMTDFTSLDEELAKKQMFNTTKGMILMDHMFQ

HPTIMGMPLSYNTTVASILRIKVNATVKADPSLLLTQNTLKGSFNITPEVVVNAFFKMGCHIPQIKFG

SAINSTFNVTLPVHTNISIDFTKKKYEVIFPQIERNLTFLNFTHKIYTFWQKKDEMENQTIVNTTLTN

RKPVKKVLCIEPINDTKLCLNVTFTPRLGHPNSPYHPLTGPSYLSIFQNKTVYSPKIIKFKYQVNNPEL

WETEKNVDIVMTTMGLKHDLMEYENITTFNVRIHHPNKRIVLEVTDSAHPEWRSEIRAVNMDGEVDLKA

YWGNPHDHVVKINNTDMRLTREYFLQINSQSWTFNDTRNITIFWAELPQWLKNVTYLTKTYVLPVVLEA

IKKTTDTEFFVEPLLNPIVNSTTVSLKIKTNYTMDVEIHLPIEKITVVNVTVPVNTSVLKYNVPDAVAT

IQRRLTEKFLSGNCTYNHTGTFVTYDQLFYKYNLTGSCPHVLTKDCSPKKRFTVLVQNTRQSNLLVRT

NPLVTVYIDNHKIELITRNDEEVYMKYNGIEHLTQRDSTPIETDTCIIERNHTHITVKAKIGLTVVYC

KHNVTTSLSPWYVNKTCGMCGDFNGEIFREMKNTTAEEVNNSTKFGASWLVHGDDCMDETCKLTKEDL

YALPETVYLANKPAVCFSKEPVKLCPIGCNNESPENIFTSTLVPKKQFVKVPFFCMASENLEETKTLMK

TRTDLIRSQPVDLYRDIELPHDCICTSNCNIKV

Vitellogenin_N (PF01347) x 2; DUF1943 (PF09172); VWD (PF00094)

Conserved uncharacterized protein; oki.10.197, gbr.10.204, gbr.10.206; gbr.127.19, gbr.127.26, gbr.127.28 MDMTVRGFLFIFCFLPASFELARSQSTDDTMQPGAEGTYVVISTLRRLSQLTEDSSGDLQPDHGFLRRV

AWVDTRDGTAAGAYDPNYHGGIWRVDRSVYDATQAMLEDSRYSSIFGRIRQLFDINWDQTTWQDCRKPI

YSALAARLYFHRLQSTIPKRLSDQATLWWNEYHTRPTDTVENFIRKVSALEGTYATVCALNMAKTRDRG

WWW.NATURE.COM/NATURE | 55

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

56

TDLTVVAEASGTAVVEATLQKINRLMTEQDGSGSQPGQTGLQADFNFMRRLAWVETLDGTAEFTYSDPA

YHGGIWRVDRGPLQATLEMTNVPEYRAIFRTIEQIFCIDWKETSWEDCRKPLYSALAARLYFHNQKPAI

PQGVSDQAEYLGMQDQSRPNKTVTNFVMNVTNLDSESECLVRGIDLVFVMDSSGSVTSSNFELMKNFV

LEVVDFFDIGPDRTRVSVIRYASDASIQFSLNKYTDKTLLKQAIQRIEYSGGGTQTVTALNLMESQSF

LEGNGARPANQGLPRVAVVITDGQSQGPQAVAIPADRAREKGITLFAIGVTSSVNDDELNAIANKPSE

TYVFHVDNFQAIANIGVTLQGTTCNQATPITEPIINGTLDGEATQYLQQAVPAEGVTLAIEASNGSVA

MYVSITTPNPNEALHDFLLVAVAGAGSVEVFLGPENFDGPVTVAAGGGSAGGTSRKRRDVPNSNQTLAG

SPIGTVYMTIQGRLAENKFVLVVDRGDVTEPSTTKPATDVGVATATTVAYRLLLALTIVTAALVPTLLS

C

VWA (PF00092)

Conserved uncharacterized protein; oki.52.142; gbr.150.20 MLRSLAAVVLMLGVLGFQMTTAWNEDELGQLLKRVLKEEVEETKKSIDSSNLPIVGKITWQSSTLSPY

SSDKAVDGSSSSNLYSSEHCSHTIAAARDPWWMVDLGSNHCISKVRILNRGDCCSKRLKDAVVRVGPN

VAATENWACGSPVTAAQAAPWGDTIEFICYPTLKGRYVSVDIPGSATLHLCEVALEEVPLGHCPDSQP

FSVFGKPAEQSTTHAAGYAASFAVDGSSSAIMYPDRHCSSTVTNSNHPWWKVDLGGEQCVTKVTILNR

GDCCSERLQNAIVRAGTYKTVAANQACGAPITARQAQPLGGTIEIKCDRPLRARYVSVDIPGTATLQL

CEVSVEVLSSPDC

F5_F8_type_C (PF00754) x 2

Conserved uncharacterized protein; oki.1.63; gbr.2.190 MLRSLAAVVLVLGLVGFQMTTAWNGDKLDLLERVLKEEVEESKKSGDSTNLPIVGKITWQSSTLGAYS

SNKAVDGSSSSNLYPSQHCSHTITAARDPWWMVDLGSNHCISKVRILNRGDCCSERLEGAVVRVGPSV

TGTENWACGSPVTAAQAAPSGGTIEFTCYPALKGRYVSVDIPGSATLQLCEVTLEEIPLSQCPVPQPF

NVVGKTAEQSTTHGAGYTADLAVDGSSSAILYPARHCSHTVTNSNHPWWKVDLGGEQCVTKVTILNRG

DCCSDRLQNAIVRAGTSETATANQACGAPITASQAQPLGGTIEIKCDRPLRARYVSVDIPGTATLQLC

EVSVEVLSSPDC

F5_F8_type_C (PF00754) x 2

Fig. S7.4. Derived amino acid sequences and annotation of COTS proteins secreted into

seawater during aggregation or when alarmed. The predicted gene family name and OKI and

GBR gene model identifiers precede the sequence. Yellow, predicted signal peptide; red,

predicted cleavage site; green, cysteine; blue, predicted glycosylation site; grey boxes,

predicted domains using Pfam search with a E-value threshold of 1.0. The predicted domains

are listed after the sequence in the order of appearance from the N-terminal. See Table

S7.2a-n for further details about these proteins.

7.3. Behavioural response of COTS to signals from starfish aggregations

Behavioural responses of adult COTS were examined in the Australian Institute of

Marine Science (AIMS) SeaSim aquarium precinct (www.aims.gov.au/seasim) as

described in the Online Methods. Starfish test subjects were placed in a starter box

at the distal end of a 4.4 m long Y-maze. For control assays a COTS was subjected to

ambient flowing sea water entering via both of the Y-maze arms. The control subject

was removed from the Y-maze after every assay and a new animal used for the next

control experiment. Twenty-four hours prior to the treatment experiments six COTS

WWW.NATURE.COM/NATURE | 56

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

57

were placed in the header tank where they formed an aggregation. A test subject

was then exposed to the aggregated COTS-conditioned sea water in the header tank

via one arm, and ambient flowing sea water via the other arm. As for the controls,

fresh test subjects were used for each experiment. Given that COTS are nocturnal

and sedentary these experiments were performed at night and basic indicators of

motivation and activity recorded. Motivation was determined if the test animal

moved out of the original starter box. Experiments were run for 45 min. In the sea

water negative controls only 10% of starfish moved out of the starter box (N=32;

90% did not respond at all) whereas 96% of those exposed to the aggregation-

conditioned seawater (N=22) did so (Fig. 2a; Supplementary Video S1) with 23%

moving into the treatment arm. These changes in motivation were graphically

represented in heat maps where the frequency of a specific position in a 2D space

was visualised as a colour representing the minimum and maximum per-pixel

frequency over the duration of the experiment (see Online Methods). Activity, how

long and how frequent a test subject was active is determined by the number of

changed pixels for a current sample divided by the total number of pixels in the

arena. Using this measure, activity state thresholds can be arbitrarily set as inactive,

moderately active, active and highly active. Activity is not necessarily a simple

measure of distance moved, as anxiety movement will be detected as activity and

such behaviour is typically triggered by a stimulus.

Given the encouraging results from the short-term exposure to aggregated COTS,

experiments were largely replicated but with observations spanning 8 h to establish

a definitive response. A threshold of >60% active time was imposed as a measure of

‘highly active’. When starfish are exposed long term to an aggregation water-borne

signal they are highly active for 45% of the time. There was a significant difference in

the activity of starfish exposed to aggregated COTS-conditioned seawater

(aggregation; Mean=216.8, Standard Error Mean=33.2) compared to seawater only

(control; Mean=0.66, SEM=0.45); t-test for Equality of Means (N=52)=7.89, Sig. (2-

tailed) p=0.000, (Fig. S7.5b, Supplementary Video S2). Further, those COTS exposed

to the conditioned water showed a reduction in meandering indicating they were

moving more directly

WWW.NATURE.COM/NATURE | 57

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

58

Fig. S7.5. Response of COTS to COTS aggregation-conditioned sea water over 8 h. a, Heat

maps showing the cumulative response of COTS over 8 h to water conditioned with six

aggregating COTS (N=22) and ambient (control) sea water (N=32). Red, area in which COTS

spent most of the time with descending time to blue; black, no presence. Green outline

represents the Y-maze and the arm divider that prevents recirculation of water into the

opposite arm; starter zones are shown in yellow. b, The duration of movement (highly active

threshold set at >60%; p<0.05) and c, the meander (change in direction of movement;

p<0.05) of active animals over 8 h. Control, header tank containing ambient sea water only;

Aggregation, header tank containing six COTS. Mean ± standard error.

towards the cue (Fig. S7.5c). As the COTS in header tank were inaccessible to the test

subjects this meant they could not physically join the aggregation but were clearly

WWW.NATURE.COM/NATURE | 58

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

59

agitated exhibiting behaviour indicative of searching for the source regardless of

time observed.

7.4. Behavioural response of COTS to signals from its main predator, the giant

triton Charonia tritonis

Behavioural responses of adult COTS were examined in the presence of the giant

triton, Charonia tritonis, using the same Y-maze setup as for the aggregation assays.

One starfish test subject was placed in each of the two Y-maze arms. Twenty-four

hours prior to the treatment experiments a giant triton was placed in the header

tank. One COTS was subjected to ambient flowing sea water entering via one of the

Y-maze arms while the second test subject was exposed to the giant triton-

conditioned sea water in the header tank via the second arm. Fresh control and test

subjects were used for each experiment. Given that COTS are nocturnal and

sedentary, these experiments were performed under simulated night conditions and

basic indicators of motivation and activity recorded. Motivation was determined if

the test animal moved out of the Y-maze arm and into the distal leg. Experiments

were run for 45 min (N=18). There was no movement of over 94% of COTS in the

control group (sea water only) whereas 50% of COTS exposed to the giant triton-

conditioned water moved, with 33% moving completely out of the arm. The control

COTS spent most of their time in one position whereas the exposed COTS moved

further downstream in the arm or as far away from the signal that it could get

(Extended Data Fig. 5, Supplementary Video S3). Some exposed COTS travelled to

the arm that was without predator water-borne signal. The pattern of responses of

COTS across the three movement categories was significantly associated with giant

triton-derived chemical signals in the water (Fischer exact test, p < 0.01). These

changes in motivation were graphically represented in heat maps as for the

aggregation assays.

The cumulative duration of movement, i.e. mobility state, was approximately 10

times greater in the COTS exposed to predator water-borne signal (298±87 sec)

compared to controls (29±16 sec) (p<0.05) (Extended Data Fig. 5). Control COTS

meander twice as much (25.8±2.63 deg mm-1) compared to those animals exposed

WWW.NATURE.COM/NATURE | 59

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

60

to the giant triton-conditioned sea water (12.7±1.34 deg mm-1) (p<0.05) (Extended

Data Fig. 5). COTS alarmed by the giant triton overall move further, in a near straight

line and more quickly than controls. As the giant triton in the header tank was

inaccessible to the test subjects this meant the COTS could not visually detect the

predator but were clearly agitated exhibiting behaviour indicative of a flight

response.

Additional Supplementary Tables

Table S7.1. Summary of all COTS exoproteins detected in the seawater.

Table S7.2a-n. Summary of 108 COTS secreted proteins detected in the seawater.

Supplementary Videos

Supplementary Video S1. Response of crown-of-thorns starfish over 45 minutes to

factors released by aggregating starfish. Time-lapse videos of 45 min Y-maze

behavioural assays showing in the first instance two crown-of-thorns starfish

subjected to flowing ambient seawater (control) and then two different COTS

subjected to flowing seawater conditioned with factors released by aggregating

COTS. Two example Y-mazes are shown (1, 2), with right (R) and left (L) arms. 270x

real time speed.

Supplementary Video S2. Response of crown-of-thorns starfish over 8 hours to

factors released by aggregating starfish. Time-lapse videos of 8 h Y-maze

behavioural assays showing in the first instance two crown-of-thorns starfish

subjected to flowing ambient seawater (control) and then two different COTS

subjected to flowing seawater conditioned with factors released by aggregating

COTS. Two example Y-mazes are shown (1, 2), with right (R) and left (L) arms. 480x

real time speed.

Supplementary Video S3. Response of crown-of-thorns starfish over 45 minutes to

factors released by their predator, the giant triton. Time-lapse videos of 45 min Y-

maze behavioural assays showing two crown-of-thorns starfish, one subjected to

flowing ambient seawater (control) and the other subjected to flowing seawater

WWW.NATURE.COM/NATURE | 60

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

61

conditioned with factors released by their predator, the giant triton. Two Y-mazes

are shown (1, 2), with right (R) and left (L) arms. 270x real time speed.

WWW.NATURE.COM/NATURE | 61

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

62

8. Identification and analysis of ependymin-related genes

8.1 Identification and manual curation of ependymin-related genes

Potential ependymin-related genes (EPDRs) were identified and aligned as described

in the Online Methods. From the alignment, it became evident that several of the

predicted proteins appeared to be incorrect, based upon corresponding transcript

sequences. The transcripts were then used as queries to identify the correct

intron/exon architecture of the genes in the genome assemblies. Using this method,

26 EPDR genes were found in GBR and OKI A. planci genomes (Table S8.1). The gene

arrangement, intron-exon structure and intergenic distances are largely consistent

between the OKI and GBR genomes, with two inconsistencies identified that likely

originate from the genome assembly (Fig. S8.1). The majority of the genes are found

in two clusters of 11 and 8 genes, with the remaining genes occurring as pairs or

single genes on other scaffolds (Fig. S8.1; created using FancyGene, Rambaldi and

Ciccarelli 2009).

8.2 Characteristics of COTS ependymin-related proteins

An alignment of the manually curated GBR ependymin-related proteins is presented

in Fig. S8.2. All proteins are predicted to possess signal peptides and are therefore

expected to be secreted. A sequence logo indicating the overall frequencies of amino

acids reveals a high level of conservation of a number of residues within the

alignment, most notably of six cysteine residues. The conservation of these residues

most likely indicates that they play structural role in this class of proteins.

WWW.NATURE.COM/NATURE | 62

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

63

Fig. S8.1. Genomic arrangement of OKI and GBR ependymin-related genes. The gene

arrangement, intron-exon structure, and intergenic distances are largely consistent between

the two genomes. Gene orthology is indicated by a dotted line between scaffolds of each

genome. Most of the genes are found in tandem on scaffolds 141/60 and 11/218 in the OKI

and GBR genomes, respectively. GBR scaffold 60 appears to be incorrectly assembled, as the

likely paralogue of oki.141.76c is found on a separate, short scaffold (gbr_scaffold1230); its

probable location on gbr_scaffold60 is indicated by a red dotted line. The regions of the

gene models shown in red for oki.141.72 and oki.141.73 lie within gaps in the scaffold; the

intron-exon arrangement has been predicted based upon the corresponding GBR gene

models. Most gene models have a conserved architecture of six exons, except those found

on scaffolds OKI 140/GBR 184, and OKI 8/GBR 8 that possess five and three exons,

respectively.

WWW.NATURE.COM/NATURE | 63

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

64

Figure S8.2. Alignment of COTS ependymin-related proteins. The signal peptide is indicated

by a blue box. The size of the amino acid residues in the sequence logo displayed beneath

the alignment indicates the proportion of proteins possessing that residue at the same

position. Residues that are conserved 50% and above across all sequences are shaded in

yellow.

WWW.NATURE.COM/NATURE | 64

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

65

8.3 Distribution of ependymin-related genes in other species

Given the large number of EPDR genes found in the COTS genomes - these genes are

generally found as single copies in non-teleost vertebrate genomes, (Suárez-Castillo

and García-Arrarás 2007), we investigated the content of EPDR genes in a range of

metazoan species as described in the Online Methods (Tables S8.1 - S8.3).

The number of unique ependymin domain-containing genes found in each species

was extremely variable (Tables S8.2 and S8.3). Some genomes, including the

nematode Caenorhabditis elegans and the arthropod Tribolium castaneum, appear

to have lost the gene family in its entirety; previous studies have also failed to find

ependymin-related genes in ecdysozoans (Suárez-Castillo and García-Arrarás 2007).

Expansions of the gene family have occurred in other lineages, however the

occurrence of these expansions does not appear to correlate with phylogeny. For

instance, although expansions appear to have occurred in the molluscs Pinctada

fucata and Lottia gigantea, a third molluscan species, Octopus bimaculoides,

possesses only two EPDR genes. From this survey, the largest expansion appears to

have occurred in COTS, which possesses 26 ependymin-related genes. Most other

ambulacrarians possess significantly fewer ependymin-related genes - with the

caveat that this analysis was performed on transcriptomes for most of these taxa,

and some members of the gene family may not have been identified. Interestingly,

however, 22 EPDR genes were identified in Patiria miniata, a closely related valvatid

asteroid. This points towards a potential expansion of the EPDR gene family in the

order Valvatida.

8.4 Phylogenetic analysis of ependymin-related genes

To reveal the relationship between COTS ependymin-related genes and those

previously identified (Suárez-Castillo and García-Arrarás 2007), phylogenetic analysis

was performed using a subset of the sequences identified in Table S8.3 (Fig. 3b; Fig.

S8.3). The alignment included ependymin-related sequences identified in taxa for

which whole genome data are available, as well as those identified in the

transcriptomes of two other valvitid asteroids, P. miniata and Asterina pectinifera,

and a more distantly-related forcipulatid asteroid, Labidaster annulatus. To

WWW.NATURE.COM/NATURE | 65

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

66

determine whether the apparent expansion of EPDRs in COTS and other valvatid

asteroids resulted from a lineage-specific expansion, an additional analysis was

performed using all sequences identified in ambulacrarian transcriptomes (Table

S8.3, Extended Data Fig. 7). Analyses were performed as described in the Online

Methods. The ML trees are presented in Fig. 3b, Extended Data Fig. 7 and Fig. S8.2.

The analysis reveals that the majority of the COTS ependymin-related genes fall

within two clades (Fig. 3b; clades 4 and 5 in Extended Data Fig. 7; Fig. S8.2), and in

many cases have closely related orthologues in P. miniata and A. pectinifera, and less

frequently L. annulatus. The purple sea urchin (Strongylocentrotus purpuratus)

sequences can be found in a separate clade. This indicates that there has been a

large expansion of EPDR genes within both the Asteroidea and the Valvatida. In

COTS, the majority of genes within each of the clades are found clustered within the

genome (Figs 3c and S8.1), indicating that gene expansion has occurred via tandem

duplication.

Clade 1 in Extended Data Fig. 7 within the tree contains members with a broad

taxonomic distribution, including vertebrates, cnidarians and molluscs, and

encompasses many of the EPDR genes reported in Suárez-Castillo and García-Arrarás

(2007). The true ependymins, which are fish-specific, restricted to the brain, and

involved in memory formation (Shashoua 1991; indicated in Figs 3 and S8.3 by a

star), also fall within this clade. The clade has one COTS member that groups closely

with P. miniata and S. purpuratus genes. Interestingly, this putative ancestral

ependymin is found on a separate scaffold to other COTS-related genes and has a

distinct intron-exon arrangement (Fig. S8.1).

WWW.NATURE.COM/NATURE | 66

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

67

Table S8.2. Ependymin-related gene family numbers in metazoan genomes, and echinoderm genomes and transcriptomes

Name EPDRs Phylum Class Superorder Order Family Reference Accession/version/source

Acanthaster planci 26 Echinodermata Asteroidea Valvatacea Valvatida Acanthasteridae Asterina pectinifera 7* Echinodermata Asteroidea Valvatacea Valvatida Asterinidae Reich et al. 2015 SRX445871 Patiria miniata 22* Echinodermata Asteroidea Valvatacea Valvatida Asterinidae Reich et al. 2015# SRX445851/SRX1261879 Solaster stimpsoni 0* Echinodermata Asteroidea Valvatacea Valvatida Solasteridae Telford et al. 2014 Luidia clathrata 3* Echinodermata Asteroidea Valvatacea Paxillosida Reich et al. 2015 SRX445684 Luidia senegalensis 2* Echinodermata Asteroidea Valvatacea Paxillosida O’Hara et al. 2014 SRX1625090 Asterias amurensis 0* Echinodermata Asteroidea Forcipulatacea Reich et al. 2015 SRX445872 Asterias forbesi 0* Echinodermata Asteroidea Forcipulatacea Reich et al. 2015 SRX445857 Asterias vulgaris 2* Echinodermata Asteroidea Forcipulatacea Reich et al. 2015 SRX445860 Labidaster annulatus 13* Echinodermata Asteroidea Forcipulatacea Cannon et al. 2014 SAMN03012748/12747 Leptasterias sp. 3* Echinodermata Asteroidea Forcipulatacea Reich et al. 2015 SRX445863 Marthasterias glacialis 7* Echinodermata Asteroidea Forcipulatacea Reich et al. 2015 SRX445866 Pisaster ochraceus 2* Echinodermata Asteroidea Forcipulatacea Reich et al. 2015 SRX445868 Echinaster spinulosus 12* Echinodermata Asteroidea Spinulosacea Reich et al. 2015 SRX446364 Henricia sp. 9* Echinodermata Asteroidea Spinulosacea Reich et al. 2015 SRX445861 Amphipholis squamata 10* Echinodermata Ophiuroidea Telford et al. 2014 Astrotoma agassizii 3* Echinodermata Ophiuroidea Cannon et al. 2014 SAMN03012756 Ophiactis abyssicola 5* Echinodermata Ophiuroidea O’Hara et al. 2014 SRX1625094 Ophiocoma echinata 7* Echinodermata Ophiuroidea Reich et al. 2015 SRX445856 Ophiomyxa australis 5* Echinodermata Ophiuroidea O’Hara et al. 2014 SRX1625098 Eucidaris tribuloides 0* Echinodermata Echinoidea Reich et al. 2015 SRX445845 Strongylocentrotus purp. 8 Echinodermata Echinoidea Sea Urchin GSC 2006 GCA_000002235.2 Ensembl Leptosynapta clarki 2* Echinodermata Holothuroidea Cannon et al. 2014 SAMN03012745 Parasticophus californ. 5* Echinodermata Holothuroidea Cannon et al. 2014 SAMN03012744 Apometra wilsoni 4* Echinodermata Crinoidea O’Hara et al. 2014 SRX1625091 Dumetocrinus sp 5* Echinodermata Crinoidea Cannon et al. 2014 SAMN03012750 Saccoglossus kowalevskii 4 Hemichordata Simakov et al. 2015 PRJNA42857 Cephalodiscus gracilis 5* Hemichordata Cannon et al. 2014 SAMN03012629 Ptychodera bahamensis 2* Hemichordata Cannon et al. 2014 SAMN03012539 Danio rerio 4 Chordata GCA_000002035.3 Ensembl Homo sapiens 1 Chordata GCA_000001405.20 Ensembl Mus musculus 1 Chordata GCA_000001635.6 Ensembl Onchorhyncus mykiss 6 Chordata Berthelot et al. 2014 Tetraodon nigroviridis 2 Chordata TETRAODON 8.0 Ensembl Takifugu rubripes 4 Chordata FUGU 4.0 Ensembl Xenopus tropicalis 1 Chordata Hellsten et al. 2010 GCA_000004195.1 Ensembl Lottia gigantea 16 Mollusca Simakov et al. 2013 v.1.0 JGI Proteins:Filt.Models Pinctada fucata 13 Mollusca Takeuchi et al. 2016 v.2.0 Octopus bimaculoides 2 Mollusca Albertin et al. 2016 PRJNA270931Capitella teleta 4 Annelida Simakov et al. 2013 v.1.0 JGI Proteins:Filt.Models Tribolium castaneum 0 Arthropoda Tribolium GSC 2008 GCA_000002335.2 EnsemblCaenorhabditis elegans 0 Nematoda GCA_000002985.3 EnsemblTrichoplax adhaerens 10 Placozoa Srivastava et al. 2008 Triad1 JGI best_proteins Acropora digitifera 0 Cnidaria Shinzato et al. 2011 Nematostella vectensis 1 Cnidaria Putnam et al. 2007 v.1.0 JGI Proteins:Filt.Models

Amphimedon queensl. 12 Porifera Srivastava et al. 2010

* Estimate is from a transcriptome and may not represent the complete gene complement in that species. # Also unpublished.

WWW.NATURE.COM/NATURE | 67

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

Figure S8.3. Phylogenetic tree of ependymin-related genes. This is a more detailed depiction of the

tree presented in Fig. 3b. Midpoint rooted maximum likelihood phylogenetic tree of EPDR genes in

selected lineages. GBR COTS sequence names are in red, ovals next to the names indicate whether

the protein was found within the exoproteome in aggregating (red) or in both aggregating and

alarmed (green) animals. The true ependymins (expressed in the brain of teleosts) are indicated by a

star. Branches with ML bootstrap values >70 and Bayesian posterior probability values >0.9 are

indicated by a solid line, those with lower values are indicated by a dashed line. The scale bar

indicates the number of substitutions per site. Ape, Asterina pectinifera; Cte, Capitella teleta; Dre,

Danio rerio; Hsa, Homo sapiens; Lan, Labidaster annulatus; Lgi, Lottia gigantean; Mmu, Mus

musculus; Nve, Nematostella vectensis; Omy, Onchorhyncus mykiss; Pmi, Patiria miniata; Sko,

Saccoglossus kowalevskii; Spu, Strongylocentrotus purpuratus; Tni, Tetraodon nigroviridis; Tru,

Takifugu rubripes; Xtr, Xenopus tropicalis.

Additional Supplementary Tables

Table S8.1. GBR-OKI EPDRs (curated).

Table S8.3. EPDRs used in analyses.

WWW.NATURE.COM/NATURE | 68

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

69

9. Identification and analysis of GPCRs

We screened the protein models of 6 deuterostome species – Acanthaster planci (both OKI

and GBR genomes), Branchiostoma floridae, Homo sapiens, Saccoglossus kowalevskii,

Ptychodera flava, and Strongylocentrotus purpuratus – using PFAM-scan.pl

(ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/) against version 27 of the PFAM-A

database (Krishnan et al. 2014; Table S9.1). Sequences annotated by PFAM_scan.pl with

domains in GPCR_A Pfam clan (CL0192), and with at least 5 transmembrane regions

according to hmmtop (Tusnady and Simon 2001), were considered to be GPCRs and were

further annotated with InterProScan 5.8-49.0 (Jones et al. 2014).

Large-scale phylogenetic analysis of olfactory receptors (ORs) from several chordates

showed that in most species the ORs expanded in a lineage-specific manner (Adipietro et al.

2012, Khan et al. 2015, Niimura 2012). However, identification of ORs in non-chordates

based simply on sequence similarity has been difficult (Nei et al. 2008). Helpful to the

present analysis is the genome of another echinoderm, the sea urchin S. purpuratus, which

encodes a large taxon-specific expansion of rhodopsin family GPCRs (Raible et al. 2006). A

substantial fraction of these appear to be OR-like, based on gene architecture (single exon

genes as vertebrate ORs) and on their expression in tissues with known chemosensory

function, including pedicellariae and tube feet (Raible et al. 2006).

Many sequences were annotated as rhodopsin and hence sequences annotated with PFAM

00001 were trimmed specifically to the region annotated as “7 transmembrane receptor

(rhodopsin family)” by InterProScan and subsequently parsed into subfamilies using

FastOrtho, a modified version of OrthoMCL (Li et al. 2003) with inflation parameters of 1.5.

FastOrtho identified 957 groups of at least two GPCRs in the rhodopsin family (7tm_1)

(Table S9.2).

The other GPCRs were similarly trimmed to the transmembrane receptor region for

phylogenetic analysis. The annotations used for trimming for each of these GPCRs were as

follows: 7TM_3/Glutamate (PF00003); Dicty_CAR (PF05462) “G-protein coupled receptors

WWW.NATURE.COM/NATURE | 69

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

70

family 2 profile 2”; Frizzled (PF01534) “Frizzled/Smoothened family membrane region”;

GpcrRhopsn4 (PF10192) “rhodopsin-like GPCR transmembrane domain”; Lung_7-TM_R

(PF06814) “Lung seven transmembrane receptor”; and Ocular_alb (PF02101) “Ocular

albinism type 1 protein”. Phylogenetic analyses were conducted on the transmembrane

receptor region for each GPCR family using FastTree 2 (Price et al. 2010) with the slow

gamma model options (Tables S9.3-S9.6).

9.1 Tissue expression of GPCRs

To examine tissue-specific patterns of GPCR gene expression, we inferred the expression of

all genes in each of our transcriptomes using rsem (Li et al. 2011). Expression data, in terms

of fragments per kilobase of transcript per million mapped reads (FPKM), was then

assembled into a data matrix and visualized using Pretty Heatmap (https://cran.r-

project.org/web/packages/pheatmap) in R (R Core Team 2015).

9.2 Identification and phylogenetic analysis of olfactory receptor-like genes

Olfactory receptors (ORs) constitute the largest multigene family in mammals and the

number and diversity of ORs vary markedly from species to species (Nei et al. 2008). In

vertebrates, olfaction is largely mediated by ORs belonging to the rhodopsin family (Class A)

of GPCRs (Fredriksson et al. 2003). OR repertoires have been characterised in a number of

chordates, including terrestrial mammals, fishes, sea lamprey and the cephalochordate

amphioxus (Niimura 2009). Vertebrate ORs are categorised into Type I and Type II ORs,

which are further classified into six (α-ζ) and five (η-λ) groups, respectively (Niimura 2009).

Mapping of OR repertoires in terrestrial and marine chordates showed that Type I α and γ

groups are more specific for detecting airborne odourants, whereas Type I δ, ε, and ζ and

Type II η are likely to detect water-borne chemical signals (Niimura 2009). Large-scale

phylogenetic analysis of ORs from several chordates showed that in most species, the ORs

expanded in a lineage-specific manner (Adipietro et al. 2012, Khan et al. 2015, Niimura

2012,). Given these lineage-specific expansions, which have resulted in a diversity of

chordate ORs, identification of ORs in non-chordates based simply on sequence similarity

has been difficult (Nei et al. 2008). A substantial fraction of the large taxon-specific

expansion of rhodopsin family GPCRs in S. purpuratus appear to be OR-like, based on gene

WWW.NATURE.COM/NATURE | 70

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

71

architecture (single exon genes as vertebrate ORs) and their expression in chemosensory

tissues (Raible et al. 2006).

To identify putative OR-like genes in the COTS genome, we thus utilised a similar

methodology as described previously (Niimura 2009, Niimura 2013) with some

modifications to incorporate the approaches of Raible et al. (2006). Initially, we conducted a

Pfam search against all predicted protein models of both GBR and OKI COTS genomes, using

an e-value cut-off of 0.001. Although we identified 766 sequences that align with the 7tm_1

domain (PF00001), representative to the class A or rhodopsin-like GPCRs, a screen with the

7tm_4 Pfam HMM model (PF13853; e-value cut-off of 0.001) identified no ORs in either

COTS genome. This anomaly appears to be because the seed alignment for the 7tm_4

model contains only the mammalian representatives, which comprises recently duplicated

OR-like genes. Analogous scenarios were observed in amphioxus (Churcher and Taylor

2009), and in other invertebrates including S. purpuratus (Churcher and Taylor 2011), where

7tm_4 model failed to readily identify OR-like sequences. Therefore, to search for putative

OR-like sequences in A. planci, we built 13 distinct HMM profiles from previously curated OR

repertoires, comprising those from fish (fugu, medaka, pufferfish, zebrafish and stickleback),

amphioxus, sea urchin (“Specific rapidly expanded lineages of rhodopsin family” GPCRs

(Surreal GPCRs) groups A-F) and manually curated ORs from Swiss-Prot. All non-redundant

hits were retrieved from the combined results of all HMM searches and, as anticipated, as

all class A GPCRs share key residues, the non-redundant dataset contained a large number

of non-ORs. These include an E/DRY motif at the junction of TM3 and intracellular loop, and

an NpxxY motif in TM7. To distinguish between ORs from rest of the 12 rhodopsin

subfamilies (non-ORs), we conducted a BLASTP search (default settings) against a local

database containing all class A or rhodopsin-like GPCRs from the Swiss-Prot database. As

observed for the S. purpuratus Surreal GPCRs (Raible et al. 2006), this approach yielded an

unexpectedly low number of rhodopsin GPCR genes in both GBR and OKI genomes (four

each) that could be unambiguously categorised into OR subfamily. This anomaly is because

the top five Blastp hits of A. planci rhodopsin-family GPCRs contained both ORs as well as

other non-OR Rhodopsin subfamilies, preventing the unambiguous classification of these as

ORs. Conversely, an all-against-all comparison of COTS rhodopsin-like GPCRs revealed a

WWW.NATURE.COM/NATURE | 71

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

72

number of paralogous clusters of putative OR-like genes, as observed earlier in S.

purpuratus Surreal GPCRs.

To determine if these COTS paralogous clusters of Class A GPCRs are species-specific – as is

the case for the Surreal GPCRs - and to resolve their relationship to other “Class A”

deuterostome GPCRs, we conducted a large-scale comparative phylogenetic analysis. The

dataset included class A rhodopsin-like GPCRs from S. purpuratus, which includes the

Surreal GPCRs, from two hemichordates (P. flava and S. kowalevskii), as well as ORs from

fish (fugu, medaka, pufferfish, zebrafish, stickleback) and amphioxus. All sequences that

contained 5 to 7 transmembrane helices were considered for phylogenetic analysis. The

final dataset (2615 sequences) was aligned using MAFFT V7 with the FFT-NS-2 progressive

method (Katoh and Standley 2013) and the alignment was manually trimmed to conserved

blocks of transmembrane regions for phylogenetic tree reconstruction. The ML phylogenetic

tree topology shown in Fig. 4b was built using MEGA7 using a Poisson model with rate

uniformity across sites (Kumar et al. 2016). Attempts to verify the inferred tree topology

using the Bayesian approach implemented in MrBayes3.2 (Ronquist et al. 2012) and ML

methods implemented in RAxML (Stamatakis et al. 2006) yielded unresolved trees, likely

due to size and divergence of the dataset. Nonetheless, the reported ML topology supports

several paralogous clusters (a to k) of COTS rhodopsin family GPCRs that appear to be

closely related to surreal GPCRs in the S. purpuratus genome (Raible et al. 2006). These

paralogous clusters are distinct from non-OR rhodopsin-like GPCRs as well as distinct from

fish and amphioxus ORs, implying that both A. planci and S. purpuratus have undergone

lineage-specific expansions of rhodopsin-like GPCRs that may perform analogous

chemosensory functions to ORs in other species.

9.3 Identification of OR-like Motifs

Although ORs have undergone taxon-specific expansions and diversifications in several

chordate taxa, they share a few characteristic OR-like amino acid motifs. Such conserved

characteristic motifs include LxxPxYxxxxxLxxxDxxxxxxxxP, KAxxTxxxH, MAxDRYxxxCxPLxY,

SSxxNPxxY (Churcher and Taylor 2009, Churcher and Taylor 2011); the latter two may not be

specific to ORs as they comprise the DRY and NPxxY usually found in all Rhodopsin-like

GPCRs. These motifs were previously used to query/search for ORs in non-chordate species,

WWW.NATURE.COM/NATURE | 72

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

73

and were also found to occur in echinoderms and cnidarian OR-like sequences, although not

all residues were found to be conserved (Churcher and Taylor 2009, Churcher and Taylor

2011). We here considered LxxPxYxxxxxLxxxDxxxxxxxxP and KAxxTxxxH to be indicative of

OR-related sequences and searched for these in putative COTS OR-like paralogous clusters

identified as described above. Of these, the LxxxxxxxxxxLxxxD motif is encoded in most

genes comprising COTS OR-like clusters; the KAxxTxxxH is largely absent. An alignment of

representative mammalian, amphioxus, sea urchin and COTS OR-like sequences reveal a

strong conservation of the LxxxxxxxxxxLxxxD motif across deuterostome phyla (Fig. S9.1).

Additional Supplementary Tables Table S9.1. Comparison of ambulacrarian and amphioxus GPCR repertoires.

Table S9.2. Class A GPCRs – rhodopsin receptors.

Table S9.3. Class B GPCRs – adhesion and secretin receptors.

Table S9.4. Class C GPCRs - glutamate receptors.

Table S9.5. Class F GPCRs - Frizzled receptors.

Table S9.6. Other GPCR receptors.

WWW.NATURE.COM/NATURE | 73

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

74

Fig. S9.1. Alignment showing the presence one of the characteristic OR-like motifs conserved across

deuterostome taxa. A few representative sequences are aligned from each cluster shown on the left.

Residues that are conserved 50% and above across all sequences are shaded in yellow. Panels on the

right shows amino acid logos of the corresponding clusters made using all sequences of each cluster.

Residues that aligned with the characteristic LxxPxYxxxxxLxxxDxxxxxxxxP motif are highlighted in red

in both the alignment and the amino acid logo.

WWW.NATURE.COM/NATURE | 74

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

75

10. References Adipietro KA, Mainland JD, Matsunami H (2012) Functional evolution of mammalian odorant

receptors. PLoS Genet 8:e1002821.

Adjeroud M et al. (2009) Recurrent disturbances, recovery trajectories, and resilience of coral assemblages on a South Central Pacific reef. Coral Reefs 28:775–780.

Albertin CB et al. (2015) The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature 524:220-224.

Alqaisi KM et al. (2016) A comparative study of vitellogenesis in Echinodermata: Lessons from the sea star. Comp Biochem Physiol A Mol Integr Physiol 198:72-86.

Audsley N, Down RE (2015) G protein coupled receptors as targets for next generation pesticides. Insect Biochem Mol Biol 67:27-37.

Babcock RC, Mundy CN, Whitehead D (1994) Sperm diffusion models and in situ confirmation of long-distance fertilization in the free-spawning Asteroid Acanthaster planci. Biol Bull 186:17-28.

Babendreier D (2007) Pros and Cons of Biological Control. pp 403-418. In ‘Biological Invasions’, Nentwig W (ed) Ecol Ser 193. Springer-Verlag, Berlin.

Baird AH, Pratchett MS, Hoey AS, Herdiana Y, Campbell SJ (2013) Acanthaster planci in a major cause of coral mortality in Indonesia. Coral Reefs 32:803-812.

Barham EG, Gowdy RW, Wolfson FH. (1973) Acanthaster (Echinodermata, Asteroidea) in the Gulf of California. US Nat Mar Fish Serv Fish Bull 71:927–942.

Barnes JH (1966) The crown-of-thorns starfish as a destroyer of coral. Aust Nat Hist 15:257-261.

Bateman A et al. (2004) The Pfam protein families database. Nucl Acids Res 32:D138–D141.

Benzie JAH, Black KP, Moran PJ, Dixon P (1994) Small-scale dispersion of eggs and sperm of the crown-of- thorns starfish (Acanthaster planci) in a shallow coral reef habitat. Biol Bull 186:153–167.

Berthelot C et al. (2014) The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat. Comm. 5:3657.

Birkeland C (1982) Terrestrial runoff as a cause of outbreaks of Acanthaster planci (Echinodermata: Asteroidea). Mar Biol 69:175-185.

Birkeland C, Randall RH (1979) Report on the Acanthaster planci (Alamea) studies in Tutuila, American Samoa. NOAA: Local climatological data. Annual summary with comparative data. Pago Pago, American Samoa.

Birkeland CE, Lucas JS (1990) Acanthaster planci: major management problems of coral reefs. CRC Press, Boca Raton, Florida; 257 p.

Boetzer M et al. (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinform 27:578–79.

Bouchon C (1985) Quantitative study of scleractinian coral communities of Tiahura Reef (Moorea Island, French Polynesia. Proc 5th Coral Reef Congr 6:279-284.

Branham JM, Reed SA, Bailey JH, Caperon J (1971) Coral-eating sea stars Acanthaster planci in Hawaii. Science 172:1155.

Caers J et al. (2012) More than two decades of research on insect neuropeptide GPCRs: an overview. Front Endocrin 3:2-30.

WWW.NATURE.COM/NATURE | 75

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

76

Camacho C et al. (2009) BLAST+: architecture and applications. BMC Bioinform 10:421.

Cameron A et al. (2015) Do echinoderm genomes measure up? Mar Genomics 22:1-9.

Cameron AM, Endean R, DeVantier LM (1991) Predation on massive corals: Are devastating population outbreaks of Acanthaster planci novel events? Mar Ecol Prog Ser 75:251–258.

Cannon JT et al. (2014) Phylogenomic resolution of the hemichordate and echinoderm clade. Curr Biol 24:2827–2832.

Chapman JA et al. (2011) Meraculous: de novo genome assembly with short paired-end reads. PLOS ONE 6:e23501.

Cheney DP (1973) An analysis of the Acanthaster control programs in Guam and Trust Territory of the Pacific Islands. Micronesica 9:171.

Chikhi R, Medvedev P. (2013) Informed and automated k-mer size selection for genome assembly. Bioinform 30:31-37.

Churcher AM, Taylor JS (2009) Amphioxus (Branchiostoma floridae) has orthologs of vertebrate odorant receptors. BMC Evol Biol 9:242.

Churcher AM, Taylor JS (2011) The antiquity of chordate odorant receptors is revealed by the discovery of orthologs in the cnidarian Nematostella vectensis. Genome Biol Evol 3:36-43.

Cohen E (2014) Advances in Insect Physiology: Target Receptors in the Control of Insect Pests: Part II. Elsevier, Amsterdam, 495 p.

Cole RN, Burggren WW (1981) The contribution of respiratory papulae and tube feet to oxygen uptake in the sea star Asterias forbes. Mar Biol Lett 2:279-287.

Conand C (1984) Distribution, reproductive cycle and morphometric relationships of Acanthaster planci (Echinodermata: Asteroidea) in New Caledonia, western tropical Pacific. Proc 5th Intl Echinoderm Conf:499–506.

Connell JH, Hughes TP, Wallace CC (2015) A 30-year study of coral abundance, recruitment and disturbance at several scales in space and time. Ecol Monogr 67:461-488.

Cote RG et al. (2012)The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium. Mol Cell Proteomics 11:1682-1689.

De’ath G, Fabricius K, Sweatman H, Puotinen M (2012) The 27-year decline of coral cover on the Great Barrier Reef and its causes. Proc Natl Acad Sci USA 109:17995-17999.

Dong G et al. (2011) Chemical constituents and bioactivities of starfish. Chem Biodiv 8:740-791.

Endean R, Chesher RH (1973) Temporal and spatial distribution of Acanthaster planci population explosions in the Indo-West Pacific region. Biol Conserv 5:87.

Farmanfarmaian A (1966) The Respiratory Physiology of Echinoderms. pp 245- 265. In Boolootian RA (editor) Physiology of Echinodermata. John Wiley & sons, New York.

Fraser N, Crawford B, Kusen J (2000) Best practices guide for crown-of-thorns clean-ups. Coastal Resources Center Coastal Management Report #2225. Proyek Pesisir, CRC/URI CRMP, NRM Secretariat, Ratu Plaza Building 18th Floor Jl. Jenderal Sudirman 9, Jakarta Selatan 10270, Indonesia, 37 pp.

WWW.NATURE.COM/NATURE | 76

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

77

Fredriksson R, Lagerstrom MC, Lundin LG, Schioth HB (2003) The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. Mol Pharm 63:1256-1272.

Galtsoff PS, Loosanoff VL (1939) Natural history and method of controlling the starfish (Asterias forbesi). Bull US Bur Fish 49:75-132.

Garlovsky DF, Bergquist A (1970) Crown-of-thorns starfish in Western Somoa. S Pac Bull 20:47

Garner D (1971) A report on the preliminary findings of a brief study on the crown-of-thorns starfish (Acanthaster planci) carried out on the island of Malaaita in the British Solomon Islands Protectorate. Regional Symp Conserv Nature – Reef and Lagoons, South Pacific Commission, Noumea, New Caledonia.

Gladstone W (1992) Observations of crown-of-thorns starfish spawning. Mar Freshw Res 43:535-537.

Gouezo M et al. (2015) Impact of two sequential super typhoons on coral reef communities in Palau. Mar Ecol Prog Ser 540:73-85.

Haas BJ et al. (2008) Automated eukaryotic gene structure annotation using evidence modeler and the program to assemble spliced alignments. Genome Biol 9:1-22.

Hellsten U et al. (2010) The genome of the Western clawed frog Xenopus tropicalis. Science 328:633-636.

Hock K, Wolff NH, Condie SA, Anthony RN, Mumby PJ (2014) Connectivity networks reveal the risks of crown-of-thorns starfish outbreaks on the Great Barrier Reef. J Appl Ecol 51:1188-1196.

Hostettmann K, Marston A (1995) Saponins. Cambridge University Press, Cambridge. 555 p.

Jones P et al. (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236-1240.

Kanayama RK (1970). The Crown-of-thorns starfish. Aloha Aina Dept Land Nat Resources Hawaii 1:16-18.

Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 33:511–518.

Kayal M et al. (2012) Predator crown-of-thorns starfish (Acanthaster planci) outbreak, mass mortality of corals and cascading effects on reef fish and benthic communities. PLOS ONE 7:e47363.

Kenchington RA, Pearson R (1981) Crown of thorns starfish on the Great Barrier Reef; a situation report. Proc 4th Intl Coral Reef Symp 2:597-600.

Kent JW et al. (2002) The human genome browser at UCSC. Genome Res 12:996–1006.

Kettle BT, Lucas JS (1987) Biometric relationships between organ indices, fecundity, oxygen consumption and body size in Acanthaster planci (Echinodermata; Asteroidea). Bull Mar Sci 41:541–551.

Khan I et al. (2015) Olfactory receptor subgenomes linked with broad ecological adaptations in Sauropsida. Mol Biol Evol 32:2832-2843.

Krishnan A et al. (2014) The GPCR repertoire in the demosponge Amphimedon queenslandica: insights into the GPCR system at the early divergence of animals. BMC Evol Biol 14:270.

WWW.NATURE.COM/NATURE | 77

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

78

Kumar, S, Stecher G, Tamura K (2016) MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol. [in press]

Lawrence JM (2013) Starfish: Biology and Ecology of the Asteroidea. John Hopkins University Press, Baltimore, 267 p.

Lee C-C, Tsai W-S, Hsieh HJ, Hwang D-F (2013) Hemolytic activity of venom from crown-of-thorns starfish Acanthaster planci. J Venom Anim Toxins Incl Trop Dis 19:22.

Lee C-C, Hsieh HJ, Hwang D-F (2014) Cytotoxic and apoptotic activities of the plancitoxin I from the venom of crown-of-thorns starfish (Acanthaster planci) on A375.S2 cells. J Appl Toxicol 35:407-417.

Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform 12:323.

Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178-2189.

Lin LI (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255-268.

Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550

Lucas JS, Hart RJ, Howden ME, Salathe R (1979) Saponins in eggs and larvae of Acanthaster planci (L.) (Asteroidea) as chemical defences against planktivorous fish. J Expt Mar Biol Ecol 40: 155–165.

Marcais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinform 27:764–70.

Marsh JA (1972) Past and present status of Acanthaster planci in Palau. Proc Univ Guam – Trust Territory Acanthaster planci Workshop. Tsuda RT (ed) Univ Guam Mar Lab Tec Rep No 3, 24.

Mendonça VA et al. (2010) Persistent and expanding population outbreaks of the corallivorous starfish Acanthaster planci in the Northwestern Indian Ocean: are they really a consequence of unsustainable starfish predator removal through overfishing in coral reefs, or a response to a changing environment? Zool Stud 49:108–123.

Menge BA, Freidenburg TL (2001) Keystone species. Pp 613-631. Levin SA (editor) Encyclopedia of Biodiversity. Academic Press, New York.

Menge BA, Sanford E (2013) Ecological role of sea stars from populations to meta-ecosystems. Pp 67-80. In Lawrence JM (ed) Starfish: Biology and Ecology of the Asteroidea. John Hopkins University Press, Baltimore.

Menge BA, Sanford E (2013) Ecological role of sea stars from populations to meta-ecosystems. pp 67-80. In Lawrence JM (ed) Starfish: Biology and Ecology of the Asteroidea. John Hopkins University Press, Baltimore.

Messmer V, Pratchett MS, Clark TD (2013) Capacity for regeneration in crown of thorns starfish, Acanthaster planci. Coral Reefs. 32:461-461.

Moran PJ (1986) The Acanthaster phenomenon. Ocean Marine Biol: Ann Rev 24:379-480.

Mori C et al. (2015) Through bleaching and tsunami: Coral reef recovery in the Maldives. Mar Poll Bull 98:188-200.

Moutardier G et al. (2015) Lime juice and vinegar injections as cheap and natural alternative to control COTS outbreaks. PLOS ONE 10: e0137605.

WWW.NATURE.COM/NATURE | 78

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

79

Muhando CA, Lanshammar F (2008) Ecological effects of the crown-of-thorns starfish removal programme on Chumbe Island Coral Park, Zanzibar, Tanzania. Proc 11th Int Coral Reef Symp 23:1127-1131.

Nakamura K (1972) Past and present status of Acanthaster planci in Yap Proc Univ Guam – Trust Territory Acanthaster planci Workshop. Tsuda RT (ed) Univ Guam Mar Lab Tech Rep No 3, 23.

Nakamura M, Okaji K, Higa Y, Yamakawa E, Mitarai S. (2014) Spatial and temporal population dynamics of the crown-of-thorns starfish, Acanthaster planci, over a 24-year period along the central west coast of Okinawa Island, Japan. Mar Biol 161:2521–2530

Nei M, Niimura Y, Nozawa M (2008) The evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity. Nat Rev Genet 9:951-963.

Niimura Y (2009) On the origin and evolution of vertebrate olfactory receptor genes: comparative genome analysis among 23 chordate species. Genome Biol Evol 1:34-44.

Niimura Y (2012) Olfactory receptor multigene family in vertebrates: from the viewpoint of evolutionary genomics. Curr Genomics 13:103-114.

Niimura Y (2013) Identification of chemosensory receptor genes from vertebrate genomes. Meth Mol Biol 1068:95-105.

O’Hara TD, Hugall AF, Thuy B, Moussalli A (2014) Phylogenomic resolution of the class Ophiuroidea unlocks a global microfossil record. Curr Biol 24:1874-1879.

Omori M (2011) Degradation and restoration of coral reefs: Experience in Okinawa, Japan. Mar Biol Res 7:3012.

Onizuka EW (1976) Studies on the effects of crown-of-thorns starfish on marine game fish habitat. Job Progress Report, Statewide Dingell-Johnson Program, Hawaii.

Osborne K, Dolman AM, Burgess SC, Johns KA (2011) Disturbance and the dynamics of coral cover on the Great Barrier Reef (1995-200). PLOS ONE 6: e17516.

Owens D (1971) Acanthaster planci starfish in Fiji: a survey of incidence and biological studies. Fiji Agric J 33:15.

Pearson R (1981) Recovery and recolonization of coral reefs. Mar Ecol Prog Ser 4:105–122.

Porter JW (1972) Predation by Acanthaster and its effect on coral species diversity. Amer Nat 106:487-492.

Pratchett MS (2010) Changes in coral communities during an outbreak of Acanthaster planci at Lizard Island, northern Great Barrier Reef (1995-1999). Coral Reefs 29:717-725.

Pratchett MS, Caballes CF, Rivera-Posada JA, Sweatman HPA (2014) Limits to understanding and managing outbreaks of crown-of-thorns starfish (Acanthaster planci). pp 133-200. In Hughes RM, Hughes DJ, Smith IP (eds) Ocean Mar Biol Ann Rev 52. CRC Press.

Putnam NH et al. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317:86-94.

Rambaldi D, Ciccarelli FD (2009) FancyGene: dynamic visualization of gene structures and protein domain architectures on genomic loci. Bioinformatics 25:2281-2282.

Randall JE (1972) Chemical pollution and the sea and the crown-of-thorns starfish (Acanthaster planci). Biotropica 4:132.144.

Rand C, Medvedev P (2014) Informed and automated k-mer size selection for genome assembly. Bioinform 30:31–37.

WWW.NATURE.COM/NATURE | 79

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

80

Reich A, Dunn C, Akasaka K, Wessel, G. (2015) Phylogenomic analyses of Echinodermata support the sister groups of Asterozoa and Echinozoa. PLOS ONE 10:e0119627.

Richards S et al. (2008) The genome of the model beetle and pest Tribolium castaneum. Nature 452:949-955.

Riegl B, Berumen M, Bruckner A (2013) Coral population trajectories, increased disturbance and management intervention: a sensitivity analysis. Ecol Evol 3:1050-1064.

Rivera-Posada J, Owens L (2014) Osmotic shock as alternative method to control Acanthaster planci. J Coast Life Med 2:99-106.

Rivera-Posada J, Pratchett MS (2012) A review of existing control efforts for A. planci; limitations to success. Report to the Department of Sustainability, Environment, Water, Population & Communities. NERP, Tropical Environmental Hub. Townsville, 26 pp.

Rivera-Posada J, Caballes CF, Pratchett MS (2013) Lethal doses of oxbile, peptones and thiosulfate-citrate-bile-sucrose agar (TCBS) for Acanthaster planci; exploring alternative population control options. Mar Poll Bull 75:133-139.

Rivera-Posada J, Pratchett MS, Aguilar C, Grand A, Cabelles GF (2014) Bile salts and the single-shot lethal method for killing crown-of-thorns sea stars (Acanthaster planci). Ocean Coast Manag 102:383-390.

Rivera-Posada J, Prattchet M (2012) A review of existing control efforts for A. planci; limitations to successes. Report to the Department of Sustainability, Environment, Water, Population & Communities, NERP, Tropical Environmental Hub. Townsville, June 5 2012. 26 p.

Rivol G, Schiel DR (2011) Community regulation: The relative importance of recruitment and predation intensity on an intertidal community dominant in a seascape context. PLOS ONE 6: e23958.

Ronquist F et al. (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539-542.

Sablan B (1972) Past and present status of Acanthaster planci in the Marshall Islands. Proc Univ Guam – Trust Territory Acanthaster planci Workshop. Tsuda RT (ed) Univ Guam Mar Lab Tech Rep No 3, 21.

Sapp J (1999) What is Natural? – Coral Reef Crisis. Oxford University Press, Oxford. 275 pp.

Savitri IKE, Ibrahim F, Sahlan M, Wijanarko A (2011) Rapid and efficient purification method of phospholipase A2 from Acanthaster planci. Int J Pharm Bio Sci 2:401-406.

Schiffels S, Durbin, R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919-925.

Schleyer MH, Celliers L (2003) Biodiversity on the marginal coral reefs of South Africa: what does the future hold? Zool Verh 345:387–400.

Semmens DC et al. (2013) Discovery of a novel neurophysin-associated neuropeptide that triggers cardiac stomach contraction and retraction in starfish. J Exp Biol 216:4047-4053.

Shiomi K, Midorikawa S, Ishida M, Nagashima Y, Nagai H (2004) Plancitoxins, lethal factors from the crown-of-thorns starfish Acanthaster planci, are deoxyribonucleases II. Toxicon 44:499-506.

Sluka RD, Miller MW (1999) Status of crown-of-thorns starfish in Laamu Atoll, Republic of Maldives. Bull Mar Sci 65:253–258.

WWW.NATURE.COM/NATURE | 80

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

81

Shashoua VE (1991) Ependymin, a brain extracellular glycoprotein, and CNS plasticity. Ann. N. Y. Acad. Sci. 627:94-114.

Shinzato C et al. (2011) Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476:320-323.

Shoguchi E et al. (2013) Draft assembly of the Symbiodinium minutum nuclear genome reveals dinoflagellate gene structure. Curr Biol 23:1399–1408.

Simakov O et al. (2013) Insights into bilaterian evolution from three spiralian genomes. Nature 493:526-531.

Simakov O et al. (2015) Hemichordate genomes and deuterostome origins. Nature 527:459–65.

Sodergren E et al. (2006) The genome of the sea urchin Strongylocentrotus purpuratus. Science 314:941-952.

Srivastava M et al. (2008) The Trichoplax genome and the nature of placozoans. Nature 454:955-960.

Srivastava M et al. (2010) The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466:720-726.

Stump R (1990) Life history characteristics of Acanthaster planci populations, potential clues to causes of outbreaks. pp 105-118. In Engelhardt U, Lassig B (eds) The Possible Causes and Consequences of Outbreaks of the crown-of-thorns Starfish. GBRMPA Workshop Series No. 18.

Suárez-Castillo EC, García-Arrarás JE (2007) Molecular evolution of the ependymin protein family: a necessary update. BMC Evol Biol 7:23.

Sweatman H, Delean S, Syms C (2011) Assessing loss of coral cover loss on Australia’s Great Barrier Reef over two decades, with implications for longer term trends. Coral Reefs 30:521-531.

Takemura F et al. (2015) Development of an acetic acid injection device for crown-of-thorns starfish controlled by a remedy operated underwater robot. J Robot Mech 27: 571-580.

Takeuchi T et al. (2016) Bivalve-specific gene expansion in the pearl oyster genome: implications of adaptation to a sessile lifestyle. Zool Lett 2:3.

Telford MJ et al. (2014) Phylogenomic analysis of echinoderm class relationships supports Asterozoa. Proc R Soc. Lond B Biol Sci 281:20140479.

Timmers MA, Bird CE, Skillings DJ, Smouse PE, Toonen R (2012) There’s no place like home: crown-of-thorns outbreaks in the central Pacific are regionally derived and independent events. PLOS ONE 7:e31159.

Tsuda RT (1972) History of Acanthaster planci in Guam and the Trust Territory. Proc Univ Guam – Trust Territory Acanthaster planci Workshop. Tsuda RT (ed) Univ Guam Mar Lab Tech Rep No 3, 4.

Tusnady GE, Simon I (2001) The HMMTOP transmembrane topology prediction server. Bioinform 17:849-850.

Van Lenteren JC (1988) Implementation of biological control. Amer J Alt Agric 3:102-109.

Vizcaino JA et al. (2016) Update of the PRIDE database and its related tools. Nucleic Acids Res 44:D447-456.

Vizcaino JA et al. (2104) ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nature Biotech 32:223-226.

WWW.NATURE.COM/NATURE | 81

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033

82

Wass RC (1973) Acanthaster population levels and control efforts on Ponape, Eastern Caroline Islands. Micronesica 9:167.

Wilkinson C (2000) Status of the Coral Reefs of the World: 2000. Townsville, Australia, Global Coral Reef Monitoring Network and Reef and Rainforest Research Centre.

Yamaguchi M (1973) Early life history of coral reef asteroids with special reference to Acanthaster planci. pp 369-387. In Jones OA, Endean R (eds) Biology and Geology of Coral Reefs. Vol. II, Biology 1. Academic Press, New York.

Yamaguchi M (1975) Coral-reef asteroids of Guam. Biotropica 7:12-23.

Yamaguchi M (1977) Larval behaviour and geographical distribution of coral reef asteroids in the Indo-West Pacific. Micronesica 13:283-296.

Yamaguchi M (1986) Acanthaster planci infestations of reefs and coral assemblages in Japan: a retrospective analysis of control efforts. Coral Reefs 5:23-30.

Yamamoto T, Otsuka T (2013) Experimental validation of dilute acetic acid solution injection to control crown-of-thorns starfish (Acanthaster planci). Naturalistae 17:63-65.

Yamazato K (1969) Acanthaster planci, a coral predator. Kon-nichi no Ryukyu 13:7.

Yasuda N, Nagai S, Hamaguchi M, Okaji K, Gérard K (2009) Gene flow of Acanthaster planci (L.) in relation to ocean currents revealed by microsatellites analysis. Mol Ecol 18:1574-1590.

Yasumoto T, Watanabe T, Hashimoto Y (1964) Physiological activities of starfish saponins. Bull Jap Soc Sci Fish 30:357-364.

Zann L, Bell L (1991) The effects of the crown-of-thorns starfish on Samoan Reefs. FAO/UNDP SAM/89/002 Field Report No. 8.

Zann L, Brodie J, Vuki V (1990) History and dynamics of the crown-of-thorns starfish Acanthaster planci (L.) in the Suva area, Fiji. Coral Reefs 9:135-144.

Zann LP, Weaver K (1988) An evaluation of crown-of-thorns starfish control programs undertaken on the Great Barrier Reef. Proc 6th Intl Coral Reef Symp 2:183-188.

WWW.NATURE.COM/NATURE | 82

SUPPLEMENTARY INFORMATIONRESEARCHdoi:10.1038/nature22033