W W W. N A T U R E . C O M / N A T U R E | 1
SUPPLEMENTARY INFORMATIONdoi:10.1038/nature13761
2
SUPPLEMENTARY METHODS
Ribosome display-based protein barcoding
This section describes the procedure to generate PRMC complexes from barcoded DNA templates:
1. Prepare linear barcoded DNA templates by PCR as follows:
PCR components Volume (ĀµL) Template (e.g., plasmids, refer to Supplementary Table 5 for DNA sequences)
x (~5-10 ng)
Upstream barcoded primer (āTemplate barcoded (F)ā, 100 Ī¼M)
0.5
Downstream universal primer (āTemplate (R)ā, 100 Ī¼M)
0.5
10Ć PCR buffer (supplied with Taq) 5 MgCl2 (50 mM) 1.5 DMSO 2.5 Betaine (5 M) 10 dNTPs (10 mM each) 1 Platinum Taq DNA polymerase (5U/ Ī¼l) 0.4 dH2O to 50
30 cycles of thermal cycling with an annealing temperature of 54ĖC. Note: 1) All oligos used in this
study were purchased from Integrated DNA Technologies and the sequences can be found in
Supplementary Table 5. 2) DMSO and betaine were found to enhance the yield.
2. Purify PCR products with a QIAquick PCR purification kit (Qiagen) and measure their
concentrations by NanoDrop. Mix barcoded DNA templates for subsequent assays and analyses.
3. In vitro transcribe mRNA templates by using a HiScribe T7 kit (NEB) as follows:
In vitro transcription components Volume (ĀµL) Mixed linear DNA templates x (~6-10 Ī¼g) 10Ć transcription buffer (supplied with the kit) 20 20x ribonucleotide mix (supplied with the kit) 10 20x HMW mix (supplied with the kit) 10 T7 polymerase (500 U/Ī¼l) 10 dH2O to 200
The IVT reaction is incubated at 42ĖC for 2 h. Note: A long incubation time can increase mRNA
hydrolysis.
4. Remove DNA templates from mRNAs by using a TURBO DNA-free kit (Cat# AM1907, Ambion),
and purify transcribed mRNAs with an RNeasy Mini kit (Qiagen). Purified mRNAs can be stored at -
80ĖC for later use.
5. Prepare mRNA-cDNA hybrids via reverse transcription as follows:
SUPPLEMENTARY INFORMATION
2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
3
Reverse transcription components (Mix 1) Volume (ĀµL) mRNA templates x (~0.2 Ī¼M) RT primer (āRT primerā, 10 Ī¼M) 20 dNTPs (10 mM each) 10 dH2O to 110
Incubate the Mix 1 at 65Ā°C for 5 min and then place it on ice for at least 1 min.
Reverse transcription components (Mix 2) Volume (ĀµL) 10Ć RT buffer 20 MgCl2 (25 mM) 40 DTT (0.1 M) 10 RNaseOUT (40 U/ĀµL) 10 SuperScript III RT (200 U/ĀµL) 10
Incubate the mixture of Mix 1 and 2 at 50ĖC for ~30 min. The RT reaction can be scaled up (e.g., ā„1
mL) using multiple tubes. Note: Avoid a prolonged incubation of mRNAs in the presence of Mg2+ to
alleviate mRNA hydrolysis.
6. Precipitate mRNAācDNA hybrids in the reaction mixture by using isopropanol. For example, add
60 ĀµL ammonium acetate (5 M), 12 ĀµL EDTA (0.5 M) to a 0.5-mL reaction mixture and then mix it
with 0.6 mL isopropanol. After incubation at -20ĖC for 30 min, collect the precipitates by
centrifugation (14,000 g, 4 ĖC) for 15 min and wash them with 70% ethanol (DEPC treated). Quantify
mRNAācDNA hybrids by measuring their cDNAs via real-time PCR. Note: mRNAs lacking barcoding
DNA can lead to formation of non-barcoded proteins, which can be separated from PRMC complexes
via streptavidin pull-down (see the step 10).
7. In vitro translate and display proteins on PRMC complexes by using a PURExpress Ī Ribosome kit
(NEB) as follows:
In vitro translation components Volume (ĀµL) Solution A 40 Factor mix 12 Ribosomes (13.3 ĀµM) 2.2 (~0.3 ĀµM) RNase inhibitor (40 U/ĀµL, Cat# M0314, NEB) 2 mRNAācDNA hybrids x (~0.4 ĀµM) dH2O to 100
Incubate the reaction at 37ĖC for 30 min. Note: mRNAācDNA templates were added at a higher molar
concentration than that of ribosomes to decrease polysome formation.
8. Quench the reaction by addition of 100 ĀµL ice-cold buffer HKM (50 mM HEPES, pH 7.0, 250 mM
KOAc, 25 mM Mg(OAc)2, 0.25 U/mL RNasin (Promega), 0.5 mg/mL chloramphenicol, 5 mM 2-
mercaptoethanol and 0.1% (v/v) Tween 20). Centrifuge (14,000 g, 4ĖC) the tube for 10 min to remove
insoluble components. Note: PRMC complexes should be kept on ice or in cold room to improve their
W W W. N A T U R E . C O M / N A T U R E | 3
SUPPLEMENTARY INFORMATION RESEARCH
4
stability.
9. Purify PRMC complexes containing full-length proteins of interest by using Flag-tag affinity
purification. Incubate a 200-ĀµL reaction mixture with 40 ĀµL anti-Flag M2 magnetic beads (Sigma-
Aldrich), which are blocked with the buffer HKM supplemented with 100 Āµg/mL yeast tRNA and 10
mg/mL BSA, with gentle mixing for ~2-4 h in cold room. Elute bound PRMC complexes with the
buffer HKM in the presence of 0.1 mg/mL Flag peptide.
10. To remove ribosome complexes lacking barcoding DNAs, as well as the Flag peptide which might
interfere with following assays, further purify PRMC complexes with streptavidin-coated magnetic
beads (Dynabeads M-270 Streptavidin, Life Technologies). For example, incubate 200 ĀµL eluent (from
the step 9) with 100 ĀµL streptavidin magnetic beads, which were pretreated with 0.1 M NaOH (refer
to the manual of the beads) and blocked with the buffer HKM in the presence of 100 Āµg/mL yeast
tRNA and 10 mg/mL BSA, for 1 h in cold room with gentle mixing. Elute bound PRMC complexes
with 20 ĀµL buffer HKM containing 5 mM biotin.
11. Quantify PRMC complexes by measuring their cDNAs via real-time PCR. Note: Estimated yields
of PRMC complexes varied from 2.5 to 10.6% of the molar amounts of added mRNAācDNA hybrids
based on a test of individually displayed proteins of different sizes.
HaloTag-based protein barcoding
Enzymatic tags (e.g., HaloTag, SNAP-tag and CLIP-tag) can be applied to the covalent coupling of
various proteins to a barcoding DNA30. Compared with chemical conjugation methods, they can
improve the binding of proteins to an enzyme ligand-modified DNA and catalyze the bond formation.
This section describes how to prepare conjugates of HaloTagged proteins and barcoding dsDNAs
(Extended Data Fig. 2).
1. Prepare a HaloTag ligand-modified primer. Incubate 100 Ī¼L conjugation reaction containing an
amino modified oligo (100 Ī¼M), a succinimidyl ester (O4) Halo-ligand (10 mM, freshly prepared in
DMSO, Promega) and 50 Ī¼L formamide in 50 mM Na2HPO4, pH 8.0, 150 mM NaCl at room
temperature for 1 h. Purify the ligand-modified oligo by reverse-phase HPLC using a Zorbax Eclipse
XDB-C18 column (5 Ī¼m, 9.4Ć250 mm, Agilent Technologies) and an elution gradient of 5-70%
CH3CN/H2O (0.1 M triethylammonium acetate). Lyophilize the modified oligo for further use. Note:
Formamide denaturation of the oligo was found to improve the conjugation efficiency.
2. Prepare barcoded templates via the first PCR:
PCR components Volume (ĀµL)
SUPPLEMENTARY INFORMATION
4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
5
Universal backbone DNA template (āUniversal template of barcoding DNA 1 or 2ā, 1 nM)
1
Upstream barcoded primer (āBarcoding DNA-1 or 2 (F)ā, 12.5 Ī¼M)
1
Universal downstream primer (āBarcoding DNA (R)ā, 12.5 Ī¼M)
1
Platinum PCR SuperMix (Life Technologies) 22.5
25 cycles of thermal cycling with an annealing temperature of 58ĖC. Barcoded primers were prepared
in 96-well plates.
3. Prepare barcoding dsDNAs with desthiobiotin, acrydite and Halo-ligand modifications via the
secondary PCR:
PCR components Volume (ĀµL) Barcoded template (~0.1-1 nM) 1 Universal upstream modification primer (āBarcoding DNA modification (F)ā, 25 Ī¼M)
2
Universal downstream modification primer (āBarcoding DNA modification (R)ā, 25 Ī¼M)
2
Platinum PCR SuperMix 45
30 cycles of thermal cycling with an annealing temperature of 60ĖC.
Purify PCR products with AMPure XP beads (Beckman Coulter) and quantify them with NanoDrop.
4. To generate proteināDNA conjugates, incubate ~0.5-2 Ī¼M barcoding dsDNAs and ~2-5 Ī¼M
HaloTagged proteins in a conjugation buffer (50 mM HEPES, pH 7.5, 150 mM NaCl, 2 mM EDTA and
5% glycerol) with gentle shaking at room temperature for 2-4 h. Note: The yields of proteināDNA
conjugates were estimated to be above 15% based on a test of proteins of various sizes.
5. To remove free barcoding dsDNAs, purify the conjugates, as well as free proteins, by using the anti-
Flag M2 or His-tag (Dynabeads) magnetic beads, and elute them with 50 mM sodium phosphate, pH
8.0, 300 mM NaCl, 1 mM EDTA, 5% glycerol and 0.1% (v/v) Tween 20, in the presence of 0.1 mg/mL
Flag peptide or 250 mM imidazole.
6. To remove free proteins, purify the conjugates by using the M-270 streptavidin-coated magnetic
beads, and elute them with assay buffers in the presence of 5 mM biotin.
7. Quantify proteināDNA conjugates by real-time PCR. The conjugated can be stored at -80ĖC for
future use.
Array deposition
This section outlines the protocol to immobilize SM barcoded proteins on the surface of a microscopic
glass slide.
W W W. N A T U R E . C O M / N A T U R E | 5
SUPPLEMENTARY INFORMATION RESEARCH
6
1. Clean glass slides and coverslips (e.g., 24 x 60 mm rectangular, No. 2) by sonication in 5% Contrad
70, 1 M NaOH, 0.1 N HCl and Milli-Q H2O, and air dried in an AirClean PCR hood.
2. Treat the slide surface with Bind-Silane (GE Healthcare). A detailed protocol can be found at
http://arep.med.harvard.edu/polony/polony_protocols/bind_silane.htm.
3. Prepare a gel-casting solution as follows:
Gel-casting solution components Volume (ĀµL) 2Ć deposition buffer (40 mM HEPES, pH 7.0, 100 mM KOAc, 12 mM Mg(OAc)2, 0.5 U/mL RNasin (Promega) and 0.2% Tween 20)
45
40% acrylamide/bis-acrylamide (19:1, molecular grade, Ambion) 15 Bridge amplification primer (F) (āBridge amplification (F)ā, 1 mM) 25 Bridge amplification primer (R) (āBridge amplification (R)ā, 1 mM) 25 dH2O to 90
Note: Because oxygen trapped in solution or on glass surface can inhibit acrylamide polymerization,
the reagents are degassed with argon and put into an anaerobic chamber (Coy Lab). The reagent mixing
and gel polymerization process are handled in the chamber.
4. Prior to the immobilization, dilute samples with the deposition buffer to a protein concentration
ranging from 0.1 to 1 nM. Note: The protein concentration can be adjusted to optimize polony
densities.
5. Prepare a gel-casting mix by adding 10 ĀµL diluted protein sample to 90 ĀµL gel-casting solution.
6. Add 1 ĀµL 10% (v/v) TEMED and 1 ĀµL 5% (w/v) ammonium persulfate to the gel-casting mix, and
apply ā„ 20 Ī¼L the gel-casting mix to the Bind-Silane-treated slide surface. To form a gel layer of less
than 5-Ī¼m thickness, place a coverslip on the top of the liquid and tightly press it against the slide to
form a liquid layer evenly spread over its surface. Note: A degassed gel-casting mix undergoes a faster
polymerization than usual, so complete this process quickly or otherwise reduce the amounts of
TEMED and ammonium persulfate.
7. Allow the gel to polymerize in the chamber for ~4 h.
8. Gently remove the coverslip under the Milli-Q H2O with the help of a steel blade. Wash the slide
with Milli-Q H2O in a Coplin jar, dry it by a quick spin and place it face up in a PCR hood.
Polony amplification, linearization and blocking
This procedure is to convert barcoding DNAs into linearized and 3ā-OH blocked polonies prior to
sequencing. The process is partly similar to the cluster generation applied to Illumina platforms10. To
facilitate changing reagents and buffers during polony amplification, a protein-loaded slide was
assembled into a FC 81 transmission flow cell with a 1.85-mm-thick polycarbonate flow channel
SUPPLEMENTARY INFORMATION
6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
7
(BioSurface Technologies). The flow cell temperature was controlled by a VWR modular heating
block.
1. Prepare the following buffers:
Buffer Components Volume RNA digesting buffer 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2 and
0.1% (v/v) Triton X-100 100 mL
Amplification buffer 20 mM Tris-HCl, pH 8.8, 10 mM ammonium sulfate, 2 mM magnesium sulfate, 0.1% (v/v) Triton X-100, 1.3% (v/v) DMSO, 2M betaine
1,000 mL
Linearization buffer 20 mM Tris-HCl, pH 8.8, 10 mM KCl, 10 mM ammonium sulfate, 2 mM magnesium sulfate and 0.1% (v/v) Triton X-100
100 mL
Blocking buffer 20 mM Tris-acetate, pH 7.9, 50 mM KOAc, 10 mM Mg(OAc)2 and 0.25 mM CoCl2
100 mL
Wash buffer W1 1ĆSSC and 70% formamide 500 mL Wash buffer W2 0.3ĆSSC and 0.1% (v/v) Tween 20 200 mL
Note: Milli-Q H2O and molecular biology grade reagents are used to avoid nuclease contamination.
2. Clean flow cell components including the polycarbonate flow channel and a coverslip by sonication
in 5% Contrad 70 and Milli-Q H2O, and air dried in an AirClean PCR hood.
3. For samples containing PRMC complexes, digest mRNAs by adding the RNA digesting buffer in
the presence of 10 U/mL RNase H (NEB) into the flow cell and incubating it at 37ĖC for 20 min. Wash
the flow cell with the wash buffer W2 (3Ć3 mL).
4. Increase the flow cell temperature to 60ĖC, and maintain it for the polony amplification process
(steps 5-8).
5. Wash the flow cell with deionized formamide (3Ć3 mL, Ambion).
6. Wash the flow cell with the amplification buffer (3Ć3 mL).
7. Add the amplification buffer in the presence of 200 Ī¼M dNTPs and 80 U/mL Bst polymerase (NEB)
into the flow cell and incubate it for 5 min.
8. Repeat the steps 5-7 for additional 31 cycles.
9. Decrease the flow cell temperature to 37ĖC.
10. Wash the flow cell with the wash buffer W2 (3Ć3 mL) and the linearization buffer (3Ć3 mL).
11. To linearize polonies, add the linearization buffer in the presence of 10 U/mL USER enzyme (NEB)
and incubate the flow cell at 37ĖC for 1 h.
12. Wash off the excised strands with the wash buffers W1 (3Ć3 mL) and W2 (3Ć3 mL).
W W W. N A T U R E . C O M / N A T U R E | 7
SUPPLEMENTARY INFORMATION RESEARCH
8
13. Wash the flow cell with the blocking buffer (3Ć3 mL).
14. To block 3ā-OH ends of polonies and primers, add the blocking buffer in the presence of 10 Ī¼M
ddNTPs and 250 U/mL terminal transferase (NEB) and incubate the flow cell at 37ĖC for 10 min. To
drive the reaction to completion, refill the flow cell with the fresh reagents and repeat this step twice.
Note: The 3ā-OH blocking can prevent nonspecific ligation of labeled oligos to polonies and gel-
anchored primers during sequencing.
15. Wash the flow cell with the wash buffer W2 (3Ć3 mL).
Polony sequencing-by-ligation and colocalization analysis
Polonies generated by our approach are compatible with both sequencing-by-synthesis and
sequencing-by-ligation chemistries. Programmable synthetic barcodes can expand choices of
sequencing strategies. In this work, we modified a sequencing-by-ligation method reported by our lab11
(http://www.polonator.org/protocols/). As detailed protocols of the sequencing method can be found
in our previous reports11,51 (http://arep.med.harvard.edu/Polonator/), this section only focuses on
differences of the current protocol.
1. To facilitate the deconvolution of sequencing signals from colocalized protein and probe polonies,
two rounds of sequencing with different anchor primers (āSequencing 1ā and āSequencing 2ā,
Supplementary Table 5) were successively conducted for protein and probe libraries.
2. Because polony sequencing was performed with a three-channel fluorescence imaging setup, a
three-color sequencing method was designed to decode synthetic barcodes only composed of A, T and
C. Thus, for each query position (e.g., position 1 to 5), an anchor primer is ligated with three
fluorescently labeled degenerate nonamer pools. As previously described11,51, each sequencing-by-
ligation cycle comprises four steps:
(i) Hybridize an anchor primer (10 Ī¼M) to polonies in a hybridization buffer (5ĆSSC and 0.1% (v/v)
Tween 20) at 60ĖC for 10 min and then decrease the temperature to 40ĖC.
(ii) Ligate polony-bound anchor primers with nonamers (2 Ī¼M each pool) in a ligation buffer (50 mM
Tris-HCl, pH 7.6, 10 mM MgCl2, 1 mM ATP and 5 mM DTT) in the presence of 30 U/Āµl T4 DNA
ligase (Enzymatics) at room temperature for 20 min, and then increase the temperature to 35ĖC and
maintain it for 40 min.
(iii) Scan the polony slide by using a fluorescence microscope to determine ligated nonamers.
(iv) Strip off polony-bound primers by washing with the buffer W1 at 60ĖC and then with the buffer
W2.
SUPPLEMENTARY INFORMATION
8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
9
Note: To save the ligase and oligos used for each cycle, the hybridization and ligation steps were
performed in a gasket chamber (~0.5 mL) assembled with a polony slide and a microarray gasket slide
(Cat# G2534-60008, Agilent Technologies). The stripping was performed in a Coplin jar.
3. For polony colocalization analysis, reference images constructed for protein and probe polony
sequencing are aligned with the assist of a cross-library reference. Thus, protein and probe polonies
were hybridized with anchor primers labelled with different fluorophores (āSequencing 1-Cy3ā and
āSequencing 2-Cy5ā, Supplementary Table 5), and their super-imposed images served as the reference.
4. Polony colocalization analysis was performed at each image position. MATLAB scripts analyze all
combinations of protein polony and probe polony positions to identify and count the protein polonies
within a threshold distance (e.g., 0.7 Āµm) from probe polonies.
W W W. N A T U R E . C O M / N A T U R E | 9
SUPPLEMENTARY INFORMATION RESEARCH
10
SUPPLEMENTARY NOTES
1. Colocalization statistics
To compare degrees of colocalization between different protein and probe pairs in an experiment,
we measured colocalization ratios defined as the percentages of protein polonies colocalized with
corresponding probe polonies, and performed Studentās t-tests for the measurements at multiple
imaging positions. The contribution from random colocalization can be estimated by calculating the
mean value of pair cross-correlation function (PCCF) over the distance interval of zero to the
colocalization threshold. In addition, the PCCF statistic 39 can be applied to characterize colocalization
patterns of two polony species that were overlapped or partially overlapped. Below is how the PCCF
values were calculated.
Let i and j be two types of objects for colocalization analysis and A be a sampled array area. A
cross-correlation Ripley K-function ļæ½Ģļ潚¾(šš) can be estimated 52 as
ļæ½Ģļ潚¾šš,šš(šš) = 1š“š“ļæ½Ģļ潚ššļæ½Ģļ潚šš
ā ā Ļ(šššš, šššš)I(šššššš,šššš < šš)šššš
where šššššš,šššš is the distance between the centroids of kāth location of type i objects and the lāth location
of type j objects, and I(šššššš,šššš < šš) is the indicator function with the value 1 if šššššš,šššš < šš is true and
0 otherwise. The density of type i objectives ļæ½Ģļ潚 can be estimated as
ļæ½Ģļ潚šš = ššššš“š“
where šššš is the total number of i objects. The weight function, Ļ(šššš, šššš) provides an edge correction
but was here ignored (Ļ(šššš, šššš) ā 1). The function ļæ½Ģļ潚¾šš,šš(šš) can be interpreted as the ratio of the
number of i and j objects localized within radius r of each other, over the number that would be
expected by chance. Following 39, we also computed a PCCF that considered colocalization within a
radial interval [šš, šš + āšš) via
1š“š“ļæ½Ģļ潚ššļæ½Ģļ潚šš(2ššššāšš + ššāšš2)
ā ā š¼š¼(šš ā¤ šššššš,šššš < šš + āšš)šššš
where ā ā š¼š¼(šš ā¤ šššššš,šššš < šš + āšš)šššš and š“š“ļæ½Ģļ潚ššļæ½Ģļ潚šš(2ššššāšš + ššāšš2) are, respectively, an actual count of
colocalized objects i and an average number of objects i that are colocalized with objects j by chance.
The PCCF mean values were calculated over the interval of 0 to the colocalization threshold (šš = 0
and āšš = the colocalization threshold). In computing a PCCF value for an experiment in which Q
images were analyzed, colocalization events were aggregated over all images and divided by Q times
the expected number of random colocalization per image. By definition, randomly colocalized objects
should have PCCF values of 1. However, to assess whether PCCFs derived in actual experiments were
SUPPLEMENTARY INFORMATION
1 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
11
statistically significantly different from 1, following 39 we estimated 95% confidence intervals of the
PCCFs of randomly colocalized objects using Monte-Carlo simulations. Specifically, each simulation
assumed Q images, and within each image, Ni and Nj polony and probe objects, respectively, where Q
was the number of images analyzed in the experiment whose PCCF was being evaluated, and Ni and
Nj were the mean numbers of polony and probe objects observed in the actual experiment. Coordinates
for the protein and probe polonies were randomly picked using uniform locations. All dimensions were
scaled to actual image dimensions in pixels. For each simulation, a PCCF was computed in the same
manner as in the actual experiment by aggregating colocalization events over Q random images.
Finally, means and confidence intervals for these random PCCFs were obtained from 1,000 simulations.
2. Initial mathematical model of SM-based protein library vs. probe library binding assay
This note describes a mathematical model whose aim is to assist understanding of the sensitivity and
specificity of detection of proteināprobe interactions in complex mixtures. The following items are
assumed:
1) šš species of barcoded proteins šš1, šš2, ā¦ , šššš are allowed to interact with šš species of
barcoded probes š š 1, š š 2, ā¦ , š š šš in a one-pot assay. It is assumed that each protein is present in
the same concentration and that the total protein concentration is šš#. Similarly, it is assumed that
the total concentration of probes is š š # and the concentration of each š š šš is š š #/šš. It is assumed
that probe concentrations are titratable and that š š #/šš ā« šš#/šš. For simplicity, we will assume
here that šš = šš and that for each protein šššš one probe š š šš (denoted with the same index) has
been chosen or designed to specifically target the protein.
2) Due to folding and other issues relating to the efficiency of ribosome display, only a fraction Ī±
of each protein is in an active form that is capable of binding specifically to their targeting probes.
The active and inactive forms of the protein šššš will be denoted šššš+ and šššš
ā , with total
concentrations š¼š¼šš#šš and (1āš¼š¼)šš#
šš , respectively. For similar reasons, only a fraction of probes
are active and can specifically bind to their targeted proteins, and their active and inactive forms
will similarly be denoted š š šš+ and š š šš
ā, with concentrations ššš š #šš and (1āšš)š š #
šš . These fractions are
assumed to be stable throughout the assay, and active and inactive forms of the proteins and
probes are assumed not to be able to interconvert. The fractions Ī± and will be assumed to apply
to all proteins and all probes, respectively.
3) For šš = 1,2, ā¦ šš, the active forms of protein šššš and its specifically targeting probe š š šš will
interact according to the reaction
W W W. N A T U R E . C O M / N A T U R E | 1 1
SUPPLEMENTARY INFORMATION RESEARCH
12
(S1) šššš+ + š š šš+š¾š¾š·š·ā (šššš+š š šš+)šš
where (šššš+š š šš+)šš denotes the complex formed from the specific interaction, and š¾š¾š·š· the
dissociation constant of this complex, and where š¾š¾š·š· applies equally to each such proteināprobe
pair. All forms of protein šššš will also interact non-specifically with all forms of all probes,
including with specific probe š š šš . This leads to four reactions between the active or inactive
protein šššš and each of the n probes š š šš (šš = 1,2, ā¦ šš) , all of which are assumed to be
characterized by the same non-specific dissociation constant U:
(U1) šššš+ + š š šš+ššā (šššš+š š šš+)šš (j=1,..,n)
(U2) šššš+ + š š ššāššā (šššš+š š ššā)šš (j=1,..,n)
(U3) ššššā + š š šš+ššā (ššššāš š šš+)šš (j=1,..,n)
(U4) ššššā + š š ššāššā (ššššāš š ššā)šš (j=1,..,n)
It will also be assumed that (i) non-specific interactions between probes and proteins are always
binary, and we can therefore neglect the possibility of ternary or higher complexes, and (ii) probes
only non-specifically interact with proteins, and proteins only with probes, and thus that probes
and probes, and proteins and proteins, will not interact.
4) After these reactions reach equilibrium, proteināprobe complexes of all of these sorts are
irreversibly captured by chemical crosslinking, and free probes are removed from the solution,
leaving a residual concentration š š 0. It is assumed that both free and complexed protein and
probe molecules are then deposited on the surface of the array in proportion to their solution
concentrations, and then immobilized on the array. Of these, it is assumed that only a fraction Ī²
of protein and a fraction Ī³ of probe molecules bear barcoding DNAs that can be successfully
amplified into polonies and detected on the array, and that amplifiability of protein and probe
DNAs is independent of whether the proteins and probes are free or in complex.
5) The following simplifications will be made regarding computation of PCCF statistics (see above):
Instead of computing PCCFs by counting all pairs of šššš and š š šš polonies within a specified
distance threshold, PCCFs will be calculated from the numbers of šššš polonies that are found
colocalized with š š šš polonies in either of the following ways: (i) specific and non-specifically
SUPPLEMENTARY INFORMATION
1 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
13
bound šššš ā š š šš complexes in which both components form polonies (as per the assumption 4)
will be counted as intrinsically colocalized polonies; (ii) šššš and š š šš polonies that are formed on
the array by other means may be found to be randomly colocalized. A central value for random
colocalization will be computed as the number of non-šššš ā š š šš -derived šššš polonies that are
expected to be found by chance within the distance threshold from non-šššš ā š š šš -derived š š šš polonies, given the numbers of these polonies obtained from 4 above. The sum of (i) and (ii) will
be used to compute a central PCCF for šššš and š š šš on the array, and variation from this central
value will be estimated by random simulations described below. This calculation of PCCF differs
from the formal definition given above and in 39 by being non-symmetrical in šššš and š š šš. Also,
in counting šššš polonies that are near š š šš polonies instead of counting all pairs of neighboring
šššš and š š šš polonies, it ignores the extra pairs that would be taken into account in the PCCF as
formally defined should a šššš polony be found near multiple š š šš polonies, and is thus
conservative regarding colocalization counts compared to its formal definition.
The equilibriums of the five reactions in the assumption 3, and the assumption 1 that š š #/šš ā« šš#, yield
2šš + 1 equations involving the concentration [šššš+] of free šššš+ and 2šš equations involving the
concentration [ššššā] of free ššššā
(S1ā²) [šššš+]ššš š #ššš¾š¾š·š·
= [(šššš+š š šš+)šš]
(U1ā²) [šššš+]ššš š #šššš = [(šššš+š š šš+)šš] (j=1,..,n)
(U2ā²) [šššš+](1 ā šš)š š #
šššš = [(šššš+š š ššā)šš] (j=1,..,n)
and
(U3ā²) [ššššā]ššš š #šššš = [(ššššāš š šš+)šš] (j=1,..,n)
(U4ā²) [ššššā](1 ā šš)š š #
šššš = [(ššššāš š ššā)šš] (j=1,..,n)
Note that here there is a single (S1ā²) equation involving the one specifically targeting probe š š šš, but n
instances each of (U1ā²)-(U4ā²), one for each š š šš for j=1,..,n.
From the assumption 2 and the equations (S1ā²), (U1ā²) and (U2ā²), we get
W W W. N A T U R E . C O M / N A T U R E | 1 3
SUPPLEMENTARY INFORMATION RESEARCH
14
[šššš+] + [(šššš
+š š šš+)šš] + ā [(šššš
+š š šš+)šš]
šš
šš=1+ ā [(šššš
+š š ššā)šš]
šš
šš=1= š¼š¼šš#
šš
or
[šššš+] (1 + ššš š #
ššš¾š¾š·š·+ ā ššš š #
šššš
šš
šš=1+ ā (1 ā šš)š š #
šššš
šš
šš=1) = [šššš
+] (1 + ššš š #ššš¾š¾š·š·
+ š š #šš ) = š¼š¼šš#
šš
which leads in turn to
[šššš+] = š¼š¼šš#
šš + š š # ( ššš¾š¾š·š·
+ šššš)
= š¼š¼šš#
šš + š š #ļæ½Ģļ潚¾š·š·
= š¼š¼šš#ļæ½Ģļ潚¾š·š·ššļæ½Ģļ潚¾š·š· + š š #
where ļæ½Ģļ潚¾ can be interpreted as an adjusted specific dissociation constant
ļæ½Ģļ潚¾š·š· = 1šš
š¾š¾š·š·+ šš
šš
Similarly from the assumption 2 and equations (U3ā²) and (U4ā²) we get
[ššššā] =
(1 ā š¼š¼)šš#
šš + ššš š #šš
=(1 ā š¼š¼)šš#šššš(šš + š š #)
Using equations (S1ā²) and (U1ā²)-(U4ā²), the total concentration [(šššš ā š š šš)] of (šššš ā š š šš) complexes
between the protein šššš and its specifically targeting probe š š šš in any of their active and inactive forms
is
[(šššš ā š š šš)] = š¼š¼šš#ļæ½Ģļ潚¾š·š·ššļæ½Ģļ潚¾š·š· + š š #
(ššš š #ššš¾š¾š·š·
+ š š #šššš) +
(1 ā š¼š¼)šš#šššš(šš + š š #) (š š #
šššš)
= šš#š š #šš (
š¼š¼ (1 ā (šš ā 1)ļæ½Ģļ潚¾š·š·šš )
ššļæ½Ģļ潚¾š·š· + š š #+
(1 ā š¼š¼)šš(šš + š š #))
Total free protein concentration can also be computed as
[šššš+] + [šššš
ā] = š¼š¼šš#ļæ½Ģļ潚¾š·š·ššļæ½Ģļ潚¾š·š· + š š #
+(1 ā š¼š¼)šš#šššš(šš + š š #) = šš# ( š¼š¼ļæ½Ģļ潚¾š·š·
ššļæ½Ģļ潚¾š·š· + š š #+
(1 ā š¼š¼)šššš(šš + š š #))
We also have a total concentration [(šššš ā š š ššā šš)] of (šššš ā š š šš) complexes between šššš and š š šš probes (j
i) that are not targeted to šššš, in any of their active and inactive forms. This is simplified as
[(šššš ā š š ššā šš)] = ([šššš+] + [šššš
ā]) (šš ā 1)š š #šššš
Finally, we must also consider that probe š š šš will be in non-specific complexes with other proteins ššššā šš
than its specific target. By our assumptions above, since all proteins šššš (šš ā šš) behave identically with
SUPPLEMENTARY INFORMATION
1 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
15
respect to their targeting and non-targeting probes to šššš, we have [šššš+] + [ššššā] = [šššš+] + [ššššā] for all
šš ā šš, and therefore that
[(ššššā šš ā š š šš)] = ([šššš+] + [ššššā])(šš ā 1)š š #
šššš
Arraying, polony formation, and colocalization statistics
It is now assumed that the mixture is arrayed for SM assaying, and that polonies are formed on the
array. Following the assumption 4, the fractions of polonies relevant to evaluation of šššš and š š šš colocalization can be computed as follows:
šš(šššš) =š½š½š¾š¾[(šššš ā š š šš)]
š¶š¶
Fraction of (šššš ā š š šš) complexes between šššš and its specifically targeting probe š š šš that are detectable on the array as intrinsically colocalized polonies
šš(šššš) =š½š½š¾š¾[(šššš ā š š ššā šš)]
š¶š¶
Fraction of (šššš ā š š šš) complexes between šššš and other probes š š šš (j i) that are detectable on the array as polonies of šššš that are intrinsically colocalized with those of other probes.
šššš=š½š½(1 ā š¾š¾)([(šššš ā š š šš)] + [(šššš ā š š ššā šš)]) + š½š½([šššš+] + [ššššā])
š¶š¶
Fraction of šššš polonies that do not appear intrinsically colocalized with probe polonies
šš(šššš) =š½š½š¾š¾[(ššššā šš ā š š šš)]
š¶š¶
Fraction of (šššš ā š š šš) complexes between probe š š šš and other proteins šššš (j i) that are detectable on the array as polonies of š š šš that are intrinsically colocalized with the other proteins.
šššš =(1 ā š½š½)š¾š¾([(šššš ā š š šš)] + [(ššššā šš ā š š šš)]) + š¾š¾ š š
0
ššš¶š¶
Fraction of š š šš polonies that do not appear intrinsically colocalized with protein polonies
where
š¶š¶ = (1 ā (1 ā š½š½)(1 ā š¾š¾))[(šššš ā š š šš)] + š½š½[(šššš ā š š ššā šš)] + š¾š¾[(ššššā šš ā š š šš)] + š½š½([šššš+] + [ššššā]) + š¾š¾ š š 0
šš
Note that as per the assumption 5, šš(šššš) determines the number of intrinsically colocalized šššš and š š šš
W W W. N A T U R E . C O M / N A T U R E | 1 5
SUPPLEMENTARY INFORMATION RESEARCH
16
polonies found on the array. The other fractions will be used in calculation of the number of randomly
colocalized polonies below. First we will compute the numbers of polonies of the various sorts, and then
we will calculate random colocalization.
Let it now be assumed that šššš polonies are detected for the protein šššš. These šššš polonies may
be apportioned as
šš(šššš) =šššššš(šššš)
šš(šššš) + šš(šššš) + šššš
Polonies of šššš intrinsically
colocalized with polonies of š š šš
šš(šššš) =šššššš(šššš)
šš(šššš) + šš(šššš) + šššš
Polonies of šššš intrinsically
colocalized with polonies of other
probes š š šš (šš ā šš)
šššš = šššššššššš(šššš) + šš(šššš) + šššš
Polonies of šššš that are not
intrinsically colocalized with probe
polonies.
It follows from the frequencies derived above that the following numbers of polonies are detected for
the probe š š šš apart that are not counted with the šššš šššš protein polonies above (the only š š šš polonies
considered with the šššš polonies above are the šš(šššš) instances of š š šš polonies colocalized with šššš
polonies).
šš(šššš) =šššššš(šššš)
šš(šššš) + šš(šššš) + šššš
Polonies of š š šš intrinsically
colocalized with polonies of other
proteins šššš (šš ā šš)
šššš = šššššššššš(šššš) + šš(šššš) + šššš
Polonies of š š šš that are not
intrinsically colocalized with protein
polonies.
In preparing to compute random colocalization and the final PCCF statistic, a question arises in the
context of our highly multiplexed SM assay as to whether šššš polonies from both uncomplexed šššš+ and
ššššā objects vs. šššš polonies formed from šššš ā š š ššā šš complexes should be treated equivalently regarding
whether they can be randomly colocalized (and similarly for š š šš polonies). It could be the case that šššš
polonies formed within complexes cannot be colocalized with š š šš polonies to the degree that šššš
polonies formed from uncomplexed šššš objects can due to steric constraints or other factors. In non-
multiplexed assays, such as those considered in 39, this question never arises because the non-targeting
partners in šššš ā š š ššā šš and ššššā šš ā š š šš complexes would never be surveyed for detection, and the resulting
SUPPLEMENTARY INFORMATION
1 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
17
šššš and š š šš polonies would all be considered isolated objects that could appear near each other by chance
in the same way. A broader issue concerns the fact that the PCCF is specifically a Pair Cross-Correlation
Function 39, and the question arises whether for multiplexed assays it might be better to develop and
employ a higher-order multi-variate statistic that compares actual vs. expected random colocalization for
many kinds of objects at once, somewhat like multi-variate ANOVAs analyze variances of many
variables and interactions at once. However, in this initial model, we will in fact treat polonies derived
from free probe and protein molecules vs. complexes equivalently in terms of their potential for random
colocalization within the constraints indicated in the assumption 5. Notably, even when only considering
pairwise colocalization, such as the application of PCCF in 39, where objects are labeled antibodies, the
prima facie distinction between objects colocalized by virtue of targeting physical interactions and
isolated objects that appear as random background is an idealization, since the apparently isolated objects
are likely interacting non-specifically with many other kinds of unsurveyed molecules and complexes in
the cell matrix, and PCCF remains a useful statistic even though these interactions are ignored.
Random colocalization
As noted in assumption 5 and discussed in the comment above, random colocalization will be
considered between šššš and š š šš polonies that do not arise from intrinsic colocalization from šššš ā š š šš
complexes. We now know the number of such polonies to be šš(šššš) + šššš for šššš, and šš(šššš) + šššš for
š š šš. Given imaged array area A and polony radius r, we can estimate the density of these š š šš polonies
that could appear anywhere on the array by chance as
šššš =šššš + šš(šššš)
š“š“and the probability of a probe š š šš polony appearing in the vicinity of a šššš protein polony by chance
would then be
šš(2šš)2šššš
Thus, the expected number of the šš(šššš) + šššš šššš polonies that will have an š š šš polony localized nearby
by chance will be
šš(šššš)šššššššš = šš(2šš)2šššš (šš(šššš) + šššš)
Thus, the total number of šššš polonies colocalized with š š ššcomplexes will be
šš(šššš)š”š”š”š”š”š” = šš(šššš) + šš(šššš)
šššššššš
PCCF statistic
To complete the PCCF statistic as specified in assumption 5, we must divide šš(šššš)š”š”š”š”š”š” by the expected
number of šššš polonies colocalized with š š šš polonies assuming that all of these individual polonies
W W W. N A T U R E . C O M / N A T U R E | 1 7
SUPPLEMENTARY INFORMATION RESEARCH
18
(including the ones in šššš ā š š šš complexes) could be colocalized by chance. Similar to the logic above,
the total density of š š šš objects will now be
š š =ššš š + šš(ššš š ) + šš(ššš š )
š“š“and the probability of a probe š š šš polony appearing in the vicinity of a šššš protein polony will then be
šš(2šš)2š š šššš
and, therefore
šššššššš = šššššššš(šš#, š š #, š š 0, šš, š¼š¼, šš, š¾š¾š·š·, šš, š½š½, š¾š¾, šššš, š“š“, šš) =šš(ššš š )
š”š”š”š”š”š”
šš(2šš)2š š šššš
Random simulations
To estimate the degree of variation to which the PCCF statistic may be subject under a given set
parameters, we compute a distribution of PCCF values using the formula above assuming that the six
terms šššš, šš(ššš š ), šš(šššš), šš(ššš š ), ššš š , and šš(ššš š )šššššššš are all randomly drawn from Poisson distributions
whose means are the values computed above within the model. Because these simulations do not take
into account variation in actual samples or assay conditions, and because Poisson error may itself
underrepresent the variability inherent in the underlying system vs. the model, these estimates must be
considered lower bounds for the variance that will be encountered in actual assays.
Detection of specific vs. non-specific binding as a function of š²š²š«š« and šš
As an application of the model, we compare the PCCF values computed for a mixture of šš proteins
and targeting probes that specifically interact with dissociation constant š¾š¾š·š·, where šš is allowed to
vary over a large range, with the PCCF for mixtures of the same numbers of proteins and probes, in
which all the proteins and probes interact only non-specifically with dissociation constant šš . In
particular, we assume an array in which 5Ć108 protein polonies can be detected, and that these are divided
equally among the šš proteins, where šš is allowed to range between 500 and 500,000 (so that the
number of detected polonies per protein species šššš correspondingly varies between 1,000,000 and
1,000). We consider three specific dissociation constants š¾š¾š·š· , and compute non-specific PCCFs by
letting š¾š¾š·š· ā ā. All parameters other than š¾š¾š·š· , šš, and šššš are assigned the following fixed values
consistent with literature and experimental data.
šš# = 20 šššššššš /100 šššš
Approximate values which can be used in the assay š š # = 200 šššššššš /100 šššš
š š 0 = 100 šššš
š“š“ = 75 Ć 25 šššš2 Standard microscope slide area
šš = 0.7 šššš Colocalization threshold distance used in our experiments
SUPPLEMENTARY INFORMATION
1 8 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
19
š¼š¼ = 0.8 Approximate values based on our test of a few proteins
šš = 0.8
šš = 10 šššš Assumed non-specific proteināprobe complex dissociation
constant
š½š½ = 0.75 Approximate value based on this study and our previous
measure7 š¾š¾ = 0.75
Results are summarized in the figure below. In this figure, error bars span the range of the 1st and
99th percentiles of randomly simulated PCCF distributions as described above, with the following
exception(s): (i) For large values of šš, the 99th percentile of the non-specific PCCF distribution was no
more than the central value computed by the model so that the upper error bar could be at or below the
central value. In these cases the maximum value observed in the PCCF distribution was used to set the
upper error bar instead of the 99th percentile, and the upper end of the error bar was marked with an
asterisk (*). This situation arises because the number of šššš and š š šš polonies becomes very small so that
simulations result in no or very few colocalized polonies except for a small number of outliers. (ii)
Because PCCFs are presented below via their log10 values, PCCF values of 0 cannot be portrayed directly.
However, in some cases the 1st percentiles of PCCF values were 0, and this is indicated by the use of a
downward pointing arrowhead on the lower error bars. Note that markers and error bars are slightly
jittered in order to allow overlapping error bars to be seen clearly. For each set of š¾š¾š·š·, šš, and šššš values,
10,000 random simulations were performed.
A conclusion that may be drawn from these simulations is that order-of-magnitude differences
between specific š¾š¾š·š·s can be clearly distinguished from each and from non-specific binding in mixtures
of up to ~63,000 distinct protein and probe species under the conditions assumed in the model. Note,
however, that while the lack of overlap between error bars that indicate 1st and 99th percentiles implies
that the PCCF distributions for these different š¾š¾š·š·s overlap with P < 0.0001, these probabilities are not
corrected for multiple hypotheses.
SUPPLEMENTARY INFORMATION
2 0 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
21
SUPPLEMENTARY DISCUSSION
Comparison of protein interaction profiling technologies based on nucleic acid barcoding and
high-throughput sequencing
A large set of techniques have been developed to study proteināprotein interaction and their features
and applications have been well covered by numerous reviews53-57. These techniques are built on a
variety of protein detection methods (e.g., mass spectrometry, immunostaining, spectrophotometry,
etc.) and some of them are also applicable to other types of interactions, such as proteinānucleic acid
and proteināsmall molecule interactions. Many techniques can be categorized broadly as āprotein
barcodingā technologies, including in vivo and in vitro approaches using DNA sequences, such as
protein coding sequences (CDS) or non-CDSs, to identify proteins of interest. Of note, some
techniques have been successfully adapted for use with massively parallel sequencing technologies to
improve throughput and cost-effectiveness16,27,40,42-44,58-60. Here, we compare them with SMI-Seq with
focus on differences in the protein barcoding and decoding methods employed (Extended Data Table
3). Their applications demonstrated for compound screening16,30,45-50,61 are also included in the table.
Protein barcoding methods. A variety of protein barcoding methods can be grouped into two general
categories. The first category includes those that couple proteins and DNAs in natural or synthetic
compartments. Yeast two-hybrid (Y2H)62-64 and protein-fragment complementation assay (PCA)65-68
are well-established in vivo techniques in which proteins and barcoding DNAs (CDSs) are paired in
cellular compartments. Proteināprotein interactions are detected in intracellular environments with the
help of a transcriptional or spectroscopic reporter. They are relatively easy to implement and have
successfully been applied to screen Gateway-compatible ORFeome libraries for interactome
mapping2,31,69,70. Other prominent examples belonging to this category are cell or virus-based protein
display where proteins of interest are presented on the surface of cells or viral particles and can directly
be subjected to binding assays. Cell-based displays can happen in nature, e.g., immunoglobulin
expression on B lymphocyte surface, or can be engineered in various expression systems (e.g., phage
display71,72, yeast display73,74, bacterial display75,76 and mammalian cell display77,78). Similarly,
coupling of proteins to their DNA templates can be achieved in non-biological compartments, such as
water-in-oil emulsions, via in vitro transcription and translation (e.g., bead surface display79,80).
Although all these techniques are of great utility for screening proteināprotein interactions, it is
difficult to use them to obtain quantitative measures of protein binding, partly because each
compartment contains different numbers of protein molecules and their effective concentrations are
thus variables that are difficult to control and measure.
W W W. N A T U R E . C O M / N A T U R E | 2 1
SUPPLEMENTARY INFORMATION RESEARCH
22
The other category of protein barcoding is to molecularly attach DNAs (or RNAs) to proteins.
Molecular junctions can be obtained simply by non-covalent binding, e.g., formation of biotinylated
proteināstreptavidinābiotinylated DNA complexes, or by covalent chemical crosslinking (refer to
Pierce crosslinking reagents technical handbook81) or enzymatic conjugation (e.g. sortase82, SNAP
tag30, etc.). In principle, these methods are applicable to almost all proteins and complexes that can be
functionally produced in available expression systems; however, because proteins and DNAs need to
be individually coupled, the cost scales almost linearly with library size. In contrast, cell-free protein
display techniques, such as ribosome display6, mRNA display83 and DNA display84, enable one-pot
barcoding of a whole library (up to 1015 proteins) and the time and effort required for each assay are
independent of library size. Nevertheless, the choice of proteins which can be synthesized in a
functional form by in vitro display systems can be limited by the lack of factors that assist protein
synthesis, folding, modification and assembly. mRNA display was found to only work efficiently for
small proteins (ā¤ 300 amino acids)85. In addition to above methods, proteins can be indirectly barcoded
by binding to barcoded antibodies or nucleic acid aptamers (e.g., proximity ligation assay (PLA)13,14
and proximity extension assay (PEA)15). The use of capture reagents allows direct analyses of proteins
from biological samples and has very versatile applications. These techniques have been used to
measure protein abundance13,44,86-88 and to detect proteināprotein and proteināDNA interactions14,89-91
and post-translational modifications92,93, as well as to screen compounds50. However, a limitation of
these techniques is that they require capture reagents of both high affinity and specificity that can be
difficult to produce, and this can be especially constraining in the context of multiplexed binding assays
with large libraries. In general, compared with compartmentation, DNA-attached proteins can be
precisely quantitated by measuring the abundance of their DNA barcodes, thus providing a basis for
the quantification of protein interactions.
Quantification of protein interactions by high-throughput sequencing. High-throughput protein
interaction screening involves detection and quantitation of barcoding DNAs of interacting proteins.
DNA barcodes can be quantified by real-time PCR, microarray hybridization or next-generation
sequencing (NGS). However, for large-scale measurements, NGS technologies hold distinct
throughput and cost advantages and are quickly coming into wide use (Extended Data Table 3). For
example, NGS has been applied to Y2H to quantitate the enrichment level of each positive interactor
(QIS-Seq)40. In a library vs. library screening, genes of each interacting pair need to be individually
joined together by PCR prior to sequencing (Stitch-Seq)27, thus imposing a limit on the throughput.
NGS has widely been used for in-depth profiling of complex antibody repertoires by simultaneously
analyzing immunoglobulin genes from millions of B cells (Ig-seq, recently reviewed by Georgiou et
SUPPLEMENTARY INFORMATION
2 2 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
23
al.41). Likewise, it has also been coupled with phage display for autoantigen discovery (PhIP-Seq)42
and in vitro antibody selection60, mRNA display for screening proteins generated from random cDNA
fragments (IVV-HiTSeq)43 and ribosome display for the interaction profiling of full-length human
ORFeome (PLATO)16,61. Moreover, NGS was applied to PLA for simultaneous quantitation of 35
proteins in blood plasma (ProteinSeq)44, and PEA for a one-pot binding assay with three barcoded
proteins and 262 barcoded small molecules (IDUP)30.
While NGS techniques can provide true digital quantification of protein molecules through their
DNA barcodes, the ability to precisely quantitate protein interactions can be affected when it is
necessary to separate interacting from non-interacting proteins and sequence the interacting protein
barcodes alone. This is because the sequencing data do not contain those of the quantities of the protein
molecules that did not interact with the baits or probes that are required to calculate binding affinities.
Separations of this sort include growth selection in medium (Y2H), flow cytometry sorting (cell-based
protein display) and affinity enrichment (cell-free protein display). In principle, this problem can be
alleviated by pre-controlling protein concentrations or measuring them in an additional assay. This is
not possible for all detection methods (such as Y2H, see above), but even when possible, these methods
can introduce biases and extra sources of variance compared to in situ sequencing of a whole mixtures,
in which the abundances of both free and interacting proteins can be measured in the same assay.
In situ SM quantification. In addition to sharing advantages of other techniques, such as highly
efficient barcoded library construction conferred by ribosome display, SMI-Seq presents a fundamental
new advantage in its use of in situ SM sequencing to simultaneously identify and count both bound
and unbound proteins in solution. In situ counting of numerous different SM proteins in solution can
lead to ultimate sensitivity and accuracy5,94 and has been a major goal for modern analytical techniques
because it can dramatically increase assay throughput and multiplexity. This is demonstrated by our
ability to conduct a 200Ć55 library-by-library screen, much larger than the 5Ć5 demonstration provided
by its most similar non-SM method14, and our mathematical modeling suggests that theoretically,
interactions of tens of thousands of proteins with tens of thousands probe proteins could be
quantitatively measured in a one-pot assay based on half billion polony reads, a throughput within the
capability of current NGS platforms. Even though the assays were performed in a library vs. library
format in this work, this technique holds the promise of direct molecular counting of all-by-all pairwise
or even higher-order interactions in a complex mixture. SMI-Seq, as well as recent in situ sequencing
techniques95-98, represents a further extension of how imaging-based sequencing technology can glean
new and valuable information by analyzing the spatial patterning as well as the sequence content and
numbers of arrayed DNAs.
W W W. N A T U R E . C O M / N A T U R E | 2 3
SUPPLEMENTARY INFORMATION RESEARCH
24
Supplementary Figure 1 | Expression vectors used in this study. pRD-NHA-SecM was used to
generate ribosome display-barcoded proteins; pEco-CSBPHis, pEco-CHaloFlagHis and pEco-NHalo-
CHis were applied to E. coli in vivo and in vitro protein expression; pBac-NFlagHA was applied to
Baculovirus expression of GPCRs; pIRES-CHaloFlagHis and pIRES-CHaloFlagHis-Gateway were
used to express HaloTagged proteins in the human IVT system. T7 pro., T7 promoter sequence; T7
term., T7 terminator sequence; polyhedrin pro., polyhedrin promoter sequence; RBS, ribosomal
binding site; IRES, internal ribosome entry site. DNA sequences can be found in Supplementary Table
5.
SUPPLEMENTARY INFORMATION
2 4 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
25
SUPPLEMENTARY REFERENCES
2 Dreze, M. et al. High-quality binary interactome mapping. Methods Enzymol. 470, 281-315 (2010).
5 Weiss, S. Fluorescence spectroscopy of single biomolecules. Science 283, 1676-1683 (1999). 6 Hanes, J. & Pluckthun, A. In vitro selection and evolution of functional proteins by using
ribosome display. Proc. Natl. Acad. Sci. U.S.A. 94, 4937-4942 (1997). 7 Mitra, R. D. & Church, G. M. In situ localized amplification and contact replication of many
individual DNA molecules. Nucleic Acids Res. 27, e34 (1999). 10 Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator
chemistry. Nature 456, 53-59 (2008). 11 Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome.
Science 309, 1728-1732 (2005). 13 Fredriksson, S. et al. Protein detection using proximity-dependent DNA ligation assays. Nat.
Biotechnol. 20, 473-477 (2002). 14 Hammond, M., Nong, R. Y., Ericsson, O., Pardali, K. & Landegren, U. Profiling cellular
protein complexes by proximity ligation with dual tag microarray readout. PLoS One 7 (2012).
15 Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102 (2011).
16 Zhu, J. et al. Protein interaction discovery using parallel analysis of translated ORFs (PLATO). Nat. Biotechnol. 31, 331-334 (2013).
27 Yu, H. et al. Next-generation sequencing to generate interactome datasets. Nat. Methods 8, 478-480 (2011).
30 McGregor, L. M., Jain, T. & Liu, D. R. Identification of ligand-target pairs from combined libraries of small molecules and unpurified protein targets in cell lysates. J. Am. Chem. Soc. 136, 3264-3270 (2014).
31 Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nature Methods 8, 659-661 (2011).
39 Philimonenko, A. A., Janacek, J. & Hozak, P. Statistical evaluation of colocalization patterns in immunogold labeling experiments. J. Struct. Biol. 132, 201-210 (2000).
40 Lewis, J. D. et al. Quantitative interactor screening with next-generation sequencing (QIS-Seq) identifies Arabidopsis thaliana MLO2 as a target of the Pseudomonas syringae type III effector HopZ2. BMC Genomics 13 (2012).
41 Georgiou, G. et al. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat. Biotechnol. 32, 158-168 (2014).
42 Larman, H. B. et al. Autoantigen discovery with a synthetic human peptidome. Nat. Biotechnol. 29, 535-541 (2011).
43 Fujimori, S. et al. Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data. Sci. Rep. 2, 691-691 (2012).
44 Darmanis, S. et al. ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing. PLoS One 6 (2011).
45 Young, K. et al. Identification of a calcium channel modulator using a high throughput yeast two-hybrid screen. Nat. Biotechnol. 16, 946-950 (1998).
46 Nishihara, T. et al. Estrogenic activities of 517 chemicals by yeast two-hybrid assay. J. Health Sci. 46, 282-298 (2000).
47 Chidley, C., Haruki, H., Pedersen, M. G., Muller, E. & Johnsson, K. A yeast-based screen reveals that sulfasalazine inhibits tetrahydrobiopterin biosynthesis. Nat. Chem. Biol. 7, 375-383 (2011).
W W W. N A T U R E . C O M / N A T U R E | 2 5
SUPPLEMENTARY INFORMATION RESEARCH
26
48 Wrighton, N. C. et al. Small peptides as potent mimetics of the protein hormone erythropoietin. Science 273, 458-463 (1996).
49 Lowman, H. B. Bacteriophage display and discovery of peptide leads for drug development. Annu. Rev. Biophys. Biomol. Struct. 26, 401-424 (1997).
50 Leuchowius, K. J. et al. High content screening for inhibitors of protein interactions and post-translational modifications in primary cells by proximity ligation. Mol. Cell. Proteomics. 9, 178-183 (2010).
51 Porreca, G. J., Shendure, J. & Church, G. M. Polony DNA sequencing. Curr. Protoc. Mol. Biol. 7.8 (2006).
52 Hanisch, K. H. & Stoyan, D. Formulas for second-order analysis of marked point processes. Math. Operationsforsch. Statist., Ser. Statitics 14, 559 (1979).
53 Phizicky, E. M. & Fields, S. Protein-protein interactions: methods for detection and analysis. Microbiol. Rev. 59, 94-123 (1995).
54 Zhu, H. & Snyder, M. Protein chip technology. Curr. Opin. Chem. Biol. 7, 55-63 (2003). 55 Piehler, J. New methodologies for measuring protein interactions in vivo and in vitro. Curr.
Opin. Struct. Biol. 15, 4-14 (2005). 56 Boozer, C., Kim, G., Cong, S., Guan, H. & Londergan, T. Looking towards label-free
biomolecular interaction analysis in a high-throughput format: a review of new surface plasmon resonance technologies. Curr. Opin. Biotechnol. 17, 400-405 (2006).
57 Berggard, T., Linse, S. & James, P. Methods for the detection and analysis of protein-protein interactions. Proteomics 7, 2833-2842 (2007).
58 Dias-Neto, E. et al. Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysis. PLoS One 4 (2009).
59 Mendez-Rios, J. & Uetz, P. Global approaches to study protein-protein interactions among viruses and hosts. Future Microbiol. 5, 289-301 (2010).
60 Ravn, U. et al. By-passing in vitro screening-next generation sequencing technologies applied to antibody display and in silico candidate selection. Nucleic Acids Res. 38 (2010).
61 Larman, H. B., Liang, A. C., Elledge, S. J. & Zhu, J. Discovery of protein interactions using parallel analysis of translated ORFs (PLATO). Nat. Protoc. 9, 90-103 (2014).
62 Fields, S. & Song, O. K. A novel genetic system to detect protein-protein interactions. Nature 340, 245-246 (1989).
63 Chien, C. T., Bartel, P. L., Sternglanz, R. & Fields, S. The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. U. S. A. 88, 9578-9582 (1991).
64 Rual, J. F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).
65 Rossi, F., Charlton, C. A. & Blau, H. M. Monitoring protein-protein interactions in intact eukaryotic cells by beta-galactosidase complementation. Proc. Natl. Acad. Sci. U. S. A. 94, 8405-8410 (1997).
66 Stagljar, I., Korostensky, C., Johnsson, N. & te Heesen, S. A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo. Proc. Natl. Acad. Sci. U. S. A. 95, 5187-5192 (1998).
67 Remy, I. & Michnick, S. W. Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays. Proc. Natl. Acad. Sci. U. S. A. 96, 5394-5399 (1999).
68 Pelletier, J. N., Arndt, K. M., Pluckthun, A. & Michnick, S. W. An in vivo library-versus-library selection of optimized protein-protein interactions. Nat. Biotechnol. 17, 683-690 (1999).
69 Rual, J. F., Hill, D. E. & Vidal, M. ORFeome projects: gateway between genomics and omics. Curr. Opin. Chem. Biol. 8, 20-25 (2004).
SUPPLEMENTARY INFORMATION
2 6 | W W W. N A T U R E . C O M / N A T U R E
RESEARCH
27
70 Yashiroda, Y., Matsuyama, A. & Yoshida, M. New insights into chemical biology from ORFeome libraries. Curr. Opin. Chem. Biol. 12, 55-59 (2008).
71 Clackson, T., Hoogenboom, H. R., Griffiths, A. D. & Winter, G. Making antibody fragments using phage display libraries. Nature 352, 624-628 (1991).
72 Bratkovic, T. Progress in phage display: evolution of the technique and its applications. Cell. Mol. Life Sci. 67, 749-767 (2010).
73 Kieke, M. C., Cho, B. K., Boder, E. T., Kranz, D. M. & Wittrup, K. D. Isolation of anti-T cell receptor scFv mutants by yeast surface display. Protein Eng. 10, 1303-1310 (1997).
74 Gai, A. S. & Wittrup, D. K. Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467-473 (2007).
75 Stahl, S. & Uhlen, M. Bacterial surface display: Trends and progress. Trends Biotechnol. 15, 185-192 (1997).
76 Samuelson, P., Gunneriusson, E., Nygren, P. A. & Stahl, S. Display of proteins on bacteria. J. Biotechnol. 96, 129-154 (2002).
77 Beerli, R. R. et al. Isolation of human monoclonal antibodies by mammalian cell display. Proc. Natl. Acad. Sci. U. S. A. 105, 14336-14341 (2008).
78 Zhou, C., Jacobsen, F. W., Cai, L., Chen, Q. & Shen, W. D. Development of a novel mammalian cell surface antibody display platform. MAbs 2, 508-518 (2010).
79 Diamante, L., Gatti-Lafranconi, P., Schaerli, Y. & Hollfelder, F. In vitro affinity screening of protein and peptide binders by megavalent bead surface display. Protein Eng. Des. Sel. 26, 713-724 (2013).
80 Huang, L. C. et al. Linking genotype to phenotype on beads: high throughput selection of peptides with biological function. Sci. Rep. 3 (2013).
81 http://www.piercenet.com/page/crosslinking-reagents-technical-handbook-1601673. 82 Pritz, S. et al. Synthesis of biologically active peptide nucleic acid-peptide conjugates by
sortase-mediated ligation. J. Org. Chem. 72, 3909-3912 (2007). 83 Wilson, D. S., Keefe, A. D. & Szostak, J. W. The use of mRNA display to select high-affinity
protein-binding peptides. Proc. Natl. Acad. Sci. U.S.A. 98, 3750-3755 (2001). 84 Yonezawa, M., Doi, N., Kawahashi, Y., Higashinakagawa, T. & Yanagawa, H. DNA display
for in vitro selection of diverse peptide libraries. Nucleic Acids Res. 31 (2003). 85 Cotten, S. W., Zou, J. W., Valencia, C. A. & Liu, R. H. Selection of proteins with desired
properties from natural proteome libraries using mRNA display. Nat. Protoc. 6, 1163-1182 (2011).
86 Schallmeiner, E. et al. Sensitive protein detection via triple-binder proximity ligation assays. Nat. Methods 4, 135-137 (2007).
87 Darmanis, S. et al. Sensitive plasma protein analysis by microparticle-based proximity ligation assays. Mol. Cell. Proteomics 9, 327-335 (2010).
88 Ke, R., Nong, R. Y., Fredriksson, S., Landegren, U. & Nilsson, M. Improving precision of proximity ligation assay by amplified single molecule detection. PLoS One 8 (2013).
89 Soderberg, O. et al. Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat. Methods 3, 995-1000 (2006).
90 Gustafsdottir, S. M. et al. In vitro analysis of DNA-protein interactions by proximity ligation. Proc. Natl. Acad. Sci. U. S. A. 104, 3067-3072 (2007).
91 Soderberg, O. et al. Characterizing proteins and their interactions in cells and tissues using the in situ proximity ligation assay. Methods 45, 227-232 (2008).
92 Jarvius, M. et al. In situ detection of phosphorylated platelet-derived growth factor receptor beta using a generalized proximity ligation method. Mol. Cell. Proteomics 6, 1500-1509 (2007).
93 Leuchowius, K. J., Weibrecht, I., Landegren, U., Gedda, L. & Soderberg, O. Flow cytometric in situ proximity ligation analyses of protein interactions and post-translational modification
W W W. N A T U R E . C O M / N A T U R E | 2 7
SUPPLEMENTARY INFORMATION RESEARCH
28
of the epidermal growth factor receptor family. Cytometry A. 75A, 833-839 (2009). 94 Walt, D. R. Optical methods for single molecule detection and analysis. Anal. Chem. 85,
1258-1263 (2013). 95 Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods
10, 857-860 (2013). 96 Lee, J. H. et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360-
1363 (2014). 97 Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput
sequencing instrument. Nat. Biotechnol. 29, 659-664 (2011). 98 Buenrostro, J. D. et al. Quantitative analysis of RNA-protein interactions on a massively
parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562-568 (2014).