Supporting InformationGlessner et al. 10.1073/pnas.1000274107SI MethodsPennCNV-Affy. The CNV calling on the Affymetrix 6.0 platformused a very similar algorithm to those used in the Illumina arrays,but with different signal preprocessing steps. Unlike the Illuminaplatform, where normalized signal intensities (LRR and BAF)can be exported directly from the BeadStudio software, thesesignal intensity measures in the Affymetrix platform must becalculated from the collection of genotyped samples. We usedAPT (https://www.affymetrix.com/support/developer/powertools/changelog/index.html) to perform data normalization and signalextraction from rawCELfiles generated in genotyping experiments.The “median smoothing” and “quantile normalization” optionswere used in APT. In addition, the “expr.genotype = true” optionwas used to specify allele-specific signal extraction. This step usesa self-normalization algorithm that requires information containedwithin all of the genotyped samples. TheAPT software was used forgenotype calling as well, with a “confidence score” assigned to eachgenotype call. For each SNP marker, we then relied on the allele-specific signal intensity for the AA, AB, and BB genotypes on allgenotyped samples to construct three canonical genotype clusters,similar to the Illumina clustering generation approach. We did notuse aenotype calls with a confidence score < 0.1 in constructingcanonical genotype clusters. Once the canonical genotype clustershave been constructed, we can then transform the signal intensityvalues for each SNP to LRR and BAF values.The Affymetrix arrays contain nonpolymorphic (NP) markers
to provide better genome coverage than SNPmarkers only. Thesemarkers can be handled in a fashion similar to SNPs for CN in-ference, but there are some differences. First, the R value iscalculated as the signal intensity of the NP marker, rather than asthe sum of two alleles. The expected R value for each NP markeris calculated as the median signal intensity values for all geno-typed samples at this marker. In addition, the BAF values cannotbe derived for NPmarkers. Consequently, they are not used in thelikelihood calculation. Finally, due to the use of fewer probes, thevariance of LRR values for NP markers may be different fromthat for SNP markers; thus, the likelihood model parameters forLRR are different in NP markers and SNP markers.
CNV Validation by Quantitative PCR.Universal Probe Library (UPL;Roche) probes were selected using ProbeFinder v2.41 (Roche).Quantitative PCR was performed with an ABI 7500 Real-TimePCR instrument or an ABI Prism 7900HT Sequence DetectionSystem (Applied Biosystems). Each sample was analyzed in qua-druplicate either in 25 μL of reaction mixture [250 nM probe, 900nMeach primer, Fast Start TaqManProbeMaster (Roche), and 10ng genomic DNA] or in 10 μL of reaction mixture [100 nM probe,200 nM each primer, 1× Platinum Quantitative PCR SuperMix-Uracil-DNA-Glycosylase (UDG) with ROX (Invitrogen), and 25ng genomic DNA]. The values were evaluated using SequenceDetection Software v2.2.1 (AppliedBiosystems).Data analysiswasalso performed using either the ΔΔCT method or qBase. Refer-ence genes, chosen fromCOBL,GUSB, and SNCA, were includedbased on the minimal coefficient of variation, and then the datawere normalized by setting a normal control to a value of 1.
Illumina Infinium Assay for CNV Calling. The genotype data contenttogether with the intensity data provided by the genotyping arrayprovides high confidence for CNV calls. The array platform used inthis study provides a highly robust and reproducible SNP clusteringbecause of the random placement of SNP specific beads with ∼18-fold redundancy for each SNP. Using a SNP array provides allele
frequency data that can be analyzed andmore closely controlled forredundancy and high performance compared with public data-bases. This establishes a more robust definition for normal diploidstates than can be provided by intensity alone. The genotypeclustering establishes the probe performance at each locus for theexpected heterozygous genotype state. Based on the hybridizationefficiency, this may tend more to the DNP-tagged red range or thebiotin-tagged green range for any given locus. The normalizationperformed to calculate BAF from theta adjusts the SNP-specificrange to an expected value of 0.5. This creates more continuousdata, because the heterozygous state is properly modeled based onextensive genotyping. Another key technical strength of our studyis that the same array was typed at the same genotyping facility atthe same time, with the same cluster file for cases and controls.The data analysis is also standardized as described inMethods, andCNVs are called with the same version of PennCNV.
CNV Filtering Steps.We performed multiple CNV filtering steps aspart of the analysis. First, it is important to note that of the1,736,438 markers (848,415 SNPs and 888,023 CNs) with chro-mosome annotation, noncomplete genotyping failure, three ge-notype states observed, and normal theta patterns on the Affy-metrix 6.0 array, 33,797 (10,687 SNPs and 23,110 CNs) (1.95%)showed deletion and 44,023 (16,618 SNPs and 27,405 CNs)(2.54%) showedduplication in at least twoormore unrelated casesin the MGS/CHOP discovery cohort (frequency ≥ 0.205%). Thethreshold of two cases is selected because it is the minimal casefrequency to provide certainty that the calls are reliable in a givenregion. We find this upfront exclusion to be very similar to theinclusion threshold of 1% minor allele frequency in GWA SNPgenotype studies. This drastically decreases the number of testsperformed to correct for genome-wide testing.Second, all CNVs were called simultaneously in both cases and
controls and classified into CNVRs as defined in the main text. Atotal of 70 deletion CNVRs and 50 duplication CNVRs wereidentified. Third, to search for novel CNVs, we first filtered out allCNVRs that were not nominally significantly overrepresented inthe CHOP cases (P < 0.05) and carefully reviewed the raw data(BAF and LRR) for accurate CNV calling and statistical signifi-cance as described inMethods. This left us with 20 deletion CNVRsand 5 duplication CNVRs that we subsequently divided into twocategories: (i) CNVs present in cases only and absent in controls(n = 5 deletions and 2 duplications) and (ii) CNVs nominally sig-nificantly overrepresented in the cases (n = 15 deletions and 3duplications). For (i), based on the inclusion significance criteria,there were at least two cases per individual CNV. Thus, the datasetcomprising (i) and (ii) defines the CNVRs from the discovery co-hort that we used to test for novel schizophrenia CNVs.We next attempted to replicate these CNVRs in the in-
dependent case-control dataset (MGS/CHOP). Seven deletionCNVRs and one duplication CNVR survived our replicationcriteria (P < 0.05 after adjustment for the number of tests per-formed) or were absent in the independent control set and weresubsequently experimentally validated with two independentmethods, qPCR and Illumina Human Hap550 Beadchip. Theseresults are shown in Table 2.InTable 2, CNVRs that survivedmultiple testingwithBonferroni
adjustment in the discovery phase (P < 0.05 after correction for 20tests in the case of deletions and 5 tests in the case of duplications)and also survived replication and experimental validation are pre-sented in bold type. CNVRs significant in the discovery phase butnot in the replication phase are listed in Table S2.
Glessner et al. www.pnas.org/cgi/content/short/1000274107 1 of 10
Conversely, only one CNV locus overrepresented in controlsreached nominal significance. Thus, the CNVs overrepresented incases exceeded our null expectations. Given the diploid state ofthe vast majority of the genome, the existence of CNVs protectiveagainst the development of schizophrenia seems unlikely.It is important to note that CNV calling is not unequivocally
attained by any one single platform for multiple reasons, includingvariations in DNA provided, array type, DNA processing, dataprocessing, quality control, CNV calling algorithm, genomic fea-tures, genomic coverage, and statistical presentation of regions. Thiscan lead to a high false-positive rate on initial inspection despiteexhaustive efforts to standardize and control each confoundingcontribution.
UPenn Control Population Typed at CHOP. The control populationcollected at UPenn and typed at CHOP included HHDL andPennCATHresearchparticipants.HHDLisacross-sectionalstudyofgenetic factors contributing to elevated HDL cholesterol levels.Probands with elevated HDL cholesterol (>75th percentile for ageand sex) are identified by physician referrals or through theHospitalof the University of Pennsylvania clinical laboratory. Participantscompletea lifestyle questionnaire andprovideablood sample for themeasurement ofHDLand other lipid-related traits (1). PennCATHcomprises consecutive participants undergoing coronary angiogra-phy at University of Pennsylvania Health System hospitals. Of thesubjects in the Penn-CATH study, 85% were Caucasian, 55% weremale, and the average age was 62 years (2). Only Caucasian samplesvalidated by Eigenstrat principal components analysis were used.Inclusion criteria were as follows:
The subject must give signed, informed consent.The proband must have a consensus best-estimate DSM-IVdiagnosis of SZ (schizophrenia) or of schizoaffective disor-
der with at least 6 months’ duration of the “A” criteria forschizophrenia.
The subject must be over 18 years of age at interview (male orfemale).
The informant should have known the subject for at least 2years, be familiar with the psychiatric history, and have atleast 1 h of contact per week with the proband (close familymembers preferred).
Exclusion criteria were as follows:
The subject is unable to give informed consent to all aspects ofthe study.
The subjects is unable to speak and be interviewed in English(to ensure validity of the interviews).
Psychosis is deemed secondary to substance use by the con-sensus diagnostic procedure because psychotic symptomsare limited to periods of likely intoxication or withdrawal,or there are persistent symptoms likely related to substanceuse (e.g,, increasing paranoia after years of amphetamineuse, symptoms limited to visual hallucinations after exten-sive hallucinogen use).
The psychotic disorder is deemed secondary to a neurologicaldisorder, such as epilepsy, based on the nature and timing ofsymptoms. For example, nonspecific, nonfocal EEG abnor-malities are common in SZ, but subjects with psychosis thatemerged in the context of temporal lobe epilepsy would beexcluded.
The subject has severe mental retardation (MR). A subjectwith mild MR (IQ ≥ 55 or based on clinical and educationalhistory) can be included if SZ symptoms and history can beclearly established.
1. Edmondson AC, et al. (2009) Loss-of-function variants in endothelial lipase are a causeof elevated HDL cholesterol in humans. J Clin Invest 119:1042–1050.
2. Lehrke M, et al. (2007) CXCL16 is a marker of inflammation, atherosclerosis, and acutecoronary syndromes in humans. J Am Coll Cardiol 49:442–449.
3. Xu B, et al. (2008) Strong association of de novo CN mutations with sporadicschizophrenia. Nat Genet 40:880–885.
Glessner et al. www.pnas.org/cgi/content/short/1000274107 2 of 10
Fig. S1. Attempt to replicate 1q21.1 deletion association with schizophrenia. We observed 1q21.1 deletion in three cases and one control (P = 0.110).
Glessner et al. www.pnas.org/cgi/content/short/1000274107 3 of 10
Fig. S2. Attempt to replicate 15q13.3 deletion association with schizophrenia. We observed 15q13.3 deletion in two cases and three controls (P = 0.670).
Glessner et al. www.pnas.org/cgi/content/short/1000274107 4 of 10
Fig. S3. Frequency of CNVs observed in study subjects. Red indicates schizophrenia case deletion; blue, schizophrenia case duplication; black, schizophreniacontrol deletion; purple, schizophrenia control duplication. The maximum value displayed is 0.2, to make low-frequency CNVs (the majority of loci) visible.
Glessner et al. www.pnas.org/cgi/content/short/1000274107 5 of 10
Fig. S4. Affymetrix genotyping console browser showing the log 2 ratio of schizophrenia cases. Deleted 3′ of CACNA1B on 9q34.3 and of RET on 10q11.21.
Glessner et al. www.pnas.org/cgi/content/short/1000274107 6 of 10
Table S1. Schizophrenia cases: Clinical demographic data
Sex dx_cs_1 Percentage DescriptionMale 70% 295 100% Schizophrenic disordersFemale 30%
dx_cs_2 Percentage DescriptionAge at interview, years 303.9 31% Alcoholism
Mean 43.8 305 17% Drug abuse NDMedian 45 311 16% Depressive disorderSD 11.4 296.8 10% Bipolar
305.2 9% Cannabis abuse NDAge at onset, years 304.3 7% Cannabis dependence
Mean 21.0Median 20 dx_cs_3 Percent DescriptionSD 6.58 305.2 23% Cannabis abuse ND
304.3 20% Cannabis dependenceLocation Cases 305 11% Abuse of drugs ND
PA 387 303.9 10% AlcoholismAU 216 311 7% Depressive disorderIL 138 296.8 6% BipolarCO 123 305.7 5% Amphetamine abuse NDNY 112GA 97Israel 94 SumCritA*IA 87 0 1%CA 77 2 2%MO 58 3 7%LA 15 4 21%TX 5 5 24%
6 24%FH psychosis 7 21%
None 71.7%Suspected 16.1% cs_A1 cs_A3Definite 12.2% 0 1% 0 32%
1 99% 1 68%Months Ill cs_A1a cs_A4
Mean 273.2 0 34% 0 32%Median 276 1 66% 1 68%SD 138.9 cs_A2 cs_A5
0 5% 0 19%Months mood 1 95% 1 81%
Mean 24.1 cs_A2aMedian 1 0 50%SD 58.5 1 50%
FH, family history; ND, nondependent; Ill, residual schizophrenia; Mood, full mood syndromes (mania or depression).*All subjects with schizophrenia or schizoaffective disorder diagnoses were judged as meeting DSM-IV criteria B-F for schizophrenia.Consensus rating for DSM-IV schizophrenia criteria: A1, delusions; A1a, implausible, impossible, including Schneiderian delusions; A2,hallucinations; A2a, voices that converse with each other and/or comment on the subject’s thoughts or behavior; A3, disorganizedspeech, frequent derailment or incoherence; A4, grossly disorganized or catatonic behavior; A5, negative symptoms, affective flat-tening, alogia, avolition (1 = present; 0 = absent).
Glessner et al. www.pnas.org/cgi/content/short/1000274107 7 of 10
Table S2. Schizophrenia controls: Clinical demographic data
SexMale 46%Female 54%
Age, yearsMean 50.8Median 51SD 16.8
Location, n casesPA 74FL 73TX 67OH 66IL 51CA 45NY 43WA 42WI 41NC 40VA 39AZ 33CO 33IA 30MA 30MO 29GA 27NJ 27OK 27SC 27MN 26OR 24MI 23MD 22AL 21IN 21UT 21LA 19KY 18AR 15ID 14CT 13NE 13SD 13WV 12
Table S3. Attempts to replicate observations of increased frequency of schizophrenia caseswith novel genes affected by CNV
CNV associationconditions
100-kb CNVsize
DGV genesexcluded
10-probeCNV size
10-probeDGV genesexcluded
Multiple novelgenes
impacted
Cases 9.93% 5.83% 48.21% 38.59% 16.48%Controls 13.49% 6.74% 50% 36.99% 17.03%P value 0.009 0.399 0.386 0.443 0.741
Glessner et al. www.pnas.org/cgi/content/short/1000274107 8 of 10
Table
S4.
CNVRsstatistically
ove
rrep
resentedin
schizophrenia
casesan
dnotreplicated
inan
indep
enden
tco
hort
CNVR
Probes
Cases
disco
very
Controls
disco
very
Disco
very
Pva
lue
Cases
replication
Controls
replication
Combined
Pva
lue
OR
Gen
eDistance
from
gen
eTy
pe
Can
ary
chr7:321
7745
1–32
3929
7519
83
03.53
×10
−2
00
3.67
×10
−2
NA
PDE1
C0
Dup
Nch
r3:618
0364
1–61
8113
839
40
1.16
×10
−2
01
4.47
×10
−2
8.05
1PT
PRG
0Del
Nch
r4:162
4176
55–16
2424
561
124
01.16
×10
−2
66
1.70
×10
−2
3.36
1FSTL
5,RAPG
EF2
99.9
kb,
1.92
Mb
Del
Y
chr4:135
2767
04–13
5408
238
217
27.44
×10
−3
67
2.06
×10
−2
2.91
6PA
BPC
4L*
0Del
Nch
r5:209
7129
–21
1136
617
96
4.86
×10
−2
45
4.76
×10
−2
2.38
4IRX4
161kb
Del
Nch
r6:572
6814
3–57
2724
5813
40
1.16
×10
−2
02
9.91
×10
−2
4.02
4PR
IM2A
,RAB23
17.9
kb,
73.1
kbDel
N
chr12:60
5588
36–60
5639
7210
117
2.00
×10
−2
04
1.12
×10
−1
2.01
5FA
M19
A2
0Del
Nch
r19:42
6716
–43
4473
53
03.53
×10
−2
710
1.51
×10
−1
2.01
5SH
C2
14.7
kbDup
Nch
r5:527
0291
5–52
7181
3112
74
4.77
×10
−2
15
3.01
×10
−1
1.78
9FS
T10
9kb
Del
Nch
r15:99
9800
78–10
0033
288
365
24.22
×10
−2
03
3.16
×10
−1
2.01
2TM
2D3,
TARSL
20
Dup
N
chr6:164
9955
4–16
5087
1720
62
1.80
×10
−2
05
3.79
×10
−1
1.72
4ATX
N1
0Dup
Nch
r6:162
7404
76–16
2741
040
25
24.22
×10
−2
15
3.79
×10
−1
1.72
4PA
RK2
0Del
Nch
r15:32
7172
47–32
7651
0550
1513
2.51
×10
−2
824
4.10
×10
−1
1.25
2GJD
20
Del
Nch
r7:142
9413
48–14
2963
649
108
37.84
×10
−3
08
4.66
×10
−1
1.46
3AL8
3358
310
.7kb
Del
Nch
r4:114
5736
91–11
4581
335
274
14.26
×10
−2
05
7.39
×10
−1
1.34
0CAMK2D
11.7
kbDel
Nch
r1:920
1431
9–92
0210
2810
52
4.22
×10
−2
06
7.70
×10
−1
1.25
6TG
FBR3
0Del
Nch
r12:69
1589
42–69
1642
949
85
3.69
×10
−2
519
8.61
×10
−1
1.08
9PT
PRB,
KCNMB4
32.6
kb,
47.7
kbDel
N
NA,notap
plicab
le.
*From
Xuet
al.3
Glessner et al. www.pnas.org/cgi/content/short/1000274107 9 of 10
Table S5. Independent validation of CNVRs with qPCR and Illumina human Hap550 BeadChip
CNVRCNVtype Sample ID
Relativegenedosage SE Illumina chip ID Tag SNP ID
Illuminalog Rratio
chr22:17404806–19941349 Del 1222439226 0.524 0.035 4290041416_21 rs1934895 −1.052chr22:17404806–19941349 Del 9626794429 0.521 0.011 4276098785_11 rs1934895 −0.996chr22:17404806–19941349 Del 04C28087A* 1.000 0.173 4562262038_21 rs1934895 −0.018chr22:17404806–19941349 Del 04C28139A* 1.029 0.122 4562369091_21 rs1934895 −0.120chr16:29425212–30134444 Dup 7873015771 1.461 0.089 4079019681_A rs4563056 0.498chr16:29425212–30134444 Dup 8623080628 1.489 0.007 1582065333_A rs4563056 0.595chr16:29425212–30134444 Dup 9163054078 1.508 0.096 1846673715_A rs4563056 0.369chr16:29425212–30134444 Dup 04C28087A* 1.000 0.023 4562262038_21 rs4563056 −0.063chr16:29425212–30134444 Dup 04C28139A* 0.975 0.027 4562369091_21 rs4563056 −0.221chr16:68743639–68770545 Del 151169809 0.548 0.034 1587851079_A rs17028422 −0.135chr16:68743639–68770545 Del 04C28087A* 1.000 0.031 4562262038_21 rs2287983 −0.017chr16:68743639–68770545 Del 04C28139A* 0.954 0.017 4562369091_21 rs2287983 −0.059chr9:140145139–140152969 Del 1475148472 0.507 0.246 4147907270_B rs11137379 −1.765chr9:140145139–140152969 Del 3005849912 0.473 0.008 4068230324_B rs11137379 −2.270chr9:140145139–140152969 Del 4311028436 0.475 0.029 4276098403_12 rs11137379 −2.711chr9:140145139–140152969 Del 5678778794 0.545 0.128 1846673296_A rs11137379 −2.025chr9:140145139–140152969 Del 6711973667 0.428 0.154 1796039438_A rs11137379 −1.951chr9:140145139–140152969 Del 8934645510 0.432 0.023 4276098713_22 rs11137379 −2.440chr9:140145139–140152969 Del 9140263548 0.474 0.020 4276098270_12 rs11137379 −2.804chr9:140145139–140152969 Del 04C28087A* 1.000 0.036 4562262038_21 rs11137379 −0.003chr9:140145139–140152969 Del 04C28139A* 1.035 0.091 4562369091_21 rs11137379 −0.136chr10:42932615–42934354 Del 300030062 0.617 0.016 4276098188_12 rs715106 −0.175chr10:42932615–42934354 Del 1207317307 0.527 0.041 4523255137_11 rs715106 −0.204chr10:42932615–42934354 Del 1299194495 0.455 0.126 4506261167_11 rs715106 −0.161chr10:42932615–42934354 Del 5442260823 0.488 0.168 4562297116_21 rs715106 −0.174chr10:42932615–42934354 Del 9508038552 0.375 0.009 4157398294_A rs715106 −0.460chr10:42932615–42934354 Del 04C28087A* 1.000 0.026 4562262038_21 rs715106 −0.003chr10:42932615–42934354 Del 04C28139A* 1.057 0.049 4562369091_21 rs715106 −0.093chr3:4063809–4074877 Del 325927264 0.480 0.022 4240108555_11 rs317528 −0.508chr3:4063809–4074877 Del 2577168153 0.452 0.006 1890578271_A rs317528 −0.607chr3:4063809–4074877 Del 04C28087A* 1.000 0.068 4562262038_21 rs317528 −0.040chr3:4063809–4074877 Del 04C28139A* 1.040 0.041 4562369091_21 rs317528 −0.028chr4:9881886–9884092 Del 332702531 0.510 0.020 4290041726_12 rs10939814 −0.640chr4:9881886–9884092 Del 6483240361 0.440 0.170 4243114252_11 rs10939814 −0.752chr4:9881886–9884092 Del 9655625304 0.611 0.013 1837427556_A rs10939814 −0.585chr4:9881886–9884092 Del 9966812554 0.482 0.024 4276098355_21 rs10939814 −0.502chr4:9881886–9884092 Del 04C28087A* 1.000 0.110 4562262038_21 rs10939814 −0.040chr4:9881886–9884092 Del 04C28139A* 0.823 0.025 4562369091_21 rs10939814 −0.059chr18:38310567–38311765 Del 1317180605 0.000 0.000 4256206108_21 rs10468964 −4.483chr18:38310567–38311765 Del 3613918399 0.000 0.000 4276098785_12 rs10468964 −4.855chr18:38310567–38311765 Del 3673606183 0.000 0.000 4240108637_11 rs10468964 −4.646chr18:38310567–38311765 Del 5301838910 0.000 0.000 4523280020_21 rs10468964 −4.984chr18:38310567–38311765 Del 8334564658 0.000 0.000 4079300087_A rs10468964 −5.693chr18:38310567–38311765 Del 04C28087A 1.000 0.057 4562262038_21 rs10468964 −0.009chr18:38310567–38311765 Del 04C28139A 0.987 0.071 4562369091_21 rs10468964 0.033
*Negative control samples (normal diploid).
Glessner et al. www.pnas.org/cgi/content/short/1000274107 10 of 10