Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Supplementary Information for Genetic variation across the human olfactory receptor repertoire alters odor perception Trimmer C, Keller A, Murphy NR, Snyder LL, Willer JR, Nagai MH, Katsanis N, Vosshall LB, Matsunami H, Mainland JD Casey Trimmer Email: [email protected] This PDF file includes:
Supplementary text Figs. S1 to S4 Table S1 Captions for Datasets S1 to S6 References for SI reference citations
Other supplementary materials for this manuscript include the following:
Datasets S1 to S6
www.pnas.org/cgi/doi/10.1073/pnas.1804106116
2
Supplementary Information Text Materials and Methods Psychophysical testing. The collection of psychophysical data was previously reported by Keller et al. and approved by the Rockefeller University Institutional Review Board (1, 2). Briefly, 391 subjects rated both the intensity and pleasantness of 66 odors at two concentrations (designated “high” and “low”) and two solvents on a scale of 1 to 7, 1 being “extremely weak” or “extremely unpleasant” and 7 being “very strong” or “extremely pleasant”. The high and low odorant concentrations were intensity-matched to 1/1,000 and 1/10,000 dilutions of 1-butanol, respectively. For six odorants, pure odorant was rated as less intense than the 1/1000 dilution of 1-butanone, and these were not diluted for testing (odors described in Table S1). In addition, the detection thresholds for three odorants (pentadecalactone, vanillin, and isovaleric acid) were determined for each subject. Subjects rated the intensity and pleasantness of each odorant/concentration twice. Within-subject variability in odorant rating was determined by calculating the Pearson’s correlation between the first and second rating of all odorants. Test-retest correlation was calculated by examining the Pearson’s correlation between the first and second rating for each olfactory phenotype where duplicate trials were run. For each subject, the average intensity and pleasantness ratings at each odorant concentration (low and high) were ranked from 1 to 68, such that the odorant with the highest rated intensity for a concentration was ranked as 68 and the odorant with the lowest rated intensity was ranked as 1. Solvents (propylene glycol and paraffin oil) were rated three times at a single concentration, which was averaged and included in the ranking with the other 66 odorants at both concentrations. Ranking on a per-subject basis controlled for the different use of the rating scale among subjects. Detection thresholds were ranked on a per-odorant basis, such that the subject with the highest detection threshold for a particular odorant received a ranking of 1 and the subject with the lowest detection threshold for an odorant received a ranking of 391. Three measurements were used to calculate general olfactory acuity: percentage of odorants where the high concentration was rated as more intense than the low concentration, percentage of odorants where the high concentration was rated as more intense than solvent, and percentage of odorants where the low concentration was rated as more intense than the solvent. All three measurements were ranked on a per-task basis, the ranks were averaged for each subject, and finally this average was expressed as a rank among all subjects from 1 (lowest acuity) to 391 (highest acuity). Therefore, 276 different phenotypes were examined: perceived intensity and pleasantness rank of 66 odorants at two concentrations, two solvents at one concentration but included in the ranking for both high and low odorant concentration, detection threshold rank of three odorants, and overall olfactory acuity. Sequencing sample preparation and genotyping. Genomic DNA was prepared from venous blood samples with the PAXgene Blood DNA kit (Qiagen). DNA was sheared (Covaris) and ligated to adapters necessary for both sequencing and barcoding samples using the TruSeq kit (Illumina). The OR subgenome was captured using an Agilent SureSelect Target Enrichment kit custom-designed to enrich for the open reading frame of ORs and human orthologs of 256 additional genes expressed in mouse olfactory sensory neurons (SureSelect ELID 0352781, Dataset S6) (3). Paired-end sequencing was carried out on 332 participants using an Illumina GAIIx with a read length of 2x75 basepairs. Each sample was individually barcoded and twelve samples were multiplexed per lane. Sequence variants were identified using a custom-made pipeline that followed the current best practices recommended for variant detection by the Broad Institute (4, 5). Reads were aligned to the hg19 reference genome using BWA (6), PCR duplicates were removed with Picard Tools (7),
3
and the Genome Analysis Toolkit (GATK) (8) was used to realign reads around insertions and deletions (indels), recalibrate base quality scores, and genotype variant sites (SNPs and indels) across all 332 subjects simultaneously using variant quality score recalibration (SNPs) or standard hard filtering (indels). SNPs were phased with SHAPEIT (9) (excluding SNPs genotyped at a frequency < 95% and SNPs which deviated from Hardy-Weinberg equilibrium (p < 0.00001) (10)), and a custom-written R (11) script was used to translate the phased variant call file into 836 full-length haplotypes (418 ORs x a maternal and a paternal haplotype) for each subject. Finally, Sanger sequencing was used to confirm the sequence of 10 ORs in at least 68 subjects. Genomic DNA was amplified with Phusion DNA Polymerase (Thermo Fisher Scientific) or EmeraldAmp PCR Master Mix (Clontech) using primers up- and downstream of the OR’s open reading frame. PCR products were purified by ultrafiltration on a vacuum manifold (NucleoFast 96 PCR, Machery Nagel) and sequenced (ABI 3730XL) at the University of Pennsylvania DNA Sequencing Facility. Association Analysis. The association between olfactory phenotypes and single OR genotype was analyzed using multiple linear regression to regress the haplotype count (0, 1, or 2) of individual ORs against all 276 phenotype measurements using the R statistical package. The analysis was limited to haplotypes found at a frequency greater than 5% in our cohort and ORs with low genotype frequency were removed (29 ORs: OR4F5, OR2T8, OR2T7, OR2T5, OR2T29, OR2T35, OR2T27, OR5AC1, OR5H8, OR2A4, OR10AC1, OR2A42, OR2A7, OR2A1, OR4F21, OR51A2, OR52Z1, OR52N1, OR5G3, OR8G1, OR11H12, OR11H2, OR4Q2, OR11H7. OR4E1, OR4F4, OR1D5, OR1D4, OR4F17). Note the ORs eliminated did not have any single SNPs which significantly associated with odorant perception (see below for details on SNP association analysis). To correct for population structure (ancestry), the first two PCs calculated from all genetic data were incorporated as covariates in the linear model (12). Principal components were calculated using the SNPRelate (13) package in R (after removing SNPs genotyped at a rate < 95% and SNPs in linkage disequilibrium > 0.5 (10)). P-values were corrected for multiple comparisons using false discovery rate (14). Although our ranked data is not normally distributed, linear regression was used to find the coefficients for each haplotype. Genotype/phenotype association was also analyzed using a Kruskal-Wallis test, which demonstrated that p-values between the parametric and non-parametric test were significantly correlated (r2 = 0.77, p < 0.0001). To analyze phenotype association with the genotype of single SNPs, individual SNP counts were regressed against phenotype measurements, and the first 2 PCs were incorporated as covariates to correct for ancestry in the study population using PLINK (15, 16). The analysis was limited to variants with minor allele frequencies greater than 5% that did not significantly deviate from Hardy-Weinberg equilibrium (p > 0.00001) (10). P-values were corrected for multiple comparisons using false discovery rate. To examine linkage disequilibrium among SNPs in OR loci, SNP genotype correlations were calculated from 1000 Genomes data (European population) using LocusZoom (17, 18). Contribution of ancestry, age, and gender to perception. First, the correlations between olfactory phenotypes and the first two PCs of genetic variation were calculated, and p-values were corrected for multiple comparisons using false discovery rate. The total contribution of single OR genotype, ancestry, age, and gender to phenotype variability was calculated by regressing the count of individual OR haplotypes against all 276 phenotypes and incorporating the first two PCs calculated from all genetic data, age (in years), and gender as covariates (full model). P-values were corrected for multiple comparisons using false discovery rate. To calculate the individual contribution (r2) of single OR genotype, ancestry, age, and gender to each phenotype, the percentage of variance explained by a linear model in which each covariate was
4
removed was compared to the full linear model. To determine if a covariate significantly altered the linear model, a one-way analysis of variance was used to compare models with and without the covariate. Consensus odorant receptor. The online version of MAFFT version 7 was utilized to create OR6Y1, OR6B2, and OR56A4 consensus proteins based on the alignment of the orthologs found in Homo sapiens, Gorilla gorilla gorilla, Pan paniscus, Pan troglodytes, Pongo abelii, Macaca mulatta, Mandrillus leucophaeus, Callithrix jacchus, Microcebus murinus, Rattus norvegicus, and Mus musculus (19, 20). The consensus genes were then designed for expression in human cells using the IDT Codon Optimization Tool and synthesized as a standard IDT gBlocks Gene Fragment. OR cloning. OR haplotyes for functional testing in cell culture were generated by cloning the respective sequence from pooled genomic DNA, from a specific subject, or by generating polymorphic SNPs by site-directed mutagenesis using overlap extension PCR (21). The open reading frame of each OR was amplified with Phusion polymerase and cloned into the pCI vector (Promega) containing the first 20 amino acids of human rhodopsin (22). Cloned sequences were confirmed by Sanger sequencing (ABI 3730XL) at the University of Pennsylvania DNA Sequencing Facility. Luciferase assay. The Dual-Glo Luceriferase Assay System (Promega) was used to measure in vitro OR activity as described previously (23, 24). Hana3A cells were co-transfected with OR, a short from of receptor transporter protein 1 (RTP1S) (25), the type 2 muscarinic acetylcholine receptor (M3-R) (26), Renilla luciferase driven by an SV40 promoter, and firefly luciferase driven by a cyclic AMP response element. 18-24 hours post-transfection, ORs were treated with medium or serial dilutions of odorants spanning 1nM to 1mM in triplicate. Four hours after odorant stimulation, luciferase activity was measured using the Synergy 2 (BioTek). Normalized luciferase activity was calculated by dividing firefly luciferase values by Renilla luciferase values for each well. Results represent mean response (for 3 wells) +/- s.e.m. Responses were fit to a three-parameter sigmoidal curve. An odorant was considered an agonist if the standard error of the logEC50 was less than 1 log unit, the 95% confidence intervals for the top and bottom parameters of the curve did not overlap, and the extra sum-of-squares test confirmed that the odorant activated OR-transfected cells more than empty-vector-transfected cells. An extra sum-of-squares was also used to determine if one model fit the data from two haplotypes better than two separate models. Data were analyzed with GraphPad Prism 6.
5
Fig. S1. Association between OR haplotype and odorant perception. ORs are plotted by chromosomal position against association with an olfactory phenotype (-log10 p-values): perceived intensity (circles) or pleasantness (squares) rank of 68 odorants at high (filled) and low (open) concentrations, detection threshold rank of 3 odorants (triangle), or general olfactory acuity (diamond). The gray line represents p = 0.05 following multiple comparisons correction (FDR). The black line represents p = 0.66 (following FDR), the significance cutoff of the top 50 associations which were tested in cell culture (colored points). Associations with mixtures above the black line are shown in gray (spearmint, citronella, orange, and lime), because we did not analyze mixtures as ligands in our in vitro assay. Color indicates the associated odorant. For loci where multiple ORs associate with an odorant, only the top association is labeled. Associated ORs are identified at the top of the graph. Previously published associations are shown in white (different datasets (27, 28)) or indicated by black arrows (same dataset (1, 29)).
OR6Y
1
OR10
G4
OR10
Z1
OR10
R2
OR6B
2
OR10
C1
OR56
A4
OR10
J5
OR4K
15
OR7D
4OR
7D4
OR2A
25OR
13C8
OR10
G9
OR5C
1
OR1C
1
OR11
A1
OR8D
4
OR52
A4
OR2T
10
OR12
D2
OR7D
4
OR5W
2
OR6A
2
OR6C
1
OR4A
16
OR56
A1
OR4D
6
OR14
J1
OR5A
2OR
4D5
OR13
G1OR
2W3
OR1A
1OR
3A2
OR2Y
1
OR4F
15
OR11
H1
OR4C
15
OR10
Q1
OR10
D3
2
4
6
8
10
Chromosome
−log10(p)
0
Pass FDRp = 0.05
Top 50p = 0.66
Spea
rmin
t
Lim
e
Citr
onel
la
Ora
nge
Ora
nge
Para
ffin
oil
Dia
cety
lOO
And
rost
enon
e
HO
And
rost
adie
none
O
H
HH
Gua
iaco
l
OH
OCH3
2−B
utan
one
O
But
yric
aci
d
4−M
ethy
lval
eric
aci
d
2−Et
hylfe
ncho
lOHH
2−Et
hylfe
ncho
lOH
H
Fenc
hone
O
H
Lina
lool
HO
Lina
lool
HO
Non
yl a
ldeh
yde
O
Oct
yl a
ldeh
yde
OIsob
utyr
alde
hyde
O
Citr
al
OH
Hep
tyl a
ceta
te
But
yl a
ceta
te
Cap
roic
aci
d
Isov
aler
ic a
cid
O
OH
Isov
aler
ic a
cid
O
OH
Isob
utyr
ic a
cid
Cin
eole
Euge
nol m
ethy
l eth
er
(−)−
Men
thol
Gal
axol
ide
r−Li
mon
ene
Isoe
ugen
ol
OH
O
Dia
llyl s
ulfid
eS
Bou
rgeo
nal
O
Citr
al
OH
Hep
tyl a
ceta
te
Cap
roic
aci
d
1 2 3 5 6 7 8 9 10 11 12 1415 17 1922X16
OR11H7/Isovaleric acidOR2J3/Cis-3-hexen-1-ol
AcuityDetection
Intensity [High]Intensity [Low]Pleasantness [High]Pleasantness [Low]
6
Fig. S2. Genetic variation in OR clusters associates with odorant perception. Association (-log10 p-values) between SNPs in 27 different OR loci and the 38 perceptual phenotypes shown in the top right corner of each. For loci associated with multiple phenotypes, an * indicates the phenotype shown in the panel. The most highly associated SNP is shown in purple, and flanking SNPs are colored according to their linkage disequilibrium with the best-associated SNP (pairwise r2 values from 1000 Genomes EUR data (17, 18)). Recombination rates in each locus are shown in blue (right axis). ORs labeled at the top of the plot were tested in cell culture for their response to the associated odorant, and associated ORs are shown in bold. Unlabeled lines at the top of the plot are non-OR genes or ORs that were not tested in cell culture due to low correlation with the top SNP.
a b c d e f
h i j k l n
o p q r ut
v w x
g
y
s
m
OR10R2
OR10Z1
158.5 158.6Position on Chromosome 1 (Mb)
OR10K1
OR6Y1
OR6P1
OR10X1
0.20.40.60.8
r2 *Diacetyl Intensity2-Butanone Intensity
*Guaiacol IntensityGuaiacol PleasantnessIsoeugenol Pleasantnessr-Limonene Intensity
123.8 123.9Position on Chromosome 11 (Mb)
OR8D4
OR4D5
OR10S1
OR10G4
OR10G8
OR10G9
OR6T1
OR10G6
OR10G7
0.20.40.60.8
r2
240.9 241Position on Chromsome 2 (Mb)
OR6B2
OR6B3
Isobutyraldehyde Intensity
0.20.40.60.8
r2
OR56A4
*Isovaleric acid PleasantnessOR56
A1
Isobutyric acid Pleasantness
6 6.1Position on Chromosome 11 (Mb)
0.20.40.60.8
r2
OR10J1
OR10J5
Bourgeonal Intensity
159.4 159.5 159.6Position on Chromosome 1 (Mb)
0.20.40.60.8
r2
20.4 20.5Position on Chromosome 14 (Mb)
OR4K15
0.20.40.60.8
r2 -
log 1
0 (p)
0
2
4
6
8
10
*Diallyl sulfide Pleasantness
*Androstenone IntensityAndrostadienone Intensity
AndrostadienonePleasantness
9.3 9.4Position on Chromosome 19 (Mb)
OR7D2
OR7D4
0.20.40.60.8
r2
OR6B1
OR2A5
OR2A25
OR2A12
OR2A2
OR2A14
*Linalool Intensity
143.7 143.8Position on Chromosome 7 (Mb)
0.20.40.60.8
r2
OR13F1
OR13C4
OR13C8
OR13C5
OR13C3
OR13C2
OR13C9
Citral Intensity
107.3 107.4Position on Chromosome 9 (Mb)
0.20.40.60.8
r2
OR1L4
OR1L6
OR5C1
OR1K1
Citral Intensity
125.5 125.6Position on Chromosome 9 (Mb)
0.20.40.60.8
r2
247.9 248Position on Chromosome 1 (Mb)
Linalool Intensity*4-Methylvaleric acid Intensity
Butyric acid Pleasantness
OR2W3
OR14A16
OR6F1
OR13G1
OR1C1
OR11L1
OR14A2
0.20.40.60.8
r2
-lo
g 10 (
p)
0
2
4
6
8
10
OR52A4
OR52A5
OR52A1
OR52E2
Octyl aldehyde Pleasantness
5.1 5.2Position on Chromosome 11 (Mb)
0.20.40.60.8
r2
248.7 248.8Position on Chromosome 1 (Mb)
OR2T29
OR2T11
OR2T35
OR2T27
OR2T10
OR2T34
OR14I1
0.20.40.60.8
r2 Nonyl aldehyde Intensity
OR5W2
OR5L1
OR5I1
OR5L2
OR5D14
(-)-Menthol Intensity
OR5D16
55.6 55.7Position on Chromosome 11 (Mb)
0.20.40.60.8
r2 Cineole Pleasantness
6.8 6.9Position on Chromosome 11 (Mb)
OR2AG2
OR10A5
OR2D2
OR2AG1
OR6A2
OR10A2
OR10A4
0.20.40.60.8
r2
55.8Position on Chromosome 12 (Mb)
OR6C1
OR6C3
OR6C75
OR6C65
OR6C68
OR6C70
OR6C2
OR6C76
0.20.40.60.8
r2
-lo
g 10 (
p)
0
2
4
6
8
10
Heptyl acetate Intensity
55.1 55.2 55.3Position on Chromosome 11 (Mb)
OR4A16
OR4A15
OR4C15
OR4C11
Eugenol methyl etherIntensity
0.20.40.60.8
r2
OR5AN1
OR5A2
OR5A1
OR4D6
OR4D10
Galaxolide Intensity
59.2Position on Chromosome 11 (Mb)
0.20.40.60.8
r2
29.1 29.2Position on Chromosome 6 (Mb)
0.20.40.60.8
r2
OR2W1
OR2B3
OR2J2
OR14J1
OR2J3
Caproic acid Intensity
OR1A2
OR1A1
OR1D4
*Caproic acid IntensityAndrostadienone
Pleasantness
OR3A2
OR3A1
OR3A4
3.1 3.2Position on Chromosome 17 (Mb)
0.20.40.60.8
r2
OR2Y1
Heptyl acetate Pleasantness
180.1 180.2Position on Chromosome 5 (Mb)
0.20.40.60.8
r2 Butyl acetate Intensity
102.3 102.4Position on Chromosome 15 (Mb)
OR4F6
OR4F15
0.20.40.60.8
r2 Isovaleric acid Intensity
OR11H1
16.4 16.5Position on Chromosome 22 (Mb)
0.20.40.60.8
r2
55.4 55.5Position on Chromosome 11 (Mb)
Paraffin oil Pleasantness
OR4C15
OR5D14
0.20.40.60.8
r2 2-Ethylfenchol IntensityOR10W1
OR5B17
OR10Q1
58 58.1Position on Chromosome 11 (Mb)
0.20.40.60.8
r2
20
40
60
80
100
Recombination Rate (cM
/Mb)
124.1 124.2 124.3
0.20.40.60.8
r2
Position on Chromosome 11 (Mb)
OR10D3
OR8G1
OR8G5
OR8D2
OR8G2
OR8D1
OR8B2
OR8B3
Isoeugenol Intensity
z
0
20
40
60
80
100
Recombination Rate (cM
/Mb)
0
20
40
60
80
100
Recombination Rate (cM
/Mb)
40
60
80
100
Recombination Rate (cM
/Mb)
-lo
g 10 (
p)
0
2
4
6
8
10
aa
29.3 29.4
0.20.40.60.8
r2
OR12D2
OR11A1
OR10C1
*2-Ethylfenchol IntensityFenchone Pleasantness
Position on chr11 (Mb)
7
Fig. S3. Modified OR6Y1 responds to diacetyl. (a) Comparison of human OR6Y1 to orthologous receptors from ten species. The ten species differ at only nine amino acids, and the most common amino acid for each position, highlighted in gray, was used to make a consensus receptor with nine amino acid changes from human OR6Y1. (b) Response of human and modified OR6Y1 to increasing doses of diacetyl. Y-axis values are normalized to the response of the consensus haplotype. Error bars, s.e.m. of three replicates.
Consensus
Homo sapienGorilla gorillaPan paniscus
Pan troglodytesPongo abelii
Macaca mulattaMandrillus leucophaeus
Callithrix jacchusMicrocebus murinus
Rattus norvegicusMus musculus
Amino acid # 11 16 32 35 122 176 197 202 259HRRRRRRRGWW
R
RHHHHHHHHHH
H
FFFFFLLLLLL
L
IIIIIVVVVVV
V
IIIIIVVVVVV
V
MTTTTTTTTTT
T
AAAAASSSSSS
S
MLLLLLLLLLL
L
MTTTTTTTTTT
T
a
b
[Diacetyl] (log M)
Human OR6Y1Consensus OR6Y1
Vector control
Norm
alize
d Lu
cifer
ase
Valu
e
-6 -5 -4 -3-50
0
50
100
150
8
Fig. S4. Genetic ancestry correlates with olfactory perception. (a) Self-reported ancestry clustered when participants were plotted according to the first two principal components (PCs) calculated from all available genotype data for our subject cohort. (b) Correlation between the first PC of genetic variation calculated using SNPs from all targeted genes (OR and non-OR genes) and the first PC calculated using SNPs identified in non-OR genes only (r2 = 0.98, p < 0.0001). (c) Percent variance explained by PC1 for all 276 phenotypes (ordered by percent variance explained). PC1 explains greater than 4% of the variance for 6 phenotypes (labeled) (p < 0.01 following FDR). Bold labels indicate high odorant concentration, and plain labels indicate low odorant concentration. (d) Correlation between PC1 and the perceptual ranking for the pleasantness of vanillin (r2 = 0.077, p = 0.0001). (e) Percent variance explained by PC2 for all 276 phenotypes (ordered by percent variance explained). The top 5 phenotypes are labeled (r2 > 0.027, p > 0.05 following FDR). Bold labels indicate high odorant concentration, and plain labels indicate low odorant concentration. (f) Correlation between PC2 and the perceived pleasantness of spearmint (r2 = 0.042, p = 0.054).
Principal Component 1
0.0
0.1
0.2
−0.10 −0.05 0.00 0.05
Princ
ipal C
ompo
nent
2
African−AmericanCaucasianAsianNative AmericanOtherDo Not Wish To Specify
a
c
0
20
40
60
−0.10 −0.05 0.00 0.05Principal Component 1
Vanil
lin [L
ow] P
leasa
ntnes
s Ran
kM
ore
pleas
ant
Less
plea
sant
e
0
20
40
60
0.0 0.1 0.2Principal Component 2
Spea
rmint
[High
] Plea
santn
ess R
ank
Mor
e ple
asan
tLe
ss p
leasa
nt2−Decenal Intensity
Vanillin DetectionUndecanal Intensity
Octyl acetate PleasantnessOverall acuity
0
2
4
6
8
0 50 100 150 200 250Phenotype
% V
arian
ce E
xplai
ned
by P
C1
Spearmint Pleasantness
Methyl salicylate IntensityAnise PleasantnessUndecanal Intensity
Decyl aldehyde Intensity
0
1
2
3
4
0 50 100 150 200 250
% V
ariat
ion E
xplai
ned
by P
C2
Phenotype
−0.10
−0.05
0.00
0.05
−0.10 −0.05 0.00 0.05Principal Component 1: All SNPs
Princ
ipal C
ompo
nent
1: n
on−O
R SN
Ps
r2 = 0.98, p < 0.0001
b
r2 = 0.077, p < 0.0001
r2 = 0.042, p = 0.054
d
f
High Odorant ConcentrationLow Odorant Concentration
High Odorant ConcentrationLow Odorant Concentration
9
Table S1. Odorants used in psychophysical testing Name CAS # [Low] [High] Solvent (-)-Menthol 2216-51-5 1/400 1/40 propylene glycol (+)-menthol 15356-60-2 1/400 1/40 propylene glycol 1-butanol 71-36-3 1/10000 1/1000 paraffin oil 2-butanone 78-93-3 1/10000 1/5000 paraffin oil 2-decenal 3913-71-1 1/10000 1/1000 paraffin oil 2-ethylfenchol 18368-91-7 1/100000 1/5000 paraffin oil 2-methoxy-4-methylphenol 93-51-6 1/1000000 1/100000 paraffin oil 4-methylvaleric acid 646-07-1 1/5000000 1/10000 paraffin oil ambrette 8015-62-1 1/1000000 1/1000 paraffin oil androstadienone 794-58-9 1/100000 1/1000 propylene glycol androstenone 18339-16-7 1/100000 1/1000 propylene glycol anise 8007-70-3 1/50000 1/5000 paraffin oil banana
1/250000 1/10000 paraffin oil
bourgeonal 18127-01-0 1/2000 1/200 paraffin oil butyl acetate 123-86-4 1/1000000 1/1000 paraffin oil butyric acid 107-92-6 1/1000000 1/250000 paraffin oil caproic acid 142-62-1 1/1000000 1/250000 paraffin oil cedarwood 68990-83-0 1/5000 1/2000 paraffin oil cineole 470-82-6 1/100000 1/1000 paraffin oil cinnamon 8015-91-6 1/50000 1/10000 paraffin oil cis-3-hexen-1-ol 928-96-1 1/250000 1/100000 paraffin oil citral 5392-40-5 1/50000 1/5000 paraffin oil citronella 8000-29-1 1/250000 1/10000 paraffin oil decyl aldehyde 112-31-2 1/25000 1/5000 paraffin oil diacetyl 431-03-8 1/10000000 1/10000 paraffin oil diallyl sulfide 592-88-1 1/2000000 1/100000 paraffin oil diphenyl ether 101-84-8 1/500 1/200 paraffin oil ethyl vanillin 121-32-4 1/1000 1/200 propylene glycol ethylene brassylate 105-95-3 1/500 1/100 paraffin oil eugenol 97-53-0 1/25000 1/1000 paraffin oil eugenol acetate 93-28-7 1/1000000 1/100 paraffin oil eugenol methyl ether 93-15-2 1/500 1/10 paraffin oil fenchone 7787-20-4 1/25000 1/1000 paraffin oil fir 8002-09-3 1/100000 1/10000 paraffin oil galaxolide 1222-05-5 1/10 1/1000 paraffin oil geranyl acetate 105-87-3 1/10000 1/200 paraffin oil guaiacol 90-05-1 1/50000000 1/1000000 paraffin oil
10
heptaldehyde 111-71-7 1/10000000 1/25000 paraffin oil heptyl acetate 112-06-1 1/25000 1/2500 paraffin oil hexyl butyrate 2639-63-6 1/1000 1/100 paraffin oil isobornyl acetate 125-12-2 1/2000000 1/100 paraffin oil isobutyraldehyde 78-84-2 1/100000 1/1000 paraffin oil isobutyric acid 79-31-2 1/10000 1/1000 paraffin oil isoeugenol 97-54-1 1/25000 1/2000 paraffin oil isovaleric acid 503-74-2 1/2000000 1/20000 paraffin oil jasmine 8022-96-6 1/200000 1/1000 paraffin oil lime 8008-26-2 1/2000000 1/5000 paraffin oil linalool 78-70-6 1/100000 1/100 paraffin oil methanethiol 5188-07-8 1/50000000 1/10000000 water methyl salicylate 119-36-8 1/25000 1/1000 paraffin oil nonyl aldehyde 124-19-6 1/100000 1/5000 paraffin oil nutmeg 8008-45-5 1/25000 1/1000 paraffin oil octyl acetate 112-14-1 1/200 1/100 paraffin oil octyl aldehyde 124-13-0 1/250000 1/25000 paraffin oil orange 8008-57-9 1/2500 1/100 paraffin oil paraffin oil 8012-95-1
pentadecalactone 106-02-5 1/2000 1/500 propylene glycol phenyl acetaldehyde 122-78-1 1/8000000 1/2000000 paraffin oil propylene glycol 57-55-6
pyrazine 290-37-9 1/500 1/10 propylene glycol r-carvone 6485-40-1 1/100000 1/1000 paraffin oil r-limonene 5989-27-5 1/250 1/10 paraffin oil sandalwood 8006-87-9 1/10000 1/1000 paraffin oil spearmint 8008-79-5 1/250000 1/250000 paraffin oil terpineol 98-55-5 1/10000 1/100 paraffin oil terpinyl acetate 80-26-2 1/1000 1/500 paraffin oil undecanal 112-44-7 1/10000 1/1000 paraffin oil vanillin 121-33-5 1/1000 1/200 propylene glycol
11
Additional Datasets (separate files) Dataset S1. Association between OR haplotypes and olfactory phenotypes Dataset S2. Association between individual SNPs and olfactory phenotypes Dataset S3: OR haplotypes tested in cell culture Dataset S4. Percent variance explained by genetic and demographic factors Dataset S5: Association between PC1 and PC2 and olfactory phenotypes Dataset S6: Genome locations targeted in sequencing References 1. Keller A, Zhuang H, Chi Q, Vosshall LB, Matsunami H (2007) Genetic variation in a
human odorant receptor alters odour perception. Nature 449(7161):468–72. 2. Keller A, Hempstead M, Gomez IA, Gilbert AN, Vosshall LB (2012) An olfactory
demography of a diverse metropolitan population. BMC Neurosci 13:122. 3. Mainland JD, Willer JR, Matsunami H, Katsanis N (2013) Next-Generation Sequencing of
the Human Olfactory Receptors. Olfactory Recept Methods Protoc 1003:133–147. 4. Auwera GA Van Der, et al. (2014) From FastQ data to high confidence varant calls: the
Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics, eds Bateman A, Pearson WR, Stein LD, Stormo GD, Yates JR doi:10.1002/0471250953.bi1110s43.From.
5. DePristo MA, et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43(5):491–8.
6. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760.
7. Broad Institute Picard Tools 1.119. Available at: http://broadinstitute.github.io/picard. 8. McKenna A, et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for
analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–303. 9. Delaneau O, Marchini J, Zagury J-F (2012) A linear complexity phasing method for
thousands of genomes. Nat Methods 9(2):179–81. 10. Anderson C a, et al. (2010) Data quality control in genetic case-control association
studies. Nat Protoc 5(9):1564–73. 11. R Development Core Team (2008) R: A language and environment for statistical
computing. R Found Stat Comput Vienna, Austria. 12. Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population
stratification in genome-wide association studies. Nat Rev 11(7):459–463. 13. Zheng X, et al. (2012) A high-performance computing toolset for relatedness and principal
component analysis of SNP data. Bioinformatics 28(24):3326–3328. 14. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and
powerful approach to multiple testing. J R Stat Soc Ser B (57):289–300. 15. Purcell S, et al. (2007) PLINK: a tool set for whole-genome association and population-
based linkage analyses. Am J Hum Genet 81(3):559–75. 16. Purcell S PLINK 1.07. 17. Pruim RJ, et al. (2011) LocusZoom: Regional visualization of genome-wide association
scan results. Bioinformatics 27(13):2336–2337. 18. Abecasis GR, et al. (2010) A map of human genome variation from population-scale
sequencing. Nature 467(7319):1061–73. 19. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7:
Improvements in performance and usability. Mol Biol Evol 30(4):772–780. 20. Katoh K MAFFT version 7. 21. Heckman KL, Pease LR (2007) Gene splicing and mutagenesis by PCR-driven overlap
extension. Nat Protoc 2(4):924–32.
12
22. Krautwurst D, Yau K, Reed RR, Hughes H (1998) Identification of Ligands for Olfactory Receptors by Functional Expression of a Receptor library. Cell 95:917–926.
23. Zhuang H, Matsunami H (2008) Evaluating cell-surface expression and measuring activation of mammalian odorant receptors in heterologous cells. Nat Protoc 3(9):1402–13.
24. Trimmer C, Snyder LL, Mainland JD (2014) High-throughput analysis of mammalian olfactory receptors: measurement of receptor activation via luciferase activity. J Vis Exp (88):1–10.
25. Zhuang H, Matsunami H (2007) Synergism of accessory factors in functional expression of mammalian odorant receptors. J Biol Chem 282(20):15284–93.
26. Li YR, Matsunami H (2011) Activation state of the M3 muscarinic acetylcholine receptor modulates mammalian odorant receptor signaling. Sci Signal 4(155):ra1.
27. Menashe I, et al. (2007) Genetic elucidation of human hyperosmia to isovaleric acid. PLoS Biol 5(11):e284.
28. McRae JF, et al. (2012) Genetic variation in the odorant receptor OR2J3 is associated with the ability to detect the “grassy” smelling odor, cis-3-hexen-1-ol. Chem Senses 37(7):585–93.
29. Mainland JD, et al. (2014) The missense of smell: functional variability in the human odorant receptor repertoire. Nat Neurosci 17(1):114–20.