Upload
valentine-kelley
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
SNP molecular function, evolution and disease
Md Imtiyaz Hassan, Ph.D
Effect on molecular function
Phenotype
Natural selection
Medical Genetics
Structural Biology Biochemistry
Evolutionary Genetics
Predicting the effect of mutations in proteins
Why is this useful?
Understanding variation in molecular function and structure
Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection
Linkage analysisRare
Classical association studies
ControlDisease
Common
Quantitative trait
Mendelists Biometricians
Forces to maintain variation:
Selection
Mutation
Common disease / Common variant
Trade off (antagonistic pleiotropy) Balancing selectionRecent positive selection Reverse in direction of selection
ExamplesAPOE Alzheimer’s diseaseAGT HypertensionCYP3A HypertensionCAPN10 Type 2 diabetes
Individual human genome is a target for deleterious mutations !
~40% of human Mendelian diseases are due to hypermutable sites
Frequency of deleterious variants is directly proportional to mutation rate (q=/s)
Multiple mostly rare variants
Many deleterious alleles in mutation-selection balance
Examples
Plasma level of HDL-CPlasma level of LDL-CColorectal adenomas
Harmful mutations
Function: damaging
Evolution: deleterious
Phenotype: detrimental
Advantageous pseudogenization (Zhang et al. 2006)
Gain of function disease mutations
Sickle Cell Anemia
N E L V T L T C L A R G F S - P K D V L V R W L R E S A T I T C L V T G F S - P A D V F V Q W M G G S L R L S C V A S G I T - F S G Y D M Q W V T P G L T L T C T V S G F S - L S S Y D M G W V G Q K A K M R C I P E - - - - K G H P V V F W Y G Q E A T L W C E P I - - - - S G H S A V F W Y G Q Q V T L S C F P I - - - - S G H L S L Y W Y R K D V S L T C L V V G F N - P G D I S V E W T G Q K L T L K C Q Q N - - - - F N H D T M Y W Y R D K A T F T C F V V G S D - L K D A H L T W E S K S A T L T C R V S N M V N A D G L E V S W W G A R T S L N C T F S D - - - S A S Q Y F W W Y G A S L Q L R C K Y S Y - - - S A T P Y L F W Y N G A P K L T C L V V D L E S E K N V N V T W N E A T V T L T C V V S N - - A P Y G V N V S W T
Profile
Ala -1.2 1.1 -0.6 -0.8 0.3 ... ... Arg 0.6 -0.3 -0.3 -0.5 0.6 ... ... Asn -1.1 -0.5 -0.5 -0.7 0.4 ... ... Asp -0.9 -0.3 -0.3 -0.5 0.6 ... ... Cys 0.4 -0.5 0.6 0.8 -0.3 ... ... Gln ... ... ... ... ... ... ...
... ... ... ... ... ... ... ...
protein
multiple alignment
profile
PolyPhen
Prediction rate of damaging substitutions
possibly probably
Disease mutations
Divergence
82% 57%
9% 3%
Polymorphism 27% 15%
10% of PolyPhen false-positives are due to compensatory substitutions
Neutral mutation model
Human ACCTTGCAAATChimpanzee ACCTTACAAATBaboon ACCTTACAAAT
Prob(TAC->TGC) Prob(TGC->TAC)
Prob(XY1Z->XY
2Z) 64x3 matrix
Strongly detrimental mutations
Effectively neutral mutations
Mildly deleterious mutations
Mildly deleterious mutations
54 genes, 757 individuals
inflammatory response236 genes, 46-47 individuals
DNA repair and cell cycle pathways
518 genes, 90-95 individuals
Fitness and selection coefficient
Wild type New mutation
N1= 4 N2= 3
Fitness 1N1
N2 = 1 – s
Selection coefficient
Classical association studies
ControlDisease
Common
Genetic polymorphism
•Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations.
•Genetic Mutation: A change in the nucleotide sequence of a DNA molecule.
Genetic mutations are a kind of genetic polymorphism.
Single nucleotidePolymorphism(point mutation)
Repeat heterogeneity
Genetic Variation
SNP Single Nucleotide Polymorphisms
•A Single Nucleotide Polymorphism is a source variance in a genome. •A SNP ("snip") is a single base mutation in DNA.
•SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms).
•There are two types of nucleotide base substitutions resulting in SNPs:–Transition: substitution between purines (A, G) or between pyrimidines (C, T). Constitute two thirds of all SNPs.
–Transversion: substitution between a purine and a pyrimidine.
SNP
Instead of using restriction enzymes, these are found by direct sequencingThey are extremely useful for mapping
Markers
Classical Mendelian 100RFLPs 7000SNPs 1.4x106
-----------------------ACGGCTAA
-----------------------ATGGCTAA
SNPs occur every 300-1000 bp along the 3 billion long human genome
Many SNPs have no effect on cell function
Human Genome and SNPs• Human genome is (mostly) sequenced, attention turning to
the evaluation of variation
• Alterations in DNA involving a single base pair are called single nucleotide polymorphisms, or SNPs
• Map of ~1.4 million SNPs (Feb 2001)
• It is estimated that ~60,000 SNPs occur within exons
Goals of SNP Initiatives
• Immediate goals:
– Detection/identification of all SNPs estimated to be present in the human genome
– Interest also in other organisms, e.g. potatoes(!)
– Establishment of SNP Database(s)
SNPs
Humans are genetically >99 per cent identical: it is the tiny percentage that is different
Much of our genetic variation is caused by single-nucleotide differences in our DNA : these are called single nucleotide polymorphisms, or SNPs. As a result, each of us has a unique genotype that typically differs in about three million nucleotides from every other person.
SNPs occur about once every 300-1000 base pairs in the genome, and the frequency of a particular polymorphism tends to remain stable in the population.
Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences".
Longer term goals: Areas of SNP Application
• Gene discovery and mapping
• Association-based candidate polymorphism testing
• Diagnostics/risk profiling
• Response prediction
• Homogeneity testing/study design
• Gene function identification etc.
Polymorphism• Technical definition: most common variant (allele) occurs
with less than 99% frequency in the population
• Also used as a general term for variation
• Many types of DNA polymorphisms, including RFLPs, VNTRs, micro-satellites
• ‘Highly polymorphic’ = many variants
SNPs in Genetic Analysis
• Abundance – lots
• Position – throughout genome
• Haplotype patterns – groups of SNPs may provide exploitable diversity
• Rapid and efficient to genotype
• Increased stability over other types of mutation
• Recombination patterns – e.g. ‘hot spots’
Coding Region SNPs
Occasionally, a SNP may actually cause a disease.
SNPs within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein.
•Types of coding region SNPs–Synonymous: the substitution causes no amino acid change to the protein it produces. This is also called a silent mutation.
–Non-Synonymous: the substitution results in an alteration of the encoded amino acid. A missense mutation changes the protein by causing a change of codon. A nonsense mutation results in a misplaced termination.
–One half of all coding sequence SNPs result in non-synonymous codon changes.
Intergenic SNPs
• Researchers have found that most SNPs are not responsible for a disease state because they are intergenic SNPs
• Instead, they serve as biological markers for pinpointing a disease on the human genome map, because they are usually located near a gene found to be associated with a certain disease.
• Scientists have long known that diseases caused by single genes and inherited according to the laws of Mendel are actually rare.
• Most common diseases, like diabetes, are caused by multiple genes. Finding all of these genes is a difficult task.
• Recently, there has been focus on the idea that all of the genes involved can be traced by using SNPs.
• By comparing the SNP patterns in affected and non-affected individuals—patients with diabetes and healthy controls, for example—scientists can catalog the specific DNA variations that underlie susceptibility for diabetes
Polymorphic Sites Revealed in SequencingPolymorphic Sites Revealed in Sequencing
Medium- and Low-throughput SNP Genotyping
I. SNP Discovery and validation.
A. Data base mining, “resequencing” on microarrays, de novo sequencing of EST libraries.
B. Genotyping of pooled samples for determining heterozygosity.
II. How many SNPs are to be typed in how many samples?
A. What degree of multiplexing is possible for the” before-typing” PCR reactions?
B. What degree of multiplexing is possible for the genotyping reactions?
III. What is the appropriate platform given the size of the project, the budget and the degree of automation desired?
July 2003 NCBI build 34
Red = at least 1 SNP per 100 kb
Black = Gaps in genome coverage
• 92% of genome within 100kb of a SNP• 83% of genome within 50 kb of a SNP• 50% of genome within 15 kb of a SNP• 25% of genome within 5 kb of a SNP
Mapping 100K Coverage: 116,204 SNPs
Chemistry/Demultiplexing/Detection Options in SNP Genotyping
Allele-Specific Hybridization
Allele-Specific Extend + Ligate
Allele-Specific PCR
Sequenom iPlexTM
Mass Spec.
“DASH”,Amplicon Tm
Fluor Res EnergyTransfer-FRET
Luminex 100 FlowCytometry
Single NucleotidePrimer Extension
OligonucleotideLigation Assay
CapillaryElectrophoresis
Homogeneous
Semi-Homogen.
Fluorescence
Solid phasemicroarray
Solid phasemicrospheres
MassSpectrometry
ABI SNPlexTM
ABI SNaPShotTM
FluorescencePolarization
MicroarrayMinisequencing
Perkin-ElmerFP-TDI
ABI TaqmanTM
5’-Nuclease
IlluminaBeadArrayTM
Enzyme Chemistry Demultiplexing Detection Method Platform/Company
A5’
AT
T
C
C5’
ddC-biot or ddA-biot5’
T5’ A
T5’ A
Single Base Primer Extension, “Minisequencing”
Allele-specificPrimer Extension
Allele-specific Primer Extension and Ligation
Allele-specific
Hybridization
T5’ A
T5’ A
LSO
Probes
SBE Primer
5’Short GC
TA
GC
Long GCPCR only: Tm-shift Primers
Enzymatic Options in SNP Genotyping
ddA-biot, dATP, dTTP, dGTP
SNP Genotyping on Beads/Microarrays
Selection of SNPs
Design of PCR and “Tag” SBE/ASPE primers
Preparation of beads with “Anti-Tag” primers
Multiplex PCR
Cyclic SBE/ASPE with biot(fluor.)-ddNTP/dNTP
Capture of products on beads
Signal measurement in flow cytometer/scanner
Pastinen, et al., Gen. Res. 7, 606, 1997
Single Base Extension (SBE) of Targets on MicroarraysSingle Base Extension (SBE) of Targets on Microarrays
SBE (Minisequencing) of Target DNA with SBE (Minisequencing) of Target DNA with Glass-immobilized primersGlass-immobilized primers
Allele-Specific Extension & Identification in CE: Allele-Specific Extension & Identification in CE: “Minisequencing” (ABI SNaPShot“Minisequencing” (ABI SNaPShotTMTM))
dR6G
dR110
Degree of Multiplexing Depends on Resolution in CEDegree of Multiplexing Depends on Resolution in CE
ABI SNaPshot® on 3130xl
Gen. Res. 9: 492, 1999
Fluorescence Polarization
Gen. Res. 9: 492, 1999
SBE (Minisequencing) with Detection by Fluorescence PolarizationSBE (Minisequencing) with Detection by Fluorescence Polarization
PCR Amplification
Single Base Extension
SAP Treatment
MALDI-TOF Mass Spec
Spot on 384-place Chips
Genotyping by SBE and Mass SpectrometryGenotyping by SBE and Mass Spectrometry
Allele-specific Primer Extension (ASPE) with Chain Allele-specific Primer Extension (ASPE) with Chain TerminationTermination
Use of Allele-specific Probes in Genotyping by Melting Use of Allele-specific Probes in Genotyping by Melting Curve Analysis: “DASH”Curve Analysis: “DASH”
One base mismatch Matched
Heterozygote
Nature Biotech. 17: 87, 1999
Intercalating dye
Wang, et al., Biotechniques 39: 885, 2005
Use of Modified TUse of Modified Tmm-shifting Primers in Genotyping-shifting Primers in Genotyping
Bead Arrays: DNA immobilized on silica or polystyrene beads, random array requires decoding steps.
1) Lynx (www.lynxgen.com). In rows. Limited to ca. 20 bases/read.
2) Illumina BeadChip (www.illumina.com). In etched microwells.
3) Luminex coded microspheres (luminexcorp.com). Measurements by flow cytometry.
4) 454 LifeSciences (www.454.com). Clonal amplification and sequencing on 28 µ beads. Minimum 100 bases/read.
Bead Technologies for SNP Bead Technologies for SNP Genotyping/Gene Expression and Massively Genotyping/Gene Expression and Massively
Parallel SequencingParallel Sequencing(not currently supported in CIF)(not currently supported in CIF)
Lynx/Solexa Bead Arrays for Gene Expression and MPSSLynx/Solexa Bead Arrays for Gene Expression and MPSS
Clones on Beads
Brenner et al., PNAS 97: 1665, 2000, and Nature Biotech. 18: 630, 2000
Separate loaded from unloaded beads (FACS), ligate to anti-tag.
1.8 x 1015
unique Tags
tagCompetitively hybridize beads with labeled libraries, then sort by FACS, OR…
Sequence signatures with type IIs res. enz. & labeled, encoded adaptors.
Expression profiling with Illumina BeadChips in Microwells Expression profiling with Illumina BeadChips in Microwells
Gen. Res. 14: 870 & 2347, 2004
Total setup costs, satellite facility <$6000.
HumanRef-8: 24k probes, $100/sample, $50 labeling.
Random loading of beads in etched 3 µm microwells
Decoding by Sequential hybridization: 11012202. 38 = 6561 codes. (48 = 65,536)
5’3’
Illumina Allele Specific Illumina Allele Specific Primer Extension (ASPE) Primer Extension (ASPE) and Ligationand Ligation
ASOs and LSOs
Cy3 and Cy5-labeled universal primers
Luminex coded microspheres and multiplexed assaysLuminex coded microspheres and multiplexed assays
Green laser: Up to 100 different transcripts can be monitored simultaneously in high-throughput by flow cytometry, e.g., with “PR” genes in Arabidopsis, Gen. Res. 11: 1888, 2001 and 217 miRNAs in human cancers, Nature 435: 834, 2005.
Red laser: Coding is in ratio of red and orange fluorescence inside microsphere.
SNP Genotyping Costs by PlatformSNP Genotyping Costs by Platform
Platform #SNPs/ sample
# samples
$Oligo Set/$SNP $Mix/SNP $ per SNP Min $
Illumina (UCLA) 1536 488 0.09 69,892
AB SNPlex (ABI 3730)
48 5000500
72/0.014472/0.0144
0.040.20
0.0780.214
14,840
AB SNaPshot (ABI 3100)
50 500 50/0.10 0.476 0.576 14,400
AB Taqman (ABI 7700)
1 750 310/0.413 0.75 1.21 910
Allele-specific PCR
50 5000500
17.60/0.003517.60/0.035
0.422 0.43
S.-H. Lee et al., Theor. Appl. Genet. 110:167, 2004