SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP molecular function, evolution and disease

Md Imtiyaz Hassan, Ph.D

Effect on molecular function

Phenotype

Natural selection

Medical Genetics

Structural Biology Biochemistry

Evolutionary Genetics

Predicting the effect of mutations in proteins

Why is this useful?

Understanding variation in molecular function and structure

Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection

Linkage analysisRare

Classical association studies

ControlDisease

Common

Quantitative trait

Mendelists Biometricians

Forces to maintain variation:

Selection

Mutation

Common disease / Common variant

Trade off (antagonistic pleiotropy) Balancing selectionRecent positive selection Reverse in direction of selection

ExamplesAPOE Alzheimer’s diseaseAGT HypertensionCYP3A HypertensionCAPN10 Type 2 diabetes

Individual human genome is a target for deleterious mutations !

~40% of human Mendelian diseases are due to hypermutable sites

Frequency of deleterious variants is directly proportional to mutation rate (q=/s)

Multiple mostly rare variants

Many deleterious alleles in mutation-selection balance

Examples

Plasma level of HDL-CPlasma level of LDL-CColorectal adenomas

Harmful mutations

Function: damaging

Evolution: deleterious

Phenotype: detrimental

Advantageous pseudogenization (Zhang et al. 2006)

Gain of function disease mutations

Sickle Cell Anemia

N E L V T L T C L A R G F S - P K D V L V R W L R E S A T I T C L V T G F S - P A D V F V Q W M G G S L R L S C V A S G I T - F S G Y D M Q W V T P G L T L T C T V S G F S - L S S Y D M G W V G Q K A K M R C I P E - - - - K G H P V V F W Y G Q E A T L W C E P I - - - - S G H S A V F W Y G Q Q V T L S C F P I - - - - S G H L S L Y W Y R K D V S L T C L V V G F N - P G D I S V E W T G Q K L T L K C Q Q N - - - - F N H D T M Y W Y R D K A T F T C F V V G S D - L K D A H L T W E S K S A T L T C R V S N M V N A D G L E V S W W G A R T S L N C T F S D - - - S A S Q Y F W W Y G A S L Q L R C K Y S Y - - - S A T P Y L F W Y N G A P K L T C L V V D L E S E K N V N V T W N E A T V T L T C V V S N - - A P Y G V N V S W T

Profile

Ala -1.2 1.1 -0.6 -0.8 0.3 ... ... Arg 0.6 -0.3 -0.3 -0.5 0.6 ... ... Asn -1.1 -0.5 -0.5 -0.7 0.4 ... ... Asp -0.9 -0.3 -0.3 -0.5 0.6 ... ... Cys 0.4 -0.5 0.6 0.8 -0.3 ... ... Gln ... ... ... ... ... ... ...

... ... ... ... ... ... ... ...

protein

multiple alignment

profile

PolyPhen

Prediction rate of damaging substitutions

possibly probably

Disease mutations

Divergence

82% 57%

9% 3%

Polymorphism 27% 15%

10% of PolyPhen false-positives are due to compensatory substitutions

Neutral mutation model

Human ACCTTGCAAATChimpanzee ACCTTACAAATBaboon ACCTTACAAAT

Prob(TAC->TGC) Prob(TGC->TAC)

Prob(XY1Z->XY

2Z) 64x3 matrix

Strongly detrimental mutations

Effectively neutral mutations

Mildly deleterious mutations

Mildly deleterious mutations

54 genes, 757 individuals

inflammatory response236 genes, 46-47 individuals

DNA repair and cell cycle pathways

518 genes, 90-95 individuals

Fitness and selection coefficient

Wild type New mutation

N1= 4 N2= 3

Fitness 1N1

N2 = 1 – s

Selection coefficient

Classical association studies

ControlDisease

Common

Genetic polymorphism

•Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations.

•Genetic Mutation: A change in the nucleotide sequence of a DNA molecule.

Genetic mutations are a kind of genetic polymorphism.

Single nucleotidePolymorphism(point mutation)

Repeat heterogeneity

Genetic Variation

SNP Single Nucleotide Polymorphisms

•A Single Nucleotide Polymorphism is a source variance in a genome. •A SNP ("snip") is a single base mutation in DNA.

•SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms).

•There are two types of nucleotide base substitutions resulting in SNPs:–Transition: substitution between purines (A, G) or between pyrimidines (C, T). Constitute two thirds of all SNPs.

–Transversion: substitution between a purine and a pyrimidine.

SNP

Instead of using restriction enzymes, these are found by direct sequencingThey are extremely useful for mapping

Markers

Classical Mendelian 100RFLPs 7000SNPs 1.4x106

-----------------------ACGGCTAA

-----------------------ATGGCTAA

SNPs occur every 300-1000 bp along the 3 billion long human genome

Many SNPs have no effect on cell function

Human Genome and SNPs• Human genome is (mostly) sequenced, attention turning to

the evaluation of variation

• Alterations in DNA involving a single base pair are called single nucleotide polymorphisms, or SNPs

• Map of ~1.4 million SNPs (Feb 2001)

• It is estimated that ~60,000 SNPs occur within exons

Goals of SNP Initiatives

• Immediate goals:

– Detection/identification of all SNPs estimated to be present in the human genome

– Interest also in other organisms, e.g. potatoes(!)

– Establishment of SNP Database(s)

SNPs

Humans are genetically >99 per cent identical: it is the tiny percentage that is different

Much of our genetic variation is caused by single-nucleotide differences in our DNA : these are called single nucleotide polymorphisms, or SNPs. As a result, each of us has a unique genotype that typically differs in about three million nucleotides from every other person.

SNPs occur about once every 300-1000 base pairs in the genome, and the frequency of a particular polymorphism tends to remain stable in the population.

Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences".

Longer term goals: Areas of SNP Application

• Gene discovery and mapping

• Association-based candidate polymorphism testing

• Diagnostics/risk profiling

• Response prediction

• Homogeneity testing/study design

• Gene function identification etc.

Polymorphism• Technical definition: most common variant (allele) occurs

with less than 99% frequency in the population

• Also used as a general term for variation

• Many types of DNA polymorphisms, including RFLPs, VNTRs, micro-satellites

• ‘Highly polymorphic’ = many variants

SNPs in Genetic Analysis

• Abundance – lots

• Position – throughout genome

• Haplotype patterns – groups of SNPs may provide exploitable diversity

• Rapid and efficient to genotype

• Increased stability over other types of mutation

• Recombination patterns – e.g. ‘hot spots’

Coding Region SNPs

Occasionally, a SNP may actually cause a disease.

SNPs within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein.

•Types of coding region SNPs–Synonymous: the substitution causes no amino acid change to the protein it produces. This is also called a silent mutation.

–Non-Synonymous: the substitution results in an alteration of the encoded amino acid. A missense mutation changes the protein by causing a change of codon. A nonsense mutation results in a misplaced termination.

–One half of all coding sequence SNPs result in non-synonymous codon changes.

Intergenic SNPs

• Researchers have found that most SNPs are not responsible for a disease state because they are intergenic SNPs

• Instead, they serve as biological markers for pinpointing a disease on the human genome map, because they are usually located near a gene found to be associated with a certain disease.

• Scientists have long known that diseases caused by single genes and inherited according to the laws of Mendel are actually rare.

• Most common diseases, like diabetes, are caused by multiple genes. Finding all of these genes is a difficult task.

• Recently, there has been focus on the idea that all of the genes involved can be traced by using SNPs.

• By comparing the SNP patterns in affected and non-affected individuals—patients with diabetes and healthy controls, for example—scientists can catalog the specific DNA variations that underlie susceptibility for diabetes

Polymorphic Sites Revealed in SequencingPolymorphic Sites Revealed in Sequencing

Medium- and Low-throughput SNP Genotyping

I. SNP Discovery and validation.

A. Data base mining, “resequencing” on microarrays, de novo sequencing of EST libraries.

B. Genotyping of pooled samples for determining heterozygosity.

II. How many SNPs are to be typed in how many samples?

A. What degree of multiplexing is possible for the” before-typing” PCR reactions?

B. What degree of multiplexing is possible for the genotyping reactions?

III. What is the appropriate platform given the size of the project, the budget and the degree of automation desired?

July 2003 NCBI build 34

Red = at least 1 SNP per 100 kb

Black = Gaps in genome coverage

• 92% of genome within 100kb of a SNP• 83% of genome within 50 kb of a SNP• 50% of genome within 15 kb of a SNP• 25% of genome within 5 kb of a SNP

Mapping 100K Coverage: 116,204 SNPs

Chemistry/Demultiplexing/Detection Options in SNP Genotyping

Allele-Specific Hybridization

Allele-Specific Extend + Ligate

Allele-Specific PCR

Sequenom iPlexTM

Mass Spec.

“DASH”,Amplicon Tm

Fluor Res EnergyTransfer-FRET

Luminex 100 FlowCytometry

Single NucleotidePrimer Extension

OligonucleotideLigation Assay

CapillaryElectrophoresis

Homogeneous

Semi-Homogen.

Fluorescence

Solid phasemicroarray

Solid phasemicrospheres

MassSpectrometry

ABI SNPlexTM

ABI SNaPShotTM

FluorescencePolarization

MicroarrayMinisequencing

Perkin-ElmerFP-TDI

ABI TaqmanTM

5’-Nuclease

IlluminaBeadArrayTM

Enzyme Chemistry Demultiplexing Detection Method Platform/Company

A5’

AT

T

C

C5’

ddC-biot or ddA-biot5’

T5’ A

T5’ A

Single Base Primer Extension, “Minisequencing”

Allele-specificPrimer Extension

Allele-specific Primer Extension and Ligation

Allele-specific

Hybridization

T5’ A

T5’ A

LSO

Probes

SBE Primer

5’Short GC

TA

GC

Long GCPCR only: Tm-shift Primers

Enzymatic Options in SNP Genotyping

ddA-biot, dATP, dTTP, dGTP

SNP Genotyping on Beads/Microarrays

Selection of SNPs

Design of PCR and “Tag” SBE/ASPE primers

Preparation of beads with “Anti-Tag” primers

Multiplex PCR

Cyclic SBE/ASPE with biot(fluor.)-ddNTP/dNTP

Capture of products on beads

Signal measurement in flow cytometer/scanner

Pastinen, et al., Gen. Res. 7, 606, 1997

Single Base Extension (SBE) of Targets on MicroarraysSingle Base Extension (SBE) of Targets on Microarrays

SBE (Minisequencing) of Target DNA with SBE (Minisequencing) of Target DNA with Glass-immobilized primersGlass-immobilized primers

Allele-Specific Extension & Identification in CE: Allele-Specific Extension & Identification in CE: “Minisequencing” (ABI SNaPShot“Minisequencing” (ABI SNaPShotTMTM))

dR6G

dR110

Degree of Multiplexing Depends on Resolution in CEDegree of Multiplexing Depends on Resolution in CE

ABI SNaPshot® on 3130xl

Gen. Res. 9: 492, 1999

Fluorescence Polarization

Gen. Res. 9: 492, 1999

SBE (Minisequencing) with Detection by Fluorescence PolarizationSBE (Minisequencing) with Detection by Fluorescence Polarization

PCR Amplification

Single Base Extension

SAP Treatment

MALDI-TOF Mass Spec

Spot on 384-place Chips

Genotyping by SBE and Mass SpectrometryGenotyping by SBE and Mass Spectrometry

Allele-specific Primer Extension (ASPE) with Chain Allele-specific Primer Extension (ASPE) with Chain TerminationTermination

Use of Allele-specific Probes in Genotyping by Melting Use of Allele-specific Probes in Genotyping by Melting Curve Analysis: “DASH”Curve Analysis: “DASH”

One base mismatch Matched

Heterozygote

Nature Biotech. 17: 87, 1999

Intercalating dye

Wang, et al., Biotechniques 39: 885, 2005

Use of Modified TUse of Modified Tmm-shifting Primers in Genotyping-shifting Primers in Genotyping

Bead Arrays: DNA immobilized on silica or polystyrene beads, random array requires decoding steps.

1) Lynx (www.lynxgen.com). In rows. Limited to ca. 20 bases/read.

2) Illumina BeadChip (www.illumina.com). In etched microwells.

3) Luminex coded microspheres (luminexcorp.com). Measurements by flow cytometry.

4) 454 LifeSciences (www.454.com). Clonal amplification and sequencing on 28 µ beads. Minimum 100 bases/read.

Bead Technologies for SNP Bead Technologies for SNP Genotyping/Gene Expression and Massively Genotyping/Gene Expression and Massively

Parallel SequencingParallel Sequencing(not currently supported in CIF)(not currently supported in CIF)

Lynx/Solexa Bead Arrays for Gene Expression and MPSSLynx/Solexa Bead Arrays for Gene Expression and MPSS

Clones on Beads

Brenner et al., PNAS 97: 1665, 2000, and Nature Biotech. 18: 630, 2000

Separate loaded from unloaded beads (FACS), ligate to anti-tag.

1.8 x 1015

unique Tags

tagCompetitively hybridize beads with labeled libraries, then sort by FACS, OR…

Sequence signatures with type IIs res. enz. & labeled, encoded adaptors.

Expression profiling with Illumina BeadChips in Microwells Expression profiling with Illumina BeadChips in Microwells

Gen. Res. 14: 870 & 2347, 2004

Total setup costs, satellite facility <$6000.

HumanRef-8: 24k probes, $100/sample, $50 labeling.

Random loading of beads in etched 3 µm microwells

Decoding by Sequential hybridization: 11012202. 38 = 6561 codes. (48 = 65,536)

5’3’

Illumina Allele Specific Illumina Allele Specific Primer Extension (ASPE) Primer Extension (ASPE) and Ligationand Ligation

ASOs and LSOs

Cy3 and Cy5-labeled universal primers

Luminex coded microspheres and multiplexed assaysLuminex coded microspheres and multiplexed assays

Green laser: Up to 100 different transcripts can be monitored simultaneously in high-throughput by flow cytometry, e.g., with “PR” genes in Arabidopsis, Gen. Res. 11: 1888, 2001 and 217 miRNAs in human cancers, Nature 435: 834, 2005.

Red laser: Coding is in ratio of red and orange fluorescence inside microsphere.

SNP Genotyping Costs by PlatformSNP Genotyping Costs by Platform

Platform #SNPs/ sample

# samples

$Oligo Set/$SNP $Mix/SNP $ per SNP Min $

Illumina (UCLA) 1536 488 0.09 69,892

AB SNPlex (ABI 3730)

48 5000500

72/0.014472/0.0144

0.040.20

0.0780.214

14,840

AB SNaPshot (ABI 3100)

50 500 50/0.10 0.476 0.576 14,400

AB Taqman (ABI 7700)

1 750 310/0.413 0.75 1.21 910

Allele-specific PCR

50 5000500

17.60/0.003517.60/0.035

0.422 0.43

S.-H. Lee et al., Theor. Appl. Genet. 110:167, 2004

Documents

SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D