60
SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Embed Size (px)

Citation preview

Page 1: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP molecular function, evolution and disease

Md Imtiyaz Hassan, Ph.D

Page 2: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Effect on molecular function

Phenotype

Natural selection

Medical Genetics

Structural Biology Biochemistry

Evolutionary Genetics

Page 3: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Predicting the effect of mutations in proteins

Page 4: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Why is this useful?

Understanding variation in molecular function and structure

Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection

Page 5: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Linkage analysisRare

Page 6: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Classical association studies

ControlDisease

Common

Page 7: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Quantitative trait

Mendelists Biometricians

Forces to maintain variation:

Selection

Mutation

Page 8: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Common disease / Common variant

Trade off (antagonistic pleiotropy) Balancing selectionRecent positive selection Reverse in direction of selection

ExamplesAPOE Alzheimer’s diseaseAGT HypertensionCYP3A HypertensionCAPN10 Type 2 diabetes

Page 9: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Individual human genome is a target for deleterious mutations !

~40% of human Mendelian diseases are due to hypermutable sites

Frequency of deleterious variants is directly proportional to mutation rate (q=/s)

Page 10: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Multiple mostly rare variants

Many deleterious alleles in mutation-selection balance

Examples

Plasma level of HDL-CPlasma level of LDL-CColorectal adenomas

Page 11: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Harmful mutations

Function: damaging

Evolution: deleterious

Phenotype: detrimental

Advantageous pseudogenization (Zhang et al. 2006)

Gain of function disease mutations

Sickle Cell Anemia

Page 12: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D
Page 13: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

N E L V T L T C L A R G F S - P K D V L V R W L R E S A T I T C L V T G F S - P A D V F V Q W M G G S L R L S C V A S G I T - F S G Y D M Q W V T P G L T L T C T V S G F S - L S S Y D M G W V G Q K A K M R C I P E - - - - K G H P V V F W Y G Q E A T L W C E P I - - - - S G H S A V F W Y G Q Q V T L S C F P I - - - - S G H L S L Y W Y R K D V S L T C L V V G F N - P G D I S V E W T G Q K L T L K C Q Q N - - - - F N H D T M Y W Y R D K A T F T C F V V G S D - L K D A H L T W E S K S A T L T C R V S N M V N A D G L E V S W W G A R T S L N C T F S D - - - S A S Q Y F W W Y G A S L Q L R C K Y S Y - - - S A T P Y L F W Y N G A P K L T C L V V D L E S E K N V N V T W N E A T V T L T C V V S N - - A P Y G V N V S W T

Profile

Ala -1.2 1.1 -0.6 -0.8 0.3 ... ... Arg 0.6 -0.3 -0.3 -0.5 0.6 ... ... Asn -1.1 -0.5 -0.5 -0.7 0.4 ... ... Asp -0.9 -0.3 -0.3 -0.5 0.6 ... ... Cys 0.4 -0.5 0.6 0.8 -0.3 ... ... Gln ... ... ... ... ... ... ...

... ... ... ... ... ... ... ...

protein

multiple alignment

profile

Page 14: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

PolyPhen

Page 15: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Prediction rate of damaging substitutions

possibly probably

Disease mutations

Divergence

82% 57%

9% 3%

Polymorphism 27% 15%

Page 16: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

10% of PolyPhen false-positives are due to compensatory substitutions

Page 17: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Neutral mutation model

Human ACCTTGCAAATChimpanzee ACCTTACAAATBaboon ACCTTACAAAT

Prob(TAC->TGC) Prob(TGC->TAC)

Prob(XY1Z->XY

2Z) 64x3 matrix

Page 18: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Strongly detrimental mutations

Page 19: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Effectively neutral mutations

Page 20: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Mildly deleterious mutations

Page 21: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Mildly deleterious mutations

54 genes, 757 individuals

inflammatory response236 genes, 46-47 individuals

DNA repair and cell cycle pathways

518 genes, 90-95 individuals

Page 22: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Fitness and selection coefficient

Wild type New mutation

N1= 4 N2= 3

Fitness 1N1

N2 = 1 – s

Selection coefficient

Page 23: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Classical association studies

ControlDisease

Common

Page 24: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Genetic polymorphism

•Genetic Polymorphism: A difference in DNA sequence among individuals, groups, or populations.

•Genetic Mutation: A change in the nucleotide sequence of a DNA molecule.

Genetic mutations are a kind of genetic polymorphism.

Single nucleotidePolymorphism(point mutation)

Repeat heterogeneity

Genetic Variation

Page 25: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP Single Nucleotide Polymorphisms

•A Single Nucleotide Polymorphism is a source variance in a genome. •A SNP ("snip") is a single base mutation in DNA.

•SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms).

•There are two types of nucleotide base substitutions resulting in SNPs:–Transition: substitution between purines (A, G) or between pyrimidines (C, T). Constitute two thirds of all SNPs.

–Transversion: substitution between a purine and a pyrimidine.

Page 26: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP

Instead of using restriction enzymes, these are found by direct sequencingThey are extremely useful for mapping

Markers

Classical Mendelian 100RFLPs 7000SNPs 1.4x106

-----------------------ACGGCTAA

-----------------------ATGGCTAA

SNPs occur every 300-1000 bp along the 3 billion long human genome

Many SNPs have no effect on cell function

Page 27: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Human Genome and SNPs• Human genome is (mostly) sequenced, attention turning to

the evaluation of variation

• Alterations in DNA involving a single base pair are called single nucleotide polymorphisms, or SNPs

• Map of ~1.4 million SNPs (Feb 2001)

• It is estimated that ~60,000 SNPs occur within exons

Page 28: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Goals of SNP Initiatives

• Immediate goals:

– Detection/identification of all SNPs estimated to be present in the human genome

– Interest also in other organisms, e.g. potatoes(!)

– Establishment of SNP Database(s)

Page 29: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNPs

Humans are genetically >99 per cent identical: it is the tiny percentage that is different

Much of our genetic variation is caused by single-nucleotide differences in our DNA : these are called single nucleotide polymorphisms, or SNPs. As a result, each of us has a unique genotype that typically differs in about three million nucleotides from every other person.

SNPs occur about once every 300-1000 base pairs in the genome, and the frequency of a particular polymorphism tends to remain stable in the population.

Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences".

Page 30: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Longer term goals: Areas of SNP Application

• Gene discovery and mapping

• Association-based candidate polymorphism testing

• Diagnostics/risk profiling

• Response prediction

• Homogeneity testing/study design

• Gene function identification etc.

Page 31: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Polymorphism• Technical definition: most common variant (allele) occurs

with less than 99% frequency in the population

• Also used as a general term for variation

• Many types of DNA polymorphisms, including RFLPs, VNTRs, micro-satellites

• ‘Highly polymorphic’ = many variants

Page 32: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNPs in Genetic Analysis

• Abundance – lots

• Position – throughout genome

• Haplotype patterns – groups of SNPs may provide exploitable diversity

• Rapid and efficient to genotype

• Increased stability over other types of mutation

• Recombination patterns – e.g. ‘hot spots’

Page 33: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Coding Region SNPs

Occasionally, a SNP may actually cause a disease.

SNPs within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein.

•Types of coding region SNPs–Synonymous: the substitution causes no amino acid change to the protein it produces. This is also called a silent mutation.

–Non-Synonymous: the substitution results in an alteration of the encoded amino acid. A missense mutation changes the protein by causing a change of codon. A nonsense mutation results in a misplaced termination.

–One half of all coding sequence SNPs result in non-synonymous codon changes.

Page 34: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Intergenic SNPs

• Researchers have found that most SNPs are not responsible for a disease state because they are intergenic SNPs

• Instead, they serve as biological markers for pinpointing a disease on the human genome map, because they are usually located near a gene found to be associated with a certain disease.

• Scientists have long known that diseases caused by single genes and inherited according to the laws of Mendel are actually rare.

• Most common diseases, like diabetes, are caused by multiple genes. Finding all of these genes is a difficult task.

• Recently, there has been focus on the idea that all of the genes involved can be traced by using SNPs.

• By comparing the SNP patterns in affected and non-affected individuals—patients with diabetes and healthy controls, for example—scientists can catalog the specific DNA variations that underlie susceptibility for diabetes

Page 35: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Polymorphic Sites Revealed in SequencingPolymorphic Sites Revealed in Sequencing

Page 36: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Medium- and Low-throughput SNP Genotyping

I. SNP Discovery and validation.

A. Data base mining, “resequencing” on microarrays, de novo sequencing of EST libraries.

B. Genotyping of pooled samples for determining heterozygosity.

II. How many SNPs are to be typed in how many samples?

A. What degree of multiplexing is possible for the” before-typing” PCR reactions?

B. What degree of multiplexing is possible for the genotyping reactions?

III. What is the appropriate platform given the size of the project, the budget and the degree of automation desired?

Page 37: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D
Page 38: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

July 2003 NCBI build 34

Red = at least 1 SNP per 100 kb

Black = Gaps in genome coverage

• 92% of genome within 100kb of a SNP• 83% of genome within 50 kb of a SNP• 50% of genome within 15 kb of a SNP• 25% of genome within 5 kb of a SNP

Mapping 100K Coverage: 116,204 SNPs

Page 39: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Chemistry/Demultiplexing/Detection Options in SNP Genotyping

Allele-Specific Hybridization

Allele-Specific Extend + Ligate

Allele-Specific PCR

Sequenom iPlexTM

Mass Spec.

“DASH”,Amplicon Tm

Fluor Res EnergyTransfer-FRET

Luminex 100 FlowCytometry

Single NucleotidePrimer Extension

OligonucleotideLigation Assay

CapillaryElectrophoresis

Homogeneous

Semi-Homogen.

Fluorescence

Solid phasemicroarray

Solid phasemicrospheres

MassSpectrometry

ABI SNPlexTM

ABI SNaPShotTM

FluorescencePolarization

MicroarrayMinisequencing

Perkin-ElmerFP-TDI

ABI TaqmanTM

5’-Nuclease

IlluminaBeadArrayTM

Enzyme Chemistry Demultiplexing Detection Method Platform/Company

Page 40: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D
Page 41: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

A5’

AT

T

C

C5’

ddC-biot or ddA-biot5’

T5’ A

T5’ A

Single Base Primer Extension, “Minisequencing”

Allele-specificPrimer Extension

Allele-specific Primer Extension and Ligation

Allele-specific

Hybridization

T5’ A

T5’ A

LSO

Probes

SBE Primer

5’Short GC

TA

GC

Long GCPCR only: Tm-shift Primers

Enzymatic Options in SNP Genotyping

ddA-biot, dATP, dTTP, dGTP

Page 42: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP Genotyping on Beads/Microarrays

Selection of SNPs

Design of PCR and “Tag” SBE/ASPE primers

Preparation of beads with “Anti-Tag” primers

Multiplex PCR

Cyclic SBE/ASPE with biot(fluor.)-ddNTP/dNTP

Capture of products on beads

Signal measurement in flow cytometer/scanner

Page 43: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Pastinen, et al., Gen. Res. 7, 606, 1997

Single Base Extension (SBE) of Targets on MicroarraysSingle Base Extension (SBE) of Targets on Microarrays

Page 44: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SBE (Minisequencing) of Target DNA with SBE (Minisequencing) of Target DNA with Glass-immobilized primersGlass-immobilized primers

Page 45: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Allele-Specific Extension & Identification in CE: Allele-Specific Extension & Identification in CE: “Minisequencing” (ABI SNaPShot“Minisequencing” (ABI SNaPShotTMTM))

Page 46: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

dR6G

dR110

Degree of Multiplexing Depends on Resolution in CEDegree of Multiplexing Depends on Resolution in CE

ABI SNaPshot® on 3130xl

Page 47: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Gen. Res. 9: 492, 1999

Fluorescence Polarization

Page 48: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Gen. Res. 9: 492, 1999

SBE (Minisequencing) with Detection by Fluorescence PolarizationSBE (Minisequencing) with Detection by Fluorescence Polarization

Page 49: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

PCR Amplification

Single Base Extension

SAP Treatment

MALDI-TOF Mass Spec

Spot on 384-place Chips

Genotyping by SBE and Mass SpectrometryGenotyping by SBE and Mass Spectrometry

Page 50: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Allele-specific Primer Extension (ASPE) with Chain Allele-specific Primer Extension (ASPE) with Chain TerminationTermination

Page 51: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D
Page 52: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Use of Allele-specific Probes in Genotyping by Melting Use of Allele-specific Probes in Genotyping by Melting Curve Analysis: “DASH”Curve Analysis: “DASH”

One base mismatch Matched

Heterozygote

Nature Biotech. 17: 87, 1999

Intercalating dye

Page 53: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Wang, et al., Biotechniques 39: 885, 2005

Use of Modified TUse of Modified Tmm-shifting Primers in Genotyping-shifting Primers in Genotyping

Page 54: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Bead Arrays: DNA immobilized on silica or polystyrene beads, random array requires decoding steps.

1) Lynx (www.lynxgen.com). In rows. Limited to ca. 20 bases/read.

2) Illumina BeadChip (www.illumina.com). In etched microwells.

3) Luminex coded microspheres (luminexcorp.com). Measurements by flow cytometry.

4) 454 LifeSciences (www.454.com). Clonal amplification and sequencing on 28 µ beads. Minimum 100 bases/read.

Bead Technologies for SNP Bead Technologies for SNP Genotyping/Gene Expression and Massively Genotyping/Gene Expression and Massively

Parallel SequencingParallel Sequencing(not currently supported in CIF)(not currently supported in CIF)

Page 55: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Lynx/Solexa Bead Arrays for Gene Expression and MPSSLynx/Solexa Bead Arrays for Gene Expression and MPSS

Clones on Beads

Brenner et al., PNAS 97: 1665, 2000, and Nature Biotech. 18: 630, 2000

Separate loaded from unloaded beads (FACS), ligate to anti-tag.

1.8 x 1015

unique Tags

tagCompetitively hybridize beads with labeled libraries, then sort by FACS, OR…

Sequence signatures with type IIs res. enz. & labeled, encoded adaptors.

Page 56: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Expression profiling with Illumina BeadChips in Microwells Expression profiling with Illumina BeadChips in Microwells

Gen. Res. 14: 870 & 2347, 2004

Total setup costs, satellite facility <$6000.

HumanRef-8: 24k probes, $100/sample, $50 labeling.

Random loading of beads in etched 3 µm microwells

Decoding by Sequential hybridization: 11012202. 38 = 6561 codes. (48 = 65,536)

5’3’

Page 57: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Illumina Allele Specific Illumina Allele Specific Primer Extension (ASPE) Primer Extension (ASPE) and Ligationand Ligation

ASOs and LSOs

Cy3 and Cy5-labeled universal primers

Page 58: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

Luminex coded microspheres and multiplexed assaysLuminex coded microspheres and multiplexed assays

Green laser: Up to 100 different transcripts can be monitored simultaneously in high-throughput by flow cytometry, e.g., with “PR” genes in Arabidopsis, Gen. Res. 11: 1888, 2001 and 217 miRNAs in human cancers, Nature 435: 834, 2005.

Red laser: Coding is in ratio of red and orange fluorescence inside microsphere.

Page 59: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

SNP Genotyping Costs by PlatformSNP Genotyping Costs by Platform

Platform #SNPs/ sample

# samples

$Oligo Set/$SNP $Mix/SNP $ per SNP Min $

Illumina (UCLA) 1536 488 0.09 69,892

AB SNPlex (ABI 3730)

48 5000500

72/0.014472/0.0144

0.040.20

0.0780.214

14,840

AB SNaPshot (ABI 3100)

50 500 50/0.10 0.476 0.576 14,400

AB Taqman (ABI 7700)

1 750 310/0.413 0.75 1.21 910

Allele-specific PCR

50 5000500

17.60/0.003517.60/0.035

0.422 0.43

Page 60: SNP molecular function, evolution and disease Md Imtiyaz Hassan, Ph.D

S.-H. Lee et al., Theor. Appl. Genet. 110:167, 2004