17
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Single nucleotide polymorphisms and

applicationsUsman Roshan

BNFO 601

Page 2: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

SNPs

• DNA sequence variations that occur when a single nucleotide is altered.

• Must be present in at least 1% of the population to be a SNP.

• Occur every 100 to 300 bases along the 3 billion-base human genome.

• Many have no effect on cell function but some could affect disease risk and drug response.

Page 3: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Toy example

Page 4: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

SNPs on the chromosome

SNP

Chromosome

Gene

Page 5: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Bi-allelic SNPs

• Most SNPs have one of two nucleotides at a given position

• For example:– A/G denotes the varying nucleotide as

either A or G. We call each of these an allele

– Most SNPs have two alleles (bi-allelic)

Page 6: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

SNP genotype

• We inherit two copies of each chromosome (one from each parent)

• For a given SNP the genotype defines the type of alleles we carry

• Example: for the SNP A/G one’s genotype may be– AA if both copies of the chromosome have A– GG if both copies of the chromosome have G– AG or GA if one copy has A and the other has G– The first two cases are called homozygous and latter

two are heterozygous

Page 7: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

SNP genotyping

Page 8: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Real SNPs

• SNP consortium: snp.cshl.org

• SNPedia: www.snpedia.com

Page 9: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Application of SNPs: association with disease

• Experimental design to detect cancer associated SNPs:– Pick random humans with and without

cancer (say breast cancer)– Perform SNP genotyping– Look for associated SNPs – Also called genome-wide association study

Page 10: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Case-control example

• Study of 100 people:– Case: 50 subjects with

cancer

– Control: 50 subjects without cancer

• Count number of alleles and form a contingency table

#Allele1 #Allele2

Case 10 90

Control 2 98

Page 11: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Effect of population structure on genome-wide association

studies• Suppose our sample is drawn from a

population of two groups, I and II• Assume that group I has a majority of allele

type I and group II has mostly the second allele.

• Further assume that most case subjects belong to group I and most control to group II

• This leads to the false association that the major allele is associated with the disease

Page 12: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Effect of population structure on genome-wide association

studies• We can correct this effect if case and

control are equally sampled from all sub-populations

• To do this we need to know the population structure

Page 13: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Population structure prediction

• Treated as an unsupervised learning problem (i.e. clustering)

Page 14: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Clustering

• Suppose we want to cluster n vectors in Rd into two groups. Define C1 and C2 as the two groups.

• Our objective is to find C1 and C2 that minimize

where mi is the mean of class Ci

|| x j −mi ||2

x j ∈C i

∑i=1

2

Page 15: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

K-means algorithm for two clusters

Input: Algorithm:

1. Initialize: assign xi to C1 or C2 with equal probability and compute means:

2. Recompute clusters: assign xi to C1 if ||xi-m1||<||xi-m2||, otherwise assign to C2

3. Recompute means m1 and m2

4. Compute objective

5. Compute objective of new clustering. If difference is smaller than then stop, otherwise go to step 2.

x i ∈ Rd ,i =1K n

m1 =1

C1

x ixi ∈C1

m2 =1

C2

x ixi ∈C2

|| x j −mi ||2

x j ∈C i

∑i=1

2

δ

Page 16: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

K-means

• Is it guaranteed to find the clustering which optimizes the objective?

• It is guaranteed to find a local optimal

• We can prove that the objective decreases with subsequence iterations

Page 17: Single nucleotide polymorphisms and applications Usman Roshan BNFO 601

Proof sketch of convergence of k-means

|| x j −mi ||2

x j ∈C i

∑i=1

2

∑ ≥

|| x j −mi ||2

x j ∈C i*

∑i=1

2

∑ ≥

|| x j −mi* ||2

x j ∈C i*

∑i=1

2

Justification of first inequality: by assigning xj to the closest mean the objective decreases or stays the same

Justification of second inequality: for a given cluster its mean minimizes squared error loss