Upload
jerod
View
28
Download
1
Tags:
Embed Size (px)
DESCRIPTION
The Human Genome What’s in it? How do we know?. Gary Benson Department of Computer Science Department of Biology Program in Bioinformatics Boston University. Outline of Talk. Protein Genes SNPs Haplotypes Finding a Disease Locus. Size of the Genomes. bacteria. yeast. - PowerPoint PPT Presentation
Citation preview
The Human GenomeThe Human Genome
What’s in it? What’s in it?
How do we know?How do we know?
Gary BensonGary Benson
Department of Computer ScienceDepartment of Computer ScienceDepartment of BiologyDepartment of Biology
Program in BioinformaticsProgram in BioinformaticsBoston UniversityBoston University
Outline of TalkOutline of Talk
• Protein GenesProtein Genes• SNPs SNPs • HaplotypesHaplotypes• Finding a Disease LocusFinding a Disease Locus
Size of the GenomesSize of the Genomes
0 500 1000 1500 2000 2500 3000 3500
Human
Maize
Rice
Arabidopsis
Drosophila
C. elegans
S. cerevisiae
E. coli
Millions of Basepairs
bacteriabacteria
yeastyeast
round wormround worm
fruit flyfruit fly
flowering plantflowering plant
The Human GenomeThe Human Genome
What the letters stand forWhat the letters stand for
DNA has four chemical subunits, called DNA has four chemical subunits, called nucleotide basesnucleotide bases abbreviated A, C, G, T.abbreviated A, C, G, T.
GATTACAGATTACA
http://http://en.wikipedia.org/wiki/Nucleotideen.wikipedia.org/wiki/Nucleotide
What’s in the Genome?What’s in the Genome?• Chromosomes – 23 pairsChromosomes – 23 pairs
– Genes • Protein genes• RNA genes• MicroRNA genes
– Repeats• Tandem repeats• Inverted repeats• Transposons• Segmental duplications
– Regulatory regions • Promoters• Transcription factor binding sites
Protein GenesProtein Genes
A A protein geneprotein gene contains the genetic code for a protein. The contains the genetic code for a protein. The production of protein involves production of protein involves transcriptiontranscription (copying DNA to (copying DNA to RNA) and RNA) and translation translation (using RNA code to produce a protein).(using RNA code to produce a protein).
http://www.slic2.wsu.edu:82/hurlbert/micro101/images/TransTranscrip.gifhttp://www.slic2.wsu.edu:82/hurlbert/micro101/images/TransTranscrip.gif
http://nobelprize.org/medicine/educational/dna/a/http://nobelprize.org/medicine/educational/dna/a/translation/polysome_em.htmltranslation/polysome_em.html
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Miller_Beatty3.jpgMiller_Beatty3.jpg
TranscriptionTranscription TranslationTranslation
Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..
Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..
Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..
Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..
Finding Protein GenesFinding Protein GenesBefore the sequencing of genomes, protein genes were found Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are experimentally. Now, new genes are predicted computationally predicted computationally using ausing a gene model gene model..
Building a Gene ModelBuilding a Gene ModelGene models for prediction are based on the Gene models for prediction are based on the structurestructure of genes in of genes in DNA and their messenger RNAs (DNA and their messenger RNAs (mRNAsmRNAs). This includes ). This includes exons, exons, intronsintrons, , promoterspromoters, and the , and the polyadenylation signalpolyadenylation signal. .
http://xray.bmc.uu.se/Courses/Bke2/Exercises/Exercise_answers/pre_mRNA_processing.gifhttp://xray.bmc.uu.se/Courses/Bke2/Exercises/Exercise_answers/pre_mRNA_processing.gif
ExonsExonsIn this example, In this example, EXONSEXONS are uppercase and introns are lowercase. Exons contain the are uppercase and introns are lowercase. Exons contain the code for a protein, introns code for a protein, introns interruptinterrupt the exons. Before translation, the exons. Before translation, introns are removedintrons are removed from the messenger RNA.from the messenger RNA.
DNA:DNA:……ACTGCTACAGACTGCTACAGtctattgatctattgaGAACAACATAGGAACAACATAGtcacgaacttaacgtgcatcacgaacttaacgtgcaGTTTAACAGCACGGTTTAACAGCACGtcttctcgaagggca…cgaagggca…
RNA (before removal of introns):RNA (before removal of introns):……ACUGCUACAGACUGCUACAGucuauugaucuauugaGAACAACAUAGGAACAACAUAGucacgaacuuaacgugcaucacgaacuuaacgugcaGUUUAACAGCAGUUUAACAGCACGCGucucgaagggca…ucucgaagggca…
RNA (after removal of introns):RNA (after removal of introns):……ACUGCUACAGGAACAACAUAGGUUUAACAGCACG…ACUGCUACAGGAACAACAUAGGUUUAACAGCACG…
The sequence of an exon contains The sequence of an exon contains codons. codons. Each codon is a Each codon is a triplet of nucleotidestriplet of nucleotides which codes for a which codes for a single amino acidsingle amino acid. Amino . Amino acids are the building blocks of a protein.acids are the building blocks of a protein.
Finding ExonsFinding Exons
http://en.wikipedia.org/wiki/Genetic_codehttp://en.wikipedia.org/wiki/Genetic_code
Genetic CodeGenetic Code. Each codon specifies one of . Each codon specifies one of twenty amino acidstwenty amino acids. Three codons . Three codons
are are stop codonsstop codons, which specify the end of translation., which specify the end of translation.
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/code.gifhttp://www.emc.maricopa.edu/faculty/farabee/BIOBK/code.gif
An An open reading frameopen reading frame (ORF), (ORF), is a sequence of codons that is a sequence of codons that does does not contain a stop codon.not contain a stop codon.
Open Reading Frame (ORF)Open Reading Frame (ORF)
http://en.wikipedia.org/wiki/Genetic_codehttp://en.wikipedia.org/wiki/Genetic_code
alaninealanine
threoninethreonine
glutamic acidglutamic acid
leucineleucine
argininearginine
serineserine
STOP!STOP!
Sequence:Sequence:acggacucacggacucuaguagccccuaauaaugugugaugacgaccgacugaugacacauaguaggguaauaaauucgcucauucgcuc
Even though this sequence contains stop codons, they are not present in allEven though this sequence contains stop codons, they are not present in all reading framesreading frames..
frame +1frame +1acg gac ucacg gac ucu agu agc cc cua aua aug ug ugauga cga c cga cug aug aca ca uaguag g gua aua aau ucg cucau ucg cuc
frame +2frame +2a cgg acu ca cgg acu cua gua gcc cc uaa uaa ugugu gau gac gac c gac uga uga cacau agu agg g uaa uaa auu cgc ucauu cgc uc
frame +3frame +3ac gga cuc ac gga cuc uaguag cc ccu aau aau gu gug aug acg accg acu gau gac ac aua gua gggu aau aaa uuc gcu ca uuc gcu c
Very short ORFs are unlikely.Very short ORFs are unlikely.
Finding ExonsFinding Exons
Finding IntronsFinding IntronsIntrons usually start at a G – T boundary and end at an A – G Introns usually start at a G – T boundary and end at an A – G boundary. boundary.
Sequence:Sequence:acggacucuagccuaaugugacgacugacauagguaaauucgcucacggacucuagccuaaugugacgacugacauagguaaauucgcuc
A gene can contain open reading frames connected across stop A gene can contain open reading frames connected across stop codons by an intron codons by an intron
frame +1frame +1acg gac ucu agc cua aug acg gac ucu agc cua aug ugauga cga cug aca cga cug aca uaguag gua aau ucg cuc gua aau ucg cucframe +3frame +3ac gga cuc ac gga cuc uaguag ccu aau gug acg acu gac aua ggu aaa uuc gcu c ccu aau gug acg acu gac aua ggu aaa uuc gcu c
Finding ExonsFinding Exons
How many genes are there?How many genes are there?
EstimatesEstimates• pre 2000: pre 2000: 100,000100,000 based on estimates of required number of based on estimates of required number of
genes to account for human complexity genes to account for human complexity• 2001: 2001: 30,000 – 40,00030,000 – 40,000 based on first draft of human genome based on first draft of human genome• 2003: 2003: 23,000 – 24,50023,000 – 24,500 based on gene prediction computer based on gene prediction computer
programs programsWhy so low?Why so low?• alternate splicing of exonsalternate splicing of exons• complex regulatory mechanismscomplex regulatory mechanisms• inability to predict genes which are unlike those seen beforeinability to predict genes which are unlike those seen before
http://www.ornl.gov/sci/techresources/Human_Genome/faq/genenumber.shtmlhttp://www.ornl.gov/sci/techresources/Human_Genome/faq/genenumber.shtml
RNA GenesRNA Genes
RNA genesRNA genes do not code for proteins. Instead, the RNA molecule do not code for proteins. Instead, the RNA molecule itself is functional in the cell. itself is functional in the cell.
Examples include:Examples include:1.1. Ribosomal RNARibosomal RNA – these molecules form the major – these molecules form the major
component of the protein building machinery component of the protein building machinery 2.2. Transfer RNATransfer RNA – work with ribosomal RNA to insert correct – work with ribosomal RNA to insert correct
amino acids into growing proteinsamino acids into growing proteins3.3. MicroRNAMicroRNA – a newly discovered class of RNA which helps – a newly discovered class of RNA which helps
regulate gene expression. regulate gene expression.
RibosomeRibosome
http://http://www.ncbi.nlm.nih.www.ncbi.nlm.nih.gov/Class/gov/Class/NAWBIS/NAWBIS/Modules/RNA/Modules/RNA/images/images/fig_rna12.jpgfig_rna12.jpg
http://nobelprize.org/medicine/educational/dna/a/http://nobelprize.org/medicine/educational/dna/a/translation/polysome_em.htmltranslation/polysome_em.html
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Miller_Beatty3.jpgMiller_Beatty3.jpg
TranscriptionTranscription TranslationTranslation
RNA GenesRNA Genes
MicroRNAsMicroRNAs are are shortshort and show little or no conservation of and show little or no conservation of sequence.sequence.
Unlike protein genes, RNA genes Unlike protein genes, RNA genes do not containdo not contain codons or open codons or open reading frames. But, they do contain reading frames. But, they do contain inverted repeatsinverted repeats..
Inverted Repeats (IRs)Inverted Repeats (IRs)
RNARNA
G A C U U G A U C A A G U CG A C U U G A U C A A G U C
complementedcomplemented
reversedreversed
Two patterns, one the Two patterns, one the reverse complementreverse complement of the other of the other
IR NomenclatureIR Nomenclature
Left armLeft arm Right armRight arm
SpacerSpacer
RNARNA
G A C U U G A U C A A G U CG A C U U G A U C A A G U C
Stem-Loop StructureStem-Loop Structure
CCAAGGUUUUCCAAGG
GGUUCCAAAAGGUUCC
SpacerSpacer
Left armLeft arm Right armRight arm
Structure forms by pairing of complementary basesStructure forms by pairing of complementary bases
MicroRNAMicroRNA
MicroRNAs come from a precursor that contains a stem-loop.MicroRNAs come from a precursor that contains a stem-loop.
http://www.ma.uni-heidelberg.de/apps/zmf/argonaute/interface/mirna.jpeghttp://www.ma.uni-heidelberg.de/apps/zmf/argonaute/interface/mirna.jpeg
Detection of Approximate Inverted RepeatsDetection of Approximate Inverted Repeats
Human Chr. 3 ~173,291,101 Human Chr. 3 ~173,291,101
AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGT GCATTTCCCC CTACGT
Detection of Approximate Inverted RepeatsDetection of Approximate Inverted Repeats
Human Chr. 3 ~173,291,101 Human Chr. 3 ~173,291,101
AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGTGCATTTCCCC CTACGT
Arms are 72 nt long, spacer is 42bp longArms are 72 nt long, spacer is 42bp long
The Problem: Find the Inverted RepeatThe Problem: Find the Inverted Repeat
Human Chr. 3 ~173,291,101 Human Chr. 3 ~173,291,101
AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG ATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGTGCATTTCCCC CTACGT
Single Nucleotide Polymorphisms (SNPs)Single Nucleotide Polymorphisms (SNPs)
A A SNPSNP is a single position in the genome (a is a single position in the genome (a locuslocus) that is ) that is not the same in all not the same in all peoplepeople. Some people have one type of nucleotide and other people have a . Some people have one type of nucleotide and other people have a different nucleotide. Differences in the population at a single locus are different nucleotide. Differences in the population at a single locus are called called polymorphismspolymorphisms and the individual types are called and the individual types are called allelesalleles. .
SNPs are found experimentallySNPs are found experimentally
aaccaattttcccctt
aaccggttttaatttt
SNPsSNPs
HaplotypesHaplotypes
A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.
Shown are SNPS on two chromosomes in each individual.Shown are SNPS on two chromosomes in each individual.
aaccggttttccaatt
aaccaattttccaatt
ttccggttttccaatt
aaccaaggaattaatt
aaccaattttcccctt
aaccaattttcccctt
aattaaggttccccaa
aaccaaggttccccaa
ttccaattttccaatt
aaccaattttccaaaa
HaplotypesHaplotypes
A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.
Homozygous (same alleles)Homozygous (same alleles)aaccggttttccaatt
aaccaattttccaatt
ttccggttttccaatt
aaccaaggaattaatt
aaccaattttcccctt
aaccaattttcccctt
aattaaggttccccaa
aaccaaggttccccaa
ttccaattttccaatt
aaccaattttccaaaa
HaplotypesHaplotypes
A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.
Heterozygous (different alleles)Heterozygous (different alleles)aaccggttttccaatt
aaccaattttccaatt
ttccggttttccaatt
aaccaaggaattaatt
aaccaattttcccctt
aaccaattttcccctt
aattaaggttccccaa
aaccaaggttccccaa
ttccaattttccaatt
aaccaattttccaaaa
HaplotypesHaplotypes
A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.
Rare allelesRare allelesaaccggttttccaatt
aaccaattttccaatt
ttccggttttccaatt
aaccaaggaattaatt
aaccaattttcccctt
aaccaattttcccctt
aattaaggttccccaa
aaccaaggaaccccaa
ttccaattttccaatt
aaccaattttccaaaa
HaplotypesHaplotypes
A A haplotypehaplotype is a collection of SNP alleles on a single is a collection of SNP alleles on a single chromosome in an individual.chromosome in an individual.
Strong linkage (usually occur together)Strong linkage (usually occur together)aaccggttttccaatt
aaccaattttccaatt
ttccggttttccaatt
aaccaaggaattaatt
aaccaattttcccctt
aaccaattttcccctt
aattaaggttccccaa
aaccaaggttccccaa
ttccaattttccaatt
aaccaattttccaaaa
Linkage AnalysisLinkage Analysis
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom daddadaaccaaggttccccaa`̀
aaccaaggaaccaatt
childchild
recombination and recombination and inheritanceinheritance
Linkage AnalysisLinkage Analysis
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom daddadaaccaaggttccccaa`̀
aaccaaggaaccaatt
childchildrecombination in recombination in the mother’s the mother’s chromosomeschromosomes
Linkage AnalysisLinkage Analysis
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom daddadaaccaaggttccccaa`̀
aaccaaggaaccaatt
childchildrecombination in recombination in the father’s the father’s chromosomeschromosomes
Linkage AnalysisLinkage Analysis
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom daddadaaccaaggttccccaa`̀
aaccaaggaaccaatt
childchild two to three crossovers per two to three crossovers per chromosome per generationchromosome per generation
Linkage AnalysisLinkage Analysis
Key point: Key point: Alleles Alleles that are physically that are physically close togetherclose together tend to be tend to be inherited together inherited together because the chance of a crossover between because the chance of a crossover between them is small. They them is small. They exhibit strong linkageexhibit strong linkage..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom daddadaaccaaggttccccaa`̀
aaccaaggaaccaatt
childchild
Finding an Unknown Disease LocusFinding an Unknown Disease Locus
The The location on the genome of many diseases is unknownlocation on the genome of many diseases is unknown. SNPs . SNPs and haplotypes are being used to search for disease loci using and haplotypes are being used to search for disease loci using linkage analysis. linkage analysis.
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
Linkage Analysis – Linkage Analysis – Dominant ModelDominant Model
Assume the disease is caused by a Assume the disease is caused by a dominant alleledominant allele, meaning , meaning one one copy is enough to cause the diseasecopy is enough to cause the disease..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
SNP alleles in SNP alleles in father that are not father that are not in motherin mother
Linkage Analysis – Linkage Analysis – Dominant ModelDominant Model
Assume the disease is caused by a Assume the disease is caused by a dominant alleledominant allele, meaning , meaning one one copy is enough to cause the diseasecopy is enough to cause the disease..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
SNP allele in child, SNP allele in child, inherited from inherited from father with diseasefather with disease
Linkage Analysis – Linkage Analysis – Dominant ModelDominant Model
Assume the disease is caused by a Assume the disease is caused by a dominant alleledominant allele, meaning , meaning one one copy is enough to cause the diseasecopy is enough to cause the disease..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
SNP allele and SNP allele and disease are linked disease are linked indicating possible indicating possible disease locus.disease locus.
Linkage Analysis – Linkage Analysis – Recessive ModelRecessive Model
Assume the disease is caused by a Assume the disease is caused by a recessive allelerecessive allele, meaning , meaning two two copies are required to cause the diseasecopies are required to cause the disease..
aaccaattttccaatt
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
homozygous SNP homozygous SNP alleles in father that are alleles in father that are heterozygous in motherheterozygous in mother
Linkage Analysis – Linkage Analysis – Recessive ModelRecessive Model
Assume the disease is caused by a Assume the disease is caused by a recessive allelerecessive allele, meaning , meaning two two copies are required to cause the diseasecopies are required to cause the disease..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
homozygous SNP homozygous SNP allele in child, allele in child, identical to father’s identical to father’s
Linkage Analysis – Linkage Analysis – Recessive ModelRecessive Model
Assume the disease is caused by a Assume the disease is caused by a recessive allelerecessive allele, meaning , meaning two two copies are required to cause the diseasecopies are required to cause the disease..
aaccaattttccaat`t`
aattaaggttccccaa
aaccaaggaattaatt
ttccaattttccaatt
mommom dad dad has has diseasedisease
aaccaaggttccccaa`̀
aaccaaggaaccaatt
child child has has diseasedisease
SNP allele and SNP allele and disease are linked disease are linked indicating possible indicating possible disease locus.disease locus.
BMI = weight/heightBMI = weight/height22 in kg/m in kg/m22, BMI > 25 overweight, BMI > 30 obese, BMI > 25 overweight, BMI > 30 obese
Other Differences – Other Differences – MicrodeletionsMicrodeletions
A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.
aaccaattttcccctt
aattggtttt
tt
ggcc
ggccaatt
aaccaaccttcccctt
microdeletionsmicrodeletions
Other Differences – Other Differences – MicrodeletionsMicrodeletions
A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.
aaccaattttcccctt
aattggtttt
tt
ggcc
ggccaatt
aaccaaccttcccctt
heterozygousheterozygous
Other Differences – Other Differences – MicrodeletionsMicrodeletions
A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.
aaccaattttcccctt
aattggtttt
tt
ggcc
ggccaatt
aaccaaccttcccctt
homozygoushomozygous
Other Differences – Other Differences – MicrodeletionsMicrodeletions
A A microdeletion microdeletion is the loss of a small piece of DNA, perhaps as is the loss of a small piece of DNA, perhaps as small as 1000 bases. These pieces can contain genes, parts of small as 1000 bases. These pieces can contain genes, parts of genes or regulatory regions.genes or regulatory regions.
aaccaattttcccctt
aattggtttt
tt
ggcc
ggccaatt
aaccaaccttcccctt
miscalled miscalled homozygoushomozygous
Apparent Inheritance InconsistencyApparent Inheritance Inconsistency
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aattaaaaggaaaacc
cccccc
aacc
aaccaattccccaacc
ccccccttccccaacc
mommom daddadcccccc
aacc`̀
aaccaattccccaacc
childchild
Apparent Inheritance Apparent Inheritance InconsistencyInconsistency
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aattaaaaggaaaacc
cccccc
aacc
aaccaattccccaacc
ccccccttccccaacc
mommom daddadcccccc
aacc`̀
aaccaattccccaacc
childchilda a + t t a a + t t → → a t a t
by Mendelian inheritanceby Mendelian inheritance
Apparent Inheritance Apparent Inheritance InconsistencyInconsistency
SNPs and haplotypes are used to identify SNPs and haplotypes are used to identify regionsregions of the genome of the genome that cause diseasethat cause disease. The technique is called . The technique is called linkage analysislinkage analysis and and evidence of a connection is called evidence of a connection is called linkage disequilibrium (LD)linkage disequilibrium (LD)..
aattaaaaggaaaacc
cccccc
aacc
aaccaattccccaacc
ccccccttccccaacc
mommom daddadcccccc
aacc`̀
aaccaattccccaacc
childchildcluster of inconsistencies cluster of inconsistencies suggests a microdeletion.suggests a microdeletion.
MicrodeletionsMicrodeletions
HundredsHundreds of microdeletion haplotypes have been discovered of microdeletion haplotypes have been discovered recently. They may be a recently. They may be a major contributor to human differences major contributor to human differences and disease.and disease.
ResourcesResources
UCSC Human Genome BrowserUCSC Human Genome Browserhttp://genome.ucsc.edu/cgi-bin/hgGatewayhttp://genome.ucsc.edu/cgi-bin/hgGateway
National Center for Biotechnology Information (NCBI)National Center for Biotechnology Information (NCBI)http://http://www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov//
PubMedPubMedhttp://http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbwww.ncbi.nlm.nih.gov/entrez/query.fcgi?db==PubMedPubMed