Genetic Variations Resources Developed by:

Preview:

Citation preview

Genetic Variations ResourcesGenetic Variations Resources

Developed by:

Ansuman Chattopadhyay, Ph.DInformation Specialist in Molecular Biology and Genetics

Health Sciences Library SystemUniversity of Pittsburgh

Ansuman@pitt.edu

IntroductionIntroduction

Scientists expect that comparison of genomic sequences taken from two unrelated individuals will reveal that they are 99.9% identical. The 0.1% difference is due to genetic variations, and mainly one form of variation called single nucleotide polymorphisms.

These polymorphisms are considered one of the the key factorsthat makes each and every one of us different and can have a major impact on how we respond to diseases; environmental insults such as bacteria, viruses and chemicals; and drugs and other therapies. This makes genetic variations of great value for biomedical research and for developing pharmaceutical products or medical diagnostics.

This module will focus on human genetic variations and mainly cover Single Nucleotide Polymorphisms (SNP).

ObjectivesObjectives

At the end of this module participants will be able to:

• Understand the basic concepts behind different forms of genetic variations• Understand the terminologies used by researchers studying genetic variations• Identify genetic variation databases and interpret database search results • Understand and use online resources for functional analysis of variation information• Understand the significance of the International Hap Map Project

Questions - A Few ExamplesQuestions - A Few Examples

Participants will be able to answer questions like:

•Mutations on BRCA1 gene have been reported to be associated with theearly onset of breast cancer. Retrieve all non-synonymous and validated coding SNPs for BRCA1 from dbSNP.

•What disorders are caused by a mutation to the gene HFE? Do all known substitutions in this gene cause disease? How many SNPs have been located in the HFE gene?

•A gene variant primarily found in African Americans, that slightly increases the risk for developing an irregular heartbeat, known as arrhythmia. The variant occurs in the cardiac sodium channel gene SCN5A which results a change of amino acid at the position of 1102 from serine to tyrosine (S To Y) . Can you predict the effect of this non-synonymous SNP (rs7626962).

Human Genetic VariationsHuman Genetic Variations

Primarily two types of genetic mutation events create all forms of variations:

• Single base mutation which substitutes one nucleotide for another

-- Single Nucleotide Polymorphisms (SNP)

• Insertion or deletion of one or more nucleotide(s)

--Tandem Repeat Polymorphisms --Insertion/Deletion Polymorphisms

Single Nucleotide PolymorphismsSingle Nucleotide Polymorphisms

Single nucleotide polymorphisms (SNP) are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered.For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA.

SNPS are the most common class of polymorphisms.example:

Tandem Repeat PolymorphismsTandem Repeat Polymorphisms

Tandem repeats or variable number of tandem repeats (VNTR) are a very common class of polymorphism, consisting of variable length of sequence motifs that are repeated in tandem in a variable copy number.

VNTRs are subdivided into two subgroups based on the size of the tandem repeat units.

• Microsatellites or Short Tandem Repeat (STR)repeat unit: 1-6 (dinucleotide repeat: CACACACACACA)

• Minisatellitesrepeat unit: 14-100

example:

Spinocerebellar ataxia Type10 (SCA10) (OMIM:+603516) is caused by largest tandem repeat seen in human genome. Normal population has 10-22 mer pentanucleotide ATTCT repeat in intron 9 of SCA10 gene; where as SCA10 patients have 800-4500 repeat units, which causes the disease allele up to 22.5 kb larger than the normal one.

Insertion/Deletion PolymorphismsInsertion/Deletion Polymorphisms

Insertion/Deletion (INDEL) polymorphisms are quite common and widely distributed throughout the human genome.

Sequence repetitiveness in the form of direct or inverted tandem repeat have been shown to predispose DNA to localized rearrangements between homologous repeats. Such rearrangements are thought to be one of the reason which create INDEL polymorphism.

example: Association between coronary heart disease and a 287 bp Indel

Polymorphism located in intron 16 of the angiotensin converting enzyme (ACE) have been reported (OMIM 106180). This Indel, known as ACE/ID is responsible for 50% of the inter individual variability of plasma ACE concentration.

Chromosome AberrationsChromosome Aberrations

Gross chromosomal aberrations like deletions, inversions or translocations with a large segment of DNA sequences werethought to be quite rare. Although numerous clinically characterized genomic syndromes have been reported to beassociated with chromosomal aberrations.

example:Velocardiofacial syndrome (VCSF) characterized by the

presence of features like cleft palate, cardiac anomalies and learning disabilities is associated with a deletion mutation on chromosome 22q11.2. (OMIM:192430)

Estimated NumbersEstimated Numbers

•SNPs appear at 0.3-1-kb average intervals, consideringthe size of entire human genome, which is 3X107 bp, the total number scales up to 5-10 million. (Altshuler et al., 2000)

•In sillico estimation of potentially polymorphic VNTR are over 100,000 across the human genome.

•The short insertion/deletions are very difficult to quantify and the number is likely to fall in between SNPs and VNTR

Polymorphisms and Disease MarkersPolymorphisms and Disease Markers

•Very few of these polymorphisms show direct impact on deleterious phenotype.

•The non-disease-causing polymorphisms when mapped to the genome,may serve as markers to identify and map other genes that do cause disease when mutated.

•If these non-disease-causing variations are found to be inherited with a particular trait, but do not cause the trait, they may provide evidence of where the trait's gene is located in the genome.

Single Nucleotide Polymorphisms Single Nucleotide Polymorphisms (SNP)(SNP)

Common TerminologiesCommon Terminologies

Allele: Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent.

Polymorphism: Difference in DNA sequence among individuals.

Linkage Disequilibrium (LD): If two alleles tend to be inherited togethermore often than would be predicted, then the alleles are in linkage disequilibrium.

Haplotype refers to the set of alleles on one particular chromosome. Each person has two haplotypes in a given region, and each haplotype will be passed on as a complete unit.

Transitions and TransversionsTransitions and Transversions

SNPs include single base substitutions such as:

•Transitions: change of one purine (A,G) for a purine, or a pyrimidine (C,T) for a pyrimidine;

•Transversions: change of a purine (A,G) for a pyrimidine (C,T), or vice versa.

•CpG dinucleotides are first methylated and then deaminated to form either CpA or TpG.

G>A and C>T transitions accounting for 25% of all SNPs in human genome.

SNPs and MutationsSNPs and Mutations

Terminology for variation at a single nucleotide position is defined by allele frequency.

•A single base change, occurring in a population at a frequency of >1% is termed a single nucleotide polymorphism (SNP).

•When a single base change occurs at <1% it is considered to be a mutation.

Life Cycle of SNPs and MutationsLife Cycle of SNPs and Mutations

Classification of SNPsClassification of SNPs

SNPs may occur at any position in the above gene structure and based on its location it can be classified as: intronic, exonic or promoter region etc.

Coding SNPs can be further subdivided into two groups:•Synonymous: when single base substitutions do not cause a change

in the resultant amino acid •Non-synonymous: when single base substitutions cause a change

in the resultant amino acid.

Coding SNPsCoding SNPs

Image from the Geospiza Green Arrow(TM) tutorial by Sandra Porter, Ph.D.on SNP or Sequencing Error

Genetic Variations DatabasesGenetic Variations Databases

SNP DatabasesSNP Databases

•dbSNP http://www.ncbi.nlm.nih.gov/SNP/index.html

•Human Genome Variation Database (HGVbase) http://hgvbase.cgb.ki.se/

TSC: The SNP Consortiumhttp://snp.cshl.org/

dbSNPdbSNP

URL: http://www.ncbi.nlm.nih.gov/SNP/index.html

The Single Nucleotide Polymorphism database (dbSNP) is a public- domain archive for a broad collection of simple genetic polymorphisms.

This collection of polymorphisms includes:

•Single-base nucleotide substitutions (also known as single nucleotide polymorphisms or SNPs)

•Small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs),

•Microsatellite repeat variations (also called short tandem repeats or STRs).

dbSNP : StatisticsdbSNP : Statisticsttp://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi

dbSNP Data TypedbSNP Data Type

The SNP database has two major classes of content:

• Submitted data, i.e., original observations of sequence variation;

Submitted SNPs (SS) with ss# (ss 5586300)

• Computed data, i.e., content generated during the dbSNP "build" cycle by computation on original submitted data.

Reference SNP Clusters (Ref SNP) with rs# (rs 4986582)

Submitted SNP (ss) Details PageSubmitted SNP (ss) Details Page

•Provides information on the SNP and conditions under which it was collected.

•Provides links to collection methods (assay technique), submitter information (contact data, individual submitter), and variation data (frequencies, genotypes).

http://www.ncbi.nih.gov/SNP/snp_ss.cgi?subsnp_id=5586300

Reference SNP ClustersReference SNP Clusters

•Ref SNPs are similar to records in RefSeq, as they are curated by NCBI staff.

•Ref SNP Clusters define a "non-redundant set of annotated markers".

•When SNPs are first submitted by a researcher, the SNP is given an ss#.

•Non-redundant SNPs are then provided a unique RefSNP number.

•Submitted SNPs that represent redundant data are instead deposited into the matching RefSNP cluster.

•RefSNP Clusters are identified by RS numbers.

The dbSNP build CycleThe dbSNP build Cycle

Image from The NCBI Handbook

Ref SNP Graphic SummaryRef SNP Graphic Summary

Validation MethodsValidation Methods

dbSNP Search OptionsdbSNP Search Options

•http://www.ncbi.nlm.nih.gov/SNP/

•The NCBI Hand Book

Entrez SNPEntrez SNP

The dbSNP is now a part of the Entrez integrated information retrieval system and may be searched using either qualifiers (aliases) or a combination of 28 different search fields. A complete list of the qualifiers and search fields can be found on the Entrez SNP site.

Entrez SNP: Limit OptionsEntrez SNP: Limit Options

•The extensive limits screen in Entrez SNP cover a variety of features, including:

•Function Class (coding non synonymous; intron; etc.) •Chromosome (including W and Z for nonmammals) •Organisms •Observed Alleles (using IUPAC-International Union of Pure and Applied Chemistry-codes) •Map Weight (how many times in genome) •"Created" and "Updated" Builds •Records with links to other NCBI data domains (OMIM, Nucleotide, Protein, Structure, PubMed) •Type of validation •% Heterozygosity •Success Rate (likelihood that the SNP is real; = 1 minus false positive rate) •SNP Class •Method Class •Population Class

SNP Class and Method ClassSNP Class and Method Class

Q1. Find SNPs for a geneQ1. Find SNPs for a gene

Mutations on BRCA1 gene have been reported to be associated with the early onset of breast cancer.Retrieve all non-synonymous and validated coding reference SNPs for BRCA1 from dbSNP.

dbSNP

Answer

Flow Chart 1. dbSNP - Search dbSNP to find SNP records for a gene

Step By Step Guide 1. dbSNP • Enter "BRCA1 [Gene Name]" in the search box • Click on "Limits" • Go to "Function class" and select "coding nonsynonymous" • Go to "Organism(s)" and select "Homo Sapiens" • Go to "Validation" and select all options except "no info" • Click on "Details" and review "Query Translation" • Click on "GO"

dbSNP Search ResultdbSNP Search Result

Ref SNP Cluster ReportRef SNP Cluster Report• http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=4986852

Ref SNP FASTA SequenceRef SNP FASTA Sequence

RefSNP Variation and Validation SummaryRefSNP Variation and Validation Summary

Gene-Oriented SNP VisualizationGene-Oriented SNP Visualization

SNP: GeneViewSNP: GeneView

GeneView for All SNPsGeneView for All SNPs

Display of all known ref SNPs overlaid on the gene structure

Genome-Oriented SNP VisualizationGenome-Oriented SNP VisualizationRefSNP Summary Info

Map Viewer icon for SNPs (1)Map Viewer icon for SNPs (1)

Map Viewer Icon for SNPs (2)Map Viewer Icon for SNPs (2)

Map Viewer Icon for SNPs(3)Map Viewer Icon for SNPs(3)

Map Viewer Icon for SNPs(4)Map Viewer Icon for SNPs(4)

NCBI and dbSNPNCBI and dbSNP

Question 2:Question 2:Genome oriented SNP visualizationGenome oriented SNP visualization

Mutations in Dopamine Receptor 5 (DRD5) gene have been observed in patients with various neurological disorders. Search dbSNP and find how many refSNP records have been reported for DRD5. Show all refSNPs in the context of a chromosome.

Answer

1. Entez Gene - Search Entrez Gene database to find gene-centered information and use link to access dbSNP and Map Viewer

1. Entez SNP - Find SNP information for a gene

2. Map Viewer - Display all SNPs in the context of a chromosome

Flow ChartFlow Chart

Step By Step Guide 1. Entez Gene • Enter "DRD5" in the search box • Click on "Limits" • Select "Gene Name" from the drop down list of "To limit your search to a specific field" • Go to "Limit by Taxonomy" and select "Homo sapiens" • Click "Go" • Click on "DRD5" from "Entrez Gene" search result to view gene information • Click on "Links" and select "SNP" to retrieve all SNPs records from dbSNP • Click on "Links" and select "GeneView in dbSNP" to find location of SNPs on the gene • Click on "Links" and select "Map Viewer" to display all SNPs in Map Viewer

2. Map Viewer • Click on "Map and Options" (appears at left side bar) • A new pop up window will appear, Select "Variation" available under "Sequence Maps" in "Available Maps“ section and click on "ADD" button to include it in "Maps Displayed (left to right)" box • Select "Variation" in "Maps Displayed (left to right)" box and Click on "Make Master/Move to Bottom" button • Click on "Apply" button

Step By Step GuideStep By Step Guide

Map Viewer Display OptionMap Viewer Display Option

SNP: Genome ViewSNP: Genome View

SNP : Chromosome ReportSNP : Chromosome Report

http://www.ncbi.nlm.nih.gov/SNP/maplists/maplist-newmap.html

Online Mendelian Inheritance Online Mendelian Inheritance in Man (OMIM)in Man (OMIM)

OMIM : A Brief OverviewOMIM : A Brief Overview

•URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM.

•Online Mendelian Inheritance in Man is a full text knowledgebase of human genes and genetic disorders. •The resource was created by Dr. Victor McKusick (Johns Hopkins University), and is curated by McKusick and colleagues. •The print version, Mendelian Inheritance in Man was first published in the 1960s; the web-based version has been available since 1995. •Each record in OMIM serves as a summary of the current state of knowledge of the gene or disorder; records may be thought of as review articles.

•Records contain information on a variety of topics, including:

•description and clinical features of the gene or disorder; •biochemical and other features; •cytogenetics and mapping; •molecular and population genetics; •diagnosis and clinical management; •animal models for the disorder; •and allelic variants.

•OMIM is searchable via Entrez, and records link to other NCBI resources.

OMIM StatisticsOMIM Statistics

** image file = OMIM_statistics.gif

OMIM Allelic VariantsOMIM Allelic Variants

•The OMIM database includes genetic disorders caused by all levels of mutation/variation, from nucleotide substitutions to large-scale chromosomal abnormalities.

•The allelic variants that are listed in and searchable through the "Allelic Variants" field tend to be more "SNPlike":

•nucleotide substitutions; •small insertions and deletions (indels); •frame shifts caused by these indels.

•Allelic variants are represented by a 10-digit OMIM number (e.g., 141900.0003): six digit OMIM number of the parent locus; followed by decimal point and 4-digit variant number.

Finding Allelic Variants in OMIMFinding Allelic Variants in OMIM

•OMIM is an Entrez database, and follows Entrez searching conventions.

•Allelic variants may be retrieved from the database two main ways:

•Search for a particular topic of interest (a gene, disease, etc.) and when retrieved, view its allelic variants.

•Use the limits screen to limit initial search to: •retrieve only records that contain allelic variant information; •search for particular terms within the allelic variants field.

OMIM Search ExampleOMIM Search Example

•Search OMIM for information on the gene GLUCOSE-6-PHOSPHATE DEHYDROGENASE; G6PD.What known disorders are caused by allelic variants in this gene?

OMIM Search ResultOMIM Search Result

•Our search retrieves records related to this gene; +305900is the record specifically to the gene. •Click the hyperlinked OMIM number to retrieve the record of interest.

OMIM Search contdOMIM Search contd•Once a record is retrieved, note the link to "Allelic Variants" on the blue border to the left of the screen.

•Clicking on "Allelic Variants" will lead to a discussion of selected variants.

•Clicking on the "View List" link will retrieve a list of variants; each variant is linked to its description.

Viewing the List of Allelic VariantsViewing the List of Allelic Variants

•The List of Allelic Variants provides the:

•OMIM allelic variant number; •name of associated disorder or condition; •gene symbol; •type/location of variant.

Viewing the List of Allelic Variants ContdViewing the List of Allelic Variants Contd

•Click the hyperlinks to retrieve the record of interest.

Limitations of "Allelic Variants" in OMIMLimitations of "Allelic Variants" in OMIM

•Only "selected" mutations are included in the "allelic variants" subset of the database. Included variants are chosen on the following criteria: •the first mutation to be discovered; •high population frequency; •distinctive phenotype; •historic significance; •unusual mechanism of mutation; •unusual pathogenic mechanism; •distinctive inheritance. •Few neutral polymorphisms are included in OMIM; most included allelic variants are known disease-producing mutations.

Mutation Database CatalogsMutation Database Catalogs

•Nucleic Acids Research Database Issue http://nar.oupjournals.org/content/vol32/suppl_1/

•HUGO Mutation Database Initiativehttp://www2.ebi.ac.uk/mutations/cotton/dblist/dblist.html

Genetic Variations and Map ViewerGenetic Variations and Map Viewer

NCBI Map Viewer offers integration of variation data from several data sources. The list includes

• dbSNP (Variation)• Mitelman Breakpoint• OMIM ( Morbid/Disease)

Question 3:Question 3:How to create a genetic variation MapHow to create a genetic variation Map

Generate an integrated variation map with reference SNPs, Mitelman breakpoints and OMIM diseases for chromosome 17; region 7773,000-7792,000 bp. What gene(s) have you found in this region?

Answer

Step By Step Guide Map Viewer • Click on "Homo sapens (Human)" appears under "mammals“ node in the tree diagram • Click on chromosome 17 • Specify the region by entering "7773,000" and "7792,000" respectively in the "Region Shown" boxes appear in the left side bar • Click on "Go" button • Click on "Map and Options" (appears at left side bar) • A new pop up window will appear, Select "Variation", "Mitelman Breakpoints" and "OMIM/Morbid Diseases" from "Available Maps" section • Click on "ADD" button to include the selected map options into "Maps Displayed (left to right)" box. • Select "Variation" in "Maps Displayed (left to right)" box and Click on "Make Master/Move to Bottom" button • Click on "Apply" button

Integrated Variations MapIntegrated Variations Map

Functional Analysis ofFunctional Analysis ofPolymorphismsPolymorphisms

Gene StructureGene Structure

SNPs and The Structure of a GeneSNPs and The Structure of a Gene

A Decision Tree for SNP AnalysisA Decision Tree for SNP Analysis

Exonic Splicing Enhancer/SilencerExonic Splicing Enhancer/Silencer

Question 4:Question 4:Functional analysis of a SNPFunctional analysis of a SNP

A gene variant primarily found in African Americans, that slightly increases the risk for developing an irregular heartbeat, known as arrhythmia. The variant occurs in the cardiac sodium channel gene SCN5A which results a change of amino acid at the position of 1102 from serine to tyrosine (S To Y) . Can you predict the effect of this non-synonymous SNP (rs7626962).

Answer

Flow Chart 1. Entrez SNP - Search Entrez SNP by refSNP ID to find SNP information.

2. Entrez Protein - Find protein information including its amino acid sequence and the presence of functional domains

3. NCBI Amino Acid Explorer - Compare amino acids in terms of physyo-chemical properties

4. NCBI Mutation Analyzer - Predict the effect of amino acid change on the protein structure

5. TMHMM Server v. 2.0 - Predict the presence of transmembrane helix in

a protein sequence

6. Russel etal., Amino Acid Properties Table - Predict the effect of amino acid change on the protein structure

Step By Step Guide 1. Entrez SNP • Enter "rs7626962" in the search box and click on "GO" • Click on "GeneView" and note the amino acid change at 1102 position • Click on "NP_000326" appears in Protein column in GeneView to view the protein information of SCN5A present in Entrez Protein database

2. Entrez Protein • Select "FASTA" from "Display" drop-down menu and click on "Display" to get amino acid sequence for SCN5A • Click on "Domains" to see the presence of conserved domains in the protein sequence • Select the domain, which covers the amino acid position 1102 and click on the domain to view the sequence alignment. Check whether "Ser" at position 1102 is conserved among the family members or not

3. NCBI Amino Acid Explorer • Go to "Compare" option appears in the left side bar and select "S-Ser" to "Y-Tyr" and click "Compare"

4. NCBI Mutation Analyzer • Select "ser" to "tyr" and click on "Mutate" button" • In the "results of mutating serine to tyrosine" page note the color which indicates the amino acid substitution score based on BLOSUM62 matrix

5. TMHMM Server v. 2.0 • Copy the FASTA formatted sequence for SCN5A from Entrez Protein step and paste it into the sequence submission box • Select output format "Extensive, with graphics" and click "Submit" • Find the topology (transmembrane helix/inside/outside) around position 1102

6. Russel etal., Amino Acid Properties Table

• Click on "S" present in the "Overview of Amino Acid Properties" figure • Check the substitution score for ser to tyr

SNP Gene View for SCN5ASNP Gene View for SCN5A

Sequence AlignmentSequence Alignment

Amino Acid Comparison Text ViewAmino Acid Comparison Text ViewNCBI Amino Acid Explorer

Amino Acid Comparison Graphic ViewAmino Acid Comparison Graphic View

NCBI Amino Acid Explorer

Substitution MatrixSubstitution Matrix

Amino Acid Properties Table: http://www.russell.embl.de/aas/

Substitution Preferences for SerSubstitution Preferences for Ser

PharmacogenomicsPharmacogenomics

Pharmacogenomics is a science that examines the inherited variations in genes that dictate drug response and explores the ways these variations can be used to predict whether a patient will have a good response to a drug, a bad response to a drug, or no response at all.

SOURCE: NCBI A Science Primer

Pharmacogenomics : ExamplePharmacogenomics : Example

PharmGKBPharmGKB

The Pharmacogenetics and Pharmacogenomics Knowledge Base: URL: http://www.pharmgkb.org/index.jsp

PharmGKB : Search ExamplePharmGKB : Search Example

Search PharmGKB for Albuterol :

Hap MapHap Map

Haplotype StudyHaplotype Study

Whole-genome genotyping of 10 million SNPs

•Technologically daunting•Prohibitively expensive

Researchers are trying to downsize the problem of genome-wide genotyping by studying haplotypes.

A haplotype is a contiguous, linear set of SNP alleles along a

genome that is inherited as a block.

Genetic TerminologiesGenetic Terminologies

Allele: Alternative form of a genetic locus; a single allele for each locus is inherited separately from each parent.

Genotype: Each person has two copies of all chromosomes except the sex chromosomes. The set of alleles that a person has is called a genotype.

The term genotype can refer to the SNP alleles that a person has at a particular SNP, or for many SNPs across the genome.

Genotyping: A method that discovers what genotype a person has is called genotyping.

•Sets of nearby SNPs on the same chromosome are inherited in blocks, called haplotype blocks, which are 12 or more kb long.

•Between 65% and 85% of the human genome is organized in haplotype blocks.

•Each block comes in three or four common versions that capture the majority of genetic diversity throughout the entire human population.

Blocks may contain a large number of SNPs, but a few SNPs are enough to uniquely identify the haplotypes in a block. The specific SNPs that identify the haplotypes are called tag SNPs.

Haplotype BlocksHaplotype Blocks

International Hap Map ProjectInternational Hap Map Project

The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation.

The HapMap will be a tool that will allow researchers to find genes and genetic variations that affect health and disease.

The HapMap Home PageURL: http://www.hapmap.org/index.html.en

Hap Map Project: Population and SampleHap Map Project: Population and Sample

The DNA samples for the HapMap will come from a total of 270 people:

•Yoruba people in Ibadan, Nigeria (30 both-parent-and-adult-child trios)

•Japanese in Tokyo (45 unrelated individuals)

•Han Chinese in Beijing (45 unrelated individuals)

•The U.S. Centre d'Etude du Polymorphisme Humain (CEPH) (30 trios,residents with ancestry from Northern

and Western Europe) SOURCE: International Hap Map Projecthttp://www.hapmap.org/abouthapmap.htm

Hap Map Project: Hap Map Project: Scientific strategyScientific strategy

To develop the HapMap, the samples will be genotyped for at least 1 millionSNPs across the human genome.

When the Project started, 2.8 million SNPs were in the public database dbSNP. However, many chromosome regions had too few SNPs, and many SNPs were too rare to be useful, so millions of additional SNPs were needed to develop the HapMap.

The Project discovered another 2.8 million SNPs by

September of 2003, and SNP discovery continues.

Participating Centers:Canada, China, Japan, the United Kingdom, and the United States

The Project initially will produce a map of 600,000 SNPs evenly spaced across the genome, which is a density of one SNP every 5000 bases.

SOURCE: International Hap Map Projecthttp://www.hapmap.org/abouthapmap.htm

Question 5: Identify disease-causing mutationsQuestion 5: Identify disease-causing mutations

What disorders are caused by a mutation to the gene HFE? Do all known substitutions in this gene cause disease? How many SNPs have been located in the HFE gene?

Answer

Question 6: Analysis of exonic SNPsQuestion 6: Analysis of exonic SNPs

Germ line mutations in the BRCA1 gene lead to the predisposition to breast and ovarian cancer. A single point mutation, a G to Tsubstitution in exon 18 at nucleotide 5199 ( codon 1694) have been observed in a group of breast and ovarian cancer patients. This mutation changes a glutamic acid to a stop codon (Glu 1694 ter).Further study revealed that instead of expressing any transcript withexon 18 containing stop codon, the mutant allele produces only mRNA in which the entire exon 18 has been skipped. Explain the cause of this exon skipping phenomenon.

Answer

Question 7 : Find SNPs in a given base pair Question 7 : Find SNPs in a given base pair range on an assembled genomerange on an assembled genome

Have any SNPs been discovered on mouse chromosome 5 betweenchromosome position 38000000 and 39000000? Which of these SNPs have observed alleles of A/G? Can these SNPs be viewed on a map?

Answer

Recommended