SDD1.101

Embed Size (px)

Citation preview

  • 7/27/2019 SDD1.101

    1/7

    SNPPEB 1.1

    Software Design Document

    (current document version 1.101)

    Document update history:

    version 1.0

    Created by Tony on Aug 4, 2004

    Description: First draft for general ideaversion 1.01

    Modified by Tony on September 29, 2004

    Draw flowchart to show design, and also address the issues in SRS1.01

    Version 1.011Modified by Tony on Oct 1, 2004

    Still address to SRS1.01More detailed module design in section 5.Version 1.012

    Modified by Tony on Oct 6, 2004

    Still address to SRS1.01Modify module design in section 5 from version 1.011.

    Version 1.1

    Modified by Tony on Jan 31, 2005Address to SRS 1.1

    Major change:

    1. provide service to new Backman GenomeLab SNPstream Genotyping System

    2. Setup local databases instead of using XML files from NCBIVersion 1.101

    Modified by Tony on Mar 10, 2005

    Still address to SRS 1.1Redefine DB design

  • 7/27/2019 SDD1.101

    2/7

    1. Description

    The requirements in SRS will be fully addressed in this software design document or alternative

    solution should be given. We will use reference sequence data from NCBI in fasta files and XML

    file to setup our local dabases. Also, "Primer3"(http://frodo.wi.mit.edu/primer3/primer3_code.html) will be integrated into our application within

    the useage condition in its copyright document.

    2. Function Design

    In this version of design document, we have a primary design to address the issues in SRS 1.1, anddraw a design flowchart.

    a. Input and criteria

    Input:

    List of SNP ids

    Two locations in a chromosome (Two STS markers?)Criteria:

    Orientation: Original, all forward, all reverse

    SNP types: (any combination)

    6 types

    Exclude coding SNPs?

    Flanking sequence length?

    Number of SNPs to be separated

    Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/

  • 7/27/2019 SDD1.101

    3/7

    b. Query local databases

    Database name: snppeb

    i. ER:

    ii. Table and column definition:

    Table genome_contig:accession: accession.version format, example: NT_077402.1ctg_id: internal ID, example: CONTIG:77451tax_id: 9606 is Homo sapiensctg_length: length of contigchr: chromosome. Un is not placed on any chromosomechr_from: chromosome coordinate, reported in 1 base coordinates, starts

    from 1. 0 means not localized or placed on any chromosomechr_to: chromosome coordinate, reported in 1 base coordinates. 0 means

    not localized or placed on any chromosomeorient: +, -, 0, where 0 indicates uncertainty in orientationassembly: this value is used to associate contigs with a particular

    assembly (e.g., reference assembly vs alternate assembliesprovided by other groups or representing other haplotypes)

    Table genome_contig_set:

    accession: accession.version format, example: NT_077402.1

  • 7/27/2019 SDD1.101

    4/7

    segment_id: this is associated with ctg_from and ctg_to.Let ctg_from m ctg_toSegment_id = int((m-1)/200 + 1);

    #ctg_from: contig coordinate, reported in 1 base coordinates, starts from1. Not added in DB, can be calculated from seqment_id:ctg_from = 200 * (segment_id 1);

    #ctg_to: contig coordinate, reported in 1 base coordinates.

    Not put in db.Can be calculated from segment_id and seq:ctg_to = 200 * (segment_id 1) + seq.length 1;

    seq: sequence segment from contig, lower case means repetitive

    Table snp_info:

    id: rs#tax_id: species id, 9606 for humanbuild_create: build to create this SNPbuild_update: last build to update this SNPallele_1, allele_2: nucleotides in SNP site, 1 and 2 are in alphabet order

    (example: A C, not C A)allele_1_frq: average frequency of allele_1

    allele_2_frq: average frequency of allele_2frq_count: number of all chromosomes contributing to frequencycalculation.

    validated_pop: T|F, at least one ss in cluster was validated by independentassay

    validated_frq: T|F, at least one subsnp in cluster has frequency datasubmitted

    validated_clu: T|F, cluster has 2+ submissions, with 1+ submission assayedwith a non-computational method

    validated_2h2: T|F, all alleles have been observed in 2+ chrosomesvalidated_hap: T|F, validated by HapMap projectctg_accession: mapping contig in accession.version format, example:

    NT_077402.1ctg_chr: chromosome of mapping contigctg_loc: snp location mapped to contig

    chr_loc: snp location mapped to chromosomectg_ori: orientation of snp and flanking sequence to contigctg_fxn: functional relationship of SNP to genes at contig location:

    locus-region |coding |conding-synon |coding-nonsynon | mrna-utr |intron |splice-site |reference |exception

    Table snp_flanking:

    id: rs#side: 5|3, 5 or 3 sidefragment: number index of fragment of a flanking sequence in order

    5 side starts from the far end to SNP site, 3side startsfrom the immediate neighboring site of SNP

    seq: fragement of flanking sequence

  • 7/27/2019 SDD1.101

    5/7

    c. Information retrieval

    SNP Information displayed:

    Checkbox for further primer design

    SNP id

    Allele

    Allele frequencies Flanking sequences, length and orientation

    Verification information

    function class(coding nonsynon, coding synon, ...)

    location info (chr, contig, ...)

    Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/

    d. Primer designi. Generate text file for autoprimer.com

    ii. Primer in batch (call primer3)

    Parameter setup

    This page will be similar to the primer3 web application to setup parameters to run

    primer3. The default value will be given according to suggestions from our lab

    specialists.

    Display Result

    This page will display the primers for a list of SNPs. The format will be customized

    by our lab specialists.

  • 7/27/2019 SDD1.101

    6/7

    3. Flowchart

  • 7/27/2019 SDD1.101

    7/7

    4. System Requirement and Running Enviroment

    Programming tool: Java, PHP, Perl, CGI, BioPerl, XML::Twig

    Primer design software: Primer3

    Running environment: Redhat Enterprise Linux ws3, Dell workstation Precision 670

    Database server: MySQLServer: bioinfo.vipbg.vcu.edu/SNPPEB

    Client: IE, Mozilla, or Netscape browser and internet connection