Upload
marjaaaaa
View
212
Download
0
Embed Size (px)
Citation preview
7/27/2019 SDD1.101
1/7
SNPPEB 1.1
Software Design Document
(current document version 1.101)
Document update history:
version 1.0
Created by Tony on Aug 4, 2004
Description: First draft for general ideaversion 1.01
Modified by Tony on September 29, 2004
Draw flowchart to show design, and also address the issues in SRS1.01
Version 1.011Modified by Tony on Oct 1, 2004
Still address to SRS1.01More detailed module design in section 5.Version 1.012
Modified by Tony on Oct 6, 2004
Still address to SRS1.01Modify module design in section 5 from version 1.011.
Version 1.1
Modified by Tony on Jan 31, 2005Address to SRS 1.1
Major change:
1. provide service to new Backman GenomeLab SNPstream Genotyping System
2. Setup local databases instead of using XML files from NCBIVersion 1.101
Modified by Tony on Mar 10, 2005
Still address to SRS 1.1Redefine DB design
7/27/2019 SDD1.101
2/7
1. Description
The requirements in SRS will be fully addressed in this software design document or alternative
solution should be given. We will use reference sequence data from NCBI in fasta files and XML
file to setup our local dabases. Also, "Primer3"(http://frodo.wi.mit.edu/primer3/primer3_code.html) will be integrated into our application within
the useage condition in its copyright document.
2. Function Design
In this version of design document, we have a primary design to address the issues in SRS 1.1, anddraw a design flowchart.
a. Input and criteria
Input:
List of SNP ids
Two locations in a chromosome (Two STS markers?)Criteria:
Orientation: Original, all forward, all reverse
SNP types: (any combination)
6 types
Exclude coding SNPs?
Flanking sequence length?
Number of SNPs to be separated
Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/
7/27/2019 SDD1.101
3/7
b. Query local databases
Database name: snppeb
i. ER:
ii. Table and column definition:
Table genome_contig:accession: accession.version format, example: NT_077402.1ctg_id: internal ID, example: CONTIG:77451tax_id: 9606 is Homo sapiensctg_length: length of contigchr: chromosome. Un is not placed on any chromosomechr_from: chromosome coordinate, reported in 1 base coordinates, starts
from 1. 0 means not localized or placed on any chromosomechr_to: chromosome coordinate, reported in 1 base coordinates. 0 means
not localized or placed on any chromosomeorient: +, -, 0, where 0 indicates uncertainty in orientationassembly: this value is used to associate contigs with a particular
assembly (e.g., reference assembly vs alternate assembliesprovided by other groups or representing other haplotypes)
Table genome_contig_set:
accession: accession.version format, example: NT_077402.1
7/27/2019 SDD1.101
4/7
segment_id: this is associated with ctg_from and ctg_to.Let ctg_from m ctg_toSegment_id = int((m-1)/200 + 1);
#ctg_from: contig coordinate, reported in 1 base coordinates, starts from1. Not added in DB, can be calculated from seqment_id:ctg_from = 200 * (segment_id 1);
#ctg_to: contig coordinate, reported in 1 base coordinates.
Not put in db.Can be calculated from segment_id and seq:ctg_to = 200 * (segment_id 1) + seq.length 1;
seq: sequence segment from contig, lower case means repetitive
Table snp_info:
id: rs#tax_id: species id, 9606 for humanbuild_create: build to create this SNPbuild_update: last build to update this SNPallele_1, allele_2: nucleotides in SNP site, 1 and 2 are in alphabet order
(example: A C, not C A)allele_1_frq: average frequency of allele_1
allele_2_frq: average frequency of allele_2frq_count: number of all chromosomes contributing to frequencycalculation.
validated_pop: T|F, at least one ss in cluster was validated by independentassay
validated_frq: T|F, at least one subsnp in cluster has frequency datasubmitted
validated_clu: T|F, cluster has 2+ submissions, with 1+ submission assayedwith a non-computational method
validated_2h2: T|F, all alleles have been observed in 2+ chrosomesvalidated_hap: T|F, validated by HapMap projectctg_accession: mapping contig in accession.version format, example:
NT_077402.1ctg_chr: chromosome of mapping contigctg_loc: snp location mapped to contig
chr_loc: snp location mapped to chromosomectg_ori: orientation of snp and flanking sequence to contigctg_fxn: functional relationship of SNP to genes at contig location:
locus-region |coding |conding-synon |coding-nonsynon | mrna-utr |intron |splice-site |reference |exception
Table snp_flanking:
id: rs#side: 5|3, 5 or 3 sidefragment: number index of fragment of a flanking sequence in order
5 side starts from the far end to SNP site, 3side startsfrom the immediate neighboring site of SNP
seq: fragement of flanking sequence
7/27/2019 SDD1.101
5/7
c. Information retrieval
SNP Information displayed:
Checkbox for further primer design
SNP id
Allele
Allele frequencies Flanking sequences, length and orientation
Verification information
function class(coding nonsynon, coding synon, ...)
location info (chr, contig, ...)
Prototype is available at: http://bioinfo.vipbg.vcu.edu/SNPPEB/prototypes/
d. Primer designi. Generate text file for autoprimer.com
ii. Primer in batch (call primer3)
Parameter setup
This page will be similar to the primer3 web application to setup parameters to run
primer3. The default value will be given according to suggestions from our lab
specialists.
Display Result
This page will display the primers for a list of SNPs. The format will be customized
by our lab specialists.
7/27/2019 SDD1.101
6/7
3. Flowchart
7/27/2019 SDD1.101
7/7
4. System Requirement and Running Enviroment
Programming tool: Java, PHP, Perl, CGI, BioPerl, XML::Twig
Primer design software: Primer3
Running environment: Redhat Enterprise Linux ws3, Dell workstation Precision 670
Database server: MySQLServer: bioinfo.vipbg.vcu.edu/SNPPEB
Client: IE, Mozilla, or Netscape browser and internet connection