Upload
kerry-horn
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
3/1/2004 MSE Presentation I 1
ESTMD System-- A Web-based EST Model Database
System
Yinghua Dong
3/1/2004 MSE Presentation I 2
Outline Project Overview
Requirements
Cost Estimation
Project Plan
Potential Risks
Demonstration
References
Acknowledgments
3/1/2004 MSE Presentation I 3
Project Overview -- Objective
Build a web-based, user-
friendly Expressed Sequence Tags
model database (ESTMD) system
to help biology scientists search
expression sequences and related
information to make further
decision
3/1/2004 MSE Presentation I 4
Project Overview -- Background
ESTs: Expressed Sequence Tags, are partial sequences of
randomly chosen cDNA, obtained from the results of a single
DNA sequencing reaction. Typically, EST processing includes
raw sequence cleaning, cleaned sequence assembling, and
unique sequence annotation and functional assignment.
Trace Files
Raw (clone)sequences
Cleaned (EST)sequences
Assembled(unique)
sequences
PhredCross_match &PERL program
Cap3
Uniquesequenceswith hit
Blast
3/1/2004 MSE Presentation I 5
Project Overview -- Background (cont’d)
Gene Ontology
A set of controlled vocabularies used to
describe biological features within a specified domain
of biological knowledge. Gene Ontology describes the
molecular functions, biological processes and cellular
components of gene.
Pathway
The sequence of enzyme catalyzed reactions
by which an energy-yielding substance is utilized by
protoplasm.
3/1/2004 MSE Presentation I 6
Project Overview -- System Architecture
Client Tier
Responsible for presenting
data, and receiving user
inputs
Application-server Tier
Responsible for recording
and abstracting business
processes
Data-server Tier
Responsible for data storage Three-tier Architecture
3/1/2004 MSE Presentation I 7
Project Overview-- Technologies and Tools
HTML with JavaScript will be used to build client
interfaces
Java Servlets, JSP (Java ServerPage) and JDBC will be
used on the server-side
XML and XSLT will be used to describe and present
Gene Ontology tree structure
MySQL4.0 is chosen as database management system
3/1/2004 MSE Presentation I 8
Project Overview-- Technologies and Tools
(cont’d)
JBuilder Enterprise9 is used as development tool
Rational Rose is used to create UML models
MS-Project is used for project plan
Some verification and validation software (such as
Alloy, USE, or SPIN) will be used for formal
requirement specification
3/1/2004 MSE Presentation I 9
Project Overview-- E-R Model
3/1/2004 MSE Presentation I 10
Requirements
3/1/2004 MSE Presentation I 11
Requirements (cont’d)
Search in Detail Users search detail
information by gene name or
symbol, sequence ID,
FlyBase ID, or GenBank ID
Users can decide the fields
shown in the result
The output format is
html/text (A sample output is
shown on the right side)
unisequenceID: Contig1uniSeq: CGCGGCCGCGTCGACGAGATTCGGAGGTTAGAAACATGACTCGCAAACGCCGTAATGGAGGACGGGCTAAGCACGGCCGTGGCCACGTTAAGGCGGTGAGATGCACCAACTGCGCGCGTTGCGTGCCTAAGGACAAAGCTATCAAAAAGTTCGTGATCAGGAATATTGTCGAAGCGGCTGCCGTCAGGGATATCAACGAAGCTTCCGTATATGCATCATTCCAGCTGCCGAAGCTGTATGCAAAGCTCCACTACTGCGTCTCCTGCGCCATCCACAGCAAAGTTGTGCGCAACAGGTCTAAGAAGGACAGGAGAATCCGCACACCACCCAAGAGCACCTTCCCCAGGGACATGCAGCGCCCACAGAATGTGCAAAGGAAGTGAAGTGATTTACAATAAATTTTAAGAAAACCCflybaseID: FBgn0004413evalue: 2.00E-49hitLength: 114bitScore: 190identity: 93/115
3/1/2004 MSE Presentation I 12
Requirements (cont’d) Search by Keyword
Users search the sequences at each stage by keyword
The output includes sequence ID, length (with a link to sequence),
gene name, symbol and a link to contig view image
A sample output
cloneID RawLengt
h
Cleaned
Length
UnisequenceID
Unisequence
Length
GeneName
symbol
ContigView
pb42ad-1_001_a07.pb42primer
876 409 Contig1 413 Ribosomal protein S26
RpS26 View link
pb42ad-1_001_f07.pb42primer
886 205 Contig1 413 Ribosomal protein S26
RpS26 View link
pyes2-ct_012_c12.p1ca
291 286 Contig1 413 Ribosomal protein S26
RpS26 View link
pyes2-ct_034_h06.p1ca
803 398 Contig1 413 Ribosomal protein S26
RpS26 View link
3/1/2004 MSE Presentation I 13
Requirements (cont’d) Gene Ontology Search
Users search gene ontology information by gene names,
symbols, IDs, or a text file.
The output is a table including GO ID, term, type, sequence ID,
hit ID, and gene symbol.
The hyperlinks on terms can show gene ontology tree structure.
A sample output
GO ID Term Type Sequence ID
Hit ID Gene Symbol
GO:0006412
protein biosynthesis Biological_process
Contig1 FBgn0004413
RpS26
GO:0005843
cytosolic small ribosomal subunit (sensu Eukarya)
Cellular_component
Contig1 FBgn0004413
RpS26
GO:0005840
ribosome Cellular_component
Contig1 FBgn0004413
RpS26
GO:0003735
structural constituent of ribosome
Molecular_function
Contig1 FBgn0004413
RpS26
3/1/2004 MSE Presentation I 14
Requirements (cont’d) Gene Ontology Classification
Users input a batch of gene names/symbols, or a local text file
containing sequence IDs.
Users can choose the gene ontology types which they want to
classify.
The output is a table including gene ontology type, subtype,
sequence count, and percentage of sequences.
A sample outputtype subtype sequence_cou
nt%
Cellular_component
cell 3 75%
Biological_process Cell growth and/or maintenance
3 75%
Molecular_function
enzyme 1 25%
Molecular_function
Protein tagging 1 25%
Molecular_function
Structural molecule 3 75%
3/1/2004 MSE Presentation I 15
Cost Estimation
The effort of the project is estimated by
Function Point Analysis (FPA)
COCOMO II Model
3/1/2004 MSE Presentation I 16
Cost Estimation-- Function Point Analysis
Unadjusted Function Points
FunctionType
Simple Average Complex Total UFPAmoun
tWeight
Amount
Weight
Amount
Weight
Inputs 7 3 0 4 0 6 21
Outputs 2 4 5 5 0 7 33
Inquires 11 3 0 4 2 6 43
Files 0 7 3 10 0 15 30
Interfaces 1 5 1 7 0 10 12
Total UFP 138
3/1/2004 MSE Presentation I 17
Cost Estimation-- Function Point Analysis
(cont’d)
Function Point Analysis Total Unadjusted Function Points (UFP) = 138
Product Complexity Adjustment (PC) = 0.65 + (0.01× 40)
= 1.05
Total Adjusted Function Points (FP) = UFP × PC = 144.9
Language Factor (LF) for Java assumed as 35
Source Lines of Code (SLOC) = FP × LF = 5071.5
3/1/2004 MSE Presentation I 18
Cost Estimation-- COCOMO II
For application programs:
Delivered Source Instructions (KDSI) = 5.0715
Programmer Effect (PM) = 2.4 × (KDSI) 1.05
= 13.2 person-month
Development Time in month (TDEV) = 2.5 × (PM) 0.38
= 6.66 months
3/1/2004 MSE Presentation I 19
Project Plan
Phase I: Requirement ( 1/12/04 ~3/1/04)
Phase II: Design (2/23/04 ~ 4/23/04)
Phase III: Implementation and Test (4/26/04 ~ 7/30/04)
3/1/2004 MSE Presentation I 20
Project Plan (cont’d)
3/1/2004 MSE Presentation I 21
Potential Risks
The requirements may change continually
Some biology knowledge is needed
Some new technologies, such as XML, XSLT,
need to be leaned
3/1/2004 MSE Presentation I 22
Demonstration
http://129.130.115.72:8080/estmd/index.html
3/1/2004 MSE Presentation I 23
References IEEE STD 830-1998, IEEE Recommended Practice for Software
Requirements Specifications, 1998 Edition, IEEE, 1998
IEEE Standard for SW Quality Assurance Plans (IEEE Std 730-1998)
Walker Royce, Software Project Management -- A United Framework, 1998
Marty Hall, Core Servlets and JavaServer Pages, 2000
Roger. S. Pressman, Software Engineering: A practitioner’s Approach, 5th Edition.
Dr. Gustafson, CIS 540 lecture
http://sunset.usc.edu/research/COCOMOII/index.html
3/1/2004 MSE Presentation I 24
Acknowledgments
Committee:
Dr. Mitchell L. Neilsen
Dr. Gurdip Singh
Dr. Daniel Andresen
3/1/2004 MSE Presentation I 25
Suggestions and Comments
Thank You!