View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Developing a Virtual Laboratory for E-Science
Ersin C. Kaletas
©2003, E. Kaletas
VLAM-G: Grid-based Virtual Laboratory AMsterdam
A support environment for collaborative experimental sciences
(Dutch ICES/KIS project 1999-2003)
Enable VLAM-G users to define, execute, and monitor their collaborative experiments by providing:
location independent experimentation familiar experimentation environment assistance during experimentation
Designing & integrating middleware to bridge the gap between Grid-Services and Application layers
Developing application prototypes to check ideas and to learn
©2003, E. Kaletas
Outline
Development Approach Problem domain Requirements analysis Related work VLAM-G VIMCO Summary & acknowledgements
©2003, E. Kaletas
Field study
Related research
Design
Validation
• Procedure / protocol characterizations• Experiment characterizations• Requirements
• Experiment characterizations• Requirements• Available solutions• Shortcomings
genera lization
• Concepts and models• Architecture• Implementation
• Modifications• Improvements• Further generalization
©2003, E. Kaletas
Development approach Problem Domain: E-Science Domains Requirements analysis Related work VLAM-G VIMCO Summary & acknowledgements
©2003, E. Kaletas
E-Science Paradigm
Large amounts of data are generated by either simulations or
'networked' instruments (i.e. instruments that are connected to
storage and computing facilities through computer networks)
Many steps in experiments are automated (e.g. re-plating biological
sample by using a pipetting robot)
Information and communication technologies (ICT) are extensively
used throughout the entire experiment life-cycle, from experiment
design and execution to results analysis and interpretation
a new way of performing scientific research using advanced
computing, information, and communication technologies
©2003, E. Kaletas
E-Science Domains
Bio-sciences Drug discovery Food informatics
Physics Material analysis High-energy physics
Chemistry New product
development
Gas Chromatography Mass Spectrometer
Micro-array spotter
Fourier Transform Infra Red
Confocal Microscopy
© June 2003 - H. Afsarmanesh
©2003, E. Kaletas
E-Science Challenges
Complexity–Complex instruments
–Long and complex experimentation procedures
Data size In biology, sequence databases double in every
14 months In physics, 100s of MB of data is generated / day
Database Number of entries 1996 Number of entries 1997
EMBL nucleotides 850,000,000 1,200,000,000
EMBL entries 1,000,000 3,000,000
SwissProt aminoacids 30,000,000 50,000,000
SwissProt entries 80,000 90,000
Data heterogeneity–Wide variety of types of scientific information
–Various representations / formats
–Various access mechanisms
Need for collaboration–Sharing resources (data, hardware, software, etc.)
–Collaborative experimentation
Lack of standards for modeling and representation of information
–Specific solutions for common problems
–Wasted efforts
©2003, E. Kaletas
Life Sciences: A Challenging Scenario
drug against this disease is under
developmenttarget disease
responsible gene for this disease isalready known!
is the drug really preventing the
disease ?
How to do it?
DNA micro-arraysGenome-wide monitoring of changes
in gene expression levels
©2003, E. Kaletas
Micro-Array Experiments
Study the characteristics of thousands of genes in a single experiment, to help:
Identifying genes responding to certain stimuli (e.g. drugs, toxins)
Monitoring gene expression changes during disease progression
Better understanding of mechanisms of gene regulation
Assigning functions to novel genes Identifying metabolic pathways
The number of useful
data points produced
per experiment,
ranges from 12,000-
20,000 for yeast, to
200,000-300,000 for
human
©2003, E. Kaletas
Main Difficulties !
Long and complex experimental
procedures a typical micro-array experiment
consists of 5 phases
only the clone preparation phase of an
experiment contains up to 51 steps
Archival of experiment setup,
conditions, and experiment results as
well as the proper links among them
Experiment definition
4Hybridization,Scanning &
Image analysis
3cDNA-Probepreparation
2Micro-arrayproduction
5Data analysis
1Clone
preparation
Different levels of quality
requirements for different stages of
experimentation
Accessing extremely diverse set of
biological data resources necessary
for comparison and interpretation of
experiment results 281 biological database sources all
around the world that are listed in
biological databases collection of
2001, classified in 18 different
categories
Security of proprietary information,
for instance to support the privacy of
experiment setup and results for
drug discovery experiments in a
pharmaceutical company
©2003, E. Kaletas
Enhancement Environments
data/information handling, management, and analysis
high performance computation infrastructures
interoperation and collaboration infrastructures
development of enhancement environments to support scientists with their complex activities
Enhanced supporting environments are needed!
©2003, E. Kaletas
Development approach Problem domain Requirements Analysis Related work VLAM-G VIMCO Summary & acknowledgements
©2003, E. Kaletas
General Requirements
Infrastructure requirements storage facilities computing facilities networking facilities instrumentation facilities software environment
Interface requirements user interfaces programming interfaces
Functionality requirements experiment management information management user management Collaboration Security
Architectural / technological implementation requirements
©2003, E. Kaletas
Modeling requirements high expressive power generic and uniform evolvable easy to understand
Storage requirements multiple databases for diverse
scientific information
Manipulation requirements storage, access and manipulation
mechanisms for various types of information
query capability version management different granularity levels administrative mechanisms
Implementation requirements
Collaboration requirements distributed / federated information
management and query processing secure mechanisms for data /
information sharing and exchange data models and mechanisms for
defining syntax and semantics of objects
Security requirements define and enforce access rights for
data security and information visibility
Interoperability requirements standards for information modeling,
information exchange, data manipulation, and database implementation must be followed
mechanisms to resolve model / paradigm heterogeneity, semantic heterogeneity, and data definition / manipulation language heterogeneity
Information Management Requirements
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related Work: Enhanced Support
Environments VLAM-G VIMCO Summary & acknowledgements
©2003, E. Kaletas
Enhanced Support Environments
Science Portals
Problem Solving Environments
Virtual Laboratories
Help scientists during their experimentations
Provide similar / complementary functionality
©2003, E. Kaletas
Science Portals
Provide a uniform means for accessing resources Resources are of importance to a specific scientific community
Computational, storage, networking resources, electronic whiteboards, or a digital library
Usually made available to users as services Familiar and simplified interfaces, typically through Web
browsers A common set of services (e.g. job submission, file transfer) Users themselves are responsible for the correct and efficient
usage of the available resources
Examples: CLRC Data Portal, Gateway, Unicore, VirtualSky, Enter The Grid
©2003, E. Kaletas
Problem Solving Environments
Provides computational facilities needed to solve a target class of problems
Features provided by PSEs: solve simple or complex problems support rapid prototyping or detailed analysis by exploiting
technologies such as interactive color graphics, powerful processors, and networks of specialized services
assistance through automatic and semiautomatic selection of a proper solution method from the available set of solution methods
assistance by providing ways to easily incorporate new solution methods
Examples: Cactus, ASC, Pellpack, ECCE
©2003, E. Kaletas
Virtual Laboratories
Provides an electronic workspace for distributed collaboration and experimentation in research or other scientific creative activities
Supports an aggregation of people who pursue a related set of research activities and share resources
Resources, including the people, may be geographically distributed and associated with different institutions
Bring together best combination of skills, expertise, and tools to carry out the same type of research that is done in a single real laboratory
Examples: The VL Project, Softlab, VNMRF, DOE 2000 Programme, Tele-Actor, Virtual
Lab
“center without walls, in which the nation's researchers can perform their research without regard to geographical location - interacting with colleagues, accessing instrumentation, sharing data and computational resources, and accessing information in
digital libraries”William Wulf
©2003, E. Kaletas
SP vs. PSE vs. VL
Virtual Laboratories
Problem Solving Environments
Science Portals
Middleware
Resources
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related Work: Enabling Technologies,
Standards and Paradigms VLAM-G VIMCO Summary & acknowledgements
©2003, E. Kaletas
Enabling Technologies, Standards, Paradigms
Information models and standards address representational aspects of scientific information standardization is important for sharing and exchanging information
Distributed information management systems focuses on distributed / federated information management provides generic mechanisms for sharing and exchange of heterogeneous
information among multiple autonomous information centers Resource management technologies
efficient resource management high availability and performance of resources ease their usage by scientists
Other related technologies, paradigms and tools includes workflow management systems, virtual organization paradigm,
toolkits for Grid software development
©2003, E. Kaletas
Information Models and Standards
ODMG Standard at the level of an object store a standard data model for defining object schemas a standard language for object querying and manipulation a standard format for object exchange addresses syntactic heterogeneity among multiple object stores
WebDAV Distributed Authoring and Versioning Protocol provides a Web-based environment and mechanisms for multiple users to
publish, co-author and annotate documents aims to overcome systematic heterogeneity
Dublin Core Metadata Standard at the level of content standardizing elements and their meanings in the descriptions of
documents addresses semantic heterogeneity
©2003, E. Kaletas
Federated Information Management
Federated Database Systems
Loosely Coupled Tightly Coupled
Single Federation
Multiple Federations
Sharing and exchange of information
among collaborating sites
Preserve autonomy
Ensure security of private information
Hides data model, data manipulation
language, and semantic heterogeneity
+name : Str ing+id : Integer
+type : String+descri ption : S tring+startDate : Tim estamp
+endDate : T ime stamp+url : Str ing
Pro ject
+ comme ntDate : T imestamp+ comme nt : String
Commen t
+name : String+id : Integer
+experimentId : Integer+description : S tring
St ep
+hasCo mments
0..*
+hasNextStep
0..*
+hasP revStep
0..*
+act ivityDate : Timest amp
ActivityS tep Dat aS tep
+name : S tring+value : Str ing+unit : Strin g
Prop erty
+hasProp erties0..*
+nam e : String+id : I nteger
Pe rson
SwTo ol
+name : St ring+id : Integer+descriptio n : String
+serialNo : S tring
Har dware
+n ame : Str ing
+id : Intege r+d escription : S tring
+s erialNo : Str ing+v er sion : S tring
Sof tware
+hasSoftwar e
0..1
+hasHardwar e
0..1
+isM adeBy
0..1
+phone : S tring
+fax : Str ing+email : Str ing+url : S tring
User+isP erformedB y
1
+ hasPerform ed0..*
+name : Str ing+id : Integer+valu e : String
+unit : String
Par ameter
+hasParameters
0..*
+hasP arameters
0..*
+hasP arameters0..*+hasPara meter s0..*
+hasE mployee s
0..*
+isE mployeeOf
0..1
+name : S tring
+id : Integer+activityType : S tring+postalAddre ss : S tring
+phone : S tring+fax : St ring+email : Str in g
+url : S tring
Organizat ion
+name : S tring+id : Integer+pro jectId : Integer+description : Str ing
+type : S tring+subject : Str ing+status : Str in g
+lastUpdateDate : Tim estamp+publishedIn : Str ing+relatedP ubli cations : S tring
+url : S tring
E xper iment
+isPa rtOfProject
0..1
+ hasE xpe riments
0..*
+hasNextE xperiment0..1
+ha sP revExperiment
0..1
+isP artOfExperiment
0.. 1
+hasS teps
0..*
+hasComments 0..*
+o wnsExperiments 0..*
+hasOwner 0..1
HwTo ol
+hasCon tributor s0..*
+cont ributedExperiments0..*
+hasS ubStep 0..1 +hasSuper Step0..1
+hasVendor
0..1
+hasVendor
0..1
+na me : String+id : Integer
+typ e : S tring+de scription : S tring+sta rtDate : Timesta mp
+en dDate : T imestamp+ur l : String
P roj ect
+commentDate : Timestamp+comment : S tring
Com ment
+name : S tring+id : Integer
+experimentId : Integer+description : String
St ep
+hasComme nts
0..*
+h asNextS tep
0..*
+h asP revS tep
0..*
+ act ivityDate : T im estamp
ActivitySte pDataS tep
+name : S tring
+value : String+unit : String
P rop erty
+hasProperties0..*
+name : S tring+id : Integer
Per son
SwToo l
+name : Str ing+id : Integer
+descr iption : S tring+serialNo : S trin g
Har dware
+name : String+id : Integer
+description : S tring+serialNo : String+version : S tring
Sof tware
+hasS oftwar e
0..1
+hasHard ware
0..1
+isMade By
0..1
+phone : S tring
+fax : String+email : Str ing+url : Str ing
User +isP erformedB y
1
+ hasP erformed0..*
+n ame : Str ing+i d : Intege r+v alue : S tring
+u nit : String
Param et er
+hasP arameters
0..*
+hasP arameters
0..*
+hasP arameters0..*+hasP aramet er s0..*
+hasEm plo yees
0..*
+isE mployeeOf
0.. 1
+name : S tring+id : Integer
+activityType : Str ing+postalAddress : S tring+phone : S tring
+fax : Str ing+email : Str ing+url : Str ing
Or ganization
+name : S tring
+id : Integer+projectId : Integer+description : String
+type : St ring+subject : String+status : Str ing
+lastUpda teDate : Timestamp+publishe dIn : String+relatedP ublications : String
+url : S trin g
E xperime nt
+isPar tOfP roject
0..1
+hasE xperiments
0..*
+hasNextE xperim ent0..1
+hasPr evExperiment
0..1
+isPartO fExper im ent
0..1
+hasSteps
0..*
+ hasComments 0..*
+ow nsE xperiments 0..*
+hasOwner 0..1
HwToo l
+hasContributor s0..*
+c ontr ibut edExperiments0..*
+hasS ubStep 0..1 +hasSuperS tep0..1
+hasV endor
0..1
+hasV endor
0..1
+name : Str ing
+id : Integer+type : String+description : String
+startDate : Timestamp+endDate : T im estamp+url : Str ing
Pro jec t
+ comm entDate : T imestamp+ comm ent : String
Commen t
+nam e : String
+id : Integer+experimentId : Intege r+description : S tring
St ep
+hasComments
0..*
+hasNextStep
0..*
+hasPrevSt ep
0..*
+activityDa te : Timestamp
ActivityS tep Dat aS tep
+name : S tring+value : Str ing+unit : St ring
Prop erty
+hasPro perties0..*
+na me : String+id : Integer
P erson
SwT ool
+name : S tring+id : Integer+description : String
+serialNo : S tring
Har dware
+ name : Str ing
+ id : Inte ger+ descrip tion : S tring
+ serialNo : Str ing+ ver sion : S tring
Sof tware
+hasSoftw are
0..1
+hasHardw ar e
0..1
+isMadeBy
0..1
+phone : S tring
+fax : Str ing+email : Str in g+url : S tring
User+isP erformedB y
1
+ hasPerfor med0..*
+name : Str ing+id : Integer+value : String
+unit : String
Par ameter
+hasParameters
0..*
+hasP arameter s
0..*
+hasP arameter s0..* +hasPar ameter s0..*
+ha sE mployees
0..*
+isE mployeeOf
0..1
+name : S tring
+id : Integer+activityTy pe : S tring+postalAdd ress : S tring
+phone : S tring+fax : St rin g+email : St ring
+ur l : S tring
Organ izat ion
+n ame : S tring+id : Integer+p rojectId : Integer
+d escription : Str ing+type : S tring+subject : S tring
+status : St ring+lastUpdateDate : T imestam p+p ublishedIn : Str ing
+re latedP ublications : S tring+u rl : S tring
E xper iment
+isP artOfPr oject
0..1
+ hasE xperiment s
0..*
+hasNextE xperiment0..1
+h asP revExperiment
0..1
+ isP artOfExperiment
0 ..1
+hasS teps
0..*
+hasComments 0..*
+ ownsExperiments 0..*
+hasOwner 0..1
HwT ool
+hasCo ntributo rs0..*
+contributedExperiments0..*
+hasS ubStep 0..1 +hasSu per Step0..1
+hasVendor
0..1
+hasVendor
0..1
Examples: PEER, Donaji, DiscoveryLink, Virtuoso
©2003, E. Kaletas
Grid Technology
“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities”. Karl Kesselman and Ian Foster.
http://www.globus.org
What is the Grid?
On-demand creation of powerful virtual computing systems
Uniform, high-performance access to computational resources
Sensor nets
Data archives
Computers
Softwarecatalogs
Colleagues
©2003, E. Kaletas
Grid & Data Management
Several ongoing projects and research efforts: Global Grid Forum – DAIS
Database Access and Integration Services Working
Group
EU DataGrid Project Data Management Work Package
UK e-Science Programme Database Task Force
What is out there?
©2003, E. Kaletas
Grid & Data Management
What is Missing?
Focus is on: integrating database access within Grid using existing Grid services as much as possible
Federation: So far, only means retrieving data from different data sources No mention of:
Semantic integration of data/information Heterogeneous data models Access rights and visibility at different levels
Recent research & emerging technology: Not mature yet!
Research on federated databases & collaborating virtual organizations must be incorporated!
©2003, E. Kaletas
Other Technologies, Standards, Paradigms
Virtual Organizations “a temporary alliance of enterprises that come together to share skills
or core competencies and resources in order to better respond to business opportunities, whose cooperation is supported by computer networks”
Allow organizing the collaborative activities by defining and enforcing collaboration rules and conditions
Workflow Management Systems Allow users to organize the activities required to accomplish a task,
and specify rules for the correct execution and successful completion of the activities
Provides the coordination system required for experiment execution
Grid Programming Toolkits Provide convenient interfaces to the Grid (e.g. Java and Perl
interfaces)
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G: Grid-based Virtual Laboratory
Amsterdam VIMCO Summary & acknowledgements
©2003, E. Kaletas
VLAM-G Overview
VIRTUAL LABORATORY
FTIR
DNA MICRO-ARRAY
INTERNET
INTERNET
CAVE
MICRO_BEAM
MCT focal plane array
FTS 6000 Spectrometer
Step-scanmirror
Fixedmirror
50/50Beamsplitter
Ceramicmid-IR source
Turningmirror
Turningmirror
CaF2Lens
Cassegrainiancondensor
Cassegrainianobjective
Sample
Grid / Globus Services
©2003, E. Kaletas
Virtual Laboratory
Application Layer
Grid Layer
VLAM-G Middleware
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G: Functionality Design VIMCO Summary & acknowledgements
©2003, E. Kaletas
Use – Case Analysis
Scientist
Domain Expert
Tool Developer
Administrator
Virtual Laboratory
©2003, E. Kaletas
Scientist
Design experiments
Execute experiments
Analyze experimentresults
Virtual Laboratory
Scientists’ Use Cases
©2003, E. Kaletas
Domain Expert
Model domain -specific information
Define protocols
Design experimenttemplates/procedures
Scientist
«extends»
Virtual Laboratory
Domain Experts’ Use Cases
©2003, E. Kaletas
Tool Developer
Develop softwaretools
Deploy softwaretools
Virtual Laboratory
Tool Developers’ Use Cases
©2003, E. Kaletas
Virtual Laboratory
Administrator
Monitor/manageresources
Manage useraccounts
Administrators’ Use Cases
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G: Architecture Design VIMCO Summary & acknowledgements
©2003, E. Kaletas
application toolkit tier
VLAM-Gresources tier
application presentation
tier
grid services tier
Application specific functionality
Generic VLAM-G functionality
DNA-Array Portal MACS Portal Other Portals …Science Portals
VLAM-G Middleware
Grid Middleware Distributed resource management
Computing / Networking / Storage Resources
4-tiers of VLAM-G Architecture
©2003, E. Kaletas
Globus Toolkit
application toolkit tier
grid services tier
VIMCORTS
SessionManager
Collaboration Assistant
PFT Instantiator Topology Editor PFT EditorFront-End
VLAM-G Middleware
VLAM-GarchiveExpressive MACS RTS
DBLinuxcluster
CAVE µ-Beam
data / information resources HW resources
VIMCODB
Modulerepository
SW resources
VLAM-Gresources tier
application presentation
tier
VLAM-G Architecture
©2003, E. Kaletas
user user
Front-End
Collaboration Session Manager
RTS
Assistant
VIMCO
Grid / Globus services
M1 M2 M3 M4
Module Repository
Resource A Resource B…
RTSDB
ProjectDB
VIMCODB
… …
ApplicationDBs
Components of VLAM-G Architecture
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G: Conceptual Design VIMCO Summary & acknowledgements
©2003, E. Kaletas
Project * Experiment * Experiment StepProject * Experiment * Experiment Step
Physical Entity
Activity
Data Element
EXP RFLAG NAME TYPE CH1I CH1B CH1D CH2Iy744n40 GENOMIC 450NG/UL CONTROL 493 191 302 566y744n40 GENOMIC 150NG/UL CONTROL 321 212 109 384y744n40 GENOMIC 450NG/UL CONTROL 771 200 571 824y744n40 GENOMIC 150NG/UL CONTROL 333 213 120 404y744n40 GENOMIC 50NG/UL CONTROL 471 210 261 537y744n40 3XSSC CONTROL 399 216 183 461y744n40 GENOMIC 50NG/UL CONTROL 355 214 141 403y744n40 3XSSC CONTROL 394 213 181 442y744n40 POLY A CONTROL 916 216 700 1204y744n40 LAMBDA CONTROL 611 215 396 664y744n40 POLY A CONTROL 420 228 192 491y744n40 LAMBDA CONTROL 419 228 191 444y744n40 GENOMIC 450NG/UL CONTROL 371 212 159 430y744n40 GENOMIC 150NG/UL CONTROL 338 227 111 392y744n40 GENOMIC 450NG/UL CONTROL 1174 218 956 1132y744n40 GENOMIC 150NG/UL CONTROL 564 226 338 643y744n40 GENOMIC 50NG/UL CONTROL 916 225 691 913y744n40 3XSSC CONTROL 390 201 189 441y744n40 GENOMIC 50NG/UL CONTROL 523 235 288 574y744n40 3XSSC CONTROL 392 224 168 442y744n40 POLY A CONTROL 2168 251 1917 3895y744n40 LAMBDA CONTROL 1195 230 965 2033y744n40 POLY A CONTROL 841 250 591 944y744n40 LAMBDA CONTROL 461 207 254 559y744n40 SALMON SPERM CONTROL 420 236 184 424y744n40 PBR322 CONTROL 542 222 320 490y744n40 SALMON SPERM CONTROL 608 254 354 701y744n40 PBR322 CONTROL 763 213 550 656y744n40 PKRX PLASMID CONTAINING LACZ CONTROL 394 242 152 436y744n40 PKRX PLASMID CONTAINING LACZ CONTROL 923 206 717 529y744n40 PLASMID CONTAINING GFP CONTROL 345 233 112 389y744n40 PLASMID CONTAINING GFP CONTROL 424 223 201 454y744n40 3XSSC CONTROL 391 226 165 440y744n40 3XSSC CONTROL 468 211 257 507y744n40 3XSSC CONTROL 331 243 88 374y744n40 3XSSC CONTROL 384 202 182 437
General Structure of Scientific Experiments
©2003, E. Kaletas
Common Aspects of Scientific Experiments
Scientist performing the experiment (owner of the experiment) Input (e.g. data acquired from a device) A set of activities (e.g. computational processes), applied on the input Output (e.g. data obtained by applying a process on an input data) Devices (hardware) (e.g. to acquire data, or to perform a process) Software (e.g. to control a specific device, or to perform a specific
process) Conditions and parameters for the processes, devices and software A recursive flow of processes and data where a specific order is
followed during an experiment
©2003, E. Kaletas
Process Data Flows
HYBRIDIZATION
ARRAY IMAGE
HYBRIDIZED ARRAY
ARRAY SCANNING
ARRAY IMAGE ANALYSIS
ARRAY MEASUREMENT
SCANNER
IMG. ANALYSIS PRG.
MICRO-ARRAYCLONE
ORGANISMGENE
PDF for micro-array experiments
Step-by-step definition of an experiment
Steps involved in an experiment
Attributes of a step Relationships among
steps
©2003, E. Kaletas
VLAM-G Experiment Model
Process Flow Template
Study
Experiment
Experiment Topology
Project * Experiment * Experiment StepProject * Experiment * Experiment Step
Scientist performing the experiment (owner of the experiment)
Input (e.g. data acquired from a device)
A set of activities (e.g. computational processes), applied on the input
Output (e.g. data obtained by applying a process on an input data)
Devices (hardware) (e.g. to acquire data, or to perform a process)
Software (e.g. to control a specific device, or to perform a specific process)
Conditions and parameters for the processes, devices and software
A recursive flow of processes and data where a specific order is followed during an experiment
HYBRIDIZATION
ARRAY IMAGE
HYBRIDIZED ARRAY
ARRAY SCANNING
ARRAY IMAGE ANALYS IS
ARRAY MEAS UREMENT
SCANNER
IMG. ANALYS IS PRG.
MICRO-ARRAYCLONE
ORGANIS MGENE
General Structure of Scientific Experiments
Common Aspects of Scientific Experiments
Process Data Flows in
Experiments
uniform, generic, step-wisemodeling of
scientific experiments
©2003, E. Kaletas
Process Flow Template
defines the approach to solve a particular scientific problem by defining the typically involved steps
standardizes the experimental approach
for experiments of the same type
ARRAY IMAGE
HYBRIDIZED ARRAY
ARRAY SCANNING
ARRAY IMAGE ANALYSIS
ARRAY MEASUREMENT
SCANNER
IMG. ANALYSIS PRG.
MICRO-ARRAY CLONE
ORGANISMGENEHYBRIDIZATION
©2003, E. Kaletas
Study
an instantiation of a process flow template
describes the solution –describes the accomplishment of a particular experiment, by providing descriptions of each step involved in the experiment
provides the context for a particular experiment
Mouse tissue array scanningMAD-lab-219
Exp-1082003-02-01
Mouse tissue array imageMAD-lab-220
Exp-108/home/user/images/mouse-tissue.jpg
Mouse tissue array image analysisMAD-lab-221
Exp-1082003-02-02
Mouse tissue array measurementMAD-lab-222
Exp-108/home/user/data/raw/mouse-tissue.dat
AB Laser ScannerMAD-lab-Dev-100
Scanner for type A micro-arraysModel-MAA2001
Lab Image AnalyzerMAD-lab-SW-78
Image analyzer for images produced by MAA2001 scanners
MAA2001-v7.1
Mouse tissue micro-arrayMAD-lab-219
Exp-1083 cm x 8 cm1000 spots
Freezer 18, rack 4
…
©2003, E. Kaletas
Experiment Topology
during its processing and analysis, data flows from one process to another
data flow is represented by a directed graph
Nodes => computational processes
connecting arcs => data flowing through the processes
This data flow graph is called as experiment topology
FILE_READERread array image
IMAGE_ANALYZERanalyze and quantify array image
FILE_WRITERwrite image analysis raw data
©2003, E. Kaletas
ARRAY IMAGE
HYBRIDIZED ARRAY
ARRAY SCANNING
ARRAY IMAGE ANALYS IS
ARRAY MEAS UREMENT
SCANNER
IMG. ANALYS IS PRG.
MICRO-ARRAY CLONE
ORGANIS MGENEHYBRIDIZATION
Mouse tissue array scanningMAD-lab-219
Exp-1082003-02-01
Mouse tissue array imageMAD-lab-220
Exp-108/home/user/images/mouse-tissue.jpg
Mouse tissue array image analysisMAD-lab-221
Exp-1082003-02-02
Mouse tissue array measurementMAD-lab-222
Exp-108/home/user/data/raw/mouse-tissue.dat
AB Laser ScannerMAD-lab-Dev-100
Scanner for type A micro-arraysModel-MAA2001
Lab Image AnalyzerMAD-lab-SW-78
Image analyzer for images produced by MAA2001 scanners
MAA2001-v7.1
Mouse tissue micro-arrayMAD-lab-219
Exp-1083 cm x 8 cm1000 spots
Freezer 18, rack 4
Mouse tissue micro-array hybridizationMAD-lab-220
Exp-1082003-02-01
65 °C2 µl of 3 M sodium acetate (pH 5.5)
Mouse tissue hybridized micro-arrayMAD-lab-221
Exp-108Freezer 21, rack 0
FILE_READERread array image
IMAGE_ANALYZERanalyze and quantify array image
FILE_WRITERwrite image analysis raw data
Process Flow Template
Study
Experiment Topology
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G: Implementation VIMCO Summary & acknowledgements
©2003, E. Kaletas
user user
Front-End
Collaboration Session Manager
RTS
Assistant
VIMCO
Grid / Globus services
M1 M2 M3 M4
Module Repository
Resource A Resource B…
RTSDB
ProjectDB
VIMCODB
… …
ApplicationDBs
VLAM-G Implementation
Matisse ODBMS32-Node PIII Linux cluster (DAS2)
Several stand-alone Linux boxes
Globus Toolkit
C / C++ programs
Matisse Java APIJDBC
Java
Java
Java
Not implemented yetNot implemented
yet
HTTPRMI
HTTPRMI
CORBA
Globus libraries
XML
©2003, E. Kaletas
VLAM-G Implementation – Current
SERVER
Session ManagerRMI SERVER
CLIENTIN
TE
RN
ET
APPLICATION
Other VLAM-G Components
RMI
SERVER
Session ManagerWEB SERVER
(SERVLET-BASED)
CLIENT
APPLICATION
Other VLAM-G Components
Heavy client application
Heavy client application
HTTP(S)
service – based
service – based
©2003, E. Kaletas
VLAM-G Implementation – Alternative I
INT
ER
NE
T
SERVER
Session ManagerWEB SERVER
(SERVLET-BASED)
CLIENT
APPLET(IN A BROWSER)
Other VLAM-G Components
Heavy client applet
HTTP(S)
• Slow (especially in a browser)• Security issues (because of applets)• Requires a powerful client machine
• Easy to maintain
service – based
©2003, E. Kaletas
VLAM-G Implementation – Alternative II
INT
ER
NE
T
SERVER
Session ManagerWEB SERVER
(SERVLET-BASED)
CLIENT
APPLICATION(JAVA WEB START)
Other VLAM-G Components
Heavy client application
HTTP(S)
• Slow (only once for the first use)• Requires a powerful client machine
• Easy to maintain
service – based
©2003, E. Kaletas
VLAM-G Implementation – Future
INT
ER
NE
T
SERVER
Session ManagerWEB SERVICE
CLIENT
APPLET / SIMILAR(IN A BROWSER)
Other VLAM-G Components
Thin client
SOAP
• Easy to maintain• Fast• Off – the – shelf client machine
event – based
VLAM-G X ManagerWEB SERVICE
VLAM-G Y ManagerWEB SERVICE
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G: In Action VIMCO Summary & acknowledgements
©2003, E. Kaletas
PFT Editor
©2003, E. Kaletas
PFT Viewer
©2003, E. Kaletas
Topology Editor
©2003, E. Kaletas
Histogram demo
©2003, E. Kaletas
Floating ball experiment
©2003, E. Kaletas
MRI Scan experiment
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO: Virtual Laboratory Information
Management for Collaboration Summary & Acknowledgements
©2003, E. Kaletas
VIMCO Overview
VIMCO objectives for scientists: Assistant Enabler / Facilitator Information Manager
VIMCO objectives for VLAM-G: Service & Session Information Manager
Virtual laboratory Information Management for Co-Operation
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO: Information Modeling Summary & Acknowledgements
©2003, E. Kaletas
Data model with generic constructs
Extending the generic
constructs
Application specific database schemas
+activityDate : Date
ActivityStepDataStep+name : String+value : String+unit : String
0
+name : String+id : int
Person
+commentDate : Date+comments : String
Comment
isMadeBy
+name : String+id : int+description : String+serialNo : String+version : String
Software
+name : String+id : int+description : String+serialNo : String
Hardware+username : String+certSubject : String
User
+name : String+id : int+activityType : String
OrganizationhasVendor
hasVendor
hasEmployees
isEmployeeOf
hasNextExperiment
hasPrevExperiment
+name : String+id : int+projectId : int+description : String+type : String+subject : String+status : String+lastUpdateDate : Date+publishedIn : String+relatedPublications : String+url : String
Experiment
hasComments
ownsExperiments
hasOwner
+name : String+id : int+value : String+unit : String
Parameter
hasParameters
hasParameters
HwToolhasHardware
hasParameters
SwToolhasSoftware
hasParameters
hasNextStep hasPrevStep
+name : String+id : int+experimentId : int+description : String
Step
hasSubStephasSuperStep
isPartOfExperiment hasSteps
hasComments
hasProperties
isPerformedBy
hasPerformed
+name : String+id : int+type : String+description : String+startDate : Date+endDate : Date+url : String
Project
isPartOfProject hasExperiments
contributedExperiments
hasContributors
+postalAddress : String+phone : String+fax : String+email : String+url : String
ContactInfo
hasContactInfo
hasContactInfo
+activityDate : Timestamp
ActivityStepDataStep
+name : String+value : String+unit : String
Property
ArraySpotting
+isManuallyPerformed : Boolean+hybTemp : String+washTemp : String+washTime : String+washSolution : String+dryTime : String
Hybridization
+aim : String+treatmentType : String+timeElapsed : String+temperature : String
CellSampleTreatment
RNAIsolation
+labelUsed : String+labelingRatio : Float
RNALabeling
ArrayScanning
ArrayImageAnalysis
+slideType : String+arrayDimensions : String+spotDimensions : String+pitch : String+numSpots : Integer+substrate : String
MicroArray
+phenotype : String+genotype : String+geneticVariation : String+organ : String+tissue : String+cellType : String+cellLine : String+cellCulture : String+sex : String+strain : String+developmentStage : String+age : Integer+ageUnit : String+growthConditions : String+pathologyDescription : String
CellSample
TreatedCellSample
+volume : String
RNA
+volume : String+concentration : String
Target
HybridizedArray
+imageLocationURL : String
ArrayImage
+rawDataLocationURL : String
ArrayMeasurement
Process +name : String+id : Integer+description : String
Protocol+hasProtocol
0..*
+spotLocation : String+flag : Short+control : Boolean
Spot+name : String+id : Integer+description : String+type : String+length : Integer+sequence : String+unigeneId : String
Clone
+name : String+id : Integer+description : String+subtype : String+ncbiTaxonomyAccNo : String
Organism+name : String+id : Integer+description : String+function : String+chromosome : String+emblAccNo : String
Gene
+hasSource
0..1+hasGenes
0..*
+hasOrganism
0..1
+hasGene0..1
+hasOrganism
0..1
+containsClone
1..1
+hasTemplate
1..1
+usedByArrays
0..*
+isMeasurementFor0..1
+hasMeasurement0..*
+hasTarget 0..*
+hasMeasurement 0..*
+name : String+id : Integer
Person
+commentDate : Timestamp+comments : String
Comment
+isMadeBy0..1
+name : String+id : Integer+description : String+serialNo : String+version : String
Software
+name : String+id : Integer+description : String+serialNo : String
Hardware
+username : String+certSubject : String
User
DNALabSwTool
+usesSwTool0..1
+usesSwTool
0..1
+usesSwTool 0..1+usesSwTool 0..1
+name : String+id : Integer+activityType : String
Organization+hasVendor
0..1+hasVendor
0..1+hasEmployees
0..* +isEmployeeOf 0..1
+hasNextExperiment
0..1
+hasPrevExperiment0..1
+name : String+id : Integer+projectId : Integer+description : String+type : String+subject : String+status : String+lastUpdateDate : Timestamp+publishedIn : String+relatedPublications : String+url : String
Experiment
+hasComments
0..*
+ownsExperiments
0..*
+hasOwner
0..1
+name : String+id : Integer+value : String+unit : String
Parameter
+hasParameters0..*+hasParameters0..*
HwTool
+hasHardware
0..1
+hasParameters
0..*
SwTool
+hasSoftware
0..1
+hasParameters
0..*
DNALabHwTool
+usesHwTool0..1
+usesHwTool
0..1
+usesHwTool 0..1
+attName : String+className : String+handlingType : Integer+accessMod : Integer+isNullable : Boolean+displayName : String+description : String
AttProperty
+hasNextStep 0..* +hasPrevStep0..*
+name : String+id : Integer+experimentId : Integer+description : String
Step
+hasSubStep
0..1
+hasSuperStep
0..1
+isPartOfExperiment
0..1
+hasSteps
0..*
+hasComments
0..*
+hasProperties0..*
+isPerformedBy
0..1
+hasPerformed
0..*
+name : String+id : Integer+pftGroupId : Integer+version : Integer+description : String
PFT
+displayName : String+id : Integer+pftId : Integer+description : String+className : String+classCodebase : String+minCardinality : Integer+maxCardinality : Integer+isRTSProcess : Boolean+reusePolicy : Integer
PFTElement
+displayName : String+id : Integer+description : String+relName : String+inverseRelName : String+reusePolicy : Integer
PFTConnection
+shape : String+color : String+icon : List<Byte>+x : Integer+y : Integer+w : Integer+h : Integer
PFTGUI
+isFromPFTElement0..1+hasOutgoingConnections 0..*
+isToPFTElement
0..1
+hasIncomingConnections0..*
+hasGUIProperties 0..1
+isPartOfPFT
0..1
+hasStartElements
0..*
+hasGUIProperties
0..1
+name : String+id : Integer+type : String+description : String+startDate : Timestamp+endDate : Timestamp+url : String
Project
+isPartOfProject
0..1
+hasExperiments
0..*
+contributedExperiments0..*
+hasContributors
0..*
+studyId : Integer+pftId : Integer
StudyPFT
+studyElmId : Integer+pftElmId : Integer
StudyElmPFTElm+elements
0..*
+originOid : Integer+copyOid : Integer+updatePolicy : Integer
OriginCopy
+name : String+id : Integer+description : String
ArrayTemplate
+isInArrayTemplate0..1
+hasSpots0..*
DataAnalysis
+signal : Double+bkgd : Double+intensity : Double
ChMeasurement
+usesSwTool 0..1
+hasChMeasurements 0..*
+isForArrayMeasurement
1..1
+isForSpot
1..1
+hasChMeasurements0..*
+isForTarget
0..1
+postalAddress : String+phone : String+fax : String+email : String+url : String
ContactInfo
+hasContactInfo
0..*
+hasContactInfo
0..*
generic types
extended types
Process and data flow in experiments
MCT focal plane array
FTS 6000 Spectrometer
Step-scanmirror
Fixedmirror
50/50Beamsplitter
Ceramicmid-IR source
Turningmirror
Turningmirror
CaF2Lens
Cassegrainiancondensor
Cassegrainianobjective
Sample
Micro-array experiments
Material analysis experiments
ARRAY_SPOTTING MICRO_ARRAY
ARRAY_SPOTTER
ARRAY_SPOTTER_PRG
SPOT
CLONE
ORGANISMGENE
CELL_SAMPLE_TREATMENT
CELL_SAMPLEORGANISM
TREATED_CELL_SAMPLE
mRNA_EXTRACTION
mRNA
mRNA_LABELINGname: stringid: string
description: stringactivityDate: datelabelUsed: stringlabelingRatio: float
mRNA_PROBEHYBRIDIZATION
ARRAY_IMAGE
SLIDE_PROCESSORARRAY_ HYBRIDIZED
ARRAY_SCANNING
ARRAY_IMG_ANALYSIS
ARRAY_MEASUREMENT
SPOT_MEASUREMENT
CHANNEL_MEASUREMENT
SCANNER
IMG_ANALYSIS_PRG
MICRO_ARRAY
SPOT
mRNA_PROBE
mRNA_PROBE
Process-Data FlowCommon
aspects of experiments…
Approach
©2003, E. Kaletas
Representing Scientific Experiments
Three data models for the three components of the VLAM-G experiment model
Process Flow Template
Study
Experiment Topology
PFT Data Model
Experimentation Environment Data Model
Module and Topology Data Models
©2003, E. Kaletas
PFT Data Model
Represents PFTs as a set of PFT elements and connections between them
PFTElements correspond to data types defined in an application database schema
PFTConnections correpond to relationships defined between data types in an application database schema
Stored in application databases
+name : String+id : integer+pftGroupId : integer+version : integer+description : string
PFT+displayName : String+description : String+id : int+relName : String+inverseRelName : string+reusePolicy : integer
PFTConnection+displayName : String+id : integer+pftId : integer+description : String+className : String+classCodebase : String+minCardinality : integer+maxCardinality : integer+isRTSProcess : boolean+reusePolicy : integer
PFTElement
+isPartOfPFT
0..1 +hasStartElements
0..*
+isFromPFTElement
0..1
+hasOutgoingConnections0..*
+isToPFTElement
0..1
+hasIncomingConnections 0..*
+shape : String+color : String+icon : list<byte>+x : integer+y : integer+w : integer+h : integer
PFTGUI
+hasGUIProperties
0..1
+hasGUIProperties 0..1
©2003, E. Kaletas
Experimentation EnvironmentData Model
Generic constructs for common aspects of experiments
Represents studies as instances of process flow templates
EEDM elements correspond to objects in an application database
+activityDate : Date
ActivityStepDataStep+name : String+value : String+unit : String
0
+name : String+id : int
Person
+commentDate : Date+comments : String
Comment
isMadeBy
+name : String+id : int+description : String+serialNo : String+version : String
Software
+name : String+id : int+description : String+serialNo : String
Hardware+username : String+certSubject : String
User
+name : String+id : int+activityType : String
OrganizationhasVendor
hasVendor
hasEmployees
isEmployeeOf
hasNextExperiment
hasPrevExperiment
+name : String+id : int+projectId : int+description : String+type : String+subject : String+status : String+lastUpdateDate : Date+publishedIn : String+relatedPublications : String+url : String
Experiment
hasComments
ownsExperiments
hasOwner
+name : String+id : int+value : String+unit : String
Parameter
hasParameters
hasParameters
HwToolhasHardware
hasParameters
SwToolhasSoftware
hasParameters
hasNextStep hasPrevStep
+name : String+id : int+experimentId : int+description : String
Step
hasSubStephasSuperStep
isPartOfExperiment hasSteps
hasComments
hasProperties
isPerformedBy
hasPerformed
+name : String+id : int+type : String+description : String+startDate : Date+endDate : Date+url : String
Project
isPartOfProject hasExperiments
contributedExperiments
hasContributors
+postalAddress : String+phone : String+fax : String+email : String+url : String
ContactInfo
hasContactInfo
hasContactInfo
©2003, E. Kaletas
Modeling Application Specific Information
DataStep
+slideType : String+arrayDimensions : String+spotDimensions : String+pitch : String+numSpots : Integer+substrate : String
MicroArray
+phenotype : String+genotype : String+geneticVariation : String+organ : String+tissue : String+cellType : String+cellLine : String+cellCulture : String+sex : String+strain : String+developmentStage : String+age : Integer+ageUnit : String+growthConditions : String+pathologyDescription : String
CellSample
TreatedCellSample
+volume : String
RNA
+volume : String+concentration : String
Target
HybridizedArray
+imageLocationURL : String
ArrayImage
+rawDataLocationURL : String
ArrayMeasurement
+spotLocation : String+flag : Short+control : Boolean
Spot+name : String+id : Integer+description : String+type : String+length : Integer+sequence : String+unigeneId : String
Clone
+name : String+id : Integer+description : String+subtype : String+ncbiTaxonomyAccNo : String
Organism+name : String+id : Integer+description : String+function : String+chromosome : String+emblAccNo : String
Gene
+hasSource
0..1+hasGenes
0..*
+hasOrganism
0..1
+hasGene0..1
+hasOrganism
0..1
+containsClone
1..1
+hasTemplate
1..1
+usedByArrays
0..*
+isMeasurementFor0..1
+hasMeasurement0..*
+hasTarget 0..*
+hasMeasurement 0..*
+name : String+id : Integer+description : String
ArrayTemplate
+isInArrayTemplate0..1
+hasSpots0..*
+signal : Double+bkgd : Double+intensity : Double
ChMeasurement
+hasChMeasurements 0..*
+isForArrayMeasurement
1..1
+isForSpot
1..1
+hasChMeasurements0..*
+isForTarget
0..1
generic EEDM type
extended application -specific types
related application -specific types
©2003, E. Kaletas
Module Data Model
Module (software entity):Self-contained executable program
Description of:
Tasks Input/output Run-time
requirements Parameters
+name : string+id : integer+description : string+classification : string+registrationDate : timestamp+minNumberCPUReq : string+avgNumberCPUReq : string+maxNumberCPUReq : string+minCPUTimeReq : string+avgCPUTimeReq : string+maxCPUTimeReq : string+minMemoryReq : string+avgMemoryReq : string+maxMemoryReq : string+minStorageReq : string+avgStorageReq : string+maxStrorageReq : string+version : string+manual : string
mSoftwareEntity
mAggSoftwareEntity mAtomSoftwareEntity
+mConsistsOf
0..*
+executablePlatform : String+executableOs : string+executableURL : string+dynamicLibs : list<string>+executionHost : string+executionLocation : String+authCertSubject : string
mExecutable
+mHasExecutables
0..*
+name : String+id : int
mConnection+name : String+id : integer+description : String+direction : String
mPort
+mContainsConnection0..*
+hasInConn
0..1
+mHasToPort
0..1+hasOutConn
0..*
+mHasFromPort
0..1
+mIsOutPortForSoftwareEntity0..1
+mHasOutPorts0..*
+mIsInPortForSoftwareEntity0..1
+mHasInPorts0..*
+shape : String+color : String+icon : list<byte>+x : integer+y : integer+w : integer+h : integer
GUI
+name : string+type : string+description : string+isRequired : boolean+defaultValue : string+values : string
mParameter
+mHasParameters
0..*
+hasGUIProperties0..1
+hasGUIProperties0..1
-mHasDevelopedSoftwareEntities
0..*
-mHasDeveloper
1..1
+name : string+id : integer+description : string
mDataType+mIsDataTypeForPort
0..*
+mHasDataType
0..1
+name : string+defaultValue : string
mEnvVariable
+mHasVariables 0..*
+name : String+description : String+order : int+isRequired : boolean+argumentSwitch : String+values : String
mArgument
+mHasArguments
0..*
+name : string+id : integer
Person
+username : string+certSubject : string
User
+postalAddress : string+phone : string+fax : string+email : string+url : string
ContactInfo
+name : string+id : integer+activityType : string
Organization
+hasContactInfo0..*
+hasContactInfo
0..*
+hasEmployees
0..*
+isEmployeeOf0..*
©2003, E. Kaletas
Topology Data Model
Experiment Topology (processing):Graph defined by attaching modules to each other
Run-time info:
Host Actual
parameters Environment
variables
+shape : String+color : String+icon : list<byte>+x : integer+y : integer+w : integer+h : integer
GUI
+name : string+value : string
Parameter
Process
+name : String+id : integer+description : String+creationDate : timestamp
Processing
+hasProcesses
0..*
+isProcessInProcessing 0..1
+hasParameters0..*
Port
Connection
+globusContext : String+logFileURL : String+hostName : String+rsl : string+commandLineArgs : string
RunTimeInfo
+hasToPort
0..1
+hasInConn
0..1
+hasFromPort
0..1
+hasOutConn
0..*
+isInPortForProcess
0..1
+hasInPorts
0..*
+isOutPortForProcess
0..1
+hasOutPorts
0..*
+hasRunTimeInfo 0..1
+isDefinedBy
0..1
+hasDefinedProcessings
0..*
+hasGUIProperties0..1
+hasGUIProperties 0..1
+name : string+value : string
EnvVariable
+hasVariables0..*
+name : string+id : integer
Person
+username : string+certSubject : string
User
+postalAddress : string+phone : string+fax : string+email : string+url : string
ContactInfo
+name : string+id : integer+activityType : string
Organization
+hasContactInfo0..*
+hasContactInfo
0..*
+hasEmployees0..*
+isEmployeeOf0..*
+containsConnections 0..*
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO: Functionality Modeling Summary & Acknowledgements
©2003, E. Kaletas
VIMCO Functionality
VIMCO functionality for scientists:Assistant•Modeling scientific experiments
•Information presentation
•Customizing experimentation environment
•Information manipulation
Enabler / Facilitator•Multi-disciplinary experiments
•Information integration
•Collaboration
Information Manager•Information management services
•Several information management libraries
VIMCO functionality for VLAM-G:
Service & Session Information Manager
•Services Information Management
•Sessions Information Management
•User & Access-Rights Management
©2003, E. Kaletas
Modeling Scientific Experiments
Project * Experiment * Experiment Step
Micro-Array Experiments
Material AnalysisExperiments
Other ScientificApplication Domains
ARRAY_SPOTTING
MICRO_ARRAYplatform_type: STRINGarray_dimensions : STRINGspot_dimensions : STRINGpitch: STRINGnum_grids: INTEGERnum_rows : INTEGERnum_cols : INTEGERsubstrate: STRINGbinding_protocol : STRINGlocation: STRING
has_next_elm 0..N has_hybridization 0..N
has_prev_elm 0..N
SPOTgrid: INTEGERrow: INTEGERcol: INTEGERflag: SHORT
has_spot 0..N
has_measurement 0..N
has_spot_measurement 0..N
has_clone 1..1
ARRAY_SPOTTERhas_sw_tool: SW_TOOL
has_hw_tool 1..1
ORGANISMname: STRINGid: STRING
CELL_SAMPLEorgan: STRINGtissue: STRINGcell_type: STRINGcell_line: STRINGcell_culture: STRINGdevelopment_stage: STRINGsex: STRINGstrain: STRING
has_source 1..1
CELL_SAMPLE_TREATMENThas_next_elm 0..N
has_prev_elm 0..N
has_next_elm 0..N
has_prev_elm 0..N
has_next_elm 0..N
has_next_elm 0..N
mRNA_EXTRACTIONcell_rupture: STRINGrna_isolated: STRING
volume: SHORT
mRNA TREATED_CELL_SAMPLE
has_prev_elm 0..N
has_prev_elm 0..N
has_prev_elm 0..N
cDNA_LABELINGlabel: STRING
efficiency: FLOAT
has_prev_elm 0..N
has_next_elm 0..N
cDNA_PROBE
has_prev_elm 0..N
has_next_elm 0..N
has_measurement 0..Nhas_channel_measurement 0..N
Generic constructs for modeling scientific experiments
Methodology for modeling complex experimental procedures
Modeling application specific experimental data / information by extending the generic constructs
©2003, E. Kaletas
Information Presentation
Experiment Topology– Graphical representation of self-contained data processing
modules attached to each other in a workflow
Process-Flow Template– Graphical representation of data elements and processing steps in an experimental procedure
– Information to support context-sensitive assistance
Study– Descriptions of experimental steps represented as an instance of a PFT with references to experiment topologies
©2003, E. Kaletas
Micro-Array: Definition: The glass slide where the clones are spotted. Attributes: Pitch, Number of spots, Template used …Target: Definition: mRNA extracted from samples under study.Probe: Definition: See target.
Customizing Experimentation Environment (future work)
Ontology management Dictionary for formal definitions of entities (data elements, processes, and their attributes) in a specific scientific domain Definitions for Inter-Disciplinary Research
User-defined data types
Extending the application data model with
customized data types, e.g. for modeling
user-specific experimental steps and results
User-defined experiment templates
For experienced scientists: More specialized experiments through customized PFTs
MolecularBiology
Ontology
Sample:
Definition: The glass slide where the clones are spotted.
Attributes: Pitch, Number of spots, Template used …
Artist:
Definition: mRNA extracted from samples under study.
Surface analysis:
Definition: See target.
Material AnalysisOntology
RNA(COPY)
(0, 1)(false)
RNA COMMENT(COPY)(0, -1)(false)
hasComments(COPY)
RNA PROPERTY(COPY)(0, -1)(false)
hasProperties(COPY)
RNA OWNER(LINK)(0, 1)(false)
isPerformedBy(LINK)
hasPerformed(NOREUSE)
RNA LABELING(COPY)
(0, 1)(false)
RL COMMENT(COPY)
hasComments(COPY)
RL PROPERTY(COPY)(0, -1)(false)
hasProperties(COPY)
RL PROTOCOL(COPY)
(0, 1)(false)
hasProtocol(COPY)
RL OPERATOR(LINK)(0, 1)(false)
isPerformedBy(LINK)
hasPerformed(NOREUSE)
hasNextStep(NOREUSE)
hasPrevStep(NOREUSE)
RNA COMMENTATOR(LINK)(0, -1)(false)
isMadeBy(LINK)
RL COMMENTATOR(LINK)(0, -1)(false)
isMadeBy(LINK)
BioSample { targetRna; sampleCovering; …}
BioSample { targetRna; sampleCovering; …}
©2003, E. Kaletas
A R R A YS P O T T IN G
P R O J E C T E X P E R IM E N T M IC R O -A R R A Y
C O M M E N T C O M M E N T P R O P E R T Y C O M M E N T P R O P E R T Y
D N A L A B H WT O O L
D N A L A B S WT O O L
A R R A YS P O T T E R
P A R A M E T E RA R R A Y
S P O T T E RS O F T W A R E
P A R A M E T E R
P R O T O C O L
Information Manipulation
PFT manipulation Creating, modifying, deleting
PFTs Version control
Experiment topology manipulation
Reading and writing of experiment topologies
Study manipulation Creating, modifying, deleting
studies Linking studies to their
corresponding PFTs Linking studies to experiment
topologies Linking studies to large (raw)
data sets
• Query facilities– Querying for specific experiment
steps• All related information about
the queried step– Application-specific query
interfaces
©2003, E. Kaletas
Multi-Disciplinary Projects
Project Design
Micro-ArrayExperiment
Material AnalysisExperiment
Simulation & Visualization
©2003, E. Kaletas
Information Integration (future work)
ARCHIPEL: Generic cooperative framework for: cross-institutional data sharing and exchange (semantic) integration of diverse information from
heterogeneous sources management of integrated data
• Using the available distributed
computing facilities:
– VLAM-G RTS
– Grid
+name : String+id : Integer+type : String+description : String+startDate : Timestamp+endDate : Timestamp+url : String
Project
+commentDate : Timestamp+comment : String
Comment
+name : String+id : Integer+experimentId : Integer+description : String
Step
+hasComments
0..*
+hasNextStep
0..*
+hasPrevStep
0..*
+activityDate : Timestamp
ActivityStep DataStep
+name : String+value : String+unit : String
Property
+hasProperties0..*
+name : String+id : Integer
Person
SwTool
+name : String+id : Integer+description : String+serialNo : String
Hardware
+name : String+id : Integer+description : String+serialNo : String+version : String
Software
+hasSoftware
0..1
+hasHardware
0..1
+isMadeBy
0..1
+phone : String+fax : String+email : String+url : String
User +isPerformedBy
1
+hasPerformed0..*
+name : String+id : Integer+value : String+unit : String
Parameter
+hasParameters
0..*
+hasParameters
0..*
+hasParameters0..* +hasParameters0..*
+hasEmployees
0..*
+isEmployeeOf
0..1
+name : String+id : Integer+activityType : String+postalAddress : String+phone : String+fax : String+email : String+url : String
Organization
+name : String+id : Integer+projectId : Integer+description : String+type : String+subject : String+status : String+lastUpdateDate : Timestamp+publishedIn : String+relatedPublications : String+url : String
Experiment
+isPartOfProject
0..1
+hasExperiments
0..*
+hasNextExperiment0..1
+hasPrevExperiment
0..1
+isPartOfExperiment
0..1
+hasSteps
0..*
+hasComments 0..*
+ownsExperiments 0..*
+hasOwner 0..1
HwTool
+hasContributors0..*
+contributedExperiments0..*
+hasSubStep 0..1 +hasSuperStep0..1
+hasVendor
0..1
+hasVendor
0..1
+name : String+id : Integer+type : String+description : String+startDate : Timestamp+endDate : Timestamp+url : String
Project
+commentDate : Timestamp+comment : String
Comment
+name : String+id : Integer+experimentId : Integer+description : String
Step
+hasComments
0..*
+hasNextStep
0..*
+hasPrevStep
0..*
+activityDate : Timestamp
ActivityStep DataStep
+name : String+value : String+unit : String
Property
+hasProperties0..*
+name : String+id : Integer
Person
SwTool
+name : String+id : Integer+description : String+serialNo : String
Hardware
+name : String+id : Integer+description : String+serialNo : String+version : String
Software
+hasSoftware
0..1
+hasHardware
0..1
+isMadeBy
0..1
+phone : String+fax : String+email : String+url : String
User +isPerformedBy
1
+hasPerformed0..*
+name : String+id : Integer+value : String+unit : String
Parameter
+hasParameters
0..*
+hasParameters
0..*
+hasParameters0..* +hasParameters0..*
+hasEmployees
0..*
+isEmployeeOf
0..1
+name : String+id : Integer+activityType : String+postalAddress : String+phone : String+fax : String+email : String+url : String
Organization
+name : String+id : Integer+projectId : Integer+description : String+type : String+subject : String+status : String+lastUpdateDate : Timestamp+publishedIn : String+relatedPublications : String+url : String
Experiment
+isPartOfProject
0..1
+hasExperiments
0..*
+hasNextExperiment0..1
+hasPrevExperiment
0..1
+isPartOfExperiment
0..1
+hasSteps
0..*
+hasComments 0..*
+ownsExperiments 0..*
+hasOwner 0..1
HwTool
+hasContributors0..*
+contributedExperiments0..*
+hasSubStep 0..1 +hasSuperStep0..1
+hasVendor
0..1
+hasVendor
0..1
+name : String+id : Integer+type : String+description : String+startDate : Timestamp+endDate : Timestamp+url : String
Project
+commentDate : Timestamp+comment : String
Comment
+name : String+id : Integer+experimentId : Integer+description : String
Step
+hasComments
0..*
+hasNextStep
0..*
+hasPrevStep
0..*
+activityDate : Timestamp
ActivityStep DataStep
+name : String+value : String+unit : String
Property
+hasProperties0..*
+name : String+id : Integer
Person
SwTool
+name : String+id : Integer+description : String+serialNo : String
Hardware
+name : String+id : Integer+description : String+serialNo : String+version : String
Software
+hasSoftware
0..1
+hasHardware
0..1
+isMadeBy
0..1
+phone : String+fax : String+email : String+url : String
User +isPerformedBy
1
+hasPerformed0..*
+name : String+id : Integer+value : String+unit : String
Parameter
+hasParameters
0..*
+hasParameters
0..*
+hasParameters0..* +hasParameters0..*
+hasEmployees
0..*
+isEmployeeOf
0..1
+name : String+id : Integer+activityType : String+postalAddress : String+phone : String+fax : String+email : String+url : String
Organization
+name : String+id : Integer+projectId : Integer+description : String+type : String+subject : String+status : String+lastUpdateDate : Timestamp+publishedIn : String+relatedPublications : String+url : String
Experiment
+isPartOfProject
0..1
+hasExperiments
0..*
+hasNextExperiment0..1
+hasPrevExperiment
0..1
+isPartOfExperiment
0..1
+hasSteps
0..*
+hasComments 0..*
+ownsExperiments 0..*
+hasOwner 0..1
HwTool
+hasContributors0..*
+contributedExperiments0..*
+hasSubStep 0..1 +hasSuperStep0..1
+hasVendor
0..1
+hasVendor
0..1
©2003, E. Kaletas
Collaboration
Information sharing Complete studies or study
steps Experiment topology designs Modules
Sharing policies Copy/reuse policy: Make a
copy of the shared object or make link to it
Update policy: Automatic/manual update of shared objects, no update
Basic access control to information
Access-rights management at different levels:
Study, study step, step attribute
PFT Topology Module
Multi-disciplinary projects
©2003, E. Kaletas
Information Manager
Information management services for: Domain information
Available data sources, data types, access mechanisms Domain-specific ontology information Domain-specific experimental information
PFTs, studies, modules , topologies User information
Profile, preferences, roles, permissions / access-rights Session information
Active sessions, active PFTs, studies, jobs
Several information management libraries: Multi-platform: Java / C / C++, Grid, JDBC / ODBC Several interfaces: RMI, Activatable RMI, HTTP/HTTPS Servlet
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO: Architecture Design Summary & Acknowledgements
©2003, E. Kaletas
user user
Front-End
Collaboration Session Manager
RTS
Assistant
VIMCO
Grid / Globus services
M1 M2 M3 M4
Resource Repository
Resource A Resource B
…RTSDB
ProjectDB
VIMCODB
… …
ApplicationDBs
Position of VIMCO within VLAM-G
©2003, E. Kaletas
VIMCOArchitecture
DB Server Node-1
VLAM-G Server Node
VIMCO Core Functionality Server Node
DB Server Node-0
VIMCO Server
XML ManagerConnection Manager
DB Server Node-2
RTS DB
DB Server Node-3
EXPRESSIVE DB
DB Server Node-4
MACS DBVIMCO DB
VLAM-G Session Manager
Lookup Server
VIMCO Server Manager
VIMCO Communication Server Node
VIMCO RMI Server VIMCO HTTP Server
Log Manager
PROJECT DB
{OR} {OR}
VIMCO Activatable RMI Server
{OR}
(data) resources tier
application server tier
communication tier
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO: Databases Summary & Acknowledgements
©2003, E. Kaletas
RTS DB
Expressive DB MACS DBOther application
databases
VIMCO DB
• User information (roles, access-rights)• Active sessions• Available data sources
• Definitions of available software entities• Processings
• Experiments involved in multi-disciplinary projects
• Expressive DB: DNA micro-array procedures and contexts• MACS DB: Material analysis procedures and contexts
Project DB
VIMCO Databases
©2003, E. Kaletas
Multi-Disciplinary Projects in Project DB
Project
Experiment * Experiment Step
Experiment * Experiment Step
PROJECT DB
EXPRESSIVE DB
MACS DB
Project * Experiment * Experiment Step
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO: Implementation Summary & Acknowledgements
©2003, E. Kaletas
VIMCO Implementation
RMIRMI Activation HTTP(S) Servlet
Java, C, C++XML JDBCMts2Java APIVLAM-G RTS
Matisse ODBMS
32-N
ode
PII
I L
inu
x cl
ust
er (
DA
S2)
4-
pro
cess
or S
un
Ult
ra4
DB Server Node-1
VIMCO Core Functionality Server Node
DB Server Node-0
VIMCO Server
XML ManagerConnection Manager
DB Server Node-2
RTS DB
DB Server Node-3
EXPRESSIVE DB
DB Server Node-4
MACS DBVIMCO DB
Lookup Server
VIMCO Server Manager
VIMCO Communication Server Node
VIMCO RMI Server VIMCO HTTP Server
Log Manager
PROJECT DB
VIMCO Activatable RMI Server
©2003, E. Kaletas
Main VIMCOComponents
+init()+getConnection(in databaseObject : Object, in username : String, in password : String, in sessionId : int) : IConnection+removeConnection(in databaseObject : Object, in username : String, in sessionId : int)+removeAllConnections()+removeSessionConnections(in sessionId : int)
ConnectionManager
+connect(in database : Object, in username : String, in password : String)+disconnect()+startTransaction()+commitTransaction()+abortTransaction()+startReadOnlyTransaction()+endReadOnlyTransaction()+execSQL(in query : String) : Vector+readObject(in oid : int) : Object+writeObject(in object : Object) : Object+deleteObject(in oid : int)+deleteObject(in object : Object)+isConnectionOpen() : boolean+createEmptyObject(in typeName : String) : int+readAllClassInfos() : Vector
«interface»IConnection
JMatisse MtsJava
+id : int+value : String
«type»XML
+readXML(in xml : XML) : Object+writeXML(in object : Object) : XML
JSXManager
+readXML(in xml : XML) : Object+writeXML(in object : Object) : XML
ElectricXMLManager
JSX ElectricXML
+readXML(in xml : XML) : Object+writeXML(in object : Object) : XML
XMLManager
«uses»
«uses»
+readXML(in xml : XML) : Object+writeXML(in object : Object) : XML
VimcoXMLManager
«uses»
VimcoXML
-sessionList : Hashtable-databaseList : Hashtable-userList : Hashtable-roleList : Hashtable-restrictionList : Hashtable-publicUserList : Hashtable-password : String-sessionId : String-identOid : int
LookupServer
+authenticateUser(in distinguishedName : String) : int+getUserType(in userId : int) : int+getAvailableServices(in userId : int) : Hashtable+readUser(in sessionId : int, in dbId : int, in userId : int) : Hashtable+writeUser(in sessionId : int, in dbId : int, in user : XML) : Hashtable+deleteUser(in sessionId : int, in dbId : int, in userId : int)+readOrganization(in sessionId : int, in dbId : int, in organizationId : int) : Hashtable+readAllOrganizations(in sessionId : int, in dbId : int) : Hashtable+writeOrganization(in sessionId : int, in dbId : int, in organization : XML) : Hashtable+deleteOrganization(in sessionId : int, in dbId : int, in organizationId : int)+readRole(in sessionId : int, in dbId : int, in roleId : int) : Hashtable+writeRole(in sessionId : int, in dbId : int, in role : XML) : Hashtable+deleteRole(in sessionId : int, in dbId : int, in roleId : int)+readRestriction(in sessionId : int, in dbId : int, in restrictionId : int) : Hashtable+writeRestriction(in sessionId : int, in dbId : int, in restriction : XML) : Hashtable+deleteRestriction(in sessionId : int, in dbId : int, in restrictionId : int)+addUserRole(in sessionId : int, in dbId : int, in userId : int, in roleId : int)+removeUserRole(in sessionId : int, in dbId : int, in userId : int, in roleId : int)+addRoleRestriction(in sessionId : int, in dbId : int, in roleId : int, in restrictionId : int)+removeRoleRestriction(in sessionId : int, in dbId : int, in roleId : int, in restrictionId : int)+createSession(in sessionObject : Object)+updateSession(in sessionObject : Object)+terminateSession(in sessionId : int)+readPFT(in sessionId : int, in dbId : int, in pftId : int) : Hashtable+writePFT(in sessionId : int, in dbId : int, in pft : XML) : Hashtable+writePFTAs(in sessionId : int, in dbId : int, in pft : XML) : Hashtable+deletePFT(in sessionId : int, in dbId : int, in pftId : int)+readStudy(in sessionId : int, in dbId : int, in studyId : int) : Hashtable+getReuseObjects(in sessionId : int, in dbId : int, in copyStudyElementId : int, in currentPFTElementId : int) : Hashtable+writeStudy(in sessionId : int, in dbId : int, in study : XML) : Hashtable+writeStudyAs(in sessionId : int, in dbId : int, in study : int) : Hashtable+deleteStudy(in sessionId : int, in dbId : int, in studyId : int)+readModule(in sessionId : int, in dbId : int, in moduleId : int) : Hashtable+readAllModules(in sessionId : int, in dbId : int) : Hashtable+writeModule(in sessionId : int, in dbId : int, in module : XML) : Hashtable+writeModuleAs(in sessionId : int, in dbId : int, in module : XML) : Hashtable+deleteModule(in sessionId : int, in dbId : int, in moduleId : int)+readTopology(in sessionId : int, in dbId : int, in topologyId : int) : Hashtable+writeTopology(in sessionId : int, in dbId : int, in topology : XML) : Hashtable+writeTopologyAs(in sessionId : int, in dbId : int, in topology : XML) : Hashtable+deleteTopology(in sessionId : int, in dbId : int, in topologyId : int)+getAllClassInfos(in sessionId : int, in dbId : int) : Hashtable+writeObjects(in sessionId : int, in dbId : int, in objects : XML) : Hashtable+execQuery(in sessionId : int, in dbId : int, in query : String) : Hashtable+getNextId() : int+instantiateSession(in sessionId : int, in userId : int) : Hashtable+readAttProperties(in sessionId : int, in dbId : int) : Hashtable+readAttProperties(in sessionId : int, in dbId : int, in pftId : int) : Hashtable
VimcoServer
ConnectionManager: Connection caching
LookupServer:Fast memory access
XMLManager:Independence from 3rd party tools
VimcoServer:Uniform API
©2003, E. Kaletas
VIMCO Object Model
//Generated by SchemaClassGenerator - Do not modify!
package nl.wtcw.vlamg.app.expressive.db.user;
import nl.wtcw.vlamg.vimco.db.driver.Persistent;import java.util.Vector;
public class MicroArray extends DataStep implements Persistent {
public String substrate; public int numSpots; public String pitch; public String spotDimensions; public String arrayDimensions; public String slideType;
public Vector hasMeasurement; public Vector hasTemplate;
//Do not modify the empty constructor.//If needed, add new constructors or custom methods
public MicroArray() { }
}
+slideType : String+arrayDimensions : String+spotDimensions : String+pitch : String+numSpots : Integer+substrate : String
MicroArray
+hasTemplate
1..1
+usedByArrays
0..*
+isMeasurementFor0..1
+hasMeasurement0..*
Removing object-boundary problem
©2003, E. Kaletas
Development approach Problem domain Requirements analysis Related work VLAM-G VIMCO Summary & Acknowledgements
©2003, E. Kaletas
MC T fo ca l p la ne ar ra y
F TS 60 0 0 Spe ctro m eter
Ste p-sc a nm ir ror
F ixedm i rr o r
50 /5 0B e am spli tte r
C era m icm id- IR so ur ce
T ur nin gm ir ror
Turningm ir ror
Ca F 2
Lens
C as se g rain ia nc o n d e n s or
C as se g rain ia no b jec t iv e
Sam ple
S ession M an ager
V IM C O S erver
E X P R E S S IV E M A C S V L A R C H IV E
F ron t-E n dP rocess F low
T em p la teE x p er im en t
T op o logy
D istr ib u ted R T S
N od e A N od e B
S tartP S E
E n dP S E
hasExperim ents(N ORE USE )
hasSteps(N ORE USE )
PR OJEC T(L IN K)
EXPER IM ENT(C OP Y)
CO M M ENT(C OP Y)
hasCom m ents(C OP Y)
O W NER(LIN K)
hasOwnerL INK
CO NTRIBUTO R(LIN K)
isPartO fProjec t(N ORE USE )
ownsE xperim ents(N ORE USE )
hasContributors(L IN K)
contribu tedE xperim ents(N ORE USE )
EXPER IM ENT(L IN K)
hasNextE xperim ent(N ORE USE )
hasPrevExperim ent(N ORE USE )
isPartO fExperim ent(N ORE USE )
CO M M ENTA TOR(LIN K)
isM adeBy(L IN K)
AR RAYM E AS UREM EN T
(C OP Y)
CO M M ENT(C OP Y)
hasCom m ents(C OP Y)
PR OP ERTY(C OP Y)
hasProperties(C OP Y)
O W NER(LIN K)
isPerform edB y(L IN K)
hasPerform ed(N ORE USE )
CO M M ENTA TOR(LIN K)
isM adeBy(L IN K)
hasNextS tep(N ORE USE )
hasPrevStep(N ORE USE )
DA TA A NALYS IS(C OP Y)
Experimentation in VLAM-G
©2003, E. Kaletas
Acknowledgements
CAPS (Computer Architecture & Parallel Systems)
director: Prof. Bob Hertzberger
CO-IM (Co-operative Information Management) Group
group leader: Dr. Hamideh Afsarmanesh
&
The VLAM-G Team