SimDB and SimTAP
Dealing with a complex data model
Gerard Lemson, Nara, 2010-12-10
SimDB and SimDALProtocols to support• describing simulations
– Simulation Data Model: Model for N-body 3+1D any simulations http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/specification/uml/SimDB_DM.png
• publishing simulations– Simulation Database (SimDB): protocol for accessing a database built according
to SimDM.• finding simulations
– SimDB/TAP– queryData in SimDAL– SimTAP
• retrieving simulation data, whole, in parts, manipulated– SimDAL getData services (not in this talk)
• Btw: “simulation” can be– simulation run– simulation result– simulation data– post-processing of simulation results
SimDB/REST
• “simple” access to SimDB• Uses XML representation of model
– XML schema http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/xsd
• Examples http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples
– PDR http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples/external/PDR
– Gadget2http://volute.googlecode.com/svn-history/r1382/trunk/projects/theory/snapdm/specification/examples/external/Gadget2/Gadget2.xml
– TODO more (SVO)
• VO-URP – validator http://www.g-vo.org/SimDB-browser/Validate.do
– upload http://www.g-vo.org/SimDB-browser
– download http://www.g-vo.org/SimDB-browser
SimDB/TAP
• Model complex– Too(?) complex for trivial (parameter based) query language– Need special navigation tools (vo-urp@gavo)– Need powerful query language
• Impement TAP on database built according to SimDM• Map UML to RDB model
– TAP_SCHEMA for SimDM (vo-urp@gavo old)http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/tap
– create table + inserts– VODataService
• VO-URP SQL query http://www.g-vo.org/SimDB-browser/Query.do
• Not always easy!
Model complex
• Normalised (see image)
• General Abstract– e.g. parameters must be fully defined, no
assumptions
• Hard to deal with quantities with a priori unknown units– ParameterSetting table has value AND unit
attributes (Quantity datatype)
Example queries
• Find synthetic spectra of white dwarf stars
• Find cosmological simulations with Ω=0.9, ΩΛ= 0.7 and Ωb=0.02
• Find all SPH simulations containing a galaxy cluster with mass around1014 Msun
select e.* from experiment e , targetObject t , result r , product p where t.label=‘white_dwarf’ and t.containerid=e.id and r.containerid=e.id and r.targetId=t.id and p.containerid=r.id and p.productType=‘spectrum’
Example queries
• Find synthetic spectra of white dwarf stars
• Find (cosmological) simulations with Ω=0.9, ΩΛ= 0.7 and Ωb=0.02
• Find all SPH simulations containing a galaxy cluster with mass around1014 Msun
select e.* from Experiment e , InputParameter ip1 , ParameterSetting ps1 , InputParameter ip2 , ParameterSetting ps2 , InputParameter ip3 , ParameterSetting ps3 where ps1.containerId = e.id and ps1.parameterId = ip1.id and ip1.label = ‘omega_lambda’ and ps1.numericalValue_value=0.7 and ps2.containerId = e.id and ip2.label = ‘omega_baryon’ and ps2.parameterId = ip1.id and ps2.numericalValue_value=0.02 and ps3.containerId = e.id and ip3.label = ‘omega’ and ps3.numericalValue_value=0.9
Example queries
• Find synthetic spectra of white dwarf stars
• Find (cosmological) simulations with Ω=0.9, ΩΛ= 0.7 and Ωb=0.02
• Find all SPH simulations containing a galaxy cluster with mass around1014 Msun
select e.* from Experiment e , ExperimentRepresentationObject ero , RepresentationObjectType rot , TargetObject to , Property p, StatisticalSummary s where ero.containerId = e.id and ero.typeId= rot.id and rot.label=‘sph.particle’ and to.containerId = e.id and to.label = ‘galaxy.cluster’ and p.containerId = to.id and p.label=‘mass’ and s.propertyId = p.id and s.statistic = ‘value’ and s.numericalValue_value=1e14 and s.numericalValue_unit=‘M_sun’
SELECT r.id as id, r.publisherdid as publisherdid, s0.numericValue_value as mass, s1.numericValue_value as x, s2.numericValue_value as y, s3.numericValue_value as z FROM result r , product o , statisticalsummary s0 , property p0 , statisticalsummary s1 , property p1 , statisticalsummary s2 , property p2 , statisticalsummary s3 , property p3 WHERE r.containerid = 6 AND o.containerid = r.id and s0.containerid = o.id and s1.containerid = o.id and s2.containerid = o.id and s3.containerid = o.id and p0.publisherdid = 'mass' and s0.proprtyid=s3.id and s0.statistic = ‘nominal’ and p1.publisherdid = 'x' and s1.proprtyid=s3.id and s1.statistic = ‘nominal’ and p2.publisherdid = 'y' and s2.proprtyid=s3.id and s2.statistic = ‘nominal’ and p3.publisherdid = 'z' and s3.proprtyid=s3.id and s3.statistic = ‘nominal’
An example from Paris.Find typical values of mass,x,y,z properties in a given simulation result
SELECT r.id as id, r.publisherdid, max(case when p.publisherdid = ‘mass’ and s.statistic=‘nominal’ then s.numericValue_value else null end) as mass, max(case when p.publisherdid = ‘x’ and s.statistic=‘nominal’ then s.numericValue_value else null end) as x, max(case when p.publisherdid = ‘y’ and s.statistic=‘nominal’ then s.numericValue_value else null end) as y, max(case when p.publisherdid = ‘z’ and s.statistic=‘nominal’ then s.numericValue_value else null end) as z FROM result r , product o , statisticalsummary s , property p WHERE r.containerid = 6 AND o.containerid = r.id and s.containerid = o.id and p.id = s.propertyidgroup by r.id,r.publisherid,o.id
Conclusions
• Some queries can be phrased nicely
• Others using standard SQL, but due to level of normalisation and abstraction MANY joins required
• Can we simplify this a bit?
zoom
containerId value unit parameterId... ... ... ...
123 0.02 456
123 0.7 457
123 0.9 458
345 .04 456
345 .7 457
345 1 458
... ... ... ...
id name label datatype description456 omega_b omega.baryon real ...
457 omega_l omega.lambda real ...
458 omega omega real ...
... ... ... ... ...
ParameterSetting
InputParameter
id omega_b omega_l omega ...
123 0.02 0.7 0.9
345 0.04 0.7 1
+
simtap.Experiment
SimTAP
• When Protocol is fixed, tap schema can be simplified– parameters columns in simtap.Experiment
table– property characterisation columns in
product specific characterisation table(s)– ...
select e.* from Experiment e , InputParameter ip1 , ParameterSetting ps1 , InputParameter ip2 , ParameterSetting ps2 , InputParameter ip3 , ParameterSetting ps3 where ps1.containerId = e.id and ps1.parameterId = ip1.id and ip1.label = ‘omega_lambda’ and ps1.numericalValue_value=0.7 and ps2.containerId = e.id and ip2.label = ‘omega_baryon’ and ps2.parameterId = ip1.id and ps2.numericalValue_value=0.02 and ps3.containerId = e.id and ip3.label = ‘omega’ and ps3.numericalValue_value=0.9
Instead ofthis
this
select e.*
from simtap.Experiment
where omegaLambda=0.7
and omegaBaryon=0.02
and omega=0.9
Table definitions can be derived
• From a Protocol definition– input parameters– for each Representation object type
• a table with statistical summaries of properties
– target object type• ala SimDM (units in ADQL required)• pivoted per project?
– input data sets (urls)
• Pivoting queries can be generated
Proposal
• SimDAL services MAY include a SimTAP service
• 1 SimTAP schema per Protocol• Each such schema contains
– 1 Experiment table with columns for parameters
– >=1 Product tables with characterisation of properties
– Possibly other tables from SimDB/TAP