A.Z. Fazliev,A.Z. Fazliev, N.A.Lavrentiev, A.I.PrivezentsevN.A.Lavrentiev, A.I.Privezentsev V.E. Zuev Institute of Atmospheric Optics SB RAS, V.E. Zuev Institute of Atmospheric Optics SB RAS, Academician Zuev Square 1, Tomsk 634021, Russia Academician Zuev Square 1, Tomsk 634021, Russia
E-mail: [email protected]: [email protected]
HITRAN Conference, Cambridge, 16-18 June 2010
IntroductionIntroductione-Science. Service oriented architecturee-Science. Service oriented architecturee-Science. Three layers information systemse-Science. Three layers information systemsThe Data, Information and Knowledge LifecycleThe Data, Information and Knowledge Lifecycle
Architecture of W@DISArchitecture of W@DIS
Model of Model of Quantitative MolecularQuantitative Molecular Spectroscopy Spectroscopy
Data ValidityData Validity Formal constraintsFormal constraintsSelection rules. Primary data sourcesSelection rules. Primary data sourcesPublication constraintsPublication constraintsNon-formal constraintsNon-formal constraints
Current State of W@DIS
HITRAN Conference, Cambridge, 16-18 June 2010
e-Science. Service oriented architecturee-Science. Service oriented architecture
De Roure D., Jennings N., Shadbolt N. A Future e-Science Infrastructure // Report commissioned
for EPSRC/DTI Core e-Science Programme. 2001. 78 p.
HITRAN Conference, Cambridge, 16-18 June 2010
e-Science. Three layers information systemse-Science. Three layers information systems
De Roure D., Jennings N., Shadbolt N. A Future e-Science Infrastructure // Report commissioned for EPSRC/DTI Core e-Science Programme. 2001. 78 p.
The Data-Computation Layer“As soon as computers are interconnected and communicating we have a distributed system, and the issues in designing, building and deploying distributed computer systems have now been explored over many years. First it positions the Grid within the bigger picture of distributed computing, asking whether it is subsumed by current solutions. Then we look in more detail at the requirements and currently deployed technologies in order to identify issues for the next generation of the infrastructure. Since much of the grid computing development has addressed the data-computation layer, this section particularly draws upon the work of that community.”
The Information Layer“This layer is focus firstly on the Web. The Web’s information handling capabilities are clearly an important component of the e-Science infrastructure, and the web infrastructure is itself of interest as an example of a distributed system that has achieved global deployment. The second aspect addressed is support for collaboration, something which is key to e-Science. The information layer aspects build on the idea of a ‘collaboratory’, defined as a “centre without walls, in which the nation’s researchers can perform their research without regard to geographical location - interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries.”
The Knowledge Layer“The aim of the knowledge layer is to act as an infrastructure to support the management and application of scientific knowledge to achieve particular types of goal and objective. In order to achieve this, it builds upon the services offered by the data-computation and information layers.The first thing to reiterate at this layer is the problem of the sheer scale of content we are dealing with. We recognise that the amount of data that the data grid is managing will be huge. By the time that data is equipped with meaning and turned into information we can expect order of magnitude reductions in the amount. However the amount of information remaining will certainly be enough to present us with a problem – a problem recognised as infosmog – the condition of having too much information to be able to take effective action or apply it in an appropriate fashion to a specific problem. Once information is delivered that is destined for a particular purpose, we are in the realm of the knowledge grid that is fundamentally concerned with abstracted and annotated content, with the management of scientific knowledge.”
HITRAN Conference, Cambridge, 16-18 June 2010
The Data, Information and Knowledge LifecycleThe Data, Information and Knowledge Lifecycle
De Roure D., Jennings N., Shadbolt N. A Future e-Science Infrastructure // Report commissioned for EPSRC/DTI Core e-Science Programme. 2001. 78 p.
Acquire
Modelling
Retrieve
Publish
Maintain
The challenge of knowledge publishing or disseminating can be described as getting the right data, information and knowledge, in the right form, to the right person or system, at the right time … .
HITRAN Conference, Cambridge, 16-18 June 2010
What is the hierarchy of the problems ?1.Lifecycle. Implement Distributed Information System that
allows simplify for the investigator aquire, retrieve and publish data and information
2.Publish. Create Publishing Tools 3.Key question. Guarantee Data Validity4.Constraints Types.
1. Restrictions on physical entity values (for instance, selection rules). Verification of the restrictions is identical to verification of statement 2. Existence (Publication) Restrictions ( )
Getting the right data, …..Getting the right data, …..
SX )(SY )(
- existential quantifier - universal quantifier
S – spectroscopy domain, X- physical entity characterized by quantum numbers,Y – published data set
HITRAN Conference, Cambridge, 16-18 June 2010
Web-service of publications data base synchronization
Web-service for the formation of a homogeneous set of
inverse and direct tasks solutions properties in
a distributed system
Web-service for the formation of an ontology of molecular
spectroscopy tasks’ solutions properties
Inte
rfac
es
Pro
tég
éin
terf
ace
W@
DIS
, C
aD@
DIS
Data - computation layer
Information layer
Knowledge layer
System of direct and inverse spectroscopy problems’
solutions input
Spectral functions calculation
Formation of compositeproblems’ solutions
Inference engine
Logical consistency check
Decomposition of problems’ solutions
according to publications
Description of non-calculable properties of molecular spectroscopy inverse
and direct problems’ solutions
Computation of calculated properties of direct
and inverse spectroscopy problems’ solutions
Composite solutions of spectroscopy tasks
Primary solutions ofinverse spectroscopy
tasks
Primary solutions ofdirect spectroscopy
tasks
Publications DB
Ontology of spectroscopy tasks’ solutions properties
Molecular spectroscopy tasks’ solutions properties
Data Node Applications Interfaces Web-servicesWeb-services
Node Architecture of W@DISNode Architecture of W@DIS Semantic Web approach
HIT
RA
N C
onfe
renc
e, C
ambr
idge
, 16
-18
June
201
0
Model of Model of Quantitative MolecularQuantitative Molecular Spectroscopy Spectroscopy
Isolated molecules spectral line parameters (T2)
Isolated molecule physical characteristics
(T1)
Spectral line profile parameters (T3)
Spectral functions calculation (T4)
Direct Problems Inverse Problems
Com
puta
tions
Mea
sure
men
ts
Two chains of problems are selected for domain
Isolated molecule energy levels (T7)
Einstein coefficients (T6)
Interacting molecule spectral line parameters
(ET)
Spectral functions measurement (E)
Quantum numbers assignment to spectral lines
(T5)
Elementary solution of spectroscopic problemElementary solution of spectroscopic problem
Elementary source characteristicsElementary source characteristicsmolecule – H2Othe list of physical quantities – energy levels E (cm-1), Quantum numbers (v1 v2 v3 J Ka Kc), …….
publication - Schwenke D.W., New H2O Rovibrational Line Assignments. // Journal of Molecular Spectroscopy, 1998, v. 190, no. 2,
p. 397-402 data - ………………………………………………………………
HITRAN Conference, Cambridge, 16-18 June 2010
IUPAC Data group approach (information aspect)IUPAC Data group approach (information aspect)
Data Validity
Formal constraintsData type – Quantum Numbers – natural numbers, …Intensity, Halfwidth, Frequency, Energy Levels – positive real numbers, …. Variation interval – 0 < wavenumbers < 45000 cm-1, 10-16 cm/mol < intensity <10-30 cm/mol
Selection rules -normal modes - ka+kc=J or J+1, …..
Publication constraintWhether data are published or notOther constraints (transitivity and antisymmetry axioms)
…………………………………………..
Non-formal constraintsExperts’ opinion
XM
LO
WL
DL
HITRAN Conference, Cambridge, 16-18 June 2010
SSelection rules. Primary data sourceselection rules. Primary data sources
Problem Т1, Problem Т7 Problem Т2, Problem Т6 Problem Т3, Proble Т5 Direct Inverse
H2O 9(2), 30 (24) 5(0), 91 (47) 5 (0), 183 (167)
H217O 4(0), 19 (15) 5(1), 40 (31) 4 (0), 19 (16)
H218O 4(0), 18
(18) 5(1), 59 (35) 4 (0), 29 (17)
HDO 1(0), 32 (28)
3(0), 83 (56) 2 (0), 8 (3)
HD17O -, 3 (3) 2(0), 3 (3) 2 (0), 6 (6)
HD18O -, 5 (4) 2(0), 6 (6) 2 (0), 7 (7)
D2O 1(1), 18 (8) 3(0), 38 (26) 3 (0), 10 (7)
D217O 1(0), 3 (3) 2 (0), 1 (1)
D218O 2(0), 6 (6) 2 (0), 1 (1)
15(3), 125 (100)
28(2), 318 (207)
26(0), 264 (225)
9(2) Total number of data sources(Number of correct data sources)Privesentsev A.I., Ontological knowledge base implementation and software for information resources description in molecular spectroscopy, Tomsk State University, PhD Dissertation, 2009, 238 Pages
HIT
RA
N C
onfe
renc
e, C
ambr
idge
, 16
-18
June
201
0
ValidityPublication constraints
Informatics Restriction. RFC 2396: A (information) resource can be anything that has identity.
Decomposition. Mathematical basis. Axiom of reflexivity: For each a, a=a.Physical constrainta ---- 1-st type of measurements --- > A1a ---- 2-nd type of measurements -- > A2Experimental accuracy Criteria of identity|A1 – A2| <
The complete validated data set are published by IUPAC data group (J. Tennyson, P.F. Bernath, L.R. Brown, et al., IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part I. Energy Levels and Transition Wavenumbers for H2
17O and H218O, Journal of
Quantitative Spectroscopy and Radiative Transfer, July 2009, V.110, no.9-10, P.573-596. )
HITRAN Conference, Cambridge, 16-18 June 2010
Decomposition. Hitran-2008 (H218O)
H2
18O (N = 9753)
(cm-1) Data source (inverse problem direct problem)
Residual
N= 310-10 cm-1
10-1 1971_StBe 1970_PoJo 2006_GoMaGuKn 1972_LuHeCoGo 28
10-1 2008_ZoShOvPo 2007_ZoOvShPo 2007_ScPaTa_a 2007_ScPaTa_b 8
N= 1510-30 cm-1
10-3 2006_GoMaGuKn 1972_LuHeCoGo 1987_BeKoPoTr 1999_MaNaNAOd 1976_FlGi 1981_Partridg 0
10-2 2008_ZoShOvPo 2007_ZoOvShPo 2007_ScPaTa_a 2007_ScPaTa_b 0
N= 3330-50 cm-1
10-2 1999_MaNaNAOd 1985_Johns 1981_Partridg 1976_FlGi 17
10-2 2007_ScPaTa_a 2007_ScPaTa_b 2007_ZoOvShPo 2008_ZoShOvPo 0
N= 34250-200 cm-1
10-1 1999_MaNaNAOd 1985_Johns 1980_KaKy 1978_KaKaKy 1977_Winther 205
10-1 2007_ScPaTa_a 2007_ScPaTa_b 2008_ZoShOvPo 2007_ZoOvShPo 2
N= 6889 200-10000 cm-1
10-1 1985_Johns 1978_KaKaKy 1980_KaKy 1977_Winther 2003_MiTyMe 1998_Toth 1992_Toth 2006_LiDuSoWa 1983_Guelachv 1993_Toth 1971_WiNaJo 1978_JoMc 1975_ToMa 1994_Toth_a 1969_FrNaJo 1983_PiCoCaFl 1973_CaFlGuAm 1983_ToBr 2007_JeDaReTy 2002_MiTyStAl 1985_ChMaFlCa 1977_ToCaFl_a 1986_ChMaFlCa 1989_UlZhSh 2006_LiNaSoVo 1986_ChMaCaFl_b 2004_MaRoMiNa 1994_Toth_b 2007_MiLeKaCa 1977_ToFlCa_b 2001_MoSaGiCi 2001_MoSaGiCi 2005_ToTe 2006_LiHuCaMa 1987_ChMaFlCa 2005_ToNaZoSh
1259
10-0 2007_ScPaTa_a 2007_ScPaTa_b 2008_ZoShOvPo 2007_ZoOvShPo 1983_PiCoCaFl 302
N= 244310000-20000 cm-1
10-1 1987_ChMaFlCa 2005_ToNaZoSh 2007_MaToCa 1995_ByNaPeSc 2002_TaBrTe 2005_TaNaBrTe 164
10-0 2007_ZoOvShPo 2007_ScPaTa_a 2007_ScPaTa_b 2008_ZoShOvPo 209
HITRAN Conference, Cambridge, 16-18 June 2010
Publication constraints
0. L.S. Rothman, R.R. Gamache, A. Goldman, L.R. Brown, R.A. Toth, H.M. Pickett, R.L. Poynter, J.-M. Flaud, C. Camy-Peyret, A. Barbe, N. Husson, C.P. Rinsland, and M.A.H. Smith, “The HITRAN database: 1986 Edition,” Appl.Opt. 26, 4058-4097 (1987) 26. H. Partridge and D.W. Schwenke, “The determination of an accurate isotope dependent potential energy surface for water from extensive ab initio calculations and experimental data,” J.Chem.Phys. 106, 4618-4639 (1997).28. J.P. Chevillard, J.-Y. Mandin, J.-M. Flaud, and C. Camy-Peyret, “H2
18O: line positions and intensities between 9500 and 11 500 cm-1. The (041), (220), (121), (201), (102), and (003) interacting states,” Can.J.Phys. 65, 777-789 (1987).30. R.A. Toth, “Linelist of water vapor parameters from 500 to 8000 cm-1,” see http://mark4sun.jpl.nasa.gov/data/spec/H2O.34. Calculation from K.V. Jucks, private communication (2000).
Composite data sourceUnpublished data sourcePublished data and data in HITRAN are not the same
HITRAN Conference, Cambridge, 16-18 June 2010
J. Tennyson, P.F. Bernath, L.R. Brown, et al., IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part I. Energy Levels and Transition Wavenumbers for H2
17O and H218O
Journal of Quantitative Spectroscopy and Radiative Transfer, July 2009, V.110, no.9-10,
P.573-596.
J. Tennyson, P.F. Bernath, L.R. Brown, et al.,
IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part II. Energy Levels and Transition Wavenumbers for HDO, HD17O and HD18O
Journal of Quantitative Spectroscopy and Radiative Transfer, 2010.
Non-formal constraintsNon-formal constraints
HITRAN Conference, Cambridge, 16-18 June 2010
Current State of the W@DIShttp://wadis.saga.iao.ru http://saga.molsp.phys.spbu.ru http://atmos.appl.sci-nnov.ru
H2OH2SSO2
CO2
N2ONH3
CH4
C2H2
COO2
OCSHNCO
~ 2000 articles
H2OH2SCO2
COCH4
~ 1200 data sets
e-Library(Primary Data)
Digitized Data
H2OH2SSO2
O3
N2OOCS NH3
C2H2
COHBrOCO2
CH4 In the end of August 2010
CO2
H2OH2Snow
+NH3
COCH4
In the end of 2010
Upload Systems
Data Base &Knowledge Base of DIS
HITRAN Conference, Cambridge, 16-18 June 2010
Summary
► A node prototype of a Distributed Information System for acquire, retrieve, publish and maintain data, information and knowledge in quantitative molecular spectroscopy is developed and implemented
► A component of the publishing tools provides formal validation of data is implemented
► IS W@DIS – http://wadis.saga.iao.ru, http://atmos.appl.sci-nnov.ru
http://saga.molsp.phys.spbu.ru
HITRAN Conference, Cambridge, 16-18 June 2010
AcknowledgementsAcknowledgements
HITRAN Conference, Cambridge, 16-18 June 2010
We thank Prof. J.Tennyson for the assistance providing the creation of all data collections and Dr. S.Tashkun for his contribution in CO2 data collection.
Fazliev A. thanks Prof. Tyuterev Vl.G. for fruitful discussion on the publications constraints.
This work has received partial support from RFBR and 7-th Framework Programme