View
218
Download
0
Category
Tags:
Preview:
Citation preview
Fox CI and X-informatics - CSIG 2008, Aug 11
1
Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project experience
Peter FoxHigh Altitude Observatory,NCAR
Work performed in part with Deborah McGuinness (RPI), Rob Raskin (JPL), Krishna Sinha (VT), Luca Cinquini
(NCAR), Patrick West (NCAR), Stephan Zednik (NCAR), Paulo Pinheiro da Silva (UTEP), Li Ding (RPI) and
others
Fox CI and X-informatics - CSIG 2008, Aug 11
2
Outline• Background and inevitabilities• Informatics -> e-Science• Informatics methodology e.g. Semantic
Web as a approach and a technology– Virtual Observatories: use cases, some
examples, and non-specialist use– Data ingest, integration, mining and
where we are heading• Discussion
Fox CI and X-informatics - CSIG 2008, Aug 11
3
BackgroundScientists should be able to access a global, distributed
knowledge base of scientific data that:• appears to be integrated• appears to be locally available
But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed
And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…
Fox CI and X-informatics - CSIG 2008, Aug 11
4
But data has Lots of Audiences
From “Why EPO?”, a NASA internalreport on science education, 2005
More Strategic
Less Strategic
InformationInformation products have
SCIENTISTS TOO
Fox CI and X-informatics - CSIG 2008, Aug 11
5
Shifting the Burden from the Userto the Provider
Fox CI and X-informatics - CSIG 2008, Aug 11
6
The Astronomy approach; data-types as a service
… … … …
VO App1
VO App2VO App3
DB2 DB3DBn
DB1
VOTable
Simple Image
Access Protocol
Simple Spectrum
Access Protocol
Simple Time Access
Protocol
VO layer
Limited interoperability
Lightweight semantics
Limited meaning, hard coded
Limited extensibility
Under review
Open Geospatial Consortium:
Web {Feature, Coverage, Mapping} Service
Sensor Web Enablement:
Sensor {Observation, Planning, Analysis} Service
use the same approach
Fox CI and X-informatics - CSIG 2008, Aug 11
7
Mind the Gap!
• As a result of finding out who is doing what,
sharing experience/ expertise, and substantial
coordination:
• There is/ was still a gap between science and the
underlying infrastructure and technology that is
available• Cyberinfrastructure is the new
research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.
Informatics - information science includes the
science of (data and) information, the practice
of information processing, and the engineering
of information systems. Informatics studies the
structure, behavior, and interactions of natural
and artificial systems that store, process and
communicate (data and) information. It also
develops its own conceptual and theoretical
foundations. Since computers, individuals and
organizations all process information,
informatics has computational, cognitive and
social aspects, including study of the social
impact of information technologies. Wikipedia.
Fox CI and X-informatics - CSIG 2008, Aug 11
8
Progression after progression
IT Cyber
Infrastructure
Cyber Informatics
Core Informatics
Science Informatics,
aka
Xinformatics
Science, SBAs
Informatics
Fox CI and X-informatics - CSIG 2008, Aug 11
9
Virtual ObservatoriesMake data and tools quickly and easily accessible to a
wide audience.
Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated
Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage -> thus part IT, part CI, part Informatics
Fox CI and X-informatics - CSIG 2008, Aug 11
10… … … …
VO Portal
Web Serv.
VO API
DB2 DB3DBn
DB1
Semantic mediation layer - VSTO - low level
Semantic mediation layer - mid-upper-level
Education, clearinghouses, other services, disciplines, et c.
Metadata, schema, data
Query, access and use of data
Semantic query, hypothesis and inference
Semantic interoperability
Added value
Added value
Added value
Added value
Mediation Layer• Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes
• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new
hypothesis generation, testing, explanation, et c.
Fox CI and X-informatics - CSIG 2008, Aug 11
11
Semantic Web Methodology and Technology Development Process
• Establish and improve a well-defined methodology vision for Semantic Technology based application development
• Leverage controlled vocabularies, et c.
Use Case
Small Team, mixed skills
Analysis
Adopt Technology Approach
Leverage Technology
Infrastructure
Rapid Prototype
Open World: Evolve, Iterate,
Redesign, Redeploy
Use Tools
Science/Expert Review & Iteration
Develop model/
ontology
Fox CI and X-informatics - CSIG 2008, Aug 11
12
Science and technical use casesFind data which represents the state of the neutral
atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.
– Extract information from the use-case - encode knowledge– Translate this into a complete query for data - inference and
integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere-Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.
Fox CI and X-informatics - CSIG 2008, Aug 11
13
Inferred plot type and return required axes data
Fox CI and X-informatics - CSIG 2008, Aug 11
14
But data has Lots of Audiences
From “Why EPO?”, a NASA internalreport on science education, 2005
More Strategic
Less Strategic
Fox CI and X-informatics - CSIG 2008, Aug 11
15
What is a Non-Specialist Use Case?
Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”.
Someone should be able to query a virtual observatory without having specialist knowledge
Fox CI and X-informatics - CSIG 2008, Aug 11
16
Teacher receives four groupings of search results:
1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.php and http://www.meted.ucar.edu/hao/aurora/
2) Research, data and tools: via research VOs but the search for brightness, or green/red line emission is mediated for them
3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights
4) Did you mean?: Aurora Borealis or Aurora
Australis, etc.
What should the User Receive?
Fox CI and X-informatics - CSIG 2008, Aug 11
17
Semantic Information Integration: Concept map for educational use of
science data in a lesson plan
Fox CI and X-informatics - CSIG 2008, Aug 11
18
Fox CI and X-informatics - CSIG 2008, Aug 11
19
• Scaling to large numbers of data providers and redefining the roles/ relations among them
• Branding and attribution (where did this data come from and who gets the credit, is it the correct version, is this an authoritative source?)
• Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …)
• Crossing discipline boundaries• Data quality, preservation, stewardship• Security, access to resources, policies
Informatics issues for Virtual Observatories
20
Provenance
• Origin or source from which something comes, its intention for use, whom or what it was generated for, the manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery; documented in detail sufficient to allow reproducibility
21
• Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter?
• What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO?
• Find all good images on March 21, 2008.• Why are the quick look images from March 21,
2008, 1900UT missing?• Why does this image look bad?
Use cases
22
23
24
25
Quick look browse
Yasukawa: Computer crash
Yasukawa: Computer crashYasukawa: Rain, cloud
26
27
Visual browse
28
29
30
Search
31
32
A Better Way to Access DataThe ProblemScientists only use data from a single instrument because it is difficult to access, process, and understand data from multiple instruments. A typical data query might be:
“Give me the temperature, pressure, and water vapor from the AIRS instrument from Jan 2005 to Jan 2008”
“Search for MLS/Aura Level 2, SO2 Slant Column Density from 2/1/2007”
A SolutionUsing a simple process, SESDI allows data from various sources to be registered in an ontology so that it can be easily accessed and understood. Scientists can use only the ontology components that relate to their data. An SESDI query might look like:
“Show all areas in California where sulfur dioxide (SO2) levels were above normal between Jan 2000 and Jan 2007”
This query will pull data from all available sources registered in the ontology and allow seamless data fusion. Because the query is measurement related, scientists do not need to understand the details of the instruments and data types.
Fox CI and X-informatics - CSIG 2008, Aug 11
33
Determine the statistical signatures of volcanic forcings on the height of the tropopause
34
Detection and attribution relations…
36
Fox CI and X-informatics - CSIG 2008, Aug 11
37
Leveraged VSTO semantic framework indicating how volcano and atmospheric parameters and databases can immediately be plugged in to the semantic data framework to enable data integration.
Fox CI and X-informatics - CSIG 2008, Aug 11
42
Discussion (1)
• Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability– Use cases– X-informatics– Core Informatics– Cyber Informatics
• Evolvable technical infrastructure
Fox CI and X-informatics - CSIG 2008, Aug 11
43
Progression after progression
IT Cyber
Infrastructure
Cyber Informatics
Core Informatics
Science Informatics
Science, Societal Benefit
Areas, Edu
Informatics
One example:
•CI = OPeNDAP server running over HTTP/HTTPS
•Cyberinformatics = Data (product) and service ontologies, triple store
•Core informatics = Reasoning engine (Pellet), OWL, CMAP,
•Science (X) informatics = Use cases, science domain terms, concepts in an ontology
Fox CI and X-informatics - CSIG 2008, Aug 11
44
Discussion (2)• The data and information challenges are (almost)
being identified as increasingly common• Data and information science is becoming the
‘fourth’ column (along with theory, experiment and computation)
• Semantics are a very key ingredient for progress in informatics
• A sustained involvement of key inter-disciplinary team members is very important -> leads to incentives, rewards, etc. and a balance of research and production
Fox CI and X-informatics - CSIG 2008, Aug 11
45
Summary• Informatics is playing a key role in filling the gap
between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure– This is evident due to the emergence of Xinformatics
(world-wide)• Our experience is implementing informatics as
semantics in Virtual Observatories (as a working paradigm) and Grid environments– VSTO is only one example of success– Data mining, data integration, smart search, provenance
• Informatics is a profession and a community activity and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic
Fox CI and X-informatics - CSIG 2008, Aug 11
46
More Information• Virtual Solar Terrestrial Observatory (VSTO):
http://vsto.hao.ucar.edu, http://www.vsto.org• Semantically-Enalbed Science Data Integration (SESDI):
http://sesdi.hao.ucar.edu • Semantic Provenance Capture in Data Ingest Systems
(SPCDIS): http://spcdis.hao.ucar.edu • SAM/Semantic Knowledge Integration Framework (SKIF):
http://skif.hao.ucar.edu • Conferences: numerous• Journals: Earth Science Informatics• Texts: <empty>, a few are in progress• Courses:
– Semantic e-Science, fall 2008 course at RPI– Geoinformatics, at Purdue
• Contact: Peter Fox pfox@ucar.edu
Fox CI and X-informatics - CSIG 2008, Aug 11
47
Spare room
Fox CI and X-informatics - CSIG 2008, Aug 11
48
Translating the Use-Case - non-monotonic?
Input
Physical properties: State of neutral atmosphere
Spatial:
• Above 100km
• Toward arctic circle (above 45N)
Conditions:
• High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
Fox CI and X-informatics - CSIG 2008, Aug 11
49
VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.org
Web Service
Fox CI and X-informatics - CSIG 2008, Aug 11
50
Partial exposure of Instrument class hierarchy - users seem to LIKE THIS
Semantic filtering by domain or instrument hierarchy
Fox CI and X-informatics - CSIG 2008, Aug 11
51
Fox CI and X-informatics - CSIG 2008, Aug 11
52
Semantic Web Services
Fox CI and X-informatics - CSIG 2008, Aug 11
53
Semantic Web Services
OWL document returned using VSTO ontology - can be used both syntactically or semantically
Fox CI and X-informatics - CSIG 2008, Aug 11
54
Semantic Web Services
Fox CI and X-informatics - CSIG 2008, Aug 11
55
Semantic Web Services
Fox CI and X-informatics - CSIG 2008, Aug 11
56
VSTO achievements • Conceptual model and architecture developed by combined
team; KR experts, domain experts, and software engineers• Semantic framework developed and built with a small,
cohesive, carefully chosen team in a relatively short time (deployments in 1st year)
• Production portal released, includes security, et c. with community migration (and so far endorsement)
• VSTO ontology version 1.2, (vsto.owl) in production, 2.0 in preparation
• Web Services encapsulation of semantic interfaces in use• Solar Terrestrial use-cases are driving the completion of the
ontologies (e.g. instruments)• Using ontologies and the overall framework in other
applications (volcanoes, climate, oceans, water, …)
Fox CI and X-informatics - CSIG 2008, Aug 11
57
Semantic Web Basics• The triple: {subject-predicate-object}
Interferometer is-a optical instrument
Optical instrument has focal length
An ontology is a representation of this knowledge
• W3C is the primary (but not sole) governing organization for languages, specifications, best practices, et c.– RDF - Resource Description Framework – OWL 1.0 - Ontology Web Language (OWL 1.1 on the way)
• Encode the knowledge in triples, in a triple-store, software is built to traverse the semantic network, it can be queried or reasoned upon
• Put semantics between/ in your interfaces, i.e. between layers and components in your architecture, i.e. between ‘users’ and ‘information’ to mediate the exchange
Fox CI and X-informatics - CSIG 2008, Aug 11
58
Semantic Web Benefits• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time• Decreased input requirements for query: in one case reducing the
number of selections from eight to three• Generates only syntactically correct queries: which was not always
insurable in previous implementations without semantics• Semantic query support: by using background ontologies and a
reasoner, our application has the opportunity to only expose coherent query (portal and services)
• Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services– understanding of coordinate systems, relationships, data synthesis,
transformations, et c.– returns independent variables and related parameters
• A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)
59
Example 1: Registration of Volcanic Data
SO2 Emission from Kilauea east rift zone -
vehicle-based (Source: HVO)Abreviations: t/d=metric tonne (1000 kg)/day, SD=standard deviation, WS=wind speed, WD=wind direction east of true north, N=number of traverses
Location Codes:• U - Above the 180° turn at Holei Pali (upper Chain of Craters Road)
• L - Below Holei Pali (lower Chain of Craters Road)
• UL - Individual traverses were made both above and below the 180° turn at Holei Pali
• H - Highway 11
60
Registering Volcanic Data (1)
61
Registering Volcanic Data (2)
• No explicit lat/long data
• Volcano identified by name
• Volcano ontology framework will link name to location
62
Example 2: Registration of Atmospheric Data
Satellite data for SO2 emissions
Abbreviation: SCD: Slant Column Density (in Dobson Unit (DU))
63
Registering Atmospheric Data (1)
Fox CI and X-informatics - CSIG 2008, Aug 11
64
SAM Project ObjectivesS. Graves, R. Ramachandran
• To create a prototype Semantic Analysis and Mining framework (SAM) comprising:– Data mining and knowledge extraction web services– Linked ontologies describing the mining services, data
and the problem domain– Web-based client
• To allow users to discover and explore existing data and services, compose workflows for mining and invoke these workflows.– Semantic search– Automated web service invocation– Automated web service composition
Fox CI and X-informatics - CSIG 2008, Aug 11
65
Data Mining Ontology: Design
Courtesy: R. Ramachandran
Fox CI and X-informatics - CSIG 2008, Aug 11
66
Data Mining Ontology: Snapshot
Courtesy: R. Ramachandran
Fox CI and X-informatics - CSIG 2008, Aug 11
67
The Information Era: Interoperability
• managing and accessing large data sets• higher space/time resolution capabilities • rapid response requirements• data assimilation into models• crossing disciplinary boundaries.
Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:
Fox CI and X-informatics - CSIG 2008, Aug 11
68
Virtual Observatories• Conceptual examples: • In-situ: Virtual measurements
– Related measurements
• Remote sensing: Virtual, integrative measurements– Data integration
• Managing virtual data products/ sets
Fox CI and X-informatics - CSIG 2008, Aug 11
69
Virtual Solar Terrestrial Observatory• A distributed, scalable education and research
environment for searching, integrating, and analyzing observational, experimental, and model databases.
• Subject matter covers the fields of solar, solar-terrestrial and space physics
• Provides virtual access to specific data, model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use
• 3 year NSF-funded (OCI/SCI) project - completed• Several follow-on projects
70
Problem definition
• Data is coming in faster, in greater volumes and outstripping our ability to perform adequate quality control
• Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision
• We often fail to capture, represent and propagate manually generated information that need to go with the data flows
• Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects
• The task of event determination and feature classification is onerous and we don't do it until after we get the data
71
Building blocks
• Data formats and metadata: IAU standard FITS, with SoHO keyword convention, JPeG, GIF
• Ontologies: OWL-DL and RDF• The proof markup language (PML) provides an interlingua
for capturing the information agents need to understand results and to justify why they should believe the results.
• The Inference Web toolkit provides a suite of tools for manipulating, presenting, summarizing, analyzing, and searching PML in efforts to provide a set of tools that will let end users understand information and its derivation, thereby facilitating trust in and reuse of information.
• Capturing semantics of data quality, event, and feature detection within a suitable community ontology packages (SWEET, VSTO)
Recommended