1 Semantic and Streaming Grids Chinese Academy of Sciences Dec 6 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories

Embed Size (px)

DESCRIPTION

3 Data Deluged Science In the past, we worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new science and new ways of computing Data assimilation was not central to HPCC DoE ASCI set up because didn’t want test data! Now particle physics will get 100 petabytes from CERN Nuclear physics (Jefferson Lab) in same situation Use around 30,000 CPU’s simultaneously 24X7 Weather, climate, solid earth (EarthScope) Bioinformatics curated databases (Biocomplexity only 1000’s of data points at present) Virtual Observatory and SkyServer in Astronomy Environmental Sensor nets

Citation preview

1 Semantic and Streaming Grids Chinese Academy of Sciences Dec Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 2 Four Data Streaming Application Areas Data Assimilation applied to link the data deluge (satellites, sensors, seismometers) in real time to small and large scale parallel simulations Use in Earthquake Science Department of Defense (and Homeland Security) have built the Global Information Grid with a target architecture NCOW (Network Centric Operations and warfare) They submit no jobs; rather stream data to brokers from which they are filtered and distributed Includes their rather dated distributed simulation HLA Audio-Video Conferencing implemented with services and Grid messaging Hand-held Grid linking PDA/cell-phones to Grids 3 Data Deluged Science In the past, we worried about data in the form of parallel I/O or MPI-IO, but we didnt consider it as an enabler of new science and new ways of computing Data assimilation was not central to HPCC DoE ASCI set up because didnt want test data! Now particle physics will get 100 petabytes from CERN Nuclear physics (Jefferson Lab) in same situation Use around 30,000 CPUs simultaneously 24X7 Weather, climate, solid earth (EarthScope) Bioinformatics curated databases (Biocomplexity only 1000s of data points at present) Virtual Observatory and SkyServer in Astronomy Environmental Sensor nets 4 Information/Knowledge Grids Distributed (10s to 1000s) of data sources (instruments, file systems, curated databases ) Data Deluge: 1 (now) to 100s petabytes/year (2012) Moores law for Sensors Possible filters assigned dynamically (on-demand) Run image processing algorithm on telescope image Run Gene sequencing algorithm on compiled data Needs decision support front end with what-if simulations Metadata (provenance) critical to annotate data Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available 5 Database SS SSSSSSSSS FS FSFS Portal FSFS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS MD MetaData Filter Service Sensor Service Other Service Another Grid Raw Data Data Information Knowledge Wisdom Decisions S S Another Service S Another Grid S SS FS SOAP Messages 6 Semantic Grid and Services Implications of SOA (Service Oriented Architectures) for SG (Semantic Grid) Build services to implement SG Implications of SG for SOA Build metadata rich systems of services using SG Services receive data in SOAP messages, manipulate it and produce transformed data as further messages Meta-data is carried in SOAP messages Meta-data controls processing and transport of SOAP Messages Knowledge is created from data by services The Grid enhances Web services with semantically rich system and application specific management One must exploit and work around the different approaches to meta-data and their manipulation in Web Services 7 Structure of SOAP Messages SOAP Messages have System information in the header including WS-Policy based meta-data defining processing options Processed by Handlers Application data and meta-data is the body (controversies here!) Processed by the Service itself Some meta-data like WS-RF is logically only in messages Other like that in WS-Context or the SRB are stored in logical equivalent of XML databases We only need to preserve semantic structure (XML/SOAP Infoset) so transport in fast XML and store in efficient relational databases H1H4H3H2Body F1F2F3 F4 Service Container Handlers Container Workflow 8 What Type of Services are there? There are a horde of support services supplying security, collaboration, database access, user interfaces The support services are either associated with system or application We will study the WS-* and GS-* which implicitly or explicitly define many support services There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input Simulations (including PDEs and reactive systems) Data-mining Transformations Agents Reasoning are all termed filters here There are services like author ontology, parse RDF or attach provenance that directly support Semantic Grid But all services and their interactions are bathed in sea of meta- data and so implicitly need and support the Semantic Grid 9 Its a Composite Hierarchical World Filters can be a workflow which means they are just collections of other simpler services One needs meta-data to control the workflow Services are programs that accept messages and produce messages Grids are a distributed collection of services supporting managed shared resources Management requires meta-data Grids are distributed systems that accept distributed messages and produce distributed result messages Can always talk about Grids and view a service or a workflow as a special case of a Grid It just requires meta-data to send a message to a Grid and it routed to correct computer holding requested service Meta-data allows mapping of virtual to real addresses 10 Semantically Rich Services with a Semantically Rich Distributed Operating Environment Database SS SSSSSSSSS FS FSFS Portal FSFS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS OSOS MD MetaData Filter Service Sensor Service Other Service SOAP Message Streams Raw Data Data Information Knowledge Wisdom Decisions Information Another Service Another Grid Grids of Grids Architecture is same as outward facing application service 11 GIS Grids and Sensor Grids OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors GML Geography Markup language defines specification of geo-referenced data SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information Grid workflow links services that are designed to support streaming input and output messages We are building Grid (Web) service implementations of these specifications for NASAs SERVOGrid 12 A Screen Shot From the WMS Client 13 WMS uses WFS that uses data sources Northridge2 Wald D. J , ,34.176 Electric Power and Natural Gas data from LANL Interdependent Critical Infrastructure Simulations Zoom-in Zoom-out FeatureInfo mode Measure distance mode Clear Distance Drag and Drop mode Refresh to initial map 15 Typical use of Grid Messaging in NASA Datamining Grid Sensor Grid Grid Eventing GIS Grid 16 Typical use of Grid Messaging HPSearch Manages Narada Brokering Sensor Grid WS-Context Stores dynamic data Filter or Datamining WFS (GIS data) Post before Processing Post after Processing Notify Subscribe Grid Database Archives Web Feature Service GIS Grid Geographical Information System 17 Real Time GPS and Google Maps Subscribe to live GPS station. Position data from SOPAC is combined with Google map clients. Select and zoom to GPS station location, click icons for more information. 18 Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records. Integrating Archived Web Feature Services and Google Maps 19 Google Maps as Service accessed from our WMS Client 20 3 XML Databases of Importance WS-Context controlling a workflow (Extended) UDDI supporting semantic service discovery WFS or ASFS (see later) provides application specific data/meta-data repository) These have different performance, scalability and data unit size requirement In our implementation, each is currently just an Oracle/MySQL database front ended by filters that convert between XML (GML for WFS) and object-relational Schema Example of Semantics (XML) versus representation (SQL) difference OGSA-DAI offers Grid interface to databases we could use but dont as we only need to expose WFS and not MySQL to Grid 21 Information Management/Processing SOAP messages transport information expressed in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides Semantic Web technologies like RDF and OWL help us have rich expressivity Data Information Knowledge transformation We build application specific information management/transformation systems ASIS for each application domain One special domain is the system itself where the metadata associated with services, sessions, Grids, messages, streams and workflow is itself managed and supported by an SIIS 22 Generalizing a GIS Geographical Information Systems GIS have been hugely successful in all fields that study the earth and related worlds They define Geography Syntax (GML) and ways to store, access, query, manipulate and display geographical features In SOA, GIS corresponds to a domain specific XML language and a suite of services for different functions above However such a universal information model has not been developed in other areas even though there are many fields in which it appears possible BIS Biological Information System MIS Military Information System IRIS Information Retrieval Information System PAIS Physics Analysis Information System SIIS Service Infrastructure Information System 23 ASIS Application Specific Information System I a) Discovery capabilities that are best done using WS-* standards b) Domain specific metadata and data including search/store/access interface. (cf WFS). Lets call generalization ASFS (Application Specific Feature Service) Language to express domain specific features (cf GML). Lets call this ASL (Application Specific language) Tools to manipulate information expressed in language and key data of application (cf coordinate transformations). Lets call this ASTT (Application specific Tools and Transformations) ASL must support Data sources such as sensors (cf OGC metadata and data sensor standards) and repositories. Sensors need (common across applications) support of streams of data Queries need to support archived (find all relevant data in past) and streaming (find all data in future with given properties) Note all AS Services behave like Sensors and all sensors are wrapped as services Any domain will have raw data (binary) and that which has been filtered to ASL. Lets call ASBD (Application Specific Binary Data) 24 ASIS Application Specific Information System II Lets call this ASVS (Application Specific Visualization Services) generalizing WMS for GIS The ASVS should both visualize information and provide a way of navigating (cf GetFeatureInfo) database (the ASFS) The ASVS can itself be federated and presents an ASFS output interface d) There should be application service interface for ASIS from which all ASIS service inherit e) There will be other user services interfacing to ASIS All user and system services will input and output data in ASL using filters to cope with ASBD AS Tool (generic) AS Sensor AS Repository AS Service (user defined) ASVS Display AS Tool (generic) Messages using ASL Filter, Transformation, Reasoning, Data-mining, Analysis 25 Everything Is a Service or a message/ Information Nugget Military Information Management System Directly GS-* WS-* ASVS Filters/ASTT 26 MIO or Military Information Object Unit of Managed Information expressed in ASL OGSA-DAI and Sensor Standards Info-D WS-Notification WS-Eventing ASFS 27 Two-level Programming I The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model We make a Service (same as a distributed object or computer program running on a remote computer) using conventional technologies C++ Java or Fortran Monte Carlo module Data streaming from a sensor or Satellite Specialized (JDBC) database access Such services accept and produce data from users files and databases The Grid is built by coordinating such services assuming we have solved problem of programming the service Service Data 28 Two-level Programming II The Grid is discussing the composition of distributed services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs Such interpretative environments are the single processor analog of Grid Programming Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately Service1Service2 Service3Service4 29 WS 2WS N-1Web Service 1Web Service N 3 Layer Programming Model Level 2 Programming choosing services by virtualization Application Semantics (Metadata, Ontology) Semantic Grid Level 1 Programming inside services Application expressed in in Java Fortran C++ MPI etc. Level 3 Grid Programming composing multiple services Service Workflow, Transactions, Mediation WS-* Infrastructure Substantial work in UK e-Science program, international semantic web community 30 Consequences of Rule of the Millisecond Useful to remember critical time scales 1) ms CPU does a calculation 2a) to 0.01 ms Parallel Computing MPI latency 2b) to 0.01 ms Overhead of a Method Call 3) 1 ms wake-up a thread or process either? 4) 10 to 1000 ms Internet delay: Workflow So use pointers and the compute memory system when latencies of 1 millisecond but use URI looked up in a context store when longer delays allowed Transfer data when read-only and long latency allowed Always choose the slowest allowed methodology and remember when in doubt, Moores law favors computer performance and systems always get more complex and harder to maintain. Classic Programming 31 GlobalMMCS Web Service Architecture SIPH323 Access GridNative XGSP Admire Gateways convert to uniform XGSP Messaging High Performance (RTP) and XML/SOAP and.. Media Servers Filters Session Server XGSP-based Control NaradaBrokering All Messaging Use Multiple Media servers to scale to many codecs and many versions of audio/video mixing NB Scales as distributed Web Services NaradaBrokering 32 GlobalMMCS Architecture Event Messaging Service (NaradaBrokering) XGSP Conference Control Service Audio Video Web Service Instant Messaging Web Service Shared Display Web Service Shared . Web Service Non-WS collaboration control protocols are gatewayed to XGSP NaradaBrokering supports TCP (chat, control, shared display, PowerPoint etc.) and UDP (Audio-Video conferencing) 33 XGSP Example: New Session GameRoom chess chess-0 John false chess-0 Bob black chess-0 Jack white 34 XGSP AV Signaling Protocol with H.323 35 NaradaBrokering Messaging infrastructure for collaboration, peer-to-peer and Grids Implements JMS and native high-performance protocols (message transit time of 1 to 2 ms per hop) Order-preserving message transport with QoS and security profiles Support for different underlying transport such as TCP, UDP, Multicast, RTP SOAP message support and WS-Eventing, WS-RM and WS-Reliability. WS-Notification when specification agreed Active replay support: Pause and Replay live streams. Stream Linkage: can link permanently multiple streams using in annotation of real-time video streams Replicated storage support for fault tolerance and resiliency to storage failures. Management: HPSearch Scripting Interface to streams and brokers (uses WS-Management) Broker Topics and Message Discovery: Locate appropriate Integration with Axis2 Web Service Container (?) High Performance Transport supporting SOAP Infoset 36 Average Video Delays for one broker Performance scales proportional to number of brokers Latency ms # Receivers One session Multiple sessions 30 frames/sec 37 Collaboration Grid UDDI Narada Broker HPSearch WS-Context Gateway WS-Security Narada Broker Gateway XGSP Media Service Video Mixer Transcoder Audio Mixer Replay Record Annotate Thumbnail WhiteBoard SharedDisplay SharedWS 38 GlobalMMCS SWT Client Chat TV WebcamVideo Mixer GIS 39 e - Annotation Player Archived stream player Annotation / WB player Archieved stream list Real time stream list e - Annotation Whiteboard Real time stream player Archived Real Time Real Time Stream List Stream List Player e-Annotation Archived Stream Annotated e-Annotation Player Player Stream Player Whiteboard 40 Location of software for Grid Projects in Community Grids Laboratory htpp://www.naradabrokering.org provides Web service (and JMS) compliant distributed publish-subscribe messaging (software overlay network) htpp://www.naradabrokering.org htpp://www.globlmmcs.org is a service oriented (Grid) collaboration environment (audio-video conferencing) htpp://www.globlmmcs.orgis an OGC (open geospatial consortium) Geographical Information System (GIS) compliant GIS and Sensor Grid (with POLIS center) has WS-Context, Extended UDDI etc.The work is still in progress but core part of NaradaBrokering is quite mature All software is open source and freely available 41 Summary Virtualization everywhere Focus on semantics not representation to get performance combined with expressivity for transport and data access All this enabled by powerful meta-data services Grids add management to rich but potentially chaotic set of Web Services; management and coherence enabled by meta-data Can define general information architectures (ASIS, GIS, SIIS) for both applications and system Knowledge from filters that span simulations, data- mining, reasoning and agents A service is just a special case of a Grid Build systems from SubGrids (Gridlets)