Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA

Embed Size (px)

Citation preview

  • Slide 1

Towards Personalized and Active Information Management for Meteorological Investigations Beth Plale Indiana University USA Slide 2 Problem Statement Mesoscale meteorology research is highly data- driven. Large percentage of data streams in from observational platforms. Available in OPeNDAP servers. Data that is over 10 minutes old is too old. Researchers are currently working on increasing real- time responsiveness to developing weather conditions. Mesoscale meteorology is a vast information space. Forecasting models assimilate data from growing number of sources Slide 3 Solution Statement Internet has proven the utility of user-oriented view towards information space management Browser, bookmarks to organize Blogs, web page tools ( FrontPage, Dreamweaver ) to publish We apply concept of user-oriented view to management of mesoscale meteorology information space. myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology. Slide 4 Motivation for LEAD Each year, mesoscale weather floods, tornadoes, hail, strong winds, lightning, and winter storms causes hundreds of deaths, routinely disrupts transportation and commerce, and results in annual economic losses > $13B. Slide 5 Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Slide 6 OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Conventional Numerical Weather Prediction Slide 7 Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Slide 8 Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Slide 9 Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Slide 10 Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students Conventional Numerical Weather Prediction OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites The process is entirely serial and pre-scheduled: no response to weather! The process is entirely serial and pre-scheduled: no response to weather! Slide 11 Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students The LEAD Vision: No Longer Serial or Static OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Slide 12 Analysis/Assimilation Quality Control Retrieval of Unobserved Quantities Creation of Gridded Fields Prediction PCs to Teraflop Systems Product Generation, Display, Dissemination End Users NWS Private Companies Students The LEAD Vision: No Longer Serial or Static OBSERVATIONS Radar Data Mobile Mesonets Surface Observations Upper-Air Balloons Commercial Aircraft Geostationary and Polar Orbiting Satellite Wind Profilers GPS Satellites Slide 13 LEAD data: initial working data set ETA model gridded analysis METAR surface observations Rawinsondes upper air balloon observations ACARS commercial aircraft temperature and wind observations NEXRAD Level II data GOES visible satellite data Slide 14 Slide 15 Returning to Solution Statement We apply concept of user-oriented view to management of mesoscale meteorology information space. myLEAD: tool to help an investigator make sense of, and operate in, the vast information space that is mesoscale meteorology. Slide 16 Information space management tool At core is metadata catalog Why? Observational products already being stored elsewhere. Public file and could be large, so do not want to copy users file system. Instead maintain bookmark Scale to support thousands of distributed users, including individual investigators, pre-college classroom investigators, casual observers. Slide 17 Technical Challenges Querying must be efficient Over data products described by rich domain-specific metadata Over data products whose description can be augmented over time Obtaining metadata is hard Automate as much as possible Privacy must be fully enforced Any data product that user designates as private must remain private Publishing Publish product to larger community: data file, model output, full experiment Must be under user control Discovery of information that has been made public Build trust User may work within myLEAD space for 5 years of graduate work, for instance User must be convinced of privacy, reliability, longevity, etc. Slide 18 Rundown on Implementation Specs Building on top of MCS and OGSA-DAI MCS for extensible db schema, general db schema, and security infrastructure already in place OGSA-DAI for grid/web service architecture Database used is mySQL 5.0 Supports stored procedures Ogsa-dai to mySQL is JDBC Data product descriptions in and out of database conform to LEAD-specific XML schema. myLEAD server and myLEAD agent are written in java. Slide 19 Related Work mySpace AstroGrid, UK Similar to myLEAD in reigning information space Creates swatches in large federation of data archives for the cache and persistent data for a community Provides common query access over cache space and persistent space RDF (Resource Description Framework) Basic building block is the subject-predicate-object triple: [S] P -> [O] [Dickens] hasWritten -> [Pickwick Papers] Good for storing detailed relationship information (good for understanding the relationship between two terms) NEESgrid NCSA Uses RDF Little available in public literature myGrid Information Repository (MIR) myGRID, Manchester Most similar to myLEAD Support for text search scientific papers, uses Life Sciences Identifier (LSID) myLEAD stronger personal orientation (gurantees, publishing, automatic metadata generation) Slide 20 myLEAD service Server side services Client side services data model data model MCS myLEAD stored procedures OGSA-DAI JDBC MCS client myLEAD agent Portal access to myLEAD User interface relational DB myLEAD myLEAD Architecture Slide 21 Factory myLEAD agent instance myLEAD agent instance WRF model Data mining task Data mining task workflow myLEAD service myLEAD service Storage Repository Service (RLS) Storage Repository Service (RLS) myLEAD portlet as component of LEAD portal /var/tmp/wrf_tmp IU NCSA myLEAD use scenario Workflow confers with myLEAD agent to determine location of scratch space Slide 22 Metadata Catalog Data Model Users Investigations Tornado April 20 Chicago Illinois Experiments Ensemble: run of 100 simultaneous forecast models parameterized slightly differently Collections Logical files Input observational files, input parameters, derived files, analysis results, images, model results, workflows, execution status messages AbeBingCaru Slide 23 Investigation User Dublin Core Attributes stored in type tables: i.e., string, float, temporal, int. Great extensibility, but need to carefully control naming; efficient querying could be an issue as well. Logical file Collection Data Model Slide 24 myWorkspace: J. Kowaleski preferences Experiment 1: Norman, OK 21Oct04:23:11:45 Workflow template vizEta 03Aug04:13:35:40 Workflow template WRF 15May04:05:25:59 Favorite spaces Home disk space Thor cluster scratch space Input parameters NEXRAD 26Oct04:13:45:40 GOES-infrared 26Oct04:12:00:00 METAR 26Oct04:09:10:05 Wrf-out1-26Oct04:13:35:40 Input observational WRF-out Wrf-out2-26Oct04:13:37:25 Wrf-out3-26Oct04:13:43:15 workflow instance Collection level Logical file level Have associated a set of attributes that describe this data product Browser provides user a hierarchical view of space that is essentially flat. Users like hierarchy. Data Model Slide 25 myLEAD agent Separate transient grid/web service Has state about user, current investigation and experiment Embeds myLEAD client API Purpose: Controls naming Helps use database structure in repeatable, meaningful way Maintains FSM of current state of execution; stores into new collection based on state Input model run analysis final results Derives metadata attributes for new data product object when created during course of workflow by means of: Case-based reasoning Internal state Consulting ontology Slide 26 Resources Geo- Data products Observational data Model generated data Collections Derived data Data analytics Workflow scripts compute resources, storage resource Data analytics resources (statistics table) services Model input resources Resources: things that need describing (i.e., metadata) Data mining Data Product Metadata Slide 27 Notes Global ID LSID for geosciences Temporal coverage Same as spatial Spatial coverage GML, THREDDS, FGDC, COARDS-CF Geophysical quantity Defined by common vocabulary Platform Goes10, Goes8; WSR-88, CASA Instrument type site East-west; KXYZ Model run info Model derived data product Syntactic description Binary format of data product Contact info Dublin core Physical location of service Protocol to access service Dataset summary Dublin core list of predecessors GID of input data products, workflow instance Event mesocyclone, storm cell, tornado Quality Complex Completeness Slide 28 Current Research Challenges Publishing Publishing data product to larger community: data file, model output, full experiment Discovery of information that has been made public Guarantees Any data product that user designates as private must remain private When request for product is issued, product must exist Flexible yet efficient schema Inherited from MCS, supports evolved understanding of data product over time by means of extended attributes Immutable investigations Collections, views, and logical files can be reused from earlier investigations without destroying integrity of earlier investigation Proactive agent Infers metadata attributes from context of active experiment using case- base reasoning. Slide 29 Beth Plale [email protected] 4 days away from our national elections wish us well.