Upload
napua
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Adaptable and Incremental Metadata Capture in e-Science. Scott Jensen Data to Insight Center Indiana University. University of Chicago – March 2, 2012. What is Metadata?. Data About Data - PowerPoint PPT Presentation
Citation preview
Scott Jensen
Data to Insight Center
Indiana University
Adaptable and Incremental Metadata Capture in e-Science
University of Chicago – March 2, 2012
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 2
What is Metadata?Data About Data• “structured information that
describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage any other resource” National Information Standards Organization
• Alternately, answers the who, what, when, and why questions about a dataset.ISO 19115 standard
– Where (spatial metadata)– How (configuration)
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 3
Why Does Metadata Matter?
• Data Reuse– “Metadata is key to being able to share results”
U.K. e-Science Core Programme
– “A significant need exists in many disciplines for long-term, distributed, and stable data and metadata repositories”
NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure
– “Preservation of digital data is arguably a ‘grand challenge’ of the information age” Francine Berman
• Trusting and Understanding Data– The ability to understand and evaluate the quality of data is key to reuse after
discovery. If they have too much uncertainty, they would not use it.Ann Zimmermann
• Data that is Costly and Irreplaceable– Can other data be regenerated?
• Data Management Plans
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 4
Metadata Capture
• Historically done at the end of the data lifecycle– Research is completed– Data and results tarred up as a dataset – metadata at the dataset level– Inserts are full metadata documents
• Metadata often captured at the collection level– Generalized and not specific to each data product– Collection level metadata for discovery (e.g., WCS)– Detailed metadata stored as an object
• Data search is coarse – Based on keywords or text search– Spatial bounding box and temporal range– Not specific to a data product , details not searchable– Sometimes just browse capabilities
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 5
How Much Metadata to Capture?
Lower Barriers to Entry
Less
Str
uctu
reM
ore Structure
Structured Metadata Schemata (FGDC, EML)
CoreMetadata
Richer Metadata to Search Over
Name / ValuePairs
Flat Schemata(unqualified DC)
Cost / Benefit Trade-offs
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 6
Research Problem
• Early Capture of Ephemeral Metadata– Incremental, not at the end of the lifecycle– Incremental capture must be efficient
• Deluge, Tsunami, Bonanza – Requires automation– Detailed metadata for discovery– Scalability
• Variable and Dynamic Data– Must accommodate new metadata– Accommodate different domains and schemata
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 7
Research Focus• Identified the concept based character of scientific metadata
schemas that differentiates them as a class from other XML schemas.
• Capture metadata incrementally and efficiently early in the scientific process– Capture detailed metadata without full update– Reconstruct metadata on-the-fly after incremental capture– automated metadata extraction from data objects
• Incremental capture must be efficient and scalable
• Architecture must generalize across schemas and domains
• Detailed metadata must be discoverable
• Extensible without schema modifications
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 8
Metadata Schemas - a Bag of ConceptsStandard
Metadata
Identification EntityAttribute
MetadataReferenceDistributionSpatial
ReferenceSpatialData
DataQuality
CoreFGDC Spatial Schema
ISO 19115• Identification• Constraints• Data Quality• Spatial• Reference
System• Distribution• Metadata
Extension• and more …
DDI (version 2.0)• Description• Study description• Physical file
description• Logical description
(variables)• other
Astronomy• Identity• Curation• Content• Coverage• Spatial• Temporal• Data Quality
Ecology (EML)• General• Geographic• Temporal• Taxonomic• Methods• Data table
metadata
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 9
Concepts have Complex Structure
• Schemata are often composed of complex concepts (compound elements)– “Compound elements represent higher-level concepts that cannot be represented by an
individual data element”
• Increased structure → Increased reusability
• Flat schema → difficulty harvesting– Harvesting Dublin Core led to incomplete and inconsistent data - California Digital
Libraries
– Similar issues at the National Science Digital Library made it difficult to build services on harvested Dublin Core.
• Performance bottleneck when converting XML to name/value pairs
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 10
Concepts & Incremental Metadata Capture
• As an experiment runs, adding a concepts does not require editing the existing metadata.
• Can capture ephemeral metadata such as workflow notifications and add them to a detailed metadata document.
• Metadata can be harvested from files and added as queryable metadata at different levels of the hierarchy.
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 11
Partitioning a Schema on Concepts
Indentification
Citation
Keywords
.
.
.Theme
Temporal
...
Thesaurus
Theme Keyword
Originator
Publication Date
Publication Time
Title...
Publication InfoPublication Place
Publisher
Larger Work Citation
...
Entity and Attribute Detailed Desc
Entity TypeType Label
Type Definition
Definition Source
Attribute
Thesaurus
Temporal Keyword
Attribute Label
Definition
Definition Source
Domain Values
.
.
.
Distribution Distributor
Standard Order Process
Metadata
Metadata Concepts
Elements Within a Concept
Concept Requirements:Recursion is within concept
Elements where cardinality can exceed one are concepts or contained in concepts
Beneficial when CRUD operations are at the concept level or higher
Global ordering of concept elements and higher levels• Incremental ingest – no need
to modify existing concepts.• Efficient reconstruction based
on concept-sized fragments1
2
3
56
7
12 13
16
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 12
Shredding XML Concepts• Metadata documents are
“shredded” into concepts and then concepts are shredded into elements using XSLT.
• Once CLOBs are stored, metadata cannot be lost.
• CLOBs are indexed on Object ID and their global ordering.
• Shredded metadata is only a search index, allowing for strong typing – even if types do not match XML.
Metadata Document
IDConceptConceptConcept …GlobalOrderCLOBShredded Concept
NameSourceSub-concept *Element *
NameSourceTyped Value
Detailed Search
Fast
Res
pons
e
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 13
DatabaseXMC Cat
Ingest & Search Using Incremental Capture
ShreddedConcepts
ConceptCLOBs
DetermineSchema Validate Shred
new concept
BuildQuerysearch based
on concepts query shredded metadata for matching objects
BuildResult
object IDs
query for CLOBs based on IDs
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 14
Exploded Datasets(Describing data in a broader context)
• Not a tarball at the end of a project
• Automated capture during an experiment
• Data objects are generated throughout a workflow
• Experiment data hierarchies vary by domain
• Provides scientists access to incremental metadata
Metadata Catalog
Query For D
ata
Browse
Sea
rch
Compose
Workflow
Gateway
Message Bus
Wor
kflo
wN
otifi
catio
ns
Workflow Inputs
Intermediate Results
Workflow Outputs
Workflow Notifications
Incremental Capture During a Workflow or Experiment
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 15
Automated Metadata Capture
DataRepository
Science Gateway
Data Management Agent
Archived to the data repository XMC Cat Metadata Catalog
Minimal source metadata is recorded
worker
Post-processing of data registration events
Registration eventsadded to queue
pluginplugin
worker
pluginplugin
Database
dataregistration
event queue
nodenode
node
node
Workflow Nodes Register Data Products
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 16
Domain Schema → Generalized Architecture
XMC Cat Browser
XMC Cat Web Service
XMC Cat WSDLFunctionality
Plug-insXMC Cat
WSDL
DomainMetadata Schema
Domain SchemaXML Bean
Domain SchemaXSLT Shredding Templates
Domain ConceptDatabase Script
Post-ProcessingPlug-ins
DistributedShredders
Dat
a S
tore
External Services
Generated withXMC Cat Builder
XMC CatDatabase
Shred Data on Ingest
Cast Metadata on Ingest
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 17
Adaptable Metadata Store
• Shares characteristics of clinical genomics databases and relational RDF stores such as Jena.
• Definition of concepts is based on schema structure.
• Dynamic concepts can be defined based on metadata content instead of structure.
• Every concept is stored as a CLOB
• Concepts can optionally be parsed into concepts, sub-concepts, and elements.
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 18
A Generic Structure for SearchingDomain Concepts
Schema-BasedXML XML
Shredding
ShreddedLeafData
Elements
XML FragmentsAs CLOBs
Detailed QueryBased on
Shredded Data Elements
Query for CLOBs and
Build Response
Object IDs
CLOBs
XMLResponseQuery Shredded
Data Elements forObject IDsObject
IDs
Query onData
Elements
ObjectIDsMetadata Schema : Concept +
Concept : Sub-Concept *, Atomic Element *
Sub-Concept : Sub-Concept *, Atomic Element *
Atomic Element : date | time | timestamp | integer | float | spatial | string
Complex Domain-Specific Concepts
Generalized Concepts, Sub-Concepts and Elements
mapped to
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 19
Shredding Domain Metadata<lead:LEADresource xmlns:lead="http://schemas.leadproject.org/2007/01/lms/lead" xmlns:le="http://schemas.leadproject.org/2007/01/lms/leadelements" xmlns:fgdc="http://schemas.leadproject.org/2007/01/lms/fgdc"> <le:resourceID>urn:uuid:97afbef7-58c8-4143-9b05-f0b9d82d27ef</le:resourceID> <lead:data> <lead:idinfo> <lead:citation> <fgdc:origin>/C=US/O=National Center for Supercomputing Applications/CN=Anne Wilson</fgdc:origin> <fgdc:pubdate>Unknown</fgdc:pubdate> <fgdc:title>LEAD CONUS ADAS Catalog/CONUS ADAS 10km</fgdc:title> <fgdc:pubinfo> <fgdc:pubplace>unknown</fgdc:pubplace> <fgdc:publish>IU/GEOG</fgdc:publish> </fgdc:pubinfo> </lead:citation> <lead:descript> <fgdc:abstract>Real-time meteorological data assimilations with CONUS coverage at 10km resolution produced hourly by CAPS at OU. The List of contents provides the OPeNDAP URLs for the files within the collection. They have a form: http://lead.unidata.ucar.edu/cgi-bin/nph-dods/test-data/ADAS/OU/ad{date}.nc where {date} has the form: YYYYMMDDHH and indicates the hour for which the data assimilation is valid. </fgdc:abstract> <fgdc:purpose>Scientific research and education</fgdc:purpose> </lead:descript> . . . <lead:keywords> <fgdc:theme> <fgdc:themekt>DatasetTypes.lead.org</fgdc:themekt> <fgdc:themekey>ADAS</fgdc:themekey> </fgdc:theme> <fgdc:theme> <fgdc:themekt>CF-1.0</fgdc:themekt> <fgdc:themekey>projection_x_coordinate</fgdc:themekey> <fgdc:themekey>projection_y_coordinate</fgdc:themekey> <fgdc:themekey>height</fgdc:themekey> <fgdc:themekey>geopotential_height</fgdc:themekey>
Citation Concept
Description Concept
2nd Theme Keyword Concept
<lead:LEADresource xmlns:lead="http://schemas.leadproject.org/2007/01/lms/lead" xmlns:le="http://schemas.leadproject.org/2007/01/lms/leadelements" xmlns:fgdc="http://schemas.leadproject.org/2007/01/lms/fgdc"> <le:resourceID>urn:uuid:97afbef7-58c8-4143-9b05-f0b9d82d27ef</le:resourceID> <lead:data> <lead:idinfo> <lead:citation> <fgdc:origin>/C=US/O=National Center for Supercomputing Applications/CN=Anne Wilson</fgdc:origin> <fgdc:pubdate>Unknown</fgdc:pubdate> <fgdc:title>LEAD CONUS ADAS Catalog/CONUS ADAS 10km</fgdc:title> <fgdc:pubinfo> <fgdc:pubplace>unknown</fgdc:pubplace> <fgdc:publish>IU/GEOG</fgdc:publish> </fgdc:pubinfo> </lead:citation>
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 20
Shredded Citation Metadata
<objectClobProperty myPos=“5" (namespaces omitted here) > <objectClob> <lead:citation xmlns:lead="http://schemas.leadproject.org/2007/01/lms/lead" xmlns="http://schemas.leadproject.org/2007/01/lms/lead" xmlns:fgdc="http://schemas.leadproject.org/2007/01/lms/fgdc" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"> <fgdc:origin>/C=US/O=National Center for Supercomputing Applications/CN=Anne Wilson</fgdc:origin> <fgdc:pubdate>Unknown</fgdc:pubdate> <fgdc:title>LEAD CONUS ADAS Catalog/CONUS ADAS 10km</fgdc:title> <fgdc:pubinfo> <fgdc:pubplace>unknown</fgdc:pubplace> <fgdc:publish>IU/GEOG</fgdc:publish> </fgdc:pubinfo> </lead:citation> </objectClob> <objectProperty myName="citation" mySource="LEAD"> <objectProperty myName="pubInfo" mySource="LEAD"> <objectElement myName="pubPlace" mySource="LEAD" myVal="unknown"/> <objectElement myName="publisher" mySource="LEAD" myVal="IU/GEOG"/> </objectProperty> <objectElement myName="originator" mySource="LEAD" myVal="/C=US/O=National Center for Supercomputing Applications/CN=Anne Wilson"/> <objectElement myName="pubDate" mySource="LEAD" myVal="Unknown"/> <objectElement myName="pubDateTime" mySource="LEAD" myVal="Unknown"/> <objectElement myName="title" mySource="LEAD" myVal="LEAD CONUS ADAS Catalog/CONUS ADAS 10km"/> </objectProperty></objectClobProperty>
CLOB forCitation Concept
pubInfoSub-concept
All Shredded Metadata Conforms to the Same Schema
<objectProperty myName="citation" mySource="LEAD"> <objectProperty myName="pubInfo" mySource="LEAD"> <objectElement myName="pubPlace" mySource="LEAD" myVal="unknown"/> <objectElement myName="publisher" mySource="LEAD" myVal="IU/GEOG"/> </objectProperty> <objectElement myName="originator" mySource="LEAD" myVal="/C=US/O=National Center for Supercomputing Applications/CN=Anne Wilson"/> <objectElement myName="pubDate" mySource="LEAD" myVal="Unknown"/> <objectElement myName="pubDateTime" mySource="LEAD" myVal="Unknown"/> <objectElement myName="title" mySource="LEAD" myVal="LEAD CONUS ADAS Catalog/CONUS ADAS 10km"/> </objectProperty></objectClobProperty>
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 21
Dynamic Concepts Based on Content
Entity and Attribute Detailed Desc
Entity TypeType Label
Type Definition
Definition Source
AttributeAttribute Label
Definition
Definition Source
Domain Values
Metadata
1312
1
CLOB parsed out and saved based on global order (schema structure)
Concept defined based on “entity” label and source
Sub-concept and elements defined based on “attribute” label and source
New domain concepts without schema changes• Concept CLOBs are always saved
based on global order – even if concept is not defined.
• To be queryable, new concepts and elements defined, but no schema change is required
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 22
XMC Cat Builder: Concepts
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 23
Deployed in Diverse Domains
• Linked Environments for Atmospheric Discovery (LEAD) – NSF funded science gateway– Metadata describing 500TB of data, intermediate results, and workflow
output– Data objects each described by up to 2,202 elements– Individual workspaces of up to 15,000 objects
• One Degree Imager (ODI) WIYN Consortium– Component in the data subsystem– Data-driven workflows
• SEAD Project– Sustainability science– Provide search capability over archived use metadata
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 24
Comparing to a Native XML DatabaseConcurrent Insert/Query Execution Time in Milliseconds
• Except for queries based on object IDs, XMC Cat at 8X the base workload performs better than Berkeley XML at 1/10th of the base workload.
• XMC Cat experiment inserts include validation not reflected in Berkeley results, eliminating validation, XMC Cat at 8X the workload is 2,477 ms.
Scott Jensen, Devarshi Ghoshal, and Beth Plale, Evaluation of Two XML Storage Approaches for Scientific Metadata Indiana University CS Technical Report TR698, October 2011.
Projected insert and query workload as multiples of projected LEAD workload based on LEAD technical report and insert/query ratios of the TPC-E benchmark.
Minimal (core)
Moderate (file)
Extensive (experiment)
Additional Concept
Query on ID
Context Query
Berkeley 1/10th 87 138 2,659 78 23 1,086
1X 52 74 2,316 26 27 632X 53 76 2,954 27 27 634X 60 80 4,803 29 31 696X 67 88 4,628 32 54 728X 69 89 4,719 36 34 145
XMC Cat - Percentage of Base Workload
Inserting Metadata Queries
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 25
Performance Compared to Inlining
200,000
150,000
100,000
50,000
00 5 10 15 20Inlining 4X Base Workload
Ave
rage
Pro
cess
ing
Tim
e (m
s) 200,000
150,000
100,000
50,000
00 5 10 15 20XMC Cat 9X Base Workload
File QueryExperiment Query
Batch File InsertExperiment Insert
Processing Start Time (minute in test)
Scott Jensen and Beth Plale, Using Characteristics of Computational Science Schemas for Workflow Metadata Management, In Proceedings of the 2008 IEEE Congress on Services, IEEE 2008 Second International Workshop on Scientific Workflows (SWF 2008) , Hawaii, July 2008.
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 26
Eventual ConsistencyBrowse versus Search Metadata
Met
adat
a C
atal
og W
eb S
ervi
ce
pars
e m
etad
ata
into
co
ncep
t CLO
Bs
Catalog
queue
Concept Shredders
. .
.
1a
1b
2
wor
ker
thre
ad
shre
d in
to s
ub-c
once
pts
and
elem
ents
successfullyshredded?
Yes
Yes
Storage of concepts so a user can browse their workspace
Shred of concepts for eventually consistent querying of the workspace
4
5
Metadata Catalog Distributed Concept Shredders
store concept CLOBs to object’s metadata
queue concept’s ID for eventual shredding
experimentsadding
metadata
addingmetadatato existing
experiments
query for a batch
of concepts
CLO
Bs
adde
d to
que
ue
. .
.
wor
ker
thre
ad.
. .
Deq
ueue
CLO
B
3
shredded metadata added to metadata catalog
remove entry for concept in processing queue
6b
6a
Scott Jensen and Beth Plale, Trading Consistency for Scalability in Scientific Metadata, In Proceedings of the 2010 IEEE International Conference on e-Science, Brisbane, Australia, December 2010.
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 27
Bounds on Eventual Consistency
ECt = Wt + Tt + Rt + St + It
Above times are averages for fetching a batch of 100 concepts (Tt and Rt) and then processing each concept (St and It).
Total wait time is dominated by Wt. If the distributed shredders keep pace with the ingest rate, the frequency of the shredders fetching determines Wt
WtTime a concept ID is queued
TtTime to “tag” as taken when fetching (64.42ms per batch of 100 concepts)
RtTime to fetch tagged concepts (74.58ms per batch of 100 concepts)
StTime in local shredder queue and shredded by a worker thread (3.48ms)
ItTime to insert shredded concept into the metadata catalog (13.74ms)
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 28
Evaluation of Eventual Consistency
• Eventual consistency scales higher• Strict consistency scaled to 8X the
projected workload• Mostly due to deferred shredding• Using two eventually consistent
shredders on a separate server
Total Processing
Inserting Shredded Metadata
Strict Consistency
Total With Shredding
Total - Deferred Shredding
Eventual Consistency
Multiple of Base Workload20 4 6 8
Mea
n E
xecu
tion
Tim
e (m
s)
20
0
40
60
80
100
120
140strict consistency is
42% longer at 6Xthe base workload
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 29
Domain-Adaptable Metadata Search
• Metadata search criteria are often limited keywords or text, spatial bounding box, and temporal bounds.
• If rich metadata is captured as a BLOB, it is available as use metadata, but not discovery metadata.
Instead …
• Use domain concepts and dynamic concepts to define search criteria.
• Generic architecture for shredded metadata -> search criteria can include any shredded domain metadata.
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 30
Dynamic Search Definition
Concept_idCategory_idConcept_nameConcept_sourceSchema_order_idConcept_descriptionConcept_short_descTop_concept_idParent_sequenceParent_id
concept_definitions
Category_idCategory_description
metadata_categories
Element_idConcept_idElement_typeElement_nameElement_sourceElement_descriptionElement_short_desc
element_definitions
publisher nameelement definition
pubinfosub-concept
citationconcept
general information
category
<metadataDefinition> <metadataCategoryDef> <categoryId>1</categoryId> <categoryName>General Information</categoryName> <metadataConceptDef> <conceptId>1</conceptId> <conceptName>citation</conceptName> <conceptSource>FGDC</conceptSource> <conceptDesc>citation</conceptDesc> <conceptShortDesc>citation</conceptShortDesc> <metadataElementDef> <elementId>1</elementId> <elementName>originator</elementName> <elementSource>FGDC</elementSource> <elementDesc>citation originator</elementDesc> <elementShortDesc>originator</elementShortDesc> <elementType>6</elementType> </metadataElementDef> . . . <metadataConceptDef> <conceptId>3</conceptId> <conceptName>pubinfo</conceptName> <conceptSource>FGDC</conceptSource> <conceptDesc>publication information</conceptDesc> <conceptShortDesc>pub info</conceptShortDesc> . . . <metadataElementDef> <elementId>12</elementId> <elementName>publish</elementName> <elementSource>FGDC</elementSource> <elementDesc>publisher name</elementDesc> <elementShortDesc>publisher</elementShortDesc> <elementType>6</elementType> </metadataElementDef> </metadataConceptDef>
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 31
Search Adjusts to Domain Concepts
When the target is selected:
all concepts are listed as search options – grouped by their categories
When a concept is selected, all of its sub-concepts and elements are listed
as options
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 32
Strongly Typed Search Criteria
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 33
Current Work• Handle hierarchies based on
multiple schema– Experiments bringing together data from
multiple sources described by different standards.
– Data described by different metadata standards can be combined in a single dataset.
– Metadata can be queried based on different schemas.
• Faceted search– Added to XMC Cat web service.– Can alternate between facets and details.– Unified criteria for multiple schema.
Simulation
Forecast
SensorData
EcologicalData Satellite
Data
CensusData
University of Chicago – March 2, 2012 Adaptable and Incremental Metadata Capture for e-Science 34
Thank You!
Scott Jensen
Thanks also to:- The NSF-funded Linked Environments for Atmospheric Discovery (LEAD) project- Data to Insight Center