46
transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Embed Size (px)

Citation preview

Page 1: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

transparency, collaboration and information sharing

solution architecture tools and techniques using the social data web

george thomas, 1105 ea2009

Page 2: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 3: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 4: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Web Oriented Architecture (WOA)• REpresentational State Transfer (REST)

– The architectural style of the World Wide Web– aka Resource Oriented Architecture (ROA)

• hyperlinks dereference (information) resource representations– HTTP URI's and content negotiation

• user agent prefers .htm, .xml, .rdf, .etc

• statefulness– servers maintain resource state, clients maintain application state

• RESTful Web services– HTTP uniform interface

• CRUD analog to HTTP PUT/GET/POST/DELETE– contrast to Remote Procedure Call (RPC) style Web services

• SOAP/WSDL, you design the methods to invoke

• global visibility (the Web) and persistence (permalinks)– caching, crawling, indexing

Page 5: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 6: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

XForms - human data capture• Orbeon server side XForms engine, Ajax browser GUI's

• catalog and builder apps• create new XSD bound forms• populate, persist, search• Tomcat and eXist• off-line capability• transformation pipeline

Page 7: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Atom Publishing Protocol (APP)• automated invocation of the RESTful Web service

– HTTP PUT/POST the spreadsheet or XML instance doc• to atomserver.codehaus.org

• where else is APP used?– Google Data API's, Microsoft Live Framework

Page 8: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Atom Syndication Format• transform XForm or APP captured info into XHTML+RDFa • (permalinked) public recordset in feed entry <content>

Page 9: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

the london-gazette.co.uk

Page 10: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

london-gazette.co.uk/listing

small, discreet, component ontology/data-domain-metamodels

Page 11: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

web page = web service

Page 12: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

RDFa enabled 'deep link' discovery• Rich Snippets from Google

• SearchMonkey from Yahoo

Page 13: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 14: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

goal: federated dataset correlation• graph based dynamic schema evolution across silos

– centralization/normalization not required (or realistic/practical!)

Page 15: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Web as DB - Web API• Linking Open (Government) Data (LOD)

• SPARQL endpoints

linkeddata.org

Page 16: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

browse: from web of docs to web of data

Page 17: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

http://data.linkedmdb.org/page/actor/10

• content negotiation, user agent prefers;– human (html) or machine (rdf/xml) readable

RDF/N3

Page 18: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

http://data.linkedmdb.org/page/actor/10

• now at the bottom of the same page/actor/10– triple is Subject (S) Predicate (P) Object (O)

• 10 (S) vocabulary:property (P) <object> (O)

– properties link to other dataset instances• that use different datatype definitions

– note D2R app, expose RDB as RDF, SPARQL to SQL

Page 19: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

http://data.linkedmdb.org/data/actor/10• <subject> has predicate {space} object1 , objectN ; repeat until .

<http://data.linkedmdb.org/resource/actor/10> foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> ,

<http://www.imdb.com/name/nm0000564/> ;

owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;

rdf:type movie:actor ,

foaf:Person .

• this is an 'N3' RDF serialization, instead of RDF/XML (or others)

• some properties have RESTful SPARQL queries as <objects>

foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>

Page 20: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Web based SPARQL query builder

http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')

Page 21: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

creates dbpedia.org query

• use response data in next query

Page 22: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

authoritative metadata - provided tags!!• using standardized datatype and property specifications

• ontologies emerges from social folksonomy

http://commontag.org

Page 23: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 24: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

indexing/searching the Data Web

Page 25: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

aggregation and live data reporting

http://sig.ma

Page 26: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

many to many set visualization

http://mqlx.com/~david/parallaxinterface used to aggregate data across multiple (data) 'bases' on

http://freebase.com

Page 27: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

ad-hoc analyst/end-user 'meshups'

Page 28: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

schema/bizmo/federal_enterprise

• bizmo.freebase.com = OMG BMM + CPIC (+SOA...)– Obama is an instance of the Federal Enterprise type

• Federal Enterprise (S) Fed Ent Goal (P) Goal (O)

Page 29: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

/rdf/bizmo.federal_enterprise (excerpt)• (W3C/FBase) <subject/topic> <predicate/property>

<object/topic> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.object.name> "Federal

Enterprise"@en.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/freebase.type_profile.instance_count> "1"^^<http://www.w3.org/2001/XMLSchema#long>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_strategy>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_tactic>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_directive>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_objective>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_information_technology_budget>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_goal>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/xhtml/vocab#license> <http://creativecommons.org/licenses/by/3.0/>.

Page 30: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

connecting the data dots:• create the following subject/predicate/object or topic/property/topic

schema:

Goal / amplifies / Vision

Objective / quantifies / Goal

Federal Enterprise / (has) Fed Ent Goal / (of type) Goal

Federal Agency / maintains / Exhibit 53

Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)

Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal

• then create instances with data from http://it.usaspending.gov:

Obama / is of type / Federal Enterprise

Obama / has a Fed Ent Goal / Health Care Reform

HHS / is of type / Federal Agency

HHS / maintains / HHS Exhibit 53

HHS Exhibit 53 / contains / Nat Health Info Network Connect

Nat Health Info Network Connect / supports Obama Goal / Health Care Reform

Page 31: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

search all 'bases' for 'Exhibit 53'

http://mqlx.com/~david/parallax interface tohttp://bizmo.freebase.com

Page 32: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

base/bizmo/e53 returns

• a collection (2 instances) of an Exhibit 53 topic– one from HHS and GSA (data from it.usaspending.gov)

• triple in Exhibit 53 topic schema– Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)

Page 33: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

discovering unknown data structures

• the power of 'faceted' search and browsing• interactive query – which of these?

– Ex53 Recordset (S) Supports Federal Goal (P) ? (O)

Page 34: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

traversing the data graph

• from info about an IT investment• to info about Administration priorities

• 2 Ex53's to 3 Recordsets to 1 that has Obama Goal– <uri> (S) <uri> (P) <uri> (O)

Page 35: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

http://freemix.it - more faceted filtering

Page 36: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

scatter chart driven by tag clouds

Page 37: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

more multi-dataset faceted meshups

Page 38: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

drag & drop metadata/data 'curation'

Page 39: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with

Page 40: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 41: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

crowdsourced analyticsshown using 'Top Braid Composer Maestro' from

http://topquandrant.com

'SPARQLMotion' script – also see Yahoo | Derihttp://pipes.yahoo.com | http://pipes.deri.org

Page 42: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

cloud scale analytics (petabyte batch)• proprietary Google

– GFS, BigTable and MapReduce

– page rank impl• open source Apache Hadoop

– HDFS, HBase and MapReduce

– entity, RDFa extraction• Amazon EMR, Cloudera

– COSS prof service providers

facebook.com

Page 43: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

talis.com/platform - cloud graph store• Software as a Service, enabling rapid development with zero deployment

costs

• a simple, consistent web API for storing, managing and retrieving both structured and unstructured data

• flexible, schema-free metadata that allows applications to be easily evolved

• a range of data access and query options enabling easy integration into both new and existing applications

• access control options to support hosting of both public and private data

• a data hosting solution that is founded on open internet standards and web architectural best practices

• ...

• every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request

• SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF

• content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application

Page 44: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 45: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

application to EA discipline getting there from here

– stop:• publishing / analyzing / visualizing unstructured data• using structure data only in file or message exchanges

– start:• align Gov and Web architecture (including EA KB's!)• publish component ontologies on the Web• and begin linking their metadata and data• using the Social Data Web

– continue:• embrace emergent structure and continuous improvement• using open source and enabling long-tail crowd-sourcing

Page 46: Transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

q&a - discussion• thanks for your time and attention!

• contact me

– http://xri.net/=george.thomas

– GSA OCIO Chief Enterprise Architect– FCIOC-AIC Services Subcommittee Chair– W3C eGov IG invited expert– OMG GovDTF Steering Committee– Graduate School Faculty SOA Instructor