20
Contact: ©2011 Cambridge Semantics Inc. All rights reserved. Company Confidential Semantic Web Technologies on HPC for Life Sciences and Other Domains Sean Martin Founder & CTO Cambridge Semantics Inc. [email protected] m +1 617 606 341

Semantic Web Technologies on HPC for Life Sciences and Other Domains

  • Upload
    joie

  • View
    24

  • Download
    1

Embed Size (px)

DESCRIPTION

Semantic Web Technologies on HPC for Life Sciences and Other Domains. Sean Martin Founder & CTO Cambridge Semantics Inc. [email protected] +1 617 606 341. Semantic Web Technologies on HPC for Life Sciences and Other Domains. Sean Martin Founder & CTO Cambridge Semantics Inc. - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic Web Technologies on HPC for Life Sciences and Other Domains

Contact:

©2011 Cambridge Semantics Inc. All rights reserved. Company Confidential.

Semantic Web Technologies on HPC for Life Sciences and Other Domains

Sean MartinFounder & CTOCambridge Semantics [email protected]+1 617 606 341

Page 2: Semantic Web Technologies on HPC for Life Sciences and Other Domains

Contact:

©2011 Cambridge Semantics Inc. All rights reserved. Company Confidential.

Semantic Web Technologies on HPC for Life Sciences and Other Domains

Sean MartinFounder & CTOCambridge Semantics [email protected]+1 617 606 341

Page 3: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

What is/are Semantic Technologies anyway?

Semantics (from Greek sēmantiká, neuter plural of sēmantikós) is the study of meaning.

10 Semantics experts in a room = 11 opinions

Page 4: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

• Usually proprietary, mostly heuristics/statistics based• Search (not query)• Usually extract meaning from unstructured data

(text/video etc)• Examples:– or– Enterprise search e.g. or– Entity extraction, automated tagging, text analytics– Natural Language Processing Technologies (NLP)– Automated Translation e.g. Google Translate– SMILA & UIMA open source frameworks–

Little “s” semantics

Page 5: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

• W3C recommendations (open data standards)• Machine readable, query (not search) & instant data

integration• The Semantic Web – Also known as “Linked Open Data”– Also known as “Web 3.0

• Examples:– Google “rich snippets”– OpenGraph– The Good Relations Ontology e.g. – Public Government Data (USA, Europe, UK)– All sorts of startup activity

Big “S” Semantics – Paint starting to dry

Page 6: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

What are the W3C’s Open Data Standards?

• RDF• OWL• SPARQL

There are others, but these are the key ones

Page 7: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

RDF

• Self describing (tagged) instance data• Facts or Triples :

<subject> <predicate> <Object/Value>

• Collections of triples creates a directed labeled graph<subject> and <predicate> are globally unique strings or URIs e.g. http://www.cambridgesemantics.com/people/sean

Page 8: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

OWL

• OWL (Web Ontology Language)– Describe data models in a way that domain expert would

• What triples or facts are needed to properly describe something and its relationship to other similarly described things?

– Relationships for inference and other kinds of reasoning

Page 9: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

SPARQL

• The first standards based distributed query language for RDF data & the Web– Wow!

Page 10: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Important properties of RDF

• Machine readable model / programs can “understand”• Unique Identity of every data element

– Subject is a unique identifier– Predicates (the relationship) is also a unique identifier– Object can be a unique identifier pointing to another subject

• That’s how we get directed graphs

– Allows annotation (the unique subject string provides an “anchor” for 3rd party metadata)

– Allows provenance (especially useful when data travels beyond its source system or needs to be updated)

• Semantic Type (not just primitive data types)– Lets programs immediately know what type of data they are

dealing with, allowing automated contextualization of information

Page 11: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

So what does any of this change?• Adoption of the semantic standards will be disruptive

in at least two ways that create enormous value1. Who can do what. Much easier.

• Pushing the bar further and further towards end user self-service

2. How long it takes. Much faster.• Each new wave of technology brings at least an order of

magnitude productivity increases, often moreRecent waves: Web Services/SOA; Java (no memory management); Virtualization etc.

• Semantic technology is another wave

Page 12: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Where do these benefits come from?

• Using Semantic Technologies, the end users understanding of their data need be the only system or application model required

• This allows the construction of applications & systems to move from what have until now been carefully planned, structure dependent “all up front” designs over to malleable conceptual representations that can be evolved quickly Systems go from being brittle to flexible Systems can change at the speed the business does End Users can increasingly make more of these changes

directly themselves

Page 13: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Preserving the end users model

Traditional middleware

• Relational Model Physical

• Relational Model Logical

• Object Relational Model

• Business Objects Model

• User Interface Model

• Users idea of the Model

Semantic middleware

• Users Model*

*Warning: dramatically over simplified to make a point

Page 14: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

• Exploding data volumes– tagging creates 10x more data

• Random Access is expensive– >35 Years of optimization around RDBMS is not helping – too many “self-joins” on a three column table– No index support

• Adding an additional layer of indirection is expensive– every time you want to display a value you need to

dereference it

Paying the price for all this flexibility

Page 15: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Paying the price for all this flexibility – enabling trends

• W3C Semantic standards• A decade of semantic middleware+storage R&D• Multi-core CPUs• Fast networking• Cheap RAM

– Web 2.0 blazing the trail with a new RAM based application model?

Disk is the new tape?Twitter, Facebook, LinkedIn and iostat

• SSD – The changing cost of the sub 4k random access read and

what it means to transaction processing systems and the applications that run on them

Page 16: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Spot the difference

Then.. Now

Page 17: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

And finally, so what does any of this have to do with HPC?

• Cray’s XMT Systems+ Very large quantities of RAM arranged in a contiguous block+ Very low latency memory access+ Large number of CPUs+ Large number of cheap threads= Full pipelines

• Great for interactive applications creating random access queries patterns, particularly complex ones requiring many joins

Page 18: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Other HPC related Semantic efforts

• Raytheon BNN’s SPARQL on MapReduce clusters• WebPie – VU University of Amsterdam’s OWL Horst

Inference on MapReduce• Clustered RDF triple stores – Open Link’s Virtuosa data store– Ontotext’s Big OWLIM– Franz Inc’s AllegroGraph

Page 19: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Semantics & the Enterprise – not waiting for the network effect

Overview of Cambridge Semantics Middleware Platform

A W3C-based semantic middleware for real-time user driven operational intelligence

Allow business users & customers /partners to:

• Discover & connect to any data in databases & other systems on the fly• Create dashboards &

applications on demand

Allows IT to:

• Rapidly integrate data across silos and firewalls• Expose business policies, rules

& workflow to business users• Implement manual intervention

with automated response• Enterprise-class security,

governance, provenance, …

Page 20: Semantic Web Technologies on HPC for Life Sciences and Other Domains

©2011 Cambridge Semantics Inc. All rights reserved.

Thanks for listening

• Further Interest and a completely different view– Sir Tim Berners-Lee’s TED Talk on the next web

• Questions/Objections?– Stop me & ask/state

• Contact details again

Sean MartinCambridge Semantics [email protected]+1 617 606 341