Upload
clark-parsia-llc
View
3.506
Download
0
Embed Size (px)
DESCRIPTION
Stardog is a fast, scalable, lightweight RDF database for complex SPARQL queries. It features OWL 2 reasoning, transactions, a robust security layer, integrity constraint validation via Pellet 3, and world-class support.
Citation preview
Kendall Clark, CEOClark & Parsia, LLC
1Thursday, March 17, 2011
About C&P• We build semantic technology infrastructure
and enterprise solutions
• Pellet, the leading OWL reasoner
• POPS Expertise Location system
• Bootstrapped since 2005
• Offices in DC and Cambridge, MA
• Government & enterprise customers
• First talk ever was at LOC in 2005 :)
2Thursday, March 17, 2011
3Thursday, March 17, 2011
TLDR?• Java RDF database (“quad store”) (no
native code)
• Freemium model:
• enterprise & community editions
• OEM
• Performance for complex SPARQL queries
• Best available reasoning support
4Thursday, March 17, 2011
NoSQL and SemWeb• Semweb is schemaless and schema-rich
• As agile as NoSQL stores
• More expressive than SQL
• Standards based
• Graph DBs are all ad hoc
• Query Language and, you know, joins
• Do you really want to write map-reduce programs...only?! We sure don’t...!
5Thursday, March 17, 2011
Why another RDF DB?
• We’re scratching our itch for fast query for integration & decision support apps
• aimed at db-reasoner “tweener” space
• operationally agile
• There’s a hole in the market; or: markets are normal distributions (probably)
• Gives us a complete semantic application platform
6Thursday, March 17, 2011
Commercial Market• 6 products
• Technically homogenous:
• Sagan-like scale obsession
• Mostly ad hoc reasoning
• Weak perf on complex queries
• Ho-hum feature sets & integrations
• See http://bit.ly/92P8eN for more
7Thursday, March 17, 2011
Stardog1.0: Overview• Fast
• Lightweight
• Rich API support
• Logical & statistical inference
• Transactions
• Full-text search
• Graph algorithms and path language
• awesome mascot!
8Thursday, March 17, 2011
Fast? No, Really Fast!
• First design goal in Stardog is performance of complex SPARQL query eval on single machine in the default configuration
• Next, total total queries per second
• In-memory mode available, when needed
• Early testing is promising: fastest RDF DB on SP2B benchmark. Often several times faster.
9Thursday, March 17, 2011
Performance• Do yr own testing; the only queries that
matter are yours; don’t trust, test.
• It’s not ready till it’s very, very fast.
• Flatten the RDF performance tax
• About 256 GB for ~2B triples in main-memory mode, i.e., $20k Dell box.
• When in doubt: Add. More. RAM.
10Thursday, March 17, 2011
Scalability• Stardog 1.0: scale up
• Disk-based joins for very large intermediate structures
• Triples compression
• Ideally efficient on-disk indices
• Stardog 2.0: scale out (shared-disk cluster)
• We think it’s easier to scale a fast DB than to speed up a scalable one...
11Thursday, March 17, 2011
Lightweight• ~34 KLOC for core system, ~10 KLOC of
tests (1034 unit tests)
• Trivially simple installation:
• copy JAR & restart servlet container
• If you’ve ever used Sesame...
• May run: embedded, client-server; main memory or disk-backed modes; any combination of these
12Thursday, March 17, 2011
Interfaces
• SNARL (Stardog Native API for RDF Language)
• Avro RPC—esp. the low-level TCP transport (coming soon...)—for Java & non-Java
• Sesame & Jena
• SPARQL Protocol (HTTP)
13Thursday, March 17, 2011
Logical Inference1. OWL 2 QL, EL, and RL “query-time”
reasoning
• No materialization (so: fast bulk loading)
• reasoning enabled per-query
2. OWL 2 DL reasoning via Pellet 3.0
• in-memory, schema reasoning
3. Integrity Constraint Validation via OWL2
4. user-defined & SWRL rules
14Thursday, March 17, 2011
OWL validation of RDF• Use OWL ontologies to validate RDF
instance data in Stardog.
• May be used as a guard to database modifications (so, if resulting data is invalid, transaction fails).
• W3C Member Submission to formalize this approach; stay tuned for details.
• See http://clarkparsia.com/pellet/icv/ for details
15Thursday, March 17, 2011
OWL 2 Support
• Stardog 1.0: query-time, query rewriting reasoner for SPARQL entailment regimes
• It will support all of OWL 2 QL, EL, and RL, with exceptions:
• limited support for datatypes reasoning
• i.e., won’t support user-defined datatypes
• will depend on customer demand
16Thursday, March 17, 2011
Statistical Inference• Corleone is a machine learning system for
RDF and OWL
• Optimized for Stardog
• Multiple classifier & cluster algorithms
• Clusters (similarity) and classifies (predicts) by RDF class & individual
• Machine learning must still be tuned; no magic bullets
17Thursday, March 17, 2011
Transactions
• Supports optional ACID transactions on database mutations
• 2-phase commit based on Java Transaction API
• Tx’d writes 2x to 8x slower, depending on lots of variables
• Writes may be asynchronous & queued
18Thursday, March 17, 2011
Search• Indexes RDF individuals and literals
• Results are 2-tuples (url|value, score)
• Based on Lucene: very fast, very scalable
• Can use 1 of 6 algorithms to partition RDF individuals from a graph
• via SPARQL DESCRIBE hook
• Will be integrated with SPARQL syntax...
19Thursday, March 17, 2011
RDF as Graph• SPARQL isn’t ideal for every use case
• Graph algorithm processing on RDF purely as a graph
• Stardog supports Gremlin, the ad hoc standard for graph database query languages
• Gremlin makes graph algorithms easy to write
• More optimized Gremlin support for 1.0
20Thursday, March 17, 2011
Implementations
Sesame Jena Empire
HTTP API Native API Avro API
Stardog API
SPI Runtime
Transactions
Stardog RDF
Stardog Core
Query
Exec
Optimizer
Plan Filter API
Query Rewriting/Reasoning
Index API SPI
CP Util IO Util Stardog Util Sesame Ext
Plan API
!"#$%&'#&("'
21Thursday, March 17, 2011
Status
• Stardog 0.4.6 alpha release to alpha testers on 15 March 2011
• It feels damn good to ship code, even if it’s just an alpha! :)
• Weekly updates till beta period starts, then bimonthly updates till 1.0 release
22Thursday, March 17, 2011
The Private Beta• Doin’ it old school: private beta, invitation
only
• Helps us keep commercial focus
• ~1 April to 30 May
• [email protected] if yr interested: give name, org, area of interest, etc.
• Rolling releases, new features, bug fixes, etc
• ~90 organizations signed up for beta so far
23Thursday, March 17, 2011
Roadmap• 1.0 in mid-Summer
• SPARQL 1.1, MRMW
• stored procedures in any JVM lang
• Shiro-based security layer
• native OWL 2 RL reasoner
• provenance API
• graph algorithms & an RDF path language
• performance improvements continuously
24Thursday, March 17, 2011
Thanks! Questions?• http://stardog.com/
• http://clarkparsia.com/
• http://twitter.com/candp
• http://twitter.com/stardog_db
25Thursday, March 17, 2011