Stardog talk-dc-march-17

Kendall Clark, CEOClark & Parsia, LLC

1Thursday, March 17, 2011

About C&P• We build semantic technology infrastructure

and enterprise solutions

• Pellet, the leading OWL reasoner

• POPS Expertise Location system

• Bootstrapped since 2005

• Offices in DC and Cambridge, MA

• Government & enterprise customers

• First talk ever was at LOC in 2005 :)



TLDR?• Java RDF database (“quad store”) (no

native code)

• Freemium model:

• enterprise & community editions

• OEM

• Performance for complex SPARQL queries

• Best available reasoning support


NoSQL and SemWeb• Semweb is schemaless and schema-rich

• As agile as NoSQL stores

• More expressive than SQL

• Standards based

• Graph DBs are all ad hoc

• Query Language and, you know, joins

• Do you really want to write map-reduce programs...only?! We sure don’t...!


Why another RDF DB?

• We’re scratching our itch for fast query for integration & decision support apps

• aimed at db-reasoner “tweener” space

• operationally agile

• There’s a hole in the market; or: markets are normal distributions (probably)

• Gives us a complete semantic application platform


Commercial Market• 6 products

• Technically homogenous:

• Sagan-like scale obsession

• Mostly ad hoc reasoning

• Weak perf on complex queries

• Ho-hum feature sets & integrations

• See http://bit.ly/92P8eN for more


http://bit.ly/92P8eN

http://bit.ly/92P8eN

Stardog1.0: Overview• Fast

• Lightweight

• Rich API support

• Logical & statistical inference

• Transactions

• Full-text search

• Graph algorithms and path language

• awesome mascot!


Fast? No, Really Fast!

• First design goal in Stardog is performance of complex SPARQL query eval on single machine in the default configuration

• Next, total total queries per second

• In-memory mode available, when needed

• Early testing is promising: fastest RDF DB on SP2B benchmark. Often several times faster.


Performance• Do yr own testing; the only queries that

matter are yours; don’t trust, test.

• It’s not ready till it’s very, very fast.

• Flatten the RDF performance tax

• About 256 GB for ~2B triples in main-memory mode, i.e., $20k Dell box.

• When in doubt: Add. More. RAM.


Scalability• Stardog 1.0: scale up

• Disk-based joins for very large intermediate structures

• Triples compression

• Ideally efficient on-disk indices

• Stardog 2.0: scale out (shared-disk cluster)

• We think it’s easier to scale a fast DB than to speed up a scalable one...


Lightweight• ~34 KLOC for core system, ~10 KLOC of

tests (1034 unit tests)

• Trivially simple installation:

• copy JAR & restart servlet container

• If you’ve ever used Sesame...

• May run: embedded, client-server; main memory or disk-backed modes; any combination of these


Interfaces

• SNARL (Stardog Native API for RDF Language)

• Avro RPC—esp. the low-level TCP transport (coming soon...)—for Java & non-Java

• Sesame & Jena

• SPARQL Protocol (HTTP)


Logical Inference1. OWL 2 QL, EL, and RL “query-time”

reasoning

• No materialization (so: fast bulk loading)

• reasoning enabled per-query

2. OWL 2 DL reasoning via Pellet 3.0

• in-memory, schema reasoning

3. Integrity Constraint Validation via OWL2

4. user-defined & SWRL rules


OWL validation of RDF• Use OWL ontologies to validate RDF

instance data in Stardog.

• May be used as a guard to database modifications (so, if resulting data is invalid, transaction fails).

• W3C Member Submission to formalize this approach; stay tuned for details.

• See http://clarkparsia.com/pellet/icv/ for details


http://clarkparsia.com/pellet/icv/

http://clarkparsia.com/pellet/icv/

OWL 2 Support

• Stardog 1.0: query-time, query rewriting reasoner for SPARQL entailment regimes

• It will support all of OWL 2 QL, EL, and RL, with exceptions:

• limited support for datatypes reasoning

• i.e., won’t support user-defined datatypes

• will depend on customer demand


Statistical Inference• Corleone is a machine learning system for

RDF and OWL

• Optimized for Stardog

• Multiple classifier & cluster algorithms

• Clusters (similarity) and classifies (predicts) by RDF class & individual

• Machine learning must still be tuned; no magic bullets


Transactions

• Supports optional ACID transactions on database mutations

• 2-phase commit based on Java Transaction API

• Tx’d writes 2x to 8x slower, depending on lots of variables

• Writes may be asynchronous & queued


Search• Indexes RDF individuals and literals

• Results are 2-tuples (url|value, score)

• Based on Lucene: very fast, very scalable

• Can use 1 of 6 algorithms to partition RDF individuals from a graph

• via SPARQL DESCRIBE hook

• Will be integrated with SPARQL syntax...


RDF as Graph• SPARQL isn’t ideal for every use case

• Graph algorithm processing on RDF purely as a graph

• Stardog supports Gremlin, the ad hoc standard for graph database query languages

• Gremlin makes graph algorithms easy to write

• More optimized Gremlin support for 1.0


Implementations

Sesame Jena Empire

HTTP API Native API Avro API

Stardog API

SPI Runtime

Transactions

Stardog RDF

Stardog Core

Query

Exec

Optimizer

Plan Filter API

Query Rewriting/Reasoning

Index API SPI

CP Util IO Util Stardog Util Sesame Ext

Plan API

!"#$%&'#&("'


Status

• Stardog 0.4.6 alpha release to alpha testers on 15 March 2011

• It feels damn good to ship code, even if it’s just an alpha! :)

• Weekly updates till beta period starts, then bimonthly updates till 1.0 release


The Private Beta• Doin’ it old school: private beta, invitation

only

• Helps us keep commercial focus

• ~1 April to 30 May

• [email protected] if yr interested: give name, org, area of interest, etc.

• Rolling releases, new features, bug fixes, etc

• ~90 organizations signed up for beta so far


mailto:[email protected]

mailto:[email protected]

Roadmap• 1.0 in mid-Summer

• SPARQL 1.1, MRMW

• stored procedures in any JVM lang

• Shiro-based security layer

• native OWL 2 RL reasoner

• provenance API

• graph algorithms & an RDF path language

• performance improvements continuously


Thanks! Questions?• http://stardog.com/

• http://clarkparsia.com/

• http://twitter.com/candp

• http://twitter.com/stardog_db


http://stardog.com

http://stardog.com

http://clarkparsia.com

http://clarkparsia.com

http://twitter.com/candp

http://twitter.com/candp

http://twitter.com/stardog_db




Technology

Stardog talk-dc-march-17