23
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Embed Size (px)

Citation preview

Page 1: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Berlin SPARQL Benchmark (BSBM)

Presented by: Nikhil Rajguru

Christian Bizer and Andreas Schultz

Page 2: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Agenda

• Need for a benchmark for RDF stores• Existing benchmarks• Design of BSBM, Dataset generator and query

mixes• Evaluation results• Contributions• My work• Q&A

Page 3: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Motivation

• A large number of Semantic web applications represent their data as RDF

• Many RDF stores support the SPARQL query language and SPARQL protocol

• Need to compare performance of various RDF stores and also traditional Relational DB solutions (SPARQL wrappers)

Page 4: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Existing benchmarks• SP2Bench

• Uses a synthetic, scalable version of the DBLP bibliography dataset • Queries designed for comparison of different RDF Store layouts - Not designed towards realistic workloads, no parameterized queries and

no warmup• DBPedia Bechmark

• Uses DBPedia as the benchmark dataset - Very specific queries and dataset not scalable

• Lehigh University Benchmark (LUBM)• Compares OWL reasoning engines - Does not cover SPARQL specific features like OPTIONAL filters, UNION,

DESCRIBE, etc. - Does not employ parameterized queries, concurrent clients and warm-up

Page 5: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Main Goals of BSBM

• Compare different stores that expose SPARQL endpoints

• Have realistic use case motivated data sets and Query mixes

• Test query performance (integration and visualization) against large RDF datasets rather than complex reasoning

Page 6: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Dataset

• Built around an e-commerce use case• Dataset generator• Scales to arbitrary sizes (scale factor = # of

products)• Data generation is deterministic

• Dataset objects: Product, ProductType, ProductFeature, Producer, Vendor, Offer, Review, Reviewer and ReviewingSite.

Page 7: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Data set sizes

Page 8: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Query Mix

• Simulates how customers browse, review and select items online

• Operations include• Look for products with some generic features• Look for products without some specific features• Look for similar products• Look for reviews and offers• Pull up all information about a specific product• Find the best deal for a product

Page 9: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Query Mix

Page 10: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Queries

Page 11: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Queries

Page 12: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

BSBM Query Characteristics

Page 13: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Experimental Setup

• RDF Stores tested– Jena SDB– Virtuoso– Sesame– DR2 Server (with MySQL as underlying RDBMS)

• DELL workstation • Processor: Intel Core 2 Quad Q9450 2.66GHz• Memory: 8GB DDR2 667• Hard disks: 160GB (10,000 rpm)SATA2, 750GB (7,200 rpm)

SATA2) • OS: Ubuntu 8.04 64-bit

Page 14: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Load times (sec)

• Data loaded as,• D2R server: Relational representation of BSBM dataset

(MySQL dumps)• Triple Stores: N-triples representation of BSBM Dataset

3.6 hr

7.7 hr

13.6 hr

3.3 min

Page 15: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Overall Run Time

• 50 query mixes, 1250 queries in all• Test driver and store under test running on the

same machine• 10 query mixes executed for warm up

Page 16: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Average Run Time Per Query

• Gives a different perspective on query performance for the stores

• No data store performs optimally for all query types at all Data set sizes (50K – 25M triples)

• Sesame best for Queries 1 - 4 but has bad performance for queries 5 – 9

• DR2 server fastest for queries 6 – 9 but bad for all the lower ones

• Similar results for Jena SDB and Virtuoso

Page 17: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Average Run Time Per Query

Page 18: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Average Run Time Per Query

Page 19: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Average Run Time Per Query

Page 20: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Contributions

• First benchmark to compare stores that implement SPARQL query language and protocol for data access

• Dataset generator (RDF, XML and Relational representation)

• First benchmark to test RDF stores with realistic workloads of use case motivated queries

Page 21: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

My Work

• Build a scalable RDF store for storing the Smart Grid data– Sensor readings, building information, weather

data, Time schedule for each customer• Scale to 50000 sensors (20M triples to be

loaded every 15mins)• Load Fast and slow changing data

Page 22: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

My work

• Support a range of SPARQL queries on the store• Web Portal: (latency ~sec)– 100 customers x 100 columns = 10000 triples

• Schedule trigger: (latency ~min)– ~50,000 customers x 5 schedule events per day x 4

triples = 1,000,000 triples• Forecast training: (latency ~hrs)– 3 years x 365 days x 100 readings x 200 buildings x

2 sensor x 25 columns = 1,095,000,000 triples

Page 23: Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz

Thank you

Questions ?