16
1 / 15 Database Systems Research in Dan Olteanu’s Group @Oxford DBOnto Kick-Off Sept 25, 2014

Overview of Dan Olteanu's Research presentation

  • Upload
    dbonto

  • View
    150

  • Download
    0

Embed Size (px)

DESCRIPTION

Overview of Dan Olteanu's Research - Factorized databases - Probalistic databases - Datalog engines

Citation preview

Page 1: Overview of Dan Olteanu's Research presentation

1 / 15

Database Systems Research

in Dan Olteanu’s Group @Oxford

DBOnto Kick-Off Sept 25, 2014

Page 2: Overview of Dan Olteanu's Research presentation

Factorized Databases

Probabilistic Databases

Datalog Engines

2 / 15

Outline

Page 3: Overview of Dan Olteanu's Research presentation

Factorized Databases by Example

Orders

customer day pizza

Mario Monday Capricciosa

Mario Friday Capricciosa

Pietro Friday Hawaii

Lucia Friday Hawaii

Pizzas

pizza item

Capricciosa base

Capricciosa ham

Capricciosa mushrooms

Hawaii base

Hawaii ham

Hawaii pineapple

Items

item price

base 6

ham 1

mushrooms 1

pineapple 2

Consider the natural join of the three relations above:

Orders 1 Pizzas 1 Items

customer day pizza item price

Mario Monday Capricciosa base 6

Mario Monday Capricciosa ham 1

Mario Monday Capricciosa mushrooms 1

Mario Friday Capricciosa base 6

Mario Friday Capricciosa ham 1

Mario Friday Capricciosa mushrooms 1

. . . . . . . . . . . . . . .

3 / 15

Page 4: Overview of Dan Olteanu's Research presentation

Factorized Databases by Example

Orders 1 Pizzas 1 Items

customer day pizza item price

Mario Monday Capricciosa base 6

Mario Monday Capricciosa ham 1

Mario Monday Capricciosa mushrooms 1

Mario Friday Capricciosa base 6

Mario Friday Capricciosa ham 1

Mario Friday Capricciosa mushrooms 1

. . . . . . . . . . . . . . .

A flat relational algebra expression encoding the above query result is:

〈Mario〉 × 〈Monday〉 × 〈Capricciosa〉 × 〈base〉 × 〈6〉 ∪

〈Mario〉 × 〈Monday〉 × 〈Capricciosa〉 × 〈ham〉 × 〈1〉 ∪

〈Mario〉 × 〈Monday〉 × 〈Capricciosa〉 × 〈mushrooms〉 × 〈1〉 ∪

〈Mario〉 × 〈Friday〉 × 〈Capricciosa〉 × 〈base〉 × 〈6〉 ∪

〈Mario〉 × 〈Friday〉 × 〈Capricciosa〉 × 〈ham〉 × 〈1〉 ∪

〈Mario〉 × 〈Friday〉 × 〈Capricciosa〉 × 〈mushrooms〉 × 〈1〉 ∪ . . .

It uses relational product (×), union (∪), and singleton relations (e.g., 〈1〉).

The attribute names are not shown to avoid clutter.4 / 15

Page 5: Overview of Dan Olteanu's Research presentation

Factorized Databases by Example

The previous relational expression entails lots of redundancy due to the joins:

〈Mario〉 × 〈Monday〉 × 〈Capricciosa〉 × 〈base〉 × 〈6〉 ∪

〈Mario〉 × 〈Monday〉 × 〈Capricciosa〉 × 〈ham〉 × 〈1〉 ∪

〈Mario〉 × 〈Monday〉 × 〈Capricciosa〉 × 〈mushrooms〉 × 〈1〉 ∪

〈Mario〉 × 〈Friday〉 × 〈Capricciosa〉 × 〈base〉 × 〈6〉 ∪

〈Mario〉 × 〈Friday〉 × 〈Capricciosa〉 × 〈ham〉 × 〈1〉 ∪

〈Mario〉 × 〈Friday〉 × 〈Capricciosa〉 × 〈mushrooms〉 × 〈1〉 ∪ . . .

We can factorize the expression following the join structure, e.g.,:

〈Capricciosa〉 × (〈Monday〉 × 〈Mario〉 ∪ 〈Friday〉 × 〈Mario〉)

× (〈base〉 × 〈6〉 ∪ 〈ham〉 × 〈1〉 ∪ 〈mushrooms〉 × 〈1〉)

∪ 〈Hawaii〉 × 〈Friday〉 × (〈Lucia〉 ∪ 〈Pietro〉)

× (〈base〉 × 〈6〉 ∪ 〈ham〉 × 〈1〉 ∪ 〈pineapple〉 × 〈2〉)

pizza

day

customer

item

price

There are several algebraically equivalent factorized representations defined by

distributivity of product over union and commutativity of product and union.5 / 15

Page 6: Overview of Dan Olteanu's Research presentation

Key Properties of Factorized Representations

Factorized representations of results for queries with select, project, join,

aggregate, groupby, and orderby operators:

Very high compression rateI Can be exponentially more succinct than the relations they encode.I Arbitrarily better than generic compression schemes, e.g., bzip2I Factorized representations of asymptotically-tight size bounds computable

directly from input database and query

Querying in the compressed domainI Factorizations are relational expressions and can be composed with queriesI We developed the FDB in-memory query engine for this purpose

6 / 15

Page 7: Overview of Dan Olteanu's Research presentation

Current Focus

Reduce communication cost in distributed database systems

Factorization of temporary query results exchanged between nodes

Many systems already employ limited factorizations

Google MegaStore and F1, FoundationDB, Microsoft Cloud SQL Server

Google Faculty Research Award

Reduce space requirements of large-scale feature vectors in predictive modelling

Feature vectors = relations with high cardinality

Improvements of 10-100x on LogicBlox client data

7 / 15

Page 8: Overview of Dan Olteanu's Research presentation

Factorized Databases

Probabilistic Databases

Datalog Engines

8 / 15

Outline

Πgora ENFrame SPROUT2

Page 9: Overview of Dan Olteanu's Research presentation

Probabilistic Data is Commonplace

Facts of life:

Real-world data is often uncertain

Currrent probabilistic databases are in the order of Billion records

Generated from web data by NELL, Google Squared & Knowledge Vault

Curating before processing is a time & money black hole

We would like to query uncertain data asap!

MayBMS/SPROUT probabilistic database system

Open-source, built on top of PostgreSQL

3000+ downloads (as of Dec 2013)

The PDB most benchmarked against

SPROUT2 = SPROUT on Google Squared

Caught the interest of UK Defence Science and Technology Lab

9 / 15

Page 10: Overview of Dan Olteanu's Research presentation

Probabilistic Data is Commonplace

Facts of life:

Real-world data is often uncertain

Currrent probabilistic databases are in the order of Billion records

Generated from web data by NELL, Google Squared & Knowledge Vault

Curating before processing is a time & money black hole

We would like to query uncertain data asap!

MayBMS/SPROUT probabilistic database system

Open-source, built on top of PostgreSQL

3000+ downloads (as of Dec 2013)

The PDB most benchmarked against

SPROUT2 = SPROUT on Google Squared

Caught the interest of UK Defence Science and Technology Lab

9 / 15

Page 11: Overview of Dan Olteanu's Research presentation

A Squared Query Engine for Uncertain Web Data

10 / 15

Page 12: Overview of Dan Olteanu's Research presentation

A Squared Query Engine for Uncertain Web Data

11 / 15

Page 13: Overview of Dan Olteanu's Research presentation

Factorized Databases

Probabilistic Databases

Datalog Engines

12 / 15

Outline

Page 14: Overview of Dan Olteanu's Research presentation

A New Breed of Smart Database Systems

Unified and declarative programming model for the enterprise tech stack

Can freely mix transactions, analytics, graph queries, mathematical

programming and optimization, probabilistic programming

Makes possible new classes of hybrid applications

Typical app in retail sector:

I 50K Datalog++ LOC (vs. millions of C++ LOC)I One system (vs. tens)

13 / 15

Page 15: Overview of Dan Olteanu's Research presentation

Live Programming in the Database

Flexible spreadsheet backed by scalable full-fledged DBMS

Users can define formulas or change schemaI Triggers addition/deletion of datalog code on the DB server

program!

edbs!

idbs! revised idbs!

execution graph!

revised execution graph!

(meta-data)!

(actual data)!

14 / 15

Page 16: Overview of Dan Olteanu's Research presentation

Our Approach

Use declarative programming to improve

the implementation of declarative systems!

Internal library for declarative and incremental maintenance of program

state, using a small datalog engine.

I In the LogicBlox engine since May 2014

15 / 15