AvocadoDB query language (DRAFT!)

© 2012 triAGENS GmbH | 2012-04-13 1

AvocadoDB query languageJan Steemann (triAGENS)

© 2012 triAGENS GmbH | 2012-04-13 2

Database query languages / paradigms

There are many database query languages and paradigms around

Some examples: SQL

declarative query language for relational databases, well-known and popular

UNQLdeclarative query language for document databases, SQL-syntax like, embeds JSON

graph query languages (Cypher, Gremlin, ...)declarative languages focusing on graph queries

fluent query languages/interfacese.g. db.user.find(...).sort(...)

map/reduceimperative query formulation/programming

...

© 2012 triAGENS GmbH | 2012-04-13 3

AvocadoDB query language: status quo

There is a query language in AvocadoDB

The language syntax is very similar to SQL / UNQL

The language currently supports reading data from collections (i.e. equivalent to an SQL/UNQL SELECT query)

Some complex access patterns (e.g. joins using multiple collections) are also supported

There are some specials as creating inline lists from a list of documents (named: LIST JOIN)

© 2012 triAGENS GmbH | 2012-04-13 4


There is a query language in AvocadoDB

The language syntax is very similar to SQL / UNQL

The language currently supports reading data from collections (i.e. equivalent to an SQL/UNQL SELECT query)

Some complex access patterns (e.g. joins using multiple collections) are also supported

There are some specials as creating inline lists from a list of documents (named: LIST JOIN)

© 2012 triAGENS GmbH | 2012-04-13 5


Syntax example:

SELECT { "user": u, "friends": f }FROM users uLIST JOIN friends fON (u.id == f.uid)WHERE u.type == 1ORDER BY u.name

© 2012 triAGENS GmbH | 2012-04-13 6

Language problems

The current query language has the problem that some queries cannot be expressed very well with it

This might be due to the query language being based on SQL, and SQL being a query language for relational databases

AvocadoDB is mainly a document-oriented database and its object model does only partly overlap with the SQL object model:

SQL (relational): tables

(homogenous) rows

columns

scalars

references

AvocadoDB (document-oriented): collections

(inhomogenous) documents

attributes

scalars

lists

edges

© 2012 triAGENS GmbH | 2012-04-13 7

Language problems: multi-valued attributes

Attributes in AvocadoDB can and shall be stored denormalised (multi-valued attributes, lists, ...):{ "user": { "name": "Fred",

"likes": [ "Fishing", "Hiking", "Swimming" ] } }

In an SQL database, this storage model would be an anti-pattern

Problem: SQL is not designed to access multi-valued attributes/lists but in AvocadoDB we want to support them via the language

UNQL addresses this partly, but does not go far enough

© 2012 triAGENS GmbH | 2012-04-13 8

Language problems: graph queries

AvocadoDB also supports querying graphs

Neither SQL nor UNQL offer any „natural“ graph traversal facilities

Instead, there are: SQL language extensions: e.g. CONNECT BY, proprietary

SQL stored procedures: e.g. PL/SQL imperative code, does not match well with the declarative nature of SQL

Neither SQL nor UNQL are the languages of choice for graph queries, but we want to support graph queries in AvocadoDB

© 2012 triAGENS GmbH | 2012-04-13 9

AvocadoDB query language, version 2

During the past few weeks we thought about moving AvocadoDB's query language from the current SQL-/UNQL-based syntax to something else

We did not find an existing query language that addresses the problems we had too well

So we tried to define a syntax for a new query language

© 2012 triAGENS GmbH | 2012-04-13 10


The new AvocadoDB query language should have an easy-to-understand syntax for the end user

offer a way to declaratively express queries

avoid ASCII art queries

still allow more complex queries (joins, sub-queries etc.)

allow accessing lists and list elements more naturally

be usable with the different data models AvocadoDB supports(e.g. document-oriented, graph, „relational“)

be consistent and easy to process

have one syntax regardless of the underlying client language

© 2012 triAGENS GmbH | 2012-04-13 11


A draft of the new language version is presented as follows

It is not yet finalized and not yet implemented

Your feedback on it is highly appreciated

Slides will be uploaded to http://www.avocadodb.org/

© 2012 triAGENS GmbH | 2012-04-13 12

Data types

The language has the following data types: absence of a value:null

boolean truth values:false, true

numbers (signed double precision):1, -34.24

strings, e.g."John", "goes fishing"

lists (with elements accessible by their position), e.g.[ "one", "two", false, -1 ]

documents (with elements accessible by their name), e.g.{ "user": { "name": "John", "age": 25 } }

Note: names of document attributes can also be used without surrounding quotes

© 2012 triAGENS GmbH | 2012-04-13 13

Bind parameters

Queries can be parametrized using bind parameters

This allows separation of query text and actual query values

Any literal values, including lists and documents can be bound

Collection names can also be bound

Bind parameters can be accessed in the query using the @ prefix

Example:@ageu.name == @nameu.state IN @states

© 2012 triAGENS GmbH | 2012-04-13 14

Operators

The language has the following operators: logical: will return a boolean value or an error&& || !

arithmetic: will return a numeric value or an error+ - * / %

relational: will return a boolean value or an error== != < <= > >= IN

ternary: will return the true or the false part? :

String concatentation will be provided via a function

© 2012 triAGENS GmbH | 2012-04-13 15

Type casts

Typecasts can be achieved by explicitly calling typecast functions

No implicit type cast will be performed

Performing an operation with invalid/inappropriate types will result in an error

When performing an operation that does not have a valid or defined result, the outcome will be an error:1 / 0 => error1 + "John" => error

Errors might be caught and converted to null in a query or bubble up to the top, aborting the query. This depends on settings

© 2012 triAGENS GmbH | 2012-04-13 16

Null

When referring to something non-existing (e.g. a non-existing attribute of a document), the result will be null:users.nammme => null

Using the comparison operators, null can be compared to other values and also null itself. The result will be a boolean (not null as in SQL)

© 2012 triAGENS GmbH | 2012-04-13 17

Type comparisons

When comparing two values, the following algorithm is used

If the types of the compared values are not equal, the compare result is as follows:null < boolean < number < string < list < document

Examples:null < false 0 != nullfalse < 0 null != falsetrue < 0 false != "" true < [ 0 ] "" != [ ] true < [ ] null != [ ]0 < [ ][ ] < { }

© 2012 triAGENS GmbH | 2012-04-13 18

Type comparisons

If the types are equal, the actual values are compared

For boolean values, the order is:false < true

For numeric values, the order is determined by the numeric value

For string values, the order is determined by bytewise comparison of the strings characters

Note: at some point, collations will need to be introduced for string comparisons

© 2012 triAGENS GmbH | 2012-04-13 19

Type comparisons

For list values, the elements from both lists are compared at each position. For each list element value, the described comparisons will be done recursively: [ 1 ] > [ 0 ][ 2, 0 ] > [ 1, 2 ][ 99, 4 ] > [ 99, 3 ][ 23 ] > [ true ][ [ 1 ] ] > 99[ ] > 1[ true ] > [ ][ null ] > [ ][ true, 0 ] > [ true ]

© 2012 triAGENS GmbH | 2012-04-13 20

Type comparisons

For document values, the attribute names from both documents are collected and sorted. The sorted attribute names are then checked individually: if one of the documents does not have the attribute, it will be considered „smaller“. If both documents have the attribute, a value comparison will be done recursively:{ } < { "age": 25 }{ "age": 25 } < { "age": 26 }{ "age": 25 } > { "name": "John" }{ "name": "John", == { "age": 25, "age": 25 } "name": "John" }{ "age": 25 } < { "age": 25, "name": "John" }

© 2012 triAGENS GmbH | 2012-04-13 21

Base building block: lists

A good part of the query language is about processing lists

There are several types of lists: statically declared lists, e.g. [ { "user": { "name": "Fred" } }, { "user": { "name": "John" } } ]

lists of documents from collections, e.g. users

locations

result lists from filters/queries, e.g. NEAR(locations, [ 43, 10 ], 100)

© 2012 triAGENS GmbH | 2012-04-13 22

FOR: List iteration

The FOR keyword can be used to iterate over all elements from a list

Example (collection-based, collection „users“):FOR u IN users

A result document (named: u) is produced on each iteration

The above example produces the following result list:[ u1, u2, u3, ..., un ]

Note: this is comparable to the following SQL:SELECT * FROM users u

In each iteration, the individual element is accessible via its name (u)

© 2012 triAGENS GmbH | 2012-04-13 23

FOR: List iteration

Nesting of multiple FOR blocks is possible

Example: cross product of users and locations (u x l):FOR u IN users FOR l IN locations

A result document containing both variables (u, l) is produced on each iteration of the inner loop

The result document contains both u and l

Note: this is equivalent to the following SQL queries:SELECT * FROM users u, locations lSELECT * FROM users u INNER JOIN locations lON (1=1)

© 2012 triAGENS GmbH | 2012-04-13 24

FOR: List iteration

Example: cross product of years & quarters (non collection-based):FOR year IN [ 2011, 2012, 2013 ] FOR quarter IN [ 1, 2, 3, 4 ]

Note: this is equivalent to the following SQL query:SELECT * FROM (SELECT 2011 UNION SELECT 2012 UNION SELECT 2013) year, (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) quarter

© 2012 triAGENS GmbH | 2012-04-13 25

FILTER: results filtering

The FILTER keyword can be used to restrict the results to elements that match some definable condition

Example: retrieve all users that are activeFOR u IN usersFILTER u.active == true

Note: this is equivalent to the following SQL:SELECT * FROM users u WHERE u.active = true

Access to the individual list elements in FOR list using variable name u

© 2012 triAGENS GmbH | 2012-04-13 26


The FILTER keyword in combination with nested FOR blocks can be used to perform joins

Example: retrieve all users that have matching locationsFOR u IN users FOR

l IN locationsFILTER u.a == l.b

Note: this is equivalent to the following SQL queries:SELECT * FROM users u, locations l WHERE u.a == l.bSELECT * FROM users u (INNER) JOIN locations l ON u.a == l.b

Access to the individual list elements using variable names

© 2012 triAGENS GmbH | 2012-04-13 27

Base building block: scopes

The query language is scoped

Variables can only be used after they have been declared

Example:FOR u IN users FOR l IN locationsFILTER u.a == l.b

Scopes can be made explicit using brackets (will be shown later)

Introduces u

Introduces l

Can use both u and l

© 2012 triAGENS GmbH | 2012-04-13 28


Thanks to scopes, the FILTER keyword can be used everywhere where SQL needs multiple keywords: ON

WHERE

HAVING

© 2012 triAGENS GmbH | 2012-04-13 29


That means: in AvocadoDB you would use FILTERFOR u IN users FOR l IN locationsFILTER u.a == l.b

whereas in SQL you would use either ONSELECT * FROM users (INNER) JOIN locations l ON u.a == l.b

or WHERE:SELECT * FROM users, locations lWHERE u.a == l.b

© 2012 triAGENS GmbH | 2012-04-13 30


FILTER can be used to model both an SQL ON and an SQL WHERE in one go:FOR u IN users FOR l IN locationsFILTER u.active == 1 && u.a == l.b

This is equivalent to the following SQL query:SELECT * FROM users u (INNER) JOIN locations lON u.a == l.b WHERE u.active = 1

© 2012 triAGENS GmbH | 2012-04-13 31


More than one FILTER condition allowed per query

The following queries are all equivalent

Optimizer's job is to figure out best positions for applying FILTERsFOR u IN usersFILTER u.c == 1 FOR l IN locations FILTER l.d == 2FILTER u.a == l.b

FOR u IN users FOR l IN locationsFILTER u.c == 1 && l.d == 2 && u.a == l.b

FOR u IN users FOR l IN locations FILTER l.d == 2 && u.a == l.bFILTER u.c == 1

====

© 2012 triAGENS GmbH | 2012-04-13 32

RETURN: results projection

The RETURN keyword produces the end result documents from the intermediate results produced by the query

Comparable to the SELECT part in an SQL query

RETURN part is mandatory at the end of a query(and at the end of each subquery)

RETURN is partly left out in this presentation for space reasons

© 2012 triAGENS GmbH | 2012-04-13 33


Example:FOR u IN usersRETURN { "name" : u.name, "likes" : u.likes, "numFriends": LENGTH(u.friends)}

Produces such document for each u found

© 2012 triAGENS GmbH | 2012-04-13 34


To return all documents as they are in the original list, thereis the following variant:FOR u IN usersRETURN u

Would produce:[ { "name": "John", "age": 25 }, { "name": "Tina", "age": 29 }, ... ]

Note: this is similar to SQL's SELECT u.*

© 2012 triAGENS GmbH | 2012-04-13 35


To return just the names for all users, the following query would do:FOR u IN usersRETURN u.name

Would produce:[ "John", "Tina", ... ]

Note: this is similar to SQL's SELECT u.name

© 2012 triAGENS GmbH | 2012-04-13 36


To return a hierchical result (e.g. data from multiple collections),the following query could be used:FOR u IN users FOR l IN locationsRETURN { "user": u, "location" : l }

Would produce:[ { "user": { "name": "John", "age": 25 }, "location": { "x": 1, "y“: -1 } }, { "user": { "name": "Tina", "age": 29 }, "location": { "x": -2, "y": 3 } }, ... ]

© 2012 triAGENS GmbH | 2012-04-13 37


To return a flat result from hierchical data (e.g. data from multiple collections), the MERGE() function can be employed:FOR u IN users FOR l IN locationsRETURN MERGE(u, l)

Would produce:[ { "name": "John", "age": 25, "x": 1, "y": -1 }, { "name": "Tina", "age": 29, "x": -2, "y": 3 }, ... ]

© 2012 triAGENS GmbH | 2012-04-13 38

SORT: Sorting

The SORT keyword will force a sort of the list of intermediate results according to one or multiple criteria

Example (sort by first and last name first, then by id):FOR u IN users FOR l IN locationsSORT u.first, u.last, l.id DESC

This is very similar to ORDER BY in SQL

© 2012 triAGENS GmbH | 2012-04-13 39

LIMIT: Result set slicing

The LIMIT keyword allows slicing the list of result documents using an offset and a count

Example for top 3 (offset = 0, count = 3): FOR u IN usersSORT u.first, u.lastLIMIT 0, 3

© 2012 triAGENS GmbH | 2012-04-13 40

LET: variable creation

The LET keyword can be used to create a variable using data from a subexpression (e.g. a FOR expression)

Example (will populate variable t with the result of the FOR):LET t = ( FOR u IN users)

This will populate t with[ u1, u2, u3, u4, ... un ]

explicit scope bounds

© 2012 triAGENS GmbH | 2012-04-13 41


The results created using LET can be filtered afterwardsusing the FILTER keyword

This is then similar to the behaviour of HAVING in SQL

Example using a single collection (users):FOR u IN users LET friends = ( FOR f IN u.friends )FILTER LENGTH(friends) > 5

Iterates over an attribute („friends“) of each u

function to retrieve the length of a list

© 2012 triAGENS GmbH | 2012-04-13 42


Example using two collections (users, friends):FOR u IN users LET friends = ( FOR f IN friends FILTER u.id == f.uid )FILTER LENGTH(friends) > 5

Differences to previous one collection example:

replaced f IN u.friends with just f IN friends

added inner filter condition

© 2012 triAGENS GmbH | 2012-04-13 43


SQL approach:SELECT u.*, GROUP_CONCAT(f.uid) AS friendsFROM users u (INNER) JOIN friends fON u.id = f.uidGROUP BY u.id HAVING COUNT(f.uid) > 5

Notes: we are using 2 different tables now

the GROUP_CONCAT() aggregate function will create the friend list as a comma-separated string

need to use GROUP BY to aggregate

non-portable: GROUP_CONCAT is available in MySQL only

© 2012 triAGENS GmbH | 2012-04-13 44


More complex example (selecting users along with logins and group membership):FOR u IN users LET logins = ( FOR l IN logins_2012 FILTER u.id == l.uid ) LET groups = ( FOR g IN group_memberships FILTER u.id == g.uid )RETURN { "user": u, "logins": logins, "groups": groups}

for each user, all users logins are put into variable „logins“

for each user, all group memberships are put into variable „groups“

logins and groups are independent of each other

© 2012 triAGENS GmbH | 2012-04-13 45

COLLECT: grouping

The COLLECT keyword can be used to group a list by one or multiple group criteria

Difference to SQL: in AvocadoDB COLLECT performs grouping, but no aggregation

Aggregation can be performed later using LET or RETURN

The result of COLLECT is a (grouped/hierarchical) list of documents, containing one document for each group

This document contains the group criteria values

The list of documents for the group can optionally be retrieved by using the INTO keyword

© 2012 triAGENS GmbH | 2012-04-13 46

COLLECT: grouping

Example: retrieve the users per city (non-aggregated):FOR u IN usersCOLLECT city = u.cityINTO gRETURN { "c": city, "u": g }

Produces the following result:[ { "c": "cgn", "u": [ { "u": {..} }, { "u": {..} }, { "u": {..} } ] }, { "c": "ffm", "u": [ { "u": {..} }, { "u": {..} } ], { "c": "ddf", "u": [ { "u": {..} } ] } ]

group criterion (name: „city“, value: u.city)

captures group values into variable gg contains all group members

© 2012 triAGENS GmbH | 2012-04-13 47

COLLECT: grouping

Example: retrieve the number of users per city (aggregated):FOR u IN usersCOLLECT city = u.cityINTO gRETURN { "c": city, "numUsers": LENGTH(g) }

Produces the following result:[ { "c": "cgn", "numUsers": 3 }, { "c": "ffm", "numUsers": 2 }, { "c": "ddf", "numUsers": 1 } ]

© 2012 triAGENS GmbH | 2012-04-13 48

Aggregate functions

Query language should provide some aggregate functions, e.g. MIN()

MAX()

SUM()

LENGTH()

Input to aggregate functions is a list of values to process. Example:[ { "user": { "type": 1, "rating": 1 } }, { "user": { "type": 1, "rating": 4 } }, { "user": { "type": 1, "rating": 3 } } ]

Problem: how to access the „user.rating“ attribute of each value inside the aggregate function?

© 2012 triAGENS GmbH | 2012-04-13 49

Aggregate functions

Solution 1: use „access to all list members“ shortcut:FOR u IN [ { "user": { "type": 1, "rating": 1 } }, { "user": { "type": 1, "rating": 4 } }, { "user": { "type": 1, "rating": 3 } } ]COLLECT type = u.typeINTO gRETURN { "type": type, "maxRating": MAX(g[*].u.user.rating)}

g[*] will iterate over all elements in g and return each elements u.user.rating attribute

© 2012 triAGENS GmbH | 2012-04-13 50

Aggregate functions

Solution 2: use FOR sub-expression to iterate over group elements

FOR u IN usersCOLLECT city = u.cityINTO gRETURN { "c" : city, "numUsers" : LENGTH(g), "maxRating": MAX((FOR e IN g RETURN e.user.rating))}

capture group values

g is a variable containing all group members

sub-expression to iterate over all elements in the group

© 2012 triAGENS GmbH | 2012-04-13 51

Unions and intersections

Unions and intersections can be created by invoking functions on lists:

UNION(list1, list2)

INTERSECTION(list1, list2)

There will not be special keywords as in SQL

© 2012 triAGENS GmbH | 2012-04-13 52

Graph queries

In AvocadoDB, relations between documents can be stored using graphs

Graphs can be used to model tree structures, networks etc.

Popular use cases: find friends of friends

find similarities

find recommendations

© 2012 triAGENS GmbH | 2012-04-13 53

Graph queries

In AvocadoDB, a graph is composition of vertices: the nodes in the graph

edges: the relations between nodes in the graph

Vertices are stored as documents in regular collections

Edges are stored as documents in special edge collections, with each edge having the following attributes: _from id of linked vertex (incoming relation)

_to id of linked vertex (outgoing relation)

Additionally, all document have an _id attribute

The _id values are used for linking in the edges collections

© 2012 triAGENS GmbH | 2012-04-13 54

Graph queries

Task: find direct friends of users

Data: users are related (friend relationships) to other users

Example data (vertex collection „users“):[ { "_id": 123, "name": "John", "age": 25 }, { "_id": 456, "name": "Tina", "age": 29 }, { "_id": 235, "name": "Bob", "age": 15 }, { "_id": 675, "name": "Phil", "age": 12 } ]

Example data (edge collection „relations“):[ { "_id": 1, "_from": 123, "_to": 456 }, { "_id": 2, "_from": 123, "_to": 235 }, { "_id": 3, "_from": 456, "_to": 123 }, { "_id": 4, "_from": 456, "_to": 235 }, { "_id": 5, "_from": 235, "_to": 456 }, { "_id": 6, "_from": 235, "_to": 675 } ]

© 2012 triAGENS GmbH | 2012-04-13 55

Graph queries

To traverse the graph, the PATHS function can be used

It traverses a graph's edges defined in an edge collection and produces a list of paths found

Each path object will have the following properties: _from id of vertex the path started at

_to id of vertex the path ended with

_edges edges visited along the path

_vertices vertices visited along the path

© 2012 triAGENS GmbH | 2012-04-13 56

Graph queries

Example:FOR u IN users LET friends = ( FOR p IN PATHS(relations, OUTBOUND, 1) FILTER p._from == u._id )

edge collection: relationsdirection: OUTBOUNDmax path length: 1

path variable name: p

only consider paths starting at the current user (using the user's _id attribute)

© 2012 triAGENS GmbH | 2012-04-13 57

Graph queries

Produces:[ { "u": { "_id": 123, "name": "John", "age": 25 }, "p": [ { "_from": 123, "_to": 456, ... }, { "_from": 123, "_to": 235, ... } ] }, { "u": { "_id": 456, "name": "Tina", "age": 29 }, "p": [ { "_from": 456, "_to": 123, ... }, { "_from": 456, "_to": 235, ... } ] }, { "u": { "_id": 235, "name": "Bob", "age": 15}, "p": [ { "_from": 235, „_to": 456, ... }, { "_from": 235, „_to": 675, ... } ] }, { "u": { "_id": 675, "name": "Phil", "age": 12 }, "p": [ ] } ]

Note: _edges and _vertices attributes for each p left out for space reasons

© 2012 triAGENS GmbH | 2012-04-13 58

Summary: main keywords

FOR ... IN

FILTER

RETURN

SORT

LIMIT

LET

COLLECT ... INTO

List iteration

Results filtering

Results projection

Sorting

Results set slicing

Variable creation

Grouping

Keyword Use case

© 2012 triAGENS GmbH | 2012-04-13 59

Q & A

Your feedback on the draft is highly appreciated

Please let us know what you think: [email protected]

[email protected]@triagens.de#AvocadoDB

And please try out AvocadoDB: http://www.avocadodb.org/

https://github.com/triAGENS/AvocadoDB

mailto:[email protected]



http://www.avocadodb.org/

https://github.com/triAGENS/AvocadoDB

Business

AvocadoDB query language (DRAFT!)