Upload
avocadodb
View
3.393
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs. The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation. Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
Citation preview
© 2012 triAGENS GmbH | 2012-04-13 1
AvocadoDB query languageJan Steemann (triAGENS)
© 2012 triAGENS GmbH | 2012-04-13 2
Database query languages / paradigms
There are many database query languages and paradigms around
Some examples: SQL
declarative query language for relational databases, well-known and popular
UNQLdeclarative query language for document databases, SQL-syntax like, embeds JSON
graph query languages (Cypher, Gremlin, ...)declarative languages focusing on graph queries
fluent query languages/interfacese.g. db.user.find(...).sort(...)
map/reduceimperative query formulation/programming
...
© 2012 triAGENS GmbH | 2012-04-13 3
AvocadoDB query language: status quo
There is a query language in AvocadoDB
The language syntax is very similar to SQL / UNQL
The language currently supports reading data from collections (i.e. equivalent to an SQL/UNQL SELECT query)
Some complex access patterns (e.g. joins using multiple collections) are also supported
There are some specials as creating inline lists from a list of documents (named: LIST JOIN)
© 2012 triAGENS GmbH | 2012-04-13 4
AvocadoDB query language: status quo
There is a query language in AvocadoDB
The language syntax is very similar to SQL / UNQL
The language currently supports reading data from collections (i.e. equivalent to an SQL/UNQL SELECT query)
Some complex access patterns (e.g. joins using multiple collections) are also supported
There are some specials as creating inline lists from a list of documents (named: LIST JOIN)
© 2012 triAGENS GmbH | 2012-04-13 5
AvocadoDB query language: status quo
Syntax example:
SELECT { "user": u, "friends": f }FROM users uLIST JOIN friends fON (u.id == f.uid)WHERE u.type == 1ORDER BY u.name
© 2012 triAGENS GmbH | 2012-04-13 6
Language problems
The current query language has the problem that some queries cannot be expressed very well with it
This might be due to the query language being based on SQL, and SQL being a query language for relational databases
AvocadoDB is mainly a document-oriented database and its object model does only partly overlap with the SQL object model:
SQL (relational): tables
(homogenous) rows
columns
scalars
references
AvocadoDB (document-oriented): collections
(inhomogenous) documents
attributes
scalars
lists
edges
© 2012 triAGENS GmbH | 2012-04-13 7
Language problems: multi-valued attributes
Attributes in AvocadoDB can and shall be stored denormalised (multi-valued attributes, lists, ...):{ "user": { "name": "Fred",
"likes": [ "Fishing", "Hiking", "Swimming" ] } }
In an SQL database, this storage model would be an anti-pattern
Problem: SQL is not designed to access multi-valued attributes/lists but in AvocadoDB we want to support them via the language
UNQL addresses this partly, but does not go far enough
© 2012 triAGENS GmbH | 2012-04-13 8
Language problems: graph queries
AvocadoDB also supports querying graphs
Neither SQL nor UNQL offer any „natural“ graph traversal facilities
Instead, there are: SQL language extensions: e.g. CONNECT BY, proprietary
SQL stored procedures: e.g. PL/SQL imperative code, does not match well with the declarative nature of SQL
Neither SQL nor UNQL are the languages of choice for graph queries, but we want to support graph queries in AvocadoDB
© 2012 triAGENS GmbH | 2012-04-13 9
AvocadoDB query language, version 2
During the past few weeks we thought about moving AvocadoDB's query language from the current SQL-/UNQL-based syntax to something else
We did not find an existing query language that addresses the problems we had too well
So we tried to define a syntax for a new query language
© 2012 triAGENS GmbH | 2012-04-13 10
AvocadoDB query language, version 2
The new AvocadoDB query language should have an easy-to-understand syntax for the end user
offer a way to declaratively express queries
avoid ASCII art queries
still allow more complex queries (joins, sub-queries etc.)
allow accessing lists and list elements more naturally
be usable with the different data models AvocadoDB supports(e.g. document-oriented, graph, „relational“)
be consistent and easy to process
have one syntax regardless of the underlying client language
© 2012 triAGENS GmbH | 2012-04-13 11
AvocadoDB query language, version 2
A draft of the new language version is presented as follows
It is not yet finalized and not yet implemented
Your feedback on it is highly appreciated
Slides will be uploaded to http://www.avocadodb.org/
© 2012 triAGENS GmbH | 2012-04-13 12
Data types
The language has the following data types: absence of a value:null
boolean truth values:false, true
numbers (signed double precision):1, -34.24
strings, e.g."John", "goes fishing"
lists (with elements accessible by their position), e.g.[ "one", "two", false, -1 ]
documents (with elements accessible by their name), e.g.{ "user": { "name": "John", "age": 25 } }
Note: names of document attributes can also be used without surrounding quotes
© 2012 triAGENS GmbH | 2012-04-13 13
Bind parameters
Queries can be parametrized using bind parameters
This allows separation of query text and actual query values
Any literal values, including lists and documents can be bound
Collection names can also be bound
Bind parameters can be accessed in the query using the @ prefix
Example:@ageu.name == @nameu.state IN @states
© 2012 triAGENS GmbH | 2012-04-13 14
Operators
The language has the following operators: logical: will return a boolean value or an error&& || !
arithmetic: will return a numeric value or an error+ - * / %
relational: will return a boolean value or an error== != < <= > >= IN
ternary: will return the true or the false part? :
String concatentation will be provided via a function
© 2012 triAGENS GmbH | 2012-04-13 15
Type casts
Typecasts can be achieved by explicitly calling typecast functions
No implicit type cast will be performed
Performing an operation with invalid/inappropriate types will result in an error
When performing an operation that does not have a valid or defined result, the outcome will be an error:1 / 0 => error1 + "John" => error
Errors might be caught and converted to null in a query or bubble up to the top, aborting the query. This depends on settings
© 2012 triAGENS GmbH | 2012-04-13 16
Null
When referring to something non-existing (e.g. a non-existing attribute of a document), the result will be null:users.nammme => null
Using the comparison operators, null can be compared to other values and also null itself. The result will be a boolean (not null as in SQL)
© 2012 triAGENS GmbH | 2012-04-13 17
Type comparisons
When comparing two values, the following algorithm is used
If the types of the compared values are not equal, the compare result is as follows:null < boolean < number < string < list < document
Examples:null < false 0 != nullfalse < 0 null != falsetrue < 0 false != "" true < [ 0 ] "" != [ ] true < [ ] null != [ ]0 < [ ][ ] < { }
© 2012 triAGENS GmbH | 2012-04-13 18
Type comparisons
If the types are equal, the actual values are compared
For boolean values, the order is:false < true
For numeric values, the order is determined by the numeric value
For string values, the order is determined by bytewise comparison of the strings characters
Note: at some point, collations will need to be introduced for string comparisons
© 2012 triAGENS GmbH | 2012-04-13 19
Type comparisons
For list values, the elements from both lists are compared at each position. For each list element value, the described comparisons will be done recursively: [ 1 ] > [ 0 ][ 2, 0 ] > [ 1, 2 ][ 99, 4 ] > [ 99, 3 ][ 23 ] > [ true ][ [ 1 ] ] > 99[ ] > 1[ true ] > [ ][ null ] > [ ][ true, 0 ] > [ true ]
© 2012 triAGENS GmbH | 2012-04-13 20
Type comparisons
For document values, the attribute names from both documents are collected and sorted. The sorted attribute names are then checked individually: if one of the documents does not have the attribute, it will be considered „smaller“. If both documents have the attribute, a value comparison will be done recursively:{ } < { "age": 25 }{ "age": 25 } < { "age": 26 }{ "age": 25 } > { "name": "John" }{ "name": "John", == { "age": 25, "age": 25 } "name": "John" }{ "age": 25 } < { "age": 25, "name": "John" }
© 2012 triAGENS GmbH | 2012-04-13 21
Base building block: lists
A good part of the query language is about processing lists
There are several types of lists: statically declared lists, e.g. [ { "user": { "name": "Fred" } }, { "user": { "name": "John" } } ]
lists of documents from collections, e.g. users
locations
result lists from filters/queries, e.g. NEAR(locations, [ 43, 10 ], 100)
© 2012 triAGENS GmbH | 2012-04-13 22
FOR: List iteration
The FOR keyword can be used to iterate over all elements from a list
Example (collection-based, collection „users“):FOR u IN users
A result document (named: u) is produced on each iteration
The above example produces the following result list:[ u1, u2, u3, ..., un ]
Note: this is comparable to the following SQL:SELECT * FROM users u
In each iteration, the individual element is accessible via its name (u)
© 2012 triAGENS GmbH | 2012-04-13 23
FOR: List iteration
Nesting of multiple FOR blocks is possible
Example: cross product of users and locations (u x l):FOR u IN users FOR l IN locations
A result document containing both variables (u, l) is produced on each iteration of the inner loop
The result document contains both u and l
Note: this is equivalent to the following SQL queries:SELECT * FROM users u, locations lSELECT * FROM users u INNER JOIN locations lON (1=1)
© 2012 triAGENS GmbH | 2012-04-13 24
FOR: List iteration
Example: cross product of years & quarters (non collection-based):FOR year IN [ 2011, 2012, 2013 ] FOR quarter IN [ 1, 2, 3, 4 ]
Note: this is equivalent to the following SQL query:SELECT * FROM (SELECT 2011 UNION SELECT 2012 UNION SELECT 2013) year, (SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) quarter
© 2012 triAGENS GmbH | 2012-04-13 25
FILTER: results filtering
The FILTER keyword can be used to restrict the results to elements that match some definable condition
Example: retrieve all users that are activeFOR u IN usersFILTER u.active == true
Note: this is equivalent to the following SQL:SELECT * FROM users u WHERE u.active = true
Access to the individual list elements in FOR list using variable name u
© 2012 triAGENS GmbH | 2012-04-13 26
FILTER: results filtering
The FILTER keyword in combination with nested FOR blocks can be used to perform joins
Example: retrieve all users that have matching locationsFOR u IN users FOR
l IN locationsFILTER u.a == l.b
Note: this is equivalent to the following SQL queries:SELECT * FROM users u, locations l WHERE u.a == l.bSELECT * FROM users u (INNER) JOIN locations l ON u.a == l.b
Access to the individual list elements using variable names
© 2012 triAGENS GmbH | 2012-04-13 27
Base building block: scopes
The query language is scoped
Variables can only be used after they have been declared
Example:FOR u IN users FOR l IN locationsFILTER u.a == l.b
Scopes can be made explicit using brackets (will be shown later)
Introduces u
Introduces l
Can use both u and l
© 2012 triAGENS GmbH | 2012-04-13 28
FILTER: results filtering
Thanks to scopes, the FILTER keyword can be used everywhere where SQL needs multiple keywords: ON
WHERE
HAVING
© 2012 triAGENS GmbH | 2012-04-13 29
FILTER: results filtering
That means: in AvocadoDB you would use FILTERFOR u IN users FOR l IN locationsFILTER u.a == l.b
whereas in SQL you would use either ONSELECT * FROM users (INNER) JOIN locations l ON u.a == l.b
or WHERE:SELECT * FROM users, locations lWHERE u.a == l.b
© 2012 triAGENS GmbH | 2012-04-13 30
FILTER: results filtering
FILTER can be used to model both an SQL ON and an SQL WHERE in one go:FOR u IN users FOR l IN locationsFILTER u.active == 1 && u.a == l.b
This is equivalent to the following SQL query:SELECT * FROM users u (INNER) JOIN locations lON u.a == l.b WHERE u.active = 1
© 2012 triAGENS GmbH | 2012-04-13 31
FILTER: results filtering
More than one FILTER condition allowed per query
The following queries are all equivalent
Optimizer's job is to figure out best positions for applying FILTERsFOR u IN usersFILTER u.c == 1 FOR l IN locations FILTER l.d == 2FILTER u.a == l.b
FOR u IN users FOR l IN locationsFILTER u.c == 1 && l.d == 2 && u.a == l.b
FOR u IN users FOR l IN locations FILTER l.d == 2 && u.a == l.bFILTER u.c == 1
====
© 2012 triAGENS GmbH | 2012-04-13 32
RETURN: results projection
The RETURN keyword produces the end result documents from the intermediate results produced by the query
Comparable to the SELECT part in an SQL query
RETURN part is mandatory at the end of a query(and at the end of each subquery)
RETURN is partly left out in this presentation for space reasons
© 2012 triAGENS GmbH | 2012-04-13 33
RETURN: results projection
Example:FOR u IN usersRETURN { "name" : u.name, "likes" : u.likes, "numFriends": LENGTH(u.friends)}
Produces such document for each u found
© 2012 triAGENS GmbH | 2012-04-13 34
RETURN: results projection
To return all documents as they are in the original list, thereis the following variant:FOR u IN usersRETURN u
Would produce:[ { "name": "John", "age": 25 }, { "name": "Tina", "age": 29 }, ... ]
Note: this is similar to SQL's SELECT u.*
© 2012 triAGENS GmbH | 2012-04-13 35
RETURN: results projection
To return just the names for all users, the following query would do:FOR u IN usersRETURN u.name
Would produce:[ "John", "Tina", ... ]
Note: this is similar to SQL's SELECT u.name
© 2012 triAGENS GmbH | 2012-04-13 36
RETURN: results projection
To return a hierchical result (e.g. data from multiple collections),the following query could be used:FOR u IN users FOR l IN locationsRETURN { "user": u, "location" : l }
Would produce:[ { "user": { "name": "John", "age": 25 }, "location": { "x": 1, "y“: -1 } }, { "user": { "name": "Tina", "age": 29 }, "location": { "x": -2, "y": 3 } }, ... ]
© 2012 triAGENS GmbH | 2012-04-13 37
RETURN: results projection
To return a flat result from hierchical data (e.g. data from multiple collections), the MERGE() function can be employed:FOR u IN users FOR l IN locationsRETURN MERGE(u, l)
Would produce:[ { "name": "John", "age": 25, "x": 1, "y": -1 }, { "name": "Tina", "age": 29, "x": -2, "y": 3 }, ... ]
© 2012 triAGENS GmbH | 2012-04-13 38
SORT: Sorting
The SORT keyword will force a sort of the list of intermediate results according to one or multiple criteria
Example (sort by first and last name first, then by id):FOR u IN users FOR l IN locationsSORT u.first, u.last, l.id DESC
This is very similar to ORDER BY in SQL
© 2012 triAGENS GmbH | 2012-04-13 39
LIMIT: Result set slicing
The LIMIT keyword allows slicing the list of result documents using an offset and a count
Example for top 3 (offset = 0, count = 3): FOR u IN usersSORT u.first, u.lastLIMIT 0, 3
© 2012 triAGENS GmbH | 2012-04-13 40
LET: variable creation
The LET keyword can be used to create a variable using data from a subexpression (e.g. a FOR expression)
Example (will populate variable t with the result of the FOR):LET t = ( FOR u IN users)
This will populate t with[ u1, u2, u3, u4, ... un ]
explicit scope bounds
© 2012 triAGENS GmbH | 2012-04-13 41
LET: variable creation
The results created using LET can be filtered afterwardsusing the FILTER keyword
This is then similar to the behaviour of HAVING in SQL
Example using a single collection (users):FOR u IN users LET friends = ( FOR f IN u.friends )FILTER LENGTH(friends) > 5
Iterates over an attribute („friends“) of each u
function to retrieve the length of a list
© 2012 triAGENS GmbH | 2012-04-13 42
LET: variable creation
Example using two collections (users, friends):FOR u IN users LET friends = ( FOR f IN friends FILTER u.id == f.uid )FILTER LENGTH(friends) > 5
Differences to previous one collection example:
replaced f IN u.friends with just f IN friends
added inner filter condition
© 2012 triAGENS GmbH | 2012-04-13 43
LET: variable creation
SQL approach:SELECT u.*, GROUP_CONCAT(f.uid) AS friendsFROM users u (INNER) JOIN friends fON u.id = f.uidGROUP BY u.id HAVING COUNT(f.uid) > 5
Notes: we are using 2 different tables now
the GROUP_CONCAT() aggregate function will create the friend list as a comma-separated string
need to use GROUP BY to aggregate
non-portable: GROUP_CONCAT is available in MySQL only
© 2012 triAGENS GmbH | 2012-04-13 44
LET: variable creation
More complex example (selecting users along with logins and group membership):FOR u IN users LET logins = ( FOR l IN logins_2012 FILTER u.id == l.uid ) LET groups = ( FOR g IN group_memberships FILTER u.id == g.uid )RETURN { "user": u, "logins": logins, "groups": groups}
for each user, all users logins are put into variable „logins“
for each user, all group memberships are put into variable „groups“
logins and groups are independent of each other
© 2012 triAGENS GmbH | 2012-04-13 45
COLLECT: grouping
The COLLECT keyword can be used to group a list by one or multiple group criteria
Difference to SQL: in AvocadoDB COLLECT performs grouping, but no aggregation
Aggregation can be performed later using LET or RETURN
The result of COLLECT is a (grouped/hierarchical) list of documents, containing one document for each group
This document contains the group criteria values
The list of documents for the group can optionally be retrieved by using the INTO keyword
© 2012 triAGENS GmbH | 2012-04-13 46
COLLECT: grouping
Example: retrieve the users per city (non-aggregated):FOR u IN usersCOLLECT city = u.cityINTO gRETURN { "c": city, "u": g }
Produces the following result:[ { "c": "cgn", "u": [ { "u": {..} }, { "u": {..} }, { "u": {..} } ] }, { "c": "ffm", "u": [ { "u": {..} }, { "u": {..} } ], { "c": "ddf", "u": [ { "u": {..} } ] } ]
group criterion (name: „city“, value: u.city)
captures group values into variable gg contains all group members
© 2012 triAGENS GmbH | 2012-04-13 47
COLLECT: grouping
Example: retrieve the number of users per city (aggregated):FOR u IN usersCOLLECT city = u.cityINTO gRETURN { "c": city, "numUsers": LENGTH(g) }
Produces the following result:[ { "c": "cgn", "numUsers": 3 }, { "c": "ffm", "numUsers": 2 }, { "c": "ddf", "numUsers": 1 } ]
© 2012 triAGENS GmbH | 2012-04-13 48
Aggregate functions
Query language should provide some aggregate functions, e.g. MIN()
MAX()
SUM()
LENGTH()
Input to aggregate functions is a list of values to process. Example:[ { "user": { "type": 1, "rating": 1 } }, { "user": { "type": 1, "rating": 4 } }, { "user": { "type": 1, "rating": 3 } } ]
Problem: how to access the „user.rating“ attribute of each value inside the aggregate function?
© 2012 triAGENS GmbH | 2012-04-13 49
Aggregate functions
Solution 1: use „access to all list members“ shortcut:FOR u IN [ { "user": { "type": 1, "rating": 1 } }, { "user": { "type": 1, "rating": 4 } }, { "user": { "type": 1, "rating": 3 } } ]COLLECT type = u.typeINTO gRETURN { "type": type, "maxRating": MAX(g[*].u.user.rating)}
g[*] will iterate over all elements in g and return each elements u.user.rating attribute
© 2012 triAGENS GmbH | 2012-04-13 50
Aggregate functions
Solution 2: use FOR sub-expression to iterate over group elements
FOR u IN usersCOLLECT city = u.cityINTO gRETURN { "c" : city, "numUsers" : LENGTH(g), "maxRating": MAX((FOR e IN g RETURN e.user.rating))}
capture group values
g is a variable containing all group members
sub-expression to iterate over all elements in the group
© 2012 triAGENS GmbH | 2012-04-13 51
Unions and intersections
Unions and intersections can be created by invoking functions on lists:
UNION(list1, list2)
INTERSECTION(list1, list2)
There will not be special keywords as in SQL
© 2012 triAGENS GmbH | 2012-04-13 52
Graph queries
In AvocadoDB, relations between documents can be stored using graphs
Graphs can be used to model tree structures, networks etc.
Popular use cases: find friends of friends
find similarities
find recommendations
© 2012 triAGENS GmbH | 2012-04-13 53
Graph queries
In AvocadoDB, a graph is composition of vertices: the nodes in the graph
edges: the relations between nodes in the graph
Vertices are stored as documents in regular collections
Edges are stored as documents in special edge collections, with each edge having the following attributes: _from id of linked vertex (incoming relation)
_to id of linked vertex (outgoing relation)
Additionally, all document have an _id attribute
The _id values are used for linking in the edges collections
© 2012 triAGENS GmbH | 2012-04-13 54
Graph queries
Task: find direct friends of users
Data: users are related (friend relationships) to other users
Example data (vertex collection „users“):[ { "_id": 123, "name": "John", "age": 25 }, { "_id": 456, "name": "Tina", "age": 29 }, { "_id": 235, "name": "Bob", "age": 15 }, { "_id": 675, "name": "Phil", "age": 12 } ]
Example data (edge collection „relations“):[ { "_id": 1, "_from": 123, "_to": 456 }, { "_id": 2, "_from": 123, "_to": 235 }, { "_id": 3, "_from": 456, "_to": 123 }, { "_id": 4, "_from": 456, "_to": 235 }, { "_id": 5, "_from": 235, "_to": 456 }, { "_id": 6, "_from": 235, "_to": 675 } ]
© 2012 triAGENS GmbH | 2012-04-13 55
Graph queries
To traverse the graph, the PATHS function can be used
It traverses a graph's edges defined in an edge collection and produces a list of paths found
Each path object will have the following properties: _from id of vertex the path started at
_to id of vertex the path ended with
_edges edges visited along the path
_vertices vertices visited along the path
© 2012 triAGENS GmbH | 2012-04-13 56
Graph queries
Example:FOR u IN users LET friends = ( FOR p IN PATHS(relations, OUTBOUND, 1) FILTER p._from == u._id )
edge collection: relationsdirection: OUTBOUNDmax path length: 1
path variable name: p
only consider paths starting at the current user (using the user's _id attribute)
© 2012 triAGENS GmbH | 2012-04-13 57
Graph queries
Produces:[ { "u": { "_id": 123, "name": "John", "age": 25 }, "p": [ { "_from": 123, "_to": 456, ... }, { "_from": 123, "_to": 235, ... } ] }, { "u": { "_id": 456, "name": "Tina", "age": 29 }, "p": [ { "_from": 456, "_to": 123, ... }, { "_from": 456, "_to": 235, ... } ] }, { "u": { "_id": 235, "name": "Bob", "age": 15}, "p": [ { "_from": 235, „_to": 456, ... }, { "_from": 235, „_to": 675, ... } ] }, { "u": { "_id": 675, "name": "Phil", "age": 12 }, "p": [ ] } ]
Note: _edges and _vertices attributes for each p left out for space reasons
© 2012 triAGENS GmbH | 2012-04-13 58
Summary: main keywords
FOR ... IN
FILTER
RETURN
SORT
LIMIT
LET
COLLECT ... INTO
List iteration
Results filtering
Results projection
Sorting
Results set slicing
Variable creation
Grouping
Keyword Use case
© 2012 triAGENS GmbH | 2012-04-13 59
Q & A
Your feedback on the draft is highly appreciated
Please let us know what you think: [email protected]
[email protected]@triagens.de#AvocadoDB
And please try out AvocadoDB: http://www.avocadodb.org/
https://github.com/triAGENS/AvocadoDB