11
Translating openCypher Queries to SQL Working Draft MÁRTON ELEKES, JÁNOS BENJAMIN ANTAL, JÓZSEF MARTON, and GÁBOR SZÁRNYAS In the last decade, a lot of database management systems were developed with different NoSQL techniques. One group of these systems are graph databases, which allow users to store and query their data as graphs. This data model is often a better fit to represent strongly interlinked data sets than the traditional relational model, and its conciseness can lead to better performance. That said, relational databases have been developed and optimized for almost 50 years, and it is an open question whether efficient processing of graph data requires specialized databases. New query languages, such as openCypher, were developed for querying and processing graph data. These languages usually offer a more intuitive way to express graph queries than SQL-like languages. However, most enterprises still store their data in traditional relational databases, which necessitates loading their data to graph databases. This is often impractical or infeasible for production databases. Our goal is to allow using expressive graph query languages and leverage the performance of existing relational databases while avoiding the overhead of transferring the data between different systems. To this end, we developed a transpiler which can transform openCypher graph queries into SQL. Comparing the performance of database systems requires standard benchmarks. For relational databases, this is fulfilled by the benchmarks of the Transaction Processing Performance Council. Due to the relative immaturity of graph databases, there is only a limited number of benchmarks available for graph query workloads. We joined the development of the LDBC (Linked Data Benchmark Council) Social Network Benchmark. We reworked and significantly improved existing implementations of the Interactive Workload, then performed a thorough evaluation and detailed analysis of database systems. ACM Reference Format: Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas. 2019. Translating openCypher Queries to SQL: Working Draft. 1, 1 (March 2019), 11 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 PRELIMINARIES 1.1 Case study Social network in Figure 1 24 years old Alice speaks English 53 years old Bob speaks English, German Neofolk Folk Music Art knows interests in has class is subclass of Fig. 1. Example graph Authors’ address: Márton Elekes; János Benjamin Antal; József Marton, [email protected]; Gábor Szárnyas, [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2019 Association for Computing Machinery. Manuscript submitted to ACM Manuscript submitted to ACM 1

Translating openCypher Queries to SQL

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Translating openCypher Queries to SQL

Translating openCypher Queries to SQLWorking Draft

MÁRTON ELEKES, JÁNOS BENJAMIN ANTAL, JÓZSEF MARTON, and GÁBOR SZÁRNYASIn the last decade, a lot of database management systems were developed with different NoSQL techniques. One group ofthese systems are graph databases, which allow users to store and query their data as graphs. This data model is often abetter fit to represent strongly interlinked data sets than the traditional relational model, and its conciseness can lead tobetter performance. That said, relational databases have been developed and optimized for almost 50 years, and it is anopen question whether efficient processing of graph data requires specialized databases.

New query languages, such as openCypher, were developed for querying and processing graph data. These languagesusually offer a more intuitive way to express graph queries than SQL-like languages. However, most enterprises still store theirdata in traditional relational databases, which necessitates loading their data to graph databases. This is often impractical orinfeasible for production databases. Our goal is to allow using expressive graph query languages and leverage the performanceof existing relational databases while avoiding the overhead of transferring the data between different systems. To this end,we developed a transpiler which can transform openCypher graph queries into SQL.

Comparing the performance of database systems requires standard benchmarks. For relational databases, this is fulfilled bythe benchmarks of the Transaction Processing Performance Council. Due to the relative immaturity of graph databases, thereis only a limited number of benchmarks available for graph query workloads. We joined the development of the LDBC (LinkedData Benchmark Council) Social Network Benchmark. We reworked and significantly improved existing implementations ofthe Interactive Workload, then performed a thorough evaluation and detailed analysis of database systems.

ACM Reference Format:Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas. 2019. Translating openCypher Queries to SQL:Working Draft. 1, 1 (March 2019), 11 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 PRELIMINARIES1.1 Case studySocial network in Figure 1

24 years oldAlice

speaks English

53 years oldBob

speaks English, German

Neofolk Folk

MusicArt

knows

interests in

has class

is subclass of

Fig. 1. Example graph

Authors’ address: Márton Elekes; János Benjamin Antal; József Marton, [email protected]; Gábor Szárnyas, [email protected].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee providedthat copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation onthe first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit ispermitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].© 2019 Association for Computing Machinery.Manuscript submitted to ACM

Manuscript submitted to ACM 1

Page 2: Translating openCypher Queries to SQL

2 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas

2

INTEREST

level: 4

SUBCLASS_OF

5

4

3 CLASS1

KNOWS

since: 2014

c:Tag

topic: 'Neofolk'b

:Person

name: 'Bob', age: 53

speaks: ['en', 'de']

a

:Person:Student

name: 'Alice', age: 24

speaks: ['en']

f:Class

subject: 'Art'

e

:Class

subject: 'Music'd

:Class

subject: 'Folk'

Fig. 2. Example graph visualized. ◇ Person, ◻ Tag, △ Class

1.2 Data models

data modeldata model feature

nodes edgestypes props types props

directed graph ○ ○ ○ ○

labelled graph ● ○ ○ ○

semantic graph ● ∗ ● ○

object-oriented model ● ● ● ○

property graph ● ● ● ●

Table 1. Comparison of data models

𝐿 = {Person, Student, Class, . . .}

𝑇 = {KNOWS, INTEREST, CLASS, SUBCLASS_OF}𝑃𝑣 = {name, speaks, age, . . .}

𝑃𝑒 = {since}𝑉 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓}

𝐸 = {1, 2, 3, 4, 5}st ∶ 1→ ∐𝑎, 𝑐, 2→ ∐𝑏, 𝑐, . . .

lbl ∶ 𝑏→ {Person}, 𝑎→ {Person, Student}, . . .

typ ∶ 1→ KNOWS, 3→ CLASS, . . .

name ∶ 𝑎→ “Alice”, 𝑏→ “Bob”, 𝑒→ NULL, . . .

since ∶ 1→ 2014, 2→ NULL, 3→ NULL, . . .

. . .

1.2.1 Relational data model. The relational model of the example is shown in Figure 3.

1.3 Query languagesExample query: “return the name of persons and the number of their friends”

1.3.1 Relational algebra. [4]

Example. The example query can be expressed in relational algebra as:𝛾p.name

p.name,count(𝑓))𝜋p,p.name(Persons) (𝜋p,f (Knows) 𝜋f (Persons))⌈

1.3.2 Cypher. Cypher [1, 2]

Example. The example query in Cypher.

MATCH (p:Person)OPTIONAL MATCH (p)-[:KNOWS]->(f:Person)RETURN p.name, count(f)

Listing 1. Cypher example code

Manuscript submitted to ACM

Page 3: Translating openCypher Queries to SQL

Translating openCypher Queries to SQL 3

id name agea Alice 24b Bob 53

(a) Persons

personIda

(b) Students

id topicc Neofolk

(c) Tags

id subjectd Folke Musicf Art

(d) Classes

tag classc d

(e) HasClass

src trgd ee f

(f) SubclassOf

src trg sincea b 2014

(g) Knows

person tag levela d 4

(h) Interests

personId langa enb deb en

(i) Speaks

Fig. 3. Relational model for the example

Object‐GraphMapping

Key Col2 Col3

0 A 24

1 B 53

Relation

:Osztály2attr1: Stringattr2: int

Objects:Class1

attr1: Stringattr2: int

:Class2attr1: Stringattr2: int

Graph

PREREQUISITE

:Coursename: 'Math2'ATTEND

:Studentname: 'Alice'

:Coursename: 'Math1'

Object‐RelationalMapping

Graph‐RelationalMapping

Fig. 4. Mapping between object-oriented, relational, and graph data models

edgeedge_id BIGINT

from BIGINT

to BIGINT

type TEXT

vertexvertex_id BIGINT

edge_propertyparent BIGINT

key TEXT

value JSONB

vertex_propertyparent BIGINT

key TEXT

value JSONB

labelparent BIGINT

name TEXT

Fig. 5. Relationa schema for representing property graphs. Key attributes are denoted with , foreign keys are denoted with doubleunderline, many-to-one relationships are denoted with

Example. The example query in SQL, based on the tables of Figure 3:

SELECT p.name, COUNT(f.id)FROM persons AS pLEFT JOIN knows ON p.id = knows.srcJOIN persons AS f ON knows.trg = f.id;

Listing 2. SQL example code

2 MAPPING GRAPH QUERIES TO RELATIONAL DATABASES2.1 Requirements for mappingThe Cypher language has some language constructs that are only supported by a few SQL dialects.

Manuscript submitted to ACM

Page 4: Translating openCypher Queries to SQL

4 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas

vertex_idabcdef

(a) vertex relation

parent namea Persona Studentb Personc Tagd Classe Classf Class

(b) label relation

parent key valuea name "Alice"a age 24a speaks ["en"]b name "Bob"b age 53b speaks ["en", "de"]

. . .

(c) vertex_property relation

edge_id from to type1 a b KNOWS2 a c INTEREST3 c d CLASS4 d e SUBCLASS_OF5 e f SUBCLASS_OF

(d) edge relation

parent key value1 since 20142 level 4

(e) edge_property relation

Fig. 6. Relations representing the example property graph

Fig. 7. Translating Cypher queries to SQL

Handling arrays. 1 2 3

Dynamic typing.

Recursive queries.

2.2 Mapping Cypher queries to SQL (C2S)2.2.1 Mapping workflow.

2.2.2 Get-vertices.Example. Collect all students’ names and ids

MATCH (p:Person:Student)RETURN p, p.name

p p.namea Alice

(◯Person,Studentp,p.name )

1 SELECT vertex_id AS "p",2 (SELECT "value" FROM vertex_property3 WHERE parent = vertex_id AND key = 'name') AS "p.name"4 FROM vertex5 WHERE NOT EXISTS(VALUES ('Person'), ('Student')6 EXCEPT ALL7 SELECT name FROM label WHERE parent = vertex_id)

2.2.3 Get-edges.

1https://www.postgresql.org/docs/10/static/arrays.html2https://dev.mysql.com/worklog/task/?id=20813https://www.sqlite.org/datatype3.html

Manuscript submitted to ACM

Page 5: Translating openCypher Queries to SQL

Translating openCypher Queries to SQL 5

Example. Collect students, their interests and interest levels

MATCH (s:Student)-[i:INTEREST]->(t)RETURN s, i, i.level, t, t.topic

s i i.level t t.topica 2 4 c Neofolk

⌊ ◯INTERESTÐÐÐÐÐÐ→

i,i.level◯

Students t,t.topic}

1 SELECT "from" AS "s", edge_id AS "i", "to" AS "t",2 (SELECT "value" FROM edge_property3 WHERE parent = edge_id AND key = 'level') AS "i.level",4 (SELECT "value" FROM vertex_property5 WHERE parent = "to" AND key = 'topic') AS "t.topic"6 FROM edge7 WHERE type IN ('INTEREST') AND8 NOT EXISTS(VALUES ('Student')9 EXCEPT ALL

10 SELECT name FROM label WHERE parent = "from")

] ◯T←→e◯

L1 L2v w { ≡ ] ◯

TÐ→e◯

L1 L2v w { ∪ ] ◯

TÐ→e◯

L2 L1w v {

2.2.4 Selection and Projection.

Example. Unique names of persons under 30 years old

MATCH (p:Person)WHERE p.age < 30RETURN DISTINCT p.name AS name

nameAlice

𝛿(𝜋p.name⇑name)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)

q2

𝜎p.age<30)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)

q1

(◯Personp,p.age,p.name)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0

)

1 WITH q0 AS ( -- GetVertices2 SELECT vertex_id AS "p",3 (SELECT value FROM vertex_property4 WHERE parent = vertex_id AND key = 'age') AS "p.age",5 (SELECT value FROM vertex_property6 WHERE parent = vertex_id AND key = 'name') AS "p.name"7 FROM vertex8 WHERE NOT EXISTS(VALUES ('Person')9 EXCEPT ALL SELECT name FROM label WHERE parent = vertex_id)),

10 q1 AS ( -- Selection11 SELECT * FROM q0 WHERE "p.age" < 30),12 q2 AS ( -- Projection13 SELECT "p.name" AS "name" FROM q1)14 -- DuplicateElimination15 SELECT DISTINCT * FROM q2

2.2.5 Grouping and unwinding.

Manuscript submitted to ACM

Page 6: Translating openCypher Queries to SQL

6 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas

Example. Collect the languages and the name of their speakers

MATCH (p:Person)WITH p, p.name AS nameUNWIND p.speaks AS langRETURN lang, collect(name) AS speakers

lang speakersen [Alice, Bob]de [Bob]

𝛾langlang,collect(name)⇑speakers(𝜔𝑝.speaks⇒lang

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q2

(𝜋p,p.speaks,p.name⇑name)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)

q1

(◯Personp,p.speaks,p.name)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0

))

1 WITH q0 AS ( /* GetVertices: (p:Person) | attributes: p.speaks, p.name */ ),2 q1 AS ( /* Projection: p, p.speaks, p.name AS name */ ),3 q2 AS ( -- Unwind4 SELECT "p", "name", unnest("p.speaks") AS "lang"5 FROM q1)6 -- Grouping7 SELECT "lang", array_agg("name") AS "speakers"8 FROM q29 GROUP BY "lang"

2.2.6 Natural join.

Example. Friends of friends of Alice

MATCH (p:Person {name: 'Alice'})<-[e1:KNOWS]->(f)<-[e2:KNOWS]->(foaf)RETURN foaf.name

foaf .name∅

𝜋foaf .name ⇑≡e1 ,e2

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q8

⎝(𝜎p.name=”Alice”

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q3

] ◯KNOWS←ÐÐÐÐ→

e1◯

Personp,p.name f {

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q2=q0∪q1

)

q7

] ◯KNOWS←ÐÐÐÐ→

e2◯f foaf ,foaf .name{

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q6=q4∪q5

1 WITH q0 AS ( -- GetEdges: (p:Person)-[e1:KNOWS]->(f) | attributes: p.name2 SELECT "from" AS "p", edge_id AS "e1", "to" AS "f",⋅⋅⋅ AS "p.name" FROM edge ⋅⋅⋅),3 q1 AS ( -- GetEdges: (p:Person)<-[e1:KNOWS]-(f) | attributes: p.name4 SELECT "from" AS "f", edge_id AS "e1", "to" AS "p",⋅⋅⋅ AS "p.name" FROM edge ⋅⋅⋅),5 q2 AS ( -- UnionAll: q0 ∪ q16 SELECT ⋅⋅⋅ FROM q0 UNION ALL SELECT ⋅⋅⋅ FROM q1),7 q3 AS ( -- Selection: p.name = 'Alice'8 SELECT * FROM q2 WHERE ("p.name" = 'Alice')),9 q4 AS ( -- GetEdges: (f)-[e2:KNOWS]->(foaf) | attributes: foaf.name

10 SELECT "from" AS "f", edge_id AS "e2", "to" AS "foaf",⋅⋅⋅ FROM edge ⋅⋅⋅),11 q5 AS ( -- GetEdges: (f)<-[e2:KNOWS]-(foaf) | attributes: foaf.name12 SELECT "from" AS "foaf", edge_id AS "e2", "to" AS "f",⋅⋅⋅ FROM edge ⋅⋅⋅),13 q6 AS ( -- UnionAll: q4 ∪ q514 SELECT ⋅⋅⋅ FROM q4 UNION ALL SELECT ⋅⋅⋅ FROM q5),15 q7 AS ( -- Join16 SELECT "left"."p", "left"."p.name", "left"."e1", "left"."f",17 "right"."e2", "right"."foaf", "right"."foaf.name"18 FROM q3 AS "left" INNER JOIN q6 AS "right" ON "left"."f" = "right"."f"),19 q8 AS ( -- AllDifferent20 SELECT * FROM q7 WHERE is_unique(ARRAY["e1", "e2"]))21 -- Projection22 SELECT "foaf.name" FROM q8

2.2.7 Antijoin.

Manuscript submitted to ACM

Page 7: Translating openCypher Queries to SQL

Translating openCypher Queries to SQL 7

Example. Categories without a superclass

MATCH (c:Class)WHERE NOT (c)-[:SUBCLASS_OF]->()RETURN c.subject

c.subjectArt

𝜋𝑐.subject((◯Classc,c.subject)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0

⧸q2

] ◯SUBCLASS_OFÐÐÐÐÐÐÐÐÐ→

e1◯c v1 {

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q1

)

1 WITH q0 AS ( /* GetVertices: (c:Class) | attributes: c.subject */ ),2 q1 AS ( /* GetEdges: (c)-[e1:SUBCLASS_OF]->(v1) */ ),3 q2 AS ( -- AntiJoin4 SELECT * FROM q0 AS "left"5 WHERE NOT EXISTS(6 SELECT 1 FROM q1 AS "right"7 WHERE "left"."c" = "right"."c"))8 -- Projection9 SELECT "c.subject" FROM q2

2.2.8 Transitive join.

Example. Persons reachable from Bob in at most 6 steps

MATCH (p:Person {name: 'Bob'})<-[el:KNOWS*1..6]->(foaf)RETURN foaf.name

foaf .nameAlice

𝜋foaf .name ⇑≡el

q8

⎝(𝜎p.name=”Bob”

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q1

(◯Personp,p.name)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0

)∗

61

q5

] ◯KNOWS←ÐÐÐÐ→

el◯p foaf {

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q4=q2∪q3

q7

(◯foaf ,foaf .name)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q6

1 WITH q0 AS ( /* GetVertices: (p:Person) | attributes: p.name */ ),2 q1 AS ( /* Selection: p.name = 'Bob' */ ),3 ⋅⋅⋅4 q4 AS ( /* GetEdges: (current_from)<-[current_edge:KNOWS]->(current_to) */ ),5 q5 AS ( -- TransitiveJoin6 WITH RECURSIVE recursive_table AS (7 SELECT "p" AS start_vertex,8 ARRAY[]::BIGINT[] AS edge_list,9 "p" AS end_vertex,

10 "p.name"11 FROM q112 UNION ALL13 SELECT start_vertex,14 (edge_list || current_edge),15 current_to AS end_vertex,16 "p.name"17 FROM q4 INNER JOIN recursive_table18 ON "current_edge" <> ALL (edge_list) AND19 end_vertex = current_from AND20 array_length(edge_list) < 6)21 SELECT start_vertex AS "p",22 edge_list AS "el",23 end_vertex AS "foaf",24 "p.name"25 FROM recursive_table26 WHERE array_length(edge_list) >= 1),27 q6 AS ( /* GetVertices: (foaf) | attributes: foaf.name */ ),28 q7 AS ( /* Join: q5 q6 */ ),29 q8 AS ( -- AllDifferent30 SELECT * FROM q7)31 -- Projection32 SELECT "foaf.name" FROM q8

Manuscript submitted to ACM

Page 8: Translating openCypher Queries to SQL

8 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas

2.2.9 Mapping optional graph patterns.

Example. Persons and their interest with more than “level 3” interest level (if there are any)

MATCH (p:Person)OPTIONAL MATCH (p)-[i:INTEREST]->(t:Tag)WHERE i.level > 3RETURN p.name, t.topic

p.name t.topicAlice NeofolkBob NULL

𝜋𝑝.name,𝑡.topic((◯Personp,p.name)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0

i.level>3)⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊)

q2

⌊ ◯INTERESTÐÐÐÐÐÐ→

i,i.level◯

Tagp t,t.topic}

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q1

)

1 WITH q0 AS (/* GetVertices: (p:Person) | attributes: p.name */),2 q1 AS (/* GetEdges: (p)-[i:INTEREST]->(t:Tag) | attributes: i.level, t.topic */),3 q2 AS (-- ThetaLeftOuterJoin4 SELECT "left"."p", "left"."p.name",5 "right"."i.level", "right"."i", "right"."t", "right"."t.topic"6 FROM q0 AS "left" LEFT OUTER JOIN q1 AS "right"7 ON "left"."p" = "right"."p" AND "i.level" > 3)8 -- Projection9 SELECT "p.name", "t.topic" FROM q2;

2.2.10 Create.Example. Create new persons and knows edges

MATCH (p:Person)CREATE (p)-[k:KNOWS {since: 2018}]->(c:Person:Student {name: 'Carol'})

𝜁◯

KNOWSÐÐÐÐÐÐÐ→

k,k.since=...◯p c

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q5−q7

𝜁◯

Person,Studentc,c.name=...

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q1−q4

(◯Personp )

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0

1 WITH q0 AS ( /* GetVertices: (p:Person) */ ),2 q1 AS ( -- GenerateId3 SELECT *, nextval('vertex_seq') AS "c" FROM q0),4 q2 AS ( -- InsertVertex5 INSERT INTO vertex SELECT "c" AS vertex_id FROM q1),6 q3 AS ( -- InsertLabels7 INSERT INTO label SELECT q1."c" AS parent, labels.l AS name8 FROM q1, (VALUES ('Person'), ('Student')) AS labels(l)),9 q4 AS ( -- InsertVertexProperty

10 INSERT INTO vertex_property11 SELECT "c" AS parent, 'name' AS key, 'Carol' AS value FROM q1),12 q5 AS ( -- GenerateId13 SELECT *, nextval('edge_seq') AS "k" FROM q1),14 q6 AS ( -- InsertEdge15 INSERT INTO edge16 SELECT "k" AS edge_id, "p" AS "from", "c" AS "to", 'KNOWS' AS type FROM q5),17 q7 AS ( -- InsertEdgeProperty18 INSERT INTO edge_property19 SELECT "k" AS parent, 'since' AS key, 2018 AS value FROM q5)20 SELECT "p", "k", "c" FROM q5

2.3 Mapping queries to a predefined schema2.3.1 Nullary operators on a given schema.

Manuscript submitted to ACM

Page 9: Translating openCypher Queries to SQL

Translating openCypher Queries to SQL 9

Fig. 8. Example use of the gTop schema description approach (based on [3])

Example. Nodes labelled Message and their content properties

MATCH (m:Message)RETURN m.content

𝜋m.content (◯Messagem,m.content)

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0−q2

1 WITH q0 AS ( -- GetVertices2 SELECT make_vertex_id('post', "id") AS "m",3 "content" AS "m.content"4 FROM "post"),5 q1 AS ( -- GetVertices6 SELECT make_vertex_id('comment', "id") AS "m",7 "content" AS "m.content"8 FROM "comment"),9 q2 AS ( -- UnionAll

10 SELECT "m", "m.content" FROM q011 UNION ALL12 SELECT "m", "m.content" FROM q1)13 -- Projection14 SELECT "m.content" FROM q2

Manuscript submitted to ACM

Page 10: Translating openCypher Queries to SQL

10 Márton Elekes, János Benjamin Antal, József Marton, and Gábor Szárnyas

●●

● ●

● ●

11 12 13 14

6 7 8 9 10

1 2 3 4 5

1 3 10 1 3 10 1 3 10 1 3 10

1 3 10

0.71

3

710

30

70100

300

0.7

1

0.07

0.1

0.3

1

3

7

30

0.03

0.07

0.1

0.0010.0030.0070.010.030.070.10.30.71

3710

3070100

300

0.07

0.070.1

0.30.7

1

37

10

3070

100

0.3

0.7

0.3

0.71

3

710

30

70100

0.07

0.1

0.07

0.1

0.3

0.0030.0070.01

0.030.070.1

0.30.71

3710

Scale factor [#nodes/edges − SF1: 3/17M, SF3: 9/52M, SF10: 30/177M]

Exe

cutio

n tim

e [s

]

● PostgreSQL Cypher−to−SQL transpiler on PostgreSQL

Fig. 9. Execution times for the complex read queries 1–12 of the LDBC SNB Interactive workload (hand-coded PostgreSQLqueries vs. transpiled ones)

Example. Edges typed LIKES and their end points

MATCH (p)-[l:LIKES]->(m:Message)RETURN m.content, l.creationDate

𝜋m.content,l.creationDate ⌊ ◯LIKES

ÐÐÐÐÐÐÐÐÐ→l,l.creationDate

◯Message

p m,m.content}

)⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊]⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊⌊)q0−q2

1 WITH q0 AS ( -- GetEdges2 SELECT3 make_vertex_id('person', edgeTable."personId") AS "p",4 make_edge_id('LIKES', edgeTable."personId", edgeTable."postId") AS "l",5 make_vertex_id('post', edgeTable."postId") AS "m",6 toTable."content" AS "m.content",7 edgeTable."creationDate" AS "l.creationDate"8 FROM "person_LIKES_post" edgeTable9 JOIN "post" toTable ON (edgeTable."postId" = toTable."id")),

10 q1 AS ( /* GetEdges: (p:Person)-[l:LIKES]->(m:Comment)11 attributes: m.content, l.creationDate */ ),12 q2 AS ( /* UnionAll: q0 ∪ q1 */ )13 -- Projection14 SELECT "m.content", "l.creationDate" FROM q2

3 EVALUATIONFigure 9 and 10 show preliminary benchmark results.

Manuscript submitted to ACM

Page 11: Translating openCypher Queries to SQL

Translating openCypher Queries to SQL 11

2 5 8 11 12

1 3 10 1 3 10 1 3 10 1 3 10 1 3 100.1

0.3

0.71

3

710

30

70100

0.003

0.0070.01

0.03

0.070.1

0.3

0.71

3

710

0.00070.001

0.0030.007

0.01

0.030.070.1

0.30.7

1

37

10

3070

100

300

0.070.1

0.3

0.71

3

710

30

70100

300

0.03

0.070.1

0.3

0.71

3

710

30

70100

Scale factor [#nodes/edges − SF1: 3/17M, SF3: 9/52M, SF10: 30/177M]

Exe

cutio

n tim

e [s

]

● PostgreSQL Cypher−to−SQL transpiler on PostgreSQL

Fig. 10. Individual query execution times for the complex read queries of the LDBC SNB Interactive workload (hand-codedPostgreSQL queries vs. transpiled ones)

REFERENCES[1] Nadime Francis et al. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD Conference. ACM,

1433–1445.[2] József Marton, Gábor Szárnyas, and Dániel Varró. 2017. Formalising openCypher Graph Queries in Relational Algebra. In

ADBIS (Lecture Notes in Computer Science), Vol. 10509. Springer, 182–196. https://doi.org/10.1007/978-3-319-66917-5_13[3] Benjamin A. Steer, Alhamza Alnaimi, Marco A. B. F. G. Lotz, Félix Cuadrado, Luis M. Vaquero, and Joan Varvenne. 2017.

Cytosm: Declarative Property Graph Queries Without Data Migration. In GRADES@SIGMOD. ACM, 4:1–4:6.[4] Gábor Szárnyas, József Marton, János Maginecz, and Dániel Varró. 2018. Reducing Property Graph Queries to Relational

Algebra for Incremental View Maintenance. CoRR abs/1806.07344 (2018). arXiv:1806.07344 http://arxiv.org/abs/1806.07344

Manuscript submitted to ACM