Upload
couchbase
View
386
Download
6
Embed Size (px)
Citation preview
BIG DATA QUERY LANDSCAPE – N1QL AND MORE
Yingyi Bu | Couchbase
©2015 Couchbase Inc. 2
About Myself
Sr. Software Engineer @ Couchbase Committer @ AsterixDB (Research Project under Apache Incubation) PhD Student @ UC Irvine N1QL [email protected]@buyingyi
©2015 Couchbase Inc. 3
Agenda
Introduction Operational Query Processing Analytical Query Processing Comparison and Unification Summary
Introduction
©2015 Couchbase Inc. 5
Research Projects
Introduction
NoSQL
SQL-on-HadoopETL
SQL++Unification
Connector
©2015 Couchbase Inc. 6
Language Unification Research SQL Backward Compatible Rich Data Model Configurable Semantics
System Unification Research A Single Language Interface Scale-out for Both Workloads Resource Scheduling Underneath
Introduction
SQL++
Operational Query Processing
©2015 Couchbase Inc. 8
ArrayList<URI> nodes = new ArrayList<URI>();
// Add one or more nodes of your clusternodes.add(URI.create("http://127.0.0.1:8091/pools"));
// Try to connect to the clientCouchbaseClient client = null;try { client = new CouchbaseClient(nodes, "default", "");} catch (Exception e) { System.err.println("Error connecting to Couchbase: " + e.getMessage()); System.exit(1);} // Put the key-value pair into Couchbase.client.set("hello", "couchbase!").get();
// Return the result and cast it to stringString result = (String) client.get("hello");System.out.println(result);
Operational Query Processing
PutGet
What If? JSON Filtering Flatten Group-by Aggregation Join Ordering
©2015 Couchbase Inc. 9
N1QL – SQL for NoSQL Nested Data Heterogeneous
Data Dynamic typing[ {
"beer-sample": { "brewery_id": "bro"
"abv": {"m1":1, "m2“:2},
"category": "North American Lager”,
"type": "beer" }
}, { "beer-sample": { "abv": 9.5, "brewery_id": "brouwerij"
} }]
SELECT category, type, abv.m1FROM `beer-sample`WHERE type = “beer”
[ { "category": "North American Lager", "type": "beer”, "m1": 1 }]
Standard SELECT pipeline Joins, subqueries, set
operators UNNEST and NEST
©2015 Couchbase Inc. 10
Cassandra
SQL-like query languageFeature N1QL
Cassandra
Lookup ✔ ✔Filtering ✔ ✔Ordering ✔ ✔Aggregation
✔ ✖
Join ✔ ✖Subqueries ✔ ✖Unnest ✔ ✖Schema-free
✔ ✖
SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING;
SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01')
©2015 Couchbase Inc. 11
MongoDB
JavaScript-like languageFeature N1QL
MongoDB
Lookup ✔ ✔Filtering ✔ ✔Ordering ✔ ✔Aggregation
✔ ✔
Join ✔ ✖Subqueries ✔ ✖Unnest ✔ ✔Schema-free
✔ ✔
db.sales.aggregate( [ { $group : { _id : { month: { $month: "$date" }, day:
{ $dayOfMonth: "$date" }, year: { $year: "$date" } }, totalPrice: { $sum: { $multiply: [ "$price",
"$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ])
db.users.find( { age: { $gt: 18 } }, { name: 1, address: 1 } ).limit(5)
Analytical Query Processing
©2015 Couchbase Inc. 13
Hive
INSERT OVERWRITE TABLE school_summary SELECT subq1.school, COUNT(1) FROM (SELECT a.status, b.school, b.gender FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.ds='2009-03-20' )) subq1GROUP BY subq1.school
ProjectProject
Scan (a)
FilterScan (b)
ReduceSink ReduceSink
Join
Group-by
FileSink
Scan
ReduceSink
Group-by
FileSink
M1
R1
M2
R2 More data types than SQL Hadoop or Tez as runtime
©2015 Couchbase Inc. 14
Impala
INSERT OVERWRITE TABLE school_summarySELECT subq1.school, COUNT(1) FROM (SELECT a.status, b.school, b.gender FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.ds='2009-03-20' )) subq1GROUP BY subq1.school
ProjectProject
Filter HDFS Scan (b)
Hash Join
HDFS Scan (a)
Pre-Agg
Merge-Agg
HDFS Write
ANSI SQL-92 HDFS/HBase as the
storage Native MPP execution
engine
©2015 Couchbase Inc. 15
Spark SQL
ctx = new HiveContext()users = ctx.table("users")young = users.where(users("age") < 21) println(young.count())
SELECT count(*) FROM userswhere age < 21
SQL DataFrames
SQL
DataFrames
Unresolved Logical Plan
Logical Plan
PhysicalPlans
SelectedPhysicalPlan
RD
Ds
Cost
M
odel
Catalog
©2015 Couchbase Inc. 16
Drill
ANSI SQL-92 Nested Data Schema
Inference
Centralized schema Static Managed by DBAs
Self-describing or schema-less Dynamic evolving Managed by applications Embedded in data CSV, JSON, Parquet, ORC
Comparison and Unification
©2015 Couchbase Inc. 18
Comparison and Unification
AsterixDB – System Unification Research Query language? Language Comparisons SQL++ – Language Unification Research N1QL and SQL++
SQL++
Unification
Research Projects
©2015 Couchbase Inc. 19
NoSQL data model with schema flexibility Declarative full-fledged query language (AQL) Partitioned native LSM-based storage Secondary index (B-Tree, R-Tree, and keyword
index) Single-row transaction Spatial/temporal data types External data (HDFS) access and indexing Native MPP query execution engine
AsterixDB (Apache incubator)
Operational
Analytical
©2015 Couchbase Inc. 20
Query Language?
SELECT subq1.school, COUNT(1) FROM (SELECT a.status, a.date, b.school, b.region FROM status_updates a JOIN profiles b ON (a.userid = b.userid AND a.date='2009-03-20' )) subq1GROUP BY subq1.school
Relational JSON Nested
tuples/collections Partial/missing
schema Heterogeneity Complex values
What If? Replace COUNT(1) with “(select * from subq1 order by date limit 3)”; “school” is not in the
schema of the “profiles” table
“school” is missing in some profiles;
“school” is a nested tuple.
©2015 Couchbase Inc. 21
Language Comparison: Data Model
SystemTop-level Values
Heterogeneity
Arrays Bags MapsNested Tuples
Primitive
Values
Hive Bags/Tuples ✖ ✔ ✖ P ✔ ✔Impala Bags/Tuples ✖ ✖ ✖ ✖ ✖ ✔Spark SQL
Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔
Drill Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔N1QL Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔Cassandra
Bags/Tuples ✖ P ✖ P ✖ ✔
MongoDB
Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔
AsterixDB
Any Values ✔ ✖ ✔ ✖ ✔ ✔
©2015 Couchbase Inc. 22
Language Comparison: Types
SystemDynamic
Type Check
Static Type
CheckAny Type
Open Type
Union Type
Optional
Hive ✖ ✔ ✖ ✖ ✖ ✖
Impala ✖ ✔ ✖ ✖ ✖ ✖
Spark SQL
✖ ✔ ✖ ✖ ✖ ✖
Drill ✖ ✔ ✖ ✖ ✖ ✖
N1QL ✔ ✖ – –
Cassandra
✖ ✔ ✖ ✖ ✖ ✖
MongoDB
✔ ✖ – –
AsterixDB
✔ ✔ ✔ ✔ ✖ ✔
©2015 Couchbase Inc. 23
Language Comparison: Path Navigation
SystemTuple Nav.
absent
Tuple Nav.
mismatch
Array Nav.
absent
Array Nav. mismatch
Map Nav.
absent
Map Nav.
mismatch
Hive error error null error null errorImpala error error -- -- -- --Spark SQL
error error error error null error
Drill error error error error null errorN1QL missing missing missing missing -- --Cassandra
error error -- -- -- --
MongoDB
missing missing -- -- -- --
AsterixDB
null error error error -- --
No Errors!
©2015 Couchbase Inc. 24
Language Comparison: SELECT Clause
System
Project Tuples with Non-scalar Subqueries
Project Tuples with
Nested Collections
Project Non-Tuples
Hive ✖ ✔ ✖Impala ✖ ✖ ✖Spark SQL ✖ ✔ ✖Drill ✖ ✔ ✖N1QL ✔ ✔ ✔Cassandra ✖ ✖ ✖MongoDB ✖ ✔ ✔AsterixDB ✔ ✔ ✔
©2015 Couchbase Inc. 25
Language Comparison: FROM Clause
SystemSubque
ryJoins
Inner Unnest
Outer Unnest
Ordinal Positions
Hive ✔ ✔ ✔ ✔ ✔Impala ✔ ✔ ✖ ✖ ✖Spark SQL
✔ ✔ ✖ ✖ ✖
Drill ✔ ✔ ✔ ✖ ✖N1QL ✔ ✔ ✔ ✔ ✖Cassandra
✖ ✖ ✖ ✖ ✖
MongoDB
✖ ✖ ✔ ✖ ✖
AsterixDB
✔ ✔ ✔ ✖ ✔
©2015 Couchbase Inc. 26
JSON data model INNER/OUTER FLATTEN CLAUSE Arbitrary subqueries in SELECT Configurable parameters for semantics
Path navigations Equality evaluations Collection coercions
SQL++ (The “++” Part)
Supported by N1QL!
Made consistent in N1QL!
©2015 Couchbase Inc. 27
SQL++ Configuration for N1QLConfigurati
onParameter Value Parameter Value
@path
tuple_nav.absent missing tuple_nav.type_mismatch
missing
array_nav.absent missing array_nav.type_mismatch
missing
map_nav.absent missing map_nav.type_mismatch
missing
@eq
complex yes type_mismatch false
null_eq_null null null_eq_value null
null_eq_missing missing missing_eq_missing missing
missing_eq_value missing null_and_missing missing
null_and_true null null_and_null null
missing_and_true missing missing_and_missing missing
SummaryN1QL in a Bigger Context
©2015 Couchbase Inc. 29
Operational Query Processing Rich Data Model SQL is BACK, but with EXTENSIONS!
Analytical Query Processing Rich Data Model is a MUST!
Unification The trend!
Summary
Thank you.Q & A