34
Bringing SQL to the OpenStack World Apache Tajo on Swift Jihoon Son Apache Tajo PMC member

Apache Tajo on Swift: Bringing SQL to the OpenStack World

Embed Size (px)

Citation preview

Page 1: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Bringing SQL to the OpenStack World

Apache Tajo on Swift

Jihoon SonApache Tajo PMC member

Page 2: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Who am I

● Jihoon Son○ Ph.D candidate (Computer Science & Engineering,

2010.3 ~) ○ Tajo project co-founder (2010)○ Apache Tajo PMC and Committer (2014.5.1 ~)

● Contacts○ Email: jihoonson AT apache.org○ LinkedIn: https://www.linkedin.com/in/jihoonson

Page 3: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Outline

● OpenStack Swift● Apache Tajo● Tajo on Swift● Location-aware Computing of Tajo● Brief Evaluation Results● Roadmap

Page 4: Apache Tajo on Swift: Bringing SQL to the OpenStack World

OpenStack Swift

● Popular object storage○ Images, videos, logs, ...

● Enterprises store objects on Swift to provide their services○ Usually private clusters

Page 5: Apache Tajo on Swift: Bringing SQL to the OpenStack World

SQL on Swift

● Data analysis is important to improve the quality of their services○ SQL is one of the most powerful and popular query

languages● Many enterprise data analysis tools relying on

SQL○ OLAP, visualization, data mining, …

● Need for using SQL on Swift

Page 6: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Apache Tajo

● Scalable, efficient, and fault-tolerant data warehouse system○ Support SQL standards compliance ○ Efficient batch execution and interactive ad-hoc

analysis■ Low latency and high throughput■ No use of MapReduce

○ No single point of failure

Page 7: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Apache Tajo

Pluggable Storage Layer

...

MasterMasterTajoMaster

TajoWorker

TajoWorker

TajoWorker

TajoWorker

...

Page 8: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Apache Tajo

Tajo cluster

...

Page 9: Apache Tajo on Swift: Bringing SQL to the OpenStack World

● Activity summary

Apache Tajo

● Active open source project○ Apache incubating

project (2013)○ Apache Top-Level

Project (2014)○ 18 committers + 16

contributors

Page 10: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Tajo on Swift

...Swift

Tajo cluster

Page 11: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Tajo on Swift

● No need to modify code of Tajo and Swift○ Tajo can access Swift with the Hadoop-openstack

library■ But, doesn’t need to install or run Hadoop

○ Just use it

Swift

Network

Page 12: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Tajo on Swift

● Focusing on private clusters○ Tajo and Swift may share the same cluster○ Able to optimize the cluster configuration and data

deployment● More efficient data analysis!

○ By reducing the overhead of bottlenecks

Page 13: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Data Locality Problem

● Network can be usually the bottleneck

Worker

StorageNode

Interconnection Network

Node A

Worker

Node B

StorageNode

SignificantNetwork

Overhead

Page 14: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Data Locality Problem

● Network can be usually the bottleneck

Worker

StorageNode

Interconnection Network

Node A

Worker

Node B

StorageNode

SignificantNetwork

Overhead

Page 15: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Data Locality Problem

● Important to reduce data transmission over the network

Worker

StorageNode

Interconnection Network

Node A

Worker

Node B

StorageNode

Page 16: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Location-aware Computing

● Moving the processing close to the data○ Avoiding the performance degradation due to the

data transfer over the network

Master

Worker Worker

A B

Read B Read A

< Moving data to tasks >

Master

Worker Worker

A B

Read A Read B

< Moving tasks to data >

Page 17: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Location-aware Computing

● Three cases of user data deployment○ Case 1: small objects without segmentation○ Case 2: large objects with segmentation○ Case 3: large objects without segmentation

Page 18: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 1: Small Objects W/O Segmentation

● Each task processes an object○ The object size should be sufficiently small to be

processed by a taskStorage nodeWorker

Task

Task

... ...

Page 19: Apache Tajo on Swift: Bringing SQL to the OpenStack World

StorageNode

Case 1: Small Objects W/O Segmentation

1) Getting object locations from the ring

QueryMaster

MasterMasterProxyServer

Get object locations

StorageNode

StorageNode

Page 20: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 1: Small Objects W/O Segmentation

2) Assigning tasks based on object locationsQueryMaster

Worker Worker Worker ...

StorageNode

StorageNode

StorageNode

...

Assign tasks close to objects

Directly read object data

Page 21: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 2: Large Objects with Segmentation

● Each task processes a segment○ The segment size should be sufficiently small to be

processed by a taskStorage nodeWorker

Task

Task

... ...

Page 22: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 2: Large Objects with Segmentation

1) Getting segments of the given objects

QueryMaster

MasterMasterProxyServer

Get segments of the input objects

Page 23: Apache Tajo on Swift: Bringing SQL to the OpenStack World

StorageNode

Case 2: Large Objects with Segmentation

2) Getting segment locations from the ring

QueryMaster

MasterMasterProxyServer

Get segment locations

StorageNode

StorageNode

Page 24: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 2: Large Objects with Segmentation

3) Assigning tasks based on segment locationsQueryMaster

Worker Worker Worker ...

StorageNode

StorageNode

StorageNode

...

Assign tasks close to segments

Directly read segments

Page 25: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 3: Large Objects W/O Segmentation

● Inevitable performance degradation due to the suboptimal data deployment○ Each Tajo worker processes large objects, which

causes coarse-grained load balancingWorker Worker Worker

Page 26: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Case 3: Large Objects W/O Segmentation

● Alternatively, Tajo will provide the data load feature○ Load objects into the Tajo’s optimized internal

storage

Load

Page 27: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Location-aware Computing

● Current status○ Support location-aware computing without

segmentation● Future support

○ Location-aware computing for segmented objects○ Data load into the Tajo’s storage

Page 28: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Brief Evaluations

● Performance comparison with on another distributed storage○ Swift VS Hadoop Distributed File System (HDFS)

● Scalability test of Swift

Page 29: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Brief Evaluations

● Cluster configuration○ 1 master + 8 slaves

■ Each worker is equipped with 1 disk○ Swift: 1 proxy server + 8 storage nodes○ HDFS: 1 namenode + 8 datanodes○ Tajo: 1 master + 8 workers

■ Each worker can process 2 tasks simultaneously

Page 30: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Brief Evaluations

● Data set○ Crawled twitter log

■ # of objects: 1014■ Avg size of objects: 70MB■ Total size: 69.5GB

● Query○ Full scan query

■ select * from twitter_log

Page 31: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Comparison with HDFS

● Scan time ● Schedule time

Page 32: Apache Tajo on Swift: Bringing SQL to the OpenStack World

● Result● Data

Scalability Test

Page 33: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Our Roadmap

● Storage layer specialized for Swift○ Location-aware computing for segmented objects○ Data load into the Tajo’s storage

● Block storage support○ Cinder and Ceph

● Provisioning Tajo clusters○ Sahara○ Heat, TOSCA

Page 34: Apache Tajo on Swift: Bringing SQL to the OpenStack World

Thanks!http://tajo.apache.org/