28
The BI on Hadoop Benchmark Bay Area Big Data Meetup – March 2016 www.atscale.com

The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

The BI on Hadoop Benchmark

Bay Area Big Data Meetup – March 2016

www.atscale.com

Page 2: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

2© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Agenda

❑Market Context

❑AtScale Overview

❑Benchmark Setup

❑Results!!

❑Lessons Learned

❑Wrap Up & Q & A

Page 3: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

3© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Market  Context

Page 4: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

4© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

But  Seriously…

Page 5: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

5© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Hadoop  Use  Cases

Yesterday Today

Page 6: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

6© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Agenda

❑Market Context

❑AtScale Overview

❑Benchmark Setup

❑Results!!

❑Lessons Learned

❑Wrap Up & Q & A

Page 7: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

7© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

What  AtScale  Does

I.T. needsControl & Consistency

The Business needsFreedom & Self-Service

The Business Interface for Hadoop

Page 8: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

8© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

How  We  Do  It

❑Any BI tool

❑ Industry standards

❑Schema on demand

❑Write once

Page 9: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

9© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Demo  Time

Design                                    Center

Designers

The  AtscaleVirtual  Cube

On-­DemandAggregate  Engine

Your  Business  Team

Page 10: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

10© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Agenda

❑Market Context

❑AtScale Overview

❑Benchmark Setup

❑Results!!

❑Lessons Learned

❑Wrap Up & Q & A

Page 11: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

11© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Ingredients

RAM per node 128G

CPU specs data (worker) nodes 32 CPU cores

Storage specs data (worker) nodes 2x 512mb SSD

12 node cluster with:• 1 master node• 1 AtScale gateway node• 10 data nodes

1. Hadoop Cluster

Page 12: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

12© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Ingredients

Version: 1.6-SNAPSHOT

Hive Version:1.2

File Format: Parquet

Workers: 70

Memory per worker: 14G

Cores per worker: 4

Version: 2.3

Hive Version:1.2

File Format: Parquet

Workers: 10

Memory per worker: 110G

Tez Version: 0.7

Hive Version: 1.2

File Format: ORC

hive.tez.container.size: 4096mb

hive.cbo.enabled: true

hive.auto.convert.join.noconditionaltask.size:

3036549120

2. SQL-on-Hadoop Engines

Page 13: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

13© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Ingredients

3. Benchmark Data Set

Table Name Number of Rows

CUSTOMER 1 Billion

LINEORDER 6 Billion

SUPPLIER 2 Million

PART 2 Million

DATE 16 Thousand

Star-Schema Benchmark (SSB)

Page 14: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

14© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Ingredients

4. Benchmark Queries

Query ID Joins Largest Join Table Group Bys Filters Comments

Q1.1 1 16,799 0 31 range condition, 1 comparative filter condition in fact table

Q1.2 1 16,799 0 32 range filter conditions directly on LINEORDER table

Q1.3 1 16,799 0 42 range filter conditions directly on fact, 2 conditions on joined table

Q2.1 3 2,000,000 2 2 filter on p_category (less selective)

Q2.2 3 2,000,000 2 2filter on p_brand, 2 values (more selective)

Q2.3 3 2,000,000 2 2filter on p_brand, 1 value (most selective)

Q3.1 3 1,050,000,000 3 3 filter on region (less selective)

Q3.2 3 1,050,000,000 3 3 filter on nation (more selective)

Q3.3 3 1,050,000,000 3 3 filter on city (most selective)

Q3.4 3 1,050,000,000 3 3filter on city (most selective) and month (vs. year)

Q4.1 4 1,050,000,000 2 2

Q4.2 4 1,050,000,000 3 3includes filter on year (more selective)

Q4.3 4 1,050,000,000 3 3includes filter on year and nation (most selective)

Page 15: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

15© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Ingredients

5. Real Bearded Wizard

Page 16: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

16© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Agenda

❑Market Context

❑AtScale Overview

❑Benchmark Setup

❑Results!!

❑Lessons Learned

❑Wrap Up & Q & A

Page 17: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

17© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Framework

❑Performs on Big Data

❑Fast on Small Data

❑Stable for Many Users

Page 18: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

18© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Performs  on  Big  Data:  6B  Rows

Page 19: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

19© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Fast  on  Small:  Adaptive  Cache

Page 20: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

20© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Stable  for  Many:  Concurrency

Page 21: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

21© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Agenda

❑Market Context

❑AtScale Overview

❑Benchmark Setup

❑Results!!

❑Lessons Learned

❑Wrap Up & Q & A

Page 22: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

22© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Data  Formats  &  Partitioning

❑ORC for Hive, because the majority of Hive's speed-ups (vectorization, CBO etc) only work on ORC tables

❑Parquet for Impala and Spark - majority of performance work for these engines are done for parquet

❑The tables contained no partitioning to achieve a true test of performance against large data sets

Page 23: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

23© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Impala  tuning

❑Impala required the least amount of tuning, We configured it so it would use the same amount of Memory as the other engines.

❑For the queries, – Changed the formatting so they would run on Impala.– Changed the ordering of joins for queries, this change showed a

10-20% performance increase on a few queries.

Page 24: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

24© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Spark  SQL  tuning

❑ 14G per memory, 3 Cores and 70 workers - best combo

❑ spark.sql.autoBroadcastJoinThreshold is your friend! The dims were all small enough that we could make sure all queries were broadcast joins.

❑ Changing the join order for the queries yields ~5% performance increase.

Page 25: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

25© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Hive  1.2.1  Tuning❑ Required a different setup for each concurrent query test (to set

hive.server2.tez.sessions.per.default.queue,hive.server2.tez.default.queues) to better support different concurrency levels.

❑ Like Spark, hive.auto.convert.join.noconditionaltask.size is your friend, we

set it high enough 3,036,549,120 in our case so that all our queries would

run as broadcast joins.

❑ The exceptions were queries Q4-1 to Q4-3, These we had to force to be

sort-joins due to the GC pressure the broadcast join caused.

❑ Unlike Spark and Impala we did not have to change the join order as Hive

CBO did that automatically.

Page 26: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

26© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Agenda

❑Market Context

❑AtScale Overview

❑Benchmark Setup

❑Results!!

❑Lessons Learned

❑Wrap Up & Q & A

Page 27: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

27© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Benchmark  Key  Findings❑ No outright winner - different engines have different sweet spots

❑ SparkSQL and Impala are better options for “Small Data” queries

❑ Impala is the clear winner as concurrency increases, though all engines scaled linearly

❑ Nobody is standing still, and there is plenty more to do!

Page 28: The BI on Hadoop Benchmark - Meetupfiles.meetup.com/5717572/BI-on-Hadoop Benchmark Meetup...© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY 2 Agenda Market Context

28© 2015 ATSCALE, INC. ALL RIGHTS RESERVED. CONFIDENTIAL & PROPRIETARY

Next  time.

❑ Latest Engines: Spark 2.0, Impala 2.5, Hive 2.0 with LLAP

❑ New Engines: Drill, Presto, Hawq..

❑ New queries, Analytics, Window functions

❑ Data model variations (embedded dimension as maps, etc)