31
Dashboard Engine for Hadoop June 2015 Matt McDevitt Sr. Project Manager Pavan Challa Sr. Data Engineer Think Big Start Smart Scale Fast

Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

Dashboard Engine for Hadoop

June 2015

Matt McDevittSr. Project Manager

Pavan ChallaSr. Data Engineer

Think Big Start Smart Scale Fast

Page 2: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 2

Agenda

• Think Big Overview

• Engagement Model

• Solution Offerings

• Dashboard Engine

• Demo

• Q&A

2© 2015 Think Big, a Teradata Company

Page 3: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 3

3

© 2015 Think Big, a Teradata Company

Page 4: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 4

• Founded in 2010, acquired in 2014, International in 2015

• First and leading professional services firm exclusively focused on big data

• End to End Services: Strategy, Design, Implementation, IP/Software, Support and Managed Services

• Academy to scale delivery capability

• Extend and integrate open source with UDA

• Team-based delivery with Solution Center

• Growing quickly: we’re hiring!

Think Big Overview

Think Big

Founded 2010

4

PRESTO

© 2015 Think Big, a Teradata Company

Page 5: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 5

Think Big Engagement Model

5© 2015 Think Big, a Teradata Company

Page 6: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 6

Big Data

Program Mgt

Business

Analytics

Managed

Services

Data

Engineering

Think Big Analytics VELOCITY Methodology

• Solutions

• Planning and Design

• Prioritization

• Capability Backlog

• Grooming for engineering

• Engineering

• Sprint(s)

• Releases

• Quality Assurance & Test

• Managed Support

• Break Fix

• Sustaining Engineering

• New Models

• New Analytics

• New Insights

• New Data Requirements

• New Data

• Big Data Approach

• Use Cases

• Roadmap

• Data Science

• Discovery

• R&D

Big Data Lab

6© 2015 Think Big, a Teradata Company

Page 7: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 7

1. Big Data Strategy Roadmap

2. Data Lake Starter Program

3. Data Lake Optimization

4. Data Lake Managed Services

5. Presto for the Enterprise – new as of June 10, 2015

6. Big Data Managed Services

7. Think Big Academy

Think Big Solution Offerings

7

• Device Data Manufacturing Operations

• Omni-Channel Marketing Analytics

• Financial Services Fraud/Risk Analytics

• Healthcare personalization

Custom Analytics Solution Services

• Device Data Behavior Analytics

• IT Threat Detection

• Public Sector Risk Analysis

• Gaming Analytics

© 2015 Think Big, a Teradata Company

Page 8: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

MAKING BIG DATA COME ALIVEMAKING BIG DATA COME ALIVE

Data Lake Implementation

Page 9: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 9

Data Lake: Starter Program

− Stand up a Data Lake and build 3 governed batch data ingest streams

− Includes Services and Subscription Software Frameworks

Data Lake: Optimization

− Add governance to your Data Lake

− For Data Lakes not originally built by Think Big

Data Lake: Dashboard Engine Reporting

− Install and configure engine with Data Lake to build dashboard analytics for deep dimensional rollup reporting capabilities with Tableau on Hadoop

Data Lake: Security

− Data Security & InfoSec, Cluster Hardening, Perimeter, Connectivity

Data Lake: Managed Services

− Only for Data Lakes that Think Big Designs and Builds

− On Premise, Public Cloud (AWS) and Private Cloud (Teradata and Altiscale)

Data Lake Program Offers

9© 2015 Think Big, a Teradata Company

Page 10: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 10

Design Build & Test Integrate & Tune Assess, Mentor & Plan

• Collaborative workshops with

business groups

• Identification and prioritization

of high-value data streams

• Gap analysis

• Develop Ingest

workflows

• Install Metadata and

Info Security Services

• Prepare Cluster for

Integration test

• Install Ingest & System

Test

• Begin Profiling Data

• Learn about Information

Security and data wrangling

• Begin Building DL Reporting

• Final tuning, assessment and

next steps

Think Big Data Lake Starter Program(8 Week Engagement)

Develop & Unit

Testing

Data Stream

Prioritization

Info Security

Objectives

Data Profiling

and Capability

Follow-up

Roadmap

2 weeks 2 week 2 week 2 weeks

Executive

Presentation

Objective: Design, Develop and Deploy Data Lake Ingestion with Governance

Software

Component

Installation

Data

Sources

Organization &

Training

Cluster

configuration &

Integration

System

Integration

Testing

10© 2015 Think Big, a Teradata Company

Page 11: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 11

Enterprise Data Lake

Information Sources

Evaluate Source Data

Ingest

Collect & Manage

Metadata

ApplyStructure

Sequence

Compress

Automate

Protect

Prepare Data for Ingest

Prepare Source Metadata

Perimeter-Authentication-Authorization

InfoSecDownstream Applications

DashboardEngine

Think Big Enterprise Data Lake

© 2015 Think Big, a Teradata Company

Page 12: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 12

Data Lab

Data Repository

Security, Archival RainStor – System of Record,

Archive

Governed Ingestion

CDC

Buffer Server

Spark

Msg Queue

Kafka

Experimental Data

RawData

Processing DerivedViews

Loom – integrated Metadata, lineage,

WranglingMetadata Repository

Dashboard Engine

API

RealtimeProcessing

API

Discovery Zone

Statistics

Machine Learning

Graph

Analytics

12

© 2015 Think Big, a Teradata Company

Page 13: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 13

13

© 2015 Think Big, a Teradata Company

Page 14: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 14

Why a Dashboard Engine?

14

Events Hadoop

© 2015 Think Big, a Teradata Company

Page 15: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 15

• Near real-time analytics

• Easily scales to 100s of simulaneous users

• Query latency typically under 100 ms

• Deep dimensional drill-down

• Works with popular BI tools

− javascript, jquery

− Tableau

− others announced soon

ThinkBig Dashboard Engine Strengths

15© 2015 Think Big, a Teradata Company

Page 16: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 16

Using Tableau without Dashboard Engine

Hadoop

Middle

Tier Server

Extract

• Queryable data limited by

size of Server.

• Doesn’t scale as users grow.

16© 2015 Think Big, a Teradata Company

Page 17: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 17

• For the time the query is running, most or all of the cluster is dedicated to that one query.

− Has limitations if the cluster has other loads

− Has limitations for simultaneous dashboard users

• Low latencies possible only if all the event data is in RAM at query time.

Using Impala without Think Big Dashboard Engine

© 2015 Think Big, a Teradata Company 17

Page 18: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

18

Dash Board Engine Architecture

Page 19: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 19

• Uses the power of Apache Spark to pre-aggregate data

• Scales as event volume grows.

• Scales as number of users grows.

Think Big’s Dashboard Engine for Hadoop

API

© 2015 Think Big, a Teradata Company 19

Page 20: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 20

Store cube data

Arr

ivals

-s:C

A-2

014-0

1-0

4

Arr

ivals

-s:C

A-2

014-0

1-0

32053

1911

1965

14147

14158

14269

Arr

ivals

-a:S

FO

-s:C

A-2

014

-01-0

2

Arr

ivals

-a:S

FO

-s:C

A-2

014

-01-0

3

Arr

ivals

-a:S

FO

-s:C

A-2

014-0

1-0

4429

479

433

… …A

rriv

als

-s:C

A-2

014-0

1-0

2

Arr

ivals

-2014-0

1-0

2

Arr

ivals

-2014-0

1-0

3

Arr

ivals

-2014-0

1-0

4

© 2015 Think Big, a Teradata Company

Page 21: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 21

• Aggregate API that understands metrics, dimensions, time ranges.

• Relational API that understands (some) SQL.

API - Connecting to the Dashboard Engine

Aggregate API

SQL API

© 2015 Think Big, a Teradata Company 21

Page 22: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

22

Demo

Page 23: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 23

• Running on a 16-node cluster (TD Appliance for Hadoop)

• Process and store all data in ~ 2 hours

Flight Data Statistics for Demo

Rows Storage space

Flight records 160 million 30 GB

MOLAP cube 35 billion 2.1 TB

© 2015 Think Big, a Teradata Company 23

Page 24: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 24

• Sends SQL queries to the API

SQL Query to REST API Example

SELECT FlightData.Date AS "none_Date_ok",

FlightData.State AS "none_State_nk”,

SUM(FlightData.Arrivals) AS "sum_Arrivals_nk”

FROM "default"."FlightData" "FlightData"

GROUP BY "none_Date_ok” , "none_State_nk”

• Translated to Aggregate API queries

http://10.25.12.241:52080/clickstream/aggregate/v1/?

period=day&start=1970-01-01&dimension=State:&metric=Arrivals

© 2015 Think Big, a Teradata Company 24

Page 25: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 25

<index name="AirportsByState">

<periods>

<period>day</period>

</periods>

<indexDimensions>

<dimension name="State" />

</indexDimensions>

<listDimensions>

<dimension name="Airport" />

</listDimensions>

</index>

Example index: List all Airports for a specific State

© 2015 Think Big, a Teradata Company 25

Page 26: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 26

Aggregate use: Show arrivals for all airports for NY

© 2015 Think Big, a Teradata Company 26

http://10.25.12.241:52080/clickstream/aggregate/v1/?period=da

y&start=2014-01-04&end=2014-01-

05&dimension=Airport:&dimension=State:NY&metric=Arrivals&head

ers=on

Day Start Airport State Arrivals

2014-01-04 ALB NY 20

2014-01-04 ART NY 1

2014-01-04 BUF NY 40

...

2014-01-04 JFK NY 167

2014-01-04 LGA NY 206

2014-01-04 ROC NY 17

2014-01-04 SWF NY 2

2014-01-04 SYR NY 14

Page 27: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 27

<index name="ListFlightNoCarrierCityState">

<periods>

<period>day</period>

</periods>

<indexDimensions>

</indexDimensions>

<listDimensions>

<dimension name="State" />

<dimension name="City" />

<dimension name="Carrier" />

<dimension name="FlightNo" />

</listDimensions>

</index>

Index: List Flight No / Carrier / City / State combinations

© 2015 Think Big, a Teradata Company 27

Page 28: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 28

Dimensions use: Show all Flight/Carrier/City/State

© 2015 Think Big, a Teradata Company 28

http://10.25.12.241:52080/clickstream/dimensions/v1/?period

=day&start=2014-01-04&end=2014-01-

05&dimension=State:&dimension=City:&dimension=Carrier:&dime

nsion=FlightNo:

"results":[

["AK","Anchorage, AK","AS","101"],

["AK","Anchorage, AK","AS","102"],

["AK","Anchorage, AK","AS","103"],

["AK","Anchorage, AK","AS","106"],

["AK","Anchorage, AK","AS","108"],

...

["AL","Huntsville, AL","DL","1782"],

["AL","Huntsville, AL","DL","2077"],

...

["WY","Rock Springs, WY","OO","7413"]]

Page 29: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

CONFIDENTIAL | 29

<index name="ListFlightNoByCarrierState">

<periods>

<period>day</period>

</periods>

<indexDimensions>

<dimension name="State" />

<dimension name="Carrier" />

</indexDimensions>

<listDimensions>

<dimension name="FlightNo" />

</listDimensions>

</index>

Index Question

© 2015 Think Big, a Teradata Company 29

Q: Drill down to a list of flights that had caused delay in Colorado done by Delta?

A: Create the index below, rerun index creation step, query delay metrics forgiven state and carrier, while listing flight numbers dimension=FlightNo:

Page 30: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

30

Questions?

Page 31: Dashboard Engine for Hadoop · CONFIDENTIAL | 4 • Founded in 2010, acquired in 2014, International in 2015 • First and leading professional services firm exclusively focused on

DATA ANALYTICS

DATA ENGINEERS

DATA SOLUTIONS

Think Big International

We are hiring!!!

http://thinkbigcareers.teradata.com/