34
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015

Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Drawing the Big Picture Multi-Platform Data Architectures,

Queries, and Analytics

Philip Russom TDWI Research Director for Data Management

August 26, 2015

Page 2: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

2

Sponsor

Page 3: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

3

Speakers

Imad Birouty Director, Technical Product

Marketing, Teradata

Philip Russom TDWI Research Director,

Data Management

Page 4: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Agenda • The Mission

– Queries, analytics, and other BI that reach multiple warehouse and data platforms simultaneously

• Enabling Technologies

– Modern data warehouse environments (DWEs)

– Single-console tools

– Data exploration and discovery

– Standard SQL, but extended

– Grid, fabric, virtualization, logical DW…

• Benefits of the single big picture

– New ways to view data and develop queries or analytics

– Simplification for architecture, governance, stewardship, compliance, auditing, security...

• Recommendations

PLEASE TWEET

@pRussom, @Teradata,

#TDWI, #Analytics, #Big Data

Page 5: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

The Mission Redux

• Today’s BI/DW/analytics demands:

– As much data as possible

– From more sources and source types

– In many structures or structure free

– Persisted on old and new data platform types

– Virtualized, as appropriate

– All the above, available all the time, for everyone

• We’ve always aspired toward these goals:

– But success is more likely today, because we have better

software, hardware, skills, best practices…

– We also have better executive support

• Organizations want more business value from big data, new data, analytics,

new data-driven business programs…

Page 6: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Enablers for the Revised Mission • New tool types and functions,

plus their disciplines & practices

– Data exploration and data discovery

– More agile data preparation

– Data visualization – ease of use, analytics,

fun & compelling presentations, story telling…

• New data platforms

– Hadoop, whether open source or vendor distro

– MPP RDBMSs, appliances & columnar

• Old skills and technologies, too

– SQL & other relational techs are as important as ever

• All the above, integrated and interoperable

– Single console – or as few tools as possible

– Single access & query method – SQL, but for any data, platform

– Data architecture – to integrate the back end

Page 7: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

DEFINITION

Multi-Platform Data

Warehouse Environments

• Many enterprise data warehouses (EDWs) are evolving into

multi-platform data warehouse environments (DWEs).

• Users continue to add additional standalone data platforms to

their warehouse tool and platform portfolio.

• The new platforms don’t replace the core warehouse, because

it is still the best platform for the data that goes into standards

reports, dashboards, performance management, and OLAP.

• Instead, the new platforms complement the warehouse,

because they are optimized for workloads that manage,

process, and analyze new forms of big data, non-structured

data, and real-time data.

Page 8: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Modern DW Architectures are Complex • Tech stack for DW, BI, DI, & analytics has always been multi-platform environ.

• What’s new? The trend toward a portfolio of many physical data platforms has accelerated. Logical architecture that integrates them is very important.

• Why do it? More platform types to serve more types of users, data & workloads.

Complex,

Event

Processing

Streaming

Data

Tools

Analytic

Sand

Box

Data

Federation

& Virtuali-

zation

No-SQL

Database

Hadoop

Distributed

File Sys

Map

Reduce

No-SQL

Database

Hadoop

Distributed

File Sys

Star or

Snowflake

Scheme

Data

Warehouse

Federated

Data

Marts

Customer

Mart or

ODS

Metrics for

Performance

Mgt

Multi-

dimensional

Data Models

Federated

Data

Marts

Federated

Data

Marts

Customer

Mart or

ODS

Real

Time

ODS

Data

Staging

Areas

OLAP

Cubes

Detailed

Source

Data

Data

Staging

Areas

Data

Staging

Areas

Detailed

Source

Data

Detailed

Source

Data

OLAP

DBMSs

DW from a

Merger

Over The Passage of Time

DW

Appliance

Columnar

DBMS Columnar

DBMS

DW

Appliances

Cloud-

based

DBMSs

Logical Data Warehouse

It’s a logical and/or virtual layer of the DW

architecture that complements the

physical layer of architecture under it.

Page 9: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

DEFINITIONS OF THE

Logical Data

Warehouse • TDWI: A Data Warehouse is user-defined data architecture

– The architecture & its design components must be populated by data

– But the data can be physical, logical/virtual, or both

– So, most DW architectures have two key layers: physical & logical

• Gartner’s view: A Logical DW depends on virtual tech

– From simple federation to object-oriented virtualization, plus virtual

views, indices, semantics, server memory…

• Building out the Logical Layer of your DW is important

– The logical layer enables cross-platform integration and

interoperability, for broad queries, exploration, analytics…

Page 10: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

• The LDW layer provides a unified view (or a collection of views) of data in multiple platforms – Plus a simplified (yet diverse & high-performance)

collection of interfaces into such sources and targets to achieve interoperability, especially for queries

• The point of the LDW layer is to provide – A fairly comprehensive big picture of data in the DWE

– A single layer through which data can be accessed, thereby reducing data redundancy, movement, processing

– A simplified view & related mechanisms that enable more user types

• Similar Concepts: – Virtual DW (LDW is often partially virtual, but mostly physical)

– Real-Time DW, Operational DW, Active DW, Dynamic DW

– Query Grid, Data Grid, Data Fabric

DEFINITIONS OF THE

Logical Data

Warehouse (LDW)

Page 11: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

NEW ARCHITECTURES

Hadoop integrated with a Relational DBMS The strengths of one balance the weaknesses of the other

• A Relational DBMS is good at:

– Metadata management

– Complex query optimization

– Table joins, views, keys, etc.

– Security, including roles, directories

• HDFS & other Hadoop tools are good at:

– Massive, linear scalability

– Multi-structured & no-schema data

– Some ETL and ELT functions

– Custom code for algorithmic analytics

• Other platforms are also being tightly integrated w/relational DW – Analytic DBMSs based on columnar, appliance, MapReduce, graph

• To make this integration of diverse data platforms practical – Good design by users for the logical DW architectural layer

– Vendor tools that can reach all the above and more from one query

Page 12: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Importance of Data Exploration • Exploring data is a first step to leveraging new data

– Never allow new data into a DW without proper vetting

– Assess value & use cases for new (big) data via exploration

• Exploring data is a prerequisite to analyzing data – By its natural, analysis makes correlations across data of

diverse sources, structures, subjects, and vintages

– Finding just the right combination for successful analysis depends on data exploration as a first step

• High ease of use for user productivity – Some users are biz people who need biz friendly view

– Ease of use accelerates developers’ productivity, too

• Support for all data platforms, from relational to Hadoop – A modern data exploration tool will merge diverse data via a

single complex query

• A data exploration tool must do more than exploration – Profile data to understand its content and condition

– Extract data, model the result set, index big data

– Deduce data’s structure and develop metadata

– Perform tasks as you go, not ahead of time, for greater agility

Page 13: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

ITERATIVE, FOUR-STEP PROCESS FOR

Exploratory Analytics with New (Big) Data

Visualize Explore

Analyze Data Prep

Page 14: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

A FEW REQUIREMENTS FOR

Advanced Analytics • Market direction: Seamless integration

– In one tool environment, exploration, data prep, analysis, visualization, and more

– The iterative, four-step process of exploratory analytics demands tight tool integration

• Advanced forms of analytics

– Mining, predictive, statistics, NLP (not OLAP)

– Algorithmic, as well as query based

• Both canned and home-grown algorithms

– Tool should include library of pre-built algorithms

– Tool should also help you write your own

• High ease-of-use for broad collaboration

– Functions for both technical and business users

– Both develop analytic apps and consume them

– Assume that many user types will share their work

Visualize Explore

Analyze Data Prep

ITERATIVE,

FOUR-STEP

PROCESS

Page 15: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

SQL is More Important than Ever

• Data professionals want and depend on SQL

– It must be ANSI standard, high performance, iterative, optimized

– Why? To leverage user skills and SQL-based tool portfolios

• SQL on Hadoop versus SQL off Hadoop argument

– Users interviewed want BOTH !

– In survey, SQL on Hadoop is a “must have” (69%)

– Only 4% don’t need SQL on Hadoop

Source: TDWI survey run in late 2014.

Based 99 respondents.

Page 16: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

SQL-Based Analytics • Data Exploration = Ad-hoc queries on steroids

– A query grows in size, scope, and complexity

with each iteration

• KLOCs = Thousands of Lines of [SQL] Code

– Whether tool-generated, hand-written, or both

• Complex SQL expresses many things

– Data access via many interfaces, near real time

– Data models, even dimensional ones

– Multi-way joins, but also complex transformations

• Growing number and diversity of users

– Data analysts, data scientists, BI/DW pros,

business analysts

• All the above demand a hefty tool environ’t

– As described on the next slide…

Page 17: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

SUMMARY & CONCLUSION: TOOLS AND REQUIREMENTS FOR

Logical Data Warehousing and Other

Complex Data Ecosystems

• Look for tools and environments that enable:

– Designing and architecting a “big picture”

– Interoperability among diverse systems and data types

– Data operations optimized across multiple platforms

– ANSI SQL support; performance for iterative queries

• Features that help with complex data architectures:

– Distributed queries, in the extreme

– High performance, even with multiple platforms

– Metadata management and metadata deduction

– Easy ingestion of new data, whether streaming or static

– Real-time indexing, to keep pace with data ingestion

– Single-sign-on security, despite multiple systems

Page 18: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

RECOMMENDATIONS

Draw the Big Picture for its Benefits

• Benefits of the unified big picture of data.

– New ways to view data & develop queries & analytics

– Simplification for data architecture, governance,

stewardship, compliance, auditing, security...

• Revisit your mission as a data professional

– Tons of data, sources, and source types, in many

structures (or structure free) persisted on old and

new data platform types (virtualized, as appropriate)

– All the above, available all the time, for everyone

• Satisfy new requirements with tools/platforms that provide unified view

– Virtual DW and miscellaneous approaches to Real-Time DW

– Query Grid, Data Grid, Data Fabric

– Special functions: Hadoop, exploration, SQL-based analytics…

Page 19: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Teradata QueryGrid™

Imad Birouty Director, Teradata Product Marketing

Page 20: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

20

DA

TA M

AR

T

1990’s

Just Give Me

Some Data and Fast!

EDW

/ID

W

2000’s

Give Me

Good Data But Do It Efficiently!

LOG

ICA

L D

ATA

WA

REH

OU

SE

2010’s

Give Me

All Data Fast, Simple &

Effectively!

Page 21: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

21

What’s Different Today? There Is No Single Technology That Can Do Everything

Higher volume of data

New sources of data

New types of data

New technologies

New economic models

Increased prevalence of analytics

Page 22: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

22

What’s The Same Today?

• Users need access to all relevant data to make informed business decisions

• Users need timely access to data when they need it

• User skills and tools

Page 23: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

23

Shift from a Single Platform to an Ecosystem

“We will abandon the old

models based on the

desire to implement for

high-value analytic

applications.”

"Logical" Data Warehouse

Page 24: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

24

Not All Data Should Be Treated Equally

• Data of different value – High value density ERP, CRM,…

– Low value density Sensors, weblogs, social,…

• Different processing techniques required – Structured data SQL

– Multi-structured data SQL, NoSQL

• Different integration requirements – Pre-define schema and integrated upon data acquisition (schema-

on-write)

– Define schema during query runtime (schema-on-read)

Regardless….data and analytics should be accessible

Page 25: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

25

• Pick Your Best-of Breed

Technology:

– Data types

– Analytic engines

– Economic options

• Run the right analytic on

the right platform:

– Minimize data movement, process data where it resides

– Minimize data duplication

– Optimized work distribution through “push-down” processing

– Bi-directional data movement

Data Fabric Enabled by QueryGrid™ Analytic Flexibility to meet your business needs

Users direct their queries to a cohesive data fabric using existing SQL skills & tools

Focus on data and business questions, not integrating separate systems

Page 26: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

Teradata QueryGrid™ Demo

Page 31: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

31

For all cars that received warranty repair, find the reported Diagnostic Trouble Code – Requires data from Hadoop and Teradata data warehouse

– Query passed through, data not persisted

Multi-System Query

TERADATA

PRODUCTION

DATA

•VINs

• Service records

•Warranty data

•DTC descriptions

HADOOP

RAW MULTI-

STRUCTURED DATA

•Massive amounts of detailed sensor data

Teradata QueryGrid

Page 32: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

32 32

Page 33: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

33

Questions?

Page 34: Drawing the Big Picture - 1105 Mediadownload.1105media.com/pub/tdwi/Files/082615Teradata.pdf · 2015-08-25 · –Requires data from Hadoop and Teradata data warehouse –Query passed

34

Contact Information

If you have further questions or comments: Philip Russom, TDWI [email protected] Imad Birouty, Teradata

[email protected]