54
© 2014 MapR Technologies © 2014 MapR Technologies Getting Real With Hadoop Jim Scott, Director, Enterprise Strategy & Architecture @kingmesal #BigDataEverywhere #Chicago - October 1 st , 2014

Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

Embed Size (px)

Citation preview

Page 1: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies© 2014 MapR Technologies

Getting Real With Hadoop

Jim Scott, Director, Enterprise Strategy & Architecture

@kingmesal #BigDataEverywhere #Chicago - October 1st, 2014

Page 2: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 3: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 4: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 5: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 6: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 7: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 6

Can’t We All Just Get Along?

Page 8: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 7

We Have All Contributed…

Page 9: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

8

The Reality is Architecture Matters

Page 10: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 9

No NameNode architecture

MapReduce/YARN HA

NFS HA

Instant recovery

Rolling upgrades

HA is built in

• Distributed metadata can self-heal

• No practical limit on # of files

• Jobs are not impacted by failures

• Meet your data processing SLAs

• High throughput and resilience for NFS-based data

ingestion, import/export and multi-client access

• Files and tables are accessible within seconds of a node

failure or cluster restart

• Upgrade the software with no downtime

• No special configuration to enable HA

• All MapR customers operate with HA

High Availability (HA) Everywhere

Page 11: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 12: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 11

RDBMS Hammer

Page 13: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 12

Page 14: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 13

Hadoop Hammer

Page 15: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Data Everywhere!

Social Media

Messages

Audio

Sensors

Mobile Data

Email

Clickstream

Page 16: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Friends don’t let friends

run name nodes.

Page 17: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 16

Too Many Files!

Page 18: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Friends don’t let friends

run name nodes.

Page 19: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 18

Volumes

100K volumes are OK,

create as many as needed

Volumes dramatically simplify

management:

• Replication factor

• Scheduled mirroring

• Scheduled snapshots

• Data placement control

• User access and tracking

• Administrative permissions

/projects

/tahoe

/yosemite

/user

/msmith

/bjohnson

Page 20: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 19

MapR M7: The Best In-Hadoop Database

NoSQL Columnar Store

Apache HBase API

Integrated with Hadoop

HBase

JVM

HDFS

JVM

ext3/ext4

Disks

Other Distros

Tables/Files

Disks

The most scalable, enterprise-grade,

NoSQL database that supports online applications and analytics

MapR-DB

MapR Enterprise Database Edition (M7)

Page 21: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 20

Tradeoffs with Other NoSQL Solutions

24x7 applications with strong

data consistency

Reliability

Continuous low latency with

horizontal scaling

Performance

Easy day-to-day management

with minimal learning curve

Easy Administration

Page 22: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 21

Consistent, Low Read Latency

--- M7 Read Latency --- Others Read Latency

Page 23: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

MapR Integrates Security into HadoopMapR Integrates Security into Hadoop

Page 24: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 23

Hadoop Security

Authorization to ensure the right access to files and databases

Authentication for users and user-created job requests

Encryption to ensure user credentials and data are always secure

Integration with existing security infrastructure

Page 25: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 24

Fine-Grained Access Control

Full POSIX permissions on files and directories

ACLs on tables, column families and columns

ACLs on MapReduce jobs and queues

Administration ACLs on cluster and volumes

ACLs for Apache Hive, Apache Drill and Impala

Page 26: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 25

Seamless Integration with Direct Access NFS

• MapR is POSIX compliant

– Random reads/writes

– Simultaneous reading and writing to a file

– Compression is automatic and transparent

Page 27: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 26

Seamless Integration with Direct Access NFS

• MapR is POSIX compliant

– Random reads/writes

– Simultaneous reading and writing to a file

– Compression is automatic and transparent

• Industry-standard NFS interface (in

addition to HDFS API)

– Stream data into the cluster

– Leverage thousands of tools and

applications

– Easier to use non-Java programming

languages

– No need for most proprietary Hadoop

connectors

Page 28: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 27

Disaster Recovery: Mirroring

• Flexible– Choose the volumes/directories to mirror

– You don’t need to mirror the entire cluster

– Active/active

• Fast– No performance impact

– Block-level (8KB) deltas

– Automatic compression

Production

WAN

Production Research

Datacenter 1 Datacenter 2

WAN EC2

Page 29: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 28

Disaster Recovery: Mirroring

• Flexible– Choose the volumes/directories to mirror

– You don’t need to mirror the entire cluster

– Active/active

• Fast– No performance impact

– Block-level (8KB) deltas

– Automatic compression

• Safe– Point-in-time consistency

– End-to-end checksums

• Easy– Graceful handling of network issues

– No third-party software

– Takes less than two minutes to configure!Production

WAN

Production Research

Datacenter 1 Datacenter 2

WAN EC2

Page 30: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 29

99.999% uptime ✓ X

Instant recovery from failures ✓ X

Continuous low latency (no compactions) ✓ X

Zero administration

(no processes to manage, self-tuning)✓ X

Online data protection (snapshots, mirroring) ✓ X

Scalability (number of tables supported) Trillion Hundreds

MapR Advantages

MapR-DB Others

Page 31: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 30

Packages Supported by various distributionsMapR 4.0.1

(Sep 2014)

Cloudera 5.1.2

(Aug 2014)

Hortonworks 2.1.5

(Aug 2014)

Apache Versions

(Sep 12th, 2014)

Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1

Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2

Hive 0.12, 0.13 0.12 0.13 0.13

Tez 0.4 (Dev Preview Only) X 0.4 0.5

Pig 0.12 0.12 0.12 0.12

Cascading 2.1.6 X X 2.5

Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1

Interactive SQL Impala 1.2.3 1.4 X 1.4

Drill 0.5 X X 0.5

SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1

NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98

Phoenix X X 4.0.0 4.1.0

AsyncHBase 1.5 X X 1.5

Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA

Machine Learning and

Graph

Mahout 0.9 0.9 0.9 0.9

MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1

GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1

Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1

Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2

Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1

Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5

Flume 1.5.0 1.5.0 1.4.0 1.5.0

Knox X X 0.4 0.4

Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1

Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5

GUI, Configuration,

Monitoring

Management MCS CM Ambari Ambari

Hue 3.5 3.6 2.5.1 3.6

Red – lacking

Blue - leading

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html

Page 32: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Pick the

Right Tool for the Job

Page 33: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 32

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provisioning &

coordination

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MapReduce v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance

Tez*

Accumulo*

Hive

Impala

SparkSQL

Drill

SQL

Sentry* Oozie ZooKeeperSqoop

Knox* WhirrFalcon*Flume

Data Integration& Access

HttpFS

Hue

* Certification/support planned for 2014

Man

ag

em

ent

MapR Data Platform

MapR Distribution for Apache Hadoop

Page 34: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 33

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provisioning &

coordination

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MapReduce v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance

Tez*

Accumulo*

Hive

Impala

SparkSQL

Drill

SQL

Sentry* Oozie ZooKeeperSqoop

Knox* WhirrFalcon*Flume

Data Integration& Access

HttpFS

Hue

* Certification/support planned for 2014

NFS HDFS API JSON APIHBase API

Map

R C

on

tro

l S

yste

m(M

an

ag

em

ent a

nd

Mo

nito

rin

g)

GU

IR

ES

T A

PI

CLI

MapR Distribution for Apache Hadoop

Page 35: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

1.65TBWITH 298 SERVERS

Page 36: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 35

1/7th the Hardware Footprint

Page 37: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 36

Forrester Wave™: Big Data Hadoop Solutions, Q1‘14February 2014 “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014”

Page 38: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Page 39: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 38

• Pioneering Data Agility for Hadoop

• Apache open source project

• Scale-out execution engine for low-latency queries

• Unified SQL-based API for analytics & operational applications

APACHE DRILL

40+ contributors150+ years of experience building

databases and distributed systems

Page 40: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 39

Drill Supports Schema Discovery On-The-Fly

• Fixed schema

• Leverage schema in centralized

repository (Hive Metastore)

• Fixed schema, evolving schema or

schema-less

• Leverage schema in centralized

repository or self-describing data

2Schema Discovered On-The-FlySchema Declared In Advance

SCHEMA ON WRITE

SCHEMA BEFORE READ

SCHEMA ON THE FLY

Schema Declared In Advance Schema Discovered On-The-Fly

Page 41: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 40© 2014 MapR Technologies

Operational Analytics

Page 42: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 41

Must Be Able to Scale

Page 43: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 42

Mobile

application server

OperationalReal-time and

Actionable

AnalyticsHadoop (MapR M7)

• User profiles and state

• User interactions

• Real-time location data

• Web and mobile session state

• Comments/rankings

Web

application server

Customer 360

dashboard

Churn analysis (predictive analytics)

Product/service

optimization and

personalization

Real-time ad

targeting

Data exploration

(SQL)

Page 44: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 43

General Application Monitoring

Page 45: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 44

Hard Drive Failure Rates

Page 46: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 45

Recommendation Engines

Page 47: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 46

20MSONGS

Media Content Recommendation Engine

Page 48: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

Fraud Detection

Page 49: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 48

104MCARD MEMBERS

Offer Serving, Credit Risk & Fraud

More than $600B+

Page 50: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 49PEOPLE

100MData Points

per second

Fastest Data Ingest Rates

Page 51: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 50

Speed and Intelligence…

Page 52: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 51

Forrester Wave™: NoSQL Key-Value Databases, Q3‘14September 2014 “The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014”

Page 53: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies 52

MapR Editions

Control System NFS Access Performance Unlimited Nodes Free

All the Features of M5 Simplified Administration

for HBase Increased Performance Consistent Low Latency Unified Snapshots,

Mirroring

Control System NFS Access Performance High Availability Snapshots & Mirroring 24 X 7 Support Annual Subscription

Fastest On-Ramp:

MapR Sandbox for Hadoop

Page 54: Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)

© 2014 MapR Technologies

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies