63
© 2014 MapR Technologies 1 ® © 2014 MapR Technologies Frans Thamura / Meruvian / [email protected] March 2014

Meruvian - Introduction to MapR

Embed Size (px)

DESCRIPTION

Meruvian - Introduction to MapR

Citation preview

Page 1: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 1

®

© 2014 MapR Technologies

Frans Thamura / Meruvian / [email protected] March 2014

Page 2: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 2

MapR Overview

BIG DATA

BEST PRODUCT

BUSINESS IMPACT

Hadoop Top Ranked

Production Success

Page 3: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 3 © 2014 MapR Technologies ®

3 Trends Forcing a revolution in enterprise architecture

Page 4: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 4

Industry Leaders Compete and Win with Data 1 TREND

More Data Beats Better Algorithms Collecting interaction data from ecommerce, social media, offline, and call centers enables a “customer 360 view” and consumer intimacy Competitive Advantage is Decided by 0.5% Consumer financial services: 1% improvement in fraud detection means hundreds of millions of dollars Advertising and retail: 0.5% improvement in lift means millions of dollars increase in profitability

Page 5: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 5

Big Data is Overwhelming Traditional Systems

•  Mission-critical reliability •  Transaction guarantees •  Deep security •  Real-time performance •  Backup and recovery

•  Interactive SQL •  Rich analytics •  Workload management •  Data governance •  Backup and recovery

Enterprise Data

Architecture

2 TREND

ENTERPRISE USERS

OPERATIONAL SYSTEMS

ANALYTICAL SYSTEMS

PRODUCTION REQUIREMENTS

PRODUCTION REQUIREMENTS

OUTSIDE SOURCES

Page 6: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 6

Hadoop: The Disruptive Technology at the Core of Big Data 3 TREND

JOB TRENDS FROM INDEED.COM

Inte

res

t O

ve

r T

ime

2 0 0 4 2 0 0 6 2 0 0 8 2 0 1 0 2 0 1 2 2 0 1 4

GOOGLE TRENDS

Page 7: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 7 © 2014 MapR Technologies ®

And 3 Realities

Page 8: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 8

OPERATIONAL SYSTEMS

ANALYTICAL SYSTEMS

ENTERPRISE USERS

1 REALITY

•  Data staging •  Archive

•  Data transformation •  Data exploration

•  Streaming, interactions

Hadoop Relieves the Pressure from Enterprise Systems

2 Interoperability

1 Reliability and DR

4 Supports operations and analytics

3 High performance

Keys for Production Success

Page 9: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 9

What Would Google Do?

2003 GFS

2004 Web index is batch (GFS/MapReduce)

2010 Web index is real-time

(BigTable)

The transition from batch to real-time

2004 MapReduce

2006 BigTable

The explosion in operational applications

Google’s operational data store (BigTable) has enabled multiple revolutions within the company:

(1)

(2)

2 REALITY

Page 10: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 10

Architecture Matters for Success 3 REALITY

FOUNDATION

Page 11: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 11

FOUNDATION

Architecture Matters for Success 3 REALITY

Data protection & security

High performance

Multi-tenancy

Operational & Analytical Workloads

Open standards for integration

NEW APPLICATIONS SLAs TRUSTED INFORMATION LOWER TCO

Page 12: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 12 © 2014 MapR Technologies ®

MapR: Architecture Matters

Page 13: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 13

104M CARD MEMBERS

Fortune 100 Financial Services Company

Page 14: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 14

Advertising Automation

Cloud!

Sellers Cloud!

Buyers!Cloud!

100B AD AUCTIONS

per day

Page 15: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 15

45M SHOPPERS

analyzed each month

Fortune 100 Retailer

Page 16: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 16

20M SONGS

Page 17: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 17

Largest Biometric Database in the World

PEOPLE

1.3B PEOPLE

Page 18: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 18

ENTERPRISE DATA HUB

MARKETING OPTIMIZATION

RISK & SECURITY OPTIMIZATION

OPERATIONAL INTELLIGENCE

• Multi-structured data staging & archive

• ETL / DW optimization • Mainframe optimization

• Data exploration

• Recommendation engines & targeting

• Customer 360 • Click-stream analysis • Social media analysis • Ad optimization

• Network security monitoring

• Security information & event management

• Fraudulent behavioral analysis

• Supply chain & logistics • System log analysis • Manufacturing quality assurance

• Preventative maintenance

• Smart meter analysis

Common Use Cases: Taking Advantage of Hadoop

Page 19: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 19

MapR is the Hadoop Technology Leader

BIG DATA HADOOP

Page 20: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 20

The Power of the Open Source Community M

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provisioning &

coordination

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MapReduce v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance Tez*

Accumulo*

Hive

Impala

Shark

Drill

SQL

Sentry* Oozie ZooKeeper Sqoop

Knox* Falcon* Flume

Data Integration & Access

HttpFS

Hue

*  Cer&fica&on/support  planned  for  2014  

MapR-DB MapR-FS

Page 21: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 21

MapR Distribution for Hadoop M

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provisioning &

coordination

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MapReduce v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance Tez*

Accumulo*

Hive

Impala

Shark

Drill

SQL

Sentry* Oozie ZooKeeper Sqoop

Knox* Falcon* Flume

Data Integration & Access

HttpFS

Hue

*  Cer&fica&on/support  planned  for  2014  

Enterprise-grade Security Operational Performance Multi-tenancy Interoperability

MapR-DB MapR-FS

• Standard file access • Standard database

access • Pluggable services • Broad developer

support

• Enterprise security authorization

• Wire-level authentication

• Data governance

• Ability to support predictive analytics, real-time database operations, and support high arrival rate data

• Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators

• 2X to 7X higher performance

• Consistent, low latency

• High availability • Data protection • Disaster recovery

Page 22: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 22

MapR Distribution for Hadoop M

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provisioning &

coordination

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MapReduce v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance Tez*

Accumulo*

Hive

Impala

Shark

Drill*

SQL

Sentry* Oozie ZooKeeper Sqoop

Knox* Whirr Falcon* Flume

Data Integration & Access

HttpFS

Hue

*  Cer&fica&on/support  planned  for  2014  

• Enterprise security authorization

• Wire-level authentication

• Data governance Ø  Kerberos support Ø  Native key-based

authentication Ø  Enterprise directory

integration LDAP/NIS/AD

Ø  Linux PAM Ø  Role-based access

control with Boolean expressions

Ø  Intel AES/NI high performance encryption

• Ability to support predictive analytics, real-time database operations, and support high arrival rate data

Ø  Integrated

in-Hadoop database Ø  Consistent low

latency Ø  Instant recovery for

database operations Ø  No compactions Ø  Elimination of read/

write amplification Ø  Zero administration

• Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators

Ø  Data placement

control Ø  Job placement

control Ø  Logical volumes Ø  Ability to leverage

enterprise access control to isolate and secure data access

Ø  Enforce SLAs, provide job isolation

• High availability • Data protection • Disaster recovery Ø  Instant stateful

failover Ø  99.999% Availability Ø  Consistent snapshots Ø  Point-in-time recovery Ø  Self-healing Ø  WAN replication Ø  RTO with mirroring Ø  Job Tracker HA Ø  System resource

protection Ø  Job isolation and user

quotas

• Standard file access • Standard database

access • Pluggable services • Broad developer

support

Ø  NFS support Ø  POSIX Ø  Random read/write Ø  Concurrent read/write Ø  JDBC/ODBC Ø  Nagios/Gangila

integration Ø  REST API

• 2X to 7X higher performance

• Consistent , low latency

Ø  No-Namenode

distributed architecture

Ø  Database performance with no compactions or defragmentation

Ø  Automated compression

Enterprise-grade Security Operational Performance Multi-tenancy Interoperability

Page 23: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 23

MapR: Best Solution for Customer Success

Top Ranked Exponential Growth

500+ Customers

Premier Investors

>2x annual bookings

80% of accounts expand 3X

90% software licenses

< 1% lifetime churn

> $1B in incremental revenue generated by 1 customer

Page 24: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 24

Forrester Wave™: Big Data Hadoop Solutions, Q1‘14

The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.

MapR: The Top Ranked Current Offering

“The score speaks for itself. MapR has added some unique innovations to its Hadoop distribution, including support for Network File System (NFS), running arbitrary code in the cluster, performance enhancements for HBase, as well as high-availability and disaster recovery features.”

Weak

Weak

Strategy Strong

Current offerings

Strong

Risky Bets Contenders

Strong Performers Leaders

Market presence

Page 25: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 25

Forrester Wave™: Big Data Hadoop Solutions, Q1‘14

The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at the time and are subject to change.

MapR: The Top Ranked Current Offering

“The score speaks for itself. MapR has added some unique innovations to its Hadoop distribution, including support for Network File System (NFS), running arbitrary code in the cluster, performance enhancements for HBase, as well as high-availability and disaster recovery features.”

Weak

Weak

Strategy Strong

Current offerings

Strong

Risky Bets Contenders

Strong Performers Leaders

Market presence

Page 26: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 26 © 2014 MapR Technologies ®

High Availability & Data Protection

Page 27: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 27

Business Continuity

High Availability

Data Protection

Disaster Recovery

What are your requirements?

What do you have for your enterprise storage, databases and data warehouses?

Page 28: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 28

No NameNode architecture

MapReduce/YARN HA

NFS HA

Instant recovery

Rolling upgrades

HA is built in

•  Distributed metadata can self-heal •  No practical limit on # of files

•  Jobs are not impacted by failures •  Meet your data processing SLAs

•  High throughput and resilience for NFS-based data ingestion, import/export and multi-client access

•  Files and tables are accessible within seconds of a node failure or cluster restart

•  Upgrade the software with no downtime

•  No special configuration to enable HA •  All MapR customers operate with HA

High Availability (HA) Everywhere

Page 29: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 29

Apache Hadoop NameNode High Availability

NameNode

A B C D E F

HDFS-based Distributions

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

Primary NameNode

A B C D E F

Standby NameNode

A B C D E F

NameNode

A B

NameNode

C D

NameNode

E F NameNode

A B

NameNode

C D

NameNode

E F

HDFS HA HDFS Federation

Single point of failure

Limited to 50-200 million files

Performance bottleneck

Metadata must fit in memory

Only one active NameNode

Limited to 50-200 million files

Performance bottleneck

Metadata must fit in memory

Double the block reports

Multiple single points of failure w/o HA

Needs 20 NameNodes for 1 Billion files

Performance bottleneck

Metadata must fit in memory

Double the block reports

Page 30: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 30

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

No-NameNode Architecture

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

®

NameNode

A B C D E F A A A B B B B C C C D D D E E E F F F

Up to 1T files (> 5000x advantage) Significantly less hardware & OpEx Higher performance

No special config to enable HA Automatic failover & re-replication Metadata is persisted to disk

Page 31: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 31

Data Protection: Replication and Snapshots

Replication •  Protect from hardware failures •  File chunks, table regions and metadata are automatically

replicated (3x by default) •  At least one replica on a different rack

Snapshots •  Protect from user and application errors •  Point-in-time recovery •  Redirect on write •  No performance or scale impact •  Read files and tables directly from snapshot

C1 C2

C3

C1 C2

C4

C1 C4 C4 C2

C5

C5 C6

C3

C5 C6

C3C6 C7

C7 C7

Ac#ve&Volume Snapshot13505505.09500

A B C D D₁

Page 32: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 32

Disaster Recovery: Mirroring •  Flexible

–  Choose the volumes/directories to mirror –  You don’t need to mirror the entire cluster –  Active/active

•  Fast –  No performance impact –  Block-level (8KB) deltas –  Automatic compression

•  Safe –  Point-in-time consistency –  End-to-end checksums

•  Easy –  Graceful handling of network issues –  No third-party software –  Takes less than two minutes to configure! Production

WAN

Production Research

Datacenter  1   Datacenter  2  

WAN EC2

Page 33: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 33 © 2014 MapR Technologies ®

Interoperability

Page 34: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 34

Seamless Integration with Direct Access NFS •  MapR is POSIX compliant

–  Random reads/writes –  Simultaneous reading and writing to a file –  Compression is automatic and transparent

•  Industry-standard NFS interface (in addition to HDFS API)

–  Stream data into the cluster –  Leverage thousands of tools and

applications –  Easier to use non-Java programming

languages –  No need for most proprietary Hadoop

connectors

®

Page 35: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 35

When Hadoop Looks Like a NAS…

•  Data ingestion is easy –  Popular online gaming company changed data

ingestion from a complex Flume cluster to a 17-line Python script

•  Database bulk import/export with standard vendor tools

–  Large telco saved $30M on EDW costs (5 years) by leveraging MapR to pre-process and store raw data prior to loading into EDW

•  1000s of applications/tools –  Large credit card company uses MapR volumes as

the user home directories on the Hadoop gateway servers

Application servers

$  find  .  |  grep  log  $  cp  $  vi  results.csv  $  scp  $  tail  -­‐f  part-­‐00000  

Logs

Page 36: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 36 © 2014 MapR Technologies ®

Multi-Tenancy & Security

Page 37: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 37

Volumes

100K volumes are OK, create as many as needed

Volumes dramatically simplify management: •  Replication factor •  Scheduled mirroring •  Scheduled snapshots •  Data placement control •  User access and tracking •  Administrative permissions

/projects

/tahoe

/yosemite

/user

/msmith

/bjohnson

Page 38: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 38

Multi-tenancy Isolation •  Tasks sandboxed so they don’t impact other tasks or system daemons •  System resources protected from runaway jobs •  Volume-based data placement •  Label-based job scheduling

Quotas •  Storage quotas by volume/user/group •  CPU and memory quotas by queue/user/group

Security and delegation •  Wire-level authentication and encryption (Kerberos not required) •  Fine-grained administration permissions including volume-level delegation •  Authenticate users to AD, LDAP and Kerberos via Linux PAM

Reporting •  Detailed reporting on resource usage (75+ different metrics) •  All reports are available via UI, CLI and REST API

Page 39: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 39

MapR Integrates Security into Hadoop MapR  Integrates  Security  into  Hadoop  

Page 40: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 40

Making Security Easy

> 99% consumers accessing

online banks use strong wire-level authentication

< 5% organizations deploying Hadoop enable strong

wire-level authentication

Page 41: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 41

Hadoop Security

Authorization to ensure the right access to files and databases

Authentication for users and user-created job requests

Encryption to ensure user credentials and data are always secure

Integration with existing security infrastructure

Page 42: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 42

… Along With Fine-Grained Access Control

Full POSIX permissions on files and directories ACLs on tables, column families and columns ACLs on MapReduce jobs and queues Administration ACLs on cluster and volumes Access control expressions for easy, role-based control

Page 43: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 43

HADOOP CLUSTER

CLIENT (NO KERBEROS)

CLIENT (KERBEROS-ENABLED)

KERBEROS KDC

USER DIRECTORY (AD, LDAP, NIS, …)

USERNAME/ PASSWORD

(HTTPS)

KERBEROS SERVICE TICKET

CHECK USERNAME/ PASSWORD

CHECK USERNAME/PASSWORD

Existing Security Infrastructure

Integration with Existing Security Infrastructure SSO with existing Kerberos infrastructure (optional) Linux PAM integration enables third-party user directories

MapR supports wire-level authentication with and without Kerberos

Page 44: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 44

Native Security Authentication

*MapR Leverages Standard Cryptography: NSA Suite B Cryptography (AES-256 and SHA-384)

Ease of Deployment

Hadoop initiates and maintains secure key communication* throughout the cluster without requiring external validation Users authenticate themselves through a simple and secure login-password mechanism All cluster nodes authenticate and interact with each other through secure keys

Cluster-wide Security

All operations on Hadoop are secured natively including: User operations such as file reads and writes, database manipulations, MapReduce job submissions Intra-cluster node-node interactions including remote procedure calls Inter-cluster operations such as mirroring

Page 45: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 45 © 2014 MapR Technologies ®

Performance Leader

Page 46: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 46

World-Record Performance

PREVIOUS RECORD: 1.6 TB with 2200 nodes

1.65 TB IN 1 MINUTE

298 NODES

NEW MINUTESORT WORLD RECORD

MapR: With a Fraction of the Hardware

Previous Record

Page 47: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 47

Comparative Study of Hadoop Distributions

212

59

262

69

276

64

475 465 IDH

CDH

HDP

MapR

Source: Flux7 Labs Study, October 2013

Read and Write Throughput Benchmarks

DFSIO Read Throughput DFSIO Write Throughput

MB

per

Sec

ond

MB

per

Sec

ond

Page 48: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 48

MapR-DB: The Best In-Hadoop Database

▪  NoSQL  Wide-­‐column  Store  

▪  Apache  HBase  API  ▪  Integrated  with  Hadoop  

HBase

JVM

HDFS

JVM

ext3/ext4

Disks

Other Distros

Tables/Files

Disks  

MapR Enterprise Database Edition (M7)

The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics

MapR-DB

Page 49: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 49

Consistent, Low Latency

--- M7 Read Latency --- Others Read Latency

Page 50: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 50

Operations + Analytics = Real-time, Personalized Services

Fraud model Recommendations table

MapR Distribution for Hadoop

Fraud investigator

Interactive marketer

Online transactions

Fraud detection

Personalized offers

Clickstream analysis

Fraud investigation tool

Real-time Operational Applications

Analytics

Page 51: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 51 © 2014 MapR Technologies ®

Ensuring Your Success

Page 52: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 52

Page 53: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 53

Committed to our Customers’ Success

Educational Services Professional Services Customer Support

Core Hadoop Services

Data Engineering

Advanced Analytics

M7/HBase Practice

Hadoop engineering experts provide

24x7x365 global coverage

Instructor-led courses &

Web-based training for Hadoop cluster administration, HBase &

MapReduce programming and more

Data Engineering

Data Science

Page 54: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 54

WORLDWIDE PRESENCE &

CUSTOMER SUPPORT

HQ

Page 55: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 55

Key MapR Advantage Partners Business  Services  

INFRASTRUCTURE & CLOUD

ANALYTICS & BUSINESS INTELLIGENCE

APPLICATIONS & OS

CONSULTANTS & INTEGRATORS

DATA WAREHOUSE & INTEGRATION

Page 56: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 56

From Redundant Processing Silos and Data Science Experiments…

Opportunity to Revolutionize Enterprise Data Architecture

Page 57: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 57

®

… to Consolidated Operational and Analytical Workloads

The Production Enterprise Data Hub

Round bullets for subtext

Page 58: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 58

Summary

BIG DATA

BEST PRODUCT

BUSINESS IMPACT

Hadoop Top Ranked

Production Success

Page 59: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 59

Q & A

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

Page 60: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 60 © 2014 MapR Technologies ®

Extra slides

Page 61: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 61

Packages Supported by various distributions MapR 4.0.1 (Sep 2014)

Cloudera 5.1.2 (Aug 2014)

Hortonworks 2.1.5 (Aug 2014)

Apache Versions (Sep 12th, 2014)

Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1

Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2 Hive 0.12, 0.13 0.12 0.13 0.13 Tez 0.4 (Dev Preview Only) X 0.4 0.5 Pig 0.12 0.12 0.12 0.12 Cascading 2.1.6 X X 2.5 Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1

Interactive SQL Impala 1.2.3 1.4 X 1.4 Drill 0.5 X X 0.5 SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1

NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98 Phoenix X X 4.0.0 4.1.0 AsyncHBase 1.5 X X 1.5 Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA

Machine Learning and Graph

Mahout 0.9 0.9 0.9 0.9 MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1

Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1 Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2 Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1

Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5 Flume 1.5.0 1.5.0 1.4.0 1.5.0 Knox X X 0.4 0.4

Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1 Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5

GUI, Configuration, Monitoring

Management MCS CM Ambari Ambari Hue 3.5 3.6 2.5.1 3.6

Red – lacking Blue - leading

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html

Page 62: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 62

Business Continuity

High Availability

Data Protection

Disaster Recovery

What are your requirements?

What do you have for your enterprise storage, databases and data warehouses?

Page 63: Meruvian - Introduction to MapR

®© 2014 MapR Technologies 63

The Cloud Leaders Pick MapR

Google chose MapR to provide Hadoop on Google

Compute Engine

Amazon EMR is the largest Hadoop provider in revenue

and # of clusters