39
© 2016 MapR Technologies © 2016 MapR Technologies 5.2 Product Update MapR Product Mgmt & Product Marketing Aug 17, 2016

MapR 5.2 Product Update

Embed Size (px)

Citation preview

Page 1: MapR 5.2 Product Update

© 2016 MapR Technologies © 2016 MapR Technologies

5.2 Product Update

MapR Product Mgmt & Product Marketing

Aug 17, 2016

Page 2: MapR 5.2 Product Update

© 2016 MapR Technologies

Today’s Presenters

Sameer NoriSr. Product Marketing Manager

Prashant RathiSr. Product Manager

Ian DownardTechnical Marketing Engineer

Balaji MohanamProduct Manager

Page 3: MapR 5.2 Product Update

© 2016 MapR Technologies 3

Today’s Agenda • Recent Product Announcements

• The Spyglass Initiative & Demo

• MapR Ecosystem Pack(MEP)

• Spark and Drill updates

Page 4: MapR 5.2 Product Update

© 2016 MapR Technologies 4

The MapR Converged Data Platform

Page 5: MapR 5.2 Product Update

© 2016 MapR Technologies 5

Recent Product Announcements• Quick Start Solution focused on Risk Management for Financial

Services – July 16

• Enterprise-Grade Spark Distribution – June 16

• Quick Start Migration Service – May 16

• Stream Processing On-Demand Training(ODT) – Apr 16

• Apache Drill 1.6 – Mar 16

Page 6: MapR 5.2 Product Update

© 2016 MapR Technologies 6

Four Big Themes in the 5.2 ReleaseMajor new features

• MapR-DB JSON Table replicationBinary Elastic Search v2.x supportDrill DB JSON improvements

• StreamsPerformant Spark Streaming Stream Admin APIs

Easier Management

• Spyglass : deep visibility across cluster opsDeep visibility

Search across metrics and logsFull control

customizable , sharable dashboardsExtensible

• Various Graphical Installer improvements

Community Innovation• MapR Eco Pack 1.0

Supportability and StabilityCurrency and Commitment to SLAEasy deployment and upgrade

Customer requested features

• POSIX : HardLink and StatFS feature• Fast Failover for client • Fuse Client performance• Rack Reliability for data placement

enhancement• File Client Impersonation enhancements

Page 7: MapR 5.2 Product Update

© 2016 MapR Technologies 7

5.2 Ecosystem SupportThese are the only component version changes in MEP 1.0 from 5.2 release date and all of these have been out for 5.1 already.

Eco on 5.1 today MEP 1.0 on 5.2

Component Released with 5.1 Subsequently released for 5.1

Drill 1.4 1.6 1.6

Spark 1.5.2 1.6.1 1.6.1

Impala 2.2.0 2.5 2.5

Storm 0.10.0 0.10.1 0.10.1

Mahout 0.11.2 0.12.2 0.12.2

Page 8: MapR 5.2 Product Update

© 2016 MapR Technologies 8

4 Reasons to Step Up to MapR 5.21. New features in the MapR Converged Data Platform

2. Ecosystem updates

3. Continuing quality improvements

4. End-of-maintenance for prior releases

Page 9: MapR 5.2 Product Update

© 2016 MapR Technologies 9© 2016 MapR Technologies

Project Spyglass

Page 10: MapR 5.2 Product Update

© 2016 MapR Technologies 10

MapR Vision: Maximizing User/Operator Productivity

DeepVisibility

Another sample

EasyManagement

FullControl

Page 11: MapR 5.2 Product Update

© 2016 MapR Technologies 11

The MapR Spyglass Initiative• New approach for increasing user and administrator productivity

– Comprehensive, open, extensible• Simplifies the management of growing big data deployments• Starts with upcoming release

– Phase 1 – MapR Monitoring– Initial focus on operational visibility

• Helps community innovate faster– Extensive use of open source visualization and dashboarding tools

Page 12: MapR 5.2 Product Update

© 2016 MapR Technologies 12

Spyglass Initiative Phase 1 - MapR Monitoring

Empower administrators with cluster

monitoring capabilities, including

metric and log collection from nodes,

services, and jobs, with dashboards to

display information in a useful way.

Converged Customizable Extensible

Page 13: MapR 5.2 Product Update

© 2016 MapR Technologies 13

Collection VisualizationAggregation & Storage

MapR Monitoring Architecture

Future

Data Sources

Log Shippers

Metrics Collectors

Alerting

Node Environmentals

(CPU, Mem, I/O)

Service Daemons

(YARN, Drill, Hive, etc.)

MapR Control System

Page 14: MapR 5.2 Product Update

© 2016 MapR Technologies 14

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable entity (data, volume, snapshot and total size)

YARN/MR Application Monitoring• Global YARN trend graphs

• Containers - Pending, Active

• vCores & RAM - Allocated & Used

• Per Queue charts - containers, vCores, RAM

Service Daemon Monitoring• Per-service charts with for (CPU Usage

by type, Memory)

• Centralized, searchable logs

• MapR core and ecosystem services (includes YARN, Drill and Spark)

Page 15: MapR 5.2 Product Update

© 2014 MapR Technologies 15

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Page 16: MapR 5.2 Product Update

© 2014 MapR Technologies 16

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable entity (data, volume, snapshot and total size)

Page 17: MapR 5.2 Product Update

© 2014 MapR Technologies 17

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable entity (data, volume, snapshot and total size)

YARN/MR Application Monitoring• Global YARN trend graphs

• Containers - Pending, Active

• vCores & RAM - Allocated & Used

• Per Queue charts - containers, vCores, RAM

Page 18: MapR 5.2 Product Update

© 2014 MapR Technologies 18

Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable entity (data, volume, snapshot and total size)

YARN/MR Application Monitoring• Global YARN trend graphs

• Containers - Pending, Active

• vCores & RAM - Allocated & Used

• Per Queue charts - containers, vCores, RAM

Service Daemon Monitoring• Per-service charts with for (CPU Usage by

type, Memory)

• Centralized, searchable logs

• MapR core and ecosystem services (includes YARN, Drill and Spark)

Page 19: MapR 5.2 Product Update

© 2016 MapR Technologies 19

Customizable Dashboardsfor Visualizing Metrics

Log Analytics

Page 20: MapR 5.2 Product Update

© 2016 MapR Technologies 20

Destination to Learn and Collaborate

Blog about topics and ideas

Share code snippets and dashboards

View demos, tutorials, and videos

Engage in use case discussion/development

Page 21: MapR 5.2 Product Update

© 2016 MapR Technologies 21

Dashboards are defined with JSONand easy to export and import in Grafana and Kibana

Extend/Integrate using REST API

The Exchange

Page 22: MapR 5.2 Product Update

© 2016 MapR Technologies 22

Dashboards can be viewed on mobile devices.

Page 23: MapR 5.2 Product Update

© 2016 MapR Technologies 23

Summary

● Data collection and storage infrastructure (packaged and supported)

○ Collection/storage of metrics & logs across node, storage, services

● Visualization dashboard (Driven via community)○ Sample dashboards for Grafana & Kibana

5.2 - Spyglass 1.0 GA

CUSTOMIZABLE, shareable and mobile-ready dashboards

CONVERGED monitoring with deep search

EXTENSIBLE and easy to integrate with REST API

Page 24: MapR 5.2 Product Update

© 2016 MapR Technologies 24© 2016 MapR Technologies

MapR Ecosystem Pack (MEP)

Page 25: MapR 5.2 Product Update

© 2016 MapR Technologies 25

What is the MapR Ecosystem Pack (MEP)?

• What is the “MapR Ecosystem”?– A selected set of stable and popular components from the

Hadoop Ecosystem that we fully support on the MapR platform.

• What is the “Pack”?– A single repository of selected versions of these components fully tested

to be interoperable.– Available via installer or package.– Delivered with a predictable cadence.

 

Page 26: MapR 5.2 Product Update

© 2016 MapR Technologies 26

Extended Ecosystem

Where Does MEP Fit In?

MapR Ecosystem

MEP

Community supported.

Fully supported, updates tied to MapR core.

Fully supported, updates follow MEP process.

Anoop Dawar
Maybe take kafka out and add something else -- like Apex here... we don't want to drive folks to Kafka and instead drive them to streams...
Page 27: MapR 5.2 Product Update

© 2016 MapR Technologies 27

An Example: Drill in MEP releases

August September October November December January

MapR 5.2 MapR 6.0

MEP 1.0: Drill 1.6

An example of how this would look for Drill

MEP 1.1: Drill 1.8

MEP 3.0: Drill 2.X

MEP 2.0: Drill 1.9

On our current release plan, MapR 5.2 will receive 3 different versions of Drill before updates cease.

Page 28: MapR 5.2 Product Update

© 2016 MapR Technologies 28

MEP Can Be Installed Using the 5.2 Installer

Can select MapR and MEP version. Can manually select components.

Page 29: MapR 5.2 Product Update

© 2016 MapR Technologies 29

Competitor Process Comparison

MapR MEPProcess Cloudera Hortonworks

Predictable Cadence

Required Component Upgrades

Updates independent of core release

Developer Previews

Support For Multiple Versions

Packaged updates

How our new process stacks up against the competition:

Page 30: MapR 5.2 Product Update

© 2016 MapR Technologies 30© 2016 MapR Technologies

Drill and Spark Updates

Page 31: MapR 5.2 Product Update

© 2016 MapR Technologies 31

Drill Product Evolution

Drill 1.0 GA•Drill GA

Drill 1.1•Automatic Partitioning for Parquet Files

•Window Functions support

•- Aggregate Functions: AVG, COUNT, MAX, MIN, SUM

•-Ranking Functions: CUME_DIST, DENSE_RANK, PERCENT_RANK, RANK and ROW_NUMBER

•Hive impersonation

•SQL Union support

•Complex data enhancements· and more

Drill 1.2•Native parquet reader for Hive tables

•Hive partition pruning

•Multiple Hive versions support

•Hive 1.2.1 version support

•New analytical functions (Lead, lag, Ntiile etc)

•Multiple window Partition By clauses support

•Drop table syntax

•Metadata caching

•Security support for web UI

• INT 96 data type support

Drill 1.3/1.4• Improved Tableau experience with faster Limit 0 queries

•Metadata (INFORMATION_SCHEMA) query speed ups on Hive schemas/tables

•Robust partition pruning (more data types, large # of partitions)

•Optimized metadata cache

• Improved window functions resource usage and performance

Drill 1.5/1.6•Enhanced Stability & scale•New memory allocator

• Improved uniform query load distribution via connection pooling

• Enhanced query performance•Early application of partition pruning in query planning

•Hive tables query planning improvements

•Row count based pruning for Limit N queries

• JDK 1.8 support

Drill 1.7•Enhanced MaxDir/MinDir functions

•Access to Drill logs in the Web UI

•Addition of JDBC/ODBC client IP in Drill audit logs

•Monitoring via JMX

•Hive CHAR data type support

•Partition pruning enhancements

•Ability to return file names as part of queries

ANSI SQL Window

Functions

Enhanced Hive

Compatibility

Query Performance & Scale

Drill on MapR-DB

JSON tables

Easy Monitoring of deployments

Page 32: MapR 5.2 Product Update

© 2016 MapR Technologies 32

Converging SQL and JSON with Apache Drill 1.6

• Flexible and operational analytics on NoSQL– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables– Pushdown capabilities provide optimal interactive experience

• Enhanced query performance – Provides better query performance via partition pruning, metadata caching and other optimizations– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill

• Better memory management– Delivers greater stability and scale which enables customers to run not only larger but also more SQL

workloads on a MapR cluster

• Improved integration with visualization tools like Tableau– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop. – Enhanced SQL Window functions

Page 33: MapR 5.2 Product Update

© 2016 MapR Technologies 33

Drill ANSI SQL Capabilities Directly on JSON0: jdbc:drill:drillbit=10.10.103.32> SELECT * FROM mfs.yelp_maprdb.business LIMIT 1;+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type |+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":true,"Parking":{"garage":false,"lot":true,"street":false,"valet":false,"validated":false},"Price Range":1.0,"Ambience":{},"Good For":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | --1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur BlvdWestsideLas Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Westside"] | true | 4.0 | 4.0 | NV | business |+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+

0: jdbc:drill:drillbit=10.10.103.32> SELECT count(*) FROM mfs.yelp_maprdb.business;+---------+| EXPR$0 |+---------+| 42153 |+---------+

Page 34: MapR 5.2 Product Update

© 2016 MapR Technologies 34

Simplified Deployment with YARN (Drill 1.8)

● Drill as a long running application in YARN

● Key features○ Client tool to launch Drill as

YARN application○ New Drill application

master (AM)○ CPU & memory controls○ Add/remove nodes to

cluster○ Multiple Drill clusters

Drill Configuration w/YARN

Page 35: MapR 5.2 Product Update

© 2016 MapR Technologies 35

Spark 2.0

Page 36: MapR 5.2 Product Update

© 2016 MapR Technologies 36

What’s in Spark 2.0?• Structured Streaming with Spark SQL

– The ability to perform interactive queries against live streaming data.– Output can now be aggregated in a stream for continuous applications.– Pre-computation of analytics in a continuous fashion can occur as the data is generated

• Whole Stage Code-gen– Provided by the second-generation Tungsten engine.– Eliminates the need for multiple JVM calls by flattening SQL queries into one single

function evaluated as bytecode at runtime.

• Dataset API’s– Runs on the same engine as SparkSQL.– Allows access to data from a variety of different data sources.– Can run database-like operations or allow for passing in custom code.

Page 37: MapR 5.2 Product Update

© 2016 MapR Technologies 37

Spark 2.0: Structure Streaming with Spark SQL (Alpha)

valrecords=sqlContext.read.format(“json”).stream(“hdfs://input”) valcounts=records.groupBy(“user”).count() counts.write .trigger(ProcessingTime(“5sec”)) .outputMode(UpdateInPlace(“user”)) .format(“jdbc”) .startStream(“mysql://...”)

Repeated Queries

DB

User Count

User 1 10

User 2 23

User 3 16

…….. ……..

Store only the processed output instead of every single record.

● Query executed repeatedly as and when the data arrives.● Read the result from persistent storage, instead of processing the entire data set, resulting in faster access.

Page 38: MapR 5.2 Product Update

© 2016 MapR Technologies 38

Spark 2.0 Whole Stage Code-gen: Planner

ParquetRelation

Filter

Project

Broadcast Hash join

Project

TungstenAggregate

Exchange

ParquetRelation

Filter

Project

ParquetRelation

Filter

Project

Broadcast Hash join

Project

TungstenAggregate

Exchange

ParquetRelation

Filter

Project

Whole Stage Codegen Whole Stage Codegen

Page 39: MapR 5.2 Product Update

© 2016 MapR Technologies 39

Q & AEngage with us!

1. Spyglass Initiativehttps://www.mapr.com/products/spyglass-initiative

https://community.mapr.com/docs/DOC-1088

2. Ask Questions: – Ask Us Anything about Spyglass in the MapR Community from Mon(Aug 29nd)-

Fri(Sep 2nd)

– https://community.mapr.com/