23
Confidential ACHIEVING OPERATIONAL EXCELLENCE WITH HIVE AND MAPREDUCE

5 Best Practices for Monitoring Hive and MapReduce Application Performance

Embed Size (px)

Citation preview

Page 1: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

ACHIEVING OPERATIONAL EXCELLENCE WITH

HIVE AND MAPREDUCE

Page 2: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential2

Heterogeneous Application Environments Cluster Performance Monitoring Application Performance Monitor

Production Hadoop Environments Contain a Variety of Application Technologies

CHALLENGES

Page 3: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential3

Application Performance MonitorCluster Performance MonitoringHeterogeneous Application Environments

Cluster Monitoring Products Do Not Provide Application Insight

CHALLENGES

Page 4: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential4

Cluster Performance Monitoring Application Performance MonitorHeterogeneous Application Environments

Existing Tools Offer Limited Value for Monitoring Application PerformanceLeaving us blind to business context, priority, ownership and performance of our data applications

CHALLENGES

Page 5: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential5

Enterprise Scale Monitoring and Management for Big Data Apps

Business & Operational Context

Data & TechnologyConnecting Business and Data

PERFORMANCE MONITORING & VISIBILITY

Page 6: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

4

5

3

5 BEST PRACTICES TO ACHIEVE OPERATIONAL EXCELLENCE

6

Visibility

1Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

2

Page 7: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

PERFORMANCE MONITORING & VISIBILITY

7

Pinpoint bottlenecks and identify causes

Monitor current executions and performance

Comprehensive view of all your data processing execution Fully visualize your entire data pipeline

Immediately understand the status of all your data applications

See all successful, failed, pending processes…

Page 8: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

PERFORMANCE MONITORING & VISIBILITY

8

Fully visualize your queries and data pipelinesComprehensive view of all your data processing executions

RESULTS

JOIN OPERATIONS

SOURCE SINK

SURFACE HQL

Page 9: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

1

2

4

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

9

Segmentation

2

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.5

3

Page 10: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

SEGMENTATION

10

Pinpoint bottlenecks and identify causes

Signal Over Noise

Quickly find and filter what you are looking for and save as a custom view

Views can private, shared with a team, or made public

Quickly view application data by cluster, owner, technology etc

Page 11: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

11

Identify Problems

3

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

1

2

4

5

Page 12: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential12

Create JIRA issues with views and data for quickly collaborating to resolve performance problems

With one click, create a Jiraissue with a link to this view

QUICKLY DRILL DOWN AND EXPOSE ROOT CAUSE

Page 13: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

IDENTIFY BOTTLENECKS AND SLOWDOWNS

13

Pinpoint bottlenecks and identify causes

Pinpoint bottlenecks and identify causes

CHOOSE METRICSUNDERSTAND BEHAVIORS VISUALIZE SLOWDOWNSDRILL DOWN TO QUERY PERFORMANCE VIEW

Page 14: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

14

Add Context

4

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

1

2

3

5

Page 15: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

UNDERSTAND THE BUSINESS CONTEXT

15

Leverage metadata to align applications with their business context

View and sort by application metadata

Visualize executions and resource contention

Understand concurrency

Page 16: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

SURFACE ALL FAILURES

16

Quickly identify all failing applications

App NameOwnerOrganizationCluster A or BPrivacy LevelProduction or DevCustom TagsMore …

Not all problems are created equal

Page 17: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

5 BEST PRACTICES TO ACHIEVE OPERATIONAL READINESS

17

Collaborate

5

Performance monitoring and visibility into all of your big data applications• Increase the quality and efficiency of your deployments with a single integrated view of your

data applications and real-time performance metrics across all environments.

Segmenting users, applications and environments• Quickly understand what is happening, where and by whom in ways that are meaningful and

aligned to how your business operates.

Identify performance issues, bottleneck and noncompliant applications and queries• Spend less time wading through Hadoop logs, ResourceManager and source code to find

issues with your data pipelines. Instead, use that time optimizing your environment.

Add business context to better monitor your applications• Immediately understand the business impact of an issue, including the downstream

implications, so you can rapidly take the right corrective action.

Collaborate across teams to resolve issues faster• Collaboration between all roles that interact with an application, data scientists, developers

and operations, the quality and efficiency of your application increases.

1

2

3

4

Page 18: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential18

Ensure that business, development, IT operations can collaborate seamlessly when it matters

NURTURE A CULTURE OF OPERATIONAL EXCELLENCE

Page 19: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

LET’S TAKE A TOUR

For a walk-through of all the features of Driven,

Go to our Showcase interactive demo

http://showcase.driven.io

Page 20: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

ConfidentialConfidential

THANK YOU

Page 21: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

ConfidentialConfidential

APPENDIX

Page 22: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

End-to-end operational telemetry metadata for big data applicationsAccessible via Web browser, command-line interface (CLI), or simple search queriesEasy integrations through JMX and upcoming Driven SDK

… THROUGH A SCALABLE, SEARCHABLE METADATA STORE

22

Telemetry metadata(SSL)

YARN

HADOOP APPS AND INFRASTRUCTURE

APPLICATIONS

Plugin

HADOOP CLUSTERS

WAR

files Web App

Server

Server

Web CLI JMX

Web AppServer

SCALE OUT

SCALE OUT

Page 23: 5 Best Practices for Monitoring Hive and MapReduce Application Performance

Confidential

Go to our website: www.driven.io

TO LEARN MORE