Upload
cloudera-inc
View
1.220
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager.
Citation preview
Taming the Elephant - Learn how Monsanto manages their Hadoop clusters to enable Genome/Sequence processing
Erich HochmuthMark Seidenstricker
Bala VenkatraoAparna Ramani
• Hadoop World 2012, New York, October 25th, 2012
Agenda
• Introductions• Monsanto Hadoop Use Case
• Operational Challenges• How Monsanto leverages Cloudera Manager & Product Demo• Key benefits of using Cloudera Manager
• Cloudera Manager• Overview• Key Features• Roadmap
• Q&A2
Introductions
•Monsanto• Erich Hochmuth – R&D IT Data & Analytics Lead• Mark Seidenstricker – Infrastructure R&D Architect
• Cloudera• Bala Venkartrao – Director, Products• Aparna Ramani – Director, Engineering
3
Monsanto Serves Farmers Around the WorldWorking With Growers Large and Small, Row Crops and Vegetables
4
Monsanto’s Approach to Driving YieldA System of Agriculture Working Together to Boost Productivity
The science of improving plants by inserting genes into their DNA
BIOTECHNOLOGYBREEDING AGRONOMICS
The art and science of combining genetic material to produce a new seed
The farm management practices involved in growing plants
5
Increasing Yield through Big DataAt the Cornerstone of Yield Increases is Information & Analytics
• PBs of NGS data• 10’s TBs of genomic data• TBs of yield data• Billions of genotyping dps
VolumeVariety Velocity
• Raw Sequence data• Unstructured sensor data• Poly-structured genomic data• Spatial data
• 10’s millions yield dps/day• 100’s million genotyping dps/day• TBs of NGS data/week
Increased Yield
6
What are the Challenges of managing a Hadoop Cluster?
Software Provisioning & Configuration Management• Automated & simplified installation/patch management • Streamlined cluster configuration
Enterprise –ready Tools• Enterprise grade monitoring & management capabilities• Integration with existing enterprise IT stack
Reporting & Monitoring• Proactive monitoring & alerting• Capacity planning
Support• Midwest Location• Lack of Hadoop expertise
7
With Cloudera Manager, you get…Intuitive Management Console
• Mission control style dashboard for entire cluster • Centralized management of entire Hadoop ecosystem• Treat the cluster as an appliance• Configuration change audit & validation
Integration with Enterprise IT Management Tools• Connect to Corporate LDAP• Cloudera Manager API integrates with existing BMC platform
Comprehensive Monitoring & Alerting• Proactive service level alerts• Summarized cluster level graphs & charts• Real-time series charts (MapReduce & HBase)
Historical Cluster Metrics/Reports• Capacity planning - Disk usage/ Slot Capacity
8
What are the Solutions?
Lowers the barrier for Hadoop administration• Do not need to rely on experts solely
• Reduces the number of administrators needed
Provides a “one-stop” holistic view• Easy to understand how the overall cluster is performing
Includes pre-tuned configuration with best practices• Get straight to solving the business problem
Integrates with Cloudera support• Leverage the real experts…not just for bugs
What are the Benefits of Cloudera Manager?
9
Cloudera Enterprise – The Platform for Big Data
10
Why You Need Cloudera Manager?
Hadoop is more than a dozen services running across many machines• Hundreds of hardware components• Thousands of settings• Limitless permutations
Complexity
Hadoop is a system, not just a collection of parts• Everything is interrelated• Raw data about individual pieces is not enough• Must extract what’s important
Context
Managing Hadoop with multiple tools & manual process takes longer• Complicated, error-prone workflows• Longer issue resolution• Lack of consistent & repeatable processes
Efficiency
11
Cloudera ManagerEnd-to-End Administration for CDH
DeployInstall, configure & start your cluster in 3 simple steps
1Configure & OptimizeEnsure optimal settings for all hosts & services2Monitor, Diagnose & ReportFind & fix problems quickly, view current & historical activity & resource usage
312
Managing Complexity
One Tool For EverythingDEPLOYMENT & CONFIGURATION MONITORING WORKFLOWS EVENTS & ALERTS LOG SEARCH DIAGNOSTICS REPORTING ACTIVITY
MONITORING
CLOUDERA ENTERPRISE
+
DO-IT-YOURSELF
“In a recent Cloudera survey, >95% of respondents emphasized the importance of having a single end-to-end tool to manage their Hadoop Operations”
13
Raw Data vs. Hadoop IntelligenceProviding Context
? VS.
Smart ConfigurationAuto-sets configurations & guards against user error1WorkflowsEnsures that multi-step tasks are accomplished completely & in the correct sequence
2DependenciesAware of how a particular action affects the rest of the cluster & manages the impact
3Events & AlertsMakes you aware of what’s important at a Hadoop system level4HistoryCompares current & past activities for context5
14
Cloudera Manager Key FeaturesAutomated Deployment Installs the complete Hadoop stack in minutes via a wizard-based interface
Centralized Management Gives you complete, end-to-end visibility and control over your Hadoop cluster from a single interface
Multi-Cluster Management Allows you to manage multiple clusters from a single instance of Cloudera Manager
LDAP Authentication Integrate Cloudera Manager with Active Directory
Global Time Control Establishes the time context globally for almost all views
Correlates jobs, activities, logs, system changes, configuration changes and service metrics along a single timeline to simplify diagnosis
Service & Configuration Management
Set server roles, configure services and manage security across the cluster
Gracefully start, stop and restart of services as needed
Role-Based Administration Supports Administrator and Read-Only users
Audit Trails Maintains a complete record of configuration changes with the ability to roll back to previous states
Proactive Health Checks Monitors dozens of service performance metrics and alerts you when you approach critical thresholds
15
Cloudera Manager Key Features (Contd..)Intelligent Log Management Gather, view and search Hadoop logs collected from across the cluster
Scans Hadoop logs for irregularities and warns you before they impact the cluster
Event Management Creates and aggregates relevant Hadoop events pertaining to system health, log messages, user services and activities and make them available for alerting and searching
Alerting Generates email alerts when certain events occur
Activity Monitoring Consolidates all cluster activity into a single, real-time view
Host Level Monitoring View information pertaining to hosts in your cluster including status, resident memory, virtual memory and roles
Heatmaps Visualize health status and metrics across the cluster to quickly identify problem nodes and take action
Operational Reports Visualize current and historical disk usage by user, group and directoryTrack MapReduce activity on the cluster by job or user
Support Integration Takes a snapshot of the cluster state and automatically sends it to Cloudera support to assist with resolution
Comprehensive API Easily integrate Cloudera Manager with your existing enterprise-wide management and monitoring tools
16
Cloudera Manager Roadmap• Cloudera Manager 4.1 – Released 10/24
• Platform Support for CDH4.1• Cloudera Impala management & monitoring • New monitoring – Zookeeper, Flume NG• Maintenance Mode• Host Decommissioning• Several Usability Enhancements
• Cloudera Manager 4.5 – Early 2013• Rolling Upgrades/ Restarts• Enhanced Monitoring, Cluster Heatmaps etc.• Role Groups Configuration• Cloud Support• Others – SNMP support, Error handling, ISV integration etc.
17
Why Cloudera Manager?
End-to-End Hadoop administration in a single toolSimple
Manages Hadoop at a system level – Cloudera’s experience realized in softwareIntelligent
Simplifies complex workflows & makes administrators more productiveEfficient
The only enterprise-grade Hadoop management application availableBest-in-Class
18
Next Steps
• Try out FREE edition of Cloudera Manager• Download from:
http://www.cloudera.com/products-services/tools/• Support available via [email protected]
• For Cloudera Enterprise subscriptions, please contact: [email protected]
19
Q&A
20
22
Cloudera Manager
Key Features
23
1 2 3Find Nodes Install Components Assign Roles
Enter the names of the hosts which will be included in the Hadoop cluster. Click Continue.
Cloudera Manager automatically installs the CDH components on the hosts you specified.
Verify the roles of the nodes within your cluster. Make changes as necessary.
Install A Cluster In 3 Simple StepsCloudera Manager Key Features
View Service Health & PerformanceCloudera Manager Key Features
24
Get Host-Level SnapshotsCloudera Manager Key Features
25
Monitor & Diagnose Cluster WorkloadsCloudera Manager Key Features
26
Gather, View & Search Hadoop LogsCloudera Manager Key Features
27
Track Events From Across The ClusterCloudera Manager Key Features
28
Report On System Performance & UsageCloudera Manager Key Features
29
Visualize Health Status With HeatmapsCloudera Manager Key Features
30
Manage Multiple CDH ClustersCloudera Manager Key Features
31
Easily Configure High AvailabilityCloudera Manager Key Features
32
Set The Time Context GloballyCloudera Manager Key Features
33