2
Cloudera University’s three-day administrator training course for Apache Hadoop provides System Administrators a comprehensive understanding of all the steps necessary to operate and manage Hadoop clusters. From installation and configuration, through load balancing and tuning your cluster, Cloudera’s Administration course has you covered. Through lecture and interactive, hands-on exercises, attendees will cover topics such as • Introduction to Apache Hadoop and HDFS • Apache Hadoop architecture • Proper cluster configuration and deployment • Populating HDFS using Apache Sqoop • Management and monitoring tools • Job scheduling • Best practices for maintaining Apache Hadoop in Production • Installing and managing other Apache Hadoop projects • Diagnosing, tuning and solving Apache Hadoop issues Audience This course is intended for experienced developers who wish to write, maintain, and/or optimize Apache Hadoop jobs. A background in Java is preferred, but experience with other programming language such as PHP, Python, or C# is sufficient. Knowledge of algorithms and other computer science topics is a plus. Upon completion of the course, attendees are able to attempt the Cloudera Certified Developer for Apache Hadoop (CCDH) exam. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of their skills. TRAINING SHEET Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification “Cloudera Administrator Training for Apache Hadoop helped me to advance my use of Apache Hadoop and cultivate a better understanding of the platform’s inner workings. The course material, interactive labs and exercises really helped cement together all the little bits and pieces that I had bumped into prior to the class into a useful mental model of how Apache Hadoop works.” Eric Marshall, Senior System Administrator Administrator Training for Apache Hadoop Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com ©2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

Cloudera Administrator Training for Apache Hadoop

Embed Size (px)

DESCRIPTION

Cloudera-Administrator-Training-for-Apache-Hadoop

Citation preview

  • Cloudera Universitys three-day administrator training course for Apache Hadoop provides System Administrators a comprehensive understanding of all the steps necessary to operate and manage Hadoop clusters. From installation and configuration, through load balancing and tuning your cluster, Clouderas Administration course has you covered.

    Through lecture and interactive, hands-on exercises, attendees will cover topics such as

    Introduction to Apache Hadoop and HDFS

    Apache Hadoop architecture

    Proper cluster configuration and deployment

    Populating HDFS using Apache Sqoop

    Management and monitoring tools

    Job scheduling

    Best practices for maintaining Apache Hadoop in Production

    Installing and managing other Apache Hadoop projects

    Diagnosing, tuning and solving Apache Hadoop issues

    AudienceThis course is intended for experienced developers who wish to write, maintain, and/or optimize Apache Hadoop jobs. A background in Java is preferred, but experience with other programming language such as PHP, Python, or C# is sufficient. Knowledge of algorithms and other computer science topics is a plus.

    Upon completion of the course, attendees are able to attempt the Cloudera Certified Developer for Apache Hadoop (CCDH) exam. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of their skills.

    TRAINING SHEET

    Take your knowledge to the next level with Clouderas Apache Hadoop Training and Certification

    Cloudera Administrator Training for Apache Hadoop helped me to advance my use of Apache Hadoop and cultivate a better understanding of the platforms inner workings. The course material, interactive labs and exercises really helped cement together all the little bits and pieces that I had bumped into prior to the class into a useful mental model of how Apache Hadoop works.

    Eric Marshall, Senior System Administrator

    Administrator Training for Apache Hadoop

    Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com

    2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

  • Cloudera Certified Administrator for Apache Hadoop (CCAH)Establish yourself as a trusted and valuable resource by completing the online certification exam for Apache Hadoop Administrators. The exam is demanding and is designed to test your fluency with concepts and terminology in the following areas:

    TRAINING SHEET

    Administrator Training for Apache Hadoop

    Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com

    2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

    An Introduction To Hadoop And HDFS o Why Hadoop? o HDFS o MapReduce o Hive, Pig, HBase and other ecosystem projects o Hands-On Exercise: Installing a pseudo- distributed cluster

    Planning Your Hadoop Cluster o General Planning Considerations o Choosing The Right Hardware o Node Topologies o Choosing The Right Software

    Deploying Your Cluster o Installing Hadoop o Using SCM Express for easy installation o Typical Configuration Parameters o Configuring Rack Awareness o Using Configuration Management Tools o Hands-On Exercise: Installing a Hadoop Cluster

    Managing and Scheduling Jobs o Starting and stopping MapReduce jobs o Hands-On Exercise: Managing jobs o The FIFO Scheduler o The Fair Scheduler o Hands-On Exercise: Using the FairScheduler

    Cluster Maintenance o Checking HDFS with fsck o Hands-On Exercise: Breaking the Cluster o Copying data with distcp

    Cluster Maintenance (continued) o Rebalancing cluster nodes o Adding and removing cluster nodes o Hands-On Exercise: Verifying the Clusters Self- Healing Features o Backup And Restore o Upgrading and Migrating o Hands-On Exercise: Backing Up and Restoring the NameNode Metadata

    Cluster Monitoring, Troubleshooting and Optimizing o Hadoop Log Files o Using the NameNode and JobTracker Web UIs o Interpreting Job Logs o Monitoring with Ganglia o Other monitoring tools o General Optimization Tips o Benchmarking Your Cluster

    Populating HDFS From External Sources o Using Sqoop o Using Flume o Best Practices for Data Ingestion

    Installing And Managing Other Hadoop Projects o Hive o Pig o HBase o Hands-On Exercise: Configuring the Hive Shared Metastore

    Cloudera Certified Administrator Exam

    Course Outline: Cloudera Administrator Training for Apache Hadoop

    Apache Hadoop Cluster Overview Daemons and normal operation of an Apache Hadoop cluster, both in data storage and in data processing. The current features of computing systems that motivate a system like Apache Hadoop

    Apache Hadoop Cluster Planning Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster

    Apache Hadoop ClusterManagement

    Cluster handling of disk and machine failures. Regular tools for monitoring and managing the Apache Hadoop file system

    Job Scheduling How the default scheduler and the fair scheduler handle the tasks in a mix of jobs running on a cluster

    Monitor and Logging Basic functionality and features of Apache Hadoops logging and monitoring systems