Upload
skillspeed
View
274
Download
0
Embed Size (px)
Citation preview
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Top 5 Tasks of aHadoop Developer
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
This session will coverᗍ Introduction to Big Data and Hadoopᗍ Roles & Scope of a Hadoop Developerᗍ Top 5 Tasks of Hadoop Developersᗍ Introduction to Hadoop Clusters & HBaseᗍ Job Trends for Hadoop
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data Challenges
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why Hadoop?
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Hadoop?
ᗍ Hadoop is open source framework for big data. Both distributed storage and processingᗍ Hadoop is reliable and fault tolerant with no rely on hardware for these propertiesᗍ Hadoop has unique horizontal scalability
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Tasks of a Hadoop Developer
The following are the tasks of a Hadoop Developer:
ᗍ Development and implementationᗍ Loading from disparate data setsᗍ Pre-processing ᗍ Designing, building, installing, configuring and supporting Hadoopᗍ Translate complex functional and technical requirements into detailed designᗍ Perform analysis on big dataᗍ Securing Dataᗍ Create scalable and high-performance web services for data trackingᗍ High-speed queryingᗍ Managing and deploying
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Let us Look at the Top 5 Tasks of a Hadoop Developer with Examples
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 1: Development and Implementation
A Hadoop developer is responsible for the actual coding/programming of Hadoop applications
One of the most important component of Hadoop is MapReduce in which you need to write Java programs – all you need is a basic Java background
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 2: Loading from Disparate Data Sets
Disparate Data are heterogeneous data
They are neither similar nor can be easily integrated with an organizations database management system. It differs in one or more aspects of an information system
Disparate data may be characterized by these basic problems:
ᗍ Implementing a database system in an organization, there is no complete and integrated inventory of all its data
ᗍ High data redundancy all over the organization
ᗍ High variability of data formats and contents
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 2: Loading from Disparate Data Sets – Scenario
Consider a web application where a user can send a query on getting a variety of information about various aspects such as crime statistics, weather, hotels, demographics, etc. in a particular city
Traditionally, the information must be stored in a single database with a single schema
But it would be difficult for any single enterprise to process and expensive to collect
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 3: Perform Analysis on Big Data
A Hadoop Developer perform and analyses the big data
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 3: Perform Analysis on Big Data – Example
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?Today, it is becoming a problem for all of us to manage such BIG DATA….
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 4: Securing Data
One of the biggest concerns in our present age revolves around the security and protection of sensitive information
Network security breaches from internal and external attackers are on the rise, often taking months to be detected, and affects the organizations terribly
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Task 5: Managing and Deploying
Managing the Hadoop Cluster. This is done using HBase
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1Introduction to Big Data and Hadoop
Module 2HDFS Internals,
Hadoop Configurations and Data Loading
Module 3Introduction to Map
Reduce
Module 4Advanced Map Reduce
Concepts
Module 5Introduction to Pig
Module 6Advanced Pig and
Introduction to Hive
Module 7Advanced Hive
Concepts
Module 8Extending Hive and HBase Introduction
Module 9Advanced HBase and
Oozie Introduction
Module 10Project Set-up
Discussion
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
24x7
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND+91-90660-20904 USA1866-607-6547 (Toll Free)
Or reach us [email protected]
Contact us..
Slide ‹#›© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Referenceshttp://bigdatascroll.com/what-is-hadoop-an-introduction/http://www.rudraitservices.com/web-development.html
http://www.datameer.com/product/data-visualization.html