36
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The Elephant Meets Scrum Rommel Garcia

Hadoop Meets Scrum

Embed Size (px)

Citation preview

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The Elephant Meets ScrumRommel Garcia

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Agenda

Control access into

system

Flexibility in defining

policies

• Introductions

• Why Scrum?

• Scrum Basic Concepts

• Scrum Team

• Scrum Framework

• Hadoop Meets Scrum

• Scrum Exercise

• Open Forum

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Introductions

What’s your name?

What’s your role?

Why are you here?

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Why Scrum?Nobody wants to fail too big….too co$tly…on projects.

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Monolithic SDLC

• Small change, impacts everything

• Cost of failure, extremely big

• Slow, unpredictable progress

• Hard to prioritize

• Not business friendly

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum..

• Produces immediate results

• Makes the development team nimble and adaptable

• Full visibility on development process

• Is a perfect fit for Hadoop• Hadoop provides isolation of data and processing (HDFS and YARN respectively)

• Failure in Hadoop is cheap

• Complete traceability of apps deployed, run, tested by whom, when, where

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum ConceptsAgile. Iterative. Adaptive. Fast results.

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum is..

• A framework within which people can address complex adaptive problems, while productively and creatively delivering products of the highest possible value

• A framework to employ various processes and techniques

• Lightweight

• Simple to understand

• Difficult to master….if RULES are not followed religiously

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Success of SCRUM depends on..

• Transparency• Common language must be shared by all team members

• What does “Done” mean??

• Inspection• Frequent Scrum artifacts progress check

• But be careful not to overdo it or it gets in the way of work

• Adaptation• Adjust properly and timely when process deviates outside of acceptable limits

• Adjust immediately to prevent further deviation

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum Formal Events

1. Sprint Planning

2. Daily Scrum

3. Sprint Review

4. Sprint Retrospective

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum consists of..

• Team

• Roles

• Events

• Artifacts

• Rules

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum TeamCommitted or Involved.

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The ‘Ham-n-Eggs’ Paradigm

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The Team

• Product Owner

• Development Team

• Scrum Master

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Product Owner

• Mainly responsible for Product Story management

• Clearly defines Product Story items

• Effectively order items in Product Story

• Ensures Product Story is visible, transparent, and clear to all, and shows what the Scrum Team will work on next

• Validates with Scrum team that they understand the items in the Product Story

• In real world, this could be either the Project Manager, Program Manager, Development Manager, or Product Manager

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Development Team

• Self-organizing• They decide how to produce and release incremental releasable functionality

• Scrum Master has no influence on how the team develop functionality

• Cross-functional• Pig, Hive, HDFS, YARN, and more

• Develop and release features faster

• Accountability belongs to the Development Team as a whole

• Team size: >=3 but <=9

• Normally composed of Hadoop Developer, Hadoop Architect, Data Scientist, Data Analyst, QA.

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum Master

• Ensures Scrum theory, practices, and rules are enacted

• Servant-leader for the Scrum Team• Coach Development Team in self-organization and cross-functionality

• Remove impediments to Development team’s progress

• Serves the Product Owner• Find techniques for effective Product Story management

• Help with clear, concise definition of Product Story items

• Ensures Product Owner knows how to arrange Product Story to maximize value

• Facilitate Scrum events as requested/needed

• Serves the Organization– Leading Scrum adoption

– Work with other Scrum Masters to increase effectiveness of Scrum application in the organization

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum FrameworkFail fast in Hadoop. Move fast with Scrum.

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The Sprint

• It is the heart of Scrum

• Time-boxed at 1 month or less. 2 weeks is pretty common.

• New Sprint starts immediately after conclusion of previous Sprint

• Consists of• Sprint Planning

• Daily Scrums

• Development Work

• Sprint Review

• Sprint Retrospective

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

During the Sprint

• No changes are made that would compromise the Sprint Goal

• Quality goals do NOT decrease

• Scope may be clarified and re-negotiated between Product Owner and Development team as more is learned

• ONLY Product Owner has the authority to cancel a Sprint

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Sprint Planning

• Time-boxed• 8 hours planning is to 1 month of Sprint or 2 hours of planning is to 2 weeks of Sprint

• Answers the questions:• What can be done this Sprint?

– Development Team forecasts what Product Story items it will deliver

– Output is Sprint Goal

• How will the chosen work get done?

– Development Team determines how to deliver the increments

– Output is Sprint Story

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Daily Scrum

• Driven by Scrum Master

• Time-boxed at 15 mins• Synchronize activities and plan for the next 24 hours

• Each Development Team member will be asked the questions:

– What has been done yesterday?

– What needs to be done today?

– What were the issues faced that prevented incremental progress to work?

• Highlights and promotes quick decision-making

• Improves communications and eliminate other meetings

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Sprint Review

• Time-boxed• 4 hour review is to 1 month Sprint or 2 hour review is to 2 week Sprint

• Scrum Team and Stakeholders collaborate on what was done in the Sprint.

• Informal meeting, NOT a status meeting. A demo of product is presented

• Scrum Team discusses• What went well during the sprint

• What were the issues faced

• What could be improved

• Output is a revised Product Story items for the next Sprint

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Sprint Retrospective

• Time-boxed• 3 hour meeting is to 1 month Sprint or <2 hour meeting is to 2 week Sprint

• Main purpose• Review how the previous Sprint went with respect to people, relationships, process, and tools

• Identify and order the major items that went well and potential improvements

• Create a plan for implementing improvements to the way how the Scrum Team does its work

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum Timelines

Product Story Sprint Planning Sprint Sprint ReviewSprint

Retrospective

Business Input

Immediate

Driven by Product Owner, Stakeholders,Scrum Master

Immediate

4 hours for 2 wk Sprint

2 weeksDaily Scrum

2 hours

2 hours

Immediate

<2 hours

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Meets ScrumScrum Tools

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Scrum Tools: Go modern or Archaic

• Agile Software is available i.e. www.rallydev.com, etc.

• LCD Projector

• Whiteboard and colored markers

• Long, contiguous wall

• Clustered cubicles

• Index card

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Meets ScrumSupporting Scrum in Hadoop Development

Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

What is needed in Hadoop to support Scrum?

• Multi-tenancy is critical • Setup security -> LDAP/AD, Ranger, Kerberos, Knox

• Setup HDFS quota for each Scrum Team

• Setup Capacity Scheduler Queue for each Scrum Team or member

• High Availability is important but not critical• Setup NN HA

• Setup RM HA

• Setup HiveServer2 HA

• Setup Hive Metastore HA

• Setup Multi HBase Master

• Setup Multi Knox Cluster

Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

What is needed in Hadoop to support Scrum?

• Establish a habit of disciplined performance tuning of Hadoop regularly• YARN, Hive, Tez, Spark, Kafka, Storm, Flume, HBase, Solr, Mapreduce, etc.

• Truncate logs regularly• All Hadoop component logs

• Truncate when at 80% disk utilization

• Logs are a gold mine. Learn to interpret it correctly.• Troubleshooting purposes

• Understanding how component operates, interoperate

• Turn off Hadoop services that are not needed• Save cpu, memory, disk space

• Do not forget to turn on maintenance mode. Ask your Hadoop Admin why.

Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

What is needed in Hadoop to support Scrum?

• Know your toolsComponent Best used for

Sqoop Ingesting RDBMS tables into HDFS and/or Hive

Flume Ingesting flat files from network file systems or file servers. Capped at 400,000 records/sec

NFS Ingesting flat files from NFS based file servers. ONLY ingest less than 1GB per file

Kafka, Storm, HBase Realtime, Streaming and Online processing. Perfect for IoT, CEP. They all go together in realtime systems.

Slider Deploying custom long running applications. i.e. Tomcat Apps, etc.

Spark Data science (Spark ML), Micro-batch Streaming (Spark Streaming)

MapReduce Only use it when Pig and Hive can’t do the job

Pig Perfect for ETL processing. Data mining and statistics (Apache DataFu)

Hive Reporting and Analytics. Data warehousing. Always use ORC!

Tez Never turn it off. Enable both for Pig and Hive for fast data processing

Falcon Process orchestration and data lineage

Knox, Kerberos, Ranger AuthN, AuthZ, Audit. Preventing impersonation.

Ambari Do NOT update config files manually. Use Ambari UI to make config changes in Hadoop.

Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Let’s Scrum!Putting Hadoop and Scrum to the test

Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The Project – HVAC Sensor Analytics

• Business wants to understand how the buildings are consuming energy and wants to start with HVAC. They want to determine which HVAC systems are working harder and prioritize for maintenance or replacement.

• Determine which HVAC products have the highest temperature deviation and order them by age.

• Recommend which buildings have the possible, poorest maintenance practices

Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

TODO

• Apply SCRUM principles and rules

• Properly size your team

• Break down the requirements into Product Story

• Determine Sprint Goal

• Generate one Spring Story

• Develop the app in Hive

• Any performance tuning to your tables and creates is a big +

Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Rules

• Spend 15 minutes as Sprint Planning

• We will do a 2 hour Sprint

• We will do daily Scrum meeting (just once) in the middle of 2 hour Sprint

• Spend 15 minutes Sprint Review

Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Q&A…Discussion