Upload
josh-evans
View
83.117
Download
1
Embed Size (px)
Citation preview
PowerPoint Presentation
Josh Evans - Director of Operations Engineering
November 16, 2015Beyond DevOps:How Netflix Bridges the Gap
Technical Debt
Java 6PerforceSingle Master JenkinsAntCentOSAsgard/Mimir
Fall 2013
Java 6 needed to move forward on Java but struggled to drive adoptionPerforce many teams moving to Git no story for supporting perforce in the cloudJenkins long queues & build timesAnt long build times, inefficient dependency managementCentOS slow delivery of new kernel and userland binariesAsgard served us well as a deployment & cloud managementMimir gave a great prototype and we learned a lot
Tech debt kept us from doing our jobs well
How do we drive broad-based change?
Does this sound familiar? Have any of you been on one side or the other of this situation?
The Paved RoadJava 7StashJenkins ShardsGradleUbuntu
To move forward we defined the concept of the paved roadThe paved road promises a well supported integrated developer experience. Java 7 just to move forward Java 8 already on the horizonGit organically adopted by many teams Gradle built time reduced due to efficient dependency managementUbuntu more frequent, well vetted userland binarie & kernelsJenkins shards to fix long build timesStarted building our next generation cloud console & continuous delivery platform Spinnaker
We staffed up and went for it big bang
Some said Youre overloading usToo many projectsPoor targeting
Others saidWhat took you so long?Weve moved onNow we need to migrate
Thats great butWere paying a high tax
Expectations gapDivision of laborTiming of solutionsLeadership
AffectsReputationRelationshipsLost opportunities
Organizational Debt
How do we bridge the gap?
Remember that TIME is money
Read to the audience:
He that can earn ten shillings a day by his labour, and goes abroad, or sits idle one half of that day, tho' he spends but sixpence during his diversion or idleness, ought not to reckon that the only expense; he has really spent or rather thrown away five shillings besides.
- Advice to a Young Tradesman
Time is a form of currency
Please raise you hand if you know which puritanical workaholic wrote this?In addition to the obvious intent behind this there is a more profound message. Time spent working is related to the money you make but time is also in and of itself a form of currency. Its the exchange or giving of time that drives the economics of an engineering organization
Product EngineeringOperations EngineeringChallenges & StrategiesOur time today
Product EngineeringOperations EngineeringChallenges & StrategiesOur time today
Product Innovation
winning moments of truth
Every facet of the product1400 AB tests in the last year & accelerating
Continuous Innovation
But wait, theres more
Build Itdesigncodebuildbaketestdeploy
Run Itconfiguremonitortriagefixat scale, globallyYou build it, you run it
Netflix has a freedom & responsibility culture. You build it you run it perfectly aligns with our values around autonomy & ownership
Internet
1000s of starts per second100,000s of requests per second100,000,000 hours of content / day
3 AWS Regions, 3 AZs per region
Relentless product innovationBuilding & running micro-services at scale, globally
This leads a high pressure situation created a shortage of time.
Product EngineeringOperations EngineeringChallenges & StrategiesOur time today
DevOps is a software development method that emphasizes the roles of both software developers and other information-technology (IT) professionals with an emphasis on IT Operations.
- WikipediaThe Gap
Read definition out loudOut of curiosity who agrees with this definition? Who disagrees?Not only is there disagreement but the general construct isnt really that helpful
Why? How?
It doesnt address how to bridge the gap or why it matters to do so?Whats are the strategies for success?
Its the practices, tools, cultureMotivations the reason for doing DevOps is to achieve operational excellence
QualityVelocityOperational Excellence
Operational Excellence is the continuous improvement of the management, design, and function of operational environments to achieve greater quality, velocity, and competitive advantage.
Engineering ToolsInsight & Real-time AnalyticsPerformance & ReliabilityOperations Engineering is the application of software engineering practices to achieve and sustain operational excellence.
We do the undifferentiated heavy lifting for out customers. This means we take on the operationally oriented common engineering work across teams so that each team can focus on their core charter.
Operations Engineering
Service providerOperational excellence driver
Cross-cutting solutionsUndifferentiated heavy lifting
We do the undifferentiated heavy lifting for out customers. This means we take on the operationally oriented common engineering work across teams so that each team can focus on their core charter.
Product EngineeringOperations EngineeringChallenges & StrategiesOur time today
Youre overloading usWhat took you so long?
Remember that feedback?We made assumptionsRequirements what & whenTime for non-product work
Move from assumptions to knowledgeAffect change without imposing a tax?Achieve and sustain operational excellence?How do we
Time is a form of currency
Going back to our Ben Franklin quote time is a form of currency. In our engineering world time really is currency. We dont pay each other to do work.We commit time to projects. In other words we have a time-based economy.
5 strategies for successin time-based economies
software & organizational engineering
Audience can anyone name one of the strategies?
1. Reach out
What are your biggest operational pain points?How can we help?How well are we meeting your needs today?What would you like to see from us in the future?
Listen
Shower, rinse, repeatTalk to your engineering customers
Grease the Squeaky Wheelslow tolerance for taxmore vocal than most
Stop spamming us!
High impact solutions Clarity on deliverablesLower operational taxLeadership, innovation, and partnershipWhat they wanted
Deliver on solutions Better road map definition & communicationA more aggressive stance on automationDeeper investment into leadership, innovation, planningOur commitments
2. Make an impactApply what youve learnedDeliver what matters
global cloud consoleend to end deliveryautomation platform
velocity with confidence
Pipelines - Automated Global Delivery
3. Make it easy to do the right thing
Audience can anyone name one of the strategies?
A free chaos monkey for good ones
Engineering time is scarce
We must do more heavy lifting
Supply & Demand
Spinnaker manual stepAutomated migrations MimirProvide on-ramps
Automate proven practices
Alerting and MonitoringApache & Tomcat HardeningAutomated Canary AnalysisAutoscalingChaos ParticipationConsistent NamingELB ConfigurationHealthcheck ConfiguredRed-Black PipelineSqueeze TestingTimeout & Fallback TuningWorkload Reliability
Production Ready?
Alerting and MonitoringApache & Tomcat HardeningAutomated Canary AnalysisAutoscalingChaos ParticipationConsistent NamingELB ConfigurationHealthcheck ConfiguredRed-Black PipelineSqueeze TestingTimeout & Fallback TuningWorkload ReliabilityProduction Ready?
Old Version (v1.0)New Version(v1.1)Load BalancerCustomers100 Servers5 Servers95% 5%MetricsCanaries
Old Version (v1.0)New Version(v1.1)Load BalancerCustomers0 Servers100 Servers 100%MetricsCanaries
DefineMetricsA threshold
Every n minutesClassify metricsCompute scoreMake a decision
Automated Canary Analysis
Canary AnalysisPerformanceIntegration TestsChaosConformityStaticUnit Tests
Make it easy to do the right thing
Static & Functional Testing
4. Reduce the cost of change
\
Ongoing migrationsLibrary propagation
100s of micro-servicesComplex dependencies
Continuous, Broad-based Change
There are several approaches that you might take to solve for this problem. Ill explore each one.
Change EngineeringLocateCommunicateFacilitate
Automated forensicsWho last touched x?What team?Who was their manager?Who owns this artifact, repository, service?
WhitepagesWorkday wrapperApp & REST APIOrganization hierarchyMetadataChange log
(###) ###-####
KriegerREST-based serviceSourcesWhitepagesStashEddaJenkinsSpinnakerEtc
{ "content": {}, "_links": { "employees": { "href": "/api/employees/" }, "projects": { "href": "/api/projects/" }, "teams": { "href": "/api/teams/" }, "applications": { "href": "/api/applications/" }, "jobs": { "href": "/api/build/jobs" }, "masters": { "href": "/api/build/masters" }, "projectDistribution": { "href": "/api/teams/projectDistribution" } }}
/api/employees?q=jevans "employees": [ { "id": "241", "firstName": "Josh", "lastName": "Evans", "username": "jevans", "email": "[email protected]", "jobTitle": "Director of Operations Engineering", "isManager": true, "isCurrent": true, "title": "Josh Evans (jevans) - Operations Engineering", "_links": { "self": { "href": "/api/employees/241" }, "manager": { "href": "/api/employees/117890" }, "team": { "href": "/api/teams/f9134a81" }, "projects": { "href": "/api/teams/f9134a81/projects" } } } ] }
Security vulnerabilitiesWho owns this service?
Platform updatesWho is using this version of this library?
Today Targeted Coordination
Automated, efficient technical project management
CommunicationGuidanceTracking
Low tax for TPMs & engineers
Security FixJava 9GuavaFuture Change Campaigns
5. Develop Partnerships Beyond supply & demand
And once youve proven that you can deliver you have some money in the bank. You have earned a seat at the table. Now youre ready to build strong partnerships.
Nearing completionAggressive scheduleUnexpected delaysCommitment to June deliverySpinnaker 1.0 1H 2015
Built their own continuous delivery solutionNot positioned for engineering-wide supportBelieves common solutionsEdge Engineering
Partnership in ActionStrong relationshipOpen discussions about concernsDecision - leaned forward
+2 engineers on SpinnakerSuccessful 1.0 launch
Moving Forward TogetherContainers?Achieving alignmentCollaborative explorationEdge, Platform, OperationsA new paved road?
Paved Road adoptedAdding new ones Production Ready ongoingMigrations easierReputation improvingImprovedService uptimeRate of changePayoffs
Putting it to the test in 2016
Streaming production & test - EC2 Classic to VPCHighly cross-functionalComplex dependenciesZero downtime
Stay tuned
Five StrategiesReach outMake an impactMake it easy to do the right thingReduce the cost of changeDevelop partnerships
Open Sourced!https://netflix.github.io/
Josh Evans [email protected] @ops_engineeringQuestions?