37
SRE Bruno Connelly Evolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER

Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

SRE

Bruno Connelly

Evolution of Linkedin SRE & How Catalyzers Shaped It

SREcon18 Asia

Isha GaneriwalTECHNICAL PROGRAM MANAGER

Page 2: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

TPM @ SRECon

Relevance & importance of Non-SREs in the world of SREs

Page 3: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Looking back 2010 . . .

Page 4: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

LinkedIn Operations

● Classical, stratified model: Systems, Networks, Applications, DBA

● Heavy-weight processes driven by tickets and heroes

● Culture of not trusting developers in any deployed environments

● Huge wall and growing frustration between Dev and Ops teams (and in ops itself)

● 7 engineers in total made up NOC, SRE, Release Operations: “Site Operations”

● On-call was horrible

2010

Page 5: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Is the Site Up?

● Peak traffic periods Mon-Wed ~ 8am

● Regular capacity related outages Mon-Wed ~ 8am

● Zero tolerance for failure in the application stack

● Near zero instrumentation

● Bi-weekly downtime maintenances

2010

Page 6: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Member Growth

500,000,000

450,000,000

400,000,000

350,000,000

300,000,000

250,000,000

200,000,000

150,000,000

100,000,000

50,000,000

02003 2004 2005 20072006 2008 2009 2010 2011 2012 2013 2014 2015 2016

SRE Established

7 Years of Tech Debt

32% YOY Growth

# Members

2017

Page 7: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

What is SRE @ LinkedIn?

Page 8: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Core Principles

Site Up Empower Developer Ownership

Operations is an Engineering Problem

1 2 3

Page 9: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

LinkedIn’s Engineering Hierarchy of Needs

Magic

Site Up & Secure

Technology at Scale

Development at Scale

Solid APIs & Building Blocks

Page 10: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

High Level

SFSan Francisco

SNVSunnyvale BLR Bangalore

NYC New York City

Composed of Software, Database, and Infrastructure Engineering generalists that make LinkedIn work

SREsGlobally

Page 11: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Generalists you say?

Page 12: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

"the fox knows many things, but the hedgehog knows one

big thing."

-- Archilochus, Greek Poet

Page 13: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance
Page 14: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance
Page 15: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

"Expect the best, plan for the worst, and prepare to be surprised."

Page 16: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Catalyzers: Technical Program Managers

TPM

Page 17: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Catalyzers: Technical Program Managers

TPM ? ?

Page 18: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Catalyzers: Technical Program Managers

Partners & Leaders in your organization

1

Execute Right&

Execute what is Right

2

Metric Oriented

3

Page 19: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Poll Upcoming: Go to www.slido.com - Code: #X563

Page 20: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Unplanned VS Planned

PLANNED WORK

UNPLANNED WORK

Page 21: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Vote at www.slido.com

Let’s answer a few common SRE questions!

1. Enter event code - #X563/ SRECon APAC 20182. Answer Polls

https://wall2.sli.do/event/vdlvm7hl

Page 22: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Poll Results

Page 23: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Data Points TPMs Capture

● The number and severity of incidents over a month.● The availability of your services● Growth projections of your services / Capacity planning

LEVEL 1

● How does your Oncall data look? ● How much of work is planned vs unplanned?

LEVEL 2

● How do you feel about your craftsmanship, how about your partners?● How do you feel about the relationship with your partners?

LEVEL 3

Page 24: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Things I know Things I don’t know but users know

Things that collectively we all know

Things that team members know but are not vocalized

Transparent & Known

Blind Area

UnknownThe Hidden Zone

SELF

TEAM

The BLINDSPOTS

Page 25: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

The FEEDBACK Loop

Process / Project / Tool

Iterative Feedback Loop for course correction

Fast Feedback Loops

Page 26: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

The 5 STEP Plan

Approach the Feedback1

Know your audience

Isolate and Triage Issue3

2

Remove the Facade

Roll with the solution

4

5

Page 27: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Overall Takeaways for SRE

Keep calm & trust your TPM

1

What gets measured gets fixed

2

If you’re not a part of the solution, you’re a part of the

problem

3

Page 28: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Catalyzers: Security

Security

Page 29: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

“Changes in production applications are happening at a greater rate than ever before. New product ideas can be visualized in the morning and implemented in code in the

afternoon.”

Page 30: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Innovation and Rate Of Change

Embrace the Error Budget• Self Healing & Auto Remediation• Reduction of Manual Process

Inject Engineering Discipline• Review when architecture changes reach a

certain complexity point.

“Trust but Verify” • Security to follow SRE “trust but verify”

approach towards engineering partners

Page 31: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

“Microservice architectures are exploding to meet scalability requirements”

Page 32: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Microservice ArchitectureSECURITY CHALLENGES ARE SIMILAR TO SRE

● Authentication

● Authorization

● Access Control Logic

SRE Challenges Security Challenges

● Latency & Performance Impact

● Cascading Failure Scenarios

● Service Discovery

Page 33: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Catalyzers: Security @ SRE

SRE SECURITY

Page 34: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Production Access & Change Control

Configuration as code, leveraging source code control paradigms, are a huge boon to security.

Rollback ruthlessly.

● Start with a known-good state ● Asset management and change control discipline● Ensure visibility● Validate consistently and constantly

Page 35: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Overall Lessons for SRE

Remove single points of security failure like you do

for availability

1

Assume that an attacker can be anywhere in your system

or flow

2

Capture and measure meaningful security

telemetry

3

Page 36: Evolution of Linkedin SRE & How Catalyzers Shaped ItEvolution of Linkedin SRE & How Catalyzers Shaped It SREcon18 Asia Isha Ganeriwal TECHNICAL PROGRAM MANAGER TPM @ SRECon Relevance

Giveaways

Align with your catalyzers, and let them help you.

TPMs and Security can help you reduce your tech debt.

Measure your data, and isolate the issues.

Failing to plan is planning to fail.

Generalists will always need specialists and vice versa - That’s how we grow together.