41
Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig

Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Failing GracefullyAs A Feature

Lorne KligermanDirector of Product, Gremlin

@lklig

Page 2: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

2

Page 3: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

3

Page 4: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

4

Be down in 10!

T-Ho 2017

Hey team… bit of a spill but I’m fine.

Page 5: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

5

We Expect Technology To Just Work™

Page 6: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Technical Issues Likely Cost Retailers Billions12.01.16

Macy’s, Lowe’s hit by Black Friday technical glitches11.27.17

Retail outages online leave shoppers frustrated on Black Friday11.23.18

People.com

Black Friday Failures

@lklig

Page 7: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Wells Fargo accidentally foreclosed hundreds of homeowners8.7.18

Customers report difficulty accessing Chase Bank mobile and online2.16.19

Citibank Website down, not working 2.28.19

Investopedia

Breaking Banks

@lklig

Page 8: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Computer Problems Blamed For Flight Delays4.1.19

Major US Airlines hit by delays after glitch at vendor4.1.19

Pilots of doomed Boeing 737 MAX fought the plane’s software and lost4.4.19

Airline Incidents

@lklig

Page 9: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

9

Technology is fragile.When it breaks, we

shouldn’t notice.

@lklig

Page 10: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

10

Plan ahead to keep your

users happy

FAILURE

GRACEFUL DEGRADATION

@lklig

Page 11: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

11

Why Are Failures So Common?

Page 12: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

12

Legacy Systems

@lklig

Page 13: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

13

Lack of Testing

Failure

UI

End to end

Integration

Unit

@lklig

Page 14: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

14

With Scale Comes Complexity

@lklig

Page 15: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

@lklig

Page 16: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

@lklig

Page 17: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

17

What Can We Do About It?

Page 18: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

18

Design For Failure

Page 19: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

19@lklig

Page 20: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

20

Designing For Failure

Key User Stories & Features

Edge Cases From Unexpected User

Behaviour

Dependency Failures

@lklig

Page 21: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

2121@lklig

Page 22: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

22@lklig

Loading Screens Are Not Graceful

Page 23: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

23

Inject Failure

By Breaking Things On Purpose

@lklig

Page 24: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Inject failure one service at a time.

Maintain critical functionality.

24@lklig

Page 25: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Common Failures Modes

25

ErrorsHTTP 400, 401, 402

500, 503

Blackhole Latency

@lklig

THAT DEGRADE THE USER EXPERIENCE

@lklig

Page 26: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

26

Degrade Gracefully

Page 27: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

27

Graceful Degradation

● Provide the best possible experience

● All but the most critical functionality can fall off

● Don’t give up on your users, hold state as long as possible

@lklig

Page 28: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

28@lklig

When one dependency fails, users are often affected

Storage

Auth

User Data

Content

Cache

Feature 1

Feature 2

Page 29: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

29

Implemented As Designed

@lklig

Page 30: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

30

Added Latency

@lklig

Page 31: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

31

Blocked Video Link

@lklig

Page 32: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

32

Blocked JQuery Request

@lklig

Page 33: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

33@lklig

Page 34: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

34

Delight Your Users

Page 35: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

3535

Graceful Degradation Done Right

@lklig

Page 36: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

36

Positive Business Impact

Product Launch

Delight users with new features

Success Metrics

Quantitative goals of the launch

Product Landing

Were the goals achieved? Why or why not? What got in way?

@lklig

Page 37: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

37@lklig

Maintain release velocity

Deliver a positive user experience

Engineers spend less time in war rooms

Plan Experiments Early

@lklig

Page 38: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

38

RELIABILITY THROUGH CHAOS ENGINEERING

Design for Failure

Identify the most critical end user

functionality.

Inject Failure

Impact your system to be sure your user experience

isn’t impacted.

Degrade Gracefully

Plan for non critical functionality not to

get in the way.

Delight Your Users

Your product metrics will show behaviour, no

matter the condition.

Graceful Degradation As a Feature

@lklig

Page 39: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

USE LORNE FOR 20% OFF

Page 40: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

40

gremlin.com/lorne

Page 41: Failing Gracefully As A Feature - QCon...Failing Gracefully As A Feature Lorne Kligerman Director of Product, Gremlin @lklig 2 3 4 Be down in 10! T-Ho 2017 Hey team… bit of a spill

Q&A

Lorne KligermanDirector of Product, Gremlin

@lklig