17
Michael Kehoe Senior Site Reliability Engineer LinkedIn SouthBay SRE Meetup LinkedIn Traffic Shifting

SouthBay SRE Meetup Jan 2016

Embed Size (px)

Citation preview

Michael Kehoe Senior Site Reliability Engineer

LinkedIn

SouthBay SRE MeetupLinkedIn Traffic Shifting

2

$ whoami Michael Kehoe

• Sr Site Reliability Engineer (SRE)• Member of PROD-SRE• https://www.linkedin.com/in/michaelkkehoe

3

LinkedIn Multicolo History

4

What is a Traffic Shift?

• Edge(PoP)shift• DatacenterLoadshift• SingleMasterFailovers

5

Why do we do traffic shifts

• Tomitigateuserimpactfromproblemswitha3rdpartyproviderorLinkedIn’sinfrastructure/services

• TovalidateDisasterRecovery(DR)incaseofanydatacenterfailure

• Tovalidateandtestcapacityheadroomacrossourdatacenters

• Toexposebugsandsuboptimalconfigurationsbyloadtestingoneormoredatacenters

• Toperformplannedmaintenance• Tovalidateandexercisethetrafficshiftautomation

6

Traffic shifting How do we do it?

7

Edge Traffic shifts How does it work

• WeuseIPVStoloadbalanceatouredges• Wecanwithdrawanycastroutestoremovetrafficfrom

thatPoP• HealthchecksonouredgeproxyaretestedbyDNS

providerstoverifywhetherthatPoPisinrotation• Wecanfailthosehealthcheckstoremoveunicast

trafficfromthatPoP

8

Edge Traffic shifts

9

Datacenter Traffic shifts How does it work?

• Differenttraffictypesarepartitionedandcontrolledseparately• Logged-invsLogged-out• CDN• Monitoring• Microsites

• Logged-inusersareplacedinto‘buckets’andhaveprimary/secondarydatacenterassignments

• Bucketsaremarkedonline/offlinetomovesitetraffic

10

Mitigating Impact What a traffic shift looks like

11

Load testing How do we do it?

12

Load testing How do we do it?

13

Single Master Failover How does it work?

• Onlyusedinextremecases• LeveragedistributedlockinginApacheZookeeper• Singlemasterserviceshaveaspringcomponentthatchecks

themastershipoftheserviceinaparticulardatacenter

14

Single Master Failover How does it work?

15

Conclusion

• Thebestwaytoprepareforadisasteristopracticeoneregularly!

• Toolingandautomationisyourbestfriendduringanoutage• Capacityplanning/managementisextremelyimportant

16

Questions?Thank You

©2014 LinkedIn Corporation. All Rights Reserved.