Upload
diego-pacheco
View
508
Download
1
Embed Size (px)
Citation preview
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The Game!
@diego_pacheco Diego Pacheco @ ilegra.com Principal Software Architect
Cloud Native
Stateless Services Ephemeral Instances Everything fails all the time Auto Scaling / Down Scaling Multi-Region No SPOF Design for Failure (expected)
SRE
Reliability is defined as “the probabilityof failure free software operationfor a specified period of timeIn a specified enviroment”
Design for Failure Test Caos and Failures Automate as much as possible There are Tools that could help us Create Culture of Caos/Failure Testing periodically Ops Tooling / Metrics Incident Training Chaos / Failures
New Practices
Exception Handling Isolate Failure – Avoid JEE like Cascading Redundancy – NO SPOF Auto-Scaling Clusters Fault Tolerance and Isolation Fallbacks and Degraded Experience Protect Customer from failures – Don’t throw
Failures -> Failures VS Errors
Design For Failure
https://github.com/Netflix/SimianArmy
Runtime Testing