35
Breaking Azure for Fun and Profit avel Michailov dentity Division

Breaking Azure for Fun and Profit

Embed Size (px)

Citation preview

Page 1: Breaking Azure for Fun and Profit

Breaking Azure for Fun and Profit

Pavel MichailovIdentity Division

Page 2: Breaking Azure for Fun and Profit

Service Challenges

Page 3: Breaking Azure for Fun and Profit

Cloud Services - Resilience▪ Not a solved problem

▪ Goal is: ▪ 100% uptime▪ No degradation▪ Responsive

Page 4: Breaking Azure for Fun and Profit

Cloud Services - Deployment

Page 5: Breaking Azure for Fun and Profit

Cloud Services – Testing challenges▪ Continuous evolution

▪ Multiple dependencies

▪ Global distribution

▪ Traffic fluctuation

Page 6: Breaking Azure for Fun and Profit

Fault Injection System

▪ Inject faults in deployed service

▪ Verify correct service response

▪ Overcome limitations of traditional testing

Page 7: Breaking Azure for Fun and Profit

Agenda

System Overview

Applications

Page 8: Breaking Azure for Fun and Profit

System Architecture

Target Service VMs

Fault Management

Service

Fault Agent

Fault AgentCloud Management Service

Cloud Management Service

Page 9: Breaking Azure for Fun and Profit

Faults

▪ Resource pressure

▪ Network

▪ Processes

▪ Virtual machine

▪ Application specific

▪ Custom

Page 10: Breaking Azure for Fun and Profit

Resource Pressure Faults

▪ CPU

▪ Memory

▪ Hard disk▪ Capacity▪ Read▪ Write

Page 11: Breaking Azure for Fun and Profit

Network faults▪ Types

▪ Disconnect▪ Latency

▪ Filters▪ Domain / IP / Subnet▪ Port

Page 12: Breaking Azure for Fun and Profit

Process faults

▪ Stop / Kill

▪ Restart

▪ Crash

▪ Hang

Page 13: Breaking Azure for Fun and Profit

Virtual Machine / OS faults

▪ Stop

▪ Restart

▪ Re-image

▪ Machine Hang

▪ Change date

Page 14: Breaking Azure for Fun and Profit

Application specific faults

▪ Hooks▪ Instrument service code

▪ Intercept / Re-route calls▪ No access to service code

Page 15: Breaking Azure for Fun and Profit

Custom Faults

▪ Support for custom code execution

▪ Partner teams contribute as needed

▪ Faults subject to security review

Page 16: Breaking Azure for Fun and Profit

Injection mechanism

▪ VM External

▪ VM Internal – Service code external Agent

▪ VM Internal – Service code internal Hooks

Page 17: Breaking Azure for Fun and Profit

External injection▪ VM / Region Stop

▪ VM / Region Restart

▪ Re-image

Target VMTarget VM

Cloud Management Service

Cloud Management Service

Page 18: Breaking Azure for Fun and Profit

VM internal injection - Agent▪ Resource pressure

▪ Network

▪ Processes

▪ OS

▪ Detours

▪ …Target Service VM

Target Application

Virtual Machine

Operating System

Fault Agent

Page 19: Breaking Azure for Fun and Profit

VM internal injection - Hooks▪ Application behavior

▪ Flexibility

▪ Service specificTarget Application

Page 20: Breaking Azure for Fun and Profit

Security and Safety▪ Azure AD Integration

▪ Granular access control

▪ Secure communication

▪ Kill-switch/automated removal

Page 21: Breaking Azure for Fun and Profit

Applications

Resilience verification

Test new features Training

Page 22: Breaking Azure for Fun and Profit

Resilience Verification

Page 23: Breaking Azure for Fun and Profit

Automated Regression Testing

▪ Scheduled periodic test runs

▪ Verify alert generation

▪ Verify telemetry and service behavior

Page 24: Breaking Azure for Fun and Profit

Scheduled Runs

Page 25: Breaking Azure for Fun and Profit

Verify Alert Generation

▪ Integration with internal alerting system

▪ Configurable time window, expected field values

▪ Incident auto-mitigation/resolution

Page 26: Breaking Azure for Fun and Profit

Verify Service Behavior

Page 27: Breaking Azure for Fun and Profit

Security Verification

▪ Custom Faults▪ Local User Creation▪ Malware upload – EICAR test file

▪ Verify security alerting

Page 28: Breaking Azure for Fun and Profit

New Feature Verification

▪ Fill gap in testing frameworks

▪ Manual injection of relevant faults

▪ Existing regression tests catch edge-cases

Page 29: Breaking Azure for Fun and Profit

Challenges – Moving Parts

▪ Multiple unmocked components

▪ Complex scenarios difficult to verify reliably

▪ Time consuming

Page 30: Breaking Azure for Fun and Profit

Challenges – Adoption

▪ Full benefit only when applied across stack

▪ Non-functional testing often deprioritized

▪ Multi-team coordination difficult

Page 31: Breaking Azure for Fun and Profit

Recovery Games

Page 32: Breaking Azure for Fun and Profit

Recovery Games - Planning

▪ Attacker prepares weekly fault

▪ Identify area of interest

▪ Develop and test fault

Page 33: Breaking Azure for Fun and Profit

Recovery Games – During the Game

▪ Attacker injects fault, provides hints

▪ Defender assesses impact

▪ Defender provides mitigation plan

▪ Senior team members and managers observe

Page 34: Breaking Azure for Fun and Profit

Recovery Games - Goals

▪ Familiarize with monitoring tools

▪ Recognize outage patterns

▪ Train on assessing the impact

▪ Root-cause / mitigation mindset

▪ Practice log analysis

Page 35: Breaking Azure for Fun and Profit

Recovery Games – Issue Discovery