Continuous availability: from the shift paradigm to ... · 1/17/2018  · Simulation – continous...

Preview:

Citation preview

Continuous availability: from the shift paradigm

to unmanned operation.

Pietro Tiberi

17 January 2018 – TIPS Contact Group

2

Agenda

Continuous availability: from the shift paradigm to unmanned operation

1

Introduction

2

Continuous

Availability

3

Results

4

Conclusions and perspective

3

Introduction TIPS Non functional requirements - Reliability / Availability

(RPO=0)

(RTO=15 minutes)

Transactions Lost

Downtime

99.9%

Continuous availability: from the shift paradigm to unmanned operation

4

Introduction Datacenter Operations

Continuous availability: from the shift paradigm to unmanned operation

Human based

(on shifts) Unmanned

5

CONTINUOUS OPERATION

Continuous availability: from the shift paradigm to unmanned operation

6

Continuous Availability From high availability to continuous availability

Continuous availability: from the shift paradigm to unmanned operation

o Redundancy

o Fault Tolerance

o Clustering

o Active Active configuration

o Proactive

monitoring

o Continuous

delivery

o Automatic

remediation

o Dynamic capacity

management

7

Continuous Availability Proactive Monitoring

Continuous availability: from the shift paradigm to unmanned operation

o Infrastructure monitoring

o Application monitoring o Detect events

before failures

o Trigger automatic

actions

o Analyze the event

8

Continuous Availability IT Automation

Continuous availability: from the shift paradigm to unmanned operation

9

Continuous Availability From Agile to Devops

Continuous availability: from the shift paradigm to unmanned operation

10

Continuous Availability DevOps - Everything as Code

Continuous availability: from the shift paradigm to unmanned operation

Code

Virtual Infrastructure

11

Continuous Availability Dynamic Capacity Management

Continuous availability: from the shift paradigm to unmanned operation

o Consumption

trend analysis

o Resource utilization

rate optimization o What if scenarios

o Predict future

requirements and

trends

12 Continuous availability: from the shift paradigm to unmanned operation

13

Test Plant Architecture

Continuous availability: from the shift paradigm to unmanned operation

Message Layer

Database Layer

User A User B

Message Router

Message Processor

Message Router

Kafka Broker

Aerospike Database

write

store store

write

write read

put

get

get

put

Application Layer

14

Results Test Architecture

Specific tests to verify the relevant

domain functions.

Common simulation layer to

reproduce real operational

environment.

executed on

Continuous availability: from the shift paradigm to unmanned operation

15

Results Simulation – continous delivery (1)

Normal traffic condition (500 msg/s), timeout = 10.000 ms

Kafka cluster rolling update

0 messages lost

0 timeout expired

Continuous availability: from the shift paradigm to unmanned operation

SIMUL.APP.01 : message latency (1 sec average)

16

Results

Continuous availability: from the shift paradigm to unmanned operation 07 November 2017 – CMG Impact 2017

SIMUL.APP.02 : message latency (1 sec average)

Simulation – continous delivery (2)

Heavy traffic condition (2000 msg/s), timeout = 10.000 ms

Kafka cluster rolling update

0 messages lost

some timeout expired

17

Results Simulation – proactive monitoring

Continuous availability: from the shift paradigm to unmanned operation

Normal traffic condition (500 msg/s)

average E2E processing time = 45 ms

High vCPU load added to Message Processor nodes.

T0-T1 below threshold

T2-T3 exceed threshold

18

Conclusions and perspective

Phased

Approach Bi-modal

Data Center

Tool

Continuous availability: from the shift paradigm to unmanned operation

Continuous availability: from the shift paradigm

to unmanned operation.

Pietro Tiberi (pietro.tiberi@bancaditalia.it)

Thanks for your attention