1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota...

Preview:

Citation preview

1

OPNFV Summit 2015

Doctor - Fault

ManagementGerald Kunzmann, DOCOMO

Carlos Goncalves, NEC

Ryota Mibu, NEC

2

Doctor Overview

• Goal

– Build fault management and maintenance framework

• Approach

– Identify requirement– Gap Analysis– Implementation work in Upstream (OpenStack)– Integration and testing

• Status

– Initial Requirement study, architecture design, Gap analysis : Done– Collaborative Development: On-going (3 merged Blueprints in

OpenStack Liberty)– Standardization Sync: On-going (by NFV member efforts, joint meeting)

3

Doctor Members

• At project creation (Dec 2014)

– NTT DOCOMO, Sprint– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco

• Now (Oct 2015)

– NTT DOCOMO, Sprint, AT&T, Telecom Italia, KDDI– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco

Cloudbase Solutions, Spirent, Intel, ZTE

2x

4

Assumption of VNF (NFV Application)

• Telco Applications basically deployed in active-standby or active-active fashion

App (Active) App (Standby)

VM VM

Machine Machine

App and App Manager

(VNFM) cannot detect HW

failures directly

App state will be switched when failure

occurred

5

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4

Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4

X1. Fault Monitoring

- Hardware fault- Hypervisor fault- Host OS fault

6. Execute Instruction- e.g. migrate VM

2. Inform the Consumer?If YES, find owner of

affected VMs from database

OpenStack Northbound Interface

3. FaultNotification(VM ID, Fault ID)

5. Instruction(VM ID)

4. Switch to SBY configurationV

Use Case 1: Fault management

6

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4

Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4 6. Execute Instruction- e.g. migrate VM

OpenStack Northbound Interface

3. Maintenance Notification

(VM ID)5. Instruction(VM ID)

4. Switch to SBY configuration

V

2. Which VMs are affected?Find Consumer owning the VM(s) from the database.

Administrator

1. Maintenance Request (Server S3)

Use Case 2: Maintenance

7

Fault Management Sequence

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Detectio

n

Reaction

Doctor Scope

8

Key Requirements as VIM

Immediate Notification

Consistent Resource State

Awareness

Extensible Monitoring

Fault Correlation

9

Doctor Architecture and Typical Scenario

Monitor

Notifier

Manager

Virtualized Infrastructure

(Resource Pool)

AlarmConf.

3. Update State2. Find Affected

Application

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

MonitorMonitor

10

Doctor OSS Map

Monitor

Notifier

Manager

Virtualized Infrastructure

(Resource Pool)

AlarmConf.

3. Update State2. Find Affected

Application

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

MonitorMonitor

Ceilometer

e.g. Monasca

e.g. Zabbix

Cinder

Neutron

Nova

11

Doctor OSS Development

Monitor

Notifier

Manager

Virtualized Infrastructure

(Resource Pool)

AlarmConf.

3. Update State2. Find Affected

Application

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

MonitorMonitor

Ceilometer

Event Alarm

Cinder

Neutron

Nova

State Correction

e.g. Zabbix

e.g. Monasca

12

Doctor Blueprints in Liberty Cycle

Project BlueprintSpec Drafter

Developer Status

Ceilometer

Event Alarm EvaluatorRyota Mibu (NEC)

Ryota Mibu (NEC)

Completed (Liberty)

Nova

New nova API call to mark nova-compute down

Tomi Juvonen (Nokia)

Roman Dobosz (Intel)

Completed (Liberty)

Support forcing service downTomi Juvonen (Nokia)

Carlos Goncalves (NEC)

Completed (Liberty)

Get valid server stateTomi Juvonen (Nokia)

Spec approved (Mitaka)

Add notification for service status change

Balazs Gibizer (Ericsson)

Balazs Gibizer (Ericsson)

Waiting for spec approval (Mitaka)

13

Doctor BP Detail: Nova – Mark Nova-Compute Down

Host / Machine

Hypervisor

VM

nova comput

e

nova api

nova conduct

or

nova schedule

r

nova DBqueu

e

External Monitoring

Service

vSwitch

BMC

EXISTING(periodic update)

Force-down API

NEW APIto update nova-computeservice state

service state

MonitoringClient

14

Doctor BP Detail: Ceilometer - Event Alarm

sample

Notification-driven alarm

evaluatorNEW Shortcut(notification-based)

EXISTING(polling-based)

Manager

Audit Service

stats

notification

event

CinderNeutro

nNova

15

Doctor Southbound API

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

ConfigurationFault Messaging

Unified Event API Monitor

Monitor

Threshold

Enable

Enable

16

Doctor Status

Notifier MonitorController Inspector

Ceilometer Z

abbixNova

Monasca? DPD

K

Neutr

on

Cin

der

Done

Next

Ste

pTo-Be Arch.

Design

Gap Analysis

Blueprint

Coding

Integration

OPNFV Release

Dec 2014

Sep 2015

Feb 2016

Mar 2015

17

Don’t miss out...• “Doctor – Fault Management”

Project Theater, Wednesday, 3:55 pm – 4:15 pm

• “Doctor: Failure Detection and Notifiaction for NFV” DOCOMO booth, PoC Demo Zone

Recommended