Peter van der Veen QNX Software Systems

Designing High-Performance Network Elements Using Multiprocessing Technologyand Adaptive Partitioning

Peter van der VeenQNX Software Systems

2

Typical Hardware Architecture

Network

Network

Chassis

Network

Network

Network...

Hig

h-s

pe

ed

inte

rco

nn

ect

Lo

w-s

pee

d b

us

Line card

Control card

3

Typical Netcom System Software Constraints

Kernel(RTOS)

Application

TCP/IPStack

Filesystem

DeviceDriver

Application

SS7Stack

Many millions of lines of code

Tens to hundreds of S/W components

Hundreds to thousands of processors and threads

Strict availability requirements

DeviceDriver

DeviceDriver

4

Software Architecture

Thread E

Thread B

Thread C Thread DThread B

Thread BThread A

ROUTE MANAGERThread D

FILE SYSTEM ETHERNET DRIVER

QNX NEUTRINO REALTIME SCHEDULER (OS)

PRIORITY

Thread A

Thread C Thread E

MEMORY

CPU

CACHE

CPU

CACHE

CPU

CACHE

CPU

CACHE

HIGH-BANDWIDTH CPU BUS

Multiple processors sharing common hardwareCommon memory bus and

address spaceAccess to all peripheral devices

and interruptsOS manages tasks running on

processors – true concurrency Transparent to application

programs No incremental hardware No application software

changes needed

Symmetric Multiprocessing

6

SMP Memory Organization

e600 core1

e600 core0Apps "A"

Apps "B"

OS

Apps "A"

Apps "B"

Sharedmemory

Physicalmemory

OS

Apps "A"

Sharedmemory

OS

Apps "B"

Sharedmemory M

MU

The OS kernel resides at physical memory address 0, addressable by both cores

The MMU relocates applications and shared memory appropriately

OS

MM

U

7

Making the Most of SMP

Concurrency … divide and conquer► Write software components using threads

► Remove serializations from dataflow

Caches … keep them hot► Minimize writes to globally shared data

► Process data on the same processor where possible

Scheduling … get your ducks in a row► Take advantage of the OS scheduler

► Use diagnostic tools to adjust runmasks and priorities

8

SMP Optimizing Tools

System Profiler► Provide a timeline view of activity

in the system

► Identify resource contention and serialization

► Analyze SMP scheduling thrashing

► Visualize distributed message passing

CPU Performance Counters► Count operations such as cache

misses

► Statistically sample based on significant events

Adaptive Partitioning

10

Introducing Adaptive Partitioning

What is Adaptive Partitioning?► Adaptive partitioning is a new QNX product that extends the Neutrino RTOS► Allows you to build secure compartments or “partitions” around a set of

applications or threads► Partitions enforce CPU guarantees for applications, controlled by easy to

use budgets

Why is it Adaptive?► Patent-pending design ensures all available CPU cycles are given to

partitions that need processing time – no CPU cycles wasted

► Provides performance advantage by permitting full processor utilization to accommodate spikes in demand

Easy to get started► No changes to how designers work today

POSIX programming model for the same, familiar design, programming & debugging techniques

► No code changes are required to implement partitions

11

0% 20% 40% 60% 80% 100%

System Restart

Steady State

TopologyChange

Reconfiguration

Routing & Forwarding

ManagementInterfaces

(CLI, SNMP)

5%

10% 70% 20%

5%

10%

95%

80%

90%

ProcessingLoad

Scenarios

Understanding “Adaptive”

Maintenance

Idle Time10%

5%

12

MaintenanceManagement

Defining Partitions

Management Interface

QNX Neutrinomicro-kernel


Maintenance

Given the processing scenarios, choose a partitioning approach and appropriate partition budgets

5% 75% 20%


13

0% 20% 40% 60% 80% 100%

Restart

Steady State

TopologyChange

Reconfigure

0% 20% 40% 60% 80% 100%

Restart

Steady State

TopologyChange

Reconfigure

Management Interface

QNX NeutrinoMicrokernel Maintenance


5% 75% 20%

Understanding “Adaptive” Partitioning

10% 10%

10%

5%

5%

5% 75% 20%

95%

80%

85%

Adaptive

75% 20%

10%

10%

75%

75% 10%

20% 75% 5%

5%

5%

5%

Static

10%

CPU Time wasted when partitions do not consume their

budget. Applications cannot benefit from available time.

10%

Adaptive: Budgets enforced when CPU is loadedAdaptive: Applications can

use free CPU time if available from other partitions

Uses for Adaptive Partitioning

15

Security Threats

Embedded systems are becoming network connected► Untrusted interfaces and network threats► Untrusted add on software

If appropriate measures are not included by design, your product’s security and availability can be compromised

Rogue software can launch denial of service (DOS) attack and starve core applications of CPU time

► Need to ensure untrusted, add-on software can be contained to guard against attacks

Distributed DOS attacks can busy your system with network processing

File System

Networking Core Application

CoreApplication

QNX NeutrinoMicrokernel

Add-On

Add-On

Device Drivers

CoreApplication

Networking stack hogging CPU time

Rogue add-on stealing CPU time

16

Partitioning to Contain Threats

Create OS enforced partitions to ensure critical system resources are protected

► Ensure CPU available for core functions► Partition inheritance ensures applications get CPU time for OS

services (such as drivers, file systems, networking)

Contain threats and protect core applications► Limit impact of rogue applications

File System

Networking Core Application

CoreApplication

QNX Neutrinomicro-kernel Add-On

Add-On

Device Drivers

CoreApplication

Networking

Consuming CPU Time

Rogue add-on

thwarted

How Adaptive Partitioning Works

18

Partition Accounting

What does “30% CPU Budget” mean?► CPU usage is calculated over a sliding window.

► Partition budget guaranteed percentage of cpu time, balanced over sliding window

► Partition usage CPU time executed, during last sliding window, expressed as percentage

Accuracy► Counting ticks is not enough. “Micro-billing” is used to track actual CPU utilization even when

threads don’t use their whole timeslice

► Micro- and nano-second resolution

► Threads are billed based on real usage, not statistics

“windowsize” is configurable as an argument to kernel at boot► Tradeoff maximum READY-state latency with accuracy of CPU budgeting

100ms window -> 1% accuracy or better.

► Internal arithmetic accurate to 0.5% or better

T= nowT= -100ms

User InterfaceQNX NeutrinoMicrokernel

DiagnosticsRoute

CalculationData

Acquisition

30% 40% 30%

19

Behavior During Normal Load

Ready

6

67

4 1010

Hard real-time, priority based scheduler under normal load

Running thread selected as highest priority READY thread

No delay on scheduling if adaptive partition has budget

CPU BudgetAvailable

CPU BudgetAvailable

Blocked

6

118

99

Running

20

Behavior During Overload

Ready

6

67

410

CPU BudgetAvailable

CPU BudgetExceeded

Blocked

6

118

9

Runs beforehigher priority

Partition budgets are enforced when the CPU is fully loaded

Highest priority READY thread in partition with budget runs

No delay on scheduling if partition has budget

Ready – No Budget

21

Behavior with Free CPU Time

Blocked Running

6

118

6

67

4

If no partitions with remaining budget have READY threads, highest priority READY thread is selected to run from other partitions

This allows “free” time to be given based upon priority► “Free” time is still accounted and may have to be paid back (for example, if partition 3 becomes ready within 1 averaging window)

6

10

10

8

Blocked

CPU BudgetExceeded

CPU BudgetExceeded

CPU BudgetAvailable

1099 10

Ready

22

Partition Inheritance

When a server process does work requested by a client, the time is “billed” to the client

Prevents runaway client processes from monopolizing system services such as device drivers and server processes

Ensures fair CPU scheduling Allows you to create servers and assign server budgets independent of

number of clients Builds on Neutrino micro-kernel and client-server, message passing

architecture

QNX NeutrinoMicrokernel

File System Application

ThreadsThreads

Inheritance: File System operation uses application’s budget

23

30

Borrowed Time: Critical Threads

Blocked

Running

Ready

6

118

11

6

67

4

30

Critical threads still run (based on priority) even if partition has no budget

Critical threads provide deterministic scheduling even in overload

Critical threads are given critical budget and can go into short-term debt► Critical time is accounted and has to be repaid

► Exceeding critical budget is considered an error and causes notification/action

CriticalThread11

CPU BudgetExceeded

CPU BudgetAvailable

24

Adaptive Partition APIs and Utilities

Control of Adaptive Partitioning Scheduler is done through a kernel API

► API is restricted to privileged processes (root)► Must be called from within default (system) partition► Partitions are created with budget (normal and possibly critical)

“aps” system utility provided► “aps” utility part of adaptive partitioning package► Can be used to create and modify partitions► Also provides usage stats over time► Use “on” to launch processes into partitions

Boot script syntax extended► Define partitions within the build file► Launch processes into specific partitions

Partition configuration completely dynamic► Can create partitions, modify budgets at runtime► Averaging window can also be changed at runtime

25

Getting Started with Adaptive Partitioning

Install Adaptive Partitioning

Step 1 Step 2

Build ImageDefine

Partitions and Budgets

Step 3

Launch Applications In Partitions

Step 4

CODECHANGES

POSIXPROGRAMMING

ALLOWED

26

Summary

SMP is a key enabler for enhancing scalability

SMP delivers measurable performance gains in real-world applications

QNX provides transparent support for SMP systems

Adaptive partitioning can be used to increase your systems security and availability

Adaptive partitioning is easy to apply to existing designs and implementations

Adaptive partition helps you integrate complex systems to improve time to market

Documents

Peter van der Veen QNX Software Systems