SSM White Paper NOV-2010

Optimizing System Management in the Platform SoC Era

Howard Pakosh, ChipStart and Phil Casini, AdvanceTech Marketing

November 2010

Introduction

Consumer focused SoCs have evolved

into platform architectures that are now

being driven by requirements from

operating systems such as Android,

iPhone. Linux, and Windows and the

thousands of applications they support.

Overtime more of the system is moving

into silicon . As a result, system

management functions have moved into

the SoC. Traditional feature based

regression testing at the silicon level

must now be increasingly complimented

with complex system level testing in

order to maintain a high level of system

coverage across SoC road maps.

Balancing price-performance-power and

high system level test coverage therefore

creates complex system management

design challenges that effect both

hardware and software operation.

System management must now be

considered as a central feature and

responsibility of the SoC architecture,

not just as a tactical design consideration

for the development of each individual

SoC. System management should

provide adequate synchronization of

hardware state changes driven by

software, maintain reasonable time to

market and maximize system test

coverage and support.

The remainder of this paper will discuss

design considerations and compare and

contrast three system management

architectures. The first is an ad hoc

system management, which is comprised

of combinations of hardware and

software elements that serve a dual

purpose, one being normal operation,

and one for system management. The

second is including system management

as part of the on-chip interconnects

implementation. The third architecture

introduces a control plane approach for

system management which complements

the data centric global interconnect.

Finally the paper will discuss the

growing importance of integrated

subsystem design and IP for SoCs and

how system level partitioning will play a

growing role in achieving efficient

system management.

System management design

considerations

One of the key challenges associated

with designing SoC system management

schemes stems from the growing number

of programmable devices on-chip.

Programmable devices exponentially

increase the number of combinations of

software operations that drive hardware

state changes in real time. This in turn

complicates system level testing in order

to achieve reasonable test coverage.

Optimizing the SoC design for a single

operating system provides little relief ,

because the diversity of applications

running on the SoC continues to

multiply the testing complexities at the

system level.

System level testing via traditional

silicon level functional and data path

regressions must now be augmented by

system functional test suites include the

programmable elements and their impact

on hardware state changes. Each

programmable core can be isolated and

tested to achieve a high level of code

coverage, and each execution path

through the different cores combinations

can be tested., but the combinations of

hardware state changes they require as a

result of application behavior makes it

almost impossible to achieve adequate

system level coverage solely from

testing the cores and the buses in

isolation or even pseudo random

combinations.

It is at this point that compromises are

often made in the SoC design. How

much risk is affordable when trading off

the cost and time to build these complex

system level regression suites with the

actual test coverage achieved? As

volumes grow the answer is risk must be

mitigated and therefore these tradeoffs

become essential to minimize.

This paper challenges the increasing

“tax” on the project costs to balance

adequate system level test coverage, and

risk, based on current system

management architecture assumptions .

Specifically, instead of continuing to

grow regression suites and make risk

choices based on the assumption that the

associations between the levels of

hardware and system testing are tightly

coupled, abstraction layers can be

inserted into the architecture to decouple

the hardware, operating system, and

applications support functions.

Furthermore, each of these components

can tested through independent

elements introduced into the SoC

architecture.

In fact, this trend has already begun. The

growing use of decoupled global

interconnect structures, such as those

that employ OCP or similar features,

provides a proven example of how to

ease chip architecture design as it

evolves from single to multicore or

multi-layer. By “abstracting” the data

plane, and allowing the associations

between the IP cores to become linked

through the independent global

interconnect structure, system

performance at the hardware level

becomes more predictable and tunable

(CPU to off chip memory for example).

This predictability affords opportunities

to streamline the design process because

these loosely coupled associations are

less effected by specific design changes.

This leads to more rapid timing closure

even though the complexity of the data

plane has grown significantly.

Similar abstraction techniques can be

applied to system management. The

software and hardware layers, the system

management, and the functional

operation of the SoC can be decoupled,

making it easier to test each component

of the system level architecture while

considering the system level driven

hardware state changes. This results in a

system level design which is more easily

understood and has better test coverage.

This approach also abstracts the system

management operational complexities

between hardware and software even

though the number of applications

grows.

The next section of the paper will

discuss three potential methods of

abstraction that lead to varied degrees of

optimizing system management.

System Management Scheme

Comparisons

Given that the objective is to reduce

overall system management complexity

there are three baseline characteristics

that system management schemes should

be benchmarked by:

1. How well does the approach

achieve independence between

the silicon-operating system-

and application layers?

2. How flexible is the approach to

adapt to each derivative design

in a SoC road map?

3. How much test coverage does

the resultant system

management scheme achieve

for the SoC architecture?

By applying these benchmark criteria,

three methods can be evaluated.

Method 1: Using a single operating

system hosted on a “master” CPU. This

has been a popular approach to perform

system management because silicon

elements already required for real time

operation also execute system

management functions.

When SoC complexities are relatively

low, this scheme is very efficient. No

extra silicon, some extra software

development, but very containable.

However, the complexity growth

associated with multicore SoC for

consumer designs today have weakened

the effectiveness of using this approach

because as system tasks become

distributed, that is more interdependent

as more cores are added to the SoC, the

visibility and control of any one core

over any of the others is reduced with

each new element added. The visibility

and control becomes more dependent on

the global interconnect as well as the

cores, adding even more complexity to

execute control functions. The addition

of the global interconnect as part of the

system testing is required in this case

because it controls access to external

memory, a key element in system

operations.

If the master CPU can no longer manage

and verify the hardware state changes of

the other core elements, the number of

possible states increasing results in

unpredictable coverage and the

methodology no longer has value.

Extending the scheme then to add

system test does not return meaningful

dividends on the potentially massive

investment of developing the tests and

verification infrastructure.

Applying the criteria then to this method

for today’s platform SoCs

1. This approach fundamentally

breaks down for multicore SoCs

because it will not adequately

allow the economical

construction of operating system

and application level system test

layers.

2. This criterion is considered

inconsequential given that the

criteria failed the first test.

Host

CPU

IP

Core

IP

CoreI/O

3. This approach will yield

extremely low system test

coverage and therefore its

usefulness is directly dependent

on the complexity of the SoC.

Method 2: Introducing global

interconnect structures and additional

logic to support pseudo-control plane

system management functions. This

approach is an extension of method 1

because often the host CPU continues to

act as the system management master.

Side band signaling, either contained in

the interconnect or designed separately

is used for the control functions.

Mixing data plane and control functions

introduces abstraction levels that aides in

achieving higher system test coverage as

long as the SoC does not drive the

interconnect requirements to become so

complex that the control functions

become a small and lower priority in the

overall mix of functions. When this

occurs the control tasks are executed

sub-optimally as delays occur from

priority choices between functional

operations and system management tasks

because of complex arbitration

sequences and delayed communication

through blocked hierarchical buses.


for today’s SoCs

1. This approach introduces levels

of abstraction which makes the

approach feasible for some

multicore SoCs.

2. However, the approach also has a

ceiling of usefulness which is

normally reached when extra

logic is required to manage

“special” cases for each of the

derivatives in the SoC road map

as inefficiencies mount that are

tolerated to minimize time to

market. One area where this

occurs is when the system

management master, usually the

host CPU, requests that another

core should power down.

Inefficiencies sometimes occur

when complex arbitration

schemes and blocked requests

delay the actual action of

powering down the core. These

delays can often be measured in

thousands of cycles, which is

power consumed for no useful

system function, and is therefore

power wasted.

3. As a result of the ceiling in the

benefits of the approach, overall

coverage is directly dependent on

the complexity of the SoC and as

such is useful only within a range

of SOC complexity.

Method 3: Introducing a control plane

that compliments a data plane global

interconnect.

Host

CPU

IP

Core

IP

Core

I/OIP

Core

IP

Core

This approach differs from the first two

methods because it does not extend the

traditional host CPU system master

approach. Rather, it introduces a

separate control plane and an

independent system controller to

perform system management tasks.

An independent control plane essentially

abstracts the system management tasks

from any one entity. As such, it can be

controlled by any-or all SoC elements as

required, and therefore offers multiple

layers of abstraction. System testing can

be developed by software, hardware,

verification, and system engineers and

applied using a common framework with

equal effectiveness.

This approach is also advantageous

because it separates targeted control

tasks ideally executed with low latency

from longer more complex and often

performance sensitive data plane tasks.

This separation is often necessary when

complexity is high, because traditional

approaches reach the ceiling of

effectiveness discussed during method 2.


for today’s SoCs

1. This approach creates maximum

levels of abstraction for system

management but introduces

control plane functionality.

2. This approach introduces high

levels of flexibility as both

control and data plane functions

can be tuned for each SoC

derivative without changing the

base architecture.

3. This approach also maximizes

the coverage achievable because

any source can direct the system

management and as such

operations (applications) can be

isolated and tested within the

approach without compromising

overall coverage.

Summary:

While method 3 introduces new control

plane functionality, it also enables SoCs

of virtually any complexity to be tested

and operated with maximum efficiency

achieved using the same approach. As

such it is best suited for roadmaps that

contain a wide variety of complexity or

when extreme flexibility is required for

the SoC architecture. The ability to

direct the system controller using any

SoC core is especially noteworthy

because it allows multiple applications

to directly control the hardware states in

real time when needed and without the

overhead of channeling its requests

through other entities, thus avoiding

inter-function dependencies,

complexities and delays.

The Impact of SoC Subsystems on

System Management.

The basic theme to achieving better

system management is successful

partitioning in order to increase adequate

levels of system test coverage. This is

why method 3 was chosen as the most

SystemController

Low

Speed

I/O

Media

Engine

High

Speed

I/O

DRAM

Controller

DSPCPU

Global Interconnect

Control Plane

SystemController

Low

Speed

I/O

Media

Engine

High

Speed

I/O

DRAM

Controller

DSPCPU

Global Interconnect

Control Plane

effective for today’s system management

needs.

It stands to reason, then, that the impact

of subsystem utilization further abstracts

the system management tasks. However,

creating systems within systems also

introduces hierarchies of complexity and

as such, further pushes traditional

methods of system management useless.

The growing use of subsystems over the

next generations of SoC design will

therefore accelerate the adoption of

control plane based system management

as the preferred method of architecture

so that hierarchical levels of complexity

can be absorbed into the system

management architecture while

maintaining a common architecture that

provides the flexibility and scalability

while minimizing risks and costs of

expensive architecture redesigns that

will accelerate as system requirements

continue to become more complex.

Technology

SSM White Paper NOV-2010