Presentation processor with integrated real time garbage collection

© TLB GmbH, Karlsruhe 2012

A Novel RISC Processor Architecture For

Garbage Collection in Embedded Systems


Buffer Overflows are Responsible for a

Large Number of Today’s Security and Safety Problems

In a standard computer system, dynamically growing data

structures can overwrite unrelated data (buffer overflow)

The standard processor architecture lacks protection mechanisms

against buffer overflows

Buffer overflow errors are a common cause for critical security

vulnerabilities


Garbage Collection Helps Reduce Buffer Overflows,

But Causes High Overhead & Unpredictable Pauses

Automatic dynamic memory management automatically releases and compacts dynamically allocated memory after its last use

Such “garbage collection” reduces common error sources for buffer overflows

Existing garbage collection is mostly software-based, demands a high overhead and causes unpredictable pauses in the program execution

The limited resources of embedded systems typically do not allow for efficient garbage collection in real time


A Novel Approach Enables Parallel Garbage Collection

And Parallel Synchronization in Real-Time

The novel RISC processor architecture is optimized for security:

Strict separation of pointers from ordinary non-pointer data by using distinct register sets for pointers and data

The dedicated coprocessor performs the garbage collection:

The coprocessor uses an optimized Baker-style copying collector algorithm that runs in parallel to the main processor

A new garbage collection cycle is started by the coprocessor when the available memory falls below a chosen threshold

Simple hardware extensions to the processor pipeline support the synchronization between garbage collector and main processor

Key for the efficient implementation to avoid unbounded pauses


This Novel Approach Improves Performance

By Leaving The Cache Largely Unaffected

Software garbage collectors usually repeatedly

displace the entire contents of the cache

Examine the entire heap during a single cycle

The coprocessor directly connects to the memory controller

Does not access memory through the main processor’s cache

The cache remains largely unaffected by the garbage collection

The coprocessor ensures cache coherency

Inspects and selectively flushes single cache lines through a dedicated

cache port (resembles snoop port)

The coprocessor eliminates unnecessary memory traffic

Invalidates all cache lines with dead objects

at the end of a garbage collection cycle


A Fully Functional Prototype Exists

And Has Been Used For Performance Measurements

Main processor & GC Coprocessor modeled at register transfer level in VHDL, synthesized for Altera APEX 20K1000C (@ 25MHz)

Pipelined RISC processor, statically scheduled

up to 3 instructions per clock cycle (3-way multiple issue, “in order”)

16 pointer registers, 16 data registers, 8 Praedikatregister

8K execution cache, 8K data cache, 2K attribute cache

two-way set-associative copy-back cache

Micro-coded garbage collection coprocessor

256 x 80 bit on-chip microcode memory

Uses less than 20% of the chip surface area

Software

Native Java bytecode compiler developed for the architecture. An included code scheduler rearranges instructions to take advantage of the processor’s parallel execution units and to hide instruction latencies

Subset of the Java class libraries supporting text-based apps in order to facilitate the execution of representative programs (includes NFS client)


An Experimental Computer System Was Assembled

Based On The Garbage-Collection Processor


Pauses Caused By Garbage Collection Do

Not Exceed 500 Clock Cycles

Frequency distribution of synchronization pauses (shown for javac)

Pause Duration in Clock Cycles

Frequency


The Runtime Overhead For The Hardware-Based

Garbage Collection Is Small


The Advantages Of This Approach Could Enable

Real-Time Garbage Collection in Embedded Systems

Limits pauses from garbage collection to 500 clock cycles

Efficient synchronization

No code overhead

Low total runtime overhead of only a few percent

Undisturbed cache locality

Exact (non-conservative) garbage collection

Compiler & code are independent from garbage collector


BACKUP


Efficient Implementation


Coprocessor Architecture


Synchronization I


Synchronization II


Synchronization III


Synchronization IV


The Runtime Overhead Is Small - I


The Runtime Overhead Is Small - II


The Runtime Overhead Is Small - III

Technology

Presentation processor with integrated real time garbage collection