49
External Use TM Introduction to Data Path Acceleration Architecture (DPAA) FTF-NET-F0146 APRIL 2014 Mary Kung | Digital Networking Applications Engineering

Introduction to Data Path Acceleration Architecture (DPAA)

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

External Use

TM

Introduction to Data Path

Acceleration Architecture (DPAA)

FTF-NET-F0146

A P R I L 2 0 1 4

Mary Kung | Digital Networking Applications Engineering

TM

External Use 1

Session Introduction

• This session will provide:

− Introduction to the QorIQ Data Path Acceleration Architecture (DPAA)

− Discussion of how each component interacts with the core and with

each other

TM

External Use 2

Session Objectives

• After completing this session you will be able to:

− Understand the purpose of DPAA

− Describe the building blocks of DPAA

− Understand how the DPAA blocks interact with each other

− Understand DPAA implementations on various Freescale devices

TM

External Use 3

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 4

Multicore Data Path Issues and Requirements

Multicore SoCs have a number of new requirements related to packet processing when compared to single core SoCs:

− Load spreading of arriving packets across pools of cores for parallel processing

− Packet ordering issues after processing

− Pipelined processing of packets using cores

− Network I/O sharing between cores

− Hardware accelerator “virtualization”

− Inter-core communication

Core

D$ I$

Network

I/O

Hardware

Accelerator

Network

Core

D$ I$

Core

D$ I$

Core

D$ I$

Core

D$ I$

TM

External Use 5

More Multicore Data Path Requirements

• Addressing these requirements can lead to new requirements:

− Hardware managed queues Hardware-supported active queue

management

− Network interfaces must be able to parse, classify, and distribute frames

• High-bandwidth network I/O on QorIQ devices also drive data path

requirements:

− Queue congestion driven flow control

− Resource depletion driven flow control

− Hardware buffer management

TM

External Use 6

What is the Data Path Acceleration Architecture (DPAA)?

The QorIQ DPAA is a comprehensive architecture which integrates

all aspects of packet processing in the SoC

- Addresses issues and requirements resulting from the multicore nature

of QorIQ SoCs

The DPAA includes:

− Network and Packet I/Os

− Hardware offload accelerators

− Infrastructure required to facilitate the flow of packets between the above

TM

External Use 7

Example HW Difference: Buffer Descriptor Rings vs. DPAA

DPAA infrastructure replaces descriptor rings:

• Queuing split from buffer management

• Queues can be shared by multiple cores

• Data reception no longer throttled by how fast software can service ring entries

• Data can be stashed into cache just before it is processed

Core

D$ I$

Core

D$ I$

Core

D$ I$

Core

D$ I$

Network I/O

Eth Eth Eth

Queue

Manager

Buffer

Manager

Core

D$ I$

Eth

TM

External Use 8

QorIQ DPAA Fundamental Components

QMan

Queue

Manager

BMan

Buffer

Manager

SEC

Security

Engine

PME

Pattern

Matching

Engine

FMan

Frame

Manager

RMan

RapidIO

Message

Manager

Cores

Rapid I/O

Messaging

and more

Ethernet

Hardware

Accelerators

Infrastructure

Components

DPAA

Network &

Packet I/O

TM

External Use 9

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 10

DPAA Ethernet MAC Component Differences

• QorIQ P class devices have both:

− Three-Speed Ethernet Controller (dTSEC)

− 10-Gigabit Ethernet Media Access Controller (10GEC)

• QorIQ T class devices have:

− Multi-rate Ethernet Media Access Controller (mEMAC)

TM

External Use 11

QorIQ P4080 DPAA Components

RapidIO

Message

Unit (RMU)

2x DMA

PCIe

18-Lane 5GHz SERDES

PCIe SRIO PCIe

CoreNet

1024KB

Frontside

L3 Cache

64-bit

DDR-2 / 3

Memory Controller

SRIO

Watchpoint Cross

Trigger

Perf Monitor

CoreNet Trace

Aurora

SEC PME

Buffer

Mgr

eLBC

Test

Port/

SAP Frame Manager

1GE 1GE

1GE 1GE 10GE

1024KB

Frontside

L3 Cache

64-bit

DDR-2 / 3

Memory Controller

PAMU

Coherency Fabric PAMU PAMU PAMU PAMU

Peripheral

Access Mgmt Unit

eOpenPIC

Power Mgmt

2x USB 2.0/ULPI

SD/MMC

Clocks/Reset

2x DUART

4x I 2 C

SPI

GPIO

PreBoot Loader

Security Monitor

Internal BootROM

CCSR

Power Architecture

e500-mc Core

D-Cache I-Cache

128KB

Backside

L2 Cache 32KB 32KB

Real Time Debug

Frame Manager

1GE 1GE

1GE 1GE 10GE

Queue

Manager

QorIQ

P4080

TM

External Use 12

QorIQ T4xxx DPAA Components

Hardware Accelerators

FMAN

Frame

Manager

50 Gbps aggregate Parse,

Classify, Distribute

BMAN

Buffer

Manager

64 buffer pools

QMAN

Queue

Manager

Up to 224 queues

RMAN

Rapid IO

Manager

Seamless mapping sRIO

to DPAA

SEC

Security

40Gbps: IPSec, SSL

Public Key 25K/s 1024b

RSA

PME

Pattern

Matching

10Gbps aggregate

DCE

Data

Compression

20Gbps aggregate

Saving CPU Cycles for higher value work

Compress and

decompress

traffic across the

Internet

Protects against

internal and

external Internet

attacks

Frees CPU from

draining repetitive

RSA, VPN and

HTTPs traffic

Identifies traffic

and targets CPU

or accelerator

New Enhanced

Line rate

50Gbps

Networking

Quality of Service

for FCoE in

converged data

center networking

TM

External Use 13

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 14

Network I/O: FMAN

Frame Manager (FMan) supports:

• (P4080) One 10GE MAC and Four GE MACs − Max 12xGE parse+classify

• (T4xxx) Two 10GE MAC and Six GE MACs

• L2/L3/L4 protocol parse and validate − User defined protocols supported

• Hash-based queue selection for load spreading

• Exact-match classification queue selection

• IEEE 1588 timestamping

• RMON/ifMIB stats

• Color-aware dual-rate, 3-color policing

• “Right size” buffer acquisition from BMan buffer pools

• Per port egress rate limiting

• TCP/UDP TX checksum calculation

10GE GE GE GE GE

Frame Manager

(FMan) DMA

Policer Keygen

(Distribution)

Parser Classifier

CoreNet

To

BMan

To QMan

QMI

BMI

Buffer

Memory

TM

External Use 15

Fman Modular Architecture Processing Pipeline

MAC Rx and validate

BMI streams and allocate internal buffer

for incoming frame + IC (Internal Context)

Calculate raw L4 checksum for parser

Based upon Layer-2 packet size, BMI requests

“right sized” buffer to BMan

BMI instructs DMA to transfer frame

to external buffer

Parse / Classify / Distribute

Determine queue ID#

Per-group policing(RFC2698/4115)

QMI instructs QMan to enqueues FD

BMI releases internal buffer on completion

QMan active queue mngt

and scheduling (WRED)

Core dequeue & processing

Rx

BMI instructs DMA to write

frame IC and header to ext. Buffer

Multiple

stages

FMan

Parser

Internal Ctx

Shared

Memory

BMI

QMI

1GE 10GE

DMA

Keygen

Classify/ Distrib

Policer

MACs

FPM BMan

QMan

BMI

BMI / BMan

BMI / DMA

PCD

Policer

BMI / DMA

QMI / BMI

QMan / BMan

QMan

Core / SW

MAC

FMC

10GE 1GE 1GE 1GE 1GE 1GE

QorIQ T4240

FD

BP

Frame

TM

External Use 16

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 17

DPAA Infrastructure: BMAN

Buffer Manager (BMan) supports:

• 64 pools of buffer pointers

− All buffers in a pool are expected to have “like” characteristics

− BMan places no restrictions on these characteristics

• Hardware (and software) acquire and release of buffer pointers from/to pools

− BMan is primarily intended to reduce the buffer management load on SW

• Pool depletion thresholds for pool replenishment and lossless flow control

− All thresholds have hysteresis

Buffer Manager

(BMan)

FMan

FMan

SEC

PME

List

Engines

Software Portals

CoreNet

Internal stockpile

To Cores

Hard

wa

re p

orta

ls

TM

External Use 18

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 19

Terminology…

• Buffer: Unit of contiguous memory, allocated by software

• Frame: Buffer(s) that hold a data element (generally a packet)

− Frames can be single buffers or multiple buffers (scatter/gather lists)

A “simple frame” has one delimited data element

A “multi buffer frame” has two or more data elements

• Frame Descriptor (FD): Proxy structure used to represent frames

• Frame Queue:

− FIFO of related Frames Descriptor.(e.g. TCP session)

− The basic queuing structure supported by QMan

• Frame Queue Descriptor (FQD): Structure used to manage Frame Queues

Buffer

Buffer

Ethernet

Frame Pre-

amble

Dest

addr

Src

addr Type Data CRC

Buffer Buffer

FD

FD

FQD FD FD FD FD

TM

External Use 20

Queue “Building Blocks”

• Frame Queues (FQs) are the basic queuing structure supported by QMan − FIFO lists of Frame Descriptors (FDs)

− Each FD describes a frame which is a delineated piece of data (e.g. a packet) in buffer(s) in memory

− Multi-buffer frames are described using Scatter/Gather Tables

− FQs are in turn enqueued on Work Queues (WQs)

• Channels are a collection of 8 WQs which have relative priority − Class scheduling is performed at a channel

− FQs are an ordered list of frames which need to be processed in the same way

− WQs are an ordered list of FQs which all have the same priority

• Portal is an interface used to access QMan facilities (e.g. Enqueue or Dequeue) possibly for multiple channels

C

ha

nn

el

Ch

an

ne

l

WQ7

WQ0

WQ1 …

FQ

FQ FQ

FQ FQ

FD FD

SGT

Bu

ffer

FD

User memory QMan data structures

Bu

ffer

Bu

ffer

Porta

l

Context

TM

External Use 21

DPAA Infrastructure: QMAN

Queue Manager (QMan) supports: • Low latency, prioritized queuing of

descriptors between cores, network I/O, and accelerators

• Lockless shared queues for load spreading and device “virtualization”

• Order restoration as well as order preservation through queue affinity

• Active queue management (WRED)

• Optimized core interface which can pre-position data/context/descriptors in core’s cache

• Delivery of per-queue accelerator specific commands and context information to offload accelerators along with dequeued descriptors

FQD

Cache

Queue Manager

(QMan)

FMan

FMan

SEC

PME

FD

Memory

Queuing

Engines

Software Portals

CoreNet To Cores

Hard

wa

re p

orta

ls

Frame Descriptor

Frame

Descriptor

TM

External Use 22

Core Interface: QMan Software Portals

• Software portals provide the DPAA interface to cores and software

− Portal per core

− Can be used by a core to access multiple channels or queues directly

• Low latency, lock free dequeue and enqueue of descriptors

• Portals can work closely with a core to (optionally) position the following:

− Descriptors

− Packet data

− Software defined per queue context or state information in L1 or L2 cache

• Queues can be “held” on a portal to ensure temporary affinity for order preservation

channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Power Architecture™

Core

D-Cache

I-Cache

L2 Cache

SW Portal

Dedicated channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

QMan

Held FQs

Power Architecture™

Core

D-Cache

I-Cache

L2 Cache

SW Portal

Pool channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

TM

External Use 23

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 24

SEC 5.x • Public Key Hardware Accelerators (PKHA)

− RSA and Diffie-Hellman (to 4096b)

− Elliptic curve cryptography (1023b)

− Supports Run Time Equalization

• Data Encryption Standard Accelerators (DESA)

− DES, 3DES (2K, 3K)

− ECB, CBC, OFB modes

• Advanced Encryption Standard Accelerators (AESA)

− Key lengths of 128-, 192-, and 256-bit

− ECB, CBC, CTR, CCM, GCM, CMAC,

− OFB, CFB, and XTS

• Message Digest Hardware Accelerators (MDHA)

− SHA-1, SHA-2 256,384,512-bit digests

− MD5 128-bit digest

− HMAC with all algorithms

• ARC Four Hardware Accelerators (AFHA)

− Compatible with RC4 algorithm • Kasumi/F8 Hardware Accelerators (KFHA)

− F8 , F9 as required for 3GPP − A5/3 for GSM and EDGE − GEA-3 for GPRS

• Snow 3G Hardware Accelerators (STHA) − Implements Snow 3.0

• CRC Unit

− CRC32, CRC32C, 802.16e OFDMA CRC

• Random Number Generator, random IV generation

• Header & Trailer off-load for the following Security Protocols:

− IPSec, 802.1ae, SSL/TLS, SRTP, 802.11i, 802.16e

• Modular & Scalable with simplified device driver

On - Chip

System

Interface

Queue

Manager

Interface

Descriptor

Controllers

Job Queue

Controller

CHAs

RTIC RTIC

On - Chip

System

Interface

Queue

Manager

Interface

Descriptor

Controllers

Job Queue

Controller RTIC RTIC

CoreNet

QMan/

BMan

TM

External Use 25

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 26

Pattern Matching Engine (PME) 2.x

On-Chip

System

Bus

Interface

Pattern

Matcher

Frame

Agent

(PMFA)

Data

Examination

Engine

(DXE)

Stateful

Rule

Engine

(SRE)

Key

Element

Scanning

Engine

(KES)

Hash

Tables

Access to Pattern Descriptors and State

Pattern Matching Engine components

Cache Cache

User Definable Reports BM

an i/f

• Regex support plus extensions:

− Patterns can be split into 256 sets, each of which can contain 16 subsets

− 32K patterns of up to 128B length

− 9.6 Gbps raw performance

• Combined hash/NFA technology

− No “explosion” in number of patterns due to wildcards

− Low system memory utilization

− Fast pattern database compilations and incremental updates

• Pattern identification in streamed data by matching across “work units”

• Utilizes a pipeline of processing blocks to provide a complete pattern matching solution

QM

an

i/f

TM

External Use 27

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 28

Data Compression Engine (DCE)

• Deflate

− RFC1951

• GZIP

− RFC1952

• Zlib

− RFC1950

− Interoperability with Zlib 1.2.5 compression library

• Encode

− RFC4648: Supports Base 64 encoding and decoding

• Operate up to 600Mhz

− 10Gbps Compression rate

− 10Gbps Decompression rate

− 20Gbps Aggregate

32KB

History

Frame

Agent

QMan

I/F

BMan

I/F

Bus

I/F

Decompressor

Compressor

QMan

Portal

BMan

Portal

To

Corenet

4KB

History

TM

External Use 29

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 30

RapidIO Message Manager (RMan)

QMan

RMan

Inbound Rule

Matching

Classification

Unit

Reassembly

Contexts

Reassembly

Unit

Segmentation

Unit

Rapid

IO I

nbound T

raffic

Rapid

IO O

utb

ound T

raffic

Classification

Unit

Classification

Unit

Reassembly

Unit

Reassembly

Unit

Segmentation

Unit

Segmentation

Unit

AR

B

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Channel

Frame Manager

1GE 1GE

1GE 1GE 10GE

D$ I$

D$ I$ L2$ Core

D$ I$

SE

C

PM

E

Disassembly

Contexts

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Channel

TM

External Use 31

RMan Unit Comparison

QorIQ P4080 QorIQ P2040, P3, P5,T2,T4

Outbound

Transactions Supported Type 10 Doorbells

Type 11 Messaging

Type 5 NWRITE Type 9 Data Streaming

Type 6 SWRITE Type 10 Doorbells

Type 8 Port-Write Type 11 Messaging

Queues 1 Type 10 Doorbell

2 Type 11 Messaging Thousands of queues supporting Type 5,6,8-11

Queue Arbitration Round Robin Data Path Acceleration Architecture

• 3+3+1 SP+WRR

Segmentation Resources 2 Segmentation Units 4 Segmentation Units

Multicast Support Type 11 256B PDU to 16 Destinations Type 11 256B PDU to 32 Destinations

Inbound

Transactions Supported

Type 8 Port-Write

Type 10 Doorbells

Type 11 Messaging

Type 8 Port-Write Type 10 Doorbells

Type 9 Data Streaming Type 11 Messaging

Queues

1 Type 8 Port-Write

1 Type 10 Doorbell

2 Type 11 Messaging

1 Type 8 Port-Write

1000s Type 9-11

Classification 2 Rules (Fixed)

Type 11: [mbox]

64 Rules (Exact or Wildcards)

or

Map selected header fields to queue ID

Simultaneous Reassembly

Contexts 2 Type 11 16 Type 9, 11

Additional Features

Traffic Management N/A Type 9: End-to-end XON/XOFF Per-Queue Flow Control

TM

External Use 32

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• BMAN

• QMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 33

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

• FMan receives packets

− Allocates internal buffers

− Retrieves data from MAC

• BMI

− Acquires a buffer from BMan

− Uses DMA to store data in it

• Parse+classify+keygen select a queue and policer profile

• Policer “colors” and optionally discards frame

• QMan applies active queue management and enqueues frame

• Frame is enqueued to one of a pool of cores

• Available core dequeue FD for processing

MAC

BMI

Parser

Classifier

Keygen

Policer

QMI

WRED

Enqueue

Dequeue

To

Memory

10GE GE GE GE GE

Frame Manager

(FMan) DMA

Policer Keygen

(Distribution)

Parser Classifier

QMI

BMI

Memory Buffer

Manager

Queue

Manager

D

WQ0

WQ1

WQ2

WQ3

WQ4

WQ5

WQ6

WQ7

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

ENQ

FD

DEQ

Return

Buf Ptr

Request

Buffer

DDR

D

PKT

D

PKT

DDR

Life of an Ingress Packet

TM

External Use 34

Channel Enqueue / Dequeue Example (QorIQ P4080)

Dedicated Channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

SW Portal n+1

Pool Channel W

Q0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

DCP Portal 3

Dedicated Channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

PME Core0 Core1

Dedicated Channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

QMAN

FMAN 1

QMI

DCP Portal 0

SW Portal n

• • •

EQCR

• • •

DQRR

• • •

EQCR

• • •

DQRR

FQD[Dest_WQ]

1GE

Enqueue

Dequeue Enqueue

Dequeue

FQD[Dest_WQ]

Enqueue

Dequeue

Enqueue

Dedicated Channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

FQD[Dest_WQ] FQD[Dest_WQ]

10GEC

PCD

DCP = Direct Connect Portal / Hardware Portal

TM

External Use 35

Agenda

• Overall DPAA Architecture

• DPAA Implementation Differences

• FMAN

• QMAN

• BMAN

• SEC

• PME

• DCE

• RMAN

• Life of an Ingress Packet

• Additional Product Diagrams…

TM

External Use 36

RapidIO

Message

Unit (RMU)

2x DMA

PCIe

18-Lane 5GHz SERDES

PCIe SRIO PCIe

CoreNet™

1024KB

Frontside

L3 Cache

64-bit

DDR-2 / 3

Memory Controller

SRIO

Watchpoint Cross

Trigger

Perf Monitor

CoreNet Trace

Aurora

SEC PME

Buffer

Mgr

eLBC

Test

Port/

SAP Frame Manager

1GE 1GE

1GE 1GE 10GE

1024KB

Frontside

L3 Cache

64-bit

DDR-2 / 3

Memory Controller

PAMU

Coherency Fabric PAMU PAMU PAMU PAMU

Peripheral

Access Mgmt Unit

eOpenPIC

Power Mgmt

2x USB 2.0/ULPI

SD/MMC

Clocks/Reset

2x DUART

4x I 2 C

SPI

GPIO

PreBoot Loader

Security Monitor

Internal BootROM

CCSR

Power Architecture

e500-mc Core

D-Cache I-Cache

128KB

Backside

L2 Cache 32KB 32KB

Real Time Debug

Frame Manager

1GE 1GE

1GE 1GE 10GE

Queue

Manager

QorIQ

P4080

QorIQ P4080 DPAA Components

TM

External Use 37

SRIO

Message

Unit

DMA

PCIe

18-Lane 5GHz SERDES

PCIe SRIO PCIe

CoreNet

512-Kbyte

Frontside

L3 Cache

64-bit

DDR-2 / 3

Memory Controller

QorIQ

P4040 Power Architecture

e500-mc Core

D-Cache I-Cache

128-Kbyte

Backside

L2 Cache

SRIO

Watchpoint Cross Trigger

Perf Monitor

CoreNet Trace

Aurora

Real Time Debug

Security 4.0

Pattern

Match

Engine

2.0

Queue

Mgr.

Buffer

Mgr.

eLBIU

M2SB

Test

Port/

SAP

Frame Manager

1GE 1GE

1GE 1GE 10GE

Parse, Classify, Distribute

Buffer

32-Kbyte 32-Kbyte 512-Kbyte

Frontside

L3 Cache

64-bit

DDR-2 / 3

Memory Controller

PAMU

Coherency Fabric PAMU PAMU PAMU PAMU

1GE 1GE

1GE 1GE 10GE

Parse, Classify, Distribute

Buffer

Frame Manager

Peripheral

Access

Mgmt Unit

eOpenPIC

Power Mgmt

2x USB 2.0/ULPI

SD/MMC

Clocks/Reset

DUART

2x I 2 C

SPI

GPIO

PreBoot Loader

Security Monitor

Internal BootROM

CCSR

Execution

QorIQ P4040

TM

External Use 38

Quad e500mc Power Architecture • 4 cores (up to 1.5GHz)

• Each with 128KB backside L2 cache

• 1MB Shared L3 Cache w/ECC

Memory Controller • DDR3/3L SDRAM up to 1.3 GHz

• 32/64 bit data bus w/ECC

High Speed Interconnect • 4 PCIe 2.0 Controllers

• 2 sRapidIO 2.1 Controllers Type 9 and 11 messaging

• 2 SATA 2.0

CoreNet Switch Fabric

Ethernet • 5 x 10/100/1000 Ethernet Controllers

Or 4x 2.5Gb/s SGMII

• 1 x 10GE Controllers

• All w/ Classification, H/W Queuing, policing,

and Buffer Management, Checksum Offload,

QoS, Lossless Flow Control, IEEE 1588

• Up to 1 XAUI, 4 SGMII or 2.5Gb/s SGMII, 2

RGMII

Device • 45nm SOI Process

• 1295-pin package, pin compat with P4040 37.5x37.5mm

CoreNet

Pattern

Match

Engine

2.0

1024 KB

Frontside

L3 Cache

64-bit DDR3/3L

Memory Controller

Coherency Fabric PAMU

Peripheral

Access Mgmt Unit

eOpenPIC

Power Mgmt

2x USB 2.0 PHY

SD/MMC

Clocks/Reset

2x DUART

4x I 2 C

SPI

GPIO

PreBoot Loader

Security Monitor

Internal BootROM

CCSR

Power Architecture

e500-mc Core

D-Cache I-Cache

128 KB

Backside

L2 Cache 32 KB 32 KB

SEC

4.0

Queue

Mgr.

Buffer

Mgr.

eLBC

32b

Rapid

IO

RMan

PAMU PAMU PAMU PAMU

Frame Manager

Parse, Classify, Distribute

Buffer

DMA

x2

PC

Ie

18-Lane 5 GHz SerDes

PC

Ie

PC

Ie

PC

Ie

Watchpoint Cross

Trigger

Perf Monitor

CoreNet Trace

Aurora

1GE

10GE

Real Time

Debug

1GE

1GE

1GE

SA

TA

2.0

SA

TA

2.0

SR

IO

SR

IO

1GE

QorIQ P3 Series – P3041 Block Diagram

TM

External Use 39

Frame Manager

Parse, Classify, Distribute

Buffer

DMA

x2 P

CIe

18-Lane 5 GHz SerDes

PC

Ie

PC

Ie

PC

Ie

CoreNet

Watchpoint Cross

Trigger

Perf Monitor

CoreNet Trace

Aurora

Pattern

Match

Engine

2

1GE

10GE

1024 KB

Frontside

L3 Cache

64-bit DDR-3

Memory Controller

Coherency Fabric PAMU

Peripheral

Access Mgmt Unit

eOpenPIC

Power Mgmt

2x USB 2.0 PHY

SD/MMC

Clocks/Reset

DUART

4x I 2 C

SPI

GPIO

PreBoot Loader

Security Monitor

Internal BootROM

CCSR

Power Architecture

e500mc-64 2GHz Core

D-Cache I-Cache

512 KB

Backside

L2 Cache 32 KB 32 KB

Real Time

Debug SEC

4

Queue

Mgr.

Buffer

Mgr.

eLBC

1GE

1GE

1GE

PAMU PAMU PAMU

SA

TA

2.0

SA

TA

2.0

RAID

5/6

Engine

SR

IO

SR

IO

SRIO

Mgr.

1GE

QorIQ

P5020

QorIQ P5 Series – P5020 DPAA Components

• Dual e500mc-64 Power Architecture − 2x 64-bit e500mc cores (up to 2

GHz) − Each with 512 KB backside L2

cache − Dual 1MB Shared L3 Cache w/ECC − Supports up to 64GB addressability

(36 bit physical addressing) • Memory Controller

− Dual DDR3, 3L up to 1.3 GHz − 32/64 bit data bus w/ECC

• High Speed Interconnect − 4 PCIe 2.0 Controllers − 2 SRIO 2.1 Controllers Type 9 and 11 messaging

− 2 SATA 3Gb/s − 2 USB 2.0 with PHY

• CoreNet Switch Fabric • Ethernet

− 5 x 10/100/1000 Ethernet Controllers

− 1 x 10GE Controller (XAUI) − All w/ Classification/Policing, H/W

Queuing, policing, and Buffer Management, Checksum Offload, QoS, Lossless Flow Control, IEEE 1588v2, 4 SGMII, QSGMII

• Data Path Acceleration − SEC 4 − PME 2 − RapidIO Messaging

• Device − 45nm SOI Process − 1295-pin package

TM

External Use 40

QorIQ T4xxx DPAA Components

Hardware Accelerators

FMAN

Frame

Manager

50 Gbps aggregate Parse,

Classify, Distribute

BMAN

Buffer

Manager

64 buffer pools

QMAN

Queue

Manager

Up to 224 queues

RMAN

Rapid IO

Manager

Seamless mapping sRIO

to DPAA

SEC

Security

40Gbps: IPSec, SSL

Public Key 25K/s 1024b

RSA

PME

Pattern

Matching

10Gbps aggregate

DCE

Data

Compression

20Gbps aggregate

Saving CPU Cycles for higher value work

New Enhanced

TM

External Use 41

DPAA Component Comparison Reference

Component QorIQ P3041 QorIQ P4040/80 QorIQ P5020/40 QorIQ T4240 / T2080

Cores 4 4/8 2/4 12 cores, 24 threads / 4 cores, 8 threads

QMan 100M ops/sec

256 CongGrp

10 SP

100M ops/sec

256 CongGrp

10 SP

100M ops/sec

256 CongGrp

10 SP

295M ops/sec

256 CongGrp

50 SP

BMan 64 BufferPool 64 BufferPool 64 BufferPool 64 BufferPool

Network IO

FMan 18Mpps 2 * 18Mpps 18Mpps 2 * 37.2 Mpps / 1*27.2 Mpps

Accelerator

SEC 5Gbps (v4.2) 10Gbps (v4.0) 10Gbps (v4.2) 40Gbps(v5)

PME 5Gbps 9.6Gbps 9.6Gbps 9.6Gbps

RE n/a n/a Yes n/a

RMan 1x,2x,4x @1.25, 2.5,

3.125 &5G baud

n/a

SRIO Rev 1.2

1x,2x,4x @1.25, 2.5,

3.125 &5G baud

1x,2x,4x @1.25, 2.5, 3.125 &5G baud

DCE n/a n/a n/a 20Gbps

DCB n/a n/a n/a Yes

TM

External Use 42

Session Summary

• The Data Path Acceleration Architecture components include:

− Frame Manager

− Buffer Manager

− Queue Manager

− Hardware Accelerators (SEC, PME, DCE, RMan)

• These components are integrated to address multicore requirements such as:

− Load spreading

− Packet ordering

− Device virtualization

− Inter-core communication

− HW buffer management

TM

External Use 43

For Further Information

• Freescale Website: DPAA

− http://www.freescale.com/webapp/sps/site/overview.jsp?code=QORIQ_DPAA

• Freescale Website: DPAA Reference Manual rev 2.0

− See individual device’s webpage

• Freescale Infocenter: SDK / USDPAA Information

− http://www.freescale.com/infocenter

• FTF Presentations

− FTF-NET-F0147 Data Path Acceleration Architecture (DPAA) Usage Scenarios

− FTF-NET-F0148 Data Path Acceleration Architecture (DPAA) Debug

− FTF-NET-F0031 QorIQ T4240 Communications Processor Deep Dive

− FTF-NET-F0111 Overview of Autonomous IPSec with QorIQ T Series Processors

− FTF-NET-F0246 Troubleshooting Techniques for QorIQ eTSEC and DPAA Platforms

− FTF-SDS-F0004 QorIQ Optimization Suite (QOS) Packet Analysis Tool

TM

External Use 44

Session Closing

By now, you should be able to:

• Describe, at a high level, the DPAA module and how it is used in

Freescale’s devices

• Apply the knowledge gained in this presentation to begin or refine

your design efforts

TM

External Use 45

Introducing The

QorIQ LS2 Family

Breakthrough,

software-defined

approach to advance

the world’s new

virtualized networks

New, high-performance architecture built with ease-of-use in mind Groundbreaking, flexible architecture that abstracts hardware complexity and

enables customers to focus their resources on innovation at the application level

Optimized for software-defined networking applications Balanced integration of CPU performance with network I/O and C-programmable

datapath acceleration that is right-sized (power/performance/cost) to deliver

advanced SoC technology for the SDN era

Extending the industry’s broadest portfolio of 64-bit multicore SoCs Built on the ARM® Cortex®-A57 architecture with integrated L2 switch enabling

interconnect and peripherals to provide a complete system-on-chip solution

TM

External Use 46

QorIQ LS2 Family Key Features

Unprecedented performance and

ease of use for smarter, more

capable networks

High performance cores with leading

interconnect and memory bandwidth

• 8x ARM Cortex-A57 cores, 2.0GHz, 4MB L2

cache, w Neon SIMD

• 1MB L3 platform cache w/ECC

• 2x 64b DDR4 up to 2.4GT/s

A high performance datapath designed

with software developers in mind

• New datapath hardware and abstracted

acceleration that is called via standard Linux

objects

• 40 Gbps Packet processing performance with

20Gbps acceleration (crypto, Pattern

Match/RegEx, Data Compression)

• Management complex provides all

init/setup/teardown tasks

Leading network I/O integration

• 8x1/10GbE + 8x1G, MACSec on up to 4x 1/10GbE

• Integrated L2 switching capability for cost savings

• 4 PCIe Gen3 controllers, 1 with SR-IOV support

• 2 x SATA 3.0, 2 x USB 3.0 with PHY

SDN/NFV

Switching

Data

Center

Wireless

Access

TM

External Use 47

See the LS2 Family First in the Tech Lab!

4 new demos built on QorIQ LS2 processors:

Performance Analysis Made Easy

Leave the Packet Processing To Us

Combining Ease of Use with Performance

Tools for Every Step of Your Design