25
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems Design Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer

ECE 526 – Network Processing Systems Design

  • Upload
    brac

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

ECE 526 – Network Processing Systems Design. IXP XScale and Microengines Chapter 18 & 19: D. E. Comer. Overview. Recalled Packet processing functions (forwarding, queuing…) Traditional network processing systems (CPU + NICs) General network processor architecture and tradeoffs - PowerPoint PPT Presentation

Citation preview

Page 1: ECE 526 – Network Processing Systems Design

ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems

DesignDesignIXP XScale and Microengines

Chapter 18 & 19: D. E. Comer

Page 2: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 2

OverviewOverview• Recalled

─ Packet processing functions (forwarding, queuing…)─ Traditional network processing systems (CPU + NICs)─ General network processor architecture and tradeoffs─ Intel IXP network processors overall architecture

• Focus on individual components of Intel IXP chip─ Control processor (slow path): XScale core

• Overall architecture• Typical functions• Processor features

─ Packet processing processor (fast path): Microengines• Architecture and features• Differences to conventional processors• Pipelining and multi-threading

Page 3: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 3

Purpose of Control Purpose of Control ProcessorProcessor

• Functions typically executed by embedded control proc:─ Bootstrapping─ Exception handling─ Higher-layer protocol processing─ Interactive debugging─ Diagnostics and logging─ Memory allocation─ Application programs (if needed)─ User interface and/or interface to the GPP─ Control of packet processors─ Other administrative functions

Page 4: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 4

XScale Memory XScale Memory ArchitectureArchitecture

• Memory architecture─ Uses 32-bit linear address space─ configurable endian mode─ Byte addressable

• Memory Mapping─ Allocation of address space (2^32) to different system

components─ Accesses to memory is translated into access to

component─ Needs to be carefully crafted

• XScale assumes byte addressable memory─ Underlying memory uses different size (SDRAM)─ How does this work?

• Support for Virtual Memory─ For demand paging to secondary storage

Page 5: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 5

Shared Memory Address Shared Memory Address IssuesIssues

• Memory is shared between XScale and Microengines

• Same data, but different addresses• What impact does this have?

─ Pointers need to be translated─ Data structures with pointers can not be shared

Page 6: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 6

MicroenginesMicroengines• Microengines are data-path packet processors IXP• IXP 2400 have 8 Microengines• Simpler than XScale• Low level device as a micro-sequencer• Optimized for packet processing• More complex to use• Often abbreviated as uE

Page 7: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 7

uE FunctionsuE Functions• uEs handle ingress and egress packet processing:

─ Packet ingress from physical layer hardware─ Checksum verification─ Header processing and classification─ Packet buffering in memory─ Table lookup and forwarding─ Header modification ─ Checksum computation ─ Packet egress to physical layer hardware

Page 8: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 8

uE ArchitectureuE Architecture• uE characteristics:

─ Programmable microcontroller─ RISC design─ 256 general-purpose registers─ 512 transfer registers─ 128 next neighbor registers─ Hardware support for 8 threads and context switching─ 640 words of local memory─ Control of an Arithmetic and Logic Unit─ Direct access to various functional units─ A unit to compute a Cyclic Redundancy Check (CRC)

Page 9: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 9

uE as Micro-sequenceruE as Micro-sequencer• Micro-sequencer does not contain native

instructions for possible operations─ Instead of using instructions, uE invokes functional units

to perform operations─ Control unit is much “simpler”

• Example 1:─ uE does not have ADD R2,R3 instruction─ Instead: ALU ADD R2, R3─ “ALU” indicates that ALU should be used─ “ADD” is a parameter to ALU

• Example 2:─ Memory access not by simple LOAD R2, 0xdeadbeef─ Instead: SRAM LOAD R2, 0xdeadbeef

• Altogether similar to normal processor, but more basic

Page 10: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 10

uE Instruction SetuE Instruction Set• General

─ ALU and etc

• Brach and Jump ─ BR: branch unconditionally

• CAM ─ CAM_CLEAR: clear all entries in local memories

• I/O and context swap─ SCRATCH (read and write)

• For detail see Figure 19.1, 19.2, Comer.

Page 11: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 11

uE MemoriesuE Memories• uEs: viewing memories differently than XScale

does─ Does not map memories and I/O devices into a liner

address space─ Does not view memories as a seamless, uniform

repository

• uE ISA: requiring a separate instruction for each type of memory and I/O device─ SRAM[read, $$x, address1, address2…]

• Programmer: required binding of data items to specific type of memory permanently.

Page 12: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 12

Execution PipelineExecution Pipeline• What is pipeline?• Why pipeline is employed?

─ One instruction is executed per cycle if pipeline is proper designed

• uEs use five-stage or six-stage pipeline:

Page 13: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 13

PipeliningPipelining

Page 14: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 14

Pipelining ProblemsPipelining Problems• Possible sources of pipelining problems

─ Data dependencies─ Control dependencies─ Resource dependencies─ Memory accesses

• How pipelining problem impact system performance

• How these impact can be removed or reduced─ Remove the sources so that no stall happened─ Hide the impact of pipelining stall

Page 15: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 15

Pipeline StallsPipeline Stalls• K: ALU ADD R2, R1, R2• K+1 ALU ADD R3, R2, R3

• Control dependencies, memory have even bigger impact

Page 16: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 16

Threading IllustrationThreading Illustration

Page 17: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 17

Hardware ThreadsHardware Threads• uEs support 8 hardware thread contexts

─ One thread can execute at any given time─ When stall occurs, uE can switch to other thread (if not

stalled)

• Very low overhead for context switch─ “Zero-cycle context switch”─ Effectively can take around three cycles due to pipeline flush

• Switching rules─ If thread stalls, check if next is ready for processing─ Keep trying until ready thread is found─ If none is available, stall uE and wait for any thread to

unblock

• Improves overall throughput• Questions:

─ Why not 16, 32 threads ─ why not have 48 uEs with 1 thread?

Page 18: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 18

SummarySummary• Control processor (slow path): XScale core

• Overall architecture• Typical functions• Processor features

• Packet processing processor (fast path): Microengines

• Architecture and features• Differences to conventional processors• Pipelining and multi-threading

Page 19: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 19

Lab3 BriefLab3 Brief• Intel Reference Systems• SDK Tutorial• Lab 3

Page 20: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 20

Intel Reference SystemsIntel Reference Systems• Hardware Testbed

─ IXP2400 network processors─ QDRM-SRAM, Flash ROM and other memories─ 1G optical ethernet ports─ 100M ethernet management port─ Serial interface ─ PCI interfaces

• SDK (software development kit)─ Compiler─ Assembler, linker─ Simulator─ Reference codes

Page 21: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 21

Lab3: Forwarding, Counting & ClassificationLab3: Forwarding, Counting & Classification  • Goal: to explore the basic functionalities of the IXP2400

software development kit and Microengines. • 3 parts:

─ Part I: collecting a number of workload statistics from the IXP SDK simulator. Follow steps of lab instruction.

─ Part II: adding one counting block to count the number of

packets.

─ Part III: implementing a simple packet classification mechanism.

• Tools: All three parts require access to a machine that has the Intel SDK installed. If you want, you can also request an installation CD for your own machine, check with TA.

Page 22: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 22

Part I: Part I: Forwarding SimulationForwarding Simulation• run an implementation of IP forwarding on the

IXP2400 simulator. All the code is provided to you.

• collect a set of workload statistics that are reported by the simulator.

Page 23: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 23

Part II: Part II: Forwarding and CountingForwarding and Counting• modify above applications by adding counter

block • store how many packets are received.

Page 24: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 24

Part III: Part III: Classification and CountingClassification and Counting

• classifying packets based on the packet header information. There are four types of traffic that are considered in this lab:─ Web traffic over TCP over IPv4 ─ Non-Web traffic over TCP over IPv4 ─ UDP over IPv4 ─ IPv6

• modifying the code to report the number of packets in each type.

Page 25: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 25

How to do Lab3How to do Lab3• Windows machine with SDK installed• Download lab instructions and source code from

blackboard• Start early.• Very exciting lab.• Due day

─ Part I and Part II 10/13 ─ Part III 10/20