30
Department of Particle & Particle Astrophysics Department of Particle & Particle Astrophysics Sea-Of-Flash-Interface Sea-Of-Flash-Interface SOFI introduction and status The PetaCache Review Michael Huffer, [email protected] Stanford Linear Accelerator Center November 02, 2006

Department of Particle & Particle Astrophysics Sea-Of-Flash-Interface SOFI introduction and status The PetaCache Review Michael Huffer, [email protected]

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Department of Particle & Particle AstrophysicsDepartment of Particle & Particle Astrophysics

Sea-Of-Flash-InterfaceSea-Of-Flash-Interface

SOFI introduction and status

The PetaCache Review

Michael Huffer, [email protected]

Stanford Linear Accelerator CenterNovember 02, 2006

Department of Particle & Particle Astrophysics

2

Department of Particle & Particle Astrophysics

OutlineOutline

• Background– History of PPA involvement– Synergy with current activities

• Requirements– Usage model– System requirements– Individual client requirements

• Implementation– Abstract model and features– Building Blocks

• Deliverables– Packaging

• Schedule– Status– Milestones

• Summary– Reuse– Conclusions

Department of Particle & Particle Astrophysics

3

Department of Particle & Particle Astrophysics

BackgroundBackground• Research Engineering Group supports a wide range of activities with limited resources

– LSST, SNAP, ILC, SiD, EXO, LHC, LCLS, etc

• In order to utilize these resources most effectively requires understanding:– core competencies– the requirements of future electronics systems

• Two imperatives for REG:– Support upcoming experiments– Build for the future by advancing core competencies

• What are:– More detailed examples of a couple of upcoming experiments?– The necessary core competencies?

Department of Particle & Particle Astrophysics

4

Department of Particle & Particle Astrophysics

LSSTLSST

SLAC/KIPAC is lead institution for the camera– Camera contains > 3 gigapixels

• > 6 gigabytes of data/image• Readout time is 1-2 seconds

– KIPAC delivers camera DAQ system

“The Large Synoptic Survey Telescope (LSST) is a proposed ground-based 8.4-meter, 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night. In a relentless campaign of 15 second exposures, LSST will cover the available sky every three nights, opening a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects. The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy.”

Department of Particle & Particle Astrophysics

5

Department of Particle & Particle Astrophysics

SNAPSNAP

• SLAC is lead institution for all non-FPA related electronics– One contact every 24 hours– Requires data to be stored on board instrument– Storage capacity is roughly 1 Terabyte (includes

redundancy)– Examining NAND flash as solution to storage problem

“The Supernova/Acceleration Probe (SNAP) satellite observatory is capable of measuring thousands of distant supernovae and mapping hundreds to thousands of square degrees of the sky for gravitational lensing each year. The results will include a detailed expansion history of the universe over the last 10 billion years, determination of its spatial curvature to provide a fundamental test of inflation - the theoretical mechanism that drove the initial formation of structure in the universe, precise measures of the amounts of the key constituents of the universe, ΩM and ΩL, and the behavior of the dark energy and its evolution over time.”

Department of Particle & Particle Astrophysics

6

Department of Particle & Particle Astrophysics

Core competenciesCore competencies

• System on Chip (SOC)– Integrated processors and functional blocks on an FPGA

• Small footprint, high performance, persistent, memory systems– NAND Flash

• Open Source R/T kernels– RTEMS (Real-time Executive for Multiprocessor Systems)

• High performance serial data transport and switching– MGTs (Multi-Gigabit Transceivers)

• Modern networking protocols:– 10 Gigabit Ethernet– InfiniBand– PCI-Express

Department of Particle & Particle Astrophysics

7

Department of Particle & Particle Astrophysics

PetaCache consistent with mission?PetaCache consistent with mission?

ProjectProjectUse core technology?Use core technology?

SOC Memory R/T kernels H/S transport

LSST yes no yes yes

SNAP no yes yes no

Petacache yes yes yes yes

Main Entry: syn·er·gy   Pronunciation: 'si-n&r-jEFunction: nounInflected Form(s): plural -giesEtymology: New Latin synergia, from Greek synergos working together1 : SYNERGISM; broadly : combined action or operation2 : a mutually advantageous conjunction or compatibility of distinct business participants or elements (as resources or efforts)

Department of Particle & Particle Astrophysics

8

Department of Particle & Particle Astrophysics

Usage modelUsage model• System requirements:

– Scalable, both in: • Storage capacity• Number of concurrent clients

– Large address space– Random access– Support population evolution

• Features:– Changes are quasi-adiabatic

• “Write once, read many”– Able to treat as Read-Only system

• Requirements not addressed in this phase:– Access Control– Redundancy– Cost

Data Storage

distribution, transport & management

Host

client

“Lots of storage, shared concurrently by many clients, distributed over a large number of hosts”

Department of Particle & Particle Astrophysics

9

Department of Particle & Particle Astrophysics

Client RequirementsClient Requirements

• Uniform access time to fetch a “fixed” amount of data from storage– Implies: deterministic and relatively “small” latency in round-trip time

• Where: “fixed” is O(8 Kbytes) & “small” O(200 micro-seconds)

– Need approximately 40 Mbytes/sec between client & storage• Access time scales independent of:

– Address– Number of concurrent clients

Two contributions to latency:– Storage access time– Distribution, transport, and management overhead

Petacache project focus is on this issue alone

SOFI architecture attempts to address both issues

Department of Particle & Particle Astrophysics

10

Department of Particle & Particle Astrophysics

Abstract modelAbstract model

• Key features:– Available concurrency and bandwidth

scales with storage capacity– Many individual “Memory servers”

• Access granularity is 8 Kbytes• 16 GBytes of Memory/server• 40 Mbytes/sec/server

– Load Leveling• Data randomly distributed over memory

servers• Multicast for concurrent addressing• Both client & server side caching

– Two address spaces• Physical page access• Logical block access• Hides data distribution from client

– Network Attached storage

Memory serverFlash Memory Controller

(FMC)

Client

Content Addressable

Switching

Department of Particle & Particle Astrophysics

11

Department of Particle & Particle Astrophysics

Building BlocksBuilding Blocks

HostClient Interface (SOFI)

(1 of n)

1 Gigabit Ethernet (.1 GByte/sec)

(1 of n)

Application specific

Cluster Inter-Connect Module (CIM)

Host Inter-connect

1 Gbyte/sec PGP (Pretty Good

Protocol)

Network Attached Storage8 x 10 G-Ethernet

(8 GByte/sec)

10 G-Ethernet

Slice Access Module(SAM)

Four Slice Module (FSM)

256 GByte Flash

Department of Particle & Particle Astrophysics

12

Department of Particle & Particle Astrophysics

Four Slice Module (FSM)Four Slice Module (FSM)

clock configurationFPGA

PHY

to DIMM

initiator

PGP & command encode/decode

FMC1

CRC-In CRC-Out

FMC2

FMC3

FMC4

to PHY

out-bound transfer

& decode

in-bound transfer

& encode

in-bound

arbiter

out-bound

arbiter

1 DIMM (8 devices)

32 GBytes

1 x 4 slices

Department of Particle & Particle Astrophysics

13

Department of Particle & Particle Astrophysics

Flash Memory Controller (FMC)Flash Memory Controller (FMC)

• Implemented as Core IP• Controls 16 GBytes of memory (4 devices) in units of:

– Pages (8 Kbytes)– Blocks (512 Kbytes)

• Queues operations– Read Page (in units of 128 byte chunks)– Write Page– Erase Block– Read statistics counters– Read device attributes

• Transfers data at 40 Mbyte/sec

Department of Particle & Particle Astrophysics

14

Department of Particle & Particle Astrophysics

Universal Protocol Adapter (UPA)Universal Protocol Adapter (UPA)

Left side MFD

FPGA (SOC)200 DSPs

Lots of gatesXilinx XC4VFX60

Fabric clock

Right sideMGT clock

Right sidePPC-405

(450 MHZ)

Right sideConfiguration

memory128 Mbytes)

Samsung K9F5608

Right sideMemory

(512 Mbytes)Micron RLDRAM II

Right sideMulti-Gigabit Transceivers

(MGT)8 lanes

Left side 100-baseT

Reset

Reset optionsJTAG

The SAM is ½ of a UPA pair

Department of Particle & Particle Astrophysics

15

Department of Particle & Particle Astrophysics

UPA FeaturesUPA Features

• “Fat” Memory Subsystem– Sustains 8 Gbytes/sec– “Plug-In” DMA interface (PIC)

• Designed as a set of IP cores • Designed to work in conjunction with MGT and protocol cores

• Bootstrap loader (with up to 16 boot options and images)• Interface to configuration memory• Open Source R/T kernel (RTEMS)• 100 base-T Ethernet interface• Full network stack

“Think of the UPA as a Single Board Computer (SBC) which interfaces to one or more busses through its MGTs”

Department of Particle & Particle Astrophysics

16

Department of Particle & Particle Astrophysics

UPA Customization for SAMUPA Customization for SAM

• Implements two cores:– PGP – 10-GE

• All 8 lanes of MGT used:– 4 lanes for PGP core– 4 lanes for 10-GE

• Network driver to interface 10G-E to network stack • Executes application code to satisfy:

– Server side of SOFI client interface• Physical to Logical translation• Server side caching

– FSM management software• Proxy FMC command set• Maintains bad blocks• Maintains available blocks

Department of Particle & Particle Astrophysics

17

Department of Particle & Particle Astrophysics

(Cluster Inter-connect Module (CIM)(Cluster Inter-connect Module (CIM)

(21)(16)(16)(16)(16) (21) (22)

(4) (4)(4)(4)

(8)

to SAMs (high-speed) to SAMs (low-speed)

to host inter-connect (management-network)to host inter-connect (data-network)

High Speed SwitchData

(24 x 10-GE)Fulcrum FM2224

Switch management

(UPA)

10 GE

(XUI)

10 GE

(XUI)Low-speed

SwitchManagement

(24 x FE + 4 x GE)Zarlink ZL33020

1000 baseT

100 baseT

1000 baseT

100 baseT

Department of Particle & Particle Astrophysics

18

Department of Particle & Particle Astrophysics

Client/Server InterfaceClient/Server Interface

• Client Interface resides on host• Servers reside on SAMs• Any one client on any one host has uniform access to all flash storage• Client accesses flash through network Interconnect• Abstract Interconnect model

– Delivered implementation is IP (UDP and multicast services)• Interface delivers three types of services:

– Random Read access to objects within the store– Population of objects within the store (Write and Erase access)– Access to performance metrics

• Client Interface is Object-Oriented (C++)– Class library (distributed as a set of binaries and header files)

• Two address spaces (physical & logical)– Client access information only in logical space– Client is not sensitive to actual physical location of information– Population distribution is pseudo-random ( static load leveling)

Department of Particle & Particle Astrophysics

19

Department of Particle & Particle Astrophysics

221

22

22

232

20

Page

Slice

Manager

Interconnect

AddressingAddressing

Controller

20 x 232 x 22 x 22 x 221 = 128K peta-pages (1M peta-bytes)

Physical addressing (1 page = 8 Kbytes)

Logical Addressing (1 block = 8 Kbytes)

264

264

264

20

Bundle

Block

Partition

Interconnect

Department of Particle & Particle Astrophysics

20

Department of Particle & Particle Astrophysics

Using the interface Using the interface

• Partition is a management tool– Segment logically storage into disjoint sets– One-to-One correspondence between a partition and a server– One SAM may host more then one server

• Bundle is an organization tool– Bundle belongs to one (and only one) partition– Bundle is an access pattern hint. Allows:

• fetch look-ahead• optimization of overlapping fetches from different clients

• Both partition and bundle are assigned unique identifiers (over all time)• Identifiers may have character names (alias)

– Assigned at population time

• Client query is composed of: partition/cluster/offset/length– offset is expressed in units of blocks– length is express in units of bytes

• Client may query by either identifier or alias

Department of Particle & Particle Astrophysics

21

Department of Particle & Particle Astrophysics

DeliverablesDeliverables

• Two FSMs (8 slices) – 1/2 TByte

• Two SAMs– Enough to support FSM operations

• Client/Server interface (SOFI)– Targeted to Linux

• How will the hardware be packaged?– Where packaging is defined as:

• How the building blocks are partitioned• The specification of the electro-mechanical interfaces

Department of Particle & Particle Astrophysics

22

Department of Particle & Particle Astrophysics

The “Chassis”The “Chassis”

AcceptsDC power

PassiveBackplan

e

8 U

X2(XENPACK MSA)

1UFan-Tray

1UAir-Outlet

1UAir-Inlet

• 2 FSMs/Card– 1/2 TByte

• 16 Cards/Bank– 8 TByte

• 2 Banks/Chassis– 64 SAMS– 1 CIM– 16 TByte

• 3 chassis/rack– 48 TByte

Supervisor Card (8U)

Line Card (4U)

Department of Particle & Particle Astrophysics

23

Department of Particle & Particle Astrophysics

48 TByte facility48 TByte facility

Catalyst 6500 (3 x 4 10GE, 2 x 48 1GE)

SOFI Host ( 1 x 96)xRootD servers

1 chassis

Department of Particle & Particle Astrophysics

24

Department of Particle & Particle Astrophysics

Schedule/StatusSchedule/Status

• Methodology:– Hardware

• Implement 3 “platforms”– One for each type of module

• Decouple packaging from architectural & implementation issues…

– Evaluate layout issues concerning high-speed signals

– Evaluate potential packaging solutions– Allow concurrent development of VHDL & CPU code

IP protocol implementation

IP protocol implementation

Client API

logical/physcial translationcache management

FSM interface

The “wire”

logical/physcial translationcache management

SAM

Host

– Software• Emulate FSM component of server software

– Complete/debug in absence of hardware– Allows clients an “early look” at interface

Department of Particle & Particle Astrophysics

25

Department of Particle & Particle Astrophysics

Evaluation platformsEvaluation platforms

• UPA – Memory subsystem– Bootstrap loader– Configuration memory– RTEMs – Network stack/network driver interface issues

• CIM– Low and high speed management– Evaluate different physical interfaces (including X2)

• FSM Line card (depending on packaging this could be production prototype)– FMC debug– PGP debug

Department of Particle & Particle Astrophysics

26

Department of Particle & Particle Astrophysics

ScheduleSchedule

October November December January February March

schematic

layout

debug

spin/load

design

specification

implement

Chassis/mechanical

PIC

RTEMS/UPA

UPA/PGP

UPA/10GE driver

UPA/10GE MAC

SOFI

UPA platform 1

CIM platform 2

Supervisor PCB 4

Backplane 5

Line Card PCB 3

activities

products

Department of Particle & Particle Astrophysics

27

Department of Particle & Particle Astrophysics

MilestonesMilestones

MilestoneMilestone datedate

RTEMS running on UPA evaluation platform 2rd week December/2006

SOFI (emulation) ready 3rd week January/2007

Supervisor PCB ready for debug 3rd week January/2007

Chassis & PCBs complete 3th week of Febuary/2007

Start Test & Integrate 2nd week of March/2007

Department of Particle & Particle Astrophysics

28

Department of Particle & Particle Astrophysics

StatusStatus

ProductsProductsspecificationspecification designdesign implementationimplementation

SOFI in-progress in-progress

DIMM FCS FSM in-progress

SAM in-progress

CIM in-progress

UPA in-progress

PGP core 10-GE core

The “chassis”

Department of Particle & Particle Astrophysics

29

Department of Particle & Particle Astrophysics

Products & ReuseProducts & Reuse

ProductProduct Targeted for use?Targeted for use?

Petacache LSST Camera DAQ SNAP LCLS DAQ Atlas Trigger Upgrade

UPA yes yes no yes yes

10-GE core yes yes no yes yes

PGP core yes yes no yes yes

FCS yes no yes no no

CIM yes yes no yes yes

FSM yes no no no no

SAM yes no no no no

DIMM yes no no no no

SOFI yes no no no no

The “chassis” yes maybe no maybe maybe

Department of Particle & Particle Astrophysics

30

Department of Particle & Particle Astrophysics

ConclusionsConclusions

• Robust and well developed architecture – Concurrency and bandwidth scale as storage is added– Logical Address space hides client from actual data distribution– Network Attached Storage– Scalable (in size and users)

• Packaging solution may need an iteration…• Schedule

– Somewhat unstable, however… • sequence and activities are to a large degree correct• risk is in development of 10 GE

– Well-along implementation road• Well developed synergy between Petacache and the current activities of ESE

– Great mechanism to develop core competencies– Many of the project deliverables are directly usable in other experiments