34
HPC on Wall Street - 2012 20-25% CAGR in market volumes Competitive advantage hinges on speed, transparency, and proximity to data sources. The application must be in the data path – seamlessly Quest to balance risk/compliance with performance

Arista @ HPC on Wall Street 2012

Embed Size (px)

Citation preview

Page 1: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

20-25% CAGR in market volumes

Competitive advantage hinges on speed, transparency, and proximity to data sources. The application must be in the data path – seamlessly

Quest to balance risk/compliance with performance

Page 2: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

10GbE Switches for the Virtualized Data Center, but a software company at the core

>1300 Customers>325 EmployeesProfitable, self-funded, pre-IPO network infrastructure providerOpen Linux-based OSFully automated testing, and SW development

Page 3: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

• Couples ultra-low latency switch with next generation programmable FPGA and memory subsystem

• Customer programmable FPGA and Control Plane provides total control over the network, forwarding, inspection, redirection, etc.

• Targeted for early adopters of hardware accelerated applications such as risk analysis, data arbitrage, order routing

Arista Application Switch - 7124FX

Page 4: Arista @ HPC on Wall Street 2012

4v 1

Exegy believes…

• Exegy believes in continually challenging the status quo of market data delivery systems and trading platforms.– First to market with hardware-accelerated market data appliances

based on FPGA technology.– Best of breed solutions for major use cases faced by low-latency,

high-capacity consumers of financial market data feeds.

• Exegy believes that delivery and consumption of quality market data should be as easy and painless as possible.– Fully managed and constantly monitored appliances to assure

optimal performance and the best customer experience.– A passion to help our customers succeed in the face of escalating

complexity and the increasing demands placed on them.

Page 5: Arista @ HPC on Wall Street 2012

Converting C to multiple streaming hardware processes ain’t that hard.

Focus on reducing clock cycles

Verify as you go

Iterate, iterate, iterate (no “magic button”)

The tool flow is a bit awkward for first timers.

Visual Studio or equivalent

Impulse C co-development, analysis & compile

Altera Quartus II for place & route into FPGA

Things you can do to get up to speed quickly:

Work from known good sw modules

Get up-front training or factory engineering

Impulse C, Custom FPGA-Accelerated Solutions for the Arista 7124FX

Brian Durwood, Co-founder

Page 6: Arista @ HPC on Wall Street 2012

Programming With Impulse CNot a new language

Based on standard ANSI C

C-language for FPGA programming For embedded and HPC applications Supports standard C development tools Supports multi-process partitioning

A software-to-hardware compiler Optimizes C code for parallelism Generates HDL, ready for FPGA synthesis Also generates hardware/software interfaces

Purpose Describe hardware accelerators using C Move compute-intensive functions to FPGAs

C languageapplications

HDLfiles

Generateacceleratorhardware

Generatehardwareinterfaces

Generatesoftwareinterfaces

C softwarelibraries

Arista’s on-board

FGPA

www.ImpulseAccelerated.com

Page 7: Arista @ HPC on Wall Street 2012

Reference slides from hereafter

7www.ImpulseC.com

Page 8: Arista @ HPC on Wall Street 2012

Custom FPGA-Accelerated Solutions for the Arista 7124FX

Brian Durwood, Co-founder

Converting C to Multiple Streaming Hardware Processes

Page 9: Arista @ HPC on Wall Street 2012

FPGAs – Advantages Over Software

Massive parallelism At system level, loop level, instruction level

One FPGA can replace multiple CPUs For specific tasks/algorithms, using much lower power

No need for separate NIC card Enable in line processing at near line speed

Minimize OS interference in filtering Especially during high transaction load events Reduces jitter and other interference

Offloads standard CPUs with customized pre-processors e.g. select limited analysis of X message types that meet X criteria for X symbols

9www.ImpulseAccelerated.com Confidential

Page 10: Arista @ HPC on Wall Street 2012

10

3 Popular FPGA Configurations

GeneratedHardwareModule

Hostprocessoror cluster

1

EmbeddedHardware

Accelerators

EmbeddedCPUCore

Create a hardware moduleAccelerate an embedded CPU

Accelerate anexternal/host CPUor computingcluster

Usage Option

2

Usage

3

Usage

Generatedhardwaremodule Generated

hardwareaccelerator

Generatedhardware

accelerator

Generatedhardware

accelerator

Generatedhardware

accelerator

Generatedhardware

accelerator

FPGAFPGA

FPGA coprocessor

Page 11: Arista @ HPC on Wall Street 2012

FPGA

www.ImpulseAccelerated.com

Configurations Can Be Combined

FPGA

Combining streaming, embedded processor, and host processor

FPGA strategies can be coded using C for hardware and for embedded CPU, with shared RAM for hash table lookup or other local data

Matchingalgorithm

and strategy

Streamprocessing

andparsing

Hostmessage

generation

10G Ethernet EmbeddedCPUfor

configuration

Embedded and shared RAM

Page 12: Arista @ HPC on Wall Street 2012

12www.ImpulseAccelerated.com

Impulse C Programming Model

Communicating C-Language Processes Supports dataflow and message-based communications Supports parallelism at the application level and at the level of

individual processes Allows simulation and

debugging of parallel software processes.

S/W process

H/W process

H/W process

H/W process

S/W process

Page 13: Arista @ HPC on Wall Street 2012

13www.ImpulseAccelerated.com

Parallelism via Multiple Processes

Spatialparallelism

Temporalparallelism

(system-level pipelining)

C

Page 14: Arista @ HPC on Wall Street 2012

14www.ImpulseAccelerated.com

C

C

C

C

An Impulse C Process

Processes are independently synchronized

Shared memoryblock reads/writes

Streaminputs

Streamoutputs

Registerinputs

Registeroutputs

App Monitoroutputs

Signalinputs

Signaloutputs

Multiple methods ofprocess-to-processcommunicationsare supported

Cprocess

Page 15: Arista @ HPC on Wall Street 2012

Compile and Optimize

15www.ImpulseAccelerated.com

Optimize the results using interactive tools Pipeline analysis Loop unrolling Instruction scheduling

Generate FPGA hardware VHDL or Verilog Low level interfaces to

memory, I/O and busses. ModelSim Test bench

Page 16: Arista @ HPC on Wall Street 2012

Debug and Verify

16www.ImpulseAccelerated.com

Use C tools for application debugging Source-level debuggers C-language testing

Test and analyze paralleldataflow with the Impulse Application Monitor

Automatically generate VHDL or Verilog Test-benches

Page 17: Arista @ HPC on Wall Street 2012

17www.ImpulseC.com

co_stream_create  

co_stream_open  co_stream_close  co_stream_eos  

co_stream_read  co_stream_write  

co_stream_read_nb  co_stream_write_nb

Constructs Familiar to C Programmers

Used in configuration 

Open the stream (clear eos)  Close the stream (set eos) Check end of stream (eos) 

Read from stream (with rdy, en) Write to stream (with rdy, en)

Non-blocking read  (no rdy)No-blocking write (no rdy)

Concept is similar to getc(), putc() in C for I/O

Page 18: Arista @ HPC on Wall Street 2012

18

Credible Solution in use by:

www.ImpulseAccelerated.com Confidential

Multiple ConfidentialFinancial

NDA CoveredFinancial Teams

Page 19: Arista @ HPC on Wall Street 2012

PSP generates HW/SWwrappers between FPGAcore & system elements

Produces

19

Impulse Platform Support Package

FPGAFabric Processing

CoreImpulse

CoDeveloper™

Other I/O

Extensions (scripts and wrapper generators) Platform-specific library functions Documentation and tutorials Current ready to run examples for platform

www.ImpulseAccelerated.com Confidential

Ethernet

Host Interfaces

MemoryResources

FPGAEmbeddedProcessor

Page 20: Arista @ HPC on Wall Street 2012

Examples of FPGA processing:

Financial feed kernel bypass or Full Hardware based trading Direct handling of financial feeds

Parsing incoming feeds and triggering outbound orders – your strategy in hardware

Normalization or Protocol Conversion Gateway sending a sub-feed of data

Pre-Trade Risk Checking Low Latency Broker Dealer Compliance

Financial valuations Co-processor off-loading for Monte Carlo

and other algorithms

20www.ImpulseAccelerated.com Confidential

Page 21: Arista @ HPC on Wall Street 2012

1G or 10G Ethernet

MAC

RX Adapter

(Verilog)

TX Adapter

(Verilog)

Feed Handlerand

Outbound UDP(Impulse C)

Stand-Alone Feed Handling Solution

3

Usage

www.ImpulseAccelerated.com Confidential 21

Page 22: Arista @ HPC on Wall Street 2012

HostI/O Interface

MAC1/10GigEUDP Parser

and/or TCP/IP Stack

CustomFiltering

Application

FPGA

EmbeddedCPU

Host System

Driver

User Applica-

tion

Host Memory

EnetFilter

Network Processing Pipeline UDP and TCP/IPimplementeddirectly in FPGAhardware for lowlatency

www.ImpulseAccelerated.com Confidential 22

Page 23: Arista @ HPC on Wall Street 2012

www.ImpulseAccelerated.com Confidential

Complex Order Support

Incoming Outgoing

Replace NIC Apply Trade Logic

Processing without OS

Revert feed to exchange formats

Hardwire potential X required responses

Normalizing Across Feeds

Ultra-fast pattern matching

Message Management With Exchanges

Decompression

Pull and Present Opportunities

Decryption

Produce Sub-Feed

Replace NIC

Manage Risk

Insert risk limitations awaiting confirm

23

Standard and

Custom Feed

Handler Formats

e.g.: ITCH, OUCH, OPRA,

BATS, & Generic

UDP.

Standard and

Custom Feed

Handler Formats

e.g.: ITCH, OUCH, OPRA,

BATS, & Generic

UDP.

Exch

ange

s, fe

ed h

andl

ers,

ord

er d

ata

sour

ces

Exch

ange

s, fe

ed h

andl

ers,

ord

er d

ata

sour

ces

Adapters RMDS,

Bloomberg and

Custom.

Adapters RMDS,

Bloomberg and

Custom.

10 Gb/S Ethernet

FPGA or FPGA-Based Board

Dire

ct c

onne

ction

Impu

lse

UD

P/TC

PD

irect

con

necti

on Im

puls

e U

DP/

TCP

Page 24: Arista @ HPC on Wall Street 2012

24www.ImpulseAccelerated.com Confidential

Three Ways To Get Started

Learn the tools Acquire an Impulse CoDeveloper license. Work from the included reference designs. Experiment with ways to optimize your algorithms to run efficiently as

multiple streaming processes in FPGA.

Turn Key System (“Bump in the Wire”) License above + UDP or other network attached FPGA-enabled reference design. FPGA-based accelerator platform. Impulse factory engineers to help get your system on line.

Turn Key System Running A Target Algorithm License above + Turn Key System above + Impulse Engineers, under NDA, refactor your target algorithm(s) for

efficient compilation to FPGA. Impulse Engineers train your team on how the refactoring works.

Page 25: Arista @ HPC on Wall Street 2012

About Impulse

Most widely used C to FGPA tool

Pure ANSI C No PAR or HW statements inserted

Founded in 2002By part of the original ABEL team

25www.ImpulseAccelerated.com Confidential

Page 26: Arista @ HPC on Wall Street 2012

26www.ImpulseAccelerated.com

Additional Resources

Engineering [email protected]

Tutorials:www.ImpulseAccelerated.com/Tutorials

Book:Practical FPGAProgramming in C

Page 27: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

Compute, Storage, Memory, I/O, Application Acceleration – Together

Arista Application Switch – Systems Design

Page 28: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012Application Switching for Cloud

Networks

High Availability:Dual Hot-swappable Power SuppliesMultiple Hot-swappable Fan Units

Designed for Data Center + Colocation:Flexible Front-to-Rear or Rear-to-Front AirflowChoice of AC or DC Power Supplies

Platform Details

24 Wirespeed 1G/ 10G SFP/ SFP + Ports

Air VentsConsole Port

Management PortUSB Port16 Base SFP/SFP+ Ports 8 FX SFP/SFP+ Ports

Clock Input

Page 29: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

Ultra Low Latency 24 port 10GbE Switch•16 10GbE ports connected to LLE ASIC•8 10GbE ports connected through Stratix V FPGA•Built in 50GB SSD•Optional Chip-Scale Atomic Clock and External Clock Source

Arista Application Switch - 7124FX

Page 30: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

Application Switch Markets

Page 31: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

Instrument transaction performance at high resolution

Offload line arbitration to dramatically improve application performance

Reducing system latency increases performance of trading strategies

Financial Services Applications

Algorithmic trading

Feed Handling and A/B Arbitration

Real-time Data analysis

Convert or normalize multiple order entry formats to a common formatOrder Protocol Conversion

Set order policies for best executionOrder Execution Routing

Low Latency Broker Dealer Compliance Inline Risk Analysis

Application Switching for Cloud Networks March 19, 2011

Page 32: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

Developing on the Application Switch

Page 33: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

Application Switch Development Partners

Complete integrated appliance model• Novasparks 100% Hardware market data solution

• Exegy Appliance based robust ticker plant

System integrators and development support

• Impulse C C to RTL tools

• Enyx Customer trading solutions and IP blocks

Page 34: Arista @ HPC on Wall Street 2012

HPC on Wall Street - 2012

A new category of product that provides a network accelerated platform for high performance app vendors to develop on

Combines a true network switch with full routing and switching protocols, with fully-programmable hardware creates a new market for the most demanding applications

Application logic inserted into real-time environments with complete transparency

Arista Application Switch 7124FX