Upload
dinhhuong
View
244
Download
1
Embed Size (px)
Citation preview
Copyright (C) 2016 Intel Corporation All Rights Reserved
IvyTown Xeon + FPGA: The HARP Program
Xeon+FPGA tutorial @ ISCA 2016
David Sheffield / Intel Labs/ [email protected]
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
Legal DisclaimerCopyright (C) 2008-2016 Intel Corporation All Rights Reserved.
The source code contained or described herein and all documents related to the source code ("Material") are owned by Intel Corporation or its suppliers or licensors. Title to the Material remains with Intel Corporation or its suppliers and licensors. The Material contains trade secrets and proprietary and confidential information of Intel or its suppliers and licensors. The Material is protected by worldwide copyright and trade secret laws and treaty provisions. No part of the Material may be copied, reproduced, modified, published, uploaded, posted, transmitted, distributed, or disclosed in any way without Intel's prior express written permission.
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved3
Prototype Xeon+FPGA* system disclaimer
Results and details in this presentation were generated using pre-production hardware and software, and may not reflect production or future systems
This talk is about prototype hardware and software which has been made available to
universities in the HARP program.
Details of production Xeon+FPGA systems will be made available at a later date
*System described today is officially known as the “Intel QuickAssist QPI FPGA Platform”
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
Module
IvyTown Xeon + Stratix V FPGA
4
QPI
DDR3
DDR3
DDR3
DM
I2
PC
Ie*
3.0
x8
PC
Ie*
3.0
x8
PC
Ie*
3.0
x8
PC
Ie*
3.0
x8
PC
Ie*
3.0
x8
DDR3
Intel® Xeon®
E5-2600 v2 Product Family
FPGA
Processor Intel® Xeon® E5-26xx v2Processor
FPGA Module Altera Stratix V
QPI Speed 6.4 GT/s full width
Memory to FPGA Module
2 channels of DDR3 (not used on HARP platform)
FeaturesConfiguration Agent, CachingAgent,, (optional) Memory Controller
Software
Accelerator Abstraction Layer (AAL) runtime, drivers, sample applications
Accelerating Workloads using Xeon and coherently attached FPGA in-socket
HSSI x8
Top Side High Speed Connector
JTA
G
SKT 0 SKT 1
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel CorporationCopyright (C) 2016 Intel Corporation All Rights Reserved
Intel QPI IP
Anatomy of an IvyTown Xeon+FPGA solution
5
User logic (FPGA-based accelerator)
SW Application
Main Memory
Intel AAL library
Intel SPL2 IP
Intel AAL kernel module
Intel IvyTown Xeon
Xeo
nFP
GA
AAL provides C++ API for FPGA resource management
Linux kernel module manages FPGA page table
Intel cache coherent interconnect
Virtual memory support and transaction reordering
User is free to program in any language with C++ bindings
User defined Intel provided
Color key
Intel provides hardware template, user defined accelerator RTL (assuming HDL flow)
10-core Intel Xeon E5-2680 v2
Intel S2600CP motherboard with custom BIOS
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
IvyTown Xeon + FPGA tool flow
6
C/C++ Verilog
AAL SDK / compiler Quartus
exe bit-stream
HDL Programming
Intel® Xeon® FPGA
OpenCLhost
OpenCL kernels
Compiler /Run-time
OpenCLfor FPGA
exe bit-stream
OpenCL™ Programming
Intel® Xeon® FPGA
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved7
Agenda
• An overview of the IvyTown + FPGA system
• Hardware details
• HARP program overview
• Current status
• What’s next?
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
HARP System – IvyTown Xeon + Stratix V FPGA
8
Standard 2U server – FPGA module is the only custom hardware!
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved9
HARP FPGA module
Top -- StratixV FPGA (5SGXEA7N1F45C1)
Bottom – Socket R (LGA 2011)
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
System Logical View
Accelerator Function Units (AFUs) can access coherent cache on FPGA
§ AFU written by end-user
Intel® Quick Path Interconnect (Intel® QPI) IP participates in cache coherency with Xeon CPUs
Cores LLC AFUsQPI
DRAM
DDR
DRAMDRAM
Processor FPGA
CCI
Multi-processor Coherence Domain Cache access Domain
Cache
Intel QPI IP
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
IvyTown Xeon + FPGA microarchitecture
PHY – Implements the Intel QPI PHY 1.1 (Analog/Digital)
Intel QPI Link layer- provides flow control and reliable communication
Intel QPI Protocol – implements Intel QPI Cache Agent + Configuration Agent
Cache Controller – Cache hit/miss determination and generates Intel QPI protocol requests.
Cache Tag – Tracks state of cacheline (MESI + internal states for tracking outstanding requests)
Coherency Table – Programmable table that implements coherency protocol rules
System Protocol Layer (SPL2) – Implements Address translation functionality. Can provide up to 2GB device virtual address space to AFU. SPL2 cannot handle page faults.
AFU – User designed Accelerator Function Unit
QPI Link / Protocol Control
QPI PHYRx Align Tx Align
Rx Control Tx Control
Cachecontroller
Cache Data
CacheTag
CacheTable
Rx
Tx
QPI
SPL2
CCI-ERx
Tx
CCI-S
IntelQPIFPGAIP
Address translation
AcceleratorFunctionUnit(AFU)
Reorder Buffer
IntelXeon®Processor
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
SPL2/AFU2 details
12
Virtua-to-physicaladdresstranslator
AFUQPI/PCIe
Re-orderbuffer
VirtualaddressPhysicaladdress
Out-of-orderresponses
CCI-ECCI-SSPL
In-orderresponses
Pagetable
QPI AFU
Xeon and FPGA can share pointer-based data-structures with CCI-E
AFU
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
CCI-S vs CCI-E
Capability CCI-S CCI-E with SPL2
Req Header/Address width 32-bits 58-bits
Address granularity 64B 64B
Addressing Mode Physical Addressing Virtual Addressing
Maximum workspace size Function of OS. Upto 4MB per workspace possible.
2GB workspace
# of workspaces >1 1 (current restriction)
Read response reorder buffer No Yes
Write ordering No No
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
Cache protocol flows
14
Cache hit/miss scenarios
Read Miss (CPU cache hit)
Read Miss (CPU cache miss)
Read Hit
Write Miss
Write Hit
AFU
FPG
A
Cac
he
CP
U
LLC
Syst
em
Mem
ory
CCI : Core Cache interface
QPI
DDR
1234
1 3 1
1 4 11 2 11 31 2
SKTSKT 11
Choi, Young-kyu, et al. "A quantitative analysis on microarchitectures of modern CPU-FPGA platforms." Proceedings of the 53rd Annual Design Automation Conference. ACM, 2016.
For IvyTown + FPGA latency and bandwidth statistics, see academic literature:
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
AFU Interface
15
99/61b
99/61b
C0 Tx Header
C0 Tx RdValid
C1 Tx Header
C1 Tx WrValid
18/24b
512b
C0 Rx Header
C0 Rx Data
C0 Rx RdValidC0 Rx WrValidC0 Rx CfgValid
C1 Rx Header
C1 Rx WrValid 512b C1 Tx Data
C0 Tx Almost Full
C1 Tx Almost Full
LP Init Done
Clk32UI
SoftReset_n
C1 Tx IntrValid
C0 Rx UMsgValid
C1 Rx IntrValid
C0 Rx IntrValid
SystemReset_n
Accelerator Function Unit (AFU)
Clk16UI
18/24b
• 2 TX channels:
• Ch0 for Memory Read
• Ch1 for Memory Write
• Independent flow control of each Tx channel. Will accept upto 4 Requests after Almost Full asserted
• MData field: to identify requests and responses
• No flow control on Rx channel: AFU must provision Rx buffer before request is sent
• SoftReset_n must clear all AFU state
End-user develops AFUs
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved
Simulation environment - ASE
Host Application
Virtual Memory API
Addr Translation & Response re-ordering
QPI Link, Protocol, & PHYQPI Link, Protocol, & PHY
CPU FPGA
QPI
CCIextended
CCIstandard
Accelerator Function Units (AFU)
Service API
Physical Memory API
16
SYSTEMMEMORY
AFUSimulationEnvironment(ASE)
SWbackendInterface
Inter-ProcessCommunicationMemoryManagement
(fakephysicalmemory,rangechecking)
ProtocolEngine Checker
SimulatorHandles
(3rdpartytool)
LatencyModel
CCIAFU/SPLAFU SPL
AALXLAPI
CCIStandard
AALSDKSWApplication
HardwareAFURTL
IntelAALSDK,ASE,SPLcomponents
AFUComponents,AALUserApplication
CCIExtended
True HW/SW co-development environment for Intel Xeon+FPGA
HW deployment ASE simulation environment
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved17
Demo – Sudoku solver on IvyTown + FPGA
• Took Sudoku solver RTL from OpenCores and added SPL interface
• http://opencores.org/project,sudoku
• Simple application built in C++ with AAL
• Optimized Xeon CPU implementation
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved18
Demo
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved19
Agenda
• An overview of the IvyTown + FPGA system
• HARP program overview
• Current status
• What’s next?
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved20
Recap from HW overview: Xeon+FPGA unique featuresFPGA with coherent low-latency interconnect:
Simplified programming model
§ Support for virtual addressing
§ Data caching on FPGA
Enables new classes of algorithms for acceleration with:
§ Full access to system memory
§ Support for efficient irregular data pattern access
Remapping of algorithms from off-load model to hybrid (CPU+FPGA) processing model
• Fine grained interactions
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved21
Intel Xeon+FPGA academic program goals
Win academic mindshare by providing
1. Early access to prototype Xeon+FPGA systems
2. Generate community around Xeon+FPGA
3. Technology leadership through world-class FPGA research
Nurture early users to demonstrate value and accelerate future Xeon+FPGA adoption
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved22
HARP - call for proposals (early 2015)
http://www.sigarch.org/2015/01/17/call-for-proposals-intel-altera-heterogeneous-architecture-research-platform-program/
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved23
HARP program – areas of interest
Architecture research
Application research
Programming systems research
OS and middleware
research
Algorithm research
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved24
HARP call for proposals generated buzz
“The project is called HARP, short for Heterogeneous Architecture Research Platform. Mars and his team are applying to receive and test one of these boards. Although the world is moving toward thing like FPGAs, traditional server chips will continue to be used as well, and these boards explore ways of marrying the two. “It’s a very good move for Intel” Mars says.”
http://www.wired.com/2015/03/intel-exploring-biggest-takeover-ever/
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved25
Awardees & Research Areas
Received 100+ proposals, awarded 25 systems
• Application evaluation - 16 proposals
• Architectural tools and frameworks – 8 proposals
• Programming tools and methodology – 6 proposals
• OS and networking techniques – 3 proposals
More accepted proposals than awarded systems (some research groups sharing a system)
Global community, across continents
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved26
HARP program – history
Call for proposals early 2015
Notified awardees mid spring 2015
Full day tutorial @ Intel in June of 2015
• Intel provided full day of tutorials and hands-on activities
• 50 attendees
Systems shipped to universities mid-summer 2015
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved27
Agenda
• HARP program overview
• Current status
• What’s next?
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved28
Winning Academic Mindshare with HARP1. Academics are focusing on novel hybrid CPU—FPGA use cases
• Before: what can I offload to FPGA?
• Now: what’s CPU great at? what’s FPGA great at? how to collaborate?
• E.g., Genomics, Database, Graph/irregular, Sort
2. Academics are rethinking hybrid CPU – FPGA systems
• FPGA is becoming 1st class citizen, tighter integration to CPU
• What technologies needed to best take advantage of hybrid CPU-FPGA systems?
• E.g., JIT to FPGA, SPARK cloud + FPGA, OpenMP for FPGA
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved29
Winning Academic Mindshare with HARP (cont.)3. Academics are publishing on top FPGA conferences using Xeon-FPGA
• ISFPGA (Feb 2016): 1 out of 20 full papers use HARP
• FCCM (May 2016): 2 out of 18 full papers use HARP
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved30
Publications using HARP platformUCLA
• “The SMEM Seeding Acceleration for DNA Sequence Alignment” – FCCM16
• “A Quantitative Analysis on Microarchitectures of Modern CPU-FPGA Platforms” – DAC16
CMU
• “A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems” – ISFPGA16
ETHZ
• “Runtime Parameterizable Regular Expression Operators for Databases” – FCCM16
USC
• “High Throughput Large Scale Sorting on a CPU-FPGA Heterogeneous Platform” – RAW 2016
Even more papers in submission!
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved31
Agenda
• HARP program overview
• Current status and outcomes
• What’s next?
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved32
Where do we go with HARP?
“While at the ACM FPGA Conference in Monterey, we met with several students and professors who do not have access to the program who were just generally interested in how things were going.” – USC feedback
We continuously get email from well-known researchers asking if systems are still available
We want to grow the HARP program!
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved33
Where to go with HARP -- thoughts
1. Increase productivity of existing HARP community
• Share apps, tools, and IP repository
• Regular online symposiums / seminars. Workshop at Intel
• Cloud Xeon-FPGA setup to put systems/tools within reach
2. Grow community by attracting more experts to join
• Tutorial / workshops @ premier conferences
• Please contact me if you’re interested in participating
Intel ConfidentialCopyright (C) 2008-2015 Intel Corporation All Rights Reserved - Do not DistributeCopyright (C) 2016 Intel Corporation All Rights Reserved34
Questions
Thank you for attending!
If you are interested in getting access to the SDK/HDK, please email me @ [email protected]