Distributed Shared Memory Architecture Di Xu, Jun Song, Fu Wang Alibaba Group {di.x, justin.song, yunfan.wf}@alibaba-inc.com Fig. 1. Alibaba production cluster traces Fig. 2. Memory and Storage Media Hierarchy [1] https://www.computeexpresslink.org/about-cxl [2] Intel Corporation. Intel Non-Volatile Memory 3D XPoint. http: //www.intel.com/content/www/us/en/architecture-and-technology/non-volatile-memory.html?wapkw=3d+xpoint. [3] Yizhou Shan, Yutong Huang, Yilun Chen, Yiying Zhang LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation [4] https://genzconsortium.org/white-papers/ Fig. 3. Distributed Shared and Tiered Memory Architecture with inline acceleration • The distributed shared memory architecture that we are proposing is illustrated in Figure 3. • Tier 1 memory is composed of local system DDR which has the best performance and is intended to serve high SLA workloads. • Tier 2 memory is a pool of SCM that has large capacity to supplement Tier 1. We plan to use FPGAs with CXL interface to control SCMs. The FPGA will support RDMA or GenZ [4] protocols for interconnecting with each other to form a high-speed data plane. • The FPGA also serves as a near memory computing engine, that accelerates typical operations such as compression and encryption etc. • We propose distributed shared memory, an architecture that provides a shared and tiered memory space using a pool of servers with expansion memory modules attached to the high bandwidth, low latency, cache coherent interface such as Compute Express Link (CXL) [1] on each server. • The distributed shared memory architecture is motivated by three datacenter trends: data-intensive applications that require data sharing; emerging high bandwidth low latency cache coherent interfaces that enable memory expansion within a compute node or cross nodes; storage class memory technologies [2] that bridges the gap between DDR and flash. • With CXL, applications can perform native memory load and store instructions to access both local and far memory in this global memory space • Application data can be persistent if incorporate storage class memory technologies. • In memory database, PageRank, graph processing etc. make up an important class of data-intensive applications. People usually deploy a cluster of server nodes connected via a high-bandwidth commodity network, which requires careful handling of the data placement in order to reduce the overhead on data movement. It involves replications of data on different server nodes that cost expensive DDR capacity. • Our goal is to build a shared memory layer that is transparent to the applications and ease programmer’s work, lower system memory TCO by reducing data replications and improving memory utilizations. Figure 1 shows Alibaba’s production cluster traces [3]. Without memory sharing, a physical server becomes the boundary of resource allocation and it is difficult to achieve high memory utilization. Figure 2, Storage Class Memory is an emerging new class of persistent memory. Compared with DDR, SCM offers advantages such as capacity expansion with lower cost and increased data reliability due to its persistency.

Distributed Shared Memory Architecture…the best performance and is intended to serve high SLA workloads. •Tier 2 memory is a pool of SCM that has large capacity to supplement Tier

Download PDF Report

Upload
others
View
0
Download
0

Embed Size (px)

Citation preview

Page 1: Distributed Shared Memory Architecture…the best performance and is intended to serve high SLA workloads. •Tier 2 memory is a pool of SCM that has large capacity to supplement Tier

Distributed Shared Memory ArchitectureDi Xu, Jun Song, Fu Wang

Alibaba Group {di.x, justin.song, yunfan.wf}@alibaba-inc.com

Fig. 1. Alibaba production cluster traces Fig. 2. Memory and Storage Media Hierarchy

[1] https://www.computeexpresslink.org/about-cxl

[2] Intel Corporation. Intel Non-Volatile Memory 3D XPoint. http: //www.intel.com/content/www/us/en/architecture-and-technology/non-volatile-memory.html?wapkw=3d+xpoint.

[3] Yizhou Shan, Yutong Huang, Yilun Chen, Yiying Zhang LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation

[4] https://genzconsortium.org/white-papers/

Fig. 3. Distributed Shared and Tiered Memory Architecture with inline acceleration

• The distributed shared memory architecture that we are proposing is illustrated in Figure 3.

• Tier 1 memory is composed of local system DDR which has the best performance and is intended to serve high SLA workloads.

• Tier 2 memory is a pool of SCM that has large capacity to supplement Tier 1. We plan to use FPGAs with CXL interface to control SCMs. The FPGA will support RDMA or GenZ [4] protocols for interconnecting with each other to form a high-speed data plane.

• The FPGA also serves as a near memory computing engine, that accelerates typical operations such as compression and encryption etc.

• We propose distributed shared memory, an architecture that provides a shared and tiered memory space using a pool of servers with expansion memory modules attached to the high bandwidth, low latency, cache coherent interface such as Compute Express Link (CXL) [1] on each server.

• The distributed shared memory architecture is motivated by three datacenter trends: data-intensive applications that require data sharing; emerging high bandwidth low latency cache coherent interfaces that enable memory expansion within a compute node or cross nodes; storage class memory technologies [2] that bridges the gap between DDR and flash.

• With CXL, applications can perform native memory load and store instructions to access both local and far memory in this global memory space • Application data can be persistent if incorporate storage class memory technologies. • In memory database, PageRank, graph processing etc. make up an important class of data-intensive applications. People usually deploy a cluster of

server nodes connected via a high-bandwidth commodity network, which requires careful handling of the data placement in order to reduce the overhead on data movement. It involves replications of data on different server nodes that cost expensive DDR capacity.

• Our goal is to build a shared memory layer that is transparent to the applications and ease programmer’s work, lower system memory TCO by reducing data replications and improving memory utilizations.

Figure 1 shows Alibaba’s production cluster traces [3]. Without memory sharing, a physical server becomes the boundary of resource allocation and it is difficult to achieve high memory utilization. Figure 2, Storage Class Memory is an emerging new class of persistent memory. Compared with DDR, SCM offers advantages such as capacity expansion with lower cost and increased data reliability due to its persistency.

BSSync: Processing Near Memory for Machine Learning Workloads

Documents

MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC;

Documents

Query Acceleration of Oracle Database 12c In-Memory using ... · Database In-Memory not only speeds up analytic workloads enabling real time analytics on the most recent transactional

Documents

QUICK IT GUIDE TO MEMORY, STORAGE, AND ......Windows Server 2016 offers faster, easier, and smarter ways to allocate memory, without interrupting critical workloads or delaying projects

Documents

Decentralized Memory Disaggregation Over Low-Latency ... · the VoltDB in-memory database; (2) two Facebook-like workloads running on the Mem-cached key-value store; (3) the TunkRank

Documents

Supporting Hybrid Workloads for In-Memory Database …reports-archive.adm.cs.cmu.edu/anon/2019/CMU-CS-19-112.pdf · 2019-05-22 · OLTP workloads directly on an open-source columnar

Documents

ERMIA: Fast Memory-Optimized Database System for ...tzwang/ermia.pdf · ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads ... in-memory transaction processing

Documents

BUILDING A FASTER, HIGHLY AVAILABLE DATA TIER FOR ACTIVE WORKLOADS Dave McCrory CTO Basho Technologies May 2015 2

Documents

Efficient Virtual Memory for Big Memory Servers · Efficient Virtual Memory for Big Memory Servers ABSTRACT Our analysis shows that many “big-memory” server workloads, such as

Documents

Oracle® TimesTen In-Memory Database · Oracle TimesTen In-Memory Database is a memory-optimized relational database. Deployed in the application tier, Oracle TimesTen In-Memory Database

Documents

Workloads, Scalability and QoS Considerations in CMP Platforms · Today Tomorrow Not too distant future CacheCache Memory & I/OMemory & I/O CacheCache ... memory and I/O. CTG/STL/MPA

Documents

Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm

Documents

Opportunities for Optimism in Contended Main-Memory Multicore Transactions … · 2020. 1. 30. · performance on uncontended workloads for main-memory trans-actional databases. Contention

Documents

Samsung 32GB LRDIMM Enables 1.5TB on IBM x3750 System for Memory Intensive Server & HPC Workloads

Documents

Memory Controller Design Under Cloud Workloads · 2016-12-01 · state-of-the-art memory controller designs when run-ning several modern server workloads. The empha-sis is on the

Documents

Tachyon: Reliable, Memory Speed Storage for Cluster ... · sharing at memory speed across cluster computing frame-works. While caching today improves read workloads, writes are either

Documents

Evaluating the Memory System Behavior of Smartphone Workloadsenright/SAMOS.pdf · · 2014-05-15Evaluating the Memory System Behavior of Smartphone Workloads G. Narancic 1, P. Judd

Documents

Learning-based Memory Allocation for C++ Server Workloads · 2020. 4. 20. · Javanmard, Kathryn S. McKinley, Colin Raffel. 2020. Learning-based Memory Allocation for C++ Server Workloads

Documents

Memory & Storage Day New generations of storage excellence ... · Memory & Storage Day ... 1 Source: Enterprise Strategy Group. Software and workloads used in performance tests may

Documents

Analysis and Optimization of the Memory Hierarchy for ... · Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads Abanti Basak, Shuangchen Li, Xing Hu,

Documents

Panel The Impact of AI Workloads on Datacenter Compute and … · 2019-03-29 · Panel –The Impact of AI Workloads on Datacenter Compute and Memory Samsung @ The Heart of Your Data

Documents

Analysis and Optimization of the Memory Hierarchy for ... · tion of graph processing workloads on a simulated multi-core architecture. We analyze 1) the memory-level parallelism

Documents

$In-Memory Performance for Big Data - VLDB Endowment · PDF fileIn-Memory Performance for Big Data Goetz Graefe ... while supporting the \big data" workloads that ... This can cause$