IMC Summit 2016 Breakout - Brian Bulkowski - NVMe, Storage Class Memory and Operational Databases:...

Preview:

Citation preview

NVMe, Storage Class Memory, and Operational Databases:Real World Results

—Brian Bulkowski, CTO and FounderMay 24, 2016

See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

Who needs a fast operational database?

3© 2016 Aerospike Inc. All rights reserved. [ ]

Identified where scale is mission critical – the AdTech Industry

1 Expanding into other use casesin BFSI, Telco, Retail, IoT

4

Built database that can scale for RTB use case for these customers and more….

2

The Scalable Operational Database Problem

3

Deployed @ Digital Enterprises that need consumer scale databases

4© 2016 Aerospike Inc. All rights reserved. [ ]

Architecture Overview

LEGACY DATABASE(Mainframe)

XDR

DATA WAREHOUSE/DATA LAKE

LEGACY RDBMSHDFS BASED

BUSINESSTRANSACTIONS

Web views

( Payments ) ( Mobile Queries )

( Recommendation ) ( And More )

High Performance NoSQL

“REAL-TIME BIG DATA”“DECISIONING”

YYYOUR APPLICATION FRAMEWORK

YY

Decisioning Engine

5© 2016 Aerospike Inc. All rights reserved. [ ]

Architecture Overview

LEGACY DATABASE(Mainframe)

XDR

Decisioning Engine

DATA WAREHOUSE/DATA LAKE

LEGACY RDBMSHDFS BASED

BUSINESSTRANSACTIONS

Web views

( Payments ) ( Mobile Queries )

( Recommendation ) ( And More )

High Performance NoSQL

“REAL-TIME BIG DATA”“DECISIONING”

500Business Trans per sec

5000 Calculations per sec

X =

2.5 M Database Transactions per sec

6© 2016 Aerospike Inc. All rights reserved. [ ]

Business Challenge• Meet payment SLA of 750 ms• Differentiate between fraudulent and legitimate

orders in real-time• Stop loss of business due to latency• Support thousands of DB reads/writes per credit

card transaction• Bring new algorithms and data

online within weeks

Prevent Only Fraudulent Transactions

7© 2016 Aerospike Inc. All rights reserved. [ ]

Bespoke Algorithms• Blend Internet behavior• Track every IP address• Recent fraudulent cart items• Machine learning generated filters• Expand to new platforms

Not business process modeling

Easier to code and validatethan SQL queries

Credit Card Processing System

Fraud Detection & Protection App

RulesRule 1Rule 2Rule 3

Historical Data

Rule 1-PassedRule 2-PassedRule 3-Failed

Account BehaviorStatic Data

AccountStatistics

Prevent Only Fraudulent Transactions

8© 2016 Aerospike Inc. All rights reserved. [ ]

Intra-day System of Record

Challenge• DB2 (RDBMS) stores positions for 10 Million customers• Must update stock prices, show balances on 300 positions,

process 250M transactions, 2M updates/day• Single view of position across

Risk, Mobile, Fraud• Enable hard-to-code Risk and Fraud

algorithms

9© 2016 Aerospike Inc. All rights reserved. [ ]

Intra-day System of Record

High performance NoSQL• Immediate consistency, no data loss• Predictable low latency at high throughput • Flash provides durability, fast restarts• Cross data center (XDR) for replication

Highly parallel account-to-trades data model

Easy-to-validate Java-based algorithms

IBM DB2(Mainframe)

Real-Time App Record App

Finance App

Real-TimeData Feed

Start of the DayData Loading

End of DayReconciliation

AccountPositions

10© 2016 Aerospike Inc. All rights reserved. [ ]

■Internet cache replacement■Telecom integrated real-time billing and routing■Retail predictive analytics■Machine learning

■Online learning updates models rapidly

■Social messaging■Gaming and gambling

Similar use cases

11© 2016 Aerospike Inc. All rights reserved. [ ]

Aerospike Architecture

■ Every node in a cluster is identical, handles both transactions and long running tasks

■ Direct attach storage for 1M+ TPS performance

■ Highly available clustering, integratedtransaction processing

OHIO Data Center

12© 2016 Aerospike Inc. All rights reserved. [ ]

■Tests simultaneous read and write performance■Small block ( 1K ) random read latency

■During continual large-block write load

■50-50 read-write load

■Defragmentation ( large blocks )

■24 hours ( variance often found at 7 to 10 hours )

■Operational databases need to accept reads and writes■Reads are not localized or predictable

Aerospike ACT certification tool for Flash

Status of Wide SATA, PCIe, NVMe,

14© 2016 Aerospike Inc. All rights reserved. [ ]

■Early SATA ( 2009, 2010 )■Intel X25M

■Samsung SS805

■Devices provide 95% < 1msfor about 2,000 IOPS

■$3 / GB

■FusionIO - 2010■PCIe, but custom driver

■CPU, bus load

■$8 / GB

Historical Perspective – Early Days

15© 2016 Aerospike Inc. All rights reserved. [ ]

■Micron proves fast PCIe possible■P320 ( SLC ) with low bus overhead, excellent driver

■Over 200,000 IOPS with 99.7% < 1ms

■( SFF-8639 hot-swap 2.5” pci-e drives )

■“Wide SATA” generally used■12 to 20 2.5” SATA drives per 2U chassis

■Intel S3700, S3500 ; Samsung 843 favored

■8 drives per controller ( many issues )

■150,000 IOPs per chassis achievable

■Violin, FusionIO troubled, DSSD sold to EMC■NVMe available but not practical

Historical Perspective – 2013, 2014

16© 2016 Aerospike Inc. All rights reserved. [ ]

■Linux, Windows drivers achieve performance■U.2 and M.2 form factors available

■Intel P3700, P3600, P3500 available – ■250k IOPs per card

■Samsung PM1735 available – ■120k IOPs per card

■Micron 9100■HGST, Toshiba – 30k to 50k per card

■SAS / SATA lingers■Samsung SM1635, PM1633; Intel S3700; Micron S600 still shipping

NVMe Arrives – 2015 to present

17© 2016 Aerospike Inc. All rights reserved. [ ]

■High, predictable performance■High transfer speed reduces jitter

■2.8uS NVMe vs 5.0uS SATA

■Better controllers

■Mature and tuned Linux driver

■U.2 front panel hot swap available ( and 24-wide )

■NVMe arrays available■Apeiron ADS1000 “direct scale-out flash”

■EMC D5, Mangstore NX63020

■All new Aerospike deployments on NVMe

NVMe’s crushing superiority

18© 2016 Aerospike Inc. All rights reserved. [ ]

1 Million TPS on 1 Server

Options for storage on a database before Aerospike:

RAM, which was fast, but allowed very limited storage Disk, which allowed for a lot of storage, but was limited in

speed

Intel achieved 1M TPS using 4 Intel P3700 SDs with 1.6 TB capacity on a single Aerospike server. The cost per GB is a fraction of the cost of RAM, while still having very high performance.

19© 2016 Aerospike Inc. All rights reserved. [ ]

■Every public cloud provider has Flash■AWS / EC2 has sophisticated offerings

■Google Compute is high performance

■Private clouds manage Flash■Docker offers storage metadata

■Pivotal manages Flash and traditional storage

Flash in the Public Cloud

Storage Class Memory( and trends )

21© 2016 Aerospike Inc. All rights reserved. [ ]

■Diverging “Drive Writes Per Day”■Low-write devices

■1~2 DWPD

■Sandisk Inifiniflash, Micron

■Increase density, lower cost

■Hadoop / Data lake “all flash” use

■High-write devices■10~15 DWPD

■P3700, Hitachi, Samsung

■Optane to the rescue

Trends in NAND

22© 2016 Aerospike Inc. All rights reserved. [ ]

■Optane this year■3D Xpoint in 2.5” NVMe package

■“7x faster” – limited by NVMe !

■Very high write durability

■Replaces SLC for some use cases

■Unknown pricing

■NVDIMM■Removes NVMe limit

■Intel cagy on delivery – “uncommitted”

■Competes with DRAM

■Really changes the world

Intel’s 3D Xpoint roadmap ( public info )

23© 2016 Aerospike Inc. All rights reserved. [ ]

■It’s not exactly like DRAM■When a system restarts, need to clear

■Slower

■Different kind of data structures required

■It’s not exactly like storage■It’s on the memory bus

■New instructions for persistance

■Think of it more like DRAM■Lower power consumption

■Much higher density ( 1T++ )

How to architect for DDR 3D XPoint

Thank YouQuestions?

Recommended