42

(FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

Embed Size (px)

DESCRIPTION

Jason Timmes led the migration of the primary data warehouse for Nasdaq's Transaction Services U.S. business unit (which operates Nasdaq's U.S. equity and options exchanges) from a traditional on-premises MPP database to Amazon Redshift. The project significantly reduced operational expenses. Jason, who is an Associate Vice President of Software Development at Nasdaq, describes how his team migrated a warehouse that loads approximately 7 billion rows a day into the cloud, satisfied several security and regulatory audits, optimized read and write performance, ensures high availability, and orchestrates other back-office activities that depend on the warehouse daily loads completing. Along with sharing several technical lessons learned, Jason will discuss Nasdaq's roadmap to integrating Redshift with more AWS services, as well as with more Nasdaq products, to offer even greater benefit to clients (internal and external) in the months ahead.

Citation preview

Page 1: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 2: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

2

We make the

world’s capital markets

move faster more efficientmore transparent

Public company

in S&P 500

Develop and run markets globally in

all asset classes

We provide technology, trading,

intelligence and listing services

Intense Operational Focus

on Efficiency and Competitiveness

We provide the infrastructure, tools and strategic

insight to help our customers navigate the complexity of global capital markets and realize their capital ambitions.

Get to know usWe have uniquely transformed our business from predominately a U.S. equities exchange to a

global provider of corporate, trading, technology and information solutions.

Page 3: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

3

LEADING INDEX PROVIDER WITH

41,000+ INDEXES ACROSS ASSET CLASSES AND

GEOGRAPHIES

Over 10,000 Corporate Clients in

60 countries

Our technology

powers over

70

MARKETPLACES,

regulators, CSDs

and clearing-

houses

in over

50 COUNTRIES

100+ DATA

PRODUCT OFFERINGS

supporting 2.5+ millioninvestment professionals

and users

IN 98 COUNTRIES

26 Markets

3 Clearing Houses

5 Central Securities

Depositories

Lists more than 3,500

companies in 35 countries,

representing more than $8.8

trillion in total market value

Page 4: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 5: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

Our warehouse can be used to

analyze market share, client

activity, surveillance, power our

billing, and more…

Page 6: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 7: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 8: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 9: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 10: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 11: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 12: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 13: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• A quality of an action such that repetitions of the

action have no further effect on outcome– In other words, f(x) = f(f(x)) = f(f(f(x))), etc.

• Ingest process is designed as a workflow engine

with each step in each workflow being idempotent.

• Failures are easily recovered by repeating the failed

step after resolving the root cause of any failure.

Page 14: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• Use a manifest file inside a transaction with a table

lock, and keep a record of completed ingests

• If the S3 COPY (insert) fails, rollback the transaction

• If the insert succeeds, write a record of the

completed ingest, and commit the transaction

• Idempotence: start transaction, lock destination

table, check for prior successful ingest, and only

start insert if data hasn’t already been loaded today

Page 15: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• Pay close attention to the mandatory flag!

• Redshift UNLOAD always sets this to false!!!

Page 16: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 17: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• TableIngestStatus– We originally put this table in Redshift itself

– Turns out Redshift is not efficient on really small data sets

– Significantly impacted performance, and increased concurrency

contention

• Solution: Moved TableIngestStatus to a separate

transactional RDBMS (MySQL)– We were already using a MySQL instance to persist workflow

states

Page 18: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• Multiple layers of security– Direct Connect (private lines)

– VPC

– HTTPS/SSL/TLS (Encryption in flight)

– AES-256 (Encryption at rest in S3)

– Redshift encryption (Encryption at rest in Redshift)

– HSM integration (Redshift master key managed on premise)

– CloudTrail/STL_CONNECTION_LOG to monitor for unauthorized

DB connections

Page 19: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• Direct Connect– No company data travels over internet circuits

• VPC– Isolate our Redshift servers from other tenets/internet connectivity

– Security Groups restrict inbound/outbound connectivity

Page 20: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• All AWS API calls are made over HTTPS

• All Redshift JDBC connections must use SSL/TLS– Parameter Group: require_ssl = true

– Use Redshift cluster SSL certificate to verify cluster identity

• See http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-ssl-

support.html for details

Page 21: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• All Redshift load files staged in S3 are AES-256

encrypted (client side, not S3 SSE)– Key is provided to Redshift in the S3 COPY command:

• Enable cluster encryption on Redshift– Only specified during cluster creation, cannot be changed

– Applies to backups/snapshots as well

– Performance penalty, but not optional for Nasdaq

copy nbbo from 's3://my_ingest/2014-09-17/nbbo.manifest'

credentials 'aws_access_key_id=<access-key-id>;

aws_secret_access_key=<secret-access-key>;master_symmetric_key=<master_key>'

manifest encrypted gzip;

Page 22: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• Redshift will store the cluster key in a single

customer premise HSM (or CloudHSM)– SafeNet Luna SA HSM, firmware version should match CloudHSM

– Requires certificate exchange between cluster and HSM

– Requires cluster have an EIP

• On our side, required static 1-to-1 NAT of HSM private IP

• VPC Security Groups still apply; can still isolate cluster from others

– Encrypted database key decrypted in HSM, passed over encrypted

channel to cluster on startup, stored in memory to decrypt data

encryption (block) keys

– If running an HSM HA group, must synchronize keys after creation

Page 23: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• HSM integration was critical to Nasdaq adoption

• Monitor cluster access, react to any unauthorized

connections– STL_CONNECTION_LOG

• Query system table on a timed basis, alert to any unexpected access

– CloudTrail to Splunk Redshift connection & user logs

• Captures all API calls, not activity inside Redshift

– STL_DDLTEXT

• Audits all schema changes in the cluster

• In response to an alert, Redshift/HSM connectivity is

severed, and cluster is immediately shut down

Page 24: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

• With validation, data integrity, and security

requirements met, the challenge remains to

optimize ingest

• Why?– Concurrency is a huge performance factor; can’t afford to be

loading yesterday’s data when clients are running queries

Page 25: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 26: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

-

20

40

60

80

100

120

140

1 2 4 6 8 10 12 14 16 18

Th

rou

gh

pu

t (M

B/s

ec)

Concurrent Threads

S3 (over HTTPS) Multithreaded Throughput

Page 27: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 28: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 29: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 30: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 31: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 32: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 33: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

On premise AWS Regional (Multi-AZ) Scope AWS (US-East,

primary AZ/VPC)

S3

SNS

Redshift

Database

Cluster

HSM Key

Appliance

Cluster

MySQL

Redshift

Load files/

Manifests

Redshift

Snapshots/

Backups

Data

Loaded

Topic

RMS Input

Sources

(multiple

systems)

Data Ingest

Process

Page 34: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 35: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 36: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 37: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 38: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 39: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 40: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 41: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014
Page 42: (FIN401) Seismic Shift: Nasdaq's Migration to Amazon Redshift | AWS re:Invent 2014

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals