View
240
Download
0
Category
Preview:
Citation preview
© 2014 Aerospike. All rights reserved. Confidential 1
What Enterprises Can Learn from Real-time Bidding
How, and why, to achieve
Operational Big Data
Brian Bulkowski CTO and co-founder
Aerospike
© 2014 Aerospike. All rights reserved. Confidential 2
REQUIREMENTS FOR INTERNET ENTERPRISES
© 2014 Aerospike. All rights reserved. Confidential 3
Introduction to Advertising: Real-time Bidding
© 2014 Aerospike. All rights reserved. Confidential 4
North American RTB speeds & feeds
■ 1 to 6 billion cookies tracked ■ Some companies track 200M, some track 20B
■ Each bidder has their own data pool ■ Data is your weapon ■ Recent searches, behavior, IP addresses ■ Audience clusters (K-cluster, K-means) from offline Hadoop
■ “Remnant” from Google, Yahoo is about 0.6 million / sec ■ Facebook exchange: about 0.6 million / sec ■ “other” is 0.5 million / sec
Currently about 3.0M / sec in North American
© 2014 Aerospike. All rights reserved. Confidential 5
Advertising requirements
■ 100 millisecond or 150 millisecond ad delivery
■ De-facto standard set in 2004 by Washington Post and others
■ North America is 70 to 90 milliseconds wide ■ Two or three data centers
■ Auction is limited to 30 milliseconds ■ Typically closes in 5 milliseconds
■ Winners have more data, better models – in 5 milliseconds
© 2014 Aerospike. All rights reserved. Confidential 6
MILLIONS OF CONSUMERS BILLIONS OF DEVICES
APP SERVERS
DATA WAREHOUSE INSIGHTS
Advertising Technology Stack
WRITE CONTEXT
OPERATIONAL DB
WRITE REAL-TIME CONTEXT READ RECENT CONTENT PROFILE STORE Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms... REAL-TIME ANALYTICS Best sellers, top scores, trending tweets
BATCH ANALYTICS Discover patterns, segment data: location patterns, audience affinity
© 2014 Aerospike. All rights reserved. Confidential 7
Financial Services – Intraday Positions
LEGACY DATABASE (MAINFRAME)
Read/Write
Start of Day Data Loading
End of Day Reconciliation
Query REAL-TIME DATA FEED
ACCOUNT POSITIONS
XDR
10M+ user records Primary key access 1M+ TPS planned
Finance App
Records App
RT Reporting App
© 2014 Aerospike. All rights reserved. Confidential 8
Social Media
MYSQL or POSTGRES (ROTATIONAL DISK)
Recent user generated content
Java application tier
Data abstraction and sharding
MODIFIED REDIS (SSD ENABLED)
Content and Historical data
© 2014 Aerospike. All rights reserved. Confidential 9
Travel Portal
PRICING DATABASE (RATE LIMITED)
Poll for Pricing Changes
PRICING DATA
Store Latest Price
SESSION MANAGEMENT
Session Data
Read Price
XDR
Airlines forced interstate banking Legacy mainframe technology Multi-company reservation and pricing Requirement: 1M TPS allowing overhead
Travel App
© 2014 Aerospike. All rights reserved. Confidential 10
SOURCE DEVICE/ USER
QOS & Real-Time Billing for Telcos
■ In-switch Per HTTP request Billing ■ US Telcos: 200M subscribers, 50 metros
■ In-memory use case
Hot Standby
Execute Request
Real-time Checks
DESTINATION
Update Device User Settings
Request
XDR
Real-time Auth. QoS Billing
Config Module App
© 2014 Aerospike. All rights reserved. Confidential 11
MILLIONS OF CONSUMERS BILLIONS OF DEVICES
APP SERVERS
BATCH ANALYTICS INSIGHTS
BATCH ANALYTICS
The New Architecture
WRITE CONTEXT
TRANSACTIONS & HOT ANALYTICS
© 2014 Aerospike. All rights reserved. Confidential 12
Old Architecture ( scale out in 2000 )
Request routing and sharding
APP SERVERS
CACHE
DATABASE
STORAGE
CONTENT DELIVERY NETWORK
LOAD BALANCER
© 2014 Aerospike. All rights reserved. Confidential 13
Modern Scale Out Architecture
Load balancer Simple stateless APP SERVERS
IN-MEMORY NoSQL
RESEARCH WAREHOUSE
CONTENT DELIVERY NETWORK
LOAD BALANCER
Long term cold storage Fast stateless
© 2014 Aerospike. All rights reserved. Confidential 14
Modern Scale Out Architecture
Load balancer Simple stateless APP SERVERS
IN-MEMORY NoSQL
RESEARCH WAREHOUSE
CONTENT DELIVERY NETWORK
LOAD BALANCER
Long term cold storage Fast stateless
HDFS BASED
© 2014 Aerospike. All rights reserved. Confidential 15
Build a data layer
Use open source
Focus on Key Value
Use In-memory NoSQL
Use Flash
© 2014 Aerospike. All rights reserved. Confidential 16
How Fast You Can Go
And
How To Do It
© 2014 Aerospike. All rights reserved. Confidential 17
SHARED-NOTHING SYSTEM:100% DATA AVAILABILITY ■ Every node in a cluster is identical,
handles both transactions and long running tasks
■ Data is replicated synchronously with immediate consistency within the cluster
■ Data is replicated asynchronously across data centers
OHIO Data Center
© 2014 Aerospike. All rights reserved. Confidential 18
LESSONS 1. Optimize key-value code paths
■ No hot spots (e.g., robust DHT) ■ Scales up easily (e.g., easy to size) ■ Avoids points of failure (e.g., single node type) ■ Binary protocol
2. Code in C ■ Read() / Write() / Linux AIO ( don’t trust a library ) ■ Multithread ! ■ Direct device ■ C++ could work, leads to structural complexity
3. Memory allocation matters ■ Stack based allocators ■ Own stack allocator ■ JEMalloc for pools
© 2014 Aerospike. All rights reserved. Confidential 19
LESSONS (cont’d)
4. Innovation: masters in a shared nothing system ■ Fast cluster organization ■ Fast transaction capabilities ■ Can be CP or AP - and resolve data accurately
5. Clients are hard ■ Fast stable connection pools are hard ■ API design matters ■ Slow languages need Aerospike more
6. Aggregations / queries required ■ Row oriented ■ Secondary indexes as filters, and MR style ■ Data transformation hurts
© 2014 Aerospike. All rights reserved. Confidential 20
LESSONS (cont’d) 7. Network interrupts are painful
■ TCP is still better than UDP ■ Some great hardware out there (solarflare) ■ Network queues, RRD ■ New PCI-e, other interfaces are not there yet ■ Larger clusters solve interface issues
8. “Large data types” (better documents) ■ Ordered List, Map, Set, Stack ■ Time series and documents ■ Beyond per-row storage layout, more optimal than document
9. UDFs for flexibility
© 2014 Aerospike. All rights reserved. Confidential 21
WRITING RELIABILY WITH HIGH PERFORMANCE
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory and replica memory synchronously
4. Queue operations to disk
5. Signal completed transaction (optional storage commit wait)
6. Master applies conflict resolution policy (rollback/ rollforward)
master replica
1. Cluster discovers new node via gossip protocol
2. Paxos vote determines new data organization
3. Partition migrations scheduled
4. When a partition migration starts, write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions continue Writing with Immediate Consistency Adding a Node
© 2014 Aerospike. All rights reserved. Confidential 22
Key Value Store + Lists, Maps ■ Namespaces (policy containers)
■ Determine storage - DRAM or Flash ■ Determine replication factor ■ Contain records and sets
■ Sets (tables) of records ■ Arbitrary grouping
■ Records (rows) of key/bins ■ Block size (128k – 2MB)
■ Bin with same name can contain values of different types
■ String, integer, bytes (raw, blob, etc) ■ list ( an ordered collection of values ) ■ map ( a collection of keys and values )
■ Bins can be added anytime
■ Meta data ■ Generation counter so apps can ensure that a
record was not modified since last read ■ Time-to-live value for auto expiration, keeping most
recent context or "hot" data, aging out historical context
© 2014 Aerospike. All rights reserved. Confidential 23
KVS + Lists, Maps + Queries + UDFs
STREAM AGGREGATIONS (INDEXED MAP-REDUCE) Pipe Query results through UDFs ■ Filter, Transform,
Aggregate.. Map, Reduce
■ Enforce security
■ UDFs in Lua to ■ CRUD a record ■ Calculation based
on data within a record ■ Iterate through a
set / namespace of records
■ UDFs for real-time
analytics and aggregations
© 2014 Aerospike. All rights reserved. Confidential 24
SQL & NoSQL
➤ Secondary index § Equality, Range, IN (,,,), Compound § e.g. WHERE group_id = 1234,
WHERE last_activity > 1349293398, WHERE branch_id IN (5,6,7,8)
➤ Filters
§ SQL: Where clause with non-indexed “AND”s (e.g. “AND gender=‘M’ ”)
§ NOSQL: Map step
➤ Aggregation § SQL: GROUP BY, ORDER BY, LIMIT, OFFSET § NOSQL: Reduce step
Secondary Key
Primary Key
Record
Filter Map
Aggregate
DRAM
SSD
Aggregate
Client
Client
Server
Reduce
Aggregate
Query
© 2014 Aerospike. All rights reserved. Confidential 25
Flash storage
The Power of Flash Storage
© 2014 Aerospike. All rights reserved. Confidential 26
DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
Ask me and I’ll tell you the answer. Ask me. I’ll look up the answer and then tell it to you.
DATABASE
HYBRID MEMORY SYSTEM™
• Direct device access • Large Block Writes • Indexes in DRAM • Highly Parallelized • Log-structured FS “copy-on-write” • Fast restart with shared memory
FLASH OPTIMIZED HIGH PERFORMANCE
© 2014 Aerospike. All rights reserved. Confidential 27 © 2012 Aerospike. All rights reserved. Pg. 27
Measure your drives! Aerospike Certification Tool (ACT) http://github.com/aerospike/act Transactional database workload Reads: 1.5KB
(can’t batch / cache reads, random) Writes: 128K blocks
(log based layout) (plus defragmentation)
Turn up the load until latency is over required SLA
© 2014 Aerospike. All rights reserved. Confidential 28
Aerospike’s Flash Experience
■ Know your Flash ■ ACT benchmark http://github.com/aerospike/act ■ Read-write benchmark results back to 2011
■ All clouds support flash now ■ New EC2 instances ■ Google Compute ■ Internap, Softlayer, GoGrid…
■ Write durability usually not a problem with modern flash
■ Durability is high (5 “drive writes per day” for 5 years, etc) ■ Read performance suffers under write load anyway
© 2014 Aerospike. All rights reserved. Confidential 29
Aerospike’s Flash Experience
■ Densities increasing ■ 100G 2 years ago à 800G today ■ SATA vs PCI-E ■ Appliances: 50T per 1U this year
■ Prices still dropping: perhaps $1/G next year
■ Intel P3700 results ■ 250K per device @ $2.5 / G ■ Old standard: Micron P320h 500K @ $8 / G
■ “Wide SATA” ■ 20 SATA drives ■ LSI “pass through mode” ■ 250K+ per server
© 2014 Aerospike. All rights reserved. Confidential 30
Use Open Source
© 2014 Aerospike. All rights reserved. Confidential 31
Aerospike: the trusted In-Memory NoSQL
Performance • Over ten trillion transactions per month • 99% of transactions < 2 ms • 150K TPS per server
Scalability • Billions of Internet users • Clustered Software • Maintenance without downtime • Scale up & scale out
Reliability • 50 customers; zero down-time • Immediate Consistency • Rapid Failover; Data Center Replication
Price/Performance • Makes impossible projects affordable • Flash-optimized • 1/10 the servers required
© 2014 Aerospike. All rights reserved. Confidential 32
Recommended