View
2.368
Download
2
Category
Preview:
DESCRIPTION
Introducing Bazaarvoice datastore (EmoDB) EmoDB is a RESTful HTTP server used by Bazaarvoice for storing JSON objects and for watching for changes to those events. It also supports a blob store, a queueing service, and a data bus to track events. It is designed to span multiple data centers, using eventual consistency (AP) and multi-master conflict resolution. It relies on Apache Cassandra for persistence and cross-data center replication. About Bazaarvoice Bazaarvoice is based on a simple truth - when people talk to each other, people buy stuff they are happy about because they trust the opinions of others. We see a day when all voices are connected and, together, help the marketplace function better. We’ve built a network that connects businesses together to amplify the authentic voices of people wherever they shop – online, in-store and mobile. Our mission, just like our name, is to be the "voice of the marketplace", one authentic conversation at a time. Each month, more than 450 million people view and share opinions, questions and experiences on more than 20 million products in the Bazaarvoice network. Our technology platform channels these voices into places that help consumers make purchasing decisions. Our engineers have the opportunities to work on many areas of computer science, including distributed computing, natural language processing, big data analytics, larget-scale system design, and user interface design just to name a few. We use the latest and greatest technologies in Cloud, NoSQL, advanced client side JavaScript etc to solve problems at a scale that few companies can offer. About Fahd Siddiqui: Fahd Siddiqui is a Senior Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin.
Citation preview
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
EmoDBStore your feelings here
www.bazaarvoice.com
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights.
Thousands of clients
Hundreds of millions of pieces of content
Hundreds of millions of unique visitors per month
Tens of billions of pageviews per month
Austin-based company founded in 2005
Austin San Francisco New YorkEngineering offices
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Fahd SiddiquiSenior Software Engineer, Data Infrastructure
Bazaarvoice
linkedin.com/in/fahdsiddiqui
fahd.siddiqui@bazaarvoice.com
$ whoami
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Global Monthly Unique Visitors
1B
1B
500M
1B
400M
200M
250M
450M
1B
600M
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Monthly stats as of July 201316B1B
480M250M118M
3M40002500
Review impressions
Pageviews (37k rps)
Unique users
Products in catalog
Total reviews
Monthly new reviews
Customer implementations
Servers
95 Engineers
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store any content type without any re-architecture
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store any content type without any re-architecture
Watch for changes to data events
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store any content type without any re-architecture
Watch for changes to data events
Exposes RESTful API
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store any content type without any re-architecture
Watch for changes to data events
Exposes RESTful API
Multi-master, multi-datacenter, fault tolerant, horizontal scale on r/w
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
EmoDB Overview
System of RecordDatabusQueue ServiceBlob Store
….. Backed by Cassandra
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - tables
What is an Emo Table?It is a bucket that contains json document. Creating it is cheap, and you may create as many as you want e.g.., review:testcustomerOffers a way to fetch any particular row id, andComplete table scan – uses splits for parallel scans
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - tables
Create a table
$ curl -s -XPUT -H "Content-Type: application/json" \ "http://localhost:8080/sor/1/_table/review:testcustomer ?options=placement:'ugc_global:ugc'&audit=comment:'initial+provisioning',host:aws-tools-02" \ --data-binary '{"type":"review","client":"TestCustomer"}' | jsonpp{ "success": true}
• Store a document
$ curl -s -XPUT -H "Content-Type: application/json" \ http://localhost:8080/sor/1/review:testcustomer/demo1?audit=comment:'initial+submission',host:aws-submit-09 \ --data-binary '{"author":"Bob","title":"Best Ever!","rating":5}' | jsonpp{ "success": true}
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – rows
Row is composed of deltasWriters append deltas, and readers resolve deltas to produce a resolved objectCompaction occurs when data has been replicated to all data centersDue to this, EmoDb is not good for systems high update/create ratio
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Access in EmoDB
3 ways to read data out of EmoDBLookup by primary keyBulk extract (scan) Change feed (using EmoDB databus)What’s missing?Where, join, group by, anything other than primary key lookupUse other indexing mechanism for complex queries (such as elasticsearch, solr, etc.)
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage challenges
Problem 1:Need a way to cheaply create 10’s of 1000’s of “tables”As of Cassandra 1.1, at least 1 MB of memory in every node for each CF is neededWay too much overhead to dedicate a CF for each user-defined table
Hint: We’ll use only one Column Family to store all tables
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage challenges
Problem 2 (once Problem 1 is solved):Need to scan entire table to be indexed by Polloi (Elasticsearch)Require a way to split tables into shards that enable sequential scanShards for each table should be fully distributed over Cassandra cluster
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage
Solution to both Problem 1 and 2:Row key byte buffer contains a 9 byte “table prefix”0 – 0: 8-bit shard identifier1 – 8: 64-bit table UUIDN-byte - UTF-8-encoded content keyShard identifier is determined byBottom 8 bits of 32-bit Murmur3 hash of (table UUID | content key)
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Scanning Table
Shard identifier serves to spread content for a given table to avoid hotspots (using ByteOrderedPartitioner) All content for a table can be fetched in parallel using 2^8 = 256 range queriesThere you have it, a single CF offering range scans for segments (tables) that are fully distributed over the cluster !
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Scanning Table
Table UUIDs also solved another problem for usMultiple tables can now be stored in the same CFSince we use UUID, it allows us to DROP tables, and CREATE with the same name.DROP’ed table deleted lazily – specially important in an eventually consistent world
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Call getSplits() method to get a list of split identifiers
Then, in parallel, scan the data in each split by calling the getSplit() method
Java:Collection<String> getSplits(String table, int desiredRecordsPerSplit);Iterator<Map<String, Object>> getSplit(String table, String split, @Nullable String fromKeyExclusive, long limit, ReadConsistency consistency);
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Java code sample
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Documents are stored as a sequence of deltasReaders evaluate deltas in order to produce document Create, update, and delete documents by creating deltasWeak consistency – no document level locking
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Typically a replication conflict between t2 and t3 But since each delta specifies only the fields it modifies, the deltas merge together cleanly and produce the desired result.No cross-data center synchronous communication required for concurrent modification
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Recursive, pattern matching approachOperations available for:Setting a valueDeleting a valueUpdating a value for a key in a mapNo operation for modifying a listModel list using a map Time UUID is a good candidate for list keys
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Literal – “smash” operation
Delete Map
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
ConditionalPerform a delta conditionally Designed to help resolve the most common concurrent write conflict situations Simple and reliableEg., Mark review “approved” only if moderation hasn’t begun
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Deltas
Other types of conditionsEqual, Intrinsic, Is, Map, And, Or, Not, ConstantEg., {..,"type":or("product","category"),"client":"TestCustomer"}
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Read-Modify-WriteRead original stateCompute new versionThe write succeeds, or Eventually, the write conflicts, and databus fires an event for the application to detect it, and retry the write.
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Data center A• T1• Conditional T3
Data center B• T1• T2•
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Compaction For efficiency, older deltas get compacted and replaced by a single delta – a “compaction” record Ensures intrinsics like ~version, ~firstUpdateAt, etc. are maintained Compaction happens opportunistically, whenever documents are read
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Allows applications to get notified of updates to SoRMust create a persistent subscription A table or multiple tables (based on value of attributes)SoR “DVR”s updates for all subscriptionsSupports multiple concurrent writers, and readers (polls and acks)No guarantees on orderTo help SoR provides ~version, and ~signatureExposes RESTful API
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus – Subscription Management
Subscribe to changes to a set of tables in the System of Record
Table filters are the same as conditions for deltasFollow events on all tables for which the condition evaluates to trueTo subscribe to all tables in the SoR, omit the condition or pass ‘alwaysTrue’
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Subscribe for multiple tables
Count events
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Poll for eventsCheck for unclaimed, unacknowledged eventsIf events not ack’d, then they will return in another poll after claim period expires
Renew claims
Acknowledge Claims
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Blob Store
REST storage service for photos.No single point of failure (data loss after 3 servers fail.)Sweet spot is blobs of a few MB, not GB (not designed for video.)Data replicates to all data centersExcept where replication is restricted by legalWhy not Amazon S3?Lower latency: reads & writes are always served out of the local data center.If you don't read cross-data center or you don't mind writing to buckets in multiple regions, use S3 or S3+Cloudfront.
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Highly Scalable Architecture
We serve traffic out of three AWS
regions simultaneously
DNS Global Traffic Management sends user
requests to the fastest region
Application services are all auto-scaled
and self-healing
Our Cassandra-based EmoDB operations out of
multiple Availability Zones, so that an AZ failure
doesn’t result in downtime
Cassandra replicates across all three regions
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Emo/Polloi Contributors
Aaron DixonAhaduzzaman MunnaDave BarceloFahd SiddiquiJohn RoeslerMark BrandtMatt BognerNate BauernfiendShawn SmithSteven Grotten
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
@Bazaarvoice
@BazaarvoiceDev
http://www.bazaarvoice.com/
http://blog.developer.bazaarvoice.com/
Learnmore
Recommended