No sql findings

NoSQL FindingsChristian van der Leeden

Thursday, September 23, 2010

Our problem• Growth is not linear and not predictable

• e.g. History::Session table now > 30 Mio entries

• Activities > 26 Mio entries

• Postgres will be the performance bottleneck


Criteria• Allow us to scale from 100k Daily Active Users (DAU)

to 1 Mio DAU up to 10Mio DAU

• Scale horizontally (“Just add servers”)

• Good ruby performance

• Good transition from Rails/Postgres -> Rails/NoSQL

• Actively developed


Goal• Scores (@ 10 Mio Daily Active Users)

• 10 Mio Scores/day == 350 inserts/second

• around same read rate for Leaderboards

• Game with 10 Mio Players

• Leaderboard with 10 Mio entries

• Session (@ 10 Mio DAU)

• > 10 Mio session handshakes/day


Data Patterns• Most data is accessed time based (the most recent

data is accessed the most often)

• Write-Read rate is around the same

• Eventually consistency is good enough most of the time


Rating criteria• Type (Document Store, Key/Value Store, Big Table)

• Deployment

• How easy is it to scale?

• Existing installations

• How big are known installations?

• Heritage and activity

• Where does the solution come from and how actively is it developed by whom?


Products evaluated• MongoDB

• Redis

• Cassandra

• HBase

• Membase


MongoDB• document store

• “SQL DB” without relations

• easy transition with MongoMapper, Mongoid

• supports sharding over replication sets (since August 2010)

• Haven’t found a big shareded server installation


Experience with Mongo• nice/easy to program with

• deployment woes we’ve encountered (1.6.0)

• segmentation fault

• cannot read beacuse: invalid BSON object

• when index is > RAM performance degradation (from 20ms to 200 ms for queries)

• Global write lock makes data migrations slow


Cassandra• Big Table data store

• Was developed by Facebook and is actively maintained

• Easy to add servers and to setup (peer to peer concept)

• Thrift API to Ruby was slow in tests (Our tests: around 150 write ops/second)

• Avro API promises to be faster (will be an option in 0.7)

• Used by Facebook

• Not using it because it is too slow with ruby


Redis • Memcache with simple persistence

• Supports many different data types and atomic operations on them

• Sharding is done client side (difficult to add new servers)

• We’re using it for indexes on SQL data

• Very fast (Our tests: 4000 write operations/second)


HBase• Big Table Database

• Complex to setup and to maintain

• Very often used for Analytics Jobs with Hadoop/HIVE e.g as Amazon EC2 Elastic Map Reduce

• For Analytics also look at Scribe for data collection


Membase• Key-Value Store

• Distributed, persistent Memcache

• Easy to add nodes

• Used by Zynga


Example Leaderboards• User has many scores

• Each score has one result (integer)

• Game has many scores

• Query: the leaderboard for one game

• Insert one score into the leaderboard

• What is my rank?

• Give me 10 scores starting at position 100,000


SQL vs NoSQL• Think about Data

• Redundancy is bad

• Indexes are managed by the DB

• Query over relations

• Always exact results

• Think about Queries

• Redundancy is ok

• Roll your own indexes depending on queries

• No Joins and connecting entities

• Query results don’t have to return latest write operation


SQL vs NoSQL• standardized query

language and DDL

• All DBs are “the same”

• some solutions share standards

• Many different approaches

• Document store

• Big Table

• Key Value


Postgres

• Create new score: Score.new(attributes)Score.save => insert into scores;

• What is my rank?select count(*) from scores inner join games on (games.id = scores.game_id) where result > #{my_score.result} and games.name = #{game_name} order by result desc

• Give me 10 scores in leaderboard from position 100000 select * from scores inner join games on (games.id = scores.game_id)order by result desc offset 100000 limit 10;

ScoreUser Game1 n n 1


Redis• New Score

redis.zadd(“Jewels”, result, score_id)

• My Rank?redis.zrevrank("Jewels", result)

• 10 scores from position 100000redis.zrevrange(“Jewels”, 100000, 10)

SortedSet

key: game_namescore: resultvalue: score_id

key: "Jewels"

100<2563>

99<96877>

96<6752>

...

key: "Bug Landing"

key: "Toss It"

...

KeyValue Store

key: score_idvalue: marshalled score object

2563: { result : 100, user_id : 52345, game_id: 57142 } 96877: { result : 99, user_id : 2541, game_id: 57142 } 9752: { result : 96, user_id : 3652, game_id: 57142 }


Mongo

• New ScoreScore.create!(attributes)db.scores.insert( { result: 100, user_id: 52345, game_id: 57142 } )

• What is my rank?db.scores.count( { result: { $gt: #{my_score.result} }})

• 10 scores from position 100000db.scores.find({}).sort({ result: -1 }).skip(100000).limit(10)

Collection

key: Scores

{ _id: 6752, result : 96, user_id : 3652, game_id: 57142 }{ _id: 96877, result : 99, user_id : 2541, game_id: 57142 }

{ _id: 2563, result : 100, user_id : 52345, game_id: 57142 }


CassandraColumFamily: Leaderboards

row_key: game_name

row_key: "Jewels"

100: 2563

row_key: "Bug Landing"

row_key: "Toss It"

...

99: 96877 96: 6752

ColumFamily: Scores

row_key: score_id

row_key: 2563

result: 100

...

game_id: 57142 user_id: 6325

row_key: 96877

result: 99 game_id: 57142 user_id: 2375

row_key: 6752

result: 96 game_id: 57142 user_id: 2311


Cassandra • Insert new score:

client.insert(“ScoreList”, “Jewels”, result => id)client.insert(id, :result => result, :user_id => user_id, :game_id => game_id)

• What is my rank?=> not easy, need help from other tools

• Give me the next 10 scores starting at score Xclient.get(“ScoreList”, “Jewels”, :start => X.result, count => 10)

ColumFamily: Leaderboards

row_key: game_name

row_key: "Jewels"

100: 2563

row_key: "Bug Landing"

row_key: "Toss It"

...

99: 96877 96: 6752


Findings• Use and test the tools you want to use on the scale

you are going to use them

• There is no “Best NoSQL” solution

• Mix and match the tools you need

• NoSQL requires a lot of rethinking and change in your Ruby Code.


Links• Cassandra: http://cassandra.apache.org/

• Cassandra API: http://wiki.apache.org/cassandra/API

• Twitter on Cassandra: http://github.com/ericflo/twissandra

• Redis: http://code.google.com/p/redis/

• Redis API: http://code.google.com/p/redis/wiki/CommandReference

• Membase: http://www.membase.org/

• HBase: http://hbase.apache.org/

• Scribe: http://github.com/facebook/scribe

• Mongo: http://www.mongodb.org/


http://cassandra.apache.org

http://cassandra.apache.org

http://wiki.apache.org/cassandra/API

http://wiki.apache.org/cassandra/API

http://code.google.com/p/redis/

http://code.google.com/p/redis/

http://www.membase.org

http://www.membase.org

http://hbase.apache.org

http://hbase.apache.org

http://github.com/facebook/scribe

http://github.com/facebook/scribe

http://www.mongodb.org

http://www.mongodb.org

Documents

No sql findings