Upload
christian-van-der-leeden
View
1.599
Download
1
Tags:
Embed Size (px)
Citation preview
NoSQL FindingsChristian van der Leeden
Thursday, September 23, 2010
Our problem• Growth is not linear and not predictable
• e.g. History::Session table now > 30 Mio entries
• Activities > 26 Mio entries
• Postgres will be the performance bottleneck
Thursday, September 23, 2010
Criteria• Allow us to scale from 100k Daily Active Users (DAU)
to 1 Mio DAU up to 10Mio DAU
• Scale horizontally (“Just add servers”)
• Good ruby performance
• Good transition from Rails/Postgres -> Rails/NoSQL
• Actively developed
Thursday, September 23, 2010
Goal• Scores (@ 10 Mio Daily Active Users)
• 10 Mio Scores/day == 350 inserts/second
• around same read rate for Leaderboards
• Game with 10 Mio Players
• Leaderboard with 10 Mio entries
• Session (@ 10 Mio DAU)
• > 10 Mio session handshakes/day
Thursday, September 23, 2010
Data Patterns• Most data is accessed time based (the most recent
data is accessed the most often)
• Write-Read rate is around the same
• Eventually consistency is good enough most of the time
Thursday, September 23, 2010
Rating criteria• Type (Document Store, Key/Value Store, Big Table)
• Deployment
• How easy is it to scale?
• Existing installations
• How big are known installations?
• Heritage and activity
• Where does the solution come from and how actively is it developed by whom?
Thursday, September 23, 2010
Products evaluated• MongoDB
• Redis
• Cassandra
• HBase
• Membase
Thursday, September 23, 2010
MongoDB• document store
• “SQL DB” without relations
• easy transition with MongoMapper, Mongoid
• supports sharding over replication sets (since August 2010)
• Haven’t found a big shareded server installation
Thursday, September 23, 2010
Experience with Mongo• nice/easy to program with
• deployment woes we’ve encountered (1.6.0)
• segmentation fault
• cannot read beacuse: invalid BSON object
• when index is > RAM performance degradation (from 20ms to 200 ms for queries)
• Global write lock makes data migrations slow
Thursday, September 23, 2010
Cassandra• Big Table data store
• Was developed by Facebook and is actively maintained
• Easy to add servers and to setup (peer to peer concept)
• Thrift API to Ruby was slow in tests (Our tests: around 150 write ops/second)
• Avro API promises to be faster (will be an option in 0.7)
• Used by Facebook
• Not using it because it is too slow with ruby
Thursday, September 23, 2010
Redis • Memcache with simple persistence
• Supports many different data types and atomic operations on them
• Sharding is done client side (difficult to add new servers)
• We’re using it for indexes on SQL data
• Very fast (Our tests: 4000 write operations/second)
Thursday, September 23, 2010
HBase• Big Table Database
• Complex to setup and to maintain
• Very often used for Analytics Jobs with Hadoop/HIVE e.g as Amazon EC2 Elastic Map Reduce
• For Analytics also look at Scribe for data collection
Thursday, September 23, 2010
Membase• Key-Value Store
• Distributed, persistent Memcache
• Easy to add nodes
• Used by Zynga
Thursday, September 23, 2010
Example Leaderboards• User has many scores
• Each score has one result (integer)
• Game has many scores
• Query: the leaderboard for one game
• Insert one score into the leaderboard
• What is my rank?
• Give me 10 scores starting at position 100,000
Thursday, September 23, 2010
SQL vs NoSQL• Think about Data
• Redundancy is bad
• Indexes are managed by the DB
• Query over relations
• Always exact results
• Think about Queries
• Redundancy is ok
• Roll your own indexes depending on queries
• No Joins and connecting entities
• Query results don’t have to return latest write operation
Thursday, September 23, 2010
SQL vs NoSQL• standardized query
language and DDL
• All DBs are “the same”
• some solutions share standards
• Many different approaches
• Document store
• Big Table
• Key Value
Thursday, September 23, 2010
Postgres
• Create new score: Score.new(attributes)Score.save => insert into scores;
• What is my rank?select count(*) from scores inner join games on (games.id = scores.game_id) where result > #{my_score.result} and games.name = #{game_name} order by result desc
• Give me 10 scores in leaderboard from position 100000 select * from scores inner join games on (games.id = scores.game_id)order by result desc offset 100000 limit 10;
ScoreUser Game1 n n 1
Thursday, September 23, 2010
Redis• New Score
redis.zadd(“Jewels”, result, score_id)
• My Rank?redis.zrevrank("Jewels", result)
• 10 scores from position 100000redis.zrevrange(“Jewels”, 100000, 10)
SortedSet
key: game_namescore: resultvalue: score_id
key: "Jewels"
100<2563>
99<96877>
96<6752>
...
key: "Bug Landing"
key: "Toss It"
...
KeyValue Store
key: score_idvalue: marshalled score object
2563: { result : 100, user_id : 52345, game_id: 57142 } 96877: { result : 99, user_id : 2541, game_id: 57142 } 9752: { result : 96, user_id : 3652, game_id: 57142 }
Thursday, September 23, 2010
Mongo
• New ScoreScore.create!(attributes)db.scores.insert( { result: 100, user_id: 52345, game_id: 57142 } )
• What is my rank?db.scores.count( { result: { $gt: #{my_score.result} }})
• 10 scores from position 100000db.scores.find({}).sort({ result: -1 }).skip(100000).limit(10)
Collection
key: Scores
{ _id: 6752, result : 96, user_id : 3652, game_id: 57142 }{ _id: 96877, result : 99, user_id : 2541, game_id: 57142 }
{ _id: 2563, result : 100, user_id : 52345, game_id: 57142 }
Thursday, September 23, 2010
CassandraColumFamily: Leaderboards
row_key: game_name
row_key: "Jewels"
100: 2563
row_key: "Bug Landing"
row_key: "Toss It"
...
99: 96877 96: 6752
ColumFamily: Scores
row_key: score_id
row_key: 2563
result: 100
...
game_id: 57142 user_id: 6325
row_key: 96877
result: 99 game_id: 57142 user_id: 2375
row_key: 6752
result: 96 game_id: 57142 user_id: 2311
Thursday, September 23, 2010
Cassandra • Insert new score:
client.insert(“ScoreList”, “Jewels”, result => id)client.insert(id, :result => result, :user_id => user_id, :game_id => game_id)
• What is my rank?=> not easy, need help from other tools
• Give me the next 10 scores starting at score Xclient.get(“ScoreList”, “Jewels”, :start => X.result, count => 10)
ColumFamily: Leaderboards
row_key: game_name
row_key: "Jewels"
100: 2563
row_key: "Bug Landing"
row_key: "Toss It"
...
99: 96877 96: 6752
Thursday, September 23, 2010
Findings• Use and test the tools you want to use on the scale
you are going to use them
• There is no “Best NoSQL” solution
• Mix and match the tools you need
• NoSQL requires a lot of rethinking and change in your Ruby Code.
Thursday, September 23, 2010
Links• Cassandra: http://cassandra.apache.org/
• Cassandra API: http://wiki.apache.org/cassandra/API
• Twitter on Cassandra: http://github.com/ericflo/twissandra
• Redis: http://code.google.com/p/redis/
• Redis API: http://code.google.com/p/redis/wiki/CommandReference
• Membase: http://www.membase.org/
• HBase: http://hbase.apache.org/
• Scribe: http://github.com/facebook/scribe
• Mongo: http://www.mongodb.org/
Thursday, September 23, 2010