Upload
datastax-academy
View
637
Download
0
Embed Size (px)
Playlists at SpotifA massively scalable storage system
Marcus BetterSoftware Engineer
Spotif by the Numbers
‣ 75 million active users– 20 million paying subscribers
‣ 30 million songs‣ 1.5 billion playlists created‣ 6 000 servers in 4 data centres‣ Available in 58 markets
Architecture overview● 400+ loosely coupled services● Backend is mostly Java● Storage options:
– Cassandra– PostgreSQL– Sparkey (our own open-source database
for static data sets)● 120+ Cassandra clusters
Playlists
Requirements● Over 1 billion lists● > 100k reqs/s● Collaborative editing● Concurrent changes● Offline editing
Version controlPlaylists as versioned objectsStore all changesChanges are immutable!
ROOT
1,2bfd16
3,def87a
2,f7a9ba
Head revision
ADD i=0, items=[A,B,C]
MOV from=1, to=0, len=1
REM from=0, len=1
List: A, B, C
List: B, A, C
List: A, C
Branches
ROOT
1,2bfd16
2,81ahcd2,f7a9ba
Two heads!
Concurrent updates lead to branchingThese will be automatically merged by the system
Merging
ROOT
1,2bfd16
2,81ahcd2,f7a9ba
Concurrent updates lead to branchingThese will be automatically merged by the system3,39acc 3,8a0ba
2,f7a9ba
ADD i=5, [A] REM i=2, len=3
ADD i=2, [A]REM i=2, len=3
Playlist data model
ROOT
1,2bfd16
3,def87a
2,f7a9ba
Head revision
ADD i=0, items=[A,B,C]
MOV from=1, to=0, len=1
REM from=0, len=1
List: A, B, C
List: B, A, C
List: A, C
Typical requests“Give me all changes since rev 2”
“Give me the latest snapshot of the playlist”
Playlist changes● Column family playlist_change stores
changes● Row key = playlist ID● Column name = revision ID
Row key 1,2bfd16 2,f7a9ba 3,def87a
spotify:user:mbetter:playlist:1234
ADD i=0, [A,B,C] MOV from=1, to=0, len=1 REM from=0, len=1
Head pointers● Column family playlist_head stores head
pointers
Row key 3,def87a
spotify:user:mbetter:playlist:1234 <empty>
Snapshot cache● playlist_change works well for syncing● Not so well for fetching new playlists● Snapshot cache
Row key snapshot
spotify:user:mbetter:playlist:1234 [A, C]
Full data model
playlist_snapshot snapshot
playlist:1234 [A, C]
playlist_change 1,2bfd16 2,f7a9ba 3,def87a
playlist:1234ADD i=0, [A,B,C]
MOV from=1, to=0, len=1
REM from=0, len=1
playlist_head 3,def87a
playlist:1234 <empty>
The playlist cluster‣ 90 Cassandra nodes‣ 18 service hosts‣ Uses FusionIO solid-state drives‣ 30 TB of data‣ 1.5 billion playlists‣ 170k reqs/s at peak globally‣ 50 playlists created every second
Pain points (ouch!)‣ Repairs‣ JVM garbage collection‣ Tombstones‣ Bulk ingestion
Open source from SpotifGet yours on spotify.github.io!– Cassandra Reaper – automates repairs– Cassandra Ops Tools– hdfs2cass – bulk load data into Cassandra– Heroic – time series database backed by Cassandra
Other contributions:– Date-tiered compaction strategy (DTCS)
Thank you!Questions?
We're hiring!https://www.spotify.com/jobsTwitter: @SpotifyEng