Upload
arizona-state-university
View
49
Download
0
Tags:
Embed Size (px)
Citation preview
AudiotopsyFinding insights and trends from music data
Goal
Ingest the million song dataset Provide an option for ad-hoc querying Enable really fast access to data
Where does the data come from
1,000,000 songs / files
273 GB of data
44,745 unique artists
515,576 dated tracks starting from 1922
Data Pipeline!!
REST API End UserBatchProcessing
Real Time Queries
Pig
HBase Schema
Key Column Family
2008019123 Artist: AdeleSong: Rolling in the deep
2009017241 Artist: GotyeSong: Somebody that I used to know
2009032523 Artist: Bruno MarsSong: Locked out of heaven
Inverted Hotttnesss
Factor
Key: 2009 017 123
Year Song Id
Getting the top songs for the year 2009
Perform a partial scan on the keys
Can avoid client side sorting :)
Insights/Challenges
Compression really helps! (360.601 sec vs 885.129 sec)
Getting all components to talk to each other
Dealing with noisy data
Finding a sweet-spot for precision of Geohash
About Me – Denny Abraham Cheriyan