7
Audiotopsy Finding insights and trends from music data

Audiotopsy

Embed Size (px)

Citation preview

Page 1: Audiotopsy

AudiotopsyFinding insights and trends from music data

Page 2: Audiotopsy

Goal

Ingest the million song dataset Provide an option for ad-hoc querying Enable really fast access to data

Page 3: Audiotopsy

Where does the data come from

1,000,000 songs / files

273 GB of data

44,745 unique artists

515,576 dated tracks starting from 1922

Page 4: Audiotopsy

Data Pipeline!!

REST API End UserBatchProcessing

Real Time Queries

Pig

Page 5: Audiotopsy

HBase Schema

Key Column Family

2008019123 Artist: AdeleSong: Rolling in the deep

2009017241 Artist: GotyeSong: Somebody that I used to know

2009032523 Artist: Bruno MarsSong: Locked out of heaven

Inverted Hotttnesss

Factor

Key: 2009 017 123

Year Song Id

Getting the top songs for the year 2009

Perform a partial scan on the keys

Can avoid client side sorting :)

Page 6: Audiotopsy

Insights/Challenges

Compression really helps! (360.601 sec vs 885.129 sec)

Getting all components to talk to each other

Dealing with noisy data

Finding a sweet-spot for precision of Geohash

Page 7: Audiotopsy

About Me – Denny Abraham Cheriyan