Upload
sky-yin
View
146
Download
0
Embed Size (px)
Citation preview
// Stitch Fix is an online personal clothes shopping service
// We recommend clothes through a combination of human stylists and algorithms
Company
Redshift Spark Presto
Managed data warehouse from AWS
Distributed general purpose computing engine
Distributed SQL query engine
// Fast// Familiar to use: SQL// Managed by AWS// Expensive// Can scale up and down on demand*
Redshift
// Too many people querying in the morning// Production pipelines and ad-hoc queries
together on the same clusterTrouble
// One for production, one for ad-hoc// Sync among two clusters// Too expensive// Will hit the scalability problem again
Solution: yet another Redshift cluster?
// Presto for light ad-hoc queries// Spark for heavy jobs// Store data in S3 as the single source of truth// Running on EMR: scale up and down quickly
Solution