Upload
oren-raboy
View
989
Download
0
Embed Size (px)
Citation preview
AGENDA (PART 1)
• Background about Totango and our data architecture
• Spark in the Totango Architecture
• Quality: Testing Spark code in production
~ 500M accounts~ $5B revenue under management~ 100M events per day
Our Customers: The Worlds Leading Cloud Services
Totango Data Architecture
Collection
Real-time processing
Batch processing
Pixel
3rd Party(SFDC)
CSV
Serving Layer
• ‘Lambda Architecture’• Hosted on AWS• AWS and Open-source technologies• Java with a dash of Python
Totango Data Architecture
Collection
Real-time processing
Batch processing
Pixel
3rd Party(SFDC)
CSV
Serving Layer
• Hosted on AWS• ‘Lambda Architecture’• AWS and Open-source technologies• Java with a dash of Python
Kinesis
Kinesis
S3
ELB
Totango Data Architecture
Collection
Real-time processing
Batch processing
Pixel
3rd Party(SFDC)
CSV
Serving Layer
• Hosted on AWS• ‘Lambda Architecture’• AWS and Open-source technologies• Java with a dash of Python
Kinesis
Kinesis
S3
ELB
Batch Processing• Executed once a day (midnight at customer’s local-time)• Each task calculates a set of account-metrics (e.g. Health,
Change)• One Spark cluster runs all tasks for all customers• Pipeline executed by Pipeline Runner, using Spotify Luigi
calcsome
metrics
calcother
metrics
more
mergeresults
Somedependent
computation
Mergeresults
Into finaldocument
Raw Events Account Documents
Environment• Multi tenant: Shared infrastructure for all Totango customers
(Services) • Daily, hourly and on-demand schedule• Standalone Spark cluster on AWS EC2 instances• Input and Output on S3. Final results also indexed on
ElasticsearchService A
calcsome
metrics
calcother
metrics
more
mergeresults
Somedependent
computation
Mergeresults
Into finaldocument
Raw Events Account Documents
Service A
calcsome
metrics
calcother
metrics
more
mergeresults
Somedependent
computation
Mergeresults
Into finaldocument
Raw Events Account Documents
Service XYZ
calcsome
metrics
calcother
metrics
more
mergeresults
Somedependent
computation
Mergeresults
Into finaldocument
Raw Events Account Documents
Requirements from infrastructure:• Reliability: Calculate metrics accurately at all times• Velocity: Frequent release of new data processing code
Challenge: High quality and highly automated regression testing
calcsome
metrics
calcsomemetric
more
mergeresults
Somedependent
computation
Mergeresults
Into finaldocument
Raw Events Account Documents
NEWVERSION
How do we make sure the new version didn’t break anything?
calcsome
metrics
mergeresults
Somedependent
computation
Mergeresults
Into finaldocument
Raw Events Account Documents
NEWVERSIONSHADOW
OLD VERSION
compare csv
Testing In Production: How• Before deployment, run release-candidate ‘side by side’ older version.
• New version runs in Shadow mode and does not propagate results
• Compare old and new version results. Output unexpected diffs• Deploy to production only if no diffs across all customer data
sets
1.
2.
3.
4.
5.
Unit testing
Test environment: Integration testing
Side by side testing in production of new code
New code rolled-out, old versionside-by-side as backup
Rollout complete!
Deployment Flow
• We know the new version works correctly
• We do not need to think of all the corner test-cases
• We do not need to write lots of regression tests
QUESTIONS?• labs.totango.com <-- engineering team
blog• [email protected] <-- me!• Yes, we are hiring!