Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
www.dataloop.io | @dataloopio | [email protected]
Monitoring for Online Services
Disclaimer
• Not an Erlang developer!
• May defer questions Tomasz!
• Based on a true story
What is Dataloop?
PerformanceUp / Down Alerts
Dev Env Enterprise Stuff
Architecture
First Year
First Year
Measure
Putting out the fire
rollup workermetric worker
Problems
• NodeJS metrics workers not scaling!
• Memory management was an issue!
• Needed big caches to reduce database load!
• GC cycles too long!
• 8 x single processes on an 8 core server
Languages
• Decided on Erlang!
• Memory management!
• Fault tolerance!
• Good libraries for Rabbit and Riak!
• Live code tracing
Metric worker re-write
• Approximately 6 weeks from no Erlang experience to working version!
• No more crashes!
• Reduced servers needed from 16 to 8
New Features
Dalmatiner DB
• Open Source Time-Series DB!
• Written in Erlang!
• Based on Riak-Core and uses ZFS!
• Optimised for write throughput!
• Needed for developer analytics features!
• https://dalmatiner.io/
Modifications
• Floating point support!
• Interfaces with C via NIF!
• Lots of fixes for our shape of data
New metrics worker
• Worked with Erlang solutions!
• Cross trained team (Dave and Tomasz)!
• Removed the Redis!
• Reduced servers needed from 8 to 2
new metrics worker
New Things
• Lager!
• Pooler!
• Dialyzer!
• Quick Check!
• Rebar3
• Recon!
• Dave
Data Migration
• Used existing NodeJS code with Node_Erlastic!
• Uses ports interface!
• Saved a lot of development time vs writing from scratch!
• Migrated one organisation at a time over several weeks!
• Ran Riak and Dalmatiner in parallel and then switched
Today
Happy Ending
Next
• Convert the other workers to Erlang!
• Add metric dimensions to Dalmatiner DB!
• Make RabbitMQ more robust!
• Hire more Erlang developers!
Q&A