- 1. DevOps Down Under 2011 Sprinkling DevOps Magic in Other
People's Environments Robert Postill
2. How's This Gonna Go Down? 3. How's This Gonna Go Down?
- Everybody's got a story and this is our's 4. Our first
architecture 5. Learning from failure 6. A brief aside 7. Getting
better 8. Tough messages 9. Where to from here
10. C3's Story
- There was a dream 11. It makes the excel go into the data
warehouse 12. And it's done badly 13. So we built a prototype 14.
Then we made a sale
15. A little bit of how it works 16. Priorities of our first
architecture
- Works! 17. Restarts when the machine restarts 18. Remotely
deploy updates 19. Not a lot of state on the VM
20. Our first architecture 21. Our first architecture 22.
Lesson: Most customers will accept a small selection of services if
you give them a report from that service 23.
create_deployment.sh
- Poor man's capistrano 24. A shell script that:
- Fetched the latest from github 25. Exported it to a datestamped
directory 26. Made a set of symlinks point to the right places 27.
Restarted the app
28. Flaws
- We knew practically nothing about what washappening on the box
29. The logs... THE LOGS FIX THOSE FREAKING LOGS!!!
30. And the worst flaw of all...
- We started to get calls that started with:
- Integritys down, what's the score?
- Then we'd have a look... 31. And it would be the database
32. Lesson: Things you don't own go badly wrong and the first
people to know are the end users 33. A lot of sad face 34. So we
revved the architecture 35. Then more stuff happened...
- We continued to get calls that started with:
Integritys down, what's the score?
- Then we'd have a look... 36. And it would be the VM, mounted
disks read-only
37. Lesson: Virtual Machines are prone to at least a couple of
novel modes of failure 38. Which started to lead to the inevitable
39. So the next problem... Us
- New Relic gives you slow transaction reports 40. In ruby
select, collect and friends are ways of making in memory decisions
over collections of things 41. Which works on test set sizes of ten
or so 42. But doesn't on large volumes of things, like say a couple
of million objects 43. We'd created a technical debt mountain
44. Hiring someone new 45. A brief trip to the metaworld
- We're devops by necessity 46. There is no ops department 47.
Our devs cover a lot of ground
- Architecture 48. Operations 49. Database Administration 50.
Networking 51. Support 52. Business Analysis
53. Behold the AnDevOpSuptecht
- It used to be that a lot of places had Systems Programmers 54.
Now it feels like architects are going the same way 55. Where's the
limit going to be drawn on the responsibility of an individual...
56. Are we thinking about the roles we play in the wrong way?
57. Crap Maths Applied To Recruitment
- Australian Population : 21,874,900 58. Melbourne Population:
3,478,138 59. 22.6% ' professionals' in 2006 census: 786,059 60.
Professionals in 'information, media and telecoms': 14,246 61.
Spolsky says 1 in 200 dev applicants can dev, leaving: 712 62.
TIOBE Index says Ruby is used by 1.484% of devs: 10
63. Crap Maths Applied To Recruitment
- Australian Population : 21,874,900 64. Melbourne Population:
3,478,138 65. 22.6% ' professionals' in 2006 census: 786,059 66.
Professionals in 'information, media and telecoms': 14,246 67.
Spolsky says 1 in 200 dev applicants can dev, leaving: 712 68.
TIOBE Index says Ruby is used by 1.484% of devs: 10
69. So...
- Before we look into
- Team fit 70. Seniority 71. Skills (Ubuntu, Databases, Business
intelligence...)
- I need a lie down :( 72. Congratulations to you in Melbourne
who do hire devops! 73. Do we need to think about
apprenticeships?
74. Lesson: You need good people, really good people 75.
Meanwhile, back at the point... 76. Looking To Get Smart
- We wanted to get start deploying to numbers of machines (>
10) 77. We needed a way to start automating deployment 78. Have you
seen this chef thing? 79. So we started creating recipes
80. But we had issues
- I don't want to beat up on chef 81. The development of our
architecture was *much* slower through chef 82. We lost our chef
database 83. We tried to run chef server internally on two
instances 84. We spent a lot of time learning things like never use
the ui, only ever use data bags 85. chef changed too fast and we
also changed too fast
86. Lesson: The tools may not be mature enough and more
importantly you may not be mature enough to use them 87. So now
we...
- Take a stock Ubuntu VM 88. Customise via capistrano scripts 89.
Snapshot, distribute 90. Update via capistrano and
create_deployment.sh 91. Distribute SSH keys via chef
92. And the customers kept on ringing
- In particular there was the terrible case of the wild
performance swings 93. New Relic would give us 6x, 4x, 12x
performance swings dependant on the week. 94. We'd see CPU spikes
and terrible loads applied to the mongrels as users got
frustrated
Integritys slow, what's the score?
95. And that got difficult
- We had to start asking for VMWare metrics 96. Our working
assumption was the same version does not pitch and roll like this
97. Lets be honest what we're saying is we don't think you can
manage your own infrastructure 98. Explicitly :(
99. A lot of thinking... 100. Little by little we ground out
answers
- We found out there wasn't a lot of separation between VMs 101.
Then we found out the VMs were moving over different physical hosts
(vMotion) 102. And then we started to get a handle on
overcommitment
103. Lesson: Smart tools can play havoc with performance 104.
Lesson: VMWare (or their competitors) is not a magic well 105.
Where we are now 106. Where we are now 107. There's plenty for us
still to do
- Retire create_deployment.sh 108. Automate deployment 109.
Refactor the architecture to give us scalability over numerous
machines 110. Deploy to only part of the architecture 111. Deploy
based on need
112. Wrapping Up
- Pushing your stuff into other people's environments is hard
113. Back yourself with the stats and share them 114. Make sure
your app has sufficient canaries 115. Find good people 116. Prepare
for tough conversations
117. Questions?
- Photo credits (in order of appearance):
- http://www.flickr.com/photos/ricoslounge/38351363/- ricoslounge
118. http://www.flickr.com/photos/jima/3435396513/- jima 119.
http://www.flickr.com/photos/34495711@N06/3613301938/- Aaron
Frutman 120. http://www.flickr.com/photos/dancoulter/21042744/- Dan
Coulter 121. http://www.flickr.com/photos/abennett96/2639105060/-
BenSpark 122. http://www.flickr.com/photos/bcymet/1923368669/-
bcymet