100

Choose Boring Technology

Embed Size (px)

Citation preview

Hey

I’m Dan McKinley. That’s me in the hole. It’s a metaphor.

I work for a company called Stripe. Before that, I was an early employee at Etsy, where I worked for a lot of years. Iacquired a great deal of practical experience at Etsy so I’m going to be referring to my time there a lot.

Etsy wasn’t mature as an engineering organization when I got there, but I was eventually spoiled rotten when it cameto technology culture at Etsy.

As I’ve gone back into the wider tech world, I’ve had to confront some questions I hadn’t really considered in a fewyears.

And I’ve realized I have opinions about these things. That’s what this talk is about.

So how do you choose technology? This was eventually more or less handled for me at Etsy. Now I need to worryabout it again.

You can achieve anything with software. And I definitely believe that companies don’t usually succeed or fail becauseof specific technical choices.

But technology choices are relevant. They affect how straight the path is between you and achieving your goals. Theyaffect your efficiency.

Another question that I care about is: how do you make developers happy? This matters to me, as a developer. Butalso as a leader in a software organization—productivity and retention rely on this.

If you ask developers, a lot of them will tell you they’re happy when they’re working with technology that’s exciting. Oranother thing they say is that they like to work on hard problems.

Those things may or may not be true. But what I’ve learned with experience is that it’s not really the case at thehighest levels of fulfillment. They don’t talk about nodejs in heaven.

You can probably tell from the title of this talk that I don’t think that chasing shiny technology is right. But look, I’vebeen there.

I, too, once chased shiny technology.

Etsy early on was a big ball of PHP, written by an overall brilliant person who was unfortunately learning PHP as hewas writing it.

I spent years trying to avoid dealing with the results of that. At one point I tried building Scala services that talked toMongoDB.

I wrote blog posts about this that Etsy employees are still giving me shit about. And with good cause.

I think it’s fair to say that I’m a completely different kind of engineer now. I tend to be focused on things that are onlyvaguely engineering. I talk at design conferences, or in the “business” track. I care a lot more about product than youraverage engineer.

I view this less as the result of getting old and cranky and more as the result of climbing up Maslow’s hierarchy ofneeds. Maslow’s hierarchy, briefly, is the idea that you have to satisfy your more basic needs before higher levels ofintellectual fulfillment are possible.

The same is basically true about software. You can’t ask intelligent questions about the direction of the product ifyou’re worried about which database to use or which alerting system to use.

In my career to date, I’ve been pretty lucky to have my most basic needs fulfilled. And I want to help get others to thisstate.

So, try to think of me as a time traveler from your future. I’ve been through the shiny technology wars you might befighting today. It’s better over here. The air is fresher. Food tastes better.

So, on to the problem of choosing technology. A thing that I think is obviously true is that as human beings, we havelimited attention. We can only worry about so much stuff at one time.

I personally model that like this. You could say that we all get a limited number of innovation tokens to spend. This is aconstruct I just made up, but I think it’s helpful. And since I created this currency I also decided to put Elon Musk on it.

These represent our limited capacity to do something creative, or hard. We really don’t have that many of these toallocate. Early on in a company’s life, we get like maybe three. Not too many more than that.

So what’s your company trying to do? Well, Etsy, where I used to work, is trying to reshape the world economy.

I dunno, that sounds like a big job. That probably requires at least one of your tokens.

The company where I work now is trying to increase the GDP of the internet.

Again, that sounds like a pretty complicated thing to be doing. We probably have to spend at least one of our tokenson that. Maybe two. Maybe all of them!

If you think about innovation as a scarce resource, it starts to make less sense to be on the front lines of innovating ondatabases. Or on programming paradigms.

The point isn’t really that these things can’t work. Of course they can work. But exciting new technology takes a greatdeal more attention to work than boring, proven technology does.

To get at the reason for that I want to talk about the philosophy of knowledge a little bit. What can we know about apiece of technology? This is not actually a frivolous question. It’s really important.

Now look, I don’t like Donald Rumsfeld. But he’s associated with the following, which is thoroughly relevant to oursubject.

And that’s this. When we don’t know something, there are really two different categories that that lack of knowledgecan be in.

There are known unknowns, that is, things that we know that we don’t know. And there are unknown unknowns, thingsthat we don’t know and that we don’t know that we don’t know.

This applies in technology. This is an example of a known unknown. For a given database, we might not know whathappens when a network partition occurs. But we know that a network partition is possible. Since we know that this ispossible, we can test for this. Or we can just cross our fingers and hope that it doesn’t happen. Either way, we areinformed about the possibility.

There are also unknown unknowns in technology. This is a good example I saw a few months ago. This person had ajava process that was writing stats to a file, and that was causing GC pauses. It took him forever to figure this outbecause the possibility hadn’t occurred to him. That’s an unknown unknown.

Now, it’s important to realize that both categories are present in all software. There are always bugs that nobodyknows about, even in software that’s been around forever.

But it’d be wrong to say that all technology is therefore equivalent. New technology has larger magnitudes for both ofthese sets.

New tech typically has more known unknowns, and many more unknown unknowns. And this is really important.

Boring technology in a nutshell is technology that’s well understood. We know what it’s capable of, and at least asimportantly, we also know what it’s not capable of. We know how boring technology fails.

So, ok, all you have to do is pick proven technology, and you’re all set, right? Well, no. The combination of things thatyou choose also matters.

Let’s say that you’re already using this stack. You have python, memcached, mysql, and apache.

Let’s say you have a new problem to solve. Do you think it makes sense to add ruby to your existing stack?

I think most people’s intuition there is “probably not.” We know that the marginal utility of adding ruby isn’t going tooutweigh the complexity hit we take by adding it. Python and ruby feel pretty equivalent.

And we’ve had formal proofs since the 1930s that all problems can in principle be solved with one or the other.

Ok, so how about adding redis? We already have mysql and memcached, but should we add redis?

About here is where people lose it and start beating the polyglot programmer drums. There’s something about the ideaof adding a new database that has people storming the Bastille, saying “you can’t stop us from using the best tool forthe job!”

People tend to think that what they're doing when they acquiesce to this is that they're giving developers freedom. Andsure, it is freedom, but it's freedom very narrowly defined.

What’s going on there? Let’s try to tease this apart.

This is what we’re implicitly saying when we want to add a piece of technology.

Except in relatively rare cases where it’s not possible to solve a problem with our existing stack, we’re saying that thenew tech is going to be so much better in the near term that this benefit outweighs the cost of having two pieces oftechnology around in perpetuity.

We can actually start to formalize this idea, and think about it a structured way.

Well, sort of. I don’t expect to see this published in ACM. But here goes.

Your job is basically what my friend Coda says, here. You’re supposed to be solving business problems withtechnology.

We can model that as a bipartite graph. On the left side we have business problems, and on the right side we havetechnical solutions.

As practitioners we have to try to connect all of the nodes on the left side so that our problems are solved. Adding anedge here is making a technology choice.

Every choice has a maintenance cost, but we also get the benefit of the technology that we choose.

Every choice has maintenance costs, but every choice also helps us solve the problem. So we have a nonzero benefit,and a nonzero cost for every choice.

When we add more than one edge, we can make a choice. We can use the same technology that we’ve already paidfor …

Or we could pick a different piece of technology. We have to pay for that new tech, too, but maybe we get so muchdevelopment velocity that it’s worth it.

We can start to think about this mathematically. We’re trying to minimize this cost function. The total cost of ouroperations is all of the maintenance costs we take on from our choices, minus the development velocity we get fromevery choice.

The way we behave really depends on what you believe about which term dominates this equation in the real world.

If technology is really expensive to operate, the costs dominate. If technology really makes a huge difference in howeasy your job is, the benefits dominate.

So, depending, you might decide to make an allocation like this. Here we’ve picked many different technologies to useto solve all of our problems.

And that makes complete sense if each additional technology choice is cheap.

If we think that we get more out of using each new technology than we’ll pay for operationalizing it, then doing it thisway makes sense.

This is an alternative strategy. Here we’ve chosen just a few technologies,

And that’s what we should do if we think that each technology we add comes with a lot of baggage.

Here in reality, new technology choices come with a great deal of baggage.

This is reality. Costs to operate a technology in perpetuity tend to outstrip the convenience you get by using somethingdifferent.

So this tends to be the right way to do it. We should generally pick the smallest set of tech that lets us get the jobdone.

That’s the case because operating a piece of technology at a professional level turns out to be really hard. It’s easy toget started with a lot of technology, but harder to do a really good job with it.

This is why. Adding the technology is easy, living with it is hard. These are all the things you have to worry about.

Polyglot programming is not the kind of freedom we are looking for.

If you’re giving individual teams or individual engineers free reign to make local decisions about infrastructure, you’rehurting yourself globally.

It’s handing developers the chains so that they’re free to imprison themselves with operational toil, forever.

There’s more to this than just avoiding operational overhead. By embracing polyglot programming, you’re alsodiscarding real benefits that only arise when everyone’s using a shared platform.

A good example of this from my experience is Etsy’s activity feeds. I built this with a small team back in 2010.

Here’s a totally reasonable way to build activity feeds, if that’s all you’re trying to do. You could write events to mysql,aggregate them into a feed offline, stuff the feed into redis, and then serve the feeds to end users from redis. Thiswould totally work great.

But when we set out to build activity feeds, we didn’t have redis. We did have memcached. They’re sort of similar butthat have very different guarantees. The most relevant difference to us here is that Redis is persistent, and memcacheisn’t.

We didn’t add redis to our stack to make activity feeds. We made do with what we had.

And that required a good bit of extra effort up front. Since memcached isn’t persistent, we had to write a bunch of extracode to possibly generate the feed fresh for frontend requests. We couldn’t just assume that the feed would exist whenthe user came to the site.

That was hard work we wouldn’t have had to do if we added redis, but we got through it.

Then we walked away. We didn’t do anything related to activity feeds for years after that.

But a funny thing happened. The usage of activity feeds exploded by 20 times. And it was totally fine.

This is the greatest purely technical achievement in my entire career.

The reason it was totally fine was because we used the shared stack. We had to plug in more mysql shards andmemcached boxes, but people were doing that anyway.

If we’d done redis just for activity feeds, you can be sure that redis would have become distressed as the featurescaled up 20 times. And we would have had to go back and work on redis just to keep activity feeds working.

Or more likely, someone else would have had to do it. Our team didn’t exist at all a year later, we were all working ondifferent things. Making a mess for others to clean up strikes me as even worse. That’s what you’re doing by adding apiece of technology that makes sense locally.

This is an example, but it’s not an absolute principle. Obviously sometimes it does make sense to add new technologyto your stack.

So I wanted to finish by talking about how we should go about doing that.

First of all, it’s important to recognize that adding technology is a process. Technology has global effects on yourcompany, it isn’t something that should be left to individual engineers.

I don’t care if you’re a flat organization, a holocracy, or if you have 500 middle managers. You have to figure out howto talk to each other before you add new technology.

When we were all using real hardware, it was usually the case that talking to at least one other person was necessarybefore adding something new. Now everybody’s on AWS, and this is no longer true. Engineers can sit in a corner andproliferate new systems all day.

I don’t think that real hardware is a good thing on balance, but I do think that talking to people is a positive thing. Wejust have to work harder to do this now, and have those conversations on purpose.

The first question you should talk about is how you’d solve the problem without adding anything new.

I think that you’ll notice that pretty often, this is enough to end the conversation. Because a high percentage of thetime, the problem to be solved is that someone wants to use a new piece of tech for its own sake. You should notentertain this impulse as a serious person.

But anyway, assuming that you have a real problem, the answer is rarely that you can’t do it. If you have a functioningwebsite of any kind and you think you can’t accomplish a specific new feature with what you’ve already have, you’reprobably just not thinking hard enough.

You may need to resort to unnatural acts, but you can get pretty far with a minimal stack.

Again, you might have to do really awkward things, and it’s possible that those are too costly. But you should talkabout and write down what those things are.

And if you decide to try out a new piece of technology, you should figure out low-risk ways to get started. Your tacticshould not be to rewrite your entire application with it in one step. You should be proving the technology in productionwith minimal risk, and then gradually gaining confidence in it.

But ultimately, if you’re adding a redundant piece of technology, your goal is to replace something with it. Your goalshouldn’t be to operate two pieces of technology that are redundant with one another forever—commit to replacingwhat you have, or don’t add the technology.

So, in closing

This is what you should do, most of the time. Choose technology that’s well understood, with failure modes that areknown.

Use technology that lets you focus your attention on what really matters.

Don’t choose tech because of testimonials on Hacker News. Hacker News is kind of like Fox News, and not justbecause it’s dominated by libertarians.

Something terrible is happening somewhere in the world all the time, so cable news always has a story. Someone’sporting their site to a NoSQL database right now, and they’ll write an unreasonable blog post about it that will be onHN. It’s unreasonable to extrapolate in either of these scenarios.

Choose a few globally optimal technologies. Don’t make local decisions. Be kind to your future coworkers. Be kind toyour future selves.

It’s important to master the tools that you do pick.

Every piece of software has this curve to some degree. When you start out you encounter a bunch of problems, butyou expect to get them ironed out over time.

There’s a natural tendency to want to give up on something in its infancy. When you’ve got a lot of problems with athing, people freak out and want to switch to something else.

If you encounter this and you’re naive, it can lead to a lot of wreckage. If you do one project with one database,encounter some of its quirks and then immediately give up, you can pretty rapidly wind up with ten different databasesin production.

If you do that you miss out on the part of the curve that we call “mastery.” It’s possible that given enough time withsomething, you can reach a state of minimal problems. Probably not zero problems, but the situation will feel like it’sstabilized.

Now the given curve here, both the magnitude and the shape of it, varies across different kinds of technology. It’s truethat you’re probably going to have a better time with mysql than with mongodb. But you’re not going to have zeroproblems with mysql, and you should not expect that on the path to mastery.

There’s an unfortunate dilemma inherent in mastering your tools: having done that, you know where the bodies areburied. Familiarity with tools can breed contempt.

There are tradeoffs with every tool. You always have things that are good,

and you have things that aren’t great. That’s just the reality of mapping technology solutions onto problems inimperfect ways.

Human nature is to obsess about the pain points. Or at least this is my nature. I think a lot of engineers suffer from thesame thing, though, and technology doesn’t help. We don’t usually set up alerts reminding us about how welleverything is going, if we all just step back and reflect. Although that’s a good idea for your next hack week.

So it’s also human nature to look at another piece of technology and notice that it solves a couple of those pain points.And this is the definition of naïveté in engineering.

Because as we’ve seen, we might not even think to ask a bunch of questions about a new piece of tech that we shouldbe asking.

There can be a lot of pain points hidden in our own blind spots.

So we recognize that we have all of these cognitive issues: we’re susceptible to the green grass fallacy. We know wewill tend to give up on our tools too quickly. We’re all people who got into this business because we like technology,and that will lead us to chase shiny new stuff.

Humans are amazing animals that have figured out a method for containing the damage created by our ownpsychology. It’s called “society.”

The way we protect ourselves from our own natures is to have a process. Don’t let technology choices happen withoutdiscussion. Have a process.

Real happiness comes from what you can do after conquering technical choices, not from what you get from makingtechnical choices.

There’s a tendency among programmers to think that if they’re writing code, by definition they’re not wasting their time.This is a tar pit.

Real happiness comes from achieving your higher-level goals. Not from solving interesting technical riddles that youcreate for yourself.