Upload
vudieu
View
219
Download
0
Embed Size (px)
Citation preview
Serendipity and strategy in rapid innovationT. M. A. Fink∗†, M. Reeves‡, R. Palma‡ and R. S. Farr†
†London Institute for Mathematical Sciences, Mayfair, London W1K 2XF, UK∗Centre National de la Recherche Scientifique, Paris, France‡BCG Henderson Institute, The Boston Consulting Group, New York, USA
Innovation is to organizations what evolution is to organisms: itis how organisations adapt to changes in the environment andimprove [1]. Governments, institutions and firms that innovateare more likely to prosper and stand the test of time; thosethat fail to do so fall behind their competitors and succumbto market and environmental change [2, 3]. Yet despite steadyadvances in our understanding of evolution, what drives inno-vation remains elusive [1, 4]. On the one hand, organizationsinvest heavily in systematic strategies to drive innovation [5–8]. On the other, historical analysis and individual experiencesuggest that serendipity plays a significant role in the discoveryprocess [9–11]. To unify these two perspectives, we analyzedthe mathematics of innovation as a search process for viabledesigns across a universe of building blocks. We then testedour insights using historical data from language, gastronomyand technology. By measuring the number of makeable designsas we acquire more components, we observed that the relativeusefulness of different components is not fixed, but cross eachother over time. When these crossovers are unanticipated, theyappear to be the result of serendipity. But when we can predictcrossovers ahead of time, they offer an opportunity to strate-gically increase the growth of our product space. Thus we findthat the serendipitous and strategic visions of innovation canbe viewed as different manifestations of the same thing: thechanging importance of component building blocks over time.
Lego game. Let’s illustrate the idea using Lego bricks. Thinkback to your childhood days. You’re in a room with two friendsBob and Alice, playing with a big box of Lego bricks—say, afire station set. All three of you have the same goal: to build asmany new toys as possible. As you continue to play, each of yousearches through the box and chooses those bricks that you be-lieve will help you reach this goal. Let’s now suppose each playerapproaches this differently. Your approach is to follow your gut,arbitrarily selecting bricks that look intriguing. Alice uses whatwe call a short-sighted strategy, carefully picking Lego men andtheir firefighting hats to immediately make simple toys. Mean-
R
F
X
Language
0 2 4 6 8 10 12 14 16 18 20 22 24 26
1
10
100
1000
104
Acquired letters
Makeablewords
(usefulness)
cayennecocoa
lime
Gastronomy
0 127 254 381
1
10
100
1000
104
Acquired ingredients
Makeablerecipes(usefulness)
Rails
jQuery UI
Sauce Labs
Technology
0 331 662 993
0.5
1
5
10
50
100
Acquired development tools
Makeablesoftware(usefulness)
RFX3rd
2nd1st
Rank cayenne
cocoalime3rd
2nd1st
Rank Rails
jQuery UISauce Labs3rd
2nd1st
Rank
FIG. 1: Products, components and usefulness. (Top) We studied products and components from three sectors. In language, the products are79,258 English words and the components are the 26 letters. In gastronomy, the products are 56,498 recipes from the databases allrecipes.com,epicurious.com, and menupan.com [12] and the components are 381 ingredients. In technology, the products are 1158 software productscatalogued by stackshare.io and the components are 993 development tools used to make them. (Bottom) The usefulness of a componentis the number of products we can make that contain it. We find that the relative usefulness of a component depends on how many othercomponents have already been acquired. For each sector, we show the usefulness of three typical components: averaged at each stage over allpossible choices of the other acquired components and—for gastronomy—for a particular random order of component acquisition (points).
while, Bob chooses pieces such as axels, wheels, and small baseplates that he noticed are common in more complex models,even though he is not able to use them straightaway to producenew toys. We call this a far-sighted strategy.
Who wins. At the end of the day, who will have innovatedthe most? That is, who will have built the most new toys? Wefind that, in the beginning, Alice will lead the way, surgingahead with her impatient strategy. But as the game progresses,fate will appear to shift. Bob’s early moves will begin to lookserendipitous when he is able to assemble a complex fire truckfrom his choice of initially useless axels and wheels. It will seemthat he was lucky, but we will soon see that he effectively cre-ated his own serendipity. What about you? Picking componentson a hunch, you will have built the fewest toys. Your friends hadan information-enabled strategy, while you relied on chance.
Spectrum of strategies. What can we learn from this? If in-novation is a search process, then your component choices to-day matter greatly in terms of the options they will open upto you tomorrow. Do you pick components that quickly formsimple products and give you a return now, or do you choosethose components that give you a higher future option value?By understanding innovation as a search for designs across auniverse of components, we made a surprising discovery. Infor-mation about the unfolding process of innovation can be usedto form an advantageous innovation strategy. But there is noone superior strategy. As we shall see, the optimal strategy de-pends on time—how far along the innovation process we haveadvanced—and the sector—some sectors contain more oppor-tunities for strategic advantage than others.
Components and products. Just like the Lego toys are madeup of distinct kinds of bricks, we take products to be made upof distinct components. A component can be an object, like atouch screen, but it can also be a skill, like using Python, or aroutine, like customer registration. Only certain combinationsof components form products, according to some predetermineduniversal recipe book of products. Examples of products and
arX
iv:1
608.
0190
0v4
[ph
ysic
s.so
c-ph
] 1
7 M
ar 2
017
2
the components used to make them are shown in Fig. 1. Nowsuppose that we possess a basket of distinct components, whichwe can combine in different ways to make products. We havemore than enough copies of each component for our needs, sowe do not have to worry about running out. There are N possi-ble component types in total, but at any given stage n we onlyhave n of these N possible building blocks. At every stage, wepick a new type of component to add to our basket.
Usefulness. The usefulness of a component is the number ofproducts we can make that contain it [13]. In other words, theusefulness uα of some component α is how many more productswe can make with α in our basket than without α in our basket.As we gather more components, uα increases or stays the same;
E
A
I
R
N
T
O
S
L
C
U
D
M
P
H
G
Y
B
F
V
K
W
Z
X
J
Q
6
13
20
26
Language
eggwheatbutteroniongarlicmilkvegetable_oilcreamtomatoolive_oilblack_pepperpeppervanillacayennevinegarcane_molassesbell_peppercinnamonparsleychickenlemon_juicebeefcocoacornbreadscallionmustardgingerbasilcelerycarrotpotatochicken_brothyeastricemushroomcheesesoy_saucecuminoregano
95
190
286
381
Gastronomy
Google AnalyticsGitHubjQuerynginxBootstrapSlackJavaScriptNew RelicRedisGoogle AppsAmazon S3Amazon EC2GitAngularJSNode.jsMySQLAmazon CloudFrontTrelloRailsPostgreSQLRubyMongoDBPython
PingdomMixpanelMailChimp
PHPDockerMandrillSublime TextElasticsearch
StripeHeroku
Sass
SendGridGoogle Drive
npmJenkinsBowerGrunt
248
496
745
993
Technology
FIG. 2: Crossovers. The relativeusefulness of different componentschanges as the number of componentswe possess increases. For example, if youare only allowed six letters, the ones that showup in the most words are a, e, i, o, s, r. For gastro-nomy and technology, for clarity we only show the40 components most useful when we have all N components. A pureshort-sighted strategy acquires components in the order that theyintersect the diagonal; whereas a pure far-sighted strategy acquiresthem in the order that they intersect a vertical. If there are nocrossovers, the strategies are the same.
it cannot decrease. We write uα(n) to indicate this dependenceon n: uα(n) is the usefulness of α given possession of α andn−1 other components, the combined set of components beingn. Averaging over all choices of the n−1 other components fromthe N − 1 that are possible gives the mean usefulness, uα(n).
Usefulness experiment. To measure the usefulness of differentcomponents as the innovation process unfolds and we acquiremore components, we did the following experiments. Using datafrom each of our three sectors, we put a given component α intoan empty basket, and then added, one component at a time,the remaining N − 1 other components, measuring the useful-ness of α at every step. We averaged uα(n) over all possibleorders in which to add the N − 1 components to obtain uα(n).(We explain how in SI B.) We repeated this process for all ofthe components α. Typical results from these experiments areshown in Fig. 1. We find that the mean usefulnesses of differentcomponents cross each other as the number of components in
1
5
9
13
17
2156,498 recipes in total597 recipes in total
A B
RecipesRecipes
BIG KITCHEN381 ingredients: almond to zucchini
SMALL KITCHEN127 ingredients: almond to fenugreek
600 0 60000
Recipecomplexity
1
5
9
13
17
21
4801 recipes contain cocoa89 recipes contain cocoa
C D
Recipes with cocoaRecipes with cocoa 0100 0 1000
Cocoa is more useful than cayenne Cayenne is more useful than cocoa
1
5
9
13
17
21
7950 recipescontain cayenne
43 recipes contain cayenne
E F
Recipes with cayenneRecipes with cayenne 0100 0 1000
FIG. 3: Why crossovers happen. On the right is a big kitchen with381 ingredients. On the left is a small kitchen with one-third as manyingredients. In the big kitchen (B), we can make a total of 56,498recipes. Each bar counts recipes with the same number of ingredients(complexity). When we move to the smaller kitchen (A), the numberof makable recipes shrinks dramatically to 597, or 1.0%. But thisreduction is far from uniform across different bars. Higher bars shrinkmore, on average by an extra factor of 3 with each bar. Thus thenumber of recipes of complexity one (first bar) shrinks about 3-fold;the number of complexity two (second bar) 9-fold, and so on. Ofall the recipes in the big kitchen, 4801 contain cocoa (D) and 7950contain cayenne (F). The cayenne recipes tend to be more complex,containing on average 10.6 ingredients, whereas the cocoa recipes aresimpler, averaging 7.2 ingredients. Because higher bars suffer strongerreduction, overall fewer cayenne recipes (0.5%) survive in the smallerkitchen (E) than cocoa recipes (1.8%) (C). Thus cayenne is moreuseful in the big kitchen, but cocoa is more useful in the small kitchen.
3
our basket increases. As Fig. 1 shows for gastronomy, this istrue for both the average over all possible orderings of compo-nents (lines) as well as a specific random ordering (points).
Bumps charts. To visualise the relative usefulness of compo-nents over time, for each sector we created its “bumps chart”(Fig. 2). These show the rank order of mean usefulness at everystage of the innovation process. We see that the crossovers inFig. 1 are commonplace, but that some sectors contain morecrossovers than others. There are few crossings in language,some in gastronomy and many in technology. This means, forexample, that the most useful letters for making words in Scrab-ble (a basket of seven letters) are nearly the same as the mostuseful letters for making words with a full basket (26 letters);the key ingredients in a small kitchen (20 ingredients) are mod-erately different from those in a big one (80 ingredients); themost-used development skills for a young software firm (ex-perience with 40 tools) are significantly different from thosefor an advanced one (160 tools). We call components that donot cross in time isochronic, like the letters; and those that doanisochronic, like the tools.
Why crossovers happen. To understand why crossovers hap-pen, let’s have a closer look at how the mean usefulness in-creases for a single component (Fig. 3). To make a product ofcomplexity s, we must possess all s of its distinct components.So making a complex product is harder than making a simpleone, because there are more ways that we might be missing anecessary component. We therefore group together the prod-ucts we can make containing α according to their complexity.That is, the usefulness uα(n, s) of component α is how manymore products of complexity s we can make with α in our bas-ket than without α in our basket. Summing uα(n, s) over s givesuα(n). The advantage of this refined grouping is that, by un-derstanding the behaviour of uα(n, s), we can understand themore difficult uα(n). Our key result, which we prove in SI B, isthat uα(n, s)/ns−1 is constant over all stages of the innovationprocess. In other words, for two stages n and n′,
uα(n′, s) ' uα(n, s)(n′/n)s−1. (1)
This tells us that the number of products containing α of com-plexity s grows much faster for higher complexities than for
E A IR N T
OS
LC
UD M
PHG Y
B
FV
KW
ZX
J Q
A
6.6 6.8 7.0 7.2 7.4 7.6 7.8
1000
2000
5000
1×104
2×104
Valence: average complexity of words a letter is in
Usefulness:no.ofwordsaletterisin
eggwheat
butter onion
garlicmilk
vegetable_oilcream tomato
olive_oil
black_pepper
pepper
vanilla
cayenne vinegarcane_molasses
bell_peppercinnamon parsleychickenlemon_juice beefcocoa
cornbreadscallion
mustard
ginger
basil celerycarrotpotato
chicken_broth
yeast ricemushroomcheese
soy_sauce
cuminoregano
B
7 8 9 10 11 12
4000
8000
20000
Valence: average complexity of recipes an ingredient is in
Usefulness:no.ofrecipesaningredientisin Google Analytics
GitHubjQuery
nginx
Bootstrap Slack JavaScriptNew Relic
RedisGoogle Apps Amazon S3Amazon EC2
GitAngularJSNode.js
MySQLAmazon CloudFront
TrelloRails
PostgreSQL RubyMongoDB
PythonPingdom
Mixpanel MailChimpPHP Docker
Mandrill Sublime TextElasticsearchStripeHeroku
SassSendGrid npm
Jenkins
BowerGrunt
C
26 28 30 32 34 36 38
200
300
400
500
600
700
Valence: average complexity of software a tool is in
Usefulness:no.ofsoftwareproductsatoolisin
Language
D
0 5 10 15 20 25
10
100
1000
104
Acquired letters
Totalmakeablewords
Far-sighted strategy
Impatient strategy
Pseudo-random (alphabetical)
Gastronomy
E
0 50 100 150 200 250 300 350
10
100
1000
104
Acquired ingredients
Totalmakeablerecipes
Technology
F
0 200 400 600 8001
5
10
50
100
500
1000
Acquired development tools
Totalmakeablesoftware
FIG. 4: (ABC) Scatter plots of component usefulness versus component valence for our three sectors. For gastronomy and technology, weonly show the top 40 components; the complete set is in SI Fig. 5. (DEF) Both the short-sighted and far-sighted strategies beat a typicalrandom component ordering (here alphabetical), but they diverge from each other only insofar that there are crossings in the bumps charts.
lower complexities. Early on, uα(n, s) will tend to be small forhigher complexities, but depending on how far ahead we look,the bigger growth rate can more than compensate for this, aswe see in Fig. 3. Summing eq. (1) over size s, we find
uα(n′) ' uα(n, 1) + uα(n, 2)x+ uα(n, 3)x2 + . . . , (2)
where x = n′/n. The growth of the mean usefulness of αstrongly depends on the complexity of products containing α.
Valence. So far we have only characterised a component byits usefulness: the number of products we can make that containit. Now we introduce another way of describing a component:the average complexity of the products it appears in. We callthis the valence. The valence vα of component α is the aver-age complexity of the products it appears in at stage N , whenwe have all N components. Think of the valence as the typi-cal number of co-stars a component performs with, plus one.We show the usefulness and valence for each of the componentsin our three sectors in Fig. 4ABC. More valent components areunlikely to be useful until we possess a lot of other components,so that we have a good chance of hitting upon the ones theyneed. These are the wheels and axels in our Lego set. On theother hand, less valent components are likely to boost our prod-uct space early on, when we have acquired fewer components.These are the Lego men and their firefighting hats. This insightsuggests that more valent components will tend to rise in rela-tive usefulness, and less valent components fall. This is verifiedin our experiments: components on the right of the plots inFig. 4ABC tend to rise in the bumps charts in Fig. 2, such asonion, tomato, Javascript and Git; whereas components on theleft tend to fall, like cocoa, vanilla, Google Apps and SendGrid.
Interpreting crossovers. A crossover in the usefulness of com-ponents means that the things that matter most today arenot the same as the things that will matter most tomorrow.How we interpret crossovers in practice depends on whetherthey are unanticipated, and take us by surprise, or anticipated,and can be planned for and exploited. When they are unantic-ipated, beneficial crossovers can seem to be serendipitous. Butwhen they can be anticipated, crossovers provide an opportu-nity to strategically increase the growth of our product space.To harness this opportunity, we turn to forecasting component
4
crossovers using the complexity of products containing them.Short-sighted strategy. To maximise the size of our product
space when crossovers are unanticipated, the optimal approachis to acquire, at each stage, the component that is most usefulfrom the ones that are remaining. Think of this as a “greedy”approach. It has a geometric interpretation: it is equivalent toacquiring the components that intersect the diagonals in Fig.2. At every stage we lock in to a specific component, unawareof the future implications of the choices we make. A componentpoorly picked is an opportunity lost.
Far-sighted strategy. Using only information about the prod-ucts we can already make with our existing components, how-ever, we can forecast the usefulness of our components into thefuture. Eq. (2) shows us how, and we give an example in SI C.Here the optimal approach is to acquire the component that willbe most useful at some later stage n′. This also has a geomet-ric interpretation: it is equivalent to acquiring the componentsthat intersect a vertical at n′ in Fig. 2, and thus depends onhow far into the future we forecast.
Strategy comparison. A short-sighted strategy considers onlythe usefulness uα, whereas a far-sighted strategy considers boththe usefulness uα and the valence vα. Short-sighted maximiseswhat a potential new component can do for us now, whereas far-sighted maximises what it could do for us later. Depending onour desire for short-term gain versus long-term growth, we havea spectrum of strategies dependent on n′. A pure short-sightedstrategy (n′ = n) and a pure far-sighted strategy (n′ = N)are compared in Fig. 4DEF. Like the Lego approaches of Boband Alice, both strategies beat acquiring components in a ran-dom order. As our theory predicts, the extent to which thetwo strategies differ from each other increases with the numberof crossovers. For language, they are nearly identical, becausethere are hardly any crossovers. For gastronomy, short-sightedhas a two-fold advantage at first, but later far-sighted wins bya factor of two. For technology, short-sighted surges ahead byan order of magnitude, but later far-sighted is dominant.
Serendipity and strategy. Our research helps resolve the ten-sion between a strategic approach to innovation, which viewsinnovation as a rational process which can be measured andprescribed [3, 4, 7, 8]; and a belief in serendipity and the intu-ition of extraordinary individuals [9–11]. A strategic approachis seen in firms like P&G and Unilever, which use process manu-als and consumer research to maintain a reliable innovation fac-tory [14], and Zara, which systematically scales new productsup and down based on real-time sales data. In scientific discov-ery, “traditional scientific training and thinking favor logic andpredictability over chance” [9]. If discoveries are actually madein the way that scientific publications suggest, the path to in-vention is a step-by-step, rational process. On the other hand,a serendipitous approach is seen in firms like Apple, which isnotoriously opposed to making innovation choices based on in-cremental consumer demands, and Tesla, which has invested foryears in their vision of long-distance electric cars [15]. In science,many of the most important discoveries have serendipitous ori-gins, in contrast to their published step-by-step write-ups, suchas penicillin, heparin, X-rays and nitrous oxide [9]. The role ofvision and intuition tend to be under-reported: a study of 33major discoveries in biochemistry “in which serendipity playeda crucial role” concluded that “when it comes to ‘chance’ fac-tors, few scientists ‘tell it like it was’” [16, 17].
Serendipity. Writing about the The Three Princes ofSerendip, Horace Walpole records that the princes “were al-ways making discoveries, by accidents and sagacity, of thingsthey were not in quest of”. Serendipity is the fortunate develop-ment of events, and many organizations and researchers stressits importance [9, 10]. Crossovers in component usefulness helpus see why. Components which depend on the presence of many
others can be of little benefit early on. But as the innovationprocess unfolds and the acquired components pay off, the re-sults will seem serendipitous, because a number of previouslylow-value components become invaluable. Thus, what appearsas serendipity is not happenstance but the delayed fruition ofcomponents reliant on the presence of others. After the acqui-sition of enough other components, these components flourish.For example, the initially useless axels and wheels were laterfound to be invaluable to building many new toys. In a similarway, the low value attributed to Flemming’s initial identifica-tion of lysosome was later revised to high value in the yearsleading to the discovery of penicillin, when other needed com-ponents emerged, such as sulfa drugs which showed that safeantiseptics are possible [9]. Interestingly, the word “serendip-ity” does not have an antonym. But as our bumps charts show,for every beneficial shift in a crossover, there is a detrimentalone. Each opportunity for serendipity goes hand-in-hand with achance for anti-serendipity : the acquisition of components use-ful now but less useful later. Avoiding these over-valued compo-nents is as important as acquiring under-valued ones to securinga large future product space.
Strategy. Our research shows that the most importantcomponents—materials, skills and routines—when an organiza-tion is less developed tend to be different from when it is moredeveloped. Instead, the relative usefulness of components canchange over time, in a statistically repeatable way. Recognisinghow an organization’s priorities depend on its maturity enableit to balance short-term gain with long-term growth. For ex-ample, our insights provide a framework for understanding thepoverty trap. When a less-developed country imitates a more-developed country by acquiring similar production capabilities[6], it is unable to quickly reap the rewards of its investment,because it does not have in place enough other needed capabil-ities. This in turn prevents it from further investment in thoseneeded components. Our analysis gives quantitative backing tothe “lean start-up” approach to building companies and launch-ing products [18]. Start-ups are wise to employ a short-sightedstrategy and release a minimum viable product. Without the re-sources to sustain a far-sighted approach, they need to quicklybring a simple product to market. On the other hand, firmsthat can weather an initial drought will see their sacrifice morethan paid off when their far-sighted approach kicks in. By track-ing how potential new components combine with existing ones,organisations can develop an information-advantaged strategyto adopt the right components at the right time. In this waythey can create their own serendipity, rather than relying onintuition and chance.
[1] D. Erwin, D. Krakauer, ‘Insights into innovation’, Science 304, 1117(2004).
[2] M. Reeves, K. Haanaes, J. Sinha, Your Strategy Needs a Strategy(Harvard Business Review Press, 2015).
[3] C. Weiss et al., ‘Adoption of a high-impact innovation in a homoge-neous population’, Phys Rev X, 4, 041008 (2014).
[4] J. McNerney et al., ‘Role of design complexity in technology im-provement’, Proc Natl Acad Sci, 108, 9008 (2011).
[5] R. Van Noorden, ‘Physicists make ‘weather forecasts’ for economies’,Nature, 1038, 16963 (2015):.
[6] A. Tacchella et al., ‘A new metric for countries’ fitness and products’complexity’, Sci Rep, 2, 723 (2012).
[7] P. Drucker, ‘The discipline of innovation’, Harvard Bus Rev 8, 1(2002).
[8] V. Sood et al., ‘Interacting branching process as a simple model ofinnovation’, Phys Rev Lett, 105, 178701 (2010).
[9] M. Rosenman, ‘Serendipity and scientific discovery’, Res UrbanEconomics, 13, 187 (2001).
[10] F. Johansson, ‘When success is born out of serendipity’, HarvardBus Rev 18, 22 (2012).
[11] W. Isaacson, The Innovators: How a Group of Hackers, Geniuses,and Geeks Created the Digital Revolution, (2014).
[12] Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabsi, ‘Flavor net-work and the principles of food pairing’, Sci Rep 1, 196 (2011).
5
[13] We make no assumptions about the values of different products,which will depend on the market environment and may change withtime. But we can be sure that maximising the number of products isa proxy for maximising any reasonable property of them. A similarproxy is used in evolutionary models, where evolvability is defined asthe number of new phenotypes in the adjacent possible (1-mutationboundary) of a given phenotype; see A. Wagner, ‘Robustness andevolvability: a paradox resolved’, Proc Roy Soc B 91, 275 (2008).
[14] B. Brown, S. Anthony, ‘How P&G tripled its innovation successrate’, Harvard Bus Rev 6 (2011).
[15] K. Bullis, ‘How Tesla is driving electric car innovation’, MIT TechRev, 8 (2013).
[16] J. Comroe, ‘Roast pig and scientific discovery: Part II’, Am RevRespir Dis, 115, 853 (1977).
[17] F. Tria et al., ‘The dynamics of correlated novelties’, Sci Rep 4,5890 (2014).
[18] E. Ries, The Lean Startup, (Portfolio Penguin, 2011).
Online supplementary information (SI)
A. DataOur three data sets—described in Fig. 1—were obtained as fol-lows. In language, our list of 79,258 common English words isfrom the built-in WordList library in Mathematica 10. Of the84,923 KnownWords, we only considered those made from the26 letters a–z, ignoring case: we excluded words containing ahyphen, space, etc. In gastronomy, the 56,498 recipes can befound in the supplementary material in [12]. In technology, the1158 software products and the development tools used to makethem can be found at the site stackshare.io.
B. Proof of components invariantLet α be some component. Let N1 be the set of N − 1 otherpossible components not including α, n1 be a subset of n − 1components chosen from N1, and s1 be a subset of s − 1 com-ponents chosen from n1. The usefulness uα(n, s) is how manymore products of complexity s that we can make from the com-ponents n1 together with α, than from the components n1 alone:
uα(n, s) =∑s1⊆n1
prod(α ∩ s1)− prod(s1),
where prod(α∩ s1) takes the value 0 if the combination of com-ponents α ∩ s1 forms no products of complexity s and 1 ifα ∩ s1 forms one product of complexity s. (Occasionally, thesame combination of components α ∩ s1 forms multiple prod-ucts: for example, beef, butter and onion together form two dis-tinct recipes of length three. In such cases, prod(α ∩ s1) takesthe value 2 if α ∩ s1 forms two products, and so on.) The ex-pected usefulness of component α, uα(n, s), is the average ofuα(n, s) over all subsets n1 ⊆ N1; there are
(N−1n−1
)such subsets.
Therefore
uα(n, s) = 1/(N−1n−1
) ∑n1⊆N1
uα(n, s)
= 1/(N−1n−1
) ∑n1⊆N1
∑s1⊆n1
prod(α ∩ s1)− prod(s1).
Consider some particular combination of components s ′1. Thedouble sum above will count s ′1 once if s = n, but multiple timesif s < n, because s ′1 will belong to multiple sets n1. How many?In any set n1 that contains s1, there are n − s free elementsto choose, from N − s other components. Therefore the doublesum will count every combination s1 a total of
(N−sn−s
)times, and
uα(n, s) =(N−sn−s
)/(N−1n−1
) ∑s1⊆N1
prod(α ∩ s1)− prod(s1)
= N/n(ns
)/(Ns
)uα(N , s).
The same must be true when we replace n by n′, and therefore
uα(n, s)n/(ns
)= uα(n′, s)n′/
(n′
s
). (3)
When the number of components is big compared to the prod-
uct size (n, n′ � s), we can approximate(ns
)and
(n′
s
)by ns
and n′s, and thus
uα(n, s)/ns−1 ' uα(n′, s)/n′s−1.
For simplicity, we use this approximation in the mainmanuscript, but we could just as well have used the exactexpression in eq. (3).
C. Forecasting crossovers in usefulnessHere we show how we can forecast the usefulness of componentsat stage n′ from information we have at some earlier stagen, where n is the number of components we have acquired.As in Fig. 3, we have a set k of 127 ingredients in a smallkitchen—almond to fenugreek—and a set K of 381 ingredientsin a big kitchen—almond to zucchini.
In the small kitchen, we can make a total of 597 recipes.Of these 597 recipes, 43 contain cayenne, but they are not allequally complex. Two of the 43 recipes contain one ingredient(namely, cayenne itself) and have complexity one; one recipecontains two ingredients and has complexity two; 18 containthree ingredients and have complexity three; and so on. Simi-larly, 89 of the 597 recipes contain cocoa: six have complexityone; 22 have complexity two; and so on. Using eq. (2), we canwrite the mean usefulness of these two components as
uca(n′|k ) ' 2 + x+ 18x2 + 12x3 + 8x4 + x5 + x7 and
uco(n′|k ) ' 6 + 22x+ 37x2 + 16x3 + 8x4,
where x = n′/127. As expected,
uca(n′|k )∣∣x=1
= 43 and
uco(n′|k )∣∣x=1
= 89.
In the big kitchen, we can make a total of 56,498 recipes.Of these, 7950 contain cayenne and 4801 contain cocoa. Againusing eq. (2),
uca(n′|K ) ' 2 + 19x+ 64x2 + . . .+ 2x28 + 2x30 and
uco(n′|K ) ' 6 + 54x+ 195x2 + . . .+ 2x20 + 3x21.
where x = n′/381. As expected,
uca(n′|K )∣∣x=1
= 7950 and
uco(n′|K )∣∣x=1
= 4801.
So far, none of this is surprising. The punchline is that we canestimate the usefulness of components in the big kitchen fromwhat we know about our small kitchen. To do so, we simplyevaluate the small-kitchen polynomials at the big-kitchen stage:
uca(n′|K )∣∣n′=381
' uca(n′|k )∣∣x=3' 3569 and
uco(n′|K )∣∣n′=381
' uco(n′|k )∣∣x=3' 1485.
In log terms—log usefulness being the natural unit of measure—these are accurate to within 11% and 9% of the true values. Inparticular, this predicts the crossover of cayenne and cocoa inFigure 3.
6
eggwheat
butteroniongarlic
milk vegetable_oilcream tomatoolive_oil black_pepper
peppervanillacayennevinegarcane_molasses
bell_peppercinnamonparsley
chickenlemon_juice
beefcocoa cornbread scallionmustardginger basil celerycarrotpotato chicken_brothyeast rice mushroomcheese
soy_sauce
cuminoregano
parmesan_cheese
macaronilardlemon
thyme
cheddar_cheesecream_cheesewalnut
starch green_bell_pepper
nutmeghoneyapple
almondcilantropecan white_wine
baconpork
beanraisin
rosemaryfish cucumberolivecoconutorangeorange_juice
tamarindvegetablebuttermilkpineapple
shrimp corianderbay
lime_juicegelatin red_winepork_sausagesesame_oilchive
seedham mozzarella_cheese
oat
turmericnut
shallot
lettuceciderdill pea
zucchini
cherrylime
strawberry yogurt soybean
peanut_butter celery_oil
banana
meat tabasco_pepper
milk_fat
cabbage
mint
cranberry
fennel
sagebroccoli
turkey
wine fenugreek
beef_broth
grape_juice pumpkin
raspberry
whole_grain_wheat_flourcoffee
lemon_peel
sesame_seed avocadosherrysakefeta_cheeseapricot
rum
roasted_sesame_seedorange_peel
squash
crab
marjorampeach
swiss_cheesesweet_potato shiitakeradish
fruit tarragonblack_beanlamb
maple_syrup pear
blueberry
clamsmokepeanut tuna
kidney_bean
asparagussalmon
leek chickpea
blue_cheese
brandy
artichoke
mangohorseradish
date
white_breadcardamom oyster
cottage_cheesegrapebrown_ricecauliflower cured_pork
egg_noodlebeer
hazelnutmandarin
plum
romano_cheesepimentoscallop
smoked_sausage
goat_cheese
lentilcurrant saffronbarleybeet
carawaysquid
corn_flake anise
roasted_beef
pistachio peanut_oil
cereal
cashew
vealseaweedsauerkraut
turnipberry
kelp tomato_juicecod
blackberry
rhubarbprovolone_cheese
roasted_peanutcitrusmusselcorn_grit
chinese_cabbagemelon
bourbon_whiskey chicory lima_beanwhiskey
peppermint tequila fig
parsnip
lemongrass
watercressrye_flour savorylobster
roasted_pork
grapefruit mace
endivebrassicawatermelon
enokidakeporcini
kiwi wasabimacadamia_nuttea lime_peel_oilokra
champagne_winepopcorn smoked_salmon
kale
brussels_sprout rye_breadstar_anise thai_pepper
anise_seed yambitter_orange wheat_breadroot
buckwheatcatfishgin
cognacnirapotato_chip
rose
lavenderoatmeal
red_kidney_beanmatsutake
chervil
trufflechicken_liver
nectarine
bone_oil
katsuobushiport_winesour_cherry papayasour_milk
octopusgruyere_cheese
mackereltangerine
liverblack_tea
palmapple_brandyfrankfurter malt
cacao
cherry_brandy rutabaga
juniper_berryred_beanwood
green_tea haddock
flower
black_mustard_seed_oil
kumquatquince munster_cheese
shellfishblack_sesame_seed
caviarchayote
prawn
bartlett_pearroquefort_cheesecassavaeellicorice
passion_fruit
prickly_pearmung_bean
orange_flower
sassafras cabernet_sauvignon_wine
coconut_oilroasted_meat
sumacartemisia guava
japanese_plum salmon_roecamembert_cheese
concord_grapearmagnac
black_currantpear_brandy
beef_liver clove herringhuckleberry
mandarin_peelbaked_potato condiment
gardenialeaflingonberrylitchi ouzoblack_raspberry grape_brandy sunflower_oil
bergamot
carobjasmine
kohlrabismoked_fish violetelderberry
pork_liver spearmintblackberry_brandy
citrus_peel
sea_algae
balmcarnation
chamomilehop rapeseed
roasted_almondholy_basil
pimenta
raw_beef
red_algae
sheep_cheese
soybean_oilstrawberry_juice
4 6 8 10 12 14
10
100
1000
104
Average complexity of recipes an ingredient appears in (valence)
Numberofrecipesaningredientappearsin
(usefulness)
eggwheatbutteroniongarlicmilkvegetable_oilcreamtomatoolive_oilblack_pepperpeppervanillacayennevinegarcane_molassesbell_peppercinnamonparsleychickenlemon_juicebeefcocoacornbreadscallionmustardgingerbasilcelerycarrotpotatochicken_brothyeastricemushroomcheesesoy_saucecuminoreganoparmesan_cheesemacaronilardlemonthymecheddar_cheesecream_cheesewalnutstarchgreen_bell_peppernutmeghoneyapplealmondcilantropecanwhite_winebaconpork
beanraisin
rosemaryfishcucumberolivecoconutorangeorange_juicetamarindvegetablebuttermilkpineappleshrimpcorianderbaylime_juicegelatinred_winepork_sausagesesame_oilchiveseedhammozzarella_cheeseoatturmericnutshallotlettuceciderdillpeazucchinicherrylimestrawberryyogurtsoybeanpeanut_buttercelery_oil
95
190
286
FIG. 5: (Top) The valence-usefulness scatter plot for all ingredients that are used in two or more recipes (365 of the 381 ingredients).(Bottom) The relative usefulness of different ingredients as the number of ingredients we possess increases, for the 100 ingredients mostuseful when we have all 381 ingredients.
7
Google Analytics
GitHubjQuery
nginxBootstrap Slack JavaScriptNew Relic
RedisGoogle Apps
Amazon S3
Amazon EC2Git
AngularJS
Node.jsMySQL
Amazon CloudFrontTrelloRailsPostgreSQLRubyMongoDBPython PingdomMixpanel
MailChimp
PHP Docker
Mandrill
Sublime TextElasticsearch
StripeHeroku
Sass
SendGrid
Google DrivenpmJenkinsBower
GruntZendesk VagrantDropboxJava HTML5
Amazon Route 53Bitbucket gulpApache HTTP ServerWordPressCloudFlare
Amazon RDSJIRASentry Objective-C
Backbone.jsReactIntercomOptimizely
MemcachedjQuery UI
VimLessDigitalOcean Android SDK
HipChatMailgun
CoffeeScriptChefDjango VirtualBoxTravis CIGo InVisionTwilio
SkypePagerDutyAsanaXcode
Segment
RabbitMQUnderscore ExpressJSCircleCI
AnsibleRequireJS HAProxy
PayPalSidekiq CapistranoSeleniumMarkdownAtom
BrowserStackConfluence
Amazon SESVarnish D3.jsCodeship TestFlightPapertrail Android Studio
UnicornStatusPage.io CrashlyticsKISSmetrics GitHub PagesSwiftScala Handlebars.js Amazon ElastiCacheAmazon EBSEmber.js
Code ClimateFlaskOlark.NET
Logentries
GitLab
Puppet Labs
Pivotal TrackerCassandra
Amazon CloudWatchSourceTreeIntelliJ IDEA
BalsamiqAdRoll Datadog Amazon SQS
LaravelBrowserify
Postman
Socket.IORollbarGoogle Maps
Disqus
UserVoiceRackspace Cloud Servers
Mocha
BraintreeAmazon VPCPhpStorm
Keen IO
Parse
Nagios
MongoLab
HadoopBugsnag
Pusher
C#Basecamp
Amazon DynamoDBZapier
Foundation
Fastly
Celery
Solr
IonicHeap Jasmine
Compass boot2docker
Airbrake Visual Studio
Sinatra
Passenger
Chartbeat Yeoman
SymfonyHAML
Apache Maven
Spring
Salesforce Sales CloudMicrosoft SQL Server
FirebaseLinodeCompose
Amazon EC2 Container ServiceAlgolia
JadeFabric Buffer
LogglyHelp Scout Google App EngineFlux
AWS Elastic BeanstalkSQLiteDNSimpleCustomer.io
Windows AzureMaxCDNMarketo
Eclipse
Amazon Redshift
WebStorm
Meteor KarmaDesk.com
Sauce LabsLibratoKafka
GraphiteCloudinary
TeamCityPlayHubSpotGradleDeviseStatsD
RubyMine
PhoneGapCrazy Egg
Zookeeper RGoogle Compute EngineApache Tomcat
SoftLayer AWS CloudFormationAkamai
Webpack
TumblrMariaDBJekyllDrupalAmazon SNSAmazon EMR
Microsoft IISMemCachieriDoneThis
TrackJS
Piwik OpenStack
LogstashKibana
EmacsSwiftype
StylusQualaroo
Mustache
Material Design for AngularFilepicker
Coveralls
ClojureAWS Elastic Load Balancing (ELB)Salt RecurlyInfluxDB HighchartsFlowdockDyn
SVN (Subversion)
RaygunNeo4j
Hubot Gunicorn
ClickTale
Campaign MonitorApache Mesos
Perfect AudienceMongooseMEAN
ErlangEmbedly Consul WistiaPerlPacker OVH
HoneybadgerHockeyApp
Django REST frameworkC++ AWS OpsWorksAWS IAM
UserTestingStorm Stack OverflowPostmark
HBaseGrafana Flurry CodeIgniter
Amazon RDS for PostgreSQLZopimwerckerTornadoScout
RedmineOpenShift
IronMQHeroku Postgres
Cloud9 IDEAzure Websites Azure Storage
Apache Spark
Amazon KinesisYiiTerraform
ShopifySemantic UIPhabricator
Notepad++HarvestHackPad GoSquared
GhostGeckoboard
CouchDBApiaryZenPayroll
Puma
PubNub
MiddlemanMarionette
Looker
imgix
HHVM (HipHop Virtual Machine)GroovyFramer DeployBot ClickyAmplitude
waffle.ioUXPin
Sumo Logic SquarespaceSemaphoreSails.jsRunscopeRedis Cloud
OracleOneLogin Nexmo
NetBeans IDE
Litmus Jetty
Hogan.js
EdgeCast
Discourse C3.jsAviaryApp Annie
ZeroMQ Zencoder Urban AirshipUnbounceTransifex Tower sendwithusResquePyCharmPostGISLeaflet jQuery Mobile Join.meHelloSignGearman CakePHPBoxBeanstalkd BeanstalkAWS Lambda
YammerXamarin
20 30 40 50 60 70
10
50
100
500
Average complexity of software product a tool appears in (valence)
Numberofsoftwareproductsatoolappearsin
(usefulness)
Google AnalyticsGitHubjQuerynginxBootstrapSlackJavaScriptNew RelicRedisGoogle AppsAmazon S3Amazon EC2GitAngularJSNode.jsMySQLAmazon CloudFrontTrelloRailsPostgreSQLRubyMongoDBPython
PingdomMixpanelMailChimp
PHPDockerMandrillSublime TextElasticsearch
StripeHeroku
Sass
SendGridGoogle Drive
npmJenkinsBowerGruntZendesk
VagrantDropbox
JavaHTML5
Amazon Route 53Bitbucket
gulpApache HTTP Server
WordPress
CloudFlareAmazon RDS
JIRA
SentryObjective-CBackbone.js
ReactIntercom
OptimizelyMemcachedjQuery UI
VimLessDigitalOcean
Android SDKHipChat
MailgunCoffeeScript
ChefDjangoVirtualBox
Travis CIGo
InVision
TwilioSkype
PagerDutyAsana
XcodeSegment
RabbitMQUnderscore
ExpressJSCircleCI
Ansible
RequireJSHAProxy
PayPalSidekiqCapistrano
SeleniumMarkdownAtom
BrowserStackConfluenceAmazon SESVarnish
D3.jsCodeship
TestFlight
248
496
745
FIG. 6: (Top) The valence-usefulness scatter plot for the 365 technology tools most useful in making software products. (Bottom) Therelative usefulness of different tools as the number of tools we possess increases, for the 100 tools most useful when we have all 993 tools.