144
Platform Leadership in Open Source Software By Ken Chi Ho Wong Bachelor of Science, Computing Science, Simon Fraser University, 2005 SUBMITTED TO THE SYSTEM DESIGN AND MANAGEMENT PROGRAM IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN ENGINEERING AND MANAGEMENT at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2015 ©2015 Ken Wong. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author: ___________________________________________________________ Ken Wong System Design and Management Program February 2015 Advised by: ___________________________________________________________ Michael Cusumano SMR Distinguished Professor of Management & Engineering Systems MIT Sloan School of Management Certified by: ___________________________________________________________ Patrick Hale Director, System Design and Management Program Massachusetts Institute of Technology

Platform Leadership in Open Source Software

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

By

Ken Chi Ho Wong

Bachelor of Science, Computing Science, Simon Fraser University, 2005

SUBMITTED TO THE SYSTEM DESIGN AND MANAGEMENT PROGRAM

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE IN ENGINEERING AND MANAGEMENT

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

February 2015

©2015 Ken Wong. All rights reserved.

The author hereby grants to MIT permission to reproduce and to distribute publicly paper and

electronic copies of this thesis document in whole or in part in any medium now known or

hereafter created.

Signature of Author: ___________________________________________________________

Ken Wong

System Design and Management Program

February 2015

Advised by: ___________________________________________________________

Michael Cusumano

SMR Distinguished Professor of Management & Engineering Systems

MIT Sloan School of Management

Certified by: ___________________________________________________________

Patrick Hale

Director, System Design and Management Program

Massachusetts Institute of Technology

Page 2: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

This page is intentionally left blank.

Page 3: Platform Leadership in Open Source Software

i

Platform Leadership in Open Source Software

By

Ken Chi Ho Wong

Submitted to the System Design and Management Program on February 2015,

in Partial Fulfillment of the Requirements for the degree of

Master of Science in Engineering and Management.

Abstract

Industry platforms in the software sector are increasingly being developed in open source. Firms

seeking to position themselves as platform leaders with such technologies must find ways of

operating within the unique constraints of open source development. This thesis aims to

understand those challenges by analyzing the Android and Hadoop ecosystems through an

augmented version of Porter’s Five Forces framework proposed by Intel’s Andrew Grove.

The analysis finds that platform contenders in open source behave differently depending on

whether they focus on competing against alternative platforms or alternative providers of the

same platform as rivals. This focus informs key decisions that the firm takes, including how it

interacts with complementors and its approach to innovation. Due to the fact that open source

vendors tend to lack unilateral authority over technology decisions, they can only seek to

influence the behavior of the ecosystem by securing key relationships in the value network. In

particular, they must secure the right engineering talent, access to key complements and superior

paths to the customer.

The research highlights some of the factors and tactics platform contenders in Hadoop and

Android considered in acquiring these relationships. The open nature of FOSS (Free and Open

Source Software) also allow new technologies to emerge and change the definition of the

platform’s boundaries. This creates a further strategic challenge for open source platform

contenders.

Keywords: platform strategy, platform leadership, open source software, Hadoop, Android

Thesis Supervisor: Michael Cusumano

Title: SMR Distinguished Professor of Management & Engineering Systems

MIT Sloan School of Management

Page 4: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

ii

This page is intentionally left blank.

Page 5: Platform Leadership in Open Source Software

iii

Acknowledgement

This thesis was made possible by a number of individuals who generously shared their

time and expertise with me. There are only a few names on the cover of this document, but the

content within contains the wisdom and contributions of so many more.

I would especially like to thank Professor Michael Cusumano for his guidance and advice

throughout the entire journey. The breadth of his knowledge and depth of his insights on all

things related to platform strategy is simultaneously humbling and inspiring.

My understanding of the Hadoop ecosystem was greatly informed by a number of

enlightening conversations I’ve had with the thought leaders of that market. I am tremendously

grateful to Rob Bearden (CEO of Hortonworks), Ron Kasabian (GM of Big Data at Intel) and

Mike Olson (Founder and Chief Strategy Officer of Cloudera) for taking time to indulge the

curiosity of a student. The case study that sits at the heart of this thesis would not have been

possible without their assistance.

The time I spent at MIT was also enabled by the fantastic support I received from my

colleagues in SAP’s Analytics Division. In particular, I would like to thank Jesse Calderon, Don

Wakefield and Michael Reh for their sponsorship and encouragement during the past two years.

Though I am no longer a part of SAP, I will take the many things I’ve learned from these leaders

forward with me. The same goes to Pat Hale and the fantastic staff of the SDM program.

Finally, I would like to thank my family for their unwavering support and many sacrifices

that made it possible for me to complete my studies. To my amazing wife Sharon and the active

bundle of joy that she is currently carrying in her tummy: Completion of this program is made

even sweeter by the knowledge that I now have more time to spend with you. Your love is a true

blessing from God.

Page 6: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

iv

This page is intentionally left blank.

Page 7: Platform Leadership in Open Source Software

v

Table of Content

Abstract ................................................................................................................................ i

Acknowledgement ............................................................................................................. iii

Table of Content .................................................................................................................. v

Introduction ......................................................................................................................... 1

Approach and Structure ...................................................................................................... 3

Chapter 1 – Literature Review ............................................................................................ 5

Network Effects .............................................................................................................. 5

Product vs. Industry Platforms ....................................................................................... 7

Two-Sided Markets ......................................................................................................... 7

Topology of Platform Roles and Openness in a Platform-Mediated Network ............... 9

Platform Leadership and the “Four Levers” Framework ...............................................11

Lever 1: The Scope of the Firm .................................................................................11

Lever 2: Product Technology .................................................................................... 12

Lever 3: External Relationships ................................................................................ 12

Lever 4: Internal Organization .................................................................................. 13

Platform Establishment and Displacement ................................................................... 13

Open Source Software .................................................................................................. 16

Commercial Interest in Community-driven Development ........................................... 16

Related works on Commercial Open Source ................................................................ 20

Chapter 2 – Strategic Considerations for Open Source Leadership ................................. 23

IBM and Eclipse ........................................................................................................... 24

The Definition of Open Source Leadership .................................................................. 26

Page 8: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

vi

Google and Android ...................................................................................................... 30

Rivalry – Inter-network vs. Intra-network Competition ............................................... 34

Suppliers – Securing the Upstream Value Chain .......................................................... 38

Complementors – Identifying and Securing Critical Complements ............................. 43

Buyers – Controlling the Path to the Customer ............................................................ 45

Substitutes and New Entrants – The Threat of Shifting Platform Boundaries ............. 49

Chapter 3 – A Case Study on Hadoop ............................................................................... 57

History and Origins ....................................................................................................... 57

Hadoop and the Big Data Phenomenon ........................................................................ 59

The Relational Database ........................................................................................... 60

Hadoop to the Rescue ............................................................................................... 61

Architectural Overview ................................................................................................. 64

Distributed Storage ....................................................................................................... 65

Job Managers and Coordinators................................................................................ 65

Distributed Processing Frameworks ......................................................................... 66

Scripting Engines, Libraries and SQL on Hadoop .................................................... 68

Administration and Management .............................................................................. 70

Market Overview .......................................................................................................... 72

Strategic Factors affecting Platform Leadership within the Hadoop Ecosystem .......... 78

Rivalry - Inter-network vs. Intra-network Competition ............................................ 79

Suppliers - Securing the Upstream Value Chain ....................................................... 83

Complementors - Identifying and Securing Critical Complements .......................... 89

Buyers - Controlling the Path to the Customer ......................................................... 91

Substitutes and New Entrants - The Threat of Shifting Platform Boundaries .......... 94

Page 9: Platform Leadership in Open Source Software

Table of Content

vii

Chapter 4 - Conclusion ..................................................................................................... 99

Areas of Further Research ...................................................................................... 100

Appendix ......................................................................................................................... 101

List of Figures ................................................................................................................. 121

List of Tables ................................................................................................................... 123

References ....................................................................................................................... 125

Page 10: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

viii

This page is intentionally left blank.

.

Page 11: Platform Leadership in Open Source Software

1

Introduction

For the first decade of its existence, the idea of publically sharing source code appeared

to be fundamentally incompatible with the idea of building software for profit. Richard

Stallman, the founder of the GNU Project and a pioneer of the “free and open source software”

(“FOSS”) movement, framed the decision of developing proprietary versus open source software

as a “stark moral choice” between individual profit and the greater good [1].

Despite subsequent attempts by a multitude of individuals (including Stallman himself) to

clarify that the term ‘free software’ refers to the ability to use or modify a product freely and not

to its price, profit-seeking software firms in the late 1980s and the early 1990s largely opted for

proprietary development models in order to maximize appropriability. Firms such as Microsoft,

Oracle and SAP provided real-world evidence for the profitability of the proprietary model by

becoming some of the most valuable companies in the world. These vendors’ extraordinary

successes can be partially attributed to their ownership of proprietary industry platforms in

operating systems, database management systems and applications respectively. Auto-catalyzed

by powerful network effects, tremendously valuable business networks formed around the

technologies provided by these vendors, and these firms leveraged their exclusive ownership of

the core intellectual property to capture a disproportionally large amount of the value generated

in these ecosystems. In fact, some of these firms leveraged their dominant platform positions so

effectively that they were investigated for antitrust violations [2].

The success of these firms have captured the attention of academics and corporations

alike, and a substantial amount of effort has been put into understanding how aspiring platform

providers can replicate their successes. As a result, concepts such as ‘enveloping’, ‘coring’ and

‘tipping’ entered the business lexicon and the strategic management of ‘platform competition’

became a core concern of vendors competing in diverse technology markets ranging from mature

areas like application middleware to the nascent battlegrounds of mobile operating systems.

In many of these markets, open source technologies compete with the offerings of

commercial vendors, with a prominent example being Linux in the operating system space. The

success of these open source platforms have been greatly varied, but as end customers and

complement creators become aware of the powerful bargaining position held by proprietary

platform vendors, they are seeking to increase the substitutability of the platform by embracing

Page 12: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

2

open source technology. This behavior is especially common in markets where the cost of multi-

homing (concurrently adopting more than one platform) is high. In enterprise software, some

industry observers have pointed out that nearly all dominant ‘infrastructure’ technologies that

emerged in the last ten years have been open source [3]. Consequently, commercial platform

vendors are recognizing that the open source model is a powerful and occasionally necessary

mean to substantially increase the probability that a given platform gains widespread adoption.

Table 1 enumerates some recent examples of leading open source platforms in significant

markets created by corporate entities with the intent of commercial value extraction.

Given that platform technologies are increasingly being developed with the open source

model, a firm seeking to establish a position itself as a platform leader in a given space must find

a way to operate within the unique constraints and operating context of open source

development. Pre-existing frameworks for platform leadership management, such as Cusumano

and Gawer’s “Four Levers”, were predicated on the assumption that key decisions such as the

degree of architectural openness were within the platform provider’s locus of control. These

assumptions are invalidated in the FOSS world, and firms seeking to orchestrate the trajectory of

a given platform-based ecosystem need to find other means of exerting their influence; this

research paper is motivated by that need.

Market Platform Technology (Commercial Founder)

Mobile Operating Systems Android (Google), Sailfish OS (Jolla), Tizen (Samsung, Intel)

Cloud Platforms CloudStack (VMOps), OpenStack (Rackspace), Eucalyptus

(Eucalyptus), SmartOS (Joyent)

Content Management Wordpress (Automattic), Drupal (Acquia), Alfresco (Alfresco)

Data Management MySQL (MySQL AB), MongoDB (MongoDB), BigCouch

(Cloudant), Riak (Basho), Redis (VMWare, Pivotal), Impala

(Cloudera), Talend (Talend)

Application Middleware /

Framework

JBoss (JBoss), SpringSource (Springsource), Zend Framework

(Zend)

Table 1- Open source platforms by commercial firms

Page 13: Platform Leadership in Open Source Software

3

Approach and Structure

This thesis is divided into four chapters. In the first chapter, existing research on

platform strategy and open source business models is reviewed in order to establish the

vocabulary and concepts required for analyzing the topic. Those familiar with concepts around

network effects and existing literature on platform strategies are encouraged to skim through this

section.

The second chapter presents a short description of “open source platform leadership” is

presented for the purpose of framing the discussion to follow along with a composite framework

for understanding the strategies of open source platform vendors, inspired by Andrew Grove’s

Six Forces Model [4]. This framework is illustrated by a case study of Google’s Android

platform. This framework is also used to structure the case study on Apache Hadoop in the third

chapter. An introduction to the history of Hadoop, its relevance to the modern technology

marketplace and an overview view of its architecture are then presented along with the profiles

of key ecosystem players. The ecosystem is then analyzed using the framework established in

Chapter 2.

The selection of Hadoop was motivated by the significant technological and economic

impact of this platform technology as well as the author’s personal interest in the subject matter.

The inputs into the case study analysis include secondary research data from existing works, as

well as original data drawn through direct discussions with key influencers within the industry.

The intent of the Hadoop case study is not to project the future of the marketplace, but rather to

understand the strategies of individual vendors in order to appreciate the logic behind their

behavior.

Page 14: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

4

This page is intentionally left blank.

Page 15: Platform Leadership in Open Source Software

5

Chapter 1 – Literature Review

Network Effects

The concept of network effects originated in the telecommunication industry and was

formalized in economic models in the early 1970’s by Bell Labs researcher Roland Artle,

Christian Averous and Jeffrey Rohlfs [5][6]. These economists identified a unique type of

consumption externality that occurs in the telecommunication industry known as network effects

or network externalities. They noted that when a customer chooses to ‘consume’ a specific

networked product connecting to that network, that decision does not only bring value to that

customer but also to all the other members of that network who were external to that

consumption decision.

In a paper written approximately a decade later, Michael Katz and Carl Shapiro advanced

the concept to industries beyond telecommunication. The essence of the concept is that the value

of any given product is not always strictly a function of the product’s intrinsic quality but that

there are many markets beyond telecommunication where “the utility that a given user derives

from the good depends upon the number of other users who are in the same ‘network’ as is he or

she.” [7]. The network referenced in the aforementioned quote does not refer only to

connections between end users, but also the connections between interdependent firms offering

compatible, complementary products and services for those end consumers. For many such

networks, existing consumers participating in the network do not directly benefit when new

consumers join the network, but they benefit indirectly as more complementary firms are

attracted to the network by these additional consumers. These new complementary firms offer

additional services or capabilities that increase the value of the network, benefiting the original

consumers. The illustrative example that the researchers used was the personal computer

market; the more users adopt a given computer platform, the more likely software producers will

develop for that platform, bringing more value to the existing users. This phenomenon is also

known as increasing returns to adoption. It is worth noting that the effect of network externality

is bidirectional in that it can catalyze both adoption as well as abandonment of a platform. Users

fleeing a platform reduce its value, increasing the relative attractiveness of alternative platforms

and thereby accelerating deflection. Figure 1 provides a simple system dynamics model

illustrating this phenomenon.

Page 16: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

6

Figure 1 – A simple system dynamics model illustrating the self-reinforcing behaviors of platform adoption and abandonment

due to network externalities (original creation)

Shapiro and Katz also produced a formal economic model of “network competition”,

which provided a basis for understanding the competitive dynamics of markets where multiple

alternative networks compete for the same customer. In the aforementioned personal computer

market, the Windows and Apple ecosystems compete effectively for the same market of personal

computer users. One major insight captured in Shapiro and Katz’s model was the fact that

consumers base their adoption or purchase decision on the expected size of a given network and

not just the current size of the network [7].

The above phenomena combine to create in a demand-side economies of scale, resulting

in natural market equilibriums where the dominant winner takes most, if not all, of industry

market share[8]–[10]. This autocatalytic nature of network effects is the reason that firms

compete for the position of being the provider of industry platforms.

+

Platform

ParticipantsAdoption

R

Network Effect

(+)

Abandonment

Network Effect (-)

R

Page 17: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

7

Product vs. Industry Platforms

The term “platform” is a heavily overloaded word in the context of product development.

The term is often used to reference common componentry that is used to build a portfolio of

related products (or “product family”). The motivation behind the creation of such platforms is

varied, but generally revolves around the idea that efficiencies can be gained by sharing the

common costs, risks and benefits of development and manufacturing across multiple products.

Examples of such platforms can be found abundantly in the automobile industry, where the vast

majority of vendors offer a large number of product variants based on a much smaller number of

base platforms. For the purpose of disambiguation, researchers refer to this concept as “product

platform”.

According to de Weck, Suh and Chang, the design of a product platform is a firm-internal

optimization problem; a firm must search through the space of platform design possibilities in

order to identify a design that maximizes the cost savings of component reuse, while

simultaneously minimizing the compromises associated with component sharing [11]. In

contrast, the search space for the design of an industry platform is far larger by definition, and

the analysis of such platforms is not bounded to a single firm. Industry platforms are the

technological infrastructure that allows independently evolving goods and services from different

firms to be connected together into an interdependent system that creates value [12]. This thesis

is directed at studying the strategies of firms attempting to develop such industry platforms and

consequently all subsequent references to “platforms” are made in this vein.

Two-Sided Markets

Platform-mediated technology ecosystems are often modeled as two-sided markets. On

one side of the platform sits customers (e.g. Personal Computer users), who are trying to

consume the combined solution that consist of the platform (e.g. Windows) and complementary

products (e.g. Application Software) offered by the suppliers residing on the other side. This

model of an ecosystem allows for more precise characterization of the different types of network

externalities occur within a platform-based ecosystem, allowing scholars to differentiate between

“same-side” and “cross-side” network effects. In general, “cross-side” effects are generally

reinforcing. The value of the platform increases for a consumer when there are additional

complementors that join the network, and vice versa. In contrast, “same-side” effects are

Page 18: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

8

typically reinforcing on the consumer side and balancing on the complementors’ side.

Additional consumers increase the viability of the platform and therefore its value to other

consumers. However, additional complement suppliers increase the level of competition within

the platform and diminishes its value for other suppliers. Figure 2 provides a simple system

dynamic model illustrating these different forces.

Figure 2 - A simple system dynamics model illustrating the two different types of network effect at work in a two-sided platform

(original creation)

Due to cross-side network effects, vendors on a given platform may welcome the

entrance of additional competition in the form of other complementary vendors. This can occur

if the entrance of additional vendors increases the viability and attractiveness of the platform and

these gains sufficiently offset the effects of additional competition. This is especially true in

cases where there are barriers preventing complementors from “multi-homing” on multiple

platforms and the intensity of network-level competition between platforms exceeds that of

individual complementors. As an illustrative example, software vendors invested in developing

natively on Blackberry’s OS10 mobile operating system are likely to welcome additional

vendors to develop apps for that platform, as a more vibrant app ecosystem is expected to offset

additional competition that they would face within the Blackberry App Store. Consistent with

this observation, Kevin Boudreau found that an increase in the variety of software application

producers within a mobile application ecosystem “increases innovation incentives” due to

+

Potential

Customers

Platform

Customers

Potential

Complementors

Platform

Complementors

Adoption

Complementor

Platform Adoption

-

+

R

Cross-Side

Network Effect

B

R

Same-Side

Network Effect

Complementor

Competition

+

Page 19: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

9

network effects [13]. This phenomenon echoes Shapiro and Katz’s earlier research, which

showed that under network competition, a monopolist complement supplier within a given

network would counter-intuitively benefit from the entry of additional complement suppliers.

However, Boudreau noted that an increase in similar types of application producers actually

diminishes the motivation of developers as they become “crowded out” of the market.

The two-sided model also illustrates that platform providers must find ways of attracting

participants to both sides of the platform in order for the ecosystem to become viable. The study

of how platform vendors manage this has been a subject of great interest. One strategy is cross-

side subsidies. A two-sided platform provider may focus its monetization strategy on a single

side and opt to offer “free” or heavily subsidized goods and services on the other. Van Alstyne

and Parker showed that by lowering prices on one side of the network, platform providers can

change the shape of the demand curve on the other side, resulting in a net increase in overall firm

profits. As each “side” of the platform represents markets in their own right, this results in an

interesting phenomenon where an effective monopolist in a given market may volunteer to lower

its price below its marginal cost in order to maximize profits. For example, video game platform

vendors like Sony or Microsoft often choose to offer their software development toolkits to video

game producers for free (or close to free), despite the fact that they are effectively the only

supplier for that essential ingredient to video game production [14]. Conversely, price increases

on one-side of the network, even in a price-inelastic market, may have the counter-intuitive

effect of lowering organizational profit due to its negative impact on demand on the other side;

the cross-side implications of price changes in a two-sided network makes pricing a complicated

matter.

Topology of Platform Roles and Openness in a Platform-Mediated Network

Eisenmann, Parker and Van Alstyne identified four distinct roles that network participants

can play in participating in a platform-mediated network [15]. Beyond identifying “demand-

side” and “supply-side” platform users, which refer to consumers and complement providers

respectively, the trio further differentiated between “platform providers” and “platform sponsors”

(Figure 3). The platform provider acts as the “primary point of contact” for users on both sides

of the platform while the platform sponsor is responsible for determining which parties may

participate in the network. For example, banks such as Citi or Barclays act as platform providers

Page 20: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

10

for credit payment networks, whereas Visa itself acts as the platform sponsor. The trio asserted

that platforms differ in their degree of openness to these different roles. Based on the

categorization provided by this group, the “sponsor” role of Linux are occupied by the open

source community and therefore highly open. Table 2 enumerates a few select computing

platforms and the openness of their various platform roles as identified by Eisenmann, Parker

and Van Alstyne.

Linux Windows Macintosh iPhone

Demand-side Platform User Open Open Open Open

Supply-side Platform User Open Open Open Closed

Platform Provider Open Open Closed Closed

Platform Sponsor Open Closed Closed Closed

Table 2 - Comparison of openness by role in platform-mediated networks. Reproduced. [15]

Figure 3 –Roles and Relationships in a Platform-Mediated Network according to Parker, Eisenmann and Van Alstyne

(Reproduction of Figure 2) [15]. An open source platform market can be viewed as a market where the role of the platform

sponsor is played by an open source community.

Page 21: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

11

Platform Leadership and the “Four Levers” Framework

During its explosive growth phase in the nineties, Microsoft and Intel jointly led the

platform powering the personal computer (PC) market. Annabelle Gawer hypothesized in her

doctoral thesis that Intel’s continued success in a highly fragmented and vertically disintegrated

market stemmed from a highly evolved practice of fostering and managing the creation of

complementary products in the personal computer ecosystem. This sophisticated platform

management practice enabled Intel to establish itself as one of the primary beneficiaries of the

growth in the PC ecosystem. She observed that in a rapidly changing technological landscape

like that of the personal computer market, platform providers cannot simply seek to leverage

cross-side network effects by maximizing the supply of complementary products, but rather to

ensure that complementors are “innovating in ways that are favorable” to their platform. For this

reason, Gawer defined platform leadership as “a firm’s ability to influence the development of a

large number of complementary products by almost all other firms in their industry”[16]. This

definition of platform leadership is used throughout this paper. It is worth noting that this

definition is not restricted to a specific platform “role”; a platform leader can play any of the four

roles identified by Eisenmann, Van Alystne and Parker.

In 2002, Gawer further validated and elaborated this work with her thesis supervisor

Michael Cusumano, by categorizing Intel’s activities into four aspects of platform leadership

management. The pair called this framework “the Four Levers of Platform Leadership” [10]. An

overview of the four levers identified is presented in the sections below.

Lever 1: The Scope of the Firm

Platform firms must continuously decide which portions of the overall system to deliver

itself and which to leave for complementary vendors in the ecosystem. This is a continuous

process as the platform vendor must introspect its own capabilities and the dynamics of the

marketplace (including the behavior of platform competitors) and adjust its approach. As an

example, Microsoft has always been willing to directly compete with its software application

partners due to its immense software development capability, but had left hardware largely to its

partners. However, the company made a drastic change to this approach in 2013 by acquiring

Nokia’s mobile phone business for $7.2 Billion USD. Understanding the motivations and

Page 22: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

12

decision-making process of this strategic change is beyond the scope of this paper, but it

illustrates the dynamic nature of firm-scope management.

Lever 2: Product Technology

A platform leader’s decisions regarding the design of its technology’s architecture,

interfaces and intellectual property management significantly affect the nature of innovation that

participants of its ecosystem are able to contribute. A modular architecture enables contributions

from complementors and is generally preferred over “integrated” architectures with low

substitutability of components from the perspective of innovation enablement. However,

platform firms must determine the openness of their platform interfaces, balancing the

competitive advantages offered by exclusive proprietary access to ‘core’ platform functionality

with the need to encourage complementors. A similar balancing act occurs with regards to the

management of intellectual property. Generally speaking, the more open a platform leader is

with its intellectual property, the more vibrant is its ecosystem. As with the previous lever, the

management of product technology is also a continuous process, though it is worth noting that it

is typically more difficult for a firm to restrict an open policy than the converse.

Lever 3: External Relationships

Beyond making internal decisions regarding the scope and nature of its technologies,

platform leaders must also orchestrate the actions of complementary vendors in a manner that is

favorable to the platform. Gawer and Cusumano found that Intel was especially mature in this

aspect of platform leadership, acquiring organizational capabilities and making substantial

investments to build consensus and control of platform decision making, as well as encouraging

the right balance of collaboration and competition between complementors. As an example,

Intel shared with Gawer that it employs a unique strategy when attempting to drive the definition

of new interfaces for the PC platform. It attempts to create “momentum” behind new interface

standards by initiating the design process with a small interest group of the most influential

players within the ecosystem before involving the larger ecosystem. This approach helps to

relieve the “design by committee” challenges of completely open and democratic processes

while maintaining the benefits of having external contribution and validation.

Page 23: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

13

Lever 4: Internal Organization

The final lever Cusumano and Gawer identified pertains to the internal organizational

structure and processes that a platform leader puts in place to manage the inherent tension of

managing collaboration and competition. At Intel, this began with the identification and

differentiation of the competing objectives of the company. At Intel, “Job 1” refers to the core

organizational objective to sell more microprocessors, while “Job 2” refers to the desire to

compete directly in complementary businesses, while “Job 3” refers to the task of growing new

lines of business that may not be directly related to the core microprocessor business. By

acknowledging the inherent conflicts that these objectives create and by providing a vocabulary

for discussing them, Intel enables its management team to manage this tension proactively and

consciously. Beyond this, Intel also created organizational groups that were dedicated to these

different objectives. This not only served to focus the internal groups but also creates a level of

organizational separation that alleviates the conflicts of interests that external partners perceive.

Platform Establishment and Displacement

While Gawer and Cusumano focused their studies on the activities required to manage

and sustain the position of a platform leader, others focused their attention on the process of

establishing and displacing industry platforms. The early works of Rohlfs had established that

even potentially viable networks will naturally be attracted to a stable equilibrium of zero

participants unless a critical mass of participants is reached [5]. Evans and Schmalensee

extended the model to two-sided platform businesses and illustrated the challenge of reaching the

critical threshold on both sides of the platform. Depending on whether it is easier for potential

participants on a given side to join the platform or for existing participants to drop off, firms

aiming to launch two-sided platforms may find themselves in a position to subsidize

participation on both sides of their network in order to reach critical mass. The pair also found

that design decisions that reduce the resistance to participation on both sides of the network not

only lowers the critical mass required, but also increases the equilibrium adoption of the platform

once established.

The topic of platform displacement closely relates to the research on architectural

innovation and its impact on incumbent leaders completed by Henderson and Clark [17]. Prior to

their research, the prevailing understanding was that while all innovations create opportunity for

Page 24: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

14

new entrants to enter the market and displace the incumbent, only ‘radical’ innovations that

eradicated the technological competencies of incumbents create conditions favoring the new

entrant. In studying the then-nascent semi-conductor industry, Henderson and Clark observed

that evaluating the disruptive nature of an innovation based on the degree of technological

change was inadequate, as there were numerous occasions where seemingly incremental

technological changes resulted in the displacement of industry leaders. Instead, Henderson and

Clark found that architectural innovations – innovations that impacted the manner in which

components of a product system connected together – tended to be significantly more disruptive

to incumbent firms than technological innovations at the component level, regardless of the

degree of ‘radicalness’. The pair hypothesized that architectural innovations tend to be more

disruptive to the established firm as architectural knowledge tends to be captured in a firm’s

“structure and information-processing processes” which is difficult for a successful firm to

recognize and address. Similarly, platform displacement is also more likely when an innovation

allows for a reconfiguration of how the participants interact with one another.

An alternative path to platform displacement identified by Eisenmann, Parker and Van

Alstyne is the strategy of platform envelopment. Envelopment is interesting as it “provides a

mechanism for platform leadership change that does not require breakthrough innovation or

Schumpeterian creative destruction” [18]. The authors found that while network effects make it

difficult for a new platform entrant to displace an established platform, the incumbent may be

displaced through envelopment when the capabilities of its platform become bundled as a part of

an enveloping platform serving an adjacent market. In other words, the new entrant or attacker

can expand the scope of the competition and leverage economies of scope and scale on the

demand or supply side in order to create leverage for displacing an incumbent vendor.

Eisenmann, Parker and Van Alstyne cites the example of Microsoft’s successful “attack” on

RealNetwork’s dominant media streaming platform in the late 1990s by bundling streaming

services into its Windows NT server offering. The authors established a taxonomy for

envelopment attacks consisting of three major categories: conglomeration, intermodal, and

foreclosure (Table 3). In reaction to such attacks, platform leaders can seek to match the new

bundle, pursue legal protection, or exit the market if it cannot match the new entrant on the new

basis of competition.

Page 25: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

15

Attack Type Description and Example

Conglomeration The attacker joins functionally unrelated platforms together to create a new

bundle in order to leverage the economies of scope and scale to its

competitive advantage.

e.g. Cable and telephone television firms reciprocally bundling TV, phone,

internet services to attack each other.

Intermodal The attacker bundles two weak substitutes that deliver the same

functionality, but using different modalities and offering a single composite

platforms that negates the user’s need to choose between different modes.

e.g. Netflix bundling in DVD-by-mail and streaming delivery as a single

product.

Foreclosure The attacker bundles two complementary capabilities together to create a

synergistic offering that is superior to the individual products.

e.g. LinkedIn’s bundling of social network and job matching into a single

platform.

Table 3 - Taxonomy of Envelopment Attacks

Page 26: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

16

Open Source Software

Most scholars trace the origins of open source to the Free Software Movement started by

Richard Stallman when he incepted the GNU Project [1]. Responding to a perceived increase in

the limitations imposed by proprietary software vendors, the movement sought to restore the four

“essential freedoms” of computer users through the creation of “free software”. According to

Stallman, these freedoms are:

0. The freedom to run the program as you wish, for any purpose.

1. The freedom to study how the program works, and change it so it does your

computing as you wish. Access to the source code is a precondition for this.

2. The freedom to redistribute copies so you can help your neighbor.

3. The freedom to distribute copies of your modified versions to others. By doing this

you can give the whole community a chance to benefit from your changes. Access to

the source code is a precondition for this.

In the beginning of the movement’s history, the development of such software was

primarily of interest to academics and sophisticated individuals whose freedoms were infringed.

Despite Stallman’s continuous reminder to think of free software as “‘free speech,’ not ‘free

beer.’”, few commercial enterprises engaged in the creation of “free” software. It was widely

perceived that the act of sharing intellectual property was counterproductive to the objective of

profit extraction. In his widely cited paper “Profiting from Innovation”, David Teece argues that

innovators are best positioned to capture the value of their inventions if their intellectual property

is legally protected or “the nature of the product is such that trade secrets effectively deny

imitators access to the relevant knowledge”[19]; this condition is known as a tight

appropriability regime. The act of sharing source code increases access to potentially

differentiating knowledge, and therefore reduces appropriability. Given that the majority of

profit-seeking software firms view themselves as innovators trying to capitalize on their

creations, such firms tended to view the “free software" world with skepticism.

Commercial Interest in Community-driven Development

Commercial interest in community-driven development began to shift in the mid-nineties

with the emergence of Linux. University of Helsinki student Linus Torvald began developing

Page 27: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

17

Linux in 1991 as a personal project to maximize the capabilities of his specific hardware at the

time. The popularity of the project exploded shortly after Torvald shared his work and adopted

the GNU General Public License in 1992 (the original license that shipped with Linux explicitly

forbade commercialization). Today, Linux does not only power the personal computers of

dedicated hobbyist like Torvald, but is also competing successfully against proprietary

commercial systems in devices ranging from mobile handhelds to the most powerful

supercomputers in the world (Figure 4).

Figure 4 - Linux (in Orange) market share in various computing segments [20], [21]

Commercial interest in Linux and open source was initially motivated by the surprising quality

and effectiveness of the community-driven development model. As Di Bona, Octman and Stone

observed, the Linux volunteer community “produced a piece of software that would otherwise

require the might and resources of someone like Microsoft to create”[22]. Aided by the rapid

advance in internet collaboration technologies, the open source model managed to reconfigure

the requirements of production so substantially that it eliminated what had previously been

regarded as a natural monopoly.

In his popular 1999 essay “the Cathedral and the Bazaar”, Eric Raymond observed that

the development style employed in the development of Linux was unique even within the open

source community and hypothesized that this development style was a major reason for the

Worldwide Server Operating

Environments , 2012

Mobile Operating Environment Shipments,

2012

OS of the World's Top 500

Supercomputers, 2014

Page 28: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

18

project’s success. Chief amongst Raymond’s finding was that Linus Torvald focused his

attention on keeping his contributors engaged and worked to ensure constant activity within the

community (i.e. operating a bazaar) rather than enforcing consistency and architectural elegance

(i.e. building a cathedral). Moreover, Raymond asserted that the project’s frequent release cycle

and large, active user base assure the project’s quality more effectively than traditional software

engineering practices.

Raymond attributed much of Linux’s success to the rapid progress enabled by this

“Bazaar” model and attempted to validate the effectiveness of this approach by consciously

developing his own project in the same manner. Raymond’s findings influenced Netscape to

release the code for its then-popular web browser Communicator under an open source license

with the hope of leveraging the development capabilities of the open source community as a

competitive advantage over Microsoft. Unfortunately, Netscape’s effort failed to garner

sufficient attention from the community and it was eventually acquired by AOL before

eventually being disbanded. Raymond attribute Netscape’s failure to engage the community to

their lackluster efforts in removing the barriers to entry for contributors. As an example,

contributors of the product needed a license for a third-party UI library (Motif) just to work on

the product during its first year, which created a significant barrier for participation. While

Netscape failed to leverage the open source community as an engine for commercial success, the

challenges of their initial open source project led to the creation of the Mozilla project, which did

manage to attract significant community attention and resulted in the popular browser Mozilla

Firefox.

Despite the critical role of the community development model in progressing a number of

foundational technologies in the modern technology landscape, commercial interest in free and

open source development remained modest until the emergence of the Open Source Initiative

(OSI) and the highly publicized growth of Redhat, Inc in the late 90’s. Up until the formation of

the OSI, the commercial software development world did not delineate between the moral and

philosophical ideals of “Free Software” and the more pragmatic motivations behind community-

driven development. Founded by Eric Raymond and Bruce Perens in late February 1998, the

Open Source Initiative was created shortly after Netscape’s release of its proprietary code earlier

that month. The pair wanted to use the highly publicized event to advocate for “the superiority

Page 29: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

19

of an open development model” [23]. The term “open source” was coined at that time in order to

avoid the “the philosophically- and politically-focused label of ‘free software’” [23]. The OSI

created a pragmatic definition of “open source” software free of the constraints and judgmental

ideology of the Free Software Foundation, instead focusing on practically capturing the

requirements of licenses that should be considered “open” (Table 4). In particular, the definition

of open source software explicitly enables the possibility of deriving “paid” software from open

source software, which is forbidden for “Free Software”.

Criteria Description

Free Redistribution The license must allow for free or paid redistribution of the

software without royalties or other fees.

Source Code Source code for the software must be reasonably available and

the license must allow for its redistribution.

Derived Works “The license must allow modifications and derived works, and

must allow them to be distributed under the same terms as the

license of the original software.”

Integrity of the Author’s

Source Code

“The license must explicitly permit distribution of software built

from modified source code.”

No Discrimination

Against Persons or

Groups

“The license must not discriminate against any person or group

of persons.”

No Discrimination

Against Fields of

Endeavor

“The license must not restrict anyone from making use of the

program in a specific field of endeavor. For example, it may not

restrict the program from being used in a business, or from being

used for genetic research.”

Distribution of License “The rights attached to the program must apply to all to whom

the program is redistributed without the need for execution of an

additional license by those parties.”

License Must Not Be

Specific to a Product

“The rights attached to the program must not depend on the

program's being part of a particular software distribution.”

Page 30: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

20

License Must Not

Restrict Other Software

“The license must not place restrictions on other software that is

distributed along with the licensed software.”

License Must Be

Technology-Neutral

“No provision of the license may be predicated on any

individual technology or style of interface.”

Table 4 – A description of the ten criteria of open source software as defined by the Open Source Initiative. Modified and

Adapted from http://opensource.org/osd

This “rebranding” effort and the pragmatic, business-case driven approach of the OSI can

partially be credited for the increased interest in commercial open source development over the

past decade, though the success of Linux (and Redhat in particular) likely also played a pivotal

role. The study of open source business models within the research community has

correspondingly increased.

Related works on Commercial Open Source

In his 2005 paper, Sandeep Krishnamurthy attempted to categorize the business models

of open source firms into four distinct categories: (1) software distributors, (2) software

producers following the GPL model (firms leveraging open source components to create

derivative products that was also open source), (3) software producers not following the GPL

model (firms leveraging open source components to create derivative products that was

proprietary), and () third-party service providers [24]. Krishnamurthy further summarized that

the primary appeal of open source products to corporations stem from the perception of superior

performance, lowered adoption risk and lower total cost of ownership. Finally, he identified

community support, presence of proprietary or open source competition, relative competiveness

and marketing as the key factors affecting the profitability of open source firms.

In a Research Policy article on the topic of “melding proprietary and open source

platform strategies”, Adam West analyzed three major software platform vendors’ explorations

with community-driven development in order to understanding the strategies and motivations for

participation in open source [25]. West observed that the modern computer industry evolved

from a vertically integrated market dominated by vendors who held end-to-end control of the

“stack” to a market comprised of horizontally dominant platform firms, exemplified by

Page 31: Platform Leadership in Open Source Software

Chapter 1 – Literature Review

21

Microsoft and Intel. Focusing on the operating system as the platform of study, he observed that

industry interest in open source was motivated by a desire of the leading contenders in the

industry to challenge Microsoft’s dominant Windows platform. West chronicled the differing

efforts of IBM, Apple and Sun Microsystems as they engaged the open source movement and

identified that their participation in open source were motivated by different intentions. Table 5

summarizes West’s discussions on the motivations of these different vendors.

Vendor Open Source

Projects

Intentions

Apple FreeBSD

OpenDarwin

Apple’s primary intentions for participating in open source

was to leverage and reuse some of the market leading

components that were being built in the open source

community (in particular the Free BSD project).

IBM Apache

Eclipse

Linux

As a part of shifting its business focus towards applications

and application platforms, IBM tried to reduce the control

that Microsoft held as the platform leader by pushing the

computing industry towards open standards while

positioning itself as the leading integrator of technology.

Its involvement in projects like Apache and Eclipse were

means of accelerating the development of its proprietary

products.

Sun

Microsystems

Java

Open Office

Linux

Sun’s motivation for engaging in open source was primarily

to leverage the “horsepower” of the open source

community to accelerate the development of alternate

platforms to challenge Microsoft’s leadership in the market

of application frameworks (i.e. .NET vs. Java) and office

productivity (i.e. Office vs. OpenOffice).

Table 5 - Apple, IBM and Sun Microsystem's involvement in open source and their motivations for participating. Summarized

from the contents of [25].

West summarizes that there are two broad approaches to blending proprietary and open

source strategies. Firstly, a firm can choose to concede the more “commoditized” layers of the

platform to the open source community in order to focus their investments in differentiating

layers. Apple’s decision to adopt a variant of the open source FreeBSD kernel for its operating

Page 32: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

22

and continued development of a unique user interface shell exemplifies this approach. The

second approach is to disclose technologies and intellectual property to promote indirect network

effects. Both IBM and Sun’s experimentations with open source reflect this latter approach.

West’s observations regarding IBM’s motivations behind its open source strategy are

supported by Cepek, Frank et al. in their 2005 article in the firm’s own IBM System Journal

entitled “A history of IBM’s open source involvement and strategy” [26]. The paper, written by

IBM’s own employees who were involved in the various efforts, highlighted IBM’s recognition

that the open source movement was a business reality within its industry and recalls its efforts in

harnessing the movement for its own strategic intentions. The authors summarizes IBM’s

strategic intentions for involvement in open source as: (1) encouraging the use of “open source

implementations of open standard”, (2) fostering greater variety and choice and (3) enhancing

IBM’s mindshare.

Nicolas Economides and Evangelos Katasmakas created economic models for modeling

the competition between proprietary and open source platforms in their 2006 article in

Management Science [27]. The pair observed that a proprietary platform vendor in a multi-sided

market could find it profitable to set a price below its marginal cost on one side in order to

maximize profits on all sides. The models found that markets supported by proprietary platforms

are more profitable than those supported by open source platforms, though the variety of

available complements is generally higher in open source platforms. However, this finding was

made with the assumption that open source platform profits (the profits derived from the selling

of the platform itself) is always zero.

Page 33: Platform Leadership in Open Source Software

23

Chapter 2 – Strategic Considerations for Open Source Leadership

What does it mean to be an “open source platform leader”? A naïve definition would be a

simple literal interpretation of the two clauses from the phrase’s bisection – a “platform leader”

that leverages “open source” technologies. However, the use of open source technology is so

prevalent today that there are scarcely any commercial software firms that do not leverage open

source in some form, and consequently this definition is too broad to be useful. Even Microsoft,

the canonical proprietary platform vendor, utilized open source libraries in the delivery of its

Windows NT operating system [28]. Similarly, Cisco Systems delivers a variety of hardware

devices that run embedded versions of Linux and is generally considered a “platform leader” by

scholars like Cusumano and Gawer. Clearly, the moniker of “open source platform leader”

appears to be ill-fitting for these firms that appear to epitomize proprietary platform leadership.

A more restrictive definition of the term “open source platform leader” would be “a

leading provider of open source platforms”. In other words, candidacy for “open source platform

leadership” is restricted to platform providers who specifically create open source products. This

definition roughly aligns with Krishnamurthy’s second category of commercial software

vendors, which he called “software producers following the GPL model”, broadened to include

other types of open source licenses, but with the additional constraint of the software being a

platform product. At first glance, this appears to be a more appropriate definition, as it would

remove the above counterexamples such as Microsoft or Cisco from candidacy without

disqualifying obvious candidates like Red Hat. However, this definition is actually too narrow as

it excludes many firms that opt to utilize open source strategically in order to drive the adoption

of their platform products, even proprietary ones. One excellent example of such a firm is IBM

and its inception of the Eclipse open source project in the software application platform space.

Page 34: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

24

IBM and Eclipse

In 2001, IBM open sourced its Eclipse technology and founded the Eclipse consortium in

order to drive the adoption of the Java-based application platform and integrated development

environment (IDE) as an alternative to Microsoft’s .NET and Visual Studio stack [29]. After

utilizing the resources it had obtained from its acquisition of Object Technologies International

(OTI) to develop an internal product platform to improve the development efficiency and

consistency of its own application development, IBM decided to leverage OTI’s technology to

drive the adoption of its WebSphere suite of application platform technologies. Recognizing it

was a latecomer to the application platform market competing against a powerful incumbent with

an established ecosystem in Microsoft, IBM theorized that “in order to build momentum around

[Eclipse] and to get more vendors to build their products on top of it, [it] had to make it open

source”.

IBM’s hypothesis appeared to be correct, as Eclipse became one of the most popular

open source projects in the world and established itself as the dominant player within the market

of Java Integrated Development Environments. Only four years after IBM made Eclipse open

source, a survey conducted by the publishers of SD Times (a popular software development trade

magazine at the time) found that approximately two-thirds of all its readers utilized Eclipse in

their workplace (Figure 5), followed by IBM’s proprietary WSAD at 21%. Eclipse’s substantive

user base as well as vibrant ecosystem of complementary vendors provided a substantial

competitive advantage to WSAD.

Figure 5 - Results from the "Java Use and Awareness Study" from BZ Research, 2005

0%

10%

20%

30%

40%

50%

60%

70%

Jan-02 Jan-03 Jan-04 Jan-05

Eclipse

IBM WebSphere Studio App.Developer*Borland Jbuilder

Sun NetBeans

Oracle Jdeveloper

JetBrains IntelliJ IDEA*

BEA WebLogic Workshop

Microsoft Visual J++

Page 35: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

25

The specific license that IBM put in place when establishing Eclipse was a derivative of

IBM’s Common Public License. The license ensured that IBM was able to commercialize

Eclipse technology without having to release the derivative product as open source. As a

consequence, IBM was able to leverage Eclipse’s success to bootstrap the ecosystem of its own

proprietary WebSphere Studio Application Developer (WSAD) platform product, which

provided additional functionality while being fully compatible with complements produced by

the Eclipse ecosystem.

Although IBM spun off the responsibility of managing Eclipse to an independent non-

profit organization, the Eclipse Foundation, it continues to be a major force driving the continued

evolution of the Eclipse platform. One way this is made evident is in the fact that IBM continues

to be the single largest contributor to the project (Figure 6). Given IBM’s ability to influence the

activities of others to enhance its own offering, it clearly exhibits the qualities of a “platform

leader” in this context. Moreover, IBM’s strategic and intentional use of the open source model

as a means of bootstrapping a proxy ecosystem for its own proprietary offering suggests that it

should be considered an open source platform leader.

Figure 6 - Eclipse Project Committer by Company (excluding committers without and with unknown corporate affiliations).

Taken from http://dash.eclipse.org on August 4th, 2014

IBM32%

Eclipse Project Committers by Company

IBM

Oracle

itemis AG

Page 36: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

26

The Definition of Open Source Leadership

IBM’s utilization of the open source community to accelerate the adoption of its platform

technology against an incumbent platform rival is not a particularly unique or novel tactic. As

mentioned during the introduction to this paper, Google’s Android, Samsung’s Tizen and

Nokia’s Maemo are all open source mobile operating system efforts that were looking to displace

Apple’s dominant iOS platform. It is unlikely that these commercial firms opted to open source

their platform technology out of altruistic charity. Rather, they adopted the open source model as

a strategy intended to accelerate the development of a critical mass of users and complementors

in markets marked by network-level competition. However, IBM’s ability to use open source to

improve the competitiveness of its proprietary software product is illustrative in demonstrating

that open source platform leadership is not limited to a specific type of software license or even

business model. It is entirely possible to for a proprietary software vendor or even a service

provider to be an “open source platform leader”.

As mentioned in the literature review, a “platform leader” as defined by Cusumano and

Gawer is a firm that is able to influence the activities of other industry participants in order to

create complementary products and solutions that enhance its offering. This definition has broad

applicability as the leader can play any number of roles within the ecosystem so long as it

“drive[s] industry wide innovation for an evolving system of separately developed pieces of

technology” [30]. Building upon this definition, “open source platform leadership” is therefore

best described as “a firm’s ability to influence the development of a large number of

complementary products” through engagement in open source. The unique characteristic of

open source platform leaders is that they participate in open source development with the

specific purpose and intention of gaining a platform advantage.

While the usage of an open source model may help a platform vendor accelerate the

prevalent adoption of its platform, the model also comes with its own unique set of challenges.

These challenges are systematically analyzed in the sections to follow in order to provide a

holistic framework for understanding the strategic considerations for open source leadership.

Page 37: Platform Leadership in Open Source Software

27

One could argue that it is more important for open source platform leaders to be forward-

looking in the management of their platforms when compared to their proprietary counterparts.

The additional involvement of community contributors and the irreversible nature of “open

sourcing” intellectual properties mean that decisions that are relatively easy for proprietary firms

to make require more planning and lead time for an open source platform leader to affect. For

example, despite IBM’s significant investments and participation in Eclipse, changes to the core

platform of Eclipse are governed by the independent Eclipse Foundation (which IBM helped

establish) consisting of members of the Eclipse ecosystem, many of whom compete with IBM in

different markets. This governance structure significantly increases the friction and latency of

manipulating Lever 2 (product technology) and consequently IBM must be more proactive and

forward-looking if it wishes to deploy that lever effectively.

In order for a platform leader to proactively manage its platform strategy, it must have a

holistic understanding of the forces that shape the dynamics of competition in its given market

and establish a strategy to manage these forces. The formulation of such a strategy requires a

hypothesis on how these forces will shift as the platform evolves over time. A traditional

analysis framework used for understanding the forces that affect a given market is Porter’s Five

Forces model of industry analysis (Figure 7) [31].

Figure 7 - A reproduction of Porter's Five Forces Model. The horizontal forces represent the critical factors arising from the

value chain of the market, while the vertical forces and the center circle represent competitive forces.

Page 38: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

28

While Porter’s model provides a useful outline for analysis, it was created with the

intention of analyzing the overall attractiveness of any given industry and does not specifically

consider the unique dynamics of platform-driven markets. In particular, Porter did not consider

the critical role that platform complementors play in affecting the competitive balance within a

platform market. To address this, industry practitioners such as Intel’s Andrew Grove have

augmented Porter’s framework with an additional factor capturing the influence of partners

(Figure 8).

Figure 8 - Six Forces Diagram, taken from Only the Paranoid Survive [4]

Grove’s variant serves as useful scaffolding for the discussions of different factors that

open source platform leaders need to understand and manage. In the sections to follow, different

candidate factors are identified and reasoned, structured by the categorization provided by

Grove’s Six Forces model. For the purpose of this discussion, the considerations brought on by

the emergences of new entrants and substitutes are evaluated together. Table 6 presents a

summary of these different considerations.

An overview of Google’s Android project is presented preceding this discussion to serve

as a lighthouse reference for describing these different factors. The relevance of these identified

factors to the actual behavior of aspiring open source platform leaders is further validated in the

case study on Hadoop in the chapter to follow.

Page 39: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

29

Considerations Description

Rivalry The relative intensity of inter-network and intra-network competition

shapes and governs the behavior of the open source platform vendor.

Vendors must continually adjust their behavior as this will change over

time.

Suppliers Qualified engineering talents is the primary constraining resource that an

open source software vendor requires. A platform contender must

understand the specific organization structure of the open source

community in order to access the right talents.

Complementors Complementors in the open source world can come in the form of

commercial allies as well as community contributors. Vendors must form

a hypothesis on what are the key complements in order to secure superior

or exclusive access.

Buyers By establishing a clear understanding of the purchasing process of the

platform, platform contenders can establish superior or exclusive access to

key intermediaries to secure a competitive advantage in intra-platform

competition. They can also develop a platform advantage by injecting

themselves in the purchasing process of complements in order to exert

greater influence in the operations of the ecosystem.

New Entrants

and Substitutes

The fact that open source platform vendors do not possess exclusive

authority to define how the technology is packaged and reused means that

alternative modes of platform consumption can emerge from unexpected

sources. Platform boundaries can shift without the vendor’s involvement.

As a consequence, emergent threats in the form of new direct competition

or substitute are more arguably more prevalent in open source businesses.

Table 6 - Summary of Strategic Considerations for Open Source Platform Vendors

Page 40: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

30

Google and Android

Android is a Linux-based open source mobile operating systems created by Google using

the assets it acquired when the search giant purchased Android Inc. in 2005. Although the ‘core’

aspects of Android is open sourced to the community under the Android Open Source Project

(AOSP), Google has been and continues to be the primary engineering force behind the

continued evolution of the Android software platform. The firm deploys its considerable

engineering resources to work on the ‘next version’ in private before releasing the source to the

community. The software typically makes its way to the hands of customers through devices

created by hardware device partners such as HTC, LG and Samsung. Google collaborates with

these partners through an industry alliance known as the Open Handset Alliance (OHA).

Participation in the OHA provides partners with unique access to Googles resources and are

generally perceived as a requirement for gaining the license to deliver Google Mobile Services

(GMS). Google Mobile Services are complementary (but proprietary) services and components

that greatly enhance the value of the system, including applications like Gmail, Google Now,

Google Calendar and the Google Play Store.

Although many of hardware device partners opt to heavily alter or ‘enhance’ the versions

of Android that ship with their devices to create a unique experience that differentiates their

offerings to end users, the vast majority of these changes are cosmetic in nature and do not

fundamentally change the definition of the platform. Moreover, hardware partners who are

members of the OHA are contractually prevented from creating “forked” or “derived” versions

of platform, and instead collaborate with Google and others on the continued evolution of

Android [32]. In other words, while Google cedes some control of Android’s interface from the

perspective of end users to its hardware partners, this arrangement allows Google to remain the

definitive authority over the platform’s evolution from the perspective of software

complementors (Figure 9). At a conceptual level, this structure is not vastly different with how

Microsoft operated its Windows franchise over the years. However, the fact that Android is open

source means that Google’s role in defining and providing the platform is displaceable. A firm

with sufficient engineering resources and ability to deliver complementary services can

theoretically displace Google entirely and propose a different design trajectory for the platform.

Page 41: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

31

Figure 9 - The Android Platform and the roles of Google, hardware partners and other complementors

This theoretical scenario unfolded when online retailer Amazon released the Kindle Fire

in 2011. Amazon elected not to collaborate with Google as a participant in the Open Handset

Alliance, but rather to create its own variant of the system based on what was available through

the Android Open Source Project. The Kindle Fire ran a derived or “forked” version of Google’s

Android that was later rebranded “Fire OS” with subsequent releases. The Fire OS was largely

compatible with applications built for the version of Android from which it was derived, but

Amazon replaced all of Google’s cloud and content services with alternatives from Amazon and

its partners. Amazon even provided an alternative “App Market” to connect users to

applications, offering its own Digital Rights Management (DRM) and payment infrastructure for

software vendors in the Android ecosystem. By choosing not to participate in the Open Handset

Alliance and “forking” their own version of the platform, Amazon put themselves in a position

where they can theoretically choose to evolve Fire OS independent of Google’s influences.

Since the release of the Kindle Fire, a number of companies have followed Amazon’s

path of creating Android-derived platforms without participating in the OHA. While the

majority of these firms do not have the engineering resources that would allow them to

realistically challenge Google’s dominion over the architectural trajectory of the Android

platform, a number have sizable presence in specific geographic markets such as China and are

more than capable of displacing Google as the de facto provider of complementary services

Page 42: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

32

within their markets. It is also interesting to note that even Microsoft has gotten into the game

and forking Android as a solution for developing markets by way of its Nokia acquisition [33].

While the actions of these vendors may have actually contributed to the Android platform’s

dominant market share in the mobile platform space, they have clearly been detrimental to

Google’s ability to benefit from that dominance.

Google’s management of Android is a useful reference for discussing the different factors

affecting open source platform strategy. Although the structure of the ecosystem bound together

by mobile platforms closely resembles that of the personal computing industry with which many

are already familiar, the outcome has been quite different. In particular, the battle for the mobile

industry is interesting in that a previously dominant incumbent platform leader (Apple) has been

successfully challenged by a new entrant (Google) that has opted to release its platform as open

source. Google’s changing behavior as this unfolds is a useful illustration of the dynamic nature

of open source platform strategy.

Page 43: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

33

Company Platform Description

AliCloud

(China)

Yun OS According to Alibaba (AliCloud’s parent

company), the Yun OS is a Linux-based

operating system that utilizes components

and tools from the Android Open Source

Project to deliver Android app compatibility.

Amazon Fire OS Fire OS features an optimized UI for

consuming Amazon’s content and services.

Application Programing Interfaces (API)

have also been extended to promote the

unique capabilities of Amazon’s hardware.

Baidu

(China)

Baidu Yi Yi OS displaces Google’s GMS services with

Baidu’s implementations.

Microsoft Nokia X Nokia X re-skins Android with a look and

feel approximating Microsoft’s Windows

platform and replaces Googles services with

Microsoft’s own. It was originally conceived

as a low-cost solution for developing

markets.

Table 7 – AOSP-derived products by Google competitors [34]–[36]

Page 44: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

34

Rivalry – Inter-network vs. Intra-network Competition

Figure 10 - The competitive threat to a proprietary software platform vendor comes in the form of alternative platforms; open

source platform vendors must additionally contend with alternate providers, including the community, for their specific platform.

Platform leaders typically possess unique knowledge of the technologies that serve as the

technical foundation for its ecosystem; the extent it shares this knowledge is one of the decisions

that the firm can take (“Lever 3 – Relationship with Complementors”). For proprietary software

platform vendor, this unique knowledge is often encapsulated in the proprietary intellectual

property that source code represents. Leveraging this unique asset, the firm is able to act as an

effective monopoly within the sub-markets that its network participants represent, as no

competitors are capable of displacing their dual roles as the platform provider and sponsors.

Competition comes exclusively in the form of alternative platforms or “inter-network”

competition. For example, as the exclusive provider of the iOS, Apple Inc. does not need to

worry about another firm supplanting its role as the dominant distributor of iOS applications to

customers. It is also secure in its position as the dominant provider of development tools to iOS

application developers (complementors in this ecosystem). Apple’s competitive concerns stem

purely from alternative ecosystems and the possibility of customers or application vendors

abandoning the iOS platform for alternatives such as Microsoft’s Windows or Google’s Android

platforms. In other words, the dominant strategic concern for proprietary platform vendors is the

establishment and sustenance of the platform itself as an industry standard.

Compared to their proprietary counterparts, open source platform vendors face an

additional dimension of complexity affecting its competitive strategy. Beyond the challenges of

Page 45: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

35

establishing its platform as the dominant industry standard, open source vendors must

additionally work to establish itself as the primary provider of that standard. This challenge is

clearly evident in Google’s Android ecosystem. Like Apple, Google strives to establish Android

as the dominant platform against alternatives like iOS and Windows Phone in the mobile

computing space. However, due to its decision to open source the development of Android,

Google additionally faces competition in its role as the provider of platform technologies to users

and complementors within the Android ecosystem, as exemplified by its struggles with “platform

wannabes" such as Amazon and Alibaba. This struggle illustrates a fundamental tension that an

open source platform leader faces: balancing the occasionally conflicting needs of inter-network

competition with those of intra-network competition (Figure 10). The relative intensity of these

two different types of competition waxes and wanes over the course of platform evolution, and

the open source platform vendor is likely to find itself adjusting its position on the “Four Levers

of Platform Leadership” as a consequence.

In order to ensure that the Android ecosystem would attract the maximum number of

software and hardware complementors away from the incumbent leader, Google took pains in

the inception of the Android platform to ensure that it was architected in an open and modular

manner (Lever 2). It collaborated openly with hardware partners, software vendors and the open

source community (Lever 3). As Android establishes itself as the de facto leader within the

mobile space (nearly 85% of all smartphones shipped in Q2 2014 were Android-based [37]),

Google’s primary strategic concerns has arguably shifted from winning against competitive

platforms to sustaining its position as the primary beneficiary of Android’s success. While

Google’s decisions to share its technology have clearly contributed to Android’s dramatic growth

in the marketplace, they have also lowered the unique competitive advantages of Google as the

Android platform provider. As Google’s focus shifts away from alternative platforms to

alternative providers of the Android platform, its behavior also correspondingly changes.

One shift that Google has been slowly making pertains to its decisions around the

functionality that is delivered as a part of the open source “core” as opposed to the proprietary

services and extensions that it exclusively offers (Lever 2). In October of 2013, Ron Amadeo of

Ars Technica outlined the various functionality that Google delivers through proprietary channels

that it had previously included as part of core Android [38]. Much of this functionality were

Page 46: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

36

aided by Google’s unique proprietary cloud capabilities (such as Gmail or enhanced search) and

therefore Google’s decision to encapsulate it as proprietary extensions can be justified on a

technical basis. However, the decision to deliver self-contained enhancements, such as the

enhancements to the basic keyboards, appear to be deliberate decisions intended to further

differentiate the capabilities of Google’s Android versus alternatives offered by the community

or competing providers.

Capability Open Source Version Proprietary (Date Introduced)

Search AOSP Search Google Search (August 2010)

Music Player AOSP Music Google Play Music (May 2010)

Calendar Calendar Google Calendar (October 2012)

Keyboard AOSP Keyboard Google Keyboard (June 2013)

Camera AOSP Camera Google Camera (April 2014)

Messaging AOSP Messaging Google Hangouts (May 2013)

Table 8 - Google's shift of investment into proprietary capabilities. Content adapted from Ars Technica [38].

Google’s approach in interacting with external partners has also shifted as it seeks to

leverage its relationship in order to lock out other Android platform contenders (Lever 3).

Recognizing that its contributions to AOSP does not provide it with any legal means to minimize

the ‘forking’ of its core, Google created the Open Handset Alliance at the inception of Android

precisely to provide this means. As mentioned earlier, while ‘forking’ is a fairly normal and

desired phenomenon in open source projects, the OHA’s anti-forking restriction explicitly

prevents this from happening. There is a broad understanding in the industry that participation in

the OHA is a prerequisite for meaningful collaboration with Google on Android, and

consequently the majority of hardware device manufacturers are members of the Open Handset

Alliance. By putting in place this agreement, Google significantly limits the channels through

which alternative software platform vendors can create Android-derived products, as leading

hardware vendors participating in the OHA are restricted from collaborating with them. Amazon

Page 47: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

37

experienced this when it searched for hardware partners to help build its Fire line of devices,

ultimately settling on an original equipment manufacturer with minimal prior exposure to the

mobile industry as a result. In the recent past, Google has been aggressive in the enforcement of

this agreement, going as far as threatening OHA member Acer Computers with the termination

of its Google Mobile Service license to prevent the hardware manufacturer from shipping a

device with Alibaba’s Yun OS despite some controversy about whether the Yun OS should

technically be considered a “fork” of Android [39].

Google’s changes in behavior illustrate the dynamic nature of managing the tension

between inter-network and intra-network competition. As mentioned earlier, open source

platform leaders must proactively hypothesize how the dynamics of competition will play out, in

order to put in place mechanisms that offer competitive leverage later. In the Android example

above, had Google failed to anticipate the emergence of “forks” such as those created by

Amazon, it would not have put in place the anti-forking clause that provided it with one of its

few means to deter a well-resourced and capable competitor from challenging it as Android’s

platform leader.

Page 48: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

38

Suppliers – Securing the Upstream Value Chain

The primary constraining ‘supply’ of the software industry is engineering talent.

Depending on the specific domain of software, the talent required may be highly specialized.

For example, IBM’s inception of the Eclipse project was made possible by the unique and highly

specialized competencies that they received through their acquisition of Objects Technologies

International. If the required engineering talent is scare, the possession of such human resources

is a significant barrier of protection for open source vendors even if their software is highly open

from a licensing perspective. Moreover, even in fields where capable talents are not scarce, the

structure of many open source projects impose limits on the supply of engineers who can

materially affect the design of a given project. In many open source projects of scale, access to

the main code-line is governed by relatively small group of individuals who have demonstrated

competence with that project. Depending on the project structure, this group may be known as

“committers”, “reviewers” or “maintainers”. More importantly, there are often official or de

facto technical leaders in most FOSS projects of scale who are responsible for making the major

design decisions; the authority of granting “committer” status to individual contributors is

sometimes also held by this group. Typically, this leadership group is kept fairly small. Securing

access to this group is therefore a critical determinant in an open source platform contender’s

ability to influencing its upstream value chain.

Figure 11 - Hierarchy of influence within an Apache Software Foundation project. Adapted from the ASF [40]

• The Chair of a Project Management Committee (PMC) is appointed by the Board from the PMC Members. The PMC as a whole is the entity that controls and leads the project. The Chair is the interface between the Board and the Project.

PMC Chair

• A PMC member is a developer or a committer that was elected due to merit for the evolution of the project and demonstration of commitment. They have write access to the code repository, an apache.org mail address, the right to vote for the community-related decisions and the right to propose an active user for committership. The PMC as a whole is the entity that controls the project, nobody else.

PMC Member

• A committer is a developer that was given write access to the code repository and has a signed Contributor License Agreement (CLA) on file. They have an apache.org mail address. Not needing to depend on other people for the patches, they are actually making short-term decisions for the project. The PMC can (even tacitly) agree and approve it into permanency, or they can reject it. Remember that the PMC makes the decisions, not the individual people

Committer

• A developer is a user who contributes to a project in the form of code or documentation. They take extra steps to participate in a project, are active on the developer mailing list, participate in discussions, provide patches, documentation, suggestions, and criticism. Developers are also known as contributors .

Developer

Page 49: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

39

There is tremendous variety with regards to the contribution model and distribution of

decision-making authority amongst open source projects (Table 9). Aspiring open source

platform leaders must understand the decision-making structure for their prospective community

in order to secure the resources required to affect the technological trajectory of their platform

(Lever 2). For example, for projects governed by the Apache Software Foundation, “committer”

status is relatively scarce and is granted by the Project Management Committee (PMC), which is

also responsible for resolving the major design decisions affecting the project (Figure 11). As a

consequence of this, aspiring platform firms that wish to affect the technological trajectory of the

project must secure some critical mass of individual committers as well as adequate

representation within the PMC. Given that PMC members “are participating as individuals…

affiliations do not cloud the personal contributions”, this means that Apache-based platform

firms must retain the services of the specific individuals who already reside on the PMC if the

firm wishes to influence the design of the technology. In contrast, the Linux development

process operates on a much more open “bazaar” style basis, with the majority of design decisions

being made via publically accessible mailing lists and the only ‘governance’ process being the

actual mechanics by which a “maintainer” reviews and integrates individual submitted patches to

the mainstream code-line. It is entirely possible for a firm to hugely affect the design trajectory

of Linux without employing anyone who is an official “maintainer” of a Linux module. This

open decision making process results in a significantly larger supply of engineering resources

who can make substantial contributions when compared to the more constrained pool of

committers in an Apache-governed project.

Regardless of their specific organizational structure, most open source communities

define themselves as being transparent meritocracies. This means that authority and influence

within the community arise as a consequence of demonstrated contributions within the

community rather than role or rank assigned by some “higher” authority. This leads to the

emergence of de facto technology leaders in most open source communities. Paradoxically, the

“openness” of this meritocratic philosophy actually greatly restricts the supply of talent that an

aspiring platform leader can acquire in order lead the design trajectory of an open source

platform at any given time. While a proprietary platform leader can bestow any capable

candidate with the authority to lead the technical direction of a proprietary platform, an aspiring

open source platform must look to employ an established leader of the community if it wishes to

Page 50: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

40

secure influence over the trajectory of the platform. Therefore, the attraction and retention of

highly-visible community leaders is a critical aspect of establishing and maintaining platform

leadership in the open source world.

Community Authority Official Description

Apache

Software

Foundation

Project Management

Committee (PMC)

“The PMC is the vehicle through which decision making

power and responsibility for oversight is devolved to

developers.”[41]

PMC Chair “[The] chair is a facilitator and their role within the PMC

is to ensure that everyone has a chance to be heard and to

enable meetings to flow smoothly.” [41]

Eclipse

Software

Foundation

Project Management

Committee

“ensure that their Project is operating effectively by

guiding the overall direction and by removing obstacles,

solving problems, and resolving conflicts;”

“ultimately responsible for ensuring that the Eclipse

Development Process is understood and followed by

their project”[42]

Project Lead

Architectural Council “responsible for… monitoring, guiding, and influencing

the software architectures used by Projects” [42]

Planning Council “The Planning Council is further responsible for cross-

project planning, architectural issues, user interface

conflicts, and all other coordination and integration

issues.”

Linux

Foundation

Maintainers “determines whether the code should be accepted into the

development tree, or returned for revision” [43]

Mozilla

Foundation

Module Owners “responsible for leading the development of a module of

code or a community activity” [44]

Page 51: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

41

Release Drivers “provide guidance to developers as to which bug fixes

are important for a given release and also make a range

of tree management decisions.” [44]

Super-Reviewers “approval of a super-reviewer is generally required to

check in code” [44]

Ultimate Decision-

Makers

“The ultimate decision-maker(s) are trusted members of

the community who have the final say in the case of

disputes. This is a model followed by many successful

open source projects, although most of those

communities only have one person in this role, and they

are sometimes called the ‘benevolent dictator’” [44].

Table 9 - Decision Making Authorities in different Open Source communities

It is worth noting that Google sidestepped many of these issues in its establishment of the

Android Open Source Project. Although the source code of Android is publically published and

contributions from the community are welcome, the fact that each new version of Android is

designed behind closed doors at Google means that some of the communal and meritocratic

nature of open source development is absent from Android’s development. As a consequence,

although Google does not benefit from the power of community development that motivates

most open source projects, it is also not hindered by the constraints that community development

imposes.

As most open source platform projects are complex efforts composing of a hierarchy of

dependent sub-projects, platform vendors must thoroughly understand the architecture of the

platform and formulate their position on which sub-projects are strategic in order to secure the

right talents for affecting the platform. For example, at the time of writing, the Eclipse Platform

consists of twelve top-level projects, which are in turn composed of 243 sub-projects [45]. A

firm wishing to become a platform leader based on Eclipse must decide which of these 243

projects materially affect the platform from the perspective that matters to it and invest its

engineering resources appropriately. Each platform provider within the same ecosystem might

Page 52: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

42

hold a different perspective on which modules are most critical depending on its hypothesis of

which sides of the network its wishes to focus on. In other words, aspiring platform leaders are

most likely interested in the projects that represent external interfaces of platform complements

that they see as most strategic, or modules that represent user interfaces if they see driving user

adoption as most critical.

The extent to which a firm may find it necessary to invest in the internal ‘core’ of a

platform significantly depends on the maturity of the platform and the level of inter-platform

competition. If the platform is relatively immature and unstable, and the level of inter-network

competition is intense, a platform would find it necessary to focus their efforts on acquiring the

talents need to stabilize the core and make the platform more viable. As a platform reaches

maturity and the focus of competition shifts from inter-network competition to intra-competition,

platform vendors may find it less important to invest in the ‘core’ technologies but rather focus

their energies in affecting the peripheries that act as interfaces into the platform or in delivering

capabilities that differentiate their specific versions of the platform. Generalizing from all of

this, it becomes clear that how a firm chooses to collaborate with the open source community

significantly impacts whether (and how) it can secure the critical resources necessary for success.

Page 53: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

43

Complementors – Identifying and Securing Critical Complements

Beyond engineering talent, platform builders require the supply of key complements in

order to make their platform viable. While aspiring platform leaders seek to engage a large

number of industry participants to their platforms and provide complementary products or

services, it is sometimes the case that providers of specific types of key complements are few or

even nonexistent. In such cases, the platform leader may choose to intervene by either providing

extraordinary support for those complementors or by directly participating in the complements

ecosystems itself to boost the supply of the required complements. For example, Cusumano and

Gawer documented Intel’s creation of the Content Group in order to help spur the creation of

scarce multimedia software at the time. The pair also documented Intel’s venture into the chipset

and motherboard business after finding that the existing vendors in the business were not keeping

up with the needs of the platform [10]. However, beyond reinforcing the ecosystem to enable

network effects, controlling key complements through intervention and involvement in

complement creation can also arm a platform leader with competitive leverage against alternative

platform vendors.

The management of key complements is a tactic that is well recognized and overtly

managed in certain markets such as the video games consoles. In their survey of the various

strategies of video game console makers from 2005 through to 2007, Daidj and Isckia found that

Microsoft relied heavily on the advantage gained by exclusives such as the Halo franchise to

drive the adoption of their platform, the Xbox 360 [46]. Perhaps more interestingly, James

Prieger and Wei-Min Hu pointed out that in a separate paper on the same industry that possessing

exclusive complements are only effective in driving platform adoption if the majority of

complements available are non-exclusive. Moreover, the pair found that a “small amount of

exclusivity… would be enough to foreclose competitors from all the important sources of supply

of the complementary good”[47].

The leverage afforded by the exclusive access to key complements is particularly relevant

in the intra-network competition for open source platform as there is generally a lowered level of

differentiation amongst vendors of the same platform and also no technical barriers creating

differences in ecosystems. In other words, open source platform vendors often compete within

the conditions for effective complement-exclusivity advantages that Prieger and Hu identified.

Page 54: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

44

This fact has also manifested itself in the Android case, where Google chose to withhold

its complementary mobile services (e.g. Gmail, Maps, Google Now etc.) from Android variants.

Given that Google makes the complementary services within its Google Mobile Services

portfolio available to even competing platforms such as iOS, it may initially appear odd that

Google would refuse to provide these capabilities to other Android systems. However, Google’s

decision reflects the different competitive dynamics of inter-network and intra-network

competition. Since iOS and Android are fundamentally different platforms with significantly

differences in capabilities and ecosystems, the availability of Google Mobile Services is less

likely to materially affect a customer’s relative preference for Google’s platform in comparison

to Apple’s. In contrast, given that Amazon’s Fire OS and Google’s Android are very similar

technical platforms with a much smaller difference in capabilities and are application ecosystems

with significant overlaps (applications that are available on Android can run on Amazon’s Fire

devices if Google’s proprietary extensions are not used, or if the developers substitute Google’s

services with Amazon’s offering). Therefore, the availability or absence of Google’s class

leading services may materially affect consumer preferences for one vendor’s Android variant

over another. In light of this understanding, Google’s decision to withhold its Google Mobile

Services (e.g. Gmail, Google Maps, Search) from users of alternative Android platform appears

to be a sensible and strategic means of creating greater differentiation between its platform and

those of its intra-network rivals.

Unlike Google, most firms do not have the luxury of possessing exclusive ownership to

key platform complements and often have to invest in cultivating relationships with partners just

to secure access for their platforms. In order to do so in a cost-effective and timely manner,

aspiring open source platform leaders must form clear hypotheses on the critical type of

complements in order to secure superior access either through internal development or by

developing partner relationships.

Page 55: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

45

Buyers – Controlling the Path to the Customer

As the right-hand side of Figure 10 illustrated, an open source platform facilitates the

technical connection between customers and complement creators without consideration of who

is providing the underlying platform. For example, Android application developers can largely

be assured that their product can technically be sold to users of Google’s Android as well as

Amazon’s Fire OS with only a modest amount of additional investment. In other words, the

technical platform is often undifferentiated from the perspective of the complement creator, even

if the provider manages to differentiate its platform variant to end consumers. In fact, the

complement creator prefers a greater level of commonality across different platform variants in

order to minimize the amount of customization for its products. As a result of this, a platform

provider needs to find other means for differentiating itself from the other providers for the same

platform. One way that a platform provider can differentiate itself from its rivals is by

controlling the complementor’s path to platform customers.

Depending on the market that the platform serves in, the relationship between the

platform provider and the end customer may vary greatly in nature and intensity. For example,

in enterprise software, vendor-customer relationships tend to be highly intense as vendors tend to

have relative few customers, each representing non-trivial fraction of a vendor’s revenues. As a

result, each customer holds substantive bargaining power. While such a structure may appear to

weaken the bargaining position of platform vendors, such a structure actually provides important

leverage for platform vendors to affect the behaviors of complementors. If an open source

platform provider is able to forge strong and exclusive relationships with key customers, and

there are relatively few customers on the market, it can act as an effective monopoly on the

ecosystem from the perspective of the complementor even if it is not the exclusive provider of

the platform technology.

Page 56: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

46

Figure 12 – The Purchase Process of Complements – While an open source platform enables a complement provider to provide

its products to a given customer, the onus is still on the complement provider and customer to discover each other. If a platform

provider can facilitate this connection in a superior manner, it can differentiate itself from alternative platform providers.

Beyond establishing exclusive relationships to key customers, a platform provider can

also assert influence over complementors by positioning itself as a facilitator in the purchase

process of complements (Figure 12). Although an open source platform provides a unified

technical infrastructure that binds an ecosystem together, the existence of a unified platform does

not imply the existence of a unified purchase process. In other words, while a complementor's

product can be delivered to the customer thanks to the technical infrastructure provided by the

platform, the business process of discovering, evaluating and purchasing the solution is not a

problem resolved by the existence of an open source platform. Consequently, if a platform

provider can facilitate this process better than other platform providers and better than network

participants can accomplish on their own, that platform provider is able to create a preference for

its platform variant over others.

As it turns out, this has also been an important tactic in the intra-network competition

between platform providers in the Android ecosystem. Each of the different Android platform

providers have invested in their own proprietary application marketplaces in order to facilitate

the purchase of applications by customers. While application vendors are able to create and

distribute applications on their own, these application marketplaces provide customers with a

simpler and faster means of discovering and purchasing new complementary “apps”. Although

alternative third-party marketplaces exists for the same purpose, the marketplace offered by a

platform provider has the distinct advantage of being pre-installed on the devices that ship with

its platform. In order to secure superior access to these channels, the application vendors are

compelled to establish sometimes exclusive relationships with a specific platform provider, even

if there is no technical reason for it.

Page 57: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

47

While some may argue that the primary motivation of creating an application store is to

monetize the activities within a platform ecosystem, Google has made decisions that appear to

contradict such an objective. For example, the company limits access to its electronic

application and content marketplace (“Google Play”) to devices produced by manufacturers that

it approves (effectively manufacturers that participate in the OHA) [48]. If the company’s

objective was to capitalize on the transactions that occur within the Android ecosystem, it would

likely have taken a similar approach to Amazon; Amazon not only makes it App Market

available on its Fire devices, but also Google-approved Android devices as well as Blackberry 10

devices [49]. Despite Amazon’s efforts, Google’s “Play” store is the largest marketplace for

Android-compatible applications with an estimated 1.3 million applications compared to

Amazon’s 240,000 as of June 2014 [50]. Having exclusive access to such a vibrant marketplace

affords Google’s Android offering with a substantial competitive advantage over its intra-

network competitors. Amazon’s response of its own App Market was not purely a mean of

matching Google’s channel for delivering complements to its platform consumers, but also a

means of giving the company leverage to influence the architecture and technical interfaces of a

platform it otherwise has relatively little influence over. Developers who choose to sell their

products on Amazon’s App Market need to ensure that their apps work with Amazon’s devices,

which in turn requires the substitution of Google’s proprietary services and APIs (e.g. Google

Maps) with Amazon’s version.

The aforementioned mechanisms for platform leverage rely upon the platform provider’s

involvement in the value chain between a complement producer and the customer beyond

supplying the pure technical infrastructure offered by the platform. However, a platform

provider can also deter competitors by studying the value chain between the platform itself and

the customer. It is often the case that the path between the platform provider and the customer is

actually an indirect route controlled by a few intermediaries. Consequently, an open source

platform provider can attempt to recreate the effective monopoly of a proprietary platform

vendor by establishing exclusive relationships with those intermediaries. This was clearly a

tactic utilized by Google in an attempt to limit the fragmentation of the platform through the

Open Handset Alliance. As mentioned earlier, a customer of the mobile operating system

typically adopt a given platform by purchasing a device from one of several major device

manufacturers. By securing effectively exclusive relationships with those hardware device

Page 58: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

48

manufacturers through its Open Handset Alliance program, Google greatly restricts the extent to

which alternate platforms providers can displace it as Android’s platform leader.

Google’s ability to enforce its platform leadership through the OHA program hinged

upon the company’s identification of hardware vendors as the critical nodes on the value chain to

customers in the mobile industry. In the highly intertwined marketplace that many open source

platform vendors compete in, the nodes and relationships connecting the platform vendors to the

customers can be complicated, often times resembling a network rather than a chain. In the case

of Android, Google additionally identified network providers and implementation partners (such

as Accenture or Wipro) as nodes on the path to the customer and have consequently included

such firms in its Open Handset Alliance program. The preemptive identification of these critical

nodes allowed Google to establish superior relationships with them, and put in place legal

agreements that ensures exclusivity (i.e. the “anti-forking” clause of the OHA).

Much like how different firms may come to different perspectives on what modules

within the platform are most strategic, different firms may also hold different hypotheses on

which relationships and complements are most critical to establishing control over the

ecosystem. The hypothesis held by the firm on which types of relationships (or even which

specific relationship) are most critical to manage may even significantly impact the firm’s “scope

of the firm” (Lever 1) decisions as firms seek to avoid conflict with key members of the business

network. The relationships that affect the purchasing process of the platform and of complements

are amongst those most critical to establishing ecosystem influence. While the strategic

management of these external relationships are also important to proprietary platform vendors,

this dimension of platform management is especially critical to the open source platform vendor

in light of intra-platform competition with alternative vendors. Given that open source platform

vendors are technically replaceable, the correct identification and possession of those key

relationships serve as a critical means of asserting platform leadership for open source firms.

Page 59: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

49

Substitutes and New Entrants – The Threat of Shifting Platform Boundaries

Beyond competing with alternative platforms and rival providers of the same platform,

platform vendors must consider the threat of substitute technologies that can be considered

alternatives to adopting a platform altogether. While all product firms – platform or otherwise,

proprietary or not – face the same threat of substitution, platform firms in general and open

source vendors in particular need to be specifically aware of alternatives that can emerge from

changes in the definition of platform boundaries.

As mentioned earlier in the literature review, scholars have found that “platform

envelopment” is one of the most effective strategies for displacing an entrenched platform. In

particular, the “foreclosure attack” can be viewed as a redefinition of the platform boundaries to

a vastly larger scope, substituting the need of a specific platform with capabilities integrated into

a broader platform with known demand. One can reason that open source platforms appear to be

more susceptible to this type of substitution as there are lowered technical barriers for an attacker

to integrate the capabilities of an open source platform into the context of a broader one.

While all platform vendors face the threat of envelopment, open source platforms are

perhaps uniquely susceptible to the threat posed by shifting platform boundaries. An open source

platform is typically a complex system of subsystems loosely connected through a network of

related projects. As a consequence, enterprising vendors or members of the community can

choose to re-interpret platform boundaries and create new offerings bundling different sub-

projects together. Although proprietary platforms are also complex compositions of smaller

subsystems, proprietary intellectual property holders possess the unique ability to determine how

these internal subsystems are bundled together. For example, a customer cannot choose to adopt

just one aspect of the Apple iOS operating system without purchasing the entire platform. The

entire definition of what it means to consume the platform is at the discretion of Apple,

motivated at least partially by the business objectives it faces at any time. No other participant

within the ecosystem holds the power to define platform boundaries.

Page 60: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

50

Figure 13 - The ability for vendors to create distributions can have the undesirable effect of fragmenting the ecosystem if there is

variation between platform products that impact platform users. In the above example, distribution 1 and 2 are variants of the

platform produced by different open source vendors.

In the open source world, members of the community does not only have the ability to

substitute one implementation of a subsystem with another, but also to define the boundaries of

the platform differently. If a member of the community believes a specific subsystem is useful

and “core”, it can choose to bundle it as its own “distribution” of the platform. As mentioned in

the literature review of this paper, a license that enables distribution creation is a fundamental

criteria for a software to be considered “open source” and was the original business model upon

which the largest open source business in history (Redhat, Inc.) was based [24]. Distribution

creation creates variations in the definition platform which can blur platform boundaries and can

fragment the ecosystem if the components being varied touch interface points with platform

consumers.

Figure 13 illustrates this with a hypothetical open source platform with two distributions.

Distribution 2 varies from distribution 1 in that subsystem 2 and 4 have been substituted with

subsystems A and B respectively. While the replacement of subsystem 4 with subsystem B is

simply technical implementation decision that does not impact platform users, substituting

subsystem 2 with A can mean the Type A complements created for distribution A do not work

with distribution B, fragmenting the ecosystem and compromising the strength of indirect

network effects. Similar fragmentation can occur if the distributions introduced or omitted

interfaces differently, creating a fuzzy platform boundary. As an example, the two major variants

of the desktop operating environment (Gnome and KDE) distributed by major distribution

Page 61: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

51

creators fragmented the interfaces that desktop application developers used to create desktop

applications for Linux until a common interface was established through Project Portland by the

Desktop Linux Working Group [51]. This type of platform fragmentation reduces network

effects and harms the platform’s ability to compete with alternative platforms.

Beyond the challenge of fragmentation, open source platform leaders also face the

possibility for key subsystems of the platform to be reused in other contexts by other firms or

individuals as a means of “hijacking” the platform’s ecosystem. Amongst the large number of

subsystems and modules within a software platform, it is often the case that only a fraction of

those subsystems are materially involved in enabling interactions with a given type of

complements. If a firm is able to isolate these core subsystems, it can reuse these modules to

allow those same complements to interact with another product or even platform. While such an

approach is theoretically possible with proprietary platforms with open interfaces, the technical

barriers to execute such a tactic is extraordinarily high. Competing vendors who want to

leverage complements built for a specific proprietary platform must reverse engineer the

implementation underlying the platform based on the interface definition and replicate the

behaviors of the base platform. Depending on the implementation technology and the degree of

coupling between the complement and platform, this is a task that ranges from difficult to

effectively impossible. However, within the open source world, a firm seeking to ‘hijack’ a

platform’s complements does not need to perfectly replicate the behavior of some unknown

black box, but rather directly modify and integrate the core components required.

As an example, Figure 14 shows a hypothetical platform comprising of subsystems one

through five. Suppose that Type B complements are desirable to an alternative platform product

competing in an adjacent platform market or an inter-network competitor. Given that subsystem

2 is the only interface that Type B complements interact with, a competitor can simply integrate

that subsystem into its own platform and offer compatibility and support with Type B

complements. Since subsystem 2 requires supporting capabilities from subsystem 3 and 4, the

competing vendor can choose to also integrate those components into its product, or replace

those subsystems with its own.

Page 62: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

52

Figure 14 – Ecosystem hijacking - competing vendors can isolate the interfacing and enable subsystems for a given complement

type and choose to redeploy it in another context to expedite their own objectives.

Google experienced this hijacking phenomenon with the Android platform. In case of the

Android platform, the application framework represented by the Android Software Development

Kit (SDK) and the execution environment known as the Dalvik Runtime are the primary

components involved in supporting the consumption of Android applications on the platform

(left-hand side of Figure 15). While the application framework does depend on capabilities

provide by other core libraries within the Android platform, the Dalvik module represented the

bulk of the required complexity. Blackberry was able to integrate Dalvik into its own proprietary

Blackberry 10 OS based on the QNX kernel it had acquired. In doing so, Blackberry was able to

bootstrap its own ecosystems by supporting Android applications and vastly reducing the cost of

multi-homing for application vendors to support its platform [52].

Page 63: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

53

Figure 15 – High-level System architecture of Android and Blackberry OS 10. Adapted from

http://developer.android.com/images/system-architecture.jpg

While numerous other factors have prevented Blackberry from meaningfully contending

for a platform leadership position, the fact Blackberry was able to leverage Android’s success to

substantially enhance the competitiveness of its own proprietary platform illustrates the viability

of this hijacking tactic for the aggressor, and the threat it poses to the platform incumbent. It

should be possible that Blackberry’s tactic would have been nearly impossible or illegal with a

proprietary platform such as iOS.

The lack of control over an open source platform’s design and architecture (Lever 2)

creates significant risks for a platform leader. These risks require constant monitoring and

management. The common open source business model of distribution creation has the potential

of fragmenting the platform and reducing network effects. Distribution creation can also shift

platform boundaries, rendering key assets and assumptions held by the firm invalid by including

or excluding modules. The availability of source code and the ease with which a platform can be

decomposed into individual parts also means that open source platforms can be “hijacked” as

competitors can repurpose key subsystems for competing purposes. The open source platform

leader needs to remain vigilant in identifying such threats and ensuring that it has

countermeasures to defend against them.

Page 64: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

54

Chapter Summary

The mobile operating system space is a highly competitive market involving some of the

technology industry’s most powerful players. The fact that the leading platform (by market

share) is an open source late entrant is a testament to the increasing relevance of the open source

model in the modern computing industry. However, Google’s Android project is an

unconventional open source project in that the contributions of the open source community is

largely limited; Google explicitly chose not to take advantage of the talent within the open source

community in order to maintain greater control over the trajectory of the platform. This decision

clearly indicates that Google does not perceive the primary reason for participating in open

source to be the ability to leverage the resources of the open source community, but rather some

other attribute. With the reasonable assumption that the profit-seeking corporate entity known as

Google is not releasing the intellectual property behind Android for purely altruistic reasons, one

can reasonably infer that the decision to adopt the open source model stems from a desire to

accelerate adoption on both sides of the platform and to catalyze network effects.

While the use of an open source model can remove adoption barriers for platform users,

particularly on the side of complement producers, the forfeiture of intellectual property rights

and design authority significantly limit the means that a platform leader can use to direct the

trajectory of the ecosystem for its own benefit. As a result, platform contenders must find

alternative means of exerting their influence. In order to do so, contenders must first determine

whether inter-network competition (winning against alternative platforms) or intra-network

competition (winning against alternative providers) is the more immediate need and then form a

perspective on how this may shift over time. In addition, the firm must stay abreast of shifts in

the perceived platform boundary, which can expand or contract without their approval, and

ensure that it has the means to remain the primary benefactor of the platform’s continued growth.

An understanding of the above two factors will shape the behavior of the vendor with regards to

how it interacts with the key suppliers (engineering talent and complement providers) and the

extent to which it intervenes in the purchasing process of complements and the platform itself.

Google’s experience with Android and its challenges in maintaining control of the

platform that it sponsored illustrates the many challenges of managing an open source platform.

Despite the fact that Google has chosen to adopt a relatively closed development model for

Page 65: Platform Leadership in Open Source Software

Chapter 2 – Strategic Considerations for Open Source Leadership

55

advancing Android, competing vendors are fracturing the platform in a manner that is

incompatible with Google’s business objectives. Given that these competing efforts are

completely legal from a licensing perspective, one of Google’s few means of influencing the

ecosystem comes from its control over the key complements it controls within its Google Mobile

Services portfolio. By controlling that key asset and leveraging its initial exclusivity of Android

development expertise, Google is able to strike critical agreements with members of the value

chain in an effort to block out alternative platform providers. These agreements help Google

remain the de facto platform leader for Android, despite the fact that powerful rivals have

emerged.

Perhaps the most surprising lesson from the Google case study is that open source

platform leadership may require access to substantial complement assets and capabilities. Given

that Google is largely staffing the development of Android with its own employees with little

contribution from the community, it appears fair to assert that the decision to open source

Android has not reduced the amount of effort for Google to launch its own mobile platform

offering. However, it is unlikely that Android would have experienced its level of success if it

was launched as a proprietary offering; therein lies the double edged sword of an open source

platform strategy.

Page 66: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

56

This page is intentionally left blank.

Page 67: Platform Leadership in Open Source Software

57

Chapter 3 – A Case Study on Hadoop

History and Origins

Doug Cutting and Mike Cafarella was struggling to solve major scalability problems with

their open source web search engine project, Apache Nutch, when Google Engineers Jeffrey

Dean and Sanjay Ghemawat published their paper on MapReduce in December 2004 [53]. Their

implementation of the MapReduce idea ultimately led to the creation of the ‘big data’ platform

now known as Apache Hadoop.

Search engines such as Nutch need to traverse billions of pages in order to generate a

lookup data structure known as a search index, and this is a computationally expensive endeavor

that require the storage and processing of an enormous amount of data. Given modern hardware,

such a challenge that could only be reasonably tackled if the work were massively parallelized

between hundreds or even thousands of computers (also known as nodes) working in a

coordinated manner. The complexity of managing this type of large-scale distributed computing

was beyond what Cutting and Cafarella were able to tackle as part-time open source software

developers. The MapReduce paper represented an elegant solution to this problem by offering a

simple programming model for describing parallelizable processing algorithms and a framework

for executing them. This paper, in combination with a previous paper on the Google File System

[54], described a robust and general-purpose distributed data processing platform for exactly the

type of batch processing that Nutch was doing. Recognizing this, Cutting and Cafarella

implemented the ideas described in the papers using the Java programming language and ported

the major algorithms in Nutch to this framework. This effort allowed Nutch to scale

significantly beyond what the pair had been able to achieve with their previous homegrown

efforts.

Around the same time, internet search provider Yahoo! was prototyping a redesign of its

own distributed processing infrastructure called “Dreadnaught” based on the same MapReduce

and GFS papers under the leadership of Eric Baldeschwieler. After discovering Cutting and

Cafarella’s effort with Nutch, the firm decided to abandon its internal development and adopt the

pair’s work. According to Owen O’Malley, a founding member of Hadoop-vendor Hortonworks

and a member of the original Yahoo! team, there were two main reasons that the team abandoned

Page 68: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

58

its efforts in favor of what was being done in Nutch. Firstly, Cutting and Cafarella’s

implementation was already proven to scale out to dozens of machine in Nutch, while Yahoo!’s

efforts were less mature and unproven. Adopting Hadoop would allow the Yahoo! team to roll

out a cluster of machines for its research staff to experiment with immediately. Secondly, the

individual developers on the Yahoo! team had a preference for working in open source and they

had an easier time convincing the firm’s legal department to do that with Nutch than with

Dreadnaught, since Nutch was already available to the open source community [55]. Yahoo!’s

decision to embrace the Nutch framework was aided by a trio of supportive executive sponsors

in Qi Lu, Jan Pederson and Raymie Stata, who were leading the search division at Yahoo! in

different capacities at the time. In particular, Stata was a director on the board of the Nutch

Foundation and was familiar with both Hadoop and the team behind it.

Yahoo! hired Doug Cutting in January 2006 and spun Nutch’s distributed processing

framework into its own Apache open source project a month later. Cutting arbitrarily named the

project after his young son’s toy elephant and Hadoop was born.

Page 69: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

59

Hadoop and the Big Data Phenomenon

Today, Hadoop is associated with the phenomenon known as “Big Data”. The term “Big

Data” is attributed to John Mashey of Silicon Graphics and is used to refer to both the

opportunities of, and the challenges with, the rapid growth of available data [56][57]. Doug

Laney of the META Group published a research note in 2001 entitled “3D Data Management:

Controlling Data Volume, Variety and Velocity” which identifies three primary dimensions that

drive the complexities of managing “big data” [58]. In the note, Laney points out that

conventional approaches to data management have limits along each of these dimensions. Data

that exceeds these limits require novel techniques to be employed. In 2012, Laney (then at the

Gartner Group) published an often-cited definition of Big Data: “Big data is high volume, high

velocity, and/or high variety information assets that require new forms of processing to enable

enhanced decision making, insight discovery and process optimization."[59].

With the advent of the internet and various connected, sensor-enabled machines, new data

sources have emerged that significantly increase the requirements on all three dimensions.

Moreover, the value captured within these new sources of data is unknown until they are

analyzed. While there is often general acceptance that there is value locked within these ‘big

data’ sources, the specific means through which that value is unlocked is often unknown at the

Characteristic Description

Volume The amount of data that needs to be stored, processed and analyzed.

When this quantity is increased dramatically, conventional storage and

processing techniques either fail disastrously or perform unacceptably.

Variety The different types of data that needs to stored and processed.

Conventional data management has been focused on the management of

structured, tabular data generated by transactional systems.

Velocity The speed at which data needs to be stored, processed or retrieved.

Table 10 - The Three V's of Big Data

Page 70: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

60

time of data collection. With conventional data management technologies, the cost of collecting

such data is often too high to justify the upfront investment for collecting the data. In their 2012

survey, Forrester Research found that 88% of data collected by enterprises were discarded

because the organization could not justify the costs of collecting it. Hadoop addresses this

problem by providing a cost-effective, flexible and scalable platform for collecting and analyzing

data with minimal upfront costs. Understanding how Hadoop resolves this problem requires a

rudimentary understanding of how a Hadoop-based data management approach fundamentally

differs from conventional data management technologies such as the relational database.

The Relational Database

Since the late 1970s, the dominant design for conventional database management systems

has been the relational model. Invented by IBM computer scientist Edgar Codd in the early

1970s, the relational model offers a flexible means of storing information by handling all data as

tuples (i.e. rows in a table) and relations [60]. This model succeeds earlier designs such as the

hierarchical model or the network model and offered a more flexible and efficient means of

representing a wide variety of different data structures. The industry standard for interfacing with

relational database management systems is the Standard Query Language (SQL). SQL is a

declarative programming language, which means that it allows applications or users connecting

to the database to describe what they would like to store, read or write from the database without

having to tell the system exactly how to complete that operation. Systems built on the relational

model are known as Relational Database Management Systems or RBDMS.

In order to store data within an RBDMS, users must first define the structure of the

tables, specifying the types of data that will be stored and describing their relationships. This data

model is also known as a “schema”. The existence of a defined schema helps a relational system

enforce data integrity and enable the efficient storage and processing of data. This requirement

to have a defined schema prior to the storage of data is known as “Schema-on-Write”. In

general, defining a robust data model that meets the needs of the application and the business is a

time-consuming task for a sophisticated database designer. This process creates an upfront cost

for businesses that must be paid before the first bit of data is collected and processed by the

system.

Page 71: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

61

Beyond designing the schema, businesses implementing RBDMSs must also estimate the

quantity of data to be collected, along with the pace at which data will be read or written in order

to determine the combination of hardware and software that is needed to handle that load. This

is known as “system sizing”. While it is possible to “scale out” RBDMSs retroactively after a

system has been implemented and rolled out, this is generally a costly and difficult proposition

for conventional relational systems due the fact that they are optimized for vertical and not

horizontal scalability.

According to Nikita Shamgunov, CTO of MemSQL, “enterprise-class database systems

run well on powerful hardware, and there are many forces within the industry aligned to make

this happen — not just software vendors, but also hardware manufacturers who want to show

how many more transactions per second they can push on new hardware." [61]. As a result of

this, significant engineering efforts have been invested into ensuring that RBDMSs possess the

ability to leverage increased capacity on a single machine (“vertical scalability”). However,

increasing the storage or processing power of an individual machine is not always a feasible

option due to the limits in hardware. Expanding the capacity of the overall system by

introducing additional machines (i.e. “horizontal scale out”) is challenging for conventionally

designed database systems. Typically, such efforts require the repartitioning and redistribution of

data in order to account for the new machines. This can introduce lengthy and costly

interruptions to the operations of the system. As a consequence, conservative sizing practices are

often adopted for relational databases, which further increase their upfront costs. Moreover,

even conservative sizing is a difficult proposition for new and emergent “big data” sources

whose eventual volume and velocity is unknown.

Hadoop to the Rescue

Hadoop offers a fundamentally different approach to the data management problem than

conventional RBDMSs. While the Hadoop platform itself is a generic distributed storage and

computing framework, various database management systems have been built on top of this

framework (e.g. Apache HBase, Accumulo). These systems generally fall in the class of

“NoSQL” (“Not Only SQL”) designs and are not relational in nature. Hadoop usage for “Big

Data” also involve persistence and processing of data through raw files which would not be

classically considered as database systems by computer scientists. Neither of these usages of

Page 72: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

62

Hadoop require any significant pre-emptive modelling. Instead, the data can be persisted “raw”

in the native output format of the data producer, and the “schema” provided at the time that the

data is retrieved. This approach is known as “schema-on-read” or “late-binding”. As the

structure of the data is not provided at the time of persistence, a schema-on-read system is unable

to enforce the consistency or integrity of the data, nor can it optimize the storage for retrieval in

the manner that “schema-on-write” systems do. However, this approach allows for the deferral

of the significant upfront modelling costs associated with onboarding a new source of data with

relational systems.

Due to the fact that the workload required by Google grew at a rapid and unpredictable

pace, MapReduce and the Google File System were designed to allow the company to add

capacity in a cost-effective and flexible manner. The systems were built to run on hundreds or

thousands of inexpensive ‘commodity’ machines, rather than a few expensive but powerful

‘server’ computers, to enable cost-effective and incremental capacity increases. Moreover, the

frameworks were designed so that additional machines could be introduced to the system with

little to no interruption to system operations and minimal human intervention. This property of

“easy scalability” was inherited by Hadoop.

Hadoop’s easy scalability, in combination with the low upfront onboarding costs enabled

by its “schema on read” approach, makes Hadoop an attractive option for the persistence of new

sources of “big data” whose value and magnitude is yet to be understood. Firms can cost-

effectively persist data inside Hadoop without being immediately concerned about how they

would use it and be reasonably assured that Hadoop would scale with their needs. Hadoop also

equips them with the flexible processing framework needed to extract the value from the data

when the time comes. This approach to the management of “Big Data” has become so popular

recently that the slang “hadump” has been recently coined by some industry observers to mock

the fact that many Hadoop systems have become “dumping ground(s)” of unused data.

Despite these criticisms, the fact that Hadoop offers a cost-effective solution to the

problem of new emergent sources of data is genuinely valuable in light of the explosion of

available data. Though other distributed and NoSQL technologies exist, Hadoop has become the

leading platform in the big data management space, especially for analytical use cases. In their

2014 research report, Forrester Research characterized Hadoop as “a must-have data platform for

Page 73: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

63

large enterprises, forming the cornerstone of any flexible future data management platform” [62].

By 2015, the Gartner Group estimates that roughly two-thirds of analytical applications will have

integrated Hadoop capabilities [63].

Incumbent enterprise software vendors such as Oracle, IBM and Teradata have taken

notice as Hadoop is increasingly challenging their products as the “center of data gravity” in

enterprise datacenters. According to a 2014 survey by Wikibon, Hadoop had displaced

traditional data warehouses for some workloads in 61% of those surveyed; another 34% is

expecting to shift some workloads over to Hadoop within the next six months [64]. While it is

unclear if Wikibon’s sample is representative of the industry at large, the response does seem to

echo the increasing interests by corporations in utilizing technologies beyond conventional

relational data warehouses to manage the rising tide of new “big data” sources they face. Google

Search Trends data back up this sentiment as the popularity of the term “Data Warehouse”

declined while “Hadoop” and “Big Data” rose over the past decade (Figure 16).

Figure 16 – Google search popularity of “Hadoop” and “Big Data” vs. “Data Warehouse” [65]

0

20

40

60

80

100

120

Google Trends (2004 - 2014)

hadoop big data data warehouse

Page 74: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

64

Architectural Overview

While Hadoop originally referred to the open source implementation of the Google File

System and MapReduce framework, the term “Hadoop” is now used to refer to the collection of

technologies that has coalesced around those two original technologies. A diagram of the various

components (along with some common open source or proprietary implementations) that were

commonly found in the prototypical Hadoop application stacks at the time of writing is presented

in Figure 17.

Figure 17 – Major building blocks within a Hadoop application stack (popular proprietary / open source project fulfilling a

given role in parenthesis).

In the sections below, some of these key components are introduced to provide an

understanding of how these individual components helped Hadoop become the de facto platform

for Big Data. This understanding is necessary as a foundation for discussing the strategies of

different platform competitors and their complementors.

Page 75: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

65

Distributed Storage

The distributed storage layer within the Hadoop stack is responsible for managing the

reliable and efficient persistence of data managed by the system. As Hadoop was designed to run

on low-cost “commodity” hardware that can be prone to failure, the distributed storage layer is

responsible for providing resilience in the face of hardware failures. It does so by managing

redundant versions of the data across different machines transparently. Due to the fact data

managed by Hadoop tends to be extremely large (i.e. measured in terabytes or petabytes),

Hadoop assumes that it is more efficient to move “computation to the data” rather than the

reverse and provides mechanisms to do this.

The Hadoop Distributed File System (HDFS) was the component that was originally built

to meet the needs of this layer and it remains the most popular component for storage within the

extended Hadoop today. However, other options exist, including MapR’s proprietary Distributed

File System, IBM’s General Parallel File System (GPFS), Amazon’s Simple Storage Service

(S3) and UC Berkleys’ Tachyon [66]. Moreover, many non-Hadoop distributed NoSQL systems

such as Apache Cassandra and MongoDB with their own storage subsystems have also been

adapted to interoperate with the rest of the Hadoop stack.

Job Managers and Coordinators

The role of the Job Manager or Coordinator is to orchestrate the execution of

computation across the many computing nodes within a Hadoop cluster. Originally, Hadoop was

designed to handle only batch-based MapReduce jobs used for “embarrassingly parallel”

(computing problems that are trivial to break apart and parallelize) tasks such as the page-

indexing operation required by Apache Nutch. As a result, the original component for managing

that computation was directly integrated into the component within the MapReduce processing

framework itself. This component was known as the Job Tracker.

As more data is deposited within the Hadoop file system, the desire to run different types

of non-MapReduce programming models and interactive workloads has correspondingly

increased. This desire required an ability for the framework to manage computing resources for

these different use cases accordingly. As a consequence, a more sophisticated coordinator,

Apache YARN (“Yet Another Resource Negotiator”) was created to handle different types of

computing workloads [67].

Page 76: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

66

While the job manager is arguably one of the most central and critical components within

the Hadoop stack, there is not much competition in this space. YARN’s only notable alternatives

at the time of writing is the open source Mesos framework created by the Berkley’s AMPLab, as

well as the processing framework specific job manager originally integrated into MapReduce.

Distributed Processing Frameworks

The MapReduce distributed processing framework that Cutting replicated in Hadoop was

designed to offer a simple programming model for software developers writing highly

parallelizable programs. Structuring a computational problem so that it can be reliably processed

by a large number of computers in parallel had previously been a complex task. MapReduce

solved this problem by requiring developers to break their algorithms into its two eponymous

steps: “Map” and “Reduce”. The “Map” step partitions the required data into groups and the

“Reduce” step processes the data within that group and summarizes it. For example, in order to

find how many books any given author wrote in a large unsorted library of articles, a map

function can partition the library based on the author’s name and the reduce function can count

how many books are in each partition. MapReduce’s unique value is that it can execute this

process for a library that is spread over thousands of computers and efficiently deliver the result.

Figure 18 illustrates this process conceptually.

Figure 18 – Diagram of basic MapReduce execution, taken directly from Jeff Dean and Sanjay Ghemawat’s original article on

MapReduce [53].

Page 77: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

67

As long as a given computing algorithm can be structured this way, MapReduce was able

to ensure that it can be reliably distributed across massive clusters of computers. As it turns out,

many complex algorithms can be decomposed into a series of MapReduce steps, making Hadoop

a versatile tool for tackling all sorts of Big Data problems.

However, the single-framework approach in Hadoop is suboptimal for a number of

reasons. Firstly, as MapReduce was implemented as a batch-processing framework, it had

significant overhead and inefficiencies that makes it unusable for interactive end user computing.

Secondly, MapReduce stores all intermediate results back into the Distributed File System

(partially as a means of ensuring failure resilience) which makes the framework inefficient for

algorithms that tended to iterate over the same dataset over and over again. Many popular

machine learning algorithms useful for extracting insight out of “Big Data” falls into this

category of “iterative” algorithms. Finally, MapReduce was an inefficient programming model

for most software developers. The framework forced developers to formulate their problems in

an unintuitive way, which significantly diminished developer productivity [68].

The Hadoop community resolved this third issue of developer efficiency by developing

new abstractions that sat on top of MapReduce. This included engines such as Pig and Hive,

which enabled programmers to develop in PigLatin (a procedural programming language for data

transformation) and SQL respectively. The community also developed libraries such as Mahout,

which offered a repository of ready-made Machine Learning algorithms so that individual

developers did not have to wrestle with MapReduce directly themselves. However, these efforts

did not address the fundamental deficiencies that MapReduce had as a framework for interactive

computing or iterative processing.

Spark, a framework originating from Berkeley’s Algorithms, Machines and People

(AMP) Laboratories attempts to address most of these problems. Originally created as a part of

the AMPLab’s Berkeley Data Analytics Stack (BDAS), Spark was primarily developed by

Berkley researchers independent of the Hadoop community. However, they worked to integrate

their technology into Apache Hadoop and it has since been embraced by the Hadoop community.

The AMPLab submitted Spark as an Apache Incubator project in June of 2013 and it was

accepted as a top level Apache project in February 2014 [69].

Page 78: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

68

In addition to Spark, Apache Tez was also created to address the limitations in

computational complexity of the original MapReduce framework. Both Tez and Spark were

influenced by a Microsoft Research paper on a system called Dryad [70]. Although the original

MapReduce was able to process complex algorithms by connecting one job with another, the

framework was designed to handle a relatively simple two-stage processing pipeline with a

single input and output at each stage. Dryad (and therefore Tez and Spark) offers an arbitrary

number of inputs and outputs at each stage, enabling the expression of a complex processing

graph. This fundamental improvement of the processing framework, along with YARN’s

management framework improvements discussed in the previous section, are considered the core

parts of the Apache community’s “Hadoop 2” efforts, which seek to make Hadoop a general

purpose distributed processing framework rather than one used for batch processing [71].

Beyond offering the ability to execute more complex processing graphs, Spark also

introduced “In-Memory” computing concepts to distributed processing through a novel

abstraction known as the “Resilient Distributed Dataset” (RDD). The abstraction allows Spark

to reliably avoid persisting intermediate results back to disk, enabling the framework to execute

iterative workloads orders of magnitude faster than MapReduce or Tez. As a result of this

advantage, Spark has attracted significant attention in both academia and industry. All major

distributions of Hadoop now include Spark. Additionally, a growing set of applications,

scripting engines and libraries have ported their MapReduce algorithms over to Spark. In July of

2014, Cloudera, Databricks, IBM, Intel and MapR announced a partnership to help the Hadoop

community standardize on Spark as the “framework of choice” by porting popular components

such as Hive and Pig to Spark [72]. At the time of writing, Spark appears to be positioned to

succeed MapReduce as the de facto processing framework for Hadoop systems.

Scripting Engines, Libraries and SQL on Hadoop

As mentioned in the previous section, one of MapReduce’s major drawbacks is the fact

that it forces software developers to rethink their algorithms using a framework structured for

efficient processing by computers. In response to this, the Hadoop community created

abstractions in the form of Domain-Specific Languages and libraries to enable developers to be

more effective. For example, the Pig Scripting Engine offers imperative programming language

(“Pig Latin”) that provides developers with operations and data structures useful for

Page 79: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

69

manipulating structured datasets. Developers are able to write their data transformation

programs using this developer-friendly language and the Pig Scripting engine internally

translates these operations into machine-friendly MapReduce jobs, thereby offering the massive

parallelism and efficiency of Hadoop without imposing the burden of understanding the

associated complexity on developers [73]. Similarly, the Apache Mahout project seeks to

accelerate the development of machine learning programs on Hadoop by offering a library of

common machine learning algorithms that developers can leverage.

Unlike the other layers in the stack mentioned in the previous sections, as it is entirely

possible to have a fully functional Hadoop system to exist without this layer. Therefore, these

components should not be considered part of the core technical platform. However, from the

perspective of evaluating Hadoop as an industry platform, these components are absolutely

critical. Many of these libraries are so commonly used within their respective domains that they

have become the de facto interface into the Hadoop platform. A large number of key

complements that creates value for the Hadoop ecosystem depend on the interfaces presented by

these components. Due to the fact that some complementary applications depend exclusively on

the interfaces presented by these components, these layers actually make the underlying core

platform-level components substitutable.

One notable type of such components are those enabling SQL (Standard Query

Language) connectivity to Hadoop data. As discussed in the previous section on the relation

between Hadoop and the Big Data phenomenon, SQL is the industry standard used to interface

with conventional relational databases and has been in heavy utilization since it was first

commercially implemented in Oracle V2 in 1979 [50]. SQL and the emergence of middleware

standards such as ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity)

have made it possible for a large ecosystem of analytical software. Analytical software vendors

are able to create tools that support a vast variety of databases from different vendors by simply

obeying the SQL standard. The vibrancy of the analytics software market has created a large

number of sophisticated tools for business users, analysts and data scientists to extract insights

out of the data deposited within relational data sources. SQL-on-Hadoop allow these same tools

to be connected to data stored within Hadoop.

Page 80: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

70

The ability to use SQL to connect to Hadoop also makes it substantially easier to combine

and integrate data between a traditional relational data store and data stored in Hadoop. This

appeals to both users who seek to gain insight with data split across these two different types of

systems as well as vendors of traditional data warehouses (e.g. IBM, Microsoft, Oracle,

Teradata). Traditional vendors can maintain their positions as the “centers of data gravity” by

allowing Hadoop data to be federated or “managed” through their relational data platforms.

Table 11 enumerates some of the SQL on Hadoop offerings that GigaOm Research evaluated in

2013.

Administration and Management

Hadoop’s approach to scalability and resilience is fundamentally different than the

strategy typically employed in traditional enterprise data centers. In order to offer cost-effective

scalability, Hadoop was built to run on “commodity” hardware that are more prone to failure

than the types of dedicated servers traditionally found within data centers. Moreover, as the

Vendor / Community Product / Project Name

Cloudera Impala

Hadapt Adaptive Analytical Platform

Teradata SQL-H

EMC Greenplum HAWQ

Citus Data Citus DB

Splice Machine Splice Machine

Apache Drill, Hive, Stinger

JethroData JethroData

Concurrent Lingual

Table 11 - A selection of SQL on Hadoop offerings as identified by GigaOm Research in 2013 [63]

Page 81: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

71

number of computing nodes within a given cluster increases, the probability that there is a failure

within the system at some point also increases. Given that it was built to operate clusters

containing thousands of computing nodes, Hadoop (and its Google predecessor) was design to

treat “failures as the norm rather than the exception” [54].

This approach fundamentally affects the work of the datacenter operator, who must

become continuously involved by proactively maintaining the health of the cluster. While such a

mode of operation is familiar to operators in internet / cloud services companies such as Yahoo!

and Google, this represents a novel challenge for enterprise IT departments. This challenge is

exacerbated by the fact that the community of developers who contribute to Hadoop tend to be

employed by internet / cloud service companies. As a result, the administrative consoles in the

community delivered-versions of Hadoop were originally designed to their preferences. For

example, Hadoop originally offered only a rudimentary graphical user interface for managing the

basic operations of the cluster, leaving the majority of configuration and management tasks for

scripts, configuration files and API calls. This minimalistic approach was sufficient for the more

technical developer-operators employed by internet / cloud service companies, but enterprise IT

administrators tend to rely on graphical management consoles to simplify their work and have

come to expect this functionality in most software that reside in their datacenters.

Commercial vendors such as MapR and Cloudera have filled this gap with their own

proprietary solutions and use their solution as a means of differentiating their offerings from that

of the free community. A free and open source management console was not created until

commercial vendor Hortonworks incepted the Apache Ambari project as a part of delivering its

own distribution of Hadoop in 2011 [74].

Page 82: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

72

Market Overview

In its 2014 research report, the Forrester group characterized the Hadoop market as a

fragmented market where there were “lots of leaders, but none dominate” [62]. The researchers

divided the market into the six major types of players described in Table 12.

Name Description

Apache Open Source Users can directly deploy what is made available by the open

source community without engaging commercial firms.

Pure play Hadoop

Vendors

A number of start-ups have emerged to profit with a “focus on

developing, supporting and marketing unique Hadoop distributions,

add-on innovations and services”. The “Big 3” of this group are

Cloudera, Hortonworks and MapR Technologies.

Enterprise Software

Vendors

Enterprise software vendors such as IBM, Oracle, Pivotal, SAP and

Teradata offer Hadoop as a part of their own data management

solutions. They do so either by creating their own Hadoop

distributions or by supporting an existing distribution through

partnership.

Hadoop in the Cloud Cloud computing vendors such as Amazon and Microsoft have

begun to offer “on-demand” Hadoop services. This allows

enterprises to purchase Hadoop as a service and scale their Hadoop

clusters up or down at a moment’s notice.

Big Data Solution

Providers

Solution providers are system integrators that design solutions

using technologies from others within the Hadoop ecosystem.

Hadoop Accessories Forrester uses this group to refer to the tools and services that

complement the core Hadoop platform.

Table 12 – Breakdown of Hadoop-market according to Forrester Research [62]

Page 83: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

73

Amongst these six categories, the group that receives the most media attention is the pure

play Hadoop Vendors. This group is led by three startups that have combined to raise over $1.6

billion USD in private equity and venture capital in the five years between March of 2009 and

July 2014 (Figure 19). It is worth noting that the $1.6 billion was only the amount of funding

that has been put into the three firms. The actual valuation of the three companies is significantly

more. For example, Intel’s $740 million investment into Cloudera in March of 2014 was

exchanged for 18% of Cloudera’s equity, effectively valuing Cloudera at over $4 billion.

Cloudera, Hortonworks and MapR were all founded with the intent of bringing the

Hadoop platform from the realm of specialized internet companies to the IT departments of

enterprises. As such, these firms should be considered platform providers in the topology of

Eisenmann, Parker and Van Alstyne.

Figure 19 – Cumulative Investments in Pureplay Hadoop Vendors according to CrunchBase in 2014 [75]

2009 2010 2011 2012 2013 2014

Cloudera $11 $36 $76 $141 $141 $1,201

Hortonworks $- $- $48 $48 $98 $248

MapR Technologies $9 $9 $29 $29 $64 $174

Total $20 $45 $153 $218 $303 $1,623

Table 13 – Cumulative Investments in Pureplay Hadoop Vendors according to CrunchBase in 2014 (in Millions of USD) [75]

$-

$200.00

$400.00

$600.00

$800.00

$1,000.00

$1,200.00

$1,400.00

$1,600.00

$1,800.00

Mill

ion

s

Total Funding of Pureplay Hadoop Vendors (in Millions of USD)

Cloudera Hortonworks MapR Technologies Total

Page 84: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

74

While pure play Hadoop vendors tend to be the focus of analyst attention, they currently

trail significantly behind enterprise software vendors in capturing value from the Big Data

market. According to estimates provided by the research firm Wikibon, the combined revenue of

the three major Hadoop vendors totaled approximately $163 million USD in 2013. As all three

firms are privately held, Wikibon “triangulated” these numbers through discussions with various

industry observers, company insiders and other sources. Consequently, these numbers must be

used with caution. In fact, the November 2014 Form S-1 provided by Hortonworks as part of its

initial public offering (IPO) application reveals that the gross billings of the company was

substantially less than what Wikibon had estimated [76]. Nevertheless, these numbers are useful

for illustrating the relative scale of the “Big 3” Hadoop pure play vendors compared to the scale

of enterprise software vendors within the Big Data space (Figure 20)

.

Figure 20 - Big Data-related Software and Services Revenue of the Top 3 Enterprise Software Firms vs. Pureplay Hadoop

Vendors in millions of USD (from Wikibon, processed data in Table 18 of Appendix) [77]

While the bulk of big data revenue for enterprise software firms comes from their

traditional data warehousing products based on relational database technology, enterprise

vendors are also making sizable investments into the Hadoop world. This investment is

$-

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

$800.00

$900.00

$1,000.00

Big Data Related Software and Services Revenue 2013

Big Data Services Revenue

Big Data Software Revenue

Page 85: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

75

necessary as the market interest in Hadoop increases and as the attention of their customers shift

towards solving the types of problems that Hadoop is well-equipped to solve (i.e. unstructured or

semi-structured data, large data of unknown value and usage). IBM created its own Hadoop

distribution in 2011 as a part of its IBM BigInsights analytical offering and have since built out a

number of proprietary tools and technologies to work on top of Hadoop. EMC-spinoff Pivotal

created a comparable offering in its Pivotal HD product line. Others like Microsoft, HP, Oracle

and Teradata have partnership arrangements with Cloudera, Hortonworks and MapR to bundle,

resell or redistribute their Hadoop distributions. This creates an interesting tension for enterprise

software vendors as they need to rationalize and position Hadoop alongside their existing

offerings. With the notable exception of Pivotal, the majority of these firms position Hadoop as

a complementary component within a larger big data platform, rather than a platform itself

(Table 14).

Vendor Sample External Positioning of Hadoop

IBM “New data management and analytic technologies are being implemented to

complement rather than replace traditional approaches to data management and

analytics. Thus Apache Hadoop does not replace the data warehouse and NoSQL

databases do not replace transactional relational databases” [78]

SAP “SAP customers can incorporate enterprise Hadoop as a complement within a data

architecture that includes SAP HANA and SAP BusinessObjects enabling a broad

range of new analytic application” [79]

Oracle “New big data technologies, such as Hadoop and Oracle NoSQL database, run

alongside your Oracle data warehouse to deliver business value and address your

big data requirements” [80]

Teradata “Teradata Unified Data Architecture is the only truly integrated analytics solution

that unifies multiple technologies into a cohesive and transparent architecture that

leverages the best-of-breed complementary value of Teradata, Teradata Aster and

open source Hadoop” [81]

Page 86: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

76

Microsoft "The Microsoft Analytics Platform System is a no-compromise modern

data warehouse solution that seamlessly combines a best-in-class

relational database management system, in-memory technologies,

Hadoop, and cloud integration in a turnkey package built for Big Data

Analytics” [82]

Table 14 – Sample Hadoop positioning statements by Enterprise Software vendors

In addition to being some of their most formidable inter-network and intra-network

platform competitors, enterprise software vendors are also some of the most valuable partners for

pure play Hadoop vendors. Mega vendors such as IBM, SAP and Oracle are frequently the

providers of the products and services that are critical complements for Hadoop. In fact, every

single enterprise software vendor listed above possess partnership arrangements with either

MapR, Cloudera or Hortonworks (Table 15).

Cloudera Hortonworks MapR

IBM X X X

SAP X X X

Oracle X X

Teradata X X X

Microsoft X X X

Table 15 - Partnership matrix between pure play vendors and enterprise software vendors [83]–[85]

Of course, enterprise software vendors are not the only providers of key complements for

the Hadoop platform. “Hadoop in the Cloud” providers as well as independent software vendors

(ISVs) creating “Hadoop Accessories” also play a critical role in completing the ecosystem. In

the former category, two vendors are especially worth highlighting.

Page 87: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

77

According to the 2014 version of the Gartner group’s Magic Quadrant for Cloud

Infrastructure as a Service, Amazon is the leading provider of cloud Infrastructure-as-a-Service,

leading its only competitor (Microsoft) within the “leaders” quadrant by a significant margin in

both “Completeness of Vision” and “Ability to Execute” [86]. Amazon built upon this leadership

position to establish a significant presence in the Hadoop market with its Hadoop-as-a-Service

(HaaS) offering, Elastic MapReduce (EMR). From a technical perspective, EMR differs

substantially than the canonical Apache-based Hadoop stack in that key components (e.g.

distributed storage layer) is substituted with Amazon’s proprietary web-services (e.g. Amazon

S3). According to a study by Accenture in 2013, cloud-delivered Hadoop services is superior to

on-premise “bare-metal” deployments in price, performance as well as flexibility [87]. As a

result of these advantages, adoption of cloud-delivered Hadoop services, and consequently,

Amazon’s influence over the Hadoop market, is expected to grow.

The other “Hadoop in the Cloud” vendor worth mentioning is Berkeley start up

Databricks. Databricks offers a cloud-delivered version of its Hadoop platform variant, featuring

the open source Apache Spark technology. As discussed in the architectural overview section on

distributed processing frameworks, key Hadoop players such as Cloudera, MapR and IBM have

embraced Apache Spark as a likely successor for MapReduce and the framework is rapidly

becoming the standard execution engine for Hadoop complements. At present, the firm’s only

product offering (Databricks Cloud) is a nascent and nominal entry in the emerging Hadoop-as-

a-Service market. However, beyond employing Spark creator Matei Zaharia as its CTO,

Databricks also employs 30% of all approved “committers” for Apache Spark, the most of any

organization (Figure 21). This unique competency affords Databricks a disproportionately large

reach and influence over the Hadoop ecosystem relative to its scale and size (Figure 21).

Page 88: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

78

Figure 21 - Official Committers to Apache Spark by Organization [88]

Strategic Factors affecting Platform Leadership within the Hadoop Ecosystem

In the following section, an analysis of the critical factors impacting a firm’s ability to

direct the trajectory of the Hadoop ecosystem is presented using the framework outlined in

Chapter 2. This analysis will focus on the three “pure play” Hadoop vendors as their long term

success is most directly affected by their ability to harness the growth of the Hadoop platform.

It is worth reiterating at this point that the objective of this thesis is not to assess which

Hadoop firm is most likely to succeed. Instead, this analysis is intended to identify the firms’

assessments of the market forces and how their assessments affect their strategies. Given the

rapid changes that are occurring daily within the Hadoop ecosystem, it is entirely possible that a

firm’s perspective of the market will have changed by the time this thesis is published or

consumed, rendering the detailed analysis content outdated. However, the observed behavior of

each of the firms is consistent with its perspective at the time of writing, so the analysis is

nevertheless useful for illustrating how a firm’s assessment of these market forces materially

affects its strategies.

Committers to Apache Spark

Databricks

UC Berkeley

Yahoo!

Quantifind

Mxit

ClearStory Data

Groupon

National University of Singapore

Webtrends

Bizo

Alibaba

Imaginea, Pramati, Databricks

Page 89: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

79

Rivalry - Inter-network vs. Intra-network Competition

One of the fundamental questions that a pure play vendor must answer for itself is

whether its primary competition are other Hadoop providers (intra-network competition) or

alternative platforms (inter-network competition) such as those offered by Teradata, Oracle or

IBM. This assessment affects all aspects of the business’s strategy including how they position

their products to the market, the technical areas within the platform that they choose to invest in,

the partnerships that they choose to pursue and their interactions with the open source

community.

Current market share leaders Cloudera and Hortonworks appear to differ in their

assessments of which competitive battle is most critical to long-term success. In October 2013,

Cloudera began marketing its commercial Hadoop distribution as an “Enterprise Data Hub” and

began articulating a vision that describes its Hadoop-based platform as a “unified data

management platform” capable of addressing all data management needs of the enterprise [89].

The firm posits that Hadoop’s superior cost-effectiveness and flexibility makes it the natural

“center of data centers as the first place data goes when it enters the enterprise, rather than at the

side of the data center to solve a few, ancillary problems” [90].

While Cloudera has since been careful to clarify that it does not intend to position its

product as an immediate alternative to specialized solutions like the traditional Enterprise Data

Warehouses that its large partners offer, it has also been clear on its perspective that “workloads

that belong in high-end enterprise data warehousing systems today, won’t in the future – and

even high-performance, interactive analytic workloads will run in Hadoop” [91]. Cloudera

describes its “Enterprise Data Hub” distribution of Hadoop as a complete data management

platform for companies with a multitude of data management needs, and not as a point solution

used to fill the gaps left by traditional relational technologies. In early 2014, Cloudera’s director

of marketing Alan Saldich was quoted as saying that Cloudera has “many, many customers that

are substituting an enterprise data hub built on Hadoop for incremental purchases of a whole

range of data management infrastructure, including relational databases, enterprise data

warehouses, storage, and mainframes". In the same article, he also asserted that Cloudera’s

customers are not comparing its product to alternatives from Hadoop-vendors like Hortonworks,

but rather to solutions from IBM or Teradata [92]. In other words, Cloudera’s ambitions are not

Page 90: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

80

to become the leading provider of Hadoop-distributions but to become a leading provider of data

platforms, Hadoop or otherwise (inter-network competition).

Implications on Lever 3 (External Relationships)

Cloudera’s view of Hadoop as a platform that can eventually displace traditional data

platforms like the relational database or the enterprise data warehouse clearly differs from the

public position of its rival Hortonworks. Rob Bearden, CEO of Hortonworks, describes Hadoop

as a “rock solid” platform for processing unstructured data and expressed no desire to “reinvent

the wheel” and compete with relational database vendors such as IBM, Oracle or Teradata.

Bearden and Hortonworks espouse a “coexistence” view of Hadoop that more narrowly positions

the technology as a complement to traditional data management technologies. Hortonworks has

correspondingly invested in having its distribution “adopted and integrated seamlessly into

[these] environments”. Bearden argues that Hadoop can extend traditional data platforms such

as Teradata and “let it manage a much bigger data set, a 10 to 20 times bigger data set and have

Hadoop as an extension of its architecture” [93]. In other words, Hortonworks does not see a

need to position Hadoop as a viable alternative to traditional data management platforms but

rather focuses on competing effectively against intra-network Hadoop competitors by offering

superior integration into traditional data management environments.

The different focus on inter-network and intra-network competition of the different firms

impacts the ease with which the two firms are able to engage with other members within the

ecosystem. In some respects, Hortonwork’s positioning of Hadoop as a complement of existing

data platforms is more compatible with the perspectives of larger enterprise software vendors and

has likely assisted the firm in striking lucrative reseller arrangements with some of these giants.

HP, Microsoft, SAP and Teradata all have “strategic reseller” arrangements with Hortonworks

[91]. In 2013, Hortonwork’s arrangement with Microsoft was responsible for over 55% of

Hortonwork’s total revenue [76] and the Redmond giant actually embeds a variant of the

Hortonworks Data Platform in its HDInsights offering on its Azure cloud platform. Similarly, the

Teradata Portfolio of Hadoop redistributes a variant of Hortonwork’s HDP branded as “Teradata

Open Distribution for Hadoop” [94].

One could argue the nature of Cloudera’s relationship with these enterprise vendors are of

lesser intensities. For example, while Cloudera recently announced a significant go-to-market

Page 91: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

81

partnership with Teradata (as of 2014), Teradata still does not pay Cloudera for its technologies

in the way it does to Hortonworks; Teradata trails only Microsoft and Yahoo! in terms of

contribution to Hortonworks’ top line [76]. While Cloudera’s distribution is supported and has

been sold by HP as part of its AppSystem solution since 2012, HP made a $50 million USD

investment in Hortonworks in 2014 that appears to reflect its partnership preference [95].

However, some industry analysts have observed that enterprise software vendors are finding that

it is advantageous “to be polygamous in their relationships with Hadoop distro providers” [96].

Consequently, it is unclear if Hortonworks’ positional advantage in collaborating with enterprise

technology vendors is material or sustainable.

Implications on Lever 1 (Scope of the Firm)

While Cloudera’s intention of competing with platforms beyond Hadoop may have

impeded its collaboration with enterprise partners, this ambition appears to provide the firm with

a vision for the future of the Hadoop platform that has advanced the platform forward. This

vision has also affected the firm’s “Scope of the Firm” (Lever 1) decision making.

One example of this is Cloudera’s decision to invest in the creation of a “Fast SQL”

engine for Hadoop called Impala. In a private interview completed for this thesis, the firm’s

chairman Mike Olson shared that it was obvious “that many [existing proprietary SQL engines]

would eventually be ported to Hadoop, because Hadoop matters” [97]. Given such a

perspective, a firm focused on competing effectively with other Hadoop vendors would likely

choose to partner with an incumbent SQL vendor, rather than build yet another competitor in the

crowded space and compete with its own SQL entry. However, as SQL was a crucial interface

that connects a significant number of pre-existing complementary applications, Cloudera

believed that it was crucial for a fast SQL engine to be part of Hadoop’s “open core”, rather than

be a proprietary component external to the core platform. As a result, the firm invested into

developing Impala as a Cloudera-governed open source project. The firm believed that the

inclusion of Impala as part of Cloudera’s distribution of Hadoop was necessary to bolster its

competitiveness against traditional data management solutions.

Cloudera’s decision reset expectations of what should be available “out-of-the-box”

within a Hadoop distribution and sparked the development of additional fast-SQL-in-Hadoop

projects like Apache Stinger, which have further bolstered the viability of the Hadoop platform.

Page 92: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

82

Olson cites Cloudera’s integration of Apache Solr-based search and its embrace of Apache Spark

as other examples of the firm’s continued thought leadership in making Hadoop the leading big-

data platform. One can argue that Cloudera’s focus on competing effectively against alternative

platforms has caused it to push the boundaries of the Hadoop platform extent, benefiting the

growth of the ecosystem as a whole.

While Hortonworks and MapR have also developed a number of significant new

technologies that improve Hadoop’s viability, they have been less aggressive in changing the

platform extent of Hadoop by introducing new capabilities to the platform. Rather, the firms

have focused their efforts on offering improved implementations of capabilities that were already

available in the market. This differing investment philosophy reflect the focus of the pair on

competing effectively within the Hadoop ecosystem rather than competing beyond it. For

example, while Hortonworks has been responsible for the engineering horsepower behind

substantial projects such as Ambari and Stinger, these projects were not pioneering new grounds

in offering new functionality to the Hadoop market, but rather Apache-community

implementations of capabilities that were already available to the ecosystem at large through

proprietary extensions developed by companies such as Pivotal and Cloudera. Similarly, MapR

has focused its proprietary engineering effort on offering superior implementations of core

components such as the distributed file system, as it strives to become the most enterprise-ready

distribution of Hadoop available. While this allows MapR to differentiate itself from the likes of

Cloudera and Hortonworks, its investments in this area also do not introduce new platform

functionality that equip Hadoop to compete against alternative platforms.

As the largest pure play vendor in the Hadoop market and the “incumbent leader” due to

its first-mover advantage, Cloudera’s growth is unlikely to come at the expense of its intra-

network rivals [77]. It is unsurprising that its search for growth has led it to look to the broader

data management market and engage in inter-network competition. Conversely, MapR and

Hortonworks recognize that maximizing the growth potential of the Hadoop market require them

to capture a greater portion of the market. Consequently, it is also natural for these firms to focus

on intra-network competition. Generalizing from this case-study, one can infer that a platform

firm’s focus on inter-network vs. intra-network competition is heavily influenced by the extent to

which it is currently positioned to capture the growth in the platform. If a firm is already

Page 93: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

83

positioned to capture the majority of growth in a given platform, it will focus on inter-network

competition and winning against alternative platforms. Firms that are not the primary

beneficiaries of a platform’s growth will instead focus on growing their market share and on

intra-network competition.

Suppliers - Securing the Upstream Value Chain

As briefly mentioned in the market overview section, Intel made a substantial investment

of $740 billion USD in Cloudera at the beginning of 2014, acquiring 18% of the company [98].

This investment was not only noteworthy in terms of its magnitude, but also because it

represented a rapid and surprising shift in Intel’s approach to the Hadoop market. Intel had

entered the Hadoop market developing and bringing to market its own Intel Distribution of

Hadoop that was optimized for its microprocessors only a year earlier [99]. In a private

interview conducted for the purpose for this thesis, Intel’s Big Data GM Ron Kasabian explained

Intel’s initial motivation for getting into the Hadoop market stemmed from a desire to accelerate

the adoption of Hadoop in the enterprise. It believed it could do so by introducing the

“enterprise-hardening” features to the platform that Intel believed were critical for mass adoption

amongst enterprises.

Intel understood the challenges and opportunities of Big Data itself as it faced an

explosion of data in its own operations; Kasabian shared that Intel’s own factories generate as

much as “five terabytes of data every hour”. With an internal estimate of 94% market share in

the datacenter microprocessor market, Intel believed that it would be one of the primary

beneficiaries of mass enterprise adoption of the computationally intensive Hadoop technology.

At the time of Intel’s investment into the Hadoop space in early 2011, there were no industry

leaders within the Hadoop ecosystem which Intel felt was equipped to drive adoption of Hadoop

within the enterprise. Intel decided to invest in the technology, initially believing that it could

not only accelerate the adoption of Hadoop, but also that it could become the market leader in the

space given the firm’s unique complementary assets. The firm believed that by optimizing

Hadoop for its own microprocessors, it could not only reinforce its dominant position in the

growing Hadoop market, but also compete effectively with other Hadoop vendors on the virtues

of superior performance.

Page 94: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

84

Despite some initial market success, particularly within China where Intel’s distribution

of Hadoop was number one in market share, Intel decided that its Hadoop objectives were better

met by investing into Cloudera rather than continuing with its own distribution. Beyond taking

an 18% stake in the pure play vendor, Intel also agreed to cease development of its Intel

Distribution and have its engineers bring its optimizations into Cloudera’s distribution.

Relation to Lever 2 (Product Technology)

According to Kasabian, one of the reasons that Intel decided to abandon its own

distribution in favor of partnering with a Hadoop pure play vendor is the fact that it wanted to

drive its optimizations back into the core of the Apache governed projects in order to affect the

ecosystem in the manner it desired. Although it had a number of Apache-approved committers

on staff, Intel had far fewer committers than either Hortonworks or Cloudera. By partnering with

a pure play Hadoop vendor, Intel was much more likely to get its patches contributed back into

the Apache core projects and adopted by the broader community, including other Hadoop

vendors and their customers.

Figure 22 – Hadoop Contributors by Organization - Hortonworks and Cloudera employ the most Hadoop committers out of all

Hadoop - Yahoo and Facebook are Hadoop users and not vendors – Data extracted and analyzed from various projects website

at www.apache.org.

0

10

20

30

40

50

60

70

80

90

# of Committers to Hadoop-related Apache Projects by Company

Zookeeper

Tez

Spark

Pig

Hive

Hbase

Hadoop

Accumulo

Page 95: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

85

Intel’s assessment reflects the criticality of securing unique access to the critical

resources (i.e. influential members of the open source community) in ensuring that a firm is able

to influence the architectural trajectory of an open source platform. While the majority of open

source projects operate as meritocracies with a distributed center of authority, the truth is that a

small subset of contributors (“committers” in the vernacular of the Apache Foundation) are

responsible for the majority of technical decisions within a project at any given time. This core

group also tend to remain relatively stable for a given project. If a firm is to be a leader of an

open source platform, it must have access to such individuals for key projects; this access is a

prerequisite for wielding to “Lever 2” (product technology) of platform leadership.

Beyond gaining access to the committers that affect technical decision making, a software

firm must also staff itself with individuals who deeply understand the technology used by its

customers. Moreover, firms must convince the market at large that it has done so. In a market

such as enterprise software, the perceived pedigree of a firm’s engineering staff can be a major

consideration in the deliberate and scrutinized purchasing process. As a consequence, one can

argue that the competitive advantage of employing technical contributors in the community

stems as much from its marketing value as the actual engineering capability gained.

Being active and visible in the open source community is one way for open source firms

to convince customers of its competency in a particular open source technology. Cloudera and

MapR have engaged in very public debates regarding which firm has contributed more to the

open source development of Hadoop for this reason [100], [101]. In fact, when Hortonworks

was spun out of Yahoo! in 2011, one of its primary marketing messages was that it employed

more experienced Hadoop contributors by virtue of its Yahoo! lineage than any other company

and thus, was the best equipped to support Hadoop in the wild [102]. Open source platform

contenders in such markets also tend to employ highly visible community leaders as a part of

their management team for a similar reason; MapR, Cloudera and Hortonworks all employ

highly visible members of the Hadoop community as part of their senior leadership teams.

Collaboration with Open Source Community

Like all open source vendors, pure play Hadoop vendors face a number of options for

managing new intellectual property. There is an expectation for open source software vendors to

contribute some of their innovations back to the community. Such contributions can be

Page 96: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

86

strategically advantageous to the firm, as they can be means of influencing the trajectory of the

technology in the firm’s favor. However, it may also make sense for a firm to withhold an

innovation for itself as a means of differentiating its platform variant. If a firm opts to develop in

open source, it also needs to make a conscious decision of doing so under the governance of an

independent community such as the Apache Software Foundation, or to drive the process itself.

Within the Hadoop market, Hortonworks is the only vendor that has publically committed

to maintaining a 100% open source development model. The firm does not only do all of its

development in open source but also commits to developing “exclusively via the Apache

Software Foundation process” [103]. While this decision has implications on the company’s

business model (the company has no unique product to license and therefore, can only be a

strictly services company), it does help the firm win the minds and wallets of its customers in a

number of ways. The company’s contributions and commitment to open development create

considerable brand equity for Hortonworks within the Hadoop community itself, and this equity

can translate into influence within the community and credibility with customers in the

marketplace. The firm also heavily markets the danger of vendor lock-in that can occur with a

Hadoop distribution that is not fully open and capitalizes on this message by publicizing its

unique position as the only firm at scale to sell enterprise support for a truly open distribution

[104].

Although developing purely in the Apache model allows Hortonworks to differentiate

itself from the other vendors, it also means that the firm is limited with regards to how it can

influence the ecosystem to its exclusive benefit. Despite the fact that Hortonworks is the

plurality leader in terms of employed Apache committers in Hadoop-related projects, its

workforce still represents only a fraction of the community. Therefore, Hortonworks cannot

make technological decisions for the platform unilaterally. Moreover, while Hortonworks shares

all of its technology with its competitors, they do not necessarily reciprocate. Consequently,

Hortonworks may find itself occasionally trailing its competitors when it comes to the features

and functions that are included with its platform variant.

Unlike Hortonworks, MapR and Cloudera engage in both proprietary and open source

development. Both firms employ an “Open-Core” model, offering a for-profit commercial

product by extending open source technologies with proprietary extensions. Cloudera’s Mike

Page 97: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

87

Olson and MapR’s John Schroeder have both written public articles explaining why the

development and possession of proprietary intellectual property are necessary for creating a

sustainable businesses that their enterprise customers can rely on [3], [105].

Despite asserting that the possession of proprietary intellectual property is a necessity for

sustained profitability, both Cloudera and Hortonworks still engage in open source development

for some of their new technology initiatives. This relates to the observation made at the

beginning of this thesis, which is that establishing new proprietary platform standards has

become tremendously difficult, and open source development is a tactic that can be deployed to

accelerate industry adoption. If having a standard implementation or if sharing engineering

resources across the entire ecosystem is in the interest of the individual vendors, then it is

typically best for that project to be governed by an independent authority such as the Apache

Software Foundation. In Hadoop, an example of such a type of technology would be the

distributed processing framework itself. Processing frameworks such as Tez or Spark is

simultaneously too complex for an individual firm to develop and too important to the ecosystem

to be fragmented by proprietary development. The firms are better off collectively contributing

to the improvement of this unifying platform component than to risk fracturing the ecosystem in

the hopes of differentiation.

If a vendor believes that a given innovation is likely to help differentiate its platform

variant, then it may prefer to keep the technology proprietary and its source closed. This allows

the company to differentiate its solution and maintain control over its technology. However, if

this differentiating technology occurs in an interface component that sit between the platform and

a type of complement, the firm may need to develop it in open source in order to encourage

broader adoption. In such a scenario, the firm may opt to do so without involving an

independent authority such as the Apache Software Foundation. This allows the firm to maintain

maximum control over the project while avoiding the stigma of being a proprietary or closed

technology. Of course, such a project structure is unlikely to benefit from the resource pooling

of an independently governed project as competing platform vendors are unlikely to contribute.

Cloudera’s Impala is a prime example of such a project.

Figure 23 attempts to summarize the considerations for development model selection

discussed above into a simple decision tree for a given area of innovation.

Page 98: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

88

Figure 23 – Proposed decision-tree for selecting between a proprietary, sponsored open source and community governed open

source model for a given innovation (original creation).

Page 99: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

89

Complementors - Identifying and Securing Critical Complements

As mentioned in Chapter 2, both proprietary and open source platform contenders need to

be proactive in managing critical platform complements. However, open source platform

vendors face a unique challenge in that a community organization such as the Apache Software

Foundation may act as a broker for connecting the platform to its complements. In the topology

of Eisenmann, Parker and Van Alstyne, the Apache organization effectively acts as the platform

sponsor. As a consequence of this, open source vendors cannot use some of the levers afforded to

proprietary firms (such as developing exclusive interfaces with Lever 1) for securing unique

complements to the open platform. Instead, the firms must rely on alternative techniques, such

as the development of partnership arrangements or the introduction of proprietary interface

components to regain that leverage.

Partnership Programs

All three pure play Hadoop vendors boast robust partnership programs for key

complements. For Hadoop, one primary type of complements that enhances the value of the

platform are applications created by independent software vendors (ISVs) that specialize Hadoop

to a specific market or function. As all three firms shares a significant number of public

interfaces governed by the Apache Software Foundation process, applications that work on one

vendor’s Hadoop distribution tend to work also work on another’s. However, in the Enterprise

Software market, “officially supported” software and “technical compatible” components are

hugely differentiated and most enterprise information technology departments are only willing to

adopt the former class of software. Consequently, all three firms offer partnership or

certification programs in order to assure potential customers that solutions created by

independent software vendors are fully supported on their platforms.

Given the common architectural foundation and interfaces for the three vendors, it is

difficult to curate a fully differentiated partner ecosystem on the basis of applications alone; a

software vendor that has built software for Hadoop face very low barriers for multi-homing

across the different distributions. Table 20 of the appendix shows a composite matrix listing the

independent software vendors and technology partners that Cloudera, Hortonworks and MapR

claim on their respective company websites as of November 2014. The three firms do not only

boast an extremely similar number of such partners (Cloudera: 164, Hortonworks: 156 and

Page 100: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

90

MapR: 159), but they also have a substantial number of partners in common. The majority of

Hortonworks and Cloudera’s ISV partners have a relationship with at least one of the other pure

play Hadoop vendors.

Figure 24 - Analysis of Exclusivity of Partnership Arrangements for Hadoop ISVs

Given that it is difficult to differentiate a given open source platform variant from its

intra-network rivals based on the primary types of platform complements (i.e. applications),

aspiring platform vendors may need to look to attract other types of complements for

differentiation. Firms may form different opinions about which types of complements are most

valuable beyond the primary complement types.

In a private interview, Mike Olson shared that Cloudera actively pursued the Intel

partnership because it recognized the unique competitive advantage offered by visibility into

Intel’s roadmap and access to Intel’s unique engineering talent [97]. Olson believed that

customers would greatly value the superior performance that an Intel-optimized Hadoop

distribution would hypothetically offer. Ron Kasabian of Intel later corroborated by stating that

one of the primary reasons that the Cloudera leadership team appeared to appreciate the value

proposition that Intel was bringing to the table more than its competitors. This anecdote

illustrates the different emphasis that each firm may place on different complement types.

88 90

59

76 66

100

Cloudera Hortonworks MapR

Non-exclusive Partnership Exclusive Partnership

Page 101: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

91

Buyers - Controlling the Path to the Customer

Mediating the Purchasing Process of Complements

Enterprise software products are inherently complex systems. New products brought

onto a customer’s landscape must integrate with numerous systems that already reside there.

These existing systems are used, administrated and developed by different individuals and

organizations using different technologies from different eras. Consequently, implementations of

enterprises software systems are often expensive and lengthy endeavors that may require

millions of dollars and hundreds of man-years to complete. As a result of the significant

investment that they represent, the purchasing process of enterprise software systems also tend to

be long and complicated. Enterprises often enlist the help of multiple parties, including

consultants like IBM or Accenture to help them make the best possible decision when selecting

their vendors.

The complexity of this purchasing process is simultaneously an opportunity and a

challenge for Hadoop vendors. As mentioned in the corresponding section in Chapter 2, aspiring

open source platform contenders can differentiate their platform by mediating the purchase

process of complements for customers. Android platform providers attempt to do this by

providing electronic application marketplaces in order to simplify the user-driven acquisition

process of complementary applications for their platforms. The platform providers’ involvement

in complement delivery also provide them with a channel to influence and govern the behavior

of complement creators. Unfortunately, due to the more elaborate purchasing process of

enterprise software, Hadoop vendors cannot take complete ownership of the application

purchasing process in the manner that mobile platform vendors have attempted to. However,

Hadoop vendors still attempt to actively participate in that process to help expedite it and to exert

influence. One specific way they attempt to do that is through the creation of partnership or

certification programs for their platform.

As mentioned in the previous section on securing critical complements, Cloudera,

Hortonworks and MapR all offer programs to help assure the customer that a given complement

provider’s product is compatible with their platform variants. However, certification programs

are also intended to serve a few additional purposes. The programs are also intended to simplify

the application selection process by helping customers identifying the complement vendors

Page 102: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

92

available to them. Cloudera describes this intent on their website in the following manner – “The

Cloudera Certified Technology program is designed to make choosing the right technology

easier. When you see the Cloudera Certified Technology logo, you can trust that the product has

been tested and validated to work with CDH, our 100% open source and enterprise-ready

distribution of Apache Hadoop and related projects.” [84]. To this end, each of the firms

dedicate prominent sections of their company websites to help customers find potential

complement vendors.

Beyond helping customers identify the right partners, the certification process also

provide an opportunity for aspiring platform contenders to exert influence over the behavior of

complement creators. For example. Hortonworks explains that technology certified through its

partnership program “are reviewed for architectural best practices”, while Cloudera states that it

verifies that its partners “comply with Cloudera development guidelines for integration with

Hadoop”. These review processes give the firms an opportunity to guide a partner organization

towards integrating with platform components and interfaces in a manner favorable to them. For

example, Hortonworks and Cloudera offer very different administrative environments for their

platform variants, with Cloudera offering its proprietary Cloudera Enterprise Manager and

Hortonworks offering a similar environment in Apache Ambari. Though neither firms impose

this today, it would be possible and reasonable for the firms to require that a complement

provider integrate into their specific administration consoles in order to achieve certification. As

complement producers are often smaller vendors that depend on the endorsement of the platform

providers to reach potential clients, the certification process acts as a powerful bargaining chip

for the platform contenders to influence their activities. Even enterprise software vendors which

exceed the pure play Hadoop vendors in scope and scale lack the expertise and credibility of the

pure play vendors within the Hadoop market and may need to look to these certification

programs to gain credibility with their customers. This offers smaller vendors additional

leverage to bargain against their powerful competitors.

Beyond participating in the purchasing process of complements, Hadoop vendors also

seek to influence the purchasing process for the platform itself by partnering with key

stakeholders of the purchasing process. In enterprise software, major influencers in the

purchasing process include system integrators and IT consultancies such as Accenture, Infosys

Page 103: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

93

and IBM Global Services as well full-stack mega vendors such as Oracle, IBM and SAP. In

enterprise software, it is not uncommon for some of these firms to become so embedded within

the operations of a large enterprise that the endorsement and approval of these firms can

determine whether or not a smaller vendor will be considered for a deal. All three of the existing

Hadoop pure play vendors recognize this and have partnered with these firms as a means of

ensuring that their platform variants are considered in the selection process. One may infer from

the prior discussion on Hortonworks’ close product and reseller partnerships with enterprise

mega vendors that it holds a distinct advantage over its competitors in this regard. However, due

to the organizational separation between the product and services organizations that exist in most

of these vendors, the impact of those partnerships on the vendor selection appears to be minimal.

Table 16 - Partnerships between pure play Hadoop vendors and leading cloud IaaS vendors – sourced from company websites

One interesting type of partners for Hadoop platform vendors are cloud Infrastructure-as-

a-Service vendors such as Amazon, Google and Microsoft. These three software giants each

offer their own Big Data solutions as a Service-based offering (Elastic MapReduce for Amazon,

BigQuery for Google and Azure HDInsights for Microsoft) that compete with the offerings of the

pure play Hadoop vendors. The three software giants possess a significant go-to-market

advantage over the pure play vendors as they are able to offer both the software as well as the

underlying hardware infrastructure in a single package, significantly simplifying the overall

acquisition process for a big data solution for customers. Pure play vendors have attempted to

nullify this advantage by integrating with some of these cloud vendors; Table 16 enumerates

Amazon Google Microsoft

Cloudera N/A N/A CDH Available via Azure

Marketplace

Hortonworks N/A N/A Directly integrated into

HDInsights

MapR Directly available as

EMR Option

Exclusive Distribution

on GCE

N/A

Page 104: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

94

some of the integrations that have been pursued. Of the three pure play vendors, Cloudera is

arguably the least integrated into the offerings of these leading cloud vendors, with its only

integration point being the availability of its solution via the solution marketplace offered by

Microsoft’s Azure. According to some industry observers, this is the reason that the company

has pursued and heavily marketed its partnerships with some of the smaller cloud operators

[106].

Substitutes and New Entrants - The Threat of Shifting Platform Boundaries

In an August 2014 article to the Association of Computing Machinery (ACM), noted MIT

adjunct professor of computing science and database luminary Michael Stonebraker observed

that what exactly makes a Hadoop solution Hadoop is fairly ephemeral [107]. Stonebraker

pointed out that the MapReduce-based distributed processing framework that had been

synonymous with Hadoop has been abandoned by newer projects like Cloudera Impala; Impala

uses its own optimized distributed processing engine which accesses the Hadoop Distributed File

System (HDFS) directly. Apache Spark, originally developed independently of the Hadoop

ecosystem, is now embraced by the Hadoop community to an extent that it joins the original

Hadoop MapReduce framework and its successor (Tez) as standard processing frameworks for

Hadoop. A consequence of this, Stonebraker observed that only thing that seems to be a

condition for a platform to be labelled as “Hadoop” is the usage of the HDFS as a storage and

persistence at the bottom of the technology stack.

Page 105: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

95

Figure 25 - Hadoop in 2011 vs. 2014 – A “Hadoop” deployment in 2011 always contained the components that were considered

part of the ‘core’ Hadoop platform, and likely a component that was a part of what was considered the extended platform. A

deployment of Hadoop in 2014 may not include any components of either sorts.

Though the focus of his article was on another topic, Stonebraker was pointing the

continuous and substantial shifts in the platform boundaries of Hadoop. At Hadoop’s origin back

in 2007, the MapReduce and HDFS framework were clearly defined as “core” to the Hadoop

platform. Subsequent efforts like Apache Hive built upon this core and were so useful and

ubiquitous that they were effectively considered a part of the extended platform. Subsequent

requirements and new technologies have emerged to displace components that were even of this

original platform. In fact, the shifts have been more substantial and the definition of “Hadoop”

murkier than even what Stonebraker posited. The usage of HDFS cannot be relied upon as a

condition for defining what constitutes a “Hadoop” distribution; the MapR distributions of

Hadoop do not use HDFS at all but rather the MapR Distributed File System mentioned earlier.

Consequently, a MapR customer that uses only Impala in their “Hadoop” implementation in

2014 will not use any major components that would be considered “core” to Hadoop only a few

years earlier (Figure 25). Of course, this begs the question – what exactly is the “Hadoop

Platform” if it is not defined by any specific technology or major component?

Page 106: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

96

Figure 26 – Displacing Core Components – The fact that platform-internal APIs are well documented in open source systems

allow core components to be substituted (HDFS MapR NFS). Dependent components (HIVE Shark) can also be forked and

adapted in the case where clean substitution is not possible (MapReduce Spark).

.

Given that Hadoop is an industry platform mediating the Hadoop ecosystem, one answer

may be that “Hadoop platform is a collection of technologies that binds together the Hadoop

ecosystem”. While this definition seems tautological on the surface, it actually addresses the

puzzle at hand. While the MapReduce engine and the HDFS encapsulated the original value-

generating intellectual property that motivated the genesis of the ecosystem, they are not the

technologies that bind the ecosystem together. That responsibility lies with the relatively simpler

interfaces and interfacing subsystems that allow these components to be connected to one

another. By this logic, any technology stack that provides these interfaces ought to be

considered a “Hadoop Platform” provider, even if the technology do not share the same lineage

or development leaders as the others. For example, MapR’s product is considered a Hadoop

distribution, because the company provided a proprietary alternative of HDFS that was both API

and wire protocol compatible with the HDFS components. It is worth noting that while this

would have been theoretically possible in a proprietary platform as well, this type of low-level

component substitution is extremely unlikely to happen in proprietary platforms as interfaces

between internal platform subsystems would not have been documented and easily substitutable.

Interestingly, even in the cases where a platform component cannot cleanly fulfil the

interfaces of a core platform component, the open source nature of the platform meant that

dependent components can be forked and adapted by motivated parties. For example, although

Page 107: Platform Leadership in Open Source Software

Chapter 3 – A Case Study on Hadoop

97

Apache Spark could not provide exactly the same Mapper and Reducer APIs that the original

MapReduce engine provided for its clients, the Spark team was able to modify popular

dependent components like the popular Hive component to run on top of its engine (Figure 26).

The ability to fork the Hive source code allowed the Spark team to create a component (Shark)

that offered the same client interface as Hive (HQL) and maintain compatibility with dependent

products and applications. The ability to fork and adapt existing components also allow inter-

network competitors to hijack key platform components and complements to their competing

platforms. The MapReduce engine, as well as Hive, Spark and Shark, have been forked and

ported to alternative data management platforms such as Apache Cassandra by inter-network

competitors such as Datastax [108].

The availability of internal interfaces and implementation source code has allowed

Hadoop to evolve rapidly. However, it also represents a challenge for commercial vendors

attempting to influence the platform’s trajectory. Neither examples of technology substitution

presented above were made with the approval of a clear platform sponsor. The Apache Software

Foundation, typically viewed as the “platform sponsor” for Hadoop, did not make a conscious

decision that Spark or Shark were worth developing and adopting; Spark became an Apache

project after its first release and initial adoption. Ultimately, the organisms that determined what

technologies would be considered part of the Hadoop platform was the market at large and the

ecosystem as a whole. Technologies that were sufficiently compatible with existing ecosystem

solutions with a sufficiently compelling value proposition were eventually adopted broadly

across the entire market as intra-network competitive forces drive platform vendors to adopt the

best-in-class technology.

This meritocratic nature of platform governance in Hadoop also creates tremendous

opportunities for new entrants to enter the market. As mentioned in the Hadoop market

overview, the immense popularity of Apache Spark has allowed commercial vendor Databricks

(founded and led by Spark’s creators) to gain tremendous influence over the Hadoop ecosystem

despite its late entry and limited scale of operation. In a closed-source ecosystem, it would have

been exceedingly difficult for such a small firm to penetrate what had already become a very

large ecosystem with leaders operating at scale. This examples also suggests that open source

platform leadership is generally less stable than conventional platform leadership. The fact that

Page 108: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

98

technology can be adopted and displaced at the whims of the market means that the core

technical competencies that a platform leader has built up can quickly be invalidated if a better

mouse trap emerges. If Apache Spark continues on its current trajectory as the de facto

distributed processing framework of the Hadoop ecosystem, then the value of Hortonwork’s

technical expertise on the MapReduce and Apache Tez technologies could diminish drastically.

As a result of all of this, an aspiring open source platform leader cannot rely solely on the

gravitational pull of platform complements to maintain its leadership positions, as the platform

and the ecosystem can be ‘hijacked’ in the manners described above. It must stay atop of new

technologies and be ready to embrace them in order to maintain its understanding of the

ecosystem. Platform vendors vying to build their businesses upon open source technologies

must remain open to technological change as the market will largely determine the platform

standards for the ecosystem independent of them.

Page 109: Platform Leadership in Open Source Software

99

Chapter 4 - Conclusion

In his book “Only the Paranoid Survives”, Andrew Grove wrote the following in

reference to technological changes:

“ARE SUCH DEVELOPMENTS A CONSTRUCTIVE OR A DESTRUCTIVE FORCE? IN MY VIEW, THEY

ARE BOTH. AND THEY ARE INEVITABLE. IN TECHNOLOGY, WHATEVER CAN BE DONE WILL BE DONE.

WE CAN’T STOP THESE CHANGES. WE CAN’T HIDE FROM THEM. INSTEAD, WE MUST FOCUS ON

GETTING READY FOR THEM.” [4]

This quote certainly seems to apply to the world of open source software. While open

source platforms benefit from the same network effects that proprietary platforms enjoy, the

ability of a single firm to harness and direct that growth is greatly hindered by the increased pace

of technological change afforded by the open intellectual property. Aspiring platform contenders

in the open source world cannot rely on their exclusive possession of key platform technology to

direct the behavior of complementors. Instead, they must assess and increase their influence

over the forces that affect the market in order to shape the movement of the ecosystem as a

whole.

Given all the variables that are beyond the control of an open source platform contender,

perhaps the image of a “platform leader” as a powerful orchestrator that directs an ecosystem by

manipulating the levers of its platform empire is an inappropriate one. With fewer levers of

power at its disposal, an open source platform leader is perhaps more like politicians in modern

democracies, leading through influence and relationship building rather than power and

authority. Moreover, continued possession of a leadership position greatly depends on a firm’s

ability to survey the sentiments of its constituents and adjust accordingly. Open source platform

contenders must clearly assess whether or not their primary rivals sit within the same ecosystem

(i.e. network) and then look up, down and across the value network to ensure unique or superior

access to the suppliers, buyers and partners that make up the market. Beyond this, such vendors

must stay abreast of the technological changes that can emerge to reshape the make up the

market or risk having all their efforts quickly invalidated by the emergence of new technologies

that reshape the market landscape.

Page 110: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

100

While this image of an open source platform leader as a politician is arguably less awe-

inspiring of the previous image of a powerful empire leader, it is perhaps a more prevalent and

relevant depiction of platform leadership in the current software market landscape. With few

exceptions, the open source model is becoming the preferred approach for establishing new

platforms, in the same way that most modern modes of governance are democratic and not

authoritarian. Software vendors that seek to operate within platform markets must accept this

new reality and adapt appropriately.

Areas of Further Research

Much of this thesis drew upon the case studies of two open source platforms: Google’s

Android and Apache Hadoop. These two platforms were selected for their relevance to the

consumer and enterprise software markets, as well as the difference in structure and origins

between them. The Android case study illustrates how a firm may choose to establish an open

source model for its own technology under its own terms, and yet still be subject to the fluid

nature of open source platform dynamics. The Hadoop case study illustrates how firms can work

to establish leadership positions for an external platform technology made available through the

open source community and still extract enormous value. Despite the purposeful selection of the

two case studies, the fact that this thesis studied only two platforms is a limitation and further

research to analyze other platforms may be completed to identify additional factors, tactics and

strategies relevant to open source platform leadership.

While this thesis was written with an understanding of the different business models that

are available to open source ecosystems, the analysis did not systematically consider the impact

of these different business models on the firms’ behaviors with regards to platform strategy.

Moreover, the findings of this thesis were descriptive in nature and would be complemented by

further works to establish a prescriptive framework for managing open source platform

leadership. Systematic consideration of the business model would likely be required for such an

effort. Relatedly, both case studies of the thesis focused on platform providers even though

platform leadership is equally applicable to platform sponsors and users. Given that some of the

largest technology companies in the world are open source platform users (and not providers),

case studies focusing on strategies employed by companies acting in these other platform roles

would be beneficial.

Page 111: Platform Leadership in Open Source Software

101

Appendix

Table 17 - Committers to Apache Spark; Extracted from https://cwiki.apache.org/ on Oct 1st, 2014

Name Organization

Andrew Xia Alibaba

Stephen Haberman Bizo

Mark Hamstra ClearStory Data

Aaron Davidson Databricks

Andrew Or Databricks

Andy Konwinski Databricks

Josh Rosen Databricks

Matei Zaharia Databricks

Michael Armbrust Databricks

Patrick Wendell Databricks

Reynold Xin Databricks

Tathagata Das Databricks

Xiangrui Meng Databricks

Thomas Dudziak Groupon

Prashant Sharma Imaginea, Pramati, Databricks

Jason Dai Intel

Nick Pentreath Mxit

Shane Huang National University of Singapore

Imran Rashid Quantifind

Ryan LeCompte Quantifind

Ankur Dave UC Berkeley

Charles Reiss UC Berkeley

Haoyuan Li UC Berkeley

Joseph Gonzalez UC Berkeley

Kay Ousterhout UC Berkeley

Mosharaf Chowdhury UC Berkeley

Shivaram Venkataraman UC Berkeley

Sean McNamara Webtrends

Mridul Muralidharam Yahoo!

Ram Sriharsha Yahoo!

Robert Evans Yahoo!

Thomas Graves Yahoo!

Page 112: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

102

Table 18 - Top 10 Big Data Vendors by Revenue according to Wikibon.org [77]

2013 Worldwide Big Data Revenue by Vendor ($US millions)

Vendor Big Data Revenue

Total Revenue

Big Data Revenue

as % of Total Revenue

% Big Data

Hardware Revenue

% Big Data

Software Revenue

% Big Data

Services Revenue

IBM $1,368 $99,751 1% 31% 27% 42%

SAP $545 $22,900 2% 0% 76% 24%

HP $869 $114,100 1% 42% 14% 44%

Oracle $491 $37,552 1% 28% 37% 36%

Teradata $518 $2,665 19% 36% 30% 34%

Microsoft $280 $83,200 0% 0% 63% 37%

Pivotal $300 $300 100% 15% 50% 35%

Cloudera $73 $73 100% 0% 53% 47%

Hortonworks $55 $55 100% 0% 73% 27%

MapR $35 $35 100% 0% 77% 23%

Total $18,607 n/a n/a 38% 22% 40%

Table 19 – Hadoop-related Apache committers by project and organizations; extracted from www.apache.org on May 1st, 2014

Project PMC / Committer Name Organization

Accumulo Committer Applied Physics Laboratory

Accumulo Committer Pushpinder Heer Applied Technical Systems

Accumulo Committer Mike Fagan Arcus Research

Accumulo Committer Arshak Navruzyan Argyle Data

Accumulo Committer Ed Kohlwey Booz Allen Hamilton

Accumulo Committer Andrew George Wells ClearEdgeIT

Accumulo Committer Alex Moundalexis Cloudera

Accumulo Committer Hung Pham Cloudera

Accumulo Committer Jessica Seastrom Cloudera

Accumulo Committer Jeff Field Cloudera

Accumulo Committer Jonathan M. Hsieh Cloudera

Accumulo Committer Ryan Fishel Cloudera

Accumulo Committer Vikram Srivastava Cloudera

Accumulo Committer Aaron Glahe Data Tatics

Accumulo Committer Christian Rohling Endgame

Accumulo Committer Ravi Mutyala Hortonworks

Accumulo Committer Steve Loughran Hortonworks

Accumulo Committer Ted Yu Hortonworks

Accumulo Committer Jared Winick Koverse

Accumulo Committer Laura Peaslee Objective Solutions, Inc.

Accumulo Committer Jim Klucar Splyt

Page 113: Platform Leadership in Open Source Software

Appendix

103

Accumulo Committer Chris McCubbin sqrrl

Accumulo Committer Jonathan Park sqrrl

Accumulo Committer Luke Brassard sqrrl

Accumulo Committer Michael Allen sqrrl

Accumulo Committer Michael Berman sqrrl

Accumulo Committer Oren Falkowitz sqrrl

Accumulo Committer Phil Eberhardt sqrrl

Accumulo Committer Miguel Pereira SRA International, Inc

Accumulo Committer Damon Brown Tetra Concepts LLC

Accumulo Committer Kevin Faro Tetra Concepts LLC

Accumulo Committer Dennis Patrone The Johns Hopkins University

Accumulo Committer Al Krinker

Accumulo Committer Chris Bennight

Accumulo Committer David M. Lyle

Accumulo Committer Ed Coleman

Accumulo Committer Edward Yoon

Accumulo Committer Jason Then

Accumulo Committer Jay Shipper

Accumulo Committer Jesse Yates

Accumulo Committer Joe Skora

Accumulo Committer John Stoneham

Accumulo Committer Matthew Kirkley

Accumulo Committer Michael Wall

Accumulo Committer Morgan Haskel

Accumulo Committer Nguessan Kouame

Accumulo Committer Philip Young

Accumulo Committer Ryan Leary

Accumulo Committer Sapah Shah

Accumulo Committer Scott Kuehn

Accumulo Committer Sean Hickey

Accumulo Committer Supun Kamburugamuva

Accumulo Committer Tim Halloran

Accumulo Committer Tim Reardon

Accumulo Committer Travis Pinney

Hadoop Committer Ravi Prakash Altiscale, Inc.

Hadoop Committer Aaron T. Myers Cloudera

Hadoop Committer Colin Patrick McCabe Cloudera

Hadoop Committer Doug Cutting Cloudera

Hadoop Committer Eli Collins Cloudera

Hadoop Committer Harsh J Cloudera

Hadoop Committer Karthik Kambatla Cloudera

Page 114: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

104

Hadoop Committer Sandy Ryza Cloudera

Hadoop Committer Todd Lipcon Cloudera

Hadoop Committer Tom White Cloudera

Hadoop Committer Alejandro Abdelnur Cloudera

Hadoop Committer Andrew Wang Cloudera

Hadoop Committer Mayank Bansal ebay

Hadoop Committer Dhruba Borthakur Facebook

Hadoop Committer Hairong Kuang Facebook

Hadoop Committer Dmytro Molkov Facebook

Hadoop Committer Scott Chun-Yang Chen Facebook

Hadoop Committer Zheng Shao Facebook

Hadoop Committer Andrzej Bialecki Getopt

Hadoop Committer Arun C Murthy Hortonworks

Hadoop Committer Arpit Agarwal Hortonworks

Hadoop Committer Arpit Gupta Hortonworks

Hadoop Committer Bikas Saha Hortonworks

Hadoop Committer Brandon Li Hortonworks

Hadoop Committer Chris Nauroth Hortonworks

Hadoop Committer Devaraj Das Hortonworks

Hadoop Committer Enis Soztutar Hortonworks

Hadoop Committer Giridharan Kesavan Hortonworks

Hadoop Committer Hitesh Shah Hortonworks

Hadoop Committer Jian He Hortonworks

Hadoop Committer Jing Zhao Hortonworks

Hadoop Committer Jitendra Nath Pandey Hortonworks

Hadoop Committer Mahadev Konar Hortonworks

Hadoop Committer Matthew Foley Hortonworks

Hadoop Committer Owen O'Malley Hortonworks

Hadoop Committer Ramya Sunil Hortonworks

Hadoop Committer Sanjay Radia Hortonworks

Hadoop Committer Siddharth Seth Hortonworks

Hadoop Committer Steve Loughran Hortonworks

Hadoop Committer Suresh Srinivas Hortonworks

Hadoop Committer Tsz Wo (Nicholas) Sze Hortonworks

Hadoop Committer Vinod Kumar Vavilapalli Hortonworks

Hadoop Committer Haohui Mai Hortonworks

Hadoop Committer Xuan Gong Hortonworks

Hadoop Committer Zhijie Shen Hortonworks

Hadoop Committer Vinayakumar B Huawei

Hadoop Committer Eric Yang IBM

Hadoop Committer Kan Zhang IBM

Page 115: Platform Leadership in Open Source Software

Appendix

105

Hadoop Committer Nigel Daley Individual

Hadoop Committer Amareshwari Sriramadasu InMobi

Hadoop Committer Sharad Agarwal InMobi

Hadoop Committer Sreekanth Ramakrishnan InMobi

Hadoop Committer Christophe Taton INRIA

Hadoop Committer Devaraj K Intel

Hadoop Committer Uma Maheswara Rao G Intel

Hadoop Committer Allen Wittenauer LinkedIn

Hadoop Committer Boris Shkolnik LinkedIn

Hadoop Committer Jakob Homan LinkedIn

Hadoop Committer Lohit Vijayarenu MapR

Hadoop Committer Chris Douglas Microsoft

Hadoop Committer Ivan Mitic Microsoft

Hadoop Committer Roman Shaposhnik Pivotal

Hadoop Committer Johan Oskarsson Twitter

Hadoop Committer Raghu Angadi Twitter

Hadoop Committer Matei Zaharia UC Berkeley

Hadoop Committer Junping Du VMware

Hadoop Committer Luke Lu VMware

Hadoop Committer Konstantin Boudnik WANdisco

Hadoop Committer Konstantin Shvachko WANdisco

Hadoop Committer Amar Ramesh Kamat Yahoo!

Hadoop Committer Robert(Bobby) Evans Yahoo!

Hadoop Committer Daryn Sharp Yahoo!

Hadoop Committer Jonathan Eagles Yahoo!

Hadoop Committer Jason Lowe Yahoo!

Hadoop Committer Kihwal Lee Yahoo!

Hadoop Committer Koji Noguchi Yahoo!

Hadoop Committer Mukund Madhugiri Yahoo!

Hadoop Committer Tanping Wang Yahoo!

Hadoop Committer Thomas Graves Yahoo!

Hbase Committer Gregory Chanan Cloudera

Hbase Committer Jean-Daniel Cryans Cloudera

Hbase Committer Jonathan Hsieh Cloudera

Hbase Committer Jimmy Xiang Cloudera

Hbase Committer Lars George Cloudera

Hbase Committer Michael Stack Cloudera

Hbase Committer Todd Lipcon Cloudera

Hbase Committer Elliott Clark Cloudera

Hbase Committer Matteo Bertozzi Cloudera

Hbase Committer Gary Helmling Continuuity

Page 116: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

106

Hbase Committer Jonathan Gray Continuuity

Hbase Committer Ryan Rawson DrawnToScale

Hbase Committer Doug Meil Explorys

Hbase Committer Amitanand S. Aiyer Facebook

Hbase Committer Kannan Muthukkaruppan Facebook

Hbase Committer Karthik Ranganathan Facebook

Hbase Committer Mikhail Bautin Facebook

Hbase Committer Nicolas Spiegelberg Facebook

Hbase Committer Liyin Tang Facebook

Hbase Committer Devaraj Das Hortonworks

Hbase Committer Enis Soztutar Hortonworks

Hbase Committer Jeffrey Zhong Hortonworks

Hbase Committer Nick Dimiduk Hortonworks

Hbase Committer Sergey Shelukhin Hortonworks

Hbase Committer Ted Yu Hortonworks

Hbase Committer Rajeshbabu Chintaguntla Huawei

Hbase Committer Andrew Purtell Intel

Hbase Committer Anoop Sam John Intel

Hbase Committer Ramkrishna S Vasudevan Intel

Hbase Committer Jesse Yates Salesforce.com

Hbase Committer Lars Hofhansl Salesforce.com

Hbase Committer Nicolas Liochon Scaled Risk

Hbase Committer Chunhui Shen Taobao

Hbase Committer Honghua Feng Xiaomi

Hbase Committer Liang Xie Xiaomi

Hive Committer Prasad Mujumdar Cloudera

Hive Committer Gang Tim Liu Facebook

Hive Committer Kevin Wilfong Facebook

Hive Committer Siying Dong Facebook

Hive Committer Daniel Dai Hortonworks

Hive Committer Alan Gates Hortonworks

Hive Committer Jason Dere Hortonworks

Hive Committer Jitendra Pandey Hortonworks

Hive Committer Sushanth Sowmyan Hortonworks

Hive Committer Owen O'Malley Hortonworks

Hive Committer Prasanth Jayachandran Hortonworks

Hive Committer Sergey Shelukhin Hortonworks

Hive Committer Vaibhav Gumashta Hortonworks

Hive Committer Vikram Dixit Hortonworks

Hive Committer Amareshwari Sriramadasu InMobi

Hive Committer Eric Hanson Microsoft

Page 117: Platform Leadership in Open Source Software

Appendix

107

Hive Committer Yin Huai The Ohio State University

Pig Committer Xuefu Zhang Inadco

Pig Committer Mark Wagner LinkedIn

Pig Committer Prashant Kommireddi Salesforce.com

Pig Committer Aniket Mokashi Twitter

Pig Committer Koji Noguchi Yahoo!

Pig Committer Gianmarco De Francisci Morales

Yahoo!

Spark Committer Stephen Haberman Bizo

Spark Committer Mark Hamstra ClearStory Data

Spark Committer Aaron Davidson Databricks

Spark Committer Andy Konwinski Databricks

Spark Committer Matei Zaharia Databricks

Spark Committer Patrick Wendell Databricks

Spark Committer Reynold Xin Databricks

Spark Committer Tathagata Das Databricks

Spark Committer Prashant Sharma Databricks

Spark Committer Thomas Dudziak Groupon

Spark Committer Andrew Xia Intel

Spark Committer Jason Dai Intel

Spark Committer Shane Huang Intel

Spark Committer Nick Pentreath Mxit

Spark Committer Imran Rashid Quantifind

Spark Committer Ryan LeCompte Quantifind

Spark Committer Ankur Dave UC Berkeley

Spark Committer Charles Reiss UC Berkeley

Spark Committer Haoyuan Li UC Berkeley

Spark Committer Josh Rosen UC Berkeley

Spark Committer Kay Ousterhout UC Berkeley

Spark Committer Mosharaf Chowdhury UC Berkeley

Spark Committer Shivaram Venkataraman UC Berkeley

Spark Committer Sean McNamara Webtrends

Spark Committer Mridul Muralidharam Yahoo!

Spark Committer Ram Sriharsha Yahoo!

Spark Committer Robert Evans Yahoo!

Spark Committer Thomas Graves Yahoo!

Tez Committer Arun C Murthy Hortonworks

Tez Committer Bikas Saha Hortonworks

Tez Committer Gunther Hagleitner Hortonworks

Tez Committer Hitesh Shah Hortonworks

Tez Committer Siddharth Seth Hortonworks

Page 118: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

108

Tez Committer Mike Liddell Microsoft

Zookeeper Committer Patrick Hunt Cloudera

Zookeeper Committer Henry Robinson Cloudera

Zookeeper Committer Benjamin Reed Facebook

Zookeeper Committer Thawan Kooburat Facebook

Zookeeper Committer Alex Shraer Google

Zookeeper Committer Mahadev Konar Hortonworks

Zookeeper Committer Andrew Kornev Individual

Zookeeper Committer Flavio Junqueira Microsoft

Zookeeper Committer Michi Mutsuzaki Nicira

Zookeeper Committer Camille Fournier RentTheRunway

Accumulo PMC member Benson Margulies Basis Technology Corp.

Accumulo PMC member Drew Farris Booz Allen Hamilton

Accumulo PMC member Bill Havanki Cloudera

Accumulo PMC member Mike Drob Cloudera

Accumulo PMC member Sean Busbey Cloudera

Accumulo PMC member Jason Trost Endgame

Accumulo PMC member Billie Rinaldi Hortonworks

Accumulo PMC member Josh Elser Hortonworks

Accumulo PMC member Aaron Cordova Koverse

Accumulo PMC member William Slacum Koverse

Accumulo PMC member Christopher Tubbs NSA

Accumulo PMC member Corey J. Nolet Objective Solutions, Inc.

Accumulo PMC member Dave Marion Objective Solutions, Inc.

Accumulo PMC member Keith Turner Peterson Technologies

Accumulo PMC member Brian Loss Praxis Engineering

Accumulo PMC member Adam Fuchs sqrrl

Accumulo PMC member John Vines sqrrl

Accumulo PMC member Eric Newton SW Complete Inc.

Accumulo PMC member Chris Waring

Accumulo PMC member David Medinets

Hadoop PMC member Aaron T. Myers Cloudera

Hadoop PMC member Doug Cutting Cloudera

Hadoop PMC member Eli Collins Cloudera

Hadoop PMC member Patrick Hunt Cloudera

Hadoop PMC member Michael Stack Cloudera

Hadoop PMC member Todd Lipcon Cloudera

Hadoop PMC member Tom White Cloudera

Hadoop PMC member Alejandro Abdelnur Cloudera

Hadoop PMC member Dhruba Borthakur Facebook

Hadoop PMC member Hairong Kuang Facebook

Page 119: Platform Leadership in Open Source Software

Appendix

109

Hadoop PMC member Zheng Shao Facebook

Hadoop PMC member Arun C Murthy Hortonworks

Hadoop PMC member Devaraj Das Hortonworks

Hadoop PMC member Enis Soztutar Hortonworks

Hadoop PMC member Giridharan Kesavan Hortonworks

Hadoop PMC member Hitesh Shah Hortonworks

Hadoop PMC member Jitendra Nath Pandey Hortonworks

Hadoop PMC member Mahadev Konar Hortonworks

Hadoop PMC member Matt Foley Hortonworks

Hadoop PMC member Owen O'Malley Hortonworks

Hadoop PMC member Sanjay Radia Hortonworks

Hadoop PMC member Siddharth Seth Hortonworks

Hadoop PMC member Steve Loughran Hortonworks

Hadoop PMC member Suresh Srinivas Hortonworks

Hadoop PMC member Tsz Wo (Nicholas) Sze Hortonworks

Hadoop PMC member Vinod Kumar Vavilapalli Hortonworks

Hadoop PMC member Hemanth Yamijala Individual

Hadoop PMC member Amareshwari Sriramadasu InMobi

Hadoop PMC member Sharad Agarwal InMobi

Hadoop PMC member Uma Maheswara Rao G Intel

Hadoop PMC member Nigel Daley Jive

Hadoop PMC member Jakob Homan LinkedIn

Hadoop PMC member Chris Douglas Microsoft

Hadoop PMC member Raghu Angadi Twitter

Hadoop PMC member Luke Lu VMware

Hadoop PMC member Konstantin Shvachko WANdisco

Hadoop PMC member Robert(Bobby) Evans Yahoo!

Hadoop PMC member Daryn Sharp Yahoo!

Hadoop PMC member Jonathan Eagles Yahoo!

Hadoop PMC member Jason Lowe Yahoo!

Hadoop PMC member Kihwal Lee Yahoo!

Hadoop PMC member Thomas Graves Yahoo!

Hive PMC member Brock Noland Cloudera

Hive PMC member Xuefu Zhang Cloudera

Hive PMC member Lefty Leverenz Doc of the Bay

Hive PMC member Yongqiang He Dropbox

Hive PMC member Ning Zhang Facebook

Hive PMC member Raghotham Murthy Facebook

Hive PMC member Gunther Hagleitner Hortonworks

Hive PMC member Ashutosh Chauhan Hortonworks

Hive PMC member Thejas Nair Hortonworks

Page 120: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

110

Hive PMC member Harish Butani Hortonworks

Hive PMC member Carl Steinbach LinkedIn

Hive PMC member Edward Capriolo m6d

Hive PMC member Navis Ryu NexR

Hive PMC member Namit Jain Nutanix

Hive PMC member Ashish Thusoo Qubole

Hive PMC member Joydeep Sensarma Qubole

Pig PMC member Santhosh Srinivasan Cloudera

Pig PMC member Daniel Dai Hortonworks

Pig PMC member Alan Gates Hortonworks

Pig PMC member Giridharan Kesavan Hortonworks

Pig PMC member Ashutosh Chauhan Hortonworks

Pig PMC member Thejas Nair Hortonworks

Pig PMC member Richard Ding IBM

Pig PMC member Cheolsoo Park Netflix

Pig PMC member Bill Graham Twitter

Pig PMC member Dmitriy Ryaboy Twitter

Pig PMC member Jonathan Coveney Twitter

Pig PMC member Julien Le Dem Twitter

Pig PMC member Olga Natkovich Yahoo!

Pig PMC member Rohini Palaniswamy Yahoo!

Zookeeper PMC member Patrick Hunt Cloudera

Zookeeper PMC member Henry Robinson Cloudera

Zookeeper PMC member Benjamin Reed Facebook

Zookeeper PMC member Mahadev Konar Hortonworks

Zookeeper PMC member Ted Dunning MapR

Zookeeper PMC member Flavio Junqueira Microsoft

Zookeeper PMC member Michi Mutsuzaki Nicira

Zookeeper PMC member Camille Fournier RentTheRunway

Zookeeper PMC member Ivan Kelly Yahoo!

Page 121: Platform Leadership in Open Source Software

Appendix

111

Table 20 - ISVs and Technology Partners Matrix – Black cells represent a partnership arrangement exists; data extracted from

www.hortonworks.com, www.cloudera.com and www.mapr.com on November 27th, 2014

Complement Vendor Cloudera Hortonworks MapR

0xdata 1 0 1

Abitech Software 0 1 0

Acentrix 0 0 1

Actian 1 1 0

Actuate 1 1 0

Adatao 1 0 0

Admatic 0 1 0

Aeronomy 0 0 1

Aeverie Inc. 0 0 1

Affini-Tech 0 0 1

Aha! Software 1 0 0

AllianceONE 0 0 1

AlphaSix Corporation 0 0 1

Alpine Data Labs 1 1 1

Alteryx 1 1 1

Amazon 0 0 1

Amdocs 0 1 0

Anchormen 0 0 1

Apara Solutions 0 0 1

Apervi 0 1 0

APEXCNS 0 0 1

Apigee 0 1 0

Appcara 1 0 0

Appfluent 1 1 1

AquaFold 1 0 0

Argil Data 0 1 0

Argyle Data 0 1 0

Arieso 0 1 0

Atigeo 1 0 0

AtScale 1 1 0

Attivio 1 0 0

Attunity 1 1 0

Ayasdi 1 1 0

Aziksa 0 0 1

Azul Systems 1 0 0

Basement Supercomputing 0 1 0

Basis Technology 1 0 0

BC Cloud 0 0 1

Page 122: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

112

BDI Systems 0 0 1

BeagleData 0 0 1

Big Data Elephants, Inc. 0 0 1

Big Data Partnership 0 0 1

Big Switch Networks 0 1 0

BioDatomics 1 0 0

BIPD Ltd 0 0 1

BIPortal GmbH 0 0 1

Birst, Inc 1 0 0

Bit Stew Systems 0 1 0

Blue Canopy Group, LLC 0 0 1

BlueData 1 1 0

BMC Software 0 1 0

Booz Allen Hamilton 0 0 1

BPM-Conseil 1 0 0

BrainPad Inc. 0 0 1

Bright Computing 1 0 0

Brillio 0 0 1

Broadgate Inc 0 0 1

Calpont Corporation 1 0 0

Canonical 1 0 1

CAS 0 0 1

Caserta Concepts 0 0 1

Celer Technologies 1 0 0

Centerity Systems, Inc. 0 0 1

Centrify 0 1 0

Century Link 0 0 1

Ciber 0 0 1

cimt AG 0 0 1

Cirro 1 0 0

Cisco 0 1 1

ClearDATA 0 0 1

Cleo 0 1 0

Cloud A 0 1 0

Cloudian 0 1 0

Cloudsoft 1 0 0

Comma Soft 0 1 0

Composite Software 1 0 0

Compsesa 0 1 0

Computertekk 0 1 0

Compuware 0 1 0

Page 123: Platform Leadership in Open Source Software

Appendix

113

comSysto 0 0 1

Concurrent 1 1 1

Contexti 0 0 1

Continuent 1 1 1

Continuuity 1 1 0

Couchbase 1 1 0

CSC 0 1 0

Cumulus Networks 0 1 0

Data Center Warehouse 0 1 1

Data Tactics Corporation 0 0 1

Databox 1 0 0

Databricks 0 1 1

Datagres 1 0 0

Dataguise 1 1 1

DataHub 0 0 1

Dataiku 0 0 1

Datalakes 1 1 0

Datameer 1 1 1

DataRPM 1 1 1

DataStax 1 1 0

DataTorrent 1 1 1

Datawatch 1 1 0

DBSync 1 0 0

Dell 1 1 0

Denodo 1 1 0

Digital Reasoning 1 0 1

DigitalRoute 0 1 0

Diyotta 1 1 1

Dragonfly Data Factory 0 0 1

eCapital Advisors 0 0 1

Edis Consulting 0 0 1

Elasticsearch 1 1 1

EngineRoom.io 1 0 0

Envision IT Group 0 0 1

Eruces 1 0 0

Esri 1 0 0

eTouch Systems 0 0 1

Eucalyptus Systems, Inc 1 0 0

Exar 0 1 0

Exasol 1 0 0

Excedis 0 1 0

Page 124: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

114

Expert System 1 0 0

Feedzai 1 0 0

FICO 1 0 0

First Light Technologies 0 0 1

Formation Data Systems 0 1 0

FORMCEPT Technologies 1 0 0

Fortscale 1 0 0

Fusionex 0 1 0

Fusion-io 0 0 1

Fuzzy Logix, LLC. 1 0 0

Globalscape 0 1 0

Globant 0 0 1

GoGrid 1 1 0

Google 0 0 1

Grand Logic, Inc. 1 0 0

GraphLab 1 1 0

GrayMatter 0 0 1

Gruter 0 1 0

GTRI 0 0 1

H2O 0 1 0

Hadapt 1 0 0

HP 0 1 1

HStreaming 1 0 0

IBM 1 1 1

Ideation816 Corporation 0 0 1

IKANOW 0 1 0

Impetus Technologies 0 0 1

Indigo New Zealand Limited 0 0 1

Infobright 0 0 1

Infochimps, Inc 1 0 0

Informatica 0 1 1

Information Builders 1 1 1

InfoTrellis 0 1 0

Ingenious Qube 0 1 0

InsightsOne 0 1 0

IntegriChain 0 1 0

Interactive Algorithms Inc. 0 0 1

is-land Systems Inc. 0 1 1

iTalent Corporation 0 0 1

Jaspersoft 1 0 1

JethroData 1 0 1

Page 125: Platform Leadership in Open Source Software

Appendix

115

Jinfonet Software 0 1 1

Jinfonet Software 0 1 1

Joyent 1 1 0

Kapow Software 1 0 0

Karmasphere 1 0 1

Keylink Technology 0 0 1

Knime.com 1 0 0

Knowledgent 0 0 1

Kognitio 0 1 0

Koverse 1 1 1

KPI Partners Inc. 0 0 1

LG CNS 0 1 1

Likya Teknoloji 0 1 0

Logi Analytics 1 1 0

Looker 1 0 0

LSI 0 1 0

Lucidworks 1 1 1

ManTech 0 0 1

MarkLogic 0 1 0

MBI Solutions 1 0 0

Mellanox Technologies 0 0 1

MetiStream 0 0 1

Metric Insights 1 0 0

Micromata 0 0 1

Microsoft 1 1 0

MicroStrategy 1 1 0

Mikan Associates 0 0 1

MisOne Solution 0 0 1

MongoDB 1 1 1

MSR Cosmos, LLC 0 0 1

Narus 1 1 1

Nautilus Technologies 0 0 1

NetApp 0 0 1

New Relic 1 0 0

NFLabs 1 0 0

NGData 1 0 0

Nimbix 1 0 1

Nimble Storage 0 1 0

NorCom 0 0 1

Novetta Solutions 1 1 0

NS Solutions 0 0 1

Page 126: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

116

NTC Vulkan 0 0 1

Nutanix 0 1 0

NxtGen 1 0 0

O2MC 0 1 0

OCTO 0 0 1

Onepoint IQ 0 0 1

Onramp Corporation 0 0 1

OnX Enterprise Solutions 0 0 1

Open V 0 1 0

OpenOsmium 0 0 1

Options I/O 0 0 1

Oracle 1 1 0

Orzota 0 0 1

OS Nexus 1 0 0

ParAccel 1 0 0

Paxata 1 0 0

Pentaho 1 1 1

Pepperdata 1 1 0

Persistent Systems 0 0 1

Pervasive 1 0 0

PetaSecure, Inc. 0 1 0

PHEMI 0 1 0

Platfora 1 1 1

Plivo 1 0 0

Podium Data 0 1 0

Polyform Labs 1 0 0

Pragmatix Services 0 1 0

Pragsis 0 0 1

Predixion Software 1 0 0

Prime Dimensions, LLC 0 0 1

Protegrity 1 1 1

PSSC Labs 0 1 1

Puppet Labs 1 0 0

Qlik 1 1 0

Quaero 1 0 0

QuantCell Research 1 0 0

Qubole 0 1 0

Quest 1 0 0

QuickLogix LLC 0 0 1

Rackspace 0 1 0

Radoop 1 0 0

Page 127: Platform Leadership in Open Source Software

Appendix

117

RainStor 1 1 1

Red Gate 0 1 0

Red Hat 0 1 1

RedOwl Analytics 1 0 0

RedPoint Global 1 1 1

Reltio 1 0 0

Revelytix 1 1 1

Revolution Analytics 1 1 1

RTTS 0 1 0

SAP 1 1 1

SAS 1 1 0

Scaled Risk 0 1 0

ScaleOut Software 1 1 0

Search Technologies 1 0 0

Securonix 1 0 0

Semantic Research 1 0 0

Sematext 1 1 0

SequenceIQ 0 1 0

SequoiaDB 1 0 0

Serendio 0 1 0

Servient 1 1 0

SGI 0 1 0

SHS-Viveon 0 0 1

Simba 1 1 0

Sisense 1 0 0

Skytree 1 1 1

Smart Platform 1 0 0

SMP Management AG 0 0 1

SnapLogic 1 1 0

Softlayer 1 0 0

SoftNet Solutions 0 0 1

Solarflare 1 0 1

Solix Technologies 1 1 0

Sophias 0 0 1

Spectra Logic 0 1 0

Splice Machine 1 0 1

Splunk 1 1 1

Spring 0 1 0

SQLstream 1 0 0

Sqrrl 1 1 1

StackIQ 1 1 1

Page 128: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

118

SteamBase 1 0 0

SUSE 1 1 0

Syncsort 1 1 1

SYNTASA 0 1 0

Tableau 1 1 1

Talend 1 1 1

Tamr 1 1 0

Targit 1 0 0

Tata Consultancy Services 0 0 1

Teradata 0 1 1

Tervela 1 0 0

Think Big Analytics 0 0 1

TIBCO Software 1 1 0

Tidemark 1 0 0

Trace3 0 0 1

Transcend Business Intelligence 0 1 0

TransLattice, Inc. 1 0 0

Trendwise Analytics 0 0 1

Tresata 1 1 0

Trifacta 1 1 0

Tri-IT Solutions 0 0 1

Tugbiz 1 0 0

Twingo 0 0 1

Typesafe 0 1 0

Ubeeko 1 1 1

Ubuntu 1 0 0

UL Environment 0 0 1

Unbelievable Machine 0 0 1

Univa 1 0 1

Vanilla 0 0 1

Veristorm 1 1 0

Vintech Solutions, Inc 0 0 1

Violin Memory 0 0 1

VMware 1 1 1

Voltage Security 1 1 1

VoltDB 1 0 1

Vormetric 1 1 0

WANdisco 1 1 0

Waterline Data Science 0 1 1

WE-Ankor 0 0 1

Page 129: Platform Leadership in Open Source Software

Appendix

119

WHISHWORKS 0 0 1

WhiteKlay 0 0 1

Wibidata, Inc 1 0 0

World Wide Technology 0 0 1

X15 Software 1 1 0

Xenolytics 0 1 0

Xiilab 0 1 0

XOR Security 1 1 0

Xplenty 1 1 0

Yeswici LLC 0 1 0

Ysance 0 0 1

Z Data Inc. 0 0 1

Zaloni 0 0 1

Zementis 1 1 0

Zettaset 0 1 0

Zettics 0 1 0

Zoho WebNMS 1 0 0

Zoomdata 1 0 0

Zuhlke Engineering 0 1 0

Total Number of Vendors 164 156 159

Page 130: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

120

This page is intentionally left blank.

Page 131: Platform Leadership in Open Source Software

121

List of Figures

Figure 1 – A system dynamics model of direct network effects ..................................................... 6

Figure 2 – A system dynamics model of a two-sided platform....................................................... 8

Figure 3 –Roles and Relationships in a Platform-Mediated Network .......................................... 10

Figure 4 – Linux marketshare in various computing segments .................................................... 17

Figure 5 – Results from the "Java Use and Awareness Study" from BZ Research, 2005 ............ 24

Figure 6 – Eclipse Project Committer by Company ..................................................................... 25

Figure 7 – Porter's Five Forces Model .......................................................................................... 27

Figure 8 – Grove’s Six Forces Diagram ....................................................................................... 28

Figure 9 – The Android Platform .................................................................................................. 31

Figure 10 – Inter-network and Intra-network Competition .......................................................... 34

Figure 11 – Hierarchy of influence within an Apache Software Foundation project ................... 38

Figure 12 – The Purchase Process of Complements ..................................................................... 46

Figure 13 – Example of Platform Fragmentation. ........................................................................ 50

Figure 14 – Ecosytem Hijacking .................................................................................................. 52

Figure 15 – High-level Architecture of Android and Blackberry OS 10 ...................................... 53

Figure 16 – Google search trends of “Hadoop” and “Big Data” vs. ”Data Warehouse” ............. 63

Figure 17 – Major Building Blocks of a Hadoop Application Stack ............................................ 64

Figure 18 – Diagram of basic MapReduce execution ................................................................... 66

Figure 19 – Cumulative Investments in Pureplay Hadoop Vendors ............................................. 73

Figure 20 – Big Data-related Software and Services Revenue .................................................... 74

Figure 21 – Official Committers to Apache Spark by Organization ............................................. 78

Figure 22 – Hadoop Contributors by Organization ...................................................................... 84

Figure 23 – Decision-tree for Selecting Development Model ...................................................... 88

Page 132: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

122

Figure 24 – Analysis of Exclusivity of Partnership Arrangements for Hadoop ISVs .................. 95

Figure 25 – Hadoop in 2011 vs. 2014 ........................................................................................... 95

Figure 26 – Displacing Core Components .................................................................................... 95

Page 133: Platform Leadership in Open Source Software

123

List of Tables

Table 1 – Open source platforms by commercial firms .................................................................. 2

Table 2 – Comparison of Openness by Role in Platform-mediated Networks ............................. 10

Table 3 – Taxonomy of Envelopment Attacks .............................................................................. 15

Table 4 – Ten criteria of open source software ............................................................................. 20

Table 5 – Apple, IBM and Sun Microsystem's Involvement in Open source ............................... 21

Table 6 – Summary of Strategic Considerations for Open Source Platform Vendors .................. 29

Table 7 – AOSP-derived Products by Google Competitors .......................................................... 33

Table 8 – Google's Shift of Investment into Proprietary Capabilities. ......................................... 36

Table 9 – Decision Making Authorities in Different Open Source Communities ........................ 41

Table 10 – The Three V's of Big Data .......................................................................................... 59

Table 11 – A Selection of SQL on Hadoop offerings ................................................................... 70

Table 12 – Breakdown of Hadoop-market according to Forrester Research ................................ 72

Table 13 – Cumulative Investments in Pureplay Hadoop Vendors ............................................... 73

Table 14 – Sample Hadoop Positioning Statements by Enterprise Software vendors .................. 76

Table 15 – Partnership Matrix Between Pure play Vendors and Enterprise Software Vendors .... 76

Table 16 – Partnerships between Pure play Hadoop vendors and cloud IaaS vendors ................. 93

Table 17 – Committers to Apache Spark .................................................................................... 101

Table 18 – Top 10 Big Data Vendors by Revenue ...................................................................... 102

Table 19 – Hadoop-related Apache committers by project and organizations ........................... 102

Table 20 – ISVs and Technology Partners Matrix ....................................................................... 111

Page 134: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

124

This page is intentionally left blank.

Page 135: Platform Leadership in Open Source Software

125

References

[1] R. Stallman, “The GNU operating system and the free software movement,” Open sources

Voices from open source Revolut., 1999.

[2] R. Gilbert and M. Katz, “An economist’s guide to US v. Microsoft,” J. Econ. Perspect.,

2001.

[3] M. Olson, “The Cloudera Model,” LinkedIn, 2013. [Online]. Available:

http://www.linkedin.com/today/post/article/20131003190011-29380071-the-cloudera-

model. [Accessed: 31-Mar-2014].

[4] A. S. Grove, Only the Paranoid Survive. Doubleday, 1996.

[5] R. Schmalensee, “Jeffrey Rohlfs’ 1974 Model of Facebook,” vol. 7, no. 1, 2011.

[6] J. Rohlfs, “A theory of interdependent demand for a communications service,” Bell J.

Econ. Manag. …, 1974.

[7] M. Katz and C. Shapiro, “Network externalities, competition, and compatibility,” Am.

Econ. Rev., vol. 75, no. 3, pp. 424–440, 1985.

[8] A. Gawer and R. Henderson, “Platform owner entry and innovation in complementary

markets: Evidence from Intel,” J. Econ. Manag. …, vol. 16, no. 1, pp. 1–34, 2007.

[9] A. Gawer and M. Cusumano, “Industry platforms and ecosystem innovation,” J. Prod.

Innov. …, 2013.

[10] M. Cusumano and A. Gawer, Platform Leadership: How Intel, Microsoft, and Cisco Drive

Industry Innovation [Hardcover]. Harvard Business Press; 1 edition, 2002, p. 305.

[11] O. de Weck, E. Suh, and D. Chang, “Product family strategy and platform design

optimization,” pp. 1–38, 2004.

[12] M. Cusumano, “Technology strategy and managementThe evolution of platform

thinking,” Commun. ACM, vol. 53, no. 1, p. 32, Jan. 2010.

[13] K. Boudreau, “Let a thousand flowers bloom? An early look at large numbers of software

app developers and patterns of innovation,” Organ. Sci., 2012.

Page 136: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

126

[14] G. Parker and M. Van Alstyne, “Two-sided network effects: A theory of information

product design,” Manage. Sci., 2005.

[15] T. Eisenmann, “Opening platforms: how, when and why?,” This Pap. has been …, 2008.

[16] A. Gawer, “The organization of platform leadership: an empirical investigation of intel’s

management processes aimed at fostering complementary innovation by third,” 2000.

[17] R. Henderson and K. Clark, “Architectural innovation: the reconfiguration of existing

product technologies and the failure of established firms,” Adm. Sci. Q., 1990.

[18] T. Eisenmann, G. Parker, and M. W. Van Alstyne, “Platform Envelopment,” 2007.

[19] D. Teece, “Profiting from technological innovation: Implications for integration,

collaboration, licensing and public policy,” Res. Policy, vol. 15, no. February, pp. 285–

305, 1986.

[20] A. Gillen, “Worldwide Client and Server Operating Environments 2013–2017 Forecast

and 2012 Vendor Shares,” IDC Research, 2013. [Online]. Available:

http://www.idc.com.libproxy.mit.edu/getdoc.jsp?containerId=243003. [Accessed: 17-Jul-

2014].

[21] “Operating system Family / Linux | TOP500 Supercomputer Sites,” 2014. [Online].

Available: http://www.top500.org/statistics/details/osfam/1. [Accessed: 17-Jul-2014].

[22] C. DiBona and S. Ockman, Open sources: Voices from the open source revolution. 1999.

[23] O. S. Initiative, “History of the OSI,” About the OSI. [Online]. Available:

http://opensource.org/about. [Accessed: 15-Sep-2014].

[24] S. Krishnamurthy, “An analysis of open source business models,” Perspect. Free open

source Softw., 2005.

[25] J. West, “How open is open enough?: Melding proprietary and open source platform

strategies,” Res. Policy, 2003.

[26] P. G. Capek, S. P. Frank, S. Gerdt, and D. Shields, “A history of IBM’s open source

involvement and strategy,” IBM Syst. J., vol. 44, no. 2, pp. 249–257, 2005.

Page 137: Platform Leadership in Open Source Software

References

127

[27] N. Economides and E. Katsamakas, “Two-sided competition of proprietary vs. open

source technology platforms and the implications for the software industry,” Manage. Sci.,

2006.

[28] “Microsoft Uses Open source Code Despite Denying Use of Such Software - WSJ.”

[Online]. Available: http://online.wsj.com/news/articles/SB992819157437237260.

[Accessed: 04-Aug-2014].

[29] S. O’Mahony, F. Diaz, and E. Mamas, “Ibm and Eclipse (a),” Harvard Bus. Sch. Case, pp.

1–20, 2005.

[30] M. Cusumano and A. Gawer, “The elements of platform leadership,” IEEE Eng. Manag.

Rev., vol. 43, no. 3, 2003.

[31] M. Porter, How competitive forces shape strategy. 1979.

[32] O. Alliance, “Open handset alliance,” Retrieved August, 2011.

[33] “Nokia X products - Nokia.” [Online]. Available:

http://www.microsoft.com/en/mobile/phones/nokia-x/. [Accessed: 23-Dec-2014].

[34] J. Osawa, Chinese Software to Challenge Android - WSJ.com. Online.wsj.com, 2012.

[35] Baidu prepares mobile operating system. Financial Times, 2011.

[36] R. Brandom, This is Nokia X: Android and Windows Phone collide. The Verge, 2013.

[37] I. Research, “Worldwide Smartphone Shipments Edge Past 300 Million Units in the

Second Quarter,” IDC Research, 2014. [Online]. Available:

http://www.idc.com/getdoc.jsp?containerId=prUS25037214. [Accessed: 18-Aug-2014].

[38] R. Amadeo, “Google’s iron grip on Android: Controlling open source by any means

necessary | Ars Technica,” Arstechnica, 2013. [Online]. Available:

http://arstechnica.com/gadgets/2013/10/googles-iron-grip-on-android-controlling-open

source-by-any-means-necessary/. [Accessed: 12-Aug-2014].

[39] J. Brodkin, “Google blocked Acer’s rival phone to prevent Android ‘fragmentation’ | Ars

Technica,” Arstechnica, 2012. [Online]. Available:

Page 138: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

128

http://arstechnica.com/gadgets/2012/09/google-blocked-acers-rival-phone-to-prevent-

android-fragmentation/. [Accessed: 19-Aug-2014].

[40] “How the ASF works.” [Online]. Available: http://www.apache.org/foundation/how-it-

works.html. [Accessed: 01-Sep-2014].

[41] A. S. Foundation, “Project Management Committee Guide,” 2012. [Online]. Available:

http://www.apache.org/dev/pmc.html#what-is-a-pmc. [Accessed: 30-Aug-2014].

[42] E. S. Foundation, “Eclipse Development Process 2011,” Eclipse Development Process,

2011. [Online]. Available:

https://www.eclipse.org/projects/dev_process/development_process.php#4_6_1_PMC.

[Accessed: 30-Aug-2014].

[43] “Understanding the Open Source Development Model.” [Online]. Available:

file:///D:/Downloads/lf_os_devel_model.pdf. [Accessed: 30-Aug-2014].

[44] “Roles and Leadership — Mozilla.” [Online]. Available: https://www.mozilla.org/en-

US/about/governance/roles/. [Accessed: 30-Aug-2014].

[45] “List of Projects | projects.eclipse.org.” [Online]. Available: https://projects.eclipse.org/.

[Accessed: 23-Dec-2014].

[46] N. Daidj and T. Isckia, “Entering the economic models of game console manufacturers,”

Commun. Strateg., 2009.

[47] J. Prieger and W. Hu, “Applications barrier to entry and exclusive vertical contracts in

platform markets,” Econ. Inq., 2012.

[48] “Frequently Asked Questions | Android Open Source.” [Online]. Available:

https://source.android.com/faqs.html#what-is-the-role-of-google-play-in-compatibility.

[Accessed: 30-Oct-2014].

[49] “BlackBerry, Amazon Licensing Agreement to Bring Thousands of New Apps | Inside

BlackBerry.” [Online]. Available: http://blogs.blackberry.com/2014/06/amazon-

appstore/?utm_medium=social&utm_source=TWITTER:BlackBerry&utm_campaign=Ap

ps&linkId=8550417. [Accessed: 30-Oct-2014].

Page 139: Platform Leadership in Open Source Software

References

129

[50] “Ahead Of Smartphone Launch, Amazon Announces Its Appstore Has Tripled Year-Over-

Year To 240,000 Apps | TechCrunch.” [Online]. Available:

http://techcrunch.com/2014/06/16/ahead-of-smartphone-launch-amazon-announces-its-

appstore-has-tripled-year-over-year-to-240000-apps/. [Accessed: 30-Oct-2014].

[51] “Portland Project hits 1.0 milestone | Ars Technica.” [Online]. Available:

http://arstechnica.com/uncategorized/2006/10/7977/. [Accessed: 10-Sep-2014].

[52] “Meet the BlackBerry wizardry that created its ‘better Android than Android’ • The

Register.” [Online]. Available:

http://www.theregister.co.uk/2013/11/25/revealed_how_blackberry_made_its_better_andr

oid_than_android/. [Accessed: 03-Sep-2014].

[53] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,”

Communications of the ACM, 2005. [Online]. Available:

http://research.google.com/archive/mapreduce-osdi04-slides/index.html. [Accessed: 02-

Apr-2014].

[54] S. Ghemawat, H. Gobioff, and S. Leung, “The Google file system,” ACM SIGOPS Oper.

Syst. …, 2003.

[55] Hadoop: The Definitive Guide [Paperback]. O’Reilly Media; Third Edition edition, 2012,

p. 688.

[56] S. Kohr, “The Origins of ‘Big Data’: An Etymological Detective Story,” New York Times,

2013. [Online]. Available: http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-

data-an-etymological-detective-story/?_php=true&_type=blogs&_r=0. [Accessed: 23-Sep-

2014].

[57] “A Personal Perspective on the Origin (s) and Development of ‘Big Data’: The

Phenomenon, the Term, and the Discipline∗,” 2012.

[58] D. Laney, “3D data management: Controlling data volume, velocity and variety,” META

Gr. Res. Note, 2001.

[59] M. Beyer and D. Laney, “The Importance of’Big Data': A Definition,” Stamford, CT Gart.,

2012.

Page 140: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

130

[60] E. F. Codd, “A relational model of data for large shared data banks,” Commun. ACM, vol.

13, no. 6, pp. 377–387, Jun. 1970.

[61] N. Shamgunov, “Scaling Up And Out,” Dr. Dobb’s, 2012. [Online]. Available:

http://www.drdobbs.com/database/scaling-up-and-out/240142249. [Accessed: 11-Oct-

2014].

[62] M. Gualtieri and N. Yuhanna, “The Forrester Wave TM : Big Data Hadoop,” 2014.

[63] “Gartner Says Business Intelligence and Analytics Need to Scale Up to Support Explosive

Growth in Data Sources.” [Online]. Available:

http://www.gartner.com/newsroom/id/2313915. [Accessed: 02-Nov-2014].

[64] J. Kelly, “Data Warehouse Vendors Moving To Contain The Hadoop Threat,” Wikibon,

2014. [Online]. Available:

http://wikibon.org/wiki/v/Data_Warehouse_Vendors_Moving_to_Contain_the_Hadoop_T

hreat. [Accessed: 03-Nov-2014].

[65] “Google Trends - Web Search interest: hadoop, big data, data warehouse - Worldwide,

2004 - present.” [Online]. Available:

http://www.google.com/trends/explore#q=Hadoop%2C Big Data%2C Data

warehouse&cmpt=q. [Accessed: 03-Nov-2014].

[66] “HDFS Alternatives - Hadoop Ecosystem.” [Online]. Available:

http://hadoopecosystem.whatazoo.com/home/services/core-layers/persist/hdfs/hdfs-

alternatives. [Accessed: 06-Dec-2014].

[67] “Apache Hadoop YARN – Background and an Overview - Hortonworks.” [Online].

Available: http://hortonworks.com/blog/apache-hadoop-yarn-background-and-an-

overview/. [Accessed: 16-Oct-2014].

[68] M. Zaharia and M. Chowdhury, “Spark: cluster computing with working sets,” … cloud

Comput., 2010.

[69] “Spark Incubation Status - Apache Incubator.” [Online]. Available:

http://incubator.apache.org/projects/spark.html. [Accessed: 10-Nov-2014].

Page 141: Platform Leadership in Open Source Software

References

131

[70] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel

programs from sequential building blocks,” ACM SIGOPS Oper. …, 2007.

[71] “Apache Hadoop 2 is now GA! - Hortonworks.” [Online]. Available:

http://hortonworks.com/blog/apache-hadoop-2-is-ga/. [Accessed: 25-Oct-2014].

[72] “Community Effort Driving Standardization of Apache Spark Through Expanded Role in

Hadoop Projects.” [Online]. Available:

http://www.cloudera.com/content/cloudera/en/about/press-center/press-

releases/2014/07/01/community-effort-driving-standardization-of-apache-spark-

through.html. [Accessed: 10-Nov-2014].

[73] C. Olston, B. Reed, and U. Srivastava, “Pig latin: a not-so-foreign language for data

processing,” Proc. 2008 …, 2008.

[74] “Ambari Incubation Status - Apache Incubator.” [Online]. Available:

http://incubator.apache.org/projects/ambari.html. [Accessed: 01-Nov-2014].

[75] CrunchBase, “CrunchBase Data Exports,” 2014. [Online]. Available:

http://info.crunchbase.com/about/crunchbase-data-exports/. [Accessed: 04-Nov-2014].

[76] Hortonworks, “Form S-1 Registration Statement under the securities act of 1933,” 2014.

[Online]. Available:

http://www.sec.gov/Archives/edgar/data/1610532/000119312514405390/d748349ds1.htm

. [Accessed: 17-Nov-2014].

[77] “Big Data Vendor Revenue And Market Forecast 2013-2017 - Wikibon.” [Online].

Available:

http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017.

[Accessed: 03-Nov-2014].

[78] P. Z. D. D. K. P. T. D. D. C. J. Giles, Harness the Power of Big Data - The IBM Big Data

Platform. .

[79] H. Blog, “SAP + Hortonworks = Instant Access + Infinite Scale with HANA + Hadoop.”

[Online]. Available: http://hortonworks.com/partner/sap/. [Accessed: 08-Nov-2014].

Page 142: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

132

[80] Oracle Corporation, “Introduction to Oracle Database,” 2013. [Online]. Available:

http://docs.oracle.com/cd/E11882_01/server.112/e25789/intro.htm#CNCPT88781.

[Accessed: 01-Nov-2014].

[81] “Title: UDA Data Sheet: Exploit All Your Data with Teradata Unified Data

ArchitectureTM.” [Online]. Available:

http://www.teradata.com/Resources/Brochures/UDA-Data-Sheet-Exploit-All-Your-Data-

with-Teradata-Unified-Data-Architecture/?LangType=1033&LangSelect=true. [Accessed:

08-Nov-2014].

[82] “Microsoft Analytics Platform System Solution Brief.” [Online]. Available:

file:///C:/Users/kencw_000/Downloads/Analytics_Platform_System_Solution_Brief.pdf.

[Accessed: 08-Nov-2014].

[83] “Find Partners | MapR.” [Online]. Available: https://www.mapr.com/partners/find-partner.

[Accessed: 09-Nov-2014].

[84] “Partners.” [Online]. Available:

http://www.cloudera.com/content/cloudera/en/partners.html. [Accessed: 09-Nov-2014].

[85] “We do Hadoop. Together.” [Online]. Available: http://hortonworks.com/partners/.

[Accessed: 09-Nov-2014].

[86] “Magic Quadrant for Cloud Infrastructure as a Service,” Gartner Group. [Online].

Available: http://www.gartner.com/technology/reprints.do?id=1-

1UKQQA6&ct=140528&st=sb. [Accessed: 06-Dec-2014].

[87] A. T. Labs., “Hadoop Deployment Comparison Study.”

[88] “Committers - Spark - Apache Software Foundation.” [Online]. Available:

https://cwiki.apache.org/confluence/display/SPARK/Committers. [Accessed: 10-Nov-

2014].

[89] “Cloudera Enterprise 5 Announced - insideBIGDATA.” [Online]. Available:

http://insidebigdata.com/2013/10/29/cloudera-enterprise-5-announced/. [Accessed: 12-

Nov-2014].

Page 143: Platform Leadership in Open Source Software

References

133

[90] “Cloudera Plans Data Hub Role For Hadoop - InformationWeek.” [Online]. Available:

http://www.informationweek.com/big-data/software-platforms/cloudera-plans-data-hub-

role-for-hadoop/d/d-id/1112099. [Accessed: 12-Nov-2014].

[91] J. Twentyman, “Cloudera vs. Hortonworks: Hadoop to complement or replace data

warehouse.” [Online]. Available: http://www.computerweekly.com/feature/Cloudera-v-

Hortonworks-Hadoop-to-complement-replace-data-warehouse.

[92] “Cloudera Trash Talks With Enterprise Data Hub Release - InformationWeek.” [Online].

Available: http://www.informationweek.com/big-data/software-platforms/cloudera-trash-

talks-with-enterprise-data-hub-release/d/d-id/1113677. [Accessed: 14-Nov-2014].

[93] “Rob ‘Flipper’ Bearden plans to FLOAT his Hadoop heffalump • The Register.” [Online].

Available:

http://www.theregister.co.uk/2013/11/21/rob_bearden_hortonworks_playbook/?page=2.

[Accessed: 20-Nov-2014].

[94] “Teradata Portfolio for Hadoop.” [Online]. Available: http://www.teradata.com/Teradata-

Portfolio-for-Hadoop/?LangType=1033&LangSelect=true. [Accessed: 31-Dec-2014].

[95] “Here’s why HP invested $50M in the Hortonworks approach to Hadoop — Tech News

and Analysis.” [Online]. Available: https://gigaom.com/2014/08/02/heres-why-hp-

invested-50m-in-the-hortonworks-approach-to-hadoop/. [Accessed: 31-Dec-2014].

[96] “MapR, Teradata Ink Deal, Bad Timing for Hortonworks?” [Online]. Available:

http://www.cmswire.com/cms/big-data/mapr-teradata-ink-deal-bad-timing-for-

hortonworks-027253.php. [Accessed: 31-Dec-2014].

[97] K. W. Mike Olson, “Private Interview.” .

[98] “Intel and Cloudera: Why we’re better together for Hadoop - TechRepublic.” [Online].

Available: http://www.techrepublic.com/blog/data-center/intelcloudera/. [Accessed: 10-

Apr-2014].

[99] “Intel Validates Hadoop Market - Wikibon.” [Online]. Available:

http://wikibon.org/wiki/v/Intel_Validates_Hadoop_Market. [Accessed: 20-Nov-2014].

Page 144: Platform Leadership in Open Source Software

Platform Leadership in Open Source Software

134

[100] “The Community Effect | Cloudera Engineering Blog.” [Online]. Available:

http://blog.cloudera.com/blog/2011/10/the-community-effect/. [Accessed: 24-Nov-2014].

[101] “Reality Check: Contributions to Apache Hadoop - Hortonworks.” [Online]. Available:

http://hortonworks.com/blog/reality-check-contributions-to-apache-hadoop/. [Accessed:

24-Nov-2014].

[102] “The Yahoo! Effect - Hortonworks.” [Online]. Available: http://hortonworks.com/blog/the-

yahoo-effect/. [Accessed: 25-Nov-2014].

[103] “Enterprise Hadoop from Hortonworks.” [Online]. Available:

http://hortonworks.com/why-hortonworks/. [Accessed: 27-Nov-2014].

[104] “Hortonworks CEO Rob Bearden: Beware the Hadoop fragmentation | ZDNet.” [Online].

Available: http://www.zdnet.com/hortonworks-ceo-rob-bearden-beware-the-hadoop-

fragmentation-7000013961/. [Accessed: 14-Nov-2014].

[105] “Built to Last: How MapR’s Business Model Supports That Goal | MapR.” [Online].

Available: https://www.mapr.com/blog/built-to-last-how-maprs-business-model-supports-

that-goal#.VHavzovF_D8. [Accessed: 27-Nov-2014].

[106] “Cloudera whoops as its Hadoop loop-the-loops for cloud troupe • The Register.”

[Online]. Available:

http://www.theregister.co.uk/2013/10/28/cloudera_hadoop_cloud_partnerships/.

[Accessed: 01-Dec-2014].

[107] M. Stonebraker, “Hadoop at a Crossroads?,” Communications of the ACM, 2014. [Online].

Available: http://cacm.acm.org/blogs/blog-cacm/177467-hadoop-at-a-

crossroads/fulltext#.U_-F6RqsWmc.twitter. [Accessed: 07-Oct-2014].

[108] “Analytics with Cassandra : DataStax.” [Online]. Available:

http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-

hadoop. [Accessed: 05-Dec-2014].