The BDAS Open Source Community

  • View
    2.399

  • Download
    7

  • Category

    Software

Preview:

Citation preview

The BDAS Open Source Community

UC  BERKELEY  

Ion Stoica UC Berkeley and Databricks

Growing Beyond AMPLab As software matures and becomes successful, more and more contributors outside AMPLab New startups have anchored development » Databricks (Spark Stack) » Mesosphere (Mesos) » …

Enables AMPLab to focus more resources on future systems instead of software maintenance

Apache Spark

Velox Model Serving

Tachyon

SparkStreaming SparkSQL

BlinkDB

GraphX MLlib

MLBase SparkR

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark (core)

HDFS, S3, …Apache Mesos Yarn

Tachyon

Apache Spark Open Source: end of 2010 Apache Project: 2013 Over time has grown to include key libraries » SparkStreaming, SparkSQL, MLlib, GraphX

Becoming a platform for Big Data apps

Apache Spark Today M

apRe

duce

YARN H

DFS Stor

m

Spar

k

0200400600800

100012001400160018002000

Map

Redu

ce

YARN

HD

FS

Stor

m

Spar

k

0

50000

100000

150000

200000

250000

300000

350000

Commits Lines of Code ChangedActivity in past 6 months

2-3x more activity than: Hadoop, Storm, MongoDB, NumPy, D3, Julia, …

Meetups Around the World

Monthly Contributors

0

25

50

75

100

2011 2012 2013 2014

370+ contributors for last 12 months

Databricks founded

Spark Stack (2013)

Tachyon

SparkStreaming

BlinkDB

MLlib

MLBase

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark (core)

HDFS, S3, …Apache Mesos Yarn

Tachyon

Shark

MLlib

Last Year Developments

Tachyon

SparkStreaming

BlinkDB MLBase

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark (core)

HDFS, S3, …Apache Mesos Yarn

Tachyon

SharkSparkSQL GraphX MLlib

Tachyon

SparkR

TachyonUC  BERKELEY  

TachyonUC  BERKELEY  

…  

UC  BERKELEY  

BlinkDB

Velox Model Serving

Wide Adoption All major Hadoop distributions include Spark

Beyond Hadoop

Wide Adoption All major Hadoop distributions include Spark

Beyond Hadoop

Databricks: spurred Spark’s enterprise growth

partners

partners

Apache Mesos

Velox Model Serving

Tachyon

SparkStreaming SparkSQL

BlinkDB

GraphX MLlib

MLBase SparkR

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark

HDFS, S3, …Apache Mesos Yarn

Tachyon

Apache Mesos Open Source: 2010 Apache Project: 2012 Used in production at Twitter for past 2.5 years » +10,000 machines » +500 engineers using it

Most development moved outside Berkeley starting with 2012

Monthly Contributors

65 contributors for last 12 months

Mesosphere founded

BDAS Stack

Velox Model Serving

Tachyon

SparkStreaming SparkSQL

BlinkDB

GraphX MLlib

MLBase SparkR

Cancer Genomics, Energy Debugging, Smart BuildingsSample Clean

Apache Spark

HDFS, S3, …Apache Mesos Yarn

Tachyon

Release Growth

Tachyon 0.2: - 3 contributors

Feb ‘14 Oct‘13 Apr ‘13

Tachyon 0.3: - 15 contributors

Tachyon 0.4: - 30 contributors

16  July ‘14

Tachyon 0.5: - 46 contributors

Tachyon 0.1: -1 contributor

Dec ‘12

Fast Growing Community

Berkeley Contributors Non-Berkeley Contributors (20+ companies)

~80% contributors already outside AMPLab

Reaching Tipping Point

18  

Research to Real-World Impact

Research Real-world Impact

Apache Spark (core)

MLlib Spark Streaming

Spark SQL

Apache Mesos

GraphX

Tachyon Velox

Succinct

ADAM

BlinkDB

AMPLab/Berkeley Non-Berkeley

com

mitt

ers

/ com

mits

Impact on AMPLab Created blue-print & ecosystem for other BDAS components to succeed » MLlib, GraphX, Tachyon, …

Enabled AMPLab to increase focus on new research projects » Velox, ADAM, Succinct, …

Recommended