30
Lessons in Apache Software integration Roman Shaposhnik [email protected] Cloudera Inc.

Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik [email protected] Cloudera Inc

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Lessons in Apache Software integration

Roman Shaposhnik [email protected]

Cloudera Inc.

Page 2: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

$ whoami● An open source (UNIX) software developer

– Linux kernel, C/C++ compilers, FFmpeg, Plan9

● A Hadoop guy● Apache Software Foundation Incubator PMC

– [Bigtop], Hadoop Development Tools, Celix, Helix

● VP of Apache Bigtop

Page 3: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Apache Bigtop

“open-source software related to a system for integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop”

Page 4: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

4

Remember what Debian did to Linux?

GNU Software Linux kernelLinux kernel

Page 5: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

5

Bigtop is trying to do it with Hadoop

Hadoop Ecosystem(Pig, Hive, Mahout)

Linux kernelHadoop(HDFS + MR)

CDH4 beta 1

Page 6: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

What is missing, really?● Bigdata management platform view● An generalist 'yin' for specialist 'yang'● Shared, community driven:

– Use cases

– Best practices

– Upcoming standards

– Integration with

Page 7: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc
Page 8: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc
Page 9: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc
Page 10: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc
Page 11: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

One way of using ASF software:$ wget http://apache.org/httpd.tar.gz

$ tar xzvf httpd.tar.gz

$ cd httpd

$ ./configure ; make

$ make install

ERROR: can't write to /usr/local/bin

$ sudo make install

Page 12: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

A different way:

$ sudo apt-get install httpd

Would you like to also upgrade your conf?

Page 13: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

An “ultimate” way:

$ bigtop launch-cluster –config ./hbase.ini

Page 14: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Aren't we already there?

$ whirr launch-cluster –config ./hbase.ini

Page 15: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Key challenges● A really diverse set of components● High churn APIs● Asynchronous development cycles● Combinatoric explosion of dependencies● Java based● Fundamentally distributed applications

Page 16: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

16

ZooKeeper (coordination)

HUE (web based UI)

HBase YARN/MR1HBase

HDFS (filesystem)

Pig (DQL) Hive (SQL) Impala (SQL)

Oozie

Page 17: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

17

ZooKeeper (coordination)

HUE (web based UI)

HBase YARN/MR1HBase

HDFS (filesystem)

Pig (DQL) Hive (SQL) Impala (SQL)

Oozie

Page 18: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

18

It is a jungle out there

Zookeeper

Hadoop

HDFS

YARN

MR1

HTTPFS

HBase

Pig

Hive

Impala

Sqoop

Oozie

Whirr

Mahout

Flume

Giraph

Hama

Hue

Solr

Crunch

JDK/JRE

Kerberos

Ganglia

Nagios

JSVC

Tomcat

Utils

Postgress

HTTPD

Page 19: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

19

HBaseHBase

Hadoop (1.0, 0.22, 0.23)

Dependencies Inferno:

Hive 0.8.1

HBaseHbase (0.92, 0.90)

A million dollar question:

$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib

Page 20: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

20

HBaseHBase

Hadoop (1.0, 0.22, 0.23)

Dependencies Inferno:

Hive 0.8.1

HBaseHbase (0.92, 0.90)

A million dollar question:

$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib

hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar

Page 21: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Lessons in cat herding● Admitting the problem● On origins of suffering● You can't make “them” do “it”● The real world is highly asynchronous ● The art of making friends● YLH

Page 22: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

The origin of suffering is attachment● Don't get attached to your code ● Don't waste your time on ill-maintained code● Don't second guess your users● Do provide capabilities, not polices● Do focus on specialization● Do allow customization

Page 23: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

You can't make “them” do “it”● Don't expect common dependencies● Don't expect agreement on use cases● Don't ask – offer:

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-core</artifactId>

<version>${hadoop.version}</version>

<optional>true</optional> <classifier>hadoop-2.0.2</classifier>

Page 24: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Embrace asynchronous nature ● Don't expect flag days● Don't expect agreement on releases● Do practice Last Known Good Builds

Av1 Bv22

Cv3 Dv4

Av1 Bv2

Cv3 Dv2

........Av1 Bv2

Cv3 Dv4

Bv22

Dv44

Page 25: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Make yourself indispensable ● Be nice● Do provide glue code● Do provide tons of automation● Do provide missing testing● Do participate in upstream communities:

– RC votes

– Release Planning

Page 26: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

What does Bigtop offer:● Community focused on all of the above● Software for:

– Integration

– Build (make, Maven)

– Packaging (RPM, DEB)

– Deployment (Puppet)

– Testing (iTest)

● A continuous integration Jenkins server

Page 27: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

Who's on-board?● Cloudera

– CDH4 is 100% based on Bigtop (hadoop v2)

● WANdisco● TrendMicro● Hortonworks, EMC, EBay, Intel (partially)● Canonical

– Ubuntu Server: Hadoop and Bigdata blueprint

● Illumos (early stages of interest)

Page 28: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

What's happening● A special release: Bigtop 0.3.0-incubating

– Hadoop 1.0.1

● Last stable release: Bigtop 0.5.0– Hadoop 2.0.2-alpha

● Next stable release: Bigtop 0.6.0– End of Mar 2013 release

– Hadoop 2.0.3-beta (DANGER! DANGER!)

– Major focus on developers

Page 29: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

What does Bigtop need?● More of you!

– “Silicon Valley Hands-on Programming”http://www.meetup.com/HandsOnProgrammingEvents/

● More infrastructure for build/test– EC2, Supercell, EMC magic cluster,

CloudStack

● More integration tests– Convince your bosses to commit to Bigtop

● Validate upstream release using Bigtop

Page 30: Lessons in Apache Software integrationarchive.apachecon.com/na2013/presentations/28-Thursday...Lessons in Apache Software integration Roman Shaposhnik rvs@apache.org Cloudera Inc

How to get in touch● Bigtop home @Apache:

– http://bigtop.apache.org/

● Hangout places:– {dev,user}@bigtop.apache.org– #bigtop on Freenode

● Roman Shaposhnik– [email protected], [email protected]