My talk at LVEE 2016

Preview:

Citation preview

Using Hadoop stack to build a cloud VATdeclarations revising service

Alex ChistyakovGit in Sky

Grodno, LVEE 2016

Who I am

● Hello, my name is Alex

● Principal Engineer @ Git in Sky

● Hadoop operations engineer

● Former Java developer (not only Java and not so

“former” in fact)

Who are you?

● Linux and OSS enthusiasts?

● Software developers?

● DevOps engineers?

● Big data guys?

Well, what is this all about?

● Configuring a Hadoop/HBase cluster is easy

Well, what is this all about?

● Configuring a Hadoop/HBase cluster is easy

● 1) Buy a lot of hardware

Well, what is this all about?

● Configuring a Hadoop/HBase cluster is easy

● 1) Buy a lot of hardware

● 2) Configure the bloody cluster!

Well, what is this all about?

● Configuring a Hadoop/HBase cluster is easy

● 1) Buy a lot of hardware

● 2) Configure the bloody cluster!

● 3) ???

Well, what is this all about?

● Configuring a Hadoop/HBase cluster is easy

● 1) Buy a lot of hardware

● 2) Configure the bloody cluster!

● 3) ???

● 4) PROFIT!!!

Big Data is hard!

● A customer wants a number of environments fordifferent purposes (dev, testing, staging &production)

● DevOps culture requires repeatability!

● (Observe a beautiful snowflake to the right)

● Business wants to reduce costs

So, we need a detailed plan

● 1) Buy an enterprise subscription from Oracle

So, we need a detailed plan

● 1) Buy an enterprise subscription from Oracle

● ^ FAIL!

So, we need a detailed plan

● 1) Read the manual on the product site

So, we need a detailed plan

● 1) Read the manual on the product site

● 2) Configure everything manually

So, we need a detailed plan

● 1) Read the manual on the product site

● 2) Configure everything manually

● ^ FAIL!

So, we need a detailed plan

● 1) Take Cloudera distribution of Hadoop

So, we need a detailed plan

● 1) Take Cloudera distribution of Hadoop

● 2) Configure everything from a web interface

So, we need a detailed plan

● 1) Take Cloudera distribution of Hadoop

● 2) Configure everything from a web interface

● 3) Don’t forget to buy an enterprise subscription

So, we need a detailed plan

● 1) Take Cloudera distribution of Hadoop

● 2) Configure everything from a web interface

● 3) Don’t forget to buy an enterprise subscription

● 4) ^ MULTIPLE FAILS!!!

A word on proprietary software

● Proprietary software is full of nasty bugs, period

A word on open source software

● Open source software is awesome

Software market in 2016

● It’s not “proprietary vs open source”

Software market in 2016

● It’s not “proprietary vs open source”

● It’s “open source vs open source”

Open source vs open source

● Cloudera CDH vs vanilla Apache

So, we need a detailed plan

● 1) Hire a DevOps engineer

So, we need a detailed plan

● 1) Hire a DevOps engineer

● 2) Use Chef or something

So, we need a detailed plan

● 1) Hire a DevOps engineer

● 2) Use Chef or something

● 3) Automate all the things

So, we need a detailed plan

● 1) Hire a DevOps engineer

● 2) Use Chef or something

● 3) Automate all the things

● 4) ???

So, we need a detailed plan

● 1) Hire a DevOps engineer

● 2) Use Chef or something

● 3) Automate all the things

● 4) ???

● 5) PROFIT!!!

100 reasons not to use Cloudera CDH

● Cloudera CDH obscures configuration

● Cloudera CDH generates textual configs from the DB

● Cloudera CDH is web-interface centric

● Cloudera CDH is a monolith with a vendor lock-in

Our own little open source product

● Based on Ansible (Ansible is like Chef but awesome)

● https://github.com/gitinsky/ansible-hadoop-stack-howto

● https://github.com/gitinsky/ansible-role-*

Problems

● Lack of documentation

Problems

● Lack of documentation

● Lack of manpower

Problems

● Lack of documentation

● Lack of manpower

● Nobody uses our product (except us)

What about the VAT service thing?

● Forget it, it’s not that relevant

Conclusions

● Open source software is awesome

● But Cloudera CDH is not

● We can make open source software better

So long, and thanks for all the fish!

● Ask your questions please

● Alex Chistyakov, Principal Engineer @ Git in Sky

● http://gitinsky.com

● alex@gitinsky.com

● http://meetup.com/DevOps-40