25
Hadoop Hands On Successes and failures to drive evolution Benoit PERROUD Software Engineer @Verisign & Apache Committer GITI BigData, EPFL, November 6. 2012

Hadoop Successes and Failures to Drive Deployment Evolution

Embed Size (px)

Citation preview

Hadoop Hands OnSuccesses and failures to drive

evolution

Benoit PERROUD

Software Engineer @Verisign & Apache Committer

GITI BigData, EPFL, November 6. 2012

2Verisign Public

• I apologize for speaking “Frenglish”

• The views and statements expressed in this talk do not necessarily reflect the

views of VeriSign, Inc and any other person involved in the company do not

warrant the accuracy, reliability, currency or completeness of those views or

statements and do not accept any legal liability whatsoever arising from any

reliance on the views, statements and subject matter of the talk.

• Apache, Apache Hadoop, Hadoop, Cassandra, Apache Cassandra, Solr, Apache

Solr, Hbase, Apache Hbase, Tomcat, Apache Tomcat, Zookeeper, Apache

Zookeeper, Lucene, Apache Lucene and the yellow elephant logo are either

registered trademarks or trademarks of the Apache Software Foundation in the

United States and/or other countries.

• Java, Glassfish and the Java logo are registered trademarks of Oracle and/or its

affiliates

• Python and the Python logo are either registered trademarks or trademarks of the

Python Software Foundation

• MongoDB, Mongo and the leaf logo are registered trademarks of 10gen, Inc.

• All other marks are the property of their respective owners.

Disclaimer

3Verisign Public

Let’s talk about Hadoop!

4Verisign Public

1. MapReduce Processing Framework

• Map Combine Shuffle Reduce

2. Distributed File System (HDFS)

Hadoop 10k Feet View

Credit: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

5Verisign Public

• Pseudo-distributed mode on a single node

Your first Hadoop Deployment

6Verisign Public

• TaskTracker (TT) and DataNode (DN) is moved to a

dedicated box

Going Distributed

7Verisign Public

• NameNode crashes. Configuring PNN and SNN

NameNode Single Point of Failure

NFS HA setup is not detailed here.

8Verisign Public

• Data could be internal to the company, but also

external.

Bringing Data into the Cluster

Data Retrieval and Stream Ingestion

are over simplified.

9Verisign Public

• Integration/Validation Cluster setup

Dealing with API Changes

Validation Cluster will be omitted

in further slides for more clarity

10Verisign Public

Cluster Is Growing

11Verisign Public

Add Monitoring

12Verisign Public

Turn On Rack Awareness

13Verisign Public

Split the Cluster to Production and Research

14Verisign Public

Data Retrieval through REST End Point

15Verisign Public

Data Retrieval with Search Features

16Verisign Public

Data Retrieval add Cache

17Verisign Public

Data Visualization Tools

18Verisign Public

Upstream Updates Channel

19Verisign Public

Realtime Updates

20Verisign Public

• Hadoop Next Gen

• YARN (2.0)

• Graph processing

• Neo4J

• Google Pregel / Apache Hama

• Incremental Updates

• Real time ad hoc queries

• Cloudera Impala / Google Dremel

Future Evolutions

21Verisign Public

• Hadoop has gained huge momentum

• Technologies (around Hadoop) are evolving really fast

• There is no “One size fits all” solution

• Design hardly driven by customer needs

• Data quality is a hidden requirement

Conclusion

22Verisign Public

• Data Scientists cost a lot

• Running on commodity hardware still costs a lot

• No one has the full understanding of the full data flow

• And you need several FTE just to track the architecture

• You have a high risk of misuse of these softwares

• Hiring engineers with deep knowledge (meaning:

hands on experience) in some of these softwares is

already a challenge

Conclusion #2

23Verisign Public

Hadoop In Practice

by Alex Holmes

Senior Software Engineer @Verisign

Recommended Reading

24Verisign Public

Q & A

Benoit PERROUD

[email protected]

Thank You

© 2012 VeriSign, Inc. All rights reserved. VERISIGN and other trademarks, service marks, and

designs are registered or unregistered trademarks of VeriSign, Inc. and its subsidiaries in the United

States and in foreign countries. All other trademarks are property of their respective owners.