Is There Room For Another Elephant In Tucson

Preview:

DESCRIPTION

Would you like to scale data-intensive tasks horizontally? Would you like an open source project that gave you that foundation?Well, there is: Apache Hadoop. It's a Java software framework for supporting data-intensive distributed applications. The framework was inspired by Google papers on their MapReduce framework and Google File System.Who uses Hadoop? Here's a short list: Yahoo!, A9.com, LinkedIn, Facebook, ImageShack, eHarmony, Hulu, Last.fm, and The New York Times. The highest profile user, Yahoo!, is also a major contributor to the project. They use it extensively in their web search and advertising divisions.In this talk, titled "Is there room for another elephant in Tucson?", Andrew Lenards will tell us about Hadoop and describe how it could be applied to several practical problems, even if you aren't as big as Google.

Citation preview

Is There Room for Another                   Elephant in 

             Tucson?

Tucson Java User GroupAndrew Lenards

andrew.lenards@gmail UA grad, Dec 2001former teaching assistant UA CSformer instructor UA CSreformed .NET developer10 years on/off coding Java Co-founder UA Student ACMActive in: • Tucson Java User Group• Tucson Startup Drinks

Semi-active in: • Tucson Free Unix Group • Ubuntu Arizona Local Community

Why do I care?

Hadoop holds the TeraSoft, MinuteSort, GraySort benchmarks...  Captured TeraSort in 2008Captured MinuteSort, GraySort in 2009

Metrics (http://sortbenchmark.org/)GraySort: Sort rate (TBs / minute) achieved while sorting a very large amount of data (currently 100 TB minimum). MinuteSort: Amount of data that can be sorted in 60.00 seconds or less.TeraSort*: Elapsed time to sort 1012 bytes of data.  * Now deprecated

Establishing our setting...

Differing World Views

Brendan's quote...

"When a computer fails, they don't bother trying to find it and fix it.  They just add more machines."

-- former UA student, former Google intern, circa 2000

End of Moore's Law?

End of Moore's Law?

Scaling UpEndangered Practice?

Polar Bear Endangered

Species

DATA!

DATA!

Zettabytes???

• The New York Stock Exchange generates about one terabyte of new trade data per day.

• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage

• Ancestry.com stores around 2.5 petabytes of data.• The Internet Archive stores around 2 petabytes of data, and

is growing at a rate of 20 terabytes per month• The Large Hadron Collider will produce about 15 petabytes

of data per year

The "digital universe" is estimated to be 1.8 zettabytes by 2011

Source: "Hadoop: The Definitive Guide, Tom White"

MapReduce: the Abstraction

MapReduce

Introduced in 2004 Google paper: "MapReduce: Simplified Data Processing on Large Clusters"

 "Structured as functional programming meets distributed processing" (Aaron Kimball, Cloudera)

Designed for batch processing, not designed for interactive

MapReduce + RDBMS, not versus

Traditional RDBMS MapReduce

Data Size Gigabytes Petabytes

Access Interactive & batch Batch

Updates Read and write many times

Write once,read many times

Structure Static schema Dynamic schema

Integrity High Low

Scaling Nonlinear Linear

Source: "Hadoop: The Definite Guide, Tom White"

Shared-state makes everything hard...

Sharing requires the usage communication mechanisms between processes.  (which we know complicates things)The MapReduce abstraction limits communication to keep benefits. 

Mappers do not need to communicate Reducers do not need to communicate

Shared Nothing Architecture

Introduced in 1986 paper by Michael Stonebraker on distributed computing architectures, but applies to large scale web applications.   Note: Stonebraker was co-author of  "MapReduce: A Major Step Backwards" in January 2007.

Functional inspiration, but not dogmatic

Functions w/ no side-effects are pure functions• Map is an n-to-n operation • Fold is an n-to-1 operation

(often called a "reduce")    With MapReduce, we define a problem in Mappers & Reducers

However, a Mapper can produce more than 1 key per element.  And a Reducer may produce many values.  So the abstraction is not married to the functional model.

Partitioning work...

The design of scaling out horizontally with MapReduce is done by break large files into chunks (or blocks) and bringing computation to the data (data locality).   The "blocks" are the input to Mappers, so work partitioning is implicit to the system.

Raising the level of abstraction

MapReduce allows you to focus on the problem, let the library deal w/ the messy details An understanding of the high-level domain and the low-level details does not need to exist within the same human-form anymore.

MapReduce Usage

Example usage...

• Distributed Grep• Word Count / Count URL Frequency• Inverted Index• Term-Vector per Website• Reverse Web-Link Graph

Apache Web Server Logfiles

Consider we want to do a simple analysis of visits per host.

An abstract view of the inputs would be:

<k1, v1> -> Mapper -> <k2, v2> -> Reducer -> <k3, v3>

or

(<line-number>, <line>) --> Mapper --> (<hostname>, 1) (<hostname>, 1) --> Reducer --> (<hostname>, count)

crawl-66-249-71-34.googlebot.com - - [16/Aug/2009:04:40:36 -0700] "GET /tree/home.pages/searchTOL?taxon=Arna&Submit2=Find&startline=26 HTTP/1.1" 200 14693 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"astdenis-105-1-18-94.w81-248.abo.wanadoo.fr - - [16/Aug/2009:04:40:36 -0700] "GET /onlinecontributors/img/quicknav/RightArrow.png HTTP/1.1" 200 321 "http://www.tolweb.org/Echinodermata" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6)"localhost.localdomain - - [16/Aug/2009:04:40:36 -0700] "GET /onlinecontributors/app?service=external&page=ViewBranchOrLeaf&sp=SSiphonophorida&sp=S8149&sp=S HTTP/1.1" 200 5894 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"llf531045.crawl.yahoo.net - - [16/Aug/2009:04:40:36 -0700] "GET /Siphonophorida/8149 HTTP/1.0" 200 5894 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"localhost.localdomain - - [16/Aug/2009:04:40:36 -0700] "GET /onlinecontributors/app?service=external&page=ViewBranchOrLeaf&sp=SCampanulotes&sp=S73605&sp=S HTTP/1.1" 200 8571 "-" "Mozilla/4.0"65.55.108.238 - - [16/Aug/2009:04:40:36 -0700] "GET /Campanulotes/73605 HTTP/1.1" 200 8571 "-" "Mozilla/4.0"astdenis-105-1-18-94.w81-248.abo.wanadoo.fr - - [16/Aug/2009:04:40:36 -0700] "GET /tree/img/magnify.gif HTTP/1.1" 200 124 "http://www.tolweb.org/Echinodermata" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6)"localhost.localdomain - - [16/Aug/2009:04:40:37 -0700] "GET /onlinecontributors/app?service=external&page=ViewBranchOrLeaf&sp=SHomo&sp=S16418&sp=S HTTP/1.1" 200 10283 "-" "Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)"crawler5108.ask.com - - [16/Aug/2009:04:40:37 -0700] "GET /Homo/16418 HTTP/1.0" 200 10283 "-" "Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)"astdenis-105-1-18-94.w81-248.abo.wanadoo.fr - - [16/Aug/2009:04:40:37 -0700] "GET /tree/img/tinylink.png HTTP/1.1" 200 207 "http://www.tolweb.org/Echinodermata" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6)"msnbot-65-55-105-180.search.msn.com - - [16/Aug/2009:04:40:37 -0700] "GET /onlinecontributors/app?page=ImageGallery&service=external&sp=l27570&state:ImageGallery=ZH4sIAAAAAAAAAFvzloG1nJeBgYGJgYEtLz8l1TOluIiBLyuxLFEvJzEvXc8nPy%2FduvvJhDP9yveZGBi9GFjLEnNKUyuKGAQQivxKc5NSi9rWTJXlnvKgG2hURQEDGGRfKhdgYODNTU3JTHTOSSwu9swrAZoviNAKFEhNTy0SerRgyffGdgugFZ4wKwoZ6hgYQaYAAKhZ4XSlAAAA HTTP/1.1" 200 7899 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

Input to the Mappers...

(0, "crawl-66-249-71-34.googlebot.com - - [16/Aug...") (1, "localhost.localdomain - - [16/Aug/2009:04:40...")(2, "crawler5108.ask.com - - [16/Aug/2009:04:40:3...") (3, "msnbot-65-55-105-180.search.msn.com - - [16/...")(4, "astdenis-105-1-18-94.w81-248.abo.wanadoo.fr ...") ...

Output from Mappers,Input to the Reducers...("com.googlebot.crawl-66-249-71-34", 1)("com.googlebot.crawl-66-249-71-34", 1)("com.ask.crawler5108", 1)("com.ask.crawler5108", 1) ("com.msn.search.msnbot-65-55-220-136", 1) ("com.msn.search.msnbot-65-55-105-180", 1)("com.msn.search.msnbot-65-55-220-136", 1) ("com.msn.search.msnbot-65-55-220-136", 1) ("fr.wanadoo.abo.w81-248.astdenis-105-1-18-94", 1)

Output from Reducers...

("com.ask.crawler5108", 2) ("com.googlebot.crawl-66-249-71-34", 2) ("com.msn.search.msnbot-65-55-220-136", 3)("com.msn.search.msnbot-65-55-105-180", 1)("fr.wanadoo.abo.w81-248.astdenis-105-1-18-94", 1)...

Using the analysis...

We know that analysis of logfiles is a particularly well-suited problem for MapReduce.  But what do companies use the resulting analysis for?  Rackspace's mail division, Mailtrust, used Hadoop for processing email logs.  They use an ad hoc query to determine geographic distribution of their users.  Then, they scheduled this MapReduce job to run monthly and use it help decide where to place new mail servers in their data centers  Source: "Hadoop: The Definitive Guide, by Tom White"

A Yellow Elephant Enters...

Apache Hadoop Project

3 years old... Grew out of the Lucene & Nutch projects."[I]n a nutshell... Hadoops provides: a reliable shared storage and analysis system."

-- "Hadoop: The Definitive Guide, Tom White" Storage: Hadoop Distributed Filesystem (HDFS)Analysis: MapReduce implementation

... and a small ecosystem of supporting sub-projects

Hadoop's assumptions

•   Hardware is going to failure•   Access is going to be in batch processing, so high

  throughput trumps low latency data access•   Data sets are large, files will be gigabytes to terabytes

  in size•   Write-once-read-many is the file access needed by

  applications •   Moving computation is cheaper than moving data•   Must be portable from one platform to another (both 

  software & hardware)

Coke/Pepsi, Google/Hadoop

There is nearly a one-to-one mapping between the Google architecture and Apache Hadoop

Google/Hadoop Decoder Ring

MapReduceGoogle Filesystem (GFS) BigTableChubby Lock SystemSawzall ....

Hadoop MapReduceHDFSHBaseZooKeeperPig ....

NameNode, DataNodes

Only one dedicated machine will run NameNode software service for an entire cluster.  Each machine in a cluster will run DataNode software services.

NameNode plays role of arbitrator & metadata repository (for HDFS).  User data never flows through the NameNode.

NameNode maintains the file system namespace.

Any change to the file system namespace or its properties is recorded by the NameNode.Yes, this means there is a Single Point of Failure. 

Big Files, Narrow Access Pattern

HDFS is optimized to store LARGE files, on the order of Gigabytes.  

Files are wrote to disk, start-to-finish, and then immutable.  

Files are read from disk, start-to-finish, by client applications (like MapReduce jobs).

Files are redundantly stored.

HDFS

Filesystem is an unfortunate name because it makes us think about files and directories.  We really should think about HDFS as a 'dataset system.'

JobTracker, TaskTracker

JobTracker runs on the NameNodeTaskTracker runs on each DataNode

JobTracker pushes out work to available TaskTrackers in the cluster.  It attempts to keep the computation close to the data (again, data locality).  But, if it cannot find an available TaskTracker with the data block needed for the task - it will attempt to schedule with a machine on the same rack. 

So, this means that JobTracker is "rack-aware" (or, that it understands the network topology of the cluster).

Re-execute slow running tasks

To avoid the "Convoy Effect", slow running tasks may be reassigned for execution by another DataNode holding the data for a block.   This means that failing or slow hardware will be hold up the rest of the computations for the job.

Re-execution of tasks can be done when "speculative-execution" enabled.

Hadoop Ecosystem• HBase

A distributed column-oriented database (BigTable impl)• Hive

A distributed data warehouse.• Pig

A data flow language & execution environments for exploring very large datasets.

• ZookeeperA distributed, highly available coordination service.

• ChukwaA distributed data collection & analysis system.

• AvroA data serialization system for efficient, cross-language RPC, and persistent data storage

Where to next? • Cloudera Training Videos

http://cloudera.com/hadoop-training• Hadoop: The Definitive Guide, Tom White, O’Reilly/Yahoo!

• Intro to Parallel Programming & MapReducehttp://code.google.com/edu/parallel/mapreduce-tutorial.html

• Google Papers MapReduce, Google Filesystem, BigTable

• Trending Topicshttp://www.trendingtopics.org/

• Tutorials everywhere!

Acknowledgments

iPlant Collaborative for a job (& allowing me to research Hadoop)

Cloudera and Aaron Kimball for training videos Tom White for "Hadoop: The Definitive Guide"

Photo Acknowledgments

S#01: http://www.flickr.com/photos/kitkaphotogirl/3186255594/sizes/o/

S#02: taken by Alex YelichS#05: http://www.flickr.com/photos/toptechwriter/322770006/sizes/o/ http://bit.ly/29hzL1 & http://bit.ly/3G592M

S#07: http://jonasboner.com/talks/state_youre_doing_it_wrong/pictures/moores_law.jpg

S#08: http://milwaukee.indymedia.org/en/2006/05/205520.shtml

S#09: http://www.flickr.com/photos/ucumari/1203329752/sizes/l/

S#10: http://www.flickr.com/photos/t/236605/sizes/l/

S#11: Andrew Lenards

S#13: http://www.flickr.com/photos/bbaltimore/1412386/sizes/o/

S#44: http://www.flickr.com/photos/autumn_bliss/414160235/sizes/o/

http://creativecommons.org/licenses/by-nc-sa/3.0/

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site.

Recommended