Transcript
Page 1: Analyzing Hadoop with Hadoop

Analyzing Hadoop

with HadoopMontag, 4. Juni 12

Page 2: Analyzing Hadoop with Hadoop

© [email protected], confidential - Do not distribute

Data Grows Faster Than Moore's Law!

Unstructured: 61.7% growth

Structured: 21.8 % growth

http://www.emc.com/about/news/press/2011/20110628-01.htm

Montag, 4. Juni 12

Page 3: Analyzing Hadoop with Hadoop

© [email protected], confidential - Do not distribute

Data Warehouse

Static

ETL

Slow

Business Intelligence

Barrier

Hadoop

Dynamic

Raw Load

Fast

Analytics

Agile

30+ Years Workflow

Montag, 4. Juni 12

Page 4: Analyzing Hadoop with Hadoop

SQL

Hadoop + Hive

NO-SQL Hadoop 10+MLOC

http://dearcomputer.nl/gir/?q=nerd+&s=4&b=Rip+Google!

http://thepage.time.com/2009/04/18/why-is-this-elephant-crying/

Montag, 4. Juni 12

Page 5: Analyzing Hadoop with Hadoop

Evolution backward

http://chelseavose.wordpress.com/2012/01/26/is-evolution-real/

Structured English Query Language

1970’SEQUEL

ANSI SQL ORM JDO NO-SQL Hive

Montag, 4. Juni 12

Page 6: Analyzing Hadoop with Hadoop

Unstructured + Structured

Montag, 4. Juni 12

Page 7: Analyzing Hadoop with Hadoop

git log --numstat --pretty=format:%H,%ai,%cn,%ce%+B

Montag, 4. Juni 12

Page 8: Analyzing Hadoop with Hadoop

Data Quality?

Montag, 4. Juni 12

Page 9: Analyzing Hadoop with Hadoop

Results...

Montag, 4. Juni 12

Page 10: Analyzing Hadoop with Hadoop

Commits per Year

200

Montag, 4. Juni 12

Page 11: Analyzing Hadoop with Hadoop

LOC Changes per Year

7,000,000

Montag, 4. Juni 12

Page 12: Analyzing Hadoop with Hadoop

Most Lines Added

1,500,000

Montag, 4. Juni 12

Page 13: Analyzing Hadoop with Hadoop

2006 eMails vs Commits

72

commitsemails

Montag, 4. Juni 12

Page 14: Analyzing Hadoop with Hadoop

2011 eMails vs Commitscommitsemails

559

Montag, 4. Juni 12

Page 15: Analyzing Hadoop with Hadoop

EMails per Month

800

Montag, 4. Juni 12

Page 16: Analyzing Hadoop with Hadoop

Most Discussed, Least Changed

Montag, 4. Juni 12

Page 17: Analyzing Hadoop with Hadoop

Most Active Emailers

900

Montag, 4. Juni 12

Page 18: Analyzing Hadoop with Hadoop

We’re hiring!

Montag, 4. Juni 12

Page 19: Analyzing Hadoop with Hadoop

Emails with Most Replies

Montag, 4. Juni 12

Page 21: Analyzing Hadoop with Hadoop

Longest Comment

35,000

Montag, 4. Juni 12

Page 22: Analyzing Hadoop with Hadoop

Email Activity per Timezone

Montag, 4. Juni 12

Page 23: Analyzing Hadoop with Hadoop

Follow us: @datameer

Montag, 4. Juni 12


Recommended