4
Key Big Data Terms You Should Know Harish Kotadia, Ph.D. Blog: http://HKotadia.com Twitter: http://twitter.com/HKotadia LinkedIn: http://www.linkedin.com/in/HKotadia 1 © 2013 Harish Kotadia, Ph.D.

Key big data terms you should know

Embed Size (px)

DESCRIPTION

Listing of key Big Data terms that you should know and a very brief explanation of what it is in simple language. Hope you find it useful.

Citation preview

Page 1: Key big data terms you should know

© 2013 Harish Kotadia, Ph.D. 1

Key Big Data Terms You Should Know

Harish Kotadia, Ph.D.

Blog: http://HKotadia.comTwitter: http://twitter.com/HKotadia

LinkedIn: http://www.linkedin.com/in/HKotadia

Page 2: Key big data terms you should know

© 2013 Harish Kotadia, Ph.D. 2

Key Big Data Terms You Should Know1. Hadoop: System for processing very large data sets

2. HDFS or Hadoop Distributed File System: For storage of large volume of data (key elements – Datanodes, Namenode and Tasktracker)

3. MapReduce: Think of it as Assembly level language for distributed computing. Used for computation in Hadoop

4. Pig: Developed by Yahoo. It is a higher level language than MapReduce

5. Hive: Higher level language developed by Facebook with SQL like syntax

6. Apache HBase: For real-time access to Hadoop data

7. Accumulo: Improved HBase with new features like cell level security

8. AVRO: New data serialization format (protocol buffers etc.)

9. Apache ZooKeeper: Distributed co-ordination system

Page 3: Key big data terms you should know

3

Key Big Data Terms You Should Know

10. HCatalog: For combining meta store of Hive and merging with what Pig does

11. Oozie: Scheduling system developed by Yahoo

12. Flume: Log aggregation system

13. Whirr: For automating hadoop cluster processing

14. Sqoop: For transfering structured data to Hadoop

15. Mahout: Machine learning on top of MapReduce

16. Bigtop: Integrate multiple Hadoop  sub-systems into one that works as a whole

17. Crunch:  Runs on top of MapReduce, Java API for tedious tasks like joining

18. Giraph: Used for large scale distributed graph processing

© 2013 Harish Kotadia, Ph.D.

Page 4: Key big data terms you should know

4

for more, check out my blog:

© 2013 Harish Kotadia, Ph.D.

http://hkotadia.com/