Upload
harish-kotadia
View
715
Download
4
Embed Size (px)
DESCRIPTION
Listing of key Big Data terms that you should know and a very brief explanation of what it is in simple language. Hope you find it useful.
Citation preview
© 2013 Harish Kotadia, Ph.D. 1
Key Big Data Terms You Should Know
Harish Kotadia, Ph.D.
Blog: http://HKotadia.comTwitter: http://twitter.com/HKotadia
LinkedIn: http://www.linkedin.com/in/HKotadia
© 2013 Harish Kotadia, Ph.D. 2
Key Big Data Terms You Should Know1. Hadoop: System for processing very large data sets
2. HDFS or Hadoop Distributed File System: For storage of large volume of data (key elements – Datanodes, Namenode and Tasktracker)
3. MapReduce: Think of it as Assembly level language for distributed computing. Used for computation in Hadoop
4. Pig: Developed by Yahoo. It is a higher level language than MapReduce
5. Hive: Higher level language developed by Facebook with SQL like syntax
6. Apache HBase: For real-time access to Hadoop data
7. Accumulo: Improved HBase with new features like cell level security
8. AVRO: New data serialization format (protocol buffers etc.)
9. Apache ZooKeeper: Distributed co-ordination system
3
Key Big Data Terms You Should Know
10. HCatalog: For combining meta store of Hive and merging with what Pig does
11. Oozie: Scheduling system developed by Yahoo
12. Flume: Log aggregation system
13. Whirr: For automating hadoop cluster processing
14. Sqoop: For transfering structured data to Hadoop
15. Mahout: Machine learning on top of MapReduce
16. Bigtop: Integrate multiple Hadoop sub-systems into one that works as a whole
17. Crunch: Runs on top of MapReduce, Java API for tedious tasks like joining
18. Giraph: Used for large scale distributed graph processing
© 2013 Harish Kotadia, Ph.D.
4
for more, check out my blog:
© 2013 Harish Kotadia, Ph.D.
http://hkotadia.com/