Upload
harish-kotadia
View
18
Download
1
Embed Size (px)
DESCRIPTION
Key Big Data Terms
Citation preview
Key Big Data Terms You Should Know
Harish Kotadia, Ph.D.
Blog: http://HKotadia.com
Twitter: http://twitter.com/HKotadia
LinkedIn: http://www.linkedin.com/in/HKotadia
1 © 2013 Harish Kotadia, Ph.D.
Key Big Data Terms You Should Know 1. Hadoop: System for processing very large data sets
2. HDFS or Hadoop Distributed File System: For storage of large volume of data (key elements – Datanodes, Namenode and Tasktracker)
3. MapReduce: Think of it as Assembly level language for distributed computing. Used for computation in Hadoop
4. Pig: Developed by Yahoo. It is a higher level language than MapReduce
5. Hive: Higher level language developed by Facebook with SQL like syntax
6. Apache HBase: For real-time access to Hadoop data
7. Accumulo: Improved HBase with new features like cell level security
8. AVRO: New data serialization format (protocol buffers etc.)
9. Apache ZooKeeper: Distributed co-ordination system
2 © 2013 Harish Kotadia, Ph.D.
Key Big Data Terms You Should Know
10. HCatalog: For combining meta store of Hive and merging with what Pig does
11. Oozie: Scheduling system developed by Yahoo
12. Flume: Log aggregation system
13. Whirr: For automating hadoop cluster processing
14. Sqoop: For transfering structured data to Hadoop
15. Mahout: Machine learning on top of MapReduce
16. Bigtop: Integrate multiple Hadoop sub-systems into one that works as a whole
17. Crunch: Runs on top of MapReduce, Java API for tedious tasks like joining
18. Giraph: Used for large scale distributed graph processing
3 © 2013 Harish Kotadia, Ph.D.
for more, check out my blog:
4 © 2013 Harish Kotadia, Ph.D.
http://hkotadia.com/