Upload
tyrone-systems
View
373
Download
1
Embed Size (px)
Citation preview
Big Data Analytics
It is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other real- time insights.
Use of Big Data Analytics – Google Search recommendations, Satyamev jayte, Genes reading Data Mining Big data Analytics
Data constraints like data must be neat and clean
Big data can not be neat as it is unstructured
Elaborate ETL required thus have to wait for completion of ETL cycle for insights.
Big data analytics provide real – time insights.
Descriptive
Diagnostic
Predictive
Prescriptive
Relational databases failed to store and process Big Data.
As a result, a new class of big data technology has emerged and is being used in many big data analytics environments.
The technologies associated with big data analytics include
Hadoop Mapreduce NoSQL
Hadoop is a open source framework
Java-based programming framework
Processing and storing of large data sets
Distributed computing environment.
Components of hadoop
HDFS( hadoop distributed file system)
Mapreduce
HDFS stores data in DISTRIBUTED,SCALABLE and FAULT-TOLERANT WAY.
Name node have metadata about data on DataNodes
DataNodes actually have data on them in form of blocks and they are capable of communicating
MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
as in previous example twitter data was processed on different servers on basis of months .
Hadoop is the physical implementation of Mapreduce .
It is combination of 2 java functions : Mapper() and Reducer()
example: to check popularity of text.
Mapper function maps the split files and provide input to reducer
Mapper ( filename , file –contents):for each word in file-contents:
emit (word , 1) Reducer function clubs the input provided by mapper and produce output
Reducer ( word , values):sum=0;for each value in values:
sum=sum + valueemit(word , sum)
Not only SQL
Non- relational database management system
Used where no fix schemas are required and data is scaled horizontally.
4 Categories of Nosql databases: Key-value pair Columnar database Graph databases Document databases
KEY-VALUE PAIR
keys used to get Value from opaque Data blocks
Hash map
Tremendously fast
Drawback:No provision for content based queries .
Stay Tuned With Us for More Information
https://www.linkedin.com/company/tyronesystems
https://twitter.com/tyronesystems
https://www.facebook.com/tyronesystems