Big Data Analytics

Preview:

Citation preview

Big Data Analytics

It is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other real- time insights.

Use of Big Data Analytics – Google Search recommendations, Satyamev jayte, Genes reading Data Mining Big data Analytics

Data constraints like data must be neat and clean

Big data can not be neat as it is unstructured

Elaborate ETL required thus have to wait for completion of ETL cycle for insights.

Big data analytics provide real – time insights.

Descriptive

Diagnostic

Predictive

Prescriptive

Relational databases failed to store and process Big Data.

As a result, a new class of big data technology has emerged and is being used in many big data analytics environments. 

The technologies associated with big data analytics include

Hadoop Mapreduce NoSQL

Hadoop is a open source framework

Java-based programming framework

Processing and storing of large data sets

Distributed computing environment.

Components of hadoop

HDFS( hadoop distributed file system)

Mapreduce

HDFS stores data in DISTRIBUTED,SCALABLE and FAULT-TOLERANT WAY.

Name node have metadata about data on DataNodes

DataNodes actually have data on them in form of blocks and they are capable of communicating

MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

as in previous example twitter data was processed on different servers on basis of months .

Hadoop is the physical implementation of Mapreduce .

It is combination of 2 java functions : Mapper() and Reducer()

example: to check popularity of text.

Mapper function maps the split files and provide input to reducer

Mapper ( filename , file –contents):for each word in file-contents:

emit (word , 1) Reducer function clubs the input provided by mapper and produce output

Reducer ( word , values):sum=0;for each value in values:

sum=sum + valueemit(word , sum)

Not only SQL

Non- relational database management system

Used where no fix schemas are required and data is scaled horizontally.

4 Categories of Nosql databases: Key-value pair Columnar database Graph databases Document databases

KEY-VALUE PAIR

keys used to get Value from opaque Data blocks

Hash map

Tremendously fast

Drawback:No provision for content based queries .

Stay Tuned With Us for More Information

https://www.linkedin.com/company/tyronesystems

https://twitter.com/tyronesystems

https://www.facebook.com/tyronesystems

Recommended