11
Big Data Analytics

Big Data Analytics

Embed Size (px)

Citation preview

Page 1: Big Data Analytics

Big Data Analytics

Page 2: Big Data Analytics

It is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other real- time insights.

Use of Big Data Analytics – Google Search recommendations, Satyamev jayte, Genes reading Data Mining Big data Analytics

Data constraints like data must be neat and clean

Big data can not be neat as it is unstructured

Elaborate ETL required thus have to wait for completion of ETL cycle for insights.

Big data analytics provide real – time insights.

Page 3: Big Data Analytics

Descriptive

Diagnostic

Predictive

Prescriptive

Page 4: Big Data Analytics

Relational databases failed to store and process Big Data.

As a result, a new class of big data technology has emerged and is being used in many big data analytics environments. 

The technologies associated with big data analytics include

Hadoop Mapreduce NoSQL

Page 5: Big Data Analytics

Hadoop is a open source framework

Java-based programming framework

Processing and storing of large data sets

Distributed computing environment.

Components of hadoop

HDFS( hadoop distributed file system)

Mapreduce

Page 6: Big Data Analytics

HDFS stores data in DISTRIBUTED,SCALABLE and FAULT-TOLERANT WAY.

Name node have metadata about data on DataNodes

DataNodes actually have data on them in form of blocks and they are capable of communicating

Page 7: Big Data Analytics

MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

as in previous example twitter data was processed on different servers on basis of months .

Hadoop is the physical implementation of Mapreduce .

It is combination of 2 java functions : Mapper() and Reducer()

example: to check popularity of text.

Page 8: Big Data Analytics

Mapper function maps the split files and provide input to reducer

Mapper ( filename , file –contents):for each word in file-contents:

emit (word , 1) Reducer function clubs the input provided by mapper and produce output

Reducer ( word , values):sum=0;for each value in values:

sum=sum + valueemit(word , sum)

Page 9: Big Data Analytics

Not only SQL

Non- relational database management system

Used where no fix schemas are required and data is scaled horizontally.

4 Categories of Nosql databases: Key-value pair Columnar database Graph databases Document databases

Page 10: Big Data Analytics

KEY-VALUE PAIR

keys used to get Value from opaque Data blocks

Hash map

Tremendously fast

Drawback:No provision for content based queries .

Page 11: Big Data Analytics

Stay Tuned With Us for More Information

https://www.linkedin.com/company/tyronesystems

https://twitter.com/tyronesystems

https://www.facebook.com/tyronesystems