56
Big Data Analytics and Machine Intelligence

Big data analytics and machine intelligence v5.0

Embed Size (px)

Citation preview

Big Data Analytics and

Machine Intelligence

Giza At A Glance

• We are system integrator

• 43 years in the market

• Work in 25 countries

• 4 Regions of operation

• Enterprise Business

Solutions

• SCADA

• Transmission &

Distribution

• Transportation

Infrastructure

• Field Solutions

• Smart Buildings

Contents

• Introduction

• When Data is “Big”

• Big Data Information System Layers

Data Platform

Data Science & Advanced Analytics

Information Presentation

Actionable Insights

• Machine Intelligence

Introduction

• 2014, EMC & IDC digital universe report

• A study to analyze and forecast the amount of

data produced annually

• It is the universe of digital data

• Like the physical universe

It expands fast

Includes stars

Includes dark matter

About everything

The Digital Universe

Digital Universe Expands Fast

• Digital data doubles every two year

• Expected 44 ZB by 2020 44 Trillion GB

– ZB 103 EB 106 PB 109 TB

• Every second 205,000 new GB

• During this presentation ~ new 550 Million GB

• Less than 25% of recorded data is tagged

Telecommunication Revolution

• Smart phones full of

sensors

• Smart phone cameras

• High speed networks

• Mobile penetration

• Multiple devices per

customer

• Huge amount of data

transferred

• Communication

control data

Social Networks

• YouTube Statistics

1,300,000,000 users

300 hours / minute

uploaded

30 million visitors /

day

Internet of Things: Smart Cities• Metering

• Smart homes

• Smart buildings

• Smart parking

• Street lighting

• Traffic monitoring

• And others

Internet of Things: Smart Farming

• Weather measuring

• Air sensors

• Water sensors

• Water leakage sensors

• Soil monitoring

• Irrigation monitoring and control

• Harvesting machines tracking and monitoring

• Farm animals tracking and monitoring

• And others

Internet of Things: Industrial

• Air craft sensors gather ~1TB per flight

• Jet engines produces ~25 MB per flight hour per

engine

• Think about

– power plants,

– oil plants,

– water plants, etc.

When Big Data is “Big”

• Gartner, the known provenance of 3Vs of Big Data defines

Big Data as: High-volume, high-velocity and high-variety

information assets that demand cost-effective, innovative

forms of information processing for enhanced insight and

decision making.

• IDC defines Big Data technologies as: A new

generation of technologies and architectures, designed to

economically extract value from very large volumes of a

wide variety of data by enabling high-velocity capture,

discovery, and/or analysis.

Definitions

3-Vs Gartner Model

• Structured, semi-structured and non-structured data

• Semi-structured

Log files

Manually edited excel files

Others

• Non-structured

Chat conversations

Emails

Images & videos

Others

• Most of this data already belongs to organizations, but it is

sitting there unused — that’s why Gartner calls it dark data

Data Variety

• The speed at which data is:-

Created

Stored

Analyzed

• In Big Data systems, data is created in real-time or

near real-time

Data Velocity

• 90% of all data ever created, was created in past 2

years

• Estimated amount of data doubles every two year

• The era of a trillion sensor is upon us

Data Volume

Big Data Information

System Layers

Big Data Information System

Layers

Actionable Insights

Information Presentation

Data Science & Advanced Analytics

Data Platform

Data Platform

Actionable Insights

Information Presentation

Data Science & Advanced Analytics

Data Platform

Hadoop Distributed File System

(HDFS)

• Open source project

• Java-based file system that

• Scalable up to 200 PB

• Up to 4500 server of single cluster

• Close to a billion files and blocks

• Concurrent access through

“YARN”

Map-Reduce Algorithm

• A framework for

processing problems in

parallel

• Uses multiple computing

cluster nodes

Apache HBase

• Open source project

• Non-relational database

• Column-oriented key-value

data store

• Part of Hadoop project

• Can serve as input & output of

map-reduce jobs in Hadoop

• Data access through Java API

Apache Phoenix

• Open source

• Part of Apache Hadoop

Project

• Based on Apache HBase

• Provides a JDBC and

ODBC drivers for Hbase

Hadoop Distributions

• Top Known:-

- Cloudera

- MapR

- Hortonworks

- IBM

- Pivotal HD

- Intel distribution

• Cloud based:-

- Azure HDInsight

- Amazon Elastic MapReduce

Hadoop Hortonworks Ecosystem

Massively Parallel Processing

(MPP) Data Warehouse

Architecture

• Share nothing architecture, no single point of failure

• Scale horizontally by adding nodes

• Breaks large queries across nodes for parallel

processing

• Higher data ingestion rates through parallelized data

movement

MPP Database Examples

• Teradata

• Netezza

• Vertica

• Greenplum

• Microsoft PDW (Parallel

Data Warehouse)

• DB2 UDB with database

partitioning feature

(DPF)

Pivotal Greenplum Architecture

Actionable Insights

Information Presentation

Data Science & Advanced Analytics

Data Platform

Data Science and Advanced

Analytics Layer

Types of Data Analytics

Analytics

Descriptive

Diagnostic

Predictive

Prescriptive

Descriptive Analytics

• What happened

- Which KPIs

- Which time frame

- Which filter

- What chart type

- How remove noise

Diagnostic Analytics

• Why happened

- Why this KPI is low

- What factors of KPI

- Which factors use

to compare

- How to compare

with changing

single factor and fix

others

On-Line Analytical Processing

(OLAP)

Predictive Analytics

• Predict / Forecasting

• Segmentation

• Classification

• Anomaly detection

• Sentiment Prediction

Prescriptive Analytics

• What is the best

course of action?

• Simulation

• Optimization

• What-if analysis

Data Mining• Data mining is the computing process of discovering

patterns in large data sets.

• Cross Industry Standard Process for Data Mining

(CRISP-DM):-

- Business understanding

- Data understanding

- Modeling

- Evaluation

- Deployment

Data Mining Techniques

• Regression

• Classification

• Cluster Analysis

• Correlation Analysis

• Outlier Analysis

• Anomaly Detection

Proprietary Data Mining Tools

• SAS Analytics

• IBM SPSS

• SAP Predictive Analytics

• Angoss Predictive

Analytics

• KXEN Predictive Analytics

• Oracle Data Mining (ODM)

• Statistica

• TIBCO Analytics

• Matlab

Open Source

• Python packages

• R Project

• RapidMiner

• KNIME

• Weka

• Octave

• GGobi

• Tangara

• Prediction IO

Information Presentation

Actionable Insights

Information Presentation

Data Science & Advanced Analytics

Data Platform

Reporting / Dashboards• Reporting

Rich formatted and interactive

reports

Reports with / or without

parameters

Using scheduling capabilities

• Dashboards

Publishing web based / mobile

reports

Interactive display for KPI

comparisons with targets

Integration with operational

applications and or event

processing engines

Alerts

• Alerts of business intelligence and analytics content

via:

Emails

SMS

Or customized receiver (i.e. custom web

service)

Geospatial and Location

Intelligence• Combining geographical

and location-related data

from data sources

including:-

- Aerial maps

- GISs

- Consumer

demographics

• Displaying relationships by

overlaying data on

interactive maps

Mobile Information Presentation

• Develop and deliver

content to mobile devices

• Publishing mode and/or

interactive mode

• Takes advantage of mobile

devices’ native caps i.e.:-

- Touch screens

- Camera

- Location awareness

- Natural-Language

query

Actionable Insights

Information Presentation

Data Science & Advanced Analytics

Data Platform

Actionable Insights

Linking Insights to Actions

• Forrester reports that

74% of firms want to be

“data driven”

• But only 29% are

actually successfully

connecting analytics to

action

• Actionable insights are

the missing link

Attributes of Actionable Insights

Aligned with your

business goals

Insight results have

context

Relevance; Insights

delivered to the right

person, in the right time

and settings

Insights are Specific

Novel insights have an

advantage over familiar

ones

Clarity of the insight

Machine Intelligence

Machine Learning

“Machine Learning is giving

computers the ability to learn

without being explicitly

programmed.”

~ Arthur Samuel

Why Machine Learning for Big

Data Analytics

• Dark data makes up more than 90% of the digital

universe

• This is huge amount of data volume, formats, and

sources to be handled in a conventional way

• Analysis of non-structured data like images, videos,

and sound files is usually done using Machine

Learning algorithms

• More data better training results

Artificial Neural Networks (ANN)

• Computing systems are

inspired by biological neural

networks

• Based on a collection of

artificial “neurons” connected

by “synaptic connections”

• Synaptic connections have

weights to control transmitted

signal strength

• Neurons may have thresholds

to control aggregated signal

transmission

Deep Neural Networks (DNN)

• ANN with multiple hidden

layers between the input

and output layers

• The extra layers enable

composition of features from

lower layers

• Applied technology for

tagging of huge amount of

Dark Data images, videos,

speech, music, etc.

Graphics Processing Units (GPU)

• Rapidly create images in frame buffers for output

to display device

• General Purpose GPU (GPGPU), stream

processor or vector processor running compute

kernels

• Suitable for deep neural networks learning

• Several orders of magnitude higher than CPU

• GPU clusters

• Cloud-based GPU (IaaS)

Combining HDFS with GPU

Conventional Large Scale Distributed Deep

Learning on Hadoop Clusters

©2017 Giza Systems. All rights reserved.

Giza Systems, a leading systems integrator in the MEA region, designs and deploys industry-specific technology solutions for asset-intensive industries

such as the Telecoms, Utilities, Oil & Gas, Transportation and other market sectors. We help our clients streamline their operations and businesses

through our portfolio of solutions, managed services, and consultancy practice. Our team of 800 professionals are spread throughout the region with

anchor offices in Cairo, Riyadh, Dubai, Nairobi, Dar-es-Salaam and Abuja, allowing us to service an ever-increasing client base in over 40 countries.

Thank You!