Upload
amr-kamel-deklel
View
132
Download
2
Embed Size (px)
Citation preview
Giza At A Glance
• We are system integrator
• 43 years in the market
• Work in 25 countries
• 4 Regions of operation
• Enterprise Business
Solutions
• SCADA
• Transmission &
Distribution
• Transportation
Infrastructure
• Field Solutions
• Smart Buildings
Contents
• Introduction
• When Data is “Big”
• Big Data Information System Layers
Data Platform
Data Science & Advanced Analytics
Information Presentation
Actionable Insights
• Machine Intelligence
• 2014, EMC & IDC digital universe report
• A study to analyze and forecast the amount of
data produced annually
• It is the universe of digital data
• Like the physical universe
It expands fast
Includes stars
Includes dark matter
About everything
The Digital Universe
Digital Universe Expands Fast
• Digital data doubles every two year
• Expected 44 ZB by 2020 44 Trillion GB
– ZB 103 EB 106 PB 109 TB
• Every second 205,000 new GB
• During this presentation ~ new 550 Million GB
• Less than 25% of recorded data is tagged
Telecommunication Revolution
• Smart phones full of
sensors
• Smart phone cameras
• High speed networks
• Mobile penetration
• Multiple devices per
customer
• Huge amount of data
transferred
• Communication
control data
Social Networks
• YouTube Statistics
1,300,000,000 users
300 hours / minute
uploaded
30 million visitors /
day
Internet of Things: Smart Cities• Metering
• Smart homes
• Smart buildings
• Smart parking
• Street lighting
• Traffic monitoring
• And others
Internet of Things: Smart Farming
• Weather measuring
• Air sensors
• Water sensors
• Water leakage sensors
• Soil monitoring
• Irrigation monitoring and control
• Harvesting machines tracking and monitoring
• Farm animals tracking and monitoring
• And others
Internet of Things: Industrial
• Air craft sensors gather ~1TB per flight
• Jet engines produces ~25 MB per flight hour per
engine
• Think about
– power plants,
– oil plants,
– water plants, etc.
• Gartner, the known provenance of 3Vs of Big Data defines
Big Data as: High-volume, high-velocity and high-variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making.
• IDC defines Big Data technologies as: A new
generation of technologies and architectures, designed to
economically extract value from very large volumes of a
wide variety of data by enabling high-velocity capture,
discovery, and/or analysis.
Definitions
• Structured, semi-structured and non-structured data
• Semi-structured
Log files
Manually edited excel files
Others
• Non-structured
Chat conversations
Emails
Images & videos
Others
• Most of this data already belongs to organizations, but it is
sitting there unused — that’s why Gartner calls it dark data
Data Variety
• The speed at which data is:-
Created
Stored
Analyzed
• In Big Data systems, data is created in real-time or
near real-time
Data Velocity
• 90% of all data ever created, was created in past 2
years
• Estimated amount of data doubles every two year
• The era of a trillion sensor is upon us
Data Volume
Big Data Information System
Layers
Actionable Insights
Information Presentation
Data Science & Advanced Analytics
Data Platform
Data Platform
Actionable Insights
Information Presentation
Data Science & Advanced Analytics
Data Platform
Hadoop Distributed File System
(HDFS)
• Open source project
• Java-based file system that
• Scalable up to 200 PB
• Up to 4500 server of single cluster
• Close to a billion files and blocks
• Concurrent access through
“YARN”
Map-Reduce Algorithm
• A framework for
processing problems in
parallel
• Uses multiple computing
cluster nodes
Apache HBase
• Open source project
• Non-relational database
• Column-oriented key-value
data store
• Part of Hadoop project
• Can serve as input & output of
map-reduce jobs in Hadoop
• Data access through Java API
Apache Phoenix
• Open source
• Part of Apache Hadoop
Project
• Based on Apache HBase
• Provides a JDBC and
ODBC drivers for Hbase
Hadoop Distributions
• Top Known:-
- Cloudera
- MapR
- Hortonworks
- IBM
- Pivotal HD
- Intel distribution
• Cloud based:-
- Azure HDInsight
- Amazon Elastic MapReduce
Massively Parallel Processing
(MPP) Data Warehouse
Architecture
• Share nothing architecture, no single point of failure
• Scale horizontally by adding nodes
• Breaks large queries across nodes for parallel
processing
• Higher data ingestion rates through parallelized data
movement
MPP Database Examples
• Teradata
• Netezza
• Vertica
• Greenplum
• Microsoft PDW (Parallel
Data Warehouse)
• DB2 UDB with database
partitioning feature
(DPF)
Actionable Insights
Information Presentation
Data Science & Advanced Analytics
Data Platform
Data Science and Advanced
Analytics Layer
Descriptive Analytics
• What happened
- Which KPIs
- Which time frame
- Which filter
- What chart type
- How remove noise
Diagnostic Analytics
• Why happened
- Why this KPI is low
- What factors of KPI
- Which factors use
to compare
- How to compare
with changing
single factor and fix
others
Predictive Analytics
• Predict / Forecasting
• Segmentation
• Classification
• Anomaly detection
• Sentiment Prediction
Prescriptive Analytics
• What is the best
course of action?
• Simulation
• Optimization
• What-if analysis
Data Mining• Data mining is the computing process of discovering
patterns in large data sets.
• Cross Industry Standard Process for Data Mining
(CRISP-DM):-
- Business understanding
- Data understanding
- Modeling
- Evaluation
- Deployment
Data Mining Techniques
• Regression
• Classification
• Cluster Analysis
• Correlation Analysis
• Outlier Analysis
• Anomaly Detection
Proprietary Data Mining Tools
• SAS Analytics
• IBM SPSS
• SAP Predictive Analytics
• Angoss Predictive
Analytics
• KXEN Predictive Analytics
• Oracle Data Mining (ODM)
• Statistica
• TIBCO Analytics
• Matlab
Open Source
• Python packages
• R Project
• RapidMiner
• KNIME
• Weka
• Octave
• GGobi
• Tangara
• Prediction IO
Information Presentation
Actionable Insights
Information Presentation
Data Science & Advanced Analytics
Data Platform
Reporting / Dashboards• Reporting
Rich formatted and interactive
reports
Reports with / or without
parameters
Using scheduling capabilities
• Dashboards
Publishing web based / mobile
reports
Interactive display for KPI
comparisons with targets
Integration with operational
applications and or event
processing engines
Alerts
• Alerts of business intelligence and analytics content
via:
Emails
SMS
Or customized receiver (i.e. custom web
service)
Geospatial and Location
Intelligence• Combining geographical
and location-related data
from data sources
including:-
- Aerial maps
- GISs
- Consumer
demographics
• Displaying relationships by
overlaying data on
interactive maps
Mobile Information Presentation
• Develop and deliver
content to mobile devices
• Publishing mode and/or
interactive mode
• Takes advantage of mobile
devices’ native caps i.e.:-
- Touch screens
- Camera
- Location awareness
- Natural-Language
query
Actionable Insights
Information Presentation
Data Science & Advanced Analytics
Data Platform
Actionable Insights
Linking Insights to Actions
• Forrester reports that
74% of firms want to be
“data driven”
• But only 29% are
actually successfully
connecting analytics to
action
• Actionable insights are
the missing link
Attributes of Actionable Insights
Aligned with your
business goals
Insight results have
context
Relevance; Insights
delivered to the right
person, in the right time
and settings
Insights are Specific
Novel insights have an
advantage over familiar
ones
Clarity of the insight
Machine Learning
“Machine Learning is giving
computers the ability to learn
without being explicitly
programmed.”
~ Arthur Samuel
Why Machine Learning for Big
Data Analytics
• Dark data makes up more than 90% of the digital
universe
• This is huge amount of data volume, formats, and
sources to be handled in a conventional way
• Analysis of non-structured data like images, videos,
and sound files is usually done using Machine
Learning algorithms
• More data better training results
Artificial Neural Networks (ANN)
• Computing systems are
inspired by biological neural
networks
• Based on a collection of
artificial “neurons” connected
by “synaptic connections”
• Synaptic connections have
weights to control transmitted
signal strength
• Neurons may have thresholds
to control aggregated signal
transmission
Deep Neural Networks (DNN)
• ANN with multiple hidden
layers between the input
and output layers
• The extra layers enable
composition of features from
lower layers
• Applied technology for
tagging of huge amount of
Dark Data images, videos,
speech, music, etc.
Graphics Processing Units (GPU)
• Rapidly create images in frame buffers for output
to display device
• General Purpose GPU (GPGPU), stream
processor or vector processor running compute
kernels
• Suitable for deep neural networks learning
• Several orders of magnitude higher than CPU
• GPU clusters
• Cloud-based GPU (IaaS)
©2017 Giza Systems. All rights reserved.
Giza Systems, a leading systems integrator in the MEA region, designs and deploys industry-specific technology solutions for asset-intensive industries
such as the Telecoms, Utilities, Oil & Gas, Transportation and other market sectors. We help our clients streamline their operations and businesses
through our portfolio of solutions, managed services, and consultancy practice. Our team of 800 professionals are spread throughout the region with
anchor offices in Cairo, Riyadh, Dubai, Nairobi, Dar-es-Salaam and Abuja, allowing us to service an ever-increasing client base in over 40 countries.
Thank You!