Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Regional Forum on Cybersecurity in the Era of Emerging Technologies &
the Second Meeting of the “Successful Administrative Practices”-2017 Cairo, Egypt 28-29 November 2017
Big Data, Data Science and Analytics Challenges for Digital Transformation Introduction Note
Hisham Arafat
What is Big Data?!!
Context is important… Digital Transformation
2
Can we answer this questions in 15 mins?
Key Motivations in a Connected World!
Indicative Forecasted Figures for Year 2020 - Image source: The Enterprise Project
Billions of Connected
Users
Dozens Billions of Connected
Devices
Dozens Zettabytes of
Generated Data
Big Data…Practical Definition!
Huge Volumes Massive Streams Mixed Structures Complex Processing
1000s sensors~ 1 TB/Sensor/Day
Images/Reports/Corsp.
Millions of Parcels/Day~ 100 TB/Day
Labels/Cross App
Millions of Cars~ 200+ OBDII/sRealtime/Pred.
Big Data is not the Solution….It’s the Challenge
• Shared I/O
• Shared Processing
• Limited Scalability
• Service Bottlenecks
• High Cost FactorSh
ared
Bu
ffer
s
Data Files
Database Cluster
I/O
I/O
I/O
Network
Dat
abas
e Se
rvic
e
Traditional Data Management Systems
How Many Oxen?
In pioneer days they used oxen forheavy pulling, and when one oxcouldn't budge a log, they didn't try togrow a larger ox. We shouldn't betrying for bigger computers, but formore systems of computers.
Bullock Team drawing II Ton Marshall Engine (Australia early
20th century)
Dat
a N
od
es
Master NodesI/O
Network
Inte
rco
nn
ect
• Parallel Processing
• Shared Nothing
• Linear Scalability
• Distributed Services
• Lower Cost Factor
I/O
I/O
I/O
…
Metadata
1
2
3
n
Metadata
User data / Replicas
User data / Replicas
User data / Replicas
User data / Replicas
Abstraction of Big Data Platforms
Key Technologies and Patterns
Problem to Solve Techniques Methods
Find Relations or Patterns of Occurrence Among Items, Actions or Events
Association Rules Apriori, FP Growth
Analyze and Discover the Internal Structure, Behavior, or Similarity of Observations
Clustering Kmeans, k-medoids , DBSCAN, LDA
Put New Coming Observations Under Pre-defined Classes or Assign Labels
Classification Naive Bayes, LR, RF, Decision Trees, SVM
Understand the How Specific Outcome is Driven by Input Variables
Regression Linear, Logistic, Ridge, Multinomial, LOESS
Forecast in Short Term and Understand Temporal Behavior for Variables
Time Series Analysis Box-Jenkins, ARIMA, Wavelet
Analyzing Unstructured Text for Searching, Retrieval, Sentiment, Networking, NLP
Text Analytics BoW, TFIDF, PoS, CR, HiddenMarkov, TM
Respond to Future Events Prospectively Simulation Monte Carlo, GA
Provide List of Appealing Recommendations on Events or Actions Recommenders Collaborative Filtering, Content B. Filtering
Data Science Models…The Value
100s of Methods!
Emerging Technologies and Apps
BlockchainApplications
Internet of Things (IoT)Solutions
RecommenderSystems
PersonalizedServices
Personalized Medication
Preventive Maintenance
PerspectiveCybersecurity
Logistics & SCOptimization
GeolocationServices
Key Challenges
DataGovernance
Privacy-Preserving analytics
Securing inReal-time
Data StructureComplexities
DistributedDeployment
IdentityManagement
DDoS
Configuration Management
AlgorithmsThreats
Volume
Velocity
Processing
Varity
Insights
Growth
Streams
Real-time
Dynamicity
Responsive
Scale out Performance
Data Flow Engines
Event Pipelines
Smart Data Formats
Perspective Deep Models
Big
Dat
a b
y Ye
ar 2
01
0
Big Data…Digital Transformation
Thank You