If you can't read please download the document
Upload
regunath-balasubramanian
View
2.064
Download
0
Embed Size (px)
Citation preview
Is the Elephant in the room?
Regunath B
[email protected]
Twitter : @RegunathB
Quick read 1.8 million words?
The story is about a battle between great kings and sons, with the principal characters being Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc.
Source : The Gramener blog for visualizations
Analysis of the entire text contained in the Mahabharatha
(http://blog.gramener.com/category/visualisations)
Insights from Social Media
Source : ttwick Billionaires page (Bill Gates' Twitter Social
Media profile)
(http://ttwick.com/blog/bill-gates-twitter-social-media/)
Insights from Social Media
Source : Impact page of Satyamevjayate
(http://www.satyamevjayate.in/impact/impact.php/)
What is Big Data?
Big Data challenges and opportunities arise when information in an enterprise demonstrates following characteristics:
VolumeTransaction data from enterprise systemsFor example : Financial transactions, Orders
VarietyStructured and Unstructured dataFor example : Customer contact, Social Media, Biometrics
VelocityHigh information arrival ratesFor example : Application events, Tagging, Rating of content
Big Data opportunities arise when the enterprise is able to derive Value from the data characteristics defined above
Food for thought.... on theorems and laws
Do hardware and technology trends affect your technology selection?CPU, RAM and disk size double every 18-24 months [Moores law]
Disk seek time remains nearly constant at around 5% speed-up per year
Data Seek vs. Data transferSoftware that leverage one of the above (or) a combinationB+ tree index, LSM tree index, Fractal tree
CAP theorem effect ability to achieve only 2 of 3 properties of shared-data systems : data Consistency, system Availability and tolerance to network Partitions
Bandwidth is the most scare commodity in a Data Center
Aadhaar Patterns & Technologies
Principles
POJO based application implementation
Light-weight, custom application container
Http gateway for APIs
Compute PatternsData Locality
Distribute compute (within a OS process and across)
Compute ArchitecturesSEDA Staged Event Driven Architecture
Master-Worker(s) Compute Grid
Data Access typesHigh throughput streaming : bio-dedupe, analytics
High volume, moderate latency : workflow, UID records
High volume , low latency : auth, demo-dedupe,
search eAadhaar, KYC
Aadhaar Architecture
Work distribution using SEDA & Messaging
Ability to scale within JVM and across
Recovery through check-pointing
Sync Http based Auth gateway
Protocol Buffers & XML payloads
Sharded clusters
Near Real-time data delivery to warehouse
Nightly data-sets used to build dashboards, data marts and reports
Real-time monitoring using Events
Putting data to work at Aadhaar
Deployment Monitoring
Big Data at Flipkart
Website trafficMillions of page hits per day product catalogs, item availability, promotions, search
Millions of active sessions and shopping carts
Latencies measured in low digit milliseconds
Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...) Electronic inventory MP3, eBooks, movies
New business models, newer channels
Understanding users, user profiles, social media, experienceTera bytes of logs containing browsing behavior, data from multiple engagement channels
Recommendations based on millions of possible item matches and relevance algorithms
Is the Elephant in the room?
From Wikipedia:
"Elephant in the room" is an English metaphorical idiom for an
obvious truth that is being ignored or goes unaddressed.
Big Data opportunities and challenges are real and present
-
It is the Elephant in the room.
Some takeaways from experience
Make everything API based
Everything fails (hardware, software, network, storage)System must recover, retry transactions, and sort of self-heal
Security and privacy should not be an afterthought
Scalability does not come from one productWatch out for solution and technology stereotyping
Open scale out is the only way to goHeterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt!
Click to edit the title text formatClick to edit Master title style
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level
Ninth Outline LevelClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level
14/08/12