31
Big Data Consulting Hadoop, big data Robert Gibbon - www.bigindustries.be

Big data, Hadoop - lunchtime talk 2015.02.26

Embed Size (px)

Citation preview

Page 1: Big data, Hadoop - lunchtime talk 2015.02.26

Big Data Consulting

Hadoop, big dataRobert Gibbon - www.bigindustries.be

Page 2: Big data, Hadoop - lunchtime talk 2015.02.26
Page 3: Big data, Hadoop - lunchtime talk 2015.02.26

The information age

■ The “economic third wave” has badly hit many blue chip organisations

■ Manufacturing and retail is in rapid decline in Europe and the US■ Tech, connectivity and information is restructuring our societies■ Levels of political and social engagement have surged■ Trading platforms are empowering small businesses

Page 4: Big data, Hadoop - lunchtime talk 2015.02.26
Page 5: Big data, Hadoop - lunchtime talk 2015.02.26

Innovation■ Mass-production hates innovation■ Innovation means change – a huge cost with little benefit for

production-line economies■ Continuous improvement mentality

■ Knowledge services need to innovate to differentiate■ Change in a virtual world can be cheap and yield huge rewards■ Continuous reinvention mentality

Page 6: Big data, Hadoop - lunchtime talk 2015.02.26

The rover bicycle, 1885

Page 7: Big data, Hadoop - lunchtime talk 2015.02.26

Big data viz. innovation■ In a free market like the web, innovation can open up new

opportunities■ Consumer access to grid computing tech is a recent innovation■ Grid computing opens up new opportunities that would otherwise

not be viable■ Ideal for ventures architected around the long-tail economic

model

Page 8: Big data, Hadoop - lunchtime talk 2015.02.26

The future - thingternet■ The internet of things is with us■ Billions of connected devices, even digital tattoos

Page 9: Big data, Hadoop - lunchtime talk 2015.02.26

Big data viz. internet of things

■ Billions of connected devices create a huge amount of data

■ Until big data tech, Internet of Things was nearly impossible to monetize

Page 10: Big data, Hadoop - lunchtime talk 2015.02.26

The internet of things is a wild west■ Many new, unsolved challenges

■ Privacy■ Governance■ Civil liberties

■ New challenges = new opportunities

Page 11: Big data, Hadoop - lunchtime talk 2015.02.26

let's get back to hadoop

Page 12: Big data, Hadoop - lunchtime talk 2015.02.26

■ FOSS software solution for processing terabytes to petabytes of data■ Using arrays of regular servers

■ Hadoop core:■ HDFS - a scale-out file system■ YARN - a scale-out application resource manager

■ Runtimes:■ Spark, Impala, Flink, MapReduce, Kafka, SolrCloud etc.

■ Components for data protection, access control and operational management■ NOSQL databases

■ Hbase, Accumulo, Cassandra etc.

Hadoop refresher

Page 13: Big data, Hadoop - lunchtime talk 2015.02.26

what can you do with hadoop?

Page 14: Big data, Hadoop - lunchtime talk 2015.02.26

Storage

■ Pure online data storage, with no other processing ■ Low cost per-GB for petascale online storage ■ Option to directly query and analyse the data is

available if required.

Page 15: Big data, Hadoop - lunchtime talk 2015.02.26

■ Example: huge, constantly changing catalogue of products – like Ebay and Amazon

■ SolrCloud – an advanced search engine serving terabytes of content from Hadoop

Search

Page 16: Big data, Hadoop - lunchtime talk 2015.02.26

Messaging■ A distributed message queue backed by a Hadoop

cluster - Apache Kafka■ Elastically scalable■ Messages are persisted and replicated for durability■ TBs of messages per broker with predictable

performance

Page 17: Big data, Hadoop - lunchtime talk 2015.02.26

Targeting■ Personalised content for users■ Generates and consumes a huge amount of log data

■ for reporting ■ for predictive analysis

■ Predictive analysis is compute intensive ■ Can be TBs of data per day

Page 18: Big data, Hadoop - lunchtime talk 2015.02.26

Self-service Business Intelligence■ Enterprise Data Hub paradigm ■ A very popular emerging use case

■ Business users directly access raw datasets using specialised discovery tools built on top of Hadoop - DataMeer, Platfora and others

Page 19: Big data, Hadoop - lunchtime talk 2015.02.26

Data warehousing

■ Migration of Enterprise Data Warehouse to Hadoop ■ Big cost savings versus trad vendors like Oracle and

Teradata

Page 20: Big data, Hadoop - lunchtime talk 2015.02.26

Machine learning

■ Predictive analytics with Spark MLLib or Revolution R Enterprise

■ Automatically predict component failures for proactive intervention

Page 21: Big data, Hadoop - lunchtime talk 2015.02.26

Big Database■ Low latency, high throughput, high concurrency,

high volume■ Algotrading■ Realtime ad auctions

■ Volumes at 200BN transactions per day in realtime reliably served

Page 22: Big data, Hadoop - lunchtime talk 2015.02.26

■ Analysis and response to threats detected by SPI module on remote switch

■ Automated systems management – shut down heating when nobody home to reduce heating bill and emissions

■ Monitor driver propensity to break the speed limit - offer lower insurance premiums to good drivers

Device management

Page 23: Big data, Hadoop - lunchtime talk 2015.02.26

hadoop - mature?

Page 24: Big data, Hadoop - lunchtime talk 2015.02.26

Choice of vendors

Page 25: Big data, Hadoop - lunchtime talk 2015.02.26

Solid operational management

Page 26: Big data, Hadoop - lunchtime talk 2015.02.26

Impala v Teradata

Page 27: Big data, Hadoop - lunchtime talk 2015.02.26

Free grid computing

Page 28: Big data, Hadoop - lunchtime talk 2015.02.26

Free scale-out database

Page 29: Big data, Hadoop - lunchtime talk 2015.02.26

Growing commercial ecosystem

Page 30: Big data, Hadoop - lunchtime talk 2015.02.26

Secure and available■ RPC authentication and encryption with PKI■ Data encryption at rest and in transit■ Kerberos resource access control - HDFS, YARN■ Table cell level permissions - Accumulo■ Online snapshot backups■ No SPoF

Page 31: Big data, Hadoop - lunchtime talk 2015.02.26

thanks for listeningbe.linkedin.com/in/robertgibbon