19
BigData, Why Care? Saturday 20 October 12

Big data, why care

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Big data, why care

BigData, Why Care?

Saturday 20 October 12

Page 2: Big data, why care

Datacrunchers Consultancy Services

Speaker

Daan Gerits- BigData Architect- DataCrunchers.eu

§Semantic Analysis, Data Harvesting, ...§Hadoop, Azure, BigInsights, ...§Storm

BigData.be co-organizer

2

Saturday 20 October 12

Page 3: Big data, why care

Datacrunchers Consultancy Services

BigData

A lot of technical fuzz- Hadoop, Storm, Pig, ...

Seems to be only for the big players- Google, Facebook, Linkedin, Twitter, ...

So why should ‘we’ care?- we = Startups, Smaller and Medium Enterprises (SSME)

3

Saturday 20 October 12

Page 4: Big data, why care

Datacrunchers Consultancy Services

What BigData Promises

Ability to store and process large amounts of data- Scalable in hardware and software- Scalable in budget

Which means your budget can grow with your data- start small with a small cluster

- the more data you want to manage, the more systems you add

Lower cost systems- Several low to medium end systems- instead of 1 big expensive one

4

Saturday 20 October 12

Page 5: Big data, why care

Datacrunchers Consultancy Services

But what can you do with it?

Analyze your data with higher precisionAnalyze historical factsPrevent Data Loss- Infrastructure failure

- Human errors

Eliminate data silo’s

5

Saturday 20 October 12

Page 6: Big data, why care

Datacrunchers Consultancy Services

High Precision Analysis

Traditional Technologies- Problems:

§Unable to store all data

- Solutions:§Sharding§Aggregate data

- Problems:§Sharding has a high maintanance cost§Sharding is complex for users and apps§Manual sharding adds a high risk§Data Aggregation causes loss in data precision

6

Saturday 20 October 12

Page 7: Big data, why care

Datacrunchers Consultancy Services

High Precision Analysis

BigData allows us to- Store and process large amounts of data

§So no need to aggregate

- ‘Forget’ about sharding§BigData technologies do this for you§Makes it predictable§And transparant

But- You have to configure it correctly

- You don’t have ad-hoc querying (yet)

7

Saturday 20 October 12

Page 8: Big data, why care

Datacrunchers Consultancy Services

Analyze Historical Facts

Data Warehouse- Built on top of parameters

What if we forget to add a parameter?- Add the parameter

- Start gathering information for that parameter

Problem:- We will only have information from the moment we add

the parameter!

8

Saturday 20 October 12

Page 9: Big data, why care

Datacrunchers Consultancy Services

Analyze Historical Facts

Let’s store everythingDetermine the parameters later- by humans- by machine learning algorithms

Analysis will process all dataWhat if we forget to add a parameter?- add the parameter

- regenerate your reports

9

Saturday 20 October 12

Page 10: Big data, why care

Datacrunchers Consultancy Services

Analyze Historical Data

Conclusion- Traditionally: Ask first, store later- BigData: store first, ask later

10

Saturday 20 October 12

Page 11: Big data, why care

Datacrunchers Consultancy Services

Prevent Data Loss

Traditional technologies- Machine Failure

§ I hope you have a backup from yesterday?

- Human Error §Whoops I deleted those records§ I hope you have a backup from yesterday?

- So in the worst case, you lose one day of data

11

Saturday 20 October 12

Page 12: Big data, why care

Datacrunchers Consultancy Services

Prevent Data Loss

BigData allows us to- Survive machine failure without data-loss- Survive human error without data-loss

But- You need a data-model which supports this

§ Incremental model

- You need to restrict operations§Only append data, No updates or deletes

12

Saturday 20 October 12

Page 13: Big data, why care

Datacrunchers Consultancy Services

Prevent Data Loss

Conclusion- Traditional technologies

§ requires very advanced setups to handle machine failure§allow you to go back to yesterday’s state

- BigData § requires knowledge of how the failover algorithms work§expects failure most of the time§allows you to go back to the previous state

13

Saturday 20 October 12

Page 14: Big data, why care

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Departments having their own data sources- start to modify that data- start to treat it as their master data

- not coupled to the master dataset

Causes a lot of overhead- Silo’s miss master data updates- Business decisions based on silo data, not the more

accurate master data

No obvious way out

14

Saturday 20 October 12

Page 15: Big data, why care

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Consolidate the silo’s- Identify the silo’s- Import the data from the silo’s into one store

- Reconstruct master data based on silo rules and priorities

15

MasterData

Sa

M

SuSupport

Marketing

Sales

Saturday 20 October 12

Page 16: Big data, why care

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Generate read-only data-models per applicationData changes are sent to the master data- using a specific api- using database triggers

16

DataWarehouse

Public API

ERP/CRM DBM1

M2

M3

MasterData

Saturday 20 October 12

Page 17: Big data, why care

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Conclusion- You will have to consolidate- But you need a structural solution

- Which can be provided by BigData

- In a flexible and future-proof way

17

Saturday 20 October 12

Page 18: Big data, why care

Datacrunchers Consultancy Services

Conclusion

There is a lot to think aboutBut BigData can do a lot of things- A lot more than I explained today

For a reasonable priceAnd you are not alone- bigdata.be- datacrunchers.eu

18

Saturday 20 October 12

Page 19: Big data, why care

Questions?

Saturday 20 October 12