Big data, why care

Preview:

DESCRIPTION

 

Citation preview

BigData, Why Care?

Saturday 20 October 12

Datacrunchers Consultancy Services

Speaker

Daan Gerits- BigData Architect- DataCrunchers.eu

§Semantic Analysis, Data Harvesting, ...§Hadoop, Azure, BigInsights, ...§Storm

BigData.be co-organizer

2

Saturday 20 October 12

Datacrunchers Consultancy Services

BigData

A lot of technical fuzz- Hadoop, Storm, Pig, ...

Seems to be only for the big players- Google, Facebook, Linkedin, Twitter, ...

So why should ‘we’ care?- we = Startups, Smaller and Medium Enterprises (SSME)

3

Saturday 20 October 12

Datacrunchers Consultancy Services

What BigData Promises

Ability to store and process large amounts of data- Scalable in hardware and software- Scalable in budget

Which means your budget can grow with your data- start small with a small cluster

- the more data you want to manage, the more systems you add

Lower cost systems- Several low to medium end systems- instead of 1 big expensive one

4

Saturday 20 October 12

Datacrunchers Consultancy Services

But what can you do with it?

Analyze your data with higher precisionAnalyze historical factsPrevent Data Loss- Infrastructure failure

- Human errors

Eliminate data silo’s

5

Saturday 20 October 12

Datacrunchers Consultancy Services

High Precision Analysis

Traditional Technologies- Problems:

§Unable to store all data

- Solutions:§Sharding§Aggregate data

- Problems:§Sharding has a high maintanance cost§Sharding is complex for users and apps§Manual sharding adds a high risk§Data Aggregation causes loss in data precision

6

Saturday 20 October 12

Datacrunchers Consultancy Services

High Precision Analysis

BigData allows us to- Store and process large amounts of data

§So no need to aggregate

- ‘Forget’ about sharding§BigData technologies do this for you§Makes it predictable§And transparant

But- You have to configure it correctly

- You don’t have ad-hoc querying (yet)

7

Saturday 20 October 12

Datacrunchers Consultancy Services

Analyze Historical Facts

Data Warehouse- Built on top of parameters

What if we forget to add a parameter?- Add the parameter

- Start gathering information for that parameter

Problem:- We will only have information from the moment we add

the parameter!

8

Saturday 20 October 12

Datacrunchers Consultancy Services

Analyze Historical Facts

Let’s store everythingDetermine the parameters later- by humans- by machine learning algorithms

Analysis will process all dataWhat if we forget to add a parameter?- add the parameter

- regenerate your reports

9

Saturday 20 October 12

Datacrunchers Consultancy Services

Analyze Historical Data

Conclusion- Traditionally: Ask first, store later- BigData: store first, ask later

10

Saturday 20 October 12

Datacrunchers Consultancy Services

Prevent Data Loss

Traditional technologies- Machine Failure

§ I hope you have a backup from yesterday?

- Human Error §Whoops I deleted those records§ I hope you have a backup from yesterday?

- So in the worst case, you lose one day of data

11

Saturday 20 October 12

Datacrunchers Consultancy Services

Prevent Data Loss

BigData allows us to- Survive machine failure without data-loss- Survive human error without data-loss

But- You need a data-model which supports this

§ Incremental model

- You need to restrict operations§Only append data, No updates or deletes

12

Saturday 20 October 12

Datacrunchers Consultancy Services

Prevent Data Loss

Conclusion- Traditional technologies

§ requires very advanced setups to handle machine failure§allow you to go back to yesterday’s state

- BigData § requires knowledge of how the failover algorithms work§expects failure most of the time§allows you to go back to the previous state

13

Saturday 20 October 12

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Departments having their own data sources- start to modify that data- start to treat it as their master data

- not coupled to the master dataset

Causes a lot of overhead- Silo’s miss master data updates- Business decisions based on silo data, not the more

accurate master data

No obvious way out

14

Saturday 20 October 12

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Consolidate the silo’s- Identify the silo’s- Import the data from the silo’s into one store

- Reconstruct master data based on silo rules and priorities

15

MasterData

Sa

M

SuSupport

Marketing

Sales

Saturday 20 October 12

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Generate read-only data-models per applicationData changes are sent to the master data- using a specific api- using database triggers

16

DataWarehouse

Public API

ERP/CRM DBM1

M2

M3

MasterData

Saturday 20 October 12

Datacrunchers Consultancy Services

Eliminate Data Silo’s

Conclusion- You will have to consolidate- But you need a structural solution

- Which can be provided by BigData

- In a flexible and future-proof way

17

Saturday 20 October 12

Datacrunchers Consultancy Services

Conclusion

There is a lot to think aboutBut BigData can do a lot of things- A lot more than I explained today

For a reasonable priceAnd you are not alone- bigdata.be- datacrunchers.eu

18

Saturday 20 October 12

Questions?

Saturday 20 October 12

Recommended