20
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

Embed Size (px)

Citation preview

Page 1: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

11© 2013 IBM Corporation

11

Ecommerce

Antoine Harfouche

Page 2: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

22© 2013 IBM Corporation

22

Big Data AnalyticsLecture Series

Adapted from Kalapriya KannanIBM Research LabsJuly, 2013

Page 3: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

33© 2013 IBM Corporation

33

What is the aim of the course

Focus is on “Systems” and applications for cloud-based storage and processing of BIG DATA.

+Big Data - Definition+Big Data - Analytics+Big Data - Storage (HDFS)+Big Data - Computing (Map/Reduce)+Big Data - Database (HBase)+Big Data – Graph DB (Titan)+Big Data - Streaming (Strom)

Page 4: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

44© 2013 IBM Corporation

44

“Learning is not just restricted to listening, it is actively asking relevant questions”

Page 5: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

55© 2013 IBM Corporation

55

Get Convinced about “Big Data” Understand why we need a different paradigm. Ascertain with confidence the need to look at data computing in a different way. Realize the potential of big data

–All of you are skilled enough to get into it.

What we will not do–Do research on why things have evolved into the current trends as it stands.–Try to be hands-on – But not guaranteed

Aim

Page 6: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

66© 2013 IBM Corporation

66

What are we going to understand

What is Big Data?

Why we landed up there?To whom does it matterWhere is the money?Are we ready to handle it?What are the concerns?Tools and Technologies

–Is Big Data <=> Hadoop

Page 7: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

77© 2013 IBM Corporation

77

Simple to start

What is the maximum file size you have dealt so far?– Movies/Files/Streaming video that you have used?– What have you observed?

What is the maximum download speed you get?Simple computation

– How much time to just transfer.

Page 8: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

88© 2013 IBM Corporation

88

What is big data?

“Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.

This data is “big data.”

Page 9: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

99© 2013 IBM Corporation

99

Huge amount of data

There are huge volumes of data in the world:+From the beginning of recorded time until 2003,

+ We created 5 billion gigabytes (exabytes) of data.

+In 2011, the same amount was created every two days+In 2013, the same amount of data is created every 10 minutes.

Page 10: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1010© 2013 IBM Corporation

1010

Big data spans three dimensions: Volume, Velocity and Variety

Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.

– Turn 12 terabytes of Tweets created each day into improved product sentiment analysis – Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

– Scrutinize 5 million trade events created each day to identify potential fraud – Analyze 500 million daily call detail records in real-time to predict customer churn faster – The latest I have heard is 10 nano seconds delay is too much.

Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.

– Monitor 100’s of live video feeds from surveillance cameras to target points of interest – Exploit the 80% data growth in images, video and documents to improve customer

satisfaction

Page 11: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1111© 2013 IBM Corporation

1111

Finally….

`Big- Data’ is similar to ‘Small-data’ but bigger

.. But having data bigger it requires different approaches:

Techniques, tools, architecture… with an aim to solve new problems

Or old problems in a better way

Page 12: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1212© 2013 IBM Corporation

1212

Whom does it matter

Research Community Business Community - New tools, new capabilities, new infrastructure, new business

models etc., On sectors

Financial Services..

Page 13: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1313© 2013 IBM Corporation

1313

How are revenues looking like….

Page 14: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1414© 2013 IBM Corporation

1414

The Social Layer in an Instrumented Interconnected World

2+ billion

people on the

Web by end 2011

30 billion RFID tags today

(1.3B in 2005)

4.6 billion camera phones

world wide

100s of millions of GPS

enabled devices

sold annually

76 million smart meters in 2009… 200M by 2014

12+ TBs of tweet data

every day

25+ TBs oflog data

every day

? T

Bs

of

dat

a ev

ery

da

y

Page 15: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1515© 2013 IBM Corporation

1515

What does Big Data trigger?

From “Big Data and the Web: Algorithms for Data Intensive Scalable Computing”, Ph.D Thesis, Gianmarco

Page 16: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1616© 2013 IBM Corporation

1616

BIG DATA is not just HADOOP

Manage & store huge volume of any data

Hadoop File System

MapReduce

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Data WarehousingStructure and control data

Integrate and govern all data sources

Integration, Data Quality, Security, Lifecycle Management, MDM

Understand and navigate federated big data sources

Federated Discovery and Navigation

Page 17: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1717© 2013 IBM Corporation

1717

Types of tools typically used in Big Data Scenario

Where is the processing hosted?–Distributed server/cloud

Where data is stored?–Distributed Storage (eg: Amazon s3)

Where is the programming model?–Distributed processing (Map Reduce)

How data is stored and indexed?–High performance schema free database

What operations are performed on the data?–Analytic/Semantic Processing (Eg. RDF/OWL)

Page 18: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1818© 2013 IBM Corporation

1818

When dealing with Big Data is hard

When the operations on data are complex:–Eg. Simple counting is not a complex problem.–Modeling and reasoning with data of different kinds can get

extremely complexGood news with big-data:

–Often, because of the vast amount of data, modeling techniques can get simpler (e.g., smart counting can replace complex model-based analytics)…

–…as long as we deal with the scale.

Page 19: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2012 IBM Corporation

IBM Security Systems

1919© 2013 IBM Corporation

1919

Why Big-Data?

Key enablers for the appearance and growth of ‘Big-Data’ are:

+Increase in storage capabilities+Increase in processing power+Availability of data

Page 20: © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche

© 2013 IBM Corporation

IBM Security Systems

2020

IBM big data • IBM big data • IBM big data

IBM big data • IBM big data • IBM big data

IBM

big

da

ta

IBM

big

da

taIB

M b

ig d

ata • IB

M b

ig d

ata

THINK