24
Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings

Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

Using Data Mining and Machine Learning in Retail

Omeid SeideSenior Manager, Big Data SolutionsSears Holdings

Bharat PrasadBig Data Solution ArchitectSears Holdings

Page 2: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

22

TheChallenge

Shortened processing windows

Escalatingcosts

Hitting scalabilityceilings

Demanding business

requirements

ETLcomplexity

Latency in data

Tight IT budgets

Growing data

volumes

Over a Century of Innovation

A Fortune 100 company, nearly $40 billion in annual revenue.

The nation’s fourth largest broad line retailer with almost 2,500 full-line and specialty retail stores in the US and Canada.

A front runner in Big Data efforts including driving personalized marketing and generating savings from legacy migration.

Running one of the biggest rewards programs that captures and analyzes a very large number of customer transactions quickly.

Page 3: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

33

Big Data can no longer be defined by the amount of data, but by the type, speed, and storage capacity needed to compute and analyze that data.

What is Big Data?

Page 4: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

44

We are creating so much data, so quickly, that 90% of the data in the world today has been created in the last 2 years.

Data, Data, and More Data

Page 5: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

55

With traditional computer processing--it can be difficult to compute everything, due to storage space, processing time, and cost.

This typically leads to incomplete computations, data latency, and overall lack of quality analysis.

Hadoop brings infinite scalability, extremely large storage capability, and fast data processing.

The Problem with Large Scale Data Processing

Page 6: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

66

Runs applications on a large cluster built of commodity hardware.

Provides reliability and data motion to applications.

Implements a computational paradigm named MapReduce.• Applications divided into small fragments of work for execution/

re-execution on any node in the cluster.

Provides a Distributed File System (HDFS) that stores data on compute nodes, resulting in high aggregate bandwidth across the cluster. Both Map/Reduce and the Distributed File System Framework automatically handle the node failures.

Apache Hadoop is a framework which:

Enter Hadoop

Page 7: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

77

Stability: Hadoop is “horizontally scalable.” • Easily stores and processes petabytes of data, just by adding

hardware.

Economical: Uses commodity based hardware.

Efficient: Extremely powerful processing ability.

Reliability: Data is replicated 3x times (min) in different locations; failed tasks are rerun.

Storage space & Capacity: Central Repository; Keep everything forever.

Why Use Hadoop?

Page 8: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

88

How can I better manage my inventory?

How can I better understand my customers’ buying habits?

How can I detect fraudulent activity?

How can I create better targeted interaction with my customer?

How do I get customers to purchase more products?

Big Data Analytics in Retail

Page 9: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

99

The Evolution Data Analysis

Page 10: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1010

Top Apache Foundation software project

Uses Scalable Machine Learning algorithms

Collection of pre-built data-mining libraries

Primary focus on collaborative filtering, clustering & classification

Houses a Java based math library that uses common math operations

Uses MapReduce paradigm

What is Mahout?

Page 11: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1111

Examples of Data Mining & Machine Learning

Page 12: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1212

Clustering 

Recommendation Systems

Market Basket Analysis

3 Primary Algorithms

Page 13: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1313

A process of grouping similar things in such a way, so that ‘like items’ are grouped together with other items that most closely represent themselves.

Clustering

Page 14: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1414

Why use Clustering??

To better understand a customer’s buying behavior

To develop targeted marketing campaigns

To understand interest, motivation, and lifestyle, in order more effectively move merchandise in and out of stores

Motivation behind Clustering

Page 15: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1515

An information filtering system that is used to predict a users rating or preference, typically using a collaborative, content-based or hybrid approach to recommendations.

Recommendation Systems

Page 16: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1616

Framework that filters and recommends items based on user behavior, preferences and activities.

Based on their similarities to others.

Recommenders User based Item based

Online and Offline support Can utilize Hadoop

Uses numerous similarity measurements, such as Cosine, LLR, Tanimoto, Pearson, and more.

Collaborative Filtering

Page 17: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1717

Looks at the item and the users preference in order, and provides a recommendation.

Allows for highly precise recommendations.

Difficulty when making recommendation over cross-sections of service when used for cross- selling.

A

C

B

Users

Ratings

Matching

Content with similar feature values is recommended

Feature Values

Content used in the past

X

Z

Y

Contents

User Profile

Feature Values

Content Profile

profile

Content- Based Filtering

Page 18: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1818

A model used to describe the commonality of several relationships between two objects.

Items: anything that is purchased

Basket: a set of items

The numbers of items in a basket is typically small, and the number of baskets is typically large

Market-Basket Model

Page 19: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

1919

A list of Purchasers Additional “Purchaser” data is can be useful (but

is not needed)

A list of transactions

Seek to identify purchasing patterns What items are normally purchased together What is the purchasing sequence Is there a seasonality effect to purchasing

Categorize buying behavior

Translate buying behavior into actionable insight Targeted promotions Inventory placement Store layout Cross- Selling

Market Basket Models

Page 20: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

2020

Any set of items that appears regularly within multiple baskets

Originally used to analyze a physical “supermarket basket”

Best used to link commonly bought together pairs that often have no relationship to each other

Example: Diapers & Beer

A major store chain discovered that diapers and beer were regularly appearing in baskets together. Theory was that if you bought diapers you are likely to have a baby at home, with a baby at home it is less likely that you go to a bar to drink, and more likely you will have a beer at home.

Frequent Itemsets

Page 21: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

2121

Retail Stores

Showroom floor planning

Catalog layout

Crossing selling

Fraud Analysis

Applying Market Baskets Models

Page 22: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

2222

Big Data Stack

Data Governance & Integration --ETL/ELT

Security

Storage-hdfsOn-Promises

Metadata

NOSQL DBNOSQL DB

Hive/Pig Advance Query

Storage-hdfsCloud

Hive/PigAdvance

Query

Data Analytics Data Mining

Data Visualization & Reporting

Real-Time Streaming Time seriesOn demand

Consumption Layer

Consumption Layer

Semantic LayerSemantic Layer

Computation/Access Layer

Computation/Access Layer

Storage LayerStorage Layer

Security LayerSecurity Layer

Integration LayerIntegration Layer

Frequency Frequency

Integration Layer

Integration Layer

Page 23: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

2323

BlogSource

LayerSource Layer

Integration Layer

Integration Layer

Security Layer

Security Layer

Storage Layer

/NO SQL DB

Storage Layer

/NO SQL DB

Computation/Access Layer

Computation/Access Layer

Semantic Layer

Semantic Layer

Consumption Layer

Consumption Layer

DistributionDistribution

Open vs Closed Stack

Page 24: Using Data Mining and Machine Learning in RetailUsing Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data

2424

Questions?