27
Yelper A Collaborative Filtering Based Recommendation System Team DeepBlue Chuan Sun Capstone Project @ NYC Data Science Academy 9/20/2016

Capstone Project Slides- Yelper

Embed Size (px)

Citation preview

Page 1: Capstone Project Slides- Yelper

YelperA Collaborative Filtering Based Recommendation System

Team DeepBlueChuan Sun

Capstone Project @ NYC Data Science Academy9/20/2016

Page 2: Capstone Project Slides- Yelper

Agenda

Why?Why we need recommendation?

What?What are the components for Yelper?

How?How to build Yelper?

DemoUser-business network; Yelper main page; Simulation of users' recommendation requests handling

Summary & Future worksWhat we learn and what's next?

Page 3: Capstone Project Slides- Yelper

Information overload is a real phenomenon which prevents us from taking decisions or actions

Image link

Page 4: Capstone Project Slides- Yelper

But life is short. We only have 24 hours per day

Page 5: Capstone Project Slides- Yelper

Recommendation is becoming extremely common in recent years

Page 6: Capstone Project Slides- Yelper

Real-time recommendation system may become a new normal in data era

● Gain real-time insight● Enable rapid development● Perform real-time analytics

Page 7: Capstone Project Slides- Yelper

Agenda

Why?Why we need recommendation?

What?What are the components for Yelper?

How?How to build Yelper?

DemoUser-business network; Yelper main page; Simulation of users' recommendation requests handling

Summary & Future worksWhat we learn and what's next?

Page 8: Capstone Project Slides- Yelper

Yelp Challenge 2016 Dataset is a good sandbox to build recommendation engine

Page 9: Capstone Project Slides- Yelper

Yelper consists of 5 major components

User requests

PiperData

Recommendation Engine

Streaming Processor

Web server

Visualization

Data sourceProvide raw or processed data as the fuel for recommendation engine

Pipe user requests into streamingHandle real-time data feeds as message broker

Process user requestsDigest continuous data from Kafka in a scalable and fault-tolerable way

Predict ratingsAdopt collaborative filtering or content-based filtering

Visual insightDistill connected components or in/out degree for users and businesses

User frontendAllow customer to get customized recommendations

Page 10: Capstone Project Slides- Yelper

Agenda

Why?Why we need recommendation?

What?What are the components for Yelper?

How?How to build Yelper?

DemoUser-business network; Yelper main page; Simulation of users' recommendation requests handling

Summary & Future worksWhat we learn and what's next?

Page 11: Capstone Project Slides- Yelper

Yelper is built on top of Spark platform

Business

User

ReviewTip Checkin

Data source(1) Use business and user data in Yelp dataset(2) Map raw IDs to integers to make later stage a lot easier(3) Splitting business data by city allows fine-tuned models

Pipe user requests into streaming(1) Pipe user requests into streaming (2) Create json-based topics to allow consumer-provider communication

Process user requests in stream(1) Use Spark SQL to query data frames(2) Support parallelism(3) Fault-tolerant

Predict ratings(1) Use Spark MLlib to train Collaborative Filtering models per city(2) Tune the best rank(3) Filter user-input keywords to prune user's preferences

Visual insight(1) Use Spark GraphX to extract connected components(2) Use graph-tool library to draw large graphs(3) Use D3.js to build dynamic graphs

Client frontend(1) Use Cherrypy to allow Flask talk to Spark context(2) Use Google Map API to show recommended results

Page 12: Capstone Project Slides- Yelper

Agenda

Why?Why we need recommendation?

What?What are the components for Yelper?

How?How to build Yelper?

DemoUser-business network; Yelper main page; Simulation of users' recommendation requests handling

Summary & Future worksWhat we learn and what's next?

Page 13: Capstone Project Slides- Yelper

The city-wise user-business network in Yelp tells a lot about a city● Motivation

○ User-business interaction may reflect the economy trend in a city

● Each city is distinct, so does its user-business network

● Nodes: u1, u2, … b1, b2, ...● Edges: u1 → b1, u1 → b2, ...● The network is a bipartite● If Yelp provides the timestamp of rating

data, we may see the evolution of economy development

Page 14: Capstone Project Slides- Yelper

Demo 1: User-business network visualization in the city of Madison, US

Page 15: Capstone Project Slides- Yelper

Charlotte, USRandomly selected 3312 edges

(1% of total ratings)

Page 16: Capstone Project Slides- Yelper

Las Vegas, USRandomly selected 11547 edges

(1% of total ratings)

Page 17: Capstone Project Slides- Yelper

Las Vegas, USRandomly selected 23095 edges

(2% of total ratings)

Page 18: Capstone Project Slides- Yelper

Las Vegas, USSequentially selected 23095 edges

(2% of total ratings)

Page 19: Capstone Project Slides- Yelper

Phoenix, USRandomly selected 20582 edges

(2% of total ratings)

Page 20: Capstone Project Slides- Yelper

Pittsburgh, USRandomly selected 2230 edges

(2% of total ratings)

Page 21: Capstone Project Slides- Yelper

Demo 2: Yelper recommendation page

Page 22: Capstone Project Slides- Yelper
Page 23: Capstone Project Slides- Yelper

Demo 3: Simulation of user requests handling for recommendation using Spark Streaming and Kafka

Page 24: Capstone Project Slides- Yelper
Page 25: Capstone Project Slides- Yelper

Agenda

Why?Why we need recommendation?

What?What are the components for Yelper?

How?How to build Yelper?

DemoUser-business network; Yelper main page; Simulation of users' recommendation requests handling

Summary & Future worksWhat we learn and what's next?

Page 26: Capstone Project Slides- Yelper

Summary & future works● Summary

○ Preprocessing by dividing business data by cities to allow fine tuned and customized recommendations

○ Collaborative filtering based recommendation using Spark MLlib

○ User-business graph visualization using D3 and graph-tool library

○ User-business graph analysis using Spark GraphX in Scala

○ Real-time user request handling simulation using Spark Streaming and Apache Kafka

○ Google Map view to recommend high rated businesses for users

● What we learn○ Streaming data analysis and mining

could be the new norm in the future and we as Data Scientist should be prepared for it

● Future works○ More graph analysis

■ Graph pagerank analysis using GraphX

■ Community discovery (similar to Facebook social network)

○ Improve recommendation■ Content-based recommendation■ Clustering all businesses■ Extract object from business

photos using Convolutional Neural Network

Page 27: Capstone Project Slides- Yelper

Thanks!