22
Personalize Expedia Hotel Searches Optimize Hotel Ranks to Maximize Purchase Team members: Ambuj Agarwal Lanqiu Mei Yunlu Gao Yuqian Liu 06/18/2022

Personalize Expedia Hotel Searches

Embed Size (px)

DESCRIPTION

Our approach to Kaggle's Expedia competition that involved position ranking for Hotel Searches. The approach taken was a pseudo classification approach whereby calculating the probability of clicking and ranking the searches accordingly

Citation preview

Page 1: Personalize Expedia Hotel Searches

04/11/2023

Personalize Expedia Hotel SearchesOptimize Hotel Ranks to Maximize Purchase

Team members: Ambuj Agarwal Lanqiu Mei Yunlu Gao Yuqian Liu

Page 2: Personalize Expedia Hotel Searches

04/11/2023 2Personalize Expedia Hotel Rank

Background • Expedia is the largest online travel agency

Introduction Preprocessing Models Results Improvement

Page 3: Personalize Expedia Hotel Searches

04/11/2023 3Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Page 4: Personalize Expedia Hotel Searches

04/11/2023 4Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Background • Expedia is the largest online travel agency• Accurately matching customers with hotel inventory is

important in the highly competitive market

Page 5: Personalize Expedia Hotel Searches

04/11/2023 5Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Background • Expedia is the largest online travel agency• Accurately matching customers with hotel inventory is

important in the highly competitive market

Yearly Revenue of Expedia: $ 4,800,000,000

1% increase in conversion rate: $ 48,000,000

Page 6: Personalize Expedia Hotel Searches

04/11/2023 6Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Background • Expedia is the largest online travel agency• Accurately matching customers with hotel inventory is

important in the highly competitive market

Yearly Revenue of Expedia: $ 4,800,000,000

1% increase in conversion rate: $ 48,000,000 500 more Data Scientists, better models!

Page 7: Personalize Expedia Hotel Searches

04/11/2023 7Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Data • Searching and purchase data

• Hotel characteristics, location attractiveness, competitive OTA information

User

Visitor_locationVisitor_history_rate

Hotel Search CompetitorsSrch_idSite_idDate_timePromotion_flagAdult_countChildren_countRoom_countSatureday_nightquery_affinity_scoreorig_destination_distance

Position

Prop_idProp_starratingProp_review_scoreProp_brand_boolPromotion_flag

Location_score1Location_score2

Price_usdprop_log_historical_price

Comp_rateComp_invComp_rate_%_diff(1-8)

ResultClick_boolBooking_boolGross_book_usd

Page 8: Personalize Expedia Hotel Searches

04/11/2023 8Personalize Expedia Hotel Rank

Hotel Search CompetitorsSrch_idSite_idDate_timeAdult_countChildren_countRoom_countSatureday_nightquery_affinity_scoreorig_destination_distance

Position

Prop_idProp_starratingProp_review_scoreProp_brand_boolPromotion_flag

Location_score1Location_score2

Price_usdprop_log_historical_price

Comp_rateComp_invComp_rate_%_diff(1-8)

ResultClick_boolBooking_boolGross_book_usd

Introduction Preprocessing Models Results Improvement

Data

User

Visitor_locationVisitor_history_rate

• Searching and purchase data

• Hotel characteristics, location attractiveness, competitive OTA information

Page 9: Personalize Expedia Hotel Searches

04/11/2023 9Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Data

User Hotel SearchSrch_idSite_idDate_timePromotion_flagAdult_countChildren_countRoom_countSatureday_nightquery_affinity_scoreorig_destination_distance

Position

Visitor_locationVisitor_history_rate

Prop_idProp_starratingProp_review_scoreProp_brand_boolPromotion_flag

Location_score1Location_score2

Price_usdprop_log_historical_price

• Searching and purchase data

• Hotel characteristics, location attractiveness, competitive OTA information

Page 10: Personalize Expedia Hotel Searches

04/11/2023 10Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Data

User Hotel Search CompetitorsSrch_idSite_idDate_timePromotion_flagAdult_countChildren_countRoom_countSatureday_nightquery_affinity_scoreorig_destination_distance

Position

Visitor_locationVisitor_history_rate

Prop_idProp_starratingProp_review_scoreProp_brand_boolPromotion_flag

Location_score1Location_score2

Price_usdprop_log_historical_price

Comp_rateComp_invComp_rate_%_diff(1-8)

ResultClick_boolBooking_boolGross_book_usd

• Searching and purchase data

• Hotel characteristics, location attractiveness, competitive OTA information

Page 11: Personalize Expedia Hotel Searches

04/11/2023 11Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Data

User Hotel Search CompetitorsSrch_idSite_idDate_timePromotion_flagAdult_countChildren_countRoom_countSatureday_nightquery_affinity_scoreorig_destination_distance

Position

Visitor_locationVisitor_history_rate

Prop_idProp_starratingProp_review_scoreProp_brand_boolPromotion_flag

Location_score1Location_score2

Price_usdprop_log_historical_price

Comp_rateComp_invComp_rate_%_diff(1-8)

ResultClick_boolBooking_boolGross_book_usd

• Searching and purchase data

• Hotel characteristics, location attractiveness, competitive OTA information

Page 12: Personalize Expedia Hotel Searches

04/11/2023 12Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Data • 2.19G training data

• 54 variables

• 400,000 unique searches

• 10 Million Training Data Points

• 6 Million Test Data Points

• NA, and informative missing data

• Imbalanced classes problem • Click : 4.49%• Booking: 2.78%

Page 13: Personalize Expedia Hotel Searches

04/11/2023 13Personalize Expedia Hotel Rank

Introduction Preprocessing Models Results Improvement

Solution • Ranking Problem

• Converted to a pseudo classification problem• Overall Ranking Problem by each search

• list-wise approach• Use click_bool as target variable instead of booking_bool

Page 14: Personalize Expedia Hotel Searches

04/11/2023 14Personalize Expedia Hotel Rank

Preprocessing – Variable Not Considered

Introduction Preprocessing Models Results Improvement

Variables with missing value Id variables% Missing

comp1_rate 98%comp1_inv 98%comp1_rate_percent_diff 98%visitor_hist_starrating 95%visitor_hist_adr_usd 95%srch_query_affi nity_score 93%gross_bookings_usd 97%

ImputationFor Variables with less than 30% missing

prop_review_score: Random Imputationprop_location_score2 : Uniform Distribution Imputation

54 variables -> 19 variables

prop_id

srch_id

Page 15: Personalize Expedia Hotel Searches

04/11/2023 15Personalize Expedia Hotel Rank

Preprocessing – Dummy coding

Introduction Preprocessing Models Results Improvement

Variables Dummied

19 variables -> 30 variables

Dummy Bysite_id Majorityvisitor_location_country_id Majorityprop_country_id Majority

prop_starrating By Star

srch_length_of_stay Dummy 1 and 2srch_adults_count Dummy 1 to 4srch_room_count Dummy 1 and 2

Page 16: Personalize Expedia Hotel Searches

04/11/2023 16Personalize Expedia Hotel Rank

Preprocessing – Sampling

• Randomly take 10% of training dataset• Limited by computational power• Learning Curve

• 40000 unique search id• 1 million rows

• The philosophy of ensemble• Listwise ensemble method• Build different models to decrease variance without sacrifice in bias

Introduction Preprocessing Models Results Improvement

Page 17: Personalize Expedia Hotel Searches

04/11/2023 17Personalize Expedia Hotel Rank

Modeling Method

Introduction Preprocessing Models Results Improvement

40,000Unique IDs

39,000 unique IDs

1000 unique IDs

1,000 1,000

1,000

…39samples

… Click probabilities

Page 18: Personalize Expedia Hotel Searches

04/11/2023 18Personalize Expedia Hotel Rank

Model families

• Trees• C4.5(rpart), Bagging (C4.5), Random Forest, Gradient Boosted Trees, C5.0

• Discriminant Analysis• Linear, Quadratic, Flexible

• Logistic Regression• Logistic, Ridge, LASSO

• Support Vector Machine• Artificial Neural Network

Introduction Preprocessing Models Results Improvement

Page 19: Personalize Expedia Hotel Searches

04/11/2023 19Personalize Expedia Hotel Rank

Normalized Discounted Cumulative Gain (NDCG)• Measure the performance of a recommendation system based on the

graded relevance of the recommended entities.

• NDCG can range from 0 to 1 • 1 -> ideal ranking• Current score: 0.300 (Expedia Algorithm), 0.54075 (Kaggle Winner)

• Scoring• 1 for Click• 5 for Booking

Introduction Preprocessing Models Results Improvement

Page 20: Personalize Expedia Hotel Searches

04/11/2023 20Personalize Expedia Hotel Rank

NDCG Scores of Models

Introduction Preprocessing Models Results Improvement

Page 21: Personalize Expedia Hotel Searches

04/11/2023 21Personalize Expedia Hotel Rank

Limitation and Future Improvement

• Other imputation methods• KNN (time consuming and memory limitation)• Impute using highest correlated variable

• Utilize all the data • Computation and Memory Constraint• Use the same trick but based on the country level

• Storing Object and Predicting (Time and Memory Constraints)

• Compute some self-defined variables• Ensemble across different family of models • Performance on larger test dataset• Models with booking_bool• Possible to make different models for non-missing rows and use

gated ensemble

Introduction Preprocessing Models Results Improvement

Page 22: Personalize Expedia Hotel Searches

Questions ?