Yelp Presentation_Final

  • View
    150

  • Download
    2

Embed Size (px)

Text of Yelp Presentation_Final

Presentation Title

Yelp Data AnalysisSugandha GoelNisha NairLiz StapletonYiqun Xiang

How many people use yelp?How many read reviews?How many write reviews?

Well walk through what attributes increase the probability of receiving higher stars for a business and what attributes increase the chance of a user getting more fans.1

Our DataData received from YelpAll Data includes four countries (US, UK, DE, CA)

Business list of businesses, key variables included:Business Category (multiple)Review Count# StarsLocation

Tip comments given by users about businesses

User list of users, key variables included:Review CountAverage Stars GivenYelping Since# of Fans

Converted from JSON2

BUSINESS DATA

Initial dataset:61,184 initial records436 categoriesRemoved:Non-food related categories using category1 and category219,981 rows remaining113 categories remainingColumns that had less than 1,000 completed rowsMore complete dataset

Business Data Cleaning the Data

4

Business Data Decision Tree Analysis (CHAID)

`REVIEW COUNT

Business Data Decision Tree Analysis (CHAID)

With Drive Thru: 60% between 2.5 and 3.5 No Drive Thru: 58% between 3.5 and 4.026%

31%

60%

95%

64%

76%

86%

63%

73%

80%

90%

No Street Parkting: 69% > 3.5 stars72%

58%

33%

With Street Parkting: 83% >3.5 stars

Important factors:Drive ThruReview CountParking (Lot/Street)Noise LevelTakes ReservationsOutdoor Seating

Non-Drive Thrus > Drive Thrus

The greater the review count, the better the star rating

Business Data Decision Tree Analysis (CHAID)

8

Business Data Tableau Discovery

Location distribution of star ratings and price range:(1) PA has a highest star rating while the price is also the highest. In the opposite AZ and IL has a lower star rating and the price is low as well.(2) For WI, NV, NC, and SC, their stars rating is relatively lower compared with their price level.Conclusion: The top two maps shows that the relationship between star rating and price range differs from states to states.The average numbers of review per business also differs a lot from states to states.(1) The population of NV is 42% of that of AZ. (2) The number of business of NV is 64% of that of AZ. However, NV has the highest average number of review per business--83.31, which is almost twice of that of AZ. That indicates that NV has more Yelp users and/or the users in NV write reviews more frequently than the users in any other states. So it more like a cultural norm in NV to use Yelp than AZ.

8

Business Data Tableau Discovery

Population# of BusinessAvg Reviews per BusinessNV2.8 M4,62683AZ6.7 M7,25547NV/AZ(%)42%64%179%

The average number of reviews per business of NV (83) is twice of AZ (47) and five times of SC (16). Potential reasons:(1). NV has more Yelp users(2). The Yelp users in NV write reviews more frequentlyConclusion: Yelp is more of a cultural norm in NV

Location distribution of star ratings and price range:(1) PA has a highest star rating while the price is also the highest. In the opposite AZ and IL has a lower star rating and the price is low as well.(2) For WI, NV, NC, and SC, their stars rating is relatively lower compared with their price level.Conclusion: The top two maps shows that the relationship between star rating and price range differs from states to states.The average numbers of review per business also differs a lot from states to states.(1) The population of NV is 42% of that of AZ. (2) The number of business of NV is 64% of that of AZ. However, NV has the highest average number of review per business--83.31, which is almost twice of that of AZ. That indicates that NV has more Yelp users and/or the users in NV write reviews more frequently than the users in any other states. So it more like a cultural norm in NV to use Yelp than AZ.

9

Business Data Tableau Discovery

Location distribution of star ratings and price range:(1) PA has a highest star rating while the price is also the highest. In the opposite AZ and IL has a lower star rating and the price is low as well.(2) For WI, NV, NC, and SC, their stars rating is relatively lower compared with their price level.Conclusion: The top two maps shows that the relationship between star rating and price range differs from states to states.The average numbers of review per business also differs a lot from states to states.(1) The population of NV is 42% of that of AZ. (2) The number of business of NV is 64% of that of AZ. However, NV has the highest average number of review per business--83.31, which is almost twice of that of AZ. That indicates that NV has more Yelp users and/or the users in NV write reviews more frequently than the users in any other states. So it more like a cultural norm in NV to use Yelp than AZ.

10

TIP DATA

Completed Sentiment Analysis using r-studioRandomly chose 50,000 comments from the 500,000 available

Conclusions:People may be worried about writing negative reviewsPeople that are satisfied are more likely to spend the time giving the business a positive review

Tip Data Sentiment Analysis

Tip Data Word CloudsMost frequent words (1-star reviews)

Tip Data Word CloudsMost frequent words (5-star reviews)

USER DATA

Removed:All users without a user ID

Added:# of years since users started yelping

User Data Cleaning the Data

User Data Regression Analysis

ConclusionAll three independent variables are significant in this modelMore frequently a user writes reviews, the less fans they will havePeople care about quality rather than quantity of reviews

Dependent variables: fansIndependent variables: review_count, average_stars, frequency of review (the number of the reviews per year)All three independent variables are statistically significant to dependent variables.Frequency_of_review is negative correlated with number of fans while review_count and averge_ stars are both positively correlated with the number of fans.That is to say, while holding everything else constant, the more review the user writes, the more the number of fans the user will have. And while holding everything else constant, the higher the average stars, the more the number of fans the user will have. Since the frequency_of_review is negatively related to the fans, that is to say, while holding everything else constant, the more frequent the user write a review, the less the number of the fans the user will have.Users behave differently. Someone who has been a user for a shorter period (2 years) could write the same number of reviews as the one that has been a user for a longer period (5 years). The former might just write reviews too frequently, so their reviews might have a lower quality. And that might be a reason that they has smaller number of fans than the one that write the review less frequently.

17

SUMMARY

Advice to Improve your Yelp Rating

Do:Take reservationsOffer a quieter atmosphereOffer sufficient parkingEncourage customers to write reviews

Dont:Have a drive-thruHave a noisy environmentBe cash only

Dependent variables: fansIndependent variables: review_count, average_stars, frequency of review (the number of the reviews per year)All three independent variables are statistically significant to dependent variables.Frequency_of_review is negative correlated with number of fans while review_count and averge_ stars are both positively correlated with the number of fans.That is to say, while holding everything else constant, the more review the user writes, the more the number of fans the user will have. And while holding everything else constant, the higher the average stars, the more the number of fans the user will have. Since the frequency_of_review is negatively related to the fans, that is to say, while holding everything else constant, the more frequent the user write a review, the less the number of the fans the user will have.Users behave differently. Someone who has been a user for a shorter period (2 years) could write the same number of reviews as the one that has been a user for a longer period (5 years). The former might just write reviews too frequently, so their reviews might have a lower quality. And that might be a reason that they has smaller number of fans than the one that write the review less frequently.

19

Software Used in Our Analysis

Dependent variables: fansIndependent variables: review_count, average_stars, frequency of review (the number of the reviews per year)All three independent variables are statistically significant to dependent variables.Frequency_of_review is negative correlated with number of fans while review_count and averge_ stars are both positively correlated with the number of fans.That is to say, while holding everything else constant, the more review the user writes, the more the number of fans the user will have. And while holding everything else constant, the higher the average stars, the more the number of fans the user will have. Since the frequency_of_review is negatively related to the fans, that is to say, while holding everything else constant, the more frequent the user write a review, the less the number of the fans the user will have.Users behave differently. Someone who has been a user for a shorter period (2 years) could write the same number of reviews as the one that has been a user for a longer period (5 years). The former might just write reviews too frequently, so their reviews might have a lower quality. And that might be a reason that they has smaller number of fans than the one that write the review les