If you can't read please download the document
View
150
Download
2
Embed Size (px)
Presentation Title
Yelp Data AnalysisSugandha GoelNisha NairLiz StapletonYiqun Xiang
How many people use yelp?How many read reviews?How many write reviews?
Well walk through what attributes increase the probability of receiving higher stars for a business and what attributes increase the chance of a user getting more fans.1
Our DataData received from YelpAll Data includes four countries (US, UK, DE, CA)
Business list of businesses, key variables included:Business Category (multiple)Review Count# StarsLocation
Tip comments given by users about businesses
User list of users, key variables included:Review CountAverage Stars GivenYelping Since# of Fans
Converted from JSON2
BUSINESS DATA
Initial dataset:61,184 initial records436 categoriesRemoved:Non-food related categories using category1 and category219,981 rows remaining113 categories remainingColumns that had less than 1,000 completed rowsMore complete dataset
Business Data Cleaning the Data
4
Business Data Decision Tree Analysis (CHAID)
`REVIEW COUNT
Business Data Decision Tree Analysis (CHAID)
With Drive Thru: 60% between 2.5 and 3.5 No Drive Thru: 58% between 3.5 and 4.026%
31%
60%
95%
64%
76%
86%
63%
73%
80%
90%
No Street Parkting: 69% > 3.5 stars72%
58%
33%
With Street Parkting: 83% >3.5 stars
Important factors:Drive ThruReview CountParking (Lot/Street)Noise LevelTakes ReservationsOutdoor Seating
Non-Drive Thrus > Drive Thrus
The greater the review count, the better the star rating
Business Data Decision Tree Analysis (CHAID)
8
Business Data Tableau Discovery
Location distribution of star ratings and price range:(1) PA has a highest star rating while the price is also the highest. In the opposite AZ and IL has a lower star rating and the price is low as well.(2) For WI, NV, NC, and SC, their stars rating is relatively lower compared with their price level.Conclusion: The top two maps shows that the relationship between star rating and price range differs from states to states.The average numbers of review per business also differs a lot from states to states.(1) The population of NV is 42% of that of AZ. (2) The number of business of NV is 64% of that of AZ. However, NV has the highest average number of review per business--83.31, which is almost twice of that of AZ. That indicates that NV has more Yelp users and/or the users in NV write reviews more frequently than the users in any other states. So it more like a cultural norm in NV to use Yelp than AZ.
8
Business Data Tableau Discovery
Population# of BusinessAvg Reviews per BusinessNV2.8 M4,62683AZ6.7 M7,25547NV/AZ(%)42%64%179%
The average number of reviews per business of NV (83) is twice of AZ (47) and five times of SC (16). Potential reasons:(1). NV has more Yelp users(2). The Yelp users in NV write reviews more frequentlyConclusion: Yelp is more of a cultural norm in NV
Location distribution of star ratings and price range:(1) PA has a highest star rating while the price is also the highest. In the opposite AZ and IL has a lower star rating and the price is low as well.(2) For WI, NV, NC, and SC, their stars rating is relatively lower compared with their price level.Conclusion: The top two maps shows that the relationship between star rating and price range differs from states to states.The average numbers of review per business also differs a lot from states to states.(1) The population of NV is 42% of that of AZ. (2) The number of business of NV is 64% of that of AZ. However, NV has the highest average number of review per business--83.31, which is almost twice of that of AZ. That indicates that NV has more Yelp users and/or the users in NV write reviews more frequently than the users in any other states. So it more like a cultural norm in NV to use Yelp than AZ.
9
Business Data Tableau Discovery
Location distribution of star ratings and price range:(1) PA has a highest star rating while the price is also the highest. In the opposite AZ and IL has a lower star rating and the price is low as well.(2) For WI, NV, NC, and SC, their stars rating is relatively lower compared with their price level.Conclusion: The top two maps shows that the relationship between star rating and price range differs from states to states.The average numbers of review per business also differs a lot from states to states.(1) The population of NV is 42% of that of AZ. (2) The number of business of NV is 64% of that of AZ. However, NV has the highest average number of review per business--83.31, which is almost twice of that of AZ. That indicates that NV has more Yelp users and/or the users in NV write reviews more frequently than the users in any other states. So it more like a cultural norm in NV to use Yelp than AZ.
10
TIP DATA
Completed Sentiment Analysis using r-studioRandomly chose 50,000 comments from the 500,000 available
Conclusions:People may be worried about writing negative reviewsPeople that are satisfied are more likely to spend the time giving the business a positive review
Tip Data Sentiment Analysis
Tip Data Word CloudsMost frequent words (1-star reviews)
Tip Data Word CloudsMost frequent words (5-star reviews)
USER DATA
Removed:All users without a user ID
Added:# of years since users started yelping
User Data Cleaning the Data
User Data Regression Analysis
ConclusionAll three independent variables are significant in this modelMore frequently a user writes reviews, the less fans they will havePeople care about quality rather than quantity of reviews
Dependent variables: fansIndependent variables: review_count, average_stars, frequency of review (the number of the reviews per year)All three independent variables are statistically significant to dependent variables.Frequency_of_review is negative correlated with number of fans while review_count and averge_ stars are both positively correlated with the number of fans.That is to say, while holding everything else constant, the more review the user writes, the more the number of fans the user will have. And while holding everything else constant, the higher the average stars, the more the number of fans the user will have. Since the frequency_of_review is negatively related to the fans, that is to say, while holding everything else constant, the more frequent the user write a review, the less the number of the fans the user will have.Users behave differently. Someone who has been a user for a shorter period (2 years) could write the same number of reviews as the one that has been a user for a longer period (5 years). The former might just write reviews too frequently, so their reviews might have a lower quality. And that might be a reason that they has smaller number of fans than the one that write the review less frequently.
17
SUMMARY
Advice to Improve your Yelp Rating
Do:Take reservationsOffer a quieter atmosphereOffer sufficient parkingEncourage customers to write reviews
Dont:Have a drive-thruHave a noisy environmentBe cash only
Dependent variables: fansIndependent variables: review_count, average_stars, frequency of review (the number of the reviews per year)All three independent variables are statistically significant to dependent variables.Frequency_of_review is negative correlated with number of fans while review_count and averge_ stars are both positively correlated with the number of fans.That is to say, while holding everything else constant, the more review the user writes, the more the number of fans the user will have. And while holding everything else constant, the higher the average stars, the more the number of fans the user will have. Since the frequency_of_review is negatively related to the fans, that is to say, while holding everything else constant, the more frequent the user write a review, the less the number of the fans the user will have.Users behave differently. Someone who has been a user for a shorter period (2 years) could write the same number of reviews as the one that has been a user for a longer period (5 years). The former might just write reviews too frequently, so their reviews might have a lower quality. And that might be a reason that they has smaller number of fans than the one that write the review less frequently.
19
Software Used in Our Analysis
Dependent variables: fansIndependent variables: review_count, average_stars, frequency of review (the number of the reviews per year)All three independent variables are statistically significant to dependent variables.Frequency_of_review is negative correlated with number of fans while review_count and averge_ stars are both positively correlated with the number of fans.That is to say, while holding everything else constant, the more review the user writes, the more the number of fans the user will have. And while holding everything else constant, the higher the average stars, the more the number of fans the user will have. Since the frequency_of_review is negatively related to the fans, that is to say, while holding everything else constant, the more frequent the user write a review, the less the number of the fans the user will have.Users behave differently. Someone who has been a user for a shorter period (2 years) could write the same number of reviews as the one that has been a user for a longer period (5 years). The former might just write reviews too frequently, so their reviews might have a lower quality. And that might be a reason that they has smaller number of fans than the one that write the review les
Recommended
View more >