Upload
ryan-peyton
View
185
Download
4
Embed Size (px)
Citation preview
Sam’s Club Sales Review
Data Cleaning Process
In order to begin the data cleaning process, I took the top 1000 results from each table to better assess the data for errors.
select top 1000 *from storeinformation
select top 1000 *from memberindex
select top 1000*from store_visits
When assessing the data, I first noticed incorrect negative values in the tender amount, total unit cost and total visit amount tables. I found negative values in the tender amount, total unit cost, and total visit amount table. I then ran queries, took the absolute value of each column to make all values positive
--Find negative values in tender amount
select *from store_visitswhere tender_amt<=0
--Correct negative values in tender amount
update store_visitsset tender_amt=ABS(tender_amt)where tender_amt<0
--Find values that have a negative total unit cost
select *from store_visitswhere tot_unit_cost<=0
--Correct negative values in total unit cost
update store_visitsset tot_unit_cost=ABS(tot_unit_cost)where tot_unit_cost<0
--Find values with a negative total visit amount
select *from store_visitswhere total_visit_amt<=0
--Correct negative values for total visit amount
update store_visitsset total_visit_amt=ABS(total_visit_amt)where total_visit_amt<0
--Find incorrect values in membership_nbr
select *from memberindexwhere (membership_nbr=999)
select district_nbrfrom storeinformationwhere (district_nbr=0)
update storeinformationset district_nbr=999
where (district_nbr=0)
After fixing the incorrect values, I also noticed that many tables have missing values. The qualify organization table, the delivery type table and the align sub division table need to be so that the missing values show as null or 0.
--Missing data in the align sub division number table
select *from storeinformationwhere len(align_sub_division_nbr)=0
--update the store information table
update storeinformationset (align_sub_division_nbr='X')where len(align_sub_division_nbr=0)
Data Quality Assessment Documented
Entity Integrity
For the first part of the data quality assessment, I chose to check the entity integrity of each table. For the member index table, we ran two queries in order to discover if there was any missing or null records.
Queries
--Check if the member index table has entity integrity
select *from member_indexwhere membership_nbr is null
select membership_nbr, count(*)from member_index
group by membership_nbrhaving count(*)>1Running both queries produced no result meaning that the member index does have entity integrity
Next I chose to check the store visits table for entity integrity. Again I ran two queries in order to discover if there was any missing or null records.
Queries
--check if the store visits table has entity integrity
select *from store_visitswhere visit_nbr is null
select visit_nbr, count(*)from store_visitsgroup by visit_nbrhaving count(*)>1Running both queries produced no result meaning that the store visits table has entity integrity
Next I chose to check the store information table for entity integrity. Again I ran two queries in in order to discover if there was any missing or null records
Queries--Check if the store information table has entity integrity
select *from store_informationwhere store_nbr is null
select store_nbr, count(*)from store_information group by store_nbrhaving count(*)>1Running both queries produced no result meaning the member index does have entity integrity
Referential Integrity
Next, I chose to check table’s relationships for referential integrity. I ran queries in the tables to make sure the values in the foreign key field match an existing value in the primary table
Store Information and Store Visits --Check if the store information table has referential integrity with the store visits table
select store_nbrfrom store_informationwhere store_nbr not in (select store_nbr from store_visits);Running the query produced results, meaning that the store information table does not have referential integrity with the store visits table
select *from store_information
insert store_nbr values (9999,'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown','unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', )
To fix this issue of referential integrity, I created a dummy record in the primary key and updated the unmatched foreign key values to dummy values.
Member Index and Store Visits--Check if the member index table has referential integrity with the store visits table
select membership_nbrfrom member_indexwhere membership_nbr not in (select membership_nbr from store_visits);Running the query produced results, meaning that the member index table does not have referential integrity with the store visits table
insert membership_nbr values (9999,'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown')
To fix this issue of referential integrity, I created a dummy record in the primary key and updated the unmatched foreign key values to dummy values.
Data Analysis Process
To start analyzing the data, I first wanted to see the information available in the store visits, store information and member index tables.
select *from storevisits
select *from storeinformation
select *from memberindex
Overall Assessment of Total SalesIn order to get an assessment of overall total sales, I took a general approach by showing: the total number of items and unique items, the total unit cost the total sales of all members combined. Taking a more narrow approach, I then looked at total sales each day of the week for each individual store.
--Overall Summary of Total Sales
select sum(total_visit_amt) as [total sales], sum(tot_unit_cost) as [total unit cost], sum(tot_unique_itm_cnt) as [total number of unique items], sum(tot_scan_cnt) as [total number of items purchased]from store_visits
--Summary of Total Sales listed by day of the week and store number
select sum(total_visit_amt) as [total sales], store_nbr, transaction_date as [dayweek]from storevisitsgroup by store_nbr, transaction_dateorder by store_nbr, dayweek
In order to see which week days Sam’s Club sells the most product, I took the overall total sales and used the transaction date to group the data by day of the week
--Summary of Total Sales each day of the week
select sum(total_visit_amt) as [total sales], transaction_date as [dayweek]from storevisitsgroup by transaction_date
In order to look at the total sales of each member type, I took the overall total sales and sorted the results by member type
--Summary of Total sales by membership types
select distinct member_type, sum(total_visit_amt) as [total sales]from memberindex m join storevisits s on m.membership_nbr=s.membership_nbrgroup by member_type
I also thought it would be interesting to calculate each stores profit through calculating each stores total sales minus each stores total unit cost.
--List of each stores profit
select store_nbr,sum(total_visit_amt) as [total sales], sum(tot_unit_cost) as [total unit cost],sum(total_visit_amt)-sum(tot_unit_cost) as [store profit]from store_visitsgroup by store_nbr
Assessment of Member Buying Behavior
To Asses member buying behavior, I first diversified the data by each individual member based on total average items bought and total amount spent. I also thought it would be important to include average number of unique items bought to compare with average of total items bought.
--Typical Purchase patterns of the members per visit
select distinct membership_nbr, avg(tot_scan_cnt) as [avgitemsbought], avg(total_visit_amt) as [avgamountspent], count(tot_unique_itm_cnt) as [number of unique items]from store_visitsgroup by membership_nbr
I then created a breakdown of member visits by day of the week by sorting the number of visits each member took by transaction date
--Member visits breakdown by the day of the week
select distinct membership_nbr, transaction_date, count(visit_nbr) as [number of visits]from store_visitsgroup by transaction_date, membership_nbrorder by transaction_date
To get a summary of member visits by hours during a day, I calculated the total transaction time and grouped the result by individual transaction date including the amount of visitors which visited Sam’s Club stores each day. I also thought it would be interesting to order the data from greatest to least transaction time to discover any member visit patterns
-Summary of member visits breakdown by hours during a day
select max(transaction_time)-min(transaction_time) as [total transaction time], transaction_date, count(visit_nbr) as [number of visitors]from store_visitsgroup by transaction_date
--Summary of member visits breakdown by greatest to least transaction timeselect max(transaction_time)-min(transaction_time) as [total transaction time], transaction_date, count(visit_nbr) as [number of visitors]from store_visitsgroup by transaction_date order by max(transaction_time) desc
When looking for the characteristics of the most active members, I decided to scale the data for members who have concurred a total sales over 50000 and have visited Sam’s Club over 5000 times. I then looked at the average number of items bought for these members.
--Purchasing pattern of the customer who have visited and spent the most at Sam's Club
select distinct membership_nbr, avg(tot_scan_cnt) as [avgitemsbought], sum(total_visit_amt) as [totalamountspent], count(visit_nbr) as [number of visits]from store_visitsgroup by membership_nbrhaving sum(total_visit_amt)>50000 and count(visit_nbr)>5000
Summary of Total Sales and Buying Behavior
Overall Assessment of Store Sales
1. --A.Summary of Total Sales
--Looking at the summary of total sales for Sam's Club, the results show a total sales around 84 million dollars at a total cost of around 75 million. Sam's Club sold 84,200 items to reach these sales numbers with 61,000 of those items being unique. Overall, Sam's Club saw a total profit of $9438845.35.
--When analyzing the list of total sales by day of the week and store number, I noticed that all but 4 stores witnessed the highest number of sales on January 29th, 2000.--Looking at total sales for each store we are able to see that store 18 had the highest total sales at $6980721.18 and store 3 has the lowest total sales at $2961961.83.
--C.Summary of Total Sales Breakdowns
--The amount of total sales per day for all of Sam’s club reached as high as $4929361.73 occurring on January 29th, 2000.
--When differentiating total sales by member type, it is apparent that members with member type V bought the most with a total sales of $31187297.64. Members with member type Z buy the least having a total sales of only $12300.37. V,W,X,A,E,D,3,Y,1,Z would be the order of total sales by member type from greatest to least.
--D.Useful Insights and Additional Analysis
--Store 18 has witnessed the most profit at an amount of $834940.19 while store 3 has witnessed the least profit at $309,654.24. I also thought that it would be interesting see what states had the highest total sales to see where Sam’s Club is most popular. The state of Ohio had the most total sales by far, almost doubling the total sales for the state of Florida which was second on the list.
--What is the most popular type of payment method?
I decided to look at number of refunds to assess what percent purchases Sam Club can expects to be refunded. With 46818 out of the 1,007,961 Member visits included purchase refunds, Sam’s Club should expect to see a product refund rate of about 5%.
Assessment of Member Buying Behavior
2. --A.Summary of Typical Purchasing Patterns
--Customers who visited Sam's Club the most on average only bought one item per visit with a wide variation per customer in the average amount spent.
--Overall, the 1,007,961 total Sam's Club members on average bought 8 items at an average spending amount of $83.50. It seems as if many members do not frequently purchase duplicates of the items as 6 out of the 8 items purchased are unique.
--B.Summary of Member Visits Breakdown
When looking at the member visits breakdown by day of the week, I selected membership number, transaction date which I used to group the count of visits. I then ordered the data by transaction date to see the day to day breakdown.
When looking at the member visits breakdown, it would be expected that the amount of transaction time would correlate with the number of visitors but this is not the case. This may conclude that other factors such as the experience of the employee at Sam’s Club may have an effect on transaction time.
--C.Characteristics of Most Active Members
c.--When scaling the data to find the most active members at Sam's Club, I came up with 14 members who have visited over 5000 times and have spent over 50000 dollars. --Looking at these members, we are able to see that they always pay cash and either have a V or W member code
--D.Additional Analysis and Useful Insights
--Might also be important to retrieve information about each store to also look at the effect that management or location might have on total sales. I also thought it would be interesting the look the number of elite status members and their total sales. There are 1327 members at Sam’s Club who have accumulated a sales of 120483.11
-