10
Sam’s Club Sales Review Data Cleaning Process In order to begin the data cleaning process, I took the top 1000 results from each table to better assess the data for errors. select top 1000 * from storeinformation select top 1000 * from memberindex select top 1000* from store_visits When assessing the data, I first noticed incorrect negative values in the tender amount, total unit cost and total visit amount tables. I found negative values in the tender amount, total unit cost, and total visit amount table. I then ran queries, took the absolute value of each column to make all values positive --Find negative values in tender amount select * from store_visits where tender_amt<=0 --Correct negative values in tender amount update store_visits set tender_amt=ABS(tender_amt) where tender_amt<0 --Find values that have a negative total unit cost select * from store_visits where tot_unit_cost<=0 --Correct negative values in total unit cost update store_visits set tot_unit_cost=ABS(tot_unit_cost) where tot_unit_cost<0

Sam's Club Sales Review

Embed Size (px)

Citation preview

Page 1: Sam's Club Sales Review

Sam’s Club Sales Review

Data Cleaning Process

In order to begin the data cleaning process, I took the top 1000 results from each table to better assess the data for errors.

select top 1000 *from storeinformation

select top 1000 *from memberindex

select top 1000*from store_visits

When assessing the data, I first noticed incorrect negative values in the tender amount, total unit cost and total visit amount tables. I found negative values in the tender amount, total unit cost, and total visit amount table. I then ran queries, took the absolute value of each column to make all values positive

--Find negative values in tender amount

select *from store_visitswhere tender_amt<=0

--Correct negative values in tender amount

update store_visitsset tender_amt=ABS(tender_amt)where tender_amt<0

--Find values that have a negative total unit cost

select *from store_visitswhere tot_unit_cost<=0

--Correct negative values in total unit cost

update store_visitsset tot_unit_cost=ABS(tot_unit_cost)where tot_unit_cost<0

--Find values with a negative total visit amount

select *from store_visitswhere total_visit_amt<=0

Unknown Author, 03/04/15,
store_information and member_index, no underscore
Unknown Author, 03/04/15,
0 is not negative, < 0
Unknown Author, 03/04/15,
0 is not negative where should be tender_amt < 0 on all preceding lines.
Unknown Author, 03/04/15,
Can also use LIMIT 1000 don't forget semicolons and need space on third line after 1000
Page 2: Sam's Club Sales Review

--Correct negative values for total visit amount

update store_visitsset total_visit_amt=ABS(total_visit_amt)where total_visit_amt<0

--Find incorrect values in membership_nbr

select *from memberindexwhere (membership_nbr=999)

select district_nbrfrom storeinformationwhere (district_nbr=0)

update storeinformationset district_nbr=999

where (district_nbr=0)

After fixing the incorrect values, I also noticed that many tables have missing values. The qualify organization table, the delivery type table and the align sub division table need to be so that the missing values show as null or 0.

--Missing data in the align sub division number table

select *from storeinformationwhere len(align_sub_division_nbr)=0

--update the store information table

update storeinformationset (align_sub_division_nbr='X')where len(align_sub_division_nbr=0)

Data Quality Assessment Documented

Entity Integrity

For the first part of the data quality assessment, I chose to check the entity integrity of each table. For the member index table, we ran two queries in order to discover if there was any missing or null records.

Queries

--Check if the member index table has entity integrity

select *from member_indexwhere membership_nbr is null

select membership_nbr, count(*)from member_index

Unknown Author, 03/04/15,
=0 within the len() function?
Unknown Author, 03/04/15,
store_information table, needs underscore check other store_information table
Unknown Author, 03/04/15,
Parenthesis not needed in where clause. Be consistent and remember ; after select statement.
Page 3: Sam's Club Sales Review

group by membership_nbrhaving count(*)>1Running both queries produced no result meaning that the member index does have entity integrity

Next I chose to check the store visits table for entity integrity. Again I ran two queries in order to discover if there was any missing or null records.

Queries

--check if the store visits table has entity integrity

select *from store_visitswhere visit_nbr is null

select visit_nbr, count(*)from store_visitsgroup by visit_nbrhaving count(*)>1Running both queries produced no result meaning that the store visits table has entity integrity

Next I chose to check the store information table for entity integrity. Again I ran two queries in in order to discover if there was any missing or null records

Queries--Check if the store information table has entity integrity

select *from store_informationwhere store_nbr is null

select store_nbr, count(*)from store_information group by store_nbrhaving count(*)>1Running both queries produced no result meaning the member index does have entity integrity

Referential Integrity

Next, I chose to check table’s relationships for referential integrity. I ran queries in the tables to make sure the values in the foreign key field match an existing value in the primary table

Store Information and Store Visits --Check if the store information table has referential integrity with the store visits table

select store_nbrfrom store_informationwhere store_nbr not in (select store_nbr from store_visits);Running the query produced results, meaning that the store information table does not have referential integrity with the store visits table

Unknown Author, 03/04/15,
Checked, looks good to me. Used ; in this line, do to all other statements. Be consistent.
Unknown Author, 03/04/15,
Also could not delete a row in store information where foreign key matches in store_visits.
Unknown Author, 03/04/15,
Count(*) as Counter? Give function row a name? Should be ok just a suggestion.
Page 4: Sam's Club Sales Review

select *from store_information

insert store_nbr values (9999,'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown','unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', )

To fix this issue of referential integrity, I created a dummy record in the primary key and updated the unmatched foreign key values to dummy values.

Member Index and Store Visits--Check if the member index table has referential integrity with the store visits table

select membership_nbrfrom member_indexwhere membership_nbr not in (select membership_nbr from store_visits);Running the query produced results, meaning that the member index table does not have referential integrity with the store visits table

insert membership_nbr values (9999,'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown')

To fix this issue of referential integrity, I created a dummy record in the primary key and updated the unmatched foreign key values to dummy values.

Data Analysis Process

To start analyzing the data, I first wanted to see the information available in the store visits, store information and member index tables.

select *from storevisits

select *from storeinformation

select *from memberindex

Overall Assessment of Total SalesIn order to get an assessment of overall total sales, I took a general approach by showing: the total number of items and unique items, the total unit cost the total sales of all members combined. Taking a more narrow approach, I then looked at total sales each day of the week for each individual store.

--Overall Summary of Total Sales

select sum(total_visit_amt) as [total sales], sum(tot_unit_cost) as [total unit cost], sum(tot_unique_itm_cnt) as [total number of unique items], sum(tot_scan_cnt) as [total number of items purchased]from store_visits

Unknown Author, 03/04/15,
Good, [] or “ “ for aliases with spaces. Don't forget ; after statements
Unknown Author, 03/04/15,
Need underscore between words.
Unknown Author, 03/04/15,
Wkipedia: “when a foreign key value is used it must reference a valid, existing primary key in the parent table. For instance, deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. ” Deleting a row member_index where they had a visit isn't allowed. But yes, foreign keys In store_visits need to match primary keys in other tables for tables to have referential integrity.
Page 5: Sam's Club Sales Review

--Summary of Total Sales listed by day of the week and store number

select sum(total_visit_amt) as [total sales], store_nbr, transaction_date as [dayweek]from storevisitsgroup by store_nbr, transaction_dateorder by store_nbr, dayweek

In order to see which week days Sam’s Club sells the most product, I took the overall total sales and used the transaction date to group the data by day of the week

--Summary of Total Sales each day of the week

select sum(total_visit_amt) as [total sales], transaction_date as [dayweek]from storevisitsgroup by transaction_date

In order to look at the total sales of each member type, I took the overall total sales and sorted the results by member type

--Summary of Total sales by membership types

select distinct member_type, sum(total_visit_amt) as [total sales]from memberindex m join storevisits s on m.membership_nbr=s.membership_nbrgroup by member_type

I also thought it would be interesting to calculate each stores profit through calculating each stores total sales minus each stores total unit cost.

--List of each stores profit

select store_nbr,sum(total_visit_amt) as [total sales], sum(tot_unit_cost) as [total unit cost],sum(total_visit_amt)-sum(tot_unit_cost) as [store profit]from store_visitsgroup by store_nbr

Assessment of Member Buying Behavior

To Asses member buying behavior, I first diversified the data by each individual member based on total average items bought and total amount spent. I also thought it would be important to include average number of unique items bought to compare with average of total items bought.

--Typical Purchase patterns of the members per visit

select distinct membership_nbr, avg(tot_scan_cnt) as [avgitemsbought], avg(total_visit_amt) as [avgamountspent], count(tot_unique_itm_cnt) as [number of unique items]from store_visitsgroup by membership_nbr

I then created a breakdown of member visits by day of the week by sorting the number of visits each member took by transaction date

Unknown Author, 03/04/15,
Aliases meant to have spaces since they are in []?
Unknown Author, 03/04/15,
Profit = Revenue – cost of unit – overhead?
Page 6: Sam's Club Sales Review

--Member visits breakdown by the day of the week

select distinct membership_nbr, transaction_date, count(visit_nbr) as [number of visits]from store_visitsgroup by transaction_date, membership_nbrorder by transaction_date

To get a summary of member visits by hours during a day, I calculated the total transaction time and grouped the result by individual transaction date including the amount of visitors which visited Sam’s Club stores each day. I also thought it would be interesting to order the data from greatest to least transaction time to discover any member visit patterns

-Summary of member visits breakdown by hours during a day

select max(transaction_time)-min(transaction_time) as [total transaction time], transaction_date, count(visit_nbr) as [number of visitors]from store_visitsgroup by transaction_date

--Summary of member visits breakdown by greatest to least transaction timeselect max(transaction_time)-min(transaction_time) as [total transaction time], transaction_date, count(visit_nbr) as [number of visitors]from store_visitsgroup by transaction_date order by max(transaction_time) desc

When looking for the characteristics of the most active members, I decided to scale the data for members who have concurred a total sales over 50000 and have visited Sam’s Club over 5000 times. I then looked at the average number of items bought for these members.

--Purchasing pattern of the customer who have visited and spent the most at Sam's Club

select distinct membership_nbr, avg(tot_scan_cnt) as [avgitemsbought], sum(total_visit_amt) as [totalamountspent], count(visit_nbr) as [number of visits]from store_visitsgroup by membership_nbrhaving sum(total_visit_amt)>50000 and count(visit_nbr)>5000

Summary of Total Sales and Buying Behavior

Overall Assessment of Store Sales

1. --A.Summary of Total Sales

Unknown Author, 03/04/15,
If in [] do you mean to have spaces between words?
Page 7: Sam's Club Sales Review

--Looking at the summary of total sales for Sam's Club, the results show a total sales around 84 million dollars at a total cost of around 75 million. Sam's Club sold 84,200 items to reach these sales numbers with 61,000 of those items being unique. Overall, Sam's Club saw a total profit of $9438845.35.

--When analyzing the list of total sales by day of the week and store number, I noticed that all but 4 stores witnessed the highest number of sales on January 29th, 2000.--Looking at total sales for each store we are able to see that store 18 had the highest total sales at $6980721.18 and store 3 has the lowest total sales at $2961961.83.

--C.Summary of Total Sales Breakdowns

--The amount of total sales per day for all of Sam’s club reached as high as $4929361.73 occurring on January 29th, 2000.

--When differentiating total sales by member type, it is apparent that members with member type V bought the most with a total sales of $31187297.64. Members with member type Z buy the least having a total sales of only $12300.37. V,W,X,A,E,D,3,Y,1,Z would be the order of total sales by member type from greatest to least.

--D.Useful Insights and Additional Analysis

--Store 18 has witnessed the most profit at an amount of $834940.19 while store 3 has witnessed the least profit at $309,654.24. I also thought that it would be interesting see what states had the highest total sales to see where Sam’s Club is most popular. The state of Ohio had the most total sales by far, almost doubling the total sales for the state of Florida which was second on the list.

--What is the most popular type of payment method?

I decided to look at number of refunds to assess what percent purchases Sam Club can expects to be refunded. With 46818 out of the 1,007,961 Member visits included purchase refunds, Sam’s Club should expect to see a product refund rate of about 5%.

Assessment of Member Buying Behavior

2. --A.Summary of Typical Purchasing Patterns

--Customers who visited Sam's Club the most on average only bought one item per visit with a wide variation per customer in the average amount spent.

--Overall, the 1,007,961 total Sam's Club members on average bought 8 items at an average spending amount of $83.50. It seems as if many members do not frequently purchase duplicates of the items as 6 out of the 8 items purchased are unique.

--B.Summary of Member Visits Breakdown

Page 8: Sam's Club Sales Review

When looking at the member visits breakdown by day of the week, I selected membership number, transaction date which I used to group the count of visits. I then ordered the data by transaction date to see the day to day breakdown.

When looking at the member visits breakdown, it would be expected that the amount of transaction time would correlate with the number of visitors but this is not the case. This may conclude that other factors such as the experience of the employee at Sam’s Club may have an effect on transaction time.

--C.Characteristics of Most Active Members

c.--When scaling the data to find the most active members at Sam's Club, I came up with 14 members who have visited over 5000 times and have spent over 50000 dollars. --Looking at these members, we are able to see that they always pay cash and either have a V or W member code

--D.Additional Analysis and Useful Insights

--Might also be important to retrieve information about each store to also look at the effect that management or location might have on total sales. I also thought it would be interesting the look the number of elite status members and their total sales. There are 1327 members at Sam’s Club who have accumulated a sales of 120483.11

-