Upload
sankar-ram
View
13
Download
3
Embed Size (px)
DESCRIPTION
tutorial
Citation preview
v
Store * Service satisfaction * Contact with employee Crosstabulation
Contact with employee Service satisfaction
Strongly
Negative
Somewhat
Negative Neutral
Somewhat
Positive
Strongly
Positive
No Store Store 1 Count 16 9 18 17 19
% within Store 20.3% 11.4% 22.8% 21.5% 24.1%
Store 2 Count 2 15 16 13 12
% within Store 3.4% 25.9% 27.6% 22.4% 20.7%
Store 3 Count 9 14 23 22 14
% within Store 11.0% 17.1% 28.0% 26.8% 17.1%
Store 4 Count 17 14 19 10 10
% within Store 24.3% 20.0% 27.1% 14.3% 14.3%
Total Count 44 52 76 62 55
% within Store 15.2% 18.0% 26.3% 21.5% 19.0%
Yes Store Store 1 Count 9 11 20 13 14
% within Store 13.4% 16.4% 29.9% 19.4% 20.9%
Store 2 Count 24 15 18 14
% within Store 30.8% 19.2% 23.1% 17.9% 9.0%
Store 3 Count 6 6 18 11 15
% within Store 10.7% 10.7% 32.1% 19.6% 26.8%
Store 4 Count 10 21 25 12 24
% within Store 10.9% 22.8% 27.2% 13.0% 26.1%
Total Count 49 53 81 50 60
% within Store 16.7% 18.1% 27.6% 17.1% 20.5%
How many ppl r buying across departments
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
Primary department * Made
purchase
582 100.0% 0 .0% 582 100.0%
To find relation by seeing at the table:
Add strongly negative n somewhat negative same for positive
-ve store 2 - 50%
+ve store 3 –46%
To find relation before contact (yes n no)
Before contact after contact
-ve store 2- 41% 50%
+ve store 3 – 44% 46%
For store 2 –ve satisfaction is increasing after contact so the contact should not b increased
How to isolate 39 ppl from store 2
1) Ppl who went to store 22) Who had contact with employees3) Were negatively satisfied4)
Data view – rows r crossed out except for 39 rows
Conclusion: train d employees in clothing department coz 59% are not satisfied
How many males n females went to each n every department
This is data mining
Method of payment v service satisfaction
As .595>.05 we accept the null hypothesis that there is no relation between method of payment n service satisfaction
We go for 99% level of significance when we need to decide whether to close d store or not
Cluster analysis: the way we group d data
Hierarchical clustering and k- means clustering methods
Hierarchical – divisive clustering, agglomerative clustering
Clustering process – 1) selection of variables – ask what is the purpose of making groups
2) Distance between objects – physical dist, no of people between them etc, try to b innovative as possible
3) Clustering criteria’s – from where d dist is measured between cluster n object, and between clusters
Hierarchical is used when no of objects that u want to grp r less than 50, more than 50 k mean
similarity – 1 – more dist, less proximity
disSimilarity – 0 - less dist., more proximity, higher dist better relation
Euclidean distance property –
1) It is a straight line distance2) AB- same from A to B and B to A
The dendogram – critical element, it tells how the combinations happened and how many clusters we should have- graphical depiction of how clusters form
Distance measurement
Interval var r scale variables – we use Euclidean dist
Count – we use chi or phi square
For binary – jakard or simple
Nearest neighbour: no of objects = n(n-1)/2
Farthest neighbour: find dist between neighbour that r farthest,combine the least dist n then..
Centroid clustering:
We use between group linkages as clustering criteria
File – cell inter
How consumer feels about service and cell phones
q) which features r concumers using in cell phones
Proximities
We see variables and not cases as in cases we check for all the data and it’s very difficult to infer anything
To chk no of clusters
Chk the no of lines cutting vertical lines
In above case 5 line
But only 1 line is coming from a group therefore 1 cluster
Point of cut off: the pt at which next obj to join the cluster does so at a longer distance
In above eg 2 and 8 joined at a small dist and 4 joined at a higher dist therefore above is cut off point and there are 4 objects which are funused 0,5,1 and 7(sms,games,alarm,time and date)
Next cross tab between camera and scheduler
Jaccard – 35/(206-62) = 0.24
jaccard Yes matches/(total – no matches)
jaccard index inc if yes matches or no matches inc.
Cluster above can help in giving bundle offers
Eg buy a game and get 50 sms
0r buy games , sms ,alarm ,time n date n get internet free – gv the buyer something that they r not using
Aglomeration schedule (between groups – avg is taken
– 1 and 6 combine and distance is 862
Stage 2 sms to alarm 835
Alarm to game 807 therefore avg 821
(dist average of 1 to 6 and 1 to 2)
Stage 3(dist avg of 1to 6, 1 to 2,and 1 to 8)
Cut off point – coefficient between stage 3 and stage 4 is high therefore vll have cut off oint over here and we will have 4 objects 1,6,2,8
Russel and Rao – only yes matches
Yes/2
Output same 1,3,5,7
K means
3 cluster solution
Monthly exp of cluster 1 is 734.69 ie avg 13 ppl in cluster 1 is 734.69
4 cluster solution
Not evenly distributed
5 cluster solution
No of cases not equally distributed
Now in cluster 1 inly 1 case with monthy exp 2000 so outlier is monthly exp now do monthly exp
Black line is the median..50% cases above n below the line inside the green box
In the green box above n below line 25% each, and 25% in the t –lines(whisker), rest r outliers, astrix extreme cases and circles outliers
Outlier more than 1.5 times length of whisker, extreme case 3 times length of whisker
Data select cases
f
Now monthly expenditure is considerably well distributed if compared to previous cluster analysis
Save k means
We get another variable
Qcl_1 selects all the ppl in cluster 1
Now all the freq in one
Compare groups is best as we can come to know differences between groups
We use qcl_1
Now we do frequency, all 3 together,
We do frequency for category variables
Scaled r continous rest mostly category
Age comparison
3 group is oldest with 88.9% less than 18, 2nd youngest with 96.3% less 18
Gender of respondent
Case 1 more females
Case 3 more males
Level of education:
3rd group more educated,group 1 less educated
Name of current service providerlook at differences in all the cases
1 – less reliance 2 – more bsnl , more reliance 3- more tata indicom
Connection type:
1- More prepaid2- More postpAID
(in above egs see differences between cases)
So now profile of group
Summarize 1,2 and 3
1- Morefemales, less education, less reliance, high prepaid, 2- Relatively younger, less bsnl,more rel, more postpaid, less freq, less cash and more credit
cARD3- 3- relatively older, more males, more edu
So what is the strategy that co should adopt to target groups 1 ,2 and 3
Group 1 – come up with cheaper phones as less educated, lower education, females watch more tv so advertise more on tv and less on radio, advertise during drama serials
Trp – 8% out of all 8% ppl r watching that serial
Based on services
All groups r different
Cluster 3 only sms – ppl r not tech savvy, relatively older
Cluster 2 – tech savvy
Now save this
Now split file to build profile
permap