Telecom Dataset Analysis
AbdulMajedRaja
1
Executive Summary:
This report summarises all the statistical findings for a telecom client. The primary
objective of the statistical finding is to find significant insights that can help the
company make right business decisions to grow and also to solve some problems.
The report states a few problems like understanding the factor that drives
increase in customer service calls and it also explores the option of upselling for
the company and most importantly it tries to find the factor that can possibly
drive customer churn.
Upon Statistical analysis, the report concludes that the company has to improve
their customer base in California which is also the state where the churn rate is
comparatively very high and necessary actions should be taken to control it. The
report also finds that most of the customers prefer both International plan and
Voice Mail plan which offers a bundling and upselling option to the company. The
report also finds out that the number of customer service calls is driven by the
total number of other calls the customer makes hence a spike in total calls means
the company has to improve the customer service executives.
Mostly importantly the report finds out that none of these given factors like
Customer service calls, Voice Mail plan or International plan choice contributes to
churn rather there must be some other factor which is not given to us and the
company has to investigate.
2
Table of Contents 1. Introduction: ......................................................................................................................................... 4
1.1 Business Problem Formulation: .......................................................................................................... 4
2. Data Analysis: ........................................................................................................................................ 5
2.1 Methodology: ...................................................................................................................................... 5
2.2 Data Cleaning: ..................................................................................................................................... 5
2.3 State-wise Customer Distribution: ...................................................................................................... 6
2.4 State-wise Churn Rate:........................................................................................................................ 7
2.5 Drivers influencing Customer Service Calls: ........................................................................................ 8
2.6 Optional Offerings Associations: ......................................................................................................... 9
2.7 Factors driving Churn: ....................................................................................................................... 11
3. Conclusion: .......................................................................................................................................... 13
3
List of Tables:
Table 1 - Variables Missing Value Count ....................................................................................................... 5
Table 2 - Dataset Summary ........................................................................................................................... 6
Table 3 - States with most & least customers .............................................................................................. 7
Table 4 - Crosstab between VMail Plan & International Plan ..................................................................... 10
List of Figures:
Figure 1- State-wise Customer Distribution .................................................................................................. 6
Figure 2 - State-wise Churn Rate .................................................................................................................. 7
Figure 3 - Customer Service Calls vs Total Day Calls ..................................................................................... 8
Figure 4 - Customer Service Calls vs Total Night Calls .................................................................................. 9
Figure 5 - Association between International Plan and Voice Mail Plan .................................................... 10
4
1. Introduction:
A Telecom Operator has provided us their customer data to analyse and find
meaningful insights in business context that can help the company to improve
their process and services to their customers. This report summarises all of the
statistical findings from the analysis of the Telecom operator’s dataset.
The dataset consists of the information about 3000 customers of the company.
Each observation of the dataset contains 18 variables like State, Total Calls made
during day, evening and night and also whether the customer has stayed with the
company or churned out.
1.1 Business Problem Formulation:
What are the states where the company has its most customers and least
customers?
What are the states where the company has seen most churn and least
churn?
Is there any factor that drives the number of customer service calls?
Is there any upselling option?
Is there any factor that drives customers churning out of the company?
5
2. Data Analysis:
2.1 Methodology:
Before we start doing the basic data analysis, it is important to check the health of
the given dataset. The given dataset, a csv file, even though looks clean, has a few
missing (NA) values. Hence the most important step is to clean the data and
prepare it for data analysis. Then basic summary statistics can be performed on it.
Then our Hypothesis can be tested and conclusions can be made. Statistical tool,
R, is used to perform all the data analysis given in this report.
2.2 Data Cleaning:
Once the given dataset is read into R environment as such, a basic summary of
the dataset reveals the missing values in the dataset.
Variable NA count
State 0
VMail.Plan 0
International.Plan 0
Total.Day.Minutes 271
Total.Day.Calls 271
Total.Day.Charge 271
Total.Evening.Minutes 264
Total.Evening.Calls 264
Total.Evening.Charge 264
Total.Night.Minutes 264
Total.Night.Calls 264
Total.Night.Charge 264
Total.International.Minutes 263
Total.International.Calls 263
Total.International.Charge 263
Customer.Service.Calls 298
Number.VMail.Messages 261
Churn 0 Table 1 - Variables Missing Value Count
6
Variables State, VMail.Plan, International.Plan, Churn do not contain any missing
values and all other variables contain missing values about approximately 300
which is around 10% of the dataset. Hence instead of imputing alternative values
for these missing values, these can be omitted for the rest of the data analysis. A
new dataset can be created without any of these missing values.
Name Observations Variables
Given Dataset (teleco) 3000 18
New Dataset(clean) 2668 18 Table 2 - Dataset Summary
Also it has to be noted that any data analysis that only involves those variables
without any missing values can be performed on the original dataset instead of
the newly created dataset. Our new dataset is ready for further exploratory data
analysis.
2.3 State-wise Customer Distribution:
The most important insight for any business is to know its customer distribution
across different region where the company operates. Drawing a bar plot for all
the observations with respect to each state, creates this State-wise Customer
Distribution chart.
Figure 1- State-wise Customer Distribution
7
From Fig.1, We can infer the following:
Customer Distribution
Top 3 States Bottom 3 States
West Virginia California
Minnesota Georgia
Wyoming Iowa Table 3 - States with most & least customers
The company’s Sales and Marketing team should focus more in the second
column of Table 3 to increase their customer base in these regions.
2.4 State-wise Churn Rate:
Customer churning is a real problem for any company. It’s very important for the
company to track the regions where the most and least churn happens.
Figure 2 - State-wise Churn Rate
Fig 2 shows that the state California leads the churn rate with more than 20% of
its customers churning out, followed by Missouri. It’s also clear that Kansas is the
state where there is almost no churn followed by Connecticut whose retention
rate is just next to Kansas.
8
2.5 Drivers influencing Customer Service Calls:
Irrespective of the type of the company, Customers always expect good service
and for a telecom company it’s very important to have a stupendous customer
service team. To continuously expand the customer service team to meet the
customer’s requirements, the company has to understand the way customers
make calls to their customer service. Hence we are building a scatter plot to
understand the relationship between the number of day calls and customer
service calls.
Figure 3 - Customer Service Calls vs Total Day Calls
Fig 3 suggests that the number of customer service calls a customer makes has a
positive relationship with the total number of day calls the same customer makes.
9
A similar relationship can be seen between the number of customer service calls
and the number of night calls.
Figure 4 - Customer Service Calls vs Total Night Calls
From both Fig.3 and Fig.4, we can infer that the total number of calls irrespective
of when it is (Day/Night/Evening) made has an influence over the number of
customer service calls the customer makes.
2.6 Optional Offerings Associations:
From the given dataset, the company has two extra optional offerings: 1. Voice
Mail plan and 2. International Plan. Here we’ll explore whether there is any
association between these two variables so that the company can plan bundling
or upselling them.
In order to find the association between these two variables, an Association rule
has been built for the same.
{International.Plan=Yes} => {VMail.Plan=Yes}
Support: 0.8053333 Confidence: 0.8981413
10
And it turns out that the support is approximately 0.80 which means out of all the
customers over 80% customers prefer both together. To understand it better a
mosaic plot is drawn between the both these variables.
Figure 5 - Association between International Plan and Voice Mail Plan
A cross tabulation between those two variables can explain the customer
preference in terms of percentage.
Voice Mail Plan
International Plan
No Yes
No 0.01 0.09
Yes 0.09 0.81 Table 4 - Crosstab between VMail Plan & International Plan
Almost 81% of the customers have taken both these plans and only 9% of the
current customer base prefers just one plan.
11
2.7 Factors driving Churn:
The most worrying part of any company is when their customers leave them. And
especially for a telecom company, Customer Churn can happen due to a lot of
reasons. Here we’ll explore some of the factors that can drive customer churn.
We’ll make some Null hypothesis related to Churn and will statistically reject the
Null hypothesis or we’ll accept that we failed to reject the Null hypothesis that we
made.
The biggest assumption among experts with respect to churn is that the
customers who churn out would significantly make more calls to Customer service
than the customers who stay.
Hypothesis 1:
Null Hypothesis: There is no significant difference between the number of calls
made by Customers who churn out and Customers who stay back.
Alternative Hypothesis: There is a significant difference between the number of
calls made by Customers who churn out and Customers who stay back.
In order to test this hypothesis, we can run a complete Random ANOVA
significance Test between Number of Customer service calls & Customer Churn.
fit = aov(clean$Customer.Service.Calls ~ clean$Churn, data=clean)
The summary of the model reveals that the F-Value is 0.032 and P-Value is 0.857
(greater than F-Value). At 95% Confidence level, P-Value is not less than 0.05
hence we fail to reject the Null Hypothesis. And that means, there is no significant
difference between the customer service calls made by churned out customers
and stayed back customers.
Hypothesis 2:
Null Hypothesis: The factor variable Churn is independent of the factor Voice Mail
Plan
Alternative Hypothesis: The factor variable Churn is dependent of the factor Voice
Mail Plan
12
As we have to test the independence between two factor variables, we should
perform a Chi-Square Significance test to understand it.
chisq.test(table(teleco$Churn,teleco$VMail.Plan))
Output of this Chi-Square test gives a p-value of 0.9368. At 95% confidence level,
it’s proved that these two factor variables are totally independent which results in
failing to reject the Null Hypothesis.
Hypothesis 3:
Null Hypothesis: The factor variable Churn is independent of the factor
International Plan
Alternative Hypothesis: The factor variable Churn is dependent of the factor
International Plan
As we have to test the independence between two factor variables, we should
perform a Chi-Square Significance test to understand the dependence.
chisq.test(table(teleco$Churn,teleco$International.Plan))
Output of this Chi-Square test gives a p-value of 0.9807. At 95% confidence level,
this p-value is not less than 0.05 so it is proved that these two factor variables are
totally independent which results in failing to reject the Null Hypothesis.
13
3. Conclusion:
Key conclusions that can made from the above data analysis:
Most of the customers are from West Virginia while the company’s
customer base in California is the least where it has to improve its customer base.
The company has the best customer retention rate of 100% in Kansas while
it has the worst churn rate of over 20% in California where it has to improve a lot
to retain its customers.
The number Customer Service Calls are directly influenced by the total
number of calls (day calls, night calls, evening calls) made by a customer hence
the company has to improve its customer service executive counts when there is
a spike in the number of calls.
There is a huge upselling option for the company as there is a strong
association between Voice Mail Plan and International Plan. For any new
customer, the company can either offer a bundle package as customers mostly
prefer both or can give the customers discounts for the second one to quickly
upsell the second plan.
There is no significant relationship between customer service calls and
Customer Churn.
Also there is no significant dependence between Customer Churn and
International plan or Voice Mail plan.
Hence it can be concluded that none of these prominent factors given in
the dataset drives Customer churn and there must be some other factor that
makes customers churn out and the company has to provide more customers
data and other variables like their joining date and leaving date for us try
predicting churn.