AEM Group4 Project Report-1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

Applied Econometrics for Managers Group - 04

1
Applied Econometrics for Managers Group - 04

Table of Contents

ACKNOWLEDGEMENT.......................................................................................................3
INTRODUCTION....................................................................................................................4
PROBLEM STATEMENT.....................................................................................................5
DATASET.................................................................................................................................5
Gathering Data....................................................................................................................5
Describing Data..................................................................................................................5
Data Verification.................................................................................................................6
Exploring Data....................................................................................................................6
ATTRIBUTES..........................................................................................................................6
EXPLORATORY RESEARCH.............................................................................................6
Descriptive Test..................................................................................................................7
Data Visualization...............................................................................................................7
DATA PREPARATION........................................................................................................14
REGRESSION MODEL.......................................................................................................14
Model 1: Probit Model......................................................................................................15
Model 2: Logit Model.......................................................................................................16
LR Test.............................................................................................................................17
Calculation of APE (Average Partial Effect)....................................................................17
CONCLUSION.......................................................................................................................18
REFERENCES.......................................................................................................................19
APPENDIX.............................................................................................................................20

2
Applied Econometrics for Managers Group - 04

INTRODUCTION

Through voice and data communication services, the telecom sector is crucial in bridging the

gap between individuals and organisations around the globe. It includes a wide range of

services, such as internet access, mobile and landline phone services, and more recently,

cutting-edge technology like 5G.

In the telecom sector, it is essential to analyse customer turnover rate and its contributing

causes for a number of reasons. First, it assists telecom firms in identifying and resolving

problems with network quality, price, or customer service that result in customer churn. In

order to keep important clients, suppliers may improve their marketing and retention tactics

by having a better grasp of churn. Finally, lowering churn is a crucial indicator for telecom

organisations to track and manage since it may greatly affect profitability and long-term

viability in a market that is competitive.

The rate at which consumers stop doing business with a company is known as the churn rate,

often referred to as the attrition rate. This measure, which is frequently represented as the

percentage of service subscribers that cancel their subscriptions within a specified time frame,

reveals the effectiveness of the company's customer service division and its potential for

overall growth. The major segments within these sub-sectors include the following:

 Wireless Telecommunications

 Fixed-Line Telecommunications

 Internet Service Providers (ISPs)

 Cable and Satellite TV Providers

 Data Centers and Cloud Services

 Equipment Manufacturers

 Value-Added Services (VAS)

 Telecom Infrastructure Providers


3
Applied Econometrics for Managers Group - 04

The industry is reportedly one of the top producers of new jobs and receives significant NRI

investment. Government monopolies are facing a torrent of new competitors as they are being

privatised in several countries across the world. The growth of mobile services is surpassing that

of fixed-line services, and voice is starting to lose ground to the Internet as the primary method

of conducting business, upending established markets

4
Applied Econometrics for Managers Group - 04

PROBLEM STATEMENT

The telecommunications industry's churn rate fluctuates for a variety of reasons. By using

CHURN as a dependent variable and identifying the elements that influence customer churn,

the objective is to identify the major aspects that contribute the most to customer turnover.

The following are the major problems which will be addressed through the study:

 Effect of the service provider on the rate of client turnover

 The effect of the customer's choice of service type, such as 2G, 3G, or 4G

 How monthly fees affect customer turnover

 The effect of poor service on consumer attrition

DATASET

Gathering Data

The dataset has been obtained from data.gov.in from the telecommunication sector.

Describing Data

Each row in the dataset corresponds to a unique client, and each column to a separate

attribute, and the dataset explains the information of 7038 distinct consumers based on 11

different qualities.

The dataset includes information about:

 Clientele that has recently left

 Network type and Operator that each client has subscribed to

 Information about the customer

 Customer support quality

 Call drop reason, etc.

5
Applied Econometrics for Managers Group - 04

Data Verification

To ensure that the data is consistent, the dataset is examined for any attributes with missing

values. The dataset had no missing data.

Exploring Data

Character data columns, binary data columns, and numeric data columns are all mixed

together in the data.

ATTRIBUTES

All the columns in the dataset or the attributes of the data are described below in the table.

Attribute DataType Description


operator String The service provider to which customer has subscribed to
inout_travelling String Whether the customer is indoor, outdoor or travelling
network_type String Whether the customer is using 2G, 3G or 4G
rating Number The rating number that the customer has given for service
calldrop_category String The reason for dropping of the call
gender String Whether the customer is a male or a female
SeniorCitizen Binary Whether the customer is a senior citizen or not (0, 1)
Whether the customer has tech support or not (Yes, No, No
Support String
internet service)
Lifetime Number Number of months the customer stayed with the company
Charges Number The amount being charged from the customer monthly
Left String Whether the customer churned or not (Yes or No)
Table 1: Description of dataset attributes

EXPLORATORY RESEARCH

In order to draw some conclusions from the data, statistical tools have been used to analyse it.

It also describes the dataset's properties, such as the kind of variables used and how they are

used. Additionally, it gives a brief overview of a number of factors that could be significant

for the model.

6
Applied Econometrics for Managers Group - 04

Descriptive Test

Fig 1: Statistical Description of the dataset

Inference: Proportion of missing data is very less, hence we will omit those rows.

Data Visualization

Fig 2: Proportion of customers who left the services

It can be observed that almost 73% of the customers have not left the services while the other

27% did leave the operator or the services. This indicates that the churn rate is at 27% and the

other 73% customers are trying to stick to their network operators or the services that they

chose initially.

7
Applied Econometrics for Managers Group - 04

Fig 3: Box plot showing the monthly charges being charged from the customers w.r.t. churn rate

It is clear that the fees for consumers who have abandoned the services are somewhat greater

than the fees for users who have not abandoned the services. We may conclude that a small

number of clients abandoned the services because the prices were too high, as opposed to those

who stayed and took advantage of the services at a cheaper cost.

Fig 4: Proportion of Male and Female and their churn rate

8
Applied Econometrics for Managers Group - 04

It can be observed that the number of customers leaving the services in both the genders,

males and females is almost the same. This indicates that the gender is not a major

influencing factor for a customer to leave the service or to continue with the service.

Fig 5: Proportion of Senior Citizen customers and their churn rate

It can be observed that the senior citizens tend to leave the services compared to the non-

senior citizens. This indicates that the senior citizen factor is playing a significant influencing

factor role for a customer to leave the service or to continue with the service.

Fig 6: Proportion of customer churn rate of the various operators


9
Applied Econometrics for Managers Group - 04

It can be observed that the number of customers leaving the services of Airtel and VI are on a

slightly lower level as compared to BSNL and RJio. Overall, also, we can see that most

customers are loyal to their operators and do not leave it for any reason whatsoever.

However, this also means that all the operators need to change their strategies so as to

increase their loyalty base as compared to other operators.

Fig 7: Proportion of customer churn rate on the basis of traveling

It can be observed that travelling is not a major factor for the churn rate of customers as we can

see that all three categories have a similar and low number of customers leaving the operator.

Travelling has a slightly higher value as compared to Indoor and Outdoor, however, it is a very

minute number.

10
Applied Econometrics for Managers Group - 04

Fig 7: Proportion of customer churn rate for different network types

It can be observed that the customers are shifting towards the new era and new technology.

The churn rate for the 2G type of network is the highest with almost 35% of its users leaving.

Then comes the 3G type of network where the churn rate is around 32%. Here, the customers

would most probably shift towards the 4G type of network or even better technologies. 4G

type of network also has a significant churn rate with more than 25% of its customers leaving,

but this is comparatively less when compared to the other two types of networks we are

analysing.

11
Applied Econometrics for Managers Group - 04

Fig 8: Proportion of customer churn rate for different tech support received

It can be observed that the operators who do not provide any support services have the

highest churn rate of about 48% customers leaving the services while the operators who

provide support services also have churn rate but less than 25% which comparatively low.

Surprisingly, the operators who do not provide internet service have the lowest churn rate.

This indicates that the customer is likely to leave the services if the operator does not provide

proper support services leading to customer dissatisfaction.

12
Applied Econometrics for Managers Group - 04

DATA PREPARATION

The initial and most important phase is eliminating outliers and uncommon data types. In

addition, phases of data cleansing are typically discussed. Before being processed and

analysed, raw data must be transformed and cleansed. This phase frequently includes data

reformatting, data restoration, and blending data sources to enrich data. The processing phase

follows this stage.

The following steps have been followed:

1. Cleaning the Categorical Data

2. Creating Dummy Variables

REGRESSION MODEL

Regression analysis is a statistical technique that helps determine the relationship between

variables. Numerous modelling and analytical techniques are included to help determine the

relationship between independent and dependent variables. In addition, regression analysis

illustrates how dependent variables change when one or more independent variables change.

In this instance, logistic regression was used to analyse the dataset provided.

We have run a regression model in which the dependent variable is customer churn, i.e., whether

the customer has left the service, and the independent variables are SeniorCitizen, operator,

network_type, gender, Support, charges, and other variables. We have developed and compared

two distinct models, one with "Logit" and the other with "Probit" algorithms.

13
Applied Econometrics for Managers Group - 04

Model 1: Probit Model

Model1 was summarised using the Probit model. The significant factors identified were

SeniorCitizen1, TechSupportYes, and SeniorCitizen1:TechsupportYes.

The log-likelihood of the first model is –1384.887

14
Applied Econometrics for Managers Group - 04

McFadden's pseudo R-squared for the first model is 0.054, which indicates that this model

explains 5% of the variance.

Model 2: Logit Model

Model2 was summarised using the logit model. The significant factors identified were

SeniorCitizen1, TechSupportYes, and SeniorCitizen1:TechsupportYes.

The log-likelihood of the second model is –1384.791

15
Applied Econometrics for Managers Group - 04

McFadden’s pseudo R-squared for the second model is 0.054, which indicates that this model

explains 5% of the variance.

Model 3: Linear Probability Model

Model3 was summarised using the logit model. The significant factors identified were

SeniorCitizen1, TechSupportYes, and SeniorCitizen1:TechsupportYes.

16
Applied Econometrics for Managers Group - 04

The log-likelihood of the third model is –1385.253

McFadden’s pseudo R-squared for the third model is 0.054, which indicates that this model

explains 5% of the variance.

LR Test

The likelihood ratio test demonstrates the importance of the models. Also, it can be seen that

the R-squared values for all the three models are comparable, indicating that both all of them

are equally effective at explaining the variance.

Calculation of APE (Average Partial Effect)

The coefficients of Probit, Logit, and LP models cannot be interpreted directly because they

do not indicate the contribution of each factor individually. Thus, we calculate Average
17
Applied Econometrics for Managers Group - 04

Partial Effect (APE) that informs us of the individual contribution of every factor.

CONCLUSION

Assuming ceteris paribus for all the conditions:

1. All variables except for gender were found to be significantly affecting the Churn

18
Applied Econometrics for Managers Group - 04

rate.

2. Being a Senior Citizen marginally increases the probability of attrition rate by

0.063%. This suggests that the operators should focus on the services provided to

Senior Citizens.

3. As for the Operators, in comparison to Airtel, BSNL customers have the highest

probability of churn at close to 0.37, followed by VI with an increase in churn

probability of 0.187 and RJio with the smallest increase in the probability of

customer attrition at 0.129.

4. Network type tends to decrease the likelihood of consumer churn. Compared to 2G

networks, 3G networks reduce the likelihood of customer churn by 0.3, while 4G

networks reduce the likelihood of customer churn by 0.45. Therefore, the better

the network type, the lower the likelihood of customer attrition.

5. Monthly Charges contribute to the increase in the probability of consumer churn. The

probability of customer attrition increases by 0.001 per unit increase in Monthly

Charges.

6. The lifetime of a customer's relationship with a particular service provider plays a role.

With each unit increase in Lifetime, customer attrition probability decreases by

0.00046.

7. Using the Differences in Difference (DID) method, we analysed the combined effect

of senior citizen support and technical support. Even though the factor is not

statistically significant, it can be observed that senior citizen customers who receive

adequate customer support have a lower probability of churning than senior citizens

who do not receive Technical Support. In this case, the probability decreases by

0.002.

REFERENCES
19
Applied Econometrics for Managers Group - 04

● Dataset Source: https://data.gov.in/catalog/voice-call-quality-customer-

experience

● Regression Analysis. (n.d.). Retrieved from Wikipedia:

https://en.wikipedia.org/wiki/Regression_analysis

20
Applied Econometrics for Managers Group - 04

APPENDIX

# Clear computer memory of previous R sessions #


rm(list=ls())
21
Applied Econometrics for Managers Group - 04

#Set a working directory#


setwd("/Users/akilakundran/Downloads")

#load packages
library(psych)
library(haven)
library(foreign)
library(dummy)
library(mlogit)
library(ggplot2)
library(cowplot)
library(caTools)
library(lmtest)
library(mfx)
library(plm)
library(car)

#import the dataset


mydata <- read.csv("August_MyCall_2021.csv")
head(mydata)

#Descriptive Analysis
describe(mydata)

##Proportion of missing data is very less, hence we will omit those rows.
mydata1 <- na.omit(mydata)
describe(mydata1)

##Data Visualiztion
table(mydata1$Churn)
status <- table(mydata1$Churn)
prop.table(status)

#Exploratory Data Analysis

#Proportion of Customers churned out


barplot(prop.table(status), legend = TRUE, main = "Churn", ylab = "proportion", col = blues9,
names.arg = 1:length(status))

#Box Plot for Customers churned against Total Charges


ggplot(mydata1, aes(x=Churn,y=TotalCharges))+ geom_boxplot(color="red", outlier.color =
"black")

#Proportion of churned out customers on basis of gender


custom_colors <- c("green", "blue")
ggplot(mydata1, aes(x=gender,fill=Churn))+ geom_bar(position = 'dodge')
+scale_fill_manual(values = custom_colors)

# Create a barplot for proportion of Senior Citizens churned


plot <- ggplot(mydata1, aes(x = factor(SeniorCitizen), fill = Churn)) + geom_bar(position =
'fill') +labs(fill = "Churn")+scale_fill_manual(values = custom_colors)
# Set x-axis scale to only 0 and 1
plot + scale_x_discrete(limits = c("0", "1"))

22
Applied Econometrics for Managers Group - 04

#Proportion of churned out customers on basis of Operators


ggplot(mydata1, aes(x=operator,fill=Churn))+ geom_bar(position = 'fill')
+scale_fill_manual(values = custom_colors)

#Proportion of churned out customers on basis of travelling


ggplot(mydata1, aes(x=inout_travelling,fill=Churn))+ geom_bar(position = 'fill')
+scale_fill_manual(values = custom_colors)
#Proportion of churned out customers on basis of Network TYpe
ggplot(mydata1, aes(x=network_type,fill=Churn))+ geom_bar(position = 'fill')
+scale_fill_manual(values = custom_colors)
#Proportion of churned out customers on basis of if Tech Support Provided
ggplot(mydata1, aes(x=TechSupport,fill=Churn))+ geom_bar(position = 'fill')
+scale_fill_manual(values = custom_colors)

##Cleaning the categorical data


mydata2 <- data.frame(lapply(mydata1, function(x) {gsub("No internet service", "No", x)}))
View(mydata2)

##Creating dummy variables


mydata2_cat <- mydata2[,-c(9,10,11)]
View(mydata2_cat)
dummy<- data.frame(sapply(mydata2_cat,function(x) data.frame(model.matrix(~x-1,data
=mydata2_cat))[,-1]))
head(dummy)
mydata2$churndummy<-dummy$Churn
mydata2$genderdummy<-dummy$gender

View(mydata2)

##Building Model

##Not including Monthly Charges and Tenure

##Simple Probit Model


Model1 <- glm(formula = factor(Churn) ~ SeniorCitizen + operator + (network_type) +
(gender) + (TechSupport) +SeniorCitizen * (TechSupport),
family = binomial(link = probit), data = mydata2)

summary(Model1)

##Simple Logit Model


Model2 <- glm(formula = factor(Churn) ~ SeniorCitizen + operator + (network_type) +
(gender) + (TechSupport) +SeniorCitizen * (TechSupport),
family = binomial(link = logit), data = mydata2)

summary(Model2)

# Simple Linear Probability Model (LPM)


Model3 <- glm(
formula = factor(Churn) ~ SeniorCitizen + operator + (network_type) +
(gender) + (TechSupport) + SeniorCitizen * (TechSupport),
family = binomial(link = "identity"), # Specify identity link for LPM
data = mydata2
)
summary(Model3)

##Log likelihood and Degree of freedom


23
Applied Econometrics for Managers Group - 04

logLik(Model1)
logLik(Model2)
logLik(Model3)

#family = binomial(link=probit)
#McFadden's pseudo Rsquared
##Gives the percentage deviance
1 - Model1$deviance/Model1$null.deviance
1 - Model2$deviance/Model2$null.deviance
1 - Model3$deviance/Model3$null.deviance
#LR test statistics for overall significance
lrtest(Model1,Model2,Model3)

#APE model
logitmfx (factor(Churn) ~ SeniorCitizen + (operator) + (network_type) + (gender) +

(TechSupport), data=mydata2, atmean=

24
Applied Econometrics for Managers Group - 04

25

You might also like