Europe Hotel Customer Satisfaction
1 OVERVIEW
The task encompassed several key components, including data cleaning, demographic analysis, Likert scale analysis, predictive modelling using ordinal logistic regression, and drawing conclusions based on the findings.
Data Cleaning The data cleaning phase involved preparing the dataset for analysis by addressing issues such as missing values, outliers, and inconsistencies.
Demographic Analysis Demographic analysis involved examining the characteristics of the hotel customers, such as age, gender.
Likert Analysis The Likert scale analysis likely focused on understanding customer opinions, attitudes, and satisfaction levels using Likert scale questions. This involved assessing the level of agreement or disagreement.
Predictive Modelling with Ordinal Logistic Regression The task involved building a predictive model using ordinal logistic regression to understand the factors influencing customer satisfaction in European hotels.
Conclusion Based on the analyses conducted, the task likely culminated in drawing conclusions about the factors that significantly influence customer satisfaction in European hotels.
By integrating data cleaning, demographic analysis, Likert scale analysis, and predictive modelling using ordinal logistic regression, the task aimed to provide a comprehensive understanding of customer satisfaction in European hotels and derive actionable insights from the findings.
2 ABOUT DATASET
The dataset is from Kaggle. Here is the link to the dataset Europe Hotel Customer Satisfaction
Description of Columns
Age - 7 to 85
purpose_of_travel - aviation, academic, personal, business, tourism.
Type of Travel - Group travel, Personal Travel.
Type Of Booking - Group bookings, Individual/Couple.
Hotel wifi service - Ratings out of 5.
Departure/Arrival convenience - Ratings out of 5.
Ease of On-line booking - Ratings out of 5.
Hotel location - Ratings out of 5.
Food and drink - Ratings out of 5.
Stay comfort - Ratings out of 5.
Common Room entertainment - Ratings out of 5.
Check-in/Checkout service - Ratings out of 5.
Other service - Ratings out of 5.
Cleanliness - Ratings out of 5.
satisfaction - satisfied, neutral or dissatisfied.
3 IMPORTING DATASET
First 10 rows of the dataset
id | Gender | Age | purpose_of_travel | Type.of.Travel | Type.Of.Booking | Hotel.wifi.service | Departure.Arrival..convenience | Ease.of.Online.booking | Hotel.location | Food.and.drink | Stay.comfort | Common.Room.entertainment | Checkin.Checkout.service | Other.service | Cleanliness | satisfaction |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
70172 | Male | 13 | aviation | Personal Travel | Not defined | 3 | 4 | 3 | 1 | 5 | 5 | 5 | 4 | 5 | 5 | neutral or dissatisfied |
5047 | Male | 25 | tourism | Group Travel | Group bookings | 3 | 2 | 3 | 3 | 1 | 1 | 1 | 1 | 4 | 1 | neutral or dissatisfied |
110028 | Female | 26 | tourism | Group Travel | Group bookings | 2 | 2 | 2 | 2 | 5 | 5 | 5 | 4 | 4 | 5 | satisfied |
24026 | Female | 25 | tourism | Group Travel | Group bookings | 2 | 5 | 5 | 5 | 2 | 2 | 2 | 1 | 4 | 2 | neutral or dissatisfied |
119299 | Male | 61 | aviation | Group Travel | Group bookings | 3 | 3 | 3 | 3 | 4 | 5 | 3 | 3 | 3 | 3 | satisfied |
111157 | Female | 26 | business | Personal Travel | Individual/Couple | 3 | 4 | 2 | 1 | 1 | 1 | 1 | 4 | 4 | 1 | neutral or dissatisfied |
82113 | Male | 47 | academic | Personal Travel | Individual/Couple | 2 | 4 | 2 | 3 | 2 | 2 | 2 | 3 | 5 | 2 | neutral or dissatisfied |
96462 | Female | 52 | aviation | Group Travel | Group bookings | 4 | 3 | 4 | 4 | 5 | 5 | 5 | 4 | 5 | 4 | satisfied |
79485 | Female | 41 | tourism | Group Travel | Group bookings | 1 | 2 | 2 | 2 | 4 | 3 | 1 | 4 | 1 | 2 | neutral or dissatisfied |
65725 | Male | 20 | academic | Group Travel | Individual/Couple | 3 | 3 | 3 | 4 | 2 | 3 | 2 | 4 | 3 | 2 | neutral or dissatisfied |
Cleaning column names
Code
column names |
---|
id |
gender |
age |
purpose_of_travel |
type_of_travel |
type_of_booking |
hotel_wifi_service |
departure_arrival_convenience |
ease_of_online_booking |
hotel_location |
food_and_drink |
stay_comfort |
common_room_entertainment |
checkin_checkout_service |
other_service |
cleanliness |
satisfaction |
- cleaned column names
Dropping id column
Classes of dataset
'data.frame': 103904 obs. of 16 variables:
$ gender : chr "Male" "Male" "Female" "Female" ...
$ age : int 13 25 26 25 61 26 47 52 41 20 ...
$ purpose_of_travel : chr "aviation" "tourism" "tourism" "tourism" ...
$ type_of_travel : chr "Personal Travel" "Group Travel" "Group Travel" "Group Travel" ...
$ type_of_booking : chr "Not defined" "Group bookings" "Group bookings" "Group bookings" ...
$ hotel_wifi_service : int 3 3 2 2 3 3 2 4 1 3 ...
$ departure_arrival_convenience: int 4 2 2 5 3 4 4 3 2 3 ...
$ ease_of_online_booking : int 3 3 2 5 3 2 2 4 2 3 ...
$ hotel_location : int 1 3 2 5 3 1 3 4 2 4 ...
$ food_and_drink : int 5 1 5 2 4 1 2 5 4 2 ...
$ stay_comfort : int 5 1 5 2 5 1 2 5 3 3 ...
$ common_room_entertainment : int 5 1 5 2 3 1 2 5 1 2 ...
$ checkin_checkout_service : int 4 1 4 1 3 4 3 4 4 4 ...
$ other_service : int 5 4 4 4 3 4 5 5 1 3 ...
$ cleanliness : int 5 1 5 2 3 1 2 4 2 2 ...
$ satisfaction : chr "neutral or dissatisfied" "neutral or dissatisfied" "satisfied" "neutral or dissatisfied" ...
- 5 character variables and 11 numerical variables
4 Data Cleaning
Missing values
x | |
---|---|
gender | 0 |
age | 0 |
purpose_of_travel | 0 |
type_of_travel | 0 |
type_of_booking | 0 |
hotel_wifi_service | 0 |
departure_arrival_convenience | 0 |
ease_of_online_booking | 0 |
hotel_location | 0 |
food_and_drink | 0 |
stay_comfort | 0 |
common_room_entertainment | 0 |
checkin_checkout_service | 0 |
other_service | 0 |
cleanliness | 0 |
satisfaction | 0 |
- no missing values
Duplicated entries
- no duplicated entries
Sub setting data
5 Demographic Analysis
1. Age Distribution
Code
- Most customers are between the age of 40 - 60
2. Gender Distribution
gender | Freq |
---|---|
Female | 52727 |
Male | 51177 |
- Most customers are females with a total of 52 727 as compared to males with a total of 51 177.
3. Purpose of Travel Distribution
Code
library(plotly)
td=ggplot(europe_dem,aes(purpose_of_travel,fill=type_of_travel))+
geom_bar(position = "dodge",stat="count")+theme_bw() +
labs(fill="Type of Travel",title = "Purpose of Travel Distribution",
y="Frequency",x="Purpose of Travel")+
ggthemes::theme_economist(base_size = 5)+
labs(caption = "@Data Insights 2024")
include_graphics("td.png")
Most customers travel as a group rather than individually.
Customers who travel for the purpose of tourism they travel as a group dominating other purposes of travel.
4. Type of Booking Distribution
Code
Type of booking | Frequency |
---|---|
Group bookings | 49665 |
Individual/Couple | 46745 |
Not defined | 7494 |
- Most bookings in this hotel are group bookings
5. Satisfaction Distribution
Code
Satisfaction | Frequency |
---|---|
neutral or dissatisfied | 58879 |
satisfied | 45025 |
- Most customers are dissatisfied or neutral with the facilities of the hotel
6 Further Analysis
1. Services influencing customer satisfaction
Code
library(likert)
hws=as.factor(europe_num$hotel_wifi_service)
dac=as.factor(europe_num$departure_arrival_convenience)
eoob=as.factor(europe_num$ease_of_online_booking)
hl=as.factor(europe_num$hotel_location)
fad=as.factor(europe_num$food_and_drink)
sc=as.factor(europe_num$stay_comfort)
cre=as.factor(europe_num$common_room_entertainment)
ccs=as.factor(europe_num$checkin_checkout_service)
os=as.factor(europe_num$other_service)
c=as.factor(europe_num$cleanliness)
new_lik=data.frame(hotel_wifi_Service=hws,
departure_arrival_convinience=dac,
ease_of_online_booking=eoob,
hotel_location=hl,
food_and_drink=fad,
stay_comfort=sc,
common_room_entertainment=cre,
checkin_checkout_service=ccs,
other_service=os,
cleanliness=c)
lik=likert(new_lik)
invisible(likert.bar.plot(lik)+theme(legend.position = "bottom")+
theme_bw(base_size = 10)+
labs(title = "Respondents Distribution",
subtitle="0=not applicable,1=very dissatisfied,2=dissatisfied,3=neutral,4=satisfied,5=very satisfied",
caption = "@Data Insights 2024"))
include_graphics("likert.png")
Customers are satisfied with the following services:
other services, check-in check-out service, stay comfort, cleanliness and common room entertainment
Customers are neither dissatisfied or satisfied with the following services:
food and drink, hotel location, departure or arrival convenience, hotel WiFi and ease of online booking
Suggestion: The management of the hotel should enhance the following services ,food and drink, hotel location, departure or arrival convenience, hotel WiFi and ease of online booking in order to increase customer satisfaction and loyalty.
7 Predictive Modelling
Application of Ordinal Logistic Regression
NB. Ordinal logistic regression requires three levels of factors on the response variable and this dataset has two levels of satisfaction on the response variable hence application of binary logistic regression
2. Relationship between predictor variables and dependent variable
Code
Call:
glm(formula = satisfaction ~ ., data = europe_model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.444257 0.006781 -65.519 < 2e-16 ***
hotel_wifi_service 0.079337 0.001450 54.709 < 2e-16 ***
departure_arrival_convenience -0.056248 0.001007 -55.868 < 2e-16 ***
ease_of_online_booking 0.032373 0.001448 22.358 < 2e-16 ***
hotel_location -0.012301 0.001206 -10.204 < 2e-16 ***
food_and_drink -0.038266 0.001410 -27.147 < 2e-16 ***
stay_comfort 0.072424 0.001432 50.583 < 2e-16 ***
common_room_entertainment 0.082391 0.001743 47.266 < 2e-16 ***
checkin_checkout_service 0.062746 0.001097 57.184 < 2e-16 ***
other_service 0.038369 0.001355 28.325 < 2e-16 ***
cleanliness 0.009213 0.001644 5.605 2.09e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.1751888)
Null deviance: 25514 on 103903 degrees of freedom
Residual deviance: 18201 on 103893 degrees of freedom
AIC: 113890
Number of Fisher Scoring iterations: 2
All predictor variables are significant and most of them have a positive relationship with the satisfaction level of customers.
The estimate of 0.07937 for the “hotel wifi service” predictor in the binary logistic regression output indicates the change in the log odds of customers being satisfied for a one unit increase in the quality or availability of hotel wifi service, holding all other variables constant.
3. Most important services influencing customer satisfaction
Code
Magnitude | |
---|---|
common_room_entertainment | 0.0823914 |
hotel_wifi_service | 0.0793373 |
stay_comfort | 0.0724241 |
checkin_checkout_service | 0.0627459 |
other_service | 0.0383686 |
ease_of_online_booking | 0.0323731 |
cleanliness | 0.0092125 |
hotel_location | -0.0123009 |
food_and_drink | -0.0382663 |
departure_arrival_convenience | -0.0562483 |
(Intercept) | -0.4442572 |
The most important services influencing customer satisfaction are common room entertainment, hotel wifi service, stay comfort, check-in check-out service, other service, ease of online booking and cleanliness
Suggestion: The customer service department of the hotel should keep on increasing their delivery of the above service to increase customer satisfaction levels. They should also try to put more effort in making sure that their delivery of services with negative relationship to be excellent so as to enhance the customer satisfaction levels.
8 Code Appendix
Code
knitr::opts_chunk$set(echo = T, message = F, warning = F)
europe=read.csv(file.choose())
library(knitr)
library(dplyr)
europe %>% head(10) %>% kable(caption = "First 10 rows")
library(janitor)
europe=clean_names(europe)
europe %>% names %>% as.data.frame() %>%
rename("column names"=".") %>% kable()
europe=europe[2:17] #dropped id column
europe %>% str()
colSums(is.na(europe)) %>% kable()
anyDuplicated.default(europe)
europe_dem=europe %>%
dplyr::select(1:5,16)
europe_num=europe %>%
dplyr::select(6:15)
library(ggplot2)
library(ggthemes)
invisible(ggplot(europe_dem,aes(x=age,fill=factor(age)))+
geom_bar(stat="count",width=0.5,show.legend = F)+
theme_bw()+labs(title="Age Distribution",y="Frequency",
caption="@Data Insights 2024"))+
ggthemes::theme_calc()
include_graphics("age.png")
europe_dem %>% dplyr::select(gender) %>%
table() %>% kable()
library(plotly)
td=ggplot(europe_dem,aes(purpose_of_travel,fill=type_of_travel))+
geom_bar(position = "dodge",stat="count")+theme_bw() +
labs(fill="Type of Travel",title = "Purpose of Travel Distribution",
y="Frequency",x="Purpose of Travel")+
ggthemes::theme_economist(base_size = 5)+
labs(caption = "@Data Insights 2024")
include_graphics("td.png")
europe_dem %>% dplyr::select(type_of_booking) %>%
table() %>% as.data.frame() %>% rename("Type of booking"="type_of_booking",
"Frequency"="Freq") %>%
kable()
europe_dem %>% dplyr::select(satisfaction) %>%
table() %>% as.data.frame() %>% rename("Satisfaction"="satisfaction",
"Frequency"="Freq") %>%
kable()
library(likert)
hws=as.factor(europe_num$hotel_wifi_service)
dac=as.factor(europe_num$departure_arrival_convenience)
eoob=as.factor(europe_num$ease_of_online_booking)
hl=as.factor(europe_num$hotel_location)
fad=as.factor(europe_num$food_and_drink)
sc=as.factor(europe_num$stay_comfort)
cre=as.factor(europe_num$common_room_entertainment)
ccs=as.factor(europe_num$checkin_checkout_service)
os=as.factor(europe_num$other_service)
c=as.factor(europe_num$cleanliness)
new_lik=data.frame(hotel_wifi_Service=hws,
departure_arrival_convinience=dac,
ease_of_online_booking=eoob,
hotel_location=hl,
food_and_drink=fad,
stay_comfort=sc,
common_room_entertainment=cre,
checkin_checkout_service=ccs,
other_service=os,
cleanliness=c)
lik=likert(new_lik)
invisible(likert.bar.plot(lik)+theme(legend.position = "bottom")+
theme_bw(base_size = 10)+
labs(title = "Respondents Distribution",
subtitle="0=not applicable,1=very dissatisfied,2=dissatisfied,3=neutral,4=satisfied,5=very satisfied",
caption = "@Data Insights 2024"))
include_graphics("likert.png")
library(MASS)
europe_model=europe %>%
dplyr::select(6:16)
europe_model$satisfaction=ifelse(europe_model$satisfaction=="satisfied",
1,0)
model=glm(satisfaction ~ .,data=europe_model)
summary(model)
model$coefficients %>% as.data.frame() %>%
rename("Service"=,"Magnitude"=".") %>%
arrange(desc(Magnitude)) %>% kable()