Europe Hotel Customer Satisfaction

Predictive Modelling
Customer Satisfaction Insights
Author

Data Analyst - Pythias C

Published

March 15, 2024

1 OVERVIEW

The task encompassed several key components, including data cleaning, demographic analysis, Likert scale analysis, predictive modelling using ordinal logistic regression, and drawing conclusions based on the findings.

Data Cleaning The data cleaning phase involved preparing the dataset for analysis by addressing issues such as missing values, outliers, and inconsistencies.

Demographic Analysis Demographic analysis involved examining the characteristics of the hotel customers, such as age, gender.

Likert Analysis The Likert scale analysis likely focused on understanding customer opinions, attitudes, and satisfaction levels using Likert scale questions. This involved assessing the level of agreement or disagreement.

Predictive Modelling with Ordinal Logistic Regression The task involved building a predictive model using ordinal logistic regression to understand the factors influencing customer satisfaction in European hotels.

Conclusion Based on the analyses conducted, the task likely culminated in drawing conclusions about the factors that significantly influence customer satisfaction in European hotels.

By integrating data cleaning, demographic analysis, Likert scale analysis, and predictive modelling using ordinal logistic regression, the task aimed to provide a comprehensive understanding of customer satisfaction in European hotels and derive actionable insights from the findings.

2 ABOUT DATASET

The dataset is from Kaggle. Here is the link to the dataset Europe Hotel Customer Satisfaction

Description of Columns

Age - 7 to 85

purpose_of_travel - aviation, academic, personal, business, tourism.

Type of Travel - Group travel, Personal Travel.

Type Of Booking - Group bookings, Individual/Couple.

Hotel wifi service - Ratings out of 5.

Departure/Arrival convenience - Ratings out of 5.

Ease of On-line booking - Ratings out of 5.

Hotel location - Ratings out of 5.

Food and drink - Ratings out of 5.

Stay comfort - Ratings out of 5.

Common Room entertainment - Ratings out of 5.

Check-in/Checkout service - Ratings out of 5.

Other service - Ratings out of 5.

Cleanliness - Ratings out of 5.

satisfaction - satisfied, neutral or dissatisfied.

3 IMPORTING DATASET

Code
europe=read.csv(file.choose())

First 10 rows of the dataset

Code
library(knitr)
library(dplyr)
europe %>% head(10) %>% kable(caption = "First 10 rows")
First 10 rows
id Gender Age purpose_of_travel Type.of.Travel Type.Of.Booking Hotel.wifi.service Departure.Arrival..convenience Ease.of.Online.booking Hotel.location Food.and.drink Stay.comfort Common.Room.entertainment Checkin.Checkout.service Other.service Cleanliness satisfaction
70172 Male 13 aviation Personal Travel Not defined 3 4 3 1 5 5 5 4 5 5 neutral or dissatisfied
5047 Male 25 tourism Group Travel Group bookings 3 2 3 3 1 1 1 1 4 1 neutral or dissatisfied
110028 Female 26 tourism Group Travel Group bookings 2 2 2 2 5 5 5 4 4 5 satisfied
24026 Female 25 tourism Group Travel Group bookings 2 5 5 5 2 2 2 1 4 2 neutral or dissatisfied
119299 Male 61 aviation Group Travel Group bookings 3 3 3 3 4 5 3 3 3 3 satisfied
111157 Female 26 business Personal Travel Individual/Couple 3 4 2 1 1 1 1 4 4 1 neutral or dissatisfied
82113 Male 47 academic Personal Travel Individual/Couple 2 4 2 3 2 2 2 3 5 2 neutral or dissatisfied
96462 Female 52 aviation Group Travel Group bookings 4 3 4 4 5 5 5 4 5 4 satisfied
79485 Female 41 tourism Group Travel Group bookings 1 2 2 2 4 3 1 4 1 2 neutral or dissatisfied
65725 Male 20 academic Group Travel Individual/Couple 3 3 3 4 2 3 2 4 3 2 neutral or dissatisfied

Cleaning column names

Code
library(janitor)
europe=clean_names(europe)
europe %>% names %>% as.data.frame() %>% 
  rename("column names"=".") %>% kable()
column names
id
gender
age
purpose_of_travel
type_of_travel
type_of_booking
hotel_wifi_service
departure_arrival_convenience
ease_of_online_booking
hotel_location
food_and_drink
stay_comfort
common_room_entertainment
checkin_checkout_service
other_service
cleanliness
satisfaction
  • cleaned column names

Dropping id column

Code
europe=europe[2:17] #dropped id column

Classes of dataset

Code
europe %>% str()
'data.frame':   103904 obs. of  16 variables:
 $ gender                       : chr  "Male" "Male" "Female" "Female" ...
 $ age                          : int  13 25 26 25 61 26 47 52 41 20 ...
 $ purpose_of_travel            : chr  "aviation" "tourism" "tourism" "tourism" ...
 $ type_of_travel               : chr  "Personal Travel" "Group Travel" "Group Travel" "Group Travel" ...
 $ type_of_booking              : chr  "Not defined" "Group bookings" "Group bookings" "Group bookings" ...
 $ hotel_wifi_service           : int  3 3 2 2 3 3 2 4 1 3 ...
 $ departure_arrival_convenience: int  4 2 2 5 3 4 4 3 2 3 ...
 $ ease_of_online_booking       : int  3 3 2 5 3 2 2 4 2 3 ...
 $ hotel_location               : int  1 3 2 5 3 1 3 4 2 4 ...
 $ food_and_drink               : int  5 1 5 2 4 1 2 5 4 2 ...
 $ stay_comfort                 : int  5 1 5 2 5 1 2 5 3 3 ...
 $ common_room_entertainment    : int  5 1 5 2 3 1 2 5 1 2 ...
 $ checkin_checkout_service     : int  4 1 4 1 3 4 3 4 4 4 ...
 $ other_service                : int  5 4 4 4 3 4 5 5 1 3 ...
 $ cleanliness                  : int  5 1 5 2 3 1 2 4 2 2 ...
 $ satisfaction                 : chr  "neutral or dissatisfied" "neutral or dissatisfied" "satisfied" "neutral or dissatisfied" ...
  • 5 character variables and 11 numerical variables

4 Data Cleaning

Missing values

Code
colSums(is.na(europe)) %>% kable()
x
gender 0
age 0
purpose_of_travel 0
type_of_travel 0
type_of_booking 0
hotel_wifi_service 0
departure_arrival_convenience 0
ease_of_online_booking 0
hotel_location 0
food_and_drink 0
stay_comfort 0
common_room_entertainment 0
checkin_checkout_service 0
other_service 0
cleanliness 0
satisfaction 0
  • no missing values

Duplicated entries

Code
anyDuplicated.default(europe)
[1] 0
  • no duplicated entries

Sub setting data

Code
europe_dem=europe %>%
  dplyr::select(1:5,16)

europe_num=europe %>%
  dplyr::select(6:15)

5 Demographic Analysis

1. Age Distribution

Code
library(ggplot2)
library(ggthemes)

invisible(ggplot(europe_dem,aes(x=age,fill=factor(age)))+
  geom_bar(stat="count",width=0.5,show.legend = F)+
  theme_bw()+labs(title="Age Distribution",y="Frequency",
                  caption="@Data Insights 2024"))+
  ggthemes::theme_calc()

Code
include_graphics("age.png")

  • Most customers are between the age of 40 - 60

2. Gender Distribution

Code
europe_dem %>% dplyr::select(gender) %>%
  table() %>% kable()
gender Freq
Female 52727
Male 51177
  • Most customers are females with a total of 52 727 as compared to males with a total of 51 177.

3. Purpose of Travel Distribution

Code
library(plotly)

td=ggplot(europe_dem,aes(purpose_of_travel,fill=type_of_travel))+
  geom_bar(position = "dodge",stat="count")+theme_bw() +
  labs(fill="Type of Travel",title = "Purpose of Travel Distribution",
       y="Frequency",x="Purpose of Travel")+
  ggthemes::theme_economist(base_size = 5)+
  labs(caption = "@Data Insights 2024")

include_graphics("td.png")

  • Most customers travel as a group rather than individually.

  • Customers who travel for the purpose of tourism they travel as a group dominating other purposes of travel.

4. Type of Booking Distribution

Code
europe_dem %>% dplyr::select(type_of_booking) %>%
  table() %>% as.data.frame() %>% rename("Type of booking"="type_of_booking",
                                 "Frequency"="Freq") %>%
  kable() 
Type of booking Frequency
Group bookings 49665
Individual/Couple 46745
Not defined 7494
  • Most bookings in this hotel are group bookings

5. Satisfaction Distribution

Code
europe_dem %>% dplyr::select(satisfaction) %>%
  table() %>% as.data.frame() %>% rename("Satisfaction"="satisfaction",
                                         "Frequency"="Freq") %>%
  kable() 
Satisfaction Frequency
neutral or dissatisfied 58879
satisfied 45025
  • Most customers are dissatisfied or neutral with the facilities of the hotel

6 Further Analysis

1. Services influencing customer satisfaction

Code
library(likert)

hws=as.factor(europe_num$hotel_wifi_service)
dac=as.factor(europe_num$departure_arrival_convenience)
eoob=as.factor(europe_num$ease_of_online_booking)
hl=as.factor(europe_num$hotel_location)
fad=as.factor(europe_num$food_and_drink)
sc=as.factor(europe_num$stay_comfort)
cre=as.factor(europe_num$common_room_entertainment)
ccs=as.factor(europe_num$checkin_checkout_service)
os=as.factor(europe_num$other_service)
c=as.factor(europe_num$cleanliness)

new_lik=data.frame(hotel_wifi_Service=hws,
                   departure_arrival_convinience=dac,
                   ease_of_online_booking=eoob,
                   hotel_location=hl,
                   food_and_drink=fad,
                   stay_comfort=sc,
                   common_room_entertainment=cre,
                   checkin_checkout_service=ccs,
                   other_service=os,
                   cleanliness=c)

lik=likert(new_lik)

invisible(likert.bar.plot(lik)+theme(legend.position = "bottom")+
                             theme_bw(base_size = 10)+
  labs(title = "Respondents Distribution",
       subtitle="0=not applicable,1=very dissatisfied,2=dissatisfied,3=neutral,4=satisfied,5=very satisfied",
       caption = "@Data Insights 2024"))

include_graphics("likert.png")

  • Customers are satisfied with the following services:

    other services, check-in check-out service, stay comfort, cleanliness and common room entertainment

  • Customers are neither dissatisfied or satisfied with the following services:

    food and drink, hotel location, departure or arrival convenience, hotel WiFi and ease of online booking

  • Suggestion: The management of the hotel should enhance the following services ,food and drink, hotel location, departure or arrival convenience, hotel WiFi and ease of online booking in order to increase customer satisfaction and loyalty.

7 Predictive Modelling

Application of Ordinal Logistic Regression

NB. Ordinal logistic regression requires three levels of factors on the response variable and this dataset has two levels of satisfaction on the response variable hence application of binary logistic regression

2. Relationship between predictor variables and dependent variable

Code
library(MASS)

europe_model=europe %>%
  dplyr::select(6:16)

europe_model$satisfaction=ifelse(europe_model$satisfaction=="satisfied",
                                 1,0)

model=glm(satisfaction ~ .,data=europe_model)

summary(model)

Call:
glm(formula = satisfaction ~ ., data = europe_model)

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   -0.444257   0.006781 -65.519  < 2e-16 ***
hotel_wifi_service             0.079337   0.001450  54.709  < 2e-16 ***
departure_arrival_convenience -0.056248   0.001007 -55.868  < 2e-16 ***
ease_of_online_booking         0.032373   0.001448  22.358  < 2e-16 ***
hotel_location                -0.012301   0.001206 -10.204  < 2e-16 ***
food_and_drink                -0.038266   0.001410 -27.147  < 2e-16 ***
stay_comfort                   0.072424   0.001432  50.583  < 2e-16 ***
common_room_entertainment      0.082391   0.001743  47.266  < 2e-16 ***
checkin_checkout_service       0.062746   0.001097  57.184  < 2e-16 ***
other_service                  0.038369   0.001355  28.325  < 2e-16 ***
cleanliness                    0.009213   0.001644   5.605 2.09e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.1751888)

    Null deviance: 25514  on 103903  degrees of freedom
Residual deviance: 18201  on 103893  degrees of freedom
AIC: 113890

Number of Fisher Scoring iterations: 2
  • All predictor variables are significant and most of them have a positive relationship with the satisfaction level of customers.

  • The estimate of 0.07937 for the “hotel wifi service” predictor in the binary logistic regression output indicates the change in the log odds of customers being satisfied for a one unit increase in the quality or availability of hotel wifi service, holding all other variables constant.

3. Most important services influencing customer satisfaction

Code
model$coefficients %>% as.data.frame() %>%
  rename("Service"=,"Magnitude"=".") %>% 
  arrange(desc(Magnitude)) %>% kable()
Magnitude
common_room_entertainment 0.0823914
hotel_wifi_service 0.0793373
stay_comfort 0.0724241
checkin_checkout_service 0.0627459
other_service 0.0383686
ease_of_online_booking 0.0323731
cleanliness 0.0092125
hotel_location -0.0123009
food_and_drink -0.0382663
departure_arrival_convenience -0.0562483
(Intercept) -0.4442572
  • The most important services influencing customer satisfaction are common room entertainment, hotel wifi service, stay comfort, check-in check-out service, other service, ease of online booking and cleanliness

  • Suggestion: The customer service department of the hotel should keep on increasing their delivery of the above service to increase customer satisfaction levels. They should also try to put more effort in making sure that their delivery of services with negative relationship to be excellent so as to enhance the customer satisfaction levels.


8 Code Appendix

Code
knitr::opts_chunk$set(echo = T, message = F, warning = F)

europe=read.csv(file.choose())
library(knitr)
library(dplyr)
europe %>% head(10) %>% kable(caption = "First 10 rows")
library(janitor)
europe=clean_names(europe)
europe %>% names %>% as.data.frame() %>% 
  rename("column names"=".") %>% kable()
europe=europe[2:17] #dropped id column
europe %>% str()
colSums(is.na(europe)) %>% kable()

anyDuplicated.default(europe)

europe_dem=europe %>%
  dplyr::select(1:5,16)

europe_num=europe %>%
  dplyr::select(6:15)
library(ggplot2)
library(ggthemes)

invisible(ggplot(europe_dem,aes(x=age,fill=factor(age)))+
  geom_bar(stat="count",width=0.5,show.legend = F)+
  theme_bw()+labs(title="Age Distribution",y="Frequency",
                  caption="@Data Insights 2024"))+
  ggthemes::theme_calc()

include_graphics("age.png")

europe_dem %>% dplyr::select(gender) %>%
  table() %>% kable()
library(plotly)

td=ggplot(europe_dem,aes(purpose_of_travel,fill=type_of_travel))+
  geom_bar(position = "dodge",stat="count")+theme_bw() +
  labs(fill="Type of Travel",title = "Purpose of Travel Distribution",
       y="Frequency",x="Purpose of Travel")+
  ggthemes::theme_economist(base_size = 5)+
  labs(caption = "@Data Insights 2024")

include_graphics("td.png")

europe_dem %>% dplyr::select(type_of_booking) %>%
  table() %>% as.data.frame() %>% rename("Type of booking"="type_of_booking",
                                 "Frequency"="Freq") %>%
  kable() 
europe_dem %>% dplyr::select(satisfaction) %>%
  table() %>% as.data.frame() %>% rename("Satisfaction"="satisfaction",
                                         "Frequency"="Freq") %>%
  kable() 

library(likert)

hws=as.factor(europe_num$hotel_wifi_service)
dac=as.factor(europe_num$departure_arrival_convenience)
eoob=as.factor(europe_num$ease_of_online_booking)
hl=as.factor(europe_num$hotel_location)
fad=as.factor(europe_num$food_and_drink)
sc=as.factor(europe_num$stay_comfort)
cre=as.factor(europe_num$common_room_entertainment)
ccs=as.factor(europe_num$checkin_checkout_service)
os=as.factor(europe_num$other_service)
c=as.factor(europe_num$cleanliness)

new_lik=data.frame(hotel_wifi_Service=hws,
                   departure_arrival_convinience=dac,
                   ease_of_online_booking=eoob,
                   hotel_location=hl,
                   food_and_drink=fad,
                   stay_comfort=sc,
                   common_room_entertainment=cre,
                   checkin_checkout_service=ccs,
                   other_service=os,
                   cleanliness=c)

lik=likert(new_lik)

invisible(likert.bar.plot(lik)+theme(legend.position = "bottom")+
                             theme_bw(base_size = 10)+
  labs(title = "Respondents Distribution",
       subtitle="0=not applicable,1=very dissatisfied,2=dissatisfied,3=neutral,4=satisfied,5=very satisfied",
       caption = "@Data Insights 2024"))

include_graphics("likert.png")


library(MASS)

europe_model=europe %>%
  dplyr::select(6:16)

europe_model$satisfaction=ifelse(europe_model$satisfaction=="satisfied",
                                 1,0)

model=glm(satisfaction ~ .,data=europe_model)

summary(model)
model$coefficients %>% as.data.frame() %>%
  rename("Service"=,"Magnitude"=".") %>% 
  arrange(desc(Magnitude)) %>% kable()
Back to top