CDS505 Project LingLong-ZKer-PeiShan-PohLean Sheanlin
CDS505 Project LingLong-ZKer-PeiShan-PohLean Sheanlin
Minimizing hotel booking cancellation and maximizing the room occupancy rate are two
impactful indicators to drive hotel operation efficiency, lower the operation cost, increase the
hotel revenue and management performance. Therefore, it is common and important for the
hotel industry to practice and analyze hotel booking behaviours. Traditionally, data collected
through hotel surveys are used to analyze manually from a spreadsheet and presented in a
simple form of statistical charts.
In the current big data era, traditional data analysis is not able to examine the hidden
insightful meaning within the enormous amount of data. In this regard, the purpose of this
research is to demonstrate how visual analytics using Tableau able to change the traditional
statistical charts and present a better, clearer and meaningful visualization to outline the main
impact in hotel cancellation and booking demand.
This paper will discuss the principle and vis choice that has been chosen to cater to the
problem statement. Additionally, it will also cover how is each task abstraction being carried
out and how the vis idioms are being validated. The result will be shown using Tableau with
interactive control and the authors will highlight the insight meanings that have been derived
from each analysis.
Keywords: visualization analysis, visual analytics, visual design, data science, Tableau, hotel
management, hospitality industry, hotel booking cancellations, hotel booking behaviors, hotel
1
Chapter 1
Introduction
The rapid technological advancement has brought the world into the Big Data era. This
modern era is highly dependent on data for decision making than ever before. Data
visualization is the representation of data in a graph, chart, maps, or other visualization
idioms to exploit powerful patterns from the raw information. An effective visualization
system is aimed to augment human capabilities instead of replacing the human in the
decision-making process. In this study, a visual analysis will be carried out on a large dataset
of hospitality domain.
In the travel industry, global hotels and resorts receive millions of travellers every day. The
hotel guests can be classified into different types such as business travel, leisure travel or
family trip. Each hotel guest has their own travel purpose and different hotel experience
expectations. Since hotels have a fixed inventory and sell a perishable “product”, as a way to
make the right room available to the right guest, at the right time, hotels accept bookings in
advance (Antonio et al., 2017). In order to improve booking strategy and optimize room
occupancy, an analytics solution, with the adoption of visualization technology is applied to
discover the trend of hotel booking and cancellation.
Recently, much effort has gone into machine learning to predict a cancellation of hotel
booking. However, visually examining and analyzing the prediction results have not been
well presented. In this study, the analytical work in the hotels & hospitality industry is
focused on visual analysis of hotel booking and the likelihood of the reservation being
cancelled. The hotel marketing strategies can be refined through the analysis of customer
feedback, transactional activity, demographic data, hotel amenities and etc. In addition, data
visualization is essential to understand the distribution of booking cancellation based on the
relevant data such as leading time and customers' cancellation record in the past. The
visualization system to be used in this study is Tableau, an interactive data visualization
software which can help to see and understand the data with drag and drop feature to create
visualizations.
2
Chapter 2
Domain Background
A reservation is a binding contract consisting of mutual promises: the hotel agrees to provide
the accommodation at the quoted rate, and the client agrees to pay (Pestronk, 2017). Advance
booking is an arrangement to book for a hotel room before arrival, usually comes with an
option to cancel prior to the service provision. From the viewpoint of hotel management,
advance booking is a forecast performance, however, the option to cancel the reservation puts
the risk on the hotel to bear the cost of vacant capacity when the customer cancels a booking
or did not show up. In order to manage the risk of vacant capacity lost, some hotels
implemented an overbooking policy where the total number of rooms reserved exceeds the
total number of rooms available for the same period. This is a very important decision to
strategize the overbooking technique, overbooking that causing a customer unable to check-in
will bring a negative effect to hotel reputation.
To overcome this situation, visual analysis of hotel booking demand and cancellation from
the past history transaction records could help to discover cancellation trends and produce a
better forecast to reduce uncertainty in management decisions. There are multiple
visualization techniques and encoding will be applied based on the given dataset and derived
attribute. Through the various visualization idioms, the booking trends by periods,
cancellation rate and others underlying information will be able to be presented to the hotel
management in an intuitive way to perform further analysis for better decision making. This
information is useful to understand how many rooms a hotel should sell and which
overbooking techniques should be applied. Along with this, the insights from the
visualization can be utilized for marketing strategies planning to attract consumers attention,
improve guests satisfaction, increase customer retention and reduce the chance of
cancellation.
In the previous experience on the hospitality industry domain, data mining was done on a
dataset of hotel reviews in TripAdvisor, the purpose of the data mining was to analyze the
average ratings given by reviewers based on each hotel aspect. MapReduce techniques were
3
applied to compute the total reviews per hotel and the average rating across the aspects, there
was no visualization analysis performed on the dataset. The goal of the previous project was
to assist the travellers to make decisions on the most appropriate hotel by their preference.
Opposite to this study, the aim of this visualization analysis is to increase hotel revenue by
optimizing booking capacity. Nevertheless, they are co-relevant because the cancellation
policy is also one of the factors that influence customers’ preference.
4
Chapter 3
In this section, the dataset of hotels booking and cancellation transactions will be introduced
as well as exploring the data using data abstraction and further understanding the data in the
domain-specific using task abstraction.
3.1 Dataset
This hotel booking dataset was obtained from Kaggle (Kaggle, n.d.). The dataset contains
two types of hotel booking information which is city hotel and resort hotel. In addition, it
includes information about the market segment whether it is a corporate booking or a direct
booking. For the categorical variables, the “NULL” value in Agent and Company is not
considered as a missing value but rather as “not applicable”. Being said that, if a booking
“Agent” value is “NULL”, it means that the booking did not make through a travel agent.
There are also other variables in the table such as numbers of adults and whether or not the
family brings along children and babies. The visualization analysis is useful to discover the
peak season of hotel booking by month, what is the average duration of each stay and to find
out if the booking demand is higher over the weekend or weekday. Most importantly, the data
also provided the cancellation history of the customer and the booking channel of each
transaction.
The dataset type is a table which has 32 attributes and 119390 items in a comma-separated
values file format. The attribute type and descriptions further described in Table 1 below.
is_canceled Categorical Value indicating if the booking was canceled (1) or not (0)
5
lead_time Ordered Number of days that elapsed between the entering date of the
(Quantitative) booking into the PMS and the arrival date
stays_in_week_nights Ordered Number of week nights (Monday to Friday) the guest stayed or
(Quantitative) booked to stay at the hotel
6
distribution_channel Categorical Booking distribution channel. The term “TA” means “Travel
Agents” and “TO” means “Tour Operators”
is_repeated_guest Categorical Value indicating if the booking name was from a repeated
guest (1) or not (0)
assigned_room_type Categorical Code for the type of room assigned to the booking. Sometimes
the assigned room type differs from the reserved room type due
to hotel operation reasons (e.g. overbooking) or by customer
request. Code is presented instead of designation for anonymity
reasons.
days_in_waiting_list Ordered Number of days the booking was in the waiting list before it
(Quantitative) was confirmed to the customer
7
customer_type Categorical Type of booking, assuming one of four categories: Contract -
when the booking has an allotment or other type of contract
associated to it; Group – when the booking is associated to a
group; Transient – when the booking is not part of a group or
contract, and is not associated to other transient booking;
Transient-party – when the booking is transient, but is
associated to at least other transient booking
adr Ordered Average Daily Rate as defined by dividing the sum of all
(Quantitative) lodging transactions by the total number of staying nights
total_of_special_requests Ordered Number of special requests made by the customer (e.g. twin
(Quantitative) bed or high floor)
reservation_status_date Ordered Date at which the last status was set. This variable can be used
(Ordinal) in conjunction with the ReservationStatus to understand when
was the booking canceled or when did the customer
checked-out of the hotel
There is a list of analysis tasks in terms of domain problems that need to be explored in order
to discover and identify the potential trend and insights from the hotel booking dataset.
1. What is the overall trend of hotel booking demand and cancellation rate from Aug
2015 to Aug 2017 and future?
2. What are the features that contributed to the booking cancellation?
3. What is the hotel pricing strategy and how to determine the sales price for each day?
4. What is the distribution of the market segment in the hotel booking demand?
5. Where countries are leading the race of hotel reservations?
8
These five domain problems on hotel booking demand and details could be resolved and
presented using visualization. The relevant attributes and items from the data will be
visualized by applying different marks and channels appropriately. Through a series of
visualizations, it could be helpful to the end-users, especially to the hotel management and
hospitality marketers to search and compare from the data distribution.
In the end, hospitality investors are able to make an absolutely important and smart decision,
while carrying out their marketing strategy planning to have a better growth in the hospitality
sector in future based on this visualization analysis.
9
Chapter 4
Related Work
A data article that was published by Antonio et al. (2018), the authors performed the
descriptive analytics which was used to further understand the patterns, trends and anomalies
in data (Antonio et al., 2017). The visualization tool that is used by the authors is the Tabplot
R package which is a common and powerful visualization method to the large datasets.
Figure 1 shows the City Hotel dataset partial visualization of all observations that were
visualized by the authors. The manipulated views are applied in order to represent the data
with many attributes as well as compare the trending across different periods. For instance,
the number of adults was encoded with blue colour bars and the multiple hotel meals were
encoded with hue stacked bars. Hence, the users can analyze these similar figures, it is
possible to verify that, for both of the hotels, the distribution of variables like Adults,
Children, StaysInWeekendNights, StaysInWeekNights, Meal, Country and
AssignedRoomType is clearly different between non-cancelled and cancelled bookings.
Figure 1: City Hotel Dataset Partial Visualization of All Observations (Antonio et al., 2018)
10
Besides that, the authors did a similar visual analysis for the different dataset from four resort
hotels but it is focused on circumventing the problems caused by booking cancellations, the
hotels implement rigid cancellation policies and overbooking strategies, which can also have
a negative influence on revenue and reputation (Antonio et al., 2017). Hence, the authors
encoded the data with a stacked bar chart to represent the booking cancellation ratio per year
among four hotels, where the red colour represented the cancellation rates ranged from 11.8%
to 26.4% as shown in Figure 2 below. Meanwhile, this visual diagram is a very good example
to the designer to show the relationship between one categorical (hotel) and one quantitative
attribute (booking rate), as well as easy to understand from a user perspective.
11
Chapter 5
To cater to the analysis tasks mentioned in the previous section task abstraction, there are
three marks that will be used as the core of the design space of visual encodings (Munzner,
2014), namely line, point and area, they will be described in different idioms. Table 2 shows
the usage of idioms for each domain problem.
Bubbles Plot √
Box Plot √ √
Area Heatmap √
Maps √ √
Pie Chart √
Both the line chart and bar chart are commonly using line marks to show the overall patterns
of data. Both are suitable to cater to one categorical attribute and one quantitative attribute
which is encoded with the horizontal spatial position channel and the vertical spatial position
channel respectively. In addition, they are useful for time-series data which compares the
changes over the same period of time for more than one group.
12
The points marks are mostly used to encode quantitative attributes. A popular vis idiom,
scatter plots encode two quantitative attributes and are presented in both vertical and
horizontal spatial positions. It is clear to show a positive or negative correlation between the
two attributes as well as to detect outliers. Furthermore, the bubble plot is the extension of the
scatter plot, but it is shown in a third dimension with a visual channel of size encodes the
numeric variable. The larger bubbles corresponding to higher rates (Yi, 2019). A perfect
visual method, called box plot idiom, which is used to encode numeric attributes in order to
know its statistics or behavior. For instance, ADR is a variable related to revenue and being
encoded to display median, higher/lower quartiles and maximum/minimum (ferdio, 2017).
Besides that, the area marks are being used such as the idioms of the heatmap, map and pie
chart. Idiom heatmap shows many keys and many values often recursively subdivide space
into many regions (Munzner, 2014). Hence, it is able to show a compact summary of a
quantitative attribute with the visual channel of saturation colour like the diverging
colour-map encoding. Next, the maps are used to see the localization or geography patterns
for suitable analysis tasks regarding the location such as country, while pie chart is applying
radial spatial layout which using area judgements to show the relative contributions of parts
to the whole, which means it aggregated the population data for the selected attributes.
A visual channel is important to be used to control the appearance of marks. In this project,
the hue colour encoding is frequently used to represent the categorical and quantitative
attributes based on the scenario. It can be used as a colour map to show the sum of ADR for
each country as well as represent the type of hotels using categorical colour.
Furthermore, there is a need to derive new attributes to aggregate the items or apply a new
formula. The purpose of this approach is to embed focus and context information together to
avoid info overloaded in a visual view. For instance, attribute “Lead Time” is aggregated into
meaningful bins to fulfil the visualization needs and attribute “Is Cancelled” is being used to
derive a new attribute called “Cancellation Rate” by using a simple mathematical calculation.
To increase the effectiveness of visualization, dynamic filtering should be provided, hence
the users can select and input the parameter to drill down the scope while presenting the
visualization. It is more flexible and allows the users to narrow what they want to see from
the vis idioms. Meanwhile, multiple views are being used to provide more relevant
information or details from different visual idioms encoding. So that, the user can attend to
13
other views without having motion-sensitive peripheral vision invoked (Munzner, 2014). In
terms of the enhancement or effectiveness of the multiple views, the technique of shared
encode is approached to enable the feature of linked highlights between the views. For
example, the items that are interactively selected in one view are immediately highlighted in
all other views with the same highlight colour (Munzner, 2014).
Last but not least, a perfect data visualization tool that has been used for analysis in this
project is Tableau. It is a powerful and fastest-growing tool in the Business Intelligence
Industry (Rungta, 2020). The main reason is that it doesn't require any technical skills such as
programming to construct or manipulate the visual idioms, hence it is easy to use for the
designers or developers. It also has a very feature set and allows the designers to build as well
as customize reports such as a single visual diagram or interactive dashboard with multiple
visuals.
14
Chapter 6
In the section, there are a couple of visual diagrams will be presented based on the task
analysis and visualization method mentioned above.
Domain Problem:
What is the overall trend of hotel booking demand and cancellation rate from Aug 2015 to
Aug 2017 and future?
Task:
Providing an overview of booking and cancellation.
15
As a hotel manager, the first thing that he wants to see is the general view of booking and
cancellation transactions across the timespan from Aug 2015 to Aug 2016. In Figure 3, the
idiom of line charts connects the data points by a line, it is the most effective way to display
time-series data. With the given dataset, this line chart visually implies the trend of booking
and cancellation transactions for the particular time frame. The line charts for City Hotel and
Resort Hotel have been arranged in top and bottom for more intuitive comparison.
As a result of visualization, the resort hotels are having stagnant business sales from 2015 to
2017. From a market perspective, resort hotels do not have much business growth over the
time and the sales are comparatively lower than city hotels. However, the total cancellation
count of resort hotels are pretty consistent with the total booking count, therefore the
forecasting of sales and cancellation rate will be more straightforward. In terms of city hotels,
there is consistent business growth over the years but the booking trend is more fluctuating
compared to resort hotels. The charts showed that the booking rate for city hotels during the
early and end of the year is lower where the peak season happened in the mid of the year. As
a hotel marketing plans perspective, the forecast for city hotels is more challenging because
of the unpredictable demand. A summary of line chart analysis is shown in Table 3 below.
What: Data Table: One quantitative value attribute, one time series key attribute
Strength Discover the trends of time series data clearly and enable the viewer
to make predictions.
16
6.1.2 Bar Chart
The bar chart is a common visualization tool for comparison. It is an alternative way to to see
the comparison of the entire booking records versus the cancellation count in this hospitality
industry. Figure 4 shows the total booking records throughout the periods and categorized by
months and years. The height of each bar represents the sum of booking including both City
Hotel and Resort Hotel for each month while the horizontal line represents the cancellation
records.
Tableau forecast function provided a fast and easy way to model the future based on past
history. The forecast for hotel booking demand and cancellation count are generated with just
a few clicks. The blue colour bar representing the actual transactions records and the orange
colour bar represent the forecasted value of booking and cancellation records. The quality and
precision of the forecast metric have been added to the ToolTip. It allowed the user to read
the value when they hover their mouse over the mark as shown in Figure 5. The quality
metric ranged from 0 to 100, the higher the value, the better the forecast quality. The
precision metric is expressed as a value range with 95 percent prediction interval. There are
some patterns that can be discovered from the chart such as the booking rate is higher in
Spring and Summer but lower in the Winter. A summary of bar chart analysis is shown in
Table 4 below.
17
Figure 5: Quality and Precision of Forecast Metric
What: Data Table: One quantitative value attribute, one categorical attribute
Strength Compare booking demand and cancellation count for each month
Domain Problem:
What are the features contributed to the booking cancellation?
Task:
Providing progressive information for booking cancellation.
In order to know the reason for cancellation, further analysis is performed to study the
relationship among cancellation rate, seasons, lead time, repeated guests and previous
18
cancellations. These data with multiple attributes are encoded into multiform views as shown
in Figure 8. Prior to visualizing the data, the seasons have been derived from the original
attribute “Arrival Date Month” as well as the range of lead time has been encoded into
multiple bins as categorical data, sample shown in Figure 6 and Figure 7 below.
According to Figure 8, the bar chart at the left-top shows the number of cancellation for each
season. It is the easiest way to present the distribution of the cancellation for each hotel. Most
of the travellers changed their summer plans which resulted in a spike of cancellation for both
hotels. In contrast, there is the lowest cancellation rate during the winter due to off-season. In
addition, the line charts at the right-top indicated the trending of booking cancellation from
“Is Repeated Guest” and “Previous Cancellation” which presented blue color and rainbow
color line respectively.
19
Figure 8: Interactive Dashboard for Cancellation Rates Analysis
As a result, there are 119 records showing the repeated guests have cancelled their booking in
July while there are almost a total of 5000 previous cancellation records from September to
November. These 5000 records represent 50% of total cancellations for the Fall season. The
vis in the bottom shows the trending of cancellation by the range of lead time. The most
interesting finding is the shorter the lead time, the higher the probability of the booking will
be cancelled. Furthermore, the hotel management can simply change the filtering to analyze
different output as the dynamic filtering is available at the right side of the dashboard. For
instance, they only want to know the trend of cancellation in 2015, then they just need to
select the year 2015 in the field of “Arrival Date Year”. A summary of multiple views of
cancellation analysis is shown in Table 5 below.
20
Table 5: Summary of What - Why - How Analysis for multiple views of cancellation analysis
How: Encode Bar Chart - Spatial Position, Size and Color Hue
Line Chart - Dot chart with connection marks between dots.
Strength Dynamic filtering is allowed, hence the user can simply change the
parameters and focus on selected items.
6.2.2 Multiple Views with Bar in Bar Chart, Packed Bubbles Plot and Box Plot
Other factors that could influence booking cancellation are the deposit type and the lead time
of the booking. In Figure 9, bubble charts offer an alternative way to present one-to-many
compassion by using the size and dual-colour hue. The blue colour bubbles are representing
city hotels and orange colour bubbles representing resort hotels. Surprisingly, the cancellation
rate is highest in Non-Refundable deposit types and it has happened in both resort hotels and
city hotels.
The idiom design choice of multiple views is also applied in this visualization, with the
support of the bar in a bar chart, users can easily compare the value. In this chart, colour and
size denote total bookings and cancellation. The length of each bar expresses the values of
each measure for a particular deposit type. From the analysis, although No Deposit has the
highest number of cancellation, the cancellation rate is only 30.47% for city hotels and
24.74% for resort hotels. From a hotel sales strategies perspective, cancellation of
non-refundable booking does not cause loss of income, but in terms of room occupancy
optimization strategies, there is an opportunity to re-adjust the overbooking rate to resale the
vacant room.
21
Figure 9: Cancellation Rate by Deposit Type
In addition, the box plot shows an aggregate statistical summary for the lead time of hotel
booking. There are five derived values to provide information about the lead time
distribution: the median (50% point), the lower and upper quartiles (25% and 75% points),
and the upper and lower fences. The plot shows that Non Refund deposit type has a higher
average lead time, a plan could always change especially when it booked too early which
explained the cancellation rate is higher for this deposit type. A summary of multiple views
of cancellation rate by deposit analysis is shown in Table 6 below.
Table 6: Summary of What - Why - How Analysis for multiple views of cancellation rate by deposit analysis
How: Encode Bar in Bar Chart - Spatial Position, size and color hue
Packed Bubble Chart - Point marks with size and color hue
22
Boxplot - Line, size and opacity
Weakness Packed bubble charts have difficulty to get exact values from the
unordered bubble sizes.
Figure 10: Top Countries with Cancellation in Map View without Zoom-in
Figure 10 above shows the top 10 countries of cancellation. The map sorted by the rank of
cancellation. The highest cancellation country is from Portugal, followed by the United
Kingdom, Spain, France, Italy, Germany, Ireland, Brazil, United States and last is Belgium.
Figure 11 shows the total of cancellation for each country in a table form.
Next by comparing the table view versus the map view, the display from the map view shows
significant data that the hotel management needs to be aware of which is 8 out of 10
cancellations happen in the continental of European countries. Map view allows the user to
zoom in and focus on the highlighted area like shown in Figure 12. Ranking displayed in the
23
map view is configured with a luminance colour code of red – yellow where red represents
the most top cancellation followed by orange and yellow shows the least. A summary of
symbol maps analysis is shown in Table 7 below.
24
Table 7: Summary of What - Why - How Analysis for symbol maps
How: Encode Circle points map to the given country and sequential map by
ranking with luminance color
Strength Easily locate the countries in Map View and able to spot the
frequent cancellation happen in the European countries
Domain Problem:
What is the hotel pricing strategy and how to determine the sales price for each day?
Task:
Providing the behaviour of average daily rates from different aspects and their correlations.
Scatter plot is effective in detecting outliers. As shown in Figure 13, there are pricing outliers
in the average daily rate. For hotels room rate with zero price, it can be interpreted as free
stays such as points redemption, rewards from hotel loyalty programs or free stay offered by
the hotel to selected guests in exchange for recognition on social media. However, the
average daily rate of 5,400 USD is unacceptable and it will be removed from the subsequence
analysis. Same as the lead time, the outlier will be filtered out in Tableau data visualization
worksheets.
25
Figure 13: Analysis of Average Daily Rate vs Lead Time
Figure 14 shows the scatter plot matrix is a technique used to determine the correlation
between a series of variables. This visualization idiom is suitable for this dataset because the
number of attributes does not pass the scale of a dozen attributes. The drawback of this
visualization tool is the speed to compute the visualization, it took about 10 seconds to
generate the view. In Figure 14 below, there is a strong relationship between stays in
weeknights and stay in weekend nights. If the demand for the weekdays is high, it is expected
the same will happen on the weekend. The rest of the attributes do not have any strong
correlation. Nevertheless, some patterns have been discovered, a longer length of stay has a
lower average daily rate and for last-minute booking, there is a chance of hitting the
maximum price.
Along with interactive filtering, the attribute selection can be done by the user. The checkbox
lists the market segment of the hotel industry and range slider for an average daily rate and
lead time. However, it needs some cognitive from humans to remember what has been
changed from one selection to another selection. A summary of SPLOM analysis is shown in
Table 8 below.
26
Figure 14: Analysis of Average Daily Rate with SPLOM
Strength Detect outliers from the dataset, compare attributes correlations and
find trends
27
6.3.2 Multiview Views with Boxplot
Hotel revenue optimization means setting the best possible price for the hotel rooms. Deposit
type is also one of the factors that influence the hotel pricing. From Figure 16, the hotel rate
for city hotels is consistently rising from the year 2015 to 2017 for all deposit types. In
regards to resort hotels, the average daily rate remained stagnant. Through the visualization
analysis, it is also found that the average daily rate for non-refundable deposit types is the
lowest whereby no deposit has a higher hotel rate. This is commonly seen in hotel booking
travel websites. According to these findings, hotel sales and marketing departments can
28
adjust the room rate based on the pricing patterns. A summary of Boxplot Chart analysis is
shown in Table 9 below.
Table 9: Summary of What - Why - How Analysis for Boxplot Chart analysis
How: Encode One glyph per original attribute expressing derived attribute values
using vertical spatial position, with 1D list alignment of glyphs
separated with horizontal spatial position (Munzner, 2014)
29
6.4 Analysis of Market Segment
Domain Problem:
What is the distribution of the market segment in the hotel booking demand?
Tasks:
Providing a comprehensive summary of market segments affect the reservation.
Pie charts always provide a comprehensive summary for categorical data which show the
distribution or trending. Figure 17 indicates the pie charts are being used to show the
distribution of market segments from different types of customers. As a consequence, there is
significant talk that most of the customers are likely to book both city-type or resort-type
hotels associated with contracts through offline or online travel agents. In addition, the
transient-party type customers prefer to book the hotel by group, instead of using a travel
agent, which can see the trending from 2015 to 2017. These transient-parties tend to stay at
the resort hotels. In contrast, only a very small portion of travellers will book the hotel using
other methods, such as corporate, aviation and complementary in 2017 compared to in 2015
whereas more and more guests make their reservation through the online platforms. Hence,
30
the hotel management or especially to the sales and market team can take this trending shown
here as a consideration in order to boost the booking rate in future. A summary of Pie Chart
analysis is shown in Table 10 below.
Table 10: Summary of What - Why - How Analysis for Pie Chart analysis
How: Encode Area marks (wedges) with angle channel, radial layout (Munzner,
2014)
Strength Summarize and present the large data set in visual, especially for
multiple classes of data.
Weakness Only show the overall patterns in terms of size of the area, but do
not easily reveal the exact values.
6.4.2 Heatmap
One of the options to show multiple views is to juxtapose them side by side to present two
views simultaneously for comparison. From a human judgement perspective, aligned views
support higher precision comparison than unaligned views. Figure 19 is a design choice of
shared encoding views with subsets of data, both are using heatmap to visualize the numbers
of cancellation transactions for each month. The one on the left is the cancellation rate for the
resort hotel and the right is for the city hotel.
31
Instead of showing the numbers of records for booking cancellation, a derived attribute was
created in Tableau to calculate the percentage of cancellation for each segment by month.
The attribute value for “Is Canceled” encoded with 1 and 0, “1” means Yes and “0” means
No. The cancellation rate is calculated by the sum of “Is Canceled” divided by the count of
records as shown in Figure 18. For example, if the booking record is 1 and cancellation
record is 1, the cancellation rate will be 100%.
The heatmap idiom could show two keys of dimensions and one measure value. In this chart,
the arrival month and market segment are the dimensions and the measured value is the
percentage of cancellation. Figure 20 shows the tooltip when the user hovers the mouse over
the cell. From the analysis, the overall cancellation rate is higher in city hotels compared to
resort hotels. In terms of market segment, group booking has the highest cancellation rate and
is followed by an online TA. Even though offline TA has a pretty high cancellation rate in
city hotels but it did not happen in resort hotels.
32
There is one weakness in the above chart which is missing in the intuitive understanding of
the cancellation counts for each segment. An user needs to hover the mouse over each cell to
see the counts and required human memory to remember the value. As a solution, additional
bar charts are added to present the overall cancellation counts by segment but at the same
time, it is also a trade-off with the display area. As shown in Figure 21, the multiple views are
appropriate to present in one screen. In Figure 22, as an interactive display, the specific
market segment can be selected from the right panel and all of the visualization charts will be
highlighted with the same segment at the same time. A summary of multiple views heatmap
and barcharts analysis is shown in Table 11 below.
33
Figure 22: Highlighting market segment from multiple views
Table 11: Summary of What - Why - How Analysis for multiple views heatmap and barcharts analysis
What: Data Heatmap: Tables - Two categorical key attributes with one
quantitative value attribute
Barchart: Tables - One categorical attribute with one quantitative
value
Weakness No indication of axis and it may cause confusion if the user is not
familiar with heatmap or meaningless.
34
6.5 Analysis of Top Countries
Domain Problem:
Where countries are leading the race of hotel reservations?
Task:
Providing an analysis result of top countries are the most active in travelling.
Figure 23 shows the visualization analysis in multiform view which relates to top countries
based on the historical hotel booking data. The intent here is to analyze which country brings
more profits to the hospitality industry. Many guests from European Countries lead the
reservations, especially all from South or Central Europe based on this distribution result. For
instance, Portugal (PRT) induces the most average daily rates with almost 4.5 Million.
Meanwhile, there is a fascinating finding where most of them are transient and transient-party
customer types and the guests from the United Kingdom (GBR) and Ireland (IRL) like to stay
at resort-type hotels.
Furthermore, this visualization is built based on shared encoding. Hence, the viewers can just
highlight an item in any view, at the same time the other views also will auto highlight the
linked data. An example is shown in Figure 24, assuming the viewer is selecting the country
Spain (ESP) on the map, then the different distributions will be shown in the linked data in
the other views. A summary of multiple views of top countries analysis is shown in Table 12
below.
35
Figure 23: Analysis of Top Countries
36
Figure 24: Linked Highlighting Between Views for a Selected Country
Table 12: Summary of What - Why - How Analysis for multiple views of top countries analysis
How: Facet A shared data choice is created based on the subset of top countries,
then only using partition into multiform views.
Task Discover and identity top countries were to contribute the most
reservation.
Weakness Fisheye lens idioms are not applicable to the map in this scenario.
37
Chapter 7
This section will be discussing the strength and weakness of the visualization approach,
lessons learned from this project and lastly future work that needs to be taken.
In this project approach, the interactive dashboard allows the user to select the marks to get
better visualization on the targeted data as they will automatically highlighted from the
connected views and by hovering each mark the vis will show the detailed data from the
Tooltip. In addition to that, small-multiple views are leveraged which is perceived as a
strength where different partitions of the dataset were made visible simultaneously side by
side. Examples were illustrated in Figure 19 and Figure 23. The juxtaposed views
combination allows users to have a quick glance between the views with minimal interaction.
The approach design choices were relatively easy to understand and interpreted to the users at
a conceptual level. Besides that, the parameters filtering is available in Tableau, hence either
the designer or viewer can easily select and deselect the parameter value in order to generate
the visualization based on their requirement or flavour. Derived dataset created from a
customized calculation also helps to provide additional insights to the end user.
38
One of the weaknesses or limitations of the current approach is not being able to present
insightful patterns especially on map viz idiom with the geographical data in the dataset.
Example shown in Figure 25 is the initial attempt to encode distribution of hotel selection by
country on map. The distribution of both property type city and resort hotel is difficult to be
interpreted from a user's point of view. This is due to having coordinates of the country only
and unable to have the geographic role to be grouped by ZIP, city or region. Therefore when
looking into the property type, the attributes can only be aggregated across countries and not
quantitatively encoded with monotonically decreasing luminance.
Data abstraction and task abstraction is recommended to consider tasks in abstract form
instead of domain-specific ways. There are a lot of similarities on what people want to
achieve in visualization. The same vis tool might be useful in different domains when they
shared the same task. After completing this project, we had learned that tableau is a powerful
data visualization tool that can be used in a variety of settings. Firstly, it is important to have
a function to aggregate context to visualize something meaningful in terms of creating and
deriving new attributes based on available data. Apart from those examples stated in the
analysis section, we had also created a ranking attribute to rank the countries with most
cancellation counts. Figure 26 shows the calculation field from Tableau.
From a designer perspective, we know to apply aright visual marks and channels to cater or
design the different analysis tasks which use existing or new attributes. A right approach
applied increases the effectiveness of a vis idiom and avoids misleading the information to
the viewers. For instance, a scatter plot is suitable to be encoded for two quantitative items to
discover the correlations and detect outliers in a quick way. Multiple views is effective in
comparison but there is always a trade off with display area.
39
7.3 Future work
For the future work of this project, the visual stories can be enhanced to a more robust and
interactive storyline dashboard by looking into the granularity of data. Through capturing the
data at the transaction level allows actionable insights on various dimensions including the
length of stay, lead time. The trend identified not just help in predicting hotel cancellations
but also to forecast revenue, and room resale rate. These derived attributes help to address
embedded information about a selected set as well as overview information about more of the
data—the context. Other than that, the next researcher can also analyze other attributes that
are not being listed in this analysis. In addition, the vis approach of focus and context would
then be the next desired approach where the visualization focus changes dynamically as the
user interacts with the system. The constrained navigation includes geometric zooming across
the regions of a certain country to analyze aspects like cancellation ratio by market segment
at the region level.
Last but not least, validation is important but also difficult because there are many
considerations that have to be made to meet design goals. It will be a bonus if a real case
study and end-users can be cooperated to experience the end-to-end processes from user
interview, problem definition, data and task abstraction. In the future work, we have to work
closely with a specific domain knowledge expert to iteratively refine visualization design. An
immediate form of validation is to interview the target audience to ensure we are not
mischaracterizing the problem, and the downstream form of validation is to observe the
report usage rate by the target audience and to get comments for improvement areas.
40
Chapter 8
Conclusion
Conclusion, throughout this research project indeed the analysis using Tableau helped hotel
management to develop additional visual analysis. This has been shown throughout the
analysis process from basic statistic charts into multiple interactive views of heatmaps,
scatterplot and other visual idioms. Although Tableau is great for non-technical analysts as
the user is able to based on their own intuitiveness by dragging and dropping the attribute that
they want and using the method of trial and error to create the visual charts but if the dataset
needs advanced data processing then the data cleaning processes should perform from a
different tool. Nevertheless, principles of effectiveness, expressiveness and encoding choices
have to be considered to choose the best solution. In addition to that, basic statistical
knowledge will still be an advantage in order for Tableau to generate optimum results using
the customize parameter function. As mentioned in the discussion section, this project allows
other analysis to analyse the future work and deploy an interactive storyboard to benefit the
end-user. Moreover, an advanced data science analyst can also combine the research analysis
using machine learning to perform data modelling and data evaluation.
41
References
Antonio, N., de Almeida, A., & Nunes, L. (2018). Hotel booking demand datasets. Data in
Antonio, N., Almeida, A., & Nunes, L. (2017). Predicting hotel booking cancellations to
decrease uncertainty and increase revenue. Tourism & Management Studies, 13( 2),
25–39. https://doi.org/10.18089/tms.2017.13203
https://www.travelweekly.com/Mark-Pestronk/Eclipse-or-no-hotel-reservation-is-bi
nding-contract
https://datavizproject.com/data-type/box-plot/
https://www.kaggle.com/jessemostipak/hotel-booking-demand.
Rungta, K. (2020, December 14). What is Tableau? Uses and Applications. Guru99.
https://www.guru99.com/what-is-tableau.html
https://chartio.com/learn/charts/bubble-chart-complete-guide/
Yingcai Wu, Furu Wei, Shixia Liu, Au, N., Weiwei Cui, Hong Zhou, & Huamin Qu.
42
Transactions on Visualization and Computer Graphics, 16(6), 1109–1118.
https://doi.org/10.1109/tvcg.2010.18
43