Demand forecasting in hospitality using smoothed demand curves
DDemand Forecasting in Hospitality UsingSmoothed Demand Curves
Rik van Leeuwen • Ger Koole Ireckonu, Olympisch Stadion 43, 1076DE, Amsterdam The Netherlands Department of Mathematics, Vrije Universiteit, De Boelelaan 1111, 1081HV Amsterdam, The Netherlands [email protected] • [email protected] February 9, 2021
Abstract
Forecasting demand is one of the fundamental components of a suc-cessful revenue management system in hospitality. The industry requiresunderstandable models that contribute to adaptability by a revenue man-agement department to make data-driven decisions. Data analysis andforecasts prove an essential role for the time until the check-in date, whichdiffers per day of week. This paper aims to provide a new model, which isinspired by cubic smoothing splines, resulting in smooth demand curvesper rate class over time until the check-in date. This model regulatesthe error between data points and a smooth curve, and is therefore ableto capture natural guest behavior. The forecast is obtained by solvinga linear programming model, which enables the incorporation of indus-try knowledge in the form of constraints. Using data from a major hotelchain, a lower error and 13 .
3% more revenue is obtained.
Keywords—
Revenue Management, Forecasting, Cubic Smoothing Splines
In hospitality, revenue management (RM) systems typically consist of a forecastingmodel and a decision support model, where the success of the system heavily dependson the performance of both models and the interaction between them (Rajopadhye etal., 1999). Based on historical reservations, a forecast is created which serves as inputfor decision optimization that determines the rate given a capacity of the perishableproduct. Therefore, forecasting demand is one of the key inputs for a profitable RMsystem (Weatherford & Kimes, 2003). In the strongly related airline industry, a 10%increase in forecast accuracy increases revenue by 0.5-3.0% on high demand flights(Lee, 1990).To forecast demand, research has been conducted on different types of models suchas time series and machine learning techniques (Claveria et al., 2015). The modelstypically incorporate features such as seasonality or demand fluctuations to increasethe forecast accuracy. Whereas research has predominantly focused on the forecastaccuracy to compare the models, there has been less focus on the interpretability andexplainability. In the hospitality industry, it is seen as important (Andrew et al.,1990)). a r X i v : . [ s t a t . A P ] J a n edium-sized and larger hotel chains have dedicated teams of revenue managerswho determine the rates for future dates. They often make their decisions based ongut feeling and information on, for example, booking pace and occupancy. Bookingpace is defined as the speed at which reservations are made for a check-in date. Tomove towards more data-driven decision-making in the hospitality industry, a RMsystem can support revenue managers. As the success of a system highly dependson the willingness of revenue managers to adopt, it is important for a system to betransparent, also known as a ’white-box’ system (Loyola-Gonz ˜A¡lez, 2019)).Transparency can be created by sharing demand curves with revenue managers. Ademand curve represents the demand over time until a given check-in date, hereafterreferred to as demand scenario. Demand requests are typically used to create thedemand scenarios. As requests are constrained by the capacity of the property andthe rate of a room at a specific point in time, unconstrained optimisation methodsare typically used to forecast demand (Queenan, 2009). The benefit of unconstrainedoptimisation methods for RM systems is between 0 . − .
0% in terms of revenue growth(Weatherford & Polt, 2002).The development of demand for a demand scenario is dynamic; e.g., demand usu-ally increases over time towards a check-in date (Haensel & Koole, 2010). To capturethese dynamics, splines are often applied as unconstraining optimisation to forecast de-mand because they are sufficiently flexible. A spline is a piecewise polynomial function,which can extrapolate and is therefore a suitable technique. This technique is usedin other applications as well; e.g., in water demand models (Santopietro et al., 2020)and cost transportation models (Rich, 2018). The essence of the cubic splines modelis applied in this paper, which is based on a third-degree polynomial as interpolant.This paper proposes a white-box forecasting framework that takes into account thepractical side of RM. The cubic smoothing spline model has never been used in thehospitality industry and can also be used on other industries such as the airline andcar rental industry. The framework is created based on an extensive data analysis inSection 3, after a description of the data in Section 2. The forecast model is presentedin Section 4, followed by the decision support model in Section 5. The accuracymeasurements are given in Section 6. The model is tested in a controlled environmentin Section 7. The empirical results are discussed in Section 8 and the conclusion anddiscussion are presented in Section 9.
The data is pulled from a Property Management System (PMS), a system used in thehospitality industry to register and regulate reservations. Table 1 shows the attributesattached to a reservation. Person-related attributes are not included because of theGeneral Data Protection Regulation (GDPR). Only relevant attributes are shown inthe table. The hotel that is used for this research has a single room type and thereforeroom type is not included.The rate of a reservation is the total amount, however, the rate per night may differif the length of stay is larger than one. An additional table is joined to the reservationtable, where rates for individual nights are stored. This one-to-many relationshipbetween the reservations table and rates table provides individual night rates. Forthis research, multi-stay reservations are transformed into individual stay nights. Theindividual night rates are used as demand requests to create the demand scenarios. eservations with a rate equal to 0 are excluded. These are part of a complimentaryservice where a guest stayed in the hotel without a charge. These guests are mostlypart of the hospitality chain itself or part of marketing campaigns (e.g., social mediainfluencers). Variable Description
Arrival date Date of arrival of reservationDeparture date Date of departure of reservationLength of stay Number of days between arrival dateand departure dateBooking date Date that the reservations was madeStatus Status of reservation, either ’stay’,’cancellation’ or ’no-show’Rate Total amount in euroGroup Boolean indicating part of groupreservationSource Main channel reservationSub source Specified channel of reservationMarket code Statistical information on bookingand revenue developmentTable 1: Reservation attributes and descriptions
Data of a single property, located in the Netherlands, is included from the 1stof January 2017 until the 31st of December 2019. With three years of data, yearlyseasonality and trends over time can be captured.
The following key performance indicators (KPIs) are used in the hospitality indus-try to analyse the performance of a property according to Pizam (2010) and Mauri(2012). Average Daily Rate (ADR) is the sum of room rates from all the checked-outreservations divided by the number of occupied rooms. Revenue Per Available Room(RevPAR) is the sum of room rates from stayed reservations in the hotel divided bythe capacity. Occupancy (Occ) is the ratio between the number of occupied roomsand the capacity. Although some rooms are out-of-order or complementary , the KPIsin this paper are based on the full capacity of the property.Reservations can only have a single status: stay, canceled, or no-show. On average,74 .
2% of the nights are marked as stay, 23 .
3% of the nights as canceled, and 2 .
5% of thenights as no-show. Table 2 shows the reservations by status per year in percentages. In2019, the cancellation rate is slightly lower compared to 2017 and 2018. The no-showrate is consistent over the years.
Stay Cancellation No-show
Overall
The ADR by status per year is shown in Table 3. Over the years, the ADR ncreased, indicating the room rate increased over time. ADR for canceled and no-show also increased over the years. RevPAR for 2017, 2018, and 2019 is 71 .
9, 73 . .
6, respectively. RevPAR only includes reservations with status stay.
Stay Cancellation No-show Overall
Overall
The hospitality industry is characterised by seasonality throughout the months anddays stated by Shields (2013). When seasonality is poorly understood, it may lead toa slow pace of sales or even lost sales. In this section, an analysis is presented of theseasonality by month and day.
Table 4 shows the three KPIs by month over the available dataset. Differences overthe months are consistent throughout the year for each KPI. Unlike resort propertiesin popular vacation destinations (Chu, 2009), this property doesn’t experience anyKPI fluctuations by month.
ADR RevPAR Occ (in %)
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Table 5 shows the three KPIs by day over the available dataset. The property typi-cally attracts business guests during the week and leisure guests during the weekend.On Friday and Saturday, the occupancy is highest. On Sunday, the occupancy is low-est because leisure guests depart on Sunday and business guests arrive on Monday.Because of lower demand on Sunday, the ADR is lowest. DR RevPAR Occ (in %)
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
A detail view of ADR by day is displayed in Figure 1. Except Sunday, there is anincrease in ADR over the years. This trend is present for every day, except Sundays,where there is a decrease in ADR over the years. The highest variability is on Saturdayin 2019.
Figure 1: Metric ADR per day of week by year
A detailed view of occupancy by day is displayed in Figure 2. The range of occu-pancy by day varies over the years. Overall, 2018 varied less in occupancy comparedto 2017 and 2019.For seasonality patterns by day, the relationship between the booking day of thereservation and the day of the stay is often analysed. This feature is engineered becauseit is not part of the original dataset. Table 6 shows the relationship between the daythe reservation has been made and the day the guest spent a night at the property.For example, guests who stay on a Monday reserve the most on Mondays (19 . . Mon Tue Wed Thu Fri Sat SunMon
Tues
Wed
Thu
Fri
Sat
Sun
Lead time is the number of days between the booking date and arrival date. Thisvariable is not present in the dataset, and is therefore engineered. Lead time 0 is thelast day a guest can make the reservation. The maximum lead time is the check-indate, which is 365 days in the available dataset. This range implies reservations can bemade one year in advance. For analysis purposes, the booking horizon is used as therange of making a reservation, where 0 is the first day and 365 as the last day to makea reservation. By applying the booking horizon, time is progressing over an x-axis.By analysing this feature, more knowledge is gained when guests make reservations,which is important in demand forecasting. nly 1% of reservations are made in the range 0 - 146 days before the check-indate. Around 5% of the reservations are booked between 0 −
252 days. This impliesthat 95% of the reservations are made in the last 100 days before check-in date. In thelast 7 days before the check-in date, 27% of the reservations are made. In Figure 3,the density over the last 100 days before the check-in date is shown. The cumulativedensity shows a smooth gradient towards 100%.
Figure 3: Density for lead time 265 - 365, left individual and right cumulative
Reservations made at t of the booking horizon for stays on specific day of theweek (see Section 3.1.2), the density of the booking horizon is further analysed byday in Figure 4. The density doesn’t drop at the same time t , indicating that theunderlying demand until the check-in day is influenced by booking day. For example,for Tuesdays, fewer reservations are made in weekends, whereas for Saturdays, thedensity is more granular over the booking horizon. Model
A new method is proposed to forecast the demand in hospitality, which estimates asmooth unconstrained curve. A demand curve is defined as the number of guests whoare willing to buy a product over a period of time. In this paper, a product is a roomfor a specific rate. The booking horizon is defined from the first day a room can bereserved until check-in date T , where t = 1 , ..., T . The advantage of this method isthat the unconstrained demand curve is estimated without any assumptions on theshape by applying ideas similar to cubic smoothing splines. A spline is a piecewise polynomial function that only contains non-negative integerpowers of x . The polynomials can be determined by spline interpolation, where theinterpolant is a third-degree polynomial. Spline interpolation is preferred to poly-nomial interpolation because it avoids the problem of Runge’s Phenomenon, whereoscillation occurs at the edges of the interval (Epperson, 1987). A spline S ( x ) func-tion is defined for a set of n observations { x k , y k : k = 1 , ..., n } , where a single y valueexists for each x :1. S : R → R S ( x ) is a polynomial of degree 3 of each subinterval [ x k , x k +1 ] with k = 1 , ..., n − S ( x k ) = y k ∀ k = 1 , ..., n .The spline S ( x ) is a combination of i , for i = 1 , ..., n −
1, polynomials of degree 3,which is defined as: S ( x ) = C ( x ) , for x ≤ x < x ...C i ( x ) , for x k ≤ x < x k +1 ...C n − ( x ) , for x n − ≤ x < x n (1)Subjected to several conditions, the system can be solved and so the spline functioncan be defined for a given set of observations. These conditions are listed below: • C i ( x i ) = y i and C i ( x i +1 ) = y i +1 ∀ i = 1 , ..., n − • C (cid:48) i ( x i ) = C (cid:48) i +1 ( x i ) ∀ i = 1 , ..., n − • C (cid:48)(cid:48) i ( x i ) = C (cid:48)(cid:48) i +1 ( x i ) ∀ i = 1 , ..., n − • C (cid:48) ( x ) = 0 and C (cid:48) n − ( x n ) = 0The last condition is known as the natural boundary condition, which preventsthe spline to oscillate. The number of coefficients to be estimated is 4( n − C i = a i + b i x + c i x + d i x for i = 1 , ..., n −
1. This form of interpolation is known ascubic splines because the polynomial is a cubic.This method can be applied in hospitality where ˆ S represents the demand curvefor a specific demand scenario. Value x represents the number of days of the bookinghorizon and value y represents the number of reservations. When there are multiple y values, indicated by the vector Y , per x , the optimizationproblem changes, since the spline cannot be defined in each point of y for a given x . Therefore, a smoothed spline is required. Given a set of n observations { x k , Y k : = 1 , ..., K } , modelled by the relation Y k = S ( x k ) + ε k where ε k is independent andidentically distributed.The cubic smoothing spline estimate ˆ S of the function S , where g represents thesmoothing parameter, is defined asmin (1 − g ) (cid:88) k { ˆ S k − Y k } + g (cid:90) ˆ S (cid:48)(cid:48) ( x ) dx (2)In Equation (2), the first term measures the closeness to the data and the secondterm penalizes the curvature in the function. The smoothing parameter, g , controlsthe balance between these terms and should be larger than 0 and smaller than 1. If g = 0, it indicates no smoothing and the result function is completely focussed on thedata. If g = 1, it indicates infinite smoothing and the result is a straight function.This model is limited due to the inability of adding constraints that representsindustry knowledge, i.e., demand cannot drop below zero and dependencies betweenmultiple rates. The essence of the cubic smoothing spline model is transformed into a linear functionto avoid quadratic minimization. Therefore, both terms of the minimization functionare transformed into linear terms. The first term is transformed by taking the absolutevalue between the data point and fit. In doing so, the function becomes more robustagainst outliers. The second term is transformed to the second difference, d (2) . Thefirst difference is defined as d k = S ( x k − ) − S ( x k ). As long as the second derivativeexists and is continuous, the second difference is defined. The smoothing parametercontrols the balance between the two terms and should be larger or equal than 0 andsmaller or equal than 1. The linear minimization function is defined as follows:.min (1 − g ) (cid:88) k | e k | · w k + g n (cid:88) k =3 | d (2) k | (3)The w k variable represents the weight of each distinct y value to reduce the numberof constraints. The sum of this variable is equal to 1, so the weight per x valueis set to 1 divided by the number of y values in k . The polynomial is defined as C i = a i + b i x + c i x + d i x for i = 1 , ..., n − e and d (2) , is added to model the absolute values. • e k ≥ S ( x k ) − Y k ∀ k = 1 , ..., n • e k ≥ − ( S ( x k ) − Y k ) ∀ k = 1 , ..., n • d (2) k ≥ d k − − d k ∀ k = 3 , ..., n • d (2) k ≥ − ( d k − − d k ) ∀ k = 3 , ..., n • C k ( x k ) = C k +1 ( x k ) ∀ k = 1 , ..., n − • C (cid:48) k ( x k ) = C (cid:48) k +1 ( x k ) ∀ k = 2 , ..., n − • C (cid:48)(cid:48) k ( x k ) = C (cid:48)(cid:48) k +1 ( x k ) ∀ k = 2 , ..., n − • C (cid:48) ( x ) = 0 and C (cid:48) n − ( x n ) = 0 • e k ∈ R + ∀ k = 1 , ..., n • d k ∈ R + ∀ k = 3 , ..., n • a i , b i , c i , d i ∈ R ∀ i = 1 , ..., n − n additional constraint that is required is the demand to be larger than 0 overthe booking horizon. If this restriction is not added, it implies that rooms are givenaway for free. This constraint is modeled as follows: • S ( x k ) ≥ ∀ k = 1 , ..., n At least three distinct x -values are required to derive the second difference for thecurvature penalty.The number of polynomials, C , is equal to the distinct x -values minus 1. Eachpolynomial contains 4 coefficients. The error between the data and the fit is denotedby e . The number of e decision variables is equal to the number of distinct x values.The number of decision variables to model the curvature penalty d is ( n − × ( n −
1) + n + ( n − ×
2. For example, when theinput includes 4 weeks, or 28 days, the total number of decision variables is equal to324.
Observations with multiple y values for a single x can indicate arrivals for different rateson a specific day. As multiple rates for a single room are common in the hospitalityindustry, the cubic spline model is adjusted. Rates are denoted as r , for r = 1 , ..., R .Given N observations { x r,k , Y r,k : k = 1 , ..., n ; r : 1 , ..., R } where Y r,k representsthe number of sales for a given rate on a given day on the booking horizon. The spline S r now represents the demand curve for a rate r :min (1 − g r ) (cid:88) r (cid:88) k | e r,k | · w r,k + g r (cid:88) r n (cid:88) k =3 | d (2) r,k | (4)An additional constraint is added to the LP model to incorporate that people arealways willing to book a room if the rate of a room is lower than their willingness-to-pay (Haensel & Koole, 2010). For example, when a room can be booked for a rateof 300 then that person was also willing to book that room for a rate of 200 or even100. The rates of 100, 200, and 300 are defined as A, B, and C, respectively. RateA represents the lowest rate for a room. These typical choice sets are defined as theset of alternatives in order of preference. For the example, choice set A, B, C implythat guests are willing to spend more than 100, choice set B, C imply that guests arewilling to spend more than 200, and choice set C imply that guests that are willing tospend 300 or more. Given choice sets are strictly ordered, an additional constraint isadded to the linear model: • S r ≥ S r +1 ∀ r = 1 , ..., R − × ( n −
1) + n + ( n − × × r . .4 Residuals To assess the goodness-of-fit, the residuals are analysed. The residuals are the dif-ference between the actual values and estimated values. For each t of the bookinghorizon, a transformation to the input values is applied to ensure the residuals areindependent of demand.Arrivals follow an inhomogeneous Poisson process with a rate of λ t (Lee, 1990).Consequently, it is expected that the standard deviation of the residuals increases (andthus the goodness of fit decreases) when the rate increases. To stabilize the standarddeviation, the square root of the original data points is taken as input to the model(Brown et al., 2001). This transformation is known as Anscrombe transform. The performance of the forecasting model can be expressed in revenue by applyingdecision optimization. The performance is expressed in revenue since that is the maindriver for the hospitality industry. Dynamic programming (DP) is a widely studiedand adapted technique for decision optimization (Talluri & Ryzin (2004a) and Tal-luri & Ryzin (2004b)). DP uses demand estimates in order to maximize revenue giventhe capacity of a property. This section provides a brief description of the applicationof DP in hospitality.As multiple bookings are not considered per t on the booking horizon to ensurerate changes after each sold item, the time is divided into small intervals such thatonly one booking can be made in an interval. The variable r j denotes a rate, for j = 1 , ..., R . The different classes are ordered as follows: r > r > ... > r j > ... > r R .The probability of a booking for rate r j at a time t is denoted as λ ( r j , t ). Parameter λ ( r j , t ) assumes an inhomogeneous Poisson process, where λ ( r j , t ) < V ,which is defined as: V t ( x ) = max j ∈{ ,...,R } { λ ( r j , t ) · r j + V t +1 ( x −
1) + (1 − λ ( r j , t )) · V t +1 ( x ) } (5)By following Bellman’s principle for optimality, the value V t ( x ) denotes the totalexpected value at time t , where time t = 1 represents the day a room is available fora booking and t = T the check-in date, given capacity x . The total expected revenueis obtained by calculating V ( x ), given a capacity x .When solving the value function V , two boundary conditions exist. Firstly, thevalue function becomes 0 for any t when no capacity is left (that is, V t (0) = 0).Secondly, the value function becomes 0 for any T when there is no time left to makea reservation (that is, V T ( x ) = 0). Accuracy is measured between the observed value a and the forecasted value f . Ob-servations a , used for fitting a model, are referred as in-sample data points. Whereobservations a , used to measure accuracy, are known as out-of-sample data points.A distinction needs to be made between results that use in-sample data points andout-of-sample data points.To measure the accuracy of the model, the Weighted Absolute Percentage Error(WAPE) is applied: AP E = (cid:80) nt =1 | a t − f t | (cid:80) nt =1 a t (6)Amongst the accuracy measurements, WAPE is the most appropriate metric sinceit takes into account the actuals by weight. This metric sets importance on highervalues of actual values. Unlike, for example Mean Percentage Error (MPE) and MeanAbsolute Percentage Error (MAPE), WAPE is defined when actual demand is zero attime t .The result indicates the closeness between the observed values a and forecastedvalues f . Demand is assumed to follow a Poisson distribution. This results in abaseline error due to the variability of the distribution.Given the demand curves over time for each choice set, the decision support modeloptimises the expected revenue given the capacity. The revenue of the actual resultof a demand scenario is compared to the optimal expected revenue. To summarize,performance is expressed in two ways, WAPE and revenue. The WAPE is used toexpress the fit of the demand curve to the data points and revenue indicates theperformance of the model from a business perspective. To test the validity of the cubic smoothing spline model, simulations are performedusing sets of three known ’individual rate’ classes (see Figure 5). Following the def-inition of rates in Section 4.3.1, the ’customer choice set rates’ are created. That is,choice set rate class 2 is the sum of individual rate class 1 and 2 and choice set rateclass 3 is the sum of individual rate class 1, 2 and 3. Guests are always willing to spendless for the same type of room, and thus the demand is added. In the simulations, thebooking horizon is set to 28 days, where t = 28 is the check-in date. Figure 5: Simulation scenario with demand curves (L) and customer choice setdemand curves (R)
Table 7 shows the functions used for the three ’individual rates’. Rate class 1represents guests who are looking for the best deal. Close to the end of the bookinghorizon, demand decreases due to increasing pressure from competition. Rate class 3 epresents business guests who book a room last minute and only make reservationsnear the check-in date. Rate class 2 is a combination of the two previously mentionedguests and is simulated as constant demand. Rate Curve Parameters1 . ∗ sin( t ) . ∗ t . ∗ exp( . ∗ t )Table 7: Parameters of rate classes used for simulation Demand at time t , denoted as λ t , is simulated by setting a single rate for a given t of the booking horizon. A rate is open by sampling from a trial, where each rateclass has a chance to be open. Finally, a Poison distribution with rate λ t is appliedto perform a simulation of arrivals. Cancellations or no-shows are not included in thesimulations, implying data isn’t generated when maximum capacity is reached.To calculate the expected revenue, a price is assigned to each rate class. The threerates classes and their corresponding prices are defined as r = { , , } . Theexpected optimal revenue is 17 ,
327 given a capacity of 100 rooms, which has an ADRof 173 . { . , . , . } and the right-hand-side is the result with smoothingparameters { . , . , . } . The optimal expected revenue with smoothing parameters { . , . , . } and { . , . , . } are equal to 16 ,
819 and 17 , . .
2, respectively.
Figure 6: Cubic smoothing spline model based on 50 simulated demand scenarioswith two different smoothing parameter settings
The original and fitted demand curves are shown in Figure 6, where the blacklines are the fitted curves and the grey lines the original curves. The WAPE perrate class is 7 . , .
5, and 81 . { . , . , . } and6 .
11, 12 .
26 and 44 .
07 with smoothing parameter settings { . , . , . } . These resultscan be marked as in-sample of the model. A new set of 100 demand scenarios aresimulated from the original demand curves to analyze out-of-sample results. Thenewly simulated demand scenarios are compared to the original demand scenarios andthe fitted demand scenarios by the model. Figure 7 shows per rate class two box plots, here the WAPE with the original demand curves as fitted values is compared to theWAPE with the demand curves from the model as fitted values. The mean as well asthe variation are comparable, indicating that the LP model is very precise. Figure 7: Out-of-sample results based on individual demand scenarios expressedin WAPE for original demand curves and estimated demand curves with smooth-ing parameter settings { .7, .8, .9 } To test the robustness of the cubic smoothing spline model, the relationship betweenthe input and output is analyzed. These simulations use the demand curves presentedin Figure 5. In the simulation setting, two input parameters can be changed: (1) thenumber of demand scenarios and (2) the smoothing parameter. The output of themodel is expressed in WAPE and revenue (see Section 6).The number of demand scenarios is one of the input parameters that can be con-trolled. As there is always a trade-off between increase in accuracy and increase incomputational time, the simulation setup includes 10 to 50 demand scenarios to un-derstand the effect of the number of demand scenarios. The setup is executed 10times. In total, 410 runs are executed. The smoothing parameters are fixed for bothsmoothing parameters settings { . , . , . } and { . , . , . } . Figure 8 shows the sensitiv-ity of the number of demand scenarios, with the number of demand scenarios on thex-axis and the revenue per simulation set-up on the y-axis. The mean is steady overthe range of demand scenarios for both smoothing parameter settings. The speed atwhich the confidence interval narrows decreases slowly when more demand scenariosare included. Therefore, including more demand scenarios lead to a stable result whenthe demand curves remain equal. In this section, the results our model from the empirical data are discussed. Thecomplete dataset, containing data from 2017 to 2019, is used as input. First, theresults are presented. Second, the set-up of the model is explained. And finally, theperformance of the model is assessed.To provide additional insights about the model and the application of it, Thursdaysare taken as use-case. To create the customer choice set, the rates from every Thursdayin 2018 are included. The rates range between 70 and 170. A class is defined as 10units, and thus 11 rate classes are present in the data. Table 8 shows the number ofreservations made in the last 28 days before the check-in date. The customer choiceset, which is input for the model, is defined following the definition in Section 4.3.1.
Rate Reservations made Customer Choice Set logic70
201 2508
316 2192
481 1711
449 1262
362 900
256 644
246 398
126 272
87 185
55 130
75 75Table 8: Number of reservations for every Thursday in 2018
As 2018 counts 52 Thursdays, 52 demand scenarios are included. Therefore, thesmoothing parameters are set lower. If more data is available, the natural behaviorof demand patterns can be best captured by lowering the smoothing parameter. Therange is between 0 . . . . igure 9 shows the demand curves from the cubic smoothing spline model withrate 70 (black line) and rate 110 (gray line). The results are in line with the analysisin Section 3. The demand is increasing towards the check-in date and the demandis decreasing during days in the weekend (i.e., the curves flatten at t = 16 ,
17 and t = 23 , Figure 9: Results of model - demand curves for Rate 70 and Rate 110
To set up the cubic smoothing spline model, dates from 2017 and 2018 are used toforecast a date in 2019. Dates are only included in the input if the day is smaller asthe day that is forecasted. If a Tuesday is forecasted by the model, every Tuesdayuntil the forecasted date is used as input. The booking horizon is 100 days.The last 28 days before check-in are forecasted for each date in 2019. The first 72days of the booking horizon are used to fit the model. The optimal expected revenueof the first 72 days is compared with historical dates. WAPE is used to measure theaccuracy of the model over the first 72 days. A pitfall for only considering the revenueat t = 1 , ...,
72 could be that the behavior of two demand scenarios could deviateextremely.In total, 15 dates are selected as input for the cubic smoothing spline model.Figure 10 shows the revenue of Thursday the 6th of June 2019, a randomly chosendate. The left-hand side shows the revenue for all days in 2017 and 2018. The right-hand side shows the dates that are used as input for the model (only Thursdays). Notethat the scales of the y-axes in Figure 10 are different. In both figures, the black linerepresents the revenue on the 6th of June 2019. The WAPE including all days ragingbetween 17 .
35 and 746 .
85, with a mean of 116 .
40. The WAPE including the datesused as input for the model raging between 17 .
35 and 27 .
07, with a mean of 22 .
57. Bytaking into account the 15 closest days, measured in WAPE, contributes positively toselection of demand scenarios as input for the model.
Based on the 15 dates as input, demand curves are generated per rate class. Thesedemand curves per rate are used as input for the decision optimization algorithm toobtain the optimal expected revenue. The number of reservations made for the last28 days of the forecasted data are used as the capacity available. The actual revenueis compared with the optimal expected revenue and expressed in percent difference.The WAPE between the actual bookings and demand curves are generated to gatherdata about the closeness of the fit and actual data points. The smoothing parametersremain consistent among results, which is linear over 11 rates starting from 0 . . The performance of the model is measured in revenue between the actual revenue andthe optimal expected revenue and the WAPE of the fitted demand scenarios. Theresults are presented by day of the week.The dates selection in order to fit the model is evaluated first, which is based on15 dates that are the closest revenue in terms of revenue over the booking horizon t = 1 , ...,
72. For each day of week, starting on Monday and ending on Sunday, themean WAPE is respectively equal to 23 .
10, 22 .
22, 22 .
84, 22 .
07, 19 .
27, 18 .
61, and 21 . .
56, 8 . .
23, 11 .
30, 9 .
79, 11 .
86, and 7 .
40, respectively. Except for Saturday, the mean andstandard deviation are consistent throughout the week. On Saturday, the WAPE islower and the variability is higher.Figure 11 shows the percentage difference between the actual revenue, generatedby revenue managers, and optimal expected revenue by day of the week. Given thecapacity left at t = 73 and the output of the model, the optimal expected revenueis obtained via dynamic programming. The mean of each day of week, starting onMonday and ending on Sunday, is 14 . . . . . − . . . .
47% (31 . .
26% more revenue could have beenrealised in 2019.
The mean WAPE per rate by day of the week is presented in Table 9. In general,the mean WAPE is lower for lower rates and higher for higher rates. This is becausehistorically lower rates are set more often. Therefore, more data points are present inthe available set for lower rates than higher rates. High WAPE values are observedfor Wednesday rate 170 and Sunday rate 140 and rate 150. Low WAPE values areobserved for rate 70 and 80 on Monday, Wednesday, and Thursday. In general, thereis an increasing trend towards higher rates, however this trend decays toward thehighest rates. The variability amongst rates increases from the lowest towards thehighest rates. These findings are in line with the results of the simulation, where anincreasing trend is noticed when less data is available.
Mon Tue Wed Thu Fri Sat Sun Mean SDRate 70
Rate 80
Rate 90
Rate 100
Rate 110
Rate 120
Rate 130
Rate 140
Rate 150
Rate 160
Rate 170
Mean SD Conclusion and Discussion
This paper proposes a novel method inspired by cubic smoothing splines, which makesuse of linear programming. As input, the model uses reservations for specific rates overtime. As output, it provides demand curves per rate over a given booking horizon,demonstrating the booking behavior of guests.The data analysis shows that demand does not differ significantly throughout theyears 2017, 2018, and 2019. However, booking behavior differs among the day of theweek. These differences are expressed in standard hospitality KPI’s such as ADR,RevPAR, and occupancy. The day of booking, with respect to the check-in date,influences the demand significantly. The booking horizon is a variable that is essentialfor the proposed forecasting framework. There is a clear distinction when guests tendto make a reservation for a given day of week.The performance of the model is evaluated based on the fit to the data points aswell as optimal expected revenue. The model claims to generate 13 .
3% more revenuebased on the forecasted demand curves of each rate. The revenue difference of 13 . t of the booking horizon, minus 1 as thenumber of polynomials. To decrease the computational complexity, fewer polynomialscould be fitted. In doing so, the impact on the overall performance of the modelexpressed in WAPE and revenue need to be researched.The set-up to retrieve the results requires a ’warm-up period’, for which t = 0 tot = 72 of the booking horizon is used. For this period, days similar to the forecastingdate are selected and used. There is a potential revenue loss if RM optimisation isn’tapplied over this period. A simple strategy can tackle this warm-up period. Thisstrategy can forecast if a given date is low, medium, or high demand. For each ofthese labels, a yield policy can be applied to counter the potential revenue loss.To further improve the cubic smoothing spline model, special events (eg. concertsor conferences) can be included. These events tend to occur more often during theweekend days than weekdays. Adding the events can give more context around thecheck-in dates. eferences [1] Lee, A. Airline reservations forecasting: probabilistic and statistical models of thebooking process . Flight Transportation Laboratory Report, 232-236, 1990.[2] Andrew, W.P., Cranage, D.A., Lee, C.K.
Forecasting hotel occupancy rates withtime series models: An empirical analysis . Hospitality Research Journal, 14(2),173-182, 1990.[3] Santopietro, S., Gargano, R., Granata, F., de Marinis, G.
Generation of WaterDemand Time Series through Spline Curves . Journal of Water Resources Planningand Management, 146(11):04020080, 2020.[4] Claveria, O., Monte, E. and Torra, S.
A new forecasting approach for the hospi-tality industry . International Journal of Contemporary Hospitality Management,27(7):1520-1538, 2015.[5] Pizam, A. (ed.).
International Encyclopedia of Hospitality Management, sec. ed.
Oxford: Butterworth Heinemann, 2010.[6] Chu, F.L.
Forecasting tourism demand with ARMA-based methods . TourismManagement, 30(5):740-751, 2009.[7] Haensel, S., Koole, G.,.
Estimating unconstrained demand rate functions usingcustomer choice sets . Journal of Revenue and Pricing Management, 4(3):75-87,2010.[8] Mauri, A. G.
Hotel revenue management: Principles and practices . Milan: Pear-son, 13:511-512, 2012.[9] Talluri, K., Ryzin, G.
Revenue management under a general discrete choice modelof customer behavior . Management Science, 50(1):15-33, 2004a.[10] Talluri, K., Ryzin, G.
The Theory and Practice of Revenue Management .Springer, 2004b.[11] Epperson, J.
On the Runge Example . The American Mathematical Monthly,94(4):329-341, 1987.[12] Rich, J.
A spline function class suitable for demand models . Econometrics andStatistics, 14, 2018.[13] Shields, J., Shellemand, J.
Small Business Seasonality: Characteristics and Man-agement . Small Business Institute Journal, 9(1):37-50, 2013.[14] Brown, L.D., Zhang, R., Zhao, L.
Root-Unroot Methods for Nonparametric Den-sity Estimation and Poisson Random-Effects Models . Techical report, The Whar-ton School, Univ. Pennsylvania, 2002.[15] Rajopadhye, M., Ghalia, M.B., and Wang, P.P.
Forecasting uncertain hotel roomdemand . Proceeding of the American Control Conference, 132(1-4):1-11, 1999.[16] Loyola-Gonz ˜A¡lez, O.
Black-Box vs. White-Box: Understanding Their Advantagesand Weaknesses From a Practical Point of View . IEEE Access, 7(1):154096-154113, 2019.[17] Queenan, C.C., Ferguson, M., Higbie, J., Kapoor, R.
A Comparison of Uncon-straining Methods to Improve Revenue Management Systems . Production andoperations management, 16(6), 2009.[18] Weatherford, L. R., and Kimes, S. E.
A comparison of forecasting methods forhotel revenue management . International Journal of Forecasting, 19(3):401-415,2003.[19] Weatherford, L.R., Polt, S.
Better unconstraining of airline demand data inrevenue management systems for improved forecast accuracy and greater revenues .Journal of Revenue and Pricing Management, 1(3):234-254, 2002..Journal of Revenue and Pricing Management, 1(3):234-254, 2002.