1 s2.0 S0360544221013657 Main 2
1 s2.0 S0360544221013657 Main 2
1 s2.0 S0360544221013657 Main 2
Energy
journal homepage: www.elsevier.com/locate/energy
a r t i c l e i n f o a b s t r a c t
Article history: Commercial buildings account for a significant amount of total energy produced in the US, and the
Received 12 January 2021 Heating Ventilation and Cooling (HVAC) systems are one of the most significant components of their
Received in revised form overall consumption. In this study, we proposed a new data-driven approach to evaluate HVAC cooling
12 May 2021
systems in commercial buildings and identify savings opportunities. The focus is an investigation of the
Accepted 28 May 2021
Available online 8 June 2021
impact of thermostat setpoint setback but using only whole building, electricity data taken at 15-min
intervals for the analysis. We conducted a comparative study of setpoint setback characteristics on
432 commercial buildings with 5 building usage types across the United States. To accomplish this, both
Keywords:
Building energy
piecewise and Random Forest regression algorithms were employed using electricity and exterior
HVAC temperature datasets to identify operational characteristics and the effective setpoints in the building to
Commercial buildings determine the corresponding savings opportunities. Both occupied and unoccupied time periods were
Data analytics studied across cooling degree days (CDD), when air conditioning is typically operational. The results
Setpoint setback show that in commercial buildings, on average, cooling systems account for 9.5% of total consumption.
Random forest When a one degree setback during the cooling season is applied, an average of approximately 1.1% of
Time series annual consumption is achieved; retail and office buildings demonstrate the highest potential for sav-
ings. Additionally, we identified that the number of cooling degree days and base to peak ratio (BPR) are
the most important variables for predicting the magnitude of the consumption of cooling systems.
© 2021 Elsevier Ltd. All rights reserved.
1. Introduction building electricity data is commonly available via the utility meter,
provided in time series format and often with a resolution at 15-
Heating ventilation and air conditioning (HVAC) systems pro- min intervals [3,4]. Given such a dataset, patterns in the time se-
vide thermal comfort by adjusting the air temperature to a desired ries and subsequent analytics can convey information about the
setpoint value. Additionally, in warm climate zones or on select operation of equipment in the building. To date, approaches to
days during the cooling season, HVAC systems are the predominant identify equipment-level consumption from whole building energy
source of peak load [1]. Due to a high installation cost and associ- data (referred to as an unsupervised problem) has been challenging
ated complexities, the sub-metering of HVAC-related equipment [5,6].
using sensors with fine granularity may not feasible in all buildings The data-driven approaches for diagnostics of HVAC perfor-
[2]. Here, we reveal information about cooling system operations mance have shown that HVAC systems correlate with two critical
and potential savings using only easily accessible datasets. Whole parameters: temperature and occupancy [7e12]. Exterior temper-
ature drives HVAC operation, and therefore analysis of the energy
and temperature relationship can lead to identification of HVAC
* Corresponding author. Thayer School of Engineering at Dartmouth, Hanover, characteristics of the buildings. For example, Zhang et al. [13]
NH, USA. developed a power-temperature model, relying on the correlation
E-mail addresses: axk846@case.edu (A. Khalilnejad), rxf131@case.edu of electricity and outside temperature to disaggregate energy
(R.H. French), alexis.abramson@case.edu (A.R. Abramson).
https://doi.org/10.1016/j.energy.2021.121117
0360-5442/© 2021 Elsevier Ltd. All rights reserved.
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
consumption of multiple interconnected HVAC units from a single (2016), also analyzed the energy savings of setpoint setback on
meter. Their analysis confirms that outside temperature is critical to three office buildings using EnergyPlus modeling [21]. They
predicting the HVAC consumption in buildings. In a study to concluded that the larger the deadband, the higher the energy ef-
examine the correlation between occupancy and energy con- ficiency as it relaxes the performance of an HVAC system. They also
sumption, Ahn and Park (2016) [14] employed a wavelet coherence found that the optimal setpoint selection does make a large dif-
method and showed that in commercial buildings, when the oc- ference (up to 30% of total building savings) in energy usage at
cupants are present during set business hours (e.g., in offices and some outside temperatures. In fact, the significance of setpoint
factories), the correlation increases. They also demonstrated that reduction temperature, correlates to the range of outside temper-
occupancy patterns typically correlate with the building usage type. ature. Additionally, Cai et al. [22] used energy modeling software,
For example, in a hospital, the occupancy pattern is more random, eQuest, to model the impact of setpoint setback on energy savings
and therefore the correlation between occupancy and energy and peak load reduction in an office building. They examined set-
consumption is weak. In a later investigation, Deng and Chen backs from 1 F to 5 F with 1 F increments. They observed three
(2019) [7] examined the effect of occupancy on the HVAC energy different usage and savings patterns. For days on which exterior
consumption in 20 office buildings, and they illustrated that temperature was lower than the base-case setpoint (below 70 F),
ignoring the occupancy pattern accounts for significant inaccuracy there obviously is no benefit in applying a setpoint control scheme.
in predicting energy usage. In fact, their neural network method For days on which outdoor temperatures are higher than the set-
showed that in scheduled buildings if they assume constant set- point (above 78 F), a predictable percentage of energy savings can
point without considering occupancy, their model predicted energy be expected. However, on days with moderate temperatures
consumption with 12% less accuracy than when considering occu- (70e78 F), the savings potential cannot be accurately predicted.
pancy. Other than air temperature and occupancy, many other While valid at high temperatures, this study does not instill high
parameters affect HVAC consumption [15]. Wang et al. (2012) [16] confidence in such predictions for moderate temperatures.
revealed that other weather variables, building envelope, the While not necessarily intuitive, there is a nonlinear relationship
presence of vacant spaces, and unoccupied conditions can affect the between temperature setpoint and energy consumption that de-
uncertainty of prediction models. For example, the HVAC setpoint serves further attention in the literature. Here, we analyze this
setback at unoccupied times significantly influences the overall nonlinear relationship and present results on a population of 432
consumption. They explained that ignoring this and other key pa- commercial buildings of five different building usage types. This
rameters can affect the uncertainty of the prediction by 15%e70%. enables an analysis of the impact of several parameters on the
Quantifying the relative impact of those parameters that have savings potential and the buildings’ operational characteristics. We
the most influence on HVAC consumption can help affect actual employ a new data-driven approach to identify the thermostat
savings. For example, a large office building in a hot climate zone is setpoint and the potential impact of setback during the cooling
expected to have different HVAC usage compared to a small season, using only whole building, 15-min interval electricity data
educational building in mild climate. However, the degree of and corresponding weather datasets. We used piecewise regression
importance of each of these parameters is unknown. This gets even to identify the HVAC operation and setpoint setback savings, and
more complicated with considering parameters from relationship through an analysis of energy and temperature correlations, we
of energy and temperature characteristics. For example, air condi- investigated the energy reduction from setback increments. The
tioning in the buildings with highly correlated energy and tem- method employed also enables the disaggregation of the HVAC
perature, operate different from the buildings with low correlation. component of the whole building data, which has been a chal-
Thus, determining the parameters importance in performance of lenging problem for the industry. The influence of various param-
HVAC system in the building is critical. eters on thermostat setpoint and setback savings also is discussed.
With an understating of HVAC operation and the affecting pa-
rameters, various savings analyses also can be conducted to assess 1.1. Methods
how to achieve a reduction in the HVAC consumption and evaluate
the effect on the overall building's energy pattern. HVAC setpoint A building's cooling system reduces or maintains a room's
setback, an approach for saving on HVAC consumption, has been temperature at a desired value, its setpoint [23]. The required en-
discussed in several research studies [9,17]. For example, Papado- ergy consumed for this purpose relates to the outside weather
poulos et al. (2019) used an EnergyPlus [18] building energy model conditions, especially air temperature. Fitting a linear regression
to fine-tune HVAC setpoint and assess the associated savings of model to the energy vs. temperature data reveals this correlation,
office buildings in different climate zones across the U.S. They with the expectation that an increase in consumption corresponds
applied multi-objective optimization on their simulated model to to a rise in exterior temperature on a cooling degree day (CDD) [24].
maximize the energy reduction by setpoint setback while main- Although other weather variables such as irradiance may
taining the thermal comfort, which they defined as not correlate with energy consumption, here we focus this study on the
exceeding±3 C. They illustrated that in locations with mild climate relationship with exterior temperature to assess the impact of
zones, up to 60% of annual HVAC related energy savings can be HVAC. Previous studies have found that energy consumption and
achieved without compromising occupant comfort [18]. In fact, exterior temperature are strongly correlated in commercial build-
they concluded that the HVAC setpoint configuration standards in ings, particularly for the climate zones studied here [25]. More
commercial buildings should be revisited due to extensive amount specifically, when the slope of the exterior energy vs. temperature
of savings opportunities. Similarly, using EnergyPlus building curve is zero, the HVAC is minimal or zero. When the same curve
modeling software [19], Hoyt et al. (2015) evaluated extending the exhibits a high slope, one can easily recognize how HVAC con-
HVAC setpoint and widening the temperature range where HVAC is sumption increases as temperature extremes increase. Nonethe-
not required to operate (referred to as deadband zone) in six office less, exterior temperature, CDDs and the daily scheduled period
buildings in different climate zones [20]. They demonstrated that analyzed here are all dependent on solar irradiance, and therefore
with an increase in the cooling setpoint from 22.2 C to 25 C, 27% irradiance is considered, albeit indirectly.
of cooling load, on average, can be saved. Their analysis illustrates Differences in the behavior and response of the cooling system
that the benefit of setpoint set back is cumulative, and small in- to exterior temperature may be the result of occupancy and set-
cremental changes result in proportional savings. Ghahermani et al. point changes and/or the presence of an auxiliary system used for
2
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
8
< a 1 þ b1 T T < Tsp1
^a ¼
E a þ b2 T Tsp1 T < Tsp2 (1)
: 2
a 3 þ b3 T Tsp2 T
Ea;CDD Esb;CDD
spsbsavings ¼ 100 (3)
Ea
With implementation of Equation (2) the energy savings from a
setpoint setback can be achieved at any given DeltaT using Equation
(3). To further study the relationship between the magnitude of the
setpoint setback and the associated savings, we examine the cor-
relation of energy and temperature as a function of the setpoint
Fig. 1. Piecewise regression on energy and temperature. setback. At exterior temperatures below the effective thermostat
3
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
setpoint, the cooling system is not operational, which essentially daughter nodes of tL, and tR at splitting point of s. Thus, the decrease
equates to a zero correlation between energy and temperature in impurity is calculated by
while at high exterior temperatures, the correlation may reach
values close to 1. Here, we calculate the correlation of energy vs. NL N
Dðs; tÞ ¼ DðtÞ DðtL Þ þ R DðtR Þ (7)
temperature for each increment of setpoint using a Pearson cor- N N
relation [30]:
where NL is the number of samples in the left daughter node,
Pn
̄ ̄ and tR is the number of samples in the right daughter node. Also,
rET ¼ i¼1 Ti T Ei E D(tL) and D(tR) are the impurity of left and right daughter nodes,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (4) respectively. In formation of regression trees, the variable with
Pn ̄ 2 P
n
̄ 2
i¼1 Ti T i¼1 Ei E
maximized decrease of impurity (Equation (7)) will be chosen as
root node of each tree.
where Ei and Ti are the energy and temperature data points, and With the determination of optimal splitting criteria for each
̄ ̄ tree, in Random Forest, two parameters are required to be opti-
E and T are the average of all energy and temperature data points, mized. First, the number of regression trees (ntree) must be opti-
respectively. When the cooling system is not operational, rET ¼ 0 mized such that the accuracy does not change within a margin of
and spsbsavings (Equation (3)) at the corresponding DT is effec- error. Second, the number of input variables per node (mtry) must
tively equivalent to the percentage contribution of the cooling be optimized to obtain the highest accuracy with a default value of
system to the total consumption in the building (i.e. the dis- 1/3 of the number of variables. The optimization of mtry and ntree
aggregated portion of the consumption due to cooling). is done by measuring the accuracy of the out of bag (OOB) error,
In commercial buildings, the energy consumption and resulting which represents the accuracy of the whole Random Forest pre-
heat generation of interior loads that are largely independent of diction model [35,36]. OOB is referred to as a sub-sample of the
exterior temperature (e.g. plug loads, lighting) cause a constant dataset that is not a part of randomly drawn subsamples for
offset above and beyond the HVAC consumption. Since our method training Random Forest (the bootstrap sample). The OOB error,
identifies and disaggregates the HVAC consumption, then during defined as root mean squared error of OOB (RMSEOOB) is calculated
periods when the heating or cooling is not operating, the energy by:
consumed can be attributed to these interior loads.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u X
u1 n ̄ 2
1.2. Random Forest regression RMSEOOB ¼t y y ^j;OOB (8)
n j¼1 j
By calculating the HVAC share of total consumption from the
̄
previous section, a predictive model can be developed to predict ^j;OOB is
where yj is the actual output for the jth observation, and y
the same output. However, instead of the energy and temperature
the average perdition from all trees in the OOB. The objective of
time series data as input parameters, a set of metadata representing
tuning the Random Forest parameters (ntree and mtry) is to
characteristics of each building can be used. If the new input pa-
minimize RMSEOOB. Detailed instructions of Random Forest tuning
rameters could predict the target of HVAC share of total con-
is given in Ref. [31]. After developing the predicted model, its ac-
sumption with high accuracy, the importance of those parameters
curacy is tested using a testing set comprised of 20% of the whole
in the determination of the predictor (HVAC cooling) can be
data and compared to a training dataset that is 80% of the original
quantified. The input parameters in the predicative model can be
data.
CDD, BPR, Energy and Temperature correlation based on occupancy,
The importance of each variable is calculated as the number of
buildings HVAC on and off schedule, scheduled period, climate
times that such a variable is used in the nodes of the regression
zone, area of the building, number of floors, and annual con-
trees of Random Forest (IncNodePurity) [37]. As mentioned previ-
sumption. A Random Forest regression can be used where the
ously, the selection of the variable in each node is based on a
prediction of air conditioning share of total consumption through
decrease in impurity. The more a variables used to predict the
setpoint setback savings, using these input parameters to identify
output e in this case, the HVAC cooling share of total consumption
the critical variables and their rank and degree of importance.
e the more important that variable is.
Random Forest is an ensemble of multiple classification and
regression trees (CART), where each tree is trained on a subset of
random input variables and optimized with bootstrapping [31e33]. 2. Results and discussion
The output of Random Forest is achieved by averaging the output of
each regression tree where the splitting criteria for each node is 2.1. Case study: office building in Las Vegas, NV
selected from a small set of input variables based on the output
variance of each node [34]. For this purpose, considering X as an The building of study is an 1524 m2 office building located in Las
input variable and Y as the response, we assume that in the ith Vegas, Nevada, in the United States in Koppen-Geiger climate zone
variable: of “Bwk” [38] (Tropical and Subtropical Desert Climate). The
building's 15-min interval electricity consumption from March
Yi ¼ f ðXi Þ þ ε (5) 2017 to March 2018 varied from 1 kWh to 20 kWh, with a median of
6.2 kWh. Over the same time period, the exterior temperature
Therefore, the impurity of node t is defined as:
ranged from 3 to 47 C, with a median of 19.9 C. Fig. 3 is a
1 X ̄ 2 snapshot of the building's daily electricity consumption from
DðtÞ ¼ Yi Y t (6) March 2017 to March 2018, broken out into summer and winter
N X 2t
i seasons. The representation of the pattern of energy consumption
̄
in Fig. 3 is shown in only summer months (June, July, and August)
where Y t is the sample mean for node t and N is the sample size and winter months (December, January, and February) to illustrate
of t. Note that the impurity is in form of variance in the regression the energy consumption pattern of the building in hot and cold
tree. Node t of regression tree splits the samples into left and right days, however, the savings are applied to all CDDs. The heating
4
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
Fig. 3. Building energy consumption pattern and snapshot. Time series representation of the characteristic daily electricity consumption of a building across one year from March
2017 to March 2018 and broken down into summer (June, July, and August) and winter (December, January, February) seasons. The blue vertical boxes show the distribution (middle
50% variability) of energy consumption for the given hour across each season. The whiskers indicate the minimum to maximum consumption, excluding outliers, and red lines
represent the average energy consumption of each whisker. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this
article.)
5
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
Fig. 6. Population of the buildings of study (a) distribution in US map, (b) breakdown by building usage type, (c) breakdown by Koppen Geinger Climate zones.
Fig. 7. Distribution of operational characteristics of the buildings by building usage type (a) number of cooling degree days (CDD) (b) operational hours, and (c) BPR. Each point
represents a building colored by climate zone. The whiskers represent the range of the data without outliers and the boxes represent the quartiles (first quantile, median, third
quantile).
resulting in different energy and temperature correlation. Fig. 8 representing high density of buildings with significant HVAC
represents the distribution of energy and temperature correlation operation. Also, examining the green density plots in Fig. 8, the
(rET from Equation (4)) during occupied and unoccupied times of correlations during occupied times are less variable than during
the population of the buildings shown in Fig. 6. In Fig. 8, the green unoccupied periods, illustrating more predictable operation of
density curves represent the distribution of the correlations. As can HVAC during occupied times, as expected. The blue boxplots on the
be seen, in occupied times, the peak of distributions is 0.7, top and side of the figure indicate the first quartile, median and the
7
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
Fig. 8. Energy vs. temperature correlations, comparing occupied vs. unoccupied Fig. 9. Savings from setpoint setback of one degree The red line represents the linear
times The green lines on the side and top represent the distribution of correlations, regression model fitted to the data. The green lines on the side and top represent the
and the blue boxplot represent their quantiles (first quartile, median, and third distribution of correlations, and the blue boxplot indicates their quaritles (first quartile,
quartile). (For interpretation of the references to color in this figure legend, the reader median, and third quartile). (For interpretation of the references to color in this figure
is referred to the Web version of this article.) legend, the reader is referred to the Web version of this article.)
third quartiles of the distribution. The median of the correlations in model represents the trend of savings vs. BPR, for a given BPR, the
occupied times is 0.6 which is 60% higher than the correlations in savings has high variability, implying that other variables also affect
unoccupied times with a median of 0.24, representing higher the amount of savings. For example, in BPR of 0.25, the fitted sav-
operation of cooling systems in occupied times. With the high ings is 1.7% with savings of actual data points ranging from 0 to 3%,
variability in the distribution of correlations of the buildings shown representing a relatively large variability, compared to BPR of 0.9
in Fig. 8, it can be expected that the savings potential of buildings where a 0.4% savings from the fitted model is achieved with range
varies in different buildings, for example, in a building with high of actual data points from 0 to 0.7%. The influence of other pa-
correlation, the cooling system consumes relatively more energy rameters along with BPR will be discussed in later sections.
compared to one with low correlation. Recall, as mentioned earlier, A breakdown of the one-degree setpoint setback savings by
when the HVAC cooling operates, the energy consumption in- building type further illustrates differences in savings opportu-
creases with temperature rise, causing a positive correlation, thus, nities. Fig. 10 is the savings breakdown by building usage type. The
the negative correlation in a few of the buildings in unoccupied retail buildings have the highest savings opportunity with a median
time represents cooling system not in operation in that period on of 1.5% annual savings. The office buildings exhibit a relatively more
them. Therefore, on those buildings, the setpoint setback algorithm dispersed savings distribution compared to food sales but have an
is applied only on the data corresponding to occupied times. equal median of savings (1.1%). The industrial buildings have the
lowest opportunity from savings using a one-degree setback. The
detailed statistical information on savings breakdown is given in
2.3.1. One degree setback
Table 1. As shown in the table, generally, the buildings with longer
With one degree increment steps of the setpoint setback,
scheduled hours have more savings opportunities. This is because
tsb ¼ tsp þ 1 is the minimum adjustment to achieve savings. Fig. 9
in the scheduled period, HVAC cooling consumes relatively more
presents the building population's annual savings from a one-
energy; thus, a longer scheduled period results in more savings
degree setpoint setback vs. BPR. A median annual energy savings
opportunities. Furthermore, with the correlation of savings and BPR
of 1.1% is demonstrated from a one-degree setpoint setback applied
is discussed in the previous section (Fig. 9) it is illustrated that both
to these buildings. The distribution of the savings achieved is
CDDs and BPR play an important role in the determination of sav-
roughly normal, with an IQR of 1.13. Therefore, one can expect
ings potential. The impact of these two parameters along with
savings ranging from 0.6% to 1.7% with 1-degree setback in the
other important variables will be discussed in detail in later sec-
majority of commercial buildings. A linear regression model is
tions. The skewness given in the table is a measure of variability,
fitted to the savings (the red line in Fig. 9), leading to
and a high positive skewness means that the random distribution
spsbsavings ¼ 2.2e1.97BPR, illustrating that in low BPRs savings is
has a tail towards higher rather than median values (i.e. right-
expected to be higher, however, in high BPRs savings are lower.
skewed). A negative skewness corresponds to a tail at lower than
Recall the BPR is a representation of ratio of energy consumption in
median values. The breakdown of statistical information shows
baseload compared to peakload. Therefore, buildings with BPR of 1,
that savings in all the building types are right-skewed (a longer tail
represent equal consumption level in occupied and unoccupied
at values higher than median), illustrating a greater variability the
times, implying a non-scheduled or not in operation HVAC cooling
higher savings. The skewness can be due to the presence of outliers
system in the buildings. As can be seen in Fig. 8, although the fitted
8
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
Fig. 11. Energy vs. temperature correlations, comparing occupied vs. unoccupied
times at various setpoint setback increments. Each arrow represents one degree of
Fig. 10. One-degree setpoint setback savings as a function of building usage type setback while the colors indicate the magnitude of savings. (For interpretation of the
The black distribution curves (with shaded gray areas) represent the distribution of the references to color in this figure legend, the reader is referred to the Web version of
savings. The red lines indicate the first, second (median), and third quartiles, and the this article.)
blue data points depict the actual savings of each building. (For interpretation of the
references to color in this figure legend, the reader is referred to the Web version of
this article.) occupied and unoccupied period. As can be seen in Fig. 11, the
correlations eventually converge to zero during both occupied and
unoccupied times, except for the case where negative correlation
Table 1
Breakdown of the one degree setpoint setback savings. during unoccupied times occurs. In those buildings, the negative
correlations represent that HVAC is already off in unoccupied time.
building type BPR cdd_days scheduled spsb savings (%)
Therefore, the setpoint change in unoccupied time can not be
period (hour)
applied and their energy and temperature correlations are a con-
mean mean median IQR median IQR skewness
stant negative value. Also, the convergence to zero during unoc-
1 Retail 0.43 134 14.50 1.31 1.52 0.95 1.32 cupied times at lower setpoints, since, as discussed earlier, in
2 Office 0.54 128 13.50 3.04 1.14 1.36 1.18 scheduled buildings, in unoccupied times the setpoint is already
3 Food Sales 0.32 132 17.75 1.69 1.09 0.84 1.09
higher than occupied times, thus, less setbacks are required to
4 Educational 0.50 131 12.50 6.50 0.87 0.78 0.83
5 Industrial 0.61 130 11.62 8.68 0.33 0.61 2.56 make the HVAC not in operation in that period. This will be further
6 Overall 0.45 131 14.25 5.00 1.14 1.13 1.39 assessed in the later section.
A setpoint setback in nearly all buildings causes a reduction of
energy consumption and creates a lower rET. To demonstrate this
in the distribution, caused by a few buildings that the savings op- point, Fig. 12 shows the amount of annual energy savings vs. rET
portunities are exceptionally higher, due to more CDDs or greater across the population of buildings as a function of increasing set-
share of HVAC cooling to total consumption. This will be further point setback. Each arrow represents a one-degree setpoint
investigated in later sections. setback. As can be seen, the directions of initial arrows are vertical
and the last ones are roughly horizontal, illustrating much higher
savings in the first few setbacks. This is because, as mentioned
2.3.2. HVAC turn off
earlier, the initial setpoints of the HVAC cover most of the days in
As discussed earlier, when the setpoint setback is large enough
CDD, however, in higher setpoints, there are fewer days that require
to cause the energy and temperature correlation (rET) to reach zero,
air conditioning.
this represents the scenario when the cooling system effectively is
The nonlinear behavior of savings in increments of setpoint
not operational. Fig. 11 shows the energy vs. temperature correla-
setbacks motivates an interest to further identify the significance of
tions across the population of buildings, comparing occupied vs.
each increment. Fig. 13 illustrates this by plotting the distribution of
unoccupied times as a function of increasing setpoint setback. Each
savings at each setpoint setback increment. The three red lines in
arrow corresponds to an additional one-degree setback. The
each distribution curves represent the first quartile, median, and
changes in the correlations are represented by nonlinear curves for
third quartiles of the distributions. The difference between the
most buildings. As discussed earlier, the energy and temperature
third and first quartile lines equals to the IQR (interquartile), which
correlation corresponds to the HVAC operational characteristics
is a representation of variability of the distribution. As shown, the
shown in Fig. 7. The amount of change in the correlation with
distributions initially are normal with relatively lower variability
setpoint setback increments depends on various parameters such
compared to the higher setbacks. At high setbacks (i.e. > 10 ), the
as climate zone, buildings thermal resistance, and setpoints in the
9
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
Table 2
Breakdown of annual savings with HVAC cooling off.
Fig. 13. Annual percentage savings at increasing setpoint setbacks for the population of buildings The black lines (area colored in gray) represent the distribution of the savings.
The red lines depict the first, second (median), and third quartiles, and the blue data points indicate the actual total savings of each building. (For interpretation of the references to
color in this figure legend, the reader is referred to the Web version of this article.)
10
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
2.4. Variable importance compared to other parameters. This is because in the buildings with
higher CDD, there are more days that cooling system operates,
To identify the relative importance of different variables to the resulting in more savings opportunity. The number of CDD directly
share of cooling in overall consumption, the Random Forest corresponds to location and climate zone as well. In hotter climates,
regression is applied to the data as described above. First, a tuning there are more CDDs compared to in colder climates. Perhaps
approach must be used to ensure the highest accuracy of the obvious, but if cooling savings is a priority for a building portfolio
method. The error metrics for tuning the hyper-parameters (mtry owner, then replacement, as needed, of such systems in hotter
and ntree) and the overall performance of the Random Forest is climates will lead to greater savings for the portfolio. BPR, used 25%
shown in Fig. 15. The hyper-parameters are tuned by 100 times of less than CDDs in Random Forest regression nodes, is the second
bootstrapping (Fig. 15a), resulting in the average OOBRMSE (Equation most important variable since it contains information about the
(8)) of 3.4, ranging from 3.35 to 3.45. The evaluation of OOB error at operation of cooling systems during occupied and unoccupied
different values of mtry (Fig. 15b) illustrates that 10 is the optimal times. In buildings with a high BPR, the unoccupied time con-
value. The OOB error with up to 500 trees is shown in Fig. 15c. After sumption typically is relatively high, and therefore, there is a great
approximately 200 trees, the OOBRMSE does not gain a significant opportunity for savings. Since BPR can be an important indicator of
increase in accuracy. Ultimately, the error of the test set which is savings, this easy-to-determine value should be routinely calcu-
used for evaluation of the Random Forest model [31] is very close to lated when assessing the potential for savings in a building.
the OOB error, representing satisfactory performance. Building usage type is the third most important variable. However,
After tuning the Random Forest to reach its highest accuracy, the it is used in the regression nodes 50% less than BPR, illustrating the
input variables that potentially affect the energy consumption greater significance of the first two variables presented. This was
cooling system can be assessed and ranked. Recall, the ranking is previously discussed when presenting the breakdown of the sav-
determined from the number of times the variable is used in the ings by building usage type (Table 2). It was shown that the HVAC
nodes of the regression trees of the Random Forest model (IncNo- share of the total consumption has a different distribution (median,
dePurity [37]). As illustrated in Fig. 16, the number of CDDs is the IQR, and skewness) by building type, illustrating the critical
most critical parameter with a significantly higher IncNodePurity
Fig. 15. Evaluation of Random Forest performance based on accuracy metrics (a) The OOB RMSE histogram based on 100 times bootstrapping, the blue line represent the
median of RMSE, (b) error of OOB based on mtry from 1 to 17 (c) comparison of RMSE of testing dataset and OOB dataset with 500 trees in RF. (For interpretation of the references to
color in this figure legend, the reader is referred to the Web version of this article.)
11
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
importance of this parameter. The next important variable, the Credit author statement
energy and temperature correlation, represents the impact of the
cooling system. As discussed earlier, the energy and temperature Arash Khalilnejad: Conceptualization, Methodology, Formal
correlation (rET) represents the HVAC operational characteristics, as analysis, Investigation, Writing e original draft, Writing e review &
in buildings with high correlation consume the HVAC consumption editing. Roger H. French: Supervision, Funding acquisition. Alexis R.
is high. It can be seen that the occupied energy and temperature Abramson: Supervision, Funding acquisition, Writing e review &
correlation is more important than for unoccupied times. This can editing.
be because in occupied time the HVAC setpoint is higher, and
consumes more electricity, resulting in higher share in overall Declaration of competing interest
HVAC cooling usage. Note that Random Forest performs well even
when variables are highly correlated [42]. Therefore, despite highly The authors declare the following financial interests/personal
collinearity of energy and temperature correlations of the total, relationships which may be considered as potential competing
occupied, and unoccupied time, the algorithm is capable of interests:The authors are founders and equity holders in a startup
choosing the more important ones for splitting the regression trees company, Edifice Analytics, that has spun out from Case Western
in Random Forest. The other parameters presented in Fig. 16, with Reserve University (CWRU) where the work presented herein was
each being used in 2%e3% of total decision tree nodes, are within conducted. Edifice Analytics has a license from CWRU to perform
the error of each other, illustrating less importance of those virtual energy audits of buildings.
parameters.
Acknowledgments
12
A. Khalilnejad, R.H. French and A.R. Abramson Energy 233 (2021) 121117
[13] Zhang X, Cai M, Pipattanasomporn M, Rahman S. A power disaggregation piecewise linear regression analysis. Expert Syst Appl 2016;44:156e67.
approach to identify power-temperature models of HVAC units. In: 2018 IEEE https://doi.org/10.1016/j.eswa.2015.08.034.
international smart cities conference (ISC2); 2018. p. 1e6. https://doi.org/ [27] 42540_19 - chapter 19 energy Estimating and modeling Methods General
10.1109/ISC2.2018.8656976. Considerations 19.1 models and approaches 19.1 characteristics of models
[14] Ahn K-U, Park C-S. Correlation between occupants and energy consumption. 19.1 d Course Hero.
Energy Build 2016;116:420e33. https://doi.org/10.1016/ [28] Hitchin R, Knight I. Daily energy consumption signatures and control charts
j.enbuild.2016.01.010. for air-conditioned buildings. Energy Build 2016;112:101e9.
[15] Liu Z, Yin H, Ma S, Wei B, Jensen B, Cao G. Effect of environmental parameters [29] Heidarinejad M, Mattise N, Dahlhausen M, Sharma K, Benne K, Macumber D,
on culturability and viability of dust accumulated fungi in different HVAC Brackney L, Srebric J. Demonstration of reduced-order urban scale building
segments. Sustainable Cities and Society 2019;48:101538. https://doi.org/ energy models. Energy Build 2017;156:17e28.
10.1016/j.scs.2019.101538. [30] J. Z. Bakdash, L. R. Marusich, Repeated measures correlation, Front Psychol 8.
[16] Wang L, Mathew P, Pang X. Uncertainties in energy consumption introduced doi:10.3389/fpsyg.2017.00456.
by building operations and weather for a medium-size office building. Energy [31] M. R. Segal, Machine Learning Benchmarks and random forest regression.
Build 2012;53:152e8. https://doi.org/10.1016/j.enbuild.2012.06.017. wOS: [32] Liu D, Sun K. Random forest solar power forecast based on classification
000309086900017. optimization. Energy 2019;187:115940. https://doi.org/10.1016/
[17] Kusiak A, Li M, Tang F. Modeling and optimization of HVAC energy con- j.energy.2019.115940.
sumption. Appl Energy 2010;87(10):3092e102. https://doi.org/10.1016/ [33] Pashaei V, Dehghanzadeh P, Enwia G, Bayat M, Majerus SJ, Mandal S. Flexible
j.apenergy.2010.04.008. body-conformal ultrasound patches for image-guided neuromodulation. IEEE
[18] Papadopoulos S, Kontokosta CE, Vlachokostas A, Azar E. Rethinking HVAC Transactions on Biomedical Circuits and Systems 2019;14(2):305e18. https://
temperature setpoints in commercial buildings: the potential for zero-cost doi.org/10.1109/TBCAS.2019.2959439.
energy savings and comfort improvement in different climates. Build Envi- [34] Ishwaran H. The effect of splitting on random forests. Mach Learn 2015;99(1):
ron 2019;155:350e9. https://doi.org/10.1016/j.buildenv.2019.03.062. 75e118. https://doi.org/10.1007/s10994-014-5451-2.
[19] Crawley DB, Lawrie LK, Winkelmann FC, Buhl WF, Huang YJ, Pedersen CO, [35] Wang Z, Wang Y, Zeng R, Srinivasan RS, Ahrentzen S. Random Forest based
Strand RK, Liesen RJ, Fisher DE, Witte MJ, Glazer J. EnergyPlus: creating a new- hourly building energy prediction. Energy Build 2018;171:11e25. https://
generation building energy simulation program. Energy Build 2001;33(4): doi.org/10.1016/j.enbuild.2018.04.008.
319e31. https://doi.org/10.1016/S0378-7788(00)00114-6. wOS: [36] Aggarwal P, Karri S, Pashaei V, Dehghanzadeh P, Mandal S, Subramanyam G.
000167272000005. Towards automated positioning of ultrasonic probes. In: 2019 IEEE national
[20] Hoyt T, Arens E, Zhang H. Extending air temperature setpoints: simulated aerospace and electronics conference (NAECON). IEEE; 2019. p. 477e80.
energy savings and design considerations for new and retrofit buildings. Build https://doi.org/10.1109/NAECON46414.2019.9058185.
Environ 2015;88:89e96. https://doi.org/10.1016/j.buildenv.2014.09.010. [37] Gro €mping U. Variable importance assessment in regression: linear regression
[21] Ghahramani A, Zhang K, Dutta K, Yang Z, Becerik-Gerber B. Energy savings versus random forest. Am Statistician 2009;63(4):308e19. https://doi.org/
from temperature setpoints and deadband: quantifying the influence of 10.1198/tast.2009.08199.
building and system properties on savings. Appl Energy 2016;165:930e42. [38] Peel MC, Finlayson BL, McMahon TA. Updated world map of the Ko €ppen-
https://doi.org/10.1016/j.apenergy.2015.12.115. Geiger climate classification. Hydrol Earth Syst Sci 2007;11(5):1633e44.
[22] Cai M, Ramdaspalli S, Pipattanasomporn M, Rahman S, Malekpour A, https://doi.org/10.5194/hess-11-1633-2007.
Kothandaraman SR. Impact of HVAC set point Adjustment on energy savings [39] Zhang X, Pipattanasomporn M, Kuzlu M, Rahman S. Conceptual framework for
and peak load reductions in buildings. In: 2018 IEEE international smart cities a multi-building peak load management system. In: 2016 IEEE PES innovative
conference (ISC2); 2018. p. 1e6. https://doi.org/10.1109/ISC2.2018.8656738. smart grid technologies conference europe (ISGT-Europe); 2016. p. 1e5.
[23] Aghniaey S, Lawrence TM. The impact of increased cooling setpoint temper- https://doi.org/10.1109/ISGTEurope.2016.7856238.
ature during demand response events on occupant thermal comfort in com- [40] Ghahramani A, Jazizadeh F, Becerik-Gerber B. A knowledge based approach
mercial buildings: a review. Energy Build 2018;173:19e27. https://doi.org/ for selecting energy-aware and comfort-driven HVAC temperature set points.
10.1016/j.enbuild.2018.04.068. Energy Build 2014;85:536e48. https://doi.org/10.1016/j.enbuild.2014.09.055.
[24] Erhardt RJ. Mid-twenty-first-century projected trends in North American [41] Cbecs. Energy usage summary. 2012. https://www.eia.gov/consumption/
heating and cooling degree days. Environmetrics 2015;26(2):133e44. https:// commercial/.
doi.org/10.1002/env.2318. [42] Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of
[25] Hossain MA. Development of building markers and unsupervised non- random forest and parametric imputation models for imputing missing data
intrusive disaggregation model for commercial buildings’energy usage. Ph.D. using MICE: a CALIBER study. Am J Epidemiol 2014;179(6):764e74. https://
thesis. CASE WESTERN RESERVE UNIVERSITY; 2018. doi.org/10.1093/aje/kwt312.
[26] Yang L, Liu S, Tsoka S, Papageorgiou LG. Mathematical programming for
13