To evaluate the impact of forecast errors on the different HVAC controllers, we built a custom open source thermal simulator called ThermalSim. ThermalSim is a lightweight C/C++ based simulation platform, whose focus is to study the influence of prediction errors on HVAC operations. Figure 1 outlines the architecture of ThermalSim. It consists of four major modules:

1
Master  to handle data I/O and preprocessing,

2
Error Management  to inject unbiased errors in the occupancy streams,

3
Simulator  to simulate room temperature for a given thermal model and control logic, and

4
Analyser  to compute energy consumption, occupant comfort, and analyze simulated data streams.
In the current version, Simulator module incorporates AMPL (2014) – an algebraic modeling language for the mathematical programming – to compute the control parameters.
Master module
The Master module takes as input historical weather and occupancy data in CSV (Comma Separated Values) format, a usergenerated description of the building, and simulation control parameters (Fig. 2) including start and stop time of the simulation, parameters of the thermal model, control strategy, among others. Before executing the simulations, the Master module preprocesses the data, and after completion saves the output of simulation in the CSV format.
Modeling occupancy prediction errors
ThermalSim represents occupancy data for a day as a string of consecutive 0’s (for unoccupied workspaces) and 1’s (for occupied spaces). We consider only two states of occupancy because a majority of occupancy prediction algorithms use occupancy as a twostate variable. We call this string an occupancy string. The length of a single occupancy string depends upon the sampling rate of the occupancy data. Data sampled every ten minutes will generate an occupancy string of length 144 characters, and if the sampling rate is thirty seconds, the string will be 2880 characters long.
Error matrix
It is important that occupancy forecast errors be realistic. For example, it does not make sense to randomly flip occupancy states, since this may result in forecasting occupancy during the middle of the night, which is very unlikely. Our key insight is that a likely outcome of an errored forecast is to forecast another valid occupancy string, with the observation that the higher the error rate, the larger the distance, in an appropriate metric space, between the true and the errored strings.
We use the following approach: For a dataset with n occupancy strings, each cell of an error matrix depicts the Hamming Distance between any two occupancy strings – the number of mismatching characters (Hamming 1950). To normalize, we divide value in each cell by the length of occupancy string. The error matrix is a symmetric matrix of size n^{2} which helps in systematically injecting unbiased errors in the occupancy data.
To illustrate, consider a scenario where we want to analyze different control strategies with 10% prediction error in the occupancy data. The error management module will refer error matrix for an occupancy string which is closest to the day of analysis. We term the selected occupancy string as the reference string. The module will then look into the error matrix to find all those strings that have 10% error as compared to the reference string and randomly select one. We call the selected one an erroneous string. If the day (reference string) was 30% occupied, then the occupancy in the erroneous string may fall anywhere in between 2040%.
Simulator
The simulator module takes input from the master and error management modules to simulate the room temperature. It comprises two major blocks 

1
thermal model  depicts various thermal interactions occurring within a room, and

2
control module  to compute the control parameters.
In the current version, we have implemented two thermal models 

1
single region  no partition exists within a room (Eq. 1), and

2
two regions  the occupied area is separated from the unoccupied portion by a thin layer of air (Eqs. 47).
As discussed in “HVAC control strategies” section, we have implemented four HVAC controllers in ThermalSim 

1
schedulebased,

2
reactive,

3
model predictive control (no SPOT device present), and

4
SPOTaware model predictive control.
In the rest of the paper, we will use NS as an acronym for NoSPOT model predictive control and SA for SPOTAware MPC.
Simulator validation
To quantify the accuracy of ThermalSim in simulating room temperature, from a room in residential apartment, we collected temperature data for 17 days and carried out leavepout cross validation with p=5. In such an approach, we validate the model on p observations and use the remaining observations for training. We used a nonlinear solver whose objective was to minimize the residual between predicted and actual room temperature. The simulator tunes following model parameters 

1
thermal capacity of the room (C),

2
heat transfer coefficient between outside and room (α_{ex}),

3
coefficient of heating/cooling (ρσ)

4
heat load due to occupants (Q_{ac}), and

5
heat load due to heating/cooling appliances (Q_{ac}).
Our analysis (in Fig. 3) indicates that ThermalSim can simulate the daily room temperature with an RMSE (Root Mean Square Error) of 1.52^{∘}C(σ=0.18^{∘}C). Figure 4 depicts the average (solid line) and predicted (dashed line) room temperature. Note that though the predicted room temperature follows the pattern of actual room temperature, it fails to align perfectly. Though misalignment does increase the RMSE at some time instances, we found that it has little overall impact on total energy consumption and occupants’ comfort.
Metrics
Energy consumption
Equation 9 computes the total energy consumption of a building for a day. Here, Po(t) denotes the power consumption of HVAC and other heating/cooling devices, τ is the sampling rate, and n_{t} is the number of daily samples.
$$ E = \sum\limits_{t=0}^{n_{t}} Po{(t)} \times \frac{\tau}{3600} $$
(9)
Occupant discomfort
ThermalSim leverages Predicted Mean Vote (PMV) (ASHRAE 2006) to estimate the comfort level of the occupants (Eq. 10). At a given time instant t, if PMV (P^{ij}(t)) lies within the comfort requirements ([P_{ll},P_{ul}]) of an individual then we mark the room as comfortable, else uncomfortable. \(D_{\%}^{ij}\) denotes the percentage of time instances in a day when the user was uncomfortable in the room.
$$ P^{ij}{(t)} = P1 \times T_{oc}^{ij}{(t)}  P2 \times v_{a}^{ij}{(t)}+ P3 \times v_{a}^{ij}{(t)} \times v_{a}^{ij}{(t)}  P4 $$
(10)
$$ D^{ij}{(t)} = max\left(0, P_{ll}  P^{ij}(t), P^{ij}(t)  P_{ul}\right) $$
(11)
$$ D_{\%}^{ij} = \frac{{\sum\nolimits}_{t=0}^{n_{t}}[{D^{ij}{(t)} \ne 0}]}{{\sum\nolimits}_{t=0}^{n_{t}}[{O^{ij}{(t)} = 1}]} $$
(12)
Robustness
Prediction errors are stochastic in nature and their impact on energy consumption and occupant comfort depends on two factors:
Nature of the error: If the prediction algorithm mispredicts occupancy for short time intervals (say for a minute or so), we term the prediction errors as point errors, otherwise we call them burst errors. For a particular error percentage, an erroneous occupancy string can have point errors, burst errors, or a mix of both; resulting in different values of energy consumption and occupants’ discomfort for the same error percentage.
Timing of the error: The occupancy prediction algorithm can make errors at any time of the day  such as during peak or nonpeak time. Consider the situation where the occupancy prediction has 15% error during the peak hours and the controller assumes one of the five rooms to be occupied though it was unoccupied. In this situation there is a high chance that the HVAC might be already running during that time. Given the fact that the other four rooms are occupied, this particular prediction error will have an insignificant impact on the HVAC operations. However, during night time, the same error percentage might waste significant energy. This illustrates that the timing of the prediction errors has a significant impact on both comfort and energy consumption.
For a specific error percentage, depending on the nature and timing of the errors, the energy consumption and user discomfort may either increase or decrease, potentially destabilizing HVAC operations. For a specific example, consider the big circle and triangle in Fig. 5, which depict the energy consumption and user discomfort for NS and SA controllers respectively for perfect occupancy predictions in a particular simulation scenario. For a specific error percentage, the small circles (NS) and triangles (SA) depict the energy consumption and user discomfort for fifteen different erroneous occupancy strings. We noticed that as prediction error increases from 5% (left) to 20% (right), the points indicating erroneous strings start moving away from the results obtained from perfect prediction.
Note that the circles (NS) are more scattered than the triangles (SA). In the case of NS, the system decides the control parameters such that the desired room temperature (which is the same for each room) is achieved across all the rooms. In case of a sudden change in the occupancy, NS updates the control parameters, but it takes significant time to reattain the energydiscomfort tradeoff setpoint. In contrast, in SA, the controller knows the current state of SPOT; thus, the controller chooses a set point such that HVAC provides a certain level of comfort to the occupants and SPOT provides the necessary additional offset. SPOT, being responsive in nature, keeps the comfort level of individuals within the desired range with insignificant increase in aggregate energy consumption. Therefore, even if the error percentage increases, the energy and discomfort stays close to the perfect prediction for SA whereas NS becomes highly unstable.
To capture this phenomenon, Eq. 13 defines a robust (cs∈{NS,SA}) metric which quantifies the robustness of a particular control strategy cs towards the prediction errors. It computes the number of instances that stay within the desired limits of the building manager.
$$ {robust}_{cs}~(\%) = \frac{\text{\# of instances within limits}}{\text{total \# of instances}} \times 100 $$
(13)
For concreteness, we use ± 20 kWh and ± 5% as the acceptable limits for energy consumption and occupants’ discomfort, respectively, as shown by the rectangles in the figure. For the given scenario (in Fig. 5), when the error percentage is increasing from 5 to 20%, NS is less robust towards the prediction error (60%→0%), however, SA remains consistent (100%→93%). For a predictive control strategy, a PEC system (like SPOT) mitigates the effect of prediction errors to make the HVAC operations more reliable and robust. Whenever there is an unexpected occupancy in the room, SPOT can react quickly as compared to central HVAC system which has a slower timescale.