Non intrusive load monitoring for demand side management

*Correspondence: matteo.salani@supsi.ch 1Dalle Molle Institute for Artificial Intelligence, USI-SUPSI, Lugano, Switzerland Full list of author information is available at the end of the article Abstract In the context of a pilot project, the Lugaggia Innovation Community (LIC), we address the problem of non-intrusive load monitoring for the purpose of demand side management on low voltage grids in presence of distributed power generation (photovoltaic). From the power readings of smart meters, we estimate the photovoltaic production and detect the activation of major loads (heatpumps and domestic water heaters). Experiments, conducted with real data and in silico, show that exploiting meter readings only, we can estimate PV production with MAPE ranging from 4.6% (best case) to 41.9% (worst case). Even with non negligible photovoltaic production estimation errors, the proposed method is capable of detecting the activation of heatpumps and domestic water heaters.


Introduction
Distributed power generation is spreading considerably in many countries thanks to different political and economic actions. In Switzerland, the "Energy Strategy 2050" 1 is supporting the installation of new renewable sources which, at domestic level, are mainly represented by photovoltaic installations. While this growth is beneficial for reducing the CO2 emissions, it poses a series of issues from the viewpoint of grid operators as they are contractually bound to provide a certain power quality. One of the possible mitigation actions to overcome these issues is represented by active Demand Side Management (DSM), that is a set of actions to directly control a site's energy consumption while respecting end user needs. In the context of a pilot project, the Lugaggia Innovation Community (LIC) funded by the Swiss Federal Office of Energy SFOE project SI/501840-01, we study the application of active DSM on a set of electrical loads spread in a residential area with the presence of renewable sources as well as energy storage systems. The municipality of Capriasca (Ticino, Switzerland), where the LIC community is located, recently installed a PV plant on the roof of the local kindergarten and other PV installations are already present in the area. The controllable loads and photovoltaic installations are summarized in Table 1. Aerial pictures of the PV installations are shown in Fig. 1a and b. The focus of this paper is to detail the methodology we use for Non-Intrusive Load Monitoring (NILM). In particular we address the issue of estimating the photovoltaic production in absence of a direct monitoring infrastructure and in absence of accurate meteorological data.

State of the art
Energy disaggregation or non intrusive load monitoring has been studied since the 90s when Hart (Hart 1992) proposed his approach based on the detection of edge changes in the power signal, their clustering, and the subsequent optimal allocation to the possible appliances in order to get the best fit for the aggregate signal. This approach has influenced many authors which followed and some of its elements are still in use in the most effective algorithms. Extensive reviews of such algorithms are provided, for instance, by Zoha et al. (2012) and by Zeifman and Roth (2011).
NILM algorithms can be classified as proposed by Schirmer et al. (2020) who distinguish between algorithms using source separation, and those who does not. Source separation, or blind signal separation, is a known filtering problem where an aggregate signal composed by a mixture of a set of other signals is processed in order to obtain the individual contributions. This is exactly the problem that NILM attempts to solve and source separation has been solved in the past using a variety of approaches, from principal component analysis to singular value decomposition, to non-negative matrix factorisation. All these methods are characterised by the fact that they are unsupervised and they Fig. 1 Photovoltaic installations, source: https://map.geo.admin.ch try to optimise an objective function that represents the distance of the aggregate signal from the sum of the independent contributions.
The problem of source-separation algorithms is that their computational load is quite intense and that, in order to improve their performance, they also need to know with a certain detail the number and type of appliances into which the signal is to be decomposed, thus trading in the advantage of being unsupervised.
Algorithms that does not use the source separation approach are typically supervised algorithms that require substantial amounts of training data to take advantage of machine learning algorithms such as Support Vector Machines, Decision Trees and Deep Neural Networks such as the one by (Kong et al. 2019) .
The approach presented in this paper is based on a source separation method and it is described in the next section. It builds on previous work (Cominola et al. 2017;Piga et al. 2016;Rottondi et al. 2019) and it is also inspired by recent research from Bu et al. (Bu et al. 2020). The choice of this approach was favoured not only by our previous experience with it, but also by the consideration that we knew beforehand which devices we wanted to disaggregate: in our case, heat pumps and domestic water heaters, considering all the other devices as a residual noise after having removed the contribution of PV power generation to the aggregated signal.

Methodology
Over a given discretized time frame T, a given power profile is characterized by an aggregated effect of several loads. In particular we are given the power profile as series of P-Q (active and reactive) points p t and q t , ∀t ∈ T. We are given a set of controllable electrical loads L each characterized by a nominal power p l and a nominal reactive power q l , ∀l ∈ L. We are also given an invariant representative dataset of non-controllable profiles S that are used to represent the contribution of the uncontrollable component of the power profile, each characterized by a given P-Q power point at time t defined by p s,t and q s,t . The NILM problem consists in determining at which moment in time load l was absorbing power and which combination of power profile s is the most appropriate to represent non-controllable loads.

Mixed integer quadratic problem (MIQP) formulation
Let x l,t ∈ {0, 1} be the controllable load variables (i.e. x l,t takes value 1 if load l is estimated to absorb power at time t), y s ∈[ 0, 1] be the non-controllable profile selection variables (note y s is a continuous variable and can therefore represent the selection of any convenient fraction of the representative non-controllable load profile s). Let wp t and wq t be auxiliary variables used to represent the estimation error at time t.
Objective (1) is a convenient formulation to minimize the RMS error between the given power profile p t and q t and the aggregation of controllable and non controllable loads. Constraints (2) and (3) compute the absolute error as the difference between the meter reading p t and q t and the estimated contribution of loads l ∈ L and the reference power profiles s ∈ S at time t. Constraints (4) ensure that a convex combination of reference power profiles is selected.
Each controllable load l is connected to an actuation device and sometimes the metering infrastructure is able to provide, together with the regular power metering measurements, the status of the actuation device as an additional data series. Let u l,t ∈ {0, 1}, ∀l ∈ L, t ∈ T be the representation of the state of the actuation device: 1, if the load l is connected to the grid, i.e. it can absorb power, at time t, 0 otherwise. We can conveniently exploit this knowledge in the MIQP formulation by setting upper bounds to variables x l,t as follows: The MIQP formulation can be further improved by the analysis of the type of electrical loads to be monitored. Indeed, some of the controllable electrical loads l are used to heat materials characterized by a considerable inertia (e.g. water, air, concrete and furnitures). We can incorporate this knowledge in MIQP. First, we have to capture the state change (first derivative) by adding continuous auxiliary variables 0 ≤ f l,t ≤ 1, l ∈ L, t ∈ T \ t 0 that are linked with expected activation variables x l,t , then we limit the total number of state changes as follows: with the set of constraints (8) and (9) we ensure that the expected number of activations of the electrical load l is limited by a constant θ l .

Photovoltaic non intrusive monitoring
In presence of PV installations, their power production can be absorbed by local electrical loads in what is commonly referred as self-consumption. In such cases, the metering infrastructure records the net power consumption only. While PV installations may have a dedicated monitoring infrastructure and the load power profile can be easily computed, more and more installations are not monitored as the PV production is no more subsidized in many countries. This is the case of the PV installation in the LIC area. Figure 2 illustrates the case where some loads activate during the period when PV is producing power.
When the PV installations are not monitored, the methodologies for NILM must be enriched in order to account for PV production. One possibility is to enlarge the set of uncontrollable load profiles S with power profiles accounting for PV production, i.e. power profiles with negative values corresponding to the estimated PV production. This approach has been adopted in (Sossan et al. 2018;Nespoli and Medici 2017).
Another approach is to exploit global horizontal irradiance (GHI) data coming from weather services and estimate the PV power production based on each installation's nominal data (nominal ac and dc power). In order to do so, software libraries are normally used. One popular software library is PvLib, a community supported tool that provides a set of functions and classes for simulating the performance of photovoltaic energy systems (Holmgren et al. 2018). This latter approach is usually more accurate than the first one in case weather and nominal data are accurate. In our case, accurate weather data is not available. Therefore we reconstruct an approximation of the global irradiance data exploiting the metering data of all installations that are close to one another. The rationale behind this approach is that not all major loads are absorbing power simultaneously and the effect of PV installations is directly visible in the metering data.
Let H be the set of metering infrastructures reading neighbourhood households equipped with PV installations, p t,h identifies the power reading at time t ∈ T of household h ∈ H. Let GC t be the global irradiance in clear sky conditions at time t ∈ T, computed using PvLib. Letp t,h be an upper bound on the power production of the PV installation of household h computed with PvLib using the known nominal data of the installation and the global irradiance with clear sky conditions and r t,h be the ratio between the upper bound and the negative portion of the power reading limited by an upper boundr (equal to 1.1 in our tests to allow for some additional freedom) andr t be the maximum of such ratios for t ∈ T: Using r t we approximate the global irradianceĨ t as a fraction of the clear sky global irradiance GC t and estimate the PV installation power productionp t,h of household h with PvLib.

Numerical experiments
The present section is organized in two subsections. We first present the computational results related to the estimation of the global horizontal irradiance. Then, we present the computational results related to the NILM methodology applied to synthetic data. The rooftop of the Kindergarten, see Fig. 1b, is also equipped with a pyranometer for measuring solar irradiance gathering data with one minute resolution. We use this data for validation purposes only. Figure 3 shows the global irradiance for a period of two months.
We gather real data from the LIC area for the period comprised from 04.04.2020 to 03.06.2020. We collect one data point per minute. For space reasons we summarize results as weekly averages (daily results are available contacting the authors). Table 2 reports the test weeks and the average daily measured GHI in the first two columns. Then the remaining columns report on the Weekly Average Root Mean Square Error (RMSE) and the Weekly Average Mean Absolute Percentage Error (MAPE) of the estimated GHI with respect to the measured GHI. In our tests, we use time discretization of the set T of 5 minutes leading to |T| = 288 time steps. Columns marked with > 0 and > AVG report values when the time steps are filtered by considering those with measured GHI larger than 0 and larger than the daily average, respectively. Indeed, several time steps are characterized with a measured GHI = 0 (during night time) and the GHI is easily estimated. This is confirmed by results in Table 2 observing that the values in the column RMSE(>0) are always larger than those in column RMSE. Instead, by definition, MAPE is computed on time steps where the measured GHI is strictly positive. When observing time steps where

Fig. 4 Measured and estimated GHI
the measured GHI is larger than the daily average, that is on time steps which should correspond to large PV power production, we do not observe a clear trade off in the quality of the estimation. Considering daily results, we report that MAPE(>AVG) varies from 4.6% to 41.9%. The plot of these two cases are shown in pictures Fig. 4a and b. In Fig. 4a we can appreciate the correct estimation of a sudden drop of the GHI around 9AM. We see that the worst cases occur when the daily avg GHI is very low (cloudy days) and the PV production is indeed less relevant. If we limit the investigation to sunny or scattered days (e.g., with GHI >100 W /m 2 ) we observe that the worst MAPE is 21.2% (see Fig. 5).
While the estimation quality may seem particularly bad on some days, the outcome of PV production estimation is sufficient for the purposes of non intrusive load monitoring.

Non intrusive load monitoring
For non intrusive load monitoring, we tested the MIQP formulation approach with the help of a comprehensive low voltage simulation framework called OPTISIM and developed within SUPSI. The simulation framework assembles several state-of-the-art IT components and integrates a power flow solver (Rosato et al. 2018), in order to carry out the electrical calculations. For the simulation of the grid-interfaced agents driven by the control algorithms, OPTISIM simulates physical models of several classes of components, namely: Building thermal dynamics, Heat distribution systems, Heating logic, Heat pumps, Water tanks and boilers, PV plants, Batteries, EV chargers. Profiles of passive, uncontrolled loads and domestic hot water consumption were generated using an artificial load profile generator 2 . We remark that the uncontrolled load profiles used in OPTISIM are not used as a basis for set S.
The experimental scenario is modeled to simulate the LIC area. While the area is composed of 18 households, 13 of them are equipped with 12 Domestic Hot water (DHW) heaters and 12 Heatpumps (HP). PV installations are similar to those present in the LIC area. The OPTISIM simulator provides both the aggregated power reading as well as the separated power profiles that we use for validation. A time resolution of 1 min is used for simulation. Recorded meteo data (GHI, Temperature and Wind speed) over one calendar year in 2018 are used by OPTISIM to simulate heat needs and PV production.
The MIQP formulation is solved with Gurobi 9.0 on a standard desktop machine. We used two settings for the time-step discretization. The first setting uses a time resolution of one minute, that is |T| = 1440, therefore compatible with the time resolution of OPTISIM. In this case, the model is solved to optimality with an average computational time of 20s seconds. The second setting uses a time resolution of 5 min by averaging the power reading provided by OPTISIM. In this case |T| = 288 and the average computational time drops to 2s to reach optimality.
In Table 3 we compare the quality of NILM for the two time resolution over 20 consecutive days in January. The table reports the considered loads (the number in the load's id refers to the relative household), the total energy consumption over the period and the fraction of time the load was absorbing power during the period. Then we report the RMSE error and the RMSE error restricted to samples where the load was actually absorbing power. Then we report in column ERROR the percentage error, that is the fraction of times the status of the load was wrongly estimated.
In January power consumption is relatively high, in particular heat pumps contribute to the energy consumption by a large amount and they are up and running for up to 29% of the time.
We observe that head pumps are correctly monitored with very small errors. This is mostly due to the fact that heat pumps exhibit a clear footprint in reactive power, which is not the case for resistive loads. When aggregating to 5 min resolution, estimation errors start to rise up to 21% for one of the DHW loads. Even with lower resolution heat pumps are monitored with minor errors, up to 6.8%. We observe that for 1 min resolution, the RMSE is larger than RMSE> 0. That is, NILM correctly estimates activation periods and sometimes produces false positives. This behaviour is less critical for DSM applications as false positives constitute an overestimation of the power needs. Instead, for 5 min resolution, the RMSE is smaller than RMSE> 0. That means that sometimes load activation is underestimated and that is more critical for DSM applications. This effect is due to aggregation when, for example, loads are active for a short amount of time (about 5 min) and the activation occur across half of the time interval. In Table 4 we compare the quality of NILM for 20 days in two different periods of the year with time resolution set to 1 minute. For the sake of compactness, in this table we report the ERROR column only.
We observe that energy consumption decreases significantly as the weather gets warmer. Heat pumps are still monitored with very high accuracy, while some major errors are seen on DHW.
In Table 5 we compare the quality of NILM for 20 days in the same period of that reported in Table 3 comparing the effect of limiting the estimated activation of DHW loads, i.e. the introduction of constraints (8) and (9). In our tests we set θ l = 12 that is, we expect the activation of DHW load to occur up to every two hours in average.
We observe that quality of the estimation increases significantly for almost all DHW loads except load dhw_11. Anyway, setting the correct value of θ l is not trivial and further research should be devoted to that aspect.
For the sake of illustration, Fig. 6 shows the application of NILM on household 06 on 03.01.2018. We observe the presence of a PV installation which is correctly estimated by the method described in subsection Photovoltaic non intrusive monitoring and the correct detection of the activation of both the DHW and the heat pump.

Conclusions
We address the problem of non-intrusive load monitoring in presence of distributed power generation (photovoltaic). The proposed approach, based on mathematical programming, showed to be effective for source separation with a low frequency data sampling. We proved that the estimation of the global irradiance exploiting a set of non-monitored photovoltaic installations in the same area is sufficient to discount the contribution of PV power generation to the aggregated signal and therefore obtain accurate source separation.