Non-intrusive load monitoring techniques for the disaggregation of ON/OFF appliances

of

to a drastic transition to more sustainable energy sources, we also expect a significant contribution from consumers through the reduction of their energy consumption.
To achieve these goals without excessively impacting the necessities of end-users, we need advanced technological infrastructures that can dynamically adapt the energy distribution to future energy demand. Smart Grids are interconnected networks whose goal is to deliver energy from producers to consumers in order to ensure a sustainable, economically efficient and secure supply system (Siano 2014). The idea is that the new system should be decentralized, with many small power producers and leveraging on small-scale transmission and regional supply compensation, and where the final consumer is an active and participating part of the overall system. This is done by using some advanced technologies and services such as smart meters, smart distribution boards, fiber broadband and demand response (Pallonetto et al. 2020).
Non-Intrusive Load Monitoring (NILM) indicates a wide set of techniques aimed at extracting the power consumption of electrical devices from the aggregate signal of the main meter of the house (Hart 1992). The solution is way cheaper than the intrusive solution because it requires the installation of a single monitoring device attached to the principal electric counter. The availability of appliance-level energy consumption can enable novel services for both consumers and utilities. First of all, this kind of information can significantly increment the awareness of end-users to reduce the power consumption of their electrical devices. Indeed, it has been demonstrated that end-users can save up to 12% of their energy consumption when provided with sufficiently detailed energy feedback (Yan et al. 2020). Those savings can be achieved by reducing the use of more energy-intensive devices or by replacing obsolete and inefficient devices like fridges and freezers. Information at the appliance-level can be even more precious for utility companies. For example, they can obtain a better profile of end-users and produce more accurate forecasts of their future energy demand. In particular, the participation of consumers to demand response programs can be increased by asking for less demanding schedules that are more aligned with the original end-users habits (Jordehi 2019).
As for now, there are plenty of methods to accurately disaggregate commonly used appliances given that enough training data are available to generalize to diverse manufacturers and brands. The reason is that household appliances such as fridge, washing machine, dishwasher, oven and microwave present a richness of features that allow to distinguish their operations from those of other devices. Conversely, we found that in the real-world energy disaggregation there are many devices that are difficult to be recognized because of their poor characteristics. In particular, there are multiple devices that fall into this category characterized by the presence of only two operational states (idle or active), i.e. ON/OFF devices such as kettles, air conditioners, heaters and hair dryers. This kind of devices can be easily confused because they may present the same peak power as many other electrical devices present in the house. Since these devices only represent a minor contribution to the total energy consumption, they have been rarely studied in the past literature, leaving an important gap in the disaggregation results in those days in which they are used. The purpose of this work is to analyze the operations of ON/OFF devices by using different combinations of features, in order to find the best set of features that can uniquely identify those devices. For this reason, we propose a new methodology that tries to recognize significant patterns in the activation of these devices. Indeed, once we know that ON/OFF devices can be properly characterized in terms of their operational and external features, we can easily develop a model to identify their operations at inference time. The challenging part consists of selecting the right set of features that allow to distinguish the operations of ON/OFF devices among each others, assigning repeating patterns to the corresponding device that is responsible for their occurrence. The methodology presented in this work aims at solving the specific problem of finding consistent patterns in the operations of ON/OFF devices.
The paper is organised as follows. "Related works" section outlines some interesting works present in the literature, related to the presented one, highlighting similarities and differences. "Dataset" section presents the dataset used for our experiments. "Methodology" section describes comprehensively the system developed in this work, motivating the choices, outlining the challenges and the solutions to solve them. "Results" section shows the results of the analysis, explaining how they have been collected after the process. Finally, "Conclusion" section gives some concluding remarks about our work, describing the possible future directions.

Related works
An appliance signature consists of a set of features allowing to identify the operating state of the monitored device in the aggregate signal of the whole house. Hart (1992) proposed a first taxonomy of the possible features that can be employed to describe an appliance signature. In general, these features can be broadly divided into steady state and transient features. The former represent all those variables characterizing the operational state of an appliance (e.g. current, voltage, active and reactive power), while the latter include those features describing the transition from one operational state to the other (e.g. shape, duration, magnitude). Both steady state and transient characteristics belong to the set of electrical features, which represent the most commonly used input for NILM algorithms (Angelis et al. 2022). In general, NILM can be described as a combinatorial optimization (CO) problem where the purpose is to find the optimal set of operational states that better reconstructs the total power consumption of the house. Some recent studies tried to solve NILM as a CO problem (Ajani et al. 2022;Berrettoni et al. 2021). However, this approach is rarely applicable in the real-world since it requires to known the exact power consumption of every appliance present in the house beforehand. A more practical approach consists of identifying significant step changes in the aggregate power signal and try to match positive and negative events with the same amplitude. Once positive and negative events are matched, it is possible to recognize the individual appliances thanks to the specific power consumption of their steady-states. Several previous works implemented event-based disaggregation algorithms exploiting different methods for assigning the extracted events to the correct appliance (Majumdar 2022; Liu et al. 2022). In addition to the amplitude of the step changes, we can also leverage the temporal correlations between the events of a certain appliance, including the order of their occurrence and their time duration. To this aim, other studies started employing Markov Chains and Graph Signal Processing algorithms to improve the accuracy of event marching and classification (Mengistu et al. 2018;Zhao et al. 2018;Zheng et al. 2021;He et al. 2016). Recently, the research community followed a new promising trend in energy disaggregation exploiting various kinds of neural network techniques, including more sophisticated deep architectures. In this new framework, the task of NILM is modelled as a regression problem, where the input is the aggregate signal of the house and the output is the power consumption of a specific appliance we want to monitor. Deep learning techniques are preferred to other methods because of their straightforward disaggregation process, which can easily capture both electrical and temporal features in one simple shot. Most importantly, deep learning algorithms demonstrated better accuracy with respect to other methods for nearly every possible household appliance, placing them among the best solutions for energy disaggregation (Kaselimi et al. 2020;Cimen et al. 2022).
Despite many type II devices can be successfully recognized by most of the previous NILM algorithms, type I (ON/OFF) devices can be easily confused between them because of their poor temporal features. For this reason, some researchers tried to exploit additional features other than electrical characteristics in the hope that those devices can be successfully recognized. Dinesh et al. (2017) proposed a NILM system based on partitioning the aggregate signal in a set of non-overlapping observation windows. Then, from each window, the first 5 principal components are extracted using Karhunen-Loeve Expansion method, as described in their previous work (Dinesh et al. 2015). Eventually, each window is treated separately applying an iterative algorithm whose goal is to find among the ensemble of all possible sets of appliances, the one that most likely matches that observation window. Specifically, the algorithm has 5 iterations (one for each principal component), and at every step i, it tries to find the set S of appliances such that, given the first i principal components of the considered window, the probability that the appliances contained in S are ON in the time-of-day of the central point of the window is maximized. They showed that their approach has better performance with respect to their previous implementation (without time-of-day) both in terms of F-score and accuracy. Moreover, this method performs even better when computing time-of-day probabilities for each season of the year separately. This work is particularly important because it shows that using time-of-day feature is very helpful in detecting those kind of appliances that show a strong correlation with the time of the day they are used in. However, the proposed approach is clearly supervised, since requires an entire year of labelled data, that are time consuming and expensive information to obtain. Ponrak (2021) implemented a semi-supervised system where different clustering techniques are applied to compare results, and where the end-user is involved to identify the names of the trained models. Each data point is characterized by electrical features only and an optimization process is used to determine the optimal number of clusters to find (both the silhouette and the elbow methods are used, both giving the same result). Then, a feature selection is applied to select the most important features that will be the ones used to build the machine learning model. Their results show that even simple clustering algorithms are able to clearly distinguish a device from another, with improved performance when limiting the number of features. Probably this is due to the fact that using additional features that do not add significant information only affect data sparsity, thus reducing the performance of a distance-based classification algorithm. A similar approach has been developed by Azizi et al. (2021). In their work, an improved version of hierarchical clustering with Ward linkage, combined with the elbow method is applied. Once all appliances' states have been detected, the classification process takes place leveraging some prior information about power transitions of appliances modes, specific observed behavioural patterns, and frequency of usage. Albeit being a supervised approach, since it requires a prior knowledge extracted from data, this implementation is able to discriminate in an effective way the various groups of devices operation modes, just with an enhanced version of a common classification technique. Salem and Sayed-Mouchaweh (2019) present in their work an online semi-supervised system seeking to extract each appliance's load from the aggregate signal directly inside the smart meter. This system is based on a Conditional HMM, whose hidden states are conditioned with the probability of each appliance to be used during every hour of the day, and an online Expectation Maximization algorithm is used to estimate the model's parameters. It turns out that this implementation outperforms either with respect to an HMM-based supervised implementation trained with sub-metered data, and with respect to a simplified version of the proposed implementation without the usage of the time-of-usage distribution of appliances.
Previous studies demonstrate that it is possible to implement reliable disaggregation systems with both supervised and unsupervised learning techniques. However, unsupervised methods are more indicated for real-world applications, given that annotations are rarely available in practice. In addition, previous studies highlight that operational features such as the time duration of the activation and external features like the time of usage can actually improve the disaggregation performance. In this work, we elaborate further on both the use of unsupervised clustering methods and the use of external features in order to disaggregate the power consumption of ON/OFF devices such as kettles, air conditioners, heaters and hair dryers. In more detail, the proposed approach differentiates from previous methods by using an online processing algorithm, which is capable of analyzing the device's operations in their sequential order of occurrence. In particular, we proposed the use of an online clustering algorithm to leverage information about the frequency of certain temporal patterns and to dynamically adapt our models as soon as relevant clusters come up from our analysis. Notice that we use more external features with respect to previous works, including the time of usage, the day of week and the occurrence of weekdays/weekends, combining them with operational features (peak power and time duration) in different clustering plans. Thanks to our approach, we were finally able to identify more than 35% of these ON/OFF devices, increasing the total energy disaggregated by our system from 80% to 87%.

Dataset
The methodology presented in this work has been developed to integrate an existing NILM system owned by a private company. The disaggregation algorithm was already able to disaggregate 80% of the total energy consumption of the monitored houses. Nevertheless, 20% of the energy consumption was remaining unclassified, even if part of that was clearly attributable to isolated operations of ON/OFF devices. Notice that the operation of an ON/OFF device can be characterized in terms of the peak power of its active state, which can be eventually used to detect the presence of the device in the aggregate load. However, the peak power does not suffice to uniquely classify an operation, since other devices in the same house may present the same peak power level. Therefore, we were provided with a broader set of features for each operation extracted by their event matching algorithm, which was in charge of pairing nearby events with equivalent power changes in opposite directions. The five features characterizing the single operation of an unknown ON/OFF device is reported in Table 1. The peak power is the power consumption of the active state. The time duration is measured as the number of seconds elapsed from the start to the end of a single operation. The time of usage is the measured as the number of seconds elapsed between the start of the operation and the beginning of the day. The day of week is an ordinal variable holding the day number from Monday (1) to Sunday (7). Finally, weekend is binary variable that indicates if the date is a weekday or a weekend day. In total, we were provided with the data of 53 houses monitored for a period of 10 months, from 1 September 2020 to 31 July 2021.

Methodology
In the following, we describe the different processing steps that constitute our methodology. As depicted in Fig. 1, the procedure for producing the final clusters from our dataset is very concise and simple. We just need to scale the various features characterizing the device's operations before feeding them to our online clustering algorithm. As soon as a relevant pattern is identified, a cluster for that device is extracted from the clustering plan and it will be used for classifying new data points starting from the next iteration of the algorithm.

Preprocessing
Feature scaling is crucial when we work with machine learning models that involve the computation of distances between samples. Indeed, features with greater scales may end up saturating the distance function, thus completely overlooking the distances between variables with lower scales. To avoid this issue, feature scaling brings all the features within almost the same interval of values. In this way, all variables are equally weighted during the computation of the distance function. In this work, we employed the min-max normalization to scale all the features in the interval between 0 and 1. The formula is reported in the following: where x max and x min represent the maximum value and the minimum value of a certain feature x, respectively. The list of minimum and maximum values for the different input features has been reported in Table 2. Please, notice that the parameters of the weekend variable are reported only for completeness, since they do not affect the scale of this feature that already lies in the interval [0, 1].

Online clustering
To group together operations belonging to the same device, we decided to use an online density-based clustering algorithm to process incrementally new operations as soon as they are generated by the event matching algorithm. For this purpose, we implemeted a modified version of the online clustering algorithm introduced by Hyde et al. in Hyde et al. (2017), which is called "Clustering evolving data-streams into arbitrary shapes" (CEDAS). In detail, the algorithm processes a new data point in the following way: if the data point falls within the radius of a micro-cluster's centroid, then we assign it to that micro-cluster; alternatively, we check if the new data point lies within the radius of a certain number of unassigned data points, which triggers the creation of a new microcluster with these points; if none of the previous conditions are satisfied, then the new point remains an outlier and it can be assigned to a new micro-cluster in future iterations. Differently from traditional clustering algorithms, the online implementation has a temporal component keeping track of the time-to-live of micro-clusters, killing those clusters that remains unchanged for a certain number of iterations. The reason is that we do not want to maintain old micro-clusters that are no more relevant for the disaggregation process. In our implementation, unassigned data points also have a time-to-live variable that is decreased at every iteration, in order to prevent the excessive accumulation of outliers as the clustering plan evolves. As time goes by, it may also happen that two or more micro-clusters are merged together to form a larger macro-cluster. In particular, the online algorithm keeps track of the distance between the centroids of micro-clusters and merges together those clusters whose distance is less than a certain threshold. Notice that a micro-cluster that has not been merged with others is considered a macrocluster on its own.
(1) The online clustering algorithm proposed in this work runs in parallel with four different clustering plans corresponding to as many combinations of features. In detail, the four features' combinations are reported in Table 3. In summary, we have a single two-dimensional clustering plan formed by the peak power and the time duration of the operation, that is combined with the remaining features (time of usage, day of week and weekend) to form three additional clustering plans in three dimensions. Notice that a single device's operation belongs at the same time to all the four clustering plans. In this way, we can analyze the same operation with different features' combinations, searching for relevant patterns in both operational and external features.
Among the different groups found by the proposed online clustering algorithm, we want to select only those that actually belong to the operations of a specific device. For this purpose, we defined a score to measure the quality of macro-clusters and a threshold to indicate whether the specific cluster represents the operations of a certain device or not. To avoid false positives, a cluster should exceed the specified threshold for more than 10 days, since a lot of clusters can momentarily surpass the threshold for shorter periods of time. Once a significant cluster has been found, all the operations belonging to that cluster are removed from the clustering plans, in order to avoid unwanted merges with other clusters in the next iterations .
The metric used to evaluate the quality of a macro-cluster is given by the following formula: where S c is the mean silhouette score (Rousseeuw 1987) of a macro-cluster and D c is a scaling factor that takes into account the cardinality of the macro-cluster. In particular, the factor D c is computed by dividing the total number of data points in the macrocluster by the constant value 20. In this way, D c assumes values lower than one when the corresponding macro-cluster c has less than 20 data points, thus penalizing clusters with very few operations observed. We remind that the silhouette score is computed for each sample through the following expression: where a i is the mean squared deviation between a single sample and all other occurrences within its macro-cluster, and b i is the smallest mean squared distance computed between the same sample and the instances belonging to all other clusters. The

Results
In this section, we discuss the effectiveness of the proposed methodology for the improvement of existing disaggregation algorithms. In particular, we want to focus on the time needed by our algorithm to find a relevant pattern and the final percentage of unknown operations correctly identified thanks to our method. Figure 2 shows the evolution through time of the quality score for multiple macroclusters found in one of our houses in the period from September 2020 to January 2021. In particular, on left hand side we can find the clusters formed with the combination of features made by peak power, duration and time of the day, whereas on the right hand side we can see the clusters on the plan formed by peak power, duration and day of week.
Similarly, Fig. 3 shows the evolution of the clusters with the other two combinations of features, i.e. peak power, duration and weekday/weekend (left hand side) and peak power and duration (right hand side). In both Figs. 2 and 3, we also reported the final number of data points in every macro-cluster (in black), together with their labels (in red). In this work, we used a very selective threshold for our quality metric in order to avoid false positives, which has been set to 0.9 for this purpose. Therefore, according to our threshold, the only relevant cluster that has been found during this period is the macro-cluster number 1 in the two-dimensional plan reported on the right hand side of Figure 3. It is evident that this macro-cluster is well separated from the other clusters, resulting in a high silhouette score. In addition, the high number of data points of this macro-cluster (i.e. 21) contributes to final quality metric, which is even higher than 1. Figure 4 shows the distribution of the number of clusters found per house. We can notice that 45 out of the 53 houses analyzed in our work contains at least one cluster. Figure 5 reports the number of clusters found with different features' combinations. According to the results, the most effective combination of features is the peak power and the time duration of the device's activation, accounting for more than 85% of the clusters found during the monitoring period. The second most significant features' combination employs the triplet peak power, time duration and time of use, representing 10% of the total number of clusters. The remaining 5% of clusters were found by combining the peak power and the time duration with either the day of week or the weekend occurrence. The results show that operational features such as the peak power and the time Fig. 3 Evolution of the quality metric during the online clustering process with the features combinations {peak power, duration, weekday/weekend} on the left and {peak power, duration} on the right duration provide more information than external features such as the time of usage and the day of week. This fact also suggests that very few devices present regular patterns in their usage due to the habits of the end-users.
The effectiveness of the proposed methodology has been also evaluated in terms of the convergence time, i.e. the time needed to find a relevant pattern within the set of unknown operations. The convergence time can be measured either in terms of the number of operations or the number of days needed until the cluster's score exceeds our threshold. Figures 6 and 7 shows the number of operations needed to find a significant cluster in the two-dimensional (2-D) clustering plan and the three-dimensional (3-D) clustering plans, respectively. Notice that we have only a single 2-D clustering plan formed by the peak power and the time duration, whereas the 3-D clustering plans are formed by a combination of these two features with time of use, the day of week and the weekend variables. Analogously, Figs. 8 and 9 reports the number of days needed until a , we need a monitoring period between 1 month and 5 months to find a cluster in both the 2-D and 3-D plans, with some extreme cases where more than 8 months were needed. Clearly, the 3-D plans have very few data points to draw significant conclusion on their convergence time. On the contrary, the converge time in the 2-D plan represents a more precise estimate of the number of operations and the number of days it will take before a cluster is found. The number of days can vary a lot depending on the frequency of the operations during the monitoring period. Indeed, a device that is used more frequently will require less days to be recognized with respect to a device that is rarely used. Therefore, the number of operations remains the most reliable metric to measure the convergence time of our methodology.
At the end of the monitoring period, we found that more than 35% of the energy consumption due to unknown operations was correctly identified thanks to the proposed algorithm. The online clustering procedure introduced in this work has been integrated with our existing disaggregation system (Mengistu et al. 2018), which was responsible for the recognition of commonly used appliances (refrigerator, dishwasher, washing machine, oven, microwave). Considering that the existing algorithm already achieves a disaggregation accuracy of more than 80% in the monitored houses, thanks to our online clustering method we are now able to recognize an additional 7% of the total energy consumption of the end-users. This small increment in the overall energy consumption assumes a greater importance if we consider that ON/OFF devices represent a relevant portion of the unknown loads in those days in which they are utilized. Therefore, the ability of recognizing those patterns can significantly reduce the percentage of energy consumption that is generally assigned to unknown loads.

Conclusion
In this work, we presented an online clustering method to identify the power consumption of some ON/OFF appliances such as electric heaters, air conditioners, kettles and hair dryers. Indeed, very few interest has been demonstrated towards those devices in the NILM literature, mostly because they are too simple devices, yet with a too similar behaviour to be accurately distinguished. Moreover, they are generally associated to the idea that they are not the most energy-consuming devices in a building. However, a lot of appliances can be classified in this category, and being capable to recognise them can be a great improvement in the user experience of a load monitoring system. Thanks to our approach, we found that the monitored houses can have from 1 to 6 devices belonging to that category. We also found that the most relevant features for the appliance detection remain operational variables like the peak power and the duration of the single operation, accounting for up to 85% of the clusters identified in the monitoring period. Nevertheless, external features such as the time of usage, the day of week and the weekend occurrence allowed us to identify the remaining 15% of our clusters. The proposed online clustering algorithm demonstrated reasonable convergence time for a real-world application, requiring less than 5 months in the majority of cases to find those clusters. Clearly, the convergence time depends on the frequency of usage of the monitored devices, corresponding to an average of 40 operations until a relevant cluster is found. The proposed methodology is intended to support an existing disaggregation system and assumes that the system already disaggregates the majority of type II devices (i.e. finite state machines). If the existing disaggregation system is not sufficiently accurate (i.e. the percentage of disaggregated energy is lower than 80%), then the proposed algorithm risks generating false positives as a consequence of the overpopulation of the clustering plans.
in the future works, we will study whether ON/OFF devices can be better characterized by employing higher sampling frequencies able to capture fine grained transient features in the activation of those devices.