The following subsections shortly discuss the main characteristics of the three implemented algorithms. All algorithms have been used to predict the on-off times of appliances with a resolution of 1 h.

### Histogram algorithm

Assuming that household activities follow a weekly pattern, one can build up a histogram of on-times of an appliance for each weekday based on the training data (Chrysopoulos et al. 2014; Holub and Sikora 2013). The approach used in this work is shown in Eq. (1). It conditions relevant day-profiles with a Gaussian weighting around the time of interest. In this manner we allow on-events in the past that are not precisely aligned with the time of interest to influence the prediction. Based on the preceding *N* days each subdivided into *T* time intervals, the probability that on day *n* at time *t* appliance *l* is running is calculated as

$$ p\left({x}_{ntl}\right)\propto {\sum}_{m\in N}{\sum}_{\tau \le T}{w}_{nm}{e}^{\frac{{\left(t-\tau \right)}^2}{2{\sigma}^2}}{x}_{m\tau l} $$

(1)

where *x*_{mlt} = 1 if appliance *l* was running during the interval *τ* on weekday *m* and *x*_{mlt} = 0 otherwise. *w*_{nm} = 1 if *n* = *m*, and *w*_{nm} = 0 otherwise. The variance *σ* is a model parameter that was set experimentally, see results section.

### Pattern search algorithm

Whereas the histogram-based approaches assume the weekdays to be the governing pattern defining the weights *w*_{nm}, see Eq. (1), the approach by Barbato (Barbato et al. 2011) tries to identify these patterns. It does so by relying on the redundancy of variably sized day-patterns. To this end, one maps the *N* days preceding the day to be predicted, *n*, to a binary array of the form [*δ*_{n − N}, *δ*_{n − (N − 1)}, …, *δ*_{n − 1}] of length *N* with *δ*_{i} = 1 when the appliance was running on day *i*, and *δ*_{i} = 0 otherwise. The sub-array *S*_{n}(*i*) is then defined as *S*_{n}(*i*) = [*δ*_{n − i}, *δ*_{n − (i − 1)}, …, *δ*_{n − 1}] for a given length 1 ≤ *i* < *N*/2. The occurrences of both the sub-pattern *S*_{n}(*i*) as well as [*Sn*(*i*), 1] = [*δ*_{n − i}, *δ*_{n − (i − 1)}, …, *δ*_{n − 1}, 1] is counted in the original array and the probability of a pattern of length *i* followed by a conjectured on-day is calculated as

$$ {s}_n\left(i,1\right)=\left\{\begin{array}{cc}0& \mathrm{if}\#\left[{S}_n(i)\right]=1\\ {}\frac{\#\left[{S}_n(i),1\right]}{\#\left[{S}_n(i)\right]-1}& else\end{array}\right. $$

(2)

and correspondingly *s*_{n}(*i*, 0) for a conjectured off-day (note that *s*_{n}(*i*, 1) + *s*_{n}(*i*, 0) = 1 by construction). Now *i* is increased until either *s*_{n}(*i*, 1) or *s*_{n}(*i*, 0) equals 1. In the latter case, a day without any appliance usage is predicted. Whereas in the former, the days following the occurrences of the pattern *S*_{n}(*i*) define the relevant days used for forecasting. They replace the days with identical weekday as used in the Histogram algorithm. It turns out that for the investigated data, patterns are not as obvious as in (Barbato et al. 2011), i.e. there is typically not an optimal pattern length *i* resulting in either *s*_{n}(*i*, 1) or *s*_{n}(*i*, 0) being 1. We therefore extended the original approach as can be seen in Eq. (3). Day *n* is predicted by the sum of the *K* most probable patterns weighted with the probabilities *s*_{n}(*i*, *α*).

$$ p\left({x}_{nlt}\right)\propto {\sum}_{i,\alpha }{s}_n\left(i,\alpha \right){\delta}_{\alpha 1}{\sum}_{m\in {N}_i,\mathrm{s}\le T}{e}^{\frac{{\left(t-s\right)}^2}{2{\sigma}^2}}{x}_{msl} $$

(3)

where ∑_{i, α}goes over the K most relevant patterns. The Kronecker Delta *δ*_{α1} leads to a zero contribution of the patterns predicting a day with no appliance usage.

### Bayesian inference algorithm

The third investigated method (Truong et al. 2013) uses Bayesian inference, which differs fundamentally from the previous approaches. It uses a Markov-Chain Monte-Carlo approach to sample the posteriori distribution of the model parameters. The key elements of the model are the latent day-types *k*. They are used to create day profiles and to record correlations between the use of individual appliances. In summary, the probability *p*(*x*_{nlt}) of appliance *l* running at time *t* on day *n* is calculated as

$$ p\left({x}_{nlt}\right)\propto {\sum}_Kp\left(k\mid n\right){\mu}_{kl}(t) $$

(4)

where *k* goes over all *K* day-types and *p*(*k*| *n*) is the probability of day *n* being described by day-type *k*. One of the advantages of this approach is that it infers the parameters for each appliance *l* from the data of all appliances resulting with an effective training set of *N* · *L* data points, *L* being the total number of appliances.