 Research
 Open Access
Energy forecasting based on predictive data mining techniques in smart energy grids
 Ekanki Sharma^{1}Email author
 Published: 10 October 2018
Abstract
Energy forecasting is a technique to predict future energy needs to achieve demand and supply equilibrium. In this paper we aim to assess the performance of a forecasting model which is a weatherfree model created using a database containing relevant information about past produced power data and data mining techniques. The idea of using a weatherfree datadriven model is first to alleviate the dependence on weather data which, in some scenarios is difficult to obtain and second to reduce the computational effort. In this work, we aim first to evaluate the interplay between anomaly detection techniques and forecasting model accuracy. Secondly we will determine out of the three defined performance metrics, which one is the best for this particular application.
Keywords
 Energy forecasting
 Weatherfree data
 Datadriven model
 Anomaly detection
Motivation
The increasing penetration of renewable energy resources (RES) in the today’s power system has made energy forecasting a popular theme. It is very important for grid operators and decision makers to know how much power RES will produce over next hours and days (Dobschinski et al. 2017). Along with this, predicting load demand and consumption plays a vital role in operation and planning of modern power system. Storage of electrical energy is necessary in the case when there is excess power production from the RES and less load demand. However, it cannot be massively stored as energy storage is expensive, requires high maintenance and have limited lifespans. Because of this, utilities have to balance supply and demand at every moment. These limitations lead to several interesting characteristics of energy forecasting, which includes data collection and the need for precise accuracy. Forecasting errors lead to unbalanced supplydemand, which adversely affects the operational cost, reliability and efficiency. Forecasting energy production and consumption are usually based upon meteorological data like solar irradiation, temperature and number of occupants and appliances respectively. However, there are some scenarios where the onsite measurements for solar irradiation and other meteorological variables like temperature and humidity are unavailable and only the past power measurements are available. In such cases, datadriven models utilizing the available past power production data can be used. In this paper we aim to exploit the available past power data and to assess the performance of datadriven forecasting model in terms of accuracy by applying data preprocessing techniques. The idea is to choose an appropriate anomaly detection technique and datadriven methodology for energy production forecasting along with developing a unified model for longterm forecasting with step of shortterm (hourly) accuracy.
Background
In the past decades, different approaches for forecasting energy production, distribution and consumption had been implemented. In the domain of energy consumption forecasting several techniques are used by researchers which includes traditional methods such as regression, time series, statistical methods along with soft computing techniques such as Artificial Neural Networks (ANNs), Support Vector Machine (SVM), fuzzy logic, and Grey prediction. A good overview of these techniques can be found in (Suganthi and Samuel 2012).
To perform predictions typically larger datasets in connection with deep learning are becoming common. In (Marino et al. 2016), authors implemented an energy load forecasting technique with long short term memory (LSTM). Two variants of LSTM are presented, standard LSTM and the LSTMbased SequencetoSequence (S2S) architecture. Both variants are tested with one hour and one minute time step resolution data, the results indicate S2S worked well in both datasets.
To integrate RES in the power grid, forecasting photovoltaic (PV) yield is very important, as the output of PV systems is sensitive to weather conditions and to the varying strength of solar irradiance striking the PV surface throughout the day. The input variables and prediction horizon affect the accuracy of the prediction model. In general, the relevant variables which are available as inputs of the prediction model of solar power includes historical measurements of PV generation, historical measurements of explanatory variables like temperature, global irradiance, wind speed or cloud coverage (Wan et al. 2015). In the domain of energy production forecasting, there are several studies which reveal the potential of Artificial Intelligence (AI). Authors in (Saberian et al. 2014) implemented solar power modelling method using artificial neural networks (ANNs) which includes two neural network structures, namely, general regression neural network (GRNN) and feedforward back propagation (FFBP) to model a PV panel output power. They used meteorological data and estimated generated power to train the GRNN and FFBP. The results indicated higher accuracy when using FFBP. In (Khatib and Elmenreich 2015), authors proposed a generalized regression artificial neural network for predicting hourly solar radiation. In (Alanazi et al. 2017), authors implemented a nonlinear autoregressive neural network for prediction of irradiance. Authors in (Gandelli et al. 2014) implemented a new hybrid method PHANN (Physical hybrid artificial neural network), combining physical model with statistical model (neural network) and concluded that the PHANN method is more accurate than ANN. Authors in (Dolara et al. 2018), employed a similar method considering theoretical clear sky solar radiation model and stated the improved accuracy in case of hybrid method. It can be inferred that combining and implementing several techniques like statistical and physical could improve the performance. Based on the reviewed papers, we assume that it is possible to improve the accuracy by applying advanced approaches like soft computing techniques which outperforms naive methods coming from statistical theory. However, complex models such as deep learning models do have a limitation in terms of interpretability.
Nevertheless, it is possible to improve the accuracy by applying data preprocessing techniques (anomaly detection) i.e. feature selection and outlier rejection. Before applying any forecasting model these two important issues should be considered (Saleh et al. 2016), as both have a direct impact on the forecasting model performance. In large datasets, there is often the case where we have many ineffective features and a feature selection process could minimize the considered features to effective ones. This process can improve the model performance and provide faster decisions. The authors of (Saleh et al. 2016) implemented a data miningbased load forecasting strategy and divided the whole process in two parts data preprocessing and load estimation. The data preprocessing step performed outlier rejection to eliminate the bad data using a distancebased outlier rejection and feature selection using genetic algorithm. However, authors have clearly mentioned that the outliers are rejected based on a global view, where extreme values are considered as outliers. It is noteworthy that these values do represent real measured value and in some circumstances extreme values may indicate sudden events. Secondly to construct the case study they used historical electricity load dataset. They did not explore the validity of their model on the real time data set which may pose additional challenges.
Stateoftheart: Anomaly detection or outlier rejection
Author  Problem Targeted  Method applied  Contribution & Perspective 

Panapakidis (Panapakidis et al. 2018)  Missing data treatment  Data processing (Clustering phase + completion phase) − Clustering phase is unsupervised machine learning tool Kmeans − Completion phase filling technique application  − Proposed a new methodology for data filling − Applicable for both complete and partial absence of data − Presents a novel methodology for missing and incomplete data completion − Methodology is not dependent on data size, data resolution and amount of missing data − Incomplete data artificially completed with data entries of high similarity 
Daliento (Daliento et al. 2017)  Monitoring & diagnosis of faults in single and multiple PV array strings  Monitoring and diagnosis techniques based data mining i.e. decision tree method, Knearest neighbour and SVM  − Reviewed methods for fault detection − Presented reliability issues 
Stateoftheart: Datadriven modelling
Author  Forecast model  Forecast horizon  Performance metrics & forecast error measurement  Contribution & Perspective 

Liu et al. (Liu et al. 2015)  − Back propagation based ANN  24 h ahead  MAPE = 7.65%  − Aerosol index parameter used as input, which resulted in slightly improved accuracy 
Ramsami, Oree (Ramsami and Oree 2015)  − Stepwise Regression − GRNN − FFNN − MLR  24 h ahead  RMSE = 2.74%  − Stepwise regression: select I/P variables highly correlated to PV power O/P − GRNN,FFNN,MLR and their hybrid were applied on the I/P − Hybrid model showed slight improvement 
Ordiano et al. (Ordiano et al. 2017)  − ANN6 − ANN10 (Datadriven)  24 h ahead  MAE = 6.647.25% RMSE = 12.4713.3% \( r\left(\rho, \overline{\rho}\right) \)= 85.8187.65%  − Accuracy of model strictly related to accuracy of historical database (Achieved reasonable accuracy) 
(Filik et al. 2011) proposed a novel unified model for short, medium and longterm for hourly electric energy demand forecasting. The authors compared the accuracy of analytically developed model with three different ANN architectures and achieved highest accuracy with time delay back propagation ANN architecture.
Methods
Material
Energy forecasting algorithms are trained and tested on energy consumption and production datasets. These datasets contain energy readings from the smart meter and power output produced by PV. The forecasting approaches which are present in the literature usually utilize proprietary data. Instead, we will use freely avail able benchmark data for testing future energy forecast models which makes the comparison between approaches easier to understand. In this study we intend to use the Open Power System Data (OPSD)(openpowersystemdata.org) and the Australian Solar home electricity dataset provided by Ausgrid (ausgrid.com.au).
Model evaluation
We first split the data into training and testing datasets and then run the machine learning algorithm on the training dataset to generate the prediction model. Then we use the test dataset to evaluate the model. To avoid underfitting and overfitting cross validation will be performed. In order to evaluate the performance of the forecasting algorithm, various performance metrics are available in the literature. These standardized performance measures or metrics helps in providing forecast evaluations and benchmarking (Pelland et al. 2013). This includes Pearson correlation (ρ), mean bias error (MBE, or bias), mean square error (MSE) and root mean square error (RMSE), mean absolute error (MAE) and standard deviation (SDE).
3. Metric MAPE access uniform prediction errors given by (3)
Where p_{meas} represents actual solar power generation at i_{th} time step, p_{pred} is the corresponding solar power generation estimated by forecasting model, N is the number of points estimated in the forecasting period. This metric is useful for evaluating the overall performance of the forecasts, especially when extreme events are a concern.
Methodology

In the first step, data preprocessing techniques are applied to perform anomaly detection and outlier rejection. Three machine learning based approaches are considered:

Densitybased anomaly detection

Clusteringbased anomaly detection

Support vector machinebased anomaly detection


In the second step, the preprocessed data with chosen anomaly detection technique obtained from the first step is used to train the datadriven model based on predictive data mining techniques. The outcome of investigation from these steps will explore the interplay between anomaly detection technique and forecasting model accuracy.

The third step involves developing a unified model which forecasts accurately for different time horizons i.e. shortterm, mediumterm and longterm forecasting.
Conclusion
Intelligent decision making is important to provide an unprecedented flexibility in the energy management for the future power system. This requires accurate forecasts of future energy production and demand/consumption.
In this paper we first discussed the terminology of energy forecasting and its classification based on different time horizons followed by a detailed stateofthe art which revealed that applying advanced soft computing approaches will likely outperform the statistical methods. However, complex methods have a limitation in terms of interpretability.
We proposed to apply data preprocessing technique along with datadriven forecasting model which can possibly improve the accuracy even when using partial information i.e. past power data.
Declarations
Acknowledgements
The author would like to thank Prof. Dr. Clemens Van Dinther for his valuable feedback and comments during the shepherding process.
Funding
Publication of this article was sponsored by funds of the Smart Grids research group.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.
About this supplement
This article has been published as part of Energy Informatics Volume 1 Supplement 1, 2018: Proceedings of the 7th DACH+ Conference on Energy Informatics. The full contents of the supplement are available online at https://energyinformatics.springeropen.com/articles/supplements/volume1supplement1.
Authors’ contributions
ES analysed related work, identified open issues, and developed a research proposal related to her PhD project. The author read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The author declares that she has no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Alanazi M, Alanazi A, Khodaei A (2017) Longterm solar generation forecasting. 2016 IEEE/PES Trans Distrib Conf Expo (T & D):1–13Google Scholar
 Chakhchoukh Y, Panciatici P, Mili L (2011) Electric load forecasting based on statistical robust methods. IEEE Trans Power Syst 26:982–991View ArticleGoogle Scholar
 Chen X, Kang C, Tong X, Xia Q, Yang J (2014) Improving the accuracy of bus load forecasting by twostage bad data identification method. IEEE Trans Power Syst 29:1634–1641View ArticleGoogle Scholar
 Daliento S, Chouder A, Guerriero P, Pavan AM, Mellit A, Moeini R, Tricoli P (2017) Monitoring, diagnosis and power forecasting for photovoltaic fields: a review. Int J Photoenergy. 1–13View ArticleGoogle Scholar
 Dobschinski J, Bessa R, Du P, Gleiser K, Haupt SE, Lange M, Mhrlen C, Nakafuji D, Rodriguez M (2017) Uncertainty forecasting in a nutshell: prediction models designed to prevent significant errors. IEEE Power Energ Mag 15:40–49View ArticleGoogle Scholar
 Dolara A, Grimaccia F, Leva S, Mussetta M, Ogliari E (2018) A physical hybrid artificial neural network for short term forecasting of PV plant power output. Energies 8:1138–1153View ArticleGoogle Scholar
 Filik UB, Gerek ON, Kurban M (2011) Hourly forecasting of long term electric energy demand using novel mathematical models and neural networks. Innovative computing. Inf Control 7:115–118Google Scholar
 Gandelli A, Grimaccia F, Leva S, Mussetta M, Ogliari E (2014) Hybrid model analysis and validation for PV energy production forecasting. 2014 Int Joint Conf Neural Netw (IJCNN).:1957–1962Google Scholar
 Khatib T, Elmenreich W (2015) A model for hourly solar radiation data generation from daily solar radiation data using a generalized regression artificial neural network. Int J Photoenergy:1–13View ArticleGoogle Scholar
 Liu J, Fang W, Zhang X, Yang C (2015) An improved photovoltaic power forecasting model with the assistance of aerosol index data. IEEE Trans Sustainable Energy 6:434–442View ArticleGoogle Scholar
 Luo L, Hong T, Yue M (2018) Realtime anomaly detection for very shortterm load forecasting. J Mod Power Syst Clean Energy 6:235–243View ArticleGoogle Scholar
 Marino DL, Amarasinghe K, Manic M (2016) Building energy load forecasting using deep neural networks. IECON 201642nd Ann Conf IEEE Ind Electron Soc:7046–7051. abs/1610.09460:16.Google Scholar
 Ordiano JAG, Waczowicz S, Reischl M, Mikut R, Hagenmeyer V (2017) Photovoltaic power forecasting using simple datadriven models without weather data. Comput Sci Res Dev 32:237–246View ArticleGoogle Scholar
 Panapakidis IP, Bouhouras AS, Christoforidis GC (2018) A missing data treatment method for photovoltaic installations. 2018 IEEE Int Energy Conf (ENERGYCON):1–6Google Scholar
 Pelland S, Remund J, Kleissl J, Oozeki T (2013) Brabandere KD (2013) Photovoltaic and solar forecasting: State of the art. Tech Rep IEA PVPS:T14–T01. 1–36Google Scholar
 Ramsami P, Oree V (2015) A hybrid method for forecasting the energy output of photovoltaic systems. Energy Convers Manag 95:406–413View ArticleGoogle Scholar
 Saberian A, Hizam H, Razid MAM, Kadir MZAA, Mirzaei M (2014) Modelling and prediction of photovoltaic power output using artificial neural networks. Int J Photoenergy 14:1–10View ArticleGoogle Scholar
 Saleh AI, Rabie AH, AboAlEz KM (2016) A data mining based load forecasting strategy for smart electrical grids. Adv Eng Inform 30:422–448View ArticleGoogle Scholar
 Suganthi L, Samuel AA (2012) Energy models for demand forecasting a review. Renew Sust Energ Rev 16:1223–1240View ArticleGoogle Scholar
 Wan C, Zhao J, Song Y, Xu Z, Lin J, Hu Z (2015) Photovoltaic and solar power forecasting for smart grid energy management. CSEE J Power Energy Syst 1:38–46View ArticleGoogle Scholar
 Zhang J, Florita A, Hodge B, Lu S, Hamann HF, Banunarayan V, Brockway AM (2015) A suite of metrics for assessing the performance of solar power forecasting. Sol Energy 111:157–175View ArticleGoogle Scholar