Evaluation of neural networks for residential load forecasting and the impact of systematic feature identification

Energy systems face challenges due to climate change, distributed energy resources, and political agenda, especially distribution system operators (DSOs) responsible for ensuring grid stability. Accurate predictions of the electricity load can help DSOs better plan and maintain their grids. The study aims to test a systematic data identification and selection process to forecast the electricity load of Danish residential areas. The five-ecosystem CSTEP framework maps relevant independent variables on the cultural, societal, technological, economic, and political dimensions. Based on the literature, a recurrent neural network (RNN), long-short-term memory network (LSTM), gated recurrent unit (GRU), and feed-forward network (FFN) are evaluated and compared. The models are trained and tested using different data inputs and forecasting horizons to assess the impact of the systematic approach and the practical flexibility of the models. The findings show that the models achieve equal performances of around 0.96 adjusted R 2 score and 4–5% absolute percentage error for the 1-h predictions. Forecasting 24 h gave an adjusted R 2 of around 0.91 and increased the error slightly to 6–7% absolute percentage error. The impact of the systematic identification approach depended on the type of neural network, with the FFN showing the highest increase in error when removing the supporting variables. The GRU and LSTM did not rely on the identified variables, showing minimal changes in performance with or without them. The systematic approach to data identification can help researchers better understand the data inputs and their impact on the target variable. The results indicate that a focus on curating data inputs affects the performance more than choosing a specific type of neural network architecture.


Introduction
Energy systems face challenges due to climate change, distributed energy resources, and political agenda.For instance, in Denmark, By 2030 carbon emissions should be reduced by 70%, with the goal by 2050 being carbon footprint neutrality (Danish Energy Agency 2022a; Ma and Jørgensen 2018).To achieve this goal, the Danish government has introduced initiatives to accelerate the energy system transition to a total reliance on renewable energy sources.Among the initiatives are state-of-art energy islands, investments in technologies, such as Power-to-X and Carbon Capture, and a green transition of the industry (Danish Energy Agency 2022b).However, the changes to the energy system will lead to an increasing number of distributed energy resources (DERs), introducing new challenges, such as grid balancing (Ma et al. 2017(Ma et al. , 2019a;;Billanes et al. 2017).In addition, the electrification of vehicles and heating of households through heat pumps increases the overall electricity consumption (Ma et al. 2021;Fatras et al. 2021).These challenges are significant to distribution system operators (DSOs) who are responsible to the electricity grids (Ma et al. 2016;Christensen et al. 2021).Furthermore, DSOs face many other challenges, e.g., the resilience of the grid after natural disasters (Hu et al. 2021), an increasing number of DERs (Sauter et al. 2017), or the security of supply (Ma et al. 2019b), and cost of the grid maintenance and upgrade (Gören et al. 2022).
There are three types of electricity consumers: residential, commercial and industrial consumers (Billanes et al. 2018), and in many cases, they are located separated.Households make up around 12% of the total energy accounts and close to 13% of the emission accounts of Denmark (Statistics Denmark 2022).During peak consumption hours, households account for 35% of the total electricity load (Andersen et al. 2017).Furthermore, the adoption of DERs such as photovoltaics, electric vehicles, and heat pumps influence households' electricity consumption patterns that potentially results in grid overloads (Christensen et al. 2019).
Thus, it is important for DSOs to understand the state of their grid on the short-and long-term to ensure operational quality, maintenance, and identifying areas in the grid for renovations or investments.Some research has experimented with accurate forecasts on a short-to long-term horizon by applying machine learning (ML) and deep learning (DL) methods to the problem.Several types of neural networks, ML algorithms, and hybrids have been tested with excellent results.Furthermore, the electricity load forecasts have been tested with various independent variables and applications (Vanting et al. 2021).
However, in the literature, the independent variables are not systematically identified beforehand, often leading to the questions: why were the variables chosen in the first place, and how do they relate to the target variable?Moreover, the argument for specific supporting data does not appear until the features are analyzed for selection criteria such as correlation analysis (Friedrich and Afshari 2015;Pindoriya et al. 2010;Vonk et al. 2012).Additionally, the related literature does not explain the composition of the electricity load, i.e., the sources of electricity consumption in the aggregated load data, which may lead to a better understanding of the performance of the proposed models.Based on the challenges the DSOs face regarding the distribution grid, this study seeks to improve the prediction accuracy of load forecasts using a systematic data identification approach.
To fill the research gap, this paper aims to identify variables related to residential area aggregated electricity load systematically.The identified variables will be used to forecast the aggregated electricity consumption of two residential areas in Denmark.The systematic identification and subsequent selection will be made using the CSTEP framework (Ma 2022), which maps data within an ecosystem in several dimensions.The identification ensures that any possible data is accounted for and a strong foundation for supporting data is available, which was missing in related works.The impact of the systematic identification on the model performance will be assessed by testing and evaluating multiple types of neural networks based on related works.Moreover, the data is analyzed using the K-Means clustering algorithm to investigate the composition of the electricity load before it is aggregated.
Furthermore, to determine the impact of different electricity consumption sources, such as heat pumps and electric heating, the performance of the selected neural networks will be compared on subsets of the data set containing households with and without electric-based heating.The types of neural networks are based on the applications in the literature.The most popular models included in this paper are feed-forward networks (FFN), recurrent neural networks (RNN), and Long Short-Term Memory (LSTM) networks.Additionally, because the related publications have rarely applied Gated Recurrent Units (GRU), it will also be used in this experiment.Finally, to test the flexibility of the neural networks, each tuned model will be used to predict a single-step (1 h) and 24-step (24 h) of the electricity load.
This paper is structured as follows.First, the literature related to electricity load forecasting is presented.Afterward, the data processing and analysis is described in the methodology section, including the systematic identification and selection using the CSTEP framework.Thirdly, the forecasting results of the models are presented, compared, and analyzed.Finally, the impact of the systematic identification approach is discussed based on the results of the forecasts.

Related works
Electricity load forecasting using machine learning algorithms and deep neural networks has been a major area of research in the last decade.The increasing amount of data available and rising interest in artificial intelligence research has led researchers to experiment with different types of networks, algorithms, and hybrids to achieve high accuracies or low errors for their forecasts (Vanting et al. 2021).
Based on the literature, electricity load forecasting can be placed into three horizons: short-, medium-, and long-term (Gebreyohans et al. 2018;Solyali 2020).Short-term forecasting is applied when predicting minutes, sometimes referred to as very shortterm forecasting, and up to 1 week, as seen in Samuel et al. (2020); Houimli et al. 2020;Yong et al. 2020).Medium-term forecasts start from 1 week and go up several months to a year (Shirzadi et al. 2021;Salama et al. 2009;Gungor et al. 2020).Finally, longterm horizons are forecasts focused on predicting more than a year, sometimes several decades, depending on the data (Parlos and Patton 1993;Ekonomou 2010;Ghods and Kalantar 2008).Other than the length, each forecasting horizon is characterized by several parameters, including the independent variables, applications of the forecast, and models used for the prediction.
Long-term forecasts leverage socioeconomic data as independent variables and are usually applied to problems concerning larger areas, such as states, provinces, and countries (Elkamel et al. 2020;Tanoto et al. 2011).Furthermore, weather data are used on long-term forecasts for the electricity load of states and countries (Gao et al. 2019).In the literature, weather data includes outdoor temperature, humidity, wind speed and direction, precipitation, and solar irradiation.Moreover, electricity load forecasting on medium-term is applied to larger areas such as countries, states, and residential areas.Variables include weather, electricity prices, and socioeconomic data (Salama et al. 2009;Ilseven and Gol 2017).Short-term forecasts are applied to electricity grids and microgrids, power and substations, residential and office buildings, cities, provinces, and countries, using weather data and temporal features as independent variables (Li et al. 2021;Xu et al. 2019;Panapongpakorn and Banjerdpongchai 2019;Ahmad and Chen 2018;Ruiming 2008).Short-term forecasts are essential to determine if the load exceeds the capacity of a transformer, which can prevent power outages (Dung and Phuong 2019;Giamarelos et al. 2021;Al-Rashid and Paarmann 1996).
Additionally, the short-term forecast can indicate windows for flexibility to achieve sector coupling, leading to a more efficient energy system (Yan et al. 2012;Pramono et al. 2019;Xypolytou et al. 2017).The model selection varies within in each forecasting horizon, meaning a single type of model cannot be identified.Instead, researchers have tested several statistical methods, machine learning algorithms, and different types and combinations of neural networks to reach accurate predictions, leading to a highly diverse research field with a wide range of applications and independent variables.
In the literature, several types of neural networks have been applied.One network type is the recurrent neural network (RNN), designed to work with sequential data.The strength of an RNN is that it can take information from prior inputs together with the input at a given timestamp to better decide on the output.Furthermore, one of the more popular networks is the Long Short-Term Memory (LSTM) network, a type of RNN specifically designed to deal with long data sequences.It was first introduced in 1997 by Schmidhuber and Hochreiter and improved upon the regular RNN by dealing with the vanishing gradients problem (Hochreiter and Schmidhuber 1997).Gated Recurrent Units (GRUs) (Cho et al. 2014), which are another type of specialized RNN similar to the LSTM network, have also been applied to short-term load forecasting (Ribeiro et al. 2020;Zhu et al. 2019).Finally, a fully connected feed-forward network has also been a popular choice to forecast electricity load in the literature.Researchers have experimented with different configurations and combinations of networks and algorithms to improve forecast accuracy.While many apply regular neural networks, some combine several into hybrid ones, as seen in Panapongpakorn and Banjerdpongchai (2019) and Pramono et al. (2019).Others transform the forecast into an image recognition problem and use state-of-the-art convolutional neural networks to predict the load (Li et al. 2017;Sadaei et al. 2019).

Methodology
This paper systematically identifies and selects data relevant to forecasting the electricity load of residential areas to build a strong foundation of supporting data to improve the performance metrics of the forecasting model.To identify the possible features, the CSTEP framework proposed in Ma (2022) is used to analyze and evaluate an ecosystem by mapping the features to the five influential dimensions: Cultural, Societal, Technology, Economy and Finance, and Policies and Regulation.For this paper, the CSTEP framework is extended with different data variables dimensions to include supporting, embedded, exogenous variables and the impact of the variables on the electricity load.Supporting variables include sensor readings and statistical data, i.e., weather and climate measurements or electricity prices.Embedded variables are data that can be embedded in the target variable or other data sources, for example, temporal features or the sun's position.Exogenous variables are considered data that cannot be directly given as an input to a model but still impact the target or supporting variables.Finally, the impact on the target variable describes how each dimension and the different types of variables affect the increasing or decreasing electricity consumption of residential areas.
So far, no literature has systematically identified and selected the relevant data using the CSTEP framework.Researchers often rely on correlation analysis of features or tree-based methods for determining feature importance to decide on independent variables for multivariate forecasting.Before identifying the CSTEP variables, the electricity load is analyzed to examine the composition of the aggregated load.This step aims better to understand the performance of the model during inference.
Furthermore, this can help make the black box of neural networks more transparent by understanding the inputs better.The analysis of the electricity load will be done using descriptive statistics and by clustering the daily load profiles of each household in the area to investigate the different load patterns.The algorithm applied for the clustering is K-Means using dynamic time warping as the distance method.Afterward, the identified CSTEP variables are examined for data availability and sourced for the subsequent data analysis.Afterward, the electricity load is used to conduct feature engineering of temporal features and lagged electricity load.Finally, all selected features undergo a feature selection process using correlation coefficients and tree-based methods for feature importance.
After the data processing and analysis section, the evaluation and selection of neural networks are conducted based on related works and the research gap.This paper tests the performance of four separate neural networks on the aggregated electricity load.Baseline models of a feed-forward network (FFN), recurrent neural network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) are established and used as the starting point to tune hyperparameters and select the optimal architecture.Each tuned model is trained on the aggregated load data with and without including the selected CSTEP variables and used to forecast a single hour and 24 h.Then, each model is also trained on aggregated electricity consumption data containing households exclusively with heat pumps or electric heating.
To assess the performance of the models in this paper, four different metrics will be used, presented in the equations below.

Mean Absolute Error
Mean Absolute Percentage Error Root Mean Squared Error Adjusted R 2 Score

The CSTEP framework
The CSTEP framework consists of five critical business ecosystems dimensions, which are: climate, environment, and geographic situation; Societal culture and demographic environment; Technology (Infrastructure, technological skills, technology readiness); Economy and finance; Policies and regulation.Each dimension has several subdimensions with specific explanations as defined in Table 1 in Ma (2022).Additionally, the dimensions can be viewed on a macro and micro level based on the focuses of the business ecosystems.For instance, the sub-dimensions of Climate, environmental and geographic situation can be divided into a macro level considering the general weather conditions and natural features of a place (climate and geographic situation).Meanwhile, the micro level considers the living, working and production environment or conditions (environmental situation).The macro and micro levels of a dimension differ depending on the perspective of either the ecosystem or the individual stakeholder, focusing on either the general or specific levels of the business ecosystem (Ma 2022).

Analysis of electricity load
The electricity load data used in this paper is collected from two residential areas in Denmark in connection with a national project called Flexible Energy Denmark (Flexible Energy Denmark 2019).The data ranges from January 1st, 2019, to May 15th, 2022, and includes 211 households after processing and cleaning the data.From the residential areas, the data set includes households without photovoltaic panels, electric heating or heat pumps, and non-electric vehicle (EV) owners who use home-charging.Households with any of these characteristics are separated from the pure electricity consumption with central heating or district heating.These data are sourced by using the Danish building registry that collects information about all buildings in Denmark by law (Bygnings-og Boligregistret 2022).For EV owners, a different method had to be used, as this information is not registered anywhere.Instead, each household's data was analyzed to detect possible EV owners by clustering the load to identify outliers using K-Means.Subsequently, the load was searched for minimum-maximum consumption ranges that exceed 7.2 kWh, which is a typical consumption pattern for EV charging.By separating the households that have adapted these DERs, the impact of their load on the ability to accurately forecast can be investigated.
Figure 1 shows each household's average daily consumption profiles, where the red line indicates the average load within each cluster.The most typical consumption profiles Fig. 1 Clusters of daily load profiles can be seen in Cluster 2 and Cluster 5. Clusters 0 and 3 can be considered outlier profiles, while Clusters 4 and 1 are somewhere in between with equally many households, as seen in the distribution of clusters in Fig. 2.
The average consumption pattern over a year for the two residential areas can be seen from Fig. 3.In Denmark, household consumption usually increases during winter and decreases when the summer nears.Many factors can influence the consumption pattern, such as the sun, amount of light, temperature, rain, and wind.From the figure, a very distinct spike can also be seen towards the end of Christmas, a reoccurring pattern.These factors lead to several supporting data, for instance, the position of the sun, the weather, the length of days during the year, and special days, such as religious or national holidays.
Figure 4 shows the average aggregated daily load of the two residential areas.The pattern shows a slight increase during morning hours and a peak at 17:00.The period in the afternoon is essential to forecast correctly, as this is where the grid is challenged by high electricity loads that approach the grid's capacity.Each residential area is connected to a similar type of transformer with a capacity of 400 kWh.

Identification of CSTEP variables
As described earlier, any supporting data for the electricity load will be identified and mapped using the CSTEP framework.Table 1 shows the relevant variables identified for this research experiment.The variables are based on applications in related literature and from domain experts.The supporting variables include sensor readings or statistical data, such as weather and electricity prices.The embedded variables include data such as holidays, day lengths, demographics, and building information.The exogenous variables are data that cannot directly be used as an input for a model but add additional information about the other variables.The variables in this dimension can help explain irregularities or unexpected results.The final column describes how each CSTEP dimension's identified data impacts the target variable, which in this case is the electricity consumption of households.
After the systematic identification, each variable is investigated for availability and feasibility.Using openly available sources, the following CSTEP variables have been collected: • Holidays (Denmark) • Day lengths • Sun azimuth

• Sun altitude • Electricity prices
While many researchers insist on the importance of weather data to support the electricity load forecast (Vanting et al. 2021;Friedrich and Afshari 2015), it is not necessarily meaningful to include it in this experiment.The aggregated electricity load data is collected from two residential areas with some distance between them, meaning local weather data is unavailable.There may be a correlation between some weather data and the electricity load.However, causation cannot directly be determined in such an instance.

Feature selection and analysis
After selecting CSTEP variables, the data is analyzed with the aggregated electricity load using correlation coefficients and feature importance.The coefficients are calculated using Pearson's R, and the feature importance is the gain from gradient boosted trees using the Python library XGBoost. Figure 5 shows a correlation heatmap of the coefficients of each variable.There are no strongly correlated features with the electricity load, but a slight negative relationship with day lengths and a slight positive relationship with the sun's azimuth.
Looking at the relative feature importance of each variable in relation to the electricity load, the sun's azimuth is calculated to be the most important feature, as seen in Fig. 6.The gain signifies the relative contribution of the feature over all decision-trees in the gradient boosting model.At this point, each feature has also been analyzed individually for any irregularities.The analysis resulted in a decision to discard the electricity price variable due to a substantial increase in the price in 2022.This increase would only be visible in the test data  set, potentially resulting in unexpected predictions, as the increase is not reflected in the electricity consumption.The variable is visualized in Fig. 7.
In summary, the target variable of the electricity load is analyzed using K-Means clustering to identify different load profiles.The load profiles will give a better understanding of the input data to make the black box of neural networks more transparent.Furthermore, supporting independent variables have been systematically identified, selected, and analyzed using correlation coefficients and feature importance of gradient-boosted trees.Finally, each independent variable was analyzed for missing or broken data, potential irregularities, and seasonal patterns and trends, resulting in discarding the electricity price as an independent variable.

Model selection
The model selection is based on neural networks from related works, which are a fully connected feed-forward neural network (FFN), a recurrent neural network (RNN), and a long short-term memory network (LSTM).Finally, to fill a gap in the literature, a gated recurrent unit (GRU) is also included in the experiments of this paper.

Baseline performance and models
First, a baseline performance of the forecasting problem is conducted using a simple multivariate linear regression model to predict the electricity load based on the CSTEP variables as input.The baseline performance resulted in the metrics seen in Table 2.These baseline metrics are considered the minimum to beat by the proposed models.
Secondly, each selected model is trained and evaluated on the data once without any hyperparameter tuning or feature engineering to assess the base performance of each neural network.From here, the baselines will be iteratively improved by tuning training, data, and model parameters.Table 3 presents the baseline metrics of each model using the CSTEP variables as independent variables for the electricity load.At this point, all models perform equally without any feature engineering or hyperparameter tuning.

Model tuning
Each model from Table 3 will undergo a tuning process, where several parameters are tested in different combinations.To do this, the experiment tracking tool Weights and Biases is leveraged to find the best size and combination of the tunable parameters (Biewald 2020).An iterative random search process can be conducted by setting up a training loop that tests all four models, ending with a greedy search.The tunable parameters are seen in Table 4 below.Each tunable parameter has several values that are chosen uniformly and randomly.The feature engineering includes lags from 1 to 168 h, and the temporal features have been encoded cyclically using sine and cosine transformations.
After running several tests and calculating metrics for each model, the best parameters could be found.Table 5 summarizes the tuned parameters for each model.These four tuned models are subsequently trained on data ranging from January 1st, 2019, to May 15th, 2021, and evaluated on the test data from May 15th, 2021, to May 15th, 2022.Each model will be trained four times, resulting in 16 different prediction results: a 1-h forecast using CSTEP variables, a 1-h forecast without CSTEP variables, a 24-h forecast using CSTEP variables, and a 24-h forecast without CSTEP variables.

One-hour forecast
The prediction results of the 1-h forecasts with and without the identified CSTEP variables are presented in Table 6.Overall, the metrics look similar for each model.For example, the lowest error was found using the feed-forward network with CSTEP variables at 3.9064 kWh mean absolute error and the highest adjusted R 2 score of 0.9681.However, the same model without the CSTEP variables gives the highest error and lowest adjusted R 2 score, while no substantial difference is seen in the recurrent neural networks.This change in performance can indicate that the FFN is more dependent on the CSTEP variables than the recurrent networks.
Figure 8 visualizes each model's first week of hourly predictions with the actual load during the period.The models mostly capture the peaks and valleys with some larger errors, especially between the midday and afternoon peaks.Because these predictions look similar, it may be more interesting to investigate the performances on specifically challenging days to assess better the models, such as Christmas, which usually sees very high peaks in the afternoon to evening hours and different consumption patterns throughout the day.Figure 9 presents the forecast during Christmas 2021, where there is a greater difference in the models' ability to forecast hourly.The actual load is shaped differently than on a regular day.December 23rd and 25th have much flatter peaks, where the morning and afternoon are similar, and the 24th with a high afternoon to evening peak.The FFN, RNN, and LSTM models cannot capture these peaks as well as on a regular day.However, the GRU predicts the high increase of the afternoon peak surprisingly well.This factor could be another performance metric to consider when assessing the performance of different neural network architectures, as this cannot be seen from the error metrics and adjusted R 2 scores.

24-hour forecast
Table 7 presents the prediction results of the 24-h forecasts using CSTEP variables and excluding the CSTEP variables.Generally, the errors are higher than the 1-h forecasts, which is expected due to the multi-step predictions giving higher uncertainties at each timestep.However, the FFN is slightly more accurate out of the four models.Furthermore, the FFN's performance changes when excluding the CSTEP variables is not as visible in the 24-h forecasts compared to the 1-h forecast.
The forecasts for the first 24 h of the test data set are visualized in Fig. 10 below, where the point of the multi-step forecast starts on May 15th 23:00.There is no substantial difference in the first day of prediction for all four models.The ability to predict 24 h accurately using the same model architecture as for the 1-h forecasts means that the models are flexible in their application.To further assess the ability of the 24-h forecast models, they will also be investigated during Christmas 2021. Figure 11 presents the forecasts on Christmas day with the first prediction starting at midnight on the 24th of December 2021.The 24-h forecasts generally underestimate the actual load but follow the pattern correctly.The GRU neural network performs the best during this period, coming much closer to the peak load than the other models.Error metrics and R 2 scores are critical indicators to assess the performance of models.However, they are not the only factor to base performances on for electricity load forecasting.Looking solely at the error metrics, one would choose the FFN model as it shows the lowest overall error.However, DSOs might think it necessary to predict as accurately as possible on specific days when the grid is nearing capacity, such as Christmas.Because of this, the GRU model might be the better model to use.

Comparison with electric-based heating
All four models are tested on a data set of household electricity consumption containing heat pumps and electric heating to determine the importance of analyzing the composition of the aggregated electricity load and investigating the prediction performance of electrically heated households.It must be noted that the sample size has decreased compared to the original dataset, from 211 to 22.Because of the smaller sample size, the initial data set was sampled to have the same size, and all models were applied to the subset to compare them better.Table 8 presents the error metrics of the models applied to electric-based heating household load and the sampled non-electric-based heating electricity consumption.The results give several insights.Firstly, the sample size of the aggregated load data affects the prediction ability of the models.For instance, the subset of the data set with a sample size of 22 has an adjusted R 2 of 0.8273 for the FFN model, while the same model on the full data set reaches a score of 0.9681.This change is seen across all models, indicating the sample size of the aggregated load to be an essential factor.Secondly, multiple metrics are crucial to correctly assess neural networks' performance.Due to the increased average hourly load for electric-based heating households, the absolute and squared errors change relative to the load.For the sampled data set, the average hourly load is around 0.37 kWh, whereas the electric-based heating households have an average load of around 0.96 kWh.Thirdly, while there is a difference in absolute and squared errors, the adjusted R 2 score does not substantially change when predicting electricbased heating and district heating households.Finally, the addition of CSTEP variables impacts the performance differently depending on the model.
The FFN model sees a slight performance increase when removing the CSTEP variables.The error of the RNN model increases without the CSTEP variables.The LSTM model has the worst performance, but the error slightly decreases when removing the supporting variables.Finally, the GRU model sees almost no change in performance with or without the CSTEP variables.

Discussion
This paper systematically identified and analyzed data to forecast the aggregated electricity load of residential areas using the CSTEP framework.The data were used as inputs with feature-engineered variables to predict the next hour and 24 h.Four different neural networks are tuned, trained, and evaluated on the data sets with and without the CSTEP variables to assess the impact of the systematic identification process.It is found that 1-h forecasts perform equally well when looking at the error metrics and the adjusted R 2 score; however, further investigations into the predictions show the GRU model capturing the actual load better.An additional factor can be included in model performance assessment by examining the models on certain days such as Christmas, which usually sees very high consumption peaks.Finally, 24-h forecasts are also conducted to examine the flexibility of the models.Overall, the metrics show minimal variation across the models, but comparing the predictions through visualizations indicates where the models may differ.
Furthermore, to determine how the composition of the aggregated load data affects the forecast, a separate data set containing households with heat pumps or electric heating was used to predict.It was found that the number of households in the aggregated load affects the forecast, meaning a smaller sample size increases the forecast error.To validate this, 22 households were sampled from the initial data set to match the electric heating households and subsequently compared to each other.Here, no substantial differences were found in the adjusted R 2 scores; however, MAE, MAPE, and RMSE metrics differed due to the increased average load of households with heat pumps or electric heating.The systematically identified CSTEP variables did not increase the forecast significantly; however, they gave the authors an increased understanding and explainability of the target variable.The complexity of the consumption pattern can be better understood by considering as many factors as possible.This understanding can lead to explaining why there are increases or decreases in the electricity load during specific periods, changing behavioral patterns of residents, or to identify peaks and valleys in the load pattern.Furthermore, the FFN model saw an increase in error after removing the CSTEP variables, indicating that the recurrent neural networks rely less on the supporting variables.The popular neural network architectures in the literature are LSTMs and FFNs; however, this paper has shown that GRUs perform very well on performance metrics and when visualizing the predictions to understand where the model predicts well.Furthermore, this paper demonstrated that choosing the optimal neural network architecture is not as important as curating good data inputs, which was shown by testing the models on different load profiles with and without electric heating or heat pumps.Moreover, it was found that the aggregated load's sample size impacts the forecast's accuracy, with smaller sample sizes giving more volatile consumption patterns.
The systematic identification and selection of supporting data were valuable with certain neural network types, such as the RNN and FFN.As described in the literature, the LSTM and GRU networks are specialized in long data sequences due to their ability to remember patterns, which could explain that they do not have to rely on the CSTEP variables as much.The results of this study were not very encouraging because the test of the systematic identification process did not significantly impact the performance metrics as expected.However, the process gave a better understanding of the complex electricity load forecasting problem.The data sets used for this study were cleaned and filtered to consist of households without DERs and only the households' pure electricity consumption, meaning no electricity-based heating installations.The results prove that the sample sizes of the aggregation play a large part in the forecasting accuracy.
Moreover, the challenge of predicting different consumption patterns, such as households with heat pumps or electric heating, was rejected because the adjusted R 2 was found to be close to equal for both load patterns.However, this study achieved excellent results in forecasting the electricity load for the next hour and next 24 h, which is underlined by the satisfying low errors and high adjusted R 2 scores.Furthermore, after visualizing the predictions, it was shown that the models could get very close to the actual load.

Conclusion
The purpose of the study was to test a systematic data identification and selection process to forecast the aggregated electricity load of two Danish residential areas.In the literature, the data selection process often relied on correlation analysis of the supporting data.However, this paper added an initial step to building a robust data foundation forecast using the CSTEP framework.Forecasting with neural networks is a major research field, and this paper tested and compared different types of neural networks from the literature.The research has shown that the systematic identification of variables has potential but does not substantially affect the models' performance metrics.However, the process did give a greater understanding of the target variable, which can help curate better data in the future.Testing multiple neural networks results indicate that choosing the optimal architecture is not as impactful as having good data inputs.The findings of this study will be of interest to researchers who seek to make their data processing and analysis more systematic by applying the CSTEP framework.
Moreover, the findings will underline the importance of curated data for researchers and the industry, e.g., DSOs.The limitation of this study is the data availability for the target variable and some of the supporting data.The target variable had a small sample size for electric-based heating households, which meant that the original data set had to be sampled to be of equal size.Larger sample sizes would give a more evident answer to the differences between load patterns.Furthermore, the CSTEP variables were limited by sources of external data, such as the weather data.A majority of researchers use weather data for their forecasting models, but for this research, it was not feasible due to the location of weather stations.Finally, the results of this study are based on electricity consumption from Danish residential areas, meaning they are not directly generalizable to all parts of the world.
Despite these limitations, the study shows the models' flexibility on different consumption patterns, multiple types of independent variables, and by forecasting one hour to 24 h ahead.Further research should be conducted using the CSTEP framework to systematically identify independent variables to better assess the method's impact on the forecasting problem.Furthermore, the findings suggest that better performance metrics are needed to compare the predictions of neural networks, as the intricacies could only be seen by visually inspecting the forecast.For future work a more complex selection of models with more complicated data sets to test the forecasting ability further is planned.Finally, to improve on the limitations of this study, a larger sample size of residential houses should be used.

Fig. 4
Fig. 4 Average daily load profile of the aggregated load

Table 1
Systematically identified CSTEP variables

Table 2
Baseline performance

Table 3
Baseline neural networks

Table 5
Results of the hyperparameter tuning

Table 6
One-hour forecast metrics

Table 8
Comparison metrics with electric heating load