Abstracts from the 9th DACH+ Conference on Energy Informatics

s from the 9th DACH+ Conference on Energy Informatics Sierre, Switzerland 29-30 October 2020 Published: 30 October 2020 Introduction Welcome message from the organizers René Schumann, Michael Brand and Khoa Nguyen 1 HES-SO Valais Wallis, SiLab, Rue de Technpole 3, 3960 Sierre, 3960 Sierre, Switzerland; 2 OFFIS Institut für Informatik, Escherweg 2, 26121 Oldenburg, Switzerland. Correspondence: René Schumann (rene.schumann@hevs.ch) Energy Informatics 2020, 3(Suppl 2):Introduction Dear readers, In this supplemen of the proceedings of the DACH+ Energy Informatics 2020, we present the poster abstracts, including 10 from the main conference submission and 6 from the co-located Energy Informatics Doctoral Workshop. It is an important aspect to provide a space for the debate about the most recent developments in the field and also its transfer to the real world, making an impact on our community. Therefore we are happy to share with you, the reader, the most recent developments, which we could discuss at the conference, and which gives us confidence that also in the following years we will have interesting and relevant insights to be presented in future editions of the DACH+ Energy Informatics. The innovative ideas come from a variety of topics within the field, such as network security, electric mobility, load forecast, etc. We hope you find the poster abstracts informative and inspiring for future collaboration. Sincerely, René Schumann General Chair Michael Brand Poster Chair Khoa Nguyen Publication Chair


Introduction
Welcome message from the organizers René Schumann 1* , Michael Brand 2 and Khoa Nguyen 1 1 HES-SO Valais Wallis, SiLab, Rue de Technpole 3, 3960 Sierre, 3960 Sierre, Switzerland; 2 OFFIS -Institut für Informatik, Escherweg 2, 26121 Oldenburg, Switzerland. Correspondence: René Schumann (rene.schumann@hevs.ch) Energy Informatics 2020, 3(Suppl 2):Introduction Dear readers, In this supplemen of the proceedings of the DACH+ Energy Informatics 2020, we present the poster abstracts, including 10 from the main conference submission and 6 from the co-located Energy Informatics Doctoral Workshop. It is an important aspect to provide a space for the debate about the most recent developments in the field and also its transfer to the real world, making an impact on our community. Therefore we are happy to share with you, the reader, the most recent developments, which we could discuss at the conference, and which gives us confidence that also in the following years we will have interesting and relevant insights to be presented in future editions of the DACH+ Energy Informatics. The innovative ideas come from a variety of topics within the field, such as network security, electric mobility, load forecast, etc. We hope you find the poster abstracts informative and inspiring for future collaboration. Sincerely, René Schumann General Chair Michael Brand Poster Chair Khoa Nguyen Publication Chair Introduction Today's energy grids are on the verge to become smart. The main difference between a common grid widely used today and a smart grid, aimed at in the near future, is the high level of digitization which helps to handle more complex tasks in an efficient and effective way [1]. The gathering of information from diverse components in the smart grid using well-defined communication protocols is imperative. Consequently, several protocols have been proposed such as IEC 61850, IEC 60870-5-104 or Modbus. A long-lasting blackout can have a severe impact on our daily lives. Therefore, a reliable operation of an energy grid is crucial. However, the increasing digitization of smart grids imposes a new threatattacks on the digital infrastructure. Thus, information technology security for smart grids is becoming an important factor. This work explores the security of the IEC 61850 protocol standard using a technique called fuzzing. So called fuzzers provide unexpected (i.e. random) data to a program and monitor its behavior. Depending on the state of the input parser of that program, undesired events may be triggered such as crashes, built-in assertions or memory-leaks. Such bugs are often used as a commencement for attacks and should be reported and fixed quickly.
There are several studies that deal with fuzzing of the IEC 61850 standard [2,3,4,5,6]. However, to the best of our knowledge, the fuzzing tools of these studies were not published. Therefore, we or other researchers are unable to reproduce or build-upon the previous results.
In this work, we use fuzzing to reveal further unknown weaknesses in a frequently used implementation of the IEC 61850 protocol standard. The fuzzer we developed will be made publicly available to improve future security analysis of communication protocols used in the digital energy sector. In addition, we provide a description of our methodology to facilitate further experimentation.

Methodology
The generative fuzzing approach we propose is applied to an existing open-source library of the IEC 61850 standard [7]. This library offers implementations of a Manufacturing Messaging Specification (MMS) server and Generic Object Oriented Substation Events (GOOSE) and Sampled Values (SV) subscribers. Since the library is publicly accessible, regularly maintained, and a commercial license with support is offered, we assume that it is used in real hardware and commercial projects. As fuzzing only reveals errors that cause the program to crash or hang, it would not detect internal software errors that might occur without resulting in a crash. To find such memory specific bugs we compile the library with AddressSanitizer (ASan) [8,9]. ASan is a memory error detector for C/C++ projects. By using it, the program under test will crash whenever a memory bug is triggered. However, a certain trade-off exists. By using ASan the execution of the program is slowed down, which therefore also impacts the fuzzing performance.
Procedure for protocol fuzzing A protocol fuzzing procedure can be divided into several steps as illustrated in Figure 1. These steps can be used as a simple guideline to lower the entry barrier for other researchers outside the security domain.
Obtain Message Specifications When fuzzing protocol implementations, the fuzzer must generate packets that comply with the protocol specifications. To determine these specifications there are two general possibilities. Study the official documentation of the protocol, or read and analyze packets using a network protocol analyzer like Wireshark [10]. Wireshark sniffs packets directly from a network interface and displays a detailed analysis of the packet using various build-in dissectors. We use Wireshark to analyze the structure, fields and contents of IEC 61850 packets and to check whether the packets generated by our fuzzer match this format. Create own packets Having captured and analyzed packets of the respective protocols, the next step is to create artificial packets. For this task we use Boofuzz, which is a well-known open-source network protocol fuzzing framework [11,12]. It simplifies the creation of a fuzzer by taking care of crash detection, target reset after failure, recording of test data and also simplifies the definition of packets. To create messages, we copy the message in hex representation from Wireshark as an escaped string and paste it into the fuzzing script. Afterwards, the message is divided into any desired level of detail according to the individual fields. Dividing the message into blocks has the advantage that the code is more legible and that single blocks may be directly addressed by their corresponding name. This allows for example calculating a checksum for a certain block, if this is expected by the protocol.
Decide what should be mutated To realize the mutation we use the s random function of Boofuzz. This function creates a random data chunk using a byte-wise mutation while keeping a copy of the original data [12]. The function parameter num mutations specifies the amount of mutations. In addition, a random length range can be specified by altering min length and max length. The decision which fields of the message should be mutated is incumbent to the researcher. In this work, we mutate almost every field, set the random length range to 0-100 and set the number of mutations to 100000 per data chunk. Only essential attributes like destination, source or ethertype remain unchanged.
Start fuzzing Before starting the actual fuzzing, it has to be ensured that the program being fuzzed is observed and checked for crashes. For this purpose, Boofuzz offers a process monitor which must be started before the fuzzing process. We analyzed all three implemented sub-protocols and fuzzed the MMS server as well as the GOOSE and SV subscribers. The fuzzing processes terminated when the number of defined mutations was reached.
Found Crash? To determine whether a crash was found, the output of the fuzzer or process monitor can be examined and searched for messages that report a crash. Alternatively, the fuzzing logs stored in SQLite databases created by Boofuzz can be used. If the fuzzing does not lead to any crashes, it should be double-checked if the packets sent by the fuzzer correspond to the required format. In addition, it is advisable to mutate other fields or to create additional types of messages.
Analyze Crash Having found a crash, it has to be analyzed manually. Not every crash is equal to a new software bug, because often one bug is triggered multiple times during fuzzing. To determine whether the crash is unique and what caused it, the source code must be analyzed. A first aspect that should be checked is whether the crash can be reproduced without fuzzing. If this is the case, the next check is whether it occurs without ASan. This would indicate that the program crashes with a standard compilation when it receives the input and that the fuzzer would have detected it without further help. To understand and possibly fix the bug, a deeper analysis of the involved code is necessary. A first indication to find the source of error is to see which part of the packet has been mutated. An additional starting point, if available, is the crash documentation of ASan. It traces the crash through all instructed files and also presents the line number of the file where the error is located. At this point, it is preferable to investigate and navigate through the code and to understand the exact procedure that causes the error using a debugger. Report Bug If a crash has been found and a security researcher is able to reproduce and explain the bug, it should be reported to the software developer or maintainer. This ensures that the bug can be fixed before attackers can find and exploit it.

Results and Future Work
Using the simple process illustrated in this work, we fuzzed an opensource project implementing the IEC 61850 standard. Thereby we found crashes that could be traced back to four different errorsone in the MMS server, one in the GOOSE subscriber and two in the SV subscriber. All four errors could be reproduced without the usage of ASan and led to segmentation faults in form of illegal read memory access where the program tries to read from the zero page. They could be exploited by an attacker to conduct a Denial-of-Service attack and thus, should be fixed urgently. All bugs found during our research were reported to the project maintainer via GitHub issues and will hopefully be fixed in the near future.

Future Work
In addition to the server/subscriber implementations of the protocols used in the IEC 61850 standard, we plan to analyze the client/publisher side in future work. Furthermore, we want to extend the spectrum of protocols and analyze for example the IEC 60870-5-101 and IEC 60870-5-104 protocols. Since there are only few open-source implementations of such protocols, we plan to apply fuzzing on real hardware. This would ensure an analysis of software that is used in a real environment.
Another goal for the future is to work towards a smarter fuzzer, which could possibly take into account a feedback based on code coverage.

Summary
Our research investigates the usage of multi agent reinforcement learning (RL) based electric vehicle (EV) charging strategies to improve the photovoltaic (PV) energy self consumption share of a small energy community. Our RL agents were faced with the task of balancing local energy demand and supply with their minute-by-minute charging set point decisions. To test the two different versions of our algorithm, we simulated a year's worth of energy community activity with high fidelity stochastic models of residential energy consumption and EV usage habits. A local PV generation source was also modeled using real PV data measurements. The results suggest that RL methods can improve an energy's self consumption share relative to a "business as usual" charging strategy. The version of our algorithm that was only permitted to perform charging actions improved the self consumption share by 6.4%, while our charging-discharging algorithm improved the self consumption share by 16.7%. Keywords: electric vehicle; charging; expected SARSA; reinforcement learning Introduction and Related Work A self sufficient local energy community (micro grid) benefits from elevated energy security [1]; autonomy in deciding the generation source of their energy; and having an economic alternative to purchasing energy from their local distribution company [2]. Nonetheless, balancing local generation and demand is challenging in an energy community that contains non-controllable energy generation, e.g. photovoltaic power sources. A promising solution to improving the flexibility of local energy dispatch is the use of the energy community's EVs as storage devices that can consume intermittent local renewable energy generation surpluses and then re-inject this energy at a later time when needed by the community [3]. We are currently studying the application of RL based methods to this distributed, EV-enabled load balancing problem by investigating the performance of RL-based controllers that decide EV charging/discharging power flow set-points on a minute time scale. In recent years, RL based approaches to EV charging have begun to be investigated. However, the time resolution of the decision making processes characterized by most works are usually very coarse, i.e. on the order of hours or days [4,5,6,7], or rely on tabular methods that have inherently low scalability [8,9,10,11]. Shin et al. [12] is perhaps the first to attempt using RL to control the charging/discharging actions of multiple battery storage system agents on a minuteresolution time scale. Because of this, the problem Shin et al. is trying to solve is similar to ours. However, our agents attempt to leverage the intermittently available energy storage potential of EVs while they are parked for a charging session. Also, our approach relies on less information exchange between the agents present in the system; the simplest version of our algorithm does not require any communication beyond the use of smart meters. Moreover, we use a simpler action value approximation approach, and leverage action preference functions in our policy to inject hard operation constraints into the decision making process of the RL agent.

System Model and Problem Statement
The energy community model we use for our experiment consists of five apartments, four electric vehicles, and one communal PV plant. The consumption of both the non-controllable loads of the apartment dwellings and the controllable loads of the EVs are influenced by realistic behaviour models (see [13,14] for model details) of each apartment's tenants. Moreover, a local PV generation source based on the 2013 generation output of a real PV plant in Freiburg is included in this energy community; is assumed that the cost of consuming its energy is significantly lower than consuming energy from the power grid. The energy consumers also have access to an external power grid. Two different scenarios for our energy community are considered, which differ in the availability of smart grid technology. In the first scenario, infrastructure is such that it prohibits EVs from discharging energy into the community. In the second scenario, we postulate infrastructure capable of facilitating EV energy discharging. The self consumption share of an energy community is the percentage of energy consumed sourced from their own energy generation sources relative to their overall consumption, which includes both local and grid sourced energy. The only controllable parameters that can influence the self consumption share in the modeled energy community are the charging/discharging set point values of each EV chosen at each time step. Thus, we can formulate our problem as the maximization of the average self-consumption share experienced over some period of time via the optimization of EV charging/discharging set-points.

Reinforcement Learning Enabled EV (Dis)Charging
The EV charging agents must maximize the energy community's self consumption share while meeting the charging requirements of EVs before they depart. Every minute, the EV charger must decide a new rate of charge (i.e. the charging "set point") for the EV in a manner that provides a feasible solution to this problem. From a Markov Decision Process (MDP) perspective, the environment is the EV and power system, while the agent is the charging controller. The agent's action space is characterized by the finite charging set point decisions that can be made at each time step. The state of the environment has been characterized with four state dimensions based on time remaining in the charging session, past PV consumption, current EV energy demand, and the time of day. Reward is characterized as the net amount of local energy that the agent consumes as a consequence for the past minute's charging set point decision. Thus, if the agent consumes more grid energy than local energy, it receives negative reward, and vice versa.
The agent uses Expected SARSA as its RL method, which is an offpolicy, model free, temporal difference (TD) based method [15]. To handle the moderately large dimensionality and continuous state variables present in this MDP, the action values for a state are being approximated by a four-dimensional tile coder. Tile coders are a computationally efficient means to representing continuous states as a binary vector with a size equal to the number of "tilings" times the number of "tiles" per tiling used to partition a continuous state space [15].
To avoid needing to design a reward that can incentivize the agent to meet the charging needs of the EV, an action preference function [15] was designed to limit actions that are guaranteed to result in a failed charging objective given the current state. Moreover to enforce the operational constraints of the battery, the action preference function was also designed to restrict actions that would either saturate the battery when its SOC is high, or reduce the SOC to a critically low level. We have defined the policy of our agent (the probability distribution used by the agent to decide which action to take) using a soft-max function that transforms action preference values into a valid probability distribution.

Results
For both smart grid scenarios, we trained EV charging agents over one year's worth of simulated energy community activity, and then tested the agents' ability to operate over a year's worth of previously unseen activity. For the charge only scenario, the self consumption share of the energy community increased from 28.4% to 34.8%. When considering the EV activity only, their collective self consumption share increased from 21.5% to 43.0%. For the charging-discharging experiment, the community self consumption share increased from 28.4% to 45.1% when the RL algorithm was used instead of the baseline "business as usual" approach. As with the charge-only experiment, this figure of merit was calculated over the course of the simulated activity where at least one EV agent was connected to the micro grid. Interestingly, the self consumption share of the EVs only increased from 21.5% to 37.1%, which is less than the charge-only experiment despite the overall self consumption share of the chargingdischarging experiment being higher. This is caused by EVs discharging energy into the non-controllable loads, and increasing their self consumption share at the expense of making the EVs have to charge more often to meet their own needs. It appears that the resultant demand increase of the EVs reduced their self consumption share, but the overall respective increases and decreases in PV consumption and grid consumption in the energy community made up for it. Currently, neither the reward function nor action preference function of our algorithm enforce grid operation constraints such as rapid charging/discharging cycles and demand fluctuations. Nonetheless, now that promising self-consumption share metrics have been achieved, we can focus on incorporating these technical constraints into our algorithm in future work. To ensure a safe energy supply with fluctuating renewable energies, large storage systems for all sectors -electricity, transportation, and heating -are essential. In this work, a possible sector-coupled longterm storage system for Germany as a whole is modeled on an abstract level. The energy demand of the transportation sector is calculated in three scenarios, considering propulsions that are run mainly on batteries, hydrogen, or synthetic fuels. The need of storage systems is calculated by an optimized operation with scaled historic data of PV and wind power feed-in and energy demand at the same time (2015-2017). As a result of the optimization, the most efficient and economic scenario is the one with a focus on battery-electric transport, which also leads to a large capacity of second-life batteries. Hydrogen and methane are by far the largest storages in all scenarios. Keywords: transportation; electric mobility; storage; hydrogen; sector coupling; power-to-X; second-life batteries; energy system analysis Introduction Currently, nearly the whole energy demand for transportation is based on fossil fuels. In order to shift the whole energy system towards renewable energies, all means of transport somehow need to powered by renewable energies as well. To achieve that, mainly three propulsion technologies are discussed: Battery-electric propulsions (BEV), electric propulsions powered by fuel cells (FC) and conventional internal combustion engines (ICE) which are run by synthetic fuels. Each of those options has basically two impacts on the energy system: Firstly, because of different efficiencies, the total energy demand for transportation changes significantly and secondly, the storage medium -batteries, hydrogen and methane -can be used to balance fluctuating renewable feed-in. There are a lot of scientific estimations about how a renewable energy system could look like in the future, based on assumptions on efficiency improvements and cost development of storage technologies. For example, [1] models a possible energy system in the year 2050 in detail, but the authors assume the transportation sector to be run to 30 % by BEV in 2050 and don't discuss it further. [2] discusses different propulsion technologies for 2040, including assumed efficiency improvements and energy imports, but doesn't examine the impacts on the storage system. In this work, we didn't try to make a future prognosis, but examined the variation of propulsion technologies given today's end energy demand of Germany, which is completely met by renewables in our model.

Energy demand
Electricity demand today (that is not used for residential heating and transportation) makes up only 19 % of the end energy demand of Germany which had an average annual value of 2495.4 TWh in the years 2015 -2017 [3]. The highest share of 51 % accounts for heating, followed by the transportation sector which sums up to 30 % of the energy demand. We assume that most of the thermal energy demand can be stored in thermal storage systems, e.g. hot water tanks.
In the model, we distinguish between low-temperature demand, which is warm water and space heating (except district heating) that can also be provided by heat pumps, and high-temperature demand, which is district heating and industrial process heat. Heat power demand for industrial heat is assumed to be constant while residential low and high temperature heat depend on the daily temperature. Therefore, daily temperature data [4] in the examined years is used together with the standard load profile (P0) to generate the residential load profile. The electricity demand and PV and wind supply is taken from [5] whereby latter is scaled to meet the total energy demand.
To calculate the end energy demand for transportation, current energy demand for transportation is changed by efficiencies of fuel production found in [6] and [7], including hydrogen compression or liquefaction where necessary. Based on [8] and [7], energy demand of different modes of transport is reduced by efficiency factors compared to ICEs today. We set up three scenarios with different shares of propulsion technologies, based on discussions about the potential of each technology. For each scenario, the total energy demand for the transportation sector can be calculated. Furthermore, the total capacity of all BEVs is calculated based on assumed capacities of each vehicle. According to [9], it is assumed that 80 % of that capacity can be used as second-life batteries.
In the model, the electrical power demand for charging batteryelectric cars follows a standardized 15-minute load curve that considers the weekday, supplied by the institute KIT-IEH. Dynamic charging is not considered here. The time-dependent fuel demand and the charging power for all other BEVs are assumed to be distributed equally over the year.

Storage and conversion technologies
Storage and conversion technologies are modeled by their currently stored energy et and their current power pt at a certain time-step t.
Second-life batteries, pumped storage hydro power, hydrogen (eH2) and methane and low and high temperature heat storages are taken into account as storage technologies. To convert energy between those storages, battery periphery, pumps/turbines, electrolysis (pEly), metha-nation, fuel cells (pFC), combined heat and power plants (CHPP), heat pumps and heating resistors for low (RLT) and high temperature (RHT) applications are considered. To describe the relationship, the efficiencies of all conversion technologies (ηEly, ηFC), coefficient of performance of heat pumps, and cooling losses of thermal storages are needed. Except batteries and pumped hydro storage, all storages also reduce their energy by the current demand for fuels (dH2trans), and heating applications. Equation 1 shows the modeling of all storages exemplary for hydrogen. Optimization The optimization problem minimizes the quadratic investment costs for the storage and conversion technologies in all time-steps. The quadratic form is chosen to avoid high peaks if power and capacities over time, since actually only the extreme values have to be minimized.
The vector x~contains the stored energy and conversion power for all time-steps. The Hessian H is a sparse diagonal Matrix that contains the investment prices per MWh of every storage technology and investment prices per MW for every conversion technology. Vector f contains the loss of every conversion technology calculated by (1− η). The value of the objective function is not of interest and must not be understood as the costs of the storage system. The goal of the optimization is only to find a cost-and energy efficient usage of the storage system to find out the demand for the investigated technologies. Energy in storages can become negative in the optimization, since we don't know their starting levels and set them to zero. After the optimization, the highest delta between the lowest and the highest storage state is the used storage capacity that would be necessary in the examined years. Pumped storage is limited to 10 GW/60 GWh due to ecological limits and heat pumps and RLT are limited to 44 GW to respect distribution grid limits which is completely used in all scenarios.

Results Conclusion
It can be seen that the energy storage system becomes very large in the given approach that assumes neither efficiency improvements nor energy imports. That distinguishes this work to the studies mentioned in the introduction. Regarding the transportation sector, energy demand in Scenario 1 is by far the lowest. Storage demand is not significantly higher than in the other scenarios, so the usage of chemical fuels seems not to lower the demand for stationary storage. That can be explained by the higher energy demand. On the other hand, in Scenario 1, the availability of second-life batteries could be higher than the demand for stationary battery storages if the given assumptions will turn out to be correct, which. Therefore, batteryelectric transport is likely to have the most beneficial impacts on a sector-coupled energy storage system.

Funding
Publication costs were covered by the DACH+ Energy Informatics Conference Organizers, supported by the Swiss Federal Office of Energy.

Availability of data and materials
Sources of data and further materials are available in [10].

Author's contributions
This work is based on the master thesis of Tobias Riedel which was supervised by Martin Zimmerlin.

P4
Design of an  Summary Activities of daily living (ADL) are activities of individuals performed on a daily basis which are necessary for independent living at home. ADLs are often used as a reliable indicator of the health of a person but manual assessment of ADLs is time consuming and labor intensive. That's why the field of automatic ADL detection has seen an increase in popularity in recent years. Here, we report on a developed ultra-low power sensor platform for ADL detection. We performed field trials in the residential setting to validate the sensor system and translated the knowledge to the domain of office buildings to enable user-centric building control.
To that end, we tested the capability of the sensor platform to estimate the number of people present during meetings. The results show that our sensor platform is able estimate the number of people with a mean absolute error of 1.3. Keywords: activities of daily living; ADL; sensor platform; user centric control Introduction Activities of daily living (ADL) refer to all activities related to self-care and independent living of an individual. Since the first publications of a standardized assessment protocol for ADLs by Katz et al. [1] in 1963, many health professionals use ADLs to assess a person's ability to care for themselves. Since then, many more scales have emerged which have found application ranging from general geriatric assessments, dementia, stroke, development disorders and rehabilitation and provide reliable indicator for a persons health [2,3]. Celler et al. [4] introduced one of the first systems for tele-monitoring of ADLs in 1995. Since then, the field of automatic detection of ADSs has gained significant traction. One reason lies in the simple fact that the manual assessment of ADLs is time consuming and labor intensive. Another reason for this trend lies in the advances and miniaturization of sensors and the emergence of IoT [5,6]. A major benefit of automatic ADL detection systems is their ability for constant monitoring. Tracking a patients behaviour patterns over long periods of time increases the chance for early detection of emergency situations [7]. A literature review by Peetoom et al. [8] showed that most systems for ADL detection use simple passive infrared (PIR) motion sensors to measure activity levels at different locations. The assumption hereby is, that there is a simple mapping from room activity to ADL. For example, PIR motion activity in the kitchen is mapped to cooking, activity in the bathroom to showering, bathing or personal hygiene. In our project we translate the knowledge in the field of ADL detection to the commercial sector. We make the following contributions: 1) We built a ultra-low power sensor platform to measure the most important physical variables related to the most important activities. 2) We tested the reliability of the system in residential and office environments. 3) We built a machine learning model to estimate the number of people present in a meeting rooms.

Sensor platform
In order to decide, which ADLs provide the most relevant information, we analyzed the frequency of occurrence of activities in the most used ADL assessment scales. In total, we analyzed 16 Scales (Katz ADL Index, Barthel, DS, Lawton IADL, Lawton PSMS, RLT, FAQ, FIM, DAFS, NOSGER, CSADL, FAM, BADLS, ADCS-ADL, DAD and W-ADL). Based on this evaluation, we decided to focus on the activities cooking, eating, toileting and showering because those are among the most frequent. Additionally, we decided to include sleeping because recent research suggests that sleep is a good predictor for cognitive impairment [9,10,11]. For the office environment, we decided to focus on the activities meeting, opening / closing of windows, use of electrical appliances and desk work because they are most closely connected to Heating, Ventilation and Air Conditioning (HAVC) systems. We then analyzed which physical parameter we have to measure to detect those activities. Table 1 provides the resulting mapping between physical parameters and activity. The final sensor platform is shown in Figure 1. It included 9 ambient sensors which measure variables such as Temperature, Humidity, Light intensity, VOC (Volatile Organic Components), Sound pressure, 3D Acceleration, Magnetic field strength, Motion and Distance (via Time of Flight). Additionally, the sensor platform includes a multipurpose IO connector which provides standard communication interfaces such as I2C, SPI as well as GPIOs, ADCs and Power. The connector enables the sensor platform to be expandable with additional, highly specialized daugh-terboards. One such daughterboard was built to estimate the power consumption of appliances by measuring the residual magnetic AC field at the surface of power cords.
As the main goal of our sensor platform is to provide an easy and reliable tool for detecting and tracking daily activities in the residential and office environment, we optimized the sensor platform according to the following constrains: Power consumption: The sensor platform was optimized for ultralow power. The mean current consumption was measured to be around 90μA which provides a battery life of about 2 years. Unobtrusiveness: The sensor platform was designed to be small, lightweight and unobtrusive. The final dimensions are 80x40x25mm. Simplicity: The sensor platform uses Bluetooth 5.0 (Low Energy) to transmit sensor data to a base station. We use non-connectable advertisement packets to simplify the setup process. No pairing is required and multiple sensors can be installed in a short amount of time. Reliability: Sensor data are transmitted via Bluetooth 5.0 (Low Energy) with a repletion rate of 4 every 30s. Events from PIR and accelerometer are transmitted immediately when they occur. Security: Sensor data is encrypted via 128-bit AES before transmission.

Validation
To validate the ability to detect ADLs in a residential environment, we installed a set of 13 sensor units in two apartments of two healthy participants. During a period of 6 and 8 weeks the participants were instructed to keep a journal of their daily activities. A simple random forest classifier was trained on the dataset using a 7-fold Cross Validation methodology. The resulting classifier achieved a mean precision and recall over all tested activities of 0.97 and 0.96 respective.
To validate the sensor platform in an office environment, we conducted another two field trials in two meeting rooms where we analyzed the reliability of the system and its performance detecting the number of people. Room-1 had a floor space of 3.5m x 6m (21m 2 ) and a height of 3.5m (73.5m 3 ) and was used to test the data logging system and to develop machine learning models. Room-2 had floor space of 4.2m x 6.4m (26.9m 2 ) and a height of 3.5m (94m 3 ) and was used to test the transferability of the machine learning models to unseen locations. We compared multiple neural network architectures such as GRU, LSTM and Deep separable 1D Conv-Nets. For model selection, we monitored the mean absolute error (MAE) for the estimated number of people on the validation data. We found good agreement with few people in the room and high deviation where several people occupied the room. The MAE of room-1 was 0.069±0.045 over all data. The MAE calculated only for time frames with presence amounted to 1.31 ± 0.75. To test the transferability of the model we used it to predict the number of people using date of room-2 where we got a MAE for time-frames where people were present of 1.4.

Conclusion and Outlook
In this poster abstract, we report on the design of an ultra-low power sensor platform for the detection of daily activities in residential and commercial sector. The sensor platfrom has proven to be a reliable tool for collecting sensor date in resi-dential and commercial settings. The developed model for people count estimation suggest some ability to generalize to similar rooms. Never the less, the variability in the predictions is high which poses some limitations on the applicability of the predictions as an input to building control systems. Further improvements are required. More and more prosumers will penetrate the power grid. But how do prosumers affect the accuracy of the day-ahead load forecast? In contrast to related research on prosumers and load forecast, this paper addresses the impact of different shares of prosumers on the load forecast for areas with several households. In order to answer this research question, the load forecast accuracies for a dataset without prosumers is compared to the ones of datasets with different shares of prosumers in an experimental setup using neural networks. A sliding window approach with lagged values up to seven days is applied. Apart from electricity consumption data weather and date data are considered. The conducted tests show, that the mean absolute percentage error increases from about 8% for a dataset without prosumers up to about 39% for a dataset with a share of prosumers of 80%. It can therefore be concluded that prosumers decrease the accuracy of the day-ahead load forecast with neural networks. Keywords: load forecast; neural network; prosumer; sliding window approach Prosumers are households, which consume their self-produced electricity [1]. A normal household's load is driven by various factors like for example socioeconomic factors as the daily, weekly or yearly rhythms or physical factors like the temperature [2]. A prosumer's electricity requirement could be assumed in general the same as the one of normal households, in case of similar behaviour. But in addition, pro-sumers produce electricity on their own 1 . A main problem of electricity produced from renewable sources is the intermittency [3]. In general, renewable energies have been regarded as non-controllable and unpredictable electricity sources [4]. This causes additional costs as operating reserves need to be planned and backup capacity for short term electricity production need to be available. For prosumers the electricity production from renewable sources and their electricity consumption from the grid are linked. They combine the uncertainty of the electricity production from renewable energy sources and the uncertainty of the behaviour of households with respect to their electricity consumption. This leads to the hypothesis that it is more difficult to forecast the load for areas with a higher share of prosumers.  The produced energy may be used "internally"; the grid may not see the produced energy.
This paper aims to answer the research question how much the dayahead load forecast accuracy with neural networks is deteriorated or improved with an increasing share of prosumers. To simplify the load is predicted for a period of 24 hours. This paper distinguishes two ways to analyze. On the one hand, load forecast for prosumers can be done with neural networks, which have been trained on a dataset without prosumers. This assumes that current load forecaster based on neural networks are used to forecast also in future when more and more prosumers may appear in the grid. On the other hand, neural networks can be trained specifically for the load forecast for prosumers. This simulation answers the question if grids with a higher share of prosumers are in general more difficult to forecast.

Datasets
The non-prosumer dataset was provided by a power utility of a city in Switzerland. 2 The dataset contains 15 minutes measurements of 469 households from the whole year 2015 with an overall consumption of 1'325'267 kWh. 3 The prosumer dataset was provided by another power utility of another city in Switzerland 2 . It contains 15 minutes measurements in kWh of the net electricity consumption and production of 146 objects from the year 2017. After a data selection process, 100 objects were left. They have a yearly electricity consumption of 712'330 kWh. Further inputs are the timestamp consisting of date and time, the weather including temperature, global radiation and precipitation downloaded from IDAweb from MeteoSwiss. Additional inputs are the weekday, calendar week, month and the holidays. The paper is following a sliding window approach as it is proposed by [5,6,7]. The train label and further input variables are derived from the original dataset. The train label comprises the current kWh value of a point in time and the previous 95 15-minutes kWh values in order to predict the kWh values of 24 hours. Further input variables are lagged. The best combinations of lagged variables are shown Table  1.
For having different shares of prosumers, the electricity consumption of prosumer and non-prosumer households are merged resulting in five new datasets with shares of prosumers of 0%, 20%, 40%, 60% and 80%. The shares are calculated based on the annual electricity consumption in kWh of the households, not the number of households. Further, measurements of two different locations and years are combined. Because weekday impacts the load [8,9], the data from the two dataset are not merged based on the date but based on the weekday. The weather data is always taken from the location of the prosumers, as their electricity production depends stronger on weather data, especially the global radiation, than the electricity consumption. The holiday is taken from the dataset with the higher share in the merged dataset (e.g. for the dataset with 60% prosumers load and 40% non-prosumer load, the holidays of the city of the prosumers were considered).

Neural network
Feedforward neural networks consist of an input layer, one or several hidden layers and one output layer [10]. The input shape or number of neurons of the input layer is given by the number of input variables. The number of input variables can vary according to the chosen size of the sliding window. The output shape of all the neural networks of this paper is 96 as this is the number of 15-minutes kWh values within 24 hours (corresponding to day-ahead load forecast). The definition of the number of neurons in the hidden layers is subject of the various tests performed to parametrize the neural networks. The number of hidden layers varied between three and ten and the number of neurons per hidden layer varied between 300 and 1'000. In the network, the various layers of the model are fully connected [11]. Further the two related optimizers RMSprop and Adam algorithm are used. The loss function is either mean squared error (MSE), mean absolute error (MAE) and mean absolute precentage error (MAPE). These three performance measures are also used to measure the accuracy of the load forecast, i.e. for having meaningful optimization goals (especially MSE and MAE) and results that are on a comparable scale (MAPE). The used network parameterisation is shown in Table  2.

Evaluation
Comparing the load forecast accuracy for datasets with different shares of pro-sumers, there are two ways how the load forecast was performed on these datasets. In the first experimental setup the datasets with different shares of prosumers are used to perform the load forecast with the neural network that has an optimal parameterization to perform load forecast for non-prosumer datasets. Afterwards, the trained neural network was tested with the data of the whole year of the four datasets with different shares of prosumers.
In the second experimental setup, the neural networks have been trained and tested on the datasets with different shares of prosumers. Thereby, once the neural network parameterization with the best results for the prosumer dataset and once the parameterization with the best results for the non-prosumer dataset were used. It can also be observed that the load forecast accuracy is better when the neural network is trained and tested on the datasets with different shares of prosumers (second and third row in Table of Figure 1) compared to the first setup, where the neural network was trained on non-prosumer data (first row in Table of Figure 1). Figure 1 illustrates the MAPE. The blue graph (A) represents the first setup. It can be observed that with this setup MAPE increases disproportionally fast with an increasing share of prosumers compared to the other two tests where the neural networks were trained and tested on datasets with different shares of prosumers (orange (B) and grey graphs (C)). The comparison of datasets with different shares of prosumers has shown that the load forecast accuracy decreases with an increased share of prosumers. Independent from the experiments the load forecast accuracy for prosumer datasets is lower than for nonprosumer datasets. The lowest forecast accuracy was achieved when the datasets with different shares of prosumers were tested on the neural network trained on a dataset without prosumers. The result improved when the neural networks were trained and tested on the neural networks with different shares of prosumers. For a share of prosumers of 60% or higher it is recommended to use the parameterization of the neural network, which achieved the best results for the prosumer dataset.
Funding Publication costs were covered by the DACH+ Energy Informatics Conference Organizers, supported by the Swiss Federal Office of Energy.

Availability of data and materials
Weather data from Switzerland are retrieved from MeteoSwiss at June 14, 2019, from https://gate.meteoswiss.ch/idaweb. The consumption and production data are confidential. In recent years, the academic community intensified research on local energy markets. Implementations in pilot projects provide first insights into different hypotheses and approaches. This work presents a tested IT-architecture for local energy markets, which covers all necessary processes and basic functionality, namely the hardware, the market implementation, the database, and the application for the user. It consists of four modules and eight essential processes. The IT-architecture can serve as a blueprint for future local energy market projects as it covers the basic processes and is at the same time extendable.
Keywords: Local Energy Market; IT-Architecture; Energy Transition Introduction The expansion of small renewable generation capacities in the distribution grid changes the paradigm of top-down electricity grids and causes the emergence of new microgrid concepts that allow participants to trade their residential generation with their neighbours. Due to this changing situation, there has been increasing discussion in recent years about local energy markets (LEM) [1,2,3]. An LEM adds a market layer to a microgrid that is originally a mere technical concept. On these markets, small local producers and prosumers trade with local customers (e.g. private households) in the immediate vicinity [4]. Currently, there are several pilot projects and a vital discussion about proper market designs and regulatory issues has emerged [5,4]. However, the discussion is currently rather focused on market designs and concepts instead of IT-architectures. Therefore, in this work, we present a developed and tested IT-architecture design for local energy markets in a microgrid. This architecture is implemented in the Landau Microgrid Project (LAMP), a real-world implementation of an LEM [6]. This pilot project is a cooperation of the Karlsruher Institute of Technology, the software developer Selfbits and the local utility Energie-Su¨dwest. Its objective is to investigate the requirements, challenges and opportunities of an implemented LEM. The project is set up in a selected microgrid in the German city Landau. A local combined heat and power plant (50kW electrical) and a photovoltaic system (23.56kWp) provide local generation. The microgrid is connected to the public grid via a single link and consists of 118 connection points, most private households. This connection ensures a continuous supply and allows excess energy to be fed into the public distribution grid. Initially, eight private households decided to participate in the LEM. Based on this case study, we describe the proposed ITarchitecture and present an exemplary implementation, including specific technology choices.

IT-Architecture
The architecture consists of four modules. Each takes on functional tasks within the structure. First, the system has to record the load values of all participants (Smart Meter Hardware). Second, the customer application requires an interface to enable interactions with the user. The participants must have access to their individual load data and be able to submit bids into the system (User Application). Third, load and bidding data have to be matched by the market mechanism (Market). Fourth, the recorded and generated data of all former modules must be stored and accessible to all applications (Database). Furthermore, specific processes exchange information between the different modules to ensure the operation of the overall information system. A representation of the architecture with its modules and processes is shown in Figure 1. In the following, the functionality of each module and its processes are presented in detail. Smart Meter Hardware: The task of the Smart Meter Hardware module is to record and communicate individual load data. Energy trading on an LEM requires the current load profiles of all participants. In the proposed architecture, a digi-tal electricity meter records the load   Figure 1, process 1 displays this transfer of load data from the Smart Meter Hardware to the Database module. In the case study, the Smart Meter Hardware module is implemented through a combination of the 'Long-Range Wide-Area Network' (LoRaWAN) and digital electricity meters with a LoRa-Sensor communication module. Each meter is connected to a LoRa Sensor, which sends the recorded data to the network. Then, the LoRaWAN server processes and transmits the recorded data to the Database module via a WebSocket connection. The advantages of the LoRaWAN technology for this application are the easy installation, scalability due to the cost per sensor, and signal strength. A disadvantage is the LoRaWAN-Gateway, which represents a possible single point of failure if it is not redundantly installed. User Application: The information system of the LEM needs a humansystem interface where each participant is able to place bids on the market. In the proposed IT-architecture, the module User Application addresses these requirements. The application must be accessible by all participating users over, e.g. mobile devices like smartphones. Figure 1 shows that five different processes originate from this module. Process 2.1 and 2.2 show the registration and authentication process of a new user with her login data. Both are necessary to identify the user and prevent other participants from viewing the individual load data or issue bids in the user's name. After a successful registration, the system can authenticate the user by its login data. This is necessary for the login process (2.2). For security reasons, the login data is stored on a different database (account database) and separated from the market data (market database). The connection between the user authentication data and its individual market data is established with process 2.3. It links the ID of the smart meter hardware with the user login data. Based on this connection, process 2.4 is able to request individual consumption and market data. Comparably, the user initiates process 2.5 by entering a bid price in the application. In the case study project, the software partner provided a selfdeveloped Android based application for mobile devices. After a successful login by the user, the application receives a JSON Web Token from the account database to authenticate the user against the market database. It allows a stateless session between the application and the market database. Since end devices are often not optimized for data storage, the application sends live queries against the market database to receive the requested market data and visualize it. The application illustrates the data in different forms like charts and tables and a graphical controller allows to submit bids within specified limits. Market: An LEM requires a market to match the local supply and demand. In the proposed architecture, the Market module consists of two components: The Mar-ketWrapper and the market mechanism. The MarketWrapper is the first software component. Its task is to process the raw input data from the market database into bids for the market mechanism. Process 3.1 displays this procedure. The market mechanism, the second software component, receives the bids, allocates them and generates transactions and market prices as outputs. These are handed over to the MarketWrapper, which hands them over to the market database (process 3.2). The market database transmits requested data in a JSON file format via a GraphQL API. These files are processed by the MarketWrapper into bids and handed over to the market mechanism. The market is cleared expost in 15-minutes intervals. The implemented market mechanism is described by [7]. The market mechanism creates transactions for each trading period that include the market price, volume and buyer and seller ID. The MarketWrapper transfers each transaction back to the market database module over the same API. Database: Each LEM requires the storage of the recorded and generated data. The Database module provides this functionality. The module is the central point in the architecture and consists of two databases: the account and the market database. The account database, as mentioned above, manages the authentication data of the users. The task of the market database is to store all data associated to generation, consumption and trade and to make it available to other applications. While this module does not initiate processes itself, each of the other three modules communicates and transfers information exclusively over the database. Therefore, consistency and assignability of the data are important and with it a proper database design. In the case study project, this challenge is addressed by an object-relational database built with the open-source database management system PostgreSQL. It organizes the data with different tables and each data type (e.g. smart meter readings) is stored in its own table. A server handles the management of the database and processes data requests in a specific programming language. Such a GraphQL server manages and monitors the writing and reading accesses of the other modules.

Conclusion
This paper is intended to be the starting point of a discussion on the IT-architecture of LEMs and thus contributes to the maturing of this concept. The design of an LEM's architecture has a central influence on subsequent functionality and performance and its scalability. In this work, we propose an IT-architecture design for LEMs which can serve as a blueprint for future projects. The architecture is divided into four modules. Each takes over different tasks within the LEM information system. The Smart Meter Hardware collects load data, the User Application serves as an interface between user and information system, the Market coordinates the matching, and the Database stores the data. Processes describe the data exchange between these modules. Each process performs a different task to ensure the functionality of the LEM. We provide an exemplary technology implementation of each module and its processes in a case study. The choice of the respective technology or additional modules and processes depends strongly on the particular project and its requirements. Due to the joint impacts of both demographic changes and technological trends such as electric vehicles, the development of urban electric load is increasingly uncertain. While sophisticated machine learning methods promise to alleviate this issue, practical application of these methods is frequently limited by insufficient availability of data from distribution system operators. To overcome this challenge and provide a useful tool for network planning in urban areas, we propose a load decomposition model with minimal data requirement to model the joint impact of demographic and technological developments. The model is composed of a statistical and a deterministic part. The statistical part uses a constrained elastic net regression to decompose the annual energy consumption into residential and commercial sectors. Following this, the deterministic part of the model uses the sector-specific energy consumption forecasts from the statistical model, to scale their corresponding standard load profiles and conduct further modifications on those. Keywords: Load forecast; Machine learning; Energy decomposition; Electrification; Network planning Introduction With the energy transition, new technologies such as electric vehicles (EV) or heat pumps (HP), as well as distributed generation, are altering the electricity consumption pattern and challenge the quality and reliability of electricity supply [1]. These developments are expected to unfold predominantly in urban areas because of urbanization [2], affordability of new technologies [3], cities' responsibility to compensate their greenhouse gas (GHG) emissions [4], and their role as thought leaders to promote new energy technologies [5,6]. At the same time, demographic developments are shaping future electricity consumption in urban areas as well [7,8], but with different characteristics. Demographic development affects mainly conventional electric loads (e.g. residential and commercial load), of which the patterns are relatively well-known and can be described using historical data for creating standard load profiles (SLP) [9]; while technological development brings new technologies with different electric load patterns than the conventional sectors [10]. The complexity of urban areas' development and the differences between the two types of trends call for a novel model to study the future electric load of urban areas. However, as summarized in [11], most of the previous studies on long-term load forecast are limited either by low time resolution [12,13], or by high data requirements [14,15]. Ideally, a load forecast model for urban network planning should have the following characteristics -(1) predicting the aggregated electricity consumption (E) as well as the peak loads (Pmax) as they are both important references to scale grid components [16]; (2) using commonly available data sets since high data requirements have always been a bottleneck for long-term load forecasting in practice [17,18], and the urban system operators are usually short of data collection [16]; (3) quantifying joint impacts of both demographic and technological trends that will simultaneously arise in urban areas. Requirements (1) and (3) naturally lead to profile-based long-term load forecasting since load profiles provide load data with a high time resolution and allow simple addition of loads with different characteristics. However, current profile-based forecast models heavily rely on extensive data input, leaving the point (2) unresolved. Therefore, two research questions that we tackle are: (a) What is an appropriate model to estimate the long-term impacts of both demographic and technological trends on urban electric load? (b) What is the minimal data requirement for a long-term load forecast model?

Methodology
In order to model both the demographic and technological trends, a hybrid model composed of statistical modelling and deterministic modelling parts is developed (Figure 1). The model is able to decompose the annual energy consumption into residential and commercial sectors (statistical modelling), which enables to localize the technological trends into different sectors and to model their impacts as a subsequent step (deterministic modelling). Load decomposition is the key of the proposed model. For example, the adoption of household energy efficient appliances will only reshape the load profile of the residential sector, not the commercial sector. For such cases, it is essential to differentiate the load profiles of different sectors as a first step and then model the impact of the new technologies on their associated sectors.
(1) Statistical model for demographic trends The goal of the statistical modelling is to find out the relation (dashed arrows in Figure 1) between the independent variables such as population (POP) and the annual energy consumption of the residential and commercial sectors. Linear regression is selected as the basis for the modelling because of its interpretability [19] and transparency [20]. Since the energy consumption data for each sector is not commonly available for distribution system operators (DSO), they cannot be directly used as dependent variables. Instead, two other dependent variablesthe measured annual energy consumption E and peak power Pmax (the yellow boxes in Figure 1) are used. E is used as the primary dependent variable assuming that it has a linear relation with the independent variables. Pmax is used as the secondary dependent variable which serves as a reference to decompose the E into the two sectors. Together with regularization and bounds on the linear coefficients, the model is formulated as follows: Part of elastic net regularization þλlasso β k k 1 Part of elastic net regularization s:t:cj;lb ≤β j ≤cj;ub j ¼ 1; :::; p ð Þ Bounds on the linear coefficients During model training, the hyperparameters (λpf, λc, λridge, and λlasso) are determined at first with cross-validation (CV) and then the linear coefficients (β) are determined with bootstrap to overcome heteroscedasticity [21].
(2) Deterministic model for technological trends After the statistical model has been trained, annual energy consumption of the residential and commercial sectors can be estimated. As a result, their scaled SLP can be obtained and used as the basis for the following deterministic model. For example, if households will adopt appliances with higher energy efficiency, an efficiency factor can be used to scale down the residential load profile; if PV-battery systems will be installed in households, a PV-battery model can be applied on top of the residential load profile. Fig. 1 (abstract P6). IT-Architecture Evaluation Data from a German city is used to conduct a first evaluation of the model's accuracy in energy decomposition. Results show that the model performs well in predicting the overall energy consumption. However, its energy decomposition performance is only good for the suburb network but not for the city center network. The bias in the city center network prediction can be explained by the heterogeneity of the commercial consumers in the city center. Unlike the residential sector, the commercial sector is more diverse and it can be further decomposed into 6 subsec-tors whose standard load profile have different patterns [9]. In order to improve the forecast accuracy in commercial areas, we will further decompose the commercial sector to capture its heterogeneity in future research. This is enabled by the flexible model structure which allows to conduct decomposition into any arbitrary number of sectors. Conclusion To model the joint impacts of demographic and technological trends on electric loads in urban areas, an energy decomposition model using a constrained elastic net regression algorithm is established. The model has minimal data requirements from the distribution system operators -(1) the annual energy consumption measured at each consumption unit, and (2) the maximum current measured at MV/LV transformers. These two datasets are both used as dependent variables in the constrained elastic net regression model. This model extends the standard elastic net regression model by adding two more constraints -a peak power constraint and bounds on the linear coefficients. Enabled by the established model structure, future research or practical applications can focus on including more various independent variables, further decomposing the sectors, and enriching the scenario setups.

Funding
This contribution is submitted within the boundaries of a research project funded by the German Federal Ministry for Economic Affairs and Energy following the decision of the German Bundestag. Publication costs were covered by the DACH+ Energy Informatics Conference Organizers, supported by the Swiss Federal Office of Energy.

Summary
The success of demand response programs, as one of the key applications of a smart grid architecture, essentially depends on the end consumers' decisions and interactions. Technical demand response models mostly require presumptions concerning these parameters. In this paper, an agent-based model of consumer participation in demand response programs based on the Consumat framework is developed. It will constitute the basis for an overall model to simulate consumer decisions in the context of demand response. Fig. 1 (abstract P7). Overview of the model (blue: data inputs; yellow: dependent variables; green: scenarios for prediction; dashed arrows: unknown relationships) Keywords: consumat; demand response; agent based simulation; consumer Introduction One of the most important measure to address climate change effects is the global establishment of smart grid architectures. As one of its essential technologies, demand response has to be enabled in the residential sector to meet the European targets for a reduction of greenhouse gas emissions by 2030 (40% compared to 1990) and a greater share of renewable energy of at least 27% [1]. Demand response in this context refers to "changes in electric usage by end-use customers from their normal consumption patterns in response to changes in the price of electricity over time, or to incentive payments designed to induce lower electricity use at times of high wholesale market prices or when system reliability is jeopardized" [2]. Based on data from the US energy market (2014) demand response in the residential sector contributes 20% of the total peak demand savings and 61% of the overall energy savings [3]. As shown in, e.g., [4] and [5] the success of a demand response program essentially depends on the end consumers' participation and their behavior when configuring and using a DR system. A technical simulation model which integrates these soci-ological aspects would be very helpful to support the deployment of a new energy infrastructure. Analyzing such socio-technical systems is a major research field. A review on literature has shown that agent-based models might be considered as a preferred simulation tool (see, e.g., [6,7,8,9]). There exist several frameworks to model human decision making processes in agent-based systems. An overview and guidelines about what kind of agent decision making model to be used is given in [10]. One of these frameworks is the so-called "Consumat" model of Jager and Janssen, first published in [11]. Several publications already exist which use this approach to model sustainable behaviors but also other types of decision making like farmer crop choices (see, e.g., [12,13]). The aim of this work is to develop an agent-based model of consumer participation in demand response programs based on Consumat and to prove its suitability for further implementation in an overall socio-technical demand response consumer model.

The Consumat approach
The socio-psychological framework of Consumat allows the agentbased simulation of human decision making in situations related to consumption of goods or opportunities such as doing a specific activity, deciding where to live, and others. Details of the model and its updates as well as the underlying theoretical background can be found in [11,14,15,16]. Within the Consumat design, the simulated consumers (consumats) have needs and they are equipped with abilities to satisfy these needs with a certain behavior. The decision on which behavior to perform depends on current uncertainty of the agent and its level of need satisfaction (LNS). In an update of their original framework Jager and Jannsen describe three main need forces [16]: existence: availability of economical resources like food, income, housing, etc. social: interactions with others, social affiliation personality: individual tastes and characteristic Several needs differently influence the overall level of need satisfaction and the outcome of a certain behavior may have contrary consequences on the corresponding level of satisfaction for each need. Which behavioral options (opportunities) a consumat has depends on the domain/scenario being modeled. The Consumat approach integrates uncertainty of an agent as a relevant factor for decision making. In [14] uncertainty is described as the difference between expectations and the real outcome of an action. The updated version of the Consumat framework [16] directly couples it to the existence and social needs. With Consumat II different uncertainties concerning the several needs may have different weights within the overall uncertainty. Depending on their uncertainty and LNS agents select specific decision strategies, based on the following key rules [16]: with decreasing satisfaction, an agent accepts more effort to find the optimal behavioral option with increasing uncertainty, the behavior of other agents becomes more relevant This leads to the four main possible cognitive processes for decision making repetition, imitation, inquiring and optimization. Details about the corresponding strategies can be found in section Demand response consumers as 'consumats': model description. Demand response consumers as 'consumats': model description The model developed within this project aims to represent the decision-making of consumers to generally participate in demand response programs based on the Consumat approach. The consumers (agents) are characterized by individual levels of need satisfaction concerning their financial abilities, perception of comfort and the environmental state. Fig. 1 illustrates the adaption of the underlying Consumat model (see [14]) on the decision behavior of a demand response consumer. Based on own results published in [4,17,18], the following driving forces on the micro level were identified and integrated into the model: Needs: financial state, comfort, environmental state Opportunity: participation in demand response program Abilities: financial resources, general comfort requirements, affinity for technology, acceptance Uncertainty Based on its level of need satisfaction and uncertainty an agent will select the underlying cognitive process:

Optimization: maximization of LNS based on own calculations
Inquiring: compare what similar others did with own calculations and decide for maximum Repetition: repeat decision of last tick Imitation: copy last behavior of similar others If an agent feels satisfied or uncertain, depends on individual thresholds LNSmin and Umax. To check the general suitability and logical correctness of the model, first simulation runs based on an implementation in NetLogo (version 6.0.4) with simplified assumptions concerning the model parameter have already been realized. The preliminary results indicate that in order for the model to be used for practical purposes, further investigation of realistic parameter settings will be necessary.

Conclusion and Outlook
This work presents the development of an agent-based model of consumer participation in demand response programs based on the Consumat approach. At the current state of the project, the model provides a basic framework for further research on the variation of the agents' general behavior in time and the influence of varying input parameters on participation decision. It may be used to find optimal policy options and measures to motivate consumers to participate in demand response programs. The model is scalable and can be extended by an additional logic considering the short-term aspects of consumers' interactions in the context of demand response. The underlying NetLogo tool allows interaction with other simulation frameworks like, e.g., mosaik. Future work will focus on two aspects: (1) improve and refine the Consumat approach to model consumer participation in demand response programs and (2) integrate it in an overall model of consumer decisions in the demand response context.

Funding by the Federal State of Salzburg under the WISS2025 program is gratefully acknowledged.
Availability of data and materials Not applicable.
Author's contributions The idea for this paper was developed by JS (95%) and DE (5%

Summary
Research on the practical effects of control algorithms in smart grid systems is often dependent on simulation, since the full modeling of all the devices connected to the grid is usually not amenable to a purely theoretical analysis. Many open source core simulation packages exist, but they typically involve only core network simulation, requiring custom scripting without offering advanced functionality needed for the full assessment of the solution tested. The OPTISIM package was born as an answer to the lack of a smart grid simulation framework offering a modular structure with a clear interface for the algorithmic part, management of input and output data flows, graphing and configurability through JSON files. OPTISIM integrates stateof-the-art, freely available software components for network simulation, inter-process communication and time series database to offer a comprehensive tool. Keywords: smart-grid; simulation; demand-response; demand-sidemanagement Introduction A complete framework for the simulation of smart grids needs to provide several features: power flow resolution, graphing, management of inputs and outputs, an interface for plugging in decision algorithms for managed flexibilities, integration of physical and control models, et cetera. The OPTISIM framework was created as a tool capable of satisfying each of these needs, easing the core research task of writing realistic models and novel centralized and distributed control algorithms for demand side management and steering gridconnected apparel.

Power Flow simulation
The electric simulation engine underlying OPTISIM is OpenDSS [1], a freely available, industry-grade distribution system simulator by EPRI. It is a fast, widely used and powerful simulator, but needs third-party bindings for usage with languages such as Python in order to be interfaced with modern scientific computing packages. Many Python projects exist offering thin wrappers around the core OpenDSS library [2]. We created the Krangpower package [3] [4] with the aim to provide several enhancements. Krangpower is built over the Open-DSSdirect.py thin wrapper and offers syntactic sugar such as operator-based insertion of elements in the circuit and retrieval of objects through simple indexing. Furthermore: Items that OpenDSS returns as simple lists of floats, representing real and imaginary parts of flattened arrays of physical quantities, are returned as a numpy array [5] with the correct shape and format. Items come, where appropriate, as Quantities (from the pint [8] package) including information on the measurement unit. This enables easy conversions and secures against miscalculations.
The OpenDSS text interface is checked for errors (normally just returned as strings without raising a Python error).
Another key additional feature is the ability of returning a graph (Networkx package, [9]) representing the underlying grid and featuring the simulation results as node/edge attributes, enabling advanced graphing and analysis. Interaction with models Many of the most interesting smart grid studies involve cosimulation with physical models of batteries, heating, houses and other appliances connected to the grid. We will call them, generically, "agents". OPTISIM offers the possibility to insert software models of these kind of objects directly into the simulation. These models are configured in a dedicated structured json file, whose parameters are fed to the constructor of these objects. During the main simulation cycle, OPTISIM computes the electrical consumption for every model, obtaining active and reactive powers that are then used to configure Krangpower for the following step. Often, these models need time-series data streams as input for computing the final power (such as consumption profiles for uncontrolled loads, irradiance profiles for the models of photovoltaic plants). In vanilla Krangpower, it is possible to use simple delimited files, leveraging the underlying capabilities of OpenDSS. In the OPTISIM framework, a more flexible solution was chosen, involving a time-series database, InfluxDB [10]. The single agents, such as house models, batteries, etc., are aggregated under a single "meter", representing a commercial point of delivery that owns the underlying appliances. Each meter with its agent models runs asynchronously, in a separate process, to achieve high concurrency.

Interaction with algorithms
The central feature of OPTISIM is the ability to run separately algorithms for governing the behavior of the agents. This is to be distinguished from the built-in, parametrized models of control circuitry that are coded in the agents; we refer to computational, decisional processes that govern the agents in order to obtain an individual or collective goal, such as demand side management [11]. The scope of these algorithms is vast [12,13] and they constitute the central item of research that OPTISIM aims to aid. Examples of algorithms that can be experimented are individual self-consumption optimizers that use a forecast of local production and demand to optimally manage flexibility, or more complicated coordination schemes that involve communication between the agents and the iterative solution of an optimization problem for achieving a common goal under certain sets of rules. Overview and message broker As we have seen, the OPTISIM framework involves several processes running in parallel (the main script with the power flow simulation, the database, the externally interfaced algorithms, the agent models). In figure 1, an overview of the whole modular architecture is depicted, together with the data flows between the parts.

Execution time and test runs
The OPTISIM package is written in Python language. The choice was natural, since it is a framework connecting several existing software packages. The wide availability of bindings for all the tools involved and the scripting nature of the project made the choice natural. Nevertheless, the tool is quite optimized. In Table 1 [14]. The agents include thermal building models with heat pumps and boiler, rooftop PV, pure consumption profiles. LIC: it is a network modeled from real data from the Lugaggia Innovation Community [15] pilot project in Lugano (CH). The data was supplied by the local DSO, AEM.
In both cases, no algorithm is governing the devices connected to the grid, but the agent physical models are included. Running control algorithms, naturally, can increase the time for taking a simulation step according to their complexity. Examples of results are shown in Figures 2 and 3.

Conclusion
In this article, the OPTISIM framework was presented as a comprehensive tool for simulating electrical grids interfaced with physical models and management algorithms. The general architecture was investigated and each of the modular components was described. The Python code for the project is under review and will be soon made available to the community as open source.

Funding
Publication costs were covered by the DACH+ Energy Informatics Conference Organizers, supported by the Swiss Federal Office of Energy.

Availability of data and materials
The data for the Lugaggia Innovation Community were supplied by AEM SA, Switzerland in the context of a pilot project and are not publicly available. The data for the IEEE European Low Voltage test circuit are made available by IEEE at https://site.ieee.org/pes-testfeeders/resources/.

Author's contributions
All the authors were involved in the architectural design, implementation and test of the OPTISIM platform. As the transition to cleaner and more efficient systems for cooling and heating speeds up, it becomes more and more relevant to manage their electrical consumption to avoid overloading the electrical grid. The focus of this work is on recognizing the operational state of heat pump (HP) systems on the basis of smart meter data. For this purpose, we illustrate the application of time series classification methods and deep learning models on a monitoring data set that includes ground truth information of the HP operating state. Potentially, the information of HP state of operation can facilitate their integration to the grid.
Keywords: Digital meter data; Heat pumps; Data science; Time series classification; Deep learning Motivation An important share of the heat demand in Switzerland is already provided by heat pumps. However, it is necessary to speed up this transition from non-efficient and polluting heating systems to more sustainable ones. The electrification of these loads can bring thermal and voltage problems in low voltage networks. [1] Thus, besides planning and design aspects, an important aspect for the integration of these highly efficient heating systems into the grid is to control them such that bottle necks in the flow of electricity are prevented. [2] Traditional demand side management (DSM) techniques still in use today, consist of deactivating loads, such as heat pumps, during peak consumption times. Novel coordinated control of larger numbers of loads opens the possibility of new DSM business cases. However, this requires understanding of the operation of the HP components to avoid disrupting their duty cycles and potentially damaging the equipment. Methods for recognition of heat pump operating modes As the roll out of digital-meters continues practically in every utility, numerous research and innovation projects have looked at utilizing these data to better manage loads (DSM). When it comes to HPs within those loads behind the meter, aspects studied are related their control, identification and characterization. [3,4] Our work concerns the identification of the HP state of operation. For this purpose, methods developed in previous work make use of classical machine learning (i.e. feature engineering, dimensionality reduction, clustering), Bayesian change point detection approaches, and deep learning. The classical approach consists, essentially, of two steps (1) cycle recognition (i.e. cycles may be of different durations), and (2) classification of each cycle into one of the possible HP states. The classification takes places in a feature space corresponding to various summary metrics of each cycle. On the other hand, when applying artificial neural networks (NN), the cycle duration is fixed and the NN is trained to learn the mapping between the HP cycle-power consumption time series and the labels indicating the operating state (e.g. off, space heat, or hot water in Figure 1). Other common approaches for time series classification such as dynamic time wrapping, longest common subsequece, or clustering have not yet been applied. Here, in order to derive a method that is applicable to systems from any manufacturer, we abstract the different HP operating states into the most relevant ones: off, hot water (HW), and space heating (SH). Since our focus is on recognizing the operational state of heat pump (HP) systems with algorithms that can potentially by applied nearreal time to inform control decisions, we investigate one dimensional convolutional NNs. Compared to recurrent NN, which have feedback loops between output and input, low complexity convolutional neural networks (CNNs) have been shown better performance on sequence modelling tasks. [5,6] A simple model of a NN can serve as good baseline for more advance deep learning models, recently such 1D-CNN type of models have been applied to electricity load forecasting [7], and prediction of energy efficiency of domestic cooling systems. [8] For the implementation of the models we use Tensor-Flow [9] (v2.2.0) through its high-level application interface Keras. 4 Data sets and models In the context of this work, we explore several datasets, Table 1 describes three of them. The HSLU data is collected for load disaggregation research. However it also includes power consumption and specifications of HPs in operation, along with relevant building information for thermal analyses. The NTB Buchs and the WP Monitor data concerns dedicated HP research activities aimed at evaluating the efficiency of HP systems. Thus, besides the energy consumption data typically recorded by digital energy meters; temperature, volumetric flow, and power consumption are recorded by dedicated sensors. Moreover, binary variables indicating the operation of key components such as compressors, pumps, electrical backup heaters, cooling circuits, and storage tanks are provided. Here, we use those variables to label data with the corresponding state of operation. Fig. 1 (abstract P9). Scheme of the OPTISIM architecture

Final remarks and outlook
Several data sets are available to study the behaviour of heat pumps as seen from the grid, but dedicated monitoring campaigns are valuable to observe the behaviour behind the meter. We use these data to encode the heat pump state of operation, and evaluate the performance of one dimensional convolutional neural networks (1D-CNNs) with different configurations to predict the time evolution of the states. As expected, sudden changes of states are hard to predict correctly. However, training these networks with only a couple of hours of training data, on a laptop takes less than a minute for the deepest network (4 convolutional layers) we tested. Thus, it seems feasible to run them in an online fashion. Training on a full year of 1minute data, for the deepest network takes up to 20 minutes. Next steps in our research involve the evaluation of models to predict energy consumption at different levels of aggregation. The current century is challenged by the growing rate of the ageing population and the potential lack of energy resources [1]. A new generation of intelligent home energy management systems based on NILM techniques using the data gathered by these meters can help tackle both issues. Besides of allowing for more efficient energy consumption in residential sector, NILM approaches can also enable the detection of various health care features (e.g. inactivity, sleep disorders, memory issues, ...) [2] and offer more independence for the elder population. In this paper, the author highlights the open research problems related to NILM approaches and their applications in smart homes in relation to the previous two challenges. It particularly analyses the problem related to the accuracy of NILM techniques, the social acceptance of these systems and their effect on the two aforementioned applications.

Related work
A major open research challenge in NILM is the comparability of the proposed algorithms [3,4] as scholars generally use different evaluation setups. A recent contribution in this vein was proposed in [5] where 12 NILM benchmarks have been implemented and made available as an open-source project including both traditional and recent deep approaches. The same study proposed an experimental evaluation of the implemented algorithms and demonstrated the superiority of sequence-to-sequence (Seq2Seq) and sequence-to-point models (Seq2Point) in different scenarios. These models were inspired machine translation where they achieved very good results [6,7]. The Seq2Seq learning is about training models to convert sequences from one domain another domain. The Transformer [7] model is so far the best model for Seq2Seq modeling. However, to the best of our knowledge, there has been no proposition to adapt this model for NILM. Scholars demonstarted that NILM can improve energy efficiency by increasing user's knowledge about their appliances [8,9] which was also pointed as a preference by several social studies about SM acceptance [10,11]. Recent works also proposed activity monitoring and the detection of abnormal behaviour through electrical signatures [12,13,2]. The benefit of this approach is its flexibility, low-cost and ubiquity [2]. However, the major challenge is to provide enough accuracy for these applications as it was repeatedly pointed out in that the disaggregation accuracy proportionally increases as the sampling frequency does [2,13]. This monitoring approach has been well accepted by subjects, as well as professionals, during experiments due to their low intrusiveness [13]. However, concerns like privacy still are present and can prevent such solutions from reaching their full potential [2]. Recent studies [14,15] suggest that the acceptance of systems based on Smart Meters (SM) is governed by many concerns that individuals have, this includes but is not limited to: effects on health, cost and installation visits. Therefore, considering the end-users beliefs and concerns should be taken into consideration before the design of any services enabled by this technology. Carefully designed services have the potential to change a user's attitude toward SM. Matter of fact, several studies on acceptance showed that it usually follows a U-shaped curve from higher acceptance in the first beginning to relatively low acceptance during real deployment to again gain a higher acceptance when the project is finished and the user can perceive its concrete benefits [16]. However, to the best of our knowledge, only few studies considered this aspect.

Research questions
The research goals of current and future analysis of the author are to contribute to the improvement of existing NILM approaches, establish a set of user's requirements for SM based systems and finally evaluate how such systems can influence decision making in smart homes. RQ 01: How can deep models for NILM be improved? The author argues that adapting the Transformer model for NILM can enhance the performance and provide more understandable model through its attention mechanism. Matter of fact, establishing an understandable model for NILM will allow to establishing a cause-effect relationship between the observed results and the errors made by the model. It will, therefore, allow for more accurate models and thus enhance the user's trust in the system which will partially contribute to RQ 02. RQ 02: With NILM techniques many services can be enabled, what is the user attitude towards those services? and how can the acceptance and engagement of costumers with NILM based services be improved? Understanding the users' concerns and key factors influencing their engagement would help in establishing a set of requirements for the design of those services. It would also help utility companies to avoid costly consequences of rejections by consumers after installations. RQ 03: Taking into consideration results of RQ 01 as well as RQ 02, how can NILM services support decision making in smart homes in the case of elementary family and the case of elderly people living alone? Advanced home energy management systems based on NILM can influence human behaviour towards much efficient consumption and help make appropriate interventions in the case of elderly living alone (e.g. providing warning for carers or family members). However, in the first case, more efforts on design elements of those systems should be carried out to assess their potential effects on the consumer's engagement. Besides, in the Table 1 (abstract P10). Three data sets including Description NTB Buchs [10,11] High resolution (10-second) monitoring campaign to investigate HP performance aspects, such as: start-up behaviour, defrosting, and influence of auxiliary equipment on efficiency. Up to 13 HP systems (air-water, brine-water, variable speed, systems with cooling, new and renovated systems) were monitored for up to 3 years.
WP Monitor [12] Monitoring campaign for benchmarking efficiency of different HP technologies. A total of 87 HPs (direct evaporation systems, ground source HPs, and variable speed compressor HPs) were monitored during three years (1-minute resolution). Partially anonymized data from three HP that complies to German dataprivacy law is accessible.
HSLU [13] Load monitoring open data from digital meters and other sensors in five houses. 1.5 up to 3.5 years at 5-minute resolution. Power consumption data from HPs (air-water, and brine-water) in three of the houses along with HP specifications and building envelope information. case of the elderly monitoring, well-thought scenarios should be designed to evaluate and assess the applicability of these systems. Though the three research questions seem to be divergent, they are highly linked to each other. Matter of fact, improving the performance of NILM from RQ 01 would lead to higher accuracy and is thus a key factor to improve user's trust and contribute to address the RQ 02. The outcomes of RQ 01 and RQ 02 will theoretically help design a more performant and higher accepted services whose real effect will be evaluated during the RQ 03.

Methodology
As a first step in evaluating the proposal, the author implemented an energy measurement system based on open-source solutions in a living lab environment [12] and conducted a set of experiments that showed great potential for NILM approaches in the domain of Ambient And Assisted Living (AAL). In the next step, the author is willing to work towards answering the first research question by extending the previous system with an advanced NILM technique. This technique will consist of an adapted version of the Transformer for NILM. This model will be validated using established datasets(e.g UK-dale, REDD) and compared to already available benchmarks. For reproducibility purpose, the implementation of this model will be made available as part of the open-source project describe in related work. This implementation will later serve as a back-end for the systems build in RQ 03. The second research question will be addressed using a case study of consumer's from an energy utility in Carinthia, Austria. First, the author intends to perform a literature review of recent institutional documents dealing with the acceptance, attitude and engagement of customers with energy services. The previous study will help select an appropriate model explaining the acceptance which will be the core to design a questionnaire study. The questionnaire will be disseminated to real customers from energy utility which will help to validate the hypothesis of the previous model. RQ 02 is expected to provide a set of user's requirements and preferences for energy services. As for the third research question, an intelligent home energy management system will be developed based on requirements from RQ 02. This system will use the NILM approach from RQ 01 to provide two components: (1) energy feedback about household consumption, (2) activity monitoring and abnormal behaviour detection. Thus, the evaluation of the system will be made in two independent steps. In the first step, focus groups of an ordinary consumer's from different ages will be recruited to assess different design elements on their attitude. In the second step, focus groups of health carers will be recruited to evaluate the real applicability of the monitoring system in the case of elderly living alone.

Conclusion
In this paper, the author discussed her motivations, research questions as well as the methodology she intends to use during her project. The overall goal of the author is to evaluate the potential of NILM based ICT systems to influence decision making in smart homes (RQ 03) which rely on both accurate data (RQ 01: improving already established NILM approaches) and user's requirements for such services (RQ 02: user's concerns and requirements).

Summary
Optimal power flow algorithms can be used to optimally control power systems and to reduce need for grid expansions this way. However, optimization of power systems is a complex problem and still hardly possible in real-time, which would be necessary for grid control. In this doctoral project, a methodology is proposed to train artificial neural networks with the results from offline optimizations in order to speed-up calculation and to ensure feasibility of the optimization. That is expected to achieve fast and near-optimal results, but also allows for high modularity, which reduces engineering effort and makes the approach applicable to diverse use cases. One approach to these problems are massive grid expansions, but these are expensive and undesired by society. To minimize the need for grid expansions, it is necessary to exploit the existing infrastructure as far as possible. Especially distribution grids are mostly operated in a passive way still, which means that the distribution system operators barely perform active interventions to optimize the state of the grid regarding efficiency or stability. The mentioned new actuators technically provide a lot of active and reactive power flexibility to make active control operations possible. Searching the optimal state of an electrical grid is called the optimal power flow (OPF) problem [1]. However, the OPF is a highly complex optimization problem that is difficult to solve fast and in real-time, especially considering large-scale systems with complex constraints [2]. Consequently, real-time capable OPF (RT-OPF) approaches are pursued to make it applicable to grid control. Such a RT-OPF would enable grid operators to keep their grids in optimal state continuously. State of the Art OPF is an umbrella term for procedures that find the optimal steadystate of a power system considering operational constraints and control limits [1]. Generally, the OPF is a large-scale non-convex nonlinar optimization problem that often contains discrete variables and uncertainty, which makes it difficult to solve [2]. OPF algorithms are mainly used by transmission system operators to plan future grid control operations. That is done in 15 minute intervals or day-ahead, because of its computational complexity [3]. There is a trend towards corrective control, thus reacting to contingencies instead of anticipating them beforehand. That means less constraints and better exploitation of the power system [4]. Lots of RT-OPF approaches emerged in recent years [2]. This work focuses on artificial neural network (ANN)-based approaches, because they allow for abstraction from the original OPF formulation, which makes them applicable to various OPF variants. The usual approach from literature is visualized in Figure 1. A conventional offline OPF is performed thousands of times for different grid states to create a training data-set, which consists of a mapping of grid states to respective optimal set-points of the actuators within the grid. The generated training data is then used to train a multi-layer perceptron (MLP)-ANN so that it can approximate the optimization for a given power system. This way, the optimzation problem is transformed to a ANN inferencea series of simple matrix multiplications. Because of that, calculation time can be reduced and convergence problems are not possible anymore. MLPs are mainly used in literature because they are the standard ANN architecture and because they are proven general function approximators [5].
Pan et al. [6] use MLPs to map load values to results of the DC-OPFa simplified OPF that neglects reactive power flows. They achieved a calculation speed-up of three orders of magnitude. To prevent overfitting, 5 they used multiple ten thousand random sampled training data points. However, for training data generation, they varied loads only in the range of ±10%, which is a too small solution space to allow for good generalization. 6 Zamzam and Baker [7] as well as Guha et al. [8] use MLPs to approximate the standard AC-OPF, which is more complex than the DC-OPF. They mapped load values to the optimal set-point of actuators again, achieving a precision of more than 99% in relation to the basic OPF.

Approach
The aforementioned publications were mostly published in 2019 or 2020 and are in an early proof-of-concept stage. That results in some shortcomings, which are aimed to be resolved with this work. First, all respective works consider a specific variant of the OPF problem for a given power system and present a handcrafted ANN-OPF for that problem. However, countless variants of the OPF problem exist and design by hand results in lot of engineering effort. Instead, automation of this process is required. Figure 2 sketches the general idea how a conventional OPF can be transformed to an ANN-OPF in an automatic or semi-automatic way. Each of the framed boxes is planned to be interchangeable to achieve high modularity. For example, exchange of the power system model enables automatic design of control algorithms that are optimal for a given power system. This general idea was proposed in a previous work [9], instead of using generic concepts that are not optimized for specific grids. The OPF itself is highly modular as well, because diverse objective functions and system constraints can be chosen. This also applies to the ANN training algorithms, which can be chosen from literature. The modularity results in high generality and little engineering effort, because parts of the total design flow can be exchanged easily to apply the methodology to various kinds of problems. Second, most publications use MLP-ANNs as architecture to learn the OPF. Diverse other architectures were not tested yet. For example, recurrent neural networks can be expected to be a useful for the multi-stage OPF problem over a time-frame, instead of a single stationary grid state. Third, the presented approach from literature requires tens or hundreds of thousands of offline generated training data samples, which is computationally expensive, if complex OPF variants have to be solved. An alternate promising approach would be to map the OPF problem to the training process of the ANN, so that the training process implicitly solves the optimization problem. Hopfield networks can be used in such a way to solve optimization problems [10].
The listed shortcomings and their respective proposed solutions shall be pursued in the doctoral project to achieve an ANN-OPF that is usable for realistic use-cases, considering engineering effort and quality of solutions. To test the quality of solutions, the resulting ANN-OPF will be applied to several use cases on benchmark grids like the simbench grids [11]. Use cases are OPF applications in grid control, e.g. voltage control, re-dispatch, or clearing of ancillary service markets, which is often done using an OPF [12]. Metrics for evaluation will be 1) deviation from the conventionally generated OPF solution as ground truth, 2) number and magnitude of constraint violations, 3) robustness against missing or faulty data input and outages within the grid, and 4) computation time. The formal definition of the metrics is still to be done. The appliance of a trained ANN-OPF to multiple close-to-reality use cases is another step that was not sufficiently done in literature yet.

Conclusion and Outlook
The presented doctoral project attempts to make the solution of OPF problems faster and real-time capable by using ANNs to transform the optimization problem into simple matrix multiplications by training. In this work, it is aimed to advance the methodology to applicability in real-world situations by lowering the engineering effort for 5 Over-fitting: The ANN is not able to generalize to new data, but only "memorizes" its training data. 6 Generalization: The ability of ANNs to achieve good results for data points that were not used in training.
design, by systematically searching for the best ANN architectures, by reducing the drastic computational effort of training data generation, and by evaluating the resulting ANN-OPF regarding solution quality and computation time in realistic use cases.
In further research it could be investigated how ANNs are suited to achieve distributed control of power systems. For example, Sondermeiyer et al. [13] trained ANNs actuator-wise to achieve decentralized control. However, no communication is considered yet, which would be required to achieve distributed optimal control.

Summary
The distribution grid becomes continuously more difficult to operate and monitor leading to voltage band violations in many cases. Therefore, distribution system operators (DSO) need to surveil the correct operation of grid connected devices, such as power converters on the low voltage level. The behaviour of these decentralized generation units and their grid support functions such as reactive power dispatch, used for example for voltage control, is crucial for grid operation. The architecture developed is to enable better supervision of grid connected devices. This is to be achieved combining machine learning algorithms for anomaly detection, classification and load disaggregation. These mechanisms are then to be applied to the transformer data as well as to the device data to identify and classify unwanted behaviour. Keywords: Power Distribution Systems; Malfunction Detection; Operational Data; Machine Learning; Misconfiguration Introduction Nowadays, electricity grid operators face many challenges connected to the fundamental changes the energy system is undergoing. Especially a high density of photovoltaic (PV) power generation has grave impact on a grid, as pointed out in [1]. Local violations of the admissible voltage magnitude, the so called voltage band, are often the consequence, whereas the system frequency can be affected globally. To avoid such unfavourable effects, without limiting renewable energy generation, control strategies are needed. Voltage regulation is regarded as the most important aspect in the integration of distributed generation in distribution networks [2]. This is implemented Fig. 1 (abstract PW2). Mapping of grid state to optimal set-points using offline generated training data through grid supporting functionalities provided by the generation units or loads. Amongst others, these range from curtailing the active power dispatched or consumed, to controlling the reactive power injection of generation units with inverters. To ensure these functionalities are exercised correctly, the grid connected devices have to be supervised.

Scenarios
The cases targeted include the supervision of the correct scheduling of loads or the proper feed in of energy by generation units. These are non transient events and therefore their dynamics are not of interest, allowing for lower resolution data to be satisfactory. Here only supervision of operational changes on a lower time scale are addressed. Data with a resolution of one minute or lower is sufficient for this task since the effects on the grid, the phenomena of interest mentioned above have, are in a similar time range. A load not being switched on will only have a measurable impact after a few minutes or hours, making detection also less time critical. The control function supervised in one of the scenarios is illustrated in Figure 1: a) shows how the power factor, and therefore also the reactive power is controlled depending on the active power, whereas b) depicts a voltage control varying the reactive power. Distribution system operators (DSO) need to surveil the operation of grid connected devices, such as inverters on the low voltage level, and their voltage control capabilities in order to ensure the network to be reliable and to work within the specified limits. Deviations of control schemes from the specifications as they are defined by grid codes can have two reasons: firstly a different configuration than the normative one can be purposely implemented, leading to misconfiguration. Secondly the configuration can change due to malfunctions or faults. The operational data needed is however not utilisable at all connection points because of legal restrictions regarding data privacy or the lack of measurements in general. Therefore, monitoring is required to be performed remotely using as little data as possible, making surveillance on the distribution transformer level preferable. Architecture An architecture is proposed that takes medium voltage transformer data as well as information about the underlying low voltage grid as input, and applies various data driven approaches to it. This should enable detection and identification of grid supporting devices on the underlying low voltage level that show behaviours that do not correspond to the ones laid down in the specification. This allows to monitor for instance the execution of control schemes of distributed generation units on the low voltage level as shown in Figure 2.
Various tasks have to be covered by this architecture, such as anomaly detection, classification of the same, as well as data mining activities. The anomalies are limited to behavioural anomalies, such as wrong parameterisation of control curves either as a result of frequent initial misconfiguration or of recurrent malfunctions during the execution as elaborated before. For detection of anomalies in the behaviour of grid connected devices, kernel principle component analysis (kPCA) [4], appears to be a promising solution, for it allows to build a statistical model of the nominal state of a system. For classification purposes a partially hidden structured support vector machine (pSVM) in combination with kPCA as depicted in [5] can be employed. Yet another approach is explored in [6] using a one class support vector machine for Heating, Ventilation and Air Conditioning (HVAC) anomaly detection. The results show, that most variability in the data does not occur due to anomalies but during the usual functioning of the system, which applies also to PVs in regular operation or households. This points towards using primary component analysis (PCA) for anomaly detection on the low voltage level, whereas some form of support vector machine could be applied on the medium voltage level. For data mining purposes the transfomer load profile can be disaggregated into its contributions by the devices and loads on the low voltage level. To perform this disaggregation, [7] proposes an application of an artificial neural network (ANN). Smart meter data of households and grid connected devices, such as generation units, could be used to establish a database of appliance signatures. Alternatively, a hybrid support vector machine/Gaussian mixture model (SVM/GMM) classifier could be employed, as explored in [8]. The approach discussed here has the advantage of building its own power feature model for appliances when these are turned on without needing smart meter data. First implementations of the concept presented are being developed in a coding environment using data synthesized by grid simulation software. The data generated in this manner should comprise of distribution transformer data and low voltage grid participants data such as voltages, currents, and power flows at minute resolution. Data of regular operation and abnormal behaviour, as of wrongly parameterised inverters, are needed and therefore generated. KPIs are to be defined, such as a misclassification rate or a confusion matrix representing false negatives and false positives of the anomaly detection. The latter are of particular interest since false alarms ought to be avoided. To enable the algorithms to learn to classify these cases, at least partly labeled data is going to be necessary [9].

Discussion and Conclusion
The integration and roll out of decentralized renewable energy sources is both inevitable and necessary in order to reorganize the electric energy supply in a sustainable manner. These sources of energy show great volatility when providing energy, which can lead to problems in grid operation. Therefore, control measures have to be put in place and grid operators have to make sure of their correct functionality. An early framework has already been developed and implemented that allows to synthesise operational grid data of malfunctioning devices, allowing to examine results achieved with certain data as well as to determine the necessary properties of the same. The first preliminary results of an approach applied to the data synthesised are shown in Figure 3. These show voltages at two terminals plotted against each other, which both have a household as well as PV connected to them. Each data point's x component is the voltage of the one terminal, the corresponding y component is the voltage at the other terminal at a certain point in time. Depicted here is the data over the course of 48 hours in a 5 minutes resolution. In this case the function supervised is a reactive power dispatch curve controlled by the active power, which influences the voltage. Here, a first feature for anomaly detection is depicted: the explained variance of the second primary component varies greatly between the point clouds depicting two terminals without malfunctions (right) and two terminals where one of them experiences a malfunction resetting its control curve (left). This could be used as an indicator of abnormal and unintended behaviour. Results from these evaluations will be used to develop, improve and robustify the architecture. Finally, real world data provided by DSOs could be used to verify the concept and the approach could be tested on a grid serving as a test site.

Summary
This project aims to design a prediction model, optimize an operation plan and simulate systems operation. Simulation is necessary due to mismatches between predicted and actual heat demand. The goal is to understand the influence of heat demand prediction errors on a flexible operation of combined heat and power systems (CHP systems). The simulation additionally reveals what actions could either be taken to react on prediction errors during operation or to avoid complications beforehand. Based on these results the solution space of optimization is going to be limited in order to avoid interventions during operation plan realisation. Thereby the question is addressed whether prediction errors can be acknowledged during operation planning fruitfully for systems operation. Keywords: Prediction Errors; Heat Demand; Time Series Forecasting; Artificial Neural Networks; Operational Optimization; Flexibility

Problem Statement
The growing share of intermittent energy sources increases the likelihood of power overproduction at some point in time and power outages at others. One idea to deal with these fluctuations is the usage of existing mid-scale CHP devices in decentralized energy systems. With several hundreds of kilowatts in power generation and usually connected to a heat storage system, such CHP devices are able to shift considerable loads and react to market stimuli. Generally speaking, they incorporate the potential to generate power during high price hours and avoid generation during low price hours i.e., the potential to counterbalance power shortage in the market or shut down of renewable power plants respectively. The ability to shift loads to high price hours is limited by device constraints like storage capacity or feasible power generation. Furthermore, since the CHP device generates heat along with power, heat generation and therewith power generation is limited to the existing heat demand. Short term imbalances between generation and demand can however be compensated by storage systems. In order to generate heat and power when power prices are high and store excess heat for low price hours, system operation needs to be planned beforehand. Moreover, complying with power delivery contracts restricts spontaneous changes to plans. A reliable operating plan is crucial to unleash the full potential of existing CHP systems for demand side management. An operating plan, however, can only be as reliable as the inputs it is based upon. These inputs consist of the available energy generation and storage system, power price predictions, as well as power and heat demand predictions. All predictions are subject to uncertainties. If assumed power prices fail to materialize, the operating plan fails to achieve the promised goals. The impact of deviations between assumed and actual heat demand is different, because it affects system operation directly.
On the one hand, if heat demand exceeds the expected extent and thus remains unmet, generation needs to be boosted. A compensating heat source, like a district heating grid, is usually not available.
On the other hand, if heat demand under-runs the expected extent and generation exceeds the storage capacities, generation needs to be limited. Overproduction might cause overheating and damages to the energy system, if excess heat cannot simply be dumped.
In such situations, the operating plan needs to be modified accordingly, endangering not only promised goals but possibly the contracts entered into.  Such situations are not unlikely. The operating plan is product of an optimization algorithm to achieve the greatest possible benefits. Consequently, it tends to push the limits of capacity. Not expecting any deviations, using only 90 % instead of the full storage capacity makes no sense. Therefore, optimized operating plans are particularly vulnerable for unexpected heat demand deviations.
Heat demand predictions and their errors were studied frequently [1]. Even if predictions based on artificial neural networks (ANN) achieve good performances, a perfect prediction is simply unachievable.
If prediction errors are unavoidable and if such errors affect the promised goals when operating plans are put into practice, such operating plans might perform better if prediction errors are in advance accounted for. Stringently, the research hypothesis that needs to be verified is thus: Limiting the solution space in operation planning grants tolerances for heat demand prediction errors and thus avoids interventions during realization. In this way, a limited operational optimization does in reality tap the flexibility potential of decentralized energy systems to a greater extent than optimizing within the full range of operation.
RQ1: How do prediction errors decrease the real potential of flexible operation? RQ2: How can one limit the solution space of an optimization algorithm in a fruitful manner? RQ3: How do these limitations correlate with the performance of predictions?

Related Work
Three different publications dedicated to find the impact of imperfect energy de-mand forecasts on the flexible operation of CHP systems were found. Bakker et al. [2] predict in the first publication the heat demand of four individual single households. Via Integer Linear Programming an optimal schedule for the CHP plant is determined. The schedule is optimal concerning the profit made by the CHP plant on the electricity market APX. The schedule is then simulated taking real electricity and heat demand as well as the limitations of a local heat buffer into account. As a reference the same procedure is done with a perfect forecast.
The achieved average sales price corresponds to 78 % of the theoretically achievable price. In other words 22 % of the income is lost due to forecast errors. Baltputnis et al. [3] investigate in the second publication a CHP plant in Latvia with an electric power of 976 MW. With help of a heat demand forecast a heat and power production plan is obtained. Generated power is traded at the Nord Pool Day-ahead Market. Since their prediction is on average underestimating the heat demand, also the power production is underestimated, which in turn means more power is available than traded on the market the day before. Costs of imperfect predictions are therefore estimated as the lost revenue of unexpectedly overproduced power not sold beforehand.
For two different ANN-based forecasts -the prediction qualities of which are measured via the RMSE and given as 8.786 % and 7.819 % respectively, a difference of 11 % -the lost revenue is decreased by 26 %. The authors state that "the consequences of imprecise heating demand forecasts cannot be overstated.". Fang and Lahdelma [4] predict in the third publication the heat demand of the city of Espoo in Finland with a yearly heat demand of 2.25 TWh.
With that prediction they calculate the optimal operation plan for the existing generation system with regard to the net operating costs. The generation system consists of a combined heat and power plant and a heat storage. The power produced is sold at the Nord-Pool spot market.
In the presented case study 90 % of the theoretically possible savings due to flexible operation are achieved by taking imperfect forecasts into account. None of the above publications acknowledge prediction errors beforehand.

Methodology
The overall methodology can be subdivided into three main tasks. The three tasks are depicted in figure 1.
Prediction models are designed to obtain predictions with different accuracies. This work focuses on neural network based prediction models. As an indication for its ability to learn the underlying patterns, its performance is compared with other prediction models. On the one hand, a naive prediction model (today is the same as yesterday) will be used. On the other hand, an ARIMA [5] prediction model will be used for comparison. The main prediction model is based on a feedforward neural network with historic temperature and heat demand data, and a temperature forecast as inputs, a single hidden layer and a single output. The target is the measured heat demand e.g. 24 hours later.
Operational Optimization is realized with help of a given mixedinteger linear programming (MILP) optimization model. The goal is to obtain optimized operation plans either based on historic heat demand data as an idealized forecast or based on a imperfect heat demand predictions. The operation plan obtained with the historic heat demand data will not encounter any difficulties when put into practice. Therefore benefits can directly be determined. The operation plan based on imperfect heat demand prediction, however, is faced with the problem that the assumed heat demand during optimization does not match the actual heat demand during realization. Hence, the latter operation plan has to be simulated taking actual heat demand into account.
Simulation of Operation is added to the workflow in order to reveal any issues occurring while realizing the operation plan. The mismatch between heat demand prediction and actual heat demand results in some time periods of over-and underproduction. If the connected energy system is not able to cope with the excess heat or the heat deficit the operation plan needs to be modified. Such modifications might be unexpected shut downs or ramp ups of the CHP plant, which lead to e. g. additional fuel consumption, higher energy losses, inefficient operation or increased wear.

PW5
Tracking CO2 emissions from power generation in high spatial and temporal resolution -Case study for the Summary Until today it is not possible to allocate the CO2 emissions in the German electricity system to a specific region and its corresponding electricity demand that caused them. This paper presents a new energy system model and uses established methods for answering this question. A detailed bottom-up model of the German electricity system is built to represent a highly spatial and temporal representation of the Germany electricity system in 2019. In combination with a customized input-output analysis, the individual emissions from the producer to the caused consumer can be traced. The analysis demonstrates the importance of considering spatial and temporal effects as well as electricity exchanges between regions in estimating emissions footprints. Keywords: Electricity System Simulation; Power Flow; Input-Output Model; Grid Usage; CO2 Emission; CO2 Tracking; Carbon Accounting Introduction Nowadays, tracking of CO2 emissions is of great interest [1]. This is not only caused by national regulations [2,3] but also driven by public awareness and the interest of industries and individuals to become more environmentally friendly [4,5]. Tracking of emissions refers to the idea to check how much CO2 or other greenhouse gases are produced by a certain activity at a particular point in time. With around 35 %, the energy industry represents the largest CO2 emitter in Germany [6]. Therefore, the climate impact (CO2 and other greenhouse gas emissions) associated with the production and consumption of electricity should be carefully monitored. It is important to correctly allocate where, when and from what source emissions in the electricity sector are emitted. In the electricity sector, this is a challenging task, as generation and consumption are interconnected via transmission networks. These networks can connect regions located far from each other, which results in the fact that production and consumption take place in different locations. Therefore, consumption based-emissions in a region are not only determined by local electricity generation, but also by possible imports from other regions. This will become even more difficult in the future, since our energy system will become more and more linked to different sectors and regions [7,8]. Due to the spatial and temporal distribution of generation and consumption, these imports and the associated electricity flows often represent a critical factor in the emissions associated with local electricity consumption.
Methodologically, these consumption based-emissions can be determined using multi-regional input-output models [9]. To obtain useful results at a local level, load, generation and flow data with high spatial and temporal resolution are required. Unfortunately, this data is only available to the public with limited resolution or not available at all. To compensate this, available data must be extrapolated, and model-based analyses must be used. In this study, a model is developed that allows a highly spatial and temporal emission tracking for the German electricity system.

Research questions
The goal of this work is to develop a model that represents the German electricity sector for the year 2019. The spatial and temporal resolution of power production and consumption is a significant part of this work. In order to track emissions, production and consumption data with a high spatial and temporal resolution will be applied. The spatial resolution is determined by the number of network nodes (number of substations). The temporal resolution of the model is limited by the available load and generation time series data. As an example, we use power plant generation time series with an hourly resolution published by ENTSO-E [10] where available. Missing power plant production time series will be modeled using an optimization approach. The production and consumption data obtained will later be used to model the historical load flows in the network. Finally, the historical emissions in the system will be tracked by using a flow based tracing method. Consequently, the following research questions shall be the guideline for the model and the approaches presented in this work: · Is it possible to represent the German electricity system in a model, including the high voltage (HV) grid, conventional power plants, renewable power plants, production and consumption data on substation level by using public open data with a sufficient degree of accuracy? · Is it possible to backcast power flows for the year 2019, with the publicly available data? · Based on the built model and the derived power flows, is it possible to track CO2 emissions and gain reasonable information about their causation in the German electricity system? Methodology To analyse the impact of a specific generator or consumer onto the power system, Bialek et al. introduced the flow tracing method [11]. The method is based on solving linear equations that take the inflow and outflow pattern, as well as the electrical grid topology into account. Tranberg et al. used the flow tracing approach to allocate carbon emissions in the European electricity trades on the national level only. In order to allocate the emissions with higher geographical resolution, we need flow patterns within a single country. Power flow simulations can be used to calculate the flow patterns on a given network. Combining the economic dispatch with network constraints results in optimal power flow (OPF) problem, which is usually done due to missing generation data. Most models rely on the simplified linear OPF (LOPF) simulations. This means the generation time series is generated by the model (with lowest costs, line constraints, and linearization of the full non-linear power flow). In order to generate more realistic results, we added the available unitwise generation data published by the ENTSO-E [10]. The remaining dispatch is then approximated by the model using an LOPF simulation. The power flows are modeled in the Python for Power System Analysis (PyPSA) environment [12]. The used emission tracing method (real-time carbon accounting) traces flow pattern from generators to consumer and taken into account the underlying network topology [13]. The method follows power flows in electrical networks and draws the path connecting the location of generation with the location of consumption. It works in such a way that each technology for each node is assigned a color mathematically. For each hour, nodal production and imported flows to the node are assumed to mix evenly. The resulting color mix determines the mix of the power generation serving the demand of each node. The applied carbon emission intensities for each generation technology are derived from the ecoinvent database to provide an accurate average intensity per MWh produced during construction and operation [14]. Fig. 1 (abstract PW4). Methodology depicted in comparison with an idealized evaluation of operational optimization. Simulation of operation will reveal prediction error induced complications indicated with red dots and highlighted by an arrow. Scope of this work is to analyse whether reacting on occurrence or avoiding conflicting situations beforehand has a smaller impact on operation Energy system model The network model topology is manly based on the online available ENTSO-E Grid Map [15]. The hourly demand time series published by the ENTSO-E transparency website [16] was processed by using an heuristic allocation method based on gross domestic product and population data to assign the national demand data to the respective busses. Power plant data (conventional as well as renewable) was collected mainly from two sources (1. powerplantmachting [17] 2. Marktstammdatenregister [18]) and added to the model. The corresponding production time series were obtained for some of the conventional power plants from the ENTSO-E website [10]. The renewable generation is modeled based on the re-analysis weather data ERA5 by the European Centre for Medium-Range Weather Forecasts [19]. The energy system model is build and simulated by using the PyPSA environment [12]. In a first step, the model validation is determined by using the LOPF simulation of the energy system model for the year 2019. It will be checked whether the model with all elements (line, busses, generation units as well as the demand) can be solved without overloading the lines (line-loading is below 70 % to approximate the N-1 criteria). In a second step, the simulation results are compared with reported historical data. For this purpose, the Cross-Border Physical Flows [20] and the Actual Generation per Production Type [21] can be used as reference values.

Conclusion and outlook
An approach to create a model that can represent the German electricity system at its current status was presented. The integration of published power plant dispatch data into the model displays an innovation in the field of energy system modeling. The resulting model with 443 lines, 337 buses and a set of generators can be seen as a basis for further investigations by using power flow methods (PF, LPF and LOPF) and allocation methods for emissions tracking. In a next step, the developed model will be used in combination with the presented real-time emission allocation method to investigate the location and pathways of emissions in the German electricity system. Currently, the model represents only Germany and the neighboring countries. As an extension, the geographical coverage of the model could be enlarged to the European scale.
Keywords: NILM; Non-Intrusive Load Monitoring; disaggregation; sampling rate; data characteristics; comparability Introduction / Motivation Providing detailed information on loads in a home can enable energy savings [1]. Load monitoring provides such fine-grained information. Non-intrusive load monitoring (NILM) methods gain the information through the analysis of whole-building consumption data and its decomposition into the constituting device consumptions [2]. With the advancement of digitalisation and rising climate concerns, the research has seen many advances in recent years. Aggregated consumption traces have been confirmed to be highly heterogeneous in aspects such as noise, consumption amplitudes, sampling rates, or device classes (compare e.g. [3,4,5]). However, knowledge regarding the impact of differences in the input data, which will henceforth be called Data Characteristic Differences (DCDs) is currently shallow and distributed through multiple publications. Nonetheless, DCDs were found impeding to the generalisability and comparability of current research [3,4,6,7]. A better understanding of DCDs and their impact on disaggregation results could alleviate the impediments. Therefore, this PhD project is set to answer one central question: Which consumption data characteristics can differ between data collections and which impacts to the load disaggregation process arise from these differences? After exploring the current knowledge regarding DCDs in the State of the Art, this abstract will present the methodological approach as a Proposal to Overcome Current Impediments and conclude with the expected contributions in Conclusion and Outlook.

State of the Art
The high variety of differences in consumption data has been found to impede data set interoperability and the comparison of results across data sets in multiple evaluations [3,4,6,7]. The differences can be categorised into structural differences, concerning file and data organisation, and specifics of the data characteristics. While a number of works has considered methods to overcome structural differences (e.g. [5,8,9]), research concerning DCDs and their impact on the disaggregation process is sparse. Surveys have shown a relation between appliance type and disaggregation complexity as early as 2012 [2], but many DCDs are identified without further consideration during the proposal of NILM algorithms or the evaluation of their usage in new environments (compare e.g. [10,11,12]). The theoretical evaluation of disaggregation processes has provided broader sets of DCDs. In [13] the authors consider the impact of noise, consumption amplitude and the sampling rate. A complexity measure based on device and time sequence characteristics is derived in [14]. However, both works consider simplified ideal disaggregation processes, making their transfer to real world effects difficult.
The only data characteristic that has, to the best of our knowledge, been further evaluated is the temporal resolution. While the authors in [15,16] evaluated very low sampling frequencies under privacy aspects, the first work in this project [17] evaluated data collected at frequencies in the order of kilohertz.

Proposal to Overcome Current Impediments
To complement the existing knowledge, this work proposes a process akin to a sensitivity analysis: an identification and controlled variation of input parameters, the DCDs, and evaluation of changes in the output, the disaggregation results. The methodology is outlined in the following paragraphs, reflecting four main steps: 1. Identification of input data characteristics and their value ranges 2. Determination of adaptation possibilities 3. Evaluation of DCD impacts on the disaggregation results 4. Verification of the findings Identification of input data characteristics and their value ranges An extensive literature review will be conducted to compile a comprehensive set of DCDs, as even more detailed evaluations include different characteristics (compare [6,7,13,14]). Additionally, the literature review should provide a metric to quantify the charac-teristic's value in a given set of data, and determine a value range for each DCD, both of which are mandatory for the further process. However, DCDs are not necessarily methodologically measured in the presenting works. Thus, the compilation of metrics and value ranges will extend beyond literature review to assess suitable metrics or evaluate value ranges based on data from public data sets when the information cannot be found in the related work. Determination of adaptation possibilities Based on the created compilation of DCDs and their value ranges, a strategy for the value variation of each characteristic can be developed. However, determining possibilities to vary DCD values will face the challenge of interdependencies. An ideal variation would allow to only influence the characteristic under consideration. Interdependencies complicate the variation and are expected for device-specific characteristics, such as the consumption amplitude. The development of adaptation strategies will identify the interdependencies and acknowledge them in the developed adaptation processes. Additionally, a sequence of evaluations will be created in order to evaluate the most independent characteristics first. Evaluation of DCD impacts on the disaggregation results In general, the impact of a DCD's value will be evaluated based on the incurred changes on the disaggre-gation results, measured through suitable metrics. However, the evaluations must enable fair and generalisable comparisons. While fixing evaluation conditions allows for a fair comparison, setting or identifying generalisable evaluation conditions is not trivial. Analysing impacts on highly simplified artificial data allows complete control of influences. However, the generalisability of such an evaluation remains questionable, as complexity and interdependencies are reduced or excluded. Additionally, choosing a suitable disaggregation method as the model under evaluation is non-trivial because so far, no best disaggregation approach is known. To allow the generalisation of the model, the proposed evaluations will use multiple state-of-the-art disaggregation methods and evaluate the results for all of them. To allow for the generalisation to realistic data, evaluations will be conducted on multiple environments. An environment, in this context, is a set of training and testing data. To enable generalisable findings, each evaluation will include at least two environments, one with either highly simplified or artificially created data and one including excerpts from multiple publicly available data sets, which must previously be analysed regarding the values of further included DCDs and their possible interdependencies, which were noted during the development of adaptation strategies. In combination, the evaluation will allow to generalise identified trends, specify the impact of the DCD under review, and confirm interdependencies between DCDs. Verification of the findings To verify the identified impacts, this work proposes the analysis of unseen data from distinct sources regarding their specific data characteristics. Based on this analysis and the collected findings, predictions regarding the disaggregation process will be made and, in the following, compared to the achieved disaggregation results. Confirmation of the predictions will serve as verification for the conclusions drawn during the evaluation and confirm their transferability to new consumption traces.

Conclusion and Outlook
In conclusion, we intend to extensively evaluate Data Characteristic Differences (DCDs) and their impact on the disaggregation process to create generalisable findings. The evaluation will provide three major contributions: Accumulation of current knowledge regarding DCDs Provision of metrics and value ranges to characterise each DCD Empirical evaluation of the DCDs' impact on disaggregation results Additionally, we expect the conducted evaluation and improved understanding of DCDs to improve future research possibilities. The provision of a complete set of data characteristics and their impact is expected to allow for the creation of more meaningful benchmarks and a better model of the disaggregation process, to provide important insights for privacy considerations (e.g. providing information which characteristics are most identifying for devices) and the creation of artificial consumption data, and allow the creation of systems better adapted to their specific environments or requirements. Furthermore, it will allow a more precise assessment of data sets and current challenges for disaggregation methods.