Open modeling of electricity and heat demand curves for all residential buildings in Germany

Energy system modeling has been following the energy transition to investigate challenges and opportunities of future energy systems on all grid levels. Necessary input for sector-coupled energy system models are residential electricity and heat demand curves. The increasing importance of distribution grids and their modeling requires demand profiles in high spatial resolution.This paper presents a method to assign pre-generated electricity and heat demand curves to georeferenced residential buildings in Germany. We aim at overcoming fundamental shortcomings of the Standard Load Profiles and enable new possibilities for the modeling of distribution grids. Our approach provides a large variety in residential load profiles which spatially correspond to official socio-demographic data. All used input data sets as well as implemented methodology and the resulting profiles are publicly available under open source and open data licences to enable further use. Our results are validated on different aggregation levels as well as compared and discussed with the commonly used Standard Load Profiles.

buildings are needed. Aggregated building demands need to be available to model superordinate distribution and transmission grids.
The present publication focuses on methods and input data to assign electricity and heat demand curves to all georeferenced residential buildings in Germany which can be used e.g. for sample grids in energy system modeling. Taking socio-demographic data into account, the method bases on a bottom-up random selection of individual demand profiles from a pre-generated pool of synthetic profiles. The electricity demand excludes consumers due to sector-coupling such as heat pumps or electric vehicles. The heat demand covers drinking hot water and space heating. All results presented in this work apply to a scenario for the year 2035 based on the scenario C 2035 of the German network development plan (Übertragungsnetzbetreiber 2021).

State of the art
The majority of existing methods for modeling residential electricity and heat demands are either designed for large-scale energy systems with low spatial resolution (e.g. national level) or small systems with high spatial resolution (e.g. single house).
In grid planning and large-scale modeling of the German energy system, Standard Load Profiles (SLPs) for electricity and gas are widely used (e.g. in Gotzens et al. (2020); Ruhnau et al. (2020); Übertragungsnetzbetreiber (2021)). Due to their design, gas SLPs can be used as heat demand profiles (BDEW 2021). SLPs are representative profiles derived from historical data. The methods are in detail described in VDEW (1985) (electricity) and BDEW (2021) (gas/heat). The SLPs for electricity had been derived from measurements in Germany in the 1980s. However, behavior and electronic devices have been changing over the years, wherefore the SLPs differ from current electricity demand profiles (Bruckmeier et al. 2017). Another major issue associated with SLPs is the lack of variance, i.e. only one gas load profile for different weekdays, seasons and building types is included. In addition, SLPs are designed for representing multiple house-holds (HHs). According to Kerber (2011) SLPs are suitable for modeling electricity demand of > 400 HHs, but not applicable for models with a high spatial resolution, as peak demands of electricity and heat are underestimated.
A representative electricity and heat demand profile for a single HH can be derived from VDI 4655 (Dubiezig et al. 2008). These profiles include higher peak loads needed for modeling single houses. However, applying these profiles to multiple HHs would result in an overestimation of simultaneity and thus peak loads on higher aggregation levels. In addition, the method and data are not publicly available.
An alternative to representative profiles are measured profiles. The amount of measured data for electricity and heat demand of private HHs is growing due to an increasing penetration of smart meters. Nevertheless, publicly available data sets, e.g. (Tjaden et al. 2015;Schlemminger et al. 2022), include < 100 HHs which is still not sufficient for largescale energy models.
To increase the number of available profiles for HHs, researchers had created tools for generating synthetic bottom-up profiles. There are load profile generators for creating either electricity (Von Appen et al. 2014) or heat demand profiles (Fischer et al. 2016), others model both sector demands (Drauz 2016;Pflugradt 2021;Lombardi et al. 2019). In addition to these conventional statistical models, AI-based approaches exist (Antonopoulos et al. 2020;Zhang et al. 2021;Li and Yao 2020;Cai et al. 2019). In theory, all of these tools could be used to create profiles for every HH in a large-scale energy system. However, load profile generators depend on various input parameters that are not available for every HH. Moreover, load patterns are determined by socio-demographic and socio-economic contexts. As the methods from Von Appen et al. (2014) and Drauz (2016) found on HH types in line with the data given by "Census" used in this paper, we make use of those load profile generators. The accuracy of the synthetic profiles of single HHs is proven in the corresponding articles.
In summary, existing methods for residential electricity and heat demand profiles are not suitable for modeling load flow that is both country-wide and geographically highly resolved. They are either focusing on a high or a low spatial resolution and are not always publicly available. This study aims at closing this gap making use of open source tools and open data.

Input data sets
Census data on society and buildings for the year 2011 (Statistisches Bundesamt 2011a) is used for spatial disaggregation. The original data had been statistically extrapolated from HH surveys and enriched with other demographic data. We use the data sets on population, HHs, family types and buildings, which provide data in a resolution of 100 m × 100 m (Census cells). For confidentiality reasons cells may not hold all of these parameters or data may be modified. The extent of the deviation to the original values had been given, which allows users to exclude biased data. However, we observe further inconsistencies in the filtered data which we examine in "Census data preparation".
OpenStreetMap data (Geofabrik 2021) is used to assign electricity and heat demands to individual buildings. There had been various assessments on quality and completeness of building data in Germany (e.g. Kunze 2012;Hecht et al. 2013;Fan et al. 2014) showing that the spatial and semantic accuracy varies among regions and subjects.
In the topology of OpenStreetMap (OSM), tags are used to store metadata on objects. In line with the focus of this paper we extract buildings which we expect to have residential electrical and/or thermal demand. Accounting for 77.5% the 'building' tag value 'yes' predominates in the 32.3 million mapped buildings of Germany. Although it incorporates various other dedications we include buildings holding this value since a visual comparison to aerial imagery showed a considerable degree of accordance with groundtruth residential structures. Beside this tag value we include tag values 'house' (7.1%), 'residential' (3.3%), 'apartments' (1.7%), 'detached' (0.8%), 'semidetached_house' (0.2%), and 'farm' (0.1%), resulting in a set of 29.3 million buildings in total.
Weather data is required for the assignment of heat demand profiles. We use temperature data from ERA5 hourly data on single levels from 1979 to present (Hersbach et al. 2018) with a spatial resolution of 0.25 • (about 15-20 km). Following Gerhardt et al. (2017), the year 2011 had been selected as a representative weather year. In order to reduce the method's complexity and computational time, the climate zones described in Federal Office (2014) had been used to assign weather data to different regions.
Individual electricity demand profiles had been pre-generated using a statistical bottom-up approach developed in Von Appen et al. (2014) and Drauz (2016) and have been made available for this work. They comprise 12 HH types that differ in member count, type (adults, children and elderly people) and composition (singles, couples and multihouseholds) (Von Appen et al. 2014). To each HH type typical available appliances, their electrical demands and usages had been assigned. By varying the times of appliances' (de)activation for each day of the week (randomly or normally distributed), individual HH profiles had been generated for the year 2016. We use a set of 99,600 different, pregenerated profiles which had been published as open data, 8300 for each HH type.
Individual heat demand profiles had been derived using the load profile generator developed in Drauz (2016) which can create combined electricity, space heating and drinking hot water load profiles for individual HHs based on various input parameters. Annual demand profiles had been generated and published for selected locations in Germany for the year 2011 and subsequently matched to the temperature curves at each HH as described in "Heat". A set of generated demand profiles had been published under open data licenses. The published data includes 1259 heat demand profiles for Single Family Houses (SFHs) and Multi Family Houses (MFHs) considering different building classes and HH types.
Annual electricity demand is distributed on Nomenclature des unités territoriales statistiques (NUTS) 3-level by using the disaggregator-Tool created in the DemandRegio project (Gotzens et al. 2020). This tool includes various options to distribute future electricity demands. For our application, the Bottom-Up-method had been selected which assumes future electricity demands based on a prognosis of future population distribution and annual electricity demands per HH type. Since the final results should be comparable to scenario C 2035 from Übertragungsnetzbetreiber (2021), the distribution of electrical HH demands had been scaled to meet the overall annual electricity demand of 119 TWh.
Annual heat demand is acquired from Pan-European Thermal Atlas (Peta) version 5.0.1 (Europa-Universität Flensburg et al. 2021), which had distributed space heating and drinking hot water demands for the year 2015 to Census cells. Census cells with a residential heat demand but no Census data on population are dropped and the missing heat demand had been distributed linearly over the remaining cells. Vice versa, only 91.8% of populated Census cells have a residential heat demand in Peta. The historic heat demand from Peta had been scaled to the total heat demand of the scenario Zielszenario in Prognos (2014) as it will decrease in the future due to retrofitting. The heat demand for the year 2035 of 379 TWh had been calculated by a linear interpolation between values for 2030 and 2050.

Methods
Individual heat and electricity profiles are assigned to buildings. The criteria for the distribution of these profile types differ as heat profiles exist at building level whereas electricity profiles exist at HH level. Different patterns of actual electricity and heat demands, e.g. the strong seasonal dependency of heat demand, and the availability of input data resulted in different needs for the presented methodology. Firstly, electricity profiles are assigned to HHs per Census cell and thereafter to specific buildings. Mismatches between Census and OSM building data required to add synthetic buildings, which are an additional output of this part of the methodology. Subsequently, the heat profiles are assigned to all OSM and synthetic buildings on the basis of aggregated HHs.
A flowchart of the general workflow split into the two sections electricity and heat is given in Fig. 1. The respective process steps are indexed and referenced in the following description.

Electricity
Introduced in "Individual electricity demand profiles" the pool of given electricity demand time series offers 12 different HH profile categories. The Census HH data per hectare (Statistisches Bundesamt 2011a), however, only allows for a distinction between five categories (cf. 'hh_5type' in Table 1) as information on HH composition is not provided. In order to enable using more diverse categories, the Census data is complemented by another Census data set with additional information on age and number of children in the HHs (Statistisches Bundesamt 2011b). By this, we can refine the distinction to ten categories (cf. 'hh_10type' in Table 1). The supplementary Census data exists at NUTS 1-level and is used to derive an individual relative distribution for every federal state. The Census HH data is then used to allocate the absolute number of HHs per Census cell. Beforehand, both data sets first need to be separately processed, grouped and have imputation methods applied. Table 1 also contains the assumed number of adult residents per HH category, which we use to align the data sets, as one describes HHs per Census cell and the other describes residents in HHs at NUTS 1-level. After preparation and conversion of the data sets, they are used to assign electricity profiles to HHs in Census cells. The annual demand of the profiles is scaled with the adjusted DemandRegio projections at NUTS 3-level. Subsequently, the profiles are randomly assigned to residential buildings obtained from OSM within the Census cells. In the case of missing buildings, synthetic ones are generated to enable the assignment of all generated profiles. The individual process steps are described more thoroughly in the following.

Census data preparation (e1-e4)
The supplementary Census data set used in (e1) of Fig. 1 contains information on people living in HHs by family type, size and age per NUTS 1-region (Statistisches Bundesamt 2011b). It has to be processed and regrouped to be compatible with the individual demand profiles and their HH types. As the supplementary Census data does not provide enough information to differentiate multi-households with different amounts of children, this cluster is represented by adults and retiree categories only (cf. 'hh_10type' in Table 1). Age groups are merged into three groups: children ( < 15 y ), adults ( 15−64 y ), retiree ( > 64 y ). Children are excluded as done in Von Appen et al. (2014), where Eurostat (2021) was used as basis for the HH compositions. Finally, the number of people in the ten HH categories, differentiated here (cf. hh_10type in Table 1), are derived. The so obtained numbers represent the people living in HHs but not the number of HHs.
The actual number of HHs is obtained in step (e2) by dividing the number of people per HH type by the average number of residents in the respective HH type. As children are not included, the mapping of pair and single HHs is trivial (cf. Residents in Table 1). The average number of residents per multi-household is approximated by the O0-factor as weighted average of the people that do not live in single or pair HHs according to Statistisches Bundesamt (2011a).
After the conversion, we calculate the relative distribution of 'hh_10type' within a cluster in (e3) (cf. Table 1). In later steps the values are used as weighting factor in a sampling process within the HH type clusters.
The Census HH data at cell level (Statistisches Bundesamt 2011a) contains spatial information on HHs by family types or by HH size, depending on the filtered attribute. Table 2 quantifies the attribute coverage in the Census data set. The data exhibit data gaps and discrepancies between the different attributes for confidentiality reasons (Destatis 2018). Census cells with a confidential population value are excluded, resulting in about 3 million Census cells. In total, 71.9% of these cells hold data on people living in HHs by family type. Comparing the total number of HHs and HH types per cell, we encounter minor differences due to the confidentiality measures taken (not shown in  (3-7) which is reflected by the comparatively low population share shown in Table 2. We use the average distribution of HH types for cells with the same amount of HHs. The average share is multiplied by the total number of HHs and rounded to the next integer value. If the sum of HH types differs from the rounded sum, the difference is added or subtracted from a randomly chosen HH type to retain the HHs total per cell. 16.3% of the populated cells have no information on HH number or distribution. In these cases, a HH type distribution is randomly drawn from cells with the same population value. If there are no cells of the respective population value with HH data, the subgroup with the next smaller population value is used.

Demand profile assignment and scaling (e5-e8)
The two prepared data sets of the Census data are merged in (e5) using proportionate sampling at NUTS 1-level within the HH type clusters (cf. Table 1) as illustrated in Fig. 2. To refine the 'hh_5type' at cell level but keep the distribution at NUTS 1-level, HH types are drawn randomly but with a proportionate weight at NUTS 1-level and within HH type clusters. The distribution from (e3) is taken as proportionate weighting factor. The pool size meets the total number of HHs of this cluster at NUTS 1-level using the total number derived from the data in (e4). In step (e6), this pool is then divided into subgroups representing the Census cells. The sample size in these subgroups corresponds to the number of HHs of this type in the cells taking the results from (e4). By this, we obtain refined spatial information on the 'hh_10type' HH types per Census cell. Subsequently, electricity profiles are assigned to each HH in step (e7). Profiles are randomly sampled (without replacement) from the profile pool according to the number of HH types per cell. Therefore, no profile occurs more than once within a cell but can be used in other cells, which is necessary to reduce the size of the electricity demand profile pool.
To scale the electricity profiles per HH in step (e8), the profiles are aggregated at NUTS 3-level via the Census cells' centroids. The resulting annual demand in each NUTS 3-region is multiplied by a scaling factor to meet the annual electricity demand described in "Annual electricity demand".

OSM building assignment (e9-e11)
To assign the individual profiles to specific buildings from OSM, the deviation between both data sets need to be covered. They are attributable to different shortcomings, the most notable ones being: Census data is incomplete (cf. Table 2) and incorporate statistical bias from the methods applied in the original data preparation (Statistisches Bundesamt 2015). In the OSM dataset building data is missing or imprecise (cf. "Discussion").
In case of Census cells with assigned profiles but without OSM buildings synthetic buildings-randomly placed squares of edge length 5 m-are generated in (e9). As official data set and despite its shortcomings mentioned above, Census can be assumed more complete and consistent than OSM. Therefore its building count is used as reference to determine the number of synthetic buildings to be created. If no data is available in a cell, the number of buildings is derived by the median profile-to-building rate of the entire building data set and applied to the number of profiles assigned to the cell.
The profiles are randomly assigned to buildings within the respective cell in (e10). First, one profile is assigned to each building until all profiles are distributed. If the number of buildings within a cell exceeds the number of profiles, some buildings have no profiles. If the number of profiles exceeds the number of buildings, the surplus profiles are randomly assigned to buildings within the cell. This results in buildings with no, one or multiple profiles. The building type is distinguished in (e11), buildings with only one HH are classified as SFHs, all others as MFHs.

Heat
The methodology to create heat demand profiles for all residential buildings in Germany combines individual bottom-up load profiles with the top-down method commonly used to assign SLPs. Intra-day heat demand profiles are assigned based on the house type and mean temperature of the day according to BDEW (2021). But in contrast to SLPs, not one intra-day profile per building type and temperature class, but a pool of various different profiles is taken into account.

Intra-day profiles (h1-h3)
In the first step (h1), the 1259 annual heat demand profiles described in "Individual heat demand profiles" are cut into around 460,000 intra-day profiles. The information on the house type from the individual load profiles is thereby retained. Using the average temperature per day, the temperature-classes from BDEW (2021) ( Table 3) are assigned to each intra-day profile. To be able to meet the annual heat demand values, the intra-day profiles are scaled to a normalized daily demand of 1 (h2). The intermediate results are intra-day profiles per temperature class and building type.

Daily heat demands per building (h4-h8)
The heat demand depends on temperature curves. In order to reduce the complexity and computation time, the same temperature curve is assigned to all cells within the same climate zone in step (h5). Temperature classes are assigned to each climate zone and day using the average daily temperature of the representative weather measurement point inside the climate zone. The information on temperature classes per day is assigned to each Census cell in the climate zone (h6). Based on the defined temperature class per cell the ratio of the demand on a respective day to the annual demand from Peta is calculated using the sigmoid function described in BDEW (2021) with the parameters for the most recent building class (h4).
In step (h7), the daily heat demand per Census cell is distributed proportionally to the number of houses per cell, which was allocated in step (e11).

Heat demand curves per building (h9)
The heat demand curves per building are created by combining the results from previous steps: first, intra-day profiles with the corresponding house type and temperature class are selected for each building from the pool of profiles (h8). The selected (normalized) profiles are then scaled by the daily demands per building (h9). These daily demand curves per building are composed to yearly profiles. The final heat demand curves per building can be aggregated to different levels, e.g. Census cells or district heating areas.

Results
The methods to allocate electricity and heat demand curves to buildings are applied to Germany. Resulting profiles are analyzed along different spatial aggregation levels.
The first aggregation level is the Medium Voltage Grid District (MVGD) near Flensburg as a study region, visualized in the upper part of Fig. 3. It includes 40,300 HHs in the city Flensburg and six neighboring rural municipalities.
On the second aggregation level, 20 Census cells including 565 HHs in the city of Flensburg were selected. This number of HHs lies within the valid scope of the SLP and allows a comparison with our results. The selected cells are visualized in the middle part of Fig. 3. The amount of profiles assigned to residential buildings is indicated by the colors. Orange and grey buildings are not considered here, as they are not residential respectively not assigned to the focused cells (clipped by building area's centroid). The smallest aggregation level is a single building as an example for data used for modeling low-voltage grids. The randomly selected building of interest is marked red in a north-west cell in the middle part of Fig. 3. Following our methods and data this building is a MFH with two HHs (2 aggregated profiles).  Table 4 provides information on the total number of buildings, as well as assigned electricity and heat profiles for different aggregation levels. Mapped residential buildings are extracted from OSM as described in "OpenStreetMap", resulting in 29.3 M buildings in Germany. In 246,498 Census cells the extracted buildings are not sufficient for the number of HHs with electricity demand. Therefore, a total of 1.1 M synthetic buildings were created, as described in "OSM building assignment". Electricity demand profiles were assigned to 21.2 M (72.5%), heat demand profiles to 20.5 M (70%) of mapped and synthetic residential buildings. Table 5 shows the result of the local differentiation of HH types which influences the aggregated profile at different levels. Except for the building level, where profiles are randomly assigned within a Census cell, the differences in values primarily originate in the Census data.
In the following sections, the electricity and heat demand profiles are analyzed in detail for the described aggregation levels and compared to commonly used SLPs.

Electricity demand profiles
Exemplary electricity demand profiles are visualized on different aggregation levels in Fig. 4a for a randomly selected day (1st of February). The curves show the hourly electricity demand relative to the daily electricity demand in percent. The corresponding SLP H0 profile of the NUTS 3-region is included for comparison.
The normalized electricity demand curve of a single building (blue line) shows the highest fluctuations. A maximum of 16% of the daily electricity demand is reached at 12 pm. Compared to the individual building profile, the demand curves show a significant smoothing on the higher aggregation levels of 20 Census cells (green) and MVGD (yellow). However, there is no visual difference in smoothness between those two aggregation levels, although the shares of HH types differ significantly (cf. Table 5). Besides small temporal deviations, their curves follow a similar trend with  disproportionately high demand peaks. Maximum hourly peaks reach about 10% (20 cells) and 9% (MVGD) of the daily electricity demand. The load never drops below a base load of 0.8% in all of the three profiles. In comparison, the SLP H0 (grey) follows a similar yet smoother pattern like the aggregated profiles (20 cells, MVGD) with a minimum daily load of 1.5%. The observed smoothing suggests balancing effects between the individual profiles. For a closer examination, several percentiles of the entire profile set on the aggregation level of MVGD are calculated and presented in Fig. 4b. During night, the range among the examined percentiles is very low, whereas during day times significant variances can be observed with the temporal distribution following a similar pattern as the aggregated profiles in Fig. 4a. The magnitudes' spread maximizes at peak load times: while at 4 am the range of hourly demand shares between the 10th and 90th percentile is [0.5%; 1.3%] , it gets wider over the day, reaching a maximum of [1.8%; 24.1%] at 8 pm. Similarly, the upward deviation compared to the median (black) increases at peak load times. However, compared to the peaks in the morning and at noon, at 8 pm a strong emphasis on the upward deviation above the 60th percentile can be observed. The minimum load in the 10th percentile is 0.7% of the daily demand per hour.
In Fig. 5 the aggregated time series at MVGD level is compared to the SLP which was scaled to the same annual demand of 117 GWh. The general pattern is comparable-weekdays and weekends, as well as holidays are recognizable. Although the demand peaks match in time, they differ in magnitude: our results show a strong emphasis on the peaks, whereas the SLP's data is distributed rather uniformly. A striking difference occurs in the night hours (2-5 am) when the base load is significantly lower in our results than in the SLP's. Thus, during daytime the demand is higher, especially on midday (12 am) and evening peak hours (6-8 pm). Table 6 quantifies this observation. Our data shows a higher standard deviation, as well as a lower minimum (night) and a higher maximum (day) value. Moreover, the increased spread in load manifests in the 25% and 75% percentiles with higher deviations to the median value. Also, the SLP extends the load duration in the late evening hours (8 pm-12 am) during the transition season to summer time and narrows to winter time showing a step-like character. This effect is not included in our method. Instead, seasonal changes are mainly reflected by a decrease in total load during the day.

Heat demand profiles
The resulting heat demand profiles on different aggregation levels are visualized in Fig. 6 for a selected summer and winter day to show seasonal differences. The curves show the hourly heat demand relative to the daily sum in percent. The characteristic of the corresponding SLP (gas) for private HHs, considering the share of SFH and MFH of the NUTS 3-region is included for comparison.   The normalized heat demand curve of a single building (blue) includes high peaks, especially during the day in summer where up to 82% of the daily heat demand are consumed in 1 h of the day. Heat is not consumed at every hour of the day. At night there is nearly no heat demand. During the chosen winter day there is a constant heat demand of about 2.5% at each hour of the day. The demand rises in the morning and includes a peak of about 10% in the afternoon. When combining the demand curves of all buildings in the 20 Census cells, the normalized curve (green) is smoother, with a maximum peak of about 5% in the selected winter day and 8% in the summer day. Both curves rise in the morning and decrease at noon. The characteristic of all buildings in the MVGD (yellow) is similar to the one of the 20 Census cells but smoother. In comparison to the SLP gas (grey), the profiles of single buildings differ strongly due to the high demand peaks in single buildings. The more profiles are combined in 20 cells and MVGD, the more they align with the SLPs gas.
The fluctuation of the hourly heat demands for all buildings in the selected days in winter and summer are visualized in Fig. 7 by percentiles with upper limits from 60 to 90% and the median (black). Figure 7a shows that the general characteristic is similar for most buildings in winter, as the rise in the morning and decrease at noon is included in every building profile. During the day, the deviation is generally higher than at night. The fluctuation of heat demand profiles in summer (Fig. 7b) is higher than in winter, as every percentile has a wider range in the summer day. Figure 6b indicates that the higher peaks result in higher fluctuations. In addition, the percentile ranges in winter are symmetric to the median value, whereas the maximum upward deviation is higher in summer. In nearly every hour of the summer day 20% of all buildings are without heat demand.
The distribution of heat demand over the year involves a strong seasonal dependency. It it is significantly higher in winter than in summer. Figure 8 visualizes the heat demand over the year for the selected MVGD from the presented methodology in comparison to the SLP (gas). The seasonal dependency is similar in both methods. In the presented methodology, the heat demand is at night much lower than during the day. In summer there is nearly no heat demand at night. This effect is also visible in the SLP data, but not as pronounced as in our method. At 12 pm, the heat demand is decreased in the presented method which is not indicated by the SLPs.

Discussion
The found results are first discussed with regard to both profile assignment methods. Subsequently, sector-specific characteristics are examined.
The demand allocations for both electricity and heat mainly base upon data sets from Census and OSM. As examined in "Census data preparation", the household-related attributes from Census cover most of the cells. Yet, in 28.1% of the cells (Table 2) it is partially incomplete or inconsistent which leads to systematic errors in our results. The induced error in Germany-wide demand, however, is less significant as those cells account for only 5.5% of the total population. Moreover, the Census data set is outdated and applied statistical methods for spatial and demographic extrapolation (Statistisches Bundesamt 2015) affect the quality of our results to a non-quantifiable degree.
OSM also has several shortcomings: we extracted 29.3 million residential buildings in total (Table 4). This number deviates significantly by +51.8% from the latest official data of 19.3 million (Statistisches Bundesamt 2020) and +58.4% from the 2011 Census data of 18.5 million (Statistisches Bundesamt 2011a). This discrepancy is likely driven by the inaccurate tagging aside from different definitions of residential buildings, as well as OSM users' susceptibility to mapping errors. However, as we incorporate Census data, the number of mapped and synthetically created buildings that are assigned electricity and heat profiles by our methods is smaller (Table 4) and deviates by only +10.4% (electricity) and +6.7% (heat) from official data (Statistisches Bundesamt 2020). Assuming that all buildings from the official data are inhabited and therefore have demand, our results are in reasonable accordance in terms of the total number of buildings. Nevertheless, this does not indicate a similar level of agreement on higher resolution levels. As we did not compare building counts on other levels, we cannot quantify errors compared to official statistics or ground truth. Another shortcoming in OSM data are missing building data, which lead to further deviations. Therefore, these data gaps are filled with synthetic buildings using Census data as described in "OSM building assignment". This results in shares of 5.4% (electricity) and 4.8% (heat) synthetic buildings with demand. While their real locations cannot be determined, the population-induced building demand is retained by our methodology. The distribution of annual electricity demands from DemandRegio bases on a forecast of population per NUTS 3-region. Our distribution methods, on the other hand, utilize Census data from 2011. This results in a mismatch between the electricity demand and the Census population. However, the population forecast just differs slightly and only in a few regions (Gotzens et al. 2020). Peta's annual heat demands are based on yet other population input data, requiring additional assumptions. Overall, this leads to 3% of residential buildings with electricity but without heat demand, which restricts the usability of the corresponding profiles in the affected cells.
The only linkage of both allocation methods are the derived residential buildings in step (e11) shown in Fig. 1. A temporal linkage of electricity and heat consumption is not considered. It is thus possible that the heat demand curve of a specific building indicates that it is occupied and shows a peak load whereas the electricity demand indicates differently. This influences profiles on a high spatial resolution, but is leveled out when aggregating multiple profiles. In addition, the individual electricity demand profiles had been created for the year 2016 whereas the heat demand profiles had been generated using the weather year 2011. Since the electricity profiles are less weather-sensitive and differentiation of weekdays and weekend days is not considered in the assignment of heat demand profiles, the error is considered to be small.
The resulting profiles are only analyzed for an exemplary region on selected aggregation levels and compared with SLPs on a higher aggregation level. Further validations, e.g. statistical analyses of all profiles on different aggregation levels, are not part of this study.
Demand profiles for single buildings were validated in Drauz (2016) andVon Appen et al. (2014). In order to validate the suitability of this methodology, aggregated measurement data, e.g. from electricity or district heating grid operators on HH demand are necessary. To our knowledge, these data are currently not freely accessible or not acquired-while e.g. the regional electricity grid operator is obliged to publish an annual, aggregated load curve according to §17 StromNZV (Die Bundesregierung 2021), the data on low voltage level (Stadtwerke Flensburg 2021) are not provided per grid and sector and most likely include demands of other sectors such as commercial, trade and services. Therefore, we can not use it as a reference and draw a comparison to the SLP. The resilience of the resulting profiles is higher on aggregated levels. Profiles for a single building can be biased due to the random selection. To increase robustness, multiple samples could be taken. Further validations are easily possible due to the applied open source principles.

Electricity demand profiles
Revisiting the quality of the Census data, there are more shortcomings to be discussed specific to the electricity sector. Due to differences in HH categories and the corresponding methodology, the total HH distribution obtained by Census differs from the one used in Von Appen et al. (2014), which first verified the used demand profiles. This leads to a predominance of single HHs and a less dominant share of multi HHs in the aggregated HH distribution by Census. On building level, this is of high impact but might also affect the smoothness of profiles at aggregated levels. Since there is no alternative data with a sufficient spatial resolution, we can neither correct nor quantify this error.
The further applied imputation methods presented in "Census data preparation"lead to errors regarding the shares of HH types in 11.8% of the cells (Table 2). Moreover, 16.3% of the cells do not hold HH information at all. While we fill those data gaps, this results in wrongly assigned profiles with respect to HH types. This bias is again profound on building level, yet less significant on higher aggregation levels. As the imputation is based on the population density, mixing rural and urban cells is mostly avoided. Assuming the same HH distribution for cells with identical population density reduces the variation in HH types to some extend but can not be quantified as we lack reference data.
With regard to the individual demand profile pool, it is worth mentioning that the HHs type distribution differs compared to the national one obtained from Census. The pools of HH types have equal size (8.3% each), which leads to disproportionately small pools for the predominating types (cf. Table 5). Thus, at high aggregation levels, profiles are assigned more than once as mentioned in "Demand profile assignment and scaling". The profile pool could be enlarged and shares adjusted to the national HH distribution. This may have an effect on the smoothness of the aggregated profile. However, as in the examined MVGD (40,300 HH), profiles mainly occur once (50%), twice (31%), or three times (15%), the influence is considered to be small. An in-depth analysis of the supplied input profiles is not in the focus of this work as we aim at attaining a spatial distribution of the profiles. However, it should be noted that the profiles base upon today's user behavior, occupancy hours and device efficiencies which are subject to change in the future, which leads to forecast errors in our data.
The assumptions made in "Census data preparation" as well as the random assignment of profiles to cells in "Demand profile assignment and scaling"and subsequently to buildings in "OSM building assignment" inevitably lead to systematic errors. The data quality might increase by using additional OSM data such as buildings' ground area or number of storeys during the assignment process.
Analyzing the results in the time domain, the significantly steeper gradients, higher peaks and smaller base loads are key differences seen in Fig. 4a, but also fit in with the technological development that has taken place since the development of these profiles in the 1980s. Nowadays, there are more electrical appliances in use with lower stand-by consumption. Hence, higher peaks in occupancy hours during the day and lower consumption during the night time seem plausible. As the degree of aggregation increases, a clear smoothing behavior can be seen which can be explained by the increasing variance of the profile types used (cf. Table 5). Although the MVGD contains many more HH then the SLP's acceptable lower bound, significant deviations occur especially during peak time hours where the load variation is maximal (cf. Fig. 4b). The authors of Von Appen et al. (2014) state, that aggregated profiles from different years would be necessary to be even more similar to the SLP. Since we are not interested in aligning our data to the SLP but in profiles for specific years and of great regional heterogeneity, the deviation is acceptable and even beneficial to the investigation of low voltage grids.
Overall, the large regional heterogeneity of the resulting electricity demand profiles and observed smoothing effects for higher aggregation levels make the presented methodology better suited for the purpose of modeling load flows over all voltage levels than the usage of SLPs. Another advantage is the adaptability of the presented method to model electricity demand for different future scenarios.

Heat demand profiles
Input data as well as the method influence the quality of the resulting HH heat demand profiles. As described in "Discussion", there are mismatches in the main input data sets Census and Peta. However, due to the small amount of affected cells, profiles are available for most buildings. The effects on aggregated profiles is therefore considered to be small. In addition, the distribution of annual heat demands to buildings could be improved. The annual heat demands per Census cell are evenly distributed to buildings. In case of cells with SFHs and MFHs, this leads to an overestimation of SFHs profiles.
Besides, the number of individual heat demand profiles was substantially limited by computational resources. Therefore, neither data on HH nor day types are considered when distributing the heat demand to buildings and over time, reducing the consistency to electricity profiles. Using intra-day profiles enlarged the pool of profiles and their variation. Composing the intra-day profiles could potentially lead to inappropriate steps at midnight. However, because of the lower heat demands at night, this is not observed in the resulting annual profiles.
The resulting heat demand profiles are plausible on different aggregation levels. Looking at the load curve of single buildings there are high peak demands during occupancy hours. These peaks are caused by hot water demand, which is characterized by abrupt random instantaneous rise and fall (Drauz 2016). In summer, the heat demand profiles are dominated by hot water demand, which causes higher peak values than in winter where the profile shape is an overlay of the abrupt hot water and more constant space heating demand. These individual variations in the curve shapes of single buildings caused by occupancy are absent in the SLPs.
When aggregating the profiles of buildings, the peak demands are lower and the similarity of the generated profiles with the SLP is increased. This is caused by the fluctuation of profiles at building level. Single building profiles include peaks due to hot water demand in every occupancy hour, so the peaks balance out. Hence the implemented methodology provides a means of increasing the spatial resolution of the existing SLPs. Also, the similarity of the aggregated profiles with SLPs shows the validity of the presented methodology. The aggregated annual demand curves show a similar seasonal dependency as the SLP. This indicates reasonable assumptions of temperature data and the selected climate zones. In the presented method, the heat demand in the morning rises slightly later and more abruptly than in the SLPs method. Since the SLPs are created using gas profiles, the more abrupt rise can be caused by the inertia of the gas system which is not influencing the bottom-up heat demand profiles of the presented methodology. In contrast to SLPs the presented methodology includes a decrease in heat demand curves at midday. This could be caused by the individual bottom-up profiles because hot water demands at midday are unlikely according to the occupancy model used in the load profile generator.
All in all, the drawbacks associated with SLP are addressed to a great extent by this methodology. Heat demand peaks have a better representation along with a correct representation of the summer demand. The consideration of newer building classes makes the future demand forecast possible, but upcoming changes in the behavior and future building characteristics (e.g. low-energy houses) could not be considered. Finally, it can be stated that even with these additional features, the methodology can develop a final aggregated output in line with the commonly used SLP methodology, thus justifying its possible future application.

Conclusion and outlook
This study presents a method to create high-resolution residential electricity and heat demand profiles for energy system analyses suitable for various aggregation levels. All input data sets as well as implemented process steps are publicly available following open data and open source principles. The method was used to model demand profiles for every residential building in Germany.
Exemplary resulting demand time series show reasonable results on different spatial and temporal aggregation levels. The large diversity of demand profiles for single buildings with individual peak loads is useful for modeling a wide variety of distribution grids. When aggregating multiple demand profiles for modeling the transmission grid, the profiles are smoothed and to some extend converge to the commonly used SLPs. A more thorough comparison of the resulting profiles with actual data will be feasible once sufficient open data is available.
Updates of the input data sets can improve the quality of the resulting profiles in the future: there will be a new Census data set in 2022 (Statistisches Bundesamt 2022) which will provide more recent socio-demographic data for Germany. The Census update will also lower currently existing temporal mismatches to the other central input data set OSM. Besides, OSM's coverage and quality are likely to improve in the future (Open-StreetMap Contributors 2022), resulting in a better mapping of residential buildings.
The quality of the resulting profiles could also be enhanced by enlarging the pool size of individual demand time series and therefore increasing their variety. Moreover, the resulting data's consistency could be improved by closer linking of the electricity and heat demands. Future modifications like these are facilitated by the open character and can further improve the comprehensive load data created in this work.

About this supplement
This article has been published as part of Energy Informatics Volume 5 Supplement 1, 2022: Proceedings of the 11th DACH+ Conference on Energy Informatics. The full contents of the supplement are available online at https:// energ yinfo rmati cs. sprin gerop en. com/ artic les/ suppl ements/ volume-5-suppl ement-1.

Author contributions
The methods concerning the electricity demand time series were essentially developed and analyzed by JA and JE. CB and AM developed and evaluated the methods for the heat demand time series. The software implementation was similarly distributed. BS and IC critically reviewed the paper. All authors read and approved the final manuscript.

Funding
The authors thank the Federal Ministry for Economic Affairs and Climate Action for funding the research project eGo n (funding code: 03EI1002).

Availability of data and materials
Data and methods follow open source and open data guidelines. All methods are implemented in the Python tool eGon-data, which is published on GitHub. The resulting data presented in this paper is published on Zenodo. Upcoming changes in the data and methods will be published on GitHub and Zenodo.