Quantifying geospatial interdependencies of ICT and power system based on open data

In order to evaluate the effectiveness of innovative services and technologies spanning over the ICT-enabled power system, realistic models are required. Although nowadays there is a wide range of power system data models, these models do not include a lifelike representation of the ICT system architecture and do not consider the level of interconnectedness of the two systems. In this paper, we propose a methodology for quantification of the geospatial relations between ICT and power system based on openly available data. We describe a graph-theoretic approach, formulate a problem of assessing geospatial relations and discuss the methods required to process publicly available data, retrieve scenarios for different regions and quantify the level of interdependence between ICT and power system.


Introduction
With the growing penetration of distributed generators and renewable energy sources, the introduction of virtual power plants and the subsequent need for coordination of distributed smart power system components, the role of the Information and Communication Technology (ICT) becomes indispensable. One of the most visible trends of grid digitization is a continuously growing number of distributed intelligent communicationbased power grid services. In order to evaluate the reliability and effectiveness of these services, analytical and simulation-based techniques are usually applied to representative evaluation scenarios, which describe the composition and dependencies between components of the system. Such evaluation scenarios have to take the spatial properties and interdependencies of the real infrastructures into account. However, while geospatial interdependencies have been identified as one of the key dependency types (Rinaldi et al. 2001;Hokstad et al. 2012;Tøndel et al. 2018), they are usually not considered. Over the last decade, multiple simulation approaches for the interconnected ICT and power systems have been developed for wide-area smart grid applications (Müller et al. 2012;Lin et al. 2012;Georg et al. 2013). Pietsch et al. (2020) have proposed an approach to design scenarios for the communication network between distributed generators. Oest et al. (2021) have described an evaluation model for the optimized scheduling of virtual power plants. These publications have different objectives, but they all use various assumptions regarding the composition of the interconnected ICT and power system in order to construct evaluation scenarios. The validity of these assumptions is even more crucial, when the reliability of the interconnected system should be evaluated, for example, in case of the distribution grid restoration (Stark et al. 2021).
Our key insight is that realistic assumptions regarding the structure of the interconnected ICT and power system can be obtained from openly available data. In contrast to the transmission grid, the distribution grid significantly relies on public communication infrastructures, therefore, both power grid and Internet service provider data is required. While official data is reasonably unavailable, open data can still highlight the regional penetration of ICT access points. In the domain of the power systems, multiple projects (CIGRE Study Committee C6 2014; Meinecke et al. 2020) have achieved visible results in developing representative data models of actual power grids. Thus, the Sim-Bench project (Meinecke et al. 2020), has already used an open data approach in order to extract geospatial properties of distribution grids. Thus, the methodology described by Kays et al. (2016) defines several vital methods applicable to extract geospatial properties of the power grid from the open data, such as application of Voronoi tessellation. Furthermore, a valid approach to determine supply area and type of a distribution grid scenario is discussed in Kittl et al. (2018). However, for the interconnected ICT and power system, there is a lack of insights into how public communication systems are collocated with the power grid.
In this work we focus on cellular ICT systems due to the following aspects. Being already present in all types of regions (rural, semi-urban, urban), the role of cellular networks to enable future wide-area power gird communications will only grow (Borenius et al. 2021) with the introduction of 450MHz (Wissner et al. 2020) and 5G (Dragičević et al. 2019) technologies, especially in rural regions. Next, cellular network geodata for the area of Germany is significantly more complete due to the fact that cellular nodes are more visible at the surface compared to usually hidden underground Digital Subscriber Line (DSL) or optical networks and easier to capture by using radio-based crowd-sourcing applications. However, retrieving assumptions regarding the placement of ICT nodes in relation to the distribution power grid from open data is a challenging task. First, interdependencies between power and ICT system should be formalized in a way quantitative methods are applicable. Second, open and crowd-sourced data are neither complete nor reliable, thus requiring extensive correction, enrichment and merging of different data sources for the various region types.
In this paper we consider described challenges and make the following contributions. We describe a graph-theoretic approach, which provides a model of an interconnected ICT and power system within a geographic scenario and formulate a problem of finding geospatial relations. We apply the methodology on openly available datasets and perform multiple steps for extraction, correction and analysis of the crowd-sourced data as well as for scenario selection. We quantify geospatial properties of power and ICT system by performing a statistical analysis as well as discuss the dissimilarities in the system structure for different types of regions.

Methodology
This section describes the modelling approach, identifies required parameters for both systems and proposes a methodology to retrieve these parameters from openly available data.

Problem formulation
We define power grid as a weighted graph G p = (V p , E p ), where V p is a set of n power system nodes and E p is a set of power lines. Each power line e p ij has a weight ω ij ∈ , which represents the length of the power line between two power system nodes V p i and V p j . The supply area A i is defined as the geographic area around the power system node V p i . In this work, we focus on the MV and MV/LV grid level and assume that V p is a set of MV and MV/LV voltage substations. We also assume that two power system nodes are always connected by a power line through the shortest path, therefore, we assume that ω ij defines the distance between two power system nodes.
The ICT system is defined as a weighted graph G c = (V c , E c ), where V c is a set of k ICT nodes and E c is a set of edges. Each edge e c lm has a weight φ lm ∈ , which represents the distance between two ICT nodes V c l and V c m . This approach allows to model multiple wireless and wired ICT technologies. In this work, we focus on cellular networks and define V c as cellular base stations. We assume a cell to be a circle. This assumption is not always true for modern cellular networks transmitting in an arc or wedge, however, is required to merge our base station model with the models used by the most popular ICT data sources. Thus, we define our base station model with two parameters: r l as the range of a cellular base station V c l and Cov l = πr 2 as coverage of the base station.
Scenario S = {area, type} is a plane in a two-dimensional space, which represents a particular geographical region and is defined by the following two parameters: the area size S area in km 2 and type S type = {rural, semi-urban, urban}. Scenario types have been selected according to known grid data models, such as Kays et al. (2016). The problem of quantifying the geospatial relations of the interconnected system is defined as how to quantify the co-location of a power system graph G p and an ICT system graph G c in scenario S, so that both graphs keep realistic geospatial relation to each other and the scenario S. Figure 1 depicts an exemplary model of interconnected ICT and power system and indicates different types of the geospatial relations between both systems.
We quantify the geospatial relations between ICT and power system according to the following distance metrics. First, ω ij and φ lm define the distances between nodes of power and ICT system respectively. Second, the geospatial distance between power system node V p i and ICT system node V c l can be defined as a distance d il . Third, the amount of nodes within the supply area and coverage quantifies in which intensity one system is dependent on the other. We say V p i is dependent on ICT node V c l , if V p i belongs to Cov l . This implies that a power system node uses communication services of an ICT node. However, a power system node can use services of multiple ICT nodes. ICT node V c l is dependent on V p i , if V c l belongs to A i , which implies the power system node supplies the ICT node. The amount of ICT nodes of V c within the supply area A i of the power system node V p i is an instance of a point-in-polygon problem (Haines 1994) and is defined as follows. Let V c l be a point and A i a polygon generated by V p i on a plane S, then the amount of the nodes within the supply area A i is defined as: The number of power system nodes V p i ∈ V p in coverage Cov l of the ICT node V c l is defined as follows: The geospatial relation of both systems to scenario S is represented by the number of nodes of both systems |V p | and |V c | with respect to the scenario area as a density of nodes per km 2 : density p = |V p | S area , density c = |V c | S area . The presented model also allows to determine the different cases of dependency between nodes of both systems: from the complete absence of dependency to mutual dependency, when the ICT node belongs to the supply area of the power system node, which in turn uses communication services of this ICT node.
We also consider that the ICT system structure can vary depending on the scenario type. Thus, shifting the view from an urban to a rural scenario will change not only the density of the ICT nodes but also the typical cell size r avg . In order to create a realistic model of interconnected ICT and power system for a particular scenario S, multiple metrics are required. While the assumptions regarding ω avg and r avg can be estimated based on the different data sources such as Prettico et al. (2016), real values of , , and d are neither available in the related work nor published by power system or communication infrastructure operators. The resulting research objective is formulated as follows: Given a geospatial dataset of power system nodes with locations and voltages, a geospatial dataset of ICT nodes with locations and ranges, a set of the observation scenarios S o , compute the values avg , avg , avg , d avg . Observation scenario S o is an instance of scenario S on a real geographical area. Location features of datapoints in power and ICT system datasets define V p and V c for the given scenario respectively. The research objective therefore can be tackled as a set of problems of computational geometry.

Quantification of the geospatial relations
The estimation of the spatial distances , for power and ICT system respectively and d for the interconnected system is done based on the geographic location of power and ICT nodes in terms of coordinates. We assume that crowd-sourced datapoints follow the pattern of a realistic system, and for both power and ICT system the nearest neighbours are most likely connected. Distance matrix algorithm (Dokmanic et al. 2015) can be employed to estimate the distances between the nodes within a scenario S. This algorithm aims to find datapoints with the closest spatial features to the features of a given datapoint.
In order to estimate the values of , the supply area A i of a power system node V p i is identified. We assume that loads are not evenly distributed over the scenario area S and the supply areas must therefore have different sizes. Since to the best of our knowledge, no openly available data contains a realistic description of supply areas of power system substations, we split scenario S by assigning a supply area to the nearest substation, while considering the geographic location. Thus, in order to identify A i for each power system node in V p , Voronoi tessellation can be applied. According to Okabe et al. (2008), Voronoi tessellation can be performed for bounded scenarios S ∈ R 2 and a set of n generator points P = (p 1 , . . . , p n ). In general form, Voronoi region Vor i , i = (1, . . . , n) is defined via Euclidean distance d(p, p i ) between an arbitrary point p and the generator point p i as: A supply area A i of power system node V p i corresponds to the Voronoi region of this node. The amount of ICT nodes of V c within the supply area A i of the power system node V p i is therefore calculated according to Equation (1).
Different to the supply area estimation problem, the coverage Cov j of ICT node V c j is estimated based on the location and range r j values known from the data. In general form, for a given geospatial dataset X = {x i | i = 1, . . . , N}, where N is the total amount of objects in X, the algorithm generates the buffer area with a given radius rad. The buffer area B i is calculated as Zhou et al. (2021): where d(p, X i ) is the Euclidean distance between an arbitrary point p and object X i , rad i is the buffer radius. By taking the set of ICT nodes V c as objects and cell range value r j of ICT node V c l as radius, Cov l can be estimated. The observed amount of power system nodes from V p in coverage Cov l of V c l is calculated according to Equation (2).

Data analysis
In this section, we describe the steps required to apply the proposed methodology for extraction of the geospatial relations from publicly available data for the region of Germany at MV and MV/LV grid level.

Data source selection
In this work we have identified several potential publicly available data sources for the power grid and cellular communication network and evaluated those based on four qualitative suitability criteria. First, relevance of data has been estimated based on the presence of the required features, such as location and voltage tags for the power system data and location and range for ICT system data. The relevance is evaluated as high if both of the features are available, and limited, if only one feature is available. However, the relevance of data can be increased by merging several data sources. Second, availability of the data set for download in a machine-readable data format under an open, permissive license has been examined. Next, completeness and accuracy have been estimated as amount of data per scenario and frequency of the updates respectively. We have set up the following categories: high if the data is released by an official source, medium if the data is released by a crowd-sourced project, dense data regions and recent updates have been identified during a preliminary analysis and low otherwise. According to our observations, relevance and availability of the data appeared to be superficial requirements.
For power system data, the OpenStreetMap (OSM) (OpenStreetMap Contributors) dataset has been selected as data source. Alternatively observed projects either use OSM data or only observe higher voltage levels. The positional accuracy of the OSM geodata has been estimated as accurate within a region of maximum 18 meters (Fan et al. 2014). However, the quality of tagging at the distribution grid level in OSM is rather low, what can be explained by a limited motivation of the community to track power system infrastructures, high amount and concealed placement of power system objects, especially in the bigger municipalities. We continue to use OSM data in this work in order to evaluate the developed methodology and add several additional steps to enhance the quality of results, such as selection of accurate and complete scenarios.
Regarding ICT data, several potential sources, including Federal Network Agency of Germany (BNA)   Table 1. BNA and MLS datasets have been selected. The BNA dataset includes official information regarding the location of cellular base stations and has a license preventing publishing and visualization of derivatives. The information regarding the range of base station is not available and has to be extracted from MLS data. The MLS uses radio environment information collected by users of the Ichnaea (Mozilla 2020a) geolocation tracking service to determine the location and range of cell towers including radio technology, operator and country codes. If a cell tower includes several cellular equipment units, e.g. antennas with different Cell Global Identification (CGI), each device will be observed as a distinct ICT node and will therefore have its own dataset entry. This behaviour will be further corrected by merging the data with BNA data, which contains precise base station locations. Further data sources have been studied in detail, such as OpenCellID, however, significant data anomalies, such as the same default range value shared by 58.2% of nodes have been explored. The position accuracy of the BNA dataset is expected to be high, while the location and range values for MLS are estimations based on crowd-sourced data.

Data extraction and correction
In order to estimate the density and distance metrics for the distribution power system, only substations of certain voltage levels (in this work, we look at MV and MV/LV) have been retrieved. We selected the classification approach based on IEC 60038 (International Electrotechnical Commission 2009), which defines V ≤ 1 kV as low voltage and 1 kV < V ≤ 35 kV as medium voltage. According to this classification, 8224 of 19558 entries of the total dataset were considered to operate in MV and MV/LV voltage levels and selected. The MLS and BNA data has been carefully extracted with respect to the licensing. For MLS data reverse geocoding has been applied to each data entry by using Nominatim (OpenStreetMap Contributors) and entries with low sample rates have been detected. A low amount of samples implies limited accuracy of the location and range of the datapoint. As discussed in Velasco et al. (2008), triangulation is a valid technique to locate an object based on three or more radio signals, it is thus assumed that the location of datapoints with at least three samples is more precise. Therefore, 8.48% of the datapoints (51124 entries) with less than three samples have been removed from the dataset to increase confidence in the location data.
Next, range values both too high and too low have been discovered and filtered out by determining lower and upper bounds. According to Kabalci et al. (2016), the maximum cell range for Long Term Evolution (LTE) can reach up to 100 km with reduced performance. For Universal Mobile Telecommunications System (UMTS) the range limit is estimated as 70 km (Mozilla 2020b), for Global System for Mobile Communications (GSM) according to extended timing advance feature a range of 120 km is possible (3GPP 2020). 7138 datapoints with a range lower than 10 meters, smallest radio access nodes deployed by mobile operators (European Telecommunications Standards Institute 2018), or a range exceeding the limit of their respective radio technology have been removed from the dataset. Furthermore, the ICT data has been enriched by combining the MLS coverage data with official base station locations from the BNA dataset as follows. For every BNA datapoint the spatially closest MLS datapoints were determined. The coverage areas of the closest MLS datapoints were then merged and assigned to the BNA datapoint as a single, combined coverage area of this particular datapoint.

Scenario selection
Having the power and ICT system datasets, we are now able to capture the differences of the interconnected power and ICT system for different municipalities as scenarios. In contrast to the clustering-based classification methodology used by Kittl et al. (2018); Kays et al. (2016), we focus only on the population density and do not consider land use type. In order to retrieve the settlement data and classification of Germany, we use a dataset of 2019 from the Federal Statistical Office of Germany (Statistisches Bundesamt (Destatis) 2020). This dataset uses the Degree of Urbanisation Classification (DEGURBA) methodology and classifies the local administrative units by considering the geographical contiguity and population density (Eurostat 2018). Thus, 11007 scenarios were retrieved, classified into three types (rural, semi-urban and urban).
However, the limited quantity and quality of power system data hinders the processing of all scenarios. Thus, an approach to evaluate the data completeness was developed, including completeness requirements and scenario evaluation. These steps can be omitted if reliable power system data, e.g. Distribution System Operator (DSO) geodata, is available. The superficial requirement of scenario completeness is the presence of data points. Thus, the scenarios with zero ICT datapoints and zero both MV and MV/LV datapoints have been filtered out. From 11007 initially available scenarios, 9698 have been removed as zero data scenarios. In order to evaluate the completeness of the scenario regarding the power system nodes, we have calculated a metric of the completeness of supply as ratio of number of the nodes per 1 km 2 to population per 1 km 2 . We assume that the scenarios with greater completeness of supply have a higher density of power system nodes. The selected scenarios were later validated by applying a demand estimation. The demand per 1 km 2 was estimated as a product of the number of households and an average demand per household, assuming the average household size as 2 people (Bundesinstitut für Bevölkerungsforschung 2021) and demand as 3 kW per household. We estimated the required number of transformer substations to meet this demand based on the average transformer capacity for each region type (Prettico et al. 2016) and compared it to the number of the substation nodes in the datasets. Thus, the scenarios which fit this estimation were considered as conditionally complete. From the 1309 remaining municipalities only 53 satisfied the completeness requirement.

Results
In the previous section we performed multiple steps to increase the quality of datasets. However, due to the limited completeness of the power system data especially for urban zones, we obtained a limited amount of conditionally complete scenarios and merged therefore MV and MV/LV nodes into a single set. At the same time, for the semi-urban and rural scenarios the power system data showed a higher level of completeness. The availability of BNA dataset with reliable location data has significantly increased the quality of the ICT dataset and therefore the precision of the communication-related values estimation.
The geospatial relation values have been retrieved based on the prepared datasets and 53 observation scenarios S o . Algorithms described in Methodology section were applied in a Python Geographic Information System (GIS) application using the QGIS (QGIS Development Team 2021) Python API PyQGIS, which provides implementations of the distance matrix, buffer area, Voronoi tessellation and count points in polygon algorithms. Overall, we investigated areas of 261.08 km 2 for 3 urban, 71.20 km 2 for 5 semi-urban and 828.04 km 2 for 45 rural scenarios. The smaller area size for the semi-urban scenarios can be explained by the nature of these municipalities, which are usually parts of larger agglomerations.
First, we estimated the values of node density for each scenario. Figure 2a presents the observed average density c of ICT nodes that was observed over all scenarios. Figure 2b presents the average density density p of power system nodes. These values follow the expected correlation between the population density and the amount of required infrastructure. The node density of the power system in semi-urban areas with a median of 10.05 nodes per 10 km 2 lies closer to rural values. For the ICT system the density of nodes in semi-urban scenarios is rather closer to rural with a median of 1.63 nodes per 10 km 2 , 1.03 nodes per 10 km 2 for rural and 13.22 per 10 km 2 urban.
Next, we have observed the average distance between nodes in power and ICT systems. Figure 2c presents the average distance avg between the ICT nodes, while Fig. 2d presents the average distance avg between power system nodes. Regarding the values of Fig. 2 Results: Quantification of the geospatial relations between ICT and power system avg , the median values of the distance between power system nodes grow thrice as long from 0.89 km for urban zones to 2.23 km for rural. These values fall in the same range as observed by Meinecke et al. (2020), which additionally to other methods surveyed several grid operators. For the ICT system, the average distances between the base stations are a mean of 2.25 km for rural, 1.56 km for semi-urban and 2.23 km for urban. Figure 2e shows the average distance between power and ICT nodes d avg . Interestingly, the proximity values of the power system nodes and communication nodes follow the same pattern with a median value of 1.78 km for urban, 1.30 km for semi-urban and 1.56 km for rural areas. The observed minimum and maximum distances correlate in a similar manner.
The high amount of MLS data points with confirmed observations from 18 April 2020 to 18 April 2021 allows studying the average cell range r avg . Figure 2f shows r avg for different region types and technology tags, available in the MLS dataset. Thus, 45.24% of the rural base stations use LTE standard, while 26.49% and 28.27% provide access according to UMTS and GSM standards respectively. The following pattern repeats for the urban zones with clear domination of LTE with 56.62%, 23.44% UMTS and 19.94% GSM respectively. For the semi-urban scenarios, the domination of LTE cells increases to 69.67%.
One of the core observations of this work is the level of interconnectedness of both systems. In order to estimate it, we have observed the value of avg of ICT nodes within the supply area of a power system node according to Equation (1). First, we have investigated the power system node supply area size, which is presented in Fig. 2g. The mean supply area size grows from 0.32 km 2 and 0.49 km 2 for the urban and semi-urban areas respectively to 1.15 km 2 for the rural areas. The rural areas include multiple outliers with an area size up to 15.29 km 2 , which can be explained by the Voronoi tessellation being applied to the whole region and not only the used fraction of land. The observed amount of ICT nodes within the supply area is 0.11 nodes for rural, 0.10 nodes for semi-urban and 0.45 nodes for urban scenarios. The maximum observed values are 3 ICT nodes in one supply area for the rural scenario and 7 for an urban scenario. Regarding the observation of avg , the share of the power system nodes supplying ICT nodes, 18.11% and 11.99% of the power system nodes for rural and semi-urban areas, respectively, supply at least one ICT node, whereas for the urban areas that value grows to 35.75%. Thus, in rural areas every sixth, in semi-urban every ninth and in urban areas every third power system node supplies power to the ICT system.
Analogously, the average value of power system nodes within the coverage of ICT nodes avg was calculated according to Equation (2). The dependency of the power system on the cellular network grows from the rural region (median 8 nodes per one ICT node) to urban (median 56 nodes per one ICT node). The observed values are presented in Fig. 2h. Despite the growing dependency in urban zones, the importance of the cellular network is higher in rural areas, where this type of communication is most likely the only one available, next to power line communication, while the urban environment usually provides a variety of available technologies.

Conclusion
In this paper, we presented a methodology for quantifying the geospatial relations between nodes of the interconnected ICT and power system. We applied our methodology on publicly available geospatial datasets for ICT and power systems. In the absence of official data from the grid and ICT operators, we have used multiple data selection, extraction, correction and enrichment steps to create datasets with all required features. In our future work, we will conduct further validation of the methodology with official power and ICT system datasets and real-world scenarios, once these are made available. Next, a supply area of the power system node will be modeled in detail by looking at lower voltage grid structures and land use types. Finally, we will study geospatial relations between ICT and power system, taking into account different grid voltages, penetration of renewable energy sources, and create an approach to generate realistic models for a given municipality or region.

About this supplement
This article has been published as part of Energy Informatics Volume 4 Supplement 3, 2021: Proceedings of the 10th DACH+ Conference on Energy Informatics. The full contents of the supplement are available online at https:// energyinformatics.springeropen.com/articles/supplements/volume-4-supplement-3.