Skip to main content

State description of cyber-physical energy systems

Abstract

The integration of ICT into power systems has increased the interdependencies between the two systems. The operation of power system depends on several ICT-enabled grid services which manifest the interdependencies. ENTSO-E system state classification is a tool that is widely used by operators to determine the current operational state of the power system. However, it does not adequately describe the impact of ICT disturbances on the operation of the power system. Despite their interconnections, the operational states of both systems have been described separately so far. This paper bridges the well-established ENTSO-E systems state classification with an ICT system state classification, forming a new model considering the state classification of the ICT-enabled grid services. The model is developed by first identifying the ICT-enabled services, remedial actions and the respective performance requirements that are required by the power system. Then the states of these services are specified based on the supporting ICT system. The resulting joint state description shows how performance degradation of ICT-enabled services (introduced by disturbances) can affect the operation of the interconnected power system. Two case studies of such ICT-enabled services, namely state estimation and on-load tap changer control, are investigated in terms of how their operational states affect the states of the power system. A third case study highlights the interdependencies that exist between the services. These case studies demonstrate the interdependencies that exist between power and ICT systems in modern cyber-physical energy systems, thus highlighting the usage of a unified system state description.

Introduction

The decarbonisation of Power Systems (PS) has paved the way for smart grids with high penetration of Distributed Energy Resources (DERs), which are geographically distributed across the system. These DERs typically behave in a probabilistic, weather-dependent manner which leads to an increase in the system complexity with power flows becoming bidirectional and stochastic (IEA 2019). Additionally, there is also an increasing demand for improving factors such as safety, reliability and cost effectiveness of the system.

The smart grid as a safety-critical infrastructure should continuously and reliably provides electricity to its customers. For this purpose, system operators use a set of grid services such as ancillary services. They include operational management, voltage and frequency control, and system restoration. In case of contingencies in PS, these grid services provide remedial actions to mitigate the impact of the contingencies and restore vital PS parameters such as voltage, frequency and power reserves to their corresponding normal operational security limits. These grid services also assist the operators in economically optimising the operation of the system (e.g., optimising generation costs while meeting a certain demand). However, considering the complex and stochastic nature of the PS with increasing number of DERs, the reliable provision of these grid services becomes challenging. In addition to the transmission system, the distribution systems also requires more active management and coordination due to the increase in DERs such as wind and PV power plants, electric vehicles, fuel cells and combined heat and power; which are more prominent on the distribution side.

Information and Communication Technologies (ICT) are a vital part of smart grids and are becoming deeply integrated with PS, resulting in a highly intertwined system referred to as Cyber-Physical Energy Systems (CPES) (Yu and Xue 2016). In a CPES, the PS has a strong dependency on ICT as it brings in enhanced monitoring, optimisation, decision making and control. The ICT system itself is also dependent on the PS as it requires electric power supply. Specifically, ICT enables better provision of the aforementioned grid services, which are essential for the stable and reliable operation of the PS. The operation of these services, which are increasingly likely to be provided via digital markets (Doostizadeh et al. 2018), have certain requirements on the ICT system. Failures in the ICT system may cause a degradation in the performance of the grid services, which in turn might affect the PS operation (Tøndel et al. 2017; Rasmussen et al. 2017). The tight junction between ICT and PS operation has been demonstrated by past events. The 2003 North American blackout was caused by a software failure in state estimator combined with an alarm system failure giving the operator inaccurate awareness about that system (NERC 2004). In Austria, a congestion in ICT network of a gas infrastructure almost caused a blackout in 2013. In this case, an large number of redundant messages was circulated among controllers due to a software bug in the communication protocol (Schossig and Schossig 2014). The 2015 Ukraine blackout has shown that PS are also vulnerable to cyber-attacks (Wu et al. 2017). Evidently, ICT has become more than just a support system for PS as ICT failures can impact the PS operation. In the context of this paper, ICT system encompasses computation or communication components supported by relevant software or firmware. Components include field devices such as sensors and controllers, network devices such as routers, switches and links, as well as servers which can be located in the control room for decision making.

The ENTSO-E system state classification (referred to as ESSC) (European Union (EU) Commission Regulation 2017/1485 2017) is widely used by system operators to determine the current operational state of the PS. Based on certain PS parameters, the operation of the system can be classified into one of five states. The main drawback of the ESSC is that the impact of failures in ICT system (i.e., the degradation in ICT system) on the PS states is not adequately represented. Specifically, the connection between the performance degradation of ICT-enabled grid services and the state of PS, based on the ESSC, is yet to be investigated.

The main contribution of this paper includes the creation of a novel state description for CPES focusing on the interdependencies between the power and ICT systems. It bridges the well-established states of the ESSC with a new model representing the state classification of the ICT-enabled grid services. First, the ESSC is presented in detail with a focus on its dependency on the ICT system as well as the potential bridges between the two domains. Next, the ICT system states are developed considering the performance of the ICT-enabled PS services. The state degradation of two exemplary grid services, namely State Estimation (SE) and On-Load Tap Changer control (OLTC) are investigated independently as well as in combined case studies, considering various ICT system disturbances. The resulting impact on PS states is also analysed. These examples demonstrates how the proposed novel state description integrates states of ICT-enabled grid services into the ESSC, thus bridging the operational states of the power and ICT domains in CPES.

Related work

The ESSC provides essential information required for the system operation. Based on the monitored information and system operational security limits, the PS can be classified into five states according to (European Union (EU) Commission Regulation 2017/1485 2017; Fink and Carlsen 1978; Gagnon et al. 2007) - Normal, Alert, Emergency, Blackout and Restoration (shown in Fig. 1). Failures or contingencies may cause a degradation in the system state, e.g., Normal to Alert. Depending on which state the PS is in, suitable grid services can be applied to bring the system back to Normal or take mitigating actions to prevent further system degradation. These states, however, are based only on PS parameters and do not adequately consider the performance of the interconnected ICT system. Other state classification approaches, like the traffic-lights used in Drayer et al. (2015) for agent-based distribution grid operation, do not include an explicit consideration of ICT aspects. The impact of failures in ICT systems on the PS states is yet to be fully understood, as underlined in the survey in Tøndel et al. (2017). Focusing on static aspects like components and topology and considering relevant straightforward use cases from the transmission as well as distribution system operator’s (TSO and DSO) point of view is recommended here. Existing research does not include the system operator’s point of view as it is difficult to map these failures on PS operation. The work presented in this paper presents a new system state description for CPES that considers performance degradation of the ICT system and the services enabled by it as well as its impact on the PS. This enables PS operators to explicitly consider the necessary ICT services in CPES state description.

Fig. 1
figure1

ENTSO-E Power systems State Classification (Fink and Carlsen 1978)

The reliability oriented model in Panteli (2013), using two states each, conjointly depicts PS and ICT. A computationally intensive sequential Monte Carlo simulation is used to investigate the impact of ICT component’s failure and repair rates for small scale test systems with the aim of quantifying the impact of ICT failures on the PS performance. The results do not allow operational decision making. A similar two-state model for CPES is presented in Schacht et al. (2016), where the focus is on time dependencies of power demand and injection as well as failure rates for PS reliability. The results underline the necessity to investigate the interdependencies between modern and future PS and ICT-infrastructures. However, in the context of state description, the details regarding the states as well as the question of how to identify the system states at a given point of time, is not described. A Markov model of a selection of smart grid technologies to analyse the impact of three ICT states on customers and distributed generation reliability is proposed in Kamps et al. (2018). Using exemplary medium voltage (MV) grids, ICT services such as decentralised network automation system, voltage-regulated distribution transformer control and a line voltage regulator control are compared but the resulting PS state is not considered. In Narayan et al. (2019), a set of ICT failure categories for analysing the dependency of ICT on PS services are introduced. These failure categories can be used in order to group all ICT problems that would have identical impact on the PS. Thus, the total number of relevant ICT failures that needs to be considered for dependability analyses of CPES can be reduced drastically. However, the work does not provide any results of successful application of these failure categories. In this paper, the presented state description enables the operator to take decisions based on states of the interdependent ICT-enabled services and PS.

Going beyond the components’ perspective and considering the services provided, interdependencies of various infrastructures related to energy supply are identified in Rinaldi et al. (2001). Cascading and escalating of failures as well as their integration into state classification are theoretically outlined but are neither modeled nor applied to any use cases. The state machine approach in Laprie et al. (2007) highlights the impacts of information and electricity failures on their interdependent infrastructures through cascading, escalating or common cause failures. However, a joint infrastructure state description is not presented. The authors of (Wäfler and Heegaard 2013) propose a meta-model with PS and ICT having three states each that allow state classification. The ICT system provides service which are enabled by set of ICT components. Considering the states of PS and ICT in a dependability analysis, the model covers the interdependencies but gives neither any indices nor service specifications for the systems under investigation. The resilience of ICT systems is introduced in Sterbenz et al. (2010). The strategy described here covers the state classification of the ICT system and can be extended to ICT services which are required by the PS.

In summary, the ESSC is well acknowledged and widely used in PS, yet no comparable state description is available for the ICT-domain in CPES. Existing research works are either strictly PS specific or simplify the details of the ICT system, thus limiting their usability in operation of CPES which has high ICT penetration.

ENTSO-E system state classification

The ESSC is a comprehensive health and risk indicator for the current European transmission system and is extensively described in (European Union (EU) Commission Regulation 2017/1485 2017; European Union (EU) Commission Regulation 2017/2196 2017). It is used among TSOs to exchange high-level information about the status of their network as well. Despite its abstraction level, the ESSC considers many complex aspects of PS risk assessment. The classification itself is based on the five aforementioned states. The Restoration state takes a special role as it is subsequent to the Emergency and Blackout States (European Union (EU) Commission Regulation 2017/1485 2017) but it is not a direct result of the ESSC state identification process itself. Therefore, this work focuses on the remaining four states, all of which can be exactly defined for each time of operation. Each abnormal state (i.e., Alert, Emergency and Blackout) has a range of predefined measures and actions, which the system operator can take to bring the system back to Normal state. Note that according to Article 2 of (European Union (EU) Commission Regulation 2017/1485 2017), these system states are also applicable to DSOs, however detailed investigations regarding DSOs are out of the scope of this work.

Elements of the ENTSO-E state identification process

In this section, the vast details from (European Union (EU) Commission Regulation 2017/1485 2017) are summarised. In order to perform the ESSC state identification process to identify the current state of the system, the system operator requires the following:

Operational Limits (OL) -It is an important criteria for the state identification process, which describes explicit thresholds for various PS measurement parameters such as frequency, voltages at all nodes, currents on all lines and the thermal limits of various assets. An exhaustive description on each of these measurements can be found in 50hertz et al. (2013). It differentiates between OL and frequency as it includes temporary tolerance of certain asset’s OL under certain special circumstances. In this paper, frequency is considered as a part of the OL and the temporary tolerance aspect is not considered.

State Estimation (SE) -The SE is a grid service that provides a consistent set of information about the current topology, voltages and currents of the observed PS. This is necessary because measurements are always prone to change with regards to their availability and timeliness across the system. Inconsistencies among the measurements or their timestamps makes it challenging to continuously balance generation and consumption while keeping system parameters within the OL. Thus, several important functions and services of TSO’s control centers rely on SE results, and so does the ESSC.

Contingency Lists (CL) -This list consists of all viable contingencies in the network, which is compiled and maintained by the TSO. It comprises of all possible disturbance in assets like generators, transformers, busses or lines. The list has to be updated and exchanged among neighbouring TSOs regularly, at least once a year. Additionally, a list of exceptional contingencies is also created and exchanged. These are described in (European Union (EU) Commission Regulation 2017/1485 2017) as scenarios with an uncommon (or low probability) disturbances such as loss of double lines, loss of a busbar during increased risk of outages or common mode failures of generating units or DC links.

Remedial Actions (RA) -This list consists of the all viable countermeasures for each of the contingencies identified in the CL. These countermeasures are called RA in the context of the ESSC contingency analysis, and can be categorised into preventive and curative actions. While preventive RA are to be activated before the occurrence of a contingency, curative RA are applied right after a contingency is detected. A non-exhaustive list of common RAs can be found in 50hertz et al. (2013).

Contingency Analysis (CA) -To identify critical contingencies each contingency from the CL is simulated based on the most recent SE result. For all single contingencies that would lead to a violation of the OL, suitable RA are identified. These critical contingencies are simulated once again considering the utilisation of suitable RAs. The result of the CA is a list of contingencies that, even after utilisation of all available RA, would still lead to OL violation.

ENTSO-E state identification process

Figure 2 summarises and illustrates the ENTSO-E state identification process. Note that there are more possible transitions from Normal to Alert or Emergency than illustrated. These are now shown for the sake of brevity. Additionally, to improve clarity, the CA is shows with two separate calculation steps. Based on the results from the CA and the check on OL violation, the current PS state can be identified as follows:

Fig. 2
figure2

Flowchart of ENTSO-E State Identification Process focusing on Contingency Analysis

Normal -A PS is said to be in Normal state as long as the check on current OL violation is negative and the resulting list from the CA is empty. This implies that no potential contingencies have been identified which cause OL violation.

Alert -The CA has identified at least one potential contingency that can - based on the current SE results and the currently available RA - not be handled appropriately and would thus violate OL; the PS is said to be in Alert State. In this state, there is no OL violation yet. Another condition that triggers the Alert State is a lack of more than 20 % of the required amount of active power reserves for more than 30 min.

Emergency -If any OL is violated, an immediate transition to the Emergency State is triggered. The second important condition for a transition to the Emergency State is the failure of any critical tool or facility defined in (European Union (EU) Commission Regulation 2017/1485 2017) for more than 30 min. They include monitoring tools (like State Estimation), switch control, communication between TSOs and the operational security analysis. However, the impact on the system state when the failure of these critical tools and facilities is less than 30 min is not defined in (European Union (EU) Commission Regulation 2017/1485 2017). Additionally, the system is also in Emergency State when at least one measure of the System Defence Plan, described in (European Union (EU) Commission Regulation 2017/2196 2017), is activated.

Blackout -If the loss of loads is above 50% of the total load in a TSO system or if there is no voltage for more than 3 min, the system is defined to be in Blackout State.

The bridge between PS and ICT state classifications

The main objective of this paper is to create a joint state description for CPES. In this regard, the term ’bridge’ depicts the correlation between the states of PS and ICT systems. Within the context of the ESSC, two bridging aspects with the ICT domain can be defined. These are based on the PS services that are enabled by ICT systems. The first bridge is the high-level services for the monitoring and control of a PS. The second bridge is the various RAs associated with the CA. Both these aspects are reliant on ICT in present systems, but this is expected to increase further in the future due to the increasing decentralisation and growing utilisation of automated and digital markets.

The high-level services correspond to – but are not limited to – the list of ’critical tools or facilities’. They aid in gathering PS measurements (e.g., voltage, current) as well as perform operational security analysis (e.g., SE, power flow, preparing list of contingencies and RA). While it is already defined that the PS immediately drops to the Emergency State as soon as even one high-level service fails for an extended period of time, the exact functional requirements that of these services towards the ICT network are unclear. Specifically, the consequence of the ICT system being in an intermediate state between fully-functional and failed remains unknown. For example, a monitoring service that provides unreliable data or a remote switch control with a very high delay could potentially cause severe problems even if the ICT system would still be considered to be functional in these cases according to conventional two-state ICT models. Hence, a more detailed differentiation and consideration of states of the ICT system is also required for assessing the performance of these high-level services of the PS.

The RA is a more complex bridging aspect between the states of PS and ICT domains. As described in the previous section, the CA simulates the activation of all viable RAs for each critical contingency using the most recent SE results. For these simulations, it is generally assumed that each RA is either applicable in the current situation or not. This situational applicability can, for example, be based on currently available operational flexibility of DERs. Additionally, depending on the implementation, RAs may also depend on certain high-level services for necessary inputs. The fact that various disturbances in the ICT system can potentially have an impact on RAs – not only on its availability but also on its performance – is yet to be considered in current research.

One approach to consider the state of the ICT system would be to identify RAs that require communication between components and incorporate this dependency in the CA (Panteli 2013). This would denote the performance of the RAs. For each contingency that can only be averted by an ICT-reliant RA, the current state of the ICT system needs to be analysed as well. This way, an action that would be considered sufficient and available considering PS aspects, could potentially turn unavailable if the ICT system’s state is included in the state identification process. In worst case scenarios, this would mean the difference between Normal and Alert PS state. Furthermore, it is possible to incorporate more detailed metrics of the states of the ICT system. Simulations can, for example, potentially determine the tolerated maximum communication latency for an RA’s control system as demonstrated in Zwartscholten et al. (2020). During the CA, if the ICT latency exceeds its threshold, the performance of the corresponding RA would first degrade and then may eventually fail.

Figure 3 gives an overview on the bridge between the states of the ICT system and the states of PS (i.e., ESSC). The ICT system in smart grids consists of a set of components to perform certain functions in order to support PS operation. Its main functions are to transfer data from one point to another, to process the available data and to manage the possibly large volumes of available data. The performance of these ICT functions can be denoted using three properties, namely availability, latency and accuracy (Narayan et al. 2019). These are explained further in the following section. Note that the performance of these ICT functions (i.e., data transfer, computation and data management) can also vary depending on the disturbances in the ICT network but this is not within the scope of this paper.

Fig. 3
figure3

Bridge between PS states and ICT States via ICT-enabled PS Services

As mentioned earlier, ICT enables the provision of PS (or grid) services, which can be categorised into high-level services and RAs. Each of these services impose certain requirements on the ICT system and its functions. In other words, each of these services has certain availability, latency and accuracy requirements, which should be ensured by the ICT system for the services to perform normally. Additionally, the services also require certain static information about the PS such as grid topology, bus and branch data, load and generation limits as well as available actuators (e.g., switches and tap changers). As a result, the states of the ICT system specific to each grid service, can be determined based on availability, latency and accuracy. The states of the ICT system specific to each grid service represent normal service performance (Normal State), degraded but acceptable service performance (Limited State) and failed service respectively (Failed state). This emphasises the fact that, in CPES the states of PS and ICT systems are bridged via the ICT-enabled PS services i.e., the interdependency between PS and ICT systems is through these services. The details of the ICT states and the state classification are elaborated in the following section.

ICT system state classification

State definition and classification for communication networks has been studied in detail with the ResiliNets project (Sterbenz et al. 2010). In this approach, the communication network states are depicted on a 3x3 matrix using operational states and service parameters. The definition of ’operational state’ used in the ResiliNets project differs from the definition targeted in this paper, in the sense that the former considers only the communication network and not the service provided by it. In this paper, ICT systems includes field devices, communication network as well as the processing devices such as servers. The operational state in this paper considers both the ICT infrastructure as well as the performance of the grid services it enables.

In Kamps et al. (2018), the states of a decentralised network automation system is categorised into three categories – in service, unobservable and IED (Intelligent Electronic Devices) outage. These states consider both operational state as well as their services provided. However, the focus is only on availability of components and assumes latency requirements to be intact. This method is extended in this paper to also include PS requirements so as to develop an ICT state definition, considering specific grid services. This provides a holistic view of power and ICT systems in the interconnected CPES. ICT plays a role in monitoring, market communication and economic dispatch. It also enables the aforementioned high-level services and RA for system operation as well as to defend against and mitigate contingencies. The proposed state description considers all these roles of the ICT system.

Elements of ICT system states

In order to determine the state of the ICT system in context of grid services, the operational state and service level aspects are considered, as shown in Sterbenz et al. (2010). The approach followed in this paper combines operational state and service level to determine the ICT states. This is done in order to align this state definition with that of the ESSC. The following properties are used to determine the state of the ICT system. Note that IT-Security related properties such as confidentiality and integrity are beyond the scope of this paper.

Availability -Availability is defined as the total time a system (or component) is fully functional over a time interval [0,t]. A high availability is measured to ensure that the ICT system is persistently providing its service. Instantaneous availability is the probability that a system (or component) will be operational at a specific time t. Availability, for instance, can be determined using heartbeat signals (Xu et al. 2018).

Accuracy -Accuracy is defined as the closeness of a measurement to its true value. Accuracy of measurements in turn helps determine the accuracy of a decision made by an ICT system based on these measurements. Measuring accuracy is cumbersome but can be done using anomaly detection techniques as shown in Brand et al. (2019).

Latency -Latency is defined as the total time lapse between transmission and reception of measurements and control signals. This also includes the processing times in components such as servers and routers. This element is measured to determine the timeliness of the communication network. Latency is a QoS aspect and can be measured as shown in Guo et al. (2015).

ICT system state identification process

Using the aforementioned properties, the state of the ICT system can be identified as follows:

Normal State -In this state, the ICT-enabled grid service is fully functional, i.e., there are no failures. Accurate measurements are recorded, transmitted, received and decisions are made accordingly. Measurements and control signals are communicated in a timely manner, i.e., latency is within desirable limits. In the case of a contingency, appropriate curative remedial actions can be triggered.

Limited State -This state results from non-critical ICT component failures that lead to a performance degradation in terms of availability, accuracy or latency, yet not to a complete failure of the service. Presence of redundant components in the ICT system helps maintain a suitable level of performance, as redundancy enhances availability. However, the ICT system is considered to be in Limited state, when redundant (or back-up) components are used. In this state, the ICT-service still has an acceptable performance, but there is an increased risk of further degradation, arising due to disturbances. In compliance with the Alert state of the ESSC, the Limited ICT state serves as a warning to the system operator regarding the ICT-enabled grid services.

Failed State -In the Failed state, critical ICT components required for normal operation have failed. Grid services enabled by the ICT system have also failed and are no longer available. For instance, failures in hardware and software vital to services can result in their low availability and accuracy. As a result, in this state, the ICT system may give severely delayed or inaccurate measurements and control actions, thereby affecting the grid services.

Case studies: analysis of ICT-enabled grid services

This section provides three case studies to demonstrate the proposed novel state description as well as the interdependency between the states of PS and ICT domains in CPES. After outlining the CPES architecture considered, the ICT states of two grid services, namely State Estimation (SE) and OLTC control are analysed individually in the first two case studies. These grid services are chosen to represent a High-level service and a RA respectively, which are the previously identified bridging aspects. The third case study analyses the ICT states of SE-based OLTC control to investigate the states of interdependent grid services sharing the same ICT infrastructure. In all three cases, the resulting impact of ICT state degradation on the interconnected PS is also investigated.

Figure 4 shows the considered CPES with the two grid services. A typical implementation of grid services consists of a set of Operational Technology (OT) devices, such as Remote terminal Units (RTU) and IEDs, placed in the PS to measure required parameters. In addition to sensing, the OT devices located at the DERs also include control capabilities. A server, located in the control room, provides the computational resources required for running various processing algorithms, such as SE. In this context, the control room represents the system operator. Field measurements are transmitted to the control room using a communication network with devices such as routers, links and network switches. The communication network could either be wired or wireless and has a delay associated with it. However, in Fig. 4 the components of the communication network components are abstracted and depicted as a link with a total latency. Each component of the ICT system has a certain functionality like measuring (e.g., OT devices), data transmission (e.g., router and links) and processing (e.g., server), which is associated with certain processing times. The server processes these measurements and provides suitable SE results. The transformer tap changing is done using an ICT-based OLTC controller. This controller can receive both local as well as remote measurements. Local measurements are received via direct link to the OT device, whereas remote measurements are received via the communication network. The OLTC controller can alternatively also operate based on the results of SE service, which it receives from the control room.

Fig. 4
figure4

Exemplary CPES showing State Estimation and OLTC Control services

Both services have several ICT requirements for their normal operation. Failures in any of the ICT components may violate these requirements, thereby affecting the performance of the services. The aforementioned requirements in terms of availability, latency and accuracy are quantified for the specific implementation of the underlying ICT system. As a result, these conditions depend on the performance of components within the ICT system. In this paper, the impact of component failures on these conditions is considered, i.e., failures in the existing ICT system and their impact on services is investigated. The states of grid services are derived based on their performance.

Case study 1: analysis of a high-level service - state estimation

SE is one of the most important ICT-enabled services, which performs real-time monitoring of PS (Abur and Exposito 2004). It involves estimating the state variables of PS, namely voltage magnitude and phase angles, from the measurements gathered from the OT devices (e.g., RTU, IED) at any given time. Typical field measurements include active and reactive power flows, current magnitude, voltage magnitude and active and reactive power injections. Additionally, the status of circuit breakers and switches are used to determine the current system topology. SE helps the system operators to identify the current operational state of the PS in accordance with the ESSC. Redundant measurements can be used to reduce the impact of measurement and telemetry errors, leading to a more accurate estimation. As shown earlier, SE is a high-level service and a failure of SE service causes the PS state to degrade to Emergency State.

In order to model the performance of SE service, the requirements for the service to perform normally are to be investigated. This can be determined using two aspects, namely, solvability and accuracy. Consider a PS with n state variables and m measurements, with mn. From (Lukomski and Wilkosz 2008), the typical condition for the solvability of SE can be identified as ρ(H)>=n, where ρ(H) is the rank of the measurement Jacobian matrix H that relates the measurements m with the state variables n. Let Ms={Ms1,...,Msn} denote the set of field measurements received at the control room. It is evident that the accuracy of SE results is influenced by the accuracy of Ms, assuming that the SE algorithm runs ideally. The accuracy of Ms, which is determined using its standard deviation σ, must satisfy \(\sigma _{M_{sn}} < \sigma _{max_{Mn}}\), where \(\sigma _{max_{Mn}}\) represents the maximum allowed deviation of measurements set by the SE. Moreover, since SE is performed dynamically, there exists a time constraint as well. A typical SE provides an estimation for a specific time period. To do so, data from the field must be available and processed in the control room within this time period. This is referred to as latency l and may vary depending on the implementation. Typical latency requirements for SE can be found in Kansal and Bose (2012); Kuzlu et al. (2014). In this paper, the median value of the latency is used since if the median latency (η(l)) is lower than the permitted latency, then it implies that majority of the measurements satisfy the latency requirements of the SE service, i.e., η(l)<llimits. The solvability condition can be violated due to either unavailability of field measurements (e.g., due to OT device failure) or excessive latency in communication network (e.g., due to congestion). The SE service may also have a set of pseudo measurements Mp={Mp1,...,Mpn} available at the control room. Each element in Mp is related to the corresponding element in Ms. Pseudo measurements are typically derived using the knowledge of historical measurements available in the control room. If a certain measurement is not received in the control room within time interval l, corresponding pseudo measurements may be used. Since the ICT systems aims to provide normal operation of grid services, the state of the ICT system can be determined based on the performance of SE service. These states are described as follows:

Normal State -SE is said to be in Normal State if both solvability and accuracy conditions are satisfied, i.e., ρ(H)≥n from available OT devices, η(l)<llimit from the communication network and \(\sigma _{M_{sn}} < \sigma _{max_{Mn}}, \forall M_{s}\). Note that, Mp is not used in Normal State. In this state, the system operator can use this service to perform real-time monitoring and estimation of the required state variables.

Limited State -Disturbances such as failures in OT devices or congestion in communication networks may cause both the solvability and accuracy conditions to be violated. In the case that ρ(H)<n or \(\sigma _{M_{sn}} > \sigma _{max_{Mn}}\), suitable Mp may be used, if available, to fulfil these conditions. Since Mp are derived based on historical data, they are less accurate in representing the current status of the PS when compared to Ms. Therefore, the SE service is said to be in Limited State when Mp are used. However, it has to be ensured that \(\sigma _{M_{pn}} < \sigma _{max_{Mn}}\). In this state, the performance of SE is lower compared to that of the Normal State, i.e., the system operator can determine the current state of the system but with decreased accuracy of the estimated variables. Suitable actions must then be taken to restore the SE service to the Normal State.

Failed State -If ρ(H)<n and Mp=, then the SE is said to be in the Failed state. This is because the estimation algorithm is no longer solvable with available Ms. Additionally, if the accuracy of Ms received by SE falls below a certain predefined limit set by the system operator, the service is said to have failed. This can also happen when there is a lack of sufficient accuracy Mp. Such situations can be caused due to multiple failures in the ICT system, which can in turn decrease the accuracy of data received at the control room. Failures in the control room server can also result in the failure of SE service. This can however, be mitigated by the presence of redundant (or backup) servers. In this state, the SE loses its monitoring and estimation capabilities and hence, the PS moves to Emergency State (according to ESSC).

The conditions for ICT state classification for SE service are summarised in Fig. 5, which can be viewed in the place of the dotted box of Fig. 3. Using the states of SE service, which indicates its performance, the operator has better information regarding the operation of CPES as a whole i.e., both PS and ICT system. For example, when the PS is in Normal State and ICT-enabled SE service is in Limited state, the operator can know that there is an increased risk ICT disturbances causing SE service to fail; thereby pushing the PS into Emergency State (according to ESSC). The CPES operator can then be prepared to handle impending disturbances by possibly dropping the PS to Alert state, while being aware that the accuracy of SE results has decreased, indicated by the service being in Limited state.

Fig. 5
figure5

ICT State Classification of High-level Service - State Estimation

Case study 2: analysis of a remedial action - OLTC control

This section investigates the ICT states of OLTC control, which is a remedial action. Transformers with OLTC decouple grids with different voltage levels. By varying the tap-position, the voltage of the entire secondary side can be adjusted and kept within the permissible limits, thereby serving as a remedial action for voltage problems in the secondary side. Contrary to conventional MV-LV transformers with fixed ratios, OLTC-equipped devices can vary their tap position during operation to dynamically adjust their ratio and thus provide better control capabilities (FNN 2016). As mentioned earlier, the OLTC controller in Fig. 4 operates by combining a local voltage measurement (e.g., directly at LV side of the transformer) and a remote measurement from voltage-wise the most critical node of the grid (e.g., voltage at the farthest bus from the transformer) (Kamps et al. 2018). Since the voltage profile of traditional unidirectional LV-feeders grids strictly decrease from transformer to the end of the feeder, the voltage profile of the whole can be sufficiently described using only these two measurements. In this case study, the state of the ICT system is determined by the performance of the OLTC control grid service.

Normal State -The OLTC service is said to be in Normal state when the controller receives both local and remote measurements accurately and in-time; and is able to suitably change the transformer taps. This implies that availability, accuracy and latency requirements are satisfied.

Limited State -In case of certain ICT failures, the remote voltage measurement may be unavailable or excessively delayed. The OLTC controller now acts solely based on the local voltage measurement. Compared to the Normal State, the OLTC controller in this state lacks knowledge of the grid’s overall voltage profile. Tap changing may therefore cause voltage problems in other parts of the grid, especially at the nodes furthest away from the transformer as described in Palaniappan et al. (2019). This represents the Limited State of OLTC control remedial action.

Failed State -Events such as outages in the OLTC controller, local sensor failure or failures in the local direct connection (depicted with green bold arrows in Fig. 4) may prevent the automatic adjustment of transformer taps. In this case, OLTC control cannot contribute to remedying voltage problems in the LV grid. This represents the Failed state of OLTC control service, where the remedial action is no longer available. Depending on the implemented fall-back solution, the transformer tap may automatically reset to the mid position or remain on the last tap position as shown in Kamps et al. (2018).

The conditions for ICT state classification of OLTC control are summarised in Fig. 6, which can be viewed in the place of the dotted box of Fig. 3. Using the states of OLTC control service, the operator can be made aware of potential ICT contingencies that could cause this RA to fail. In the case where the OLTC service is required to remedy an impending contingency, the service being in Failed state would cause the PS (or ESSC) to drop from Normal to Alert State. This is due to the fact that if the contingency occurs, the system operator cannot mitigate its impact and remedy the system (since OLTC control is the only RA considered in this case study).

Fig. 6
figure6

ICT State Classification of Remedial Action - OLTC Control

Case study 3: analysis of SE-based OLTC - interdependant grid services

This case study considers a high-level service (SE) along with a remedial action (OLTC control). It investigates the implications of interdependency between the ICT-enabled grid services and shows how state degradation of one service affects the other. The resulting impact on the states of PS (ESSC) is also investigated.

Case study 2 shows an exemplary stand-alone OLTC controller. However, the increasing penetration of DERs leads to more complex voltage profiles, i.e., they can no longer be assumed to be strictly decreasing from the transformer. This implies that OLTC control requires measurements from several nodes across the grids in order to efficiently remedy voltage-violations in the grid. In this case study, the OLTC control operates based on the results of SE service, which it receives from the control room via communication network (refer Fig.4). This replaces the remote measurement of case study 2 and gives the OLTC controller a system-wide perspective as shown in Salih and Chen (2016). The OLTC service is said to be in Normal state when it receives accurate and timely estimates from the control room based on the SE service. In case of a SE failure or a communication failure between the controller and control room, the OLTC service uses the same falls-back mechanism as shown in case study 2, i.e., operation based on local measurement. This is referred to as the Limited state of the OLTC service. The Limited and the Failed states of OLTC service are similar to that of case study 2 and are shown in Fig. 6.

Figure 7 shows the impact of grid service state degradation on each other as well as on the states of PS (ESSC). The black arrows denote direct state transitions that have already been described in the previous two case studies. Note that, for the sake of brevity, not all possible transitions are shown in Fig. 7. The direct transitions between PS states (e.g., Normal to Alert or Alert to Emergency) are also not shown here and are assumed to be implicit. For instance, when the OLTC control is in Failed state, the PS drops to the Alert state. In this situation, the occurrence of a PS contingency can potentially lead to OL violations as the RA (i.e., OLTC control) is in Failed state. This would ultimately result in the PS dropping to the Emergency state. The blue arrows, on the other hand, represent examples of conditional state transitions that need to be added to the ESSC in order to obtain a state description for the CPES as a whole. They indicate transitions that occur when certain other conditions are satisfied. Two such exemplary conditional state transitions are shown in Fig. 7 and are described below.

Fig. 7
figure7

State Transitions considering Interdependency between SE and OLTC Control

Conditional State Transition 1: In this scenario, the consequence of the SE being in Limited state is that voltage estimates with decreased accuracy are sent to the OLTC controller. Even though the Limited state of SE does not affect the state of the PS, it can cause the OLTC service to drop to Limited state. The OLTC can then either use these insufficiently accurate SE results or rely on the local measurement only. This is important because it may lead to cascading failures as demonstrated by Conditional State Transition 2.

Conditional State Transition 2:The OLTC control based on insufficiently accurate SE results or on local measurement may lead to potentially incorrect tap-changing decisions. Since the OLTC control is the only available RA in this case study, a faulty tap-change can therefore lead to a voltage-violation in case of a contingency; thereby causing the PS to drop to Emergency state. This state transition demonstrates the possible impact of an RA operating incorrectly on the interconnected PS.

In the presence of multiple grid services, this case study highlights the need to consider their interdependency as it can lead to additional means of state degradation in both PS and ICT systems.

Conclusion and future work

This paper provides a novel state description for CPES by bridging the PS and the ICT domains. The PS states are based on the widely-used ENTSO-e state classification (ESSC), whereas the proposed novel state classification considers the ICT system via the ICT-enabled grid services. Two bridging aspects between the two domains of CPES have been identified. One such bridge is the PS high-level services like state estimation, which provide the overall monitoring and control capabilities to system operators. A performance degradation of the underlying ICT system causes a state transition in the PS, if the requirements of the high-level services are violated. The second bridge is the remedial actions and was demonstrated with the help of OLTC-control service. These actions are a set of preventive and curative tools of the PS and are meant to counter potential contingencies in the PS.

The presented novel state description for CPES can be utilised as a tool to identify those critical ICT-enabled PS services, whose failure could threaten the stable operation of PS. If a service, that is - due to ICT-disturbances - considered as Limited or Failed, causes a PS state degradation, then that service should be considered as critical. The state description also directly links measurable ICT performance indicators (e.g., latency, availability) to their impact on the PS states. The presented case studies illustrate that, in addition to PS state elements, ICT state elements should also be considered for the operation of a CPES. It also shows how the state of ICT system can be explicitly classified and integrated into the CPES state description.

A noteworthy aspect of this contribution is that the states of the ICT-enabled grid services are not aggregated into unified states representing the whole ICT system. This is because with aggregating different grid services comes a loss of certain vital details with regard to PS operation. For instance, if there is currently no voltage problem, then the OLTC RA is irrelevant for the system operation. In other words, an RA is required only when there is a contingency that requires it. Moreover, different grid services may have different requirements based on their specific implementations, which implies that the state of the grid services heavily depend on their implementation. Although an aggregated ICT system state description might conceal the interdependencies of distinct ICT-enabled PS services, it is an interesting area of investigation for communication network operators considering the growing usage of public (or shared) communication networks in CPES. In this case, it is essential to analyse which PS services depend on a shared ICT infrastructure due to the increased risk of common cause failures.

Additional analyses aimed at identifying not only new high-level services and remedial actions in future PS, but also their exact requirements on the performance of ICT systems is recommended. Using these requirements, mathematical formulations for the interdependencies in CPES can be derived. The novel state description can thus be used as an operational state model for CPES.

The proposed Limited state of the grid services introduces a stochastic component to the state description. Due to the stochasticity, the relevant service is at a risk of insufficient performance. A detailed analysis of this stochastic behaviour is also a crucial task for future research in the field of CPES.

Availability of data and materials

No additional data or material is used for this article.

References

  1. 50hertz, Amprion, Tennet, Transnet BW (2013) Systemschutzplanplan der Vier Deutschen Übertragungsnetzbetreiber. 50hertz, Amprion, Tennet, Transnet BW. https://www.netztransparenz.de/portals/1/Content/EU-Network-Codes/ER-VErordnung/Systemschutzplan%20der%20%C3%9CNB%20-%20Hauptdokument.pdf. Accessed 30 Sept 2020.

  2. Abur, A, Exposito AG (2004) Power System State Estimation: Theory and Implementation. CRC press, London.

    Google Scholar 

  3. Brand, M, Ansari S, Castro F, Chakra R, Hassan BH, Krüger C, Babazadeh D, Lehnhof S (2019) A framework for the integration of ict-relevant data in power system applications In: 2019 IEEE Milan PowerTech, 1–6.. IEEE Publishing, New York.

    Google Scholar 

  4. Doostizadeh, M, Khanabadi M, Ettehadi M (2018) Reactive power provision from distributed energy resources in market environment In: Electrical Engineering (ICEE), Iranian Conference On, 1362–1367.. IEEE. https://doi.org/10.1109/icee.2018.8472570. Accessed 30 Sept 2020.

  5. Drayer, E, Hegemann J, Lazarus M, Caire R, Braun M (2015) Agent-based distribution grid operation based on a traffic light concept In: 23nd Int. Conf. on Electricity Distribution (CIRED).. CIRED, Belgium, Lyon.

    Google Scholar 

  6. European Union (EU) Commission Regulation 2017/1485 (2017) Establishing a Guideline on Electricity Transmission System Operation. http://data.europa.eu/eli/reg/2017/1485/oj. Accessed: 7 May 2020.

  7. European Union (EU) Commission Regulation 2017/2196 (2017) Establishing a Network Code on Electricity Emergency and Restoration. http://data.europa.eu/eli/reg/2017/2196/oj. Accessed: 7 May 2020.

  8. Fink, LH, Carlsen K (1978) Operating under stress and strain [electrical power systems control under emergency conditions]. IEEE Spectr 15(3):48–53.

    Article  Google Scholar 

  9. FNN (2016) Voltage regulating distribution transformer (vrdt) – use in grid planning and operation. Technical report, Forum Netztechnik / Netzbetrieb im VDE (FNN). https://www.vde.com/resource/blob/1570326/c4c73c2670f47f82071b81eab368b85e/download-englisch-data.pdf.

  10. Gagnon, JM, Madani V, Novosel D, et al. (2007) Defense plan against extreme contingencies. Report, CIGRE.

  11. Guo, C, Yuan L, Xiang D, Dang Y, Huang R, Maltz D, Liu Z, Wang V, Pang B, Chen H, et al (2015) Pingmesh: A large-scale system for data center network latency measurement and analysis In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, 139–152.. ACM, New York, London.

    Google Scholar 

  12. IEA (2019) World Energy Outlook 2019. International Energy Agency, Paris:690. https://doi.org/10.1787/caf32f3b-en.

  13. Kamps, K, Möhrke F, Zdrallek M, Awater P, Schwan M (2018) Modeling of smart grid technologies for reliability calculations of distribution grids In: 2018 Power Systems Computation Conference (PSCC), 1–7.. IEEE. https://doi.org/10.23919/pscc.2018.8442727.

  14. Kansal, P, Bose A (2012) Bandwidth and latency requirements for smart transmission grid applications. IEEE Trans Smart Grid 3(3):1344–1352.

    Article  Google Scholar 

  15. Kuzlu, M, Pipattanasomporn M, Rahman S (2014) Communication network requirements for major smart grid applications in HAN, NAN and WAN. Comput Netw 67:74–88.

    Article  Google Scholar 

  16. Laprie, J-C, Kanoun K, Kaâniche M (2007) Modelling interdependencies between the electricity and information infrastructures In: International Conference on Computer Safety, Reliability, and Security, 54–67.. Springer-Verlag Berlin Heidelberg, Heidelberg.

    Google Scholar 

  17. Lukomski, R, Wilkosz K (2008) Methods of measurement placement design for power system state estimation. AT&P J PLUS2:75–79.

    Google Scholar 

  18. Narayan, A, Klaes M, Babazadeh D, Lehnhoff S, Rehtanz C (2019) First approach for a multi-dimensional state classification for ict-reliant energy systems In: International ETG-Congress 2019; ETG Symposium, 1–6.. IEEE Publishing, New York.

    Google Scholar 

  19. NERC (2004) Technical Analysis of the August 14, 2003 Blackout: What Happened, Why, and What Did We Learn? Report to the NERC Board of Trustees by the NERC Steering Group. System:1–119.

  20. Palaniappan, R, Hilbrich D, Bauernschmitt B, Rehtanz C (2019) Co-ordinated voltage regulation using distributed measurement acquisition devices with a real-time model of the cigré low-voltage benchmark grid. IET Gener Transm Distrib 13(5):710–716.

    Article  Google Scholar 

  21. Panteli, M (2013) Impact of ict reliability and situation awareness on power system blackouts. PhD thesis, The University of Manchester (United Kingdom).

  22. Rasmussen, TB, Yang G, Nielsen AH, Dong Z (2017) A review of cyber-physical energy system security assessment. 2017 IEEE Manchester PowerTech:1–6. https://doi.org/10.1109/ptc.2017.7980942.

  23. Rinaldi, SM, Peerenboom JP, Kelly TK (2001) Identifying, understanding, and analyzing critical infrastructure interdependencies. IEEE Control Syst Mag 21(6):11–25.

    Article  Google Scholar 

  24. Salih, SN, Chen P (2016) On Coordinated Control of OLTC and Reactive Power Compensation for Voltage Regulation in Distribution Systems With Wind Power. IEEE Trans Power Syst 31(5):4026–4035.

    Article  Google Scholar 

  25. Schacht, D, Lehmann D, Vennegeerts H, Krahl S, Moser A (2016) Modelling of interactions between power system and communication systems for the evaluation of reliability In: 2016 Power Systems Computation Conference (PSCC), 1–7.. IEEE. https://doi.org/10.1109/pscc.2016.7540949.

  26. Schossig, T, Schossig W (2014) Disturbances and blackouts-lessons learned to master the energy turnaround In: 12th IET International Conference on Developments in Power System Protection. https://doi.org/10.1049/cp.2014.0003.

  27. Sterbenz, JP, Hutchison D, Çetinkaya EK, Jabbar A, Rohrer JP, Schöller M, Smith P (2010) Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines. Comput Netw 54(8):1245–1265.

    Article  Google Scholar 

  28. Tøndel, IA, Foros J, Kilskar SS, Hokstad P, Jaatun MG (2017) Interdependencies and reliability in the combined ICT and power system: An overview of current research. Appl Comput Inform. https://doi.org/10.1016/j.aci.2017.01.001.

  29. Wäfler, J, Heegaard PE (2013) Interdependency modeling in smart grid and the influence of ICT on dependability. Lect Notes Comput Sci (Incl subseries Lect Notes Artif Intell Lect Notes Bioinforma):185–196. https://doi.org/10.1007/978-3-642-40552-5_17.

  30. Wu, Y-K, Chang SM, Hu Y-L (2017) Literature review of power system blackouts. Energy Procedia 141:428–431.

    Article  Google Scholar 

  31. Xu, SS, Mak M-W, Cheung C-C (2018) Patient-specific Heartbeat Classification based on i-vector adapted Deep Neural Networks In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 784–787.. IEEE Publishing, New York.

    Google Scholar 

  32. Yu, X, Xue Y (2016) Smart Grids: A Cyber-Physical Systems Perspective. Proc IEEE 104(5):1058–1070.

    MathSciNet  Article  Google Scholar 

  33. Zwartscholten, J, Klaes M, Mayorga Gonzalez D, Subhan F, Narayan A, Rehtanz C (2020) Impact of Increased ICT Latency on Active Distribution Network Control. ENERGYCON 2020 - EasyChair Preprint no. 3060. https://easychair.org/publications/preprint/grl6. Accessed 30 Sept 2020.

Download references

Funding

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project numbers 359778999 and 360352892 - as part of the priority program DFG SPP 1984 - Hybrid and Multimodal Energy Systems: System theory methods for the transformation and operation of complex networks. Publication costs were covered by the DACH+ Energy Informatics Conference Organizers, supported by the Swiss Federal Office of Energy.

Author information

Affiliations

Authors

Contributions

The authors M. Klaes (MK), A. Narayan (AN), A.D. Patil (ADP) and J. Haack (JH) contributed equally to this work from conceptualization to writing the paper. MK and JH focused mainly on the power system aspects. AN and ADP focused mainly on the ICT system aspects. M. Lindner contributed in conceptualization. The authors C. Rehtanz, M. Braun, S. Lehnhof, H. de Meer contributed with expert knowledge in the field and review. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Marcel Klaes.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Klaes, M., Narayan, A., Patil, A.D. et al. State description of cyber-physical energy systems. Energy Inform 3, 16 (2020). https://doi.org/10.1186/s42162-020-00119-3

Download citation

Keywords

  • Operational state classification
  • ENTSO-E system states
  • ICT system states
  • ICT-enabled grid services
  • Remedial actions