Federated learning for 5G‑enabled infrastructure inspection with UAVs

to check for cracks and corrosions, missing insulators and detect thermography as well as wiring problems around

the inspection is achieved by electricity operation center personnel and operators, that investigate the transmitted UAV footage and upon the detection of failures instruct operators to resolve the problems on-site. Identifying the electrical infrastructure, respective assets and faults though is time-consuming as it requires manual effort from the operators. To this end, Artificial Intelligence (AI) techniques are gradually employed to automate asset detection and hence reduce the inspection time (Lekidis et al. 2022). Coupled with 5G technologies they offer real-time latency that meets the requirements of UAV-based infrastructure inspection (3rd Generation Partnership Project 2019).
However, since the AI models are trained in the operation center with real-time video obtained from the UAVs, the identification of electrical assets depends to the use of the 5G Core Network (5GCN) (3rd Generation Partnership Project 2019) resources for data exchange and service orchestration. As such resources are deployed in a central operation center this does not offer networking scalability (Satyanarayanan 2017). Instead, if the models are deployed and executed in edge nodes, networking issues would be avoided as training would be distributed using local UAV data. This also avoids a single-point-of-failure for the automated UAV inspection in case a fault or a cyber-attack occurs in the operation center (Litchfield et al. 2016). Furthermore, as training of the AI models is performance and memory intensive, Cloud environments are usually employed. The use of such environments though, imposes privacy issues as the sensitive company data are leaving the infrastructure facility.
The recent emergence of Federated Learning (FL) (AbdulRahman et al. 2020) allows edge nodes to receive configurations and parameters from the operation center whilst performing AI network training locally. Specifically, the operation center Cloud environment first defines a global model with learning parameters. Each worker downloads the global model, computes the model update by using its local UAV data and then offloads the computed local update back to the operation center. Afterwards, the operation center combines all local model updates and constructs a new improved global model. Furthermore, FL ensures privacy as the data does not leave the electrical infrastructure facility (Li et al. 2020).
In this article we introduce an FL method for automating the UAV inspection. The method is based on the use of edge nodes, running an edge platform that is used for interacting and offloading the computation from the UAVs. Moreover, data and control commands in the method are exchanged through the use of 5G Network Function Virtualization (NFV) technologies, such as network slicing and Multi-access (or Mobile as termed earlier) Edge Computing (ETSI: GR MEC 017 2018). Additionally, since UAV inspection applications have real-time requirements, the formed 5G network slice belongs to the Ultra-Reliable Low-Latency Communication (URLLC) category as defined by the third generation partnership (3GPP) Release 15 (3rd Generation Partnership Project 2019). This category is identified by its reliability and low message latency requirements. Finally, the FL method is illustrated for the inspection of Public Power Corporation's (PPC) research center, called Innovation Hub. The experiments illustrate the method benefits in comparison with the centralized UAV-based inspection (Lekidis et al. 2022). In terms of concrete contributions the article builds on the following: • FL mechanisms for electricity infrastructure inspection by UAVs in urban areas.
• MEC platform for electricity asset identification and fault detection using FL models. • Automated update of the FL models through an interaction between the MEC platform and PPC's operation center.
The rest of the article is organized as follows. Section Background provides an overview of the UAV-based inspection phases, the AI models that are used for the inspection as well as an introduction to FL. Section Methodology provides an overview of the UAVbased inspection approach using the FL method and the automation mechanisms for the interaction between the MEC platform of each edge node and PPC's operation center.
In section Autonomous UAV inspection using federated learning the methodology is applied in PPC's Innovation Hub and experiments are conducted to demonstrate FL benefits in comparison with a centralized AI method for electrical infrastructure inspection. Finally, section Conclusion provides conclusions and perspectives for future work.

Background
In this section we provide an overview of the proposed electricity infrastructure inspection method, an introduction to Long Short Term Memory (LSTM) networks (Yu et al. 2019) that are used for infrastructure asset and fault detection as well as a brief description of the FL approach.

UAV-based infrastructure inspection
The time-critical requirements of the UAV-based infrastructure are ensured through the use of Network Function Virtualization (NFV) technologies and specifically the establishment of a URLLC network slice between the UAVs, the edge nodes and a operation center facility. To form the URLLC slice the operation center also includes an NFV Management and Orchestration (MANO) (Mijumbi et al. 2016) Virtual Network Function (VNF) for lifecycle management and orchestration, 5GCN (3rd Generation Partnership Project 2019) and Radio Access Network (RAN) VNFs for data exchange as well as and User Plane Function (UPF) VNF for processing the user traffic. Additionally, the UAVs are also included in the network slice. In Fig. 1 illustrate the inspection method overview. Specifically, the UAVs are controlled by edge nodes, which are also used for interpreting and mapping in real-time the UAV location and flight plans. In the chosen architecture each edge node is considered as a Commercial-of-the-Self hardware platform that is configured with a MEP in fixed ground locations (i.e. base stations). Moreover, through Forwarding is based on cellular 5G connectivity and proper routing mechanisms applied from one edge node to another.
The UAVs communicate with the edge nodes by using a dedicated antenna allowing cellular 5G connectivity (Lekidis et al. 2022). Edge nodes also include traffic routing functions to prevent potential collisions during take-off and landing. Furthermore, they aggregate data from multiple UAVs that are used for aerial inspections of electricity infrastructures. Edge nodes reduce the distance of the communication loop from the UAV to the operation center as well as allow faster data-based decision making. This is accomplished through the encapsulation of the UAV data into common Internet of Things (IoT) protocol formats, such as the Constrained Application Protocol (CoAP) (Lekidis and Katsaros 2018).

Long short term memory networks
Long Short Term Memory networks LSTMs), first introduced by Hochreiter and Schmidhuber (1997), are a special kind of Recurrent Neural Network (RNN) (Medsker and Jain 2001), capable of learning long-term dependencies. All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer. Moreover, traditional RNN models generally experience a vanishing gradient problem which impedes learning of long data sequences. This is because when the gradient becomes smaller, the RNN parameter updates become intangible, which hinders the learning process.
LSTMs also have a chain-based structure, with the main difference lying on the repeating module. Instead of having a single neural network layer, they constitute of four layers interacting in a very special way. In Fig. 2 the pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers.

Fig. 2 LSTM layers and primitive operators
Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations.
An important part of LSTMs is the cell state, that is represented by the horizontal line running through the top of the diagram through h t−1 to h t in Fig. 2. The cell state runs straight down the entire chain and allows information to flow along it unchanged. The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates. Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation. The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through. A value of zero allows to let nothing through, while a value of one allows to let everything through.
An LSTM has three gates, namely, forget gate, input gate, and output gate, to protect and control the cell state. These three gates solve the vanishing gradient problem of RNNs by collectively controlling which information in the cell state to forget, given new information entered the network, and which information to be mapped to the network output. Furthermore, based on their architecture and layers, LSTM AI models are very effective in capturing dynamic temporal correlations. Such correlations are present in electricity infrastructures allowing the asset identification and detection of faults with high accuracy.

Federated learning
FL is a recently introduced AI approach, which aims at training a model across multiple local datasets, contained in decentralized edge nodes or servers. Specifically, the edge nodes store data samples locally and do not perform exchange to central external serves. This also aids in addressing critical issues such as data privacy, security and access rights to heterogeneous sources. The FL approach counters the challenges that are faced in (a) traditional centralized learning techniques where all data are forwarded to a centralized server and (b) classical distributed AI techniques, which assume that the local data are identically distributed and have the same size. The general FL design involves training AI models on data samples locally and exchanging parameters (e.g., weights in a RNN) among those local models to generate a global model. FL algorithms may either (1) employ a centralized server that orchestrates the various steps of the algorithm and serves as a main synchronization reference, or (2) they may also be peer-to-peer, where no centralized server exists. Due to the presence of a synchronization reference, the former are usually more preferable in large-scale deployments for controlling the asynchronous data exchange through edge nodes. The FL process with a centralized server is divided into multiple rounds, each consisting of four steps: 1 Local training all local edge nodes compute training gradients or parameters and send locally trained model parameters to the central server. 2 Model aggregation the central server performs secure aggregation of the uploaded parameters from all the local edge nodes without learning any local information. 3 Parameter broadcasting the central server broadcasts the aggregated parameters to the every local edge node.
4 Model update all local edge nodes update their respective models with the received aggregated parameters and examine updated models' performance. After several local training and update exchanges between the central server and its associated local edge nodes, it is possible to achieve a global optimal learning model.
The main considerations that are restraining FL usage in many applications are the following points: (1) heterogeneity of distributed devices that may cause security implications, (2) biased training dataset considerations from the individual devices (Kairouz et al. 2021) and (3) coordination of many devices during training, which is highly expensive in terms of communication resources. However, to address point (1), employed devices include a Network-based Intrusion Detection System for early-stage detection of cyber-attacks (Lekidis et al. 2022) and for point (2) the formed 5G network slices for UAV-based inspections and the multiple inspection datasets ensure that data training is fair. Finally for point (3), the introduction of 5G NFV technologies aids in meeting the real-time requirements of critical applications, such as electricity infrastructure inspection and fault detection.

Methodology
In this section we describe the techniques for automated UAV-based inspection with edge nodes relying on the FL method (section Background). Initially, we focus on describing the MEC platform running on the edge nodes and afterwards we illustrate the automated interaction for the UAV-based inspection using the FL method.

Mobile edge platform overview
UAVs have resource constraints at processing and storage level. Nevertheless, often the data that are gathered from the sensors require processing, before an actual verdict is reached that will lead to autonomous actuation actions. Additionally, storing the data locally at the device level may lead into overflow in memory or storage resources. A common solution to these issues that was followed till recently was the presence of a Cloud environment deployed in a virtualized or physical server of the operation center (Lekidis et al. 2022). However, industrial applications are characterized by real-time and critical operation that requires low latency, which cannot be provided when communicating with Cloud platforms. Hence, a gradual shift is currently observed towards edge platforms, in order to provide a computational and storage layer to this architecture. The MEC initiative (ETSI: GR MEC 017 2018) allows to extend the network slices by edge resources and services, such as MEC platform, and applications, the UPF, the RAN or even Cloud-native compute, network or storage functions. MEC ensures network scalability by distributing the processing from the centralized architecture of the Cloud platform to the edge that is located closer to the user. This allows faster response to user requests, since computations, data aggregation and analytics are handled within user proximity. A scheme that is currently followed is the presence of a dedicated management entity on the edge for the resource lifecycle management, which includes instantiation, decommissioning and other functionalities. Such entity is called Mobile Edge Platform (MEP) and provides distributed processing and storage capabilities that reduce the network management complexity.
The MEP architecture is illustrated in Fig. 3 and it is deployed in each edge node. The architecture follows the standardized interfaces and components that are defined by the ETSI MEC Industry Specification Group (ISG) (ETSI: GR MEC 017 2018). Additionally, it also includes a Virtualization Infrastructure Manager (VIM) (ETSI: GR MEC 017 2018), which interacts with the NFV MANO to receive instructions for the configuration of the VNFs and virtual links in each edge node. It allows to extend the 5G network slice for providing latency and performance improvements in UAV-based inspection as well as collision avoidance capabilities in UAV missions.
Initially, a resource interface component is used for data exchange with the UAVs. The interface is based on an extension of the Linux Foundation Fledge framework, 1 which is also offered as VNF using the virtualization environment offered by Linux EVE. 2 This environment offers isolation for the execution of applications in the mobile edge. The containers are managed by a lightweight version of Kubernetes, namely K3S, 3 that is used both as a Mobile Edge Orchestrator (MEO) and as a VIM. Moreover, the entire processing and data exchange services are running on the MEP platform, which also contains the UPF. Additionally, the MEP platform also provides accurate geolocation and trajectory data for the UAVs, using constant communication with GPS satellites.
The communication with UAV-based protocols is facilitated through the Fledge resource interface that is included in each MEP (Fig. 3). Finally, every MEP is programmed to regulate the data exchange frequency, in order to maintain minimal edge resource utilization that leads to an extended battery lifetime for autonomous operation.

Federated learning incorporated in infrastructure inspection
The FL method provides a high-level of automation since each MEP is able to interact with the NFV MANO to provide autonomous operation for the system, as depicted in Fig. 4. The MANO that is employed the Open-Source NFV MANO (OSM) 4 for the orchestration of 5G network slices. Moreover, FL ensures privacy, since the data remain at the edge level and are not stored in cloud platforms. Moreover, a potential failure to the Cloud environment leads to a loss of data, processing and management capabilities and hence a degradation of UE services and applications. A cause of failure is a potential overload or even a targeted cyber-attack. With a decentralized architecture a failure in the Cloud environment of the operation center can avoid such degradation as UE's services and applications may be served by the nearest edge entity. Through the integration of 5G networks and MEC the FL models are updated and re-trained seamlessly with local data and footage from the infrastructure that is obtained from the UAVs.
Each MEP is also able to interact with the NFV MANO to provide autonomous operation for the system, as depicted in Fig. 4.
The interaction is enabled by the LSTM models that are using the FL method, in order to be trained and executed on the edge level. The reasoning behind the choice of FL lies in the presence of multiple edge Points-of-Presence (PoP) in different distributed locations, each one including a MEP platform. The MEP platforms use FL to train locally the LSTM models and receive the parameters and configurations from the Cloud environment where the NFV MANO is deployed, in order to perform infrastructure inspection closer to the UAV's using their local data. Furthermore, the FL method allows to improve the efficiency and provide a high-level of network automation for the UAV-based infrastructure inspection method. This is accomplished by performing data processing and caching in each edge PoP.

Fig. 4 Automated interaction between Edge and NFV MANO through FL APIs
Moreover, the FL method provides automation in the formation and management of 5G network slices. In this case, the MEP receives configuration instructions from the NFV MANO for network slice instantiation or extension on the edge level. Then, supervised training techniques are used to translate high-level intents from NFV MANO into concrete instructions on how to deploy and instantiate FL LSTM models in each edge PoP. Overall, the procedure that is followed is divided into three individual steps.
Initially intent-based policies are specified, in order to receive the parameters and configurations based on which the LSTM models that will be deployed and executed in each edge PoP for infrastructure asset identification and fault detection using the UAV video data. The use of intents allows to hide complexity, technology-and vendor-specific details. Intents are described in natural language and are translated into configurations through Natural Language Processing algorithms (Chowdhary 2020). Specifically, these algorithms are trained to receive input in form of textual description of the desired service characteristics and then produce a domain specific encoding corresponding to the original intent. This process follows a step sequence: (1) the intents are pre-processed, (2) keywords are extracted and translated into meaningful actions and then (3) aggregated and validated for lack of conflicts.
As a second step, appropriate Application Programming Interfaces (APIs) on the MEP are used to receive the intents and the LSTM model configuration and parameters that are used for the inspection of the specific infrastructure (Fig. 4). To this end, the ETSI MEC ISG has provided an initial set of API's 5 to facilitate this interaction. The APIs are registered and discovered over the Mp1 reference point defined in ETSI MEC architecture (ETSI: GR MEC 017 2018). Then, the associated LSTM models and VNFs are instantiated and connected using virtual links based on the intent translation of the previous step.the intents are translated into edge configurations for deploying and instantiating the LSTM models.
The third step concerns the training of the LSTM models. This is accomplished by each MEP using local data from the UAVs and deployed models based on the intents that originate from the NFV MANO. Then, the LSTM models are executed to identify electricity infrastructures and respective assets. During the inspection and if the envisaged inspection accuracy is not achieved, the FL update service synchronizes with the MEC FL API (Fig. 4) to re-calibrate the LSTM models with different parameters and configurations that will provide accuracy improvements.

Autonomous UAV inspection using federated learning
We have deployed and tested the presented method on a fleet of hexacopter UAVs, which were used to inspect the PPC's Innovation Hub in Kantza, Greece. The aerial image of the infrastructure is illustrated in Fig. 5.
The UAVs are of Foxtech -RHEA 160 type 6 and are illustrated in the left part of Fig. 6, whereas the right part shows the process of programming them for conducting the electricity infrastructure inspection missions. Furthermore, the edge nodes used a configuration of a dual-core CPU with 2.0 GHz frequency, 4 GB RAM, 28 GB disk, an antenna and a SIM card slot for 5G connectivity.
The URLLC slice is formed using the OSM NFV MANO entity, which afterwards provides instructions to the K3s platform, allowing the configuration and instantiation of the edge VNFs. For the edge nodes we have configured the MEC platform, the UPF and application modules of the MEC platform as docker containers, managed as pods by K3s that is acting as a VIM (section Methodology). Additionally, in the K3s deployment we have also included an nginx 7 container to provide analytics and load balancing capabilities.
Two sets of experiments were performed with (1) a centralized inspection method where the LSTM models are deployed and executed in a Cloud environment of the operation center, using the approach presented in Lekidis et al. (2022), and (2) the FL method that is proposed in section Methodology. The main difference of the two methods is that in the FL method the models are deployed and executed at the edge nodes through the MEP platform. Moreover, in the centralized method training phase was performed in  the Cloud environment and had an average time of 3 h and 2 min, whereas with the FL method it lasted in average 2 h and 23 min. The results depicted high accuracy for the detection of missing insulator stems for the two methods as illustrated in Fig. 7. Table 1 illustrates Key Performance Indicator (KPI) metrics for both the FL method and the centralized inspection method. The metrics include (1) average processing time for the discovery of assets and faults from the UAVs as a measure of performance, (2) reliability of the inspection method as a percentage for the trustworthiness of the inspection method in terms of the network slice and the involved resources and (3) fault discovery rate as a percentage of the faults discovered divided by the actual ones that actually happened.
As depicted from the table, FL provides several benefits in the UAV-based inspection method by (1) minimizing the processing time for the identification of infrastructure assets and the detection of faults to 3 min, (2) improving the reliability by 14.6% in comparison with the centralized AI inspection. Finally, the fault discovery rate depends on the data gathered by the UAVs and since the same data re used, the small difference originates from the faster detection time from execution of the models in the edge.

Conclusion
This article presents a UAV-based electricity infrastructure inspection method using FLoriented AI models deployed in edge computing nodes. The models that are employed for infrastructure inspection and fault detection are based on LSTM networks. Moreover, they are executed through an newly introduced MEP platform that distributes the computation and storage closer to the UAVs. The MEP platform also interacts with the Cloud environment on PPC's operation center through dedicated APIs, ensuring a high level of automation. To this end, the NFV MANO of the Cloud platform provides high-level intents that are translated into concrete instructions on the model update,  deployment and execution at the edge. The method is illustrated for the inspection of PPC's Innovation Hub, where experiments are conducted for both centralized inspection and the proposed FL method. The two methods are compared with KPIs focusing on processing time for the discovery of assets and faults from the UAVs, method reliability and fault discovery rate. As a part of our future work, we plan to develop a dedicated micro-service for each MEP, in order to enable further automation in the network slice which will be extended to the UAV-based inspection as well. Such service will also address interoperability concerns, by providing a more efficient MEP response to resource/service discovery queries about each edge PoP service-layer and/or resource-layer status (available VNF services/ resources, AI models as well as active and historical service bindings). Moreover, we will investigate novel network slice isolation policies (Schneider et al. 2018) to enable the optimal sharing of the 5G infrastructure resources based on the directions of the ETSI Zero Touch Network and Service Management (ZSM) Working Group (ETSI 2020).

Funding
This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016941 (5G-INDUCE).

Availability of data and materials
The datasets generated and/or analysed during the current work are available upon request.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.