Skip to main content

Application research of UAV infrared diagnosis technology in intelligent inspection of substations

Abstract

This study proposes an improved YOLOv4 algorithm based on mixed domain attention mechanism to design an intelligent substation inspection system. The proposed method combines improvement strategies such as lightweight, depthwise separable convolution, and mixed attention mechanism. The experimental results showed that the identification accuracy of the proposed model was only reduced by 0.2% for test samples at different positions, promoting the accuracy of intelligent inspection to reach 97.5%. The mIoU, mAP, detection speed, and recognition accuracy of the model constructed by the research were 78.34%, 95.12%, 62.05 frames per second, and 95.12%, respectively. Therefore, the proposed model could comprehensively enhance the information expression and recognition accuracy of the system, while promoting intelligent inspection to achieve high accuracy.

Introduction

With the industrialization process, the safe and reliable operation of substations plays a crucial role in today’s society (Alhammad et al. 2021). However, the existing visual monitoring systems are difficult to adapt to the problem of accurate inspection in modern substations (Han et al. 2021). In this context, machine vision technology, as an emerging artificial intelligence technology, has brought new opportunities for the future development of intelligent and automated monitoring in substations (Wang 2022). Machine vision technology refers to a multidisciplinary integrated technology that obtains image signals (infrared, visible light, etc.) through visual products, and then extracts features through an image processing platform, and makes feature judgment and diagnosis. However, traditional infrared diagnostic techniques cannot fully reflect the actual information of substations Liu et al. (2020); Hu et al. (2023). Therefore, it is necessary to select image feature discrimination methods to comprehensively express the information content of substations. Although existing image feature discrimination methods have excellent accuracy, their large number of parameters and high computational complexity make it difficult to meet real-time requirements. Secondly, the application of ordinary convolution operations in pooling networks leads to wastage of computing resources and underutilization of lightweight strategies. In addition, existing methods have shortcomings in feature extraction capabilities, making it difficult to effectively cope with diverse scenarios in complex substation environments. In view of this, to achieve rapid detection, recognition, and positioning of power equipment in substations, this study proposes an improved You Only Look Once version 4 (YOLOv4) algorithm based on the Convolutional Block Attention Module (CBAM) to optimize the intelligent inspection system of substations.

The innovations of this research are as follows: (1) Aiming at the disadvantage of the large number of parameters in the YOLOv4 target detection algorithm, MobileNet-v3 network is proposed for lightweight; (2) The common convolution with 3 × 3 convolution kernel size in pooled networks is replaced by deep separable convolution to further reduce the number of network parameters; (3) CBAM is introduced to address the possible problem of declining feature extraction ability. In short, the improved algorithm not only has high robustness and reliability, but also greatly improves the accuracy and stability of intelligent inspection of substation power equipment.

The objectives of this study are as follows: (1) To improve the detection accuracy and recognition accuracy of intelligent inspection of substation power equipment; (2) To reduce the computational complexity of algorithms and achieve real-time detection through lightweight network design; (3) By introducing CBAM, the robustness and reliability of the model can be improved to adapt to complex substation environments.

Related work

Since the early development of infrared diagnostic technology, domestic and foreign scholars have carried out a lot of research on it. Vergura proposed to improve the photovoltaic device of the diagnosis system in view of the unstable results of specific parameters in the traditional unmanned aerial vehicle (UAV) infrared diagnosis technology. The proposed method enabled the UAV to better avoid the situation of radiation map errors in the process of image acquisition, thus ensuring the correct acquisition of image data and the smooth reading of photovoltaic heat maps (Vergura 2020). Researchers such as Nie believed that the operation and maintenance of large-scale photovoltaic power stations had high labor costs, so an intelligent inspection method of large-scale photovoltaic power stations based on deep learning was put forward. The method combined the UAV infrared diagnostic technology to highlight the defects in the image. The proposed method could learn and train through data sets, and effectively extracted hot spot features, with good recognition accuracy (Nie et al. 2020). To ensure the integrity diagnosis of transmission system, Kim and other scholars proposed an intelligent non-destructive inspection system based on infrared diagnosis technology. The system included a camera system that effectively detected partial discharges on damaged surfaces of the transmission system. The built system had been tested on the ultra-high voltage transmission system, which had shown relatively accurate damage detection ability, effectiveness and convenience (Kim et al. 2020). Wang et al. proposed a processing strategy based on U-net model in view of the insufficient efficiency and accuracy of infrared image defect analysis of composite insulators in UAV infrared diagnosis technology. Based on the infrared image conversion, the temperature matrix was derived and the target insulator was located. This method could effectively diagnose defects based on historical defects and automatically determined defect types based on temperature values (Wang et al. 2021). To make up for the limitations of micro-fault detection components of large-scale photovoltaic power plants, Hong and other researchers built a new fault detection framework composed of image acquisition, image segmentation, fault orientation and defect warning. This framework integrated deep learning algorithms and drone infrared diagnostic technology to segment and locate faults in infrared images, adapting to various lighting conditions. The final detection accuracy of the proposed method could reach 95% (Hong et al. 2022).

For the intelligent inspection of substation, many scholars have contributed to the long-term research. Wang and other researchers designed a trackless robot with a robotic arm to realize intelligent inspection of substations. The robot could move autonomously through multiple sensor data and visual markers, and was capable of autonomous navigation in complex indoor substation environments. The designed robot could meet the requirements of substation inspection, demonstrating high working efficiency and good stability (Wang et al. 2020). Dong and other scholars summarized the technical status of indoor track electrical inspection robots and analyzed the existing problems. This research mainly put forward constructive suggestions on the overall structure design, function and future development direction of robots in the field of power inspection, laying a theoretical foundation for the research topic (Dong et al. 2023). Jiang et al. proposed an active posture relocation method to address the current situation that intelligent robots were prone to navigation errors and mechanical wear during substation intelligent inspection. A correlation model was established to describe the relationship between image plane pixel error and robot pose error. In addition, PID control strategy was introduced to avoid the degradation of 2D attitude estimation algorithm. Experiments verified the effectiveness of the proposed method (Jiang et al. 2022). To improve the efficiency of substation intelligent inspection, Jiang and other researchers proposed to apply power ubiquitous Internet of Things technology to the substation intelligent inspection process. The method calculated the GPS coordinates of substation according to its coordinates, and realized the inspection path planning through kinematic constraints and obstacle avoidance constraints. The inspection strategy combined with power ubiquitous Internet of Things technology could increase the inspection efficiency to 95.3% (Jiang et al. 2023). Fan et al. proposed a new power inspection method, which combined the UAV with the intelligent vehicle dual system, and gave full play to the unique advantages of UAV to realize intelligent power inspection. Among them, the drone was responsible for the detection of high-voltage transmission lines, and the intelligent vehicle was responsible for the detection of electricity meters. The proposed scheme could greatly reduce manpower and effectively improved the detection efficiency (Fan et al. 2021).

Significant progress has been made in infrared diagnostic technology and intelligent inspection, but there are still limitations. The above methods have improved the stability of image acquisition, but their applicability is limited. Although deep learning methods improve recognition accuracy, they rely on a large amount of annotated data and have high costs. Some systems perform well in high-voltage testing, but the application of low-voltage systems needs to be explored. The U-net model has high processing efficiency, but lacks adaptability to complex backgrounds. Some detection frameworks have high accuracy but require high hardware requirements. In terms of substation inspection, the robot design is effective, but its durability needs to be verified, and some analyses have not provided specific improvement plans. In addition, some methods have reduced navigation errors, but the performance of dynamic environments needs to be studied.

In summary, the above related research results have promoted the application of infrared technology in substation inspection, but often can only solve the problem of a single or similar equipment. In view of this, this study proposes an improved CBAM-YOLOv4 algorithm on the basis of UAV infrared diagnosis technology to optimize substation intelligent inspection system.

Substation intelligent inspection system based on CBAM-YOLOv4

Construction of power equipment inspection model based on CBAM-YOLOv4

The network structure of YOLOv4 target detection algorithm mainly includes Backbone Feature Extraction Network (BFEN) and Spatial Pyramid Pooling Network (SPPN) and Feature Aggregation Network (FAN) (Gai et al. 2023). BFEN performs feature extraction and cross-stage fusion of input images, SPPN enhances contextual features through pooling operations, while FAN synthesizes these features to enhance feature extraction and positioning accuracy to achieve accurate recognition. In the field of infrared image processing of power transmission and transformation equipment, deep learning frameworks such as YOLO are used to enable target detection networks to efficiently detect and locate abnormal hot spots of equipment (Li et al. 2021). Traditional YOLOv4 has some defects, such as a large number of network parameters and slow detection speed, which are improved in this study. Aiming at the disadvantage of the large number of YOLOv4 parameters, the study proposes to lightweight BFEN, that is, to replace the original BFEN with MobileNet-v3. The MobileNet-v3 network introduces a new nonlinear h-swish at the beginning and end, with faster computing speeds. swish nonlinear activation function, as an alternative to ReLU function, can significantly improve the accuracy of the neural network, and its function expression is shown in Eq. (1).

$$\operatorname{swish} \left( x \right)=x \cdot \sigma (\beta x)$$
(1)

In Eq. (1), X represents the input of the neural network layer; \(\sigma\) represents the Sigmoid function; \(\beta\) represents the tunable parameters of the function. Swish improves the accuracy, but the calculation in embedded environment is complex and the calculation cost is high, so it is improved, as shown in Eq. (2).

$${\text{ h-swish}}\left( x \right)=x\frac{{\operatorname{ReLU} 6(x+3)}}{6}$$
(2)

In Eq. (2), \(\operatorname{ReLU} 6\) represents a variation of the ReLU function that outputs the original input value between 0 and 6 and truncates the value outside this interval to either 0 (for inputs less than 0) or 6 (for inputs greater than 6). h-swish activation function can improve the efficiency of about 15% after quantization compared with swish, especially in the deep network (Li et al. 2023). When MobileNet-v3 is used for feature extraction of infrared images of power equipment, three-dimensional feature vectors can be generated, and the number of parameters is reduced from 55 million to 45 million, but the computational load is still very large for edge intelligent terminal computing units (Saputra 2021). Based on this, the common convolution whose convolution kernel size is 3 × 3 in SPPN is replaced by deep separable convolution, and the number of network parameters is further reduced by changing the convolution mode. The structure of an ordinary convolution and a depth-separable convolution is shown in Fig. 1.

Fig. 1
figure 1

Structure of ordinary convolution (a) and depthwise separable convolution (b)

In Fig. 1, the process of standard convolution is that N 3 × 3 convolution kernels are convolved with each channel of the input feature graph, and finally a new feature graph with the number of channels N is obtained. The number of parameters required for its calculation is shown in Eq. (3).

$${D_1}=C \times H \times 3 \times N$$
(3)

In Eq. (3), \({D_1}\) represents the number of parameters of an ordinary convolution; \(C,H\) represents the length and width of the input feature map; N represents the number of convolution nuclei. Depth-separable convolution is first convolved with three 3 × 3 convolution kernels and each channel of the input feature graph respectively to obtain a feature graph with input channel equal to output channel, and then convolved with N 1 × 1 convolution kernels to obtain a new feature graph with channel number N. The number of parameters required for calculation is shown in Eq. (4).

$${D_2}=C \times H \times 3+C \times H \times 1 \times N$$
(4)

In Eq. (4), \({D_2}\) represents the number of parameters for a depth-separable convolution. Since the input channel in ordinary convolution is much smaller than the output channel, and the result of \({D_2}/{D_1}\) is much smaller than 1, the same convolution effect can be obtained after the depth separable convolution, while the number of parameters in the convolution can be greatly reduced (Ding et al. 2022). After the BFEN is lightweight and the deep separable convolution is introduced, the calculation speed and operation efficiency of the model can be improved, but the feature extraction ability may be reduced. To improve this situation, CBAM is introduced into the feature fusion stage of the model in this study, and its workflow is shown in Fig. 2.

Fig. 2
figure 2

CBAM workflow

As a lightweight universal attention module, CBAM integrates spatial and channel attention mechanisms with little extra computational cost. The channel and spatial attention mechanisms significantly improve feature extraction through weight adjustment, which calculates weights based on the relative importance of features. After the feature map passes CBAM, the module successively generates attention maps along channels and spatial dimensions, and refines features through element multiplication (Dewi et al. 2022). This process has a significant improvement effect on the reduction of recognition ability caused by network lightweight. The calculation equation of the channel attention feature is shown in Eq. (5).

$${x_{ji}}=\frac{{\exp \left( {{A_i} \cdot {A_j}} \right)}}{{\sum\limits_{{i=1}}^{C} {\exp } \left( {{A_i} \cdot {A_j}} \right)}}$$
(5)

In Eq. (5), \({A_i},{A_j}\) represents the characteristics of channel \(i,j\); \({x_{ji}}\) represents the effect of channel i on channel j; C represents the length of the input feature map. Multiply the above result by a scaling parameter \(\alpha\) to obtain the final output, as shown in Eq. (6).

$${{\text{E}}_j}=\alpha \sum\limits_{{i=1}}^{c} {\left( {{x_j}{A_i}} \right)} +{A_j}$$
(6)

In Eq. (6), \(\alpha\) represents the scale parameter. The spatial attention mechanism generates weights based on the similarity of adjacent features, and the calculation formula for spatial attention features is shown in Eq. (7).

$${s_{ji}}=\frac{{\exp \left( {{B_i} \cdot {C_j}} \right)}}{{\sum\limits_{{i=1}}^{C} {\exp } \left( {{B_i} \cdot {C_j}} \right)}}$$
(7)

In Eq. (7), \({B_i},{B_j}\) represents the feature of space \(i,j\); \({s_{ji}}\) represents the effect of space i on space j. The more similar the feature representations of two locations are, the stronger the correlation between them. To realize target recognition and temperature extraction of typical components of power equipment, the overall network framework of improved target detection is shown in Fig. 3. In the figure, the improved overall network framework for object detection includes several key components. Firstly, MobileNet v3 is used as the backbone network, which is responsible for extracting multi-layer features, namely P5, P4, P3, and P2, with the size gradually decreasing. Next, the features are further hierarchically processed through a series of units that combine convolution, BAM, and CO, including 3, 5, and 5 convolution operations and up-sampling or down-sampling processes. Each feature layer outputs object detection results through YOLO Head. Finally, the FAN module integrates multi-scale features to improve the accuracy of target recognition and temperature extraction.

Fig. 3
figure 3

Improved overall network framework for object detection

CBAM-YOLOv4 integrated power equipment key location identification and temperature extraction model

Due to the contrast limitation of infrared thermal imaging images, edge characteristics of power equipment components are not significant, so sharpening processing is carried out to improve image details. In this study, a sharpening technique based on second-order differentiation is used to enhance the infrared image of power equipment and promote the feature capture capability of BFEN. The Laplace transform definition for the function of two variables \(f\left( {x,y} \right)\) is shown in Eq. (8).

$$F(s,t)=\int_{0}^{\infty } {\int_{0}^{\infty } {{e^{ - sx}}} } {e^{ - ty}}f(x,y)dxdy$$
(8)

In Eq. (8), \(F\left( {s,t} \right)\) represents the form of the original function \(f\left( {x,y} \right)\) after Laplace transformation; \(s,t\) represent the complex variable, the coordinates on the transformation complex plane associated with the variable \(x,y\); \({e^{ - sx}},{e^{ - ty}}\) represent the kernel function of the transformation, used to attenuate the corresponding variable of the original function. The mathematical expression of the second-order Laplacian is shown in Eq. (9).

$${\nabla ^2}f=\frac{{{\partial ^2}f}}{{\partial {x^2}}}+\frac{{{\partial ^2}f}}{{\partial {y^2}}}$$
(9)

In Eq. (9), \(x,y\) represents the Laplacian operator; \(\frac{{{\partial ^2}}}{{\partial {x^2}}}\) represents the second partial derivative of the function f with respect to a variable x; \(\frac{{{\partial ^2}}}{{\partial {y^2}}}\) represents the second partial derivative of the function f with respect to a variable y. In digital image processing, second-order Laplacian operators can be used to enhance details in images (Kumar et al. 2022). For the infrared image of power equipment, the operator highlights the edge by calculating the relationship between a pixel and its four adjacent pixels. This operation multiplies the gray value of the central pixel point by -4, and adds the sum of the grayscale values of the four nearest neighbors to sharpen the original infrared image. This method can effectively emphasize high-frequency details in infrared images of power equipment, such as component edges, and improve visual contrast, thus providing more vivid information for subsequent feature extraction. After each component of the power equipment is identified in the infrared image, the anchor frame coordinate value of each component is output. Subsequently, the image is converted through 256 levels of grayscale to quantify the infrared radiation intensity of each pixel (Peng et al. 2022). After conversion, the gray difference analysis method is used to extract the temperature value of the hot zone of the components in the anchor frame, which allows accurate evaluation of the thermal characteristics of the power equipment, so as to monitor and diagnose the health state of the equipment. The grayscale equation is shown in Eq. (10).

$${G_{{\text{gray }}}}(i,j)=0.39R(i,j)+0.5G(i,j)+0.11B(i,j)$$
(10)

In Eq. (10), \(R\left( {i,j} \right),G\left( {i,j} \right),B\left( {i,j} \right)\) represent the pixel value of the red channel, the green channel and the blue channel at the position \(i,j\) respectively Zu et al. (2022; Yan et al. 2021). Based on the intensity information contained in the gray image, the temperature difference in the infrared image is translated into gray difference by 256-level gray conversion. In the grayscale image, the heating area of the power equipment presents a higher brightness value than the normal temperature area because of the higher temperature. The linear transformation between gray value and temperature value is shown in Eq. (11).

$$T=\frac{{{T_{\hbox{max} }} - {T_{\hbox{min} }}}}{{255}} \times g+{T_{\hbox{min} }}$$
(11)

In Eq. (11), T represents the temperature at a certain point in the grayscale image; g represents the gray value of the point; \({T_{\hbox{min} }},{T_{\hbox{max} }}\) represents the lowest and highest temperatures in infrared images of power equipment. Therefore, this study proposes a CBAM-YOLOv4 power equipment component recognition and temperature extraction model, and the specific process is shown in Fig. 4. The overall model first processes raw power infrared thermal images through data enhancement techniques such as rotation, scaling, flipping, and sharpening, a process designed to increase the diversity of the data set and improve the robustness and stability of the target detection model to changes in the input data. Then, CBAM-YOLOv4 algorithm is adopted, which strengthens the model’s ability to capture the features of key components of power equipment through the attention module, thereby improving the accuracy of target identification and positioning. After the target detection is completed, the infrared thermal imaging image of the detected power equipment parts is processed with 256 levels of gray-scale. The processing step converts the color information in the image to gray information, and lays a foundation for temperature information extraction. By analyzing the gray difference between different pixels, the function mapping relationship between gray value and temperature is established. Through the mapping relationship, the temperature information of key components of power equipment can be accurately converted, and accurate temperature extraction can be achieved.

Fig. 4
figure 4

Power equipment component recognition and temperature extraction model based on CBAM-YOLOv4

Experiment on infrared recognition of substation intelligent inspection algorithm

Analysis of experimental results of algorithm performance recognition

This study selects 500 infrared image samples as the dataset, of which 250 samples are used for testing and the other 250 samples are used for training. The reason for selecting this dataset is that it can comprehensively cover various infrared abnormal heating faults that may be encountered in practical applications, ensuring the universality and representativeness of algorithm validation. All infrared image samples come from multiple industrial sites and laboratory environments, covering a variety of equipment and operating conditions. All images are captured under standardized conditions to ensure controllability of environmental temperature and equipment operation status. In addition, to improve the quality of data and the robustness of algorithms, the captured images undergo normalization and noise filtering. The experiments verify the accuracy of five recognition algorithms, including CBAM-YOLOv4 proposed in this study and comparison model YOLOv4, You Only Look Once version 3 (YOLOv3), lightweight YOLOv4 and Mask Region-based Convolutional Neural Network (MASK-RCNN). The specific test contents include weight file size /M, mIoU/%, mAP/%, and the number of frames transmitted per second. The configuration of experimental computers for this study is shown in Table 1.

Table 1 Computer hardware configuration

Table 2 shows the comparative results of the performance indicators of the five models. From Table 1, compared with the YOLOv3 algorithm, the mAP value of CBAM-YOLOv4 increased by 7.51%, the mIoU value increased by 10.34%, and the weight file size decreased by 5 M. CBAM-YOLOv4 used lightweight BFEN and CNAM, and the detection speed was increased by 18.05 frames per second and the mAP value was increased by 1.61% compared with YOLOv4. This is because CBAM-YOLOv4 further optimizes the BFEN, significantly improving the detection speed of the model while ensuring recognition accuracy. Due to the lightweight BFEN, the detection speed was improved, but the recognition accuracy was greatly reduced. As a two-stage detection algorithm, MASK-RCNN performed well in mIoU and mAP, but due to its complex network structure and multi-stage processing, its detection speed was significantly slower (only 9.49 frames per second). Compared with the two-stage recognition algorithm MASK-RCNN, the recognition accuracy and mIoU value of CBAM-YOLOv4 were both increased by more than 10%, and the recognition speed was also improved significantly. It can be seen that CBAM-YOLOv4 has superior recognition performance and meets the requirements for the security and stability of substation intelligent inspection.

Table 2 Comparison of performance indexes of different detection models

The anti-interference test of the designed CBAM-YOLOv4 model was carried out, and the experimental results are shown in Fig. 5. After adding noise with a density of 0.02 to the sample image, there was basically no effect. A set of images from different angles and positions were added to the test sample. For the test samples from different angles, the identification accuracy of the test samples was basically unchanged. For the test samples at different locations, the identification accuracy was reduced by only 0.2%. The channel and spatial attention mechanism were fully utilized in the model, and the feature extraction ability of the model was improved, so it has good robustness.

Fig. 5
figure 5

Performance of CBAM-YOLOv4 model under noise interference, different angles and positions testing

To further enhance the interpretability of the model and verify the effectiveness of CBAM introduced in CBAM-YOLOV4, the attention hot zone of different components of substation power equipment was analyzed. Figure 6 shows the attention heat zone diagram of different components of substation power equipment. From the figure, compared with the detection algorithm without CBAM, the attention area of CBAM-YOLOV4 was focused on the power equipment component area in the infrared image of the power equipment, which can further capture the relevant features of the power equipment. When using CBAM, the attention hotspots were dispersed and could not focus on the target components. However, after using CBAM, the attention mechanism effectively enhanced the focus on key components. Therefore, CBAM in the CBAM-YOLOV4 model had better effectiveness.

Fig. 6
figure 6

Attention heat zone diagram of different components of substation power equipment

To study the effectiveness of the introduced deep separable convolution, this study conducted an experiment on the influence of the model’s convolution mode on the recognition rate, and the experimental results are shown in Fig. 7. From Fig. 7 (a), when the number of data sets was small, ordinary convolution had a better recognition rate. However, with the increase of data sets, the CBAM-YOLOv4 model recognition based on depth-separable convolution proposed in this study has significantly improved compared with ordinary convolution. Finally, the recognition rate of depth-separable convolution was significantly higher than that of ordinary convolution. In Fig. 8 (b), the curve variation trends of the two methods were basically consistent with those of Fig. 7 (a). The results of both the training set and the test set showed that the depth-separable convolution model had better advantages for the identification of large data sets, and could better meet the requirements of power inspection in substation in actual situations.

Fig. 7
figure 7

Influence of model convolution mode on model recognition rate

Figure 8 shows the test results of the running time and recognition accuracy of the five models. In Fig. 8, Lightweight YOLOv4 required the least training time and test time (0.989s and 1.325s, respectively), but its recognition accuracy was only 83.48%. The main reason was that Lightweight V4 was lightweight, which reduced the overall running time of the model, but reduced its feature extraction capability, so the recognition accuracy was not high. For the proposed CBAM-YOLOv4 model, the training time and test time were 1.101s and 1.411s respectively, slightly higher than the running time of Lightweight YOLOv4, but the recognition accuracy reached 95.12%. The results showed that after the lightweight CBAM-YOLOV4 model, the introduction of CBAM enhanced the feature extraction capability of the model, so it had a high recognition accuracy. Compared with the other three models, CBAM-YOLOv4 had significant advantages in both running time and recognition accuracy. It can be concluded that the CBAM-YOLOv4 model has better applicability in the inspection process of substation power equipment.

Fig. 8
figure 8

Test results of running time and recognition accuracy of 5 models

Analysis of the impact of parameter settings on the results

The CBAM-YOLOV4 model not only had the performance recognition advantages of YOLOv4, but also carried the feature extraction advantages of CBAM. While comprehensively enhancing the information expression and recognition accuracy of the system, the accuracy rate of intelligent inspection reached 97.5%. In addition, this study also verified the impact of image resolution on the recognition accuracy of CBAM-YOLOv4.

Fig. 9
figure 9

Analysis of the influence of image resolution on the results

In Fig. 9 (a), for different power equipment components, the classification accuracy of CBAM-YOLOv4 proposed in this study was different. The main reason is that different power equipment components have different characteristics, and the extraction accuracy of CBAM-YOLOv4 for complex parts is lower than that of simple parts. According to Fig. 9 (b), the accuracy of CBAM-YOLOv4 algorithm increased with the increase of image pixels. In terms of required time, with the increase of image pixels, although the required time also gradually increased, its rise was small, so it also had a large advantage. In conclusion, the CBAM-YOLOv4 model proposed in this study effectively reduced the number of parameters required for model operation, improved the overall identification accuracy, and had good applicability for power inspection of substations.

Fig. 10
figure 10

Detection of infrared abnormal heating results in 24 test images using CBAM-YOLOv4 algorithm

Figure 10 shows 24 test pictures of infrared abnormal heating fault detected by CBAM-YOLOv4 algorithm. In this data, a total of 21 infrared hot spots were found and marked with red squares. The results showed that no error points detected by the CBAM-YOLOv4 algorithm were missed, and the confidence scores of the recognition results were generally high. The practice results show that this method can eliminate all kinds of interference in infrared images well, and can automatically diagnose and recognize abnormal hot spots in infrared images.

To further highlight the performance of the CBAM-YOLOv4 algorithm, the study compared three other latest and most advanced infrared recognition methods on complex datasets, namely EfficientDet, CenterNet, and YOLOv5. The experimental results are shown in Table 3. In the table, the weight file size of CBAM-YOLOv4 was 55 M, which was smaller than EfficientDet and CenterNet, and slightly larger than YOLOv5, but within a reasonable range. The mIoU, mAP, detection speed, and recognition accuracy of CBAM-YOLOv4 were 78.34%, 95.12%, 62.05 frames per second, and 95.12%, respectively, which were superior to other models. Therefore, CBAM-YOLOv4 has shown significant advantages in weight file size, mIoU, mAP, frame rate per second, and recognition accuracy, verifying its applicability and superior performance in intelligent inspection of substations.

Table 3 Performance comparison between other advanced algorithms and CBAM-YOLOv4

Conclusion

UAV recognition technology is difficult to accurately inspect the unattended substation, which brings great security risks to the substation. Based on the UAV infrared diagnosis technology, a CBAM-YOLOv4 model is constructed to design the substation intelligent inspection system. The experimental results showed that compared with YOLOv3, the mAP value and IoU value of CBAM-YOLOv were increased by 7.51% and 10.34% respectively, and the weight file size was decreased by 5 M. Compared with YOLOv4, the detection speed of CBAM-YOLOv4 was increased by 18.05 frames per second, and the mAP value was increased by 1.61%. Compared with MASK-RCNN, the recognition accuracy and mIoU value of CBAM-YOLOv4 were both increased by more than 10%. In terms of training time and test time, Lightweight YOLOv4 was 0.989s and 1.325s respectively, but its recognition accuracy was only 83.48%. The training time and test time of the CBAM-YOLOv4 model were 1.101s and 1.411s respectively, although the running time of the CBAM-YOLOV4 model was slightly higher than that of Lightweight YOLOv4, the identification accuracy reached 95.12%.

The above results indicate that the proposed CBAM-YOLOv4 model performs excellently in detection accuracy, recognition accuracy, and operational efficiency, and is of great significance for the maintenance and safety of substations. The CBAM-YOLOv4 model significantly improves detection accuracy and recognition accuracy. This system can more accurately identify and locate equipment faults in substations, reduce the probability of false alarms and omissions, improve inspection efficiency, and reduce the risk and cost of manual inspections. Secondly, the lightweight design and efficient operation of the model enable real-time monitoring in resource limited hardware environments, enhancing the practicality and adaptability of the system, and ensuring the safe and reliable operation of the substation in an unmanned state. In addition, the high robustness and reliability of CBAM-YOLOv4 enable it to maintain stable performance in complex environments, further ensuring the safety of substations. Therefore, the CBAM-YOLOv4 model can improve the detection accuracy and efficiency of substation fault equipment, and has important application value for maintaining the safe and stable operation of equipment. To a certain extent, it can ensure the stable transmission of electricity, which is of great significance for the development of people’s livelihood and economy in today’s society.

However, there are still some limitations to the research. The CBAM-YOLOv4 model has a strong dependence on large-scale data annotation, which increases the cost and complexity of data preparation. In addition, the model still needs to be optimized in terms of resource consumption in order to run more efficiently in resource constrained environments. Future research can explore the integration of multimodal data (such as vision and sound) into inspection systems to improve the accuracy and comprehensiveness of fault detection. Further optimization can focus on reducing the training time and resource consumption of the model, achieving more efficient deployment and application to meet the real-time and economic requirements in practical applications. Meanwhile, the introduction of self supervised learning and transfer learning techniques can reduce reliance on large-scale annotated data, further enhancing the practicality and generalizability of the model.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

  • Alhammad M, Avdelidis N, Deane S, Ibarra-Castanedo C, Pant S, Nooralishahi P, Ahmadi M, Genest M, Zolotas A, Zanotti-Fragonara L, Valdes J, Maldague X (2021) Diagnosis of composite materials in aircraft applications: towards a UAV-based active thermography inspection approach. Thermosense: Therm Infrared Appl XLIII SPIE 11743:35–41

    Google Scholar 

  • Andhy Panca Saputra K (2021) Waste Object Detection and classification using Deep Learning Algorithm: YOLOv4 and YOLOv4-tiny. Turkish J Comput Math Educ 12(14):1666–1677

    MATH  Google Scholar 

  • Dewi C, Chen RC, Jiang X, Yu H (2022) Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4. Multimedia Tools Appl 81(26):37821–37845

    Article  Google Scholar 

  • Ding P, Qian H, Chu S (2022) SlimYOLOv4: lightweight object detector based on YOLOv4. J Real-Time Image Proc 19(3):487–498

    Article  MATH  Google Scholar 

  • Dong L, Chen N, Liang J, Li T, Yan Z, Zhang B (2023) A review of indoor-orbital electrical inspection robots in substations. Industrial Robot: Int J Rob Res Application 50(2):337–352

    Article  MATH  Google Scholar 

  • Fan Y, Wang Z, Li Y, Liu W, Chen H, Qin K, Chen J, Zhang C (2021) Design of intelligent inspection scheme combining uav and intelligent vehicle. Int Core J Eng 7(8):223–228

    MATH  Google Scholar 

  • Gai R, Chen N, Yuan H (2023) A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput Appl 35(19):13895–13906

    Article  Google Scholar 

  • Han S, Yang F, Jiang H, Yang G, Zhang N, Wang D (2021) A smart thermography camera and application in the diagnosis of electrical equipment. IEEE Trans Instrum Meas 70:1–8

    MATH  Google Scholar 

  • Hong F, Song J, Meng H, Wang R, Fang F, Zhang G (2022) A novel framework on intelligent detection for module defects of PV plant combining the visible and infrared images. Sol Energy 236:406–416

    Article  MATH  Google Scholar 

  • Hu J, Xia R, Xu L, Hu X (2023) Application of 3D model in intelligent analysis and management of substation operation and inspection. Eighth International Conference on Energy Materials and Electrical Engineering (ICEMEE 2022)., SPIE, 12598: 568–574

  • Jiang Q, Liu Y, Yan Y, Xu P, Pei L, Jiang X (2022) Active pose relocalization for intelligent substation inspection robot. IEEE Trans Industr Electron 70(5):4972–4982

    Article  MATH  Google Scholar 

  • Jiang F, Song Q, Li C, Wang Y, Bao Z, Shao S (2023) Application of power ubiquitous internet of things technology in intelligent inspection of unattended substation. Second International Conference on Electronic Information Engineering and Computer Communication (EIECC 2022). SPIE, 12594: 37–42

  • Kim S, Kim D, Jeong S, Ham J, Lee J, Oh K (2020) Fault diagnosis of power transmission lines using a UAV-mounted smart inspection system. IEEE Access 8:149999–150009

    Article  MATH  Google Scholar 

  • Kumar S, Gupta H, Yadav D, Ansari I, Verma P (2022) YOLOv4 algorithm for the real-time detection of fire and personal protective equipments at construction sites. Multimedia Tools Appl 81(16):22163–22183

    Article  MATH  Google Scholar 

  • Li Q, Ding X, Wang X, Chen L, Son J, Song JY (2021) Detection and identification of moving objects at busy traffic road based on YOLO v4. J Inst Internet Broadcast Communication 21(1):141–148

    Google Scholar 

  • Li F, Gao D, Yang Y, Zhu J (2023) Small target deep convolution recognition algorithm based on improved YOLOv4. Int J Mach Learn Cybernet 14(2):387–394

    Article  MATH  Google Scholar 

  • Liu Y, Ji X, Pei S, Ma Z, Zhang G, Lin Y, Chen Y (2020) Research on automatic location and recognition of insulators in substation based on YOLOv3. High Voltage 5(1):62–68

    Article  MATH  Google Scholar 

  • Nie J, Luo T, Li H (2020) Automatic hotspots detection based on UAV infrared images for large-scale PV plant. Electron Lett 56(19):993–995

    Article  MATH  Google Scholar 

  • Peng G, Du B, Cao C, He D (2022) Pointer-type instrument positioning method of intelligent inspection system for substation. Journal of Electronic Imaging, 31(1): 13001.1-13001.13

  • Vergura S (2020) Correct settings of a joint unmanned aerial vehicle and infrared camera system for the detection of faulty photovoltaic modules. IEEE J Photovolt 11(1):124–130

    Article  MATH  Google Scholar 

  • Wang Z (2022) Researches advanced in transmission lines fault recognition based on line patrol UAV. 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022). SPIE, 12348: 200–206

  • Wang C, Yin L, Zhao Q, Wang W, Li C, Luo B (2020) An intelligent robot for indoor substation inspection. Industrial Robot: Int J Rob Res Application 47(5):705–712

    Article  MATH  Google Scholar 

  • Wang R, Chen J, Wang X, Xu J, Chen B, Wu W, Li C (2021) Research on infrared image extraction and defect analysis of composite insulator based on U-net. IEEE 5th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2021, 5: 859–863

  • Yan Y, Jiang W, Luo Z, Zhang J, Liu W (2021) System optimization and robustness stability control for GIS inspection robot in complex microgrid networks. Complexity 2021(2):1–12

    MATH  Google Scholar 

  • Zu W, Li Z, Nie L (2022) Research on the core algorithm of wireless charging technology for substation patrol robot based on electromagnetic resonance. Nonlinear Opt Quantum Opt 55(3/4):343–353

    Google Scholar 

Download references

Funding

This study has no funding.

Author information

Authors and Affiliations

Authors

Contributions

Daqi Tian: writing—original draft, formal analysis, writing—review and editing. Jinlin Chen: formal analysis, software. Xin Wang: software, visualization.

Corresponding author

Correspondence to Daqi Tian.

Ethics declarations

Ethical approval

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, D., Chen, J. & Wang, X. Application research of UAV infrared diagnosis technology in intelligent inspection of substations. Energy Inform 7, 56 (2024). https://doi.org/10.1186/s42162-024-00364-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42162-024-00364-w

Keywords