Skip to main content

An obstacle avoidance safety detection algorithm for power lines combining binocular vision technology and improved object detection

Abstract

In this paper, a framework of obstacle avoidance algorithm applied to power line damage safety distance detection is constructed, and its overall architecture and key processes are described in detail. The system design covers three core modules: visual data acquisition and preliminary processing, accurate target recognition and distance measurement, and system error analysis and correction. In the visual data processing chain, we deeply analyze every step from image acquisition to preprocessing to feature extraction, aiming to enhance the adaptability of applications to complex scenes. The target recognition and distance estimation part integrates advanced technology of deep learning to improve the reliability of recognition accuracy and distance estimation. In addition, many common error sources, such as system bias, parallax discontinuity, fluctuation of illumination conditions, etc., are discussed in depth, and corresponding correction strategies are proposed to ensure the accuracy and stability of the system, which provides powerful technical support for achieving efficient and accurate safety monitoring. Specifically, by carefully adjusting the learning rate, convolution kernel size, batch size, pooling layer type, and number of hidden layer nodes, we succeeded in improving the overall accuracy from the initial average of 92.4–95%, and the error rate decreased accordingly.

Introduction

With the continuous expansion and intelligent upgrading of power system, the safe operation of power line is very important as the artery of energy transmission. However, in complex and changeable outdoor environments, such as mountainous areas, forest areas and urban dense areas, power lines often face potential safety hazards caused by obstacles such as tree intrusion and foreign body mounting, which may not only lead to power supply interruption, but also cause serious accidents such as fire, posing a threat to public safety. Traditional power line inspection mainly relies on manual periodic inspection, which is inefficient and difficult to realize real-time monitoring, and difficult to meet the requirements of modern power grid for high efficiency and high security.

In recent years, with the rapid development of computer vision and machine learning technology, binocular vision technology has become an effective means to solve this problem because it can achieve three-dimensional information acquisition under non-contact conditions. By constructing a target detection and obstacle avoidance system based on binocular vision, real-time and accurate monitoring of the surrounding environment of power lines can be realized. However, the existing methods still have challenges in complex background, light variation and small target detection, which affect the detection accuracy and reliability.

Therefore, this research aims to propose a new obstacle avoidance detection algorithm for power lines by combining binocular vision technology with improved target detection algorithm. By optimizing the depth estimation of binocular vision, introducing advanced target recognition network, and combining scene understanding and intelligent analysis technology, it aims to improve the detection accuracy and robustness in complex environments, provide strong technical support for the safe operation and maintenance of power lines, and promote the intelligent transformation of the power industry (Kryukov et al. 2022).

In the field of industrial automation, non-contact detection technology is gradually replacing traditional contact detection methods, such as the use of sensors, cameras and radars to capture environmental information, and analyze and process it through advanced algorithms. These technologies enable real-time monitoring of environmental changes, prediction of potential safety risks, and timely action to achieve precise control of operational processes. The application path of external force safety distance detection is shown in Fig. 1.

Fig. 1
figure 1

Application of external force safety distance detection

In the logistics industry, the application of driverless vehicles and automated systems has led to significant improvements in transportation efficiency. Through contactless detection technology, these systems are able to accurately sense road conditions, vehicle positions and traffic signals to ensure the safety and accuracy of the transportation process. In addition, the automated warehouse management system uses non-contact detection technology to achieve rapid identification and accurate positioning of goods, greatly improving the efficiency of logistics operations (Choi et al. 2021).

In this context, binocular vision technology stands out as a milestone in perception technology. Through the cooperation of two cameras, the stereo vision of human eyes is simulated. Binocular vision technology can accurately determine the distance, depth and position of objects and realize accurate perception of three-dimensional space. This depth perception capability shows unparalleled advantages in safe distance measurement and obstacle avoidance, especially in dynamic and complex environments, where the three-dimensional visual information it provides is particularly critical for accurate navigation and collision avoidance. However, the existing target detection technology still faces many challenges in practical application. Environmental factors, such as light intensity, reflective target material, complex shape, dynamic occluder, etc., may affect the detection accuracy, resulting in high false detection and missed detection rate. Especially in the case of extremely high requirements for safety and accuracy, such as precision component assembly and dangerous goods handling, any minor detection error may cause serious consequences. Therefore, how to maintain stable and accurate detection capability in various complex environments and dynamic changes is a bottleneck that needs to be solved urgently.

Based on the above requirements, this study was born. We plan to deeply integrate binocular vision technology with innovative target detection algorithms, break the limitations of traditional technologies, and build a set of universal and high-precision external force safety detection and obstacle avoidance automation systems. This research deliberately bypassed the framework of specific applications derived from power lines and turned to a broader field of external force control, demonstrating the wide applicability and flexibility of binocular vision technology. Through algorithm improvement, we strive not only to improve the robustness of the system in various complex environments, but also to ensure the efficient implementation of safe distance control, providing solid technical support for intelligent automation operations. In short, this research is intended to lead the way in non-contact inspection technology innovation and drive a safer and smarter future of industrial automation.

The objectives of this research focus on the following key contributions: (1) propose an algorithm for efficiently integrating binocular vision and advanced target detection, optimize the distance estimation model, and improve the measurement accuracy and response speed; (2) design and implement dynamic obstacle avoidance strategies to enhance the adaptive ability of the system in dealing with unexpected situations in real time; and (3) demonstrate the universal applicability and effectiveness of the technology in external force control scenarios through empirical experiments and case analysis, and verify its technological leadership in the field of non-contact detection.

This research integrates binocular vision and deep learning to revolutionize power line safety monitoring. A robust obstacle avoidance system is developed to detect and assess safety distances in real-time. Advanced algorithms optimize depth perception and overcome environmental challenges, ensuring accurate measurements. Extensive experiments in diverse environments validate the algorithm’s precision and adaptability, highlighting its superiority over conventional methods. This innovation elevates safety standards, streamlines operations, and fosters innovation in smart grid infrastructures.

Literature review

Principles of binocular vision technology

While binocular vision technology fundamentally draws inspiration from the human visual system, employing dual cameras to perceive depth through parallax, this section will focus more on its advanced applications in obstacle avoidance and safety distance detection, rather than foundational principles. The literature highlights several studies that have significantly contributed to the enhancement of binocular vision systems for these purposes.

A notable contribution by Gimadiev (Gimadiev 2019) showcases the integration of binocular vision in autonomous vehicles for lane keeping and obstacle recognition. Their system utilizes sophisticated algorithms to interpret the disparity maps generated by binocular vision, enabling real-time decision-making for safe navigation. Similarly, drones equipped with binocular vision have demonstrated autonomous obstacle avoidance and precision landing capabilities, highlighting the technology’s versatility in diverse environments (Gimadiev 2019). The evolution of binocular vision algorithms from manual feature matching to deep learning models represents a significant leap in accuracy and robustness (Jiang et al. 2022). Early approaches relied on techniques like Scale-Invariant Feature Transform (SIFT) for matching features between images. However, contemporary systems have adopted Convolutional Neural Networks (CNNs) for feature extraction, significantly improving the precision of distance prediction. This transition has been crucial in adapting binocular vision systems to handle complex and dynamic environments, making them indispensable for intelligent systems (Kong et al. 2021). Recent studies have focused on enhancing binocular vision systems to perform reliably under challenging conditions. For instance, Ahmed et al. (Ahmed et al. 2024) proposes a method to mitigate the effects of varying illumination and camera motion, common issues in outdoor settings. Their approach involves preprocessing steps that normalize the images before disparity calculation, leading to more accurate depth estimations. This advancement is particularly important for safety distance detection in variable outdoor environments. Another trend in the literature is the integration of binocular vision with other sensing modalities to create a multi-sensor fusion system (Lee et al. 2023). By combining data from binocular vision with LiDAR or radar, researchers have developed systems that can provide more comprehensive situational awareness, enhancing the reliability of obstacle avoidance and safety distance detection. This multi-modal approach compensates for the limitations of individual sensors, creating a more robust and versatile sensing suite.

Target detection technology

As a core branch of computer vision, object detection technology has witnessed a revolutionary transformation from traditional image processing technology to deep learning, and then to the current hybrid algorithm integration trend, aiming at higher detection efficiency and robustness. Early target detection techniques focused on feature-based engineering, such as SIFT, HoGrahamic Interest Point, Haar-like features, etc., combined with sliding window scanning and classifiers such as SVM classifiers. These methods are effective in certain scenarios, but generally face the problems of difficult feature selection, large computation and insufficient environmental adaptability (Ahmed et al. 2024).

Currently, research trends focus on how to reduce model complexity while maintaining high accuracy in deep learning. Lightweight technologies, such as SSD, MOBILE-Net, etc., greatly reduce the amount of computation through network structure design and quantification, and are suitable for resource-limited environments (Lee et al. 2023). Meanwhile, hybrid methods combine traditional and deep learning, such as Haar-CNN and traditional feature fusion, seeking to balance accuracy and efficiency (Li et al. 2019). Driven by deep learning, the accuracy and application range of target detection technology have been rapidly improved, but the high cost of computing resources and adaptability are still challenges. In the future, research will focus on further optimizing algorithm efficiency, exploring new models such as transformers, self-attention mechanisms, and adaptive learning, while enhancing robustness to meet the needs of a wider range of scenarios (Alhassan et al. 2020).

For example, Zhou et al. (Zhou et al. 2022) describes in detail the feature pyramid and anchor box optimization strategies in YOLOV5 algorithm, and how these strategies improve detection efficiency and accuracy. Li et al. (Li et al. 2021) discusses the application of lightweight technology in target detection, how to reduce the calculation amount through network structure design and quantization, and is suitable for resource-limited environment. Alhassan et al. (Alhassan et al. 2020) studies how hybrid methods combine traditional and deep learning to balance accuracy and efficiency. Li et al. (Li et al. 2023) focuses on adaptive learning, how to strengthen the robustness of the model to meet the needs of a wider range of scenarios.

By studying these literatures, we can understand the development and future trend of target detection technology better, and provide useful reference for related research. At the same time, we should also pay attention to other related fields, such as image processing, computer vision, etc., to promote the further development of target detection technology.

A safety detection algorithm for power line obstacle avoidance

As the key infrastructure of energy transmission, the safe and stable operation of power lines is very important. However, under the influence of natural environment, human intervention and aging of equipment, power lines often face various obstacles and failure risks, such as tree invasion, wire falling, external force damage, etc., which may cause power supply interruption and even safety accidents. Therefore, the development of efficient obstacle avoidance safety detection algorithms for power lines becomes an important research direction to ensure the security of power grids.

In recent years, with the rapid development of Internet of Things, big data, artificial intelligence and other technologies, the safety monitoring means of power lines have become increasingly intelligent. The power line obstacle detection algorithms reported in the literature mainly include the following categories: (1) image-based recognition technology: using unmanned aerial vehicle inspection, satellite remote sensing or high-definition cameras installed on towers to collect line images, and automatically identifying potential obstacles (such as trees, buildings) and line defects (such as insulator damage, wire loosening) around the line through deep learning, machine vision and other technologies. This kind of method can realize non-contact and high-precision detection, but it requires high image definition and algorithm model accuracy (Zhao et al. 2022). (2) Sensor fusion technology: combining temperature, vibration, stress and other sensor data, through data analysis and pattern recognition technology, real-time monitoring of line operation status, timely detection of abnormalities. This multi-source information fusion method can evaluate the health status of transmission lines more comprehensively, but the implementation cost is relatively high and the data processing is complex (Lin et al. 2020). (3) Intelligent prediction model: use time series analysis and machine learning algorithms to predict possible problems of power lines, such as predicting tree growth speed according to historical data and environmental factors, and assessing its potential threat to the line. Such methods focus on preventive maintenance and reduce sudden failures, but modeling requires a large amount of accurate historical data to support it (Ma et al. 2024). (4) UAV autonomous inspection system: UAV system integrating GPS navigation, obstacle avoidance radar, HD camera and other technologies, which can inspect autonomously according to preset paths and feed back the line status in real time. UAV inspection flexibility is high, can quickly respond to emergencies, but limited by weather conditions, and need to solve the problem of accurate positioning and obstacle avoidance in complex environments.

The research scope of this paper involves the obstacle avoidance safety detection algorithm for power lines, which combines binocular vision technology with improved target detection technology. At the heart of the research is the development of a system capable of detecting and assessing the safe distance between power lines and intrusion threats in real time. Through breakthroughs in deep learning and computer vision, algorithms optimize depth perception, overcoming environmental challenges such as parallax errors and lighting variations to provide more accurate and responsive measurements. The main content of the study includes the refinement of the binocular vision module, which is equipped with adaptive illumination compensation and careful calibration to ensure stable operation under various conditions. In addition, an enhanced target detection framework is introduced, which consists of a neural network trained on a comprehensive dataset of typical obstacles in the powerline environment. This specialized network excels at recognizing small targets and navigating complex backgrounds, reducing false positives and false negatives even in low-light or partially occluded views (Malecki and Narkiewicz 2022).

In order to meet these challenges, safe distance detection technology needs to be continuously innovated and optimized. First, in dynamic environments, accurate distance detection and real-time response can be achieved by using advanced sensor technologies such as lidar, ultrasonic sensors, etc., as well as efficient data processing algorithms such as deep learning algorithms. Secondly, under complex conditions, the robustness of the system can be improved by optimizing the algorithm design to ensure that the system can operate stably in various complex environments. Thirdly, in resource-limited embedded systems, low-power operation can be achieved by adopting efficient algorithms and hardware design to meet the resource requirements of embedded systems. Finally, in terms of intelligent decision-making, self-learning and optimization of the system can be realized by introducing adaptive strategies and machine learning techniques to improve the intelligent level of the system (Zhu et al. 2023).

Application ofbinocular vision in obstacle avoidance of power line

When designing the system architecture for external force safety detection with binocular vision technology, we need to carefully consider how to efficiently integrate vision modules, optimize data processing processes, ensure accuracy of target recognition and distance estimation, and adopt effective strategies to reduce errors to achieve reliable safety control.

System architecture design

The core of constructing an efficient and accurate binocular vision external force safety detection system architecture lies in integrating key technologies such as vision perception, target recognition, distance estimation and intelligent decision-making to form a seamless processing chain. The architecture is divided into three core modules, each of which carries specific algorithms and optimization strategies to ensure efficient and accurate data processing. The names of the three core modules include vision data acquisition and preprocessing module, target recognition and distance calculation engine module, and system error analysis and correction mechanism. Figure 2 is a detailed algorithm architecture diagram.

Fig. 2
figure 2

Algorithm architecture diagram

The system architecture framework is a highly modular and scalable system designed to achieve real-time monitoring and precise control of complex industrial environments through the collaborative work of various core modules. By finely dividing and optimizing the function of each module, the system can effectively process and analyze a large amount of visual data, thus providing reliable decision support.

Visual data acquisition and preprocessing module, as the perception front end of the whole automation and intelligent system, plays a vital role. Through binocular vision sensor array, it efficiently and synchronously collects image data in the environment, providing rich and accurate information basis for subsequent image analysis and processing (Lin et al. 2020).

Weuse the formula to ensure time synchronization and image alignment, as shown in Eq. (1).

$${t_{sync}}={\text{min}}({t_{cam1}},{t_{cam2}})$$
(1)

Where, and are the image acquisition time stamps of the two cameras respectively.Subsequently, image enhancement algorithms are adopted in the preprocessing stage, such as gray scale, Gaussian noise filtering, etc., as shown in Eq. (2).

$${I_{denoised}}={I_{noisy}}{G_{kernel}}$$
(2)

Here, is the Gaussian kernel function to remove image noise, is the original image, is the denoised image.

Target recognition and distance calculation engine core uses deep learning algorithms, such as YOLO-net, for target recognition, while combining binocular parallax method to accurately calculate depth. Distance estimation is based on triangulation principles as shown in Eq. (3).

$$D=\frac{{f \times B}}{{tan(\theta ) \times disparity}}$$
(3)

Where D is the target distance, f is the focal length, B is the binocular baseline distance,\(disparity\) is the viewing angle, and \(\theta\) is parallax. Combine depth information to achieve accurate target location.

In order to ensure the accuracy and stability of detection, error analysis and correction mechanism is designed. The random error is reduced by multi-frame averaging, as shown in Eq. (4) (Ma et al. 2024).

$${D_{avg}}=\frac{{\sum\limits_{{i=1}}^{n} {{D_i}} }}{n}$$
(4)

In summary, the system architecture design integrates accurate acquisition of visual data, deep learning recognition and intelligent decision-making, combined with mathematical principles and formulas, to ensure the efficient and accurate application of binocular vision in external force safety detection, providing strong support for industrial automation safety.

Vision module and data synchronization

In the design of binocular vision system, the synchronization of vision module and data processing is the key link to ensure the quality of data and the real-time and accuracy of the system. It not only involves accurate image acquisition and time synchronization, but also includes image preprocessing, feature extraction and other steps to ensure the accuracy of subsequent target recognition and distance estimation. The visual module and data synchronization processing flow is shown in Fig. 3.

Image acquisition is the basis of vision module, and binocular camera is used to capture scene images synchronously. To ensure accurate time synchronization, hardware trigger synchronization or timestamp synchronization is usually used. Hardware triggering uses external signals to activate both cameras simultaneously, ensuring strict timing, simplifying to, which is particularly useful in industrial automation pipeline inspection, ensuring accurate capture of fast-moving objects (Malecki and Narkiewicz 2022).

For software timestamp synchronization, each camera image has an acquisition timestamp, and the system aligns the images by comparing these timestamps. This method is widely used in visual navigation of driverless vehicles to align the front and rear camera images in real time to avoid positioning errors caused by delays.

Pre-processing is an important step to improve image quality, including gray, noise removal, geometric correction, contrast enhancement and so on. For example, the gradation formula is specifically shown in Eq. (5). This is especially effective in resource-constrained embedded systems, reducing the computational burden.

$${I_{gray}}(x,y)=0.21R(x,y)+0.72G(x,y)+0.07B(x,y)$$
(5)

Feature extraction, such as SIFT,\(D(x,y)=(x,y) \cdot {\text{gradient}}(x,y)\)enhances the stability of target recognition in environments with large illumination changes by extracting invariant features of images, such as outdoor unmanned aerial vehicle logistics distribution.

Image alignment is the core of binocular vision, which finds the same name points based on pixel matching and calculates parallax. Using normalized cross-correlation matching, the gray-scale formula is specifically shown in Eq. (6).

$${C_{match}}(x,y)=\sum\limits_{{x^{\prime},y^{\prime}}} {{I_1}} (x+x^{\prime},y^{\prime}){I_2}(x,y^{\prime})$$
(6)

This method is still valid in the scene with less complex texture such as warehouse shelf recognition. Disparity map, which displays depth information intuitively, is the direct basis for obstacle avoidance judgment and path planning of robot.

Synchronization errors, such as time drift and inconsistent frame rates, need to be compensated. Time drift compensation formula, k value according to the system measured, to ensure the consistency of long-term operation. The frame rate is inconsistent, and the frame loss or interpolation method is adopted, as shown in Eq. (7). This method improves the adaptability of dynamic environment, such as tracking vehicles accurately in fast and slow changing traffic monitoring.

Equation (7) outlines an interpolation technique aimed at estimating missing or damaged frame data within a time series. \({I_{{\text{interpolated}}}}(x,y)\): The interpolated image intensity at coordinates (x, y).\(I({x_1},y)\) and \(I({x_2},y)\) : Image intensities at known timestamps \({x_1}\) and \({x_2}\) .\({x_1}\)and \({x_2}\): Timestamps of known frames, with \({x_1}\) <\({x_2}\).x: The timestamp of the frame being interpolated, situated between \({x_1}\) and \({x_2}\). \({x_1}\) - \({x_2}\): The time interval between the two known frames, influencing interpolation weights.\({x^2} - x_{1}^{2}\) and \({x^{}} - x_{1}^{2}\): Squared time differ/ences from the current to the known timestamps, contributing to the interpolation’s nonlinearity.(I(\({x_2}\),y) - I(\({x_1}\),y)\): The change in image intensity over time, aiding in capturing temporal trends.

These elements collectively enable the interpolation of intermediate frame data by leveraging the intensity values at known frames and considering the temporal dynamics between them.

$${I_{interpolated}}(x,y)=I({x_1},y)+\frac{{x - {x_1}}}{{x2 - {x_1}}}(I(x2,y) - I(x1,y))$$
(7)
Fig. 3
figure 3

Flow chart of synchronization processing between vision module and data

Target recognition and range estimation algorithms

In the design and implementation of binocular vision systems, target recognition and distance estimation algorithms are the cornerstones of precise detection and safety control strategies. The precision and complexity of this part of technology is directly related to whether the system can accurately identify other workpieces, objects or obstacles in a complex and changeable environment, and reliably evaluate their relative distances, providing a solid basis for subsequent decision-making and execution. The algorithm flow of target recognition and distance estimation is shown in Fig. 4.

Based on deep learning, especially the evolution of convolutional neural networks (CNN), target recognition technology has made significant progress. YOLO-net series, including YOLOv3, YOLOv4, etc., through the introduction of innovative anchor frame mechanism, not only achieve fast and accurate positioning of the target, but also complete the classification. The design of the loss function skillfully combines classification error and positioning error, using the formula, where the classification loss is the positioning loss, and the adjustment factor is used to ensure the balance and optimization of classification and positioning (Zhu et al. 2023).

In binocular vision, disparity map generation is an intuitive method to measure depth, by calculating the lateral displacement difference of corresponding pixels in left and right images. The formula for estimating depth reveals that D is depth, B is baseline distance, f is focal length, visual angle and parallax. This formula directly reflects the principle of how binocular system infers fixed distance from parallax.

To further improve the accuracy of distance estimation, multi-view fusion and depth refinement strategies are employed, such as semi-global matching (GM) and beam adjustment hierarchical estimation (BAC). The GM formula is shown in Eq. (8).

$$E(D)=\sum\limits_{{i,j}} C (i,j){D_i} - d_{{ij}}^{2}+\alpha \sum\limits_{i} {{D_i}} D_{i}^{2}$$
(8)

E as energy function, \(C\left( {i,j} \right)\) as cost, \(d\left( {i,j} \right)\) as disparity, D as depth and \(\alpha\) as regularity. These methods ensure high accuracy of distance estimation through deeper optimization, thus enhancing the reliability and practicability of the system.

The target detection algorithms used in this paper are YOLO-net family, including YOLOv3 and YOLOv4. By introducing innovative anchor frame mechanism, these algorithms not only achieve fast and accurate target location, but also complete classification. The design of the loss function cleverly combines classification error and positioning error, using the formula where classification loss is positioning loss and adjustment factors are used to ensure balance and optimization of classification and positioning.

Fig. 4
figure 4

Algorithm flow chart of target recognition and distance estimation

Error analysis and correction strategy

In the design and deployment of binocular vision system, although the technology of target recognition and distance estimation has become mature, there are many error sources in practice, such as illumination fluctuation, insufficient camera calibration accuracy, single environment texture, etc. In order to ensure high accuracy, the system must adopt a set of careful error analysis and correction strategy. Several typical error types and their correction methods will be discussed in depth below, and explained with formulas.

Systematic errors, arising from hardware differences from theoretical values or deviations during calibration. Non-ideal matching of base line distance B and focal length f will lead to systematic deviations in depth measurements. Accurate camera calibration, such as Zhang Zhengyou’s method, can obtain internal and external parameters. By optimizing the camera matrix and distortion coefficients, its formula reflects the multi-coefficient fitting process of minimizing errors, which requires accurate solution of camera parameters, as shown in Eq. (9).

$${\hbox{min} _{K,D}}\sum\limits_{i} | |{u_i} - K[R|t]{P_i} - K[D({u_i},{v_i})]|{|^2}$$
(9)

Here, K represents the camera internal parameter matrix, (R, t) is the external parameter rotation and translation, (D) distortion coefficient, is the world coordinate point, and is the mapping point of the image coordinate.

Disparity discontinuity is a common problem on edges or objects with obvious texture variation, which affects matching accuracy. Semi-global matching (SGM) algorithm reduces the error by optimizing the cost function, where, and the representative cost is a flat sliding term parameter, ensuring boundary consistency and reducing matching error.

The change of illumination conditions has great influence on image gray, and then on parallax calculation. Correction is critical, such as histogram equalization or gamma correction, to ensure consistent illumination. Gamma correction formula: where, is the correction map, I is the original map, and the illumination consistency is ensured through experimental optimization values (Nguyen et al. 2020).

To improve accuracy, multi-view fusion strategy fuses information from different views, weighted average or algorithm reduces uncertainty of single view. Set\({D_i}\)as the ith depth, the fusion depth\({{\mathbf{D}}_{{\mathbf{fuse}}}}\)is calculated by Eq. (10). Among them, the comprehensive performance of the system is improved based on the quality and consistency of the graph.

$${D_{fuse}}=\frac{{\sum\limits_{i} {{w_i}} {D_i}}}{{\sum\limits_{i} {{w_i}} }}$$
(10)

To sum up, error analysis and correction strategy is the core of binocular vision system. Through system calibration, algorithm optimization and information fusion, various errors are reduced and corrected to ensure high accuracy and stability of the system and meet the requirements of precision measurement and navigation. These strategies need to be iteratively optimized in design, taking into account practical application feedback, to adapt to changing environments and requirements, and to ensure long-term effectiveness and reliability of the system. The framework is shown in Fig. 5 (Vyas et al. 2019; Zhao et al. 2023).

Fig. 5
figure 5

Error analysis and correction strategy algorithm framework

Design and optimization of improved target detection algorithm

In the field of external force safety detection, continuous optimization and innovation of target detection algorithm of binocular vision technology is the core to improve system performance and accuracy. This chapter will explore the evolution strategy of the algorithm in depth, starting from key technical points, covering deep optimization of feature extraction, deepening of deep learning models, and real-time dynamic adjustment (Zhao et al. 2023; Feng et al. 2022).

From traditional feature-based Haar cascade classifiers to modern deep learning models such as YOLOv5 and RetinaNet, they not only achieve significant improvements in accuracy, but also speed up operations. The key point lies in the customized design of deep network, integrating attention mechanism, multi-scale feature fusion and deep optimization strategy, laying a foundation for the accuracy of algorithm.

To further deepen our understanding, we introduce the following formula to describe the key components in the deep learning model:

In deep learning models, attention mechanisms are usually implemented through an Attention Layer or Attention Module. These layers or modules can be integrated into multiple locations in the model, such as after convolutional layers, at the top of a feature pyramid network (FPN), or embedded into a sequence model. During training, the model learns how to dynamically adjust the attention weights according to task requirements through backpropagation algorithm. In this way, the model can automatically learn which regions or features are most important for the current task. The attention mechanism increases the attention of the model by weighting the importance of different regions. The formula can be expressed as Eq. (11). where Q, K, and V are query vector, key vector, and value vector, respectively, and are dimensions of key vector (Liu et al. 2023; Han et al. 2020).

$${\text{Attention}}(Q,K,V)={\text{softmax}}\left( {\frac{{Q{K^T}}}{{\sqrt {{d_k}} }}} \right)V$$
(11)

Multi-scale feature fusion can be realized by designing multi-scale feature extraction modules, which can generate feature maps of different scales simultaneously. For example, in YOLOv5, multi-scale feature fusion is achieved through the FPN module, which extracts features from feature maps at different scales and then fuses them together by upsampling and stitching. During training, the model learns how to adjust the weights of different scale features according to task requirements and input image characteristics. In this way, the model can make better use of feature information at different scales.

Multi-scale feature fusion improves the expressive ability of models by combining feature information of different scales. The formula can be expressed as Eq. (12). where, is the feature at the ith scale and is the corresponding weight (Xie et al. 2019; Rabah et al. 2018).

$${\text{Feature Fusion}}({F_1},{F_2}, \ldots ,{F_n})=\sum\limits_{{i=1}}^{n} {{\alpha _i}} {F_i}$$
(12)

Deep optimization strategy can be realized by adjusting network structure, weight initialization, learning rate scheduling, etc. These policies can be embedded in the design of the model, for example by considering regularization policies when designing the neural network architecture, or by dynamically adjusting the learning rate during training. During training, the model learns how to adjust the network structure to minimize the loss function. In this way, the model can automatically find an optimal network structure to improve the performance of the model.

Deep optimization strategy improves the performance of the model by adjusting the network structure. The formula can be expressed as Eq. (13). where W is the network weight, is the regularization coefficient, is the loss function, and is the weight regularization term.

$${\text{Optimization}}(W,\lambda )={\hbox{min} _W}\sum\limits_{{i=1}}^{m} \ell ({y_i},f({x_i};W))+\lambda \cdot {\text{Reg}}(W)$$
(13)

Experimental verification and case analysis

In order to ensure the reliability of the experiment and the validity of the algorithm, we adopt a series of advanced hardware equipment to carry out the experiment.

In the experiment, in order to ensure the accuracy and reliability of binocular vision system, we choose high-precision measurement equipment. Specifically, each binocular camera is equipped with the Sony IMX253 sensor, a CMOS image sensor designed for high resolution and low noise that delivers excellent image quality and detail capture. The sensor has a pixel size of 4.65 microns, which helps to obtain sharp images even in low light conditions. To ensure the accuracy of binocular parallax measurements, we used the Pentax KAF-2000E lens, which is known for its high resolution and low distortion characteristics, providing sharp image quality and consistent performance even over a wide range of focal length changes. The focal length of the lens is 16 mm and the aperture range is f/2.8-f/22, which allows us to flexibly adjust the aperture size according to different lighting conditions to obtain the best image contrast and depth information.

Specifically, we utilized two high-resolution binocular cameras, each with a baseline resolution of 3840 × 2160 pixels and a baseline separation of 5 cm, to capture high-quality stereoscopic images. To achieve accurate timestamp synchronization of images, we deployed a hardware-triggered synchronization mechanism to ensure that the data captured from both perspectives were strictly aligned in time. All image processing and algorithmic calculations are done in real-time on a high-performance GPU server. Equipped with NVIDIA Tesla V100 GPUs and 32GB of HBM2 memory, the server provides excellent parallel computing power to accelerate the training and inference process of deep learning models. In addition, the server is equipped with Intel Xeon scalable processors, providing powerful central processing power to ensure efficient operation of the entire system. During different assembly line simulation periods, we collected video data under normal light, low light, complex background, dynamic occlusion light and other conditions to build a comprehensive dataset covering various loads and abnormal situations. The dataset includes not only video clips recorded under normal ambient light, but also scenes with severe light changes, single texture targets, multiple targets and dynamic disturbances, ensuring that the algorithm can adapt to all possible power line damage scenarios, thus verifying its robustness and generalization ability. The hardware resource utilization evaluation algorithm exhibits efficient hardware resource management capability during execution. CPU utilization remained around 12.5%, with an average occupancy ranging from 5 to 15%, indicating that the algorithm consumes low CPU resources and is suitable for running on medium performance hardware. The memory footprint ranges from 20 MB to 250 MB, suitable for medium to large tasks, showing that the algorithm can handle large amounts of data while maintaining a low memory footprint. GPU usage ranges from 25 to 35%, which means that the algorithm makes full use of GPU computing power while avoiding waste of resources. Disk I/O read rates of 20 MB/s, average occupancy rates between 15 and 25 MB/s, network bandwidth requirements of 2.5 Mbps, average occupancy rates in the range of 2-3Mbps, these data show that the algorithm is equally efficient in data transfer, ensuring smooth operation of the entire system.

Table 1 details the characteristics of the dataset used to construct the experiment, covering video clips under a variety of conditions, in order to fully test and verify the robustness and generalization ability of the algorithm. For example, sequence 1 describes a light-stable environment with high environmental complexity, meaning that the video contains rich texture and detail, the number of objects varies between 10 and 150, and no dynamic interference, meaning that the accuracy of static object recognition is tested (Huo et al. 2018; Wang et al. 2023; Han et al. 2019).

Table 1 Data set details

Table 2 defines the criteria for algorithm evaluation, including accuracy rate (TP / (TP + FP + FN), i.e., the proportion of true predictions in total predictions; recall rate (TP / (TP + FN), the proportion of true predictions in true existence, emphasizing the completeness of predictions; F1 score, which combines accuracy rate and recall rate and is a common indicator for evaluating the comprehensive performance of classification models; processing time, reflecting algorithm efficiency, is the time required to process one frame of images on average (Ling et al. 2019; Gragnaniello et al. 2023).

Table 2 Definition of evaluation indicators

Table 3 summarizes the performance of the algorithm under different conditions. For example, in condition 1, the accuracy rate reaches 97.3%, the recall rate 95.6%, the F1 score 96.45, and the average processing time 18 milliseconds, showing good recognition effect and efficiency under normal stable light.

Table 3 Experimental results

Table 4 verifies the practicability of the algorithm through specific cases, such as in production line monitoring (condition 2), the accuracy rate is 93.8%, and the F1 score is 92.65, which proves that the algorithm can still accurately warn the robot arm abnormality under dynamic illumination changes, reflecting the ability to solve practical problems.

Table 4 Summary of Case studies

Table 5 clarifies the improvement space, such as the accuracy rate increased from 92.4 to 95%, which is based on the gap between the current performance and the ideal state, the processing time reduced from 25ms to 20ms, in order to improve the real-time performance, the light adaptability enhanced, reflecting the direction of the algorithm in the variable environment.

Table 5 Quantitative indicators of performance improvement suggestions
Table 6 Comparison of algorithm parameter adjustment

In Table 6, we can see that the accuracy and processing time of the algorithm can be significantly affected by adjusting the learning rate, convolution kernel size, batch size, pooling layer type, and hidden layer node number. For example, adjusting the learning rate from 0.001 to 0.0005 improves accuracy by 1.2% while reducing processing time by 2 ms. This is because a smaller learning rate helps the model learn more stably, which improves accuracy, and reduces unnecessary calculations, which reduces processing time. Increasing the convolution kernel size and the number of hidden layer nodes, while increasing the computational effort, can improve accuracy by providing more feature learning and more complex decision-making capabilities.

In Table 7, we can see that the algorithm uses hardware resources efficiently. CPU utilization remained between 12.5% and 15%, indicating that the algorithm consumes relatively little CPU resources and is suitable for running on medium performance hardware. The memory footprint is between 20 and 250 MB, suitable for medium to large tasks, indicating that the algorithm is capable of handling large amounts of data while maintaining a low memory footprint. GPU utilization ranged from 25 to 35%, indicating that the algorithm was able to fully utilize GPU computing power without causing waste of resources. Consideration of disk I/O and network requirements shows that the algorithm is also efficient in data transmission. Overall, these data show that the algorithm is efficient in resource management and can make rational use of hardware resources while maintaining high accuracy.

Table 7 Resource Consumption Assessment

Finally, experiments show that the algorithm performs well in binocular visionpower line external force damage safety detection. Through data set diversity test, strict evaluation index and actual case analysis, the accuracy and real-time performance of the algorithm are verified. Meanwhile, the potential space and optimization path for performance improvement are pointed out, which provides a solid foundation and direction for the further development of the algorithm.

To further validate the superiority of our proposed binocular vision-based power line external force damage safety distance detection system compared to existing technologies, we compare the performance of our algorithm against several advanced detection algorithms recently published in top-tier journals and international conferences. Below are two critical tables outlining the comparison results:

Table 8 Performance Metrics comparison

From Table 8, it is evident that our algorithm outperforms or at least matches the most recent algorithms published in terms of all performance metrics. Especially in terms of average processing time and F1 score, our algorithm demonstrates a significant advantage, implying that it can respond quickly while ensuring high precision, making it particularly suitable for real-time monitoring applications such as power line external force damage safety distance detection.

Table 9 Algorithm characteristics and Applicability Comparison

Table 9 highlights the characteristics and applicable scenarios of each algorithm. Although other algorithms may perform well in certain specific environments, the stability of our algorithm under complex lighting conditions and dynamic interferences, along with its processing speed, make it an ideal choice for power line safety monitoring.

Through these comparisons, we can confidently assert that our combination of binocular vision technology and improved target detection algorithm holds a leading edge in the field of power line external force damage safety distance detection, capable of meeting the high demands for precision, real-time response, and robustness in practical applications.

Conclusion

Through detailed experimental design, implementation, data set preparation, performance test and evaluation index, result analysis, and case analysis, this study comprehensively demonstrates the practicability and effectiveness of binocular vision technology inpower lineexternal force damage safety distance detection. Experimental results show that the algorithm performs well under various conditions, with high accuracy and recall, and short processing time, which proves the accuracy and real-time of the algorithm. At the same time, through case analysis, we verify the practicability of the algorithm in the actual production line environment, such as successful warning of robot arm abnormality, effective identification of goods stacking instability and so on. The application framework of binocular vision technology in external force damage safety distance detection proposed in this study realizes efficient and accurate detection effect through carefully designed system architecture, algorithm optimization and error correction strategy. This framework brings a technical innovation to the field of industrial automation safety control and has a wide application prospect.

Data availability

The data supporting the findings of this study are available within the article.

References

  • Ahmed F, Mohanta JC, Keshari A (2024) Power transmission line inspections: methods, challenges, current status and usage of unmanned Aerial systems. J Intell Robotic Syst 110(2):54

    Article  Google Scholar 

  • Alhassan AB, Zhang XD, Shen HM, Xu HB (2020) Power transmission line inspection robots: a review, trends and challenges for future research. Int J Electr Power Energy Syst 118:105862

    Article  Google Scholar 

  • Choi H, Koo G, Kim BJ, Kim SW (2021) Weakly supervised power line detection algorithm using a recursive noisy label update with refined broken line segments. Expert Syst Appl 165:113895

    Article  Google Scholar 

  • Feng WW, Liang ZR, Mei J, Yang SJ, Liang B, Zhong X, Xu J (2022) Petroleum Pipeline Interface Recognition and pose detection based on Binocular Stereo Vision. Processes 10(9):20

    Article  Google Scholar 

  • Gimadiev RS (2019) Power Line Deformation dynamics. Mech Solids 54(6):903–914

    Article  Google Scholar 

  • Gragnaniello D, Greco A, Saggese A, Vento M, Vicinanza A (2023) Benchmarking 2D multi-object detection and Tracking algorithms in Autonomous Vehicle driving scenarios. Sensors 23(8):24

    Article  Google Scholar 

  • Han Y, Zhao K, Chu ZN, Zhou Y (2019) Grasping control method of Manipulator based on Binocular Vision Combining Target Detection and Trajectory Planning. Ieee Access 7:167973–167981

    Article  Google Scholar 

  • Han Y, Chu ZN, Zhao K (2020) Target positioning method in binocular vision manipulator control based on improved canny operator. Multimedia Tools Appl 79(13–14):9599–9614

    Article  Google Scholar 

  • Huo GY, Wu ZY, Li JB, Li SJ (2018) Underwater target detection and 3D Reconstruction System based on Binocular Vision. Sensors 18(10):21

    Article  Google Scholar 

  • Jiang W, Shi YT, Zou DH, Zhang HW, Li HJ (2022) Research on mechanism configuration and dynamic characteristics for multi-split transmission line mobile robot. Industrial Robot-the Int J Rob Res Application 49(2):200–211

    Article  Google Scholar 

  • Kong FZ, Xu W, Cai YX, Zhang F (2021) Avoiding dynamic small obstacles with Onboard Sensing and Computation on Aerial Robots. Ieee Rob Autom Lett 6(4):7869–7876

    Article  Google Scholar 

  • Kryukov A, Suslov K, Thao LV, Hung TD, Akhmetshin A (2022) Power Flow Modeling of Multi-circuit Transmission Lines. Energies 15(21):8249

    Article  Google Scholar 

  • Lee C, Chung D, Kim J, Kim J (2023) Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment. IEEE-Asme Trans Mechatronics. https://arxiv.org/abs/2307.09845

  • Li B, Li SL, Cao M, Zhang LS, Liu QC, Yang M et al (2019) A novel fast extraction algorithm of Power Line in Complex background. J Nanoelectronics Optoelectron 14(4):532–542

    Article  Google Scholar 

  • Li YC, Zhang WB, Li P, Ning YH, Suo CG (2021) A Method for Autonomous Navigation and Positioning of UAV based on electric field array detection. Sensors 21(4):1146

    Article  Google Scholar 

  • Li XP, Fan X, Shang DY, Peng JW (2023) Dynamic performance analysis based on the mechatronic system of power transmission line inspection robot with dual-arm. Proc Institution Mech Eng Part C-Journal Mech Eng Sci 237(22):5391–5408

    Article  Google Scholar 

  • Lin W, Yang ZF, Yu J, Li WY, Lei Y (2020) Improving security and economy of interconnected power network through explicit feasible region of tie-line power transfer. Int J Electr Power Energy Syst 123:106262

    Article  Google Scholar 

  • Ling X, Zhao YS, Gong L, Liu CL, Wang T (2019) Dual-arm cooperation and implementing for robotic harvesting tomato using binocular vision. Robot Auton Syst 114:134–143

    Article  Google Scholar 

  • Liu TH, Nie XN, Wu JM, Zhang D, Liu W, Cheng YF et al (2023) Pineapple (< i > Ananas comosus) fruit detection and localization in natural environment based on binocular stereo vision and improved YOLOv3 model. Precision Agric 24(1):139–160

    Article  Google Scholar 

  • Ma WJ, Xiao J, Zhu GY, Wang J, Zhang DC, Fang X, Miao Q (2024) Transmission Tower and Power Line detection based on improved Solov2. IEEE Trans Instrum Meas 73:5015711

    Article  Google Scholar 

  • Malecki T, Narkiewicz J (2022) A collision avoidance algorithm in the simultaneous localization and mapping problem for mobile platforms. J Theoretical Appl Mech 60(2):317–328

    Google Scholar 

  • Nguyen V, Jenssen R, Roverso D (2020) LS-Net: fast single-shot line-segment detector. Mach Vis Appl 32(1). https://doi.org/10.1007/s00138-020-01138-6

  • Rabah M, Rohan A, Talha M, Nam KH, Kim SH (2018) Autonomous Vision-based Target Detection and Safe Landing for UAV. Int J Control Autom Syst 16(6):3013–3025

    Article  Google Scholar 

  • Vyas A, Vachhani L, Sridharan K (2019) Hardware-efficient interval analysis based collision detection and avoidance for mobile robots. Mechatronics 62:102258

    Article  Google Scholar 

  • Wang LX, Wang B, Zhang JX, Ma HR, Luo P, Yin TR (2023) An Intelligent Detection Method for Approach Distances of Large Construction Equipment in substations. Electronics 12(16):22

    Article  Google Scholar 

  • Xie YG, Xing JY, Liu GJ, Lan JY, Dong YY (2019) Real-time Reconstruction of unstructured scenes based on binocular vision depth. J Internet Technol 20(5):1611–1623

    Google Scholar 

  • Zhao HN, Wang CQ, Guo R, Rong XW, Guo JM, Yang QX et al (2022) Autonomous live working robot navigation with real-time detection and motion planning system on distribution line. High Voltage 7(6):1204–1216

    Article  Google Scholar 

  • Zhao L, Yao HT, Fan YJ, Ma HH, Li ZH, Tian M (2023) SceneNet: a multifeature joint Embedding Network with Complexity Assessment for Power Line scene classification. IEEE Trans Aerosp Electron Syst 59(6):9094–9116

    Article  Google Scholar 

  • Zhou ZX, Miao NP, Chen XZ, Li Y, Ding L, Shuang F (2022) PLENet: efficient power line extraction network based on UAV aerial imagery. J Appl Remote Sens 16(3). https://doi.org/10.1117/1.jrs.16.034512

  • Zhu GY, Zhang WX, Wang M, Wang J, Fang X (2023) Corner guided instance segmentation network for power lines and transmission towers detection. Expert Syst Appl 234:121087

    Article  Google Scholar 

Download references

Funding

No funding supports.

Author information

Authors and Affiliations

Authors

Contributions

G.L. methodology; D.L. investigation; W.S. data curation; Z.X. supervision; R.L. methodology; J.F. methodology.

Corresponding author

Correspondence to Ruchao Liao.

Ethics declarations

Ethical approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, G., Li, D., Sun, W. et al. An obstacle avoidance safety detection algorithm for power lines combining binocular vision technology and improved object detection. Energy Inform 7, 72 (2024). https://doi.org/10.1186/s42162-024-00378-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42162-024-00378-4

Keywords