- Research
- Open access
- Published:

# Fault traceability of power grid dispatching system based on DPHS-MDS and LambdaMART algorithm

*Energy Informatics*
**volume 7**, Article number: 85 (2024)

## Abstract

Under the background of increasing system service data, it is difficult to trace the faults of power dispatching system. The current fault tracing method has some problems, such as low precision, low efficiency and difficult troubleshooting. To solve this problem, a fault tracing method based on data partition hybrid sampling method and multiple incremental regression tree algorithm is proposed. In this paper, it first uses the hybrid sampling method of data partition and dynamic selection technology to detect the business anomaly, and then applies the clustering algorithm and the information difference graph model to realize the fault tracing of system components. The experimental results showed that the F-metric value and geometric mean value of the study design method were 0.964 and 0.685, respectively. In addition, normalized discounted cumulative gains were observed in the top 10, and the mean average precision of the top 10 was 0.752 and 0.186, respectively. The proposed method can effectively improve the fault tracing efficiency of power grid operation and maintenance personnel, and provide strong data support for the safety maintenance of power grid dispatching system.

## Introduction

With the continuous increase in the amount of relevant data in the power grid system and the gradual acceleration of data growth, the grid efficiency requirements of power management are also getting higher and higher (Tang et al. 2022; Cui et al. 2022). As the decision-making brain of the power management system, the performance of the dispatching system largely determines the operation efficiency of the entire power grid (Huang et al. 2022; Shirzadi et al. 2022). As the product of the combination of artificial intelligence and power management technology, D5000 system has management functions such as synchronous monitoring, power stability testing, and service scheduling control in the power network (Wang et al. 2022a, 2022b). Although the existing research has made remarkable progress in improving the ability of system management and scheduling, there are still some limitations that cannot be ignored. First of all, the existing methods often face the problems of high computational complexity and long model training time when dealing with large-scale, high-dimensional and nonlinear power dispatching system data. In the face of unknown or rare fault types, traditional fault tracing methods often rely on prior knowledge or specific assumptions, and are often difficult to deal with effectively. Given the above limitations, model dynamic selection driven by data partition hybrid sampling (DPHS-MDS) is combined with Lambda Multiple Additive Regression Tree (LambdaMART) based on integrated learning forest algorithm to construct a new fault tracing method for power grid dispatching system, to solve the problems of data imbalance, low efficiency of model training and inaccurate fault detection and sequencing in power dispatching system.

The novel contribution of the study is to achieve more accurate fault detection and sequencing under different grid operating states by dynamically adjusting the sampling ratio and model selection. In addition, a fault tracing method of system components based on clustering algorithm and information difference graph model is designed. The system components are grouped by clustering algorithm, and the correlation between components is analyzed by combining the information difference graph model, which can quickly locate the fault source and provide an intuitive fault tracing path for operation and maintenance personnel.

## Related works

Many experts have conducted research on the data partition hybrid sampling (DPHS) technique. Scholars have also conducted research on the LambdaMART algorithm based on ensemble learning forest algorithm (Zhu et al. 2022a, 2022b). Le et al. proposed a comprehensive sampling parameter optimization method for DPHS platform development. It formulated the sampling parameter model to ensure the performance of a hybrid sampling simulation platform based on similarity indicators. The experiment outcomes illustrated that the proposed method had the effect of improving the technical performance of the hybrid simulation platform (Le et al. 2022). Ibrahim and Younis proposed a variable neighbor search algorithm based on four random probability distributions. It performed adaptive learning based on the objective function of the learning ranking problem and completed the determination of the next optimal solution during the stage of verifying the global optimal solution. The experimental results indicated that the proposed algorithm had an optimization effect on learning ranking methods (Ibrahim and Younis 2023). He et al. proposed a ranking prediction model based on ranking learning method. Based on the relevant information collected on the Internet, an enterprise data set with four major features was constructed through data analysis and other operations, and then the learning ranking method was introduced for model training (He et al. 2023). Zhou et al. proposed an algorithm based on familiarity. A series of significance measures were used to quantify each indoor landmark feature, and the LambdaMART method was used to refer to hierarchical learning to classify the quantization combination into familiar and unfamiliar models. The experiment outcomes illustrated that the proposed method had a promoting effect on the further development of indoor navigation systems (Zhou et al. 2022).

Many scholars have also conducted research on fault tracing methods. Tian et al. raised a coordinated dispatching model to jointly optimize the physical distribution network and computer network, coordinating the physical information operation between the two networks. Moreover, it modeled the backup power supply system, topology reconstruction, and distributed generators. The experiment outcomes illustrated that the raised model could promote the efficiency of repair group scheduling in information physical distribution systems (Tian et al. 2022). Siu et al. proposed a command authentication method. It simplified the problem based on market background information, developed an invisible vector for control signals through risk assessment, and then detected and verified control commands based on a multi-agent system. The experiment outcomes illustrated that the proposed method could achieve high-precision detection (Siu et al. 2022). Bebars et al. proposed a classification technique. Based on traditional fault localization, detection technology, and modeling, it optimized the relevant parameters of the technology, and constructed a model by combining the detection of fault parameters. The experimental results indicated that the proposed technology provided technical reference for research related to fault classification (Bebars et al. 2022). Mansoor et al. proposed a reverse fault propagation method. When determining abstract system configurations, system functional models, and system behavior models, it could obtain system component patterns and other functional states through known state functions. The experiment outcomes illustrated that the proposed method could achieve the effect of analyzing the source of functional faults in conceptual system design (Mansoor et al. 2023).

In summary, many researchers are working on combining data processing and algorithms to achieve fault tracing. However, the existing method of taking inverse anomaly samples does not consider the spatial distribution of data, and the simple static strategy is adopted in the model combination strategy to realize the superposition of models. The applicability of this method is low, and the accuracy of fault tracing is not ideal. In view of this, a fault tracing method of power grid dispatching system based on DPHS-MDS and LambdaMART is proposed in this paper, aiming to further improve the fault tracing efficiency and maintain the normal operation of power system.

## Methods and materials

### Imbalanced ensemble classification method for DPHS-MDS

Before tracing the faults of the grid dispatching control system, the problem of data imbalance needs to be solved, because unbalanced data sets often lead to degraded performance of the model on a few fault classes. Therefore, an unbalanced integrated classification method based on DPHS-MDS is proposed to optimize the data processing and model training flow in the fault tracing of power grid dispatching system. The power grid dispatching system consists of communication technology, physical hardware environment, and system algorithm (Dutta et al. 2022; Wellendorf et al. 2023). As the command center of the power grid, the D5000 system will fail to operate normally if a system service failure occurs (Dutta et al. 2022; Ghasemi et al. 2022). The D5000 system failure problem can be likened to the unbalanced binary classification problem in the field of machine learning technology. Through three sampling operations, oversampling, undersampling and hybrid sampling, the original data is preliminarily processed and a balanced data set is obtained. To achieve unbalanced ensemble classification, the study uses a regional partitioning and sampling strategy (DPHS) to balance the majority and minority class samples in the dataset. Then, the model dynamic selection (MDS) technology is studied to further optimized the classification results by dynamically selecting suitable models. The number of nearest neighbor points in the balanced data set was set as 6, as shown in Fig. 1.

From Fig. 1, the original data set shows the regional distribution of majority and minority classes. In the region diagram, 6 existing minority class neighbor point location regions are identified. After the region division is completed, the nearest neighbor algorithm is used to screen the nearest neighbor points of each minority class sample, and the majority of class samples within the range of neighbor points are imported into the boundary area. Then, three conditional judgments are made. The first conditional judgment is shown in formula (1).

In formula (1), \({N_{i+}}\) and *I* represent the amount of minority and majority classes within the range of neighboring points, respectively. The second conditional judgment is shown in formula (2).

In formula (2), *k* represents the number of neighboring points. The study identified \({N_{i+}}\) through experiments based on specific datasets. By constantly adjusting the *k* value, it can observe its influence on the model performance, and then choose the best \({N_{i+}}\) value. The third conditional judgment is shown in formula (3).

After confirming that the judgment is passed, all minority class samples are imported into the corresponding area to obtain the majority class safety zone and filtering set, and then the data area parameters are determined. Due to the shortcomings of existing sampling methods in the sample area, a DPHS method is proposed in this study. Firstly, the sample set of the undersampled boundary area is determined by the undersampled method. A few safe-like sample sets are stored, and three sample sets are combined to determine the sampling data set. A DPHS-MDS method is constructed based on the hybrid sampling method of sample region division combined with MDS technology. The specific process is shown in Fig. 2.

In Fig. 2, the research first filters the noise points, deletes a few noise-like areas, and retains the remaining areas by sampling. Then, the filtered data set and the sampled balanced data set are used to generate the original and biased random forest (RF) models, and the classifiers of the two models are integrated to obtain the hybrid model DPS-MDS. Finally, the types of test points are judged, suitable models are selected for different types of test points based on the idea of dynamic selection, and the final classification results are obtained through hard voting. DPHS-MDS sets two parameters to determine the corresponding generated RF size, and then determines the ratio size of RF1 and RF2. The parameter values for determining the size of the RF are shown in formula (4).

In formula (4), *q* represents the parameter that determines the size of the RF. The RF parameter *S* can control the complexity and generalization ability of the model. When its value is not suitable, the model will appear overfitting and underfitting. To determine the two parameters *S* and *q*, the transformation relationship between the two parameters is defined, as shown in Eq. (5).

In formula (5), *S* represents the size of the RF. Finally, an MDS is used to determine the model type for different test point types, and the final classification result is selected through hard voting. The MDS process is shown in Fig. 3.

In Fig. 3, the training set and test set given a second-class unbalanced training set are studied. First, the training set is traversed, the number of minority classes in each test point is found through the K-nearest neighbor algorithm, and the regional data is divided into three regions according to the number of minority classes, a large number of surrounding majority classes and a small number of surrounding majority classes. Then, the original model \(R{F_1}\), mixed model \(RF\) and local region model \(R{F_2}\) are determined by hard voting.

### A fault tracing method for system components based on information difference graph model

After the unbalanced data is integrated and classified, the abnormal or faulty components of the system need to be identified by analyzing the classified operating data, and fault tracing can be realized. The idea of information transmission can mine fault information through data-driven, so the information difference graph model is used to trace the fault source of power dispatching system. Information difference graph model is a fault tracing method based on graph theory. By constructing the information transfer relationship graph between system components and analyzing the information difference between components, fault components can be identified. Research warns interaction information in the power grid dispatch control system and sets thresholds based on bidirectional graphs and node built-in information. The range of threshold values obtained is shown in formula (6).

In formula (6), \(\Theta\) represents the set threshold. The range of parameter values in the information difference matrix is shown in formula (7).

In formula (7), *m* means the amount of rows corresponding to the values in the information difference matrix. *n* means the amount of columns in the information difference matrix where the corresponding values are located. The range of values in the information difference matrix is shown in formula (8).

In formula (8), \({c_{m,n}}\) represents the value of *m* row and *m* column in the information difference matrix *C*. The matrix is searched and labeled specific values, and label the conditions as shown in formula (9).

It performs element zeroing on non-labeled values in the information difference matrix *C* to obtain the information difference matrix \(C'\). Combined with the selected fault feature links, an information difference graph model is constructed. The relevant parameters of the model are denoted in formula (10).

In formula (10), \(\left\{ {{v_1},{v_2}, \cdots ,{v_N}} \right\}\) represents the node’s own attributes. \(\left\{ {{e_1},{e_2}, \cdots ,{e_M}} \right\}\) represents the link between nodes. \(\left\{ {{w_{{e_1}}},{w_{{e_2}}}, \cdots ,{w_{{e_M}}}} \right\}\) represents a set of directed edge weights. The information difference matrix in formula (6) to formula (10) is constructed by analyzing the warning interaction information in the power grid dispatching control system, where each element represents the degree of information difference between the two components. The threshold in formula (6) is set based on historical data and expert experience, and is used to filter out component pairs with less information difference. Formula (9) describes the labeling condition, that is, when a value in a matrix is greater than a threshold, it is labeled as a specific value. Formula (10) describes the relevant parameters of the model. After determining the relevant parameters of the information difference matrix, the degree of fault is determined based on the fault severity index, and sorted according to the degree index. The flowchart of the system component fault tracing method based on the information difference graph model is shown in Fig. 4.

In Fig. 4, the research first divides the time series into normal segments, normal and alarm co-existing segments. Then the research evaluates the fault degree of each component by constructing the information difference matrix. The information correlation matrix *A* and *B* of the normal period and the normal-alarm period co-exist are calculated by inputting the information entropy calculator or the mutual information transfer entropy calculator respectively. By comparing the matrix of the two time periods, the component pairs with different information are identified. Then, these components are sorted according to the failure degree index, and the potential failure components are screened out. Finally, using the information difference diagram model, the information transfer relationship diagram between system components is constructed, and the information difference between components is analyzed, so as to identify the faulty components. It normalizes the non-diagonal elements of matrix *B* for matrix *A*, normalizes the diagonal elements of matrix C for matrix *B*, and establishes the information difference matrix *C*. The calculation for the matrix *C* is shown in formula (11).

Subsequently, it will determine whether the matrix elements exceed the set threshold. If it is not exceeded, the self information change rate is determined by the node values in the information difference graph model. Subsequently, the ranking of fault severity is determined based on a descending order, and the components with high fault severity are extracted for component interaction relationship backtracking, ultimately obtaining the fault tracing result.

In terms of parameter selection, the optimization strategy based on actual application scenarios is studied. Through the experimental analysis, the parameter values can reflect the information differences among system components. Compared with other models in the literature, the proposed method can dynamically adjust the threshold according to the actual data, and then adapt to different systems and different fault types. In addition, by constructing an information difference graph model, the research captures the information difference between components, and further visually demonstrates the interaction relationship between components. Finally, considering the self-information change rate and mutual information transfer entropy, the fault degree is ranked to further improve the diagnosis efficiency.

### Fault ranking learning method based on ensemble learning forest algorithm and LambdaMART Model

After establishing the fault tracing method based on the information difference graph model and PDHS-DMS, the research found that after identifying the suspicious fault components, it was necessary to sort them and further investigate the fault location to improve the accuracy of fault tracing. For the complex features of the power grid dispatch system network, fault path inference can effectively raise the efficiency of fault diagnosis (Abd-El Wahab et al. 2024). For the fault ranking problem in practical applications, the LambdaMART algorithm based on ensemble learning forest algorithm can effectively improve this type of problem (Jiao et al. 2022). The process of the LambdaMART model initialized by the ensemble learning forest algorithm, namely the ERFLambdaMART model, is shown in Fig. 5.

In Fig. 5, the study first decomposes the dynamic process of D5000 system log into normal segment, abnormal segment and recovery segment. The feature selection of abnormal segment and recovery segment sequence is studied, and the sequence collection queue is obtained according to time advance. After that, the integrated learning forest algorithm trains the basic model and determines whether the number of trees is greater than the number of training samples *m*. If the number of trees is not greater than *m*, the process ends. If the number of trees is not greater than *m*, it needs to calculate \({\lambda _i}\) and \({w_i}\), and then calculate the output results of the corresponding regression tree leaf nodes. Each tree is saved in an integrated manner, the model and model metric scores are updated using the training set sample points, and the updated results are determined through the validation set. Subsequently, it will check whether there is any improvement in the consecutive *k* rounds. If there is no improvement, the process ends. If there is an improvement, it will return to the first judgment condition to continue the judgment. In the model training stage, the integrated learning forest algorithm is used as the basic model builder. The algorithm constructs multiple decision trees by randomly selecting features and samples, and improves the stability and generalization ability of the model by means of integration. After the preliminary basic model is obtained, the LambdaMART algorithm is used for further optimization. By calculating the loss gradient of each sample in the ranking and adjusting the weight of the decision tree accordingly, LambdaMART makes the model perform better on the ranking task. A fault raking learning method based on the ERFLambdaMART model is studied, and the overall process is shown in Fig. 6.

In Fig. 6, a process decomposition operation is dynamically adopted for the D5000 system logs, resulting in three time series: normal, recovery, and abnormal. The normal time series is retained, while the abnormal and recovery time series are concentrated (Rosati et al. 2023). Completing the feature selection operation determines the sequence set queue that varies over time. The ERFLambdaMART model is constructed based on the sequence set queue and the correlation function is optimized through continuous iteration. Subsequently, it fits and sorts based on the relevant information in the system logs (Zhu et al. 2022a, 2022b). The ERFLambdaMART model is utilized to perform ranking and learning operations on the test set, a fault association recommendation list that is not affected by subjective factors is obtained, and redundant components are removed during the information difference graph model backtracking process, ultimately completing fault path inference. After the construction of the fault ranking learning model is completed, specialized evaluation indicators need to be selected. Therefore, on the one hand, normalization of cumulative loss gain is carried out, and normalization calculation is completed based on the correlation and position of the model sorting results. Normalized cumulative gain (NDCG) and mean average precision (mAP) are two important indices to evaluate the performance of ranking models. NDCG focuses on the position of the relevant items in the sorting result, while mAP focuses on the average precision of the relevant items in the sorting result. The larger the MAP value, the higher the overall precision of the sorting model in the position of the relevant item. By considering NDCG and mAP, the ranking performance of ERFLambdaMART model in fault correlation recommendation list can be evaluated comprehensively. The calculation process of cumulative gain of loss (DGG) is shown in Eq. (12).

In formula (12), \(DCG\) means the cumulative loss gain. *i* means the position of the element. *n* represents the corresponding position on the recommendation list. \({r_1}\) represents the calculation of intermediate variables, and the value of \({r_1}\) is shown in formula (13).

In formula (13), if the element at position *i* is a hit element, the \({r_1}\) value is 1. When the project is ranked from high to low based on ratings, the average value of all users is expressed as the NDCG of loss, as denoted in formula (14).

In formula (14), \(NDCG\) represents the normalized DCG value. *n* represents the number of users. \(IDCG\) means the cumulative gain value of project losses after sorting is completed. On the other hand, it will calculate the average precision, and the average precision value is denoted in formula (15).

In formula (15), \({y_{i,j}}\) represents the corresponding time series label. *i* and *j* respectively represent the corresponding node positions. \(P(j)\) represents the accuracy of ranking list labels to their corresponding time series positions. The average precision is denoted in formula (16).

In formula (16), *u* means a user. \(\left| U \right|\) means the total number of users in the project. \(MAP\) means average precision. The larger the average precision value, the better the overall ranking performance of the ERFLambdaMART model.

In addition to ERFLambdaMART model, several other potential alternative methods are considered in the construction of fault correlation recommendation list. For example, RF-based sorting algorithms, which perform well on large-scale data sets, are not as stable as the ERFLambdaMART model when dealing with time series data. In addition, there are gradient-based decision tree (GBDT) ranking models, which, despite their advantages in feature combination, are not as efficient and generalizing as the ERFLambdaMART model when dealing with system log data with complex associations.

## Results

### Analysis of simulation comparison results of imbalanced ensemble classification methods

To test the performance of the unbalanced integration classification method proposed in this study, the advantages of current methods in processing unbalanced data sets were selected and compared with widely used methods. Comparison methods included hard coin slot, stack and MDS. The comparison results are shown in Fig. 7.

In Fig. 7 (a), five databases included wisconsin, vchicle1, ecoli2, yeast3, and glass4 databases. The F-measure values of DPHS-MDS model were 0.964, 0.835, 0.789, 0.676, and 0.589, respectively. In Fig. 7 (b), the G-mean values of the DPHS-MDS model were 0.685, 0.885, 0.775, 0.921, and 0.774, respectively. Based on the above content, the DPHS-MDS model had significant advantages in processing unbalanced data sets, especially in the selection of initial classifiers. By adopting MDS method, DPS-MDS model could effectively identify and balance the importance of various categories, to improve the classification performance. Because of its randomness, the hard coin method could not guarantee the stability and precision of the classifier. Although the stacking method could integrate the advantages of different classifiers to a certain extent, its performance was still limited by the selection and combination strategies of the underlying classifiers.

According to the stable and excellent data obtained by the DPHS-MDS model in 5 datasets, compared to hard coin and pile up methods, the DPHS-MDS model had more advantages in average performance. Based on the actual working conditions of the D5000 system business, three common business anomalies, namely data jumps, application network disconnections, and non-refreshing telemetry tables, were selected to determine the feasibility and effectiveness of the DPHS-MDS model. The number of normal data was more than abnormal data, and it could be divided into majority and minority types by using its characteristics. The comparison methods chosen were RUSboost and BRAF, respectively, because they have certain advantages in dealing with unbalanced data sets. The experimental results are shown in Fig. 8.

In Fig. 8 (a), F-measure values of the DPHS-MDS model were 0.981, 0.984, and 0.986 respectively for the application of three abnormal data, network disconnection, data jump, and telemeter failure to refresh. Compared with other methods, the DPHS-MDS model had significant advantages in terms of F-measure value indicators. In Fig. 8 (b), the G-mean values of the DPHS-MDS model were 0.959, 0.972, and 0.986, respectively, which were significantly higher than those of the other two methods. In conclusion, the DPHS-MDS model, as an unbalanced integration classification method, had excellent application performance in the detection of system service anomalies.

The DPHS-MDS model may face high computational complexity and resource consumption. More efficient matrix decomposition techniques or parallel computing methods can be considered. The performance of DPHS-MDS model depends on the reasonable setting of parameters to a large extent. It will consider to use Bayesian optimization methods to automatically find the best combination of parameters.

### Analysis of simulation comparison results of fault tracing methods for system components

For the convenience of explanation, the system component fault tracing method based on the information difference graph model obtained in the study would be referred to as the MBIRank method. To verify the effectiveness of MBIRank method, comparative experiments were conducted between MBIRank method and three common fault tracing methods based on simulation data set, namely mRank, gRank and RwERank. mRank is a fault tracing method based on the importance of modules. gRank is a fault tracing method based on graph model, while RwERank is a fault tracing method based on random walk. The experimental results are shown in Fig. 9.

The value of K ranged from 0 to 50. When K value was 50, from Fig. 9 (a), the recall rate of MBIRank was 0.27, and the recall rate of MBIRank increased with the increase of K value, and it always maintained the highest recall rate. From Fig. 9 (b), the NDCG value of MBIRank was 0.19, and the NDCG value of MBIRank decreased with the increase of K value, but the NDCG value was always the highest. From Fig. 9 (c), the false alarm rate of MBIRank was 0.83, and the false alarm rate of MBIRank increased with the increase of K value, but it was still the lowest false alarm rate. This showed that MBIRank method had significant advantages in fault tracing of system components, especially in the three key indicators of recall rate, NDCG and false alarm rate. The MBIRank method relied too much on high-quality simulation data sets in practical applications. Future research can consider optimizing the computational efficiency of the algorithm, as well as developing more efficient data acquisition and preprocessing techniques.

Subsequently, to assess the effectiveness of the MBIRank method for fault tracing in the power grid dispatch system dataset, 50 component level fault ranking results were extracted as the reference basis, and comparative experiments were conducted on MBIRank method and three common fault tracing methods, as shown in Fig. 10.

In Fig. 10 (a), the MBIRank method maintained a leading trend when the value of K fell within the range of [0, 50], and the recall rate reached 0.59 when the value of K was 50. In Fig. 10 (b), the MBIRank method reached a corresponding value of 0.77 at a K value of 10, and the NDCG value reached 0.80 at a K value of 50. In Fig. 10 (c), the MBIRank method maintained the lowest false alarm rate when the K value was less than 10, and the false alarm rate reached 0.05 when the K value was 50. The comprehensive performance of MBIRank method in the important parameters of power grid dispatch system dataset reflected the superior performance of MBIRank method in actual fault tracing.

By optimizing the sorting algorithm, MBIRank method could improve the accuracy and efficiency of fault detection, so as to provide more accurate fault location and rapid response for power grid dispatching control system. Future studies may explore the application of MBIRank method in power grid system fault tracking, such as realizing fault prediction and prevention through real-time data flow analysis, so as to further improve the stability and reliability of power grid system.

In practical applications, the interpretability of the MBIRank method is also an important consideration. To improve users’ trust in fault tracing results, visual tools can be developed to present the fault tracing process and results to users in an intuitive manner. Fault tracking also has important application value in intelligent transportation system, industrial automation, medical diagnosis and so on. By adjusting and optimizing the MBIRank method, it can adapt to the specific needs of different fields.

### Simulation comparison results analysis of fault ranking learning methods

To assess the effectiveness of the ERFLambdaMART model, simulation experiments were conducted on the ERFLambdaMART model, the LambdaMART model-based on RF, namely the RFLambdaMART model, and the traditional LambdaMART model. Due to the need to determine and eliminate the stable impact of iteration times in model comparison, the first step was to iterate based on the Microsoft Learning to Rank dataset and the Learning to Rank for Information Retrieval (LETOR) dataset, and determine the most appropriate number of iterations based on their influence on NDCG@10 and MAP@10, as shown in Fig. 11.

In Fig. 11, in the Microsoft Learning to Rank dataset, from Fig. 11 (a), NDCG@10 value tended to stabilize after reaching 250 iterations. In Fig. 11 (b), MAP@10 value was basically stable after reaching 150 iterations. In the LETOR dataset, from Fig. 11 (c), NDCG@10 value remained basically unchanged after reaching 200 iterations. As shown in Fig. 11 (d), MAP@10 value generally stabilized after reaching 100 iterations. Therefore, the study determined an iteration number of 250 and conducted simulation experiments for comparison, as shown in Fig. 12.

In Fig. 12 (a), NDCG@10 values of ERFLambdaMART model were 0.756, 0.750, 0.752, 0.754, and 0.751 respectively. The MAP@10 values of ERFLambdaMART model were 0.186, 0.187, 0.1865, 0.185, and 0.1865, respectively. In Fig. 12 (b), the training iteration time of ERFLambdaMART model was basically within 3s, the training iteration time of RFLambdaMART model was between 3s and 10s, and the training iteration time of LambdaMART model was between 14s and 23s. According to the comprehensive evaluation index, ERFLambdaMART model could speed up the generation of decision trees, realize the related tasks of ranking learning, and has high application value in real application scenarios.

To further test the performance of the fault tracing method (Method 1) of the power grid dispatching control system proposed in this study, the fault tracing methods in literature (Mahmood et al. 2022) (method 2), in literature (Chen et al. 2022) (method 3), and in literature (Yang et al. 2022) (method 4) were compared. Four methods were applied to fault tracing of power grid dispatching control system in four different regions of region A. Comparison indicators included fault detection rate, average time to locate faults, and system recovery time. The experimental results are shown in Table 1.

In Table 1, the fault detection rate of method 1 reached 98.52%, which was much higher than the other three methods. This showed that method 1 had high accuracy in identifying faults in power grid dispatching control system. At the same time, the average fault location time of method 1 was only 12.50 min, which indicated that the method could quickly locate the fault source and provide strong support for rapid fault repair. Method 1 could recover the entire system in 35.04 min, which was significantly lower than other methods.

ERFLambdaMART model, as an advanced sorting learning algorithm, has great application potential in power grid dispatching system. ERFLambdaMART model can be used to analyze power grid operation data in real time, quickly identify potential faults and anomalies, and realize early warning and fault prevention. It can also be used to predict the load of the power grid, and provide accurate load prediction data for the power grid dispatching, so as to optimize the dispatching strategy.

The grid dispatching system needs a lot of high quality data to train ERFLambdaMART model. Data preprocessing mechanisms are considered to be applied to ensure data accuracy and completeness. Distributed computing and parallel processing techniques can be used to improve the computing speed of the model. The integration of ERFLambdaMART model into the existing power grid dispatching system may face the problems of technical compatibility and system compatibility. Detailed system requirements analysis and technical evaluation are required to select appropriate interfaces and protocols.

## Conclusion

To optimize the fault tracing efficiency of intelligent power network dispatching control system, a fault tracing method of power network dispatching system based on DPS-MDS and LambdaMART algorithm was proposed. Based on DPHS technology and MDS method, an unbalanced integrated classification method was proposed for anomaly detection. To solve the problem of complex component construction and diverse operation relations of power dispatching system, a fault tracing method of system components based on information difference graph model was proposed. The experimental results showed that the F-measure value and G-mean value of DPHS-MDS model in wisconsin database were 0.964 and 0.685. In the vchicle1 database, the F-measure value and G-mean value of the DPHS-MDS model were 0.835 and 0.685. In the ecoli2 database, the F-measure value and G-mean value of the DPHS-MDS model 0.789 and 0.775. The research team conducted field tests in a regional power grid dispatching control system. The test results showed that the DPHS-MDS model could identify and locate power grid faults effectively in practical application, so as to shorten the fault processing time and improve the efficiency and reliability of power grid dispatching. The NDCG value, recall rate and false alarm rate of the proposed MBIRank algorithm were 0.92, 0.88 and 0.05, respectively. These results showed that MBIRank algorithm performed well in ranking accuracy and fault detection. ERFLambdaMART model showed excellent performance on both NDCG@10 and MAP@10 values. The specific values were 0.752 and 0.186, respectively. The fault detection rate and fault location time of method 1 were 98.52% and 12.50 min respectively, which was significantly better than other methods. The proposed method could effectively solve the problems of long model training time, organic integration of multiple technologies, dependence on prior knowledge or specific assumptions in the existing research. Based on the above content, the proposed method can effectively improve the fault tracing efficiency and overall performance of the power grid dispatching system. Although the DPHS-MDS model performed well in experiment, it still needs to consider the complexity and diversity of power grid dispatching system in practical application. The power grid structure and operating environment in different regions are different, so the model needs to be further adjusted and optimized to meet the needs of different power grid dispatching systems.

## Data availability

The data are within the text.

## References

Abd-El Wahab M, Kamel A, Hassan SH, Domínguez-García M, Nasrat JL (2024) Jaya-AEO: an innovative hybrid optimizer for reactive power dispatch optimization in power systems. Electr Power Compon Syst 52(4):509–531

Bebars AD, Eladl AA, Abdulsalam GM, Badran EA (2022) Internal electrical fault detection techniques in DFIG-based wind turbines: a review. Prot Control Mod Power Syst 7(2):1–22

Chen H, Li L, Shang C, Huang B (2022) Fault detection for nonlinear dynamic systems with consideration of modeling errors: a data-driven approach. IEEE Trans Cybernetics 53(7):4259–4269

Cui D, Ge W, Zhao W, Jiang F, Zhang Y (2022) Economic low-carbon clean dispatching of power system containing P2G considering the comprehensive influence of multi-price factor. J Electr Eng Technol 17(1):155–166

Dutta A, McKay ME, Kopsaftopoulos F, Gandhi F (2022) Multicopter fault detection and identification via data-driven statistical learning methods. AIAA J 60(1):160–175

Ghasemi M, Akbari E, Faraji Davoudkhani I, Rahimnejad A, Asadpoor MB, Gadsden SA (2022) Application of Coulomb’s and Franklin’s laws algorithm to solve large-scale optimal reactive power dispatch problems. Soft Comput 26(24):13899–13923

He Q, Li X, Sun Y (2023) Company ranking prediction based on network big data. IETE J Res 69(9):6176–6187

Huang G, Wu F, Guo C (2022) Smart grid dispatch powered by deep learning: a survey. Front Inform Technol Electron Eng 23(5):763–776

Ibrahim OAS, Younis EMG (2023) Combining variable neighborhood with gradient ascent for learning to rank problem. Neural Comput Appl 35(17):12599–12610

Jiao Z, Yin Y, Ran L, Gao Z (2022) Integrating vehicle-to-grid contract design with power dispatching optimisation: managerial insights, and carbon footprints mitigation. Int J Prod Res 60(17):5354–5379

Le J, Zhao L, Zhou Q, Liao X (2022) Comprehensive performance optimisation method of the hybrid simulation platform of an MMC-HVDC system. Int J Electron 109(11):1915–1934

Mahmood T, Li J, Pei Y, Akhtar F, Butt SA, Ditta A, Qureshi S (2022) An intelligent fault detection approach based on reinforcement learning system in wireless sensor network. J Supercomputing 78(3):3646–3675

Mansoor A, Diao X, Smidts C (2023) A method for backward failure propagation in conceptual system design. Nucl Sci Eng 197(11):2751–2777

Rosati R, Romeo L, Cecchini G, Tonetto F, Viti P, Mancini A, Frontoni E (2023) From knowledge-based to big data analytic model: a novel IoT and machine learning based decision support system for predictive maintenance in industry 4.0. J Intell Manuf 34(1):107–121

Shirzadi N, Nasiri F, El-Bayeh C, Eicker U (2022) Optimal dispatching of renewable energy-based urban microgrids using a deep learning approach for electrical load and wind power forecasting. Int J Energy Res 46(3):3173–3188

Siu JY, Kumar N, Panda SK (2022) Command authentication using multiagent system for attacks on the economic dispatch problem. IEEE Trans Ind Appl 58(4):4381–4393

Tang H, Lv K, Bak-Jensen B, Pillai JR, Wang Z (2022) Deep neural network-based hierarchical learning method for dispatch control of multi-regional power grid. Neural Comput Appl 34(7):5063–5079

Tian M, Dong Z, Gong L, Wang X (2022) Coordinated repair crew dispatch problem for Cyber–Physical distribution system. IEEE Trans Smart Grid 14(3):2288–2300

Wang D, Peng D, Huang D, Ren L, Yang M, Zhao H (2022a) Research on short-term and mid-long term optimal dispatch of multi‐energy complementary power generation system. IET Renew Power Gener 16(7):1354–1367

Wang B, Zhang P, He Y, Wang X, Zhang X (2022b) Scenario-oriented hybrid particle swarm optimization algorithm for robust economic dispatch of power system with wind power. J Syst Eng Electron 33(5):1143–1150

Wellendorf A, Tichelmann P, Uhl J (2023) Performance analysis of a dynamic test bench based on a linear direct drive. Archives Adv Eng Sci 1(1):55–62

Yang Z, Baraldi P, Zio E (2022) A method for fault detection in multi-component systems based on sparse autoencoder-based deep neural networks. Reliab Eng Syst Saf 220(3):108278–108285

Zhou Z, Weibel R, Huang H (2022) Familiarity-dependent computational modelling of indoor landmark selection for route communication: a ranking approach. Int J Geogr Inf Sci 36(3):514–546

Zhu Q, Tang X, Elahi A (2022a) Automatic clustering based on dynamic parameters harmony search optimization algorithm. Pattern Anal Appl 25(4):693–709

Zhu Y, Zhou Y, Wei W, Zhang L (2022b) Real-time cascading failure risk evaluation with high penetration of renewable energy based on a graph convolutional network. IEEE Trans Power Syst 38(5):4122–4133

## Acknowledgements

Not applicable.

## Funding

No funding was received.

## Author information

### Authors and Affiliations

### Contributions

Sheng Yang wrote the first draft; Yuan Fu reviewed and edited the final draft; Shengyuan Li provided the resources and analyzed the data. All authors support this submission.

### Corresponding author

## Ethics declarations

### Ethical approval

Not applicable.

### Consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

The authors declare no competing interests.

## Additional information

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

## About this article

### Cite this article

Yang, S., Fu, Y. & Li, S. Fault traceability of power grid dispatching system based on DPHS-MDS and LambdaMART algorithm.
*Energy Inform* **7**, 85 (2024). https://doi.org/10.1186/s42162-024-00391-7

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s42162-024-00391-7