Skip to main content

Integrated model construction for state of charge estimation in electric vehicle lithium batteries


This research addresses the issue of State of Charge (SOC) prediction for electric vehicle batteries by employing a dynamic Kalman neural network model. The model is optimized using a Genetic algorithm to adjust the neural network weights. Additionally, a strategy involving support vector machines for model optimization is proposed. This strategy involves preprocessing the data, selecting appropriate kernel functions for training, and merging prediction results to enhance the stability of the model. Results indicated that the Dynamic Genetic Kalman Neural Network (DGKNN) model achieved the minimum prediction error percentage of only 0.1529% when the correction coefficient was set to 0.7. The DGKNN model consistently exhibited the lowest error percentage, average absolute error, mean square error, and root mean square error when handling small, medium, and large datasets. For instance, in the small dataset, the error percentage was only 0.1518, and the root mean square error was only 0.0604. The research findings demonstrated that the proposed model exhibited high real-time accuracy in predicting battery SOC, enabling real-time monitoring of battery operating parameters. The method proposed in this study can accurately predict the state of battery charge, extend the life of battery packs, and improve the performance of electric vehicles. It has important significance for promoting the development of the electric vehicle industry.


Electric vehicles play a crucial role in the global eco-friendly market, emerging as the preferred mode of sustainable and environmentally friendly transportation. During operation, the remaining electric charge in the lithium batteries of electric vehicles, known as the State of Charge (SOC), significantly influences the performance and user experience (Venkitaraman and Kosuru 2022; Mastoi et al. 2022; Anwar et al. 2022). Accurately and rapidly estimating the SOC of electric vehicle lithium batteries is pivotal for promoting the use and adoption of electric vehicles. However, due to complex nonlinear factors such as discharge properties, processes, and lifespan under different environmental and load conditions, existing efficient models face limitations (Cui et al. 2022a, b; Xu et al. 2022a). Current SOC estimation methods include voltage-based, resistance-based, model-based, and machine learning-based approaches. Yet, these methods have their respective constraints and inadequacies due to the nonlinearity, complexity, and environmental impact of batteries. Therefore, addressing how to comprehensively consider internal battery states, temperature, charge–discharge status, and other influencing factors by integrating various estimation methods is a crucial topic in the field (Dou et al. 2022; Wu et al. 2023a; Mokayed et al. 2023). This study aims to research and develop such an integrated model, exploring a novel approach to constructing a SOC estimation model. The first section introduces the research objectives, the second section constructs an intelligent SOC prediction model, the third section validates the model’s effectiveness, and the fourth section presents the research conclusions.

In this paper, a Dynamic Genetic Kalman Neural Network (DGKNN) algorithm based on genetic optimization is proposed by combining the Kalman filter method and neural network method in traditional SOC prediction methods. Compared with the existing methods, the contribution of this study is to combine the advantages of the neural network method and Kalman filter method, and use the neural network method to fit the battery parameters. It simplifies the parameter identification process of the Kalman filter. The model is updated with Kalman filtering to provide noise removal capability and robustness to the model. At the same time, the Genetic Algorithm (GA) optimization also greatly reduces the initial error, helps the battery management system monitor the remaining power of the battery, extends the life of the battery, and improves the safety and stability of the battery.

Research backgrounds

In recent years, research in the field of SOC has gradually deepened. To prevent the battery from overcharging and discharging, the unscented transform and multiple innovations are applied to the particle filter, optimizing the particle distribution and updating the state values according to historical information. The scented particle filter is formed to estimate the charging state of the battery, and the impact of parameter changes on SOC estimation is considered and verified. The algorithm can accurately estimate the real-time SOC changes, and the average error of SOC is less than 0.5%, which has a high precision (Chen et al. 2024). In order to better learn and estimate the state of battery charge, Chen et al. used a multi-core correlation vector machine and whale optimization algorithm to estimate the SOC of lithium-ion battery under different working conditions, and established the SOC estimation model of lithium-ion battery. The weights and kernel parameters of multi-kernel correlation vector machine were automatically adjusted and optimized by whale optimization algorithm, and the estimation accuracy was improved. Compared with other optimization algorithms, this algorithm had a better optimization effect and could estimate SOC more accurately (Chen et al. 2023). Liang and the team presented a rule-based approach that achieved rapid SOC balance while respecting power constraints. The results showed that compared to existing methods, the proposed algorithm exhibited superior performance (Liang et al. 2022). The team led by Xu proposed a novel SOC estimation method, emphasizing the importance of accurate SOC not only for user experience but also for preventing overcharging, over-discharging, and ensuring safe usage. Due to numerous issues with neural networks in SOC estimation, they introduced an individual migration dynamic step-size Drosophila algorithm, combining it with neural networks for SOC estimation to enhance accuracy. Experimental results demonstrated that, compared to other algorithms, the proposed algorithm exhibited excellent estimation accuracy, with an average absolute error below 0.8% and a root Mean Square Error (RMSE) below 1.4% (Xu et al. 2022b).

Deep learning algorithms have also been widely applied. Feng and colleagues developed a novel bandwidth selection method by extracting features from multiple domains to comprehensively describe target harmonic responses. This method was employed for Vold-Kalman filtering to choose reasonable bandwidth. Experimental results demonstrated the effectiveness and superiority of this adaptive Vold-Kalman filtering in the diagnosis of wind turbine planetary gearbox (Feng et al. 2023). Revach G and team designed a Kalman filter that integrated a structural state-space model and a dedicated recurrent neural network module, capable of learning complex dynamic behaviors from data. The results showed that KalmanNet efficiently addressed nonlinear and model matching problems (Revach et al. 2022). Esenogho and collaborators proposed a neural network ensemble algorithm based on feature extraction to enhance efficiency in credit card fraud detection. The findings revealed that their proposed Long Short-Term Memory (LSTM) network ensemble outperformed all other algorithms in classifier performance, achieving high levels of sensitivity and specificity (Esenogho et al. 2022). As'ad and his team introduced a mechanistic artificial neural network approach, preserving the advantages of form invariance learned from data-driven regression, while incorporating the physical rationality of mechanistic models. By reinforcing certain good mathematical properties in the network architecture, the authors ensured that the learning process adhered to physical constraints, enhancing the success rate of numerical simulations. The advantages of this learning method were prominently demonstrated in multiple finite element analysis instances, and the new approach was validated for ensuring the computational feasibility of multiscale applications (As'ad et al. 2022).

In the development of societal intelligence, embedded SOC technology played a crucial role and underwent extensive research. Within this domain, a series of studies focused on state prediction to enhance efficiency and safety. The outcomes of these studies reflected the in-depth development and diversification trends in the field. Building upon previous research, the current study incorporated new insights and improvements. The innovation of the research primarily manifested in the construction of an integrated model that fully considered the physical characteristics of lithium batteries and the challenges encountered in practical applications. By combining deep learning methods and integrating various outstanding algorithms, the model aimed to enhance prediction accuracy and stability, positively contributing to the development of electric vehicles.

Multimedia quantum representation model design

Research has focused on the design of a dynamic Kalman neural network that combines the advantages of Kalman filtering and neural networks for predicting the SOC of batteries. However, the model faces challenges related to the extensive data requirements and hardware expenses. To address these issues, a GA is introduced for optimizing the weights of the neural network.

DGKNN model design

The predictive SOC estimation process in this study is based on an Reserve Capacity (RC) battery and system. The model treats a set of batteries, combined through parallel and series connections, as equivalent to a single battery to enhance efficiency, as illustrated in Fig. 1.

Fig. 1
figure 1

Schematic diagram of parallel connection before series connection

To tackle SOC prediction challenges, the study introduces the DGKNN. Kalman filtering demands precise parameter identification and high accuracy, while neural networks require a large amount of sample data, leading to increased hardware costs. To overcome these issues, a novel structure is proposed, which is the Dynamic Neural Network. This structure integrates Kalman filtering and neural networks. Kalman filtering excels in handling dynamic changes but requires an accurate model for useful predictions. Traditional Kalman filters often handle problems of linear dynamic systems only. To overcome this limitation, neural networks are introduced, leveraging their powerful nonlinear approximation capabilities in Kalman filtering to enhance prediction accuracy, as depicted in Fig. 2.

Fig. 2
figure 2

Artificial neural network structure

A critical concern in the structure of the dynamic neural network is determining the initial weight values, which directly impact stability and prediction effectiveness. Traditional weight selection methods involve random allocation based on training data or initial assignment according to certain rules. However, these methods do not guarantee model accuracy. To address this, the research explores the use of GA to adjust the initial weights of the neural network. GA provides a globally optimal solution, drawing inspiration from natural selection and genetic principles. GA is a random search algorithm based on natural selection and genetic mechanisms. Its core idea is to inherit excellent individuals to the next generation through heredity, variation and other operations. Its advantage is that it does not require any knowledge related to the question to search. The search uses the evaluation function and the process is simple. It is scalable and easy to combine with other algorithms. Applying GA to weight selection enhances the predictive performance of the model. The nonlinear SOC model is represented by Formula (1).

$$\dot{x} = f\left( x \right) + \sum\limits_{i = 1}^{n} {g\left( {u_{i} } \right)} .$$

In Formula (1), \(x\) represents the SOC, and \(u\) represents the input. Firstly, it is necessary to construct a dynamic neural network model, which consists of an input layer, hidden layers, and an output layer. In the input parameters, multiple factors influencing SOC are considered, including current, temperature, and battery aging. In the hidden layers, specific transformations (such as nonlinear activation functions) are applied to convert the input parameters into new state parameters. Subsequently, the Kalman filtering theory is employed to train the neural network model. \(S\left( {x_{i} } \right)\) is the activation function whose expression is shown in Formula (2).

$$S\left( {x_{i} } \right) = \frac{1}{{1 + e^{{ - x_{i} }} }}.$$

When constructing the error model, the weight set is first established as shown in Formula (3).

$$V = \left[ {V_{1} ,V_{2} , \ldots ,V_{3} } \right].$$

And the input value set is defined by Formula (4).

$$u = \left[ {u_{1} .u_{2} , \ldots ,u_{n} } \right]^{T} .$$

The error model can then be described by Formula (5).

$$\dot{x} = Ax + W \cdot S\left( {x_{i} } \right) + V \cdot u + ME.$$

In Formula (5), \(ME\) represents the error model, and \(A\) represents the correction constant. After discretization, it becomes Formula (6).

$$x\left( {k + 1} \right) = Ax\left( k \right) + W \cdot S\left( {x_{i} \left( k \right)} \right) + V \cdot u + ME\left( k \right).$$

Finally, the error model is obtained as shown in Formula (7).

$$ME = f\left( x \right) + g\left( u \right) - \left( {Ax + W \cdot S\left( x \right) + V \cdot u} \right).$$

In Formula (7), \(g\left( u \right)\) represents the mapping relationship. The basic assumption of the model is that the system’s state is determined by a known state transition model and an observation model. In the research scenario, the state transition model is the studied neural network model, and the observation model is the actual measured value of SOC. During the model training process, the study needs to update the weights in the neural network model based on the observed SOC values to obtain the best prediction results. The observation model is given by Formula (8).

$$y\left( k \right) = x\left( {k + 1} \right) - Ax\left( k \right) = H\left( k \right)w\left( k \right) + ME\left( k \right).$$

In Formula (8), \(w\left( k \right)\) represents the weight vector, and \(H\left( k \right)\) represents the mapping of \(w\left( k \right)\) to \(y\left( k \right)\). The system update Formula is given by Formula (9).

$$\hat{w}\left( {k + 1} \right) = w\left( k \right) + \zeta \left( k \right).$$

In Formula (9), \(\zeta \left( k \right)\) represents the process noise. The weight error covariance prediction matrix is given by Formula (10).

$$P^{ - } \left( k \right) = \Phi \left( k \right)P^{ - } \left( {k - 1} \right)\Phi^{T} \left( k \right) + Q_{w} .$$

In Formula (10), \(\Phi \left( k \right)\) represents the Jacobian matrix, \(Q_{w}\) represents the variance of the error. The Kalman filtering gain of the model is given by Formula (11).

$$K\left( k \right) = P^{ - } \left( k \right)H^{T} \left( k \right)\left( {H\left( k \right)P^{ - } \left( k \right)H^{T} \left( k \right) + R_{k} } \right)^{ - 1} .$$

In Formula (11), \(P^{ - } \left( k \right)\) represents the error covariance interface matrix, \(H\left( k \right)\) represents the transpose of the Jacobian matrix, \(R_{k}\) represents the noise covariance matrix. However, since Kalman filtering requires an accurate model, it is often challenging to find the optimal solution quickly during training. Therefore, the study introduces GA for improvement. The GA is responsible for finding initial weights in the training of the neural network model, effectively enhancing the model’s predictive performance. By designing appropriate adaptive functions and selection strategies, the efficiency and convergence of GAs are ensured. This leads to a dynamic Kalman neural network model based on genetic optimization for SOC prediction. The genetic optimization process is illustrated in Fig. 3.

Fig. 3
figure 3

Genetic optimization process

For the optimization of initial weights in this paper, it is assumed that \(N\) DGKNN weight vectors \(w\left( 0 \right) = \{ w_{1} \left( 0 \right),w_{2} \left( 0 \right),... \, ,w_{N} (0)\}\) are the initial population. The algorithm set the number of terminating iteration steps \(s\), and compare the initial value x(0) of \(x\) with the initial population \(w\left( 0 \right) = \{ w_{1} \left( 0 \right),w_{2} \left( 0 \right),... \, ,w_{N} (0)\}\) plugged into formula (5) to get the value of the next state \(x\left( 1 \right) = \{ x_{1} \left( 1 \right),x_{2} \left( 1 \right),... \, x_{N} (1)\}\). Let the true SOC value at this time be \(r\left( 1 \right)\), Formula (12) represent the fitness function \(f_{i} \left( k \right)\).

$$f_{i} \left( k \right) = 1/\left| {x_{i} \left( k \right) - r\left( k \right)} \right|,i = 1,2, \ldots ,N.$$

In Formula (12), \(r\left( k \right)\) represents the true value, and \(x_{i} \left( k \right)\) represents the state value. The selection operation is performed on the population, and the new generation population will be selected from the previous generation population according to probability. The selection probability of each individual is calculated according to the fitness function. The probability of being selected is expressed as Formula (13).

$$p_{i} \left( k \right) = f_{i} \left( k \right)/\sum\limits_{j = 1}^{N} {f_{j} } \left( k \right),i = 1,2, \ldots ,N.$$

In Formula (13), \(i\) and \(j\) represent different individuals. Then randomly select a number \(R\) from [0,1], if \(R < p_{i} \left( k \right),i = 1,2, \cdots ,N\), then the \(i\)th individual is selected. If \(R < p_{i + 1} \left( k \right)\), then the \(i + 1\) individual is selected, and so on, until all individuals have been traversed. Cross-operation is conducted to the population. It is assumed that the crossover probability is \(\omega_{c}\), then a real number \(R\) is randomly generated between the interval [0,1]. If \(R < \omega_{c}\). Then two individuals \(\left\{ {w_{m} \left( k \right),w_{n} \left( k \right)} \right\}\) are randomly selected from the population and then crossed according to Formula (14).

$$\left\{ {\begin{array}{*{20}c} {\tilde{w}_{m} \left( k \right) = \alpha w_{m} \left( k \right) + \left( {1 - \alpha } \right)w_{n} \left( k \right)} \\ {\tilde{w}_{n} \left( k \right) = \alpha w_{n} \left( k \right) + \left( {1 - \alpha } \right)w_{m} \left( k \right)} \\ \end{array} } \right..$$

In Formula (14), \(\alpha\) represents a random number in the range [0,1], and \(w_{m} \left( k \right)\) and \(w_{n} \left( k \right)\) represent different individuals. To mutate the population, it is assumed that the mutation probability is \(\omega_{m}\). A real number \(R\) is randomly generated between the interval [0,1], if \(R < \omega_{m}\), then randomly select an individual \(w_{i} \left( k \right)\) from the population, and then mutate according to the following Formula (15).

$$\tilde{w}_{i} \left( k \right) = w_{i} \left( k \right) + \eta ,\eta \sim N\left( {0,\sum } \right).$$

In Formula (15), \(\eta\) follows a normal distribution. \(w_{i} k\) represents a random individual. The specific configuration of the genetic neural network layer is: the total number of iterations is 100, the crossover probability is 0.3, the mutation probability is 0.1, and the population size is 150. In this paper, the structure of the neural network is set as 14-10-1, with a total of 150 weights and 11 thresholds, so the individual coding length of the GA is 161, which is the sum of the number of weights and thresholds. By repeating these steps, the process is like the population evolving continuously, and each iteration’s chromosome is more adapted to the environment than the previous one. When the preset evolutionary steps are reached, the chromosome with the highest fitness among the remaining chromosomes is the sought-after optimal solution, representing the best network weight configuration.

Improvement design for SVM stability in DGKNN model

After optimizing the initial weights of the DGKNN model through GA, significant progress has indeed been achieved. However, this method still has certain limitations, particularly in terms of prediction stability. Therefore, incorporating the principles of Support Vector Machine (SVM), the research plan involves a series of enhancements to the studied model to improve prediction stability. Firstly, the data needs preprocessing before proceeding with SVM training. During this stage, an appropriate kernel function must be selected. If the data can be completely separated by a straight line in a two-dimensional plane, a linear kernel is the best choice. On the other hand, if the data exhibits a circular distribution in a two-dimensional plane, the radial basis function is a better choice. The research employs cross-validation and grid search to select the optimal model and parameters, as depicted in the model fusion architecture shown in Fig. 4.

Fig. 4
figure 4

Model fusion architecture

After obtaining preprocessed data and training the SVM model, the next step involves merging the prediction results. Specifically, the SVM prediction results and DGKNN model prediction results are integrated to form the final prediction. Different weights can be assigned to the predictions of the two models, and the two results are then combined to obtain the final result. The prediction result fusion is illustrated in Formula (16).

$$SOC\_final = \alpha * SVM\_predict + (1 - \alpha )*DGKNN\_predict.$$

In Formula (16), \(\alpha\) represents the weight in the range [0,1], \(SVM\_predict\) represents the SVM prediction result, and \(DGKNN\_predict\) represents the DGKNN model prediction result. This approach maximizes the predictive capabilities of both SVM and DGKNN models while improving prediction stability through rational weight allocation. The SVM optimization strategy diagram is shown in Fig. 5.

Fig. 5
figure 5

SVM optimization strategy

By incorporating SVM into the research model, the study aims to achieve more robust prediction results. Throughout this process, the research first preprocesses the data, then trains the SVM, generates prediction results, and finally merges these results with the predictions of the DGKNN model. Subsequently, the study continuously optimizes and adjusts the research model in practical use, making it better suited for real-world environments and providing more accurate and robust SOC predictions.

The life of the battery is limited, and the use of the battery for a long time will reduce the battery capacity and cause the attenuation of the battery life. In real electric vehicle operation, the discharge rate of the battery will change with the increase of the driving speed. Therefore, the connection between the discharge rate and the capacity of the battery must be studied in order to better meet the needs of the car. The relationship between capacity and different discharge rates was studied by conducting discharge experiments under high performance battery testing equipment. It is found that the terminal voltage of the battery drops rapidly at the beginning, and the voltage changes caused by different discharge rates are different. The rate of voltage drop is positively correlated with the rate of charge and discharge. The smaller the rate of discharge, the slower the voltage drop. After the voltage drops rapidly, it enters the voltage plateau, the terminal voltage changes slowly in this stage. The smaller the discharge rate, the longer the duration of this stage, and the battery is suitable for working in this stage. Finally, when the battery is about to run out of power, the discharge rate will become lower, and the performance of the power battery will be improved. When the voltage at both ends of the battery drops sharply to its set minimum current, the battery is not suitable for work at this stage.

Performance evaluation of SOC prediction models

The study conducted an in-depth examination of the performance of SOC prediction models. The study evaluated the DGKNN model based on genetic optimization and four other predictive models under The U.S. Environmental Protection Agency (EPA) scenario. EPA working conditions are relatively complex, including urban working conditions, high-speed working conditions, high temperature working conditions, low temperature working conditions and intense driving conditions. The final EPA results are obtained by weighting after the completion of the five working conditions. A comparison of the performance of these models in handling small, medium, and large datasets was carried out.

DGKNN model evaluation

With the increase of electric vehicles, users’ driving habits have changed, and the acceleration section of actual user conditions was slightly stronger than EPA. The reason was that electric vehicles had stronger performance and greater acceleration. EPA was a product of many years ago, and the main collection of vehicle data was also different from the latest. It may have some differences from the actual test. However, EPA was still used by automobile companies, indicating that the error was within the acceptable range. In designing the experimental validation for the model, the experiments utilized multiple cycles of the NEDC structure. The vehicle’s maximum operating speed was set at 120 km/h, with an average speed of 36.01 km per hour. The entire experimental process took 3074 s, with sampling occurring every 0.1 s, as detailed in Table 1.

Table 1 Experimental settings

As shown in Table 1, the initial SOC value of the battery was set to 0.44. The experiment used a genetic optimization based DGKNN for prediction. The input variables of the model included battery voltage (Vn − 1), battery current, and battery temperature. The output variable of the model was x(it). In terms of genetic optimization algorithms, the population size was set to 50 and the number of termination iteration steps was 50. And a correction constant MI was set, with a value less than 1.

The training environment of this experiment: CPU was Intel(R) Core(TM)i9-10920X, memory was 16G. The GPU was GeForce RTX 2080 Ti and the video memory was 11G. The operating system was Windows 64-bit, and the CUDA11.2 library file was installed. The development language was Python, Pytorch1.7.1 framework. Figure 6 showed the CPU usage of the proposed model. During the operation of DGKNN model, the CPU usage did not exceed 50%, and the average was 38.21%, which had low requirements for system hardware configuration. The optimization effect analysis was shown in Fig. 7.

Fig. 6
figure 6

Computational complexity analysis

Fig. 7
figure 7

Genetic optimization effect analysis

Figure 7 illustrated that without the application of GA optimization, DGKNN’s initial prediction error was significant. Conversely, when the GA was applied, this error significantly diminished. This observation strongly suggested that GA optimization was crucial in the application of DGKNN. Furthermore, with the optimization of weights through the GA, a substantial improvement was observed. The algorithm significantly reduced the error in the initial predictions, alleviating the stiffness observed in the DGKNN model without optimization. This enhancement boosted model performance, further enhancing the precision and practicality of prediction results. DGKNN prediction effectiveness was detailed in Fig. 8.

figure 8

Analysis of DGKNN prediction effect

Figure 8 showed a significant reduction in prediction error when DGKNN was used based on genetic optimization compared to Deep Neural Network (DNN) and Genetic models. This finding provided crucial reference information for research and design. Detailed analysis of prediction errors revealed the superiority of DGKNN after genetic optimization. Genetic optimization not only improved prediction accuracy within stages but also optimized accuracy in multi-stage predictions, reducing error fluctuations. In contrast, DNN and Genetic models exhibited lower fitting to the true value curve, with larger prediction error fluctuations. Genetic optimization-based DGKNN outperformed DNN and Genetic models in prediction errors, emphasizing the importance of genetic optimization in controlling prediction system errors. Prediction errors under different correction coefficients were outlined in Table 2.

Table 2 Prediction error under different correction coefficients

In Table 2, considering the percentage error, the percentage error ranged from 0.1529 to 0.3307. At a correction coefficient of 0.9, the percentage error reached its maximum value of 0.3307, indicating the maximum deviation of predicted results from actual values. Conversely, when the correction coefficient was 0.7, the percentage error was minimal, only 0.1529, indicating the highest accuracy of the model’s predictions at this correction coefficient. Moving on to the mean absolute error (MAE), this metric fluctuated between 0.0324 and 0.068. At a correction coefficient of 0.9, the MAE was maximum at 0.068, signifying the largest average absolute error between predicted and actual values. However, at a correction coefficient of 0.7, the MAE was minimal at 0.0324, indicating the smallest average gap between predicted and true values. Regarding the Mean Square Error (MSE), a similar trend was observed, with the range fluctuating from 0.0154 to 0.0607. The maximum value occurred at a correction coefficient of 0.9, while the minimum value was attained at a correction coefficient of 0.7, consistent with the patterns observed in the previous two metrics. Finally, considering the RMSE, a standard measure of prediction error, its values varied from 0.1243 to 0.2464. The maximum RMSE was observed at a correction coefficient of 0.9, indicating larger fluctuations in predicted values. On the other hand, the minimum RMSE occurred at a correction coefficient of 0.7, suggesting relatively accurate prediction results. It is evident that, under a correction coefficient of 0.7, all four error metrics were minimized, indicating the highest accuracy of the model. The reason of the high prediction accuracy of DGKNN was that it took into account the influence of measurement noise and time. In addition, because the internal resistance, capacitance and other parameters of the battery were not easy to measure, it was very difficult to identify the parameters of other models. However, DGKNN adopted a neural network method to fit the parameter relationship inside the battery, which greatly simplified the process of parameter identification and thus reduced the difficulty of modeling. Table 3 showed the difference between the MAE, MSE and RMSE of the DGKNN algorithm, AGA-UKF-EKF algorithm (Wu et al. 2023b) and ACHF algorithm (Serat et al. 2023).

Table 3 Comparison of estimated errors

From the RMSE data in the table, the DGKNN algorithm could effectively filter noise and better fit the actual working conditions of the battery. The smaller the ratio of the three pairs, the better the outcomes. The smaller the ratio indicated that the experimental model had better accuracy and could better predict the data. The average absolute error of the DGKNN algorithm was 0.32%, and the maximum error was 0.84%. Compared with the errors of the other two algorithms, the error of the DGKNN algorithm in estimating the SOC of lithium battery was smaller, indicating that the improved algorithm can estimate the battery capacity in real time and reduce the influence of capacity on the estimated SOC of the battery. Compared with other algorithms, the improved algorithm had higher precision and its results were more reliable.

Comprehensive model validation

The study compared the predictive effectiveness of four models for estimating the remaining capacity of lithium-ion batteries. These models included the research design model, Support Vector Regression Model (SVR), Genetic Algorithm-Backpropagation Model (GA-BP), and Particle Swarm Optimization-Adaboost Model (PSO-adaboost). The predictive results were illustrated in Fig. 9.

Fig. 9
figure 9

Comparison of prediction effect of different models

As shown in Fig. 9, four different models were compared in SOC prediction: the Research Design model, GA-BP model, PSO-adaboast model, and SVR model. Firstly, the Research Design model exhibited outstanding performance in SOC prediction. It achieved high accuracy, a well-fitting prediction curve, and minimal errors. This model accurately estimated battery states with high stability. Following that, the GA-BP model ranked second in prediction performance, closely following the Research Design model. Despite its slightly higher complexity, the GA-BP model demonstrated robustness and accuracy. Thirdly, the PSO-adaboast model ranked third in prediction performance, while the SVR model performed the least effectively. The main reason for this phenomenon was that the initial weights in DGKNN were random numbers. Although the GA has been optimized, the results of the GA itself had randomness, while the parameters of SVR had no random numbers. At the same time, compared with neural networks, FCDE-SVR had fewer requirements on the amount of data. Therefore, although the FCDE-SVR algorithm required a period of training, the algorithm was more complex, but stable with high precision. Error analysis was presented in Table 4.

Table 4 Comparison of prediction errors of different models

As shown in Table 4, across all three data set levels, DNN exhibited the highest error percentages, MSE, and RMSE values. For example, in the large data set, its error percentage was 0.271, MSE was 0.111, and RMSE was 0.3334. This may indicate that DNN’s predictive performance for these data sets was inferior to other models. In contrast, the Research model consistently showed the lowest error percentages, MAE, MSE, and RMSE across all three data set levels. For instance, the error percentage for the small data set was only 0.1518, and the RMSE was only 0.0604, highlighting the Research model’s superior predictive accuracy. As for the other three models (SVR, GA-BP, and PSO-adaboast), their error metrics exhibited similar trends across small, medium, and large data sets. Generally, as the data level increased, the error metrics also increased, possibly due to a decrease in predictive accuracy with increased data complexity. For instance, in the small data set, GA-BP and PSO-adaboast had similar RMSEs, both around 0.07. However, in the large data set, GA-BP’s MSE increased to 0.0768, while PSO-adaboast’s increased to 0.0793. Considering these factors, the Research model outperformed the other models, indicating that its structure or parameter settings were more suitable for predicting this type of data. Table 5 showed the comparison of errors of different models at 6 different temperatures.

Table 5 Comparison of errors of different models at 6 different temperatures

As can be seen from the table, the errors of all algorithms increased significantly at 0 °C, because the battery activity decreased at a lower temperature, and the battery internal resistance increased, resulting in a cumulative increase in errors during the estimation process. Compared with other methods, the proposed model had obvious advantages. Especially at low temperatures, the accuracy of the model fluctuated the least compared with that at normal temperatures, which indicated that the algorithm proposed in this paper had better robustness at different temperatures. Battery state analysis is depicted in Fig. 10.

Fig. 10
figure 10

Real-time change curve of each battery parameter

In Fig. 10, the Research Design model demonstrated real-time capabilities along with precise analysis and processing of battery operating parameters. These parameters include current, terminal voltage, temperature, vehicle speed, and battery power. Real-time monitoring of these parameters not only aids in anticipating and preventing potential faults but also allows users to understand the battery’s usage conditions and take measures to improve battery performance and extend its lifespan. This helps users gain better insights into the status of electric vehicles and their batteries, ensuring the safe and efficient use of electric vehicles.

Results and discussion

In this paper, a DGKNN based on genetic optimization is proposed, which combines the advantages of both the neural network method and Kalman filter method. It uses the neural network method to fit battery parameters to simplify the parameter identification process of Kalman filter. The model is updated with Kalman filter to provide noise removal capability and robustness. At the same time, the initial error is greatly reduced by GA optimization. The results of this paper show that DGKNN’s prediction effect is significantly better than that of the initial neural network, and DGKNN does not need to undergo long-term and large data training like the initial neural network, which means that if the battery is replaced, the initial neural network will have to undergo long-term training, which will have a great blow to the practicality of its algorithm. This also means that DGKNN has an advantage over initial neural networks in real-time SOC prediction. Therefore, DGKNN based on genetic optimization is a very suitable algorithm for SOC prediction.

In the selection of the correction coefficient, the research is obtained by comparing and analyzing the known standard values and the measured results. When the deviation of the measured result is relatively large, the correction coefficient can be used to correct the measured data to make it closer to the real value. When the error of the measurement result is relatively large, the correction coefficient can be used to adjust the measurement data, reduce the error, and improve the accuracy and reliability of the measurement. However, since the initial weights of DGKNN are obtained by the method of random generation before genetic optimization, both steps will lead to randomness of the weights, resulting in the generated weights being not fixed, and the results of each prediction will be different, which will make the algorithm unstable and the prediction effect will be good or bad. At the same time, the prediction effect of both charge and discharge data is not ideal.

In order to solve the problem of the DGKNN model, the FCDE-SVR algorithm is proposed. The algorithm forms clusters with different features through feature clustering and data screening, which effectively reduces the amount of data and the extracted data features are more obvious, which is suitable for modeling. Using SVR as the sublearning machine of ensemble learning, each cluster is trained by SVR, and then each sublearning machine is integrated to get the final result. The method obtains the feature relationship between data by combining density and distance, and updates the weights in the process of ensemble learning based on this, and realizes the selection of sub-data sets and the integration of the final prediction model.


In addressing the real-time prediction of the remaining battery capacity in electric vehicles, this study aimed to design a SOC prediction model with high accuracy and stability. Combining the advantages of Dynamic Kalman Neural Network with GA and SVM, the research employed a fusion strategy to tackle the substantial demand for sample data and device hardware overhead. The results revealed that, with a correction coefficient of 0.7, the predictive error percentage of the designed model in the study was minimized to only 0.1529%. The average absolute error reduced to 0.0324, while the RMSE decreased to 0.0604. When handling datasets of different scales, the designed model in the study consistently maintained an advantage in predictive accuracy. In the case of large-scale datasets, the error percentage was 0.1518, and the RMSE was 0.0604. Meanwhile, other models such as DNN, SVR, GA-BP, and PSO-adaboast exhibited relatively poorer performance across various dataset levels, especially in the scenario of large-scale datasets. In summary, the SOC prediction model proposed in this study, based on DGKNN combined with GA and SVM, demonstrates a significant advantage in accuracy and real-time performance. SOC estimation is not only affected by temperature, but also its accuracy is affected by the current remaining capacity. In SOC estimation, the default capacity is unchanged, while in the actual situation, the battery capacity is constantly changing with the temperature and the number of cycles, and the relative capacity of the battery can be calibrated in the future.

Availability of data and materials

All data generated or analyzed during this study are included in this published article.


  • Anwar MB, Muratori M, Jadun P, Hale E, Bush B, Denholm P, Ma O, Podkaminer K (2022) Assessing the value of electric vehicle managed charging: a review of methodologies and results. Energy Environ Sci 15(2):466–498

    Article  Google Scholar 

  • As’ad F, Avery P, Farhat C (2022) A mechanics-informed artificial neural network approach in data-driven constitutive modeling. Int J Numer Methods Eng 123(12):2738–2759

    Article  MathSciNet  Google Scholar 

  • Cao H, Wu Y, Bao Y, Feng X, Wan S, Qian C (2023) UTrans-Net: a model for short-term precipitation prediction. Artif Intell Appl 1(2):106–113

    Google Scholar 

  • Chen K, Zhou S, Liu K, Gao G, Wu G (2023) State of charge estimation for lithium-ion battery based on whale optimization algorithm and multi-kernel relevance vector machine. J Chem Phys 158(10):104110–104120

    Article  CAS  PubMed  ADS  Google Scholar 

  • Chen P, Jin X, Han XF (2024) Joint estimation of state of charge and state of health of lithium ion battery. J Electrochem Energy Convers Storage 21(01):11008–11020

    Article  CAS  Google Scholar 

  • Cui Z, Wang L, Li Q, Wang K (2022a) A comprehensive review on the state of charge estimation for lithium-ion battery based on neural network. Int J Energy Res 46(5):5423–5440

    Article  CAS  Google Scholar 

  • Cui Z, Kang L, Li L, Wang L, Wang K (2022b) A hybrid neural network model with improved input for state of charge estimation of lithium-ion battery at low temperatures. Renew Energy 198:1328–1340

    Article  Google Scholar 

  • Dou Y, Kan D, Su Y, Zhang Y, Wei Y, Zhang Z, Zhou Z (2022) Critical factors affecting the catalytic activity of redox mediators on Li–O2 battery discharge. J Phys Chem Lett 13(30):7081–7086

    Article  CAS  PubMed  Google Scholar 

  • Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407

    Article  Google Scholar 

  • Feng K, Ji JC, Ni Q (2023) A novel adaptive bandwidth selection method for Vold-Kalman filtering and its application in wind turbine planetary gearbox diagnostics. Struct Health Monit 22(2):1027–1048

    Article  Google Scholar 

  • Liang G, Rodriguez E, Farivar GG, Ceballos S, Townsend CD, Gorla NB, Pou J (2022) A constrained intersubmodule state-of-charge balancing method for battery energy storage systems based on the cascaded H-bridge converter. IEEE Trans Power Electron 37(10):12669–12678

    Article  ADS  Google Scholar 

  • Liu K, Sun Y, Yang D (2023) The administrative center or economic center: which dominates the regional green development pattern. A case study of Shandong Peninsula urban agglomeration, China. Green Low-Carbon Econ 1(3):110–120

    Article  Google Scholar 

  • Ly A, El-Sayegh Z (2023) Tire wear and pollutants: an overview of research. Arch Adv Eng Sci 1(1):2–10

    Article  Google Scholar 

  • Mastoi MS, Zhuang S, Munir HM, Haris M, Hassan M, Usman M, Bukhari SS, Ro JS (2022) An in-depth analysis of electric vehicle charging station infrastructure, policy implications, and future trends. Energy Rep 8:11504–11529

    Article  Google Scholar 

  • Mokayed H, Quan TZ, Alkhaled L, Sivakumar V (2023) Real-time human detection and counting system using deep learning computer vision techniques. Artif Intell Appl 1(4):221–229

    Google Scholar 

  • Revach G, Shlezinger N, Ni X, Escoriza AL, Van Sloun RJ, Eldar YC (2022) KalmanNet: neural network aided Kalman filtering for partially known dynamics. IEEE Trans Signal Process 70:1532–1547

    Article  MathSciNet  ADS  Google Scholar 

  • Serat Z, Fatemi SAZ, Shirzad S (2023) Design and economic analysis of on-grid solar rooftop PV system using PVsyst software. Arch Adv Eng Sci 1(1):63–76

    Article  Google Scholar 

  • Venkitaraman AK, Kosuru VSR (2022) A review on autonomous electric vehicle communication networks-progress, methods and challenges. World J Adv Res Rev 16(3):13–24

    Article  Google Scholar 

  • Wu Y, Zhao Z, Hao X, Xu R, Li L, Lv D, Huang X, Zhao Q, Xu Y, Wu Y (2023a) Cathode materials for calcium-ion batteries: current status and prospects. Carbon Neutraliz 2(5):551–573

    Article  Google Scholar 

  • Wu Z, Zhao Y, Zhang N (2023b) A literature survey of green and low-carbon economics using natural experiment approaches in top field journal. Green Low-Carbon Econ 1(1):2–14

    Article  Google Scholar 

  • Xu Y, Chen X, Zhang H, Yang F, Tong L, Yang Y, Yan D, Yang A, Yu M, Liu Z, Wang Y (2022a) Online identification of battery model parameters and joint state of charge and state of health estimation using dual particle filter algorithms. Int J Energy Res 46(14):19615–19652

    Article  Google Scholar 

  • Xu H, Wang S, Fan Y, Qiao J, Xu W (2022b) A novel Drosophila-back propagation method for the lithium-ion battery state of charge estimation adaptive to complex working conditions. Int J Energy Res 46(11):15864–15880

    Article  Google Scholar 

Download references


Not applicable.

Author information

Authors and Affiliations



Yuanyuan Liu wrote the main manuscript text; Wenxin Dun reviewed and revised the manuscript. Both authors approved this submission.

Corresponding author

Correspondence to Yuanyuan Liu.

Ethics declarations

Ethics approval and consent to participate

No human or animal subjects or materials were used.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Dun, W. Integrated model construction for state of charge estimation in electric vehicle lithium batteries. Energy Inform 7, 19 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: