Grey-box modelling of lithium-ion batteries using neural ordinary differential equations

Grey-box modelling combines physical and data-driven models to benefit from their respective advantages. Neural ordinary differential equations (NODEs) offer new possibilities for grey-box modelling, as differential equations given by physical laws and neural networks can be combined in a single modelling framework. This simplifies the simulation and optimization and allows to consider irregularly-sampled data during training and evaluation of the model. We demonstrate this approach using two levels of model complexity; first, a simple parallel resistor-capacitor circuit; and second, an equivalent circuit model of a lithium-ion battery cell, where the change of the voltage drop over the resistor-capacitor circuit including its dependence on current and State-of-Charge is implemented as NODE. After training, both models show good agreement with analytical solutions respectively with experimental data.


Introduction
Lithium-ion batteries have become an integral part of our everyday lives: They supply smartphones and laptops with electrical energy, they are used as mobile power source in electric vehicles, and they help to secure the energy supply as stationary storage plants. To operate lithium-ion batteries safely and efficiently, we need a comprehensive understanding of how the battery works and which internal processes take place. Battery modelling is complex and parameterization is a demanding task. We show a novel way of equivalent circuit modelling of lithium-ion batteries using neural ordinary differential equations (NODEs).
With increasing digitization and the associated larger amount of available data, artificial intelligence and especially neural networks gain importance. Neural networks belong to the class of black-box (BB) models. They use measured data to learn relations between inputs and outputs of systems (Döbel et al. 2018;Estrada-Flores et al. 2006;Oussar and Dreyfus 2001;Duarte et al. 2004;Hamilton et al. 2017). As Ljung (1999) has already stated in 1999, known aspects should never be estimated. Therefore, it is reasonable to consider other modelling techniques. Physical modelling based on the usage of prior knowledge is a contrary approach. The resulting white-box (WB) models describe the underlying system dynamics in form of mathematical equations. Grey-box (GB) models combine WB and BB modelling techniques and thus the respective advantages (Döbel et al. 2018;Estrada-Flores et al. 2006;Oussar and Dreyfus 2001;Duarte et al. 2004;Hamilton et al. 2017).
Missing values and irregularly-sampled time-domain data are still demanding when using neural networks. NODEs are a promising approach to deal with these problems (Chen et al. 2018). In the context of modelling dynamic systems, NODEs are used to solve homogeneous differential equations (Chen et al. 2018). However, external variables that often are important to describe the behaviour of a dynamic system have not yet been taken into account.
The focus of the present study is on GB modelling of lithium-ion batteries using NODEs. In the 'Neural ordinary differential equations' section we introduce NODEs and show how they can solve inhomogeneous differential equations. 'Grey-box modelling' section deals with GB modelling in general and GB modelling using NODEs in particular. 'Modelling of lithium-ion batteries' section applies the main results to equivalent circuit modelling of lithium-ion batteries. In the 'Discussion and conclusion' section we discuss the results and draw conclusions.

Neural ordinary differential equations
In this section, we give a general intro to NODEs. We extend NODEs for BB modelling of dynamic systems including external variables. A resistor-capacitor (RC) circuit serves as application example.

Background neural ordinary differential equations
Modelling dynamic systems using neural networks has been addressed in many publications. For instance, Che et al. (2018) used recurrent neural networks (RNNs) to model multivariate time series with missing values and Bailer-Jones et al. (1998) modelled dynamic systems including external variables through RNNs. Liao and Poggio (2016) interpreted a residual neural network (ResNet) with shared weights as RNN.
A ResNet transforms the hidden states from layer t to t + 1 according to the recursive equation where z t ∈ R is the vector of the hidden states at layer t, θ t represents the learned parameters of layer t and f : R d → R d is a function preserving the dimension of the hidden states (He et al. 2016). Sharing the parameters across the layers (θ t = θ for t = 0, ..., T) leads to the explicit Euler discretization of the initial value problem (Chen et al. 2018;Haber and Ruthotto 2017;Ruthotto and Haber 2020;Dupont et al. 2019;Zhang et al. 2019;Haber et al. 2018;Gholami et al. 2019) NODEs specify the continuous change of the hidden states according to Eq. 2 using a neural network. Starting from the input layer z(0), a differential equation solver calculates the output layer z(T) (Chen et al. 2018;Dupont et al. 2019;Zhang et al. 2019;. The implementation of NODEs is challenging due to storage requirements for backpropagation during training. To deal with this storage problem Chen et al. (2018) proposed an adjoint sensitivity method for backpropagation. As discussed by Gholami et al. (2019), this method may lead to numerical instability and inaccurate gradients. Therefore, Gholami et al. (2019) introduced an Adjoint based Neural ordinary differential equation (ODE) framework using a checkpointing method. Chen et al. (2018) used NODEs for supervised learning tasks. They applied NODEs to normalizing flows even considering time-dependent dynamics and they modelled time series including irregularly-sampled data with NODEs. One can find many extensions to this approach in current research.

Black-box modelling of dynamic systems including external variables with NODEs
Usually ODEs are used to describe dynamic systems. When external variables influence the behaviour of the system the corresponding differential equations become inhomogeneous. Typical dynamic systems follow the equations where z are the state variables, u are the external variables, y are the outputs of the system, and f and g are (non-linear) continuous functions. The functions f and g could also depend on the time t explicitly. Then the dynamic system would be time-variant. Bailer-Jones et al. (1998) proposed a special form of RNN to model time-invariant systems. This RNN encourages the usage of NODEs for modelling dynamic systems including external variables. We generalize the initial value problem according to Eq. 2: As the neural network f in Eq. 4 depends on t explicitly, the considered systems may be time-variant. We can use one of the frameworks proposed by Chen et al. (2018) or Gholami et al. (2019) to implement the NODE. However, we have to consider the external variables as inputs to the neural network. Therefore, we provide a function describing the course of the external variables with time. The interpolation of measured data is a possible method. It is also conceivable to consider time dependencies (cf. time-dependent dynamics in Chen et al. (2018)). If it is known that the external variables follow specific functions of time, for example, sine functions, we provide the corresponding function Fig. 1 Including external variables in NODEs; z t represents the state variables at layer t, and u t represents the respective external variables parameters as inputs to the neural network. We then calculate the values at the considered time points during the forward pass. Figure 1 illustrates the suggested approach to include external variables when using NODEs schematically. A differential equation solver, for example, Euler, solves the NODE.
Minimization of the defined loss function optimizes the learnable parameters of the neural network. In the case that we cannot measure one or more of the state variables directly, a correct approximation of the course of these state variables with time cannot be guaranteed. If the corresponding trajectories are important, we have to provide additional information during training.

Application of black-box modelling with NODEs to RC circuit modelling
A parallel RC circuit fed by a current source that is part of a standard equivalent circuit model (ECM) serves as application example. The RC circuit is shown in Fig. 2. The output voltage v a is the voltage drop across the parallel connection of the resistor and the capacitor.
It follows the differential equation where R = 100 denotes the ohmic resistance and C = 10 mF denotes the capacitance.
The voltage v a is the state variable and the output of the system. The current i is the external variable. The initial output voltage is set to v a (0) = 1 V. We defined a NODE to approximate the derivative of the output voltage with time:

Fig. 2 RC circuit
where f is a linear feedforward network with two hidden layers with ten neurons each. We did not include biases. We initialized the weights from the uniform distribution U − √ k, √ k , where k = 1 l with l ∈ N the number of inputs to the respective layer. We used two sinusoidal current signals with different amplitudes, frequencies and phase shifts for training. The validation signal was also a sinusoidal signal. Its amplitude is time-dependent. In order to check the generalization ability of the investigated network, the product of a sine and a cosine function with different frequencies and phase shifts served as test signal. The different current signals were chosen as follows: We implemented our example in Python (version 3.7.6). The regarded time interval of 0.05 s duration was spaced into 200 time steps of random size between 9.9998 × 10 −10 s and 0.0013 s. The true output voltages were calculated through integration of Eq. 5 using the Python library SciPy (version 1.4.1) (Virtanen et al. 2020). No noise was included.
The standard odeint solver from torchdiffeq (version 0.1.1) (Chen et al. 2018) integrated the NODE using Euler's method. In each of the 3000 optimization steps, we chose one training sample randomly for gradient descent. In other words we performed 3000 optimization steps with stochastic gradient descent. Therefore, we provided the initial value of the voltage v a and the parameters describing the course of the currents. The current values were calculated during the forward pass.
An Adam (Kingma and Ba 2014) optimizer with learning rate l = 0.01 minimized the MSE loss of the learned and the calculated output voltages. The chain rule was used for backpropagation. We do not address memory limits in this paper. Therefore, we did not use any of the adjoint methods proposed by Chen et al. (2018) or Gholami et al. (2019) for backpropagation. However, the principal approach would be the same. The validation set was used to avoid overfitting. We used the model parameters which led to the minimal validation loss for the final test. The training procedure was repeated three times.
The resulting loss values are summarized in Table 1. The differences between optimization run 1 and 3 are small. Run 2 led to worse results regarding the training losses. However, the validation and test losses are better than in run 1 and 3. In total, the test losses are around one order of magnitude higher than validation and training losses. Figure 3 shows the results after training for run 2 in comparison to the results for GB modelling (next section). On the left, the courses of the true and learned output voltages are shown. On the right, the absolute approximation error displays the difference between learned and true output voltage. As receiving the optimum results is not in the focus, we neither performed hyperparameter tuning nor regularization.

Grey-box modelling
This section deals with GB modelling. We introduce a framework for GB modelling using NODEs and apply it to RC circuit modelling.

Background grey-box modelling
GB modelling combines the advantages of both WB and BB modelling. Prior knowledge is included in the modelling process. Therefore, reliable parameter estimation requires less data in comparison to BB modelling (Döbel et al. 2018;Estrada-Flores et al. 2006;Oussar and Dreyfus 2001;Duarte et al. 2004;Hamilton et al. 2017). Sohlberg (2003) differentiated two GB modelling procedures. One is to constrain the model parameters or variables of a BB model using prior knowledge. The resulting dark GB models use specific neuro-fuzzy network structures (Lindskog and Ljung 2000). The second procedure takes a WB model as basis for GB modelling. Sohlberg (2003) developed a GB model of a heating process based on a WB model. Hamilton et al. (2017) used Takens' method to build a GB model. Oussar and Dreyfus (2001) discretized a WB model and estimated unknown parameters.

Grey-box modelling with NODEs
Similar to the approaches of Oussar and Dreyfus (2001), Hamilton et al. (2017), and Sohlberg (2003), in which a WB model consisting of a system of differential equations forms the basis for GB modelling, we develop a GB modelling technique using NODEs. The GB model forms the forward pass of a neural network module. Therefore, we describe the derivatives of the state variables resulting from WB modelling inside the module. Single dependencies or entire equations in this WB model are then replaced with parametric parts. Additional assumptions going beyond the WB model can be added. The resulting combination of WB and BB parts forms the GB model. As described in section 'Black-box modelling of dynamic systems including external variables with NODEs' , we can include external variables. A differential equation solver evaluates the evolution of the state variables. Just as for BB modelling, we choose a loss function and an optimizer depending on the modelling task.

Application of grey-box modelling with NODEs to RC circuit modelling
The RC circuit from 'Application of black-box modelling with NODEs to RC circuit modelling' section serves as application example for GB modelling as well. The change of the output voltage with time depends on the current and the voltage itself. For this example, we have assumed that the proportionality factor 1/C of the current is known, but that we are unsure about the proportionality factor −1/(RC) of the voltage in Eq. 5. We used this prior knowledge of the dynamic system to derive a GB model using NODEs. The following relationship applies to the calculation of the derivative of the output voltage: where ω is the only learnable parameter. Equation 8 was implemented inside the forward pass of a neural network module. The general setting was the same as before during BB modelling. However, only 1000 optimization steps were carried out because the loss converged more quickly. To achieve lower loss values, we would have had to tune the hyperparameters. Again, we repeated the training three times. The resulting MSE losses are shown in Table 1. The differences between the three runs are insignificant. The results of the first run outperform the others marginally regarding the test losses. Figure 3 shows the results in comparison to the results for BB modelling. The results will be discussed in 'Discussion and conclusion' section.

Modelling of lithium-ion batteries
In this section, we describe how to model a lithium-ion battery in form of an ECM. We give an overview of the usage of neural networks in the field of battery modelling. Finally, we apply GB modelling using NODEs to an equivalent circuit of a battery.

Background modelling lithium-ion batteries
Equivalent circuit modelling is a common approach for battery modelling. ECMs consist of electrical elements that describe the dynamic behaviour of batteries in a simple way and with a few parameters and states. Therefore, they often are used for SOC and Stateof-Health prediction (He et al. 2011;Wang et al. 2017). Standard ECMs consist of an SOC-dependent voltage source, a series resistor and one or more RC circuits (He et al. 2011;Fleischer et al. 2014;Chen and Rincon-Mora 2006;Haifeng et al. 2009;Hu et al. 2009;Tong et al. 2015;Krewer et al. 2018). We can extend the standard ECM by taking into account that the circuit parameters depend on SOC, temperature, the applied current or the cycle number (Chen and Rincon-Mora 2006;Krewer et al. 2018).
Neural networks are used to model lithium-ion batteries more often. For example, Zhang et al. (2019), Jiménez-Bermejo et al. (2018, and Charkhgard and Farrokhi (2010), and Almeida et al. (2020) estimated the SOC of batteries with neural networks.  used a neural network to estimate the State-of-Health of a battery with the parameters of an ECM as inputs. Krewer et al. (2018) summarized BB and GB modelling approaches to estimate the SOC and State-of-Health of lithium-ion batteries. In contrast to that, Wu et al. (2018) simplified battery design through neural networks. Turetskyy et al. (2019) combined a physical battery model and a feedforward network for end-of-line battery cell characterization.

Equivalent circuit model of a lithium-ion battery
We used a simple ECM consisting of an SOC-dependent voltage source, a series resistor and one RC circuit for modelling the dynamic behaviour of a lithium-ion battery. The equation system describing the chosen ECM can be found in He et al. (2011), andTong et al. (2015). We directly included parameter dependencies on current and SOC: where C N is the nominal battery capacity, R S (SOC, i bat ) and R 1 (SOC, i bat ) are the ohmic resistances depending on SOC and battery current, C 1 is the capacitance, and v OC (SOC) is the SOC-dependent open-circuit voltage (OCV). The battery voltage v bat is the output of the dynamic system, and the current i bat is the external variable. Figure 4 shows the corresponding ECM.

Grey-box modelling of a lithium-ion battery
As demonstration of the methodology, we described the charging and discharging characteristics of a lithium-ion battery used for stationary energy storage with a GB model using NODEs. The considered battery has been characterized experimentally in detail before by Yagci et al. (2021). It is a prismatic single cell of the Chinese manufacturer CALB with a rated (data-sheet) capacity of 180 Ah and a real (measured) capacity of 202 Ah. The chemistry is lithium iron phosphate (LFP) at the positive electrode and graphite at the negative electrode. This type of cell is typically applied in home storage systems. The electrical and thermal behaviour of the cell was investigated under controlled environment (CTS climate chamber) using a battery cycler (Biologic). For the present investigations, only a part of the available experimental data was used, in particular charge and discharge curves at T = 20°C obtained with a constant current, constant voltage (CCCV) discharging/charging protocol at different C-rates (CC phase) of 0.02 C, 0.1 C, 0.28 C, and 1 C, and a cut-off current (CV phase) of C /20. The data was available and used as voltage-versustime and current-versus-time series, for which we reduced the number of data points to 100 for each discharge and charge process. For the model, the experimental current was used as input. The presentation of the results below is in the form of voltage versus SOC, which allows a better comparison of the different C-rates. Details of the experimental approach can be found in Yagci et al. (2021). We used the ECM according to Eq. 9 as basis for GB modelling. The state Eqs. 9a and 9b were implemented inside a neural network module. Linear interpolation of the measured current values led to a function describing the current's temporal progress. We provided this function as external input to the GB module. The derivatives of the states were the outputs of the module. As the nominal capacity C N of the battery is known, we were able to implement Eq. 9a directly. According to the results in Yagci et al. (2021) we chose C N = 202 Ah. The ohmic resistances R S and R 1 and the capacitance C 1 are not known. In addition, we wanted to include parameter dependencies on current and SOC. Therefore, we used neural networks to approximate Eq. 9b. In detail, one feedforward network approximated the change of the voltage drop across the RC circuit dependent on SOC and the voltage v RC1 itself and a second neural network included the current dependency. Overall, we obtained the following equation system: where f and g are representing feedforward networks. We chose a linear network with six hidden layers with ten neurons each for f and a network with eight hidden layers with 50 neurons each, sigmoid activation, and one output neuron for g. We did not include biases and we initialized the weights as in 'Application of black-box modelling with NODEs to RC circuit modelling' . Again, we used the standard odeint solver from torchdiffeq (Chen et al. 2018) to integrate the equation system using Euler's method. The solutions of 10 were used to calculate the battery output voltage (cf. 9c). We implemented the nonlinear v OC (SOC) curve according to Yagci et al. (2021), andMayur et al. (2019). We then used the calculated SOC to obtain the OCV v OC . To approximate the voltage drop over the serial resistor R S , we used an additional feedforward network. With this, the battery output voltage was calculated according to following equation: with the linear feedforward network h consisting of five hidden layers with 50 neurons each. An Adam (Kingma and Ba 2014) optimizer minimized the L1 loss of the approximated and the measured battery voltage. Here we chose a decaying learning rate between 1 × 10 −1 and 1 × 10 −5 . The 0.02 C charge curve, the 0.1 C discharge curve, and the 1 C discharge and charge curves were used for training. In total, 2000 optimization steps were carried out with stochastic gradient descent. The 0.02 C discharge curve, the 0.1 C charge curve, and the 0.28 C discharge and charge curves were used for testing. To plot the measured voltages against the SOC, we calculated the SOC according to SOC = Q /C N with Q the measured charge throughput. The results are shown in Fig. 5. The left panel shows a comparison of experimental and simulated CCCV charge and discharge curves for different C-rates at T = 20°C. The absolute approximation error is shown in the right panel.

Discussion and conclusion
In the 'Application of black-box modelling with NODEs to RC circuit modelling' section we used a BB model with NODEs to approximate the output voltage of an RC circuit. We used different current signals as external variables. The results show that the BB model was able to fit the data and to include the external variable.
We approximated the output voltage of the same RC circuit using a GB model with NODEs in the 'Application of grey-box modelling with NODEs to RC circuit modelling' section. Using the provided current signal, the model was able to fit the course of the output voltage with time. Although the training and validation losses are higher than for BB Fig. 5 Simulation results using NODEs for grey-box modelling of a lithium-ion battery in comparison to experimental data; left: CCCV charge and discharge curves for different C-rates at T = 20°C. The lower branches represent discharge (time progresses from right to left), while the upper branches represent charge (time progresses from left to right); right: absolute approximation error modelling, the test losses are smaller by a factor of around 2.5. Therefore, we can conclude that the generalization ability of the GB model is better. The usage of prior knowledge leads to simpler structures and dependencies to be learned by the BB part of the model.
Finally, we applied the proposed GB modelling framework to an equivalent circuit of a lithium-ion battery. In the 'Grey-box modelling of a lithium-ion battery' section we showed that NODEs can be used for modelling highly nonlinear functions including external variables. We demonstrated how to combine these with ODEs. The simulations show a reasonable agreement with experimental data for low C-rates (0.02 C, 0.1 C and 0.28 C). It is worthwhile noting that the OCV hysteresis typical for LFP cells (Dreyer et al. 2010) can be reproduced with the NODE without requiring an additional physical model equation. This demonstrates the flexibility of the methodology with respect to complex cell behaviour.
For 1 C the end of charging and discharging including the CV phase cannot be approximated in a proper way. Here, the training was difficult because at the beginning and the end of CCCV charging and discharging the voltage curves are very steep and the difference between OCV and output voltage of the battery is small. Some of the measurement curves even crossed the curve of the OCV in these areas. Possible reasons could be measurement inaccuracies, the impact of the capacity-rate effect or temporal changes in battery performance. As outlook, the use of more training data could improve the results. In particular, it would be interesting to use additional data from pulse tests for training and to simulate realistic load profiles. Additionally, more sophisticated ECMs could improve the simulation results. For example, we didn't include a Warburg impedance in our ECM.
We have shown that using NODEs can be a powerful strategy for modelling dynamic systems including external variables. NODEs allow the usage of irregularly-sampled data for training and evaluation. Furthermore, NODEs can be used for GB modelling. We have introduced a framework to combine ODEs and NODEs. This offers new possibilities in GB modelling of dynamic systems.
The approximation capabilities of the GB model using NODEs could be improved further applying regularization and hyperparameter tuning. Beyond this, the direct comparison of simulation results gained from a WB and a GB model using NODEs of a complex dynamic system would be interesting.