Skip to main content

Enhanced fault detection in polymer electrolyte fuel cells via integral analysis and machine learning


The growing energy demand and population raising require alternative, clean, and sustainable energy systems. During the last few years, hydrogen energy has proven to be a crucial factor under the current conditions. Although the energy conversion process in polymer electrolyte fuel cells (PEFCs) is clean and noiseless since the only by-products are heat and water, the inside phenomena are not simple. As a result, correct monitoring of the health situation of the device is required to perform efficiently. This paper aims to explore and evaluate the machine learning (ML) and deep learning (DL) models for predicting classification fault detection in PEFCs. It represents a support for decision-making by the fuel cell operator or user. Seven ML and DL model classifiers are considered. A database comprising 182,156 records and 20 variables arising from the fuel cell's energy conversion process and operating conditions is considered. This dataset is unbalanced; therefore, techniques to balance are applied and analyzed in the training and testing of several models. The results showed that the logistic regression (LR), k-nearest neighbor (KNN), decision tree (DT), random forest (RF), and Naive Bayes (NB) models present similar and optimal trends in terms of performance indicators and computational cost; unlike support vector machine (SMV) and multi-layer perceptron (MLP) whose performance is affected when the data is balanced and even presents a higher computational cost. Therefore, it is a novel approach for fault detection analysis in PEFC that combines the interpretability of different ML and DL algorithms while addressing data imbalance, so common in the real world, using resampling techniques. This methodology provides clear information for the model decision-making process, improving confidence and facilitating further optimization; in contrast to traditional physics-based models, paving the way for data-driven control strategies.


With global energy demand expected to increase between 25 and 58% by 2050, finding clean, sustainable alternatives to fossil fuels is more critical than ever (Ruijven et al. 2019). This situation worsened with the COVID-19 epidemic because mitigation practices and measures had gradual impacts on energy demand and consumption (Jiang et al. 2021). For example, in Saudi Arabia, a virus lockdown in 2020 caused a 16% increase in electricity consumption compared to the level in 2019 (Aldubyan and Krarti 2022). However, academic literature indicates that spatial and temporal variations make each region Face its challenges in the energy sector (Jiang et al. 2021).

During the last years, hydrogen energy has proven to be a crucial factor under the current conditions (Yue et al. 2021). Hydrogen is a suitable energy carrier that can be used at different stages of the energy cycle. An electrochemical device that converts chemical energy into electrical energy in a noiseless and efficient manner is the fuel cell (FC) (Saikia et al. 2018). In general, hydrogen has managed to establish itself in some specific niches. A notable example is forklifts, where hydrogen fuel cells have gained popularity due to their efficiency and performance. While hydrogen vehicles are already commercially available in several countries, indicating significant progress in the adoption of this technology. In the field of domestic use, fuel cells have also experienced notable growth, with 225,000 heating systems sold. This shows that FC technology is starting to gain traction in the market, which is a crucial step towards widespread adoption (Staffell et al. 2019).

Even though it seems that the fuel cell and their different classification is a recent development, this technology originated in the nineteenth century with the first studies of the Welsh physicist and jurist Sir William Grove on gaseous batteries, whose results were published in 1843 (Grove 1839). In addition, another study was presented by Christian Friedrich Schönbein in Switzerland (Klell et al. 2023). Within the group of fuel cells, there is a special place for polymer electrolyte fuel cells (PEFCs), which have demonstrated prospective energy conversion due to their high energy efficiency, minimal operating temperature, soft emissions, and quick beginning (Calili-Cankir et al. 2022). So, this technology can power zero-emission vehicles to provide clean, reliable electricity in remote areas; this makes its global impact revolutionary (Ogawa et al. 2018).

Although the energy conversion process in a PEFC is clean, since the only by-products are heat and water, the inside transport phenomena are complex. As a result, correct monitoring of the health situation of the device is required to perform efficiently (Carrette et al. 2000). Inside a PEFC, the energy conversion is given by the electrochemical reactions in the anode and cathode reaction zones. Inlet temperature, relative humidity, and inlet pressure are some variables that must be considered to evaluate the device's behavior. Also, thermal and water management directly affects PEFC health and behavior (Melo et al. 2020; Nguyen and White 1993).

The main constitutive element affected by the operating hours is the electrolyte, i.e., a Nafion membrane, which allows only the passing of positive ones. At the same time, the electrical energy is used in the external circuit. The membrane can be subjected to excess water (flooding) or low humidity (dry), and then the PEFC would fault. Moreover, PEFCs are exposed to different defects that can reduce performance and even result in total failure (Rama et al. 2008). PEFC technology is still immature and should be more reliable and sturdier than presently (Lee et al. 2018, 2017; Mekhilef et al. 2012). Device lifetime is the primary challenge to a successful deployment of PEFCs. Thus, diagnosis and fault detection systems (FDD) are fundamental to guarantee the reliable operation of PEFCs (Li et al. 2019).

This research aims to explore and evaluate the capabilities of the machine and deep learning models for predicting classification fault detection in PEFCs, which will support decision-making by the fuel cell operator or user. For this purpose, seven machine and deep learning model classifiers are considered. Experiments are performed using a database of 182,156 records and 20 variables arising from the fuel cell's energy conversion process and operating conditions.

The rest of the paper is organized as follows. "Literature review" section summarizes works related to this study. "Methodology" section presents dataset features, resampling techniques, and a brief explanation of the ML and DL techniques to be applied. "Experimental results" section shows the experimental evaluations and discussion of the results. Finally, "Discussion" section includes some final comments and directions for future work.

Literature review

In the Fuel Cell technology and within the specific PEFC's area, advances in Machine learning (ML) and deep learning (DL) have facilitated the development of fuel cell design, material choice, correlations, system control with FC, power management, and monitoring of operation health of FC (Wang et al. 2020).

ML is the most broadly utilized subclass of AI, while DL uses multi-layered neural networks to learn from enormous amounts of information. According to a US Department of Energy study, ML, DL, and artificial intelligence (AI) have increased interest from energy development and material designers. The number of copyrights of these areas in the energy field in 2000–2017 has increased considerably (US Department of Energy 2023).

In that sense, many ML and DL methods can be used to perform the analysis of PEFC. Through this study, the most widely used will be analyzed in detail.

One of the first methods of ML to be analyzed in this study is the so-called Logistic Regression (LR). In several studies, LR is used in part with other ML methods in the analysis of PEFC. For instance, in Eslamibidgoli et al. (2021), researchers presented a convolutional neural network method (ConvNets) for the high throughput testing of spread electron microscopy images at the ink phase. The first steps are data sampling and data augmentation at multiple levels and using a network model pre-trained on a large generic natural images dataset (ImageNet) for transfer learning. A final part of the methodology is training an LR model on attributes obtained from transfer learning. According to the results, preparing the training set with Selective Search (SS) followed by a pre-trained model for feature extraction and training the LR model for classification was a fast and accurate process for the presented problem. Finally, it was concluded that convolutional neural networks are skilled at quick distinction between non-good samples and the optimal ones. Moreover, the algorithms can be trained for fabrication processes, e.g., recognition of the slurry during the fabrication of Li-ion electrodes.

Another case in which LR is employed with other ML methods is presented in Xing et al. (2022). In the mentioned study, a Fault Detection and Isolation (FDI) algorithm was developed on a matrix that describes the connection between faults and residuals called the fault signature matrix. The efficiency of the proposed method is proven by comparing the diagnostic results in this work and those obtained by support vector machine (SVM) and LR. In addition, the authors presented a novel data-driven approach based on sensor pre-selection and artificial neural networks (ANN). Many steps were accomplished in this study: first, a sensitivity analysis was applied to get the features of sensor data in the time domain and frequency domain. Later, a filter procedure was used to exclude the sensors with poor response to the changes in system states. Ultimately, the investigational data supervised by the outstanding sensors are employed to verify the model using the ANN model. Levenberg–Marquardt (LM) algorithm, resilient propagation (RP) algorithm, and scaled conjugate gradient (SCG) algorithm are utilized in the neural network training, respectively.

In recent years, other cases of LR usage have been reported in the literature. For instance, in Chen et al. (2024), the researchers used logistic regression models to evaluate the effects of continuing environmental air contamination and use of solid fuel on postmenopausal females with 10-year high cardiovascular disease (CVD) risk. Another recent study is explained in Modanloo et al. (2024) where many approaches are used for the manufacturing of microchannels in bipolar plates (BPPs) of a proton exchange membrane fuel cell (PEMFC) using the stamping process. During the processes, a regression model (RM) is used to predict the filling rate of the microchannel that comes from the response surface method (RSM).

Specifically, SVM is another ML method of analysis. An example of SVM's use in the PEFC analysis is presented in Li et al. 2014a, where an online fault diagnosis of twenty PEFC stacks was implemented. The proposed approach used SVM and a fluidic model to diagnose the faults in water management. The health states of the stack can be found using the fluidic model. This method is then dedicated to identifying the training data used to train the dimension reduction model, Fisher discriminant analysis (FDA), and the SVM classifier.

Another application case of SVM is presented in Han and Chung (2017). In this work, researchers gave a hybrid model blending an SVM model with an empirical equation about the polarization curves of a PEMFC under different working conditions. Various processes, such as validating, testing, and training, were applied to the hybrid model. In this way, working data was obtained. The training part determined the model coefficients and hyper-parameters of the model. The output predicted by the trained hybrid model, i.e., the polarization curves, were similar to the measurements for both the validation and the testing data set. Finally, they demonstrated that the hybrid model compensated the large prediction errors in the high voltage range of the polarization curves obtained by the singular SVM model. A final recommendation by the study exposed that this mixed model can predict the cell voltages of PEM fuel cells with high accuracy even when the main running variables (the oxygen relative humidity, the hydrogen and oxygen bay temperatures, the stack temperature, and the current density) vary.

Recently scholars in Quan et al. (2024) used SVM as a comparison method to test the diagnostic accuracy of an enhanced fault diagnosis method for fuel cell systems using a kernel extreme learning machine optimized with an improved sparrow search algorithm. The results based on the observations of the paper were 10.4% higher than support vector machine (SVM), and the operation time is only slightly higher than that of the SVM model.

In (Ma et al. 2022) scientists designed a driving pattern recognition with a model predictive control (DPR-MPC) to be used in a fuel cell hybrid electric vehicle (FCHEV) to tackle a technological challenge in FCHEV which is the power allocation between proton exchange membrane fuel cell (PEMFC) and lithium-ion battery remains. Among the methods, three SVMs were utilized to construct the recognizer, and the fuel cell efficiency was introduced into the cost function. Finally, the results of this study showed that the proposal avoided PEMFC and reduced fuel consumption by 6.67%.

The diagnostic accuracy of the proposed method is 10.4%, 5.7%, 4.8%, 4.2%, 3.0%, 1.8% higher than support vector machine (SVM), back propagation neural network (BPNN), Kernel Extreme Learning Machine (KELM), genetic algorithm-based KELM (GA-KELM), particle swarm optimization-based KELM (PSO-KELM) and SSA-KELM, respectively, and the operation time is only slightly higher than that of the SVM model and KELM model (Quan et al. 2024).

A third case of the ML approach is K-Nearest Neighbor (KNN). The use of KNN in the analysis of PEFC is presented in Onanena et al. (2012). Here, the authors assessed an algorithm based on KNN procedures and a multiclass linear discriminant analysis (LDA) classifier to identify drying and flooding on a PEMFC. This paper reports a form appreciation based on a judgment approach for fault-diagnosing 20 fuel cell stacks using Electrochemical Impedance Spectroscopy (EIS). To implement this methodology, researchers followed different steps: first, measurable features were extracted. Later, a fault categorization, a correlation-based feature selection, was executed, getting only significant features. Lastly, an assignment of each observation to one of the predefined defect classes was completed. According to the research, the results found with the two proposed feature extraction methods (FE1, FE2) and the two classifiers (LDA, KNN) correspond to the true class (false positive rate, true positive rate). Finally, this study concludes the KNN classifier with the FE1 features performs the maximum classification rate (99.6%).

A different case of using a KNN classifier in the context of other methods is explained in Detti et al. (2017). In this study, authors approach a PEMFC, drying out, and flooding diagnosis based on pattern recognition. The KNN method is used with KMEAN methods to find faults in PEMFC. The Total Harmonic Distortion (THD), computed using the Fast Fourier Transform of the voltage signal, is used as a signifier for an overall Pattern Recognition approach, i.e., fault detection and classification through a classification approach. In overview, the discovery of faults through the variation in the total harmonic distortion level and the frequency spectrum is initially applied. Then, based on supervised and non-supervised classification, fault documentation is performed, with an upright sorting rate of 84% and 98.5%.

In the last years, other studies about using the K-nearest neighbor (KNN) approach have been reported in Fuel Cell literature. For example, in Saxena et al. (2024) scholars propose a hybrid KNN-SVM machine learning approach for solar power forecasting to increase the exactitude of forecasting solar power from solar farms, and based on the results of this study, the precision of the predictions by 98%. One of the positive consequences of this method mix is the rise in the reliability of power system operators. On the other hand, in Awasthi et al. (2024) the researchers found that the use of KNN enhanced the grouping of faults in distribution networks with several generators. The KNN approach was based on the derivation of a database of fault events through a steady-state and short-circuit analysis. Apart from the efficacy shown by the KNN-based method, the approach considered the source impedance variation during faults, easing the fault type classification.

An alternative ML method is the Decision Tree (DT). A case study is examined in Santamaria et al. (2020) where difficult associations between pressure drops in PEMFC reactant channels and liquid phase behavior were disclosed. This study uses a supervised DT algorithm to develop new diagnostic tools to confirm flow-field design. The trained network predicts pressure drop down a reactant channel when a projected image of liquid phase distribution in the channel is used as an input. The results show significant improvement over prior efforts utilizing wetted area values to characterize fuel cell performance. This can result in high prediction error since the same water area can have numerous pressure signatures depending on its distribution.

Another example of DT usage is explained in Santamaria et al. (2021), where this method takes part of the overall methodology. This work presents a novel technique for two-phase data collection processing and its use in an ML algorithm. DT regression correlates the liquid allocations in reactant canals with the two-phase flow pressure drop along a 2.4 mm × 3.00 mm transparent channel by inserting liquid through a gas-diffusion layer (GDL), during which air was streamed through the channels. The DT models completed the pressure drop prediction with 90% accuracy, working with the liquid distributions as inputs and the related pressure drop data as outputs. The techniques were used in applications for fluid saturation estimation based on pressure and flow-field design.

Recently, other scholars found interesting results about the use of DT. One case to mention is shown in Lu et al. (2024). As part of a study on counter-flow mass transfer characteristics and performance optimization of commercial large-scale proton exchange membrane fuel cells (PEMFC), scholars shaped the PEMFC flow field model with a decision tree. In the scholar’s opinion, the reason for using DT was due to the ability to threaten complex datasets like in unknow distributions generating flexibly, and automatic adaptation of the dataset to specific characteristics. The model was trained with a dynamic dataset of 20% testing and 80% training features.

Another case of the use of DT is presented in Hai et al. (2024). In this study, research modeled a solid oxide fuel cell power plant combined with an absorption-ejection refrigeration cycle. The DT is used in the optimization section, where it was used with other machine learning methods including support vectors, and neural networks to reduce cost and computational time. With this optimization part, the optimization of this cycle shows that the ratio between the output hydrogen product and feedstock is enhanced up to 68%, with a cost rate reduced up to 9.7 dollars per hour.

A complementary ML method is the Random Forest (RF). A case about using RF and other ML methods is analyzed in the context of PEMFC technology. For instance, in Vaz et al. (2023), scholars look for new and alternative classes of membranes within the cells. Sulfonated polyimides (SPIs) based hydrocarbon membranes have been exploited in the work mentioned. Both supervised and unsupervised ML approaches are developed to predict the proton conductivity of SPIs; for instance, a Random Forest regression (RFR) model detected an extra set of features that can foresee proton conductivity with acceptable error. With this knowledge, correspondence about the features of the proton conductivity class labels has been explored. Then, the design of novel SPI polymer electrolyte membranes is possible while relating proton transport at the ionomer stage with factors such as the inter-chain interactions and morphology of the microstructure.

Another case of RF application is presented in Huo et al. (2021), where the RF algorithm, together with convolutional neural networks (CNN), are used as the basis of a performance forecast approach to reduce the number of needless experiments for the membrane electrode assembly (MEA) in the PEMFCs. In this research, the RF algorithm is used to choose the important components as the input of the model, as proved previously in other studies, enhancing the quality of the training dataset. On the other hand, CNN is used to create the performance forecast model, in which the I-V polarization curve is the output. All these steps are conducted because achieving the I-V polarization curve is a complex process, including thermodynamic, electrochemical, and hydrodynamic fields.

Another case of ML process is proposed in Zheng et al. (2017), where the research uses Bayesian methods to detect flooding and drying up in the PEMFC with records from the EIS in an offline manner. In this study, a naive Bayesian classifier was the methodology elected. Six operating modes (normal mode, moderate flooding, minor flooding, light flooding, moderate drying, and minor drying) with twelve input variables. A real PEMFC experiment dataset is adopted to confirm the approach's results. According to this experimentation, results show that the suggested model can effectively recognize the flooding fault of the fuel cell with over 99% accuracy under load-varying conditions.

On the other hand, an interesting method of DL is Multi-Layer Perceptron (MLP). This is commonly used in the analysis of PEFC technology. In (Vaz et al. 2023), the study used MLP with another surrogate model called Response Surface Analysis (RSA) together with optimization algorithms like NSGAII and Particle Swarm Optimization (PSO) to find out the optimal cathode catalyst layer (CL) parameters, i.e., platinum loading, the weight ratio of ionomer to carbon, the weight ratio of Pt to carbon, and porosity of the cathode CL. First, MLP, RSA, and PSO were combined into a single-objective optimization where the cell performance was maximized. Consequently, a multi-objective optimization was run with MLP integrated with the NSGAII algorithm, where the objective function was maximized while the overall PEMFC stack price was minimized.

Scholars in Quan et al. (2024) present another case of MLP application. In this study, authors find new and alternative ways to model PEMFC systems, which are nonlinear systems with multiple inputs (drawn current, the gas pressures at the anode and cathode side, and the humidity of these gases) and a single output, i.e., the cell voltage. The experimental data for identification is limited. Thus, this study investigates ANN as an MLP network with different numbers of unseen neurons. Following the other part of the study, a black-box model based on semi-empirical models available in the literature is informed. Six experimental campaigns were carried out for parameter identification and model validation to optimize the PEMFC behavior.

Not long ago, other scientists had other studies using MLP techniques in fuel cells. One case of this is shown (Ghorbanzade Zaferani et al. 2024) where the power density and voltage behavior of 13 glucose fuel cells were optimized through multi-layer perceptron (MLP) together with other techniques such as response surface methodology (RSM) and machine learning (specifically, Artificial Neural Network- ANN). Indeed, many tests were developed by using RSM followed by a mixed MLP-ANN. Finally, the researchers presented multi-objective optimization for the power density and the voltage of the fuel cells, obtaining the best optimum mode of each type of FC. Based on the results of MLP performance, the authors state that the proposed models forecast the fuel cells' power density and voltage behavior.

As part of the Neural Network (NN) method of different hybrid electric cars, in Liu et al. (2024) scholars used Multi-Layer Perceptron (MLP) controllers with numerous hidden layers. In this study, an adjustable seek-out variable that is the battery SoC as an input to the NN-based approach of the ending state of the Equivalent Consumption Minimization Strategy (ECMS), was introduced. Based on the results, scholars have stated the high adaptability of different charging/discharging modes of the battery in conventional networks considering state limitations. Moreover, the proposed NN method has reduced computation costs and time by about 95% with a very low loss of optimality in comparison with dynamic programming approaches. Additionally, the NN method demonstrated that it could save fuel by more than 3% on average.

In summary, different methods excel in various aspects associated with PEFC analysis, including fault detection, material selection, design optimization, and performance prediction. For example, LR and DT are well-suited for tasks such as fault diagnosis and pressure drop prediction, while KNN and RF excel in feature extraction and classification. MLP works well for modeling and optimization. However, combining multiple methods often leads to improved performance and accuracy. Therefore, exploring and evaluating automatic and deep learning models to determine and compare their performance against the tasks involved in PEFC analysis represents a significant contribution to facilitating the understanding of the behavior of this type of cell, as well as the development of new analysis methods.


Description of the features and data set split

The original database comprises 182,156 records and 20 variables arising from the fuel cell's energy conversion process and operating conditions (Mao et al. 2017). However, for this study and considering the literature and results presented in the preprocessing stage (Melo et al. 2022; Mao et al. 2021), only 12 variables or features were used. In detail, the preprocessing included the following stages: cleaning, Integration or exploratory data analysis (EDA), transformation, and reduction. These steps allowed us to eliminate inconsistent, incomplete, and erroneous data; as well as reduce the size of the data without losing important information using the filter method that was evaluated with the ANOVA test. Consequently, the data were prepared in such a format that it can be suitable for analysis of ML or DL models (Melo et al. 2022). Additionally, a categorical variable named "state" was created, which contains the state of health of the PEFC: normal or faulty. The description and unit of measurement of each of the variables are described in Table 1.

Table 1 Recorded measurements from the PEFC preprocessed database

According to the types of health states of the PEFC, 94% of the data corresponds to normal operating conditions, and the remaining 6% corresponds to the failure state. The database maintained an imbalance that favored the normal state. Tyagi & Mittal (Tyagi and Mittal 2019) explain that although real-world datasets are unbalanced, they negatively affect the accuracy of class predictions in classification problems. This bias can be managed by either oversampling the minority classes or undersampling the majority class.

Therefore, the unbalanced database was divided into training and test data before applying the ML and DL classification algorithms. Then, techniques were used to balance the database, and the training and testing procedure was repeated. This is to compare the results of the classification algorithms in both the unbalanced and balanced databases. For a better understanding, Fig. 1 summarizes the applied methodology.

Fig. 1
figure 1

Flow chart of fault classification procedures

Resampling techniques

Synthetic minority over-sampling technique (SMOTE)

SMOTE refers to a sampling method in which the minority classes of a feature set are oversampled from synthetic examples to rebalance the original training data set. Following the KNN logic, new classes are created by interpolating between several minority class instances that lie within feature space (Tyagi and Mittal 2019; Fernandez et al. 2018). Due to its simplicity and robustness when applied to different types of problems (Azad et al. 2022; Demidova and Klyueva 2017; Rupapara et al. 2021), SMOTE has become a standard in learning from unbalanced data (Fernandez et al. 2018). Specifically, in recent years, the application of this method has been extended to fuel cell fault detection problems, improving the accuracy of learning model evaluation (Wang et al. 2021; Zhang et al. 2020; Xiao et al. 2022).

The application of this method in the present study consisted of using the imbalance-learn library of the scikit-learn package in Python. This library contains the technique which was applied to the training data set. Afterward, the different classification models were used considering the balanced training set.

Near-miss under-sampling technique

Near-Miss is a subsampling method that removes records corresponding to the majority class from the training data set to reduce the distribution bias. Like SMOTE, this method follows the KNN logic for subsampling (Peng et al. 2019). Likewise, it has proven to be a robust technique in different applications on classification problems with large-scale unbalanced datasets (Bao et al. 2016; Mqadi et al. 2021). Despite its contribution to improving the performance of classification learning models, its scope is still limited in the field of renewable energies (Kulkarni et al. 2021). There is no evidence of its application to fuel cell fault detection problems.

As in the previous method, the imbalance-learn Python library was used for the Near-Miss application. In this library, the Near-Miss technique has three versions: (a) the first version selects the positive samples with the smallest mean distance to the nearest samples of the negative class; (b) the second version selects the positive samples with the smallest mean distance to the farthest samples of the negative class; (c) and the third version is based on keeping the nearest neighbors of each negative sample and then selecting the positive samples based on the mean distance between them and their nearest neighbors (Lemaître et al. 2016). The third version is applied to the training data set in this case.

Machine learning approach

Based on the review of the state of the art shown in the previous session, the following algorithms have been selected for this study:

Logistic regression (LR)

The logic of LR is based on the two possible outcomes, usually represented as "success" or "failure", or in math language “1” or “0”. Moreover, the sigmoid function which is an S-shape curve is used to determine the connection between the independent variables and the probability of the dependent variable. Then, this relationship becomes a linear connection, applying a natural logarithm of odds ratio, that is the proportion of the probability of success and the probability of failure (Madushani et al. 2023).

The LR is a technique used for statistical modeling in which the probability, \({P}_{1}\), of the dichotomous outcome event is related to a set of explanatory variables in the form (Eslamibidgoli et al. 2021; Schumacher et al. 1996; Vach et al. 1996):

$$logit\left({P}_{1}\right)={\text{ln}}\left(\frac{{P}_{1}}{1-{P}_{1}}\right)= {\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}+{\beta }_{3}{x}_{3}+\dots +{\beta }_{n}{x}_{n}=\sum_{i=1}^{n}{\beta }_{i}{x}_{i}$$

In Eq. (1), \({\beta }_{0}\) is the intercept and \({\beta }_{1}, {\beta }_{2,}\)…, \({\beta }_{n}\) are the coefficients associated with the independent variables \({x}_{1}, {x}_{2,}\)…, \({x}_{n}\). For the coefficients β0, β1, …, βn, the maximum likelihood estimation (MLE) is the most common method used.

The LR computes the variations in the logarithm of odds of the response variable instead of the shifts in the response variable itself. There are many kinds of independent (explanatory) variables. They could be dichotomous, discrete, continuous, or combinatorial. Usually, LR does not suppose linearity of the relationship between the explanatory variables and the variable response, and Gaussian-distributed independent variables are not required. The regressed correlation between the explanatory variables and their response is not linear because the logarithm of odds is linearly related to the explanatory variables. The chance of an event as a function of the explanatory variables is nonlinear, as derived from Eq. (1):

$${P}_{1}\left(x\right)=\frac{1}{1+{e}^{-logit\left({P}_{1}\left(x\right)\right)}}=\frac{1}{1+{e}^{-({\beta }_{0}+ \sum_{i=1}^{n}{\beta }_{i}{x}_{i})}}$$

For the specific case of Eq. (2), \({\beta }_{0}=-\frac{\mu }{s}\) known as the intercept, here \(\mu\) is the location parameter, i.e., the midpoint of the curve, where \(\rho\)(\(\mu )=0.5\), and s is a scale parameter. Moreover, \({\beta }_{1}=\frac{1}{s}\) is a rate parameter.

The LR will impose the probability values of P1 (x) to lie between 0 and 1 (P1 → 0 as the right-hand side of Eq. (2) approaches − ∞, and P1 → 1 as it approaches + ∞).

For example, if you want to predict whether a PEFC is working correctly or not, LR analyzes the PEFC data and calculates the probability that it is functioning correctly. That is, it is like a calculator that answers 0 (probable failure) and 1 (correct operation).

Support vector machine (SVM)

SVM is a classification method proposed by Platt (Platt 1998), which has been accepted in diagnosing fuel cell failures from a binary perspective. In other words, as Li et al. (Li et al. 2014b) indicate, the SVM is based on a hyperplane that looks for an optimal point to separate the data into two classes and, in turn, maximizes the margin between the hyperplane and the training records closest to each category (support vectors).

The binary SVM consists of training and performing processes based on collected (N1 + N2) labeled samples. \({z}_{1}, {z}_{2}, \dots , {z}_{{N}_{1}+{N}_{2}}\) from classes \({z}_{1}\) and \({z}_{2*{g}_{n}}\in \left\{-1, 1\right\}\) is the class label of the sample \({z}_{n}\) (-1 for class 1, 1 for class 2). Then, it solves the following quadratic problem:

$$minJ\left(x\right)=\frac{1}{2}\sum_{n=1}^{{N}_{1}+{N}_{2}}\sum_{m=1}^{{N}_{1}+{N}_{2}}{x}_{n}{x}_{m}{g}_{n}{g}_{m}k\left({z}_{n},{z}_{m}\right) -\sum_{n=1}^{{N}_{1}+{N}_{2}}{x}_{n}$$
$$s.t. {\sum }_{n=1}^{N}{x}_{n}{g}_{n}=0, 0\le {x}_{n}\le D \mathrm{for n }= 1, 2, \dots ,\mathrm{ N}.$$

The constraints expression below of \(minJ\left(x\right)\) is a set of linear restrictions, typically contending with optimization problems where \({x}_{n}\) and \({g}_{n}\) are variables or parameters. Moreover, N is the number of variables to be optimized, and D is the upper limit for each \({x}_{n}\). On the other hand, \({\sum }_{n=1}^{N}{x}_{n}{g}_{n}\) is the linear combination of the two variables. Altogether, are required to guarantee specific requirements are encountered.

where \({x={\left[{x}_{1}, {x}_{2}, \dots , {x}_{{N}_{1}+{N}_{2}}\right]}^{T}, k({z}_{n}, {z}_{m})}\) is a kernel function. Save support vectors: \({z}_{1}^{s}, {z}_{2}^{s}, \dots , {z}_{S}^{s}\) and corresponding \({g}_{n}\) and \({x}_{n}\), which are denoted by \(\left\{{g}_{n}^{s}\right\}\) and \(\left\{{x}_{n}^{s}\right\}\). Support vectors are those samples whose corresponding \({x}_{n}>0.\)

Sometimes, SVM can represent the input data like spaces with higher-dimensional characteristics to deal with an easier and separate hyperplane. This approach is called the kernel trick. linear, polynomial, radial basis function (RBF), and sigmoid kernels are cases of usual kernel functions.

For a new sample z, its class label is determined by:

$$g=sign\left\{{\sum }_{n=1}^{S} {x}_{n}^{s}{g}_{n}^{s}k\left({z}_{n}^{s}, z\right)+b\right\}$$

where b \(=\frac{1}{2}{\sum }_{j=1}^{S} \left({g}_{j}^{s}- \sum_{n=1}^{S}{x}_{n}^{s}{g}_{n}^{s}k\left({z}_{n}^{s}, {z}_{j}^{s}\right)\right)\)

From Eq. (4), it can be stated that \({g}_{j}^{s}\) is the output label related to the j-th support vector. On the other hand, \({x}_{n}^{s}\) and \({g}_{n}^{s}\) are characterized by the n-th data point and the n-th support vector, respectively. The term \(k\left({z}_{n}^{s}, {z}_{j}^{s}\right)\) is the kernel function that computes the relationships between the data points z and the support vectors.

Once SVM is trained, it can be used for categorization by determining based on the location of the test data among the sides of the hyperplane.

In simple terms, if you want to classify different types of faults in PEFC, SVM draws an imaginary line between the different types of faults to separate them. It is like a judge who decides what type of failure a PEFC has based on its data.

In the case of fuel cells, the classes correspond to the state of health (SoH) of the cell and can be healthy or faulty. Specifically, in this study, the SVM was configured in a supervised way from the variable created for the classification. The performance and efficiency analysis of the algorithm focuses on the database for testing.

K-nearest neighbor (KNN)

According to Demidova and Klyueva (2017), KNN should be the prime approach chosen between different classification methods based on the lack of information about a set of data distributions. This study proposed a non-parametric method for pattern classification, which developed into the KNN rule. Complementing the last starting point, in Cover and Hart (1967), scholars prearranged some formal features of the KNN rule. Moreover, researchers in Fukunaga and Hostetler (1975) introduced the Bayer error rate, and in Dudani (1976), distance-weighted approaches were applied instead.

This method generally tries to find the k closest points in the data to a set of query points. The KNN classifier usually uses the Euclidean distance between specified training samples and a test sample (Eslamibidgoli et al. 2021).

$$d({x}_{i},{x}_{l}) =\sqrt{{({x}_{i1}-{x}_{l1})}^{2}+{({x}_{i2}-{x}_{l2})}^{2}+\dots {+({x}_{ip}-{x}_{lp})}^{2}}$$

In Eq. (5), \({x}_{i}\) is an input sample with p features, which are \({x}_{i1}\), \({x}_{i2}\),…, \({x}_{ip}\). Moreover, n is the total number of input samples (i = 1, 2…, n). Figure 2 depicts an example of KNN.

Fig. 2
figure 2

K-nearest neighbor example: Vorroni tessellation showing Voronoi cells of 19 samples marked with a “ + ” Figure taken from Azad et al. (2022)

For example, if you want to know what type of failure an FC has, KNN looks for the FCs with faults most similar to the one you are observing and, based on its data, predicts what type of fault it has. It's like asking your neighbors what type of fault their FCs have to find out what type of fault yours has.

Decision tree (DT)

The DT is an approach when multifaced decision situations are broadly used in many industrial applications. When a complex function may be represented by DT algorithms classing new overlooked data and, thus, gets an endowed generalization feature. Therefore, this is one of the benefits of DTs, which is their ability to shatter a complicated decision into a set of simpler decisions based on the desired objectives (Han and Chung 2017). Another of the advantages of DT approaches is that their results can classify problems robustly with nature, non-parametric, and high computational effectiveness (Detti et al. 2017).

From a diagram point of view, a DT is also a flow diagram arrangement in which each internal node represents an "assessment" of a property, each branch represents the result of the trial, and each leaf node represents a decision taken after calculating all features called class labels—the paths from leaf to classification rules.

A DT consists of three types of nodes: (Santamaria et al. 2021) end nodes, classically characterized by triangles; decision nodes, classically represented by squares; and chance nodes, typically denoted by circles. Figure 3 shows a schematic of decision trees.

Fig. 3
figure 3

Decision Tree Based Approaches in Data Mining. Image taken from Zhang et al. (2020)

From a mathematical point of view, based on training vectors \({x}_{i}\in {R}^{l}, i=1, \dots , l\) and a label vector \(y\in {R}^{l}\), DT partitions the feature space recursively so that samples with the same label or target value are grouped. Then, \({Q}_{m}\) with \({n}_{m}\) samples could represent the data at node \(m\). For each split \(\theta =(j,{t}_{m})\) consisting of a feature j and threshold \({t}_{m}\), the data partition will be into \({Q}_{m}^{left}(\theta )\) and \({Q}_{m}^{right}(\theta )\) subsets, where:

$${Q}_{m}^{left}\left(\theta \right)=\left\{\left(x,y\right)|{x}_{j}\le {t}_{m}\right\}$$
$${Q}_{m}^{right}\left(\theta \right)={Q}_{m}\backslash {Q}_{m}^{left}\left(\theta \right)$$

The split of node \(m\) is determined by an impurity function \(H()\). It depends on the problem being solved (classification or regression). In this case, the objective is a classification outcome taking on values 0, 1, …, K-1, for node \(m\) let be the proportion of class k observations. The prediction probability for the region is \({P}_{mk}\), if \(m\) is a terminal node:

$${P}_{mk}= \frac{1}{{n}_{m}}\sum_{y\in {Q}_{m}}I(y=k)$$

Common impurity functions are Gini Eq. (8) and Log loss or Entropy Eq. (9):

$$H\left({Q}_{m}\right)= \sum_{k}{P}_{mk}(1-{P}_{mk})$$
$$H\left({Q}_{m}\right)= -\sum_{k}{P}_{mk}log({P}_{mk})$$

A simple example is that if you want to decide whether an FC needs maintenance or not, a DT helps you decide by asking questions about the FC data, such as voltage, current, and temperature. It's like a flow chart that guides you toward the best decision.

In practice, Decision Trees are often used as fostering additions for more complex methods such as Random Forests or Gradient Boosting Machines, which blend multiple Decision Trees to enhance their performance.

Random forest (RF)

The basis of an RF is the base learner, a binary tree built using recursive portioning. The classification and regression tree methodology (CART) is the approach to construct the base learner. This procedure has many dual divisions, recursively separating the structure into near-homogeneous or homogeneous terminus joins (Vaz et al. 2023).

The RF is usually a set of thousands of trees, where each single tree raises using a fuller sample boot of the initial data. In Breiman's RF model, every tree is lodged on the foundation of a training sample set and a random variable, the random variable corresponding to the Kth tree is denoted as ϴk, between any two of the random variables, resulting in a classifier h (X, ϴk) where X is the input vector. After a specific number of running (indeed k times), a classification sequence is obtained: h1(x), h2(x),… hk(x) to compute more than one arrangement model system. The final decision function is (Meiler et al. 2012):

$$H(x) ={\text{argmax}}\sum_{i=1}^{k}I({h}_{i}\left(x\right)=Y)$$

where H(x) is a combination of the classification models, \({h}_{i}\) is a single DT model, Y is the input variable, and I() is the indicator function. Each tree can choose the best classification path depending on the input, as shown in Fig. 4.

Fig. 4
figure 4

Random Forest. Figure taken from Peng et al. (2019)

In RF, the margin function is used to quantify the extent to which the average number of polls at X, Y for the right class exceeds that for the wrong category. The margin function is defined as (Meiler et al. 2012):

$$mg\left(X,Y\right)={av}_{k}I\left({h}_{k}\left(X\right)=Y\right)-{max}_{j\ne Y}{av}_{k}I\left({h}_{k}\left(X\right)=j\right)$$

Then from Eq. (11), it can be stated that the larger the margin value, the higher the accuracy of the classification prediction and the more confidence in classification.

To illustrate, if you want to predict when an FC will fail, RF combines many decision trees to get a more accurate prediction. It's like having a group of experts giving you different opinions on when the FC will fail.

Gaussian Naive Bayes (NB)

Gaussian NB is an alternative Naive Bayes method based on Gaussian normal distribution and continuous data. To understand this method, it is important to remember that Naive Bayes is a cluster of supervised ML classification algorithms based on the Bayes theorem. It is an easy taxonomy technique but has extraordinary functionality. The use is mainly when the dimensionality of the input data is high. Complex classification problems can also be implemented by using the Naive Bayes Classifier.

Then, the logic of Gaussian Naive Bayes (NB) starts with Bayes' theorem that supplies an approach to the probability computation of a hypothesis founded on the likelihood of observed evidence. Mathematically, Bayes' theorem is expressed as:

$$P\left(A|B\right)=\frac{P\left(B|A\right) . P(A)}{P(B)}$$

where \(P\left(A|B\right)\) is the probability of event A given event B; \(P\left(B|A\right)\) is the probability of event B given A; \(P(A)\) and \(P(B)\) are the probabilities of events A and B, respectively. Later, when operating with continuous data, a hypothesis frequently presumed is that the continuous values linked with every class are issued according to a normal (or Gaussian) distribution. The probability of the structure is assumed to be (Wang et al. 2020):

$$P\left({x}_{i}|y\right)=\frac{1}{\sqrt{2\pi {\sigma }_{y}^{2}}}{e}^{-\frac{{({x}_{i}-{\mu }_{y})}^{2}}{2{\sigma }_{y}^{2}}}$$

In Eq. (13), x is the variable, y is the class, \(\mu\) is the mean, and \(\sigma\) is the variance. This variance could be assumed to be independent of Y (i.e., \({\sigma }_{i}\)) or independent of Xi (i.e., \({\sigma }_{k}\)), or both cases.

For example, if you want to know if an FC has a specific problem, NB analyzes the FC's data and calculates the probability that it has the problem. It's like a detective putting together clues to solve a case.

Deep learning approach

Multi-layer perceptron (MLP)

In recent years, MLP has also been used in the diagnosis of the health status of fuel cells. This is due to their ability to approximate nonlinear input/output relationships. This behavior is characteristic of fuel cells, so it is considered an appropriate method for diagnosing their condition and estimating the main characteristics influencing their performance (Napoli et al. 2013; Priya et al. 2018).

In this case, it is known that the behavior of the data is nonlinear (Melo et al. 2022). Therefore, the MLP was applied as a binary classification algorithm where the input corresponded to the sixteen previously mentioned variables, and the output was the variable constructed to know the state of fuel cell health. As in previous algorithms, the analysis focuses on the test data.

The logic of MLP starts with a feature vector X as input, followed by the computation of the network outputs by the feedforward propagation. Then, each layer i, can be expressed with the activation vector h as:

$${h}_{i}^{(1)}={\varphi }^{\left(1\right)}\left(\sum_{j}{\omega }_{ij}^{(1)}{x}_{j}+{b}_{i}^{(1)}\right)$$
$${h}_{i}^{(2)}={\varphi }^{\left(2\right)}\left(\sum_{j}{\omega }_{ij}^{(2)}{h}_{j}^{(1)}+{b}_{i}^{(2)}\right)$$
$${y}_{i}={\varphi }^{\left(3\right)}\left(\sum_{j}{\omega }_{ij}^{(3)}{h}_{j}^{(2)}+{b}_{i}^{(3)}\right)$$

From Eq. (14), it can be clarified that \({h}_{i}^{(1)}\) and \({h}_{i}^{(2)}\) represent the activation layer of each layer. Moreover, the terms \({\varphi }^{\left(1\right)}\) and \({\varphi }^{\left(2\right)}\) represent the activation function applied to the weighted sum of the inputs. In addition, \({\omega }_{ij}^{(1)}\) and \({\omega }_{ij}^{(2)}\) embody the weight of the connection between the j-th network in the current layer and the i-th network in the first hidden layer. On the other hand, \({x}_{j}\) symbolizes the output of each input layer. Finally, \({b}_{i}^{(\mathrm{1,2},3)}\) are the bias associated with the i-th network in the first layer.

MLP distinguishes \({\varphi }^{\left(1\right)}\) and \({\varphi }^{\left(2\right)}\) because different layers may have different activation functions. Also, each layer can contain multiple units, so all its units' activations can be represented with an activation vector h(). The same represents each layer's weights with a weight matrix W(). Each layer also has a bias vector b():

$${h}^{(1)}={\varphi }^{\left(1\right)}\left({W}^{(1)}x+{b}^{(1)}\right)$$
$${h}^{(2)}={\varphi }^{\left(2\right)}\left({W}^{(2)}{h}^{(1)}+{b}^{(2)}\right)$$
$$y={\varphi }^{\left(3\right)}\left({W}^{(3)}{h}^{(2)}+{b}^{(3)}\right)$$

Finally, all the training examples are combined into a single matrix X and could store each layer's hidden units for all the training examples as a matrix H(). Each row contains the hidden units, for one example. This is written considering the transposes as follows:

$${H}^{(1)}={\varphi }^{\left(1\right)}\left({XW}^{(1)T}x+{1b}^{(1)T}\right)$$
$${H}^{(2)}={\varphi }^{\left(2\right)}\left({H}^{(1)}{W}^{(2)T}+{1b}^{(2)T}\right)$$
$$Y={\varphi }^{\left(3\right)}\left({H}^{(2)}{W}^{(3)}+{1b}^{(3)T}\right)$$

From Eq. (16) the equations have the same shape as equations from (14) but in matrix form.

In other words, if you want to create an automatic system to detect FC faults, MLP is an artificial network that learns to detect faults by analyzing many examples of FC data with different types of faults. It's like a “teacher” who teaches you how to detect different types of FC failures.

Model testing (MT)

Kuhn & Johnson (Bao et al. 2016) recommend using sensitivity and specificity to evaluate the binary classification models since these measures allow us to know the precision of the models. Likewise, they use the receiver operating characteristic (ROC) curve to represent the relationship between these measures. On the one hand, the sensitivity or recall, or "true positives rate", describes the proportion at which the samples with the event of interest are correctly predicted. The specificity or "true negative rate" is the proportion at which the models without events are correctly predicted. Likewise, it is possible to calculate the precision from the true positives. This measure consists of dividing the proportion of true positives by the total of predicted positives.


where: TP and TN represent true positive and true negative, respectively. FP and FN represent false positive and false negative, respectively.

Another useful measure for evaluating this type of algorithm is the F score. According to Sokolova et al. (Sokolova et al. 2006), this measure favors algorithms with higher sensitivity and challenges those with higher specificity. It is calculated from the precision and recovery of the test, applying additional weights. The F-score result ranges from 0 to 1, with 1 being perfect accuracy and recall and 0 being the lowest and least favorable value.

$$F-Score=\frac{({\beta }^{2} + 1) * precision * recall}{{\beta }^{2} * precision + recall}$$

From Eq. (20), it also can be said that Recall is the ratio between true positive predictions and the total number of existent positive instances in the dataset. Sometimes, it is called sensitivity. Moreover, the values of \({\beta }^{2}\) permits the adjustments between precision or sensitivity (recall) if \({\beta }^{2}\)=1 means the F-Scores is the mean of the harmonics among the precision and recall, i.e., equal weight for both. Other values mean more emphasis on one of the two characteristics.

Finally, the computational time is another important factor to consider when comparing and evaluating the performance of classification models (Wang et al. 2020; Li et al. 2014b; Priya et al. 2018; Mao and Jackson 2016). For this reason, in this study, the training phase's computational time was considered another performance measure.

Experimental results

This section details the performance evaluation of the applied classification algorithms based on some measures common to the methods. Based on these, the most appropriate method for detecting faults in the state of health of the fuel cell is analyzed.

Performance evaluation measures

According to the performance of the metrics, most of the algorithms achieve high precision (greater than 98%) and F1-Score (greater than 99%) in both cases (Healthy and Faulty). In detail, LR, SVM, KNN, DT, RF, and MLP have similar performance. However, NB has slightly lower performance compared to the other algorithms.

On the other hand, it is observed that the precision and F1-Score decrease in all algorithms, especially for the "Faulty" class. Specifically, the reduction in performance is more significant for NB, KNN, and MLP. In contrast, DT and RF still achieve reasonable accuracy and F1-Score (greater than 90%) for the “Faulty” class.

Overall, the performance evaluation measurements indicate that all the algorithms can correctly evaluate the cell's health under normal operating and fault conditions. However, it is necessary to highlight that KNN, DT, RF, and MLP present better performance when determining the state of the cell concerning the other models, which suggests analyzing other metrics to guarantee their reliability. Table 2 shows the results of the measurements by model.

Table 2 Performance evaluation measures of binary classification algorithms

Receiver operating characteristic (ROC) curve

In this sense, comparing the precision of the binary classification models on the originally unbalanced data and the balanced data with different sampling techniques was considered. According to the results shown in Table 3, the precision of LR, KNN, and NB follows a similar trend regardless of the data, which makes them models with stable performance and possible generalization. On the other hand, DT and RF stand out for their perfect precision on the unbalanced data. They keep the same on the balanced data with the ROS and SMOTE-OS techniques and do not significantly vary on the balanced data with the other sampling techniques. Finally, the precision of the SVM seems to be affected when the model is applied to balanced data with the NearMiss-U technique, whose value is considerably reduced from 0.997 to 0.059, unlike the other techniques where the precision differs with only 0.001. In the same way, the accuracy (0.999) of MLP is reduced to 0.059 when the NearMiss-U technique balances the data.

Table 3 Models' accuracy comparison with sampling for imbalanced data

Overall, the high accuracy of most models on balanced data indicates their potential for accurate fault detection in PEFC. Likewise, comparing performance on balanced and unbalanced data sets highlights the importance of data balance for analysis accuracy. However, it is worth noting that in the real world, obtaining high-quality, balanced data sets for PEFC can be challenging.

In addition to the exposed indicators, the ROC curve that relates the sensitivity and specificity was constructed by applying the models to the unbalanced data. Once the ROC curve of the different ML models was obtained, the AUC was obtained, the results of which are illustrated in Fig. 5. The ROC curve for all algorithms is above the diagonal line, which indicates that all models have a performance better than chance. The highest AUC is obtained by the LR, KNN, DT, and RF models, with a value close to 1. The SVM and NB models have a slightly lower AUC, but still good performance. Specifically, again the SVM model presents the lowest value compared to the other models that exceed 0.976 in the AUC.

Fig. 5
figure 5

ROC curves comparison with the imbalanced database for ML models. The figure shows the ROC Curve and AUC for different binary classification algorithms applied to a PEFC data set. The algorithms evaluated include LR, SVM, KNN, DT, RF, and NB. The AUC is a measure of a model's ability to distinguish between the "Healthy" and "Faulty" classes. An AUC of 1 indicates perfect performance, while an AUC of 0.5 indicates random performance

In practice, a high AUC indicates that the model is capable of detecting PEFC faults with high accuracy. Therefore, comparing the AUC of different algorithms helps to identify the best-performing model for fault detection. Likewise, the AUC can be used to select an optimal classification threshold that balances the sensitivity and specificity of the model.

It proceeded in the same way as the DL model, MLP. As presented in Fig. 6, the relationship between the true positive and false positive ratios results in a perfect AUC of 0.998. Consequently, MLP and KNN, DT, and RF (ML models) present the best AUC results. Specifically, the MLP model seems to be suitable for fault detection in PEFC based on the AUC. However, it is important to consider the limitations of the MLP model, such as its low interpretability and the possibility that its performance may not generalize to other data sets.

Fig. 6
figure 6

ROC curve with the imbalanced database for the DL model. The figure shows the ROC Curve and AUC for the MLP algorithm applied to a PEFC data set

Time score

According to the results shown in Table 4, the algorithm that generates the lowest computational cost and can predict in a more timely manner the failures in the health status of the cell is the NB, which is consistent with that stated by other authors (Wang et al. 2020; Li et al. 2014b; Hatti et al. 2006; Kamal and Yu 2014; Lin et al. 2019). In contrast, SVM and MLP demand higher computational costs, so they would not be suitable for online fault detection.

Table 4 Training time of binary classification algorithms

In general, training time is an important factor to consider when choosing a machine learning algorithm for fault detection in PEFC. However, it is necessary to consider that this time may vary with the size and complexity of the data set used. The latter leads to exploring techniques to reduce training time, such as hyperparameter optimization or the use of specialized hardware.


The results show that the LR, KNN, DT, RF, and NB models present similar and optimal trends in their ability to determine the healthy and failed state of the PEMFC correctly. The precision, recall, F1-score, and accuracy values exceeded 0.98 in their application to the balanced and unbalanced data. In practical terms, this stability becomes an important factor when considering its possible expansion to other fuel cell data sources of different functionality, for example, the solid oxide fuel cell (SOFC). In addition to the optimal performance, these models presented a lower computational cost, with training times less than or equal to 11.46 s, contributing to their effectiveness and efficiency.

On the other hand, although the SVM and MLP algorithm presents optimal results in terms of its ability to classify healthy conditions of the fuel cell, it does not remain stable when determining cell failures. Likewise, when applied to unbalanced data, it presents better precision than when used to data balanced by techniques such as NearMiss-US, which reflects its generalization weakness. Lastly, its application requires a much higher computational cost than the other models, with training times exceeding 100 s, which makes it unfeasible for PEFC fault diagnosis.

Table 5 summarizes the results of the models according to the performance of the indicators and the computational cost.

Table 5 Comparison of performance evaluation measures of classification algorithms

These results show that, while some ML methods like LR and DT offer better interpretability, many DL models, such as MLPs, can be "black boxes," making it difficult to understand how they arrive at their predictions. This can hinder trust and acceptance in practical applications. Also, in the future, integrating ML and DL models into existing PEFC systems and infrastructure requires careful consideration of factors like computational requirements, real-time operation, and potential safety implications.


This article presents an approach to PEMFC stack fault diagnosis that relies on different ML and DL techniques to provide a comprehensive view of the performance and computational cost that each technique demands. The procedure is carried out from a previously processed data set whose scope includes the extraction of characteristics and the category of the state of health of the cell. Then, considering that the data provided more information about a healthy state than failure conditions, it was decided to apply the algorithms to the unbalanced and balanced data using different sampling techniques. The results showed that the LR, KNN, DT, RF, and NB models present similar and optimal trends in terms of performance indicators and computational cost, unlike SMV and MLP, whose performance is affected when the data is balanced and even presents a higher computational cost.

From the theoretical aspect, the advantage provided by the methodology of the present study is that it can be scalable and adapted to other classification problems with fuel cells. Also, these results have practical implications in the energy sector, on the one hand, the application of ML and DL methods for fault detection and prediction in PEFCs has the potential to significantly improve their diagnostics and prognostics. This can lead to more effective preventive maintenance, increased cell availability, and reduced operational costs. On the other hand, ML and DL-based approaches can be valuable tools for optimizing PEFC design and selecting materials with improved properties. Consequently, more efficient, durable, and cost-effective cells can be achieved.

However, it is necessary to mention that training and validating ML and DL models require large amounts of high-quality data. Collecting and organizing such data can be challenging, especially for complex tasks like PEFC analysis. Future research should address the data availability challenges by developing efficient techniques for data acquisition and processing, and the configuration of parameters, optimization algorithms, and specific settings; since in this study models with default configurations were considered. Also, in future studies, it is expected to validate experimental results from case studies and develop robust and scalable methodologies for integrating ML and DL models into industrial environments.

Therefore, it is a very promising diagnostic proposition to diagnose the faults associated with PEMFC. In the future, extending the proposed approach to other types of health conditions is recommended, increasing the number of classes and different kinds of fuel cells.

Data availability

Data are available from the corresponding author upon request.



Artificial intelligence


Artificial neural networks


Back propagation neural network


Convolutional Neuronal Network


Deep learning


Decision tree


Equivalent Consumption Minimization Strategy


Electrochemical Impedance Spectroscopy


Fuel Cell


Fuel cell hybrid electric vehicle


Fisher discriminant analysis


Fault Detection and Isolation


Genetic algorithm-based KELM


Gas-diffusion layer


Kernel extreme learning machine


K-Nearest Neighbors


Linear discriminant analysis




Logistic regression


Machine learning


Multi-layer perceptron


Naive Bayes


Neural Network


Polymer electrolyte fuel cells


Particle swarm optimization-based KELM


Random forest


Resilient propagation


Response surface methodology


Synthetic minority over-sampling technique


State of health


Support vector machine

7. References

Download references


Financial support is obtained internally from ESPOL.

Author information

Authors and Affiliations



Conceptualisation: EM., JB-M. and ME-A.; methodology: EM., JB-M. and ME-A.; software: EM. and JB-M.; validation: HN. and ME-A; formal analysis: JB-M. and HN.; investigation: EM; resources: JB-M. and ME-A.; data curation: EM. and JB-M.; writing—original draft preparation: EM. and HN.; writing—review and editing: EM, JB-M. and HN.; visualization: EM. and HN.; supervision: JB-M. and ME-A. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Mayken Espinoza-Andaluz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melo, E., Barzola-Monteses, J., Noriega, H.H. et al. Enhanced fault detection in polymer electrolyte fuel cells via integral analysis and machine learning. Energy Inform 7, 10 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: