Skip to main content

Recently emerging trends in big data analytic methods for modeling and combating climate change effects


Big climate change data have become a pressing issue that organizations face with methods to analyze data generated from various data types. Moreover, storage, processing, and analysis of data generated from climate change activities are becoming very massive, and are challenging for the current algorithms to handle. Therefore, big data analytics methods are designed for significantly large amounts of data required to enhance seasonal change monitoring and understand and ascertain the health risks of climate change. In addition, analysis of climate change data would improve the allocation, and utilisation of natural resources. This paper provides an extensive discussion of big data analytic methods for climate data analysis and investigates how climate change and sustainability issues can be analyzed through these approaches. We further present the big data analytic methods, strengths, and weaknesses, and the essence of analyzing big climate change using these methods. The common datasets, implementation frameworks for climate change modeling, and future research directions were also presented to enhance the clarity of these compelling climate change analysis challenges. This big data analytics method is well-timed to solve the inherent issues of data analysis and easy realization of sustainable development goals.


Climate change and sustainable development have become important research areas due to their application in various facets of our economic life. In this present time, climate change (CC) has become a pressing issue that governments, organizations, and researchers are seriously working on the modalities to analyze data generated from vast climate change and sustainability-related activities. Climate change is considered one of the global threats just like the coronavirus (COVID-19), a pandemic that has caused over 6.9 million deaths as of the 12th of December 2023. Hence, both phenomena constitute a big threat to global existence. The contagious disease novel coronavirus (Tao et al. 2020; Sharma and Gupta 2021) has deeply impacted the national economic development negatively in recent times. Presently, big data analytics methods have successfully assisted in analyzing large climatic data, and contagious disease data, and have recorded impressive results. Therefore, climate change could be best described as a change in the earth’s climates, at local, regional, or global scales, and attributed greatly to the increased levels of atmospheric carbon dioxide (CO2) produced by the use of fossil fuels (Sebestyén et al. 2021). It is further explained as the changes in the earth’s climate driven primarily by human activity since the pre-industrial period, particularly the burning of fossil fuels, and removal of forests, resulting in a relatively rapid increase in CO2 concentration in the earth’s atmosphere.

Climate change as one of the sustainable development goals (SDGs) (Huan et al. 2021) has been transformed and enhanced through big data analytics (BDA). The high volumes of data generated from vast climate change environments such as satellites, fossil fuels, earth’s orbit, climate simulation, etc., can be regarded collectively as big data. The process of examining and gaining knowledge of large significant climate change data to reveal hidden patterns, extract useful data, and the correlation between them to enable decision support is termed big data analytics (Abdullah et al. 2020; Ikegwu et al. 2021). BDA methods offer storage support, decision support, processing, and proper strategy against climate change effects and improve people’s resilience in the face of the adverse effects of climate change through trends, patterns, prediction, and technical analysis using generated data.

However, data analysis, uncertainty in data storage, and processing related to climate change have become big threats to human and wildlife inhabitation. Quite several research studies have been carried out on big data analytics for big climate change data processing. Nonetheless, our findings clearly show, to the best of our knowledge, that none of these studies has comprehensively and specifically addressed, in detail, the peculiar BDA methods for analysing large climate change. For instance, an earlier study by Hassani and Huang (2019) reviewed big data and climate change, and the researchers pinpointed the values of big data and its applications in climate change-related studies in recent years. However, useful tools and BDA methods to tackle the diverse large climate change data were not discussed. Another related recent study (Ornella 2020) presented a survey on “why nature won’t save us from climate change but technology will”. This study highlighted the need to approach climate change issues with technologies where the author refers to carbon capture, storage mechanisms, etc. Nonetheless, the technological methods specifically dedicated to addressing storage, data uncertainty, and processing were left out. A more recent study by Sebestyén et al. (2021) reviewed the applications of data science, big data, and the importance of big data tools in climate change. This, however, does not cover the big data and big data analytics methods to solve the peculiar issues inherent in climate change to attain sustainable development goals (SDGs) (Huan et al. 2021; Türkeli 2020).

Consequently, this review is a well-timed investigation of big data analytics methods for combating climate change for sustainable development. Based on the available studies in the literature, there are no specific reviews that provide important discussion on these all-encompassing methods-based BDA approaches. As such, the contributions of this investigation to the current body of knowledge in big data, big data analytics, and climate change are presented in Fig. 1. The specific contributions of this paper to the body of knowledge are as follows:

  • To provide extensive discussion of big data analytics methods for climate change effects, and describe how climate change issues can be analysed and resolved;

  • To present the data type and source of big climatic data for climate change analysis;

  • To comprehensively explore the prospective analytics methods to handle vast climate change data including their strengths and weaknesses;

  • To highlight the purpose of big data analytics methods in climate change and sustainable development.

Fig. 1
figure 1

Taxonomy of big data analytics methods for climate change

The remainder of this paper is organized as follows: “The impact of climate changes” section presents the impact of climate change. “The conventional methods” section discusses the methods for modeling and combating climate change. “The big data analytic methods” section presents the big data analytics methods. “Purpose for analysing big climate data using big data analytic methods” section discusses the purpose of analysing big climatic data using big data analytic methods. The common datasets for modeling and combating climate change effects are presented in “The common datasets for modeling and combating climate change effects” section. Implementation frameworks for climate change modeling using big data analytics are discussed in “Implementation frameworks for climate change modeling using big data analytics” section. Finally, “Open research directions” section provides the conclusions and future research directions.

The impact of climate changes

Climate change has impacted various areas of human endeavor and the ecosystem. The atmospheric Carbon Dioxide (CO2) level is increasing with speed due to the industrial revolution that has cut all edges of life. The pressing component areas that have been impacted widely include health, economy, and agriculture. The impact of climate change is represented in Fig. 2.

Fig. 2
figure 2

The impact of climate change


Climate change has revolved around human health. It is no longer a gainsaying that climate change is affecting healthy living. Due to global warming is a result of a consistent increase in levels of atmospheric carbon dioxide (CO2) generated from fossil fuels (Sebestyén et al. 2021) (CO2). These continue to cause threats and cause havoc on human life and its surroundings. For instance, the recent study by Habibullah et al. (2022) uses global data of species such as fishes, birds, amphibians, mammals, plants, reptiles, etc. from 115 countries to investigate the climate change impact on biodiversity loss. The results show three variables (temperature, precipitation, and natural disaster occurrence to increase the loss of biodiversity. Another related study by Vicedo-Cabrera et al. (2021) presented heat-related human health impacts using empirical data from 732 locations in 43 countries to estimate the rate of mortality burdens. This, however, shows how climate change has affected negatively human health. The 37.0% (range 20.5–76.3%) of heat-related deaths is as a result of anthropogenic climate change. Nevertheless, there are other factors such as government and public attitude toward climate change that were not considered. Other effects of climate change on human health include infectious diseases, air pollution, heat waves, etc. as seen in the case of the Karachi, Pakistan location, however, the weather instability and environmental policies are not addressed (Babar et al. 2021). Furthermore, psychologically, climate change causes anxiety and mental health as experienced by functional and cognitive-emotional impairment including a symptom of Major Depressive Disorder (MDD) and Generalized Anxiety Disorder (GAD) as shown from uncommon empirical research (Schwartz et al. 2023). However, this research is based on associations between constructs and leaving directionality constructs unattained. In a related study, it was portray the view and obviously that climate change causes both emotional and psychological trauma even in young people between the ages of 16- and 25 years in 10 different countries surveyed (Clayton et al. 2023). The awakening clarion call on the implication of this effect of climatic change perception was deeply emphasis upturn for collaborative action to be taken by the educators, stakeholders, international communities, etc. (Ogunbode 2025).


The burden of climate change on the economy and general business activities is posing a serious concern. Climate change has progressively affected the small open economy as assessed using the Keynesian dynamic stochastic general equilibrium model (Economides and Xepapadeas 2019). It also affects the macroeconomy as reviewed by the central banks’ monetary policy assessment of the inflation outlook (Andersson et al. 2020). Hence, this calls for technological innovation, and fiscal revenue generation as energy efficiency increases and the price of renewable energy falls. Climate change has impacted different countries such as Zambia, Pakistan, Nigeria, Bangladesh, Ethiopia, etc. are faced with high resilient intensity of atmospheric weather and anthropogenic climate change which is currently becoming a global threat. Our environment, energy usage, economic growth, and gross domestic product (GDP) are being impacted negatively. Many research efforts and global bodies are bent on providing strategies, policies, tools, etc. to ameliorate these trending impacts. For instance, the global warming of 1.5 °C presents unresolved policy issues, hence, a measure was provided by Semieniuk et al. (2021) to limit climate change by reducing energy demand, though, still practically impossible due to large energy savings in a growing global economy and industrial production. Also, the adverse effect of climate change has been causing a lot of damage to economic viability. To this concern, Bhopal et al. (2021) utilized the climate-resilient green economy (CRGE) approach in 2011 as a low-carbon development strategy in Ethiopia. Government. This is also to reduce the vulnerabilities both in health and the economy, however, the persistent nature of climate change still imposes threats. More so, a recent study by Hossain et al. (2022) saw a dynamic computable general equilibrium model to analyse the outcome of climate change on Bangladesh’s economy. The model identifies a decrease of − 6.17% and investment declines by − 7.7% of the average gross domestic product (GDP) of Bangladesh as a result of climate change impact. Additionally, economic and biophysical uncertainties have been drastically caused by the effect of climate change. The study by Deangelo et al. (2023) attempted to utilize Seaweed growth and techno-economic models to address the costs of global seaweed production in Net-zero greenhouse gas. Farming seaweed at a relative scale to match the global carbon budget is quite challenging. However, this requires more digital models such as Artificial Intelligence (AI) and other sophisticated tools to reduce economic and biophysical problems. Similarly, the impact of climate change on economic growth is denigrating due to global warming. For instance, Tol (2024) the outcome of the research about a meta-analysis of the overall climatic change impact on economics close to the growth impact estimated as a function of weather shocks. Hence, the social cost of carbon reviews a close pattern to the overall impact estimates. The implication of social cost beckoned on the emissions issues such as parameterization, the rate, and extent of warming, and the aggregation of impacts among the people and issues over some time. Nonetheless, more enlightenment on the impacts of moderate warming in the short and medium term should be considered. More recent studies on the climate change impact on the economy are contained in Anh et al. (2023), Javadi et al. (2023).


Food insecurity and scarcity are seriously clocking towards the widening of climate change. The world’s climate system has gradually over time changed as a result of human activities. Hence, the changes have affected natural ecosystems and the general standard of human beings. Despite strategies and planning to address this rapid occurrence, there is still an impact on agricultural food production negatively. Agriculture has played a vital role in human sustainability and the economic gross domestic product (GDP) of every nation as no nation can survive without food. The survival of agriculture, the natural ecosystem, and the nation’s capacity building depends on policies, strategies, and campaigns against harmful greenhouse gas emissions. The inherent problem of meeting the continuously increasing food demand for the growing population will be further ignited by climate change across the world. Nonetheless, researchers are working around the clock to ensure finding optimum solutions. For instance, climate change has disrupted agriculture and reduced agricultural total factor productivity (TFP) by 34% as a result of high temperatures, drought, heat waves flood strikes, etc. (Fuglie 2021). These uncertainties continue to trail agriculture as farmers in Iran lamented as the changes in precipitation, temperature, and CO2 fertilization increase (Karimi et al. 2018). In addition, climate change is causing agricultural food vulnerability as reported by farmers in Norte de Santander, Colombia (Núñez et al. 2018) and ASEAN agricultural sectors (Handayani and Abubakar 2020). Similarly, climate change has an impact adversely on crops (e.g. maize, rice, cotton, sugarcane, bajra, gram, etc. due to the CO2 high emission observed in Pakistan, Nigeria, and at globe (Rehman et al. 2022). Additionally, Ngoma et al. (2021) analysed the climate change on household welfare and agriculture in Zambia. This was achieved using a chain of modeling analytic tools called systematic assessment of climate-resilient development framework (SACRED) to address the uncertainty trailed by the climate challenges. Hence, adaptation interventions are sorted to sustain the future of smallholder agriculture across the globe. More so, the SACRED method was utilised to provide a consistent framework for data analysis to assess temperature, precipitation, and evaporation impacts of different climate scenarios based on dynamic computable general equilibrium models in Ethiopia and Mozambique (Manuel et al. 2021; Solomon et al. 2021). Furthermore, more investigation was carried to ascertain the impact of climate change on agriculture. Anh et al. (2023) investigated climate change based on short and long-term effects on Vietnam’s agriculture, at the macro stage with reference to production and values. This study utilized Autoregressive Distributed Lag (ARDL) model. The results shows that it does unfavorably affects the global food security in Vietnam and beyond, thereby, increasing poverty alleviation and sustainable development. Also, assessment carried out in Central India, particularly in Vidarbha region shows that 70 farmers from the study areas who engaged in cotton cultivation were marginalized due to climate change perception. Following perception from the designed questionnaire, climate change has adverse impact on crop sowing, growth, post-harvest, livestock, etc. Therefore, in all cases, instances, and scenarios reviewed, it is quite understandable that climate change has greater negative effective on agricultural practice and sustainable food security development. This was a rejoinder as it is echoed and predicted at different degrees and magnitudes over the Sri Lanka and nations at large to be global warming and the resultant climatic change by the year 2025 (Nanda and Perera 2019).

The methods for modeling and combating climate change

In this section, the conventional methods and big data analytic methods are discussed. Figure 3 shows the pictorial view of the methods for modeling and combating climate change. Also, the strength and weaknesses of this method are presented in Table 1.

Fig. 3
figure 3

The methods for modelling and combating climate change

Table 1 Strengths and weaknesses of the methods for modeling climate change

The conventional methods

There are different conventions or traditional methods for modeling and combating climate change. The most effective modeling approaches include the computable general equilibrium (CGE) model, statistical model, gateway belief model (GBM), soil and water assessment tool model (SWAT), etc. These are succinctly discussed in the subsection.

Computable general equilibrium (CGE)

The computable general equilibrium model (Hossain et al. 2022) is a standard tool for empirical analysis and is dedicated to economic analysis word-wide. It captures the economy’s supply and demand side and allows for adjustments (e.g. quantities and price) based on policy shock. The CGE could be a static or dynamic computable general equilibrium (DCGE) model. The static model is best utilized when the specific economic period is known. The DCGE model allows definite features of short and long-term policy shocks to be determined. CGE is essential and globally used to capture direct, and indirect, represent the economy, and explicitly maintain consistency at the macro and micro levels (Shahpari et al. 2021). The CGE also detects any exogenous variable that changes any quantity of the economy and gives rise to consequences throughout the system. More so, it helps the data to be numerically calibrated to their algebraic framework and mathematical programming that captures the outcome of the basic features of the dynamic model. In addition, the essence of this CGE mode is to explain the function of prices in an economy. The strength bestows in their flexibility since they can adapt to a wide range of policies and shocks. Furthermore, CGE model has three major components; data, theory, and shocks. These are combined to bring effective results. The results depend on numerical predictions of the changes in the economic system. For instance, a recent study by Hossain et al. (2022) utilised a dynamic computable general equilibrium model to analyse the impact of climate change on Bangladesh’s economy. The result shows that Bangladesh’s GDP is affected by − 6.17% and even declines by − 7.76% in investment. Further predictions show a reduction of − 3.08 and − 3.7% collectively in 2030 and 2050, especially in the rice sectors. Also, Jensen et al. (2021) applied the CGE model to analyse seven current and future climate change challenges for Myanmar, state and region-specific paddy yield change of 30-year intervals between the 2020s, 2050s, and 2080s. This application produced effective results to point out strong differences in climatic change effects in the nation’s states and regions on agricultural production. Furthermore, DCGE was deployed in the analysis of Mozambique’s agricultural sector and household welfare (Manuel et al. 2021). This shows acute negative impacts are experienced in the agricultural sector, especially for maize and cassava as crop yields decrease. More so, DGCE identifies the decline in production of teff, maize, and sorghum by 25.4, 21.8, and 25.2%, respectively by 2050 compared to the base period of 2025 in Ethiopia’s agricultural production (Solomon et al. 2021). Further to this, Ethiopia nation will lose its agricultural GDP to 31.1% at factor cost by 2050 as a result of climate change. However, GCE is not generally utilised in all-purpose climate change modeling because it dwells more on economic policies and shocks. It lacks computational libraries for dynamic variable data analysis.

Statistical model

The statistical model is one of the key models used in analysing and predicting climatic conditions. The statistical modeling methods encompass simple regression up to non-parametric spatiotemporal Bayesian models deployed for effective data modeling and analysis to foster past climate change (Sweeney et al. 2018). The statistical model is known for its structural simplicity. The kind of statistical models deployed for the assessment and evaluation of climate change data include Regression, Bivariate, Frequency Ratio (FR), Evidential Belief Function (EBF), and Ordered Weight Average (OWA) that has been widely utilized (Yariyan et al. 2020; Siddiqua et al. 2021). For instance, Yariyan et al. (2020) combined different statistical models (e.g. FR, EBF, and OWA) for flood susceptibility in Kurdistan Province, Iran to reduce the harmful effects of flooding that were highly contributed due to the impact of climate change on rainfall. The hybrid of the models achieves 95.1% efficiency. Also, the computation of the Flood Potential Index (FPI) was achieved with 2 bivariate statistical models (Costache 2019). This used 10 flood conditioning factors together with 158 flood pixels and 158 non-flood pixels that were validated through the ROC curve model. This shows that 25% of the upper and middle basin of the Prehova river is due to climatic change impact. A recent study by Yang et al. (2022) utilised a geographically and temporally weighted regression model (GTWR) to show the climate change effect on corn yield in U.S. Corn Belt making it possible to retain the original data from spatiotemporal heterogeneity data. These, however, showed over 40 years, from 1981 to 2020 the positive effect of climate change on corn yield, with temperature having a major effect compared to precipitation. More so, statistical models show off a great evaluation to assess and make future projections on the best approach, simulations, strategy, measures, and policies in different hemispheres and states such as Indiana (Hamlet et al. 2020), Jemma sub-basin, upper Blue Nile Basin of Ethiopia (Worku et al. 2020), Scotland’s Atlantic salmon rivers (Jackson et al. 2018), and regions of Pakistan (Siddiqua et al. 2021). However, the current data generation is generic and spontaneous and requires advanced data analysis libraries or techniques to handle.

Gateway belief model (GBM)

The gateway belief model (Van Der Linden 2021) is a dual process theory of attitude change, that portrays what political strategists have perceived for decades: observed scientific agreement plays a key role in people’s attitudes about contested scientific issues. This model is mixed in the sense that people oppose the scientific judgment that climate change is caused by humans. So, it borders on the degree of belief of these people’s perception of climate change. It tells how real people think about climate change and how much people worry about the issue. This perception continues, which depends on the level of consensus and the extent of their endorsement of those issues. However, it empirically evaluated the messages about the scientific consensus on the reality of anthropogenic climate change and the safety of genetically modified food shifting perceptions of scientific consensus (Kerr and Wilson 2018). Therefore, the consensus was perceived in informing personal beliefs about climate change, nonetheless, results indicated limitations in the impact of single and one-off messages. This perception of people’s view on climate change causes arguments in the sense that some authors argue that environmentalism is not the main cause of thoughts or behaviors about climate change. Relatively, the evolved social needs for belongingness, understanding, control, self-enhancement, and trust are more practical intervention targets than the attempt to create environmentalist beliefs or identities (Brick et al. 2021). A study by Van Der Linden et al. (2019) presented a large-scale confirmatory replication of GBM on a national quota sample of the US population (N = 6301). The result shows from the hypotheses of the GBM that change in perceived scientific consensus causes subsequent changes in cognitive (belief) and affective (worry) judgments about climate change, which in turn are associated with changes in support for public action.

Soil and water assessment tool model (SWAT)

The soil and water assessment tool model (Akoko et al. 2021) is a hydrological modeling tool dedicated to hydrologic and environmental simulations. It is a physically based, semi-distributed, and continuous time hydrological model. Typically, the SWAT model was developed to assess water resources and predict the impacts of land use/cover changes including land management practices on soil erosion, sedimentation, and non-point source pollution on large river basins/watersheds. The SWAT model has been widely applied in the assessment of climate change impact globally. For example, Li and Fang (2021) applied the SWAT model to assess the impact of climate change on the stream flow in the Mekong Basin of the Mun River, Southeast Asia. The result shows that climate change has a great impact on the river stream monthly flow changes were negatively related to temperature (p < 0.05) in the dry season and positively linked to precipitation (p < 0.01) in the wet season. It also projected an increase of streamflow by 10.5%, 20.1%, and 23.2% during 2020–2093 under three climate scenarios. This provides a scientific basis for adaptive management, although, this model is only good in in hydrologic and environmental simulations related practices. Also, Aznarez et al. (2014) employs SWAT mode and remote sensing data to analyse the climate change impact on hydrological ecosystem services in Laguna del Sauce, Uruguay. The result shows that water resources were negatively affected in the Laguna del Sauce catchment, particularly in the representative concentration pathways (RCP) by 8.5 scenarios. However, a comparative analysis of other locations is not covered. Furthermore, water Erosion monitoring and prediction were analysed to ascertain the effect of climate change in the R’ Dom watershed in Morocco using the revised universal soil loss equation (RUSLE) and SWAT Equations (Alitane et al. 2022). SWAT model was employed in the modeling of surface water availability in a Semi-Arid Basin, El Kalb River, Lebanon (Kalb et al. 2021) due to climate change, and assessment of the future impact of climate change on the Hydrology of the Mangoky River, Madagascar (Finaritra et al. 2021). However, all these studies are geared towards providing adaptation strategies, projection, and evaluation of tackling climate change issues, managing water resources, and water engineering.

The big data analytic methods

This subsection discusses the big data analytic (BDA) methods or approaches employed to analyse climatic big data (CBD).

Machine learning

Machine learning (Rolnick et al. 2019) is the analytics method that can analyze, inform meaning, and predict the outcome of large climate change data. The impact of machine learning (ML) in tackling climate change issues includes: offering fast data analysis and prediction helping to reduce greenhouse gas emissions and assisting society in adapting to a changing climate. It further accelerates the prevention of the leakage of methane from natural gas pipelines and compressor stations, modeling emissions, and improving clean energy access. More so, ML methods such as support vector machine (SVM), Decision Tree (DT), Random Forest (RF), k-Nearest Neighbors (k-NN), Naïve Bayes, etc., are utilized to access the health risk across subpopulations due to climate change effects. For example, Machine learning methods such as DT, SVM, and k-NN were utilized to detect the daily number of COVID-19 infected and death cases during the pandemic lockdown (Saba et al. 2020). Also, the ML analytic method was applied for the evaluation of anthropogenic and natural climate change, which optimize the spectral features of the component sine waves (Abbot and Marohasy 2017). Many recent other works utilised ML methods in solving climate change effects. Few of the recent studies include (Zia 2021; Manley and Egoh 2022; Davenport and Diffenbaugh 2021; Ikegwu et al. 2023).

Deep learning

Deep learning (DL) is one of the analytics methods that has played a crucial role in big data analysis. Deep learning (Zhang and Li 2020) is a nonlinear method for simulation and prediction in both mining and diagnosis of large climate data. It discovers climate patterns and predicts climate change needs. Another reason is that DL is capable of discovering unknown hidden information in big climate data. The known DL methods are neural networks (NNs) such as artificial neural networks (ANNs), convolutional neural networks (CNNs), deep neural networks (DNNs), etc., which are used for climate pattern analysis. The NNs are the interconnection of sample computing cells known as processing units or neurons. In addition, the algorithms associated with these methods include Metropolis, Gibbs sampling, simulated annealing, variation approach, etc., which are deployed for the simulation of large climate data. DL is an important aspect of analytics methods applied in significant climate data analysis to achieve sustainable development goals (Huan et al. 2021) vision on climate change. For instance, ANNs were utilized for the simulation and forecasting of climate and meteorological variables such as temperature, rainfall, solar radiation, and wind speed (Abbot and Marohasy 2017). Also, DL was applied in the analysis of remote sensing data on climate change and urbanization (Zhu et al. 2017). Recently, Kurth et al. (2019) analyzed extracted pixel-level masks of hash weather patterns using deep learning methods. This, however, achieved a parallel efficiency of 79.0%.

Artificial intelligence

Artificial intelligence (Sebestyén et al. 2021) is an analytics method for big data analysis saddled with the responsibility of supporting simulation and decision-making infused into earth observation data and simulation climate data. The methods produce a better result when combined with numerical climate model data. For example, Kadow et al. (2020) reconstructs missing climate data from global climate datasets (HadCRUT4) using the artificial intelligence (AI) image inpainting approach. Therefore, uncertainties and biases in climate records were reduced. Furthermore, AI methods discovered climate connections to enhance earth system model (ESM) simulation and weather features (Huntingford et al. 2019). Some other works that have utilised the AI approach in combating climate change effects include (Benzidia et al. 2021; Lozo and Onishchenko 2021; Narayan 2021; Cheong et al. 2022; Avand et al. 2021; Hwang et al. 2021; Cowls et al. 2021).

Purpose for analysing big climate data using big data analytic methods

Climate change (CC) and sustainability development (SD) have been transformed and enhanced through big data analytics (BDA) methods. The purpose of BDA methods in CC&SD as one of 13 out of the 17 sustainable development goals (SDGs) (Huan et al. 2021) include:

  1. (i)

    Climate change detection: Big data analytics methods such as Hadoop MapReduce framework spatial cumulative sum algorithms (SCUMSUM) are used to reduce large climate data to monitor the seasonal changes exhibited (Manogaran and Lopez 2018). The spatial cumulative sum algorithms such as cumulative sum (CUSUM), bootstrap analysis, and spatial autocorrelation methods were utilized to determine the slow and drastic changes in the mean value of vast climate data. For example, (Manogaran and Lopez 2018) applied the SCUMSUM algorithms and the BDA methods to monitor the changes in rainfall, precipitation, maximum and minimum temperature, humidity, wind speed, and solar. This, however, achieved high result performance up to 81.48%.

  2. (ii)

    Disease identification: Disease identification (Lopez and Sekaran 2016) involves identifying and predicting diseases that occur due to climate change, in which analytics methods have played a vital role. These diseases could be illnesses and abnormalities in the human body system. For example, the analytics methods such as the CUSUM algorithm and Bootstrap analysis method combined with big data analytics help to identify and predict malaria, coronavirus, Ebola virus, dementia, and Parkinson’s (Tao et al. 2020; Sharma and Gupta 2021; Manogaran and Lopez 2018). These methods further accelerate the monitoring of physiological and psychological changes such as high temperature, dengue fever, and emotional stress, of the human body that occur as a result of climate change. In a recent development, big data analytics methods have successfully assisted in analysing contagious disease data and have recorded impressive results as reported in the current studies. For instance, a recent study by Saba et al. (2020) utilized analytic techniques such as decision trees, support vector machines, and k-Nearest Neighbors to detect and forecast the daily overall number of COVID-19 infected and death cases during the pandemic lockdown. A more recent study by Rasheed et al. (2021) deployed linear regression and convolution neural network techniques in post-COVID-19 diagnosis from data obtained from chest X-ray images. The study achieved the accuracy of LR and CNN, 95.2–97.6% without principal component analysis (PCA) and 97.6–100% with PCA respectively.

  3. (iii)

    Emission reduction: A combination of technological, natural mechanism, and economic approaches as essential means of combating climate change will help reduce carbon emissions. However, nature alone cannot protect us from the climate change effects but technology paradigm or a hybrid approach (Ornella 2020). BDA tools such as Apache Hadoop, MongoDB, Lambdoop, etc., are utilized to reduce carbon emissions concerning oil exploitation, smart buildings, and smart city development (Zhang and Li 2020; Gomede et al. 2018). In addition, the implementation of a low fossil-carbon energy system and energy use improvement also reduce carbon emissions. More so, remote sensing data generated through earth observation and analytics emanating from the basis of satellites, aircraft, and ground-based structures are utilized to form a decision-support, prediction, and forecast for further global, regional, and field scales towards carbon emission reduction.

  4. (iv)

    Monitoring and evaluation: BDA methods facilitate monitoring, evaluation, and adaptation through the utilization of digital devices such as smartphones, sensors, social media, earth observation data, and climate simulation data. Therefore, it examines how an adaptation program designed to enable storm warnings affects the standard of living before, during, and even after a storm event, as well as monitors if and how these changes could occur over time (Ford et al. 2016).

  5. (v)

    Cloud-based ecosystem: BDA methods help to create a cloud-based ecosystem from vast data sources combined with the appropriate techniques and software-as-a-service (SAAS) to revolutionize the agricultural sector, marine ecosystem, and green industries; hence, leveraging the cost-effectiveness of storage of voluminous amounts of data on the cloud (Schnase et al. 2017; Jiao et al. 2015).

  6. (vi)

    Decision support: It offers a decision support system and proper strategy against climate change effects and improve people’s resilience in the face of the adverse effect of climate change through trends, pattern, prediction, and technical analysis using machine learning and artificial intelligence (Knüsel et al. 2019; Heckman et al. 2018). It further offers early warning surveillance thereby enhancing the capacity to respond to climate change (Ford et al. 2016).

  7. (vii)

    Assessment of disaster damage: Humanitarian operation and crisis management are yielding excellent results using the application of big data analytics techniques (Akter and Wamba 2017). The BDA helps to visualize, analyse and predict disasters, thereby making them easy to manage. Quick disaster response is most compelling because it imposes some threats. Some of the threats include lives, meeting humanitarian needs (food, shelter, clothing, public health, and safety), clean-up, damage assessment, task assignment, resource allocation, etc. The application of big data such as satellite imageries, Global Positioning System (GPS) traces, mobile Call Detail Records (CDRs), social media posts, etc., in conjunction with advances in data analytic techniques (e.g., data mining and big data processing, machine learning, and deep learning) can facilitate the extraction of geospatial information that is highly needed for rapid and effective disaster response (Cumbane 2019). For instance, Guo et al. (2020) utilised a data mining approach, econometric regression model and input–output model are implemented in the system, based on multi-source data including hourly rainfall, geographical conditions, historical and real-time disaster information, socioeconomic data, and defense countermeasure.

  8. (viii)

    Rainfall-runoff modeling: Rainfall-runoff (Xiang et al. 2020) is viewed as a complex nonlinear time series model. It has been widely used for water resources management, hydro-power development, urban planning, irrigation, and other agro-hydrological/meteorological activities planning. Over time, it has attracted much attention from researchers to utilise the model for effective time series predictions in hydrology. Nonetheless, the application of big data analytics methods has made it more efficient for prediction. For instance, Xiang et al. (2020) utilised machine learning and deep learning (e.g. long short‐term memory, sequence‐to‐sequence, etc.) models to predict runoff using rainfall data sets. However, the spatial inequality of rainfall is still ignored inside each sub-watershed. Also, machine learning methods were used to investigate rainfall-runoff modeling at an hourly timescale, which achieved better accuracy (Muhammad et al. 2020).

  9. (ix)

    Crop recognition: Big data from remote sensors that it possible for the identification and classification of crop species automatically, fast, and cost-effective to avoid the use of human experts (Tantalaki et al. 2019). Remote sensing helps crop map building by pixel classification as it is important for the development of precision agriculture. Crop recognition involves two methods; a joint likelihood decision fusion multi-temporal classifier and Markov model-based technique (Zhang and Li 2020). The first method which the class with the highest likelihood of producing the observed single-date/multidate classifications for a given remote sensing image. The second method, the Markov model-based technique relates the varying spectral response along the crop cycle with plant phenology for different crop classes and recognizes different crops by analysing their spectral temporal profiles over a sequence of remote sensing images. For example, Shelestov et al. (2017) classification for multi-temporal satellite imagery for crop mapping was effectively carried through big data processing tools such as SVM, decision tree, random forest classifier, etc., and Google Earth Engine (GEE) platform.

  10. (x)

    Rainfall estimation: Seasonal (e.g. monthly, yearly, etc.) rainfall estimation and prediction are made possible with the help of BDA. This influence directly or indirectly the kind of rainfall estimation variables. This variable is dependable on data input. Examples of these input data variables include maximum, minimum, and average temperature (°C), vapor pressure (hPa), wind speed (km/h), humidity (%), and cloud cover (%) (Pundru et al. 2022). In addition, there is a variety of radar data processing algorithms involved in rainfall product data generation requests. These include data quality control, rain rate estimation, rainfall accumulation, and conversion of spherical (polar) to geographic coordinates (Seo et al. 2019; Khan and Bhuiyan 2021). These data are analysed for effective rainfall estimation and decision-making with BDA methods. For example, (Pundru et al. 2022) utilised machine learning models with singular-spectrum analysis otherwise known as least-squares support vector regression (LS-SVR), and random forest (RF) for rainfall prediction. Root Mean Square Error (RMSE) and Nash–Sutcliffe Efficiency (NSE) were used to assess the performance of the models that achieved 71.6% and 90.2% respectively. The model productively predict the rainfall.

The common datasets for modeling and combating climate change effects

The extensive review showed that there are various methods or models developed for analysing a large volume of climate change data (Zhang and Li 2020). Government agencies and other philanthropic bodies are adopting the methods to study the trends, and patterns, and extract useful data in the climatic change environments by identifying new opportunities to combat climate change issues. Therefore, in this section, we focus on data types and sources, and common datasets utilised by various authors in modeling climate change. Moreover, the datasets at terabyte, petabyte, and even Exabyte scales from diverse sources are utilized to form a basis for the increase in global warming research.

Data types and sources

Recently, the epoch of big data has created a variety of datasets from vast sources in climate change domains. These datasets include several modalities, each of which has a diverse representation, distribution, scale, and density. Hence, data mining and big data analytics methods are launched to discover hidden knowledge from the numerous datasets formed to aid quick decision-making (Zhang and Li 2020; Al-shiakhli 2019). This, in essence, helps humans better understand climate change and its analysis. Hence, large climate change data types and sources are categorically classified into two, namely; earth observation and climate simulation big data types. These are briefly discussed below:

  1. (i)

    Earth observation big data (EOBD) type

    Earth observation big data type involves the remote sensory observation utilized for monitoring large-scale variability of global climate and environmental changes (Zhang and Li 2020). The EOBD data source includes satellite, weather, atmosphere, hydrosphere, biosphere, lithosphere, station, and gridded data. The data generated from this source helps to obtain current climate conditions and predict future climate changes. This kind of dataset is made available in some domains or agencies such as the National Aeronautics and Space Administration (NASA), US Government Open Data Initiative (USGODI), SeaDataNet (SDN), National Centre for Environmental Information (NCEI), Climate Research Unit (CRU), and Data Distribution Centre (DDC), etc. (Grotjahn and Huynh 2018; Hartter et al. 2018; Tariq et al. 2019). For instance, Blume et al. (2023) utilized EOBD such as Sentinel-2 mosaic Bottom of Atmosphere imagery having 18,881 single 100 × 100-km tiles. The study’s results support spatially-explicit seagrass and ocean ecosystem accounting, and further assist policy-making, blue carbon crediting, and all necessary financial investments. The study used random forest machine learning algorithms to assess and analyze the desired data. Additional, Béjar et al. (2023) implemented discrete global grid systems (DGGS) using earth observation data cubes having rHEALPix to enable the efficient integration of diverse spatial data. However, a lot more is anticipated from the design framework, including fully functional operations and the integration of rHEALPix-safe and rHEALPix-aware features in the Python application programming interface for the open data cube.

  2. (ii)

    Climate simulation big data (CSBD) type

    The data generated from these categories are generally used to predict future climate change trends and assess their impacts. The CSDB data sources are validated by foundational elements of climate science known as coupled model inter-comparison project (CMIP) standards (Colorado-Ruiz et al. 2018). The objective of CMIP is to elucidate past, present, and future climate change arising from natural and unforced variability. The example of such data generated from CSBD includes higher resolution complex physical, chemical, and biological processes. More data sources include DECK [diagnostics, evaluation, characteristics of Klima (climate)], coordination, climate projection data, infrastructure, and documentation data (Zhang and Li 2020). Hence, MCIP and DECK historical simulation formed the major source of CSBD. The datasets are useful in climate change predictions using analytics methods. These datasets of this nature can be found in National Oceanic and Atmosphere Administration (NOAA) reports, NASA Earth Observatory (NASAEO), Global Carbon Project (GCP), and Data—Climate Change (World Bank) (Tariq et al. 2019; Pinkerton and Rom 2014). Take for example, (Nikolaev et al. 2020) utilized general circulation models simulated data of an 800 year time series from CanESM2 using deep learning methods to train on them and testing with the historical data. Nevertheless, depending on the available historical data, the simulated data needs to be fine-tuned further to lessen overfitting. Additionally, the recent study by Wang et al. (2023) implemented machine learning techniques to model snowmelt runoff in a high-altitude mountainous area in the Xiying River Basin without the need for observation data. These techniques used meteorological data at the watershed level. In order to improve the machine learning model's prediction of snowmelt runoff in alpine mountains, snow remote sensing data was used. This was also supported by the numerical simulation analysis of surface air temperature between 1980 and 2019 as carried out Krasnodar city, Russia (Volvach et al. 2023) and related comparison with hemispheric and regional sea ice extent using NOAA and NASA passive microwave-derived (Meier et al. 2022). Following research and comparison, it was determined that the space-based observation had more local circumstances than the ground-based observation based on the data from the World Data Center (ground based) and the POWER project (space based). More recent study by Hayasaka (2024) consistently examine the fire weather conditions in Mexico by utilizing hourly weather data, simulated climate data, and 20 years’ worth of satellite hotspot and rainfall data. This, however, attempted to improve local fire forecasts and trends in future fire across the globe.

The datasets, differences, and similarities for modeling and combating climate change effects

Based on the critical review, we identify 42 works that have used different or similar datasets for the implementation of climate change-related problems to identify opportunities, predict, and project effective measures, policies, strategies, and techno-drive to combat the effects of the global warming of climate change. Various data or datasets have been generated by different authors to implement climate change impacts. These datasets are structured and analysed using climate change methods (e.g. CGE, statistical, GBM, SWAT, ML, DL, AI, etc.). The climate change methods were discussed in “The methods for modeling and combating climate change” section. Therefore, some of these data are climate change earth observational big data, climate change simulated big data, and social media climate change big data. From the review, seventeen (17) out of 42 primary papers reviewed used earth observational clime change big data e.g. (Habibullah et al. 2022; Davenport and Diffenbaugh 2021; Kadow et al. 2020; Avand et al. 2021), twenty-one (21) utilised climate simulation big data, e.g. (Hossain et al. 2022; Handayani and Abubakar 2020; Yang et al. 2022; Zia 2021; Seo et al. 2019), and four (4) authors make use of social media data e.g. (Manley and Egoh 2022; Hwang et al. 2021; Liu 2021).

For example, some of the climate change earth observational data that have been utilised by various authors include HadCRUT4, Red list by IUCN, multi-country multi-city (MCC) that contain about 732 locations in 43 countries, social accounting matrix (SAM), DDEM, Oxford station data, 923 PRISM station data, a household that contains about 395 from 8 villages, 6301 of US Pop from Qualtrices LLC, etc. (Habibullah et al. 2022; Vicedo-Cabrera et al. 2021; Davenport and Diffenbaugh 2021; Kadow et al. 2020; Xiang et al. 2020; Atube et al. 2021). Then, some examples of climate simulation datasets include 14 Gradient Circulation Models (GCMs) from CMIPs, 20 GCMs from CMIP6, National Centre for Hydrological Meteorological Forecasting (NCHMF), Resnet 50 dataset, NEXRAD from Amazon Web Service (AWS), Coordination of Information on Environment (CORINE), Regional Climate Model (RCM), General Inspectorate for Emergency Situation (GIES) dataset, etc. (Costache 2019; Yang et al. 2022; Van Der Linden et al. 2019; Alitane et al. 2022; Zia 2021; Narayan 2021; Muhammad et al. 2020; Seo et al. 2019; Colorado-Ruiz et al. 2018; Wang and Tian 2022).

Similarly, some of the authors who utilised datasets from Coupled Model Intercomparison Project Phase 5 and 6 (CMIP5 and CMIP6) include (Hossain et al. 2022; Hamlet et al. 2020; Colorado-Ruiz et al. 2018; Wang and Tian 2022), social media datasets include (Manley and Egoh 2022; Hwang et al. 2021; Liu 2021). Also, National or global datasets such as GIES, CORINE, NCHMF, HadGEM, SRTMN, RCM, NASS, USDA, APHRODITE, etc., include (Costache 2019; Jackson et al. 2018; Li and Fang 2021; Kalb et al. 2021; Zia 2021; Muhammad et al. 2020; Yang et al. 2020; Nowack et al. 2018). Table 2 presents the datasets, differences, similarities, and methods for modeling and combating climate change effects.

Table 2 The datasets, differences, similarities, and methods for modeling and combating climate change effects

Implementation frameworks for climate change modeling using big data analytics

The data processing and analysis of high-voluminous climate change data generated or sourced from public sites or databases require efficient frameworks to be handled. The big data analytics modeling platforms or tools include HDFS, cross—MapReduce, YARN, Google BigTable, Spark, Mahout, and Flume. These frameworks are briefly explained. The strength and weaknesses of these frameworks are presented in Table 3.

  1. (i)

    Hadoop file distributed system (HFDS)

    Hadoop file distributed system (Shih 2018) is an analytic storage framework created from the Apache family and it is responsible for storing large data. The essence of HDFS is to store large data generated from earth observation and climate simulation datasets. HDFS handles large data and maintains the simplicity of usage. It can read data into distributed arrays without introducing a single point of data conversion. For instance, (More et al. 2019) utilized HDFS to load historical weather datasets crawled from National Climatic Data Centres. This dataset was analysed to detect climate change with the help of MapReduce which was applied to remove scalability issues. Conversely, HDFS and MapReduce work in common, HDFS stores the data, and MapReduce is deployed to process the large datasets and store them back in HDFS. For example, a recent study by Greca et al. (2023) deployed big data framework such as Hive and Hadoop to store, manage, and process climate change earth surface temperature data from NOAA’s MLOST, NASA’S GISTEMP and UK’s HADCRUT. Following the research result, it shows that in the city of Durres, the temperature increased by 1.1 °C since the inception of the pre-industrial era.

  2. (ii)

    Cross MapReduce

    Cross MapReduce is a technological framework enabled by the Apache Hadoop ecosystem that provides data splitting into the distributed format, data mapping, shuffling, and classification to reduce document search (Mirpour et al. 2021). This is essential in the processing of large climate change data because it is capable of processing all geo-distributed data. Cross MapReduce (CMR) merges the records that have the same keys in the cluster using reduce function. CMR sequentially contains three components; MapCombine, Gshuffle, and GlobalReducer. This component is jointly utilized to minimize transferring of data volumes globally. For example, Mirpour et al. (2021) proposed a Cross MapReduce framework to minimize transferred data volume and determine the number as well as the locations of global reducers. This, however, achieved a 40% reduction in the amount of data transfer over the Internet. In addition, MapReduce helps to detect climate change from a large volume of weather data (More et al. 2019). After climate change detection, there is a need for prediction to be infused in the data to form decision support, hence; machine learning.

  3. (iii)

    Yet another resource negotiator (YARN)

    YARN (Cumbane 2019) is a batch-processing framework, that coordinates the components of the Hadoop framework. YARN is implemented on top of HDFS which helps the execution of multiple and distributed climate big data in parallel across applications. YARN is an important framework because it dynamically handles multiple processing and real-time interactive processing of climate big data. Furthermore, it improves multi-tenancy, cluster utilization, scalability, and compatibility in climate big data processing and management. For example, Kanwar (2018) implemented YARN-based method using climate weather datasets. This comparatively improves Hadoop execution, tightens up confines, and fixes the scheduling and resource carriage issues.

  4. (iv)

    Google BigTable

    Climate big data storage is important and requires an efficient implementation framework that can handle distributed, column-oriented data storage and a large amount of climate-unstructured data (Ikegwu et al. 2022). In addition, handling Google’s internet search and web service operations and its related services of climate change data storage is efficiently carried out with the help of Google BigTable. For example, the database choice was taken into consideration while implementing geospatial-temporal data processing and storing; Google BigTable, Cansandra, etc. were thought to have a significant impact (Shykhmat and Verses 2023). Predictive maintenance for agricultural vehicles is included in the analysis. Meteorological stations, weather satellites, and environmental organizations can provide information about the weather, climate, and air quality. Vehicle performance is significantly impacted by weather conditions. For instance, excessive heat can cause an engine to overheat; rain and snow can muddy fields, making traction more difficult; and dust and sand are quite harsh on-air filters and radiators. However, managing geospatial-temporal data in a distributed fashion involves specific challenges and concerns.

  5. (v)

    Apache Spark

    Apache Spark (More et al. 2019; Ikegwu et al. 2022) is a fast and large-scale climate data processing. Major libraries (e.g., graphX, machine learning libraries (MLlib), spark streaming API, and spark SQL) are supported by a spark. Hence, runs programs up to 100× faster compared to some other big data frameworks (e.g. Hadoop, MapReduce, etc.) especially in memory or on disk up to 10×. Spark technology works well with Hadoop, YARN cluster manager, Java VM, and other architecture. Furthermore, Apache Spark can process a high volume of climate big datasets generated in memory with a high-speed response time and in addition, provides an alternative platform for stream data processing and analysis. For instance, Xu et al. (2020) accurately utilizes Spark to forecast wind speed big data in multi-step. It shows that the Spark distributed computing framework has a faster computation speed when processing climate big data, compared to other baseline processing frameworks.

  6. (vi)


    The Mahout (Zhang and Li 2020) is one of the machine learning libraries and frameworks that provides scalable, easy-to-use, and extensibility libraries for big data analytics. Mahout-Samara is a newer version, the project helps users build their own distributed algorithms, instead of using the existing library. However, it is not friendly as the configuration is not compatible with the existing Hadoop cluster. Nonetheless, some companies have utilised Mahout in big data analytics such as LinkedIn, Mendeley, etc. For example, Bhavanandam (2022) utilized Mahout to expedite unsupervised machine learning algorithms for crop yield prediction based on weather. This paradigm facilitates the identification and forecasting of various agricultural production conditions. In addition, Mahout was used to determine the climate parameters that affect the study, evaluation, and forecast of acer mono sap liquid water using machine learning models and data from the Internet of Things (Lee 2020). Despite the study's 98.25% accuracy rate, everyday yields have persistent issues since the specific data provided by Korea Forest Service and Meteorological office are introduced.

  7. (vii)

    Apache Flume

    Apache Flume is a streaming framework platform that enables distributed, reliable, and accessible web services from different sources to efficiently collect, aggregate, and move large amounts of climate data to a centralized defined datastore (Ikegwu et al. 2022; Doreswamy and Manjunatha 2017). Apache Flume provides simple and flexible architecture based on streaming data flows with robust, fault-tolerant, reliable, failover and recovery mechanisms to ensure efficient big data processing and management. This architecture has been widely utilized in climate change-related data processing and streaming. For example, Bouziane et al. (2021) suggested using a cloud-based architecture like Apache Flume, IoT, Hadoop, etc. to speed up the intelligent management of water resources because climate change raises energy and water consumption globally.

Table 3 Strengths and weaknesses of the analytics methods

Open research directions

The identified areas of further research are highlighted below:

  1. (i)

    Data understanding challenges

    The understanding of data is essential as it forms the basic subjects utilized in climate change prediction and forecast to aid strategic planning, policy formulation, modeling, and implementation of climate change impacts. The challenge in data encompasses data observation, structure, format, pre-processing, feature extraction, modeling, etc., which is still inherent to the implementation of climate change effect (Avand et al. 2021; Dueben and Bauer 2018). Take, for instance, the voluminous geographical dataset, spatial–temporal dataset, and meteorological dataset from public climatic databases such as CMIP5, HadGEM3, NCHMF, social media site, CORINE, IFC-Cloud-NEXRADGIES, NASS, USDA, etc. (Hossain et al. 2022; Costache 2019; Yang et al. 2022; Zia 2021; Manley and Egoh 2022; Seo et al. 2019; Colorado-Ruiz et al. 2018) that are being extracted require full training using big data analytic models before knowledge can be informed with it for effective data analysis, prediction, and forecasting. Further research is also beckoned on how to obtain updated stored data automatically and when needed on public climatic database domain, government documentaries, and high precision real-time map big data (Guo et al. 2020). Furthermore, understanding and familiarization of the data storage, processing, analysis, modeling, and visualization to aid organizational policy-makings are necessary and this area requires urgent research to ensure comprehensive knowledge on the digital trends of the data collected.

  2. (ii)

    Problems with climate change selection methods

    Selecting climate change analytics methods (Huan et al. 2021; Hossain et al. 2022; Siddiqua et al. 2021; Van Der Linden 2021; Kalb et al. 2021; Kadow et al. 2020; Rolnick et al. 2022) is difficult due to the nature of the model to be used. Each method requires extensive and indebt training. Each method has strengths and weaknesses and requires special skills to choose which method is best suited for climate data modeling and analysis. Further comparison is needed to determine the more efficient approach to select the methods based on the processing and analysing tasks. More so, more dynamic updates are required by the developers of General Circulation Models (GCMs) to improve climate modeling resulting in high emissions (Finaritra et al. 2021). More focus is required specifically to choose the appropriate methodologies and theoretical sensors or lenses to expedite the particularities of the selected methods.

  3. (iii)

    Issues with climate big data management

    The data cleaning reliability, a large volume of data aggregation of different climate big data generated, etc. requires an improvement. There is also an issue in encoding data for climate data security and privacy are most essential to the data-driven environment and for organizational success. Finding elastic measures to handle large datasets is challenging due to the progressive generation of climate big data. However, some architectures such as Spark, Hadoop libraries, and MapReduce have been considered by different authors (Mirpour et al. 2021; Xu et al. 2020; Manogaran and Lopez 2018), though much improvement on the frameworks and libraries for effective data processing and evaluation is achieved.

  4. (iv)

    Technological trends challenges

    As big data analytics improves the modeling and analysis of climate big data, improving the climate processes still required further studies. The emergence and convergence of different technology and embedded system day-by-day has made the implementation of climate big data storage and processing quite difficult. For instance, big data has its peculiar challenge, a lot of data generation, data integration, real-time streaming of data, network speed accessibility, data diversity and security, data cloud storage, etc. (Ikegwu et al. 2022). Although, AI and intelligent system (Kadow et al. 2020; Narayan 2021; Irrgang et al. 2021) that sense and learn from the environment helps to solve the identified complex earth processes. However, it still further requires research studies. Enhanced sensors, better satellite imagery, faster data storage, better software and hardware, and smarter intelligent systems are required to solve the inherent embedded system challenges (Preteek et al. 2020) and these requires an indebt research.


Climate science and big climate data-intensive areas has been drastically affected by the emergence of big data analytics and essential technological revolutions. Big data analytic methods have unprecedentedly impacted large climate data analysis. Researchers have hitherto realized the need to increase research on climate change. However, climate change as the global warming effect has negatively impacted the economic growth and human standard of living across the globe. All these ravaging issues can be tackled with a technological approach. In this paper, we present the impact of climate change, the methods for modeling and combating climate change effects, the purpose of deploying big data analytics to analysis climate big data, and big data analytics methods. In addition, the source of data and the type of datasets utilized for modeling climate change and implementation frameworks to combat climate change effects were also discussed. Furthermore, open research directions were highlighted to give insight for future studies. The practical implications of this study are multi-faceted. First, it provide vast array of data analytics methods for climate scientists to model climate change effects for appropriate policy formulation by governmental organizations. Second, the review unveil wide varieties of climate data recently collected for climate change effects modelling. Third, the study act as avenue to promote the need for the global community to adopt and speedy up campaign against climate change. Finally, the research revealed important areas where big data analytics based climate change modelling could be deployed. Various strategies were outlined in literature to mitigate the impacts of climate change. These include reduction in CO2 emission by industries, adopting efficient implementation of renewable energy, and afforestation.

Availability of data and materials

Not applicable.

Code availability

Not applicable.


Download references


There is no external funding received by the authors.

Author information

Authors and Affiliations



ACI coined out the title, surfed the internet for materials, and participated in manuscript preparation especially “Introduction” and “Implementation frameworks for climate change modeling using big data analytics” sections, and formatting of the manuscripts. HFN drafted out the structures, wrote “The methods for modeling and combating climate change” section and participated in the proofreading of the entire manuscript sections. ME does the grammatical check, correction, and editing. CVA screened the materials and participated in writing “The impact of climate changes” section. EM filtered the selected journals and participated in manuscript editing. SAI worked on “The common datasets for modeling and combating climate change effects” section. In addition, URA participated in the manuscript draft, proofreading, and reviewing of the entire manuscript.

Corresponding author

Correspondence to Anayo Chukwu Ikegwu.

Ethics declarations

Ethics approval and consent to participate

The authors certify that the study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. The full APC waiver was approved by the management of Springer Nature after written documents informed consent was obtained from all the participants/authors. The waiver approval was contained via

Consent for publication

No applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ikegwu, A.C., Nweke, H.F., Mkpojiogu, E. et al. Recently emerging trends in big data analytic methods for modeling and combating climate change effects. Energy Inform 7, 6 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: