Location and solar system parameter extraction from power measurement time series

Photovoltaic (PV) systems are considered an important pillar in the energy transition because they are usually located near the consumers. In order to provide accurate PV system models, e.g. for microgrid simulation or hybrid-physical forecast models, it is of high importance to know the underlying PV system parameters, such as location, panel orientation and peak power. In most open PV generation databases, these parameters are missing or are inaccurate.In this paper, we present a framework based on particle swarm optimisation and the PVWatts model to estimate PV system parameters using only power feed-in measurements and satellite-based ERA5 climate reanalysis data. Our sensitivity analysis points out the most relevant PV system parameters, which are panel and inverter peak power, panel orientation, system location and a small but not negligible influence of ambient temperature and albedo. The detailed evaluation on one exemplary PV system shows an acceptable accuracy in panel azimuth and tilt for the use in microgrid PV system simulation. The extracted location has less than 25 km of positioning error in the best case, which is more than satisfying with respect to the underlying data resolution of the ERA5 dataset. Similar results are observed for 10 systems in Europe and the USA.


Introduction
Due to the transition of the power grid towards clean energy, an increasing penetration of distributed renewable energy sources, mainly Photovoltaic (PV) systems at rooftops, have been observed. On the one hand, such a distributed power generation puts additional pressure on the power grid when it comes to grid management (grid limits, voltage stability or other ancillary services), but on the other hand, the reduced distance between power generation and consumption provides huge potential for regional consumption optimisation.
The trend of emerging energy communities and microgrids, as well as forming of virtual power plants for participating in ancillary service markets, such as primary frequency response, requires accurate power generation models for both simulation of different scenarios and improved forecasting with physical-hybrid models. More particular, PV generation models typically require essential system parameters, such as the panel azimuth, panel tilt, installation location, as well as PV cell and inverter electrical behaviour (e.g., influence of ambient temperature, efficiency and rated power limits).
Some of those PV system parameters are available in power plant databases such as PVOutput.org or national registration databases. However, the data quality is mixed because these values are sometimes user generated and thus suffer from incorrect reporting. Haghdadi et al. have shown that the panel tilt for 10% out of 5000 samples from pvoutput.org is missing and demonstrated to be wrong for 32% of the cases (Haghdadi et al. 2017). In addition, these databases often suffer from inaccurate value resolution (e.g., 45°interval at panel azimuth by defining only the compass direction) or have been identified as default values (e.g. 0°or 1°tilt angles) (Killinger et al. 2018). However, the generated power output of a PV system is usually measured for remuneration purpose using smart meters or is monitored via values from the inverters. This data could be used to automatically determine or validate given PV system parameters and gain a more accurate PV model for microgrid simulation or forecasting. Locating energy data might also cause privacy issues as shown by Chen et al. (2016); Chen and Irwin (2017a).
In order to analyse the feasibility of such an automatic PV system parameter estimation tool, we consider the following research question: What are the most relevant parameters of an entire PV system and how accurate can these parameters be estimated by only considering historical time series feed-in power measurements and globally available satellite weather data?
In order to answer this question, we contribute with the following points: • Performing a variance-based global sensitivity analysis on the National Renewable Energy Laboratory NREL PVWatts model in order to reduce the search space dimension and extract the most relevant model parameters, which are: peak power, panel azimuth and panel tilt, as well as location. • Describing and implementing a Particle Swarm Optimization (PSO)-based framework to estimate PV system parameters based on the NREL PVWatts model (Dobos 2014) and satellite-based ERA5 climate reanalysis data (Hersbach et al. 2020), which is available for most parts of the globe.
• Evaluating the applicability of the presented approach with a detailed numerical experiment on one exemplary PV system showing reasonable accuracy in peak power, location, as well as panel azimuth and tilt. Tests are repeated on 10 systems in Europe and the USA.
In the following "Related work" section, an overview on related literature about PV parameter estimation is provided. In the "Methodology" section, an optimisation problem of finding relevant PV system parameters based on the PVWatts model using ERA5 reanalysis data and different error metrics is defined. In addition, PSO as meta-heuristic solver is discussed. In "Experiments and discussion" section, the proposed method is tested on measured data and the accuracy of the estimated parameters is presented.
Finally, the work is concluded and an outlook to potential future work is provided in the "Conclusion" section.

Related work
In literature, PV parameter estimation has been performed on cell/panel level and on system level. A structured summary of related work is listed in Table 1.
On cell level, most authors focus on equivalent electric circuit models such as the Single Diode Model (SDM) (da Costa et al. 2010;Soon and Low 2012;Ma et al. 2013;Silva et al. Kang et al. 2018;Jadli et al. 2018), Double Diode Model (DDM) (Dali et al. 2015) or they compare both (Mughal et al. 2017;Chen et al. 2019). The SDM and the more complex DDM are used to model the current-voltage (IV) output characteristic of single PV cells or panels with relation to external influence of irradiation and temperature. This is achieved by stating an equivalent electric circuit of the cell with at least a series and a shunt resistance, diode reverse saturation current and a diode ideality factor (Gray 2011 Ruelle et al. (2016) propose to estimate PV system parameters (panel orientation and peak power) using a direct search method to minimise the normalised MAE between PV system simulation data and measured data. In order to avoid a local minimum, the initial estimate is settled to the best variable set out of 100 initial samples. Their simulation model is based on a SDM as part of the Sandia Array Performance Model (SAPM) (thus including Plane of Array (POA) irradiance calculations), which is parameterised with assumed electrical variables. The impact of these variables have not been further analysed and the location of the system is known. Performance improvements are achieved by filtering out cloudy days and shaded hours. Saint-Drenan et al. (2015) developed an algorithm to estimate the panel orientation by using PV power and meteorological data measurements from a known location. Their PV model has three variables: panel tilt, azimuth and angular loss coefficient, with which the effective irradiance is calculated and the simulated power output is fetched from a Look-up Table (LUT) that takes irradiance and air temperature as inputs. The PV system variables with the maximum likelihood that simulated power matches measured power are accepted as best estimation. Mason et al. (2020) present a Deep Neural Network (DNN) approach, which extracts PV panel tilt and azimuth from net load metering data. For that, they identified relevant features that have a relation with PV panel tilt and azimuth. Their model is trained on simulated PV data at one specific location that has been combined with customer load profiles. Their approach has been evaluated at one known location on synthetic data. Meng et al. (2020) propose a data-driven method to parameterize panel azimuth and tilt based on the normalised shape of one clear sky day per month. The best fitting parameter sets (lowest RMSE between the POA irradiance and normalised measurements) are overlapped in order to infer the final estimate. They validated their method using simulated and real PV power measurements. Their curve fitting method requires Global Horizontal Irradiation (GHI) data, which has been taken from ground measurement stations nearby or satellites with increased estimation error. The method has not been designed to find the location of the PV system, but worked comparable well on 15 min, 30 min and 60 min data resolution. Haghdadi et al. (2017) presented a two-step estimation of location and panel orientation. First, the longitude, which is stated to be independent from the other variables, is extracted from the position of the solar noon. A similar approach is also described in Williams et al. (2012). Second, latitude, panel azimuth and tilt are estimated using least square method to fit the variables of a simulation model (using NREL PVWatts model) to the measured power. This has been performed on clear sky days only, which have been identified by power output fluctuations and by fitting a 3-dimensional surface to the output data. Three extensive case studies provided good results for panel tilt and azimuth (MAE of 2.75°and 5.85°), however the location mismatch is quite high on latitude (4.08°∼ 225 km), which might be improved using weather data. Chen et al. (2016) describe a method to infer latitude and longitude independently. First, peak power production per day is fitted on an equation of time to compensate the difference between apparent solar time (actual sun movement) and mean solar time (solar noon 24hours apart). Longitude is then inferred from the extracted solar noon using binary search on a non-reversible sunset/sunrise calculation algorithm. Second, latitude -as function of daylength -is determined by extracting the average daylength within a year. Their approach requires high-resolution solar power measurement data (smallest possible area for minute resolution data is 28 km radius) and their prototypical evaluation focuses on almost south facing systems in the Northern Hemisphere. However, the impact of panel tilt and orientation has been shown as high influence factor for finding system's location but has been shifted to future work.
In a further work, Chen et al. 2017b iteratively apply a multi-step binary search in order to fit panel sizing (first step), orientation (second step) and tilt (third step) of a clear sky generation model to the daily maximum power generation of pre-processed, hourly smart meter net load measurements. The starting parameters is set to be the optimal panel orientation. The focus of that work is on disaggregating net load measurements into consumption and solar generation at a known location only.
For locating different types of energy data (consumption, wind and solar), Chen et al. use a weather signature, based on temperature, wind speed and cloud cover from ground weather stations of different locations (Chen and Irwin 2017a). In order to reduce the search space, daily correlation for initial filtering (k-means clustering) of the big weather database is used, before extracting the weighted (correlation) midpoint of locations in the cluster, based on a hourly analysis. To interpret and compare the weather signature with the solar generation data, a physical model with roughly estimated parameters using a combined approach from Chen et al. (2016) and Chen and Irwin (2017b) is used. Unfortunately, the accuracy of the system parameters (panel peak power, tilt and azimuth) have not been commented and the granularity, as well as the distribution of the underlying ground weather stations does not get clear. Satellite data might provide a more uniform coverage despite a potential lower spatial resolution.
Comparing the different approaches in literature, it can be seen that most related work does not consider PV system location, panel orientation (azimuth and tilt) and PV system component sizing (panel and inverter peak power) as unknown parameters simultaneously (exceptions in Haghdadi et al. (2017); Chen and Irwin (2017a)) and thus are limited to their specific use case. For search space reduction and avoidance of local optima, physically ideal starting parameters (Chen and Irwin 2017b), the best of set of initial samples (Ruelle et al. 2016) and filtering with lower resolution data (Chen and Irwin 2017a) has been used. As all parameters effect the power generation collectively, we consider estimating all parameter at once. We thus propose a simulation based PSO that uses ERA5 reanalysis data in order to improve location estimation. The usage of PSO is motivated further in "Particle swarm optimisation" section.

Methodology
For the proposed PV system parameter estimation framework, first the PV model that calculates the power output from relevant input variables is detailed. Afterwards, the objective function using different error metrics for comparison and the solving method is explained.

PV model
Unlike the equivalent electric circuit models, the PVWatts model directly estimates the power output of the PV panel. For evaluation, the more commonly available PV panel peak power can be compared to the estimated parameter from the PVWatts DC model; Measured IV curves under various environment conditions are not required. The PVWatts model still encompasses basic physical relation of input and output by relying on meaningful parameters, whereas other models such as the SAPM are mainly fitted with empirical measurements (King et al. 2004). We thus consider a PV system model chain that is mainly based on the PVWatts model and detailed in the following.
The model in this work is limited to the commonly used monofacial PV panels, also PV panel axis tracker are excluded. The focus is on one-sided PV panel orientation, however, east-west panel combinations are working as well, as shown later.

Inverter model
The PVWatts model includes multiple sub models. One of these is the inverter model that integrates the inverter efficiency η by defining the conversion from DC power P dc to AC power P ac and limiting to the inverter nominal power rating P ac0 , as shown in (1).
where ζ = P dc P dc_limit and P dc_limit = P ac0 η nom The constant values in Eq.
(2) have been extracted from an analysis of the California Energy Commission (CEC) inverter performance database and are part of the PVWatts model. The reference inverter efficiency η ref from the actual most typical inverter is 0.9637, the default nominal efficiency η nom is set to the proposed value of 0.96 (Dobos 2014). These assumptions represent a typical inverter efficiency, however the overall power output is mainly influenced by panel and inverter power (depending on sizing and irradiance) and panel orientation as shown in the sensitivity analysis later.

Cell model
The DC power P dc of a PV panel is calculated with the PVWatts DC model as shown in Eq. (3). In this model, the panel efficiency is assumed to decrease at a linear rate with increasing temperature. This is governed by the temperature coefficient τ , which depends on the module type.
The parameters are defined as following: • I tr represents the effectively transmitted plane of array irradiance on the PV cell in units of W /m 2 . The angle of incidence losses need to be applied beforehand (detailed in "Irradiance" section). • I tr0 is the reference irradiation, which is 1000 W /m 2 .
• T cell is the calculated PV cell temperature in°C.
• P dc0 is the nominal DC power of the PV module at reference irradiation I tr0 and cell reference temperature T ref .
• τ represents the temperature coefficient in units of 1/°C. This value is typically between -0.002 and -0.005 per°C.
• T ref is the cell reference temperature, which is defined to be 25°C. From the DC model parameters, the temperature coefficient τ and the nominal DC power P dc remain as variables in the optimization problem. The other parameters are calculated as defined in the following.

Temperature model
Instead of the temperature model from Fuentes 1987, that has been developed in the 1980s and is used in PVWatts, we calculate the cell temperature with the SAPM (King et al. 2004). This is because the early model has proven to be unnecessarily complex and thus is leading to integration issues of new module technologies. In addition, the SAPM uses less parameters by providing a temperature accuracy of ±5°C resulting in an uncertainty of less than 3% of the power output, according to King et al. (2004).
The cell temperature is calculated in the SAPM by using the ambient dry bulb temperature T a in°C, the plane of array effective irradiance I tr in W /m 2 and the wind speed WS in m/s at a height of 10 meters, in order to include heating effects from the sun and cooling effects from the wind. The cell temperature is calculated in (4) and the back-surface module temperature T m is defined in (5).
The ambient temperature T amb , as well as the wind speed at 10m height are extracted from the ERA5 reanalysis data in this work and they depend on the PV system location. The parameter sets of a (coefficient for module temperature upper limit at low wind speeds and high solar irradiance), b (coefficient for the rate at which module temperature drops as wind speed increases) and T represent the thermodynamics of the panels and their installation. Some empirically determined examples are shown in Table 2.

Irradiance
The effectively transmitted POA irradiance I tr is a linear combination of the direct POA irradiance I beam , the sky diffuse irradiance I diffuse and the ground reflected irradiance I reflected , defined in (6). The calculation of these parts, Eqs. (7) and (10), is based on GHI, Panel surface tilt β and panel surface azimuth γ (the panel orientation) remain as problem variables, whereas the solar azimuth γ sun , and solar zenith θ sun are calculated from the position of the sun at a certain time using the NREL Solar Position Algorithm (SPA) (Reda and Andreas 2004;. The additional parameters of the SPA are location (latitude and longitude), elevation (=altitude; can be derived from latitude and longitude with an elevation map), as well as the yearly average air temperature (assumed to be 12°C) and pressure (calculated from altitude) for atmospheric refraction correction.
The angle of incidence correction within PVWatts V5 to adjust the direct beam irradiance in order to account for reflection losses in the glass surface of the PV panel is not used in this work. This is because the difference in power output for standard glass modules is negligible according to Dobos (2014). The additional parameters would complicate the model by providing only minor impact.
For calculating the diffuse irradiance, multiple methods have been proposed. Loutzenhiser et al. evaluated seven models with experimental data on vertical building facades and found out that the Perez (1990) formulation provides the most accurate results for their building heating energy scenario (Loutzenhiser et al. 2007). For this work, however, we use the Hay-Davies model (Hay and Davies 1980) due to the following reasons: On the one hand, the Perez model is more complex and is based on empirically derived coefficients (Perez et al. 1990). On the other hand, the accuracy of the Hay-Davies model still has acceptable accuracy in irradiance on vertical plane (1.1% mean error compared to 0.5% mean error of Perez model at peak times (Loutzenhiser et al. 2007)). In addition, the impact of diffuse irradiance on the overall POA irradiance is even lower in non-vertical scenarios, such as it is the case with roof-top PV systems, which are usually oriented towards the sun and thus are dominated by the direct beam irradiation. The Hay-Davies model is composed of an isotropic and circumsolar component, and horizon brightening is neglected.
where the anisotropy index A = DNI I ET and R b = cos(α) cos(θ sun ) The radiation on the earth's atmosphere varies slightly over the year, thus this extraterrestrial radiation I ET is calculated with a yearly varying term in order to account for the eccentricity of the Earth's orbit around the sun. We use the Spencer model that is defined through Fourier series (Spencer 1971) with x as the day angle for the earth's orbit around the sun in Eq. (11).
+0.000719 · cos(2x) + 7.7e − 05 · sin(2x)] and the solar constant I SC = 1366.1W /m 2 (11) The ground reflected irradiance I reflected represents the reflected irradiance, which usually distinguishes between different types of ground by using the albedo factor. Although the albedo depends on the location and changes with seasonal effects such as snow or rain, an albedo of 0.25 is assumed in this paper. This value is a compromise of typical reflection for onshore surfaces (0.1 -0.4) and the average albedo from Earth (0.34), which roughly represents the reflection of grass (McEvoy et al. 2012). The effect of different albedo factors on the total irradiance with an economically optimised PV system in central Europe (panel tilt of 36°) is less than 2% (excluding snow condition) and can thus be neglected for the purpose of this work. The reflected irradiance is calculated as defined in Eq. (12) extracted from Loutzenhiser et al. (2007).

Model parameter discussion
The overall PV system model is derived by combining the individual models defined in Eqs.
(1) to (12) and the NREL SPA for calculating the position of the sun (sun azimuth and sun zenith). Irradiation (GHI, DHI) is obtained from the ERA5 reanalysis data, which requires the location of the PV system and the considered time as input. Environment condition, like the ambient temperature, and wind speed at 10 m height can be extracted from the ERA5 data as well. In order to further reduce the amount of parameters, altitude is calculated from SRTM 90m Digital Elevation Database v4.1 (Reuter et al. 2007;Jarvis et al. 2008), which is based on satellite data (by the NASA). Thus, the remaining PV system model parameters that are considered as decision variables are listed in Table 3.

Sensitivity analysis
In order to reduce the number of model parameters, a variance-based sensitivity analysis according to Sobol 2001, using the samples improvement by Saltelli (2002); Saltelli et al. (2010), is performed. Instead of the measured values from the ERA5 data set, GHI and DHI are calculated using the Ineichen clear sky model and thus depend on the PV system location. Wind speed (WS) at 10m height and ambient temperature (T amb ), as well as  Table 2) albedo and inverter efficiency η nom are added as parameters in the sensitivity analysis to measure their influence on the power output. For the model sensitivity analysis only, the time has been fixed to the 21th June, 12:00 UTC (day with most hours of daylight in Northern Hemisphere) in the year 2020 (relevant for the extraterrestrial radiation). Changing the date or time (excluding night times) does not change the sensitivity of the considered parameters significantly. The bounds of the considered parameters are listed in Table 4 (latitude and longitude roughly covering Europe) and a sample size of 10000 is used. From the analysis of the first-order and total-order indices (compare Fig. 1), it can be concluded that the main parameters with the highest influence on the output power are, as expected, the inverter and panel peak power, the panel orientation (β and γ ) as well as the location of the PV system. Ambient air temperature (T amb ) and the albedo factor still have a small but measurable impact on the output power.
The temperature coefficient τ , as well as the parameters for the PV module heating model a, b and T and inverter efficiency eta nom have negligible impact. These findings are inline with Hansen et al. 2013, who performed a detailed sensitivity analysis on the individual models. They observed a dominating contribution to the uncertainty in daily energy by the POA irradiance and the effective irradiance models, which depend on location, and panel orientation. Thus, we use the glass/glass close roof configuration from Table 2 and τ = −0.003 in order to reduce the number of dimensions in the search space.

Problem formulation
The discussed model parameters are subject to an optimisation problem to minimise the error between the calculated P t ac , according to the equations defined above, and the measured AC power P t measured at each timestamp t. The error metric thus defines the objective function (sometimes called fitness function in the context of PSO).
We consider commonly used RMSE in Eq. (13), which tends to emphasise the effect of outliers, and MAE in Eq. (14), which is more robust to outliers and thus better represents the average characteristics of a potential solution. In addition, these metrics are compared to MAD in Eq. (15) and IQR filtered RMSE and MAE metrics. The latter three completely avoid the effect of outliers as only the better half of the error series is considered. According to Stein et al. 2010, satellite irradiance data provide a similar accuracy compared to ground measurements considering the mean error. However, the standard deviation is larger and thus filtering out outliers in these metrics seems to be a suitable option.

Particle swarm optimisation
The Particle Swarm Optimization (PSO) is an optimisation technique that emulates the social behaviour of biological organisms, such as bird or fish swarms. First, a set of particles, referred to as the swarm, is randomly initialised in the n-dimensional search-space (evenly distributed). Each particle represents one candidate solution. In order to find the optima, the particles then move around the search-space using historical position and velocity of themselves and their neighbours. The original PSO algorithm is attributed to Kennedy and Eberhart (1995); Shi and Eberhart (1998) and has been developed to solve non-linear equations. Over time, many variations (e.g., different topology, search-space characteristics or constraints) have been used in research in order to solve a variety of problems. For this work, we use the classical star-topology, in which each particle is attracted by the best performing particle of the whole swarm, which is assumed to be near the global optimum. The position of the particle x i at the current step s is updated with the computed velocity v i at s + 1, as in Eq. (16). The velocity of a particle is calculated as a linear combination of: (1) its own damped previous velocity (parameter w for inertia), (2) its deviation to its p i neighbourhood (parameter c 1 for cognitive behaviour), and (3) its deviation to the best particle of the swarm p g (parameter c 2 for social behaviour), as defined in Eq. (17). The two parameters c 1 and c 2 define if the swarm is more explorative (following personal best) or exploitative (following swarm's global best). The independent random numbers r 1 and r 2 in the range of [ 0, 1] introduce a certain randomness into the velocity (next iteration), more details can be found in Shi and Eberhart (1998).
One of the main advantages of the PSO algorithm is that it does not use the gradient of the function. Thus, it is not required to have an objective function that is differentiable. As we obtain irradiance from the ERA5 dataset based on the location parameters, our problem cannot be differentiated. In more general, PSO can be classified as metaheuristic as it makes few (in our case decision variable boundaries) or no assumptions about the underlying problem to be optimised. Compared to variants of the population based Genetic Algorithm, PSO provides the same quality of solution while reducing the computational effort (Hassan et al. 2005). As panel tilt β, cell peak power P dc and efficiency parameters (roughly scaling the output power) have similar effects on the overall generation (Chen and Irwin 2017b), it appears that searching the entire parameter search space is required in order to avoid local optima. In addition, similar weather conditions in different areas of the ERA5 dataset could lead to local optima of the parameter set. PSO is a suitable method to avoid local optima, as lots of sample solutions, spread in the search space, are compared at each step and the overall solution is steadily directed towards the best known optima.

Experiments and discussion
In the following, the proposed method is evaluated with an exemplary PV system, for which all relevant parameters are known. First, the used data is described and second the results are presented and discussed.

PV system data and pre-processing
The exemplary PV system DC peak power is rated with 11.55 kW and the panels are connected to two inverters with each 4.6 kW nominal AC output (5.06 kW maximum). The inverter efficiency (η nom = 95.9% extracted from the curve in the datasheet according to weighted CEC definition) roughly matches the typical values extracted from the CEC inverter database (η nom = 96%) quite well. The PV system is installed on a roof top with a roof pitch (panel tilt) of 23°; Panel azimuth is 195.45°(slight south west direction). The PV system is 13 years old, thus a degraded panel peak power is expected. There is a minor shadow effect in the morning, which makes the PV system a good candidate for a detailed analysis as perfect systems are rare. The power output of the tested PV system has been collected at the digital, calibrated energy meter and is averaged to 1 hour mean values for the year 2020 to match the temporal resolution of the ERA5 reanalysis data. Due to some measurement issues, 374 hours are missing and have been excluded from the optimisation. The measured power output and data gaps are visualised with a quarter hourly resolution in Fig. 2.
In order to focus the error metric on productive time and improve calculation performance, night conditions (measured power smaller than 100 W) have been filtered out. This is especially important for the metrics that focus on the better half of the error series (MAD, IQR filtered RMSE and MAE), which is the case at night condition when the error is almost to zero.
The considered period has been limited from beginning of April until end of October in order to avoid the influence of snow and tree shadows (due to low sun zenith) as good as possible. This assumption is backed by observations but can also be observed from the monthly Pearson correlation between GHI (from ERA5 weather data at the location of the system) and the power measurements (see Fig. 3). Winter and late autumn season seem to have a higher mismatch induced by snow covered PV panels and higher impact of shadows due to lower sun zenith. Solar irradiance, wind speed and ambient temperature are extracted from the ERA5 reanalysis data based on the considered location. ERA5 data (reanalysis-era5-singlelevels) has been prepared with a resolution of 0.25°for both latitude and longitude, which is around 16 -20 km on latitude and 28 km on longitude in the tested part of Europe (same area as in the sensitivity analysis, Table 4). For a more precise calculation, the ERA5 data is linearly interpolated with the given location at each step.
When running the simulation model with ERA5 data using the actual system parameters for the whole year, a MAE of 567.42 W can be found. The deviation between the measured and simulated PV system can be explained with observed minor shadow effect in the morning, degraded panel peak power, model inaccuracy and mainly by the inaccuracy of the satellite weather data (compare sample period in Fig. 4).
Time series data (measured power and ERA5 data) is shifted by 30 minutes in order to calculate the sun position at the halftime of the corresponding mean period.

Results
For the experiment a swarm size of 200 with 400 iterations and c 1 = 0.7, c 2 = 0.3 and w = 0.9 results in a stable solution. The swarm acts more explorative than exploitative and thus finds the global optimum in most of the times. After around 150-200 iterations, the particle velocity of all six parameters converges. This is visualised as velocity history graph, normalised to the parameter's search range, for MAE in Fig. 5. The velocity of the inverter nominal power P ac0 increases after a while and stabilises again at the boundary of the search space. Similar convergence behaviour is observed for all tested metrics. The calculations have been repeated 15 times in order to measure the impact of the random swarm initialisation. For the exemplary system, RMSE and MAE find similar optimum parameter sets in all 15 repetitions as visualised in Fig. 6, whereas MAD and IQR filtered RMSE and MAE metrics converge in slightly different solutions. This is caused by only considering the better half of the error series, which changes in each iteration.
The inverter power P ac0 is overestimated independently of the used metric (mostly at upper search space boundary at 20 kW). This is due to minor impact of higher inverter  The initial under-sizing of the inverter (10.12 kW maximum inverter power versus 11.55 kW panel peak) has not been detected. This might stem from panel degradation and soiling, which reduced the panel peak power below the maximum inverter power.
The nominal peak power of the panels is underestimated in average between 0.8 and 1.55 kW with different metrics. This can be explained with the initial under-sizing of the inverter (1.43 kW), in addition to panel degradation and soiling effects. Nevertheless, these fitted parameters might better represent the current state of the system compared to the rated power at installation time. Fig. 6 Mismatch between the estimated and actual parameters (15 repetitions). Parameters search space according to Table 3. Inverter peak power P ac0 is mostly optimised to the search space boundary, the remaining parameters are within the search space Regarding the location parameters latitude and longitude, RMSE and MAE provide a stable solution with a distance of 30 km from the actual location (mainly west, slightly south direction). This deviation could be explained with a regular minor shadow in the morning that shifted the longitude in general to the west (ignoring different weather). MAD and especially IQR filtered metrics halve longitude error, but double the latitude error, resulting in around 25 km distance error (southeast). Using these metrics, the impact of the minor shadow effect in the morning is reduced while the results are not stable in each run.
Panel azimuth γ estimation is comparable with all metrics except RMSE and ranges between +1.68°to -0.59°. The over-estimation with RMSE (+8.8°error) is assumed to origin in the observed minor shading in the morning and the fact that the RMSE metric emphasise the impact of outliers. Panel tilt β error is quite high in all metrics, which might stem from its smaller influence on the system, as shown in the sensitivity analysis.

Discussion
The estimated PV system parameters should be considered as the best fitting parameters for the simulation model. The ERA5 data is useful for locating the system by incorporating a broad variety of weather situations, however it does not provide an accurate representation of the local situation due to its low spatial resolution.
Regarding the isolated impact of panel tilt and azimuth error in more detail, the calculated annual energy generation differs by in average of 0.35% per degree tilt and 0.08% per degree azimuth in the range of ±15°around the actual orientation (compare Fig. 7). The absolute estimation error in azimuth is 1.68°for MAE metric and thus approximately 0.13% error in annual energy generation, which could be considered as negligible. However, the estimation error in tilt (6.24°for the MAE metric) results in a annual energy generation error of 2.184%. Comparing this error with the isolated annual energy generation error for location mismatch, which accounts for around 0.2% error of annual energy, it becomes clear that a better tilt estimation is required.
The method using MAE has also been tested on a east-west sided PV installation. Its location is estimated with a similar accuracy (23.79 km distance error) compared to the single-sided system in the same area. As the weather signature in that region mainly Fig. 7 Impact of azimuth and tilt error on yearly energy generation compared to actual orientation influences the location estimation of the system, it was even possible to identify a twosided setup. The angles of the panels (azimuth error of up to 22°) and their peak power (deviation of around 25%), however, are not very accurate.
In addition, we tested 5 PV system installations in California/USA for which the given ZIP code covers the smallest area, which are mainly located in cities. It was possible to allocate the hourly resolution time series of the year 2016 with an error of between 35.74 km and 62.4 km to the centre point of the ZIP code area using the MAE metric. The panel angles are not very accurate, which can be accounted to the shading from nearby buildings in the urban area. This can also be observed when comparing the GHI of clear sky days at the estimated location with the power measurements, where the power is heavily reduced in the morning and in the evening. The tilt angle for one system with flat panels (0°tilt), however, could be identified reliably. Location estimation on 5 further PV system at different locations in Bavaria/Germany is working with a mixed accuracy ranging from 17-90 km. Panel orientation is not documented for these five systems.
When comparing the parameter estimation accuracy with related work -even when using the exact same dataset is not possible -Saint-Drenan et al. 2015 performed better on panel azimuth and tilt error (less than 2°in optimal cases) using satellite irradiance and temperature values. No accuracy on their location estimation was given. The data-driven approach by Meng et al. (2020) achieved an MAE in azimuth 4.5°and tilt 4.3°. When applying our approach with known location using RMSE metric, a tilt error of 0.55°and azimuth error of 7.22°is achieved. The azimuth error is supposed to originate from the minor morning shading of the observed PV system. Williams et al. 2012 state a longitudinal error of less than 50 miles (around 80 km) using their astronomical approach and one month of data. The panel orientation deviation is found to be ±7°. Haghdadi et al. 2017 achieved a mean absolute deviation of 0.2°longitude, 4.08°latitude, 2.75°panel tilt and 5.85°azimuth working with clear sky data and a temporal resolution of 5 minutes. Our PSO approach with IQR filtered error metrics outperforms the location estimation (less than 0.3°for both longitude and latitude) using hourly temporal resolution (1.4°longitude MAE for hourly resolution have been achieved in Haghdadi et al. (2017)). Even the other tested 5 PV systems have been located better than 1°for both longitude and latitude. However, our approach lacks accuracy in panel orientation.
Chen et al. 2017a achieved a better accuracy in allocating their PV systems, however, the granularity, as well as distribution of the underlying ground weather stations does not get clear. Satellite data, as used in this work, provide a uniform coverage making our approach more generic and applicable equally almost all over the globe.

Conclusion
For advanced modeling of PV power generation in microgrid simulation scenarios or for improved forecasting with physical-hybrid models, the PV system parameters, such as location, orientation and nominal power are required but not available in all cases.
This paper presents a framework to estimate the most relevant PV system parameters by using power measurements and ERA5 reanalysis weather data in combination with a PVWatts based PV simulation model. The relevant parameters, more specifically longitude, latitude, panel tilt and azimuth, as well as inverter/panel peak power, have been identified with a global sensitivity analysis on the simulation model. As most of the parameters show a dependency on each other, all parameters are optimised at once. This is achieved by minimising the error between the measured and the simulated power output time series using PSO and different error metrics. We compared commonly used MAE and RMSE with median error and IQR filtered metrics, which only consider the lower half of absolute errors and thus ignore outliers. The latter perform slightly better for location estimation but lack accuracy in panel tilt.
We demonstrated with one exemplary PV system and measurements over one year that the location can be estimated with an error of less than 25 km using hourly measurement resolution. This estimation error roughly matches the spatial resolution of the underlying ERA5 data. Regarding the panel orientation, azimuth estimation is acceptable while the tilt angle, which also has a lower sensitivity, remains a point for improvement. The panel peak power is in the expected range of panel degradation of the analysed system. Similar observation can be found with the 10 PV system in Europe and the USA.
In contrast to related work, we can also apply our approach to dual-sided PV installation. As result, it is possible to distinguish between single and dual-sided systems; however, the panel orientation mismatch is higher than on single-sided systems only. The location accuracy is comparable to singled-sided systems.
The presented framework is limited, on the one hand, by accurate weather data (temporal and spatial) and, on the other hand, by an accurate PV model. Localisation using ERA5 weather data has been demonstrated to work quite well with regard to the available resolution of 0.25°on latitude and longitude. However, the panel orientation extraction might be improved by removing the uncertainty of irradiance, e.g., by more focusing on clear sky days. In addition, shadow and snow detection could be integrated to avoid or correct the measurements of shaded periods. The parameter estimation might be improved by finding the best trade-off between filtering inaccurate data (e.g., panel shading) and a suitable metric that devalues outliers, both without losing the general correlation to the ERA5 data for location estimation.