15
The role of clouds in improving the regression model for hourly values of diffuse solar radiation Claudia Furlan a,, Amauri Pereira de Oliveira b , Jacyra Soares b , Georgia Codato b , João Francisco Escobedo c a Department of Statistical Sciences, University of Padua, Padua, Italy b Group of Micrometeorology, Department of Atmospheric Sciences, Institute of Astronomy, Geophysics and Atmospheric Sciences, University of São Paulo, São Paulo, SP, Brazil c Department of Natural Sciences, School of Agronomic Sciences, State University of São Paulo, Botucatu, SP, Brazil article info Article history: Received 8 April 2010 Received in revised form 13 October 2011 Accepted 13 October 2011 Available online 29 November 2011 Keywords: Regression Diffuse solar radiation model Cloud Correlation models Liu–Jordan abstract The study introduces a new regression model developed to estimate the hourly values of diffuse solar radiation at the surface. The model is based on the clearness index and diffuse fraction relationship, and includes the effects of cloud (cloudiness and cloud type), traditional meteorological variables (air temperature, relative humidity and atmospheric pressure observed at the surface) and air pollution (con- centration of particulate matter observed at the surface). The new model is capable of predicting hourly values of diffuse solar radiation better than the previously developed ones (R 2 = 0.93 and RMSE = 0.085). A simple version with a large applicability is proposed that takes into consideration cloud effects only (cloudiness and cloud height) and shows a R 2 = 0.92. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Few energy production technologies have less impact on the environment than solar energy technologies. The solar energy source is free and abundant, and the energy generated by the sun does not produce any air pollution or hazardous waste. The most widespread solar energy technologies currently in use are so- lar photovoltaic and solar thermal energy [1–5]. The efficiency of these two technologies depends on the accuracy of the knowledge about the solar radiation field at the surface [2,3]. Besides energy production, knowledge about short-term varia- tions in the solar radiation field is critical information for agricul- ture, urban planning, and atmospheric pollution analysis [6–8]. To establish energy-planning strategies in any of these fields, it is important to compile available scientific information about solar radiation, and to develop and test models to predict solar energy in different scales of time that can be representative of large areas [1,9]. This is particularly relevant in tropical in the subtropical areas of Brazil and other South America countries where solar radi- ation is considered one of the most important sources of renewable energy [10–12]. The solar radiation field at the surface is composed of direct beam and diffuse components; it can be estimated directly by in situ measurements [10,11,13–15] and indirectly by modeling techniques [16,17], or by a combination of both in satellite esti- mates [18–21]. In situ measurements of the solar radiation field constitute the most precise way to estimate global, direct, and diffuse solar radi- ation at the surface. However, observations require special care with respect to calibration and spatial ‘representativeness’, mainly because most of the sensors used operationally do not take into consideration appropriately the errors associated with tempera- ture, cosine, and ventilation effects [15], and in the case of diffuse solar radiation, the blocking effects caused by shadowing devices [22]. Remote sensing technology is not as precise and reliable as in situ measurements; however, it provides the best spatial description of the solar field at the surface [14,19,20]. The problem with satellites is their high cost. Modeling techniques can be categorized as physical and empir- ical. Physical modeling consists of the numerical solution for the radiative transfer equation [23–27], and empirical modeling con- sists of fitting a set of expressions – regression models [28–31] or procedures in the neural network technique [32,33] – to previ- ously selected data sets in a probabilistic framework. Although physical models perform better than empirical ones, estimating the solar radiation field at the surface, on a regular basis, using the radiative transfer equation is rather difficult, and it cannot be done numerically without simplifying the role of clouds, moisture, and other minor atmospheric gases and aerosols, casting doubts on its degree of realism. In several applications, the separation 0306-2619/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.apenergy.2011.10.032 Corresponding author. Fax: +39 049 827 4170. E-mail address: [email protected] (C. Furlan). Applied Energy 92 (2012) 240–254 Contents lists available at SciVerse ScienceDirect Applied Energy journal homepage: www.elsevier.com/locate/apenergy

The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Embed Size (px)

Citation preview

Page 1: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Applied Energy 92 (2012) 240–254

Contents lists available at SciVerse ScienceDirect

Applied Energy

journal homepage: www.elsevier .com/ locate/apenergy

The role of clouds in improving the regression model for hourly valuesof diffuse solar radiation

Claudia Furlan a,⇑, Amauri Pereira de Oliveira b, Jacyra Soares b, Georgia Codato b, João Francisco Escobedo c

a Department of Statistical Sciences, University of Padua, Padua, Italyb Group of Micrometeorology, Department of Atmospheric Sciences, Institute of Astronomy, Geophysics and Atmospheric Sciences, University of São Paulo, São Paulo, SP, Brazilc Department of Natural Sciences, School of Agronomic Sciences, State University of São Paulo, Botucatu, SP, Brazil

a r t i c l e i n f o a b s t r a c t

Article history:Received 8 April 2010Received in revised form 13 October 2011Accepted 13 October 2011Available online 29 November 2011

Keywords:RegressionDiffuse solar radiation modelCloudCorrelation modelsLiu–Jordan

0306-2619/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.apenergy.2011.10.032

⇑ Corresponding author. Fax: +39 049 827 4170.E-mail address: [email protected] (C. Furlan).

The study introduces a new regression model developed to estimate the hourly values of diffuse solarradiation at the surface. The model is based on the clearness index and diffuse fraction relationship,and includes the effects of cloud (cloudiness and cloud type), traditional meteorological variables (airtemperature, relative humidity and atmospheric pressure observed at the surface) and air pollution (con-centration of particulate matter observed at the surface). The new model is capable of predicting hourlyvalues of diffuse solar radiation better than the previously developed ones (R2 = 0.93 and RMSE = 0.085). Asimple version with a large applicability is proposed that takes into consideration cloud effects only(cloudiness and cloud height) and shows a R2 = 0.92.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Few energy production technologies have less impact on theenvironment than solar energy technologies. The solar energysource is free and abundant, and the energy generated by thesun does not produce any air pollution or hazardous waste. Themost widespread solar energy technologies currently in use are so-lar photovoltaic and solar thermal energy [1–5]. The efficiency ofthese two technologies depends on the accuracy of the knowledgeabout the solar radiation field at the surface [2,3].

Besides energy production, knowledge about short-term varia-tions in the solar radiation field is critical information for agricul-ture, urban planning, and atmospheric pollution analysis [6–8].To establish energy-planning strategies in any of these fields, it isimportant to compile available scientific information about solarradiation, and to develop and test models to predict solar energyin different scales of time that can be representative of large areas[1,9]. This is particularly relevant in tropical in the subtropicalareas of Brazil and other South America countries where solar radi-ation is considered one of the most important sources of renewableenergy [10–12].

The solar radiation field at the surface is composed of directbeam and diffuse components; it can be estimated directly by

ll rights reserved.

in situ measurements [10,11,13–15] and indirectly by modelingtechniques [16,17], or by a combination of both in satellite esti-mates [18–21].

In situ measurements of the solar radiation field constitute themost precise way to estimate global, direct, and diffuse solar radi-ation at the surface. However, observations require special carewith respect to calibration and spatial ‘representativeness’, mainlybecause most of the sensors used operationally do not take intoconsideration appropriately the errors associated with tempera-ture, cosine, and ventilation effects [15], and in the case of diffusesolar radiation, the blocking effects caused by shadowing devices[22]. Remote sensing technology is not as precise and reliable asin situ measurements; however, it provides the best spatialdescription of the solar field at the surface [14,19,20]. The problemwith satellites is their high cost.

Modeling techniques can be categorized as physical and empir-ical. Physical modeling consists of the numerical solution for theradiative transfer equation [23–27], and empirical modeling con-sists of fitting a set of expressions – regression models [28–31]or procedures in the neural network technique [32,33] – to previ-ously selected data sets in a probabilistic framework. Althoughphysical models perform better than empirical ones, estimatingthe solar radiation field at the surface, on a regular basis, usingthe radiative transfer equation is rather difficult, and it cannot bedone numerically without simplifying the role of clouds, moisture,and other minor atmospheric gases and aerosols, casting doubts onits degree of realism. In several applications, the separation

Page 2: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Nomenclature

Kt clearness index, quantitative variable in models (4) and(5)

Kd diffuse fractionPM10 concentration of PM10 (lg m�3) factor of general model

(4) assuming value 0 and 1 to indicate when PM10 islower and higher than 26 lg m�3

Qp p% quantilec break-pointR2 coefficient of determinationa level of significance of a testH0 null hypothesis of a testH1 alternative hypothesis of a testIKt>c indicator function (assumes value 1 if Kt > c)(b0, b1, . . ., bq) coefficients of the regression models(X1, X2, . . . ,Xq) regressors of the regression models corresponding

to quantitative variables and factors (represent theinformation)

RH air relative humidity (%), quantitative variable in model(4)

T air temperature (�C), quantitative variable in model (4)P surface atmospheric pressure (hPa), quantitative

variable in model (4)Season factor of general model (4) assuming 0 and 1 to

indicate, respectively, the coldest and driest months(March–August) and the warmest and wettest months(September–February)

CI cloudiness, quantitative variable in models (4) and (5)n number of observationsStratocumulus, Stratus, Cumulus, Altocumulus, Altostratus, Cirrus,

Cirrocumulus, Cirrostratus, Cumulus Nimbus factors ofgeneral model (4) assuming 1 and 0 to indicate,respectively, the presence and absence of stratocumulus,stratus, etc.

ClearSkyLow, ClearSkyMiddle, ClearSkyHigh, ClearSky, CloudyLow,CloudyMiddle, CloudyHigh cloudiness Pattern: factorsof general model (4) assuming 1 and 0 to indicate,respectively, the presence and absence of clouds in thethree levels: low, middle and high

PM10 �ClearSkyLow, PM10 �ClearSkyMiddle, PM10 �ClearSkyHigh,PM10�ClearSky factors indicative of interactionsbetween air pollution and cloudiness pattern in model(4)

Season�ClearSkyLow, Season�ClearSkyMiddle, Season�ClearSky-High, Season� ClearSky factors indicative of interac-tions between local climate and cloudiness pattern inmodel (4)

Season�PM10 factor indicative of interaction between localclimate and air pollution in model (4)

LowC, MiddleC, HighC factors of the simplified model (5) assuming1 and 0 to indicate, respectively, the presence andabsence of low, middle and high clouds

AI aerosol index

C. Furlan et al. / Applied Energy 92 (2012) 240–254 241

between physical and empirical modeling is not very well defined.For instance, Janjai et al. [34] developed a model to predict hourlyvalues of global solar radiation at the surface in Thailand, takinginto consideration cloud, aerosol, ozone, and moisture effects usingsatellite estimates of incoming solar radiation at the top of theatmosphere as input. This model can be classified as physical be-cause it is based on the radiative transfer equation; however, ittakes into consideration cloud, aerosol, and other effects using ahigh degree of empirical representation of absorption and scatter-ing processes.

The limitations of in situ measurements, satellite, and physicalmodeling techniques described above have made the use of theempirical modeling technique very popular in the simulation of so-lar radiation fields at the surface [35]. A brief survey of the litera-ture indicates, from a statistical point of view, that there are stillsome inconsistencies and misplaced conceptions about empiricalmodels, which makes it difficult to choose the most appropriatemodel, to improve existing models, or even to develop new ones.One example corresponds to Wong and Chow [36], who presentsparametric and decomposition models. They present parametricmodels as those models that require detailed information of atmo-spheric conditions to predict diffuse radiation, and decompositionmodels as those models that require only measured global irradi-ance data (regardless the mathematical function used in eithercase). It seems that part of the inconsistency in [36] is due to thedifferent meanings of the word ‘‘parameter’’. In the multidisciplin-ary fields that comprise empirical models of solar radiation, the useof the word ‘parameter’ to refer to atmospheric conditions (such astemperature and pressure) is quite common, while these ones arereferred to as ‘variables’, or ‘predictors’, or ‘regressors1’ in statistics.Moreover, in statistics, a parameter is an unknown coefficient, whichrelates, through a function, each regressor to the object of the studyand it is estimated by fitting this function to data. Therefore, given

1 In the framework of regression models.

these inconsistencies a classification criteria may help to choose themost appropriate model or to define the strategy of improving a par-ticular class of models. In this work, a statistical classification isadopted where empirical models are categorized as parametric andnon-parametric models. Parametric models include Generalized LinearModels (GLM) such as logistic regression model [45]; linear2 regres-sion models3 (mostly polynomials [37–44]); non-linear regressionmodels (such as segmented regression with unknown break points,sigmoid functions [31,46], and sine waves [47]); and Autoregressivemodels (such as ARIMA [48] and ARMA [49]). Non-parametric modelscomprise neural network [32,33] and fuzzy logic [50] techniques.

Diffuse solar radiation is an important component of the sur-face radiation budget; it depends on surface albedo and topogra-phy through the sky-view-fraction, and the composition of theatmosphere, mainly clouds, particulate matter, and water vapor.Therefore, this work focuses on the development of an empiricalmodel to predict diffuse solar radiation at the surface, based onthe Liu–Jordan model [37]. Known also as the ‘‘correlation mod-el’’, it corresponds to a linear regression model and it is based onthe correlation between the diffuse fraction (Kd) and the clear-ness index (Kt), which varies according to geographic positionand time of year. There is a large body of knowledge with re-spect to modeling diffuse solar radiation at the surface using aregression model based on the correlation between Kd and Kt

[28–30,37–40,43,44,51–61]. These ‘‘correlation models’’ performwell for monthly and daily values of diffuse solar radiation,but they fail for hourly and short time value variations. One rea-son is that most of the short-time variations in diffuse solar radi-ation at the surface caused by variations in the cloud propertiesthat are present in Kd are undetectable by Kt; they end upspreading the points on the Kd–Kt dispersion diagram, so thatfor a given Kt value there is a wide range of Kd [39,40]. It is

2 Linear with respect to parameters (and not to variables).3 Linear models are simple if only one variable is used in the model, and multiple if

more than one variable are included.

Page 3: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

242 C. Furlan et al. / Applied Energy 92 (2012) 240–254

apparent that Kt is not enough to predict hourly values of Kd;more information about clouds is necessary. However, eventhough cloud is one of the most frequent and importantatmospheric phenomena, acting as a critical factor in the climatesystem and in climate change processes [62], cloud type infor-mation is not available as easily as other traditional meteorolog-ical variables; this fact makes it difficult to incorporate cloudtype information explicitly in empirical models. Reviewing theliterature revealed no empirical parametrical model for diffusesolar radiation at the surface that incorporates cloud type infor-mation explicitly.

The importance of incorporating more environmental variablesbesides Kt, in particular the cloud effects, was assessed by Soareset al. [32], applying the pattern recognition ability of the neuralnetwork technique. In this case, the neural network technique con-sidered clouds implicitly using downward atmospheric long-waveradiation at the surface as surrogate to estimate the hourly valuesof diffuse solar radiation. Even though the neural network modelperformed better than the correlation model developed previouslyby Oliveira et al. [41], this technique did not allow quantifyingexplicitly the effects of the environmental variables on Kd. More-over, from a practical point of view, neural networks are rathercumbersome to use by people other than those who developedthe algorithm. This drawback seems to be characteristic of mostnon-parametric models [33,50], making them difficult to apply toplaces and conditions different from the ones used to developthem.

In fact, one important problem concerning empirical modelingis the need for wide-range applicability. An empirical model hasa wide applicability if it is built on a few easily available variables,and if it catches the main dynamics between Kd and the environ-mental variables through a simple function. However, that func-tion should be, at the same time, not only simple, but alsoaccurate in its ability to predict Kd. In this way, the model can applyto other regions taking the structure (that is, the function) as fixed:only the coefficients of the variables will change by region becausethe model can be fitted to different datasets. If the relationshipfound is universal, the coefficients will not change significantlythrough regions. Since the word ‘‘universal’’ is not very commonin empirical modeling, it is most likely that variations of the coef-ficients will help, eventually, in grouping those regions with simi-lar characteristics.

Another factor that makes developing an empirical model fordiffuse solar radiation at the surface difficult is that most solarradiation information is gathered in surface stations located in ur-ban areas where solar radiation fields are likely to be affected bythe pollution present in the atmosphere and by the radiometricproperties of the urban canopy. Air pollution affects diffuse solarradiation directly by scattering aerosols [63,64], and indirectly,by increasing the number of water droplets in polluted clouds[65]. In general, urban surfaces reflect less solar radiation than rur-al or naturally vegetated surfaces because their geometry favorsabsorption of radiation by increasing the interaction between radi-ation and the surface because of multiple reflections (trapping ef-fect). In addition, large portions of urban surfaces are made ofmaterials such as concrete or asphalt that are characterized by al-bedo smaller than naturally or artificially vegetated surfaces [66].The combination of these factors causes the spread of points inthe Kd–Kt diagram [39,40].

The objective of this work is twofold:

� To build a more complete model for hourly values of diffusesolar radiation at the surface by identifying and isolating explic-itly, through a regression model, the effects of cloud, particulatematter, and meteorological variables on Kd, that are missing indiffuse fraction studies.

� To rank the variables of the more complete regression modelbased on their effect on Kd, in order to use only the most influ-ent ones in a simple model for widespread applicability andgood performance.

In this study, the regression model was the statistical tool forinvestigating the relationship between Kd and a large set of envi-ronmental variables measured in an urban area of São Paulo, thelargest conurbation in South America, where pollution and otherurban effects could affect the solar radiation field at the surface.The idea of using a large set of information in one single modelwas already used by Soares et al [32], but the current regressionmodel provides an explicit expression for the effects of the envi-ronmental variables on Kd.

In this work, the data set description is presented in Section 2.The climate representativeness and the functionality of the Kd–Kt

relationship for São Paulo are demonstrated in Section 3. The mod-el construction is described in detail in Section 4. There, thedynamics among Kd and the regressors are investigated and anew, simplified version of the model is presented. Section 5 sum-marizes the major findings and concludes the paper.

2. Site, instruments, and data set

The data used here correspond to hourly values of global anddiffuse solar radiation at the surface, air temperature, relativehumidity, and atmospheric pressure measured continuously dur-ing the entire year of 2002, in the city of São Paulo, Brazil(23�33’34’’S, 46�44’01’’W). The variables included in this analysiswere cloudiness and cloud type, estimated hourly at the meteoro-logical surface station located in the ‘Parque Estadual Fontes do Ipi-ranga’, as well as hourly values of particulate matter observed atthe surface in the urban area of São Paulo.

Global and diffuse solar radiation, air temperature, air relativehumidity, and air pressure measurements were taken on a micro-meteorological platform located at the top of the building housingthe Institute of Astronomy, Geophysics and Atmospheric Sciencesof the University of São Paulo, at the University Campus on thewest side of the city of São Paulo (IAG, at 744 m above the meansea level (23�3303500S, 46�4305500W). All measurements were takenwith a sampling frequency of 0.2 Hz and stored as 5-min averages.

A pyranometer, model 8-48, built by Eppley Lab. Inc. measuredglobal solar irradiance. A pyranometer, Eppley model PSP, coupledto a shadow-ring device [67] measured diffuse solar irradiance. Thediffuse solar irradiance was corrected by multiplying the observedvalue by a correction factor that considered the fraction of the dif-fuse field that was blocked by the shadow-ring. This factor was cal-culated on a daily basis because variations within a period of 24 hwere not significant and could be discard to simplify data process-ing. The sensors had been periodically calibrated, at least once ayear, using a spectral precision pyranometer model PSP, from Epp-ley Lab. Inc. as a secondary standard; the calibration consisted ofrunning both pyranometers continuously, side-by-side, over 2–7 days [13].

The clearness index (Kt) was defined as the ratio of global solarradiation at the surface to extraterrestrial solar radiation. Extrater-restrial radiation was estimated analytically [68] considering thesolar constant equal to 1366.1 W m�2 [69]. The diffuse fraction(Kd) was defined as the ratio of diffuse solar radiation to global so-lar radiation at the surface.

The air temperature and relative humidity were estimatedusing a pair of thermistor and capacitive sensors from Vaisala.According to the manufacturer, the air temperature and relativehumidity could be measured with an accuracy of 0.1 �C and 2%,respectively, for a range of temperatures from 0 to 40 �C and for

Page 4: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

C. Furlan et al. / Applied Energy 92 (2012) 240–254 243

relative humidity from 10% to 90%. A capacitive transducer manu-factured by Setra Inc., with an accuracy of 0.1 mb, measured airpressure.

This study included hourly values of particulate matter concen-tration (PM10) measured at the surface in the C. Cesar station usingthe Beta Attenuator Method. C. Cesar station is part of the air-qual-ity monitoring network of the São Paulo State EnvironmentalAgency [70].

Finally, this work used observations of cloudiness and cloudtype carried out every hour from 07:00 local time (LT) to 24:00LT at the meteorological station located in the south of São PauloCity. Cloudiness corresponded to the fraction of sky covered byclouds, varying from 0 (i.e., 0%, clear sky) to 10 (i.e., 100%, cloudy).The type of cloud included traditional low (stratocumulus, stratus,and cumulus), middle (altocumulus and altostratus), and high levelclouds (cirrus, cirrocumulus and cirrostratus). Clouds of the type‘cumulus nimbus’ were considered to be located in all three levels.

4 The p% quantile, Qp, of a variable is a value with an approximate percentage p ofthe data less than or equal to Qp.

5 Top and bottom of boxes show the 75% and 25% quantiles of the diffuse fractionKd, while the top and bottom of the whiskers depict the maximum and minimumvalues of Kd, for each level of meteorological variables. The median value is thehorizontal line inside boxes. The width of boxes and whiskers illustrates the spread ofobservations.

3. Representativeness and functionality of Kt–Kd relationship inSão Paulo

The city of São Paulo, with about 11 million inhabitants, to-gether with 38 other smaller cities, forms the Metropolitan Regionof São Paulo. This region, located about 60 km away from theAtlantic Ocean, has 20.5 million inhabitants and more than 7 mil-lion vehicles distributed over an area of 8051 km2. It is the largesturban area in South America, and one of the 10 largest in the world[71].

The city of São Paulo is located in the State of São Paulo, Brazil,at approximately 770 m above the mean sea level and 60 km west-ward from the Atlantic Ocean. Its climate – typical of subtropicalregions of Brazil –is characterized by a dry winter from June to Au-gust and a wet summer from December to February. The minimummonthly averages of daily air temperature values and air relativehumidity occur in July and August (16 �C and 74%, respectively),and the minimum monthly accumulated precipitation occurs inAugust (35 mm). The maximum monthly average of daily temper-ature values occurs in February (22.5 �C), and the maximummonthly averages of daily relative humidity values occur fromDecember through January and from March through April (80%).

As mentioned in Section 2, this work is based on observationscarried out during 2002. This time restriction was determined bythe fact that only in 2002 was there a complete set of hourly obser-vations of cloud type and cloud cover available in São Paulo.To verify the temporal representativeness of year 2002, the monthlyaverages of daily values of global and diffuse solar radiations, airtemperature, relative humidity, rain, and concentration of particu-late matter, the variables considered in the study, were comparedwith those over a 10-year period (from 1997 to 2006). The monthlyaverage behavior profile of the variables investigated during 2002was similar to the 10 years of observations from 1997 to 2006(Fig. 1).To assess the representativeness objectively, a set of two-sample t-tests for unpaired data was performed for each variable and for eachmonth, and the hypothesis of equality in mean between 2002 andthe 1997–2006 period was tested, at a level of 0.05 significance. Asshown in Table 1, excepted by temperature, the number of monthsin which the monthly averages of daily values for the two periodscould be considered equal varied from 9 to 12 (from 75% to 100%).Therefore, there was good agreement between 2002 and 1997–2006, indicating that 2002 could be considered representative ofthe dominant climate conditions in São Paulo.

It should be emphasized that a large dataset would be prefera-ble in studying the dynamics (trends and variability) of diffuse so-lar radiation. However, this study focused on identifying the most

appropriate relationship between diffuse solar radiation and globalsolar radiation taking into consideration explicitly meteorological,conventional variables (temperature, relative humidity, and atmo-spheric pressure), local patterns of clouds, and air pollution. There-fore, it seemed reasonable to assume that if the climate of SãoPaulo during the sampled period did not indicate behavior too dif-ferent from the local climate, 1 year would suffice to representmost significant interactions between diffuse and global solar radi-ation and all the other meteorological variables.

Considering the available information sufficient, the widespreading of points on the dispersion diagram between the diffusefraction Kd and the clearness index Kt in Fig. 2a indicated that wasnecessary to gather more information about how the local climatefeatures affected the diffuse fraction, since the clearness index initself was not enough, regardless of the model specification [39,40].One way to address the local climate effects on the Kd–Kt relation-ship was to consider 2.5%, 5%, 25%, and 50% median values, and75%, 95% and 97.5% quantiles4 in relation to air relative humidity,air temperature, pressure, cloudiness, and PM10 (Table 2).The functionality of the Kd–Kt relationship with respect to relativehumidity, air temperature, pressure, and cloudiness could be foundby considering the joint information between the diffuse fractionsand quantiles of these variables shown in box-plot diagrams5

(Fig. 3a–d).In the case of relative humidity (Fig. 3a), Kd presented a large amountof variability when relative humidity assumed values between 61%and 88%, since the middle 50% of the diffuse fraction was included,which was approximately between 0.25 and 0.8. In contrast, for theother levels of relative humidity, Kd was much more concentrated inthe middle 50% of the observations, even though the width of whiskersoutlined a wide range of data. About the cloudiness, it is interesting tosee that during 50% of the observation time (from median to maxi-mum value), more than 90% of the sky was covered by clouds, and thatthe sky was completely clear in São Paulo only for a small percentageof the time (evaluated apart as 12%). Overall, the distribution of boxessuggested the use of polynomial functions to specify the relationshipbetween Kd and those variables.

On the other hand, quantiles were used in a different way in thecase of particulate matter. Table 2 indicates that in São Paulo, forapproximately 25% of the hours, the particulate matter concentra-tion values remained less than or equal to 26 lg m�3. Moreover, forapproximately 75% of the hours, the particulate matter concentra-tion values remained less than or equal to 54 lg m�3. Conse-quently, 50% of the time, the particulate matter concentration inSão Paulo remained between 26 lg m�3 and 54 lg m�3. Since thepolynomial class was not suitable for this variable, it was decidedto use the first quantile, 26 lg m�3, as a threshold to indicate lowor high values of particular matter in the diffuse fraction modelingfor São Paulo. The idea in this case was to express the effect of par-ticulate matter on the behavior of diffuse solar radiation in SãoPaulo in terms of high and low values indicated by the patternsassociated with concentration above and below the first quantileof 26 lg m�3. In this case, the particulate matter concentrationwas included in the model by a factor (dichotomous regressor).

It should be pointed that approximately 25% of the time, theparticulate matter concentration in São Paulo was above the sec-ondary standard values of 60 lg m�3 [70], indicating that theatmosphere of São Paulo was very polluted in 2002. The origin of

Page 5: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Fig. 1. Annual evolution of (a) global solar radiation, (b) diffuse solar radiation, (c) temperature, (d) relative humidity, (e) rain and (f) particulate matter observed in the city ofSão Paulo.

Table 1Two sample t-test for unpaired data. Significance level is at 5%. Yes/No indicatewhether the mean of 2002 is equal to the mean of 1997–2006.

Month Global Diffuse PM10 Temp RH Rain

January Yes No Yes Yes Yes YesFebruary Yes No Yes No Yes YesMarch No No Yes No No YesApril Yes Yes Yes No No YesMay Yes Yes Yes No Yes YesJune Yes Yes Yes No Yes YesJuly Yes Yes Yes Yes Yes YesAugust Yes Yes Yes No Yes YesSeptember Yes Yes Yes Yes Yes YesOctober Yes Yes No No No YesNovember Yes Yes Yes No Yes YesDecember Yes Yes No No Yes Yes

% of Yes 92 75 83 25 75 100

244 C. Furlan et al. / Applied Energy 92 (2012) 240–254

the particulate matter in São Paulo is predominantly anthropo-genic, associated in most of the cases to fossil fuel combustion(vehicles and industries) indicated by the high levels of C and sul-fate in their solute fraction to [72]. The marine contribution to sul-fate in the particulate matter present in São Paulo was negligible inthe summer and below 10% in the winter according to [72]. As willbe seen in the following sections, not only the concentration but

the nature of particulate matter in São Paulo would play an impor-tant role in the definition of most appropriate model for diffuse so-lar radiation at the surface because the particulates became anuclide of haze droplets participating in the cloud formation andcontributing indirectly to how clouds interacted with solar radia-tion in São Paulo.

Similar to particulate matter, the functionality of the cloud typeeffects on diffuse solar radiation at the surface in São Paulo wasconsidered in the model by factors. Two sets of factors were intro-duced: nine factors for cloud types and seven for cloudiness pat-terns. The justification for cloud type was straightforward sincethe first set of factors represented the effect of each of the ninetypes of clouds on diffuse solar radiation (the types of clouds in-cluded were routinely observed at the surface station). On theother hand, the second set of factors indicated the effects of thevertical distribution of clouds on diffuse solar radiation observedat the surface; vertical distribution was based on cloud type. Ta-ble 3 shows that in São Paulo the absence of cloud at the low leveloccurred in 30% of the hours of observation in 2002, at the middlelevel in 77% of the hours, and at the high level in 73% of the hours.By using this information, it was possible to identify three predom-inant cloudiness patterns, in terms of altitude, and one clear skypattern. In fact, the observations in São Paulo indicated the pres-ence of cloud only at low altitude (i.e., clear sky at middle and highaltitudes) in 47% of the cases, at the middle altitude only in 4%, and

Page 6: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Fig. 2. Diffuse radiation fraction (Kd) versus clearness index (Kt) for (a) 100% of observations (n = 3887), (b) 75% of observations (n = 2915). Line in panel (b) represents model(2).

Table 2Minimum, maximum, 2.5%, 5%, 25%, 75%, 95% and 97.5% quantiles of traditional meteorological variables and particulate matter concentration at the surface in São Paulo in 2002.

Variables Min. 2.5% 5% 25% Median 75% 95% 97.5% Max.

Relative humidity (%) 18 24 44 61 77 88 98 100 100Temperature (�C) 8 11 15 20 23 26 31 31 35Pressure (mb) 910 918 923 926 929 932 936 937 950PM10 (lg m�3) 0 2 11 26 38 54 95 113 281Cloudiness index 0 0 0 4 9 10 10 10 10

Fig. 3. Box-plot diagrams of diffuse fraction Kd for (a) relative humidity, (b) temperature, (c) pressure and (d) cloudiness. For each variable, solid lines represent fitted thirddegree polynomial functions.

C. Furlan et al. / Applied Energy 92 (2012) 240–254 245

Page 7: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Table 3Percentages of cloud type and of cloudiness conditions in terms of altitude in São Paulo in 2002.

Low % Middle % High % Mixed %

Cloud typeStratocumulus 54 Altocumulus 17 Cirrus 27 Cumulus nimbus 1Stratus 15 Altostratus 14 Cirrocumulus 6Cumulus 30 Cirrostratus 1

Clear sky 30 Clear sky 77 Clear sky 73

Cloudy patternOnly at the low altitude 47 Only at the middle altitude 4 Only at the high altitude 9 Clear sky (at the three altitudes) 12

246 C. Furlan et al. / Applied Energy 92 (2012) 240–254

at the high altitude only in 9% of the cases. It also indicated thatclear sky (no clouds at low, middle and high simultaneously) oc-curred in 12% of the hours of observation.

More details about the implementation of the factors in thegeneral regression model are described in Section 4.3.

6 Since, in the segmented regression, the endpoints are considered parameters, themodel itself is non-linear in its parameters; consequently, the widespread leastsquare technique is not appropriate and it is replaced by the non-linear least squaretechnique.

4. Model description

As part of the model description, this section shows how themodel construction can be used to investigate the existing rela-tionship between Kd and Kt and how it changes in terms of temper-ature, relative humidity, and atmospheric pressure, hereafteridentified by regressors. The section includes an explanation ofhow the model’s ability to predict hourly values of diffuse solarradiation is improved by the use of new regressors such as partic-ulate matter concentration at the surface, cloudiness, cloud type,cloudiness patterns, and local climate seasonal variation, in themathematical form of factors.

Even though the introduction of factors increased the complex-ity of the model, it allowed it to convey all the information relevantto describe diffuse solar radiation in just one expression that couldbe validated in a wide range of places and climate conditions.

It should be emphasized that the regressors correspond to thepredictor variables used in all empirical models [31,61]. The cur-rent model was further simplified by the objective selection ofthe most relevant regressors with a variable ranking analysis. Asa result, a new simple regression model that takes into consider-ation only cloud effects but still offers widespread applicabilityand good performance for predicting hourly diffuse solar radiationis proposed in this section.

4.1. Selecting training and test dataset

The study conducted model fitting and selection by splitting thedataset into training and test sets. The training set was used tobuild a satisfactory model for Kd, while the test set measured theperformance of this model. Here, the test set is used in a compar-ison of the proposed model to the models developed by Oliveiraet al. [41] and Soares et al. [32].

There is no general rule when it comes to choosing the size ofthe training and test sets, especially with a large amount of data.In this study, 25% of the dataset was adopted for the test set (assuggested by Hastie et al. [73]) and, consequently, 75% was usedfor the training set. The elements of both subsets were chosen ran-domly to guarantee the representativeness of all climate patternspresent in 1 year.

4.2. Segmented regression

To build the model, it was first necessary to identify a genuinerelationship between Kd and Kt, using as criteria the best compro-

mise between empirical evidence (see Fig. 2a) and certain pre-existing information. In most previous studies, based only on therelationship between Kd and Kt, regression models consisted ofpolynomial functions fitted into a given interval of Kt variation,joined with horizontal lines for values of Kt outside that interval[41,42].

It should be emphasized that regression models represent agood statistical tool with both the simplicity and the flexibility tocapture the Kd–Kt relationship but the main problem, in previousstudies, was the subjectivity of the choice of the endpoints of theKt variation interval: in fact, it was common to determine the latterones subjectively by eye.Recently, sigmoid functions have been introduced in the solar radi-ation-modeling field to avoid the use of break points [31,61]. Inthese cases, the use of sigmoid functions, implemented throughnon-linear relationships between Kd and the regressors, improvedthe performance of the diffuse solar radiation models; however,such a procedure seemed to go in the opposite direction of imple-menting a model whose parameters (weight that the correspond-ing regressors would have on Kd expression) were easilyconnected to their fundamental physical meaning.In this work, to keep simplicity and to reduce subjectivity, theimplementation of the regression model estimated endpointsobjectively. In particular, it made use of a segmented regression[74], that allowed different regression models of Kd (with respectto Kt) to be fitted to different Kt variation windows. The endpointsof Kt variation windows were also estimated. The authors have notfound any previous work where the endpoints (also called ‘‘breakpoints’’) were estimated objectively. Thus, this section provides adetailed description of the segmented regression because it in-cludes novelty regarding the objective estimation of break points.

Visual inspection of the Kt–Kd diagram (Fig. 2a) indicates theexistence of two distinct behaviors: the sky totally covered bycloud (cloudy), where Kd is of the order of 1 for small values ofKt; and the transition from cloudy to clear sky, where Kd variesstrongly with Kt (approximately for Kt > 0.3). Based on these behav-iors, it was decided to fit two different straight lines into eachinterval, implementing a so-called broken line. In particular, thebroken-line segmented regression was defined by (i) the breakpoints that split the Kt variation window into two intervals, charac-terized by (ii) two intercepts and (iii) two slopes. Parameters (i–iii)were simultaneously estimated by the non-linear least squaretechnique.6

Therefore, analytically, the broken-line model was defined asfollows, with the interval of Kt variation split into two intervals(0, c) and (c, 1) by the unknown break point c:

Page 8: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

7 A factor of level 1 and 0 (if a condition happens or does not happen) is a qualitativevariable that expresses two categories (or levels). Qualitative variables can be codedto appear numeric but their numbers are meaningless. The values of a qualitativevariable do not imply a numerical ordering, they are simply categories. On the otherhand, the corresponding coefficient has a physical meaning, and represents the effectof the manifestation of that condition on Kd, controlling for the other regressors. Ingeneral, factors can have more than two levels.

C. Furlan et al. / Applied Energy 92 (2012) 240–254 247

Kd ¼ðb0 � b1cÞ þ b1Kt Kt < cðb0 � b2cÞ þ b2Kt Kt > c

�ð1Þ

For Kt < c, the straight line was characterized by intercept (b0 � b1c)and slope b1, while for Kt > c, it was characterized by intercept (b0 –b2c) and slope b2.The estimate of break-point c was 0.228, of b0 was0.97, of b1 was �0.07 and of b2 was �1.64, while the correspondingstandard errors were 0.006, 0.01, 0.08 and 0.08, respectively.

Considering the model given by expression (1) as a startingpoint, a test was run to verify whether the data supported theassumption of no dependency of Kd on Kt, for Kt < c; this can be ex-pressed through the following statistical hypothesis:

H0: b1 = 0, Kd does not depend on Kt, for Kt < c;H1: b1 – = 0, Kd varies linearly in terms of Kt, for Kt > c.

A t-test was performed with a level of significance a = 0.05, andsince hypothesis H0 was not rejected, with a p-value of 0.23, model(1) could be then reduced as follows:

Kd ¼b0 Kt < 0:228ðb0 � b1cÞ þ b1Kt Kt > 0:228

�ð2Þ

where b2 of model (1) has been replaced by b1 for simplicity.It should be emphasized that since c was no longer a parame-

ter—it was estimated previously in model (1)—regression model(2) was linear and required the common least square techniqueto fit it. In summary, model (2) indicated that Kd was constantfor Kt < 0.228 and followed a straight line for Kt > 0.228 (Fig. 2b).For the year 2002, the estimate of b0 was 0.961 and of b1 was�1.65, while the corresponding standard errors were 0.003 and0.01, respectively.

The large non-homogeneous variability of observations with re-spect to the red line in Fig. 2b indicates the lack of fit of model (2)(coefficient of determination R2 = 0.86). It should be highlightedhere that, a similar lack of fit would occur for any regression modelof Kd in terms of Kt (such as polynomial functions) without any ex-tra information about the local climate features.

4.3. General multiple regression model

To build a general multiple regression model, the segmentedstructure obtained from the relationship between Kd and Kt ofmodel (2) with c = 0.228 was kept. Furthermore, information about(a) traditional meteorological variables (air relative humidity,%, airtemperature, �C, and surface pressure, mb), (b) atmospheric pollu-tion (particulate matter concentration at the surface, lg m�3), (c)cloudiness and cloud type and (d) seasonality were incorporatedinto model (2) using the following linear regression model:

Kd ¼ b0 þ b1ðKt � cÞIKt>0:228 þ b2X2 þ � � � þ bqXq ð3Þ

where IKt>0.228 was an indicator function that assumed value 1 if(Kt > 0.228) and 0 otherwise, (b0, b1, . . . ,bq) were the parametersand (X2, . . . ,Xq) were the regressors and represented the informationoutlined above in (a), (b), (c) and (d).

All three traditional meteorological variables (temperature, rela-tive humidity, and pressure) and the cloudiness (fraction of sky cov-er from 0 to 10) were included in model (3) as regressors by usingthird degree polynomials. The family of polynomials was obtainedby analyzing the box-plot diagrams indicated in Fig. 3a–d. The de-gree of the polynomial was chosen by separately fitting the polyno-mial regressions between Kd and each single variable, and by aselection procedure based on the analysis of variance [75]. The fit-ted third degree polynomial functions, the best choice in the caseof temperature, relative humidity, and atmospheric pressure, areillustrated in Fig. 3a–d by the solid lines.

The air pollution variable, PM10, was included in model (3) as afactor7 of level 0, if PM10 was smaller than 26 lg m�3 (25% quantile),and of level 1, if PM10 was bigger than 26 lg m�3. Its coefficient mea-sured the effect that values of PM10 bigger than 26 lg m�3 had on Kd,controlling for the other regressors.

Cloud information, besides the cloudiness, was included in themodel using cloud type and cloudiness pattern. Cloud type con-sisted of a set of 9 factors that assumed level 1 when a given cloudwas present and level 0 otherwise. Factors were labeled by the typeof cloud as Stratocumulus, Stratus, Cumulus, Altocumulus, Altostratus,Cirrus, Cirrocumulus, Cirrostratus, or Cumulus Nimbus. The coeffi-cient of each factor measured the effect that each type of cloudhad on Kd, controlling for the other regressors. Cloudiness patternconsisted of seven factors characterizing the vertical distributionof both clouds and clear sky. Factors were labeled by the observedcloudiness pattern as ClearSkyLow, ClearSkyMiddle, ClearSkyHigh,ClearSky, CloudyLow, CloudyMiddle, or CloudyHigh. These factorswere specified in terms of altitudes and assumed level 1 whenclouds or clear skies were present at a given altitude and level 0otherwise. For example, if ClearSkyLow was 1, it meant that thesky was clear at the low altitude and was cloudy at the middleand high levels. Similarly, if CloudyLow was 1, it meant that therewere clouds at the low altitude but that it was clear at the middleand high levels. Their coefficients measured the effect of the pres-ence (cloudy) or absence (clear sky) of cloud at a particular level(low, middle, or high) on Kd, controlling for the other regressors.

The seasonal variation of the local climate was included in themodel by introducing a new variable Season, which in the case ofSão Paulo was a factor that assumed level 1 during the warmestand wettest months (September–February) and level 0 during thecoldest and driest months (March–August). This choice was basedon the fact that the solar radiation field in the region of São Paulowas characterized by two distinct behaviors during the driest(coldest) and wettest (warmest) months (see Figs. 10 and 11 onp. 242 in [13]). The idea, here, was to replace different equationsto take into account seasonal variation effects on diffuse radiation(one for the winter and one for the summer) with only one equa-tion. This was possible by introducing the seasonal effect througha factor in the model. Its coefficient measured the effect that thewarmest and wettest months had on Kd, controlling for the otherregressors. In particular, this factor allowed capturing part of thevariation of Kd existing between the two analyzed seasons thatwas not covered by the other variables (such as temperature, rela-tive humidity and surface pressure). If the coefficient of Season issignificant, it means that the information caught by this factor willprovide useful information to improve the prediction of Kd.

4.4. Interactions

Even though the general model (3) included all the availablerelevant variables, to model diffuse radiation properly it was nec-essary to capture some of the complexity of the atmosphericdynamics and the manner in which they were modulated by localfeatures of the climate of São Paulo City.

The structure of model (3) allowed one to answer only simplequestions such as ‘Does the diffuse fraction grow with PM10, clearsky or in the warm season?’ However, in order to deal with the nextlevel of complexity, the model had to be able to answer questions

Page 9: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

248 C. Furlan et al. / Applied Energy 92 (2012) 240–254

such as ‘What happens to the diffuse fraction on a day with a clear skyand high levels of PM10?’ The model had to be able to identify howPM10 concentration at the surface contributed to the growth (orreduction) of the diffuse fraction in relation to whether the skywas clear or cloudy. From a statistical point of view, this couldbe accomplished by allowing an interaction between PM10 concen-tration and clear sky in the model.

Therefore, the complexity of model (3) would be increased byincluding interactions between variables. Here, with the availableinformation, only a few joint relationships were worthy of interestand the model would take into consideration the followinginteractions:

� PM10 concentration at the surface and clear sky (i.e., what hap-pened to Kd when PM10 concentration at the surface was high[or low] and the sky was clear [or cloudy]).� Seasonal variation and clear sky (i.e., what happened to Kd when

it was warmer and wetter [or colder and drier] and the sky wasclear [or cloudy]).

Table 4Coefficients and corresponding level of significance for all variables of model (4).

� Seasonal variation and PM10 concentration at the surface (i.e.,what happened to Kd when it was warmer and wetter [or colderand drier] and PM10 concentration at the surface was high [orlow]).

To evaluate the three interactions presented above, it was nec-essary to include 9 new regressors in model (3), resulting from theproduct of the following factors:

� Air pollution with cloudiness pattern: PM10�ClearSkyLow,PM10�ClearSkyMiddle, PM10�ClearSkyHigh and PM10�ClearSky.� Seasons with cloudiness pattern: Season�ClearSkyLow, Season�

ClearSkyMiddle, Season�ClearSkyHigh and Season� ClearSky.� Seasons with air pollution: Season o�PM10.

Since each interaction was a factor and it was the result of theproduct of two factors, it also assumed levels 0 and 1. Interactionsassumed level 1 when both factors were 1; otherwise, they as-sumed level 0. For example, PM10�ClearSkyLow assumed level 1

Page 10: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Fig. 4. Frequency distribution of daily values AI index for the city of São Pauloduring 2002.

C. Furlan et al. / Applied Energy 92 (2012) 240–254 249

if high values of PM10 were associated with a clear sky at a lowlevel.

It should be emphasized that with these interactions, model (3)totalized 40 variables, representing a general version in which allthe regressors described above were included. However, to identifywhich terms of model (3) were significant, a selection procedurebased on the analysis of variance [75] was implemented.

4.5. Choosing the best model and assessing the physical meaning of thecoefficients of the most relevant variables

The general expression (3) of a regression model, segmented inKt, was applied to the training set, using as regressors all the col-lected data presented in Sections 4.3 and 4.4. A selection proce-dure, based on the analysis of variance, was then applied toreduce the regressors to only those explicative for Kd. Table 4shows the estimates and the significance of the parameters afterthe selection procedure. The parameters significant at the 5% level(maximum) are highlighted in gray. The formal expression of themodel, that hereafter will be referred to as model (4), was:

Kd ¼ �0:281� 1:244ðKt � 0:228ÞIKt>0:228 þ 0:000003RH2

þ 0:001T þ�0:000005T3 þ 0:000000001P3

� 0:021PM10 þ 0:027Seasonþ 0:03CI � 0:005CI2

þ 0:0004CI3 þ 0:016Stratocumulusþ 0:013Stratus

þ 0:018Altostratus� 0:021Cirrusþ 0:015Cirrocumulus

þ�0:074Cumulusnimbus� 0:041ClearSkyLow

� 0:028ClearSkyMiddleþ 0:032CloudyMiddle

� 0:028CloudyHighþ 0:024PM10 � ClearSkyLow

þþ0:017Season� ClearSkyMiddleþ 0:02Season� PM10 ð4Þ

From the signs of the coefficients, it could be concluded thatStratocumulus, Stratus, Altostratus, Cirrocumulus and the presenceof cloud at middle altitude were positively related to Kd. On theother hand, clear sky at low and middle altitude, cirrus, cumulusnimbus, and cloudy only at high altitude had a negative impacton Kd; besides, based on the magnitude of the coefficients, clearsky at low altitude was the second variable after cumulus nimbusto have a negative impact on Kd.

The explanations for the positive impact on Kd caused by thepresence of clouds and the negative impact on Kd caused by clearskies were straightforward; on the other hand, the negative im-pacts caused in the other situations were not expected. First, thenegative effect caused by cirrus on the diffuse fraction was compa-rable to the effect caused by clear skies. This might indicate thatthe negative effect was caused by the inability to detect the differ-ence between clear skies and the presence of cirrus visually; inmany cases, it is possible to have cirrus even though the sky condi-tion is classified as clear. The negative effect caused by cumulusnimbus on Kd was probably because the cumulus nimbus was a verythick and deep cloud that generally produced rain. The combina-tion of all of these factors worked in the opposite direction to whatwas expected, reducing the intensity of diffuse solar radiation atthe surface. It should be emphasized that, in the presence of rain,the pyranometers might not perform as expected.

Clear sky at low altitude reduced the intensity of Kd. This wasanother unexpected result, since it indicated that there was cloudat the middle and high levels, but they did not contribute to in-creased diffuse solar radiation with respect to global solar radia-tion as expected.

Air pollution, in this study, had a negative impact on Kd: that is,Kd decreased, on average, in those hours in which PM10 assumedvalues larger than 26 lg m�3. The main reason for this rather

contradictory effect (one would expect the diffuse fraction to in-crease as a larger load of particulate matter in the atmosphere scat-ters more solar radiation) was that the particulate matter in SãoPaulo during the year 2002 had a dominant absorbing character.This can be seen in Fig. 4, where the histogram of the aerosol indexfor São Paulo is displayed for the year 2002. The aerosol index (AI)was estimated from satellite measurements of UV radiation [76]. AIindicates how much UV radiation backscattered from an atmo-sphere containing aerosols differs from UV radiation backscatteredfrom a pristine atmosphere. AI is positive for scattering aerosol andnegative for absorbing aerosol. Therefore, during most of 2002, theAI indicated the presence of absorbing aerosols in São Paulo. Thiseffect was particularly significant during the winter months(Fig. 4a) (June–August), when air pollution events in São Paulowere more frequent and when there was a significant increase inthe contribution to the aerosol load in the local atmosphere frombiomass burning associated with the sugar cane harvest [71]. Dur-ing the summer months in São Paulo, the AI indicated that aerosoleither absorbed or scattered solar radiation (Fig. 4b) with a slightlylarger absorbing character.

Seasonal variation captured those effects due to the change ofthe season that were not captured by the meteorological and cloudcover variables (a) (b) (c) of Section 4.3. Seasonal variation waspositively related to Kd; that is, Kd increased in the warmer andwetter season. This effect was expected since, during the warmerseason, cloud activity was more frequent in São Paulo.

Considering the significance of the parameters in the interac-tions, after the selection procedure indicated in Table 4, only threeinteractions were selected: between air pollution and clear sky at alow altitude (PM10�ClearSkyLow), between seasonal variation andclear sky at the middle altitude (Season�ClearSkyMiddle), and be-tween seasonal variation and air pollution (Season�PM10). The

Page 11: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Fig. 5. Kd versus Kt diagram comparing observed and modeled Kd values using (a)model (4), (b) model (4) considering only the effects of meteorological variables and(c) model (4) considering the effects of meteorological variables and PM10. Datasetcorresponds to 25% of the entire 2002 dataset.

Table 5Performance evaluation measures: R2, AIC, RMSE, MBE%, and correlation betweenobserved and predicted values of Kd.

R2 AIC RMSE MBE% Correlation

Model (2) 0.861 �1376 0.119 8.585 0.928Model (2)+meteo 0.866 �1407 0.116 8.076 0.931Model (2)+meteo+PM10 0.868 �1417 0.116 8.110 0.932Model (4) 0.930 �1993 0.085 3.281 0.964aOliveira et al. [41] 0.847 0.124 2.410 0.929aSoares et al. [32] 0.859 0.121 17.540 0.927

Boldface indicates optimal values among models.a The performance measures are evaluated in the test set of this work.

250 C. Furlan et al. / Applied Energy 92 (2012) 240–254

contribution of pollution to the growth of Kd was amplified whenthe sky was clear at low altitude. This could also be explained interms of the increase in the scattering of solar radiation caused

by clouds in the middle altitudes. Air pollution, mainly urban aero-sols, increased the number of cloud droplets, decreasing their aver-age size and increasing the scattering of solar radiation [65]. Theinteraction between seasonal variation and both clear sky and airpollution indicated that, when it was warmer and wetter in SãoPaulo, the growth of Kd was amplified when the sky was clear atthe middle altitude (Season�ClearSkyMiddle) and pollution washigh (Season�PM10).

4.6. Validation

Model (4) is plotted in Fig. 5a (stars) in the test set of observa-tions (25%). The fitting was quite good and the coefficient of deter-mination R2 was increased to 0.93 from the 0.861 of model (2).

Comparisons with Oliveira et al. [41] and Soares et al.’s [32]models are summarized in the performance evaluation measuresin Table 5: model (4) had the best performance in terms of R2,RMSE, and correlation between observed and predicted values ofKd. Model (4) followed Oliveira et al. [41] closely for MBE%.

Fig. 6 indicates the performance of model (4) using test dataset.For comparison, the performances of models proposed by Oliveiraet al. [14] and Soares et al. [32] are also indicated for the samedataset. Fig. 6a–c show predict versus observed Kd values, whileFig. 6d–f show an equivalent comparison for hourly values of dif-fuse solar radiation (Kd times hourly value of global solar radia-tion). Considering the dispersion diagram of Fig. 6a–c one can seethat the points are less scattered and more evenly distributedaround 1:1 line in the case of Fig. 6a. This indicated that model(4) performed much better than Oliveira et al. [14] and Soareset al. [32]. Similar conclusions can be drawn from dispersion dia-grams of hourly values of diffuse solar radiation (Fig. 6d–f). In thiscase, model (4) only underestimates the large values of diffuse so-lar radiation, while the other models show more pronounced dis-crepancies both in overestimating the smallest values of diffusesolar radiation and in underestimating the large values of diffusesolar radiation.

4.7. Variable ranking analysis for proposing a simple regression model

Model (4) objectively quantified the relationships between Kd

and all the meteorological information available, reducing signifi-cantly the scattering of points in the Kd–Kt diagram (Fig. 5). How-ever, it could be further simplified without substantially losingthe capacity of predicting diffuse solar radiation at the surface,by performing a variable ranking analysis.

To understand the individual contribution of meteorologicalvariables, air pollution and cloud information, the following threesubmodels of model (4) were compared with model (4) itself:

� submodel (a): the segmented regression with Kt as regressor. Itcorresponded to model (2);� submodel (b): model (2) plus only meteorological variables;� submodel (c): model (2) plus meteorological variables and

PM10, that is, submodel (b) plus PM10.

Table 5 and Fig. 5 show the comparisons among the submodels(a)–(c). Model (4) was the best model in terms of R2, AIC, RMSE, andMBE%.

In particular, Fig. 5b shows the different predictive capacitiesbetween model (4) and submodel (b): the fitting highlights thatsubmodel (b) was much more similar to the genuine model (2)(Fig. 2b) rather than to model (4). The visual impression was con-firmed by Table 5: the inclusion of the meteorological variablesonly slightly improved the modeling performances of submodel(b), when compared with model (2).

Page 12: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Fig. 6. Performance comparisons of fitted versus observed values of Kd and diffuse solar radiation among (a–d) model (4), (b–e) Oliveira et al.’s [41] model and (c–f) Soareset al.’s [32] model, on 25% of observations. Fitted diffuse solar radiations were evaluated as estimated Kd times the observed global solar radiation and are expressed in MJ m�2

h�1.

C. Furlan et al. / Applied Energy 92 (2012) 240–254 251

Model (4) was then compared with submodel (c). Fig. 5c showsthat the fitting of submodel (c) was very similar to that of submod-el (b) in Fig. 5b. This was confirmed by the analogous performancemeasures of the two submodels in Table 5: that is, the effect of airpollution did not seem to be so important. Consequently, theimprovement in the fit of final model (4) with respect to model(2) was due to the inclusion of cloud information, and it could beevaluated, for instance, through the coefficient of determinationR2, which increased from 0.868 to 0.93. With respect to the visualimpression, the cloud effect is shown in Fig. 5c as the different dis-persion of the predicted values when cloud information was in-cluded (stars) or not (triangles). In summary, it can be concludedthat cloud information was much more relevant with respect totraditional meteorological variables and, mostly, to pollution.

Therefore, considering the variable ranking analysis presentedabove, the possibility of obtaining a simplified model with a muchwider applicability would be investigated. This model had to con-tain essential cloud information and this information had to be

easily obtainable, in order to promote the applicability. For thesereasons, the information was expressed as regressors. Besides theclassical clearness index, the following cloud information wasadded:

� cloudiness: quantitative variable assuming values from 0 to 10;� cloud height: qualitative variable corresponding to factors that

were indicated by LowC, MiddleC and HighC, assuming value 1whether low, middle and high clouds are present and 0 if not.

From a practical point of view, LowC, MiddleC and HighC were asimplified version of both cloud type and cloudy pattern; besides,they were easily obtainable by exclusively using the code of theclear sky, at different altitudes, from the traditional informationon the type of clouds. For instance, if at a certain hour, there wasclear sky at the low altitude, then LowC would assume a value 0;if not, then LowC would be 1.

In summary, the simplified version of model (4) was

Page 13: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

Table 6Parameters of model (5) fitted for the whole 2002, winter and summer, and for three ranges of air pollution. R2 is evaluated in the test set.

Intercept Kt CI LowC MidC HighC R2

2002 0.731 �1.316 0.022 0.019 0.036 �0.037 0.92Summer 0.731 �1.302 0.022 0.018 0.029 �0.044 0.91Winter 0.730 �1.315 0.023 0.020 0.044 �0.033 0.92

PM10 < 26 0.741 �1.352 0.022 0.011 0.027 �0.036 0.9226 < PM10 < 54 0.708 �1.285 0.023 0.018 0.040 �0.041 0.92PM10 > 54 0.714 �1.336 0.022 0.019 0.030 �0.031 0.92

252 C. Furlan et al. / Applied Energy 92 (2012) 240–254

Kd ¼ b0 þ b1ðKt � cÞIKt>0:228 þ b2CI þ b3LowC þ b4MiddleC

þ b5HighC ð5Þ

It should be emphasized that, the factors in the model (5) allow usto include all relevant scenarios of cloud heights into the one un-ique expression, allowing these scenarios to be objectively testedwhether they are influent or not for Kd. From a statistical point ofview, the test is made to verify the significance of the correspondentcoefficient (as presented for model (4)). All parameters of model (5)indicated in Table 6 are significant at a 5% level. In the case of model(5), fitted to the entire 2002 (first line in Table 6), the parametersindicated that the slope of Kt (for Kt greater that 0.228) was�1.316; as the clearness index increased by 1 unit, Kd increasedby 0.0222; Kd increased in average by 0.019 with the presence oflow clouds and by 0.036 with middle clouds, while it decreased inaverage by 0.037 with the presence of high clouds.

In this field (applied energy) most of the solar radiation empir-ical models are traditionally based on a simple expression fittedthrough data, representing different scenarios producing as manyexpressions as the number of scenarios. Given this fact, it wasdecided to adapt model (5) to the scenarios defined by situationsof typical interest: winter and summer, and different levels ofPM10 (three ranges delimited by the 25% and the 75% quantiles ofTable 2). In this way, model (5) can be more easily implementedby a wider group of users, not necessarily expert in statistics, be-cause it mixes the correct statistical approach (building a goodmodel with the most important factors tested objectively) andthe traditional method (single expression modeling scenarios ofimportance). The problem with traditional method is that the ef-fect of each scenario cannot be objectively tested. Table 6 displaysmodel (5) coefficients and the index of determination R2 evaluatedin the test set. Even though the effect of Season cannot be objec-tively estimated, the effect of low clouds on Kd did not seem tobe susceptible to the variations in the local atmosphere associatedto winter and summer contrast, while the effect of middle and highclouds changed considerable between these two scenarios.

The R2 large indicated that Kd was well predicted by this simpli-fied model with only cloud information and it was greater than anymodels of Table 5 that did not include explicitly any extra cloudinformation (besides Kt) in the modeling.

5. Conclusion

A new multiple regression model was developed to estimatehourly values of diffuse solar radiation at the surface in terms ofglobal solar radiation at the surface, which took into accountexplicitly the effects of cloud, air pollution and seasonal variationof the climate. According to the statistical classification adoptedhere, this model belongs to the general class of parametric empir-ical models, and in particular, it is a linear regression model. Themodel was developed and applied to the city of São Paulo, Brazil,considering as meteorological data hourly values of global and dif-fuse solar radiation at the surface, air temperature, relative humid-ity, and atmospheric pressure measured continuously at screen

level. Cloud information corresponded to the cloudiness and cloudtype was observed visually from 07:00 to 24:00 LT. Air pollutionwas characterized by hourly values of particulate matter at the sur-face. These observations were carried out during 2002. A represen-tativeness test applied to all meteorological variables andparticulate matter concentrations indicated that seasonal varia-tions of these variables in 2002 were not statistically different fromthose based on 10 years of observation (1997–2006). Therefore, themodel developed here can be assumed as valid for the city of SãoPaulo even though it is based only on 1 year of observation dueto the restrictions in the cloud information data set. Model con-struction was performed using 75% of the entire data set, whilemodel validation used the remaining 25%. The functionality ofthe Kd relationship with respect to relative humidity, air tempera-ture, pressure, and cloudiness was identified by considering thejoint information between diffuse fractions and quantiles of thesevariables. This analysis indicated a polynomial function as the mostappropriate to specify the relationship between Kd and these vari-ables. The effects of particulate matter, cloud type, and cloudy pat-tern were better described by factors.

Initially, an analysis was performed to find the best representa-tion of the Kt–Kd relationship observed in São Paulo. Since a brokenline was preferred, the break point was determined objectively inthe framework of the non-linear regressions. It should be pointedout that the authors could not find any work where the breakpoints were estimated objectively. This new approach reducesthe subjectivity inherent to segmented regression interpolationmaking it more appropriate to reproduce the behavior of the Kt–Kd relationship in comparison to other techniques. Once the breakpoint was estimated, this value was considered fixed in the linearregression model used to study the influence of all the regressorson Kd. A second-degree polynomial function was identified as influ-ent for air relative humidity, while a third degree polynomial wasmore influent for temperature, surface pressure, and cloudiness.Clouds, in general, were positively correlated with Kd, while clearsky conditions were negatively correlated. The negative correlationfound between cirrus clouds and Kd may have been caused by mis-representation of this type of cloud caused by the fact that cloudstypes were identified visually. The negative correlation betweenparticulate matter concentration and Kd could be explained interms of the absorbing nature of aerosol in São Paulo. During year2002, the aerosol in São Paulo showed a high frequency of dayswith AI negative. A positive effect was also found for the warmerand wetter season.

This new modeling technique allows assessment of the effect ofinteractions between air pollution, local climate, and patterns ofcloudiness and Kd. There was evidence of a reduction in Kd (a) inthe presence of high values of particular matter with a clear skyat a low level, (b) with a clear sky in the middle level in the warmerand wetter season, and finally, (c) with high values of particularmatter in the warmer and wetter season. The interaction betweenair pollution and clouds could be explained in terms of the increaseof solar radiation scattering caused by a larger number of dropletscaused by a larger number of urban droplet nuclides. The interac-tions between air pollution and season were related to the dry

Page 14: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

C. Furlan et al. / Applied Energy 92 (2012) 240–254 253

season in São Paulo increasing the frequency of temperature inver-sions and the concentration of air pollution near to the surface.

The performance of the new model was very good with a coef-ficient of determination R2 of 0.93 for the fitting of observed data.Compared to the models developed by Oliveira et al. [41] andSoares et al.’s [32] for São Paulo using a larger data set, the newmodel had the best performance in terms of R2, RMSE, and correla-tion coefficient between observed and predicted values of Kd. Withrespect to the latter two approaches, the new regression model al-lowed both the inclusion of more information about the local cli-mate features and the explicit estimation of the basic functionsof the relationship between Kd and each variable.

One important outcome of the ranking analysis was that cloudinformation was revealed as the most critical factor in the Kd–Kt

relationship. A simplified version, with only cloud information,was then proposed to promote a much wider applicability. Thisversion of the model considered only Kt, the cloudiness, and thecloud height by introducing three factors that identified the pres-ence of clouds at low, middle and high altitudes (LowC, MiddleCand HighC). The performance of this simplified version was verygood and was significantly better than both of the models devel-oped by Oliveira et al. [41] and Soares et al. [32].

‘‘Correlation’’ models have being useful to predict Kd in terms ofKt, working well to estimate daily and monthly values of diffuse so-lar radiation. This work produced evidence that by including addi-tional cloud information such as the cloudiness and cloud height, itis possible to develop a simple regression model that performs wellfor hourly values as well.

Acknowledgements

The authors acknowledge the financial support provided byBrazilian Research Agencies CNPq and FAPESP, and by ‘FondazioneCassa di Risparmio di Padova e Rovigo’ (‘Innovation Diffusion Pro-cesses: Differential Models, Agent-Based Frameworks and Fore-casting Methods’ project).

References

[1] Mellit A, Kalogirou SA. Artificial intelligence techniques for photovoltaicapplications: a review. Prog Energy Combust 2008;34:574–632.

[2] Joshi AS, Dincer I, Reddy BV. Performance analysis of photovoltaic systems: areview. Renew Sust Energy Rev 2009;13:1884–97.

[3] Thirugnanasambandam M, Iniyan S, Goic R. A review of solar thermaltechnologies. Renew Sust Energy Rev 2010;14:312–22.

[4] Kalogirou SA. Solar thermal collectors and applications. Prog Energy Combust2004;30:231–95.

[5] Desideri U, Proietti S, Sdringola P. Solar-powered cooling systems: technicaland economic analysis on industrial refrigeration and air-conditioningapplications. Appl Energy 2009;86:1376–86.

[6] Préndez MM, Egido M, Tomas C, Seco J, Calvo A, Romero H. Correlationbetween solar radiation and total suspended particulate matter in Santiago,Chile-Preliminary results. Atmos Environ 1995;29(13):1543–51.

[7] Arboit M, Diblasi A, Fernandez Llano JC, De Rosa C. Assessing the solar potentialof low-density urban environments in Andean cities with desert climates: thecase of the city of Mendoza, in Argentina. Renew Energy 2008;33:1733–48.

[8] Stanhill G, Cohen S. Global dimming: a review of the evidence for a widespreadand significant reduction in global radiation with discussion of its probablecauses and possible agricultural consequences. Agric Forest Meteorol2001;107:255–78.

[9] Jebaraj S, Iniyanb S. A review of energy models. Renew Sust Energy Rev2006;10:281–311.

[10] Tiba C, Fraidenraich N, Grossi Gallegos H, Lyra FJM. Solar energy resourceassessment – Brazil. Renew Energy 2002;27:383–400.

[11] Ortega A, Escobar R, Colle S, Abreu SL. The state of solar energy resourceassessment in Chile. Renew Energy 2010;35:2514–24.

[12] Varella FKOM, Cavaliero CKN, Silva EP. A survey of the current photovoltaicequipment industry in Brazil. Renew Energy 2009;34:1801–5.

[13] Oliveira AP, Escobedo JF, Machado AJ, Soares J. Diurnal evolution of solarradiation at the surface in the city of São Paulo: seasonal variation andmodeling. Theor Appl Climatol 2002;71(3–4):231–49.

[14] Ceballos JC, Souza JM, Dantas ACT. Correlation fields for solar radiation innortheast Brazil. Int J Climatol 2001;21:887–902.

[15] Gueymard CA, Myers DR. Evaluation of conventional and high-performanceroutine solar radiation measurements for improved solar resource,climatological trends, and radiative modeling. Solar Energy 2009;83:171–85.

[16] Emde C, Mayer B. Simulation of solar radiation during a total eclipse: achallenge for radiative transfer. Atmos Chem Phys 2007;7:2259–70.

[17] Gueymard CA. REST2: high-performance solar radiation model for cloudless-sky irradiance, illuminance, and photosynthetically active radiation–validation with a benchmark dataset. Solar Energy 2008;82:272–85.

[18] Pereira EB, Abreu SL, Stuhlmann R, Rieland M, Colle S. Survey of the incidentsolar radiation in Brazil by use of Meteosat satellite data. Solar Energy1996;57(2):125–32.

[19] Gupta SK, Ritchey NA, Wilber AC, Whitlock CH, Gibson GG, Stackhouse Jr PW. Aclimatology of surface radiation budget derived from satellite data. J Clim1999;12:2692–710.

[20] Deneke H, Feijt A, Van Lammeren A, Simmer C. Validation of a physicalretrieval scheme of solar surface irradiances from narrowband satelliteradiances. J Appl Meteorol 2005;44:1453–66.

[21] Ceballos JC, Bottino MJ, Souza JM. A simplified physical model for assessingsolar radiation over Brazil using GOES 8 visible imagery. J Geophys Res2004;109:D02211. doi:10.1029/2003JD003531.

[22] Michalsky JJ, Dolce R, Dutton EG, Haeffelin M, Jeffries W, Stoffel T, et al. Towardthe development of a diffuse horizontal shortwave irradiance workingstandard. J Geophys Res 2005;110:D06107. doi:10.1029/2004JD005265.

[23] Ricchiazzi P, Yang S, Gautier C, Sowle D. SBDART: a research and teachingsoftware tool for plane-parallel radiative transfer in the Earth’s atmosphere.Bull Am Meteorol Soc 1998;79:2101–14.

[24] Liou KN. An introduction to atmospheric radiation. 2nd ed. SanDiego: Academic Press; 2002.

[25] Wild M. Short-wave and long-wave surface radiation budgets in GCMs: areview based on the IPCC-AR4/CMIP3models. Tellus 2008;60A:932–45.

[26] Wang P, Knap WH, Munneke PK, Stammes P. Clear-sky shortwave radiativeclosure for the Cabauw Baseline Surface Radiation Network site, Netherlands. JGeophys Res 2009;114:D14206. doi:10.1029/2009JD011978.

[27] Oreopoulos L, Mlawer E. The Continual Intercomparison of Radiation Codes(CIRC): assessing anew the quality of GCM radiation algorithms. Bull AmMeteorol Soc 2010;91:305–10.

[28] Jacovides CP, Tymvios FS, Assimakopoulos VD, Kaltsounides NA. Thedependence of global and diffuse PAR radiation components on skyconditions at Athens, Greece. Agric Forest Meteorol 2007;143:277–87.

[29] Escobedo JF, Gomes EN, Oliveira AP, Soares J. Modeling hourly and dailyfractions of UV, PAR and NIR to global solar radiation under various skyconditions at Botucatu, Brazil. Appl Energy 2009;86:299–309.

[30] Ulgen K, Hepbasli A. Diffuse solar radiation estimation models for Turkey’s bigcities. Energy Convers Manage 2009;50:149–56.

[31] Boland J, Ridley B, Brown B. Models of diffuse solar radiation. Renew Energy2008;33:575–84.

[32] Soares J, Oliveira AP, Boznar MZ, Mlakar P, Escobedo JF, Machado AJ. Modelinghourly diffuse solar radiation in the city of São Paulo using neural networktechnique. Appl Energy 2004;79:201–14.

[33] Senkal O, Kuleli T. Estimation of solar radiation over Turkey using artificialneural network and satellite data. Appl Energy 2009;86(7–8):1222–8.doi:10.1016/j.apenergy.2008.06.003.

[34] Janjai S, Pankaew P, Laksanaboonsong J. A model for calculating hourly globalsolar radiation from satellite data in the tropics. Appl Energy2009;86(9):1450–7. doi:10.1016/j.apenergy.2009.02.005.

[35] Tovar-Pescador J. Modelling the statistical properties of solar radiation andproposal of a technique based on Boltzmann statistics. In: Badescu Viorel,editor. Modeling solar radiation at the Earth’s surface recent advances. Berlin,Heidelberg: Springer; 2008. p. 55–91 [chapter 3, 515 pp].

[36] Wong LT, Chow WK. Solar radiation mode. Appl Energy 2001;69:191–224.[37] Liu BYH, Jordan RC. The interrelationship and characteristics distribution of

direct, diffuse and total solar radiation. Solar Energy 1960;4:1–19.[38] Orgill JF, Hollands KGT. Correlation equation for hourly diffuse radiation on a

horizontal surface. Solar Energy 1977;19(4):357–9.[39] Erbs DG, Klein SA, Duffie JA. Estimation of the diffuse radiation fraction for

hourly, daily and monthly-average global radiation. Solar Energy1982;28:293–302.

[40] Reindl DT, Beckman WA, Duffle JA. Diffuse fraction correlations. Solar Energy1990;45(1):1–7.

[41] Oliveira AP, Escobedo JF, Machado AJ, Soares J. Correlation models of diffusesolar radiation applied to the city of São Paulo (Brazil). Appl Energy2002;71(1):59–73.

[42] Jacovides CP, Tymvios FS, Assimakopoulos VD, Kaltsounides NA. Comparativestudy of various correlations in estimating hourly diffuse fraction of globalsolar radiation. Renew Energy 2006;31:2492–504.

[43] Jiang Y. Estimation of monthly mean daily diffuse radiation in China. ApplEnergy 2009;86(9):1458–64. doi:10.1016/j.apenergy.2009.01.002.

[44] El-Sebaii AA, Al-Hazmi FS, Al-Ghamdi AA, Yaghmour SJ. Global, direct anddiffuse solar radiation on horizontal and tilted surfaces in Jeddah, Saudi Arabia.Appl Energy 2010;87(2):568–76. doi:10.1016/j.apenergy.2009.06.032.

[45] Sansigolo CA. Non-stationary Markov chains for modelling daily sunshine atSão Paulo, Brazil. Theor Appl Climatol 1997;56:225–30.

[46] Boland JW, Scott L, Luther M. Modelling the diffuse fraction of global solarradiation on a horizontal surface. Environmetrics 2001;12:103–16.

[47] Lin H, Ma W, Lian Y, Wang X. Estimating daily solar global radiation by day ofyear in China. Appl Energy 2010;87:3011–7.

Page 15: The role of clouds in improving the regression model for hourly values of diffuse solar radiation

254 C. Furlan et al. / Applied Energy 92 (2012) 240–254

[48] Jain PK, Lungu EM. Stochastic models for sunshine duration and solarirradiation. Renew Energy 2002;27:197–209.

[49] Zaharim A, Razali AM, Gim TP, Sopian K. Time series analysis of solar radiationdata in the tropics. Eur J Sci Res 2009;25(4):672–8.

[50] Sen Z. Fuzzy algorithm for estimation of solar irradiation from sunshineduration. Solar Energy 1998;63(1):39–49.

[51] Wenxian L, Enrong L, Wenfeng G, Shaoxuan P, Tao L. Distribution patterns ofdiffuse solar radiation in Yunnan province, China. Energy Convers Manage1996;37(5):553–60.

[52] Skartveit A, Olseth JA, Tuft M. An hourly diffuse fraction model with correctionfor variability and surface albedo. Solar Energy 1998;63(3):173–83.

[53] Pinty B, Lattanzio A, Martonchik JV, Verstraete MM, Gobron N, Taberner M,et al. Coupling diffuse sky radiation and surface albedo. J Atmos Sci2005;62:2580–91.

[54] Muneer T, Munawwar S. Potential for improvement in estimation of solardiffuse irradiance. Energy Convers Manage 2006;47:68–86.

[55] Aras H, Balli O, Hepbasli A. Estimating the horizontal diffuse solar radiationover the Central Anatolia Region of Turkey. Energy Convers Manage2006;47:2240–9.

[56] Munawwar S, Muneer T. Statistical approach to the proposition and validationof daily diffuse irradiation models. Appl Energy 2007;84:455–75.

[57] Mubiru J, Banda EJKB. Performance of empirical correlations for predictingmonthly mean daily diffuse solar radiation values at Kampala, Uganda. TheorAppl Climatol 2007;88:127–31. doi:10.1007/s00704-006-0249-1.

[58] Alam S, Kaushik SC, Garg SN. Assessment of diffuse solar energy under generalsky condition using artificial neural network. Appl Energy 2009;86(4):554–64.doi:10.1016/j.apenergy.2008.09.004.

[59] Okogbue EC, Adedokun JA, Holmgren B. Hourly and daily clearness index anddiffuse fraction at a tropical station, Ile-Ife, Nigeria. Int J Climatol2009;29(8):1035–47. DOI: 10.1002/joc.1849.

[60] Butt N, New M, Malhi Y, Costa ACL, Oliveira P, Silva-Espejo JE. Diffuse radiationand cloud fraction relationships in two contrasting Amazonian rainforest sites.Agric Forest Meteorol 2010;150:361–8.

[61] Ruiz-Arias JA, Alsamamra H, Tovar-Pescador J, Pozo-Vázquez D. Proposal of aregressive model for the hourly diffuse solar radiation under all sky conditions.Energy Convers Manage 2010;51:881–93.

[62] Stephens GL. Cloud feedbacks in the climate system: a critical review. J Clim2005;18:237–73.

[63] Pereira EB, Martins FR, Abreu SL, Couto P, Stuhlmann R, Colle S. Effects ofburning of biomass on satellite estimations of solar irradiation in Brazil. SolarEnergy 2000;68(1):91–107.

[64] Jacovides CP, Steven MD, Asimakopoulos DN. Solar spectral irradiance underclear skies around a major metropolitan area. J Appl Meteorol2000;39:917–30.

[65] Rosenfeld D, Woodley W. Pollution and clouds. Phys World 2001:33–7.[66] Ferreira MJ, Oliveira AP, Soares J, Codato G, Bárbaro EW, Escobedo JF. Radiation

balance at the surface in the City of São Paulo, Brazil. Diurnal and seasonalvariations. Theor Appl Climatol, in press. doi: 10.1007/s00704-011-0480-2.

[67] Oliveira AP, Escobedo JF, Machado AJ. A new shadow-ring device for measuringdiffuse solar radiation at surface. J Atmos Ocean Technol 2002;19(5):698–708.

[68] Iqbal M. An introduction to solar radiation. New York: Academic Press; 1983.[69] Gueymard CA. The sun’s total and spectral irradiance for solar energy

applications and solar radiation models. Solar Energy 2004;76:423–53.[70] CETESB. Relatório da qualidade do ar no Estado de São Paulo 2009, CETESB, São

Paulo, Brazil; 2010. 290 pp. <http://www.cetesb.sp.gov.br/ar/qualidade-do-ar/31-publicacoes-e-relatorios> [Portuguese].

[71] Codato G, Oliveira AP, Soares J, Escobedo JF, Gomes EN, Pai AD. Global anddiffuse solar irradiances in urban and rural areas in southeast of Brazil. TheorAppl Climatol 2008;93:57–73.

[72] Bourotte C, Forti MC, Melfi AJ, Lucas Y. Morphology and solutes content ofatmospheric particles in an urban and a natural area of São Paulo state, Brazil.Water Air Soil Pollut 2006;170:301–16.

[73] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Datamining, inference, and prediction. New York: Springer Verlag; 2001.

[74] Muggeo VMR. Estimating regression models with unknown break-points. StatMed 2003;22:3055–71.

[75] Chambers JM, Hastie TJ. Statistical models in S. Pacific Grove (CA,USA): Wadsworth & Brooks/Cole; 1992.

[76] Torres O, Bhartia PK, Herman JR, Ahmad Z, Gleason J. Derivation of aerosolproperties from satellite measurements of backscattered ultraviolet radiation:theoretical basis. J Geophys Res 1998;103:17099–110.