10
Solar Energy 74 (2003) 235–244 Using probabilistic finite automata to simulate hourly series of global radiation a, b * ´ L. Mora-Lopez , M. Sidrach-de-Cardona a ´ ´ ´ ´ Dpto. Lenguajes y C. Computacion, E.T.S.I. Informatica, Universidad de Malaga, Campus de Teatinos, 29071 Malaga, Spain b ´ ´ ´ ´ Dpto. Fısica Aplicada II, E.T.S.I. Informatica, Universidad de Malaga, Campus de Teatinos, 29071 Malaga, Spain Abstract A model to generate synthetic series of hourly exposure of global radiation is proposed. This model has been constructed using a machine learning approach. It is based on the use of a subclass of probabilistic finite automata which can be used for variable-order Markov processes. This model allows us to represent the different relationships and the representative information observed in the hourly series of global radiation; the variable-order Markov process can be used as a natural way to represent different types of days, and to take into account the ‘‘variable memory’’ of cloudiness. A method to generate new series of hourly global radiation, which incorporates the randomness observed in recorded series, is also proposed. As input data this method only uses the mean monthly value of the daily solar global radiation. We examine if the recorded and simulated series are similar. It can be concluded that both series have the same statistical properties. 2003 Elsevier Ltd. All rights reserved. 1. Introduction next values of the series from their predecessors. The approach is as follows: first, the model must be identified; Different approaches have been followed to characterize to do this, the recorded series are statistically analyzed in the hourly series of solar global radiation. Taking into order to select the best model for the series. Then the account the nature of these series, we propose the use of a parameters of the model must be estimated. After this, a new model for their characterization and simulation. This new series of values can be generated using the estimated new model is easy to use once it has been constructed and model. For example, this approach has been followed in it allows us to represent the relationships observed in the Brinkworth (1977), Bendt et al. (1981), Aguiar et al. hourly series of global radiation. Moreover, it can be (1988), Aguiar and Collares-Pereira (1992), and Mora- ´ embedded in engineering software by including the esti- Lopez and Sidrach-de-Cardona (1997). mated probabilistic finite automata and the algorithm One of the problems with most of these methods is that presented in Section 5. Before explaining the model, we the probability distribution functions of the generated briefly review the existing models, paying special attention series are normal when stochastic models are used. This to their simplicity, requirements and limitations. problem can be solved for daily series using first-order Several studies have been carried out to obtain models Markov models (see Aguiar et al., 1988). For hourly series, which allow us to simulate the hourly series of solar global to circumvent this problem, a differenced series and ´ radiation. Traditionally, the analysis of time series has ARMA models can be used (e.g. Mora-Lopez and Sidrach- been carried out using stochastic process theory. One of de-Cardona, 1997); however, in this case the simulation of the most detailed analyses of statistical methods for time a new series uses a complex iterative process: the use of series research was performed by Box and Jenkins (1970). the differences operator makes it difficult to generate new The goal of data analysis by time series is to find models series of global radiation because it is necessary to which are able to reproduce the statistical characteristics of eliminate the negative values which appear in the series. the series. Moreover, these models allow us to predict the The aim of this work was to study the use of a mathematical model called probabilistic finite automata (PFA) as a means of representing the relationships ob- *Corresponding author. Tel.: 134-95-2132-802; fax: 134-95- served in hourly global solar radiation series. PFAs are 2131-397. ´ E-mail address: [email protected] (L. Mora-Lopez). mathematical models developed within the fields of Artifi- 0038-092X / 03 / $ – see front matter 2003 Elsevier Ltd. All rights reserved. doi:10.1016 / S0038-092X(03)00149-X

Using probabilistic finite automata to simulate hourly series of global radiation

Embed Size (px)

Citation preview

Page 1: Using probabilistic finite automata to simulate hourly series of global radiation

Solar Energy 74 (2003) 235–244

U sing probabilistic finite automata to simulate hourly series ofglobal radiation

a , b*´L. Mora-Lopez , M. Sidrach-de-Cardonaa ´ ´ ´ ´Dpto. Lenguajes y C. Computacion, E.T.S.I. Informatica, Universidad de Malaga, Campus de Teatinos, 29071Malaga, Spain

b ´ ´ ´ ´Dpto. Fısica Aplicada II, E.T.S.I. Informatica, Universidad de Malaga, Campus de Teatinos, 29071Malaga, Spain

Abstract

A model to generate synthetic series of hourly exposure of global radiation is proposed. This model has been constructedusing a machine learning approach. It is based on the use of a subclass of probabilistic finite automata which can be used forvariable-order Markov processes. This model allows us to represent the different relationships and the representativeinformation observed in the hourly series of global radiation; the variable-order Markov process can be used as a natural wayto represent different types of days, and to take into account the ‘‘variable memory’’ of cloudiness. A method to generatenew series of hourly global radiation, which incorporates the randomness observed in recorded series, is also proposed. Asinput data this method only uses the mean monthly value of the daily solar global radiation. We examine if the recorded andsimulated series are similar. It can be concluded that both series have the same statistical properties. 2003 Elsevier Ltd. All rights reserved.

1 . Introduction next values of the series from their predecessors. Theapproach is as follows: first, the model must be identified;

Different approaches have been followed to characterize to do this, the recorded series are statistically analyzed inthe hourly series of solar global radiation. Taking into order to select the best model for the series. Then theaccount the nature of these series, we propose the use of a parameters of the model must be estimated. After this, anew model for their characterization and simulation. This new series of values can be generated using the estimatednew model is easy to use once it has been constructed and model. For example, this approach has been followed init allows us to represent the relationships observed in the Brinkworth (1977), Bendt et al. (1981), Aguiar et al.hourly series of global radiation. Moreover, it can be (1988), Aguiar and Collares-Pereira (1992), and Mora-

´embedded in engineering software by including the esti- Lopez and Sidrach-de-Cardona (1997).mated probabilistic finite automata and the algorithm One of the problems with most of these methods is thatpresented in Section 5. Before explaining the model, we the probability distribution functions of the generatedbriefly review the existing models, paying special attention series are normal when stochastic models are used. Thisto their simplicity, requirements and limitations. problem can be solved for daily series using first-order

Several studies have been carried out to obtain models Markov models (seeAguiar et al., 1988). For hourly series,which allow us to simulate the hourly series of solar global to circumvent this problem, a differenced series and

´radiation. Traditionally, the analysis of time series has ARMA models can be used (e.g.Mora-Lopez and Sidrach-been carried out using stochastic process theory. One of de-Cardona, 1997); however, in this case the simulation ofthe most detailed analyses of statistical methods for time a new series uses a complex iterative process: the use ofseries research was performed byBox and Jenkins (1970). the differences operator makes it difficult to generate newThe goal of data analysis by time series is to find models series of global radiation because it is necessary towhich are able to reproduce the statistical characteristics of eliminate the negative values which appear in the series.the series. Moreover, these models allow us to predict the The aim of this work was to study the use of a

mathematical model called probabilistic finite automata(PFA) as a means of representing the relationships ob-*Corresponding author. Tel.:134-95-2132-802; fax:134-95-served in hourly global solar radiation series. PFAs are2131-397.

´E-mail address: [email protected](L. Mora-Lopez). mathematical models developed within the fields of Artifi-

0038-092X/03/$ – see front matter 2003 Elsevier Ltd. All rights reserved.doi:10.1016/S0038-092X(03)00149-X

Page 2: Using probabilistic finite automata to simulate hourly series of global radiation

´236 L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244

cial Intelligence and Machine Learning. The machine the year and latitude of the location where they werelearning models are very useful for studying systems in recorded (North latitude). This table also includes annualwhich the goal concept presents probabilistic behavior. average values of daily global solar radiation,G , for eachdy

The prediction of climatic variables is an example of this location.type of concept. Recently, some authors have used differ- The weather characteristics of the locations are veryent types of neural networks and finite automata to model different. There is a location with a moderate Atlanticthe values of global solar radiation on horizontal surfaces climate (Oviedo). There are interior locations which have a(e.g. Mohandes et al., 1998; Kemmoku et al., 1999; continental climate, such as Madrid, Tortosa, etc. Finally,Mohandes et al., 2000; Sfetsos and Coonick, 2000; Mora- the coastal locations have a Mediterranean climate, with

´ ´Lopez et al., 2002). When neural network models have softer temperatures both in winter and in summer (Malaga,been used, only mean values of daily or hourly global Mallorca, etc.).radiation have been analyzed. In the paper ofSfetsos andCoonick (2000) the developed models can be used topredict the hourly solar radiation time series, but these

3 . Probabilistic finite automatamodels are obtained using only data from summer months(63 days). In all cases, the obtained models are ‘‘black

We propose using a mathematical model called prob-boxes’’, and no significant information can be obtained.abilistic finite automata (PFA). One of the first applicationsThere are several programs that allow us to generateof this model was proposed byRissanen (1983)fordaily and hourly series of global radiation based onuniversal data compression. Various other practical taskspublished models (see, for instance,Scharmer and Greif,have been approached using this mathematical model, such2000), and different international projects provide method-as the analysis of biological sequences, for DNA andologies for synthetic data generation—for instance, theproteins (Krog et al., 1993), and the analysis of naturalEuropean JOULE III Climed Project.language, for handwriting and speech (Nadas, 1984;After this brief review of the models used to character-Rabiner, 1994; Ron et al., 1998).ize hourly solar radiation series, we present the outline of

Different classes of automata have been developed. Forthe paper. First, we describe the mathematical modelinstance, acyclic probabilistic finite automata have beenused—Probabilistic Finite Automata. We then explain howused for modeling distributions on short sequences (Ron etto use this model for the analysis and prediction of solaral., 1998). Probabilistic suffix automata, based on variable-radiation series. Then we propose a generalization of theorder Markov models, have been used to construct a modelmodel, based on the use of general cumulative probabilityof the English language (Ron et al., 1994). All thesedistribution functions and mean values of the daily clear-automata allow us to take into account the temporalness index. Finally, we test the model using data fromrelationships in a series.several Spanish locations.

We propose the use of this mathematical model, prob-abilistic finite automata, to represent a univariate timeseries.2 . Data set

Formally, a PFA is a 5-tuple (Q,S,t,g,q ), where0

The data of hourly exposure series of global radiation, • S is a finite alphabet; that is, a set of discrete symbolshG (t)j used in this work were recorded over several years corresponding to the different continuous values of theh

at nine Spanish meteorological stations. The number of analyzed parameter. The different symbols ofS will beavailable observations is reported inTable 1, specifying represented byx . In one series, the values observed cani

be x x . . . x . To represent the different observable5 3 3

series for a periodt to t we will use the symbols1 mT able 1 y y . . . y . Therefore, in the seriesx x . . . x , the1 2 m 5 3 3Data set symboly corresponds to the valuex , the symboly to1 5 2

Location Years Months Lat. G x , etc.dy 322(8N) (MJ m ) • Q is a finite collection of states. Each state corresponds

to a subsequence of the discretized time series. TheBadajoz 1976–1983 84 38.89 16.6maximum size of a state—number of symbols—is´Castellon 1979–1984 55 39.95 15.8bounded by a valueN fixed in advance. This value isMadrid 1979–1986 96 40.45 16.6

´Malaga 1977–1984 94 36.66 16.9 related to the number of previous values which will beMurcia 1977–1984 96 38.00 16.9 considered to determine the next value in the series andOviedo 1977–1984 96 43.35 11.1 depends on the ‘‘memory’’ of the series.Mallorca 1977–1984 94 39.33 15.5 • t : Q 3S → Q is the transition function.Sevilla 1977–1984 78 37.42 17.6 • g : Q 3S → [0,1] is the next symbol probability func-Tortosa 1980–1984 52 40.81 15.1 tion.

Page 3: Using probabilistic finite automata to simulate hourly series of global radiation

´L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244 237

• q [Q is the initial state.0

The functiong satisfies the following requirement: Forevery q [Q and for every x [S, o g(q,x )5 1.i x [S ii

Moreover, the following conditions are required:• the transition functiont can be undefined only on states

q [Q and symbolsx [S for which g(q,x)50;*• the functiont can be extended to be defined onQ 3S

in the following recursive manner:

t(q,y ,y , . . . ,y )5t(t(q,y ,y , . . . ,y ),y ), (1)1 2 t 1 2 t21 tFig. 2. Simplified probabilistic finite automata.

wherey [S.i

The maximum size of a state—number of symbols—is the same probability vector as state 1. That is, when thebounded by a valueN fixed in advance. This value is symbol 1 appears, it is not necessary to know the preced-related to the number of previous values which will be ing value to determine the probabilities of the next symbol,considered to determine the next value and depends on the since, in both cases (0 or 1), the probability vector of the‘‘memory’’ of the series. next symbol is (0.5,0.5). Therefore, the PFA ofFig. 1 can

Graphically, each state is represented by a node and the be converted into the PFA shown inFig. 2. This class ofedges going out of each state are labelled by symbols PFA is used to represent variable-order Markov models.drawn from the alphabet. Moreover, each state has an These simplified automata are the automata proposed inassociated probability vector which includes the probabili- this paper. They capture the same information with fewerty of the next symbol for each of the symbols of the states than the original automata. Moreover, they allow usalphabet.Fig. 1 shows a simple PFA as an example. to take into account, for each state, a different number of

In this PFA, the alphabet is composed of the symbols 0 previous values in the series.and 1. The states of the system are described in each node Let us define some concepts that we will use to build theof the automata: initial (i), 0, 1, 00, 01, 10 and 11. For PFAs for hourly global radiation series. LetS 5

instance, the state labeled 01 corresponds to the following hx ,x , . . . ,x j be the set of discrete values of the analyzed1 2 n

*sequence of values in the series: 1 as the last value and 0 variable and letS denote the set of all possible sequencesas the previous value. The associated vectors at each state which can be obtained with these values. For any integer

N(node) are the probabilities which each symbol of the N, S denotes the set of all possible sequences of lengthN#Nalphabet has to appear in the next moment, after the andS is the set of all possible sequences with length

symbol sequence that label the node has appeared. For less than or equal toN. For any subsequenceY, representedinstance, the node labeled 10 has the associated vector byy . . . y , wherey [S, the following notation will be1 m i

(0.25,0.75); this means that if the current state is 10, then used:the next symbol can be 0, with a probability of 0.25, and 1, • the longest final subsequence ofY, different from Y,with a probability of 0.75. The continuous and discontinu- will be final(Y)5 y . . . y ;2 m

ous arrows represent the transition function between states • the set of all final subsequences ofY will be last(Y)5(discontinuous for 0, continuous for 1). For instance, if the hy . . . y u 1# i #mj.i m

current state is 10, and the next symbol is 0, then the In the next section we explain how to build a PFA forfollowing state will be labeled 00; but if the next symbol is hourly global radiation series.1, then the following state will be labeled 01.

In the PFA shown inFig. 1, the states 01 and 11 have

4 . Building PFA for hourly global radiation series

The parameter used to build the PFA is the hourlyclearness index, defined as

K 5G /G , (2)h h h,0

where G is the hourly global radiation andG is theh h,0

extraterrestrial hourly global radiation.The hourly clearness index series have been constructed

in an ‘‘artificial’’ way because data from different dayshave been linked together: the last observation of each dayis followed by the first observation of the following day.Fig. 1. Example of probabilistic finite automata: (- - -) transition

with 0; (———) transitions with 1. This assumption has already been used in previous papers

Page 4: Using probabilistic finite automata to simulate hourly series of global radiation

´238 L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244

´(see, for instance,Mora-Lopez and Sidrach-de-Cardona, series, compute the frequency ofY. If 4a and 4b are1997), and the results obtained confirm to us the validity of true, then go to 5, else go to 6:this hypothesis. On the other hand, the number of hours 4a. The frequency of this sequence is greater than theconsidered for each series (month) is constant and equal threshold frequency.for all locations considered. The number of hours consid- 4b. For somex [S, the probability of occurrence ofp

ered for each month is 10 for January, February, the subsequenceYx is not equal to the probabilityp

November and December, 12 for March, April, September of the subsequencefinal(Y)x , that isp

and October and 14 for May, June, July and August.P(x u Y)±P(x u final(Y)) (5)p pTo discretize the continuous values of the clearness

index we have used eight different discrete values. The (not equal: when the ratio between the probabilitiessymbolic discrete values used to construct the PFA are is significantly greater than one; for instance,

greater than 1.2).S 5 h0,1,. . . ,7j. (3)

5. DoThese values form the alphabet of the PFA. 5a. Add to the PFA a node, labeledY, and compute its

The relationship between the values of the clearness corresponding probabilities vector.index and the symbols of the alphabet is as follows: 5b. For each amplified sequence,Yx : if the probabilityp

of this amplified sequence is greater than the0, 0#K ,0.35,h threshold probability, then include it inPSS.K 20.35h 6. Remove the analyzed subsequence,Y, from PSS.Y 5 (4)]]] 1 1, 0.35#K ,0.65,h h0.05 7. If there are no elements of ordero in PSS, add 1 to the5

7, K $ 0.65,h value ofo. If o #N and there are elements of lengthoin PSS, then go to 4, else Stop.

where A is the integer value ofA. We have not useduniform intervals to discretize the series because, in thelower and upper intervals, the frequency of values in the 5 . Generating new series of hourly global radiationseries is less than in the other intervals.

Using these expressions and the hourly clearness index To generate new series we need an initial state. Theseries, the discrete serieshY j are obtained. For instance, ifh initial state we have used is the discrete value corre-the maximum order of PFA is 5, the set of all possible sponding to the mean value of the clearness index for eachstates will be series. Letq be the current state. The next symbol,y, ist

generated as follows: first, a random numberr [ [0,1] is*Q 5S 5 h0,1,2,. . . ,00,01,02, . . . ,77,000,001, . . . ,generated. Then, we choose the only component of the

777, . . . ,77777j. probabilities vector—for the current state,q —whicht

satisfiesIn this set, the state 65543 can correspond to the followingj j21sequence of values for the clearness index: 0.62, 0.58,

0.56, 0.53, 0.48. y 5 y u O g(q ,y )$ r andy u O g(q ,y ), r. (6)j t i j21 t ii51 i51From all possible subsequences observed in the series,

only those with a sufficient probability will be used to The process continues until the length of the requiredbuild the PFA. This threshold of probability must be sequence is reached.defined when the PFA is built. Thus, we can generate different series with the same

The monthly series of the hourly clearness index have initial day. Once we generate a synthetic series, we test ifbeen grouped using the monthly mean value of the hourly it is possible to accept the null hypothesis that this seriesclearness index. The ranges for each group are the same asand the recorded one have the same mean and variance,those defined for the discretization of this parameter. For with significance level 0.05. If this is the case, thisevery interval, one PFA has been built. synthetic series is selected as a proxy for the recorded one;

The following algorithm is used to construct each PFA: otherwise, we generate another synthetic series, and the1. Compute the series of discrete values. process continues until we find a synthetic series for which2. Initialize the PFA with a node, with label null sequence. the null hypothesis is not rejected—in all cases, less than3. The setPSS—Possible Subsequence Set—is initialized 10 synthetic series had to be generated.

with all sequences of order 1. Each element in this set Then, for each selected synthetic series, we compare itscorresponds to a sequence of discrete values. Take cumulative probability distribution function (cpdf) with theo 5 1 as the initial value of the order—that is, the size cpdf of the recorded series. This comparison has beenof subsequences to consider. made using the Kolmogorov–Smirnov two-sample test-

4. If there are elements of ordero in PSS, pick any of statistic, which focuses on the absolute value of thethese elements,Y. Using all discrete sequences in the maximum difference between the two empirical distribu-

Page 5: Using probabilistic finite automata to simulate hourly series of global radiation

´L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244 239

tion functions (a detailed description of the Kolmogorov– inFig. 3. To analyze the relationship between these twoSmirnov statistic can be found, for instance, inRohatgi, parameters, we have computed the correlation coefficient1976, Section 13.3). However, to perform the statistical between them. This correlation coefficient proves to betest of the null hypothesis ‘‘the two series have the same 0.992, which indicates a very strong relationship betweencpdf’’, the standard critical values of the Kolmogorov– them. Therefore, we can conclude that the mean monthlySmirnov two-sample tests cannot be used with time series. daily clearness index—which is usually available—can beInstead, we have computed for each statistic a bootstrap used in the proposed model instead of the mean monthlyP-value, following the block resampling method which is hourly clearness index.usually employed with time series data (a detailed descrip- On the other hand, in the previous section we comparedtion of this procedure can be found, for instance, in the empirical distribution functions of the recorded seriesDavidson and Hinkley, 1997,Section 8.2.3). A large and the selected synthetic series to test the hypothesis thatbootstrapP-value means that the two series which are their cpdfs are the same. However, for most locations thebeing compared have a very similar cpdf—specifically, if cpdfs are not available, and hence this comparison cannotthe bootstrapP-value is greater than 0.05, the null hypoth- be performed. Several authors have shown that the cpdf ofesis that ‘‘both series have the same cpdf’’ is not rejected the daily clearness index is related to its monthly meanwith the usual significance level 0.05. This procedure can value (see, e.g.,Bendt et al., 1981; Hollands and Huget,be used to compare each recorded series with the synthetic1983; Saunier et al., 1987). Therefore, for locations whereseries selected for it. the cpdf is not available, we can examine if the generated

synthetic series is similar to the unobserved real series bycomparing the cpdfs of the generated synthetic series and

6 . Generalization of the simulation model the standard curves of cpdf described in the aforemen-tioned articles.

The model which has been proposed to generate a new To sum up, the general model to simulate new series ofseries of the hourly clearness index uses, as input data, the hourly global radiation which we propose only uses, asmean monthly value of the daily clearness index and the input data, the mean monthly value of the daily solarcumulative probability distribution function of the recorded global radiation. The procedure is as follows: first, usingmonth. For most meteorological stations, these values are this value, the mean monthly value of the clearness indexnot available and only the mean monthly values of the is calculated. Second, using this other value, a series of thedaily global radiation are usually recorded. Thus, one of clearness index is generated, and then it is tested if thethe aims of this paper is to characterize the observed mean value of this generated series is the same as the inputrelationship between the recorded data and the parameters data used. Third, to determine if the generated series canused for the proposed model. be accepted as a true series, the cpdf of this generated

On the one hand, the relation between the mean monthly series is compared with the standard cpdf proposed in thedaily clearness index and the mean monthly hourly clear- literature for that mean value—this comparison is per-ness index has been analyzed. The former has been formed by computing a bootstrapP-value for the Kol-obtained from the mean monthly value of daily radiation, mogorov–Smirnov two-sample statistic.and the latter from the data series. Both values are shown In the following section, we compare the results ob-

Fig. 3. Mean hourly clearness index vs. mean daily clearness index.

Page 6: Using probabilistic finite automata to simulate hourly series of global radiation

´240 L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244

T able 2tained when the input data are obtained from the recordedResults obtained for each interval of the clearness index, withdata for each month with the results obtained when theN 5 4 and threshold 2. The third column reports the number ofinput data are the mean values of global radiation and themonths in which it is accepted that the cpdf of the generated seriesestimated mean cpdf.and the cpdf of the real series are the same

Interval Number Number of Percentageof months simulated series

7 . Results with cpdf similarto real one

To select the best values of the parameters to build the [0–0.35) 17 17 100PFAs, we have checked the results obtained with different [0.35–0.4) 55 54 98.2values of these parameters. The values we have used are:[0.4–0.45) 79 78 98.7• Order of the PFA (N): from 2 to 14.N is the maximum [0.45–0.5) 107 106 99.1

[0.5–0.55) 137 136 99.3length of the subsequences or number of symbols which[0.55–0.6) 198 192 97.0label each state. This value can be called ‘‘the mem-[0.6–0.65) 120 116 96.7ory’’ of the series.[0.65–1.0) 32 30 93.8• Threshold—minimum number of appearances of a

sequence—from 1 to 5.For each month, synthetic series of the hourly clearness global clearness index and the standard cpdf proposed in

index have been generated, using, as input data: the literature for the corresponding mean value, using the• The mean monthly value of the hourly clearness index aforementioned bootstrap procedure. The results are re-

and the cumulative probability distribution function, ported inTable 3. It can be observed that, for almost allboth obtained from recorded data. months, the generated synthetic series are statistically

• The corresponding PFA for the interval to which the equal to the corresponding standard series. This means thatclearness index belongs by using the mean value of the daily clearness index only,In order to compare the simulated series with the real a new hourly series of the clearness index can be gener-

series, several statistical tests have been used. First, the ated, and, therefore, a series of global radiation. Forhypotheses that both series have the same mean and example,Figs. 4 and 5show the recorded and generatedvariance have been tested. Second, the cpdfs of the series for the data from Murcia (January 1977).Fig. 6recorded and simulated series have been compared using shows the cpdf of both series and the standard cpdf for thethe Kolmogorov–Smirnov two-sample test statistic with a interval of these series.bootstrapP-value, as described above. We have also computed the hourly series of global

For most of the intervals of the mean clearness index, if irradiation from the hourly clearness index series. Usingthe order used for the PFA is 2, the results are similar to the same methodology as before, we have also testedthose when the order is 4; however, for intervals 5 and 6, whether the statistical characteristics (mean, variance andusing order 4, the PFA captures the relationship observed cpdf) of the recorded and simulated series are the same. Inin the series better than when using order 2. Thus, the almost all tests, the null hypothesis of equality is notselected order (maximum) for the PFA is 4. The selected rejected, using 0.05 as significance level. For instance,minimum number of occurrences—required to use a Figs. 7–10show the recorded and simulated values ofsequence to build a PFA—is 2.

With the built PFAs, new sequences of the hourly T able 3Results obtained using the mean values of the clearness index andclearness index have been generated. The original andestimated cpdfs. The third column reports the number of monthsgenerated series have been compared again using thein which it is accepted that the cpdf of the generated series and thestatistical tests described above—first it is tested whetherstandard cpdf are the samethe mean and variance are the same, and when these

hypotheses are accepted (with significance level 0.05) the Interval Number Number of simulatedof months series with cpdf similarcpdfs are compared with the Kolmogorov–Smirnov two-

to proposed standard cpdfsample statistic with a bootstrapP-value. The resultsobtained for each interval of the clearness index are shown [0–0.35) 17 17in Table 2. It can be observed that, in 97.8% of the [0.35–0.4) 55 55months, it is accepted that the recorded series has the same[0.4–0.45) 79 79

[0.45–0.5) 107 107cpdf as the generated series using the proposed model—[0.5–0.55) 137 137only for interval 7 is this percentage less than 96%.[0.55–0.6) 198 198On the other hand, to validate the generalization of the[0.6–0.65) 120 120proposed model, we have also performed a comparison[0.65–1.0) 32 32between the cpdf of the generated series of the hourly

Page 7: Using probabilistic finite automata to simulate hourly series of global radiation

´L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244 241

Fig. 4. K values calculated from recordedG data. Location: Murcia, January 1977 (K 5 0.374).h h hm

Fig. 5. K values simulated from the PFA of the interval 0.35–0.4.h

hourly global radiation for two Spanish locations (Malaga, 8 . ConclusionsJanuary 1977 and Murcia, March 1977). Finally, we havealso compared the daily series of global irradiation ob- In this paper, a new model to simulate hourly globaltained from recorded and simulated data of hourly expo- radiation series is proposed. This model is based on the usesure with the same statistical methodology. Again, in this of Probabilistic Finite Automata and has been developedcase the null hypothesis is never rejected using 0.05 as within the machine learning field. We have verified thatsignificance level. this model allows us to keep all the relevant information

Fig. 6. Cpdfs: standard, recorded data and simulated data. Interval 0.35–0.40. Murcia, January 1977.

Page 8: Using probabilistic finite automata to simulate hourly series of global radiation

´242 L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244

Fig. 7. Recorded values ofG . Malaga, January 1977. Mean monthly value ofK 50.41.h h

Fig. 8. G values simulated from the PFA of interval [0.40–0.45).h

contained in the univariate time series in an easy way. be left out; each subsequence only uses the memory lengthMoreover, with this mathematical model, the different that it requires.relationships observed between different subsequences can Using this mathematical model a set of PFAs were built,

Fig. 9. Recorded values ofG . Murcia, March 1977. Mean monthly value ofK 5 0.61.h h

Page 9: Using probabilistic finite automata to simulate hourly series of global radiation

´L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244 243

Fig. 10. G values simulated from the PFA of interval [0.60–0.65).h

B ox, G.E.P., Jenkins, G.M., 1970. Time Series Analysis Forecast-one for each interval of the analyzed parameter—theing and Control. Prentice-Hall, Engelwood Cliffs, NJ.hourly clearness index. A method to generate new series of

D avidson, A.C., Hinkley, D.V., 1997. Bootstrap Methods and Theirhourly global radiation has been proposed. This methodApplication. Cambridge University Press.only uses the monthly mean value of the daily solar global

H ollands, K.G.T., Huget, R.G., 1983. A probability densityradiation. Using this value, the constructed PFAs, and the function for the clearness index, with applications. Solar Energyproposed standard cpdf, new series of hourly global 30, 195–209.radiation, similar statistically to the real series, can be K emmoku, Y., Orita, S., Nakagawa, S., Sakakibara, T., 1999.generated. Daily insolation forecasting using a multi-stage neural network.

The model has been checked using several tests. The Solar Energy 66 (3), 193–199.K rog, A., Mian, S.I., Haussler, D., 1993. A hidden Markov modelobtained results demonstrate that the generated and re-

that finds genes inE. coli DNA. Technical report UCSC-CRL-corded series are equal: they have the same mean, variance93-16, University of California at Santa-Cruz.and cpdf. We can therefore conclude that Probabilistic

M ohandes, M., Rehman, S., Halawani, T.O., 1998. Estimation ofFinite Automata can be used to characterize and predictglobal solar radiation using artificial neural networks. Renew-new series of hourly global solar radiation series.able Energy 14 (1–4), 179–184.

The data used to estimate the collection of PFAs, i.e. the M ohandes, M., Balghonaim, M., Kassas, M., Rehman, S.,data used in the generation of new series of global Halawani, T.O., 2000. Use of radial basis functions for estimat-radiation, correspond only to locations in Spain. In order to ing monthly mean daily solar radiation. Solar Energy 68 (2),obtain a more general PFA it would be desirable to 161.

´estimate the PFA using data for other climates. Generaliza- M ora-Lopez, L., Sidrach-de-Cardona, M., 1997. Characterizationand simulation of hourly exposure series of global radiation.tion of the PFA is easy to do: it is only necessary toSolar Energy 60 (5), 257–270.recalculate these PFA using the proposed algorithm.

´M ora-Lopez, L., Morales-Bueno, R., Sidrach-de-Cardona, M.,Readers interested in the collection of estimated PFAs andTriguero, F., 2002. Probabilistic Finite Automata and random-the program to generate new series of hourly globalness in nature: a new approach in the modelling and predictionradiation can request them from the authors by e-mail.of climatic parameters. In: Proceeding of the InternationalEnvironmental Modelling and Software Society Congress,Lugano, Suiza, June.

N adas, A., 1984. Estimation of probabilities in the language modelR eferences of the IBM speech recognition system. IEEE Trans. ASSP 32

(4), 859–861.A guiar, R.J., Collares-Pereira, M., Conde, J.P., 1988. Simple R abiner, L.R., 1994. A tutorial on hidden Markov models and

procedure for generating sequences of daily radiation values selected applications in speech recognition. In: Proceedings ofusing a library of Markov Transition Matrix. Solar Energy 40, the Seventh Annual Workshop on Computational Learning269–279. Theory.

A guiar, R., Collares-Pereira, M., 1992. Tag: a time-dependent, R issanen, J., 1983. A universal data compression system. IEEEautoregressive, gaussian model for generating synthetic hourly Trans. Inf. Theory 29 (5), 656–664.radiation. Solar Energy 49 (3), 167–174. R ohatgi, V.K., 1976. An Introduction to Probability Theory and

B endt, P., Collares-Pereira, M., Rabl, A., 1981. The frequency Mathematical Statistics. Wiley, New York.distribution of daily insolation values. Solar Energy 27, 1–5. R on, D., Singer, Y., Tishby, N., 1994. Learning probabilistic

B rinkworth, B.J., 1977. Autocorrelation and stochastic modelling automata with variable memory length. In: Proceedings of theof insolation series. Solar Energy 19, 343–347. Seventh Annual Workshop on Computational Learning Theory.

Page 10: Using probabilistic finite automata to simulate hourly series of global radiation

´244 L. Mora-Lopez, M. Sidrach-de-Cardona / Solar Energy 74 (2003) 235–244

R on, D., Singer, Y., Tishby, N., 1998. On the learnability and S charmer, K., Greif, J., 2000. Database and Exploitation Software.usage of acyclic probabilistic finite Automata. J. Comput. The European Solar Radiation Atlas, Vol. 2, pp. 290.System Sci. 56, 133–152. S fetsos, A., Coonick, A.H., 2000. Univariate and multivariate

S aunier, G.Y., Reddy, T.A., Kuman, S.A., 1987. A monthly forecasting of hourly solar radiation with artificial intelligenceprobability distribution function of daily global irradiation techniques. Solar Energy 68 (2), 169–178.values appropriate for both tropical and temperate locations.Solar Energy 38, 169–177.