8
Chiang Mai J. Sci. 2010; 37(3) : 369-376 www.science.cmu.ac.th/journal-science/josci.html Contributed Paper On Development of Spoke Plot for Circular Variables Fakhrulrozi Hussain, Abdul G. Hussin * , and Yong Z. Zubairi Centre for Foundation Studies in Science, University of Malaya, 50603 Kuala Lumpur, Malaysia. *Author for correspondence; e-mail: [email protected] Received: 5 January 2010 Accepted: 12 August 2010 ABSTRACT Relationship and simple analysis between two any circular variables are the main focus of this paper. Correlation measure, linear relationship and estimation of concentration parameters are some examples of simple approach in analyzing two circular variables. These analyses could be extended to the graphical plots to give a better understanding. A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective. As an illustration, the application of “Spoke plot” as a simple analysis between two circular variables using graphical plots in investigating the correlation, linear relationship and estimating concentration parameters are given using the Malaysian wind direction data. Keywords: concentration parameters, correlation coefficient, wind direction. 1. I NTRODUCTION Circular variables are common in some data such as clock time, compass bearing, moon circle, wind direction, turbine and others. The wind direction data is a type of the circular data. The data is measured on a scale that repeats itself or with angular directions. Unlike the common linear data, the features of circular data warrants appropriate techniques and special inferential tools in the analysis. The analysis of circular data has been studied for many years [1 - 4] as well as the application of the simple circular regression model was discussed [5]. Another area of circular statistics which is problems of outliers have been studied by Abu Zaid et al. [6]. Further, a number of symmetric probability models have been developed for directional data and these include the von Mises, wrapped Cauchy, Sen Gupta-Rattihalli and many others However, the analysis of directional data is limited as user friendly statistical software that deal with both exploratory data analysis as well as statistical inference are not many in the market. Although theoretical formulations in modeling circular variables have been developed, the evaluations are somewhat complicated due to the nature of the data. Therefore, any development on the analysis of circular data as well as to incorporate further analysis into the available software is indispensable for circular data. This paper focuses on graphical method in analyzing the relationship of two any circular variables via correlation and linear relationship.

On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

Chiang Mai J. Sci. 2010; 37(3) 369

Chiang Mai J. Sci. 2010; 37(3) : 369-376www.science.cmu.ac.th/journal-science/josci.htmlContributed Paper

On Development of Spoke Plot for Circular VariablesFakhrulrozi Hussain, Abdul G. Hussin*, and Yong Z. ZubairiCentre for Foundation Studies in Science, University of Malaya, 50603 Kuala Lumpur, Malaysia.*Author for correspondence; e-mail: [email protected]

Received: 5 January 2010Accepted: 12 August 2010

ABSTRACTRelationship and simple analysis between two any circular variables are the main

focus of this paper. Correlation measure, linear relationship and estimation of concentrationparameters are some examples of simple approach in analyzing two circular variables. Theseanalyses could be extended to the graphical plots to give a better understanding. A plot called“Spoke plot” which have been developed in MATLAB environments have been proposed toachieve this objective. As an illustration, the application of “Spoke plot” as a simple analysisbetween two circular variables using graphical plots in investigating the correlation, linearrelationship and estimating concentration parameters are given using the Malaysian winddirection data.

Keywords: concentration parameters, correlation coefficient, wind direction.

1. INTRODUCTIONCircular variables are common in some

data such as clock time, compass bearing,moon circle, wind direction, turbine andothers. The wind direction data is a type ofthe circular data. The data is measured on ascale that repeats itself or with angulardirections. Unlike the common linear data, thefeatures of circular data warrants appropriatetechniques and special inferential tools in theanalysis. The analysis of circular data has beenstudied for many years [1 - 4] as well as theapplication of the simple circular regressionmodel was discussed [5]. Another area of circularstatistics which is problems of outliers havebeen studied by Abu Zaid et al. [6]. Further, anumber of symmetric probability modelshave been developed for directional data and

these include the von Mises, wrapped Cauchy,Sen Gupta-Rattihalli and many others

However, the analysis of directional datais limited as user friendly statistical softwarethat deal with both exploratory data analysisas well as statistical inference are not many inthe market. Although theoretical formulationsin modeling circular variables have beendeveloped, the evaluations are somewhatcomplicated due to the nature of the data.Therefore, any development on the analysisof circular data as well as to incorporatefurther analysis into the available software isindispensable for circular data. This paperfocuses on graphical method in analyzing therelationship of two any circular variables viacorrelation and linear relationship.

Page 2: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

370 Chiang Mai J. Sci. 2010; 37(3)

2. CIRCULAR VARIABLESCircular random variable is one which

takes values on the circumference of a circle,i.e. they are angles in the range (0, 2π ] radiansor (0o,360o]. This random variable must beanalyzed by techniques differing from thoseappropriate for the usual Euclidean typevariables because the circumference is abounded closed space, for which the conceptof origin is arbitrary or undefined. Acontinuous linear variable is a random variablewith realizations on the straight line or realline which may be analyzed straightforwardlyby usual techniques. For example, if one wantsto compare the difference between twocircular data that have value 10o and 350o, byusing linear technique will give the answer of340o; but in reality, the difference betweenthese two data is only 20o which suggests theneed of special approach in dealing with suchvariables.

2.1 Correlation coefficient of circularvariables

The correlation measure will be used inthe development of the analysis of twocircular variables. Correlation or also knownas a measure of a correlation coefficientindicates the strength and direction of a linearrelationship between two random variables.In general statistical usage, correlation orco-relation refers to the departure of twovariables from independence. When the dataare linear there are several coefficients,measuring the degree of correlation, adaptedto the nature of data.

Given n pairs of circular data (θ1,ϕ1), ...,(θn,ϕn), where 0 < (θi,ϕi) < 2π, the circularcorrelation coefficient [3] is defined by

,

where -1< ρ̂Τ < + 1.

2.2 Linear relationship between twocircular variables

The regression model when bothvariables are circular produces is a veryinteresting form. The model is given by ϕ = α+ βθ + ε (mod2π), where ε is a circular randomerror having a von Mises distribution withcircular mean 0, and concentration parameterκ, which can be written as ε∼VM(0,κ ).This model have been discussed in detail forβ =1 [7] and also for β close to unity [8].The estimates of α, β and κ namely α̂, β^ and κ̂are given by

where S =Σ sin (ϕi - β^θi) and C=Σ cos (ϕi - β

^θi ).Due to the nonlinear nature of the first partialderivative of the log likelihood function, thenβ^ is obtained by iterative procedure accordingto the formula

The estimate of κ is given by

which can be approximated by

.

3. THE DEVELOPMENT OF SPOKE PLOTTwo circular variables (θ,ϕ) may be

represented in a Spoke plot of two concentricnon intersecting circles with any specific radius.To understand this idea, suppose we have apair of observation (45o, 90o) for variable θand ϕ respectively. In the Spoke plot thesetwo observations may be represented by a

Page 3: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

Chiang Mai J. Sci. 2010; 37(3) 371

line that connects these two points θ and ϕfrom inner circle (45o) to outer circle (90o)respectively as shown in Figure 1. The choiceof points for the inner and outer circle is

Table 1. Sample of circular data (in degree).

Θ 3.56 104.69 72.28 104.51 321.22 93.11 348.57 63.62 152.67 13.88

Φ 110.15 85.40 86.24 51.55 223.60 19.96 100.48 16.98 44.51 0.90

developed that allows us to look at the patternof any two circular variables. Therefore, givena set of data with size n where the coordinates(θi, ϕi) for i = 1,2,…, n are approximatelyequal (i.e. θi ϕi (mode 2π )) or linearly related,the set of n lines formed by connecting pointsfrom the inner circle to outer circle results in a

spoke-like image. This pattern suggests a linearassociation between the two circular variables.As an illustration, we generated values of θand ϕ from the von Mises distribution withmean π and concentration parameter 2 forboth θ and ϕ. A sample measure in degree ofthe simulated data is given in Table 1.

The Spoke plot may be used to representthe pair of points graphically. A program called“Spoke(spoke_data)” is developed withoutput as shown in Figure 2. The Spoke plotin Figure 2 could be an alternative approachto look at the relationship between twocircular variables θ and ϕ using diagrammatical

representation. As for comparison to linear, ascatter plot that is normally used in linearanalysis have been plotted as shown inFigure 3. The scatter plot, however could bemisleading due to the wrap around propertiesof the circular data and the Spoke plot seemsto be the better alternative.

Figure 1. Plot of a line that connecting two points (θ ,ϕ)from inner circle (45o) to outer circle (90o).

arbitrary; that is θ may very well be chosen asthe points in the inner and ϕ the points onouter circle respectively.

Based on this idea, a Spoke plot is

Page 4: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

372 Chiang Mai J. Sci. 2010; 37(3)

Figure 2. A sample of Spoke plot for variables θ and ϕ.

3.1 Spoke plot: Correlation coefficientAnother application of Spoke plot are

for graphical representation for correlationcoefficient and linear association. For the illus-tration purpose, a dataset called “spoke_data”is generated with size n = 30 based on asimple circular regression model, which isϕ = α + βθ + ε (mod2π). Without loss ofgenerality, α = 0 and β = 1 have been chosen.Variable θ have been generated fromVM(π/4,1.5) and ε from VM(0,30) respectively.The correlation coefficient ρ̂Τ is used tomeasure of linear association between thosetwo circular variables using a subprogramcalled “corr(spoke_data)”.

The subprogram is designed in such away that the first output gives a graphical viewof the relationship of two circular variables.This is followed by a numerical measure thatdescribes the strength of linearity. As anillustration the call function “spoke(spoke_data)” have been executed followed by callfunction “corr(spoke_data)”. From the outputas shown in Figure 4, the Spoke plot stronglysuggests the presence of linear associationbetween two circular variables. This finding isfurther supported by a numerical measure ofcorrelation value equals to 0.9555 as shownat the bottom of the plot.

Figure 3. A sample of Scatter plot for variables θ and ϕ.

Page 5: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

Chiang Mai J. Sci. 2010; 37(3) 373

Figure 4. The Spoke plot and correlation coefficient output of “spoke_data”.

3.2 Spoke plot: linear relationship usingsimple circular regression

Once a linear association is establishedfrom the graphical Spoke plot and supportedby a numerical measure of ρ̂Τ , the next stepof the analysis is to estimate the parametersof linear relationship of ϕ = α + βθ + ε(mod 2π), namely α and β respectively.Another subprogram called “alpha_regress

(spoke_data)” is written to give the estimatedvalue of α and β using the maximumlikelihood estimation. By using the similardataset, we found that the estimationparameters are α = 6.2792 radian and β =0.9981 respectively as shown in Figure 5.These estimated values are very close to thetrue values used in the simulation.

Another applicability of the Spoke plot

Figure 5. The Spoke plot shows linear association of α and β.

Page 6: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

374 Chiang Mai J. Sci. 2010; 37(3)

is to visually asses the concentration parameterκ of the data. A subprogram called “kappa_spoke(spoke_data)” have been written toestimate and display the concentrationparameter for both θ and ϕ which are κθ andκϕ respectively. In summary, the fourcommands given below produce a compre-hensive output of spoke plot as shown inFigure 6.

>> spoke(spoke_data)>> corr(spoke_data)>> alpha_regress(spoke_data)>> kappa_spoke(spoke_data)

The program and subprograms to obtainSpoke plot for circular variables are availableat URL: http://asasi.um.edu.my/download/Spoke_plot_program.doc.

Figure 6. A sample of comprehensive Spoke plot.

4. APPLICATION OF SPOKE PLOTAs an illustration, the Spoke plots are used

to two different real dataset as describedbelow:4.1 wind direction data recorded at 850

hectopascals (hPa) and 1,000 hectopascals(hPa) at time 12.00A.M. from Bayan LepasAirport, Malaysia, in July and August 2005,with the objective to compare two setsof wind direction data at two differentpressures.

4.2 wind direction data from the HoldernessCoastline, which is the Humberside coastof the North Sea, United Kingdom inOctober 1994 that were measured by HFradar and anchored wave buoy. The dataset

have 49 measurements recorded over theperiod 22.7 days.

The results are shown in Figures 7 and 8respectively. From the Spoke plot in Figure 7of data set (i), it can be seen that a number oflines crossing the inner ring implies that thereis no correlation between the variables. Tosupport the finding, the calculated correlationvalue is 0.1316 which indicates a very weakcorrelation. The linear relationship value,however shows a strong of one to one linearrelationship between the two circular variables.The concentration parameter estimate showsa small concentration for both θ and ϕ.

Page 7: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

Chiang Mai J. Sci. 2010; 37(3) 375

Figure 7. Spoke Plot of wind direction data recorded at 850 hPa and 1,000hPa, from Bayan Lepas Airport, Malaysia, in July and August 2005.

Figure 8. Spoke plot of wind direction measured by HF radar and anchored wave buoy,Holderness Coastline, Humberside coast of the North Sea, U.K., in October 1994.

From the Spoke plot in Figure 8 of dataset (ii), it can be seen that none of the linecrosses the inner ring which indicates a strongcorrelation between the variables. To supportthe finding, the calculated correlation value of

0.8680 is obtained. The linear associationmeasure shows a strong of one to one linearrelationship between the two circular variables.Further, concentration parameters show asmall concentration for both θ and ϕ.

Page 8: On Development of Spoke Plot for Circular Variables · A plot called “Spoke plot” which have been developed in MATLAB environments have been proposed to achieve this objective

376 Chiang Mai J. Sci. 2010; 37(3)

5. DISCUSSIONIn this paper, a pragmatic approach in

the exploratory data analysis of circularvariables is developed. The novelty of themethodology is that it combines both visualmethodology and numerical approach in theanalysis. With the name Spoke plot, the analysisfocuses on identifying the relationship of twocircular variables. Plot of Spoke graph,correlation measure, linear relationship andconcentration parameter value provides acomprehensive exploratory data analysis. Insummary, the method of analysis developedin this study has great potential beingdeveloped into comprehensive statisticalsoftware dedicated to circular variables. Withthe proper interface, option like graphical-user-interface (GUI), point-and-click window andmultiple windows can make the software assophisticated as those designed for linear data.

REFERENCES

[1] Mardia K.V., Statistics of Directional Data,London: Academic Press Inc, 1972.

[2] Batschelet E., Circular Statistics in Biology,London: Academic Press Inc, 1981.

[3] Fisher N.I., Statistical Analysis of CircularData, Cambridge University Press, 1993.

[4] Mardia K.V., and Jupp P.E., DirectionalStatistics, London: Academic Press Inc,2000.

[5] Jammalamadaka S.R., Topics in CircularStatistics, World Scientific Publishing Co.Pte. Ltd, 2001.

[6] Abu Zaid A.H.M., Mohamed I. andHussin A.G., A new test of discordancyin circular data. Communications in Statistics– Simulation and Computation, 2009; 38:682-691.

[7] Caires S. and Wyatt L.R., A linearfunctional relationship model for circulardata with an application to the assessmentof ocean wave measurements. Journal ofAgricultural, Biological, and EnvironmentalStatistics, 2003; 8: 153-169.

[8] Hussin A.G., Fieller N.R.J. and StillmanE.C., Linear regression for circularvariables with application to directionaldata. Journal of Applied Science andTechnology, 2004; 9: 1-6.