Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Chiang Mai J. Sci. 2010; 37(3) 369
Chiang Mai J. Sci. 2010; 37(3) : 369-376www.science.cmu.ac.th/journal-science/josci.htmlContributed Paper
On Development of Spoke Plot for Circular VariablesFakhrulrozi Hussain, Abdul G. Hussin*, and Yong Z. ZubairiCentre for Foundation Studies in Science, University of Malaya, 50603 Kuala Lumpur, Malaysia.*Author for correspondence; e-mail: [email protected]
Received: 5 January 2010Accepted: 12 August 2010
ABSTRACTRelationship and simple analysis between two any circular variables are the main
focus of this paper. Correlation measure, linear relationship and estimation of concentrationparameters are some examples of simple approach in analyzing two circular variables. Theseanalyses could be extended to the graphical plots to give a better understanding. A plot called“Spoke plot” which have been developed in MATLAB environments have been proposed toachieve this objective. As an illustration, the application of “Spoke plot” as a simple analysisbetween two circular variables using graphical plots in investigating the correlation, linearrelationship and estimating concentration parameters are given using the Malaysian winddirection data.
Keywords: concentration parameters, correlation coefficient, wind direction.
1. INTRODUCTIONCircular variables are common in some
data such as clock time, compass bearing,moon circle, wind direction, turbine andothers. The wind direction data is a type ofthe circular data. The data is measured on ascale that repeats itself or with angulardirections. Unlike the common linear data, thefeatures of circular data warrants appropriatetechniques and special inferential tools in theanalysis. The analysis of circular data has beenstudied for many years [1 - 4] as well as theapplication of the simple circular regressionmodel was discussed [5]. Another area of circularstatistics which is problems of outliers havebeen studied by Abu Zaid et al. [6]. Further, anumber of symmetric probability modelshave been developed for directional data and
these include the von Mises, wrapped Cauchy,Sen Gupta-Rattihalli and many others
However, the analysis of directional datais limited as user friendly statistical softwarethat deal with both exploratory data analysisas well as statistical inference are not many inthe market. Although theoretical formulationsin modeling circular variables have beendeveloped, the evaluations are somewhatcomplicated due to the nature of the data.Therefore, any development on the analysisof circular data as well as to incorporatefurther analysis into the available software isindispensable for circular data. This paperfocuses on graphical method in analyzing therelationship of two any circular variables viacorrelation and linear relationship.
370 Chiang Mai J. Sci. 2010; 37(3)
2. CIRCULAR VARIABLESCircular random variable is one which
takes values on the circumference of a circle,i.e. they are angles in the range (0, 2π ] radiansor (0o,360o]. This random variable must beanalyzed by techniques differing from thoseappropriate for the usual Euclidean typevariables because the circumference is abounded closed space, for which the conceptof origin is arbitrary or undefined. Acontinuous linear variable is a random variablewith realizations on the straight line or realline which may be analyzed straightforwardlyby usual techniques. For example, if one wantsto compare the difference between twocircular data that have value 10o and 350o, byusing linear technique will give the answer of340o; but in reality, the difference betweenthese two data is only 20o which suggests theneed of special approach in dealing with suchvariables.
2.1 Correlation coefficient of circularvariables
The correlation measure will be used inthe development of the analysis of twocircular variables. Correlation or also knownas a measure of a correlation coefficientindicates the strength and direction of a linearrelationship between two random variables.In general statistical usage, correlation orco-relation refers to the departure of twovariables from independence. When the dataare linear there are several coefficients,measuring the degree of correlation, adaptedto the nature of data.
Given n pairs of circular data (θ1,ϕ1), ...,(θn,ϕn), where 0 < (θi,ϕi) < 2π, the circularcorrelation coefficient [3] is defined by
,
where -1< ρ̂Τ < + 1.
2.2 Linear relationship between twocircular variables
The regression model when bothvariables are circular produces is a veryinteresting form. The model is given by ϕ = α+ βθ + ε (mod2π), where ε is a circular randomerror having a von Mises distribution withcircular mean 0, and concentration parameterκ, which can be written as ε∼VM(0,κ ).This model have been discussed in detail forβ =1 [7] and also for β close to unity [8].The estimates of α, β and κ namely α̂, β^ and κ̂are given by
where S =Σ sin (ϕi - β^θi) and C=Σ cos (ϕi - β
^θi ).Due to the nonlinear nature of the first partialderivative of the log likelihood function, thenβ^ is obtained by iterative procedure accordingto the formula
The estimate of κ is given by
which can be approximated by
.
3. THE DEVELOPMENT OF SPOKE PLOTTwo circular variables (θ,ϕ) may be
represented in a Spoke plot of two concentricnon intersecting circles with any specific radius.To understand this idea, suppose we have apair of observation (45o, 90o) for variable θand ϕ respectively. In the Spoke plot thesetwo observations may be represented by a
Chiang Mai J. Sci. 2010; 37(3) 371
line that connects these two points θ and ϕfrom inner circle (45o) to outer circle (90o)respectively as shown in Figure 1. The choiceof points for the inner and outer circle is
Table 1. Sample of circular data (in degree).
Θ 3.56 104.69 72.28 104.51 321.22 93.11 348.57 63.62 152.67 13.88
Φ 110.15 85.40 86.24 51.55 223.60 19.96 100.48 16.98 44.51 0.90
developed that allows us to look at the patternof any two circular variables. Therefore, givena set of data with size n where the coordinates(θi, ϕi) for i = 1,2,…, n are approximatelyequal (i.e. θi ϕi (mode 2π )) or linearly related,the set of n lines formed by connecting pointsfrom the inner circle to outer circle results in a
spoke-like image. This pattern suggests a linearassociation between the two circular variables.As an illustration, we generated values of θand ϕ from the von Mises distribution withmean π and concentration parameter 2 forboth θ and ϕ. A sample measure in degree ofthe simulated data is given in Table 1.
The Spoke plot may be used to representthe pair of points graphically. A program called“Spoke(spoke_data)” is developed withoutput as shown in Figure 2. The Spoke plotin Figure 2 could be an alternative approachto look at the relationship between twocircular variables θ and ϕ using diagrammatical
representation. As for comparison to linear, ascatter plot that is normally used in linearanalysis have been plotted as shown inFigure 3. The scatter plot, however could bemisleading due to the wrap around propertiesof the circular data and the Spoke plot seemsto be the better alternative.
Figure 1. Plot of a line that connecting two points (θ ,ϕ)from inner circle (45o) to outer circle (90o).
arbitrary; that is θ may very well be chosen asthe points in the inner and ϕ the points onouter circle respectively.
Based on this idea, a Spoke plot is
372 Chiang Mai J. Sci. 2010; 37(3)
Figure 2. A sample of Spoke plot for variables θ and ϕ.
3.1 Spoke plot: Correlation coefficientAnother application of Spoke plot are
for graphical representation for correlationcoefficient and linear association. For the illus-tration purpose, a dataset called “spoke_data”is generated with size n = 30 based on asimple circular regression model, which isϕ = α + βθ + ε (mod2π). Without loss ofgenerality, α = 0 and β = 1 have been chosen.Variable θ have been generated fromVM(π/4,1.5) and ε from VM(0,30) respectively.The correlation coefficient ρ̂Τ is used tomeasure of linear association between thosetwo circular variables using a subprogramcalled “corr(spoke_data)”.
The subprogram is designed in such away that the first output gives a graphical viewof the relationship of two circular variables.This is followed by a numerical measure thatdescribes the strength of linearity. As anillustration the call function “spoke(spoke_data)” have been executed followed by callfunction “corr(spoke_data)”. From the outputas shown in Figure 4, the Spoke plot stronglysuggests the presence of linear associationbetween two circular variables. This finding isfurther supported by a numerical measure ofcorrelation value equals to 0.9555 as shownat the bottom of the plot.
Figure 3. A sample of Scatter plot for variables θ and ϕ.
Chiang Mai J. Sci. 2010; 37(3) 373
Figure 4. The Spoke plot and correlation coefficient output of “spoke_data”.
3.2 Spoke plot: linear relationship usingsimple circular regression
Once a linear association is establishedfrom the graphical Spoke plot and supportedby a numerical measure of ρ̂Τ , the next stepof the analysis is to estimate the parametersof linear relationship of ϕ = α + βθ + ε(mod 2π), namely α and β respectively.Another subprogram called “alpha_regress
(spoke_data)” is written to give the estimatedvalue of α and β using the maximumlikelihood estimation. By using the similardataset, we found that the estimationparameters are α = 6.2792 radian and β =0.9981 respectively as shown in Figure 5.These estimated values are very close to thetrue values used in the simulation.
Another applicability of the Spoke plot
Figure 5. The Spoke plot shows linear association of α and β.
374 Chiang Mai J. Sci. 2010; 37(3)
is to visually asses the concentration parameterκ of the data. A subprogram called “kappa_spoke(spoke_data)” have been written toestimate and display the concentrationparameter for both θ and ϕ which are κθ andκϕ respectively. In summary, the fourcommands given below produce a compre-hensive output of spoke plot as shown inFigure 6.
>> spoke(spoke_data)>> corr(spoke_data)>> alpha_regress(spoke_data)>> kappa_spoke(spoke_data)
The program and subprograms to obtainSpoke plot for circular variables are availableat URL: http://asasi.um.edu.my/download/Spoke_plot_program.doc.
Figure 6. A sample of comprehensive Spoke plot.
4. APPLICATION OF SPOKE PLOTAs an illustration, the Spoke plots are used
to two different real dataset as describedbelow:4.1 wind direction data recorded at 850
hectopascals (hPa) and 1,000 hectopascals(hPa) at time 12.00A.M. from Bayan LepasAirport, Malaysia, in July and August 2005,with the objective to compare two setsof wind direction data at two differentpressures.
4.2 wind direction data from the HoldernessCoastline, which is the Humberside coastof the North Sea, United Kingdom inOctober 1994 that were measured by HFradar and anchored wave buoy. The dataset
have 49 measurements recorded over theperiod 22.7 days.
The results are shown in Figures 7 and 8respectively. From the Spoke plot in Figure 7of data set (i), it can be seen that a number oflines crossing the inner ring implies that thereis no correlation between the variables. Tosupport the finding, the calculated correlationvalue is 0.1316 which indicates a very weakcorrelation. The linear relationship value,however shows a strong of one to one linearrelationship between the two circular variables.The concentration parameter estimate showsa small concentration for both θ and ϕ.
Chiang Mai J. Sci. 2010; 37(3) 375
Figure 7. Spoke Plot of wind direction data recorded at 850 hPa and 1,000hPa, from Bayan Lepas Airport, Malaysia, in July and August 2005.
Figure 8. Spoke plot of wind direction measured by HF radar and anchored wave buoy,Holderness Coastline, Humberside coast of the North Sea, U.K., in October 1994.
From the Spoke plot in Figure 8 of dataset (ii), it can be seen that none of the linecrosses the inner ring which indicates a strongcorrelation between the variables. To supportthe finding, the calculated correlation value of
0.8680 is obtained. The linear associationmeasure shows a strong of one to one linearrelationship between the two circular variables.Further, concentration parameters show asmall concentration for both θ and ϕ.
376 Chiang Mai J. Sci. 2010; 37(3)
5. DISCUSSIONIn this paper, a pragmatic approach in
the exploratory data analysis of circularvariables is developed. The novelty of themethodology is that it combines both visualmethodology and numerical approach in theanalysis. With the name Spoke plot, the analysisfocuses on identifying the relationship of twocircular variables. Plot of Spoke graph,correlation measure, linear relationship andconcentration parameter value provides acomprehensive exploratory data analysis. Insummary, the method of analysis developedin this study has great potential beingdeveloped into comprehensive statisticalsoftware dedicated to circular variables. Withthe proper interface, option like graphical-user-interface (GUI), point-and-click window andmultiple windows can make the software assophisticated as those designed for linear data.
REFERENCES
[1] Mardia K.V., Statistics of Directional Data,London: Academic Press Inc, 1972.
[2] Batschelet E., Circular Statistics in Biology,London: Academic Press Inc, 1981.
[3] Fisher N.I., Statistical Analysis of CircularData, Cambridge University Press, 1993.
[4] Mardia K.V., and Jupp P.E., DirectionalStatistics, London: Academic Press Inc,2000.
[5] Jammalamadaka S.R., Topics in CircularStatistics, World Scientific Publishing Co.Pte. Ltd, 2001.
[6] Abu Zaid A.H.M., Mohamed I. andHussin A.G., A new test of discordancyin circular data. Communications in Statistics– Simulation and Computation, 2009; 38:682-691.
[7] Caires S. and Wyatt L.R., A linearfunctional relationship model for circulardata with an application to the assessmentof ocean wave measurements. Journal ofAgricultural, Biological, and EnvironmentalStatistics, 2003; 8: 153-169.
[8] Hussin A.G., Fieller N.R.J. and StillmanE.C., Linear regression for circularvariables with application to directionaldata. Journal of Applied Science andTechnology, 2004; 9: 1-6.