perhitungan manual getis ord.PDF

Embed Size (px)

Citation preview

  • 8/13/2019 perhitungan manual getis ord.PDF

    1/18

    Arthur GetisJ . K . Ord

    The Analysis of Spatial Association by Use ofDistance Statistics

    Introduced in th is paper is a fam ily of statistics G, that can be used as a measureof spatial association in a n um be r of circumstances. T he basic statistic i s derivedits properties are identijied and its advantages explained. Several of the G statis-tics make it possible to evaluate the spatial association of a variable within aspecijied distance of a single point. A comparison is made between a general Gstatistic and Morans I f o r similar hypothetical and empirical conditions. Th eempirical work includes studies of sudden infant death syndrome b y county inNorth Carolina and dwelling unit prices in metropolitan San D iego by zip-codedistricts. Results indicate that G statistics should be used in conjunction with I inord er to identijiy characteristics of patterns no t revealed by the I statistic aloneand specijically the Gi and GT statistics enable us to detect local pockets ofdependence t ha t ma y not show u p wh en using global statistics.INTRODUCTION

    The importance of examining spatial series for spatial correlation and autocor-relation is undeniable. Both Anselin and Griffith (1988) and Arbia (1989) haveshown that failure to take necessary steps to account for or avoid spatial autocor-relation can lead to serious errors in model interpretation. In spatial modeling,researchers must not only account for dependence structure and spatial hetero-skedasticity, they must also assess the effects of spatial scale. In the last twentyyears a number of instruments for testing for and measuring spatial autocorrelationhave appeared. To geographers, the best-known statistics are Morans1 and, to alesser extent, Gearys c (Cliff and Ord 1973). To geologists and remote sensinganalysts, the semi-variance is most popular (Davis 1986). To spatial econometri-cians, estimating spatial autocorrelation coefficients of regression equations is theusual approach (Anselin 1988).

    The authors wish to thank the referees for their perceptive comments on an earlier draft, which

    Arthur Getis is professor of geography at Sun Diego State University.1 K . Ord isthe David H . McKinley Professor of Business Administration in the department ofmanagement science and information systems at The Pennsylvania State University.Geographical Analysis, Vol. 24, No. 3 (July 1992) 1992 Ohio State University PressSubmit ted 9/90. Revised version accepted 4/16/91.

    led to considerable improvements in the paper.

  • 8/13/2019 perhitungan manual getis ord.PDF

    2/18

    190 I Geographical AnalysisA common feature of these procedures is that they are applied globally, that is,

    to the complete region under study. However, it is often desirable to examinepattern at a more local scale, particularly if the process is spatially nonstationary.Foster and Gorr (1986)provide an adaptive filtering method for smoothing param-eter estimates, and Cressie and Read (1989)present a modeling procedure. Theideas presented in this paper are complementary to these approaches in that wealso focus upon local effects, but from the viewpoint of testing rather thansmoothing.

    This paper introduces a family of measures of spatial association called G statis-tics. These statistics have a number of attributes that make them attractive formeasuring association in a spatially distributed variable. When used in conjunc-tion with a statistic such as Morans I, they deepen the knowledge of the processesthat give rise to spatial association, in that they enable us to detect local pocketsof dependence that may not show up when using global statistics. In this paper,we first derive the statistics Gi(d) nd G(d) , hen outline their attributes. Next,the G ( d ) statistic is compared with Morans I. Finally, there is a discussion ofempirical examples. The examples are taken from two different geographic scalesof analysis and two different sets of data. They include sudden infant death syn-drome by county in North Carolina, and house prices by zip-code district in theSan Diego metropolitan area.

    THE Gi(d)STATISTICThis statistic measures the degree of association that results from the concentra-

    tion of weighted points (or area represented by a weighted point) and all otherweighted points included within a radius of distance d from the original weightedpoint. We are given an area subdivided into n regions, i = 1 , 2 , . . . , n, whereeach region is identified with a point whose Cartesian coordinates are known.Each i has associated with it a value x (a weight) taken from a variable X. Thevariable has a natural origin and is positive. The Gi(d) tatistic developed belowallows for tests of hypotheses about the spatial concentration of the sum of x valuesassociated with thej points within d of the ith point.The statistic is

    nz W,(d)Xjx xjGi(d)= = , j not equal to i ,j = 1

    where {w,} is a symmetric one/zero spatial weight matrix with ones for all linksdefined as being within distance d of a given i ; all other links are zero includingthe link of point i to itself. The numerator is the sum of all xj within d of i but notincluding x i . The denominator is the sum of all xj not including x i .Adopting standard arguments (cf. Cliff and Ord 1973, pp. 32-33), we may fixthe value xi or the i th point and consider the set of (n - I ) random permutationsof the remaining x values at the j points. Under the null hypothesis of spatialindependence , these permutations are equally likely. That is, let X j be the randomvariable describing the value assigned to point j then

    P(Xj = x, = l / (n- 1) r z i ,and E (X j ) = 2 x,/(n - 1 )r i

  • 8/13/2019 perhitungan manual getis ord.PDF

    3/18

    Arthur Getis and J . K . Ord I 191Thus E(Gi)= Zwij(d) ( X j ) / C X jj # i j i

    = Wi/(n- 1) , (2)where Wi = Zj w,(d)

    Similarly,

    Since E(Xj)= 8 $/(n - 1)r i

    = { G xr)2 - c x;}/(n - l)(n - 2)r+i r iRecalling that t he weights a re binary

    Z8WijWik = - wij k

    and so

    wicjg Wi(Wi- 1 )2 - - ~ [ Zj x j)2 - Zj I }E(Gi)- Z j X j ) 2{ n - 1) ( n - l)(n - 2)Thus Var(Gi) = E(Gf) - E2(Gi)

    w~i(Wi- 1 )I n - l)(n - 2) (n - 1)2 .W,(n- 1 - Wi)Zj x ( n - l)(n - 2)Z j xj cj$If we set = Yi, and - YfI = Y,, ,

    (n - 1) (n - 1)W, n- 1 - Wi)(n - 1)n - 2)then Var(G,) = (3)

    As expected, Var(Gi) = 0 when W i = 0 (no neighbors within d ) , or whenWi = n - 1 (all n - 1 observations are within d ) , or when Yi2 = 0 (all n - 1observations are equal).Note that Wi, YiI, and Yi2 depend on i . Since Gi is a weighted sum of thevariable X j , and th e denominator of Gi is invariant under random permutations of

    { x j , j i } , it follows, provided WJ(n- 1) is hounded away from 0 and from 1,that the permutations distribution of Giunder H , approaches normality as n 0;cf. Hoeffding (1951) and Cliff and Ord (1973, p. 36). When d, and thus Wi, issmall, normality is lost, and when d is large enough to encompass the whole study

  • 8/13/2019 perhitungan manual getis ord.PDF

    4/18

    192 I Geographical AnalysisTABLE 1Characteristics of Gi Statistics

    j not equal to j may equal iStatistic Gt 4 G: 4Expression

    Definitions xj jY', =-(n - 1 z j xjYif =- nExpectation W J ( n- 1) Wf InVariance W, n - 1 - WJ Y ,(n - 1) (n - 2 ) Y i

    W f ( n - W f )YEn'(n - 1) (Yif)'

    area, and thus (n - 1 - Wi) s small, normality is also lost. I t is important to notethat the conditions must be satisfied separately for each point if its Gi is to beassessed via the normal approximation.Table 1 shows the characteristic equations for Gi(d)and the related statistic,Gf(d ) ,which measures association in cases where t h e j equal to i term is includedin the statistic. This implies that any concentration of the x values includes the xat i . Note that the distribution of GT(d)is evaluated under the null hypothesisthat all n random permutations are equally likely.ATTRIBUTES OF Gi STATISTICSIt is important to note that Gi is scale-invariant Y i = bXi yields the same scoresas Xi> but not location-invariant (Yi = a + X i gives different results than X i ) . Thestatistic is intended for use only for those variables that possess a natural origin.Like all other such statistics, transformations like Y i = log X i will change theresults.Gi(d)measures the concentration or lack of concentration of the sum of valuesassociated with variable X in the region under study. Gi(d) s a proportion of thesum of all xj values that are within d of i. If, for example, high-value x.s are withind of point i then Gi(d)s high. Whether the Gi(d) alue is statisticallfy significantdepends on the statistic's distribution.

    Earlier work on a form of the Gi(d) tatistic is in Getis (1984), Getis and Franklin(1987), and Getis (1991). Their work is based on the second-order approach tomap pattern analysis developed by Ripley (1977).In typical circumstances, the null hypothesis is that the set of x values within dof location i is a random sample drawn without replacement from the set of allx values. The estimated Gi(d) s computed from equation 1)using the observedxj values. Assuming that Gi(d)s approximately normally distributed, when

    Zi = {Gi(d)- E[Gi(d)]}/- (4)is positively or negatively greater than some specified level of significance, thenwe say that positive or negative spatial association obtains. A large positive Ziimplies that large values of xj (values above the mean xj) are within d of point i. Alarge negative Zi means that small values of xj are within d of point i.

  • 8/13/2019 perhitungan manual getis ord.PDF

    5/18

    Arthur Getis an d] . K.Ord I 193A special feature of this statistic is that the pattern of data points is neutralizedwhen the expectation is that all x values are the same. This is illustrated for thecase when data point densities are high in the vicinity of point i , and d is just largeenough to contain the area of the clustered points. Theoretical G,(d)values arehigh because W is high. However, only if the observed x j values in the vicinity of

    point i differ systematically from the mean is there the opportunity to identifysignificant spatial concentration of the sum of x j s . That is, as data points becomemore clustered in the vicinity of point i , the expectation of G,(d) ises, neutralizingthe effect of the dense cluster o f j values.In addition to its above meaning, the value of d can be interpreted as a distancethat incorporates specified cells in a lattice. I t is to be expected that neighboringGi will be correlated if d includes neighbors. To examine this issue, consider aregular lattice. When n is large, the denominator of each G, is almost constant soit follows that corr (Gi,Gj>= proportion of neighbors that i andj have in common.EXAMPLE

    Consider the rooks case. Cell i has no common neighbors with its four imme-diate neighbors, but two with its immediate diagonal neighbors. The numbers ofcommon neighbors are as illustrated below:0 1 0

    0 2 0 2 0l O i O l0 2 0 2 0

    0 1 0All the other cells have no common neighbors with i . Thus, the G-indices for thefour diagonal neighbors have correlations of about 0.5 with G,, four others havecorrelations of about 0.25 nd the rest are virtually uncorrelated.For more highly connected lattices (such as the queen's case) the array ofnonzero correlations stretches further, but the maximum correlation between anypair of G-indices remains about 0.5. AEXAMPLE

    m m m m m m m m m mm A A A m m B B B mm A A A m m B B B mm A A A m m B B B mm m m m m m m m m m

    Set A B = 2m, herefore? = m; = 50;A 0;B 0;put A = m l c),B = m l - c),0 5 1Using this example, the G, and Gf statistics are compared in the following table.Gi and Gf Values (queen's case; non-edge cells)

    cell Gi Z(G) G: ZG:8 8c 9 + 9c

    509 - c 9 4c49 - c 50

    5.472.43

    A, urrounded by As 5.30A, adjacent to ms + 3c 2.06

  • 8/13/2019 perhitungan manual getis ord.PDF

    6/18

    194 I Geographical Analysis. ..

    8 3c 9 3ccentral m, adjacent to As 1.89' 1.8249 50other m, adjacent to As

    8 + 2c49 1.26'

    9 + 3c50 1.21Values for Bs are the same, with negative signs attached.These values are lower bounds as c 1; they vary only slightly with c.

    We note that G i and GT are similar in this case; if the central A was replaced bya B, Z Gi)would be unchanged, whereas Z(GT)drops to 4.25. Thus, G, and GTtypically convey much the same information. AE X A M P L E

    Consider a large regular lattice for which we seek the distribution under H o forGT with W ineighbors. Let p = proportion of As = proportion of Bs and 1 - 2p= proportion of ms.Let kl, 2, k3) denote the number of As, Bs, and ms, respectively so thatkl k, + k3 = n. For large lattices, in this case, the joint distribution isapproximately tri(mu1ti-)nomialwith index Wand parameters ( p , p , 1 - 2p).

    wi + ( k , - k,)cSince GT = nclearly E(G?) = Wi/n as expectedand V(G?) = 2pWi/n,reflecting the large sample approximation. The distribution is symmetric and thestandardized fourth moment is

    This is close to 3 provided pWi is not too small.Since we are using Gi and GT primarily in a diagnostic mode, we suggest thatWi 8 at least (that is, the queen's case), although further work is clearly neces-sary to establish cut-off values for the statistics. AA G E N E R A L G STATISTIC

    Following from these arguments, a general statistic, G(d), can be developed.The statistic is general in the sense that it is based on all pairs of values xi, i )such that i and j are within distance d of each other. No particular location i isfixed in this case. The statistic is

  • 8/13/2019 perhitungan manual getis ord.PDF

    7/18

    Arthu r Getis and J . K. Ord I 195The G-statistic is a member of the class of linear permutation statistics, first

    introduced by Pitman (1937). Such statistics were first considered in a spatialcontext by Mantel (1967) and Cliff and Ord (1973), and developed as a generalcross-product statistic by Hubert (1977 and 1979), and Hubert, Golledge, andCostanzo (1981).

    For equation (5),W = Z C wij(d) j not equal to ii = l j - 1

    so thatE[G(d)] = W/[n n l)] .

    The variance of G follows from Cliff and Ord (1973, pp. 70-71):

    where mj = Z x < , j = 1, 2, 3, 4and

    i = l

    dr)= n(n - l)(n - 2) . (n - r 1)The coefficients, B, are

    Bo = n2 - 3n + 3)S, - ns, 3w2 ;B l = - [ (n2 - n)S1 - 2nSz + 3WI ;B2 = - [ 2 n S 1 - (n + 3 ) s ~ W] ;B3 = 4(n - 1)S, - 2 (n 1)SZ + sw2 ;

    and B, = S, - S2 + W 2where S , = /z C 2 (wij + wji)' j not equal to i ,and S2 = Z (wi. + w.~) ;

    i j

    wi, = Zjw, j not equal to i ;ithus

    Var(G) = E(G2) - {W/[n(n l)]}, (7)T H E G (d) T A T IS T IC A N D M O R A N S I C O M P A R E D

    The G(d) statistic measures overall concentration or lack of concentration of allpairs of (xi, xj) such that i a n d j are within d of each other. Following equation (5 ) ,one finds G(d) by taking the sum of the multiples of each xi with all xjs within d ofall i as a proportion of the sum of all xixj. Moran's I , on the other hand, is oftenused to measure the correlation of each xi with all xjs within d of i and, therefore,

  • 8/13/2019 perhitungan manual getis ord.PDF

    8/18

    196 I Geographical Analysisis based on the degree of covariance within d of all x i . Consider K1, K 2 as constantsinvariant under random permutations. Then using summation shorthand we have

    G(d) = K , ZZ wij x i x jand 1(d)= K2 Z.C.w~ x i - X ) ( x j - 3)

    = (K 2 l K l ) G(d) - K2X Z (wi. w , ~ ) x ~K2T2WZ wv and w , ~ Z wji .J jwhere wi, =

    Since both G(d) and Z(d) can measure the association among the same set ofweighted points or areas represented by points, they may be compared. They willdiffer when the weighted sums Zwi .x iand Z W , ~ X ~iffer from ME, hat is, when the.patterns of weights are unequal. The basic hypothesis is of a random pattern ineach case. We may compare the performance of the two measures by using theirequivalent Z values of the approximate normal distribution.EXAMPLESet A + B = 2m, thereforex = m; n = 50;Let us use the lattice of Example 2. As before,A 0;B 0;put A = m(l + c), B = m(l - c),In addition, put 0 5 c 5 1.a = A - m ;B = 2m - A = m - a;

    B - m = a ;m a;j not equal to i.

    For the rooks case, W = 22 wij = 170.n wij(xi- X)(xj - Z) - 50 * 24a2 2 = o.784- 170 * 18a2Z w Z ( X i - w)2

    for all choices of a, m.Var 1)= 0.010897

    Z Z) = 7.7088 whenever A > BZZ wv x ix j - 24A2 + 24B2 + 24Am + 24Bm 74m2G = -ZZ xix j

    - 170 48c22500m2- 9A2 - 9B2- 32m2

    - 2450 - 18c2When c = 0, A = B = m, and G is a minimum.

    G = 17012450 = 0.0694 andVar(G,J = 0.0000 from equation (7)

  • 8/13/2019 perhitungan manual getis ord.PDF

    9/18

    Arthur Getis andJ . K. Ord I 197When c = 1, = 2m, = 0, and G is a maximum.

    G = 21812432 = 0.0896Var(GmX)= 0.000011855

    Z G,x) = 5.87 or any mG depends on the relative absolute magnitudes of the sample values. Note thatI is positive for any A and B, hile G values approach a maximum when the ratioof A o B or B o A becomes large. A

    EXAMPLEm m m m m m m m m mm m m m m m m m m mm m A m m m m B m mm m m m m m m m m mm m m m m m m m m m

    A, , n, Was in Examples 2 and 4.= 0, for any possibleA r m

    Z(Z) = 0.1920 ince E(Z) = - l/ n l , whenever A > BG = G = 0.0694,or any possibleA r m

    Var(Gmi,) = 0, but Var(G,,) = 0.00000059Z(Gmx) = 0.0739

    Neither statistic can differentiate between a random pattern and one with littlespatial variation. Contributions to G(d) are large only when the product x i x j islarge, whereas contributions to Z(d)are large when xi- m) xj m) s large. Itshould be noted that the distribution is nowhere near normal in this case. AEXAMPLE

    m m m m m m m m m mm A B A m m B A B mm B A B m m A B A mm A B A m m B A B mm m m m m m m m m m

    A , n, W as in the above examples.= -0.7843

    Var(Z) = 0.010897Z(Z) = -7.3177

    When A = 2m and B = 0 ,G = 0.0502

    Var(G) = 0.00001189Z(G) = -5.5760

  • 8/13/2019 perhitungan manual getis ord.PDF

    10/18

    198 I Geographical Analysis

    Standard Normal Variates for G ( d )and Z(d)under Varying Circumstances for a Specified d ValueSituation ZG) z(J)

    HHHMMM 0 0

    Random 0 0HL _ML - -LL

    - __

    Key:HH = pattern of high values of xs within d of other high r valuesM = moderate valuesL = low valuesRandom = no discernible pattern of xs+ = strong positive association (high positive Z scores)

    = moderate positive association0 = no associationThis combination tends to be more negative than HL.

    The juxtaposition of high values next to lows provides the high negative covari-ance needed for the strong negative spatial autocorrelation Z Z), but it is themultiplicative effect of high values near lows that has the negative effect on Z(G).A

    Table 2 gives some idea of the values of Z(G)and Z(Z) under various circum-stances. The differences result from each statistics structure. As shown in theexamples above, if high values within d of other high values dominate the pattern,then the summation of the products of neighboring values is high, with resultinghigh positive Z(G)values. If low values within d of low values dominate, then thesum of the product of the xs is low resulting in strong negative Z(G)values. In theMorans case, both when high values are within d of other high values and lowvalues are within d of other low values, positive covariance is high, with resultinghigh Z(Z) values.GENERAL DISCUSSION

    Any test for spatial association should use both types of statistics. Sums ofproducts and covariances are two different aspects of pattern. Both reflect th edependence structure in spatial patterns. The Z(d) statistic has its peculiar weak-ness in not being able to discriminate between patterns that have high valuesdominant within d or low values dominant. Both statistics have difficulty discern-ing a random pattern from one in which there is little deviation from the mean.If a study requires that Z(d) or G(d)values be traced over time, there areadvantages to using both statistics to explore the processes thought to be respon-sible for changes in association among regions. If data values increase or decreaseat the same rate, that is, if they increase or decrease in proportion to their alreadyexisting size, Morans changes while G(d) emains the same. On the other hand,if all x values increase or decrease by the same amount, G ( d )changes but Z(d)remains the same.

    It must be remembered that G(d) s based on a variable that is positive and hasa natural origin. Thus, for example, it is inappropriate to use G ( d ) o study resid-uals from regression. Also, for both Z d) and G(d)one must recognize that trans-formations of the variable X result in different values for the test statistic. As has

  • 8/13/2019 perhitungan manual getis ord.PDF

    11/18

    Arthur Geti s a n d J . K . O r d I 199been mentioned above, conditions may arise when d is so small or large that testsbased on the normal approximation are inappropriate.

    E M P I R I C A L E X A M P L E SThe following examples of the use of G statistics were selected based on sizeand type of spatial units, size of the x values, and subject matter. The first is a

    problem concerning the rate of sudden infant death syndrome by county in NorthCarolina, and th e second is a study of the mean price of housing units sold by zip-code district in the San Diego metropolitan region. In both cases the data areexplained, hypotheses made clear, and G ( d )and Z(d) values calculated for com-parable circumstances.1 . Sudden Infant Death Syndrome (SIDS)by Cou nty in North Carolina

    SIDS is the sudden death of an infant one year old or less that is unexpectedand inexplicable after a postmortem examination (Cressie and Chan 1989). Thedata presented by Cressie and Chan were collected from a variety of sources citedin the article. Among other data, the authors give the number of SIDs by countyfo,r the period 197S1984, the number of births for the same period, and thecoordinates of the counties. We use as our data the number of SIDs as a proportionof births multiplied by 1000 (see Figure 1). Since no viral or other causes havebeen given for SIDS, one should not expect any spatial association in the data. Tosome extent, high or low rates may be dependent on the health care infantsreceive. The rates may correlate with variables such as income or the availabilityof physicians services. In this study we shall not expect any spatial association.

    Table 3 gives the values for the standard normal variate of I and G for variousdistances.Results using the G statistic verify the hypothesis that there is no discernibleassociation among counties with regard to SIDS rates. The values of Z(G) are lessthan one. In addition, there seems to be no smooth pattern of Z values as dincreases. The Z(Z) results are somewhat contradictory, however. Although noneare statistically significant at the .05 level, Z(Z) values from 30 to 50 miles, aboutthe distance from the center of each county to the cente r of its contiguous neigh-boring counties, are well over one. This represents a tendency toward positivespatial autocorrelation at those distances. Taking the two results together, oneshould be cautious before concluding that a spatial association exists for SIDS

    hs per 1000Births

    FIG.1. Sudden Infant Death Rates for Counties of North Carolina, 197S1984

  • 8/13/2019 perhitungan manual getis ord.PDF

    12/18

    200 I Geogra phical Ana lysisTABLE 3Spatial Association among Counties: SIDS Rates by County in North Carolina, 1979-1984

    d i n miles Z G Z 010 0.82 - .5520 0.29 0.9930 -0.12 1.6833 0.40 1.8440 -0.04 1.3250 0.60 1.2060 -0.36 0.4870 -0.28 -0.4580 -0.19 -0.1390 0.11 -0.19100 0.30 0.18

    +Atall distances of this length or longer each county is linked to at least one other county.

    TABLE 4Highest Positive and Negative Standard Normal Variates by County forZGf(d) and Z G , ( d ) :SIDS Rates in North Carolina, 197 3-1984 (d = 33miles)

    Countv ZGf(d) Countv ZGdd)Highest Positive

    Richmond 3.34 Richmond +3.62Scotland +2.78 Hoke +1.78Cleveland 1.78 Moore 1.39

    Highest Negative

    Robeson 3.12 Robeson +3.09Hoke +2.12 Northampton +1.44Washington -2.63 Washington -2.18Dare -1.84 Davie -1.92Davie -1.76 Dare -1.70Cherokee -1.55 Bertie -1.64Tyrrell -1.53 Stokes -1.58

    among counties in North Carolina. Perhaps more light can be shed on the issueby using the G , ( d )and G T ( d )statistics.Table 4 and Figure 2 give the results of an analysis based on the G i ( d )andG T ( d )statistics for a d of thirty-three miles. This represents the distance to thefurthest first-nearest neighbor county of any county.The G T ( d )statistic identifies five of the one hundred counties of North Carolinaas significantly positively or negatively associated with their neighboring counties(at the .05 level). Four of these, clustered in the central south portion of the state,display values greater than 1.96, while one county, Washington near AlbemarleSound, has a Z value of less than - .96(see Figure 2). Taking into account valuesgreater than +1.15 (the 87.5 percentile), it is clear that several small clusters inaddition to the main cluster are widely dispersed in the southern part of the state.The main cluster of valves less than - 1.15 (the 12.5 percentile) is in the easternpart of the state. It is interesting to note that many of the counties in this clusterare in the sparsely populated swamp lands surrounding the Albemarle and Pam-lico Sounds. If overall error is fixed at 0.05and a Bonferroni correction is applied,the cutoff value for each county is raised to about 3.50. However, such a figure isunduly conservative given the small numbers of neighbors.

  • 8/13/2019 perhitungan manual getis ord.PDF

    13/18

    Arthur Getis andJ . K . Ord I 201

    Em

  • 8/13/2019 perhitungan manual getis ord.PDF

    14/18

    202 I Geographical Analysis

    FIG.3. San Diego House Prices, Septem ber 1989.

    and Figure 4 provide evidence that two coastal districts are positively associatedat the .05 level of significance while eight central and south central districts arenegatively associated a t the .05 level. There is a strong tendency for the negativevalues to be higher. It is for this reason that the Z(G) values given above are sodecidedly negative. The districts with high values along the coast have fewer nearneighbors with similar values than do the central city lower value districts. The

  • 8/13/2019 perhitungan manual getis ord.PDF

    15/18

    Arthur Getis and J . K . Ord I 203TABLE 5Spatial Association among Zip Code Dis tricts: Dwelling Unit Prices in San Diego County,September 1989

    di n miles Z ( G ) ZU2 -0.67 0.334 -2.36 2.365 -2.32 4.136 -2.47 4.168 -2.80 3.51

    10 -2.66 3.5712 - .20 3.5314 - .34 3.9216 -2.54 4.2718 -2 .30 3.5720 -2.25 2.92

    At all distances of this length or longer each district is connected to at least one other district.

    TABLE 6Highest Positive and Negative Standard Normal Variates by Zip Code District forGT(d) and Gi d ) :Dwelling Unit Prices in San Diego County, September 1989 d = 5 miles)

    Neighborhood Z G t ( d ) Neighborhood ZCiW

    CardigSolana BeachPoint LomaLa JollaDel Mar

    East San DiegoEast San DiegoEast San DiegoNorth ParkMission Valley

    Highest Positive+2.27 Cardiff+2.02 Solana Beach+ 1.93 Mira Mesa

    1.89 Ocean Beach1.55 R. Penasquitos- .22 East San Diego-2.74 East San Diego-2.64 North Park

    .56 East San Diego-2.38 College

    Highest Negative

    +2.081.811.561.37+1.33

    -2.99-2.54-2.48-2.48

    .19

    cluster of districts with negative Z(GT) values dominates the pattern. The adjustedBonferroni cutoff is about 3.27, but again is overly conservative.CONCLUSIONS

    The G statistics provide researchers with a straightforward way to assess thedegree of spatial association at various levels of spatial refinement in an entiresample or in relation to a single observation. When used in conjunction withMorans I or some other measure of spatial autocorrelation, they enable us todeepen our understanding of spatial series. One of the G statistics useful features,that of neutralizing the spatial distribution of the data points, allows for the devel-opment of hypotheses where the pattern of data points will not bias results.

    When G statistics are contrasted with Morans I, it becomes clear that t he twostatistics measure different things. Fortunately, both statistics are evaluated usingnormal theory so that a set of standard normal variates taken from tests using eachtype of statistic are easily compared and evaluated.

  • 8/13/2019 perhitungan manual getis ord.PDF

    16/18

  • 8/13/2019 perhitungan manual getis ord.PDF

    17/18

    Arthur Getis a n d J . K . Ord I 205APPENDIXSan Diego County Average House Prices for September 1989by Zip-Code District

    Principal CoordinatesNeighborhood (miles) Priceie Name x Y (in thousands)010203040506070809101112131415161718192021

    920249200792075920149212792129921289206492131921269203792122921179210992110921119212392124921209211992071

    EncinitasCardiffSolana BeachDel MarLake HodgesR. PenasquitosR. BemardoPowayScripps RanchMira MesaLa JollaUniversity CityClairemontBeachesBay ParkKe amy MesaMission VillageTierrasan aDel CerroSan CarlosSantee22 92040 Lakeside23 92021 El Cajon24 92020 El Cajon25 92041 La Mesa26 92115 College27 92116 Kensington28 92108 Mission Valley29 92103 Hillcrest30 92104 North Park31 92105 East San Diego32 92045 Lemon Grove

    33 92077 Spring Valley34 92035 Jamul35 92002 Bonita36 92139 Paradise Hills37 92050 National City38 92113 Logan Heights39 92102 East San Diego40 92101 Downtown41 92107 Ocean Beach42 92106 Point Loma43 92118 Coronado44 92010 Chula Vista45 92011 Chula Vista46 92032 Imperial Beach47 92154 Otay Mesa48 92114 East San DiegoSource of Data: Los Angeles Times, October 29, 1989, page K15.

    1235101215171383664681013141720

    39363432.143235322928222320181519192018192223 2424 1922 1718 1614 1611 169 168 14111317

    2024171613111283371517111515

    1414131312898101212141210641211

    26426026130926.5..194191236270162398201192249152138131221187182124147~15115016913819289225152111137150291297117998488175229338374165184164126126

  • 8/13/2019 perhitungan manual getis ord.PDF

    18/18