4
314 Network Data Structures Nrrwonr Dnrn Srnucrunrs Network data structures for geographic information science(GISci) are methods for storing network data setsin a computer in order to support a range of net- work analysis procedures. Network data sets are among the most common in GISci and include trans- portation networks (e.g., road or railroads), utility networks (e.g., electricity, water, and cable networks), and commodity networks (e.g., oil and gas pipelines), among many others. Network data structures must store the edge and vertex featuresthat populate these network data sets, the attributesof those features, and, most important, the topological relationships among the features. The choice of a network data structure can significantly influence one's ability to analyzethe processes that take place acrossnetworks. This entry describes the mathematical basis for network data structures and reviews several major types of network data structures as they havebeen implementedin geo- graphic information systems (GIS). Graph Theoretic Basis of NetworkDataStructures The mathematical subdiscipline that underliesnetwork data structures is termed graph theory. Any graph or network (the terms are used interchangeablyin this context) consistsof connectedsets of edges and ver- tices. Edgesmay also be referred to as lines or arcs, and vertices may be termed junctions, points, or nodes. Within graph theory there are methods for measuring and comparing graphs and principles for proving the properties of individual graphsor classes of graphs. Graph theory is not concerned with the shape of the featuresthat constitute a network, but rather with the topological properties of those networks. The topo- logical invariants of a graph are those properties that are not altered by elastic deformations (such as a stretching or twisting). Therefore, properties such as connectivity, adjacency, and incidence are topological invariants of networks, since they will not change even if the network is deformed by a cartographic process. The permanenceof these properties allows them to serveas a basisfor describing,measuring, and analyzing networks. Graph theoretic descriptions of networks can include statements of the number of featuresin the net- work, the degreeof the vertices of the graph (where the degree of a vertexis the number of edges incident to it), or the number of cyclesin a graph.Descriptions of net- works can also be based on structuralcharacteristics of graphs, which allow them to be groupedinto idealized types. Perhaps the most familiar type is tree networks, which have edge "branches" incident to nodes,but no cycles are created by the connections among those nodes. River networks are nearly always modeled as tree networks. Another common idealizedgraphtype is the Manhanan network, which is made up of edges intersecting at right angles. This creates a series of rec- tangular "blocks" that approximate the street networks common in many U.S. cities. Other idealized types include bipartite graphsand hub-and-spoke networks. If one wishes to quantitatively measureproperties of graphsrather than simply describethem, there is a set of network indices for that purpose. The simplest of these is the Beta index, which measures the con- nectivity of a graph by comparing the number of edges to the number of vertices. A more connected graph will have a larger Beta index ratio, since rela- tively more edges are connecting the vertices. The Alpha and Gamma indices of connectivity compare proven properties of graphs with observedproperties. The Alpha index compares the maximum possible number of fundamental cycles in the graph to the actual number of fundamental cycles in the graph. Similarly, the Gamma index compares the maximum possible number of edges in a graph to the actual number of edgesin a graph. In each case,as the latter measure approachesthe former, the graph is more completely connected. Other measures exist for applied instances of networks and consequently depend on nontopological properties of the network. The reader is directed to textbooks on the topic of graph theory for a more comprehensive review of theseand other more advanced techniques. lmplementations of Network Data Structures in GtS No nto po I og I ca I D ata Structu res While the graph theoretic definition of a network remains constant, the ways in which networks are structuredin computer systems have changed dramat- ically over the history of GISci. The earliest com- puter-based systems for automated cartographystored network edges as independentrecords in a database. Each record contained a starting and ending point for

locationscience.gmu.edulocationscience.gmu.edu/People/Curtin/NetworkDataStructures2.pdf · Created Date: 8/23/2008 7:56:51 PM

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: locationscience.gmu.edulocationscience.gmu.edu/People/Curtin/NetworkDataStructures2.pdf · Created Date: 8/23/2008 7:56:51 PM

314 Network Data Structures

Nrrwonr Dnrn Srnucrunrs

Network data structures for geographic informationscience (GISci) are methods for storing network datasets in a computer in order to support a range of net-work analysis procedures. Network data sets areamong the most common in GISci and include trans-portation networks (e.g., road or railroads), utilitynetworks (e.g., electricity, water, and cable networks),and commodity networks (e.g., oil and gas pipelines),among many others. Network data structures muststore the edge and vertex features that populate thesenetwork data sets, the attributes of those features, and,most important, the topological relationships amongthe features. The choice of a network data structurecan significantly influence one's ability to analyze theprocesses that take place across networks. This entrydescribes the mathematical basis for network datastructures and reviews several major types of networkdata structures as they have been implemented in geo-graphic information systems (GIS).

Graph Theoretic Basisof Network Data Structures

The mathematical subdiscipline that underlies networkdata structures is termed graph theory. Any graph ornetwork (the terms are used interchangeably in thiscontext) consists of connected sets of edges and ver-tices. Edges may also be referred to as lines or arcs, andvertices may be termed junctions, points, or nodes.Within graph theory there are methods for measuringand comparing graphs and principles for proving theproperties of individual graphs or classes of graphs.

Graph theory is not concerned with the shape of thefeatures that constitute a network, but rather with thetopological properties of those networks. The topo-logical invariants of a graph are those properties thatare not altered by elastic deformations (such as astretching or twisting). Therefore, properties such asconnectivity, adjacency, and incidence are topologicalinvariants of networks, since they will not changeeven if the network is deformed by a cartographicprocess. The permanence of these properties allowsthem to serve as a basis for describing, measuring, andanalyzing networks.

Graph theoretic descriptions of networks caninclude statements of the number of features in the net-work, the degree of the vertices of the graph (where the

degree of a vertex is the number of edges incident to it),or the number of cycles in a graph. Descriptions of net-works can also be based on structural characteristics ofgraphs, which allow them to be grouped into idealizedtypes. Perhaps the most familiar type is tree networks,which have edge "branches" incident to nodes, but nocycles are created by the connections among thosenodes. River networks are nearly always modeled astree networks. Another common idealized graph type isthe Manhanan network, which is made up of edgesintersecting at right angles. This creates a series of rec-tangular "blocks" that approximate the street networkscommon in many U.S. cities. Other idealized typesinclude bipartite graphs and hub-and-spoke networks.

If one wishes to quantitatively measure propertiesof graphs rather than simply describe them, there is aset of network indices for that purpose. The simplestof these is the Beta index, which measures the con-nectivity of a graph by comparing the number ofedges to the number of vertices. A more connectedgraph will have a larger Beta index ratio, since rela-tively more edges are connecting the vertices. TheAlpha and Gamma indices of connectivity compareproven properties of graphs with observed properties.The Alpha index compares the maximum possiblenumber of fundamental cycles in the graph to theactual number of fundamental cycles in the graph.Similarly, the Gamma index compares the maximumpossible number of edges in a graph to the actualnumber of edges in a graph. In each case, as the lattermeasure approaches the former, the graph is morecompletely connected. Other measures exist forapplied instances of networks and consequentlydepend on nontopological properties of the network.The reader is directed to textbooks on the topicof graph theory for a more comprehensive review ofthese and other more advanced techniques.

lmplementations ofNetwork Data Structures in GtS

No nto po I og I ca I D ata Structu resWhile the graph theoretic definition of a network

remains constant, the ways in which networks arestructured in computer systems have changed dramat-ically over the history of GISci. The earliest com-puter-based systems for automated cartography storednetwork edges as independent records in a database.Each record contained a starting and ending point for

Page 2: locationscience.gmu.edulocationscience.gmu.edu/People/Curtin/NetworkDataStructures2.pdf · Created Date: 8/23/2008 7:56:51 PM

eJrucruls slecl ecueprcul-lBnc I eJn8ldrllrA\ aJnpnJls srgl dursn srsaFrre {JoA\Fueuos elelduoc ot elqrssod sr lg q8noqqy'suorlcunJ ,fuenb pu? Suqrpe eruosuoddns ol JepJo ul .,fg eqJ uo,, sdlqsuoq-elar qcns elnduoc pq1 pedolerrep ueeqe^rr'eq slool pezqerceds euos'em1cn-4s elepsH] ul peJols ,tprcldxe lou eJe sdlqsuorl-u1er 1ec6o1odo1 q8noqlly 1cedser 1erpur ,{lqerrupe suuo;red emlcn-ls eqt pue'sernpe; clqdurSoeS;o sles e8rq;o Suuep-uer clqderEoguc prder roJ &\oIIe o1 ,(1peur-ud peu8rsep s?^r elgedeqs eql '(ruSg)

elnlBsul qcJBeseu sruels,(5 T4ueuruoJr^ugeW ,(q pedolenep emlcnqs ulup pcr3o1-odoluou e st a1{ador4s eql lueserd eqlot os peureuer s?q puB s066I plul ew ulrepdod ,(1eue4xe eueceq emlcnqs eppluulJBA e lnq 'slc ruselsuleru uI peuop-uuqe ,(lequesse eJeln slepolu epp pcr3o1-odoluou 'seEeluelpeslp eserp ol en61

'secDJeA pue seEpe go ,{lvr4cauuoceql go e8pel,trou4 errnbar seJnseeru crleJo-eqt qder8 crseq tsotu eqt ue^g 'srs,(1euu

{Jo&ueu JoJ sselesn ,{11er1uesse tueqlse)l?ru seJnlcruls €l?p eseql w uoD?ru-ro;ul 1ec6o1odo1 go {cBI eql 'ereq uors-sncsrp eql JoJ luuuodun 1so141 'fu,t. euuseqt ,{lesrcerd ur pezrlrElp tou eru seEpee1ec11dnp eJer{A\ 'sJoile JeArIs ol sp?elorunl uI 'srq; 'sernleeg puo8,(1od Jo sarlu

Jo ̂\erl J?Inqel pue crqder8 u seprlord 1 ern8rg '(reqto

qile ot luecelpe ere suoE,{1od qcqm) seql Suoppuu (reqlo r;Jee ot luecefpe eJe sepou qcq,tr) sepouueeA$eq uorlerruo.Jur 1ecr8o1odo1;o erqdec eql ol sJEeJaruappu! f)n(I

'ertrtrn4s ?tep (SWIC) Surpocugxrrl?I tr ecueprcul-lenc eq] go lueudole^ep eqlol enp sempnDs ?lep SID ur slcrulsuoc yecr8olodol

Jo uorsnlJul erD JoJ elqrsuodser ,tpreurud sr neeJngsnsueJ 'S'O eql 'ereq,resle pelueuncop ilel\ ueeqs?q sv 'srs,(pue

{Jolrueu Surpnlcul 'suorlcuqt 519

,(ueur ro3 lueruele lueuodur uu sr sergedord 1ecr3o1-odot;o e8pelinouq leql uorlruSoceJ p?oJq sr eJeql

sppow qteo Mttaotodot

'srs,(1uuu {Jolrleu JoJ eJrucn4s ?]?p lueIJIJJeuI

ue eq o1 peJeprsuoc ,(1ereue8 sr 1r 's1oo1 pezruolsnc

-punoq lueprcuroc,{Fepc4red'parnldeceq o1 se8pe e1ec11dnp ro; ,(cuepuel eq1epnlcut seJnlcruls elep pcrEolodoluou

go seEeluelp?srp eql 'seEelced eJ?aqJos Sungurppepre-relndruoc Suorue eJnlcruls elup sFIl Jo ecuul-decce epvn eql o1 pe1 eSeluBApB JoDBI srq;'sesodmdcrqderSouec rog ,(uldsrp Jo suuel ul luercgJe eru feqlpue 'EurzqrSrp

{8norq1e1ep pqeds;o emlduc eql roJuro;re1d pre,rropqSrer1s B eprloJd ,(eq1 'lueweldrur

pue puelsJepun ol ,(see ere ,(eqt rql lJ?J eql epnlJurslapotu elep pcrSolodoluou ;o se8eluunpu eql

'(lepour epp ..q1eq8eds,, eq1 s? u1(otnl,(lprnboloc) atnqcnt4s loctSo1odo;uou eql peuueteJoJeJeq] sBA\ pu€ seSpe eqt go sergadord pcr8ol-odot eqt SurpreSer uoq?urJoJur due uruluoc lou plpspJoceJ eseql'se?pe eql ur selJnc peugep 1eq1 ..s1uroded?qs,, Jo ls1 e ol pJoceJ qcee rrro4 {ur1 ? pepnlcursuoq?lueueldur euros pue 'procer qcee qlra\ pep-rcosse eq plnoc spleu elnql$y's1urod esoql ueeilqequorlJeuuoJ eql su peugep se,lr eSpe eql pue 'e8pa eql

ilnNzpI

zXcpL

zcaI

llnNqeI

XqcnilnNX

IqeMXpezMllnNepI

xz

e

9V2seJnpnJls elec )Jo/nlaN

Page 3: locationscience.gmu.edulocationscience.gmu.edu/People/Curtin/NetworkDataStructures2.pdf · Created Date: 8/23/2008 7:56:51 PM

--F

316 Network Data Structures

s *

Figure 2 Matrix-Based Network Data Structure

Table 1 Star Data Structure Vertex List

Vertex List

Vertices

Pointers

how lines and polygons are stored in this topologicaldata structure.

The DIME data structure evolved into the structureemployed for the Topologically Integrated GeographicEncoding and Referencing (TIGER) files that are stillused by the Census Bureau to delineate population tab-ulation areas. There are many advantages of the dual-incidence data structure, and the wide acceptance of thedata sfructure combined with the comprehensive natureof the TIGER files led to its status as the de facto stan-dard for vector representations in GIS. Two elements ofthis advance profoundly influenced the ability to con-duct network analysis in GIS. First, the DIME structuecaptures incidence, which is one of the primary topo-logical properties defining the structure of networks. Ascan be seen in Figure 1, all edges that are incident to agiven point can be determined with a simple databasequery. Second, many of the features captured by theCensus Bureau were streets or other transportation fea-tures. Since the Census Bureau has a mandate that cov-ers the entire United States, this meant that a nationaltransportation database was available for use in GIS,and this database was captured in a structure that couldsupport high-level network analysis.

However, the dual-incidence topological datamodel also imposes some difficult constraints on net-work analysts. The Census Bureau designed the datastructure in order to well define polygons with whichpopulations could be associated. To do so, the datamodel had to enforce planarity. Planar graphs arethose that can be drawn in such a way that no twoedges cross without a vertex at that location. Thus, atevery location where network features cross, a pointmust exist in the database. This is true regardless ofwhether or not a true intersection exists between thenetwork features, and it is most problematic whenmodeling bridges or tunnels. While network featurescertainly cross each other at bridges, there is no inci-dence between the features, and network analysisshould not permit flow between features at that point.Moreover, since planar enforcement demands thatnetwork features (such as roads) be divided at every

V

8

X

5

T

J

Null

1 1

1 2 3 4 5

s 1 0 0 1 0

T 0 0 1 0 1

x 1 1 0 0

v 0 0 1 1

Table 2 Star Data Structure Adjacency List

Adjacency List

Pointer

Adjacency

I

X

9

T

4

V

J

X

2

V

5

S

6

T

10

X11

0

Page 4: locationscience.gmu.edulocationscience.gmu.edu/People/Curtin/NetworkDataStructures2.pdf · Created Date: 8/23/2008 7:56:51 PM

lndur Jo suoneurquoc suel,uoc pue lndlno euo puuslndur;o Jeqrunu ? suq uoJneu rlJeg'suotnau lercrJrqJeJo les pelceuuocJelul ue Jo slsrsuoc {Jo^ueu IBJneu v

uolllulJao pue punoJolce8

'suorlecqdde pcld,$ ur esn Jreql ol lrreleleJ senssrpue 'Surureq

{Joa$eu s? qcns 'slcedsu pcqcerd erotuSuueprsuoc eJoJeq semlcn.r1s {Jolrr1eu Jreql Jo spadses^\erleJ prre $IJo1$eu IeJneu go u8rsep eql puqeqldacuoc crseq eql seuqlno ,{-r1ue srq; '$lJo1$eu

l?Jneu1ec€olors,tqd pruce qtra\ uorsryuoc,(ue eq o1 f1a41unsr eJeql eJeqa 'slc s€ qcns slxeluoJ ur pesn s TtowaulDrnzu urJel eqil 'sur-BJq

IBtuluE Jo rrBrunq ur suomeu

Jo $lJolvqeu uo pelepou,tlesooy slool uoqecursselc pueuoqcelep uregud en (NNil syrowpu lornau prc{rtty

s)uoMrlN lVUnrN

'o8ucq3 go ,(lsre,rrun:11 'o8ecrq3 '(tg 'oN reded qcreesey) scltsyatcototlc

Touot8at puo (tldotSoa7 4toryau uaatJaq sfuqsuor1o1ay:$lroiliau uoltoltodsuo4 {o atrucn"4g'(SSO1) '; 'f4sue;

'Aelse I

-uosrppv :y6 'Surpeeg 'tuoaqt qdotg '(ZSe} g ',{rereg'reTleQ

IecrBW :{ro1 ^\eN '('po puZ) s4dot7 puo s4toqau rc{stutltttoS1o uottozlwltdo '(ZOe)'g 6€{erurhtr

ry ''g '1 'sueirg

'llpH acrluerd :fN te^rg e1ppe5 reddn'(SS-w'dd) swalsKs uotlouttotut nqdotSoa? {o ttolsgt1

aqJ'(pg) ueuserod ' \ ; uI 'uonngr4uoc s.nperng

snsueJ eql :dgDII puu ,Go1odo1 '(SeOt) 'd 'q 'a{ooJ

sOu!psad Jaqilnl

fSolodo;:UgCttisrs,(puy pqudg luo4ulueserdeg isrs,{puy {ro^4,leN

:(919) sruelsfs uoqeuuoJul crqderSoeg :(lcSlC) ecuercsuoq€uuoJul cqderEoeg :serntcruts upq l8urlepo6

elzq ip4edg 'eseqepq iu8rseq es€qele6l :snsueJ osry, aas

uqtnJ'w ultax

'|oSID pu? SIC WI/v\peler8elur flSurseercur secuelp? eseql ees o1 lcedxeueJ euo 'seugdrcsrp

Jo los esJelrp e o1 elqecqddueJ? seJnlJnJls {Jol\lau xeldruoc ,(tqBU leql uoq-ruSocer eql pue 's>po,lleu cnueu,(p Jo uo4cnpolureql 'sernlcruls ?lep pelueuo-lcefqo lo lueudolelap

eql ueas seq lsud ]uecer eql '8u!.rncco .{llunuquocerc lcslc JoJ seJnlcruls ulup ryo^ueu ur seJu?Apv

soJnlcnJls Pleo )Jon leNJo aJnlnl eql

'epou uenr8 e uro4 scJB JoJ Surqcruesuo puedep leql suqlrJoSle lro,lleu ,(ueur ro3 eJnlcn-usluerolJJe lsou eql eq ol uelord osp seq eJnlJnqssrql'uoq?uuoJur snoeue4xe 8uuo1s lnoqtrl\ punoJ aqu?c uorlBrruo;ur ,{cuecelpe 'sr(eue oA.] eseql ruoJg

'(Z elqel ees) lsil [.cuacofpo eq] pue (1 e1qel ees)$n xauat eql Jo slsrsuoc 7 ern8rg ur qder8 eql JoJ ernl-onos elep JBls eql 'secruel eql Jo qc?a ro; sercuacefpe

;o ?ur4s snonuquoc e sploq ls{ puoces eql'ls1 puo-ces e ol relurod e qlyv\ secruel eql Jo ls1 B sr lsJU erf,L'slsu 01$ uo paseq sr eJnlcruls El?p Jsls eql'elqere;erdeq feru 'em1cn-4s BlBp JBls eql se qcns 'seJnlcnrls

B1ep peseq-lsrl 'sesuc aseql ul 'uorlerruoJur pcrSoyodol

Jo lunotue lptus e ernldec o1 eceds eSerots Jo pepteet? e e4nber ,(eur xupru eq1 '(secrgel eql SuqJeu-uoc se8pe me; ,{1enrp1er) esreds sr ryolueu eql ueq^\'rerre,troll 'sergedord pcrSolodol {Jo^\leu ;o ,{"renbpgder pue elqrnlul JoJ ̂\olle seJnloruls ulep xulel J

'pepnord eJ€ xuluru ecueprcure8pe xegan pue xrJl€ru,(cuecelpe xeual eql'7 ern8rgur ulloqs {Jo&ueu eql JoC 'suor}elueserder xuleurqtl^\ >Iroaueu eql Jo sergedord 1ec6o1odo1 eqt erolsol elqeJeJeJd sr 11 'suorleredo

{Jolyueu fueru rog'sernpecord srs,{puu

{Jo1y\leu 3o uo4eredo luercrJJe eJor.u 1t\oll? 13ql seJnl-Jruls BlBp luugodrur lsotu sdeqred pue 'sernleeJ

{Jol\-1eu Suop puu uee,{\leq Sur.n,oru ueq^\ peJatunoruesecwpedur eql Iapou o] JepJo ur se8pa uo slurBIS-uoJ IsuorlseJrp pu? surnt uoddns 13111 seJnlcrulsulep 'sryomleu plJol\-leer Iepou fllecqsquer erourol JepJo ur sluerueJrnber ,{1ueueld xeleJ leql seJnlcrulselep reuelduou epnlcur eseql 'sppoa Drlp 4roilJauatnd 1o lueudolelep eqr pelslrsseceu eleq seJnt-cruls €lep SIC uoururoc Sursn ueq,l pesodur srs,(pue

{Jotqeu tuJoJ:ed o1 ,ttpqu eql uo suorlelrul aql

sppow sleo ryoMleN end

'senlen elnqlr1l€ peu8lsse ere sernpag e1dr1-Fru eseql ueql\ eseqelep eql ur sJoJJe e8urnocue uecpue JeAo seurr] dueu ezrs eseqelep eql eseeJour rrBJuoqrleder srgl 'eJnlcruls el?p eql ur spJoceJ Jo saues? se pelueseJder eq tsruu eJnleeJ e18urs

" su pesn pue

pe,\Iecred ,(luoruuroc eq ,(eru ]?q] peoJ u 'uoqceslelul

L'2sIJOrnleN leJneN

Kevin M. Curtin
Cross-Out