Path Length

1

Path Length 1) Introduction In calculating the average path length, we must find the shortest path from a source node to all other nodes contained within the graph. Previously, we found that by using an inefficient algorithm, experimentally calculating the path length of a graph can be time consuming. Since then, I have looked further into the necessary algorithms for solving this problem. 2) Verification of the Algorithm and Code

Algorithm The algorithm I used to solve the single source shortest path problem within the polymeric gel was a breadth first search. A breadth first search begins at a starting node and explores all of the neighboring nodes. Then for each of those nearest nodes, it explores their unexplored neighbors until it finds the goal.

For every node within the graph do {

Initialize the distances to all other vertices as -1 (not computed), Initialize the queue to null Store s (start node) in a queue Set the distance to s to be 0 in the Distance Table.

While there are vertices in the queue {

Read a vertex v from the queue For all adjacent vertices w {

If distance to w is -1 (not computed) do { Make distance to w equal to (distance to v) + 1 Add w to the queue

The algorithm can be designed to generate a distribution of path lengths between nodes. This is done by tracking the following process within a given timestep: Starting from a given node the algorithm will keep count how many nodes are immediate neighbors to this starting node. The path length between these immediate neighbors and the starting node is one. This is then done for second nearest neighbors. The path length between the second nearest neighbors and the starting node is two. This process continues until all reachable neighbors are visited.

2

With the polymeric gel, it was necessary to repeat this algorithm N (the number of aggregates within a timestep) times, starting from each node within a given timestep. It is well-known that within our gel network, all aggregates are not necessarily connected. Rattlers and a disconnected graph resulting in a giant component are two examples of this situation. To account for the shortest paths between all of the aggregates it is necessary to repeat this algorithm, starting from each node within the network. By repeating this algorithm, starting from each aggregate and continually updating shorter path lengths for a given timestep, the discontinuous nature of the polymeric gel network can be accounted for. From here a distribution is created by counting the number of each of the specific path lengths within each timestep. ErdsRnyi Random Graph In an effort to validate the accuracy of the FORTRAN code, I compared experimental results of an ErdsRnyi random graph to the calculated values of well-known formula. Starting with N disconnected nodes, ErdsRnyi Random graphs are generated by connecting couples of randomly selected nodes, prohibiting multiple connections, until the number of edges equals K (S. Boccaletti et al./ Physics Reports 424 (2006) 175-308). Connections between randomly chosen nodes were made with the exception of loops. Any connection resulting in a loop was not allowed. Using this definition of an E.R random graph, I created networks that contain the same number of nodes and links as our gel at that given temperature. I looked at two experimental values and three calculated values for each temperature. The experimental values were gathered using the FORTRAN code of the breadth first search algorithm. The first was the average path length as calculated from the probability distribution of path lengths. I calculated the probability distribution by dividing the path length distribution by the sum of all the path lengths (this includes any disconnections),

=i

i

ii L

LkP )(.

From here the average path length is defined as =

NkkP )(l . [1]

In the following tables of data, this value is labeled Experimental 1.

3

The second method of experimentally calculating the average path length is defined as follows

= ji jiDNN , ,)1(1l . [2]

In an unweighted graph, jiD , is the shortest distance between node i and node j . This definition assumes that 0, =jiD if node i cannot be reached by node j or if ji = . N in this definition is the total number of nodes who have connections. In the following tables of data, this value is labeled Experimental 2. I compared these two methods of experimentally gathering average path lengths to calculated values for a random E.R. graph using formula found in M. Newman et al. / Phys. Rev. E 64 026118 (2001). In each case mz is the average number of neighbors at distance m . The first is as follows ( )( )[ ]

( )1221

2112

lnln1ln

zzzzzzN +=l , [3]

and will be labeled Calculated 1 in the following table of data (Table 1). In the special circumstance where the following two conditions hold,

1zN >> 12 zz >> Eq.[3] reduces to

( )( ) 1ln

ln

12

1 +=zzzNl . [4]

This value will be labeled Calculated 2. In the special case of an E.R random graph, for which kz =1 and 22 kz = , Eq.[4] reduces to the following

kN

ln)ln(=l

[5]

(S. Boccaletti et al./ Physics Reports 424 (2006) 175-308). In the following tables of data, this value will be labeled Calculated 3.

4

There are a couple of considerations to take into account. In the creation of the E.R. random networks we started with the same number of nodes and links as the gel at each given temperature. Importantly, when we randomly choose to make connections between N nodes, not all of the N nodes will be selected. There will be some nodes that do not have connections to others. This results in a network with k links, but the total number of connected nodes is less than the number of desired nodes. This fact is important when comparing calculated values to experimental results. I am assuming that based upon the definition of an E.R. random graph, according to S. Boccaletti et al., the value for N must include those nodes that do not have connections. A second consideration is in the fact that the created E.R random graph might be disconnected. The formula used to calculate average path lengths assumes that all nodes are reachable from any randomly chosen starting node. As stated in M. Newman et al. / Phys. Rev. E 64 026118 (2001), in general this will not be true and Eq. [4] is meaningless. A better approximation to l may therefore be given by replacing N in Eq.[4] by NS , where S is the fraction of the graph occupied by the giant component. Therefore, I made this approximation. I averaged the largest component of the random graph per timestep and included this factor in each of the three calculated values.

Figures 1 and 2 contain plots of average path length verses temperature for the gel and random graphs.

0.5 1 1.5 2Temperature

0

5

10

15

20

Ave

rage

Pat

h Le

ngth

Gel - Exp 1Random - Exp 1Random - Calc 1Random - Calc 2Random - Calc 3

Path Length

Figure 1 contains a graph of the average path length data verses temperature for the polymeric gel and E.R.

random graphs.

5

0.3 0.4 0.5 0.6 0.7 0.8 0.9Temperature

2

4

6

8

10A

vera

ge P

ath

Leng

thGel - Exp 1Random - Exp 1Random - Calc 1Random - Calc 2Random - Calc 3

Path Length

Figure 2 contains a graph of the average path length data verses temperature for the polymeric gel and E.R.

random graphs. This is the same data as in figure 1, just a closer view.

Tables 1 and 2 contain the average path length data (experimental and calculated) for the random graph and the polymeric gel.

6

Table 1

E.R. RANDOM MATRIX PATH LENGTHN = 2000

Temperature 2.00 1.80 1.50 1.20 1.00 0.90 0.80 0.70 0.60 0.50 0.45 0.40 0.35 0.30

Cluster Count (approx) 1413 1335 1250 1117 977 883 762 606 427 261 206 176 161 142links 971 966 960 948 934 921 904 877 836 790 772 747 724 734

# of Nodes 1053.1 1020 981.8 910.7 833.4 773 691.2 573.4 418.5 260.5 205.8 175.9 161 142

Ratio - Giant Componentto Total Cluster Count 0.65511 0.71167 0.77297 0.84781 0.90557 0.93105 0.9634 0.98884 0.99689 1 1 1 1 1

Average Z1 1.84598 1.89608 1.95763 2.08411 2.24382 2.38551 2.61863 3.06243 4 6.07294 7.51215 8.504832 9.00621 10.3521Average Z2 2.5348 2.75686 3.00387 3.52454 4.27094 4.96455 6.19878 8.76317 15.2793 33.557 47.1817 56.30699 59.6199 69.2113

1.84598 1.89608 1.95763 2.08411 2.24382 2.38551 2.61863 3.06243 4 6.07294 7.51215 8.504832 9.00621 10.3521

^2 3.40764 3.59511 3.83231 4.34352 5.03473 5.69066 6.85725 9.37851 16 36.8806 56.4324 72.33217 81.1118 107.166

Experimental valuesExperimental 1

1 Average Path Length 6.63186 7.10304 7.71975 7.95787 7.7241 7.28176 6.72078 5.79176 4.49197 3.26783 2.83802 2.62216 2.53295 2.35146

Experimental 22 Average Path Length 6.63839 7.11042 7.72812 7.96762 7.73413 7.29168 6.7307 5.80216 4.50281 3.28044 2.85189 2.63716 2.54878 2.36814

Calculated valuesCalculated 1

1 Average Path Length 16.5118 14.5079 13.0273 10.9471 9.13104 8.08003 6.90495 5.61081 4.25684 3.08397 2.70885 2.517848 2.44093 2.29601

Calculated 22 Average Path Length 20.607 17.6088 15.4842 12.6461 10.2862 8.9716 7.54056 6.01875 4.48258 3.19999 2.80211 2.602939 2.5256 2.37825

Calculated 33 Average Path Length 11.1426 10.7169 10.2323 9.33257 8.39577 7.72001 6.85466 5.71444 4.3668 3.08482 2.64211 2.415398 2.31192 2.12042

7

Table 2

POLYMERIC GEL PATH LENGTHN = 2000

Temperature 2.00 1.80 1.50 1.20 1.00 0.90 0.80 0.70 0.60 0.55 0.50 0.45 0.40 0.35 0.30

Cluster Count (approx) 1413 1335 1250 1117 977 883 762 609 429 338 261 206 176 161 142links 971 966 960 948 934 921 904 877 836 815 790 772 747 724 734

# of Nodes 1413.39 1335.94 1250.81 1117.64 977.685 883.087 762.985 609.691 429.014 338.657 261.948 206.97 176.1839 161.984 144.111

Ratio - Giant Componentto Total Cluster Count 0.0345 0.0677 0.1652 0.46671 0.66513 0.74723 0.82145 0.88617 0.93782 0.95747 0.97467 0.98735 0.995831 0.99874 0.99259

Average Z1 1.36991 1.43859 1.52291 1.67481 1.86827 2.02433 2.26189 2.66008 3.36427 3.91926 4.62495 5.35033 5.770649 5.96497 6.19191Average Z2 1.17664 1.47748 1.88046 2.70186 3.89262 4.94847 6.69323 9.75338 14.8364 17.9091 20.9416 23.409 24.95407 24.7829 25.5831

1.37713 1.44645 1.53145 1.6843 1.87857 2.03531 2.27322 2.67195 3.37635 3.93092 4.63552 5.35828 5.775521 5.96762 6.33488^2 1.89648 2.09223 2.34535 2.83688 3.52903 4.14248 5.16751 7.13932 11.3998 15.4521 21.488 28.7112 33.35664 35.6125 40.1307

Experimental valuesExperimental 1

1 Average Path Length 0.02007 0.07875 0.47734 2.95585 4.43444 4.77851 4.8884 4.76399 4.45435 4.26854 4.08551 3.91669 3.774252 3.73232 3.47408

Experimental 22 Average Path Length 0.02009 0.07883 0.48037 2.95965 4.44104 4.78668 4.89813 4.77602 4.46983 4.28665 4.10622 3.93957 3.797657 3.75663 3.50048

3) Discussion For the random graph, as seen in table 1, the experimentally gathered path lengths ([1] and [2]) are in close agreement. Yet the difference in the calculated values increases with temperature. It was expected that the calculation of the path length using formula [3], [4], and [5] would be more consistent with the experimental results. But, we can see in figure 1 that this is not the case. Calculated 3 Eq.[5] seems to be the closest to the experimental results for a random graph at all temperatures. However, the two conditions of 1zN >> and 12 zz >> are not met for all temperatures. I feel that the second condition is not met for temperatures greater than 0.6. Due to this, I should expect that the calculated value (using Eq.[5]) should start to deviate from experimental starting at T = 0.6. As seen in figure 2, Random - calc 3, Eq.[5] seems to hold consistent with experimental results up to T = 0.8. Since Eq.[4] and [5] have been stated as the result of reducing Eq.[3], while imposing special conditions, I have assumed that Eq.[3] should hold true for the random graph at all temperatures and without these two special conditions. Yet, this calculation is only second best to the experimental results. If I were to change the value of N by using only the number of connected nodes, this would result in a lower calculated value in all three cases. However, the calculated values would still deviate at higher temperatures.

8

To check if the FORTRAN code was functioning as desired, I have built two small networks of 20 nodes. Twice, I manually drew the connections between nodes and verified that the resulting shortest path lengths and distributions are correct. In this work, if a starting node does not have a path to another, its shortest path length (zero) is not counted. Earlier, which at this point I dont remember the details, you informed me of how to deal with these disconnections while taking the inverse. Could you refresh my memory on those details?

Documents

Path Length