8
Shortest Path Efficiency Analysis - Logic Programming Suraj Nair September 6, 2015 Abstract Generally, this paper aims to study different implementations of shortest path algorithms and determine the various benefits of each implemention. Specifically, in this report we will examine implementations of Dijkstra’s algorithm for undirected and directed weighted graphs through logic programming as well as through standard graph theory. We will aim to show that the logic programming implementation, while using less memory, is actually slower and has less capabilities than the standard graph theory implementation. Due to the availability of compute space today, the memory benefits of the logic programming implementation are not nearly as valuable as the speed and range of capabilities of the graph theory implementation, and we can conclude that for most applications, the graph theory implementation is superior. Method To find these results, we developed random graphs of a user specified size. For each pair of nodes in the graph there is a 50% chance of there existing a edge between those nodes, and if such an edge does exist, it is given a random weight between 0 and 50. Then, using implementations of Dijkstra’s algorithm in Java and in Prolog, we solve for the shortest paths from the first node to every other node, validate that the answers are correct, and collect data regarding the time and space usage of each implementation. Dataset We begin with a dataset with the following structure: ## data.frame: 51 obs. of 9 variables: ## $ Number.of.Nodes : int 10 100 100 100 100 100 500 500 500 500 ... ## $ logic_cpu_start : int 208 235 319 400 440 482 722 2474 4172 5873 ... ## $ logic_cpu_end : int 209 255 341 419 463 507 2277 3960 5661 7370 ... ## $ logic_wall_start: int 703557 826379 1070444 1363390 1435559 1508727 1761770 2001670 2239261 25210 ## $ logic_wall_end : int 703568 826435 1070516 1363456 1435633 1508812 1763513 2003329 2240907 25227 ## $ graph_cpu : int 20 105 104 103 104 107 579 460 546 545 ... ## $ graph_wall : int 26 114 112 112 112 116 594 477 562 560 ... ## $ logic_mem : int 10560 725192 842376 691888 699448 679016 16565232 14614016 80080 16664696 . ## $ graph_mem : int 309344 23051368 23222264 22769592 23289224 23091200 1009160792 437351400 90 After processing and making the some of the columns more concise, we end up with a table with the following fields representing memory usage, wall time, and cpu time for each implementation for each size graph: ## [1] "Number.of.Nodes" "logic_mem" "graph_mem" "logic_wall" ## [5] "graph_wall" "logic_cpu" "graph_cpu" 1

LogicProgrammingShortestPathEfficiency

Embed Size (px)

Citation preview

Page 1: LogicProgrammingShortestPathEfficiency

Shortest Path Efficiency Analysis - Logic ProgrammingSuraj Nair

September 6, 2015

Abstract

Generally, this paper aims to study different implementations of shortest path algorithms and determinethe various benefits of each implemention. Specifically, in this report we will examine implementations ofDijkstra’s algorithm for undirected and directed weighted graphs through logic programming as well asthrough standard graph theory. We will aim to show that the logic programming implementation, while usingless memory, is actually slower and has less capabilities than the standard graph theory implementation. Dueto the availability of compute space today, the memory benefits of the logic programming implementation arenot nearly as valuable as the speed and range of capabilities of the graph theory implementation, and we canconclude that for most applications, the graph theory implementation is superior.

Method

To find these results, we developed random graphs of a user specified size. For each pair of nodes in thegraph there is a 50% chance of there existing a edge between those nodes, and if such an edge does exist, it isgiven a random weight between 0 and 50. Then, using implementations of Dijkstra’s algorithm in Java andin Prolog, we solve for the shortest paths from the first node to every other node, validate that the answersare correct, and collect data regarding the time and space usage of each implementation.

Dataset

We begin with a dataset with the following structure:

## 'data.frame': 51 obs. of 9 variables:## $ Number.of.Nodes : int 10 100 100 100 100 100 500 500 500 500 ...## $ logic_cpu_start : int 208 235 319 400 440 482 722 2474 4172 5873 ...## $ logic_cpu_end : int 209 255 341 419 463 507 2277 3960 5661 7370 ...## $ logic_wall_start: int 703557 826379 1070444 1363390 1435559 1508727 1761770 2001670 2239261 2521066 ...## $ logic_wall_end : int 703568 826435 1070516 1363456 1435633 1508812 1763513 2003329 2240907 2522705 ...## $ graph_cpu : int 20 105 104 103 104 107 579 460 546 545 ...## $ graph_wall : int 26 114 112 112 112 116 594 477 562 560 ...## $ logic_mem : int 10560 725192 842376 691888 699448 679016 16565232 14614016 80080 16664696 ...## $ graph_mem : int 309344 23051368 23222264 22769592 23289224 23091200 1009160792 437351400 900070896 727160256 ...

After processing and making the some of the columns more concise, we end up with a table with the followingfields representing memory usage, wall time, and cpu time for each implementation for each size graph:

## [1] "Number.of.Nodes" "logic_mem" "graph_mem" "logic_wall"## [5] "graph_wall" "logic_cpu" "graph_cpu"

1

Page 2: LogicProgrammingShortestPathEfficiency

Analysis

Now that we have clean data, we can begin our analysis. We will begin by looking at how memory usagescales for each implementation.

Memory

10 100 200 300 400 500 600 700 800 900 1000

0e+

004e

+08

8e+

08

Comparing Memory Usage

Number of Nodes

Mem

ory

Usa

ge in

Byt

es

Graph TheoryLogic Programming

Since the graphs are random, and have varying number of edges, the memory usage is not perfectly alignedwith the number of nodes, however we can clearly see the difference between the two implementations. Forthe graph theory implementation, we see a roughly linear growth with the number of nodes, which is to beexpected since for each node we need to create a node object as well as an edge object for each edge.

On the other hand, the Prolog implementation implementation stays roughly constant, because the entiregraph is represented as a set of rules, and no objects are created. Thus, the logic programming implementationconsistently uses less memory.

2

Page 3: LogicProgrammingShortestPathEfficiency

0 200 600 1000

0e+

002e

+07

4e+

076e

+07

Logic Programming

Number of Nodes

Mem

ory

in B

ytes

0 200 600 1000

0.0e

+00

6.0e

+08

1.2e

+09

Graph Theory

Number of Nodes

Mem

ory

in B

ytes

Here we can see a scatterplot of the memory usage for each of the nodes. This gives us a more clear pictureof how the memory usage of each implementation scales. Comparing the slopes of each of the lines of best fit,we can see that the graph theory implementation uses approximately 23 times more memory than the logicprogramming implementation.

3

Page 4: LogicProgrammingShortestPathEfficiency

Timing

10 100 200 300 400 500 600 700 800 900 1000

020

0060

0010

000

Comparing Runtime (Wall−Time)

Number of Nodes

Tim

e in

Mill

isec

onds

Graph TheoryLogic Programming

10 100 200 300 400 500 600 700 800 900 1000

020

0060

0010

000

Comparing Runtime (CPU−Time)

Number of Nodes

Tim

e in

Mill

isec

onds

Graph TheoryLogic Programming

4

Page 5: LogicProgrammingShortestPathEfficiency

The above two graphs illustrate how the time complexity of each algorithm scales with the size of the graph.Since the time spent reading in the graph is insignificant compared to the time required to compute theshortest paths, we find that the CPU time and wall time are almost identical. Furthermore, we see thatas the number of nodes increases, the logic programming implementation is slower than the graph theoryimplementation.

0 200 400 600 800 1000

060

00

Logic Programming Wall Time

Number of Nodes

Tim

e in

Mill

isec

onds

0 200 400 600 800 1000

015

00

Graph Theory Wall Time

Number of Nodes

Tim

e in

Mill

isec

onds

0 200 400 600 800 1000

060

00

Logic Programming CPU Time

Number of Nodes

Tim

e in

Mill

isec

onds

0 200 400 600 800 1000

010

00

Graph Theory CPU Time

Number of Nodes

Tim

e in

Mill

isec

onds

Here we can see a scatterplot of timing for each of the number of nodes. This allows us to compare exactlythe speed difference between each of the algorithms. Comparing the slopes of each of the lines of best fit, wecan see that the logic programming implementation uses approximately 8.27 times more wall time than thegraph theory implementation and 8.6 times more cpu time.

Upon a closer inspection of each of the methods, it becomes clear that the reason for the speed difference isthat the Graph Theory implementation utilizes a binary heap, while the Prolog implementation finds the newclosest node to the start by doing a breadth first search from the start node to find the closest unassignednode, then assigns it as found. Therefore, if we have a graph of N nodes, then in the worst case scenario wehave to explore approximately N + (N-1) + (N-2) . . . = (N)(N-1)/2 paths. However this situation can onlyoccur if every node is connected to every other node, and one direct path through all the nodes is weightedsubstantially less than all of the other path. In practice this method of finding the shortest unassigned node isgenerally fast and since the number of paths which need to be explored is the sum of the number of adjacentunassigned nodes for each assigned node, most graphs in real world applications will not require searchingtoo many edges to find the closest node.

Unlike the Logic Programming implementation, the Graph Theory implementation uses a binary heap whichis represented as an array. Thus all operations take either logarithmic or constant time. Furthermore, itis a stable heap so it supports changing the priority of a key withing the heap directly. Ultimately, thismakes the Graph Theory implementation faster, especially for larger graphs, as is evident from the previouslydisplayed timing data. Additionally, it explains the scaling difference we see in the graphs, where the GraphTheory implementation scales at a linearithmic rate, while the Logic Programming implementation scales atapproximately quadratic rate.

5

Page 6: LogicProgrammingShortestPathEfficiency

Implementing A Binary Heap in Prolog

To determine whether the Logic Programming implementation can be optimized to operate at speed close tothe Graph Theory implementation, we attempted to implement a binary heap in Prolog. However, sinceProlog stores a heap through a linked list, not an array, the functions for modifing the heap are less efficient.In fact, in the documentation, it is specifically stated that the delete from heap rule is extremely inefficient.Below we can see the average amount of time it takes to retreive the shortest node in Prolog with and withoutthe heap.

500 300 100

With HeapWithout Heap

Number of Nodes

Tim

e in

Sec

onds

0e+

002e

−04

4e−

04

Additionally, the heap is unstable, so to change the shortest path to a node, one needs to delete the Priority-Key pair and add it with the new priority, so using a heap with Prolog not only has a less efficient call to getthe smallest value, it also makes that call more often. Therefore, we use the implementation without theheap.

Real World Applications

Now let us examine the difference between these algorithms when applied to a real world example. Wewill be using the Origin and Destination Survey Data for airlines from the United States Department ofTransportation Database. Based on this data, we will construct a directed graph with a node for each of the402 airports in the data and all of the edges corresponding to real world flight info. The weights of each edgewill be the distance in miles between each of the airports. We begin with data in the following format:

## data.ORIGIN_AIRPORT_ID data.DEST_AIRPORT_ID data.NONSTOP_MILES## Min. :10135 Min. :10135 Min. : 39## 1st Qu.:11278 1st Qu.:11274 1st Qu.: 691## Median :12451 Median :12451 Median :1110## Mean :12705 Mean :12693 Mean :1311## 3rd Qu.:14122 3rd Qu.:14113 3rd Qu.:1741## Max. :16218 Max. :16218 Max. :8061

After creating a directed graph of flight routes from this data, we used both the logic programming and graphtheory implementations of Dijkstra’s algorithm to find the shortest path from a single airport to all otherairports. Below one can see the time used by each implementation.

6

Page 7: LogicProgrammingShortestPathEfficiency

Graph Theory Logic Programming

Implementation

Tim

e in

Mill

isec

onds

010

030

050

0

Conclusions

From the data, we can see that in general, the Graph Theory implementation when implemented with abinary heap, is several times faster than the logic programming approach, and thus is the preferred choice forshortest path implementations in which speed is of the greatest importance. Additionally, the graph theoryimplementations uses objects, which while they do require more memory, have the extended capability ofassociating as many features as needed with edges and vertices, which in practice is especially important,such as in the case where an edge has multiple criteria contributing to total cost.

While these conclusions seem straightforward enough, it is worth noting that there a certainly some situationsin which it would be easier and faster to utilize the logic programming implementation. Specifically, whendealing with a knowledge base being stored as an ontology or a similar format, there is a distinct advantageto utilizing the logic programming implementation. This is that the the user defined properties and hierarchalobject structure of a knowledge base stored as an ontology translates directly into a set of facts and rules for alogic program. Specifically, the Data Property assertions within an ontology relate individuals to literals andcan be used as facts, while the Object Property assertions define relationships between individuals and otherindividuals and can be used as rules. As a result, logic programming works seamlessly with these sorts ofdata structures, while standard graph theory and other methods would require parsing the data, likely froma XML/RDF format, and creating a new data structure, which for large knowledge bases, would take quite abit of time. Thus, we can see that there are applications, such a information clustering applications where weare determining the similarity of concepts based on how many properties connect them, both directly andindirectly, where we may want to use the logic programming implementation of Dijkstra’s algorithm.

7

Page 8: LogicProgrammingShortestPathEfficiency

References

Algorithms, 4th edition by Robert Sedgewick and Kevin Wayne,Addison-Wesley Professional, 2011, ISBN 0-321-57351-X.http://algs4.cs.princeton.edu

United States Department of Transportation,Airline Origin and Destination Survey (DB1B),http://www.transtats.bts.gov/Fields.asp?Table_ID=247

8