Introduction to Graph Theory, Kruskal algorithmlazsa/combopt2020spring/second... · Introduction to Graph Theory, Kruskal algorithm László Papp BME 2020. 02. 17. Introduction to

Introduction to Graph Theory, Kruskalalgorithm

László Papp

BME

2020. 02. 17.

Introduction to Graphs

DefinitionA graph is a G = (V ,E) ordered pair of sets, where V is anonempty set and E is a set of pairs made from the elements ofV . The elements of V are called vertices or nodes. We saythat an element of E is an edge. The number of vertices andedges are denoted by v(G) and e(G), respecitvely.

Example:V (G) = {1,2,3,4}E(G) ={{1,2}; {1,3}; {1,4}; {3,4}}

Drawing of this graph:2

3 4

1 {1,2}

{3,4}

{1,3} {1,4}

Drawing of a graph

We can draw a graph on the plane. In a drawing each vertex isrepresented by a disc and an edge is a curve ending at itsvertices.Note that a drawing of a graph is not equivalent to the graphitself! A graph have many different drawings.

Example: Two different drawings of the same graph:

Loops, multiple edges

Definition: If edge e is the pair {v ,w} then we say thatvertices v and w are the end vertices or end points of e. Ifv1 = v2, then e is called as a loop. If two different edges havethe same end vertices, then they are called multiple or paralleledges. A simple graph have neither loops nor multiple edges.

Examples: V (G) = {1,2,3}E(G) = {{1,1}; {1,2}; {2,3}; {2; 3}}1 and 2 are end vertices of edgeg = {1,2}.h = {1,1} is a loop.e and f are multiple edges.This graph is not simple!

1h={1,1}

g={1,2}

2 3

e={2,3}

f={2,3}

Adjacency and incidencyDefinition: A vertex v and an edge e are incident if v is anend vertex of e.Edges e and f are adjacent, if they have a common end vertex.Vertices u and v are adjacent, if {u, v} is an edge of the graph.An isolated vertex is not incident to any edge.The number of edges which are incident to v is called thedegree of v and denoted by d(v).

Examples:Vertex 1 is incident to edge {1,2}.Vertices 1 and 2 are adjacent.Vertices 2 and 3 are not adjacent.Edges {1,3} and {3,4} are adjacent.5 is an isolated vertex..d(1) = 3, d(2) = 1.

2

3 4

1 {1,2}

{3,4}

{1,3} {1,4}

5

Subgraphs

Definition: H is a subgraph of graph G, if V (H) ⊆ V (G),E(H) ⊆ E(G) and H is a graph. We denote this relation with:H ⊆ G.

Example: H is a subgraph of G, but G is not a subgraph of H.

G H

1

2

4

3

5

2 3

54

Induced subgraphs

Definition: H is an induced subgraph of graph G, if H ⊆ Gand E(H) contains all of the edges of G that have bothendpoints in V (H).

Example:H is an inducedsubgraph of G.

2

H

3

5

1

G

1

32

4 5

Remark: Each induced subgraph of G can be obtained from Gby deleting some of its vertices and the edges which areincident to those vertices.

Definition: A walk in a graph is an alternating sequence ofvertices and edges (v0,e1, v1,e2, v2, . . . , vk−1,ek , vk ) such thatei is incident to vertices vi−1 and vi for all i . If v0 = vk , then wesay that the walk is closed.

v3

e1

v1, v4 v2, v5 v6

e2,e5 e6

e3e4

e7

v0, v7

A walk is called a trail if all of its edges are different. A path isa trail where all of the vertices are different. A cycle is a closedtrail where all the vertices are different except the first and thelast.

e1

v0

e2 e3

v1 v2 v3 v1v3

v2v3

e2

e3

e1

e4

v0, v4

Connectivity

Definition A graph is connected if there is a path between anytwo of its vertices.

Example:This graph is notconnected, it contains3 connectedcomponents, whichare enringed.

2 7

6

8

1

32

4 5 9 10

G

Definition: We say that K is a connected component of G, ifK is a connected induced subgraph of G and there is no path inG between a vertex contained in K and a vertex not containedin K .

Trees and its properties

Definition: A graph is a tree if it isconnected and does not contain a cycle as asubgraph.

Claim:Each tree which has more than one vertexhas at least two degree one vertices.Remark: A degree one vertex usually calledas a leaf vertex, so the previous claim canbe transcribed as the following: A treehaving more than one vertex has at leasttwo leafs.

Claim:The number of edges in a tree having n vertices is n − 1.Proof: A tree having exactly one vertex does not have an edge.A tree having two vertices has one edge. We use induction onthe number of vertices.Assume, that the statement is true for all k < n and now weprove it for n.Let F be any arbitrary tree having n > 2 vertices. It has a leaf.Delete this leaf with edge edge which is adjacent to it. Thenumber of edges and the number of vertices both decreased byone and the obtained graph is still a tree. So it has n − 1vertices and by the induction hypothesis it has n − 2 edges. SoF has n − 1 edges.

Spanning subgraphs

Definition: H is a spanning subgraph of graph G, if G ⊆ Hand V (H) = V (G). So H is a subgraph of G and it contains allvertices of G.Remark: Each spanning subgraph of G can be obtained fromG by edge deletion.

Example: H is aspanningsubgraph of G.

2

G H

1

32

4 5

3

1

54

Spanning treesDefinition: T is a spanning tree of G if it is a spanningsubgraph of G and a tree.

TheoremEvery connected graph has a spanning tree.Proof: If G is a connected graph but not a tree, then it containsa cycle. Delete an edge of a cycle. The obtained graph is stillconnected. If it is not a tree then we repeat the previous stepuntil no cycle remains.Since at each step the number of edges decreases, eventuallythere are not enough edges to have a cycle, but the graph isstill connected. So we end up with a spanning tree.

Definition: A graph without a cycle is called a forest.

So every tree is a forest, but not every forest is a tree.

A combinatorial optimization problem:

We have six towns and we want to build a telecommunicationnetwork such that between any two cities we can send amessage. Due to some reasons, we do not want to connectcables outside of the towns. We know in advance that whichtowns can be connected by a direct wire and how much is thecost of such a wire.Task: Find the cheapest connected network!

E

F

CA

8

1

12

3

35

3

1

3

B

D

A not optimal solution

An optimal solution does not contain a cycle, because if wedelete an edge contained in the cycle the network remainsconnected.

E

F

CA

8

1

12

3

35

3

1

3

B

D

So we are looking for a spanning tree, but not all spanningtrees are good enough.The price of this network is 1 + 8 + 3 + 3 + 1 = 16.

Finding an optimal solution

E

F

CA

8

1

12

3

35

3

1

3

B

D

Each step we choose the cheapest edge which does not makesa cycle with the edges chosen earlier. This is a greedyalgorithm, since at each step we choose the locally best option.So the price of a cheapest network is 8 and we also haveobtained such a network.

Minimum weight spanning tree problemLet G be a graph and s : V (G)→ R+ ∪ {0} be a non-negativefunction on the edge set of G. This function s tells us the weight(or cost) of the edges. If we take a subgraph H of G, then theweight (cost) of H is

∑e∈E(H) s(e).

E

F

CA

8

1

12

3

35

3

1

3

B

D

Task: Find a spanning tree of G, whose weight is the smallestpossible!

During our example problem, we have solved this task in agreedy way.

Kruskal’s algorithm

1. We sort the edges of the graph to ascending orderaccording to their weight: e1,e2, . . .em.

2. Let F be the graph containing all the vertices of G but noneof its edges and let i := 1.

3. If E(F ) ∪ ei does not contain a cycle then we add ei toE(F ).

4. If i < m, then we increase i by one and we apply step 3again.

ClaimIf a graph G is connected, then Kruskal’s algorithm gives aminimal weight spanning tree of G.Remark: We have run this algorithm previously.

Greedy algorithms

Definition: An algorithm is called greedy if at each choice itchooses the locally best option.

Kruskal’s algorithm is a greedy algorithm, because at each stepit tries to include the lightest (cheapest) edge.

Remark: Usually greedy steps and greedy algorithms do notlead to optimal solutions. We are going to see examples for thisphenomenon later.

How fast is Kruskal’s algorithm?

The time complexity of Kruskal’s algorithm is O(e log(e)) wheree is the number of edges in the input graph. Sorting the edgesaccording to their weight requires Θ(e log(e)) operations in theworst case. This is the main term here, but we will not give areasoning for that.

Questions regarding the effectiveness of Kruskal’salgorithm:

I How can we encode a graph?I Is it much better than the brute-force method?

The brute-force method: Consider each spanning-tree of thegraph, calculate its weight then choose the smallest one.

Note: This is not yet an algorithm because we have notspecified how to find all the spanning trees.

How to encode (simple) graphsReminder: A graph G = (V ,E) is an ordered pair of sets,where V is the set of vertices and E is the set of edgescontaining pairs of vertices.There are two major method to encode a graph:Adjacency list: For each vertex we write down the set ofvertices which are adjacent.

ExampleFor the given graph it is:1 : 2, 3, 4;2 : 1;3 : 1, 4;4 : 1, 3;5 :

2

3 4

1 {1,2}

{3,4}

{1,3} {1,4}5

If the alphabet contains a symbols than the number of vertices,then the size of an Adjacency list is Θ(e + n), where e and ndenote the number of edges and the number of vertices,respectively.

How to encode (simple) graphs

Adjacency matrix: Each vertex has a corresponding columnand a row. Ai,j equals 1 if vertices i and j are adjacent and 0otherwise.

Example

A =

0 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 00 0 0 0 0

2

3 4

1 {1,2}

{3,4}

{1,3} {1,4}5

The size of the Adjacency matrix is n2.

Question: Which encoding requires less space?Answer: Usually the adjacency list.

Complete graphs

Definition A complete graph is asimple graph where any twovertices are adjacent. Thecomplete graph having n verticesis denoted by Kn.

Question: How many edges doesKn have?

Answer: n(n−1)2 , because: From each vertex n − 1 edges goes

to the other vertices. If we sum it for all vertices, then we obtainn(n − 1). We have counted each edge twice, at both of itsendpoints. To compensate this we divide this number by 2.

Remark: Any simple graph having n vertices is a subgraph ofKn. Therefore it has at most n(n−1)

2 edges.

Corollary: The size of a graph’s adjacency list is in O(n2).

Time complexity of the the brute-force algorithm

The brute force algorithm checks each spanning tree.In Kn there are nn−2 spanning trees. (If you are interested in theproof, search for Cayley’s formula.)

That is so much. The function f (n) = nn−2 /∈ O(2n). It growsfaster than any exponential function.

So the brute-force algorithm runs so slow if the input graph is acomplete graph, but what if it is something else?

Answer: It is too slow for most of the possible input graphs. Soavoid it!

Remark: Checking all possible solutions, evaluating theobjection function for each and picking the best one is a badidea. It can be done in finite time, but in most of the cases ittakes way to much time. This is the reason why we are lookingfor smart algorithms!

Summary for Kruskal’s algorithm

I It finds a minimum weight spanning tree.I It runs in O(e log e) time, which is just a little bit more than

the O(e) steps which is required to read the adjacency list.I It is much faster than the brute-force algorithm.I It is a greedy algorithm.

An application: Normal trees

Assume that we have an electric circuit with three type ofcomponents: resistors, voltage sources and current sources.

I

V

I

V

V

We create a graph from the electrical circuit: The vertices arethe equipotential surfaces and the edges are the components.A normal tree of the circuit is a spanning tree which containsall the voltage sources but none of the current sources.

The use of normal trees

We know the properties of the electronic components:resistence, voltage and current. We want to determine thevoltage and current across each component by using theKirchoff’s circuit laws. Sometimes this cannot be done,because there are infinitely many solutions.

I

V

I

V

V

Claim:If the circuit does not have a normal tree, then the Kirchoff’slaws does not give a unique solution.

Finding a normal tree

We assign weight to the components by the following rule:I Voltage source 1I Resistor 3I Current source 5

I

V

I

V

V

1 3

35 5

3 1

1

We search a minimum weight spanning tree by Kruskal’salgorithm. If it contains all the voltage sources and none of thecurrent sources, then it is a normal tree. Otherwise the circuitdoes not have a normal tree.

Documents

Introduction to Graph Theory, Kruskal algorithmlazsa/combopt2020spring/second... · Introduction to Graph Theory, Kruskal algorithm László Papp BME 2020. 02. 17. Introduction to