27
06/16/22 1 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

Embed Size (px)

DESCRIPTION

1/16/20163

Citation preview

Page 1: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 1

Introduction toGraphs

15-111 Advanced Programming

Concepts/Data Structures

Ananda Gunawardena

Page 2: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 2

An Airline route Map

Page 3: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 3

Page 4: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 4

Introduction • Many real world problems can be modeled using graphs– Airline Route Map

• What is the fastest way to get from Pittsburgh to St Louis?• What is the cheapest way to get from Pittsburgh to St Louis?

– Electric Circuits• Circuit elements - transistors, resistors, capacitors• is everything connected together?

– Depends on interconnections (wires)• If this circuit is built will it work?

– Depends on wires and objects they connect.

Page 5: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 5

Graphs• More applications

– Job Scheduling• Interconnections indicate which jobs to be performed before others• When should each task be performed

• All these questions can be answered using a mathematical structure named a “graph”. We will answer the questions– what are graphs?– what are their basic properties?

Page 6: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 6

Graph Definitions• Graph

– A set of vertices(nodes) V = {v1, v2, …., vn}– A set of edges(arcs) that connects the vertices E={e1, e2,

…, em}– Each edge ei is a pair (v, w) where v, w in V – |V| = number of vertices (cardinality)– |E| = number of edges

• Graphs can be– directed (order (v,w) matters)– Undirected (order of (v,w) doesn’t matter)

• Edges can be – weighted (cost associated with the edge)– eg: Neural Network, airline route map(vanguard airlines)

Page 7: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 7

Graph Representation• How do we represent a graph internally?• Two ways

– adjacency matrix– Adjacency list

• Adjacency Matrix– Use matrix entries to represent edges in the graph

• Adjacency List– Use an array of lists to represent edges in the graph

(we will discuss this later)

Page 8: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 8

Adjacency Matrix• Adjacency Matrix

– For each edge (v,w) in E, set A[v][w] = edge_cost– Non existent edges with logical infinity

• Cost of implementation– O(|V|2) time for initialization– O(|V|2) space

• ok for dense graphs• unacceptable for sparse graphs

Page 9: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 9

Adjacency List• Adjacency List

– Ideal solution for sparse graphs– For each vertex keep a list of all adjacent vertices– Adjacent vertices are the vertices that are connected to the vertex

directly by an edge.– Example

List 0

List 1

List 2

1 2

2 0 1

1

Page 10: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 10

Adjacency List• The number of list nodes equals to number of edges

– O(|E|) space • Space is also required to store the lists

– O(|V|) for |V| lists• Note that the number of edges is at least round(|V|/2)

– assuming each vertex is in some edge– Therefore disregard any O(|V|) term when O(|E|) is

present• Adjacency list can be constructed in linear time (wrt to

edges)

Page 11: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 11

Breadth First Traversal

• Algorithm– Start from any node in the graph– Traverse its neighbors (nodes that are directly

connected to it) using some heuristic– Next traverse the neighbors of the neighbors

etc.. Until some limit is reach or all the nodes in the graph are visited

– Use a queue to perform the breadth first traversal

Page 12: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 12

Depth First Traversal

• Algorithm– Start from any node in the graph– Traverse deeper and deeper until dead end– Back track and traverse other nodes that are

not visited– Use a stack to perform the depth first

traversal

Page 13: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 13

Web as a Graph

URL 1

URL 2

URL 7

URL 5

URL 3

URL 6

URL 4

Page 14: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 14

Web Algorithms

Page 15: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 15

Web Algorithms• Search

– Google, MSN, Altavista• Image search

– games• Routing• Distributed Computing• Shortest Path Algorithms

– Google Maps, MapQuest• Semantic Web

– XML metadata• Etc.

Page 16: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 16

Web Search Engines A Cool Application of Graphs

Page 17: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 17

Building a Search Engine• Crawl the web• Build a web index• Then when we build/search, we may have

to sort the index– Google sorts more than 100 billion index

items• Novel algorithms, novel data structures, distributed

computing

Page 18: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 18

A basic Search Engine Architecture

Page 19: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 19

Google Architecture

Page 20: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 20

Google’s server farm

Page 21: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 21

Web Crawlers Start with an initial page P0. Find URLs on P0 and

add them to a queue When done with P0, pass it to an indexing program,

get a page P1 from the queue and repeat Can be specialized (e.g. only look for email

addresses) Issues

Which page to look at next? (Special subjects, recency) How deep within a site do you go (depth search)? How frequently to visit pages?

Page 22: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 22

So, why Spider the Web?

Refresh Collection by deleting dead links

OK if index is slightly smaller

Done every 1-2 weeks in best engines

Finding new sites

Respider the entire web

Done every 2-4 weeks in best engines

Page 23: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 23

Cost of Spidering

Spider can (and does) run in parallel on hundreds of severs

Very high network connectivity (e.g. T3 line)

Servers can migrate from spidering to query processing depending on time-of-day load

Running a full web spider takes days even with hundreds of dedicated servers

Page 24: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 24

Indexing Arrangement of data (data structure) to permit

fast searching Which list is easier to search? sow fox pig eel yak hen ant cat dog hog ant cat dog eel fox hen hog pig sow yak Sorting helps. Why?

Permits binary search. About log2n probes into list log2(1 billion) ~ 30

Permits interpolation search. About log2(log2n) probes log2 log2(1 billion) ~ 5

Page 25: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 25

Inverted Files A file is a list of words by position

- First entry is the word in position 1 (first word)- Entry 4562 is the word in position 4562 (4562nd word)- Last entry is the last word

An inverted file is a list of positions by word!

POS1

10

20

30

36

FILE

a (1, 4, 40)entry (11, 20, 31)file (2, 38)list (5, 41)position (9, 16, 26)positions (44)word (14, 19, 24, 29, 35, 45)words (7)4562 (21, 27)

INVERTED FILE

Page 26: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 26

Inverted Files for Multiple Documents

107 4 322 354 381 405232 6 15 195 248 1897 1951 2192677 1 481713 3 42 312 802

WORD NDOCS PTRjezebel 20jezer 3jezerit 1jeziah 1jeziel 1jezliah 1jezoar 1jezrahliah 1jezreel 39

jezoar

34 6 1 118 2087 3922 3981 500244 3 215 2291 301056 4 5 22 134 992

DOCID OCCUR POS 1 POS 2 . . .

566 3 203 245 287

67 1 132. . .

“jezebel” occurs6 times in document 34,3 times in document 44,4 times in document 56 . . .

LEXICON

WORD INDEX

Page 27: 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

05/03/23 27

Ranking (Scoring) Hits Hits must be presented in some order What order?

Relevance, recency, popularity, reliability, alphabetic? Some ranking methods

Presence of keywords in title of document Closeness of keywords to start of document Frequency of keyword in document Link popularity (how many pages point to this one)