10
Algorithms for Web Indexing and Searching Gerth Stølting Brodal and Rolf Fagerberg Fall 2002 1

Search Algorithms

Embed Size (px)

DESCRIPTION

search engine algorithms

Citation preview

Page 1: Search Algorithms

Algorithms for

Web Indexing and Searching

Gerth Stølting Brodal and Rolf Fagerberg

Fall 2002

1

Page 2: Search Algorithms

Course Motivation

How does Google work?

How do search engines work?

Algorithms for web indexing and searching

2

Page 3: Search Algorithms

Course Motivation

How does Google work?

How do search engines work?

Algorithms for web indexing and searching

2

Page 4: Search Algorithms

Course Motivation

How does Google work?

How do search engines work?

Algorithms for web indexing and searching

2

Page 5: Search Algorithms

Course Outline

1. Introduction to Course

2. General Anatomy of Web Search Engines

3. Building blocks of Search Engines(a) Web Crawlers

• Anatomy of crawlers• Crawling strategy

(b) Index• Inverted files• Suffix trees• Signature files• Compression• Issues of efficient construction• Duplicate removal

3

Page 6: Search Algorithms

Course Outline

(c) Types of Queries(d) Ranking

• Textbased methods– Vector based methods– Latent semantic indexing

• Link based methods– PageRank– HITS– SALSA– Others

4

Page 7: Search Algorithms

Course Outline

4. Further topics

(a) Clustering(b) Automatic Categorization/Hierarchy Building(c) Evaluation of search engines(d) Structure of and Models for the Web Graph(e) Data Mining

5

Page 8: Search Algorithms

Formal Course Description

Prerequisites: dADS

Literature: Handouts

Course language: Danish or English

Credits: 2 points/10 ECTS

Evaluation: Programming project

Course page:

http://www.daimi.au.dk/~gerth/webalg02/index.html

6

Page 9: Search Algorithms

Programming Project

Implement a Web Search Engine

Distributed project

Groups (2–4 persons) doing:

Web crawlingIndex building

RankingQuery interface

Start: index Aarhus University websiteGoal: index domain .dk

7

Page 10: Search Algorithms

Programming Project

Implement a Web Search Engine

Distributed project

Groups (2–4 persons) doing:

Web crawlingIndex building

RankingQuery interface

Start: index Aarhus University websiteGoal: index domain .dk

7