8
Gregor Gisler- Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Embed Size (px)

Citation preview

Page 1: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

1

How to hit in google

The anatomy of a modern web search engine

Page 2: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

2

Why do we need search engines3Design goals of a search engine 3What are the benefits of a basic Web Search Engine knowledge? 4System Anatomy:

Google Architecture Overview 5Searching 6

How do I practically benefit from the new insights. Search tips 7How do I get listed in google 7

References 8

Content:

Page 3: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

3

• The amount of information is growing rapidly - over 3 billion indexed documents till now- over 150 million queries per day

• Human maintained indices cover not every topic, are expensive to build and maintain.• Automated search engines that rely on keyword matching usually return too many low quality matches.• A lot of advertisers take measures to mislead automated search engines.

Why do we need search engines:

• Improve search quality• Easy usage• Novel research activities on large scale web data

Design goals of a search engine:

Page 4: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

4

• Know what you can expect from your searches.• Get a listing of your own web site.• Build a reasonable Intranet Search Engine.• Improve your search infrastructure in your own applications.

What are the benefits of a basic Web Search Engine knowledge? :

Page 5: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

5

• Most of Google is implemented in C/C++ .• Downloading of web pages by several distributed web crawlers.• Every stored web page has an associated ID (docID).• The Indexer reads the repository, uncompresses the documents, and parses them.• Parsing/Scanning is done by a lexical analyzer (generated with flex)

Google Architecture Overview:

Page 6: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

6

• The Google Query Evaluation1 Parse the query2 Convert words into wordIDs.3 Seek to the start of the doclist in the short barrel for every

word.4 Scan through the doclists until there is a document that

matches all the search terms.5 Compute the rank of that document for the query.6 If we are in the short barrels and at the end of any doclist,

seek to the start of the doclist in the full barrel for every word and go to step 4.

7 If we are not at the end of any doclist go to step 4. Sort the documents that have matched by rank and return the top k.• The ranking system includes hitlists, anchor text and the PageRank. Google always tries to balance out on thes factors.• Page Ranking is backed by a lot of mathematics (graph theory, linear algebra and so on)

Searching :

Page 7: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

7

• Specify your search as much as you can.• Use exact phrases “Säuliämtler Seifenkistenrennen”• Look for Zürich with StopWords +Zürich• Exclude unwanted words with the - operator

Search tips:

How do I get listed in google?• Choose the correct keywords for your site and raise the keyword density.• Place your most important keyword phrase toward the beginning of the title tag.• Use Description and Keyword Meta Tags.• Use Header Tags.• Incorporate keywords in the alt tag of your images and place keywords to Page links.• Create a site map and a contact page.• Put only Quality Content on your Site (250-300 word per page).• Create for one keyword only one doorway page.• Do not use hidden text, repair broken links.• Attention with FRAMES: Add a lot of keyword rich text to the NOFRAMES tag.• Get reciprocal links and cross link your site (if possible).

Now get your web site listed in the major search engines and get a good ranking!!

Page 8: Gregor Gisler-Merz 23.07.2003 1 How to hit in google The anatomy of a modern web search engine

Gregor Gisler-Merz23.07.2003

8

• google http://www.google.com/addurl.html• altavista http://www.altavista.com/addurl.html• alltheweb http://www.alltheweb.com/add_url.php• Tipps for getting listed: http://www.totalsubmission.co.uk, http://www.amigos.org• PageRank Uncovered: http://www.supportforums.org/PageRank.pdf • PageRank Computation and the Structure of the Web: Experiments and Algorithms

http://www2002.org/CDROM/poster/173.pdf• The Anatomy of a Large-Scale Hypertextual Web Search Engine

http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm• flex scanner generator: http://www.gnu.org/software/flex/flex.html

References :