Upload
conrad-cummings
View
215
Download
0
Embed Size (px)
Citation preview
Gregor Gisler-Merz23.07.2003
1
How to hit in google
The anatomy of a modern web search engine
Gregor Gisler-Merz23.07.2003
2
Why do we need search engines3Design goals of a search engine 3What are the benefits of a basic Web Search Engine knowledge? 4System Anatomy:
Google Architecture Overview 5Searching 6
How do I practically benefit from the new insights. Search tips 7How do I get listed in google 7
References 8
Content:
Gregor Gisler-Merz23.07.2003
3
• The amount of information is growing rapidly - over 3 billion indexed documents till now- over 150 million queries per day
• Human maintained indices cover not every topic, are expensive to build and maintain.• Automated search engines that rely on keyword matching usually return too many low quality matches.• A lot of advertisers take measures to mislead automated search engines.
Why do we need search engines:
• Improve search quality• Easy usage• Novel research activities on large scale web data
Design goals of a search engine:
Gregor Gisler-Merz23.07.2003
4
• Know what you can expect from your searches.• Get a listing of your own web site.• Build a reasonable Intranet Search Engine.• Improve your search infrastructure in your own applications.
What are the benefits of a basic Web Search Engine knowledge? :
Gregor Gisler-Merz23.07.2003
5
• Most of Google is implemented in C/C++ .• Downloading of web pages by several distributed web crawlers.• Every stored web page has an associated ID (docID).• The Indexer reads the repository, uncompresses the documents, and parses them.• Parsing/Scanning is done by a lexical analyzer (generated with flex)
Google Architecture Overview:
Gregor Gisler-Merz23.07.2003
6
• The Google Query Evaluation1 Parse the query2 Convert words into wordIDs.3 Seek to the start of the doclist in the short barrel for every
word.4 Scan through the doclists until there is a document that
matches all the search terms.5 Compute the rank of that document for the query.6 If we are in the short barrels and at the end of any doclist,
seek to the start of the doclist in the full barrel for every word and go to step 4.
7 If we are not at the end of any doclist go to step 4. Sort the documents that have matched by rank and return the top k.• The ranking system includes hitlists, anchor text and the PageRank. Google always tries to balance out on thes factors.• Page Ranking is backed by a lot of mathematics (graph theory, linear algebra and so on)
Searching :
Gregor Gisler-Merz23.07.2003
7
• Specify your search as much as you can.• Use exact phrases “Säuliämtler Seifenkistenrennen”• Look for Zürich with StopWords +Zürich• Exclude unwanted words with the - operator
Search tips:
How do I get listed in google?• Choose the correct keywords for your site and raise the keyword density.• Place your most important keyword phrase toward the beginning of the title tag.• Use Description and Keyword Meta Tags.• Use Header Tags.• Incorporate keywords in the alt tag of your images and place keywords to Page links.• Create a site map and a contact page.• Put only Quality Content on your Site (250-300 word per page).• Create for one keyword only one doorway page.• Do not use hidden text, repair broken links.• Attention with FRAMES: Add a lot of keyword rich text to the NOFRAMES tag.• Get reciprocal links and cross link your site (if possible).
Now get your web site listed in the major search engines and get a good ranking!!
Gregor Gisler-Merz23.07.2003
8
• google http://www.google.com/addurl.html• altavista http://www.altavista.com/addurl.html• alltheweb http://www.alltheweb.com/add_url.php• Tipps for getting listed: http://www.totalsubmission.co.uk, http://www.amigos.org• PageRank Uncovered: http://www.supportforums.org/PageRank.pdf • PageRank Computation and the Structure of the Web: Experiments and Algorithms
http://www2002.org/CDROM/poster/173.pdf• The Anatomy of a Large-Scale Hypertextual Web Search Engine
http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm• flex scanner generator: http://www.gnu.org/software/flex/flex.html
References :