22
Search Engines By: Faruq Hasan

Search Engines By: Faruq Hasan. 2 Today's Coverage ● Introduction ● Types of Search Engines ● Components of a Search Engine ● Semantics and Relevancy

Embed Size (px)

Citation preview

Search Engines

By: Faruq Hasan

2

Today's Coverage

● Introduction● Types of Search Engines● Components of a Search Engine● Semantics and Relevancy● Search engine Algorithm ● Search Engine Optimization● others

Introduction

• A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages.

• Search engines look through their own databases of information in order to find what it is that you are looking for…

Search engine• Definition: An internet-based tool that searches an index of documents for a particular

term, phrase or text specified by the user.

• Common Characteristics:• Spider, Indexer, Database, Algorithm• Find matching documents and display them according to relevance• Frequent updates to documents searched and ranking algorithm

5

Types of Search Engine

● Crawler Powered Indexes

– Guruji.com, Google.com● Human Powered Indexes

– www.dmoz.org● Hybrid Models

– Submitted URLs to a search engine ?● Semantic Indexes

– Hakia.com,

6

7

Copyleft (ɔ) 2009 Sudarsun Santhiappan 8

How does a Search Engine work ?

9

Your Browser

How Search Engines Work(Sherman 2003)

The Web

URL1

URL2

URL3 URL4

Crawler

Indexer

SearchEngine

Database Eggs?Eggs.

Eggs - 90%Eggo - 81%Ego- 40%

Huh? - 10%

All AboutEggsby

S. I. Am

Copyleft (ɔ) 2009 Sudarsun Santhiappan 10

Search Engine Internals

Copyleft (ɔ) 2009 Sudarsun Santhiappan 11

Search Engine Internals

● Crawlers● Indexers● Searching● Semantics● Ranking

Crawlers

• A crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program, which is also known as a "spider" or a "bot."

Indexers

• A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and the use of more storage space to maintain the extra copy of data.

Semantics

• Semantics is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for, their denotation. semantics is the study of meaning that is used for understanding human expression through language.

How Does Search Engine Works?

• How Does Search Engine Works?• Spider “crawls” the web to find new documents (web pages, other

documents) typically by following hyperlinks from websites already in their database

• Search engines indexes the content (text, code) in these documents by adding it to their databases and then periodically updates this content

• Search engines search their own databases when a user enters in a search to find related documents (not searching web pages in real-time)

• Search engines rank the resulting documents using an algorithm (mathematical formula) by assigning various weights and ranking factors

How it works

Anatomy of a Search Engine

Search Engine algorithm

• Unique to every search engine, and just as important as keywords, search engine algorithms are the why and the how of search engine rankings. Basically, a search engine algorithm is a set of rules, or a unique formula, that the search engine uses to determine the significance of a web page, and each search engine has its own set of rules. These rules determine whether a web page is real or just spam, whether it has any significant data that people would be interested in, and many other features to rank and list results for every search query that is begun, to make an organized and informational search engine results page. The algorithms, as they are different for each search engine, are also closely guarded secrets, but there are certain things that all search engine algorithms have in common.

• 1. Relevancy – One of the first things a search engine algorithm checks for is the relevancy of the page. Whether it is just scanning for keywords, or looking at how these keywords are used, the algorithm will determine whether this web page has any relevancy at all for the particular keyword. Where the keywords are located is also an important factor to the relevancy of a website. Web pages that have the keywords in the title, as well as within the headline or the first few lines of the text will rank better for that keyword than websites that do not have these features. The frequency of the keywords also is important to relevancy. If the keywords appear frequently, but are not the result of keyword stuffing, the website will rank better.

Algorithm • 2. Individual Factors – A second part of search engine algorithms are the

individual factors that make that particular search engine different from every other search engine out there. Each search engine has unique algorithms, and the individual factors of these algorithms are why a search query turns up different results on Google than MSN or Yahoo!. One of the most common individual factors is the number of pages a search engine indexes. They may just have more pages indexed, or index them more frequently, but this can give different results for each search engine. Some search engines also penalize for spamming, while others do not.

• 3. Off-Page Factors – Another part of algorithms that is still individual to each search engine are off-page factors. Off-page factors are such things as click-through measurement and linking. The frequency of click-through rates and linking can be an indicator of how relevant a web page is to actual users and visitors, and this can cause an algorithm to rank the web page higher. Off-page factors are harder for web masters to craft, but can have an enormous effect on page rank depending on the search engine algorithm.

• Search engine algorithms are the mystery behind search engines, sometimes even amusingly called the search engine’s “Secret Sauce”. Beyond the basic functions of a search engine, the relevancy of a web page, the off-page factors, and the unique factors of each search engine help make the algorithms of each engine an important part of the search engine optimization design.

What is SEO

• What is SEO

• SEO is an abbreviation for search engine optimization.

• SEO is the process of improving the volume or quality of traffic to a web site from search engines via search results.

• SEO aims to improve rankings for relevant keywords in search results.