12
Search Engine Interfaces search engine modus operandi

Search Engine Interfaces search engine modus operandi

Embed Size (px)

Citation preview

Page 1: Search Engine Interfaces search engine modus operandi

Search Engine Interfaces

search engine modus operandi

Page 2: Search Engine Interfaces search engine modus operandi

The basics: what’s a search engine?

Search engines are special websites that are designed to find information stored on other sites

Most have the following capabilities: Search the Internet based on important words Keep an index of the words they find and where they

were found Allow users to looks for words or combos of words in

that index

Page 3: Search Engine Interfaces search engine modus operandi

There’s a lot of sites out there….

Indeed (thousands upon thousands nowadays)

The first search engine (for Gopher) was Archie (archive

without the “v”) .. Later, after the rise of Gopher came… Veronica (Very Easy Rodent-Orientated Net-wide Index to

Computerized Archives)

Jughead (Jonzy’s Universal Gopher Hierarchy Excavation And Display)

Page 4: Search Engine Interfaces search engine modus operandi

There’s a lot of sites out there….

Wandex - 1993 .. First search engine (for the Web)

WebCrawler - 1994 (let users search for any word in any page.. revoutionary! Now standard..)

Lycos - 1994 (Carnegie Mellon University)

Many others came after…. Excite, Infoseek, Inktomi, Northern Light, AltaVista, Yahoo!… Google came about around 2000 and rose to popularity because of it’s

innovative PageRank system

Page 5: Search Engine Interfaces search engine modus operandi

How does it work?

The pieces of a search engineA ‘spider’ or ‘crawler’

Software “robots” that go out and visit pages on the web and build lists of words that they find on each page

An index The data (words) that are gathered are indexed

(by a method determined by the particular search engine)

A search Usually accompanied by Boolean logic

Page 6: Search Engine Interfaces search engine modus operandi

Example: Google

Claim to fame: the PageRank system Uses multiple spiders (initially 3 at once)

Spiders take note of: Words on the page & Where they were found

The index consists of every “significant” word on each page Google excludes the articles ‘a’, ‘an’, and ‘the’

Each page that is indexed is weighted according to the PageRank System (a link analysis algorithm to provide a numerical weight)

Searching When a search is performed by a user, Google retrieves from its index

all of the pages that contain those keywords AND sorts them according by the assigned ‘PageRank’

Ideally the first several sites listed will match your search criteria

Page 7: Search Engine Interfaces search engine modus operandi

Example: Ask (formerly AskJeeves)

Claim to fame: the ExpertRank algorithm (formerly Teoma) Uses multiple spiders

Spiders take note of: Words on the page & Where they were found (same as Google)

The index consists of every “significant” word on each page Uses link analysis like Google

Each page is then also analyzed to determine its popularity among pages that are considered “experts” on the topic of the search. This is called subject-specific popularity.

Searching - natural language search (or subject-specific search) When a search is performed by a user, Ask goes and finds the keywords in

it’s index, figures out the topics (known as ‘clusters’), the experts on those topics, and then finds the most popular results among those experts

This leads to a unique “editorial flavor” to searching (www.ask.com)

Page 8: Search Engine Interfaces search engine modus operandi

Notable others: AltaVista and Lycos

The AltaVista search engine indexes every word on the page - even insignificant articles such as ‘a’, ‘an’, and ‘the.’

The Lycos search engine “is said to” index around 100 of the most frequently words used on the page as well as each word in the first 20 lines of text.

Page 9: Search Engine Interfaces search engine modus operandi

So many options…

Google is the most used search engine on the Internet today. (Around 50% of queries go through it)

However, there are more efficient ways to search… Ask.com’s subject-specific searching much better reflects the

way the Web is set up (in subject specific clusters). However, because of the complexity of their algorithm, the search results produced were inferior to competitors like Google’s PageRank system

Only recently has Ask began to cut into the search engine market share (way behind Google, Yahoo, and MSN) by reducing how well the keywords must match the results (reduced from 100% to about 95%) This yields more search results and puts Ask in a better position to compete for market share.

Page 10: Search Engine Interfaces search engine modus operandi

By the numbers….

Below: Popularity (as of 12/07)

Right: Timeline of major launches

Page 11: Search Engine Interfaces search engine modus operandi

Search engines of the future….

Two types of searching: Navigational and Research Search Navigational search - the user uses the search engine as a tool to

navigate to a particular intended document Research search - the user provides the search engine with a

phrase which is intended to denote an object about which the user is trying to gather/research information.

Rather than use ranking algorithms such as Google's PageRank to predict relevancy, Semantic Search uses semantics, or the science of meaning in language to produce highly relevant search results. The goal is to deliver the information queried by a user rather than

have a user sort through a list of loosely related keyword results.

Page 12: Search Engine Interfaces search engine modus operandi

Semantic Searching

Contingent upon correct semantic markup - and searching over richly structured data (ie XML and RDF)

The goal is to deliver the information queried by the user rather than have a user sort through a list of loosely related keyword results.

Examples: www.hakia.com and www.PowerSet.com