35
National Institute Of Technology, Kurukshetra Meta Search Engine using Distributed Information Retrieval Submitted to: Submitted by:- Mrs. Navneet Kaur Romil  Asst. Professor 108012 Computer Department CO 1 13-Apr-12 1

Met a Search Engine

  • Upload
    romilg1

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 1/35

National Institute Of Technology,Kurukshetra

Meta Search Engine using DistributedInformation Retrieval

Submitted to:  Submitted by:-

Mrs. Navneet Kaur Romil

 Asst. Professor 108012

Computer Department CO 1

13-Apr-12 1

Page 2: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 2/35

National Institute Of Technology,Kurukshetra

Outline Web Search. Limitations of Simple Web Search and Search Engine.

Meta Search. Meta Search Engine Architecture and Advantages.

Examples of Meta Search Engine.

IR and Distributed IR.

Current Research Work in Meta Search Engine

13-Apr-12 2

Page 3: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 3/35

National Institute Of Technology,Kurukshetra

Requirement of Search Engine Content is created by diverse organizations and

individuals.

Information on the Web is inherently heterogeneous.

Content is distributed on multiple servers in multiplelocations and multiple formats and languages aimedfor diverse audiences and purposes.

The “Open Web” of billions of static Web pages isindexed and searched via multiple search enginesand directories.

13-Apr-12 3

Page 4: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 4/35

National Institute Of Technology,Kurukshetra

Web Search Engines... Fastsearch (alltheweb.com)

 Altavista (www.altavista.com)

Google (www.google.com) Northernlight (www.northernlight.com)

HotBot (www.hotbot.com)

Excite (www.excite.com)

New search Engines

Teoma (http://www.teoma.com)

Wisenut (http://www.wisenut.com)

13-Apr-12 4

Page 5: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 5/35

National Institute Of Technology,Kurukshetra

Web Search Engines... Specialty search engines:

Country-specific search engines www.khoj.com

www.123india.com Subject-specific search engines

Chemfinder (www.chemfinder.com)

Engineering Resources Online (www.er-online.co.uk)

MathSearch (www.maths.usyd.edu.au:8000/MathSearch.html)

Netpart: Company site locator (www.websense.com/locator.cfm) World Trade Locator (www.intl-tradenet.com)

Resource-specific search engines: Patents (www.uspto.gov)

Journal articles (www.findarticles.com)

13-Apr-12 5

Page 6: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 6/35

National Institute Of Technology,Kurukshetra

Problems in Web Search Even the largest of the current search engines index

only a fraction of all Web pages. Search engines vary in terms of search techniques/

syntax. Different search engines return different search

results due to the variation in indexing and searchprocess (40% non-overlap).

None of the search engines come close to indexing

the entire web, much less the entire Internet.

13-Apr-12 6

Page 7: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 7/35

National Institute Of Technology,Kurukshetra

Overlap Among 3 Major Search Engines 

13-Apr-12 7

Page 8: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 8/35

National Institute Of Technology,Kurukshetra

Why Are Meta Search Engines Useful ? Meta Search improves the Search Quality in many

ways:

Comprehensive, Efficient,

One query queries all {one-click paradigm},

13-Apr-12 8

Page 9: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 9/35

National Institute Of Technology,Kurukshetra

Why Meta Search ? Individual Search engines don’t cover all the web

by themselves.

Individual Search Engines are prone to spamming.People trying to raise their ranking profile in a

non-legitimate manner or to promote commerce. So,paying sites can get higher ranking.

Difficulty in deciding and obtaining results withcombined searches on different search engines.

13-Apr-12 9

Page 10: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 10/35

National Institute Of Technology,Kurukshetra

Differences {Search Vs. Meta-search} Doesn’t generally have a Database by itself, 

Does not search{crawl} the web.

 A Meta-Search Engine in terms of search engine. Essentially is a hub of search engines/databases

accessible by a common interface providing the userwith results which may/may not be rankedindependently of the original search engine/source

ranking.

13-Apr-12 10

Page 11: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 11/35

National Institute Of Technology,Kurukshetra

 Algorithm used in Meta Search Engine Meta Search Engine may simultaneously search

multiple open web and hidden websites in order toincrease content coverage, precision, relevanceand/or search efficiency and effectiveness.

It integrates best practices Information Retrieval and Natural Language Processing techniqueswith AI heuristics to create an advanced general

purpose meta-search, result clustering andknowledge discovery tool.

13-Apr-12 11

Page 12: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 12/35

National Institute Of Technology,Kurukshetra

Meta Search Engine Architecture

S E 1 S E 2 S E 3

Dispatcher

Display

 U s  e r I  n t   e r f   a  c  e 

Knowledge

Personalize

Query

Feedback 

User

Web

13-Apr-12 12

Page 13: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 13/35

National Institute Of Technology,Kurukshetra

Meta Search Engine Architecture User Interface

Normally resemble search engine interfaces with

options for Types of search [Media]

Search Engines to Use

Dispatcher

Generates actual queries to the search engines by using the

user query.

May involve choosing/expanding search engines to use.

13-Apr-12 13

Page 14: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 14/35

National Institute Of Technology,Kurukshetra

Meta Search Engine Architecture Display.

Generates Results page from the replies received,

May involve ranking,parsing,clustering of thesearch results or just plain stitching.

Personalization/Knowledge.

May contain either or both. Personalization may

involve weighting of search results/query/enginefor each user.

13-Apr-12 14

Page 15: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 15/35

National Institute Of Technology,Kurukshetra

Clustering in Meta Search Engine Organize search results into categories or folders to

build a clear, concise picture for its users. By usingclustering service in meta search engine users cancomfortably explore much more information in anorganized way, rather than being bombarded withdisorganized information dumps.

www-math.mit.edu/cluster/

www.Vivisimo.com

13-Apr-12 15

Page 16: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 16/35

National Institute Of Technology,Kurukshetra

Screen Output of Vivisimo

13-Apr-12 16

Page 17: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 17/35

National Institute Of Technology,Kurukshetra

Introduction to IR  What is Information Retrieval (IR)?

Indexing text and searching for useful documentsin a collection.

Searching documents on the web.

Given a query, retrieving relevant documents

efficiently.

Commercially successful (Google, Yahoo, MSN, Ask Jeeves, etc).

13-Apr-12 17

Page 18: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 18/35

Information Retrieval

Information Retrieval is the science, study and practiceof how humans seek information.

Information seeking is complex human behavior, in

which some sort of cognitive change is sought. The nature of information is similarly complex. Does it

exist apart from a human observer? Why is oneperson’s “data” another person’s “information?” Canwe measure the information content of a message, or is

that only for the telephone engineers (like Shannon & Weaver)?

13-Apr-12 18National Institute Of Technology,

Kurukshetra

Page 19: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 19/35

National Institute Of Technology,Kurukshetra

Why IR is not reliable For all their performance, modern search is often

unsatisfying: finding the information you want isdifficult.

IR systems use queries as expressions of informationneed. But such expressions are necessarily inexact: human language is imprecise

queries are usually short, but might represent complexneeds

a person’s history and background will impact whichinformation is useful

 A document != information

13-Apr-12 19

Page 20: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 20/35

National Institute Of Technology,Kurukshetra

Ranking Method in Conventional Search

Engine  A query has t terms (i.e., words). To get a relevant

score for an entire document j, we treat the queryand document as vectors, normalize (take the vector

norm) and compute the cosine (which is the dotproduct of the normalized vectors).

Cosine ranges from 0 to 1; 0 is orthogonal(“unrelated”), 1 is a perfect match. Rank in

descending order and present results.

13-Apr-12 20

Page 21: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 21/35

National Institute Of Technology,Kurukshetra

Numerically…  Imagine a two word query.

Document d1 has a weighted score of 1 for term 1,and 2 for term 2: vector[1,2].

Query terms are weighted vector[1,3].

We first normalize the document to vector[.45,.89],and query to vector[.34,.95].

Then get cosine = .9985.

13-Apr-12 21

Page 22: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 22/35

National Institute Of Technology,Kurukshetra

Ranking in Meta Search Engine In an election, each of a large number k of voters

ranks a small number n of candidates. In web meta-search, in response to a given query, each of a small

number k of search engines (voters) ranks a (subsetof a) large number n of pages (candidates). Theresults are then combined in some fashion toproduce a ranking that is in some sense "better'' than

the results produced by any single search engine.

13-Apr-12 22

Page 23: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 23/35

National Institute Of Technology,Kurukshetra

Distributed Information Retrieval

Engine 1 Engine 2 Engine 3 Engine 4 Engine n. . . .. . .

. . .?

Information

Need

13-Apr-12 23

Page 24: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 24/35

National Institute Of Technology,Kurukshetra

Distributed Information Retrieval Site description: Contents, search engine, services,

etc.

Resource ranking: ranking resources by how likely tocontain desired content.

Resource selection: selecting the best subset from aranked list.

Result merging: Merging a set of document rankings

different underlying corpus statistics

different search engines

13-Apr-12 24

Page 25: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 25/35

National Institute Of Technology,Kurukshetra

Distributed IR Process

13-Apr-12 25

Page 26: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 26/35

National Institute Of Technology,Kurukshetra

Distributed IR benefits Primary Motivations for Distributed IR 

Partition large collections across processors

To increase speed

Because of political or administrative requirements Ever-increasing amounts of data

Networks, with hundreds or thousands of collections

Consider the number of collections indexed on the Web

Heterogeneous environments, many IR systems

Economic costs of searching everything at a site

Economic costs of searching everything on a network 

13-Apr-12 26

Page 27: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 27/35

National Institute Of Technology,Kurukshetra

Current Research & Innovations in Meta

Search Engine In the Research paper “Using Relevance Feedback In

Content Based Image Metasearch” by Ana B. Benitez,Mandis Beigi, And Shih-Fu Chang Columbia University

in IEEE Internet magazine , the authors propose aMeta Search Engine named MetaSeek.

MetaSeek is an image meta search engine developedto explore the query of large, distributed and online

visual information systems. It automatically links userto multiple image search engines for onlineresources.

13-Apr-12 27

Page 28: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 28/35

Page 29: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 29/35

National Institute Of Technology,Kurukshetra13-Apr-12 29

Page 30: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 30/35

Page 31: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 31/35

National Institute Of Technology,Kurukshetra

Future Work Customized Search :-The MetaSeek engine can be

further improved by adding capabilities such as support

for customized search. QBIC and VisualSeek allow the

user to customize the search by manually specifying

visual sketches as query input. The customized search

on these two systems is supported for colour

percentages and colour layout, which allow the user to

specify different colour amounts or different colour

locations respectively.

13-Apr-12 31

Page 32: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 32/35

National Institute Of Technology,Kurukshetra

Future Work Learning Algorithm :-MetaSeek is a system designed to learn to recommendthe most suitable search engines incoming user queries.

It has improved its performance —as measured by theretrieval of better search results in less time —throughExperience obtained from user feedback. Clearly,MetaSeek can be classified as a machine-learning

problem, and are reviewing approaches to machinelearning related research areas.

13-Apr-12 32

Page 33: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 33/35

National Institute Of Technology,Kurukshetra

Future Work Sentiment-based searching is a challenging area that

needs further research to meet the needs of users.

Using the current prototype system,various aspects and

applications of sentiment based searching on the World

Wide Web can be explored. The following are

some features that can be improved oradded in future

work:

More sophisticated review filtering : Another

classifier to classify a document as a review or

non-review can be developed.

13-Apr-12 33

Page 34: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 34/35

National Institute Of Technology,Kurukshetra

References(1) Chen, H., & Dumais, S.T. (2000). Bringing Order to the Web: automatically categorizing

Search Results, In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00) (pp. 145-152).

(2) J.R. Smith and S. Chang, “VisualSeek: A Fully Automated Content-Based Image QuerySystem,” Proc. ACM Conf. Multimedia , ACM Press, New York, 1996; also available at

http://www.ctr.columbia.edu/VisualSeek/.(3)  A sentiment-based meta search engine. In C.Khoo, D. Singh & A.S. Chaudhry (Eds.),

Proceedings of the Asia-Pacific Conference on Library & Information Education & Practice 2006 (A-LIEP 2006), Singapore, 3-6 April 2006 (pp. 83-89).

(4)  “Using Relevance Feedback In Content Based Image Metasearch” by Ana B. Benitez,Mandis Beigi, And Shih-Fu Chang Columbia University in IEEE Internet magazine.

(5) http://tamas.nlm.nih.gov/metasearch/

http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer ScientistNational Library of Medicine

(6)  “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, Sergey Brin andLawrence Page.

http://www-db.stanford.edu/~backrub/google.html 

13-Apr-12 34

Page 35: Met a Search Engine

8/4/2019 Met a Search Engine

http://slidepdf.com/reader/full/met-a-search-engine 35/35

Queries???