Upload
romilg1
View
222
Download
0
Embed Size (px)
Citation preview
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 1/35
National Institute Of Technology,Kurukshetra
Meta Search Engine using DistributedInformation Retrieval
Submitted to: Submitted by:-
Mrs. Navneet Kaur Romil
Asst. Professor 108012
Computer Department CO 1
13-Apr-12 1
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 2/35
National Institute Of Technology,Kurukshetra
Outline Web Search. Limitations of Simple Web Search and Search Engine.
Meta Search. Meta Search Engine Architecture and Advantages.
Examples of Meta Search Engine.
IR and Distributed IR.
Current Research Work in Meta Search Engine
13-Apr-12 2
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 3/35
National Institute Of Technology,Kurukshetra
Requirement of Search Engine Content is created by diverse organizations and
individuals.
Information on the Web is inherently heterogeneous.
Content is distributed on multiple servers in multiplelocations and multiple formats and languages aimedfor diverse audiences and purposes.
The “Open Web” of billions of static Web pages isindexed and searched via multiple search enginesand directories.
13-Apr-12 3
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 4/35
National Institute Of Technology,Kurukshetra
Web Search Engines... Fastsearch (alltheweb.com)
Altavista (www.altavista.com)
Google (www.google.com) Northernlight (www.northernlight.com)
HotBot (www.hotbot.com)
Excite (www.excite.com)
New search Engines
Teoma (http://www.teoma.com)
Wisenut (http://www.wisenut.com)
13-Apr-12 4
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 5/35
National Institute Of Technology,Kurukshetra
Web Search Engines... Specialty search engines:
Country-specific search engines www.khoj.com
www.123india.com Subject-specific search engines
Chemfinder (www.chemfinder.com)
Engineering Resources Online (www.er-online.co.uk)
MathSearch (www.maths.usyd.edu.au:8000/MathSearch.html)
Netpart: Company site locator (www.websense.com/locator.cfm) World Trade Locator (www.intl-tradenet.com)
Resource-specific search engines: Patents (www.uspto.gov)
Journal articles (www.findarticles.com)
13-Apr-12 5
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 6/35
National Institute Of Technology,Kurukshetra
Problems in Web Search Even the largest of the current search engines index
only a fraction of all Web pages. Search engines vary in terms of search techniques/
syntax. Different search engines return different search
results due to the variation in indexing and searchprocess (40% non-overlap).
None of the search engines come close to indexing
the entire web, much less the entire Internet.
13-Apr-12 6
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 7/35
National Institute Of Technology,Kurukshetra
Overlap Among 3 Major Search Engines
13-Apr-12 7
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 8/35
National Institute Of Technology,Kurukshetra
Why Are Meta Search Engines Useful ? Meta Search improves the Search Quality in many
ways:
Comprehensive, Efficient,
One query queries all {one-click paradigm},
13-Apr-12 8
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 9/35
National Institute Of Technology,Kurukshetra
Why Meta Search ? Individual Search engines don’t cover all the web
by themselves.
Individual Search Engines are prone to spamming.People trying to raise their ranking profile in a
non-legitimate manner or to promote commerce. So,paying sites can get higher ranking.
Difficulty in deciding and obtaining results withcombined searches on different search engines.
13-Apr-12 9
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 10/35
National Institute Of Technology,Kurukshetra
Differences {Search Vs. Meta-search} Doesn’t generally have a Database by itself,
Does not search{crawl} the web.
A Meta-Search Engine in terms of search engine. Essentially is a hub of search engines/databases
accessible by a common interface providing the userwith results which may/may not be rankedindependently of the original search engine/source
ranking.
13-Apr-12 10
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 11/35
National Institute Of Technology,Kurukshetra
Algorithm used in Meta Search Engine Meta Search Engine may simultaneously search
multiple open web and hidden websites in order toincrease content coverage, precision, relevanceand/or search efficiency and effectiveness.
It integrates best practices Information Retrieval and Natural Language Processing techniqueswith AI heuristics to create an advanced general
purpose meta-search, result clustering andknowledge discovery tool.
13-Apr-12 11
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 12/35
National Institute Of Technology,Kurukshetra
Meta Search Engine Architecture
S E 1 S E 2 S E 3
Dispatcher
Display
U s e r I n t e r f a c e
Knowledge
Personalize
Query
Feedback
User
Web
13-Apr-12 12
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 13/35
National Institute Of Technology,Kurukshetra
Meta Search Engine Architecture User Interface
Normally resemble search engine interfaces with
options for Types of search [Media]
Search Engines to Use
Dispatcher
Generates actual queries to the search engines by using the
user query.
May involve choosing/expanding search engines to use.
13-Apr-12 13
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 14/35
National Institute Of Technology,Kurukshetra
Meta Search Engine Architecture Display.
Generates Results page from the replies received,
May involve ranking,parsing,clustering of thesearch results or just plain stitching.
Personalization/Knowledge.
May contain either or both. Personalization may
involve weighting of search results/query/enginefor each user.
13-Apr-12 14
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 15/35
National Institute Of Technology,Kurukshetra
Clustering in Meta Search Engine Organize search results into categories or folders to
build a clear, concise picture for its users. By usingclustering service in meta search engine users cancomfortably explore much more information in anorganized way, rather than being bombarded withdisorganized information dumps.
www-math.mit.edu/cluster/
www.Vivisimo.com
13-Apr-12 15
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 16/35
National Institute Of Technology,Kurukshetra
Screen Output of Vivisimo
13-Apr-12 16
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 17/35
National Institute Of Technology,Kurukshetra
Introduction to IR What is Information Retrieval (IR)?
Indexing text and searching for useful documentsin a collection.
Searching documents on the web.
Given a query, retrieving relevant documents
efficiently.
Commercially successful (Google, Yahoo, MSN, Ask Jeeves, etc).
13-Apr-12 17
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 18/35
Information Retrieval
Information Retrieval is the science, study and practiceof how humans seek information.
Information seeking is complex human behavior, in
which some sort of cognitive change is sought. The nature of information is similarly complex. Does it
exist apart from a human observer? Why is oneperson’s “data” another person’s “information?” Canwe measure the information content of a message, or is
that only for the telephone engineers (like Shannon & Weaver)?
13-Apr-12 18National Institute Of Technology,
Kurukshetra
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 19/35
National Institute Of Technology,Kurukshetra
Why IR is not reliable For all their performance, modern search is often
unsatisfying: finding the information you want isdifficult.
IR systems use queries as expressions of informationneed. But such expressions are necessarily inexact: human language is imprecise
queries are usually short, but might represent complexneeds
a person’s history and background will impact whichinformation is useful
A document != information
13-Apr-12 19
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 20/35
National Institute Of Technology,Kurukshetra
Ranking Method in Conventional Search
Engine A query has t terms (i.e., words). To get a relevant
score for an entire document j, we treat the queryand document as vectors, normalize (take the vector
norm) and compute the cosine (which is the dotproduct of the normalized vectors).
Cosine ranges from 0 to 1; 0 is orthogonal(“unrelated”), 1 is a perfect match. Rank in
descending order and present results.
13-Apr-12 20
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 21/35
National Institute Of Technology,Kurukshetra
Numerically… Imagine a two word query.
Document d1 has a weighted score of 1 for term 1,and 2 for term 2: vector[1,2].
Query terms are weighted vector[1,3].
We first normalize the document to vector[.45,.89],and query to vector[.34,.95].
Then get cosine = .9985.
13-Apr-12 21
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 22/35
National Institute Of Technology,Kurukshetra
Ranking in Meta Search Engine In an election, each of a large number k of voters
ranks a small number n of candidates. In web meta-search, in response to a given query, each of a small
number k of search engines (voters) ranks a (subsetof a) large number n of pages (candidates). Theresults are then combined in some fashion toproduce a ranking that is in some sense "better'' than
the results produced by any single search engine.
13-Apr-12 22
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 23/35
National Institute Of Technology,Kurukshetra
Distributed Information Retrieval
Engine 1 Engine 2 Engine 3 Engine 4 Engine n. . . .. . .
. . .?
Information
Need
13-Apr-12 23
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 24/35
National Institute Of Technology,Kurukshetra
Distributed Information Retrieval Site description: Contents, search engine, services,
etc.
Resource ranking: ranking resources by how likely tocontain desired content.
Resource selection: selecting the best subset from aranked list.
Result merging: Merging a set of document rankings
different underlying corpus statistics
different search engines
13-Apr-12 24
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 25/35
National Institute Of Technology,Kurukshetra
Distributed IR Process
13-Apr-12 25
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 26/35
National Institute Of Technology,Kurukshetra
Distributed IR benefits Primary Motivations for Distributed IR
Partition large collections across processors
To increase speed
Because of political or administrative requirements Ever-increasing amounts of data
Networks, with hundreds or thousands of collections
Consider the number of collections indexed on the Web
Heterogeneous environments, many IR systems
Economic costs of searching everything at a site
Economic costs of searching everything on a network
13-Apr-12 26
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 27/35
National Institute Of Technology,Kurukshetra
Current Research & Innovations in Meta
Search Engine In the Research paper “Using Relevance Feedback In
Content Based Image Metasearch” by Ana B. Benitez,Mandis Beigi, And Shih-Fu Chang Columbia University
in IEEE Internet magazine , the authors propose aMeta Search Engine named MetaSeek.
MetaSeek is an image meta search engine developedto explore the query of large, distributed and online
visual information systems. It automatically links userto multiple image search engines for onlineresources.
13-Apr-12 27
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 28/35
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 29/35
National Institute Of Technology,Kurukshetra13-Apr-12 29
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 30/35
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 31/35
National Institute Of Technology,Kurukshetra
Future Work Customized Search :-The MetaSeek engine can be
further improved by adding capabilities such as support
for customized search. QBIC and VisualSeek allow the
user to customize the search by manually specifying
visual sketches as query input. The customized search
on these two systems is supported for colour
percentages and colour layout, which allow the user to
specify different colour amounts or different colour
locations respectively.
13-Apr-12 31
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 32/35
National Institute Of Technology,Kurukshetra
Future Work Learning Algorithm :-MetaSeek is a system designed to learn to recommendthe most suitable search engines incoming user queries.
It has improved its performance —as measured by theretrieval of better search results in less time —throughExperience obtained from user feedback. Clearly,MetaSeek can be classified as a machine-learning
problem, and are reviewing approaches to machinelearning related research areas.
13-Apr-12 32
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 33/35
National Institute Of Technology,Kurukshetra
Future Work Sentiment-based searching is a challenging area that
needs further research to meet the needs of users.
Using the current prototype system,various aspects and
applications of sentiment based searching on the World
Wide Web can be explored. The following are
some features that can be improved oradded in future
work:
More sophisticated review filtering : Another
classifier to classify a document as a review or
non-review can be developed.
13-Apr-12 33
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 34/35
National Institute Of Technology,Kurukshetra
References(1) Chen, H., & Dumais, S.T. (2000). Bringing Order to the Web: automatically categorizing
Search Results, In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00) (pp. 145-152).
(2) J.R. Smith and S. Chang, “VisualSeek: A Fully Automated Content-Based Image QuerySystem,” Proc. ACM Conf. Multimedia , ACM Press, New York, 1996; also available at
http://www.ctr.columbia.edu/VisualSeek/.(3) A sentiment-based meta search engine. In C.Khoo, D. Singh & A.S. Chaudhry (Eds.),
Proceedings of the Asia-Pacific Conference on Library & Information Education & Practice 2006 (A-LIEP 2006), Singapore, 3-6 April 2006 (pp. 83-89).
(4) “Using Relevance Feedback In Content Based Image Metasearch” by Ana B. Benitez,Mandis Beigi, And Shih-Fu Chang Columbia University in IEEE Internet magazine.
(5) http://tamas.nlm.nih.gov/metasearch/
http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer ScientistNational Library of Medicine
(6) “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, Sergey Brin andLawrence Page.
http://www-db.stanford.edu/~backrub/google.html
13-Apr-12 34
8/4/2019 Met a Search Engine
http://slidepdf.com/reader/full/met-a-search-engine 35/35
Queries???