31
Indexing and Search Engines Cse(n2) By Jasleen Kaur R. no-115340 Cse(n2)

Indexing and search engines

  • View
    734

  • Download
    6

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Indexing and search engines

Indexing and Search Engines

Cse(n2)

By

Jasleen Kaur

R. no-115340

Cse(n2)

Page 2: Indexing and search engines

OVERVIEW

Introduction Types of Searching Parts of a Local Search Engine Working of a Local Search

Engine Choosing a search engine Conclusion References

Page 3: Indexing and search engines

Introduction – Searching and Search Engines

A good site is one in which ‘content is king’

A lot of information makes a site huge, complex and navigation difficult

Search is the user's lifeline for mastering complex websites

Search feature is essential for users when they revisit a site, looking for specific info

Page 4: Indexing and search engines

Introduction – Searching and Search Engines

Search is also users' escape hatch when they are stuck in navigation. When they can't find a reasonable place to go next, they often turn to the site's search function.

This is why site search is an important feature of any site of reasonably size

Page 5: Indexing and search engines

Types Of Searching

• A search can be of various types: Internet Search: Search Engines like Yahoo,

Infoseek crawl the web gathering web pages or info on web pages, index them and retrieve them when the specific term is found

Database search: Databases store their information neatly organized into fields. A search Interface is provided for this.

Page 6: Indexing and search engines

Types Of Searching

With databases one can set up complex queries to find the search words in all applicable fields.

But this makes them slower to respond, requires more memory, and requires programming.

Database search is not oriented towards text search and relevance ranking: they are great for listing of inventory or directory of the institute

Page 7: Indexing and search engines

Types Of searching

Intranet search: Search is restricted to a site or a group of sites.

Text search engines store this information in one index and can find words in any field for a record.

Many high-end search engines can also store field information, so searches can be limited to a specific field as well.

Page 8: Indexing and search engines

Parts of a Local Site Search Tool

Search Indexer− The program that recognizes and creates an

index of all the documents on the site. The index is stored in a file called as the index file, where the search engine will find them.

Search Index File − Created by the Search Indexer program, this file

stores the data from the site in a special index or database, designed for very quick access.

Page 9: Indexing and search engines

Parts of a Local Site Search Tool

Search Form − HTML interface to the site search tool,

provided for visitors to enter their search terms and specify their preferences for the search

Search Engine− The program (CGI, server module or

separate server) that accepts the request from the form or URL, searches the index, and returns the results page to the server

Page 10: Indexing and search engines

Parts of a Local Site Search Tool

Results Listing− HTML page listing the pages

which contain text matching the search term(s). These are sorted in some kind of relevance order, with the closest match at the top. The format of this is often defined by the site search tool, but may be modified in some ways.

Page 11: Indexing and search engines

Working of a Local Search Engine

Search Form

Indexer

Web Site Documents

Gets words

IndexStores Words Looks in

Index

Gets Matches

Sends Query

Search Engine

Results Page

SendsFormattedResults

Retrieved Page

User views Retrieved Page

User Selects required

page

Page 12: Indexing and search engines

Types of Search Engines

CGI Programs − The Common Gateway Interface (CGI) standard allows a

web server to communicate with external programs. CGI Programs run as Search Engines.

Server Plug-Ins− For better data interchange, less overhead and more

flexibility, web server companies have defined APIs (Application Programmer Interfaces) to their servers. This allows third-party developers to create modules for the servers which run inside the server process

Page 13: Indexing and search engines

Types of Search Engines

Search Servers

− Some search engines run as separate servers. The form data is passed as part of the URL, just like a URL, but the search engine application runs as a separate HTTP server on a different machine. This reduces the load on the main web server.

Remote Searching

− It is also possible to outsource search to a remote site search service. The indexer and search engine run on the remote server. using a web indexing robot, or spider, they follow links on the site and read the pages, then store every word in the index file on that server. When it comes time to search, the form on the site Web page send a message to the remote search engine which sends results back to the site.

Page 14: Indexing and search engines

Choosing a Site Search Tool

Technical Considerations Indexing Features Searching Capabilities Results display Costs, licensing and registration

requirements Unique features (if any)

Page 15: Indexing and search engines

Features of search engines:Technical Considerations

Server platforms supported

Unix, NT, Win'95/98/NT

Web servers supported

NCSA HTTPD, CERN HTTPD, OMNI HTTPD, XITAMI, APACHE, PWS, IIS

Scalability Indexing support for multiple web servers within an intranet

Technical support:

E- Mail , Mailing list , Documentation on Web site

Main program modules

Source code availability

Ease of Installation and Maintenance

Often related to the technical expertise available

Page 16: Indexing and search engines

Features of search engines:Indexing features

File/document formats supported HTML, ASCII, PDF, SQL, Spreadsheets,WYSIWYG (MS-Word, WP, etc.)

Indexing level support File/directory level, multi-recordfiles

Standard formats recognised MARC, Medline, etc

Customisation of documentformatsStemming If yes, is this an optional or

mandatory feature?Stop words support If yes, is this an optional or

mandatory feature?

Page 17: Indexing and search engines

Features of search engines:Indexing features

Meta tags indexing If meta tag indexing is allowed, whatkind of meta tags can be used

Support for compression Does the indexer support filecompression?

Field level searching Requires more space and time

Indexing ALT text/comment text Shows if the search engine indexesALT text associated with images ortext in comment tags.

Database updation Does the indexer support incrementalupdations?

Page 18: Indexing and search engines

Features of search enginesSearch Capabilities

BooleanSearching

Use of Boolean operators AND, OR and NOT assearch term connectors

NaturalLangauge

Allows users to enter the query in natural langauge

Phrase Users can search for exact phrase

Truncation/wild card

Variations of search terms and plural forms can besearched

Exact match Allows users to search for terms exactly as it isentered

Duplicatedetection

Remove duplicate records from the retrieved records

Proximity With connectors such as With , Near, ADJacent onecan specify the position of a search terms w.r.t toorhers

Page 19: Indexing and search engines

Features of search enginesSearching features

Field Searching

Query for a specific field value in the database

Thesaurus searching Search for Broader or Narrower or Related terms or Related concepts

Query by example Enables users to search for similar documents

Soundex searching Search for records with similar spelling as the search term

Relevance ranking Ranking the retrieved records in some order

Customization (CGI prgs)

Search set manipulation Saving the search results as sets and allowing users to view search history

Page 20: Indexing and search engines

Features of search engineResults Display

Formats supported Can it display in native format or just HTML; Display in different formats, Display number of records retrieved

Relevancy ranking If the retrieved records are ranked, how the relevance score is indicated

Keyword-in-context

KWIC or highlighting of matching search terms

Customization of results display

Allow users to select different display formats

Saving options Saving in different formats; number of records that can be saved at a time

Page 21: Indexing and search engines

Choosing the right search engine

• Checklist of factors to be considered while selecting the search engine: – Size of the website

– Technical expertise available (local and/or from the supplier / developer)

– System platforms available

– Information sources and services to be supported

– Document collection: type, volume (now and in future)

– Indexing, search and display requirements

Page 22: Indexing and search engines

Choosing the right search engine

• Checklist of factors to be considered while selecting the search engine:– User community to be served– Differentiate between the need for indexing

the web site pages and the need for indexing databases / document collections (text, bibliographic, DBMS, etc.)

– Support for the concept of a "record" by the search engine.

– Support for structured fields and metadata – Cost

Page 23: Indexing and search engines

Choosing the right search engine

• Steps in the selection and procurement of search engines:

- Conduct a needs analysis. - Talk to other libraries - Attend trade shows and talk to vendors - Read the literature that reviews search engines. - Compile a list of possible products. .

Page 24: Indexing and search engines

Choosing the right search engine

• Steps in the selection and procurement of search engines:– Compare the functionality of each product to the criteria

you developed through needs analysis

– Narrow your list down to three possible products.

– Spend additional time learning about each product.

– Invite the vendors in for demonstrations.

– Ask for references and follow up with each reference

– Select product and implement.

– Follow up with end users.

– Continue an on going review with end users.

Page 25: Indexing and search engines

Choosing the right search engine

• Some Suggestions– The search system development or

selection should be based primarily on the local needs

– Consider using freeware search engines, if your requirements are met by these.

– For large, highly developed intranet sites, you may like to consider commercial search engines

– Consider if the webserver you are using supports indexing and search, and if this is adequate for you.

Page 26: Indexing and search engines

Choosing the right search engine

– The IT Professionals should make an effort to keep themselves abreast of the current web technologies

– The features available within a tool should be made use of properly to get maximum benefits

– Carefully consider interrelations between the three major components: document resources, users and the search engines.

Page 27: Indexing and search engines

Conclusion

Since search is such a common activity, the search box should appear on every page of your web site.

The initial target of the basic search should be the contents of the entire web site.

The basic search should allow for Boolean commands ("and," "or"), although this does not need to be explained.

Page 28: Indexing and search engines

Conclusion

A quality search process begins with quality metadata. It's that old principle: Garbage in, garbage out. Metadata is about giving a structure the the content. For example, if every document is assigned keywords or or classified by Geography, the reader will get a much more accurate return from his or her search.

Search engines are the mortar of the Intranet. As important as they are, their implementation must be given high priority with the necessary time allotted for research and development

Page 29: Indexing and search engines

List of some Popular Search Engines

Googlehttp://www.google.com

Yahoohttp://www.yahoo.in

Binghttp://www.bing.in

Amazonhttp://www.amazon.com

Page 30: Indexing and search engines

Free and commercial search engines

• Commercial search engines– AltaVista

(www.altavista.digital.com/)– Fulcrum (www.fulcrum.com/ )– Infoseek (software.infoseek.com)– Open Text (www.opentext.com/)– Oracle (www.oracle.com/)– PLS (www.pls.com/)– Verity (www.verity.com/)

Page 31: Indexing and search engines

THANK YOU!• ~Jasleen Kaur