View
33
Download
0
Category
Preview:
DESCRIPTION
Intelligent Meta-Search and Clustering Technology http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer Scientist National Library of Medicine doszkocs@nlm.nih.gov. Characteristics of Web Searching. - PowerPoint PPT Presentation
Citation preview
Intelligent Meta-Search and Clustering Technology
http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov
Tamas Doszkocs, Ph.D.Computer Scientist
National Library of Medicine doszkocs@nlm.nih.gov
Characteristics of Web Searching• Content is created by diverse
organizations and individuals
• Information on the Web is inherently heterogeneous
• Content is distributed on multiple servers in multiple locations and multiple formats and languages aimed for diverse audiences and purposes
(In its April 2005 survey NetCraft received responses from 62,286,451 web sites)
• The “Open Web” of billions of static Web pages is indexed and searched via multiple search engines and directories
Problems in Web Searching• Even the largest of the current search engines
index only a fraction of all Web pages (The WayBackMacine of Internet Archive has indexed 40 billion pages, Google about 8.1 billion, Yahoo
about 20.8 billion -- August 2005)
• The not so “Hidden Web” of content databases (e.g. PubMed, Web of Science) is estimated to be thousands of times larger than the Open Web.
• Both the Open Web and the Hidden Web are characterized by problems of information coverage, quality, overload, relevancy, currency and completeness, as well as inherent language ambiguity and incompatible user interfaces
Meta-Searching
• Meta-Search Engines may simultaneously search multiple Open Web and Hidden Web sites in order to increase content coverage, precision, relevance and/or search efficiency and effectiveness.
Overlap Among 3 Major Search Engineshttp://missingpieces.dogpile.com/whitepaper.pdf
http://comparesearchengines.dogpile.com/OverlapAnalysis.pdf
Overlap Among AskJeeves, Google, MSN and YahooGoogle Isn’t Everything!
http://www.forbes.com/business/free_forbes/2005/0815/056.html?partner=yahoomag
Generations of Meta-Search Engines
• First Generation
• Second Generation
• Third Generation
• Next Generation
• “Broadcast” or “Federated” search– List of results
• Merging and Ranking– Increased coverage
• Result Clustering– Focused drill-down– Dynamic Query Mods
• Semantic and Pragmatic Intelligence
– tamas.nlm.nih.gov/metasearch/– toxseek.nlm.nih.gov– http://bestmeta.com
Moving Targets:Nine Search Engines Compared
By Ben Patterson (May 9, 2005)
http://reviews.cnet.com/4520-10572_7-6219242-2.html?tag=txt
Moving Targetsand the need for
Automatic Change Detection and Monitoringand
Integrating New Capabilities
The ToxSeek Meta-Search and ClusteringProject
• Goals:– Integrate best practices Information Retrieval and
Natural Language Processing techniques with AI heuristics to create an advanced general purpose meta-search, result clustering and knowledge discovery tool
– Apply ToxSeek to efficiently access diverse biomedical and environmental health information resources
– Create specialized applications for accessing quality information sources on HIV/AIDS, consumer health, homeland security, public health law, library research and other applications
ToxSeek Features• Integrates multiple spellcheckers and sophisticated lexical,
morphologic, syntactic and semantic resources • Merges and ranks the results from heterogeneous
information sources • Employs efficient Natural Language Phrase Parser and AI
heuristics to automatically identify Key Concepts and their Associations in queries and retrieved documents
• Uses the automatically identified Key Concepts and Associations to create topical Result Clusters
• Supports focused multi-concept drill-down, dynamic query refinement, multi-media and limited question answering
ToxSeek Implementation• Production applications and research prototypes have
been implemented for meta-searching diverse content on:– Toxicology and Environmental Health– Consumer Health– Library Catalogs and Proprietary Databases– HIV/AIDS– BioDefense– Homeland Security
• “Shift Happens…”– http://library.nps.navy.mil/home/staff/gmarlatt/HSDL%20ALI%20April
%202005%20%20final%20rev%207%20april.ppt
ToxSeek Web Search Query: “terrorism”
ToxSeek Query: “police state”
Win the Search Engine Wars with Intelligent Meta-Search and Clustering Technology
http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov
Tamas Doszkocs, Ph.D.Computer Scientist
National Library of Medicine doszkocs@nlm.nih.gov
Recommended