49
Open Search David Wolber

Open Search

Embed Size (px)

DESCRIPTION

Open Search. David Wolber. Overview. Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients. Contributors. Michael Kepe Igor Ranitovic Iman Sadreddin Senior Team ’03 Ken Chong - PowerPoint PPT Presentation

Citation preview

Open Search

David Wolber

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Contributors

Michael Kepe Igor Ranitovic Iman Sadreddin Senior Team ’03 Ken Chong Rudd Stevens Colin Bean Tim Chan Julian Chan Pooja Garg

Information Source Explosion

Google, Amazon APIs Internet Archive Technorati– The World Live Web Domain Specific:

– ACM Digital Library for CS– Lexis-Nexis for law– MLA for literature

End-User Created Digital Libraries

Personal Web (shared Google desktop)

Personal Web Neighborhood

Topic-Specific Personal Crawlers

Ordinary people creating search engines as easily as web pages

2nd Degree

1st Degree

Nth Degree

PersonalWeb

Subsets of the Web

Motivation for Small, Independent Subsets of the Web

Avoid information being channeled through a single portal: Googleopoly

Google does no evil, but…– Censorship in China– Creeping level of commercialization– Unregulated manipulation of secret ranking

algorithms (see PageKing case)

Other media is lost, this is the last frontier

Little support for using multiple search engines

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Metasearch

Help users discover and use digital libraries

Send queries to multiple, selected search engines

filter, process, and unify results

A9.com – Amazon’s metasearch

Web Services Basis

server html

server softwarexml

server

html

Web Page Model

Web Service Model

How does metasearch evolve?

New Digital library

How does metasearch evolve?

New Digital library

Metasearch clients discover it

How does metasearch evolve?

New Digital library

Metasearch clients discover it Metasearch

Programmers write adaptor/scraper

How does metasearch evolve?

New Digital library

Metasearch clients discover Metasearch

Programmers write adaptor/scraper

User can access within metasearch

SLOWLY…

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Goal: Automate the Process

Metasearch engines should provide users with up-to-date lists of existing digital libraries

Digital libraries should be able to register and be made immediately available to all Metasearch clients.

Metasearch and Library development is independent.

What is Necessary?

Standard Search API– So Metasearch clients can use polymorphism to access

sources.

for each source s in sourceList {searchEngine.endPointUrl = s.endPointUrl;resultList +=

searchEngine.keywordSearch(keywords)}

Search API Registry

– Metasearch clients can get dynamic list

Web Service Standards

WSDL – Web Service Description Language

SOAP – Simple Object Access Protocol

UDDI – Universal Description, Discovery, and Integration

Standards on top of Web Services

WSDL, SOAP, UDDI basis for standards in many domains.– e.g., MS initiated for securities information

providers

Businesses agree on a standard, then client applications can use polymorphism and new businesses can register services.

In this case, we want cross-domain standard.

Open Search Architecture

Open Search Protocol (OSP)– Cross-Domain: Search-related services– Not just keyword search, but citations, authorOf, etc.

Open Search Registry– Based on UDDI– Can add customization, e.g., parsing to find out which

search operations are implemented.– Web and web service access

Open Search Architecture

OSP metasearch clientssource list

Register service

OSP-Conforming Libraries

OS Registry

User Can Choose Sources

Open Search Protocol

Keyword search

Citations (inward links, outward links)

AuthorOf and other associative operations…

Metadata object results based on Dublin Core

Restriction object for “advanced search” stuff

Publishing a Library

• Access OSP WSDL Specification from webtop.cs.usfca.edu

• Generate code in language of choice

• Implement the search operations for the digital library

• Deploy the service

• Register with Open Search registry

Deploying an Open Search Lib.

programmer 1. OS wsdl

wsdl2java

2.wsdl

3. skeleton code

Open Search

information

Registry

Library server

4. deployed service

5. registration info

Wrapping a Library

Custom search API, e.g., Google API

Open Search Wrapper

Metasearch Client

1. OSP Query 4. OSP Result

2. Custom query

3. Custom Result

Located on 3rd party server

Wrappers Developed at USF

Google Amazon (sort of) Internet Archive Technorati Feedster

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

PublishMe

Like Google Desktop, but shared.

Periodically updates inverse index and linkbase on PC

Deploys Web Service on User’s PC

Auto-Registers with Open Search Registry

Metasearch with P2P Knowledge Sharing

WEBTOP

Integrating Global and Personal Libraries

Motivation for Sharing Personal Webs

People create knowledge everyday when they bookmark, annotate, link, organize, and synthesize.

Communication is a separate step which often doesn’t happen

Experts Collaborative Work

Motivation for Sharing Personal Webs

Computers are designed using our brains for a model

Knowledge creation and dissemination separate

Explicit effort required to communicate Just as we model our word processors on

paper.

Additions to OSP for P2P

GetFile

OnLine(ip)– Handles user starting up– Dynamic IPs

OffLine

But What About PRIVACY?The Big Question:

How much of the information hidden

within your personal web is hidden due

to privacy concerns?

I Want you to be a Search Engine!

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Metasearch Clients

Goal: Implement Vannevar Bush’s Association Trails

View a document/thing in context

History of an idea

Thinkmap-like Interface

Association Types

Outward links Inward links Similar-Content links People Links

– author, people referenced in paper Domain-Specific links

– law citations– movie-actor

Associations specified by Annotators

Webtop Tree View webtop.cs.usfca.edu

Expanding a Tree

• Bird’s Eye View

• Local/Web files integrated

• Follow different Associative Trails

• Ins of Outs of Ins, etc.

• Siblings

• Weird though, as ins and outs both expand right

Webtop Side Panel View

Project Status

Too many bugs, Dad

Future Work

Open Search Protocol– In-depth study of existing search APIs– Provide Rest alternative to SOAP

Metasearch development– Complete and refine existing clients– Dream up new ones

Thinkmap Graph Automated Source Selection and Reputation System Page Ranking

Initiate grass-roots involvement

Future Work: Documents and Things

resourceassociationsannotations

document

html word pdf

person

film book

creative work

Stop talking about Webtop daddy!

webtop.cs.usfca.edu