23
A Comprehensive Information Retrieval Portal for Canadian Scientific Researchers Research Proposal for CISTI Andre Vellino August 2006

Vellino presentationtocisti

Embed Size (px)

Citation preview

A Comprehensive Information Retrieval Portal for Canadian Scientific Researchers

Research Proposal for CISTIAndre Vellino

August 2006

Overview

Context: CISTI Strategic Plan Proposal Statement System Architecture Proposal Components Partnerships Outcomes and Draft Workplan Andre’s Relevant Experience

Holy Grail

“It’s easy to say what would be the ideal online resource for scholars and scientist: all papers in all fields, systematically interconnected, effortlessly accessible and rationally navigable from any researcher’s desk, worldwide, for free”

Stevan Harnad, 1999Professor of Cognitive Science

University of Southampton

Excerpts from CISTI Strategic Plan

“Goal 1: Provide universal, seamless, and permanent access to information for Canadian research and innovation.”

“Canadians look to CISTI to deliver distilled, aggregated, and validated information that is relevant to their research and innovation activities.”

“Available at the client’s desktop, these services are provided through a technologically sophisticated infrastructure.”

“[All users] will have electronic access at their desktop to a wealth of national and international STM information resources, supported by intelligent search and analysis tools and expert advice.”

Proposal Vision

To develop a web-based information portal that offers universal, seamless access to highly relevant, distilled and aggregated SMT information using intelligent search and analysis tools that support scientific innovation.

High Level Functional Architecture

Content AggregatorOpenURL Resolver

Web ApplicationServer

User Agents

CollaborativeFiltering

PersonalizedScientificLiteratureResearchPortal

Commercial Science Publishers

LitMinerContent Analysis

Taste (open source)

Personalization Engine

CISTI & UniversityLibraries

Proposal Components

User Needs Content Aggregation Collaborative Filtering Content Mining Results Visualization Partnerships

User Needs

Customers of CISTI services and content are elite – highly educated and exacting in their requirements;

Compared to mass-market or intranet commercial search-portals, the number of CISTI end-users is small (30,000 – 100,000);

User needs are (likely) varied but focused: e.g. bibliographic literature searches / peer reviews / competitive analysis / historical research;

Contribution to “innovation” can be measured (in the short term) by asking the user directly.

User Profiling

Enables Customized services

Alerts / Notifications Higher precision search results

Greater user satisfaction Item and User based recommender system

Broadens scope of search to semantically cognate but otherwise disparate domains

Content Aggregation

Most end users will (likely) not care where the information they seek resides;

Results for a search should show that many sources are available and provide links to these sources (Open Access / Commercial / Academic / Government);

Requires partnerships with content providers and search engines.

Collaborative Filtering

Monitors user’s browsing behaviour (and / or explicit feedback) to build a profile of the users choices;

Other users with “similar” profiles can share (anonymously) their opinions (e.g. on the value or usefulness of an article or book) with others. “People who ordered article X also ordered article Y”);

Enables serendipitous recommendations (options that the “active user” might not have considered otherwise) May stimulate “innovation”; May complement citation indexing as a relevance criterion;

Untested technology in the scientific information retrieval community;

Content Mining

Concept discovery using: Automatic Classification (Categorization) Named Entity Tagging Document meta-tagging w/ Concepts

Value: Improved Precision in Search Results May add dimensions to meta-data about content “Related Articles” feature in Google Scholar Enables novel visualization of results

Entrust Toolkit

Categorizer

Entrust ContentAnalysis Toolkit

Do

cu

me

nt C

on

ve

rsio

n

Concepts

Summarizer

Search

Categories

Concepts,Meta-Data

Summaries,Ranked Phrases

Hits,Locations

FileSystem

DB

Example: Healthcare Concept Tree

Results Visualization

Content Analysis and Personalization May allow different

display paradigms for “more documents like this” or “similar articles”

Feedback on relevance of the query terms to the selected item.

Interactive Vizualization of Multiple Query Results – Battelle

Using Visualisation to Interpret Search Engine Results– Wolverhampton

Partners

Google (Books / Scholar) http://scholar.google.com/

Online Computer Library Center - WorldCat http://www.worldcat.org/

Public Library of Science http://www.plos.org

Science.gov http://www.science.gov/

International Association of STM Publishers http://www.stm-assoc.org/

Annual Reviews http://www.annualreviews.org/

BioMed Central (UK) http://www.biomedcentral.com/

Related Areas of Research

Digital Archiving Mechanisms for preserving digital objects (multi-media)

Valuation and payment models for Digital Objects To decide what to preserve / for how long / how much to

charge Application of Metadata Standards

Dublin core / Semantic Web Ontologies (OWL) Digital Rights Management & Security

Access control / Intellectual Property protection

Project Phases & Outcomes

Project Phases Requirements / Research Phase Analysis / Design Phase Development / Test Phase

Outcomes Develop prototype of content-aggregation search portal

with collaborative filtering and content analysis engine Establish partnerships with content providers and search

engine organizations Test user satisfaction and "return use" improvements on a

sample population Publish results

Requirements /Research Phase

User Requirements Find out what classes of users there are and what

features users want in an information portal that would help them innovate;

Technology Literature Review Content Aggregation Visualization Categorization Personalization / Collaborative Filtering

Analysis / Design Phase

Use-Cases For each category of user, enumerate the use-

cases (behavioural scenarios). User Interface Design

Design the interface for query, query-refinement, results visualization and recommendations.

Software Evaluation Portal web-application components Collaborative Filtering packages Categorization / LitMiner interfaces

Development / Test Phase

Prototype Information Portal Develop Content Aggregator Personalization / Recommendation agents

Integrate Content Analysis LitMiner or Categorization / Concept Tagging

toolkits Test and Evaluate in a Pilot program.

Experiments with test group to determine Measure of user acceptance Rates of Return Usage

Draft Work Plan

Andre Vellino – Relevant Experience

Entrust Content Analysis Policy Architect - Concept extraction and automatic categorization.

imGenie – startup Systems architect for a wireless, bi-modal (voice / text), personalized information

retrieval and groupware application. National Research Council

Research Scientist, IIT – Information Retrieval on small-format displays. Nortel Networks

Senior Systems Architect, Disruptive Network Solutions - Personal Identity Management for intelligent mediation of content-delivery in the network.

Carleton University Cognitive Science Ph.D. program, Adjunct Research Professor

NCF Internet Server-side Web architect for new NCF web-portal – registration, payment,

single sign-on to integrated applications. University of Georgia / Environmental Protection Agency

Research Associate, Advanced Computational Methods Center - development of expert system for predicting chemical reactivity from chemical structure.