View
114
Download
0
Category
Tags:
Preview:
Citation preview
A Comprehensive Information Retrieval Portal for Canadian Scientific Researchers
Research Proposal for CISTIAndre Vellino
August 2006
Overview
Context: CISTI Strategic Plan Proposal Statement System Architecture Proposal Components Partnerships Outcomes and Draft Workplan Andre’s Relevant Experience
Holy Grail
“It’s easy to say what would be the ideal online resource for scholars and scientist: all papers in all fields, systematically interconnected, effortlessly accessible and rationally navigable from any researcher’s desk, worldwide, for free”
Stevan Harnad, 1999Professor of Cognitive Science
University of Southampton
Excerpts from CISTI Strategic Plan
“Goal 1: Provide universal, seamless, and permanent access to information for Canadian research and innovation.”
“Canadians look to CISTI to deliver distilled, aggregated, and validated information that is relevant to their research and innovation activities.”
“Available at the client’s desktop, these services are provided through a technologically sophisticated infrastructure.”
“[All users] will have electronic access at their desktop to a wealth of national and international STM information resources, supported by intelligent search and analysis tools and expert advice.”
Proposal Vision
To develop a web-based information portal that offers universal, seamless access to highly relevant, distilled and aggregated SMT information using intelligent search and analysis tools that support scientific innovation.
High Level Functional Architecture
Content AggregatorOpenURL Resolver
Web ApplicationServer
User Agents
CollaborativeFiltering
PersonalizedScientificLiteratureResearchPortal
Commercial Science Publishers
LitMinerContent Analysis
Taste (open source)
Personalization Engine
CISTI & UniversityLibraries
Proposal Components
User Needs Content Aggregation Collaborative Filtering Content Mining Results Visualization Partnerships
User Needs
Customers of CISTI services and content are elite – highly educated and exacting in their requirements;
Compared to mass-market or intranet commercial search-portals, the number of CISTI end-users is small (30,000 – 100,000);
User needs are (likely) varied but focused: e.g. bibliographic literature searches / peer reviews / competitive analysis / historical research;
Contribution to “innovation” can be measured (in the short term) by asking the user directly.
User Profiling
Enables Customized services
Alerts / Notifications Higher precision search results
Greater user satisfaction Item and User based recommender system
Broadens scope of search to semantically cognate but otherwise disparate domains
Content Aggregation
Most end users will (likely) not care where the information they seek resides;
Results for a search should show that many sources are available and provide links to these sources (Open Access / Commercial / Academic / Government);
Requires partnerships with content providers and search engines.
Collaborative Filtering
Monitors user’s browsing behaviour (and / or explicit feedback) to build a profile of the users choices;
Other users with “similar” profiles can share (anonymously) their opinions (e.g. on the value or usefulness of an article or book) with others. “People who ordered article X also ordered article Y”);
Enables serendipitous recommendations (options that the “active user” might not have considered otherwise) May stimulate “innovation”; May complement citation indexing as a relevance criterion;
Untested technology in the scientific information retrieval community;
Content Mining
Concept discovery using: Automatic Classification (Categorization) Named Entity Tagging Document meta-tagging w/ Concepts
Value: Improved Precision in Search Results May add dimensions to meta-data about content “Related Articles” feature in Google Scholar Enables novel visualization of results
Entrust Toolkit
Categorizer
Entrust ContentAnalysis Toolkit
Do
cu
me
nt C
on
ve
rsio
n
Concepts
Summarizer
Search
Categories
Concepts,Meta-Data
Summaries,Ranked Phrases
Hits,Locations
FileSystem
DB
Results Visualization
Content Analysis and Personalization May allow different
display paradigms for “more documents like this” or “similar articles”
Feedback on relevance of the query terms to the selected item.
Interactive Vizualization of Multiple Query Results – Battelle
Using Visualisation to Interpret Search Engine Results– Wolverhampton
Partners
Google (Books / Scholar) http://scholar.google.com/
Online Computer Library Center - WorldCat http://www.worldcat.org/
Public Library of Science http://www.plos.org
Science.gov http://www.science.gov/
International Association of STM Publishers http://www.stm-assoc.org/
Annual Reviews http://www.annualreviews.org/
BioMed Central (UK) http://www.biomedcentral.com/
Related Areas of Research
Digital Archiving Mechanisms for preserving digital objects (multi-media)
Valuation and payment models for Digital Objects To decide what to preserve / for how long / how much to
charge Application of Metadata Standards
Dublin core / Semantic Web Ontologies (OWL) Digital Rights Management & Security
Access control / Intellectual Property protection
Project Phases & Outcomes
Project Phases Requirements / Research Phase Analysis / Design Phase Development / Test Phase
Outcomes Develop prototype of content-aggregation search portal
with collaborative filtering and content analysis engine Establish partnerships with content providers and search
engine organizations Test user satisfaction and "return use" improvements on a
sample population Publish results
Requirements /Research Phase
User Requirements Find out what classes of users there are and what
features users want in an information portal that would help them innovate;
Technology Literature Review Content Aggregation Visualization Categorization Personalization / Collaborative Filtering
Analysis / Design Phase
Use-Cases For each category of user, enumerate the use-
cases (behavioural scenarios). User Interface Design
Design the interface for query, query-refinement, results visualization and recommendations.
Software Evaluation Portal web-application components Collaborative Filtering packages Categorization / LitMiner interfaces
Development / Test Phase
Prototype Information Portal Develop Content Aggregator Personalization / Recommendation agents
Integrate Content Analysis LitMiner or Categorization / Concept Tagging
toolkits Test and Evaluate in a Pilot program.
Experiments with test group to determine Measure of user acceptance Rates of Return Usage
Andre Vellino – Relevant Experience
Entrust Content Analysis Policy Architect - Concept extraction and automatic categorization.
imGenie – startup Systems architect for a wireless, bi-modal (voice / text), personalized information
retrieval and groupware application. National Research Council
Research Scientist, IIT – Information Retrieval on small-format displays. Nortel Networks
Senior Systems Architect, Disruptive Network Solutions - Personal Identity Management for intelligent mediation of content-delivery in the network.
Carleton University Cognitive Science Ph.D. program, Adjunct Research Professor
NCF Internet Server-side Web architect for new NCF web-portal – registration, payment,
single sign-on to integrated applications. University of Georgia / Environmental Protection Agency
Research Associate, Advanced Computational Methods Center - development of expert system for predicting chemical reactivity from chemical structure.
Recommended