Tools for Repositories: Microsoft Research & the Scholarly Information Ecosystem Lee Dirks...

Preview:

Citation preview

Tools for Repositories:Microsoft Research &

the Scholarly Information Ecosystem

Lee DirksDirector, Education & Scholarly Communications

Microsoft External ResearchMicrosoft Corporation

Organization within Microsoft Research that engages in strong partnerships with academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computingInitiatives that focus on the research process and its role in the innovation ecosystem, including support for open access, open tools, open technology, and interoperabilityDevelopers of advanced technologies and services to support every stage of the research process

Microsoft External Research

MissionOptimize and extend Microsoft software to meet the specific needs of the academic community

Our approach:

Conduct applied projects to enhance academic productivity by evolving Microsoft’s scholarly communication offerings

Microsoft External Research is uniquely positioned to drive this initiative across Microsoft

Transforming Scholarly Communication

• Interoperability is essential– Actively lobby and drive for consensus around technical standards and standardized protocols

proactively adopted by the community; enable broad community engagement• Customers have told Microsoft that interoperability is OUR responsibility

• Leverage existing community protocols, practices, guidelines, etc.– Example – metadata conventions / taxonomies / ontologies: a traditional strength for libraries –

and a critical component in enabling Web 2.0

• Optimize for data-driven research– To both data (scientific) and to information (scholarly publications)– Reproducible research + computational science– Properly document / annotate scholarly output

• Data preservation (and provenance) should be baseline– Documentation of the data’s provenance– Preservation needs to be like “accessibility” features – i.e., assumed as required

• Semantic knowledge discovery & social networking – Harnessing collective intelligence must be a consideration – since accessing research is a core

step in the life-cycle. Enable knowledge discovery – Optimize for Web 2.0 scenarios and allow end-users/experts to find things easier

Data Collection, Research & Analysis

Authoring

Publication & Dissemination

Storage, Archiving & Preservation

CollaborationSharePoint

LiveMeetingOffice Live

DiscoverabilityLibra 2.0

“Bookweb”SharePoint

Office OpenXMLXPS FormatSQL Server & Entity FrameworkRights ManagementData Protection Manager

Office 2007:•Word•PowerPoint•Excel•OneNoteTablet PC/UMPC

Word 2007 + PowerPoint 2007WPF & Silverlight

“Sea Dragon” / “PhotoSynth” / “Deep Zoom”

Excel 2007Windows Server HPC“Astoria” / “Pop Fly”

The Scholarly Communication Lifecycle

Scholarly Communications: Project Overview• Current or Completed Projects

o Cornell – arXiv.org + Word 2007 (and repository interoperability via SWORD) o MIT / Broad Institute – Authoring (Word 2007) + data for research reproducibility o MSR – CMT++ interoperability with data + metadata transfer/exchange (conference management tool

enhancements) o LiveLabs – eJournal publishing online service (community publishing tool)o UC San Diego / PLoS – Semantic mark-up of scholarly articles (+ submission)o Chem4Word with Office & Cambridge University – Create add-in to Word 2007 to facilitate

drawing of chemical compounds and equations o Johns Hopkins University – Digital Archive for Astronomy/Astrophysics data (storage, preservation and

access) o Planets Project / EU (with MSR – Cambridge) OpenXML and file format preservation + interoperabilityo eChemistry Project (Cornell, Penn State, Indiana, Cambridge, Southampton) – ORE exemplar: access to

compound chemical info objects (cross-repository access to open chemistry data)o British Library – Researcher Information Centre (RIC) online workflow tool for scientists and researcherso Creative Commons Add-in for Office 2007 – evolving the Word 2003 efforto University of Southampton (UK) – Port ePrints Repository Software for installation on the Windows

platformo University of Manchester / “MyExperiment” Project – social networking for scientists o ORE Acceleration Project (OAI – Object Reuse & Exchange) – Alpha spec developmento UK National Archives – Virtual PC / Emulation of legacy systems to facilitate preservationo National Library of Medicine / NCBI – “PubMed Int’l” UK version of PubMed + NLM DTD

• Pipelineo DRIVER 2 (EU) – Infrastructure integration of across a network of European research repositories

• For Microsoft end-users, making it easier to use our software for all aspects of their research process

• For Microsoft developers, demonstrating the toolset and showing how our platform can be extended

• For non-Microsoft end-users, working to ensure the ability to interoperate with our software across all phases of the research process, as necessary

• For non-Microsoft developers, enabling transparency to our efforts in this space and encouraging a dialogue

Our goals for working in this community

Who’s here & why

Goals / Intentions

Approach

• 12:30 p.m. Welcome & OverviewLee Dirks – Director, Education & Scholarly Communication, Microsoft Research

• 1:00 p.m. Zentity - Repository Platform Alex Wade – Director, Scholarly Communication, Microsoft

• 2:00 p.m. Services for Repositories (RIC, Electronic Journals Service, Live Translator, Document Conversion Service)

Pablo Fernicola, Group Manager, Microsoft & Alex Wade• 3:00 p.m. Break• 3:15 p.m. Programming with Zentity

Savas Parastatidis, Software Philosopher, Microsoft• 4:30 p.m. Tools for Authors (AA, Ontology, Creative Commons, ORE,

Submission Wizard, etc.)Pablo Fernicola & Alex Wade

• 5:30 p.m. Wrap-up & Futures Discussion

AGENDA

Lee DirksDirector—Education & Scholarly Communication

Microsoft External Researchldirks@microsoft.com

URL – http://www.microsoft.com/scholarlycomm/

Questions?

Zentity 1.0Open Repositories ‘09 Workshop

Alex WadeDirector, Scholarly CommunicationMicrosoft External Research

Microsoft Corporation

Agenda

Ecosystem of Tool/Services

Repositories

User Environment• Search • Desktop Tools• ELNs• etc.

Translation ConversionPeer-Review

Authoring Collaboration/VREs

• Visualization • Discovery• Entity

Extraction • etc.

• Goals• System Requirements• Architectural Stack • Installation• Repository Demo

– UI– Services

• Extensibility

Agenda

Zentity – Goals

Quick Easy to install ‘Scholarly Works’ data model

Authors, Papers, Data, Videos, Code, Lectures, Books, etc.

Default Web UI

Extensible UI Toolkit Intuitive programming

experience Extensible Data Model

(entities, relationships) RDFs for new data models

Interoperable BibTeX Import RSS/Atom Syndication METS support OAI-PMH Provider OAI-ORE Simple Search API Atom Publishing Protocol SWORD

Free & Open Freely available Based on open standards SQL Server and Developer tools

available via Dreamspark

• Supported Processor Architectures– x86 and x64.

• Supported Operating Systems– Microsoft Windows Server 2008 (x86 and x64)– Microsoft Windows Vista SP1 (x86 and x64)

• Installation Requirements– Microsoft .Net Framework 3.5– Supported Microsoft SQL Server

• Microsoft SQL Server 2008 Enterprise Edition• Microsoft SQL Express 2008 with Advanced Services

• User and Configuration Requirements– Site Admin privileges are granted to the user installing Zentity– The selected Microsoft SQL Server instance must have “Windows Authentication”

enabled.– User running the installer must have ‘database creation’ permissions on the

Microsoft SQL Server instance.

System Requirements

Application Stack

SQL Server 2008(including Express edition)

ADO.NET 3.5 Entity Framework

Zentity.Core

Services

Web UI

Zentity.SecurityZentity.Search

UI.Toolkit

ScholarlyWorks Application

• A Semantic Computing platform• A hybrid between a relational database and a triple store

Zentity - Store

Triple stores- Evolution friendly- Poor performance- No need to model everything in advance- Semantic interpretation at the application level

Relational schema- Evolution not so easy- Great opportunities for optimization- Model everything in advance

Zentity Store- Maintain a balance- Try to model the frequently used entities in our app domain- Try to capture the frequently used relationships- Allow for extensibility (Relationships, Properties)

Research Output Repository Platform

PowerPoint presentation

Lecture on 2/19/2008

authored by

tony

presented by

organized by

Elizabeth, Sebastien,Matthew, Norman,Brian, Sarah, George, Roy

PDF file

is representation of contains

Installation

EULA

localhost\SQLExpress

FILESTREAM File Location

OAI-PMH database

localhost\SQLExpress

localhost\SQLExpress

Configure IIS

IIS App Pool

ZENTITY DEMO

• Basic Search• Search Filters• Advanced Query Syntax (AQS)

– Field Support • Advanced Search

Search

– http://<myserver>/Syndication/Syndication.ashx?resourcetype: book author:(tony hey)

Syndication

• Web UI & UI Toolkit– CSS– ASP.NET Controls

• Services• Search• Security• Data Model

Extensibility

SQL Server 2008(including Express edition)

ADO.NET 3.5 Entity Framework

Zentity.Core

Services

Web UI

Zentity.SecurityZentity.Search

UI.Toolkit

ScholarlyWorks Application

• The site contains access and downloads of relevant tools and resources for the worldwide academic research community. A small set of examples include:

– Research Output Repository: building blocks, tools, and services for developers who are tasked with creating and maintaining an organization’s repository ecosystem. http://research.microsoft.com/zentity

– Tools and Services for Research Collaboration: http://research.microsoft.com/en-us/collaboration/tools/default.aspx

Further Information and Resourceshttp://research.microsoft.com

Recommended