Upload
abdul-khalique
View
219
Download
0
Embed Size (px)
Citation preview
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 1/31
Semantic Application for
Digital Repositories
Fabrizio GagliardiEMEA & LATAM Director
Technical ComputingMSR External ResearchMicrosoft Corporation
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 2/31
• Advancement of Science
• Global Collaboration• Technology Excellence
• Interoperability
Microsoft Research’s Commitment to Science
Putting computing into science…Applying Microsoft products and research technologies to advancethe scientific research and engineering innovation process
Putting science into computing…Ensuring that research community requirements are factored into
future versions of Microsoft software
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 3/31
• Semantic relationships between different data
• Semantic descriptions of services
• Annotations
• Provenance• Repositories
• Ontologies
myGrid
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 4/31
Research Output Repository Platform
Goals• A platform for building services and tools for research
output repositories• Papers, Videos, Presentations, Lectures,
References, Data, Code, etc.• Relationships between stored entities
• Enable a tools and services ecosystem for “research
output” repositories on MS technologies
Execution• Utilizing OAI-ORE, SWORD, and other
community protocols• In development, deployment within MSR in early Q4• Beta release to the community in late Q4• Built on SQL Server 2008 + Entity Framework
• Using WPF and Silverlight for UI
Researchoutput
repository
platform
UIs
DesktopTools
SyndicationInterop
Search
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 5/31
Goals
• Create a platform for building“research output” repositories
• Engage with the digital library and
scholarly communications
community
• Become the “research output”
repository for MSR (RMCr project)
– Papers, Videos, Presentations, Lectures,
References, Data, Code, etc.
• Support an ecosystem of services and
tools
• Available to the community for free(we are still considering the open
source route)
• Build an easy-to-install collection of
basic services and tools
Non-goals
• A generic platform for assetmanagement
• Support the lifecycle of publications
• Compete with existing repository
solutions
Research Output Repository Platform
Services/tools
Microsoft.Famulus.Framework
Microsoft.Famulus.Core
(Based on the Entity Framework Model + extensions)
SQL Server 2008, MS data storage technologies, Entity
Framework runtime
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 6/31
• A Semantic Computing platform
• A hybrid between a relational database and a triple store
Research Output Repository Platform
Triple stores
-Evolution friendly
-Poor performance
-No need to model everything in advance-Semantic interpretation at the application level
Relational schema
-Evolution not so easy
-Great opportunities for optimization
-Model everything in advance
Research Output Repository Platform-Maintain a balance
-Try to model the frequently used entities in our app domain
-Try to capture the frequently used relationships
-Allow for extensibility (Relationships, Attributes)
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 7/31
An intuitive programming experience
Person tony = new Person();
Publication pub1 = new Publication();
pub1.Title = "Title1";
Publication pub2 = new Publication();
pub2.Title = "Title2";
pub1.Cites.Add(pub2);
pub1.Authors.Add(tony);
Tag tag = new Tag();
tag.Name = "keyword";
pub1.Tags.Add(tag);
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 8/31
Research Output Repository Platform
PowerPoint
presentation
Lecture on
2/19/2008
authored by
tony
presented by
organized by
Elizabeth, Sebastien,
Matthew, Norman,
Brian, Sarah, George, Roy
PDF file
is representation of contains
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 9/31
Researchers manage their personal research entities(data, citations, documents, workflows, etc.)
Entities + Relationships can be synchedto cloud storage so that they are:
- Always Available
- Sharable
- Mixable
- Harvestable
An Ecosystem of Research Repositories
Support of harvesting & federationto/from Institutional Repositories
- arXiv.org
- DSpace- ePrints
- Fedora
- etc.
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 10/31
• Limit Tech Preview release due June 2008
• Public Beta targeted for Aug/Sept 2008
For more details – Contact:
• Alex Wade (Program Manager) / [email protected]
– Community Forum:
• http://community.research.microsoft.com/forums/90.aspx
Current Project Status
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 11/31
eScience and Semantic Computingmeet the Cloud
The cyberinfrastructure for the next
generation of researchers
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 12/31
• Expect scientific research environments will follow
similar trends to the commercial sector – Leverage computing and data storage in the cloud
– Scientists already experimenting with Amazon S3 and EC2services, with mixed results;
• For many of the same reasons – Siloed research teams, no resource sharing across labs
– High storage costs
– Low resource utilization
– Excess capacity – High costs of reliably keeping machines up-to-date
– Little support for developers, system operators
12
The Future: Software plus Services for Science?
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 13/31
• Collective intelligence
– If last.fm can recommend what song to broadcast to mebased on what my friends are listening to, why cannot thecyberinfrastructure of the future recommend articles of potential interest based on what the experts in the field
that I respect are reading? – Already examples emerging but the process is manual
(Connotea, BioMedCentral Faculty of 1000 ...)
• Automatic correlation of scientific data
• Smart composition of services and functionality
• Cloud computing to aggregate, process, analyze andvisualize data
A smart cyberinfrastructure
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 14/31
• Important/key considerations
– Formats or “well-known” representations
of data/information
– Pervasive access protocols are key (e.g.
HTTP)
– Data/information is uniquely identified
(e.g. URIs)
– Links/associations between
data/information
• Data/information is inter-
connected through machine-interpretable information (e.g.
paper X is about star Y)
• Social networks are a special case
of ‘data networks’
A world where all data is linked…
Attribution: Richard Cyganiak
d d/ d/ l d h
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 15/31
…and stored/processed/analyzed in the
cloudscholarly
communications
domain-specific services
instant
messaging
identity
document store
blogs &
social networking
notification
search
books
citations
visualization and
analysis services
storage/data
services
compute
services
virtualization
Projectmanagement
Reference
management
knowledge
management
knowledge
discovery
Vision of Future Research
Environment with bothSoftware + Services
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 19/31
• Thousand years ago – Experimental Science
– Description of natural phenomena• Last few hundred years – Theoretical Science
– Newton’s Laws, Maxwell’s Equations…
• Last few decades – Computational Science – Simulation of complex phenomena
• Today – eScience or Data-centric Science – Unify theory, experiment, and simulation
– Using data exploration and data mining• Data captured by instruments
• Data generated by simulations
• Data generated by sensor networks
– Scientists overwhelmed with data – Computer Science and IT companies
have technologies that will help
(With thanks to Jim Gray)
Emergence of a New Research Paradigm?
2
2
2.
3
4
a
cG
a
a
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 20/31
Web users...
• Generate content on the Web – Blogs, wikis, podcasts, videocasts,
etc.
• Form communities
– Social networks, virtual worlds
• Interact, collaborate, share
– Instant messaging, web forums,
content sites
• Consume information and
services
– Search, annotate, syndicate
Scientists...
• Annotate, share, discover data – Custom, standalone tools
• Conferences, Journals
– Publication process is long,
subscriptions, discoverability issues• Collaborate on projects, exchange
ideas
– Email, F2F meetings, video-
conferences
• Use workflow tools to compose
services
– Domain-specific services/tools
Today
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 21/31
Data can be easily produced
http://ecrystals.chem.soton.ac.uk
Thanks to Jeremy Frey
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 22/31
Data and services can be easily composed
SensorMap
Functionality: Map navigation
Data: sensor-generated temperature, video camera feed,
traffic feeds, etc.
Taverna Workflow
Compose services from the Web
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 23/31
Data is easily accessible
With thanks to
Catharine van Ingen
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 24/31
Data is easily shareable
Sloan Digital Sky Server/SkyServer
http://cas.sdss.org/dr5/en/
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 25/31
Today…
storing computing
managing indexing
huge amountsof data
For example, Google and Microsoft both have copies of the Web
for indexing purposes
Computers aregreat tools for
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 26/31
Tomorrow…
acquisition discovery
aggregation organization
correlation analysis
interpretation inference
We would likecomputers to also
help with theautomatic
of the world’s
information
storing computing
managing indexing
huge amountsof data
Computers will stillbe great tools for
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 27/31
Semantic Computing
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 28/31
• Set of concepts and technologies
– Data modeling
– Relationships
– Ontologies
– Machine learning (entity extraction) – Inference, reasoning
– Data, information, knowledge…
What is Semantic Computing?
Data Information Knowledge Intelligence Wisdom
Current technologies
Possibilities for innovation
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 29/31
• Term used to refer to the concept of “meaning”
• The linguistics, AI, Natural Language Processing,
etc. communities have been working on
“meaning” and ”knowledge” related technologies
for decades
• Pragmatic approach to Semantic Computing
– Emergence of a new breed of technologies to capture
meaning (RDF, OWL, etc.)
– Combine with the pervasiveness of the Web
community technologies such as folksonomies …
Semantics
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 30/31
• The term is used to describe a set
of technologies used to representdata, concepts, and their
relationships
– Become a buzzword like Web 2.0
• Prefer to use the term “SemanticComputing” which is about
modeling data in ways that can
be automatically processed by
computers
A word about the “Semantic Web”
8/8/2019 Sem a Tic Microsoft
http://slidepdf.com/reader/full/sem-a-tic-microsoft 31/31
• Some efforts are driven by the traditional
“knowledge engineering” community – Engaged in building well-controlled ontologies
– Important for domain-specific vocabularies with dataformats and relationships specific to a community
– Model does not easily scale to the Internet
• Some efforts are driven by the Web 2.0 community
– Focus on the pervasiveness of Web protocols/standards
– Emphasis on microformats (small, flexible, embeddablestructures)
– Exploit evolving and ever-expanding vocabularies such asfolksonomies and tag clouds
Semantic Computing