25
Microsoft Semantic Engine Naveen Garg, Duncan Davenport Microsoft Corporation SVR32

MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Embed Size (px)

Citation preview

Page 1: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Microsoft Semantic Engine

Naveen Garg, Duncan DavenportMicrosoft Corporation

SVR32

Page 2: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

MICROSOFT SEMANTIC ENGINE

Unified Search, Discovery and Insight

Page 3: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

THE SITUATION TODAY

Significant Content is Outside Structured Storage (RDBMS, OLAP, BI)Integration of this Content is Prohibitively Expensive (Time, Money, Resources)Extracting Insight, Analytics, and Recommendations is even harderSituation is a Confluence of Search | Predictive Analytics | Large-Scale Collaborative Filtering

Page 4: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

THE SOLUTION

Having all forms of digital information on a single platform allows people to blend unstructured and structured content and to drive insight and decision making

Microsoft Semantic Engine provides a combination of technologies to form a contextual understanding of all digital content

Page 5: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

Cri

tica

l B

usi

ness

Need

Analysts gather documents, media and web content about “Business Analytics”, “Data Integration” and “Search and Discovery” C

ore

Mach

ine L

earn

ing

Unsupervised learning infers “Unified Information Access” concept cluster based on automated analysis of content E

ffici

ent

Data

A

ggre

gati

on

Cluster gains in relevance from mining across unstructured and structured sources added from ERP and BI systems

Use

r R

ele

vance

Boost Users (BDM) re-

label cluster as “Unified Search, Discovery and Insight” and engine adopts it further boosting that cluster relevance

Colla

bora

tive B

oost

Analysts collate this content requiring multi-resolution super-clusters with embedded sub-clusters

Busi

ness

Deci

sion

Maki

ng

The CxO explores super-cluster and drafts business plan for her new division

SCENARIO|MEANING DRIVEN INSIGHT

Page 6: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

SCENARIOS|UNIVERSAL APPEAL

Search and Collaboration | Personalized search, discovery and organizationLegal | Precedent and subject based search over large scale textual corpusesLife Sciences | Systems biology with large volume data correlation and searchGovernment Services | Intelligence, real-time analytics, visualization, clusteringSocial Networking | Social graph relevance mining, ranking criteria auto tuning

Page 7: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

FEATURES|UNIFY YOUR CONTENT

Unified Search, Discovery and InsightAutomatic Clustering and Organization Meaning-Driven Indexing, Classification and StorageScalable Content Processing over all Content TypesInstant On Experience for Out of Box Value

Page 8: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DEMO|VIEWS GALLERY

Search, Discover and Organize features exposed via sample UX gallerySeamless installation and indexing of desktop, email and web contentFully documented Managed APIs used in UX gallery and JavaScript / C# samples

Page 9: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|MEANING-DRIVEN PROCESSING

Streams | Descriptors (Properties) | Kinds (Concepts)Streams processed into contextualized and indexed concepts for search | discovery | organization

KR_CLIENT_225.docxSTREAM

LEGAL DOCUMENTCONCEPT

BILLABLE WORKCONCEPT

EVIDENCECONCEPT

DEPOSITIONCONCEPT

EXTRACTED PROPERTIESPROPERTY

LEGAL CASE [xxx]CONCEPT CLUSTER

SEARCH AND SHAREMDP

Page 10: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|ARCHITECTURE

Engine consists of self-contained set of pluggable services

Text Processing

Image Processing

Video Processing

Audio Processing Supervised Machine Learning

Clustering MDI (RBV)

Conceptual Search

Inference Sequence Store (Suffix Tree) Distributed Content Store Ontology and Taxonomy

Management

Semantic Engine

Search and Markup Trend and Predictive Analysis Automatic Organization Recommendation and

Discovery

Page 11: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|SCALABLE ARCHITECTURE

The logical architecture partitions analysis, indexing and storage

API1 API2 API3Analysi

s3

Analysis2

Analysis1

Staging Core Index Stream

Scale out by adding boxes; standard “web farm” (VIP) configuration

Scale out by adding boxes; each box can run all processors or specific processors

Store(<content>) Annotate(<kind>)Index(<content>) Organize(<kinds>)Search(<query>) …

TextImageAudio Video Video

Single Logical Partitionable

Page 12: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|PROGRAMMING

Designed to be hassle free out of the boxSeveral programming languages and frameworks supportedCLR/.NET, JavaScript, TSQL, C++

Page 13: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|PROGRAMMING

Sample of storing a stream in the systemInitiates the content processing, classification, and indexing

Page 14: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|PROGRAMMING

Sample of search and recommendationsReturns contextual results from the store and the web

Page 15: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DEMO|WINDOWS 7 SHELL EXTENSION

Seamless Integration in Windows Desktop Federated SearchExpose Meaning-Driven Indexing and Semantic ActionsZero Learning Curve

Page 16: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|ARCHITECTURE DETAILS

System Integration Fabric (SIF)

ImportersImportersImporters

Files

API Layer

PlugInsPlugInsPlug-Ins

SemanticEngine

Database

Kind Descriptor Stream KindLink

ListKind

Page 17: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN|ANATOMY OF A KIND

KindID SourceUri

00000000-1111

C:\My Documents\Saint Germain Des Pres Cafe (Finest electro-jazz compilation)\05 Track 5.wma

StreamID KindID StreamUri

Format Stream

11111111-2222

00000000-1111

audio/x-ms-wma

0xFFD8FFE000104A4649460001…

DescriptorID KindID Type Attribute ValueDescriptorID KindID Type Attribute Value

10000000-0000

00000000-1111

Classification

Audio 1.0

20000000-0000

00000000-1111

Metadata

Name 05 Track 5.wma

30000000-0000

00000000-1111

Metadata

Item Type Windows Media Audio File

DescriptorID KindID Type Attribute Value

10000000-0000

00000000-1111

Classification

Audio 1.0

20000000-0000

00000000-1111

Metadata

Name 05 Track 5.wma

30000000-0000

00000000-1111

Metadata

Item Type Windows Media Audio File

40000000-0000

00000000-1111

Metadata

Length 00:05:22

50000000-0000

00000000-1111

Metadata

WM/ProviderStyle

Electronica

DescriptorID KindID Type Attribute Value

10000000-0000

00000000-1111

Classification

Audio 1.0

20000000-0000

00000000-1111

Metadata

Name 05 Track 5.wma

30000000-0000

00000000-1111

Metadata

Item Type Windows Media Audio File

40000000-0000

00000000-1111

Metadata

Length 00:05:22

50000000-0000

00000000-1111

Metadata

WM/ProviderStyle

Electronica

60000000-0000

00000000-1111

Audio Tonality/Major 0.78

70000000-0000

00000000-1111

Audio Tempo/Moderato

0.79

DescriptorID KindID Type Attribute Value

10000000-0000

00000000-1111

Classification

Audio 1.0

20000000-0000

00000000-1111

Metadata

Name 05 Track 5.wma

30000000-0000

00000000-1111

Metadata

Item Type Windows Media Audio File

40000000-0000

00000000-1111

Metadata

Length 00:05:22

50000000-0000

00000000-1111

Metadata

WM/ProviderStyle

Electronica

60000000-0000

00000000-1111

Audio Tonality/Major 0.78

70000000-0000

00000000-1111

Audio Tempo/Moderato

0.79

80000000-0000

00000000-1111

Classification

Music .8

Page 18: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN| MODELSPACE

Page 19: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DESIGN| PROPERTYSPACE

Periodically, MSE checks the User database for ChangesAll Change data is returned to MSE as one XML blockMSE creates Kinds and Descriptors as needed, and Commits the activityMSE data is exposed through custom views keyed to the Users’ Primary Keys

Page 20: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

DEMO|SQL PROPERTY PROMOTION

Seamless Integration of Meaning-Driven Indexing in ALL SQL TablesExpose Meaning-Driven Indexing via T-SQL

Page 21: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Session Code: SVR32

FUTURE

PARTING THOUGHTS

Unified Search, Discovery and Insight over Every Digital ArtifactExtensible and Scalable Semantic PlatformZero Learning Curve

Page 22: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

YOUR FEEDBACK IS IMPORTANT TO US!

Please fill out session evaluation

forms online atMicrosoftPDC.com

Page 23: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

Learn More On Channel 9

> Expand your PDC experience through Channel 9.

> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses.

channel9.msdn.com/learnBuilt by Developers for Developers….

Page 24: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 25: MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight