33
©2012 Microsoft Corporation. All rights reserved. Machine Learning in SharePoint 2013 Naomi Moneypenny CTO, Synxi ManyWorlds #SPSKC

SPSKC Machine Learning in SharePoint

Embed Size (px)

DESCRIPTION

Overview of Machine Learning for SharePoint, predictive analytics, proactively anticipating the content that users need in context.

Citation preview

Page 1: SPSKC Machine Learning in SharePoint

©2012 Microsoft Corporation. All rights reserved.

Machine Learning in SharePoint 2013Naomi MoneypennyCTO, SynxiManyWorlds #SPSK

C

Page 2: SPSKC Machine Learning in SharePoint

Thanks to our Sponsors!!!

Page 3: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

About Me…

Chief Technology Officer at ManyWorlds, Inc.

Internally responsible for our knowledge infrastructure – defacto CIO

Run the engineering team for Synxi - machine learning engine for personalized knowledge and expertise discovery in SharePoint and other collaborative environments such as Yammer & Tibbr platforms. Passionate about user adoption and

enterprise collaboration & innovation

geek

Technology forecasting and strategy manager at Shell, consulted at many Fortune 100 companies since

20+ patents in adaptive systems

Naomi Moneypenny [email protected]

3000+ followers on Twitter

3+3 dogs Astrophysicist

Page 4: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Personalized Discovery that Anticipates is an Expected Feature In the Consumer IT World . ..

Page 5: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

User Experiences of the Future…

search

navigation

Recommendations

Page 6: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

The Imperative for the Enterprise is Recognized . . .

Machine learning is the most significant technology trend. Computers have to

get smarter and anticipate.

Kevin Turner, Microsoft COO, July 2012

Page 7: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Big Data

• The information required for delivering a richly anticipatory and personalized user experience in the enterprise has become possible .

• While there’s a lot of talk about Big Data, the human capacity for understanding has not changed, this is why it’s important why machine learning is necessary

Page 8: SPSKC Machine Learning in SharePoint

©2012 Microsoft Corporation. All rights reserved.

SharePoint 2013 What Can We Do Out of the Box

- Search Analytics- Web Parts

Page 9: SPSKC Machine Learning in SharePoint

The Analytics Platform replaces the Web Analytics service application

Some of the reasons for that included: There was no concept of item-to-item recommendations based on

user behavior, i.e. people who viewed this also viewed that Couldn’t promote search results based on an item’s popularity (as

determined by # of times an item was viewed) It required a very powerful SQL box and significant storage and IO Lists don’t have explicit view counts The architecture had problems scaling to large numbers

New Replacement for Web Analytics Service

Page 10: SPSKC Machine Learning in SharePoint

The new Analytics Processing engine aims to solve these issues:

Find relevant information (improve search relevance) – based on views, click thru, etc.

See what others are looking at (“hot” indicators and usage numbers – i.e. what’s popular based on # of views as well as # of unique users to view)

Understand how much content is being used (i.e. viewed) and how it compares to other documents

See discussion thread usage and find the hot topics Use this popularity info to populate views through the Content by

Search (CBS) WebPart The model is extensible for 3rd parties to build into the platform

How the New Platform Improves on Analytics

Page 11: SPSKC Machine Learning in SharePoint

Data goes through an analysis and reporting process that is contained within the search service application

Things like views and counts are combined with click-thru and other search metrics and pushed into the reporting database

Some data like view counts are also pushed into the index so it can be included in search results, sorted on (i.e. what’s most viewed), etc.

An analytics processing job examines data for clicks, links, tags, etc., as well as the usage data to create the data points used for reporting

Processing and Storing Analytics Data

Page 12: SPSKC Machine Learning in SharePoint

Search AnalyticsAnalysis Description

Anchor text processingAnchor text processing analyzes how items in the content corpus are interlinked. It also includes the anchor texts associated with the links in the analysis. The Analytics Processing Component uses the results of the analysis to add rank points to the items in the search index.

Click DistanceThe Click Distance analysis calculates the number of clicks between an authoritative page and the items in the search index. An authoritative page can be a top level site, for example http://www.contoso.com, or other pages that are viewed as important. You can define Authorative pages in Central Administration. The Analytics Processing Component uses the results of the analysis to add rank points to the items in the search index.

Search ClicksThe Search Clicks analysis uses information about which items users click in search results to boost or demote items in the search index. The analysis calculates a new ranking of items compared to the base relevance.The clicks data is stored in the Link database.

Social Tags The Social Tags analysis analyses social tags, which are words or phrases that users can apply to content to categorize information in ways that are meaningful to them.In SharePoint Server 2013, social tags are not used for refinement, ranking, or recall by default. However, you can create custom search experiences that use social tags and the information from this analysis.

Social Distance The Social Distance analysis calculates the relationship between users who use the Follow person feature. The analysis calculates first and second level Followings: first level Followings first, and then Followings of Following.The information is used to sort People Search results by social distance.

Search Reports

The Search Reports analysis aggregates data and stores the data in the Analytics reporting database where it's used to generate these search reports:•Number of queries•Top queries•Abandoned queries•No result queries•Query rule usageThe report information is saved in the Search service application, and not with the items in the search index. If you delete the Search service application, the report information is also deleted.

Deep LinksThe Deep Links analysis uses information about what people actually click in the search results to calculate what the most important sub-pages on a site are. These pages are displayed in the search results as important shortcuts for the site, and users can access the relevant sub-pages directly from the search results.

Page 13: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Results- Easy for site managers to add this functionality into their

sites where appropriate

Page 14: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

But…

- Not enterprise appropriate- ‘Paves the cowpaths…’ not personalized- Does not learn from usage, strictly pattern id- Construction of solution requires number of algorithmic

techniques

Page 15: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Personalized Discovery: Use Case

Follow

when you know sources of information

that are generally relevant

Search

when you know what

information you need now

but don’t know where it

is

Discovery

when you don’t know what you need now

or even know that it exists

?+

Page 16: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

One Approach: Support Vector Machines• SVM was first introduced in 1992 and SVM is

related to statistical learning theory • SVM becomes popular because of its success

in handwritten digit recognition 1.1% test error rate for SVM. This is the same as the error rates

of a carefully constructed neural network, LeNet 4.• SVM is now regarded as an important

example of “kernel methods”, one of the key areas in machine learning Note: the meaning of “kernel” is different from the “kernel”

function for Parzen windows

Page 17: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

What’s a Good Boundary?

• Consider a two-class, linearly separable classification problem

• Many decision boundaries! The Perceptron algorithm can be

used to find such a boundary Different algorithms have been

proposed • Are all decision

boundaries equally good?

Class 1

Class 2

Page 18: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Bad Boundary Setting

Class 1

Class 2

Class 1

Class 2

Page 19: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Large-margin Decision Boundary• The decision boundary should be as far away from the data of

both classes as possible We should maximize the margin, m Distance between the origin and the line wtx=k is k/||w||

Class 1

Class 2

m

Page 20: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Finding the Decision Boundary

• Let {x1, ..., xn} be our data set and let yi Î {1,-1} be the class label of xi

• The decision boundary should classify all points correctly Þ

• The decision boundary can be found by solving the following constrained optimization problem

• This is a constrained optimization problem. Solving it requires some new tools

Page 21: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Extension to Non-linear Decision Boundary

• So far, we have only considered large-margin classifier with a linear decision boundary

• How to generalize it to become nonlinear?• Key idea: transform xi to a higher dimensional space to

“make life easier” Input space: the space the point xi are located Feature space: the space of f(xi) after transformation

• Why transform? Linear operation in the feature space is equivalent to non-linear operation in input

space Classification can become easier with a proper transformation. In the XOR problem, for

example, adding a new feature of x1x2 make the problem linearly separable

Page 22: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Transforming the Data• Computation in the feature space can be costly because it is

high dimensional The feature space is typically infinite-dimensional!

f( )

f( )

f( )f( )f( )

f( )

f( )f( )

f(.) f( )

f( )

f( )f( )

f( )

f( )

f( )

f( )f( ) f( )

Feature spaceInput spaceNote: feature space is of higher dimension than the input space in practice

Page 23: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Solution: Apply many different techniques- Fuzzy Network structure

- Algorithms that adapt depending on data

- Understand that big data is only meaningful with machine learning interpolation

Page 24: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Adaptive Discovery Overview

Contextualization

Personalization

Relevancy & Quality

Topic 1 Topic 2 Topic 3 Topic 4 Topic N. . . . . . . . . . . . .

Inferred Interests

Topic 1 Topic 2 Topic 3 Topic 4 Topic N. . . . . . . . . . . . .

Recency Popularity

Ratings People like you

New to you?

Inferred Relative Expertise

Recommendations of Content, People, and Topics

Page 25: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Leading Practice Enterprise Stack

Learning Layer

Social Layer

Process Layer

Content & Applications Layer

Cloud(Internal or External)

The Adaptive IT StackSynxi® /

Personalization Apps

Social Platforms

SharePoint

Page 26: SPSKC Machine Learning in SharePoint

Recommendations that learn

and adapt to every user. Also contextually

relevant, e.g. context is current document

or last document user interacted with.

Tuning Control so each user can

personalize algorithms.

Example: engineers may prefer My

Interests at highest level.

Page 27: SPSKC Machine Learning in SharePoint

Recommendations of other people are based on relative inferred expertise

with respect to specific context.

Content type can

be specified based on site

owner’s preferences.

Page 28: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Yammer Discovery

Page 29: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Cross-Platform Personalized Discovery

Recommendedcross-

contextualized and personalized SharePoint documents

Content, subject and people

recommendations sourced from tibbr

Page 30: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Enterprise Discovery

Personalization

Machine learning-based inferences of interests and expertise

Context Aware

Personalizing and delivering what’s most relevant to the user’s current activities

Recommendations

Recommend knowledge and expertise (i.e., content and other users).Engineered serendipity!

Page 31: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

General Business Benefit Case• Enhance knowledge worker productivity by delivering the right information to the right person at the right

time

• Amplify the value of expertise by matching expertise at a granular level and in real-time with users who can leverage the expertise

• Enhance innovation by increasing beneficial serendipity through delivery of valuable knowledge and expertise that users may not otherwise encounter

• Amplify your investment in SharePoint, social platforms, and/or enterprise search by leveraging the social-based big data that is automatically generated but that is otherwise not utilized, as well as by enabling learning synergy across these environments

• Future proofing by implementing a persistent learning layer that is extensible to alternative social platforms, search engines, learning management systems, HR-based solutions, and more!

• User training is minimal

Page 32: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Wrap UpOpportunities to add consumer expected features to SharePoint

The future of interfaces is not just responsive to hardware, but adaptive to your needs

Machine learning is key to being able to actually use Big Data

Page 33: SPSKC Machine Learning in SharePoint

@nmoneypenny wwww.Synxi.com

Thank you!• Naomi Moneypenny

• Please email me any comments or questions, always happy to help!

[email protected]

• www.Synxi.com• @nmoneypenny