20
MAKING BUSINESS INTELLIGENT www.pragmaticworks.c om Into the Wild Taming Unstructured Data with Semantic Search Chris Price Senior BI Consultant @BluewaterSQL

Into the Wild...Taming Unstructured Data with Semantic Search

Embed Size (px)

DESCRIPTION

There is runaway growth in the data volumes many organizations are facing today. The bad news is that much of this data is unstructured which means your traditional RDBMS just isn't capability of helping you deal with it. As a result significant emphasis has been put on technologies like Hadoop, No SQL and other distributed databases which are better suited to handling unstructured data. With the latest release SQL Server 2012 however, Microsoft has provided new features which will help tame some of this unstructured data. This session will dive into the new FileTable and Statistical Semantic Search features. We will show you how they work and highlight real world examples for integrating these exciting new features into your organization.

Citation preview

Page 3: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Outline Data gone Wild FileStream -> FileTable Full-Text

FileTable/Full-Text Integration SQL Server 2012 Enhancements

Semantic Search Search Scenarios

Page 4: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Data Gone Wild! Data by any other name….

Structured: Tabular, CSV & Fixed Width Semi-Structured: HTML, XML & JSON Unstructured: Images, Videos PDF & Email

80% of this stuff is not found in a DB Difficult to Integrate Hard to manage

Page 5: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Key Objective

SQL Server 2012 is a great choice for integrating and managing structured, semi-structured & unstructured data

Page 6: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileStream Introduced in SQL Server 2008 Integrated DB Engine with NFTS File System VARBINARY(MAX) columns stored on File

System Dual Programming Model:

Transact SQL (No write) Win 32 Streaming (ODBC or OLE DB/ADO.NET)

Non-Trivial (Requires a Transactional Context)

Page 7: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Introduce in SQL Server 2012 Built over top FileStream Win32 API Access Implemented as a fixed format table:

FileStream Storage/Container Fille System Properties (Columns) Hierarchy ID (synthesized hierarchical file system

share)

Page 8: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Accessed through File System Share or Table

SMB Protocol for Remote Access Open docs in MS Word, Excel, etc

Share Allows Non-Transactional Access No Memory-Mapped Files (Notepad/Paint)

File Name/Properties Preserved Supports directory structures

Page 9: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Format

Page 10: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Set-Up Enable FileStream DATABASE

TABLE

Page 11: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Access Share:

\\<server>\<instance>\<database>\<table> T-SQL:

Insert/Update/Delete Can update a stream without affecting timestamp Cannot delete directories that have files

Functions: GetFileNamespacePath() FiletableRootPath() GetPathLocator()

Page 12: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Demo

Page 13: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Full-Text Enhanced in 2012

7-10x fast than prior version Scales up to >350m documents

NEW Property Search Filter for document properties (i.e. Author ,Title)

iFilter must support Customizable NEAR

CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, false’) CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, true’)

Page 14: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Full-Text Demo

Page 15: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Search Built on top of Full-Text What is a semantic search?

Full-Text finds words….Semantic Search meaning Extract & Index statistically significant keywords

Tag Clouds, Etc Identify related/similar docs

Based on Keywords) Explain how/why two docs are related

Page 16: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Set-Up Install Office Filter Pack & Filter Pack SP 1

Install, Attach & Register the Semantic DB

Page 17: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Verify Filters

Page 18: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Results SemanticKeyPhraseTable

Extracts key phrases for entire corpus or single document

SemanticSimilarityTable Finds similar documents

SemanticSimilarityDetailsTable Displays similarity details for two matched

documents

Page 19: Into the Wild...Taming Unstructured Data with Semantic Search

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Search Demo