The Microsoft BigData Story

Preview:

DESCRIPTION

deck from my talk at Big Data Tech Con in Boston April 2013

Citation preview

Microsoft’s BigData Story@LynnLangit

April 2013 – Big Data Tech Con

Data Expertise / Lynn Langit

• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB

• Practicing Architect• Technical author / trainer

– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server Series – 2 books on SQL Server BI

• Former MSFT FTE– 4 years

In a Relationship?

BigDat

a

NoSQL

BigData, NoSQL… => No Microsoft?

• Cheap Storage • Cloud Storage • Open Source data projects (Hadoop)

Big Data => keeping / getting more data

• NoSQL data projects • Mostly open source• Sharded replicas

NoSQL => schema-lite, scalable storage

In a (Open Source) Relationship?

NoSQLHadoop

MongoDB

Neo4j

Riak

Cassandra

Cloud

AWS Heroku RackSpace OpenStack

DEMOHDINSIGHT (HADOOP)

Data Services

The Reality

BigData

Small BigData

BigData Lifecycle Management

Locate

QuantifyQualify

ReplicateProcess

Present

Locating the data

Your source

• in SQL Server• on desktops

Public source

• you find it

Private source

• you buy it

Finding Data in Data Markets

• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps

DEMOAZURE DATAMARKET

Data Services

Database Lifecycle Management

• Evaluating current processes• Improving processes• Adding new tools– SSDT

• Data synchronization processes

Storing the data

• SQL Server – can use partitioning for scalability

Relational

• Specialized data types• XML, Hierarchy, Filestream/Filetable, Geospatial• Columnstore index

Beyond relational via relational

• OLAP cubes / Mining Models• Tabular models

Multi-dimensional / in-memory

DEMO COLUMNSTORE, XML, FILETABLE

Big Data in SQL Server 2012 – Relational Enhancements

Data Processing

Raw dataPre-processed data

Detail dataAggregate data

Views

Valuing the data

• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating– Social rating ,i.e. Yelp-like– Social scoring, i.e. Freebase-like

DEMODATA QUALITY SERVICES

Data Services

Types of Data Quality Projects• Exact matches WHERE = , WHERE <>, WHERE IN

• LIKE % -- string matchingT-SQL scripts (boolean

match)

• CONTAINSFull-text matching

(semantic word match)

• SEMANTICSIMIALARITIESTABLESemantic Search (semantic phrase match)

• List belowSSIS tasks - (transactional,

multi-valued matching)

• Knowledge Base - rules/matches• Data Quality project - clean / correct dataDQS (KB matching)

• Versioned Entities, Attributes and RulesMDS (One view of truth)

Data Presentation

• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data

But, does it work in Excel?

Import Data

Mash-up data with

PowerPivot – including Hadoop via

ODBC

Clean up data with

Data Quality Services

Extract-Transform-Load with

Data Explorer

Authorize with

Master Data

Services

3rd party – Mine with Predixion

DEMOTHE POWER OF EXCEL

From Pivot tables to Visualized Data Mash-ups with Mining

What about the UDM?

• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode– Mutually exclusive to Tabular mode

• But, should you use it anymore?

DEMOTABULAR MODELSDATA MINING

Big Data in SQL Server 2012 – Non-Relational Features

Data Consumability

Appr

opria

te

Reco

gniz

able

Beau

tiful

Valid

Enjo

yabl

e

(Accurate)

(Meaningful)

(Useful)

(Appealing)

(Satisfying)

DEMOPOWERVIEW

PowerView for Tabular Models

Data Fluency and Job Roles

Consumer• View and

understand

Analyzer• View,

manipulate and decide

Cleaner• Validate

and update

Artist• Visualize

and present

BigData in SQL Server 2012• Scaling via

• Partitioning for Tables, indexes• PDW• Columnstore indexes• Special Data Types

• XML, Hierarchy, Filetable

Relational engine

• OLAP Cubes• Tabular Models• Data Mining Models

Analysis service engines

• Data Quality Services• Master Data Services• StreamInsight

Other services

Other Data Services from Microsoft

Windows Azure

MarketplaceSQL Azure

Data Explorer Power Pivot

NoSQL – New Products / Betas

SSRS on Azure

HDInsight (Hadoop on Azure)

Cloud-based Data Explorer

PowerView

Semantic Search

Announced Futures

Hekatron Polybase Cloud Numerics

Infer.NET Fun (F#)

Many MSR Data Mining

Projects

The Changing Data Landscape

NoSQLRDBMS

OtherServices

www.TeachingKidsProgramming.org• Free Courseware • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic• C# on Pluralsight

• recipes)

Toward Data Craftsmanship…

Follow me• @LynnLangit• www.LynnLangit.com• YouTube - SoCalDevGal

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions

Recommended