34
Microsoft’s BigData Story @LynnLangit April 2013 – Big Data Tech Con

The Microsoft BigData Story

Embed Size (px)

DESCRIPTION

deck from my talk at Big Data Tech Con in Boston April 2013

Citation preview

Page 1: The Microsoft BigData Story

Microsoft’s BigData Story@LynnLangit

April 2013 – Big Data Tech Con

Page 2: The Microsoft BigData Story

Data Expertise / Lynn Langit

• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB

• Practicing Architect• Technical author / trainer

– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server Series – 2 books on SQL Server BI

• Former MSFT FTE– 4 years

Page 3: The Microsoft BigData Story

In a Relationship?

BigDat

a

NoSQL

Page 4: The Microsoft BigData Story

BigData, NoSQL… => No Microsoft?

• Cheap Storage • Cloud Storage • Open Source data projects (Hadoop)

Big Data => keeping / getting more data

• NoSQL data projects • Mostly open source• Sharded replicas

NoSQL => schema-lite, scalable storage

Page 5: The Microsoft BigData Story

In a (Open Source) Relationship?

NoSQLHadoop

MongoDB

Neo4j

Riak

Cassandra

Cloud

AWS Heroku RackSpace OpenStack

Page 6: The Microsoft BigData Story

DEMOHDINSIGHT (HADOOP)

Data Services

Page 7: The Microsoft BigData Story

The Reality

BigData

Small BigData

Page 8: The Microsoft BigData Story

BigData Lifecycle Management

Locate

QuantifyQualify

ReplicateProcess

Present

Page 9: The Microsoft BigData Story

Locating the data

Your source

• in SQL Server• on desktops

Public source

• you find it

Private source

• you buy it

Page 10: The Microsoft BigData Story

Finding Data in Data Markets

• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps

Page 11: The Microsoft BigData Story

DEMOAZURE DATAMARKET

Data Services

Page 12: The Microsoft BigData Story

Database Lifecycle Management

• Evaluating current processes• Improving processes• Adding new tools– SSDT

• Data synchronization processes

Page 13: The Microsoft BigData Story

Storing the data

• SQL Server – can use partitioning for scalability

Relational

• Specialized data types• XML, Hierarchy, Filestream/Filetable, Geospatial• Columnstore index

Beyond relational via relational

• OLAP cubes / Mining Models• Tabular models

Multi-dimensional / in-memory

Page 14: The Microsoft BigData Story

DEMO COLUMNSTORE, XML, FILETABLE

Big Data in SQL Server 2012 – Relational Enhancements

Page 15: The Microsoft BigData Story

Data Processing

Raw dataPre-processed data

Detail dataAggregate data

Views

Page 16: The Microsoft BigData Story

Valuing the data

• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating– Social rating ,i.e. Yelp-like– Social scoring, i.e. Freebase-like

Page 17: The Microsoft BigData Story

DEMODATA QUALITY SERVICES

Data Services

Page 18: The Microsoft BigData Story

Types of Data Quality Projects• Exact matches WHERE = , WHERE <>, WHERE IN

• LIKE % -- string matchingT-SQL scripts (boolean

match)

• CONTAINSFull-text matching

(semantic word match)

• SEMANTICSIMIALARITIESTABLESemantic Search (semantic phrase match)

• List belowSSIS tasks - (transactional,

multi-valued matching)

• Knowledge Base - rules/matches• Data Quality project - clean / correct dataDQS (KB matching)

• Versioned Entities, Attributes and RulesMDS (One view of truth)

Page 19: The Microsoft BigData Story

Data Presentation

• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data

Page 20: The Microsoft BigData Story

But, does it work in Excel?

Import Data

Mash-up data with

PowerPivot – including Hadoop via

ODBC

Clean up data with

Data Quality Services

Extract-Transform-Load with

Data Explorer

Authorize with

Master Data

Services

3rd party – Mine with Predixion

Page 21: The Microsoft BigData Story

DEMOTHE POWER OF EXCEL

From Pivot tables to Visualized Data Mash-ups with Mining

Page 22: The Microsoft BigData Story

What about the UDM?

• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode– Mutually exclusive to Tabular mode

• But, should you use it anymore?

Page 23: The Microsoft BigData Story

DEMOTABULAR MODELSDATA MINING

Big Data in SQL Server 2012 – Non-Relational Features

Page 24: The Microsoft BigData Story
Page 25: The Microsoft BigData Story

Data Consumability

Appr

opria

te

Reco

gniz

able

Beau

tiful

Valid

Enjo

yabl

e

(Accurate)

(Meaningful)

(Useful)

(Appealing)

(Satisfying)

Page 26: The Microsoft BigData Story

DEMOPOWERVIEW

PowerView for Tabular Models

Page 27: The Microsoft BigData Story

Data Fluency and Job Roles

Consumer• View and

understand

Analyzer• View,

manipulate and decide

Cleaner• Validate

and update

Artist• Visualize

and present

Page 28: The Microsoft BigData Story

BigData in SQL Server 2012• Scaling via

• Partitioning for Tables, indexes• PDW• Columnstore indexes• Special Data Types

• XML, Hierarchy, Filetable

Relational engine

• OLAP Cubes• Tabular Models• Data Mining Models

Analysis service engines

• Data Quality Services• Master Data Services• StreamInsight

Other services

Page 29: The Microsoft BigData Story

Other Data Services from Microsoft

Windows Azure

MarketplaceSQL Azure

Data Explorer Power Pivot

Page 30: The Microsoft BigData Story

NoSQL – New Products / Betas

SSRS on Azure

HDInsight (Hadoop on Azure)

Cloud-based Data Explorer

PowerView

Semantic Search

Page 31: The Microsoft BigData Story

Announced Futures

Hekatron Polybase Cloud Numerics

Infer.NET Fun (F#)

Many MSR Data Mining

Projects

Page 32: The Microsoft BigData Story

The Changing Data Landscape

NoSQLRDBMS

OtherServices

Page 33: The Microsoft BigData Story

www.TeachingKidsProgramming.org• Free Courseware • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic• C# on Pluralsight

• recipes)

Page 34: The Microsoft BigData Story

Toward Data Craftsmanship…

Follow me• @LynnLangit• www.LynnLangit.com• YouTube - SoCalDevGal

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions