18
p. 1 1 Chapter 2 - Video # 4 What's the Difference Between a Data Warehouse and a Data Mart? 1 Chapter 2: Business Intelligence & Data Warehousing with SSAS Course: SQL Server 2008/R2 Analysis Services Course Id: 165 Presented by Scott Whigham

Data Mart

Embed Size (px)

DESCRIPTION

chapter02_04_datamart

Citation preview

Page 1: Data Mart

p. 1

1

Chapter 2 - Video # 4

What's the Difference Between a Data Warehouse and a Data Mart?

1

Chapter 2: Business Intelligence & Data Warehousing with SSASCourse: SQL Server 2008/R2 Analysis ServicesCourse Id: 165Presented by Scott Whigham

Page 2: Data Mart

p. 2

2

• A data warehouse is a layer of abstraction over a company’s data that provides a single point of access for queries and reports– Most organizations have data spread out across

many systems

– The data warehouse unifies these underlying systems into a single interface

Data Warehouse

Page 3: Data Mart

p. 3

3

Data Warehouse

Data entry Data warehouse Reporting Training

Page 4: Data Mart

p. 4

4

• Without a data warehouse, reporting would be more challenging:– The Oracle accounting system stores country as “USA”

– The SQL Server Help Desk system stores country as “U.S.”

– The MySQL web store stores country as “United States”

Data Warehouse

Page 5: Data Mart

p. 5

5

• The data warehouse simplifies reporting– The Oracle ERP system stores country as “USA”

– The SQL Server Help Desk system stores country as “U.S.”

– The MySQL web store stores country as “United States”

Data Warehouse

Page 6: Data Mart

p. 6

6

• There are other advantages to creating a unified view of an organization’s data:– Agreed-upon definition of terms and algorithms

– Can apply data quality and consistency procedures during loading of DW

Data Warehouse

Page 7: Data Mart

p. 7

7

• There are at least two types of data warehouses– Data warehouses that store detail information

– Data warehouses that store pre-calculated aggregates

• Let’s compare!

Data Warehouse

Page 8: Data Mart

p. 8

8

• Data warehouses that store detail information:– Typically copy records from underlying data

sources on a 1:1 basis

– Often used for both analysis and archival purposes

– When a user wants to aggregate, must write aggregate expressions

Data Warehouse

Page 9: Data Mart

p. 9

9

• When a user wants to aggregate, must write aggregate expressions:

SELECT p.ProductName

, COUNT(*) AS Orders

, SUM(sd.UnitsOrdered) AS UnitsSold

FROM Sales.InternetSales s

JOIN Sales.InternetSalesDetail sd

ON s.OrderId = sd.OrderId

JOIN Products.Product p

ON sd.ProductId = p.ProductId

WHERE YEAR(s.OrderDate) = 2011

GROUP p.BY ProductName

ORDER p.BY ProductName

Data Warehouse

Page 10: Data Mart

p. 10

10

• Data warehouses that store pre-calculated aggregates:– During the loading of the DW, various levels of

aggregations are calculated and stored in the data warehouse

• “Sales” would be pre-aggregated by month, quarter, year

• “Customer growth” by city, state/province, country

– Used only for analysis

Data Warehouse

Page 11: Data Mart

p. 11

11

• When a user wants to view aggregations, he/she simply asks for the data

SELECT ProductName, Orders, UnitsSold

FROM DataWarehouseTable

WHERE OrderYear = 2011

WHERE OrderYear = 2011

AND Quarter = 1

WHERE OrderYear = 2011

AND Month = 10

Data Warehouse

Page 12: Data Mart

p. 12

12

• Regardless of the type of data warehouse, data quality is important– Garbage in, Garbage Out

Data Warehouse

Page 13: Data Mart

p. 13

13

• Typically a data warehouse is not changed directly by the users– The DW is loaded from the underlying data sources

– Users change the underlying data sources

Data Warehouse

Page 14: Data Mart

p. 14

14

• The data warehouse may not be in sync with the data entry systems– If the data warehouse is loaded each Sunday,

reports run on Saturday will not feature the previous week

Data Warehouse

Page 15: Data Mart

p. 15

15

• Typically the BI Architects define update schedules– Updating/loading the data warehouse requires

connecting to each data source and uploading changes into the data warehouse

– If updating the data warehouse can be done in less than 6-8 hours, most shops opt for nightly updates

Data Warehouse

Page 16: Data Mart

p. 16

16

• The term data mart can be confusing– Typically a data mart is a subset of the data

warehouse that holds data related to a specific area

• Sales

• Products

• Billing

Data Warehouse

Page 17: Data Mart

p. 17

17

• However it is possible to have only a data mart– A data warehouse holds data across many

departments

– Your organization may only wish to start with a “Billing” data mart

• If it is successful, you will add others and build a data warehouse

Data Warehouse

Page 18: Data Mart

p. 18

18

• OLTP vs. OLAP

"Always end the name of your child with a vowel, so that when you yell the name will carry.“

Bill Cosby

Next up