MISM 621_Microsoft Data Warehouse Tool_Group 8

Embed Size (px)

Citation preview

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    1/22

    Microsoft Data Warehouse Tool

    International School of Information Management,

    University of Mysore, Mysore

    Submitted to,

    Prof. Chandrashekar

    Submitted By,

    Saurabh Dey

    [email protected]

    Shrestha Rath

    [email protected]

    Sowmya S.

    [email protected]

    Sukritha S.

    [email protected]

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    2/22

    ~ Content ~

    Summary 2

    1. Introduction 3

    2. Microsoft Data Warehouse Framework

    4

    2.1. Building Data Warehouse 4

    2.2. Managing Data Warehouse 5

    2.3. Using Data Warehouse 5

    3. Features of SQL Server 2008

    5

    4. Functions of SQL Server 2008

    10

    4.1. Aggregate Functions in SQL Server 2008 10

    4.2. Default Databases in SQL server 2008 11

    4.3. Types of Joins in SQL Server 11

    4.4. Indexes in Microsoft SQL Server 12

    5. BI Overview and ETL

    13

    6. Comparison of Microsoft Data Warehouse with IBM DB215

    7. A Case Study of Carl Zeiss Vision, North America

    16

    8. Conclusion 20

    2

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    3/22

    References 20

    Summary

    This report highlights the basic concept of Data Warehouse and the main focus is given on

    Microsoft Data Warehouse Tool. Microsoft Data Warehouse tool is actually a combination of

    various components such as SQL Server Database, SQL Server Integration Services (SSIS), SQL

    Server Analysis Services (SSAS), and SQL Server Reporting Services (SSRS). The SQL Server

    is designed by Microsoft and the current released version is Microsoft SQL Server 2008 R2.

    In this report the Microsoft Data Warehouse Framework is described which has building section,

    managing section and utilizing section. The building section captures data from various data

    sources such as legacy systems; flat files etc., and then perform basic transformation to make the

    data suitable for loading into the data warehouse. The managing section performs the

    management of metadata repository which allows the smooth operation of multiple vendor tools

    without creating any problem in the data warehouse operation. The utilization section provides

    the user access to the stored data for analysis and decision making, through different application

    programs and directory access, which helps in searching and security of the stored data.

    SQL server has various features like, automatic recovery of data pages which enables the

    principal and mirror machines to transparently recover from 823/824 types of data page errors,

    then predictable query performance which enables greater query performance stability and

    predictability etc. It has many other features also which are discussed in the content. In the

    functions of SQL Server 2008 focus is given on aggregate functions, default databases, types of

    joins supported and index.

    In the overview of Business Intelligence (BI) the basic structure of the BI is given in graphic

    format and the extraction, transformation and loading of data is defined in brief using four

    themes. Finally a comparison is made between Microsoft SQL Server and IBM DB2.

    3

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    4/22

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    5/22

    2. Microsoft Data Warehouse Framework

    The Microsoft Data Warehouse Framework has been designed to provide an open architecture

    which can be extended easily by Microsoft customers and business partners using industry

    standard technology.

    The Microsoft Data Warehouse Framework describes the relationship between the various

    components used in the process of building, using and managing a data warehouse. The core of

    the framework is the set of enabling technologies data transport layer and integrated metadata

    repository. The framework has three sections that are given below -

    Figure 1: Microsoft Data Warehouse Framework

    5

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    6/22

    2.1. Building Data Warehouse

    This section requires a set of tools for describing logical and physical design of the data sources

    and their destinations in the data warehouse or data marts. Data which is operational must go

    through cleansing and transformation stage before being fetched into the data warehouse or data

    marts in order to meet the definition set up during design stage. The depth of data staging process

    varies based on the enterprise data warehouse architecture.

    2.2. Managing Data Warehouse

    The managing section of Microsoft Data Warehouse framework manages multi server network

    and also schedules recurring task. There is a repository in this managing section which provides

    integration point for metadata that is shared by the various tools used in the data warehousing

    process. Shared metadata helps in integrating multiple tools that are available from different

    vendors, so due to that no specialized interface is needed between each of the tools.

    2.3. Using Data Warehouse

    In the using section of Microsoft Data Warehouse framework, desktop productivity products,

    specialized analysis products and custom programs are used to gain access to information in the

    data warehouse. The user access to the data warehouse is done by information directory which

    provides optimized search option to the end user and also provides secured access to the data

    warehouse.

    3. Features of SQL Server 2008

    Transparent Data Encryption - Enable encryption of an entire database, data files, or

    log files, without the need for application changes. Benefits of this include: Search

    encrypted data using range and fuzzy searches, search secure data from unauthorized

    users, and data encryption without any required changes in existing applications.

    Extensible Key Management - SQL Server 2005 provides a comprehensive solution for

    encryption and key management. SQL Server 2008 delivers an excellent solution to this

    growing need by supporting third-party key management and HSM products.

    Auditing - Create and manage auditing via DDL, while simplifying compliance by

    providing more comprehensive data auditing. This enables organizations to answer

    common questions, such as, What data was retrieved?

    6

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    7/22

    Enhanced Database Monitoring - SQL Server 2008 builds on SQL Server 2005 by

    providing a more reliable platform that has enhanced database mirroring, including

    automatic page repair, improved performance, and enhanced supportability.

    Automatic Recovery of Data Pages - SQL Server 2008 enables the principal and mirror

    machines to transparently recover from 823/824 types of data page errors by requesting a

    fresh copy of the suspect page from the mirroring partner transparently to end users and

    applications.

    Log Stream Compression - Database mirroring requires data transmissions between the

    participants of the mirroring implementations. With SQL Server 2008, compression of

    the outgoing log stream between the participants delivers optimal performance and

    minimizes the network bandwidth used by database mirroring.

    Resource Governor - Provide a consistent and predictable response to end users with the

    introduction of Resource Governor, allowing organizations to define resource limits andpriorities for different workloads, which enable concurrent workloads to provide

    consistent performance to their end users.

    Predictable Query Performance - Enable greater query performance stability and

    predictability by providing functionality to lock down query plans, enabling

    organizations to promote stable query plans across hardware server replacements, server

    upgrades, and production deployments.

    Data Compression - Enable data to be stored more effectively, and reduce the storage

    requirements for our data. Data compression also provides significant performance

    improvements for large I/O bound workloads, like data warehousing.

    Hot Add CPU - Dynamically scale a database on demand by allowing CPU resources to

    be added to SQL Server 2008 on supported hardware platforms without forcing any

    downtime on applications. Note that SQL Server already supports the ability to add

    memory resources online.

    Policy-based Management - Police-based Management is a policy-based system for

    managing one or more instances of SQL Server 2008. Use this with SQL ServerManagement Studio to create policies that manage entities on the server, such as the

    instance of SQL Server, databases, and other SQL Server objects.

    Streamlined Installation - SQL Server 2008 introduces significant improvements to the

    service life cycle for SQL Server through the re-engineering of the installation, setup, and

    configuration architecture. These improvements separate the installation of the physical

    7

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    8/22

    bits on the hardware from the configuration of the SQL Server software, enabling

    organizations and software partners to provide recommended installation configurations.

    Performance Data Collection - Performance tuning and troubleshooting are time-

    consuming tasks for the administrator. To provide actionable performance insights to

    administrators, SQL Server 2008 includes more extensive performance data collection, a

    new centralized data repository for storing performance data, and new tools for reporting

    and monitoring.

    Language Integrated Query (LINQ) - Enable developers to issue queries against data,

    using a managed programming language, such as C# or VB.NET, instead of SQL

    statements.

    ADO.NET Data Services - Developers using the ADO.NET framework can program

    against a database, using CLR objects that are managed by ADO.NET. SQL SERVER

    2008 introduces more efficient, optimized support that improves performance andsimplifies development.

    DATE/TIME - SQL Server 2008 introduces new date and time data types:

    DATE A date-only type

    TIME At time-only type

    DATETIMEOFFSET A time-zone-aware datetime type

    DATETIME2 A datetime type with larger fractional seconds and year range

    than existing DATETIME type

    The new data types enable applications to have separate data and time types while

    providing large data ranges of user defined precision for time values.

    HIERARCHY ID - Enable database applications to model tree structures in a more

    efficient way than currently possible. New system type HIERARCHYID can store values

    that can represent nodes in a hierarchy tree. This new type will be implemented as a CLR

    UDT, and will expose several efficient and useful built-in methods for creating and

    operating on hierarchy nodes with a flexible programming model.

    FILESTREAM Data - Allow large binary data to be stored directly in an NTFS file

    system, while preserving an integral part of the database and maintaining transactionalconsistency. Enable the scaleout of large binary data traditionally managed by the

    database to be stored outside the database on more cost-effective storage without

    compromise.

    8

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    9/22

    Integrated Full Text Search - Integrated Full Text Search makes the transition between

    Text Search and relational data seamless, while enabling users to use the Text Indexes to

    perform high-speed text searches on large text columns.

    Sparse Columns- NULL data consumes no physical space, providing a highly efficient

    way of managing empty data in a database. For example, Sparse Columns allows object

    models that typically have numerous null values to be stored in a SQL Server 2005

    database without experiencing large space costs.

    Large User-Defined Types - SQL Server 2008 eliminates the 8-KB limit for User-

    Defined Types (UDTs), allowing users to dramatically expand the size of their UDTs.

    Spatial Data Types It has support for spatial data. It implements Round Earth solutions

    with geography data type. Uses latitude and Longitude coordinates to define areas on the

    Earths surface. It implements Flat Earth solutions with the geometry data type. Storepolygons, points, and lines that are associated with projected planar surfaces and

    naturally planar data, such as interior spaces.

    Backup Compression - Keeping disk-based backups online is expensive and time-

    consuming. With SQL Server 2008 backup compression, less storage is required to keep

    backups online, and backups run significantly faster since less disk I/O is required.

    Partitioned Table Parallelism - Partitions enable organizations to manage large growing

    tables more effectively by transparently breaking them into manageable blocks of data.

    Star Join Query Optimizations - SQL Server 2008 provides improves query

    performance for common data warehouse scenarios. Star Join Query optimizations

    reduce query response time by recognizing data warehouse join patterns.

    Grouping Sets - Grouping sets is an extension to the GROUP BY clause that lets users

    defines multiple grouping in the same query. Grouping Sets produces a single result set

    that is equivalent to a UNION ALL of differently grouped rows, making aggregation

    querying and reporting easier and faster.

    Change DataCapture - With Change Data Structure, changes are captured and placed

    in change tables. It captures complete content of changes, maintains cross-table

    consistency, and even works across schema changes. It enables organizations to integrate

    the latest information into the data warehouse.

    9

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    10/22

    MERGE SQL Statement With the inclusion of this statement developers can more

    effectively handle common data warehousing scenarios, like checking whether a row

    exists, and then existing an insert or update.

    SQL Server Integration Services (SSIS) Pipeline Data Integration packages can scale

    more effectively, making use of available resources and managing the largest enterprise-

    scale workloads.

    SQL Server Integration Services (SSIS) Persistent Lookups - The need to perform

    lookups is one of the most common ETL operations. This is especially prevalent in data

    warehousing, where fact records need to use lookups to transform business keys to their

    corresponding surrogates, SSIS increases the performance of lookups to support the

    largest tables.

    Analysis Scale and Performance SQL Server 2008 drives broader analysis withenhanced analytical capabilities and with more complex computations and aggregations.

    Cube design tools help users streamline the development of the analysis infrastructure

    enabling them to build solutions for optimized performance.

    Block Computations Block computations provide a significant improvement in

    processing performance enabling user to increase the depth of their hierarchies and

    complexity of the computations.

    Writeback - MOLAP enabled writeback capabilities in SQL Server 2008 Analysis

    Services removes the need to query ROLAP partitions. This provides users with

    enhanced writeback scenarios from within analytical applications without sacrificing the

    traditional OLAP performance.

    Enterprise Reporting Engine - Reports can easily be delivered throughout the

    organization, both internally and externally, with simplified deployment and

    configuration. This enables users to easily create and share reports of any size and

    complexity.

    Internet Report Development - Customers and suppliers can effortlessly be reached bydeploying reports over the Internet.

    Manage Reporting Infrastructure - Increase supportability and the ability to control

    server behavior with memory management, infrastructure consolidation, and easier

    configuration through a centralized store and API for all configuration settings.

    10

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    11/22

    Report Builder Enhancements - Easily build ad-hoc and author reports with any

    structure through Report Designer.

    Forms Authentication Support - Support for Forms authentication enable users to

    choose between Windows and Forms authentication.

    Report Server Application Embedding - Report Server application embedding the

    URLs in reports and subscriptions to point back to front-end applications.

    Microsoft Office Integration - SQL Server 2008 provides Word rendering that enables

    users to consume reports directly from within Microsoft Office Word. In addition, the

    existing Excel renderer has been greatly enhanced to accommodate the support features,

    like nested data regions, sub-reports, as well as merged cell improvements. This let users

    maintain layout fidelity and improves the overall consumption of reports from Microsoft

    Office applications.

    Predictive Analysis - SQL Server Analysis Services continues to deliver advanced data

    mining technologies. Better Time Series support extends forecasting capabilities.

    Enhanced Mining Structures deliver more flexibility to perform focused analysis through

    filtering as well as to deliver complete information in reports beyond the scope of the

    mining model. Cross-validation enables confirmation of both accuracy and stability for

    results that you can trust. Furthermore, the new features delivered with SQL Server 2008

    Data Mining Add-ins for Office 2007 empower every user in the organization with even

    more actionable insight at the desktop.

    4. Function of SQL Server 2008

    Following are the different functions performed by Microsoft SQL Server 2008

    4.1. Aggregate Functions in SQL Server 2008

    Aggregate functions are applied to a group of data values from a column. Aggregate functions

    always return a single value. SQL Server 2008/Transact-SQL supports following aggregatefunctions:

    AVG - Calculates the arithmetic mean (average) of the data values contained within a

    column. The column must contain numeric values.

    MAX and MIN - Calculate the maximum and minimum data value of the column,

    respectively. The column can contain numeric, string, and date/time values.

    11

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    12/22

    SUM - Calculates the total of all data values in a column. The column must contain

    numeric values.

    COUNT - Calculates the number of (non-null) data values in a column. The only

    aggregate function not being applied to columns is COUNT (*). This function returns the

    number of rows (whether or not particular columns have NULL values).

    COUNT_BIG - New and Analogous to COUNT, the only difference being that

    COUNT_BIG returns a value of the BIGINT data type.

    Aggregate function Example

    SELECT ProjectName, SUM (budget) TotalBudget FROM Project_Tbl GROUP BY

    ProjectName;

    4.2. Default Databases in SQL server 2008

    Microsoft SQL SERVER provides 3 default databases.

    The Master database holds information for all the databases located on the SQL Server

    instance and is the glue that holds the engine together. Because SQL Server cannot start

    without a functioning master database, we must administer this database with care.

    The tempdb holds temporary objects such as global and local temporary tables and stored

    procedures. The model is essentially a template database used in the creation of any new

    user database created in the instance.

    The msdb database stores information regarding database backups, SQL Agent

    information, DTS packages, SQL Server jobs, and some replication information such as

    for log shipping.

    4.3. Types of Joins in SQL Server

    Joins are used in SQL queries to explain how different tables are related. Joins helps in selecting

    data from a table depending upon data from another table.

    Joins are used to retrieve data from more than one table based a common field. Based on the

    query, the output is taken from two or more tables.

    12

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    13/22

    Types of Joins: INNER JOIN, OUTER JOIN, CROSS JOIN, OUTER JOIN is further classified

    as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.

    4.4. Indexes in Microsoft SQL Server

    An index is a physical structure containing pointers to the data. Indexes are created in an either

    existing table to locate rows more quickly and efficiently or it is possible to create an index on

    one or more columns of a table, and each index is given a name. The users cannot see the

    indexes; they are just used to speed up queries. Effective indexes are one of the best ways to

    improve performance in a database application.

    A table scan happens when there is no index available to help a query. In a table scan SQL

    Server examines every row in the table to satisfy the query results. Table scans are sometimes

    unavoidable, but on large tables, scans have a terrific impact on performance. There are twotypes of indexes are available in SQL SERVER.

    Clustered indexes define the physical sorting of a database tables rows in the storage

    media. For this reason, each database table may have only one clustered index. A

    clustered index is a special type of index that reorders the way records in the table are

    physically stored. Therefore, table can have only one clustered index. The leaf nodes of a

    clustered index contain the data pages.

    Non-clustered indexes are created outside of the database table and contain a sorted list of

    references to the table itself. Non-clustered indexes can be multiple. A non-clustered

    index is a special type of index in which the logical order of the index does not match the

    physical stored order of the rows on disk. The leaf node of a non-clustered index does not

    consist of the data pages. Instead, the leaf nodes contain index rows of references to the

    table.

    5. BI Overview and ETL

    Microsoft has built an end-to-end BI platform and is in the form of Data Transformation Service

    (DTS), Analysis Services and Reporting Services. DTS was rewritten as SQL Server Integration

    Services (SSIS), which is an enterprise class Extract Transform Load (ETL) tool. SQL server

    has Analysis Services which has Universal Dimensional Model (UDM), cube partitioning, data

    mining and predictive analysis, what if modeling, key performance indicators, etc. Business

    13

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    14/22

    intelligence development Studio (BIDS) was introduced with SQL Server 2005 to give a single

    and integrated tool for development of SSIS, SQL Server Analysis Services (SSAS) and SQL

    Server Reporting Services (SSRS) objects.

    Figure 2: Microsoft BI End-to-End Offerings

    SQL Server 2008 makes data available at any place, any time, and to any device to its customers.

    It provides for a more trusted, scalable and available platform. With significant advancements in

    key areas, SQL Server 2008 becomes a more productive, intelligent and enterprise data platform.The new features in SQL Server 2008 are added around the following four major themes.

    Enterprise Data Platform - SQL Server 2008 provides a more reliable, secure, trusted

    and scalable platform. It improves availability, using enhanced data mirroring,

    predictable query performance, data and backup compression, and many development

    14

    SQL Server Integration Services (SSIS) - ETL

    SQL 2008 Database

    SQL Server Reporting Services

    (SSRS)

    SQL Server Analysis Services

    (SSAS)

    SQL BI Platform

    Scoreboards Dashboards ReportsExcel, pdf,

    MS Word

    Charts,

    graphs

    Excel Calculation Analytics Performance point server

    Business Performance monitoring using performance application

    Heterogeneous Data Sources (Oracle, DB2, CRM, File Sources etc.)

    Share

    Point

    Server

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    15/22

    features. Another innovative feature is Policy-based management, which enables an

    easier SQL Server administration.

    Beyond Relational - The features in this category cab handle any type of data, even

    those that are unstructured and non relational. New data types are introduced to handle

    geo spatial data, documents and image files ad well. It enables the developers to design

    location intelligent applications and enhances the ability to handle the document

    management.

    Dynamic Development - Using the new .NET Framework 3.5 reduces the complexity of

    the development with ADO.NET Entity Framework and Language Integrated Query

    (LINQ) to SQL. ADO.NET Entity Framework enables the developers to become more

    productive by directly interacting with the business entities. Developers can write

    synchronizing applications by using features like change tracking.

    Pervasive Insight - These features further enhance the SSIS (SQL Server Integration

    Services), SSAS (SQL Server Analysis Services) and SSRS (SQL Server Reporting

    Services), making them more scalable. They enable the enterprises to integrate all the

    data into data warehouse more efficiently and enables real-time data analysis. It also

    empowers every user with actionable insights.

    6. Comparison of Microsoft Data Warehouse with IBM DB2

    The comparison of Microsoft Data Warehouse Tool and IBM DB2 is made, based on various

    aspects.

    Total Cost of Ownership

    SQL Server 2008 offers lower cost than IBM DB2 in each of the major areas that

    contribute to total cost of ownership.

    15

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    16/22

    SQL Server has a better total cost of ownership than IBM DB2 because of its lower

    costs of administration and services.

    SQL Server 2008 provides features that customers of IBM DB2 must buy as add-ins.

    Flexible Licensing Model

    In Microsoft licensing requirements are determined by the number of processors, and

    not the number of cores. IBM has more complex licensing policies such as licensing

    per core or by processor value unit, and customers can end up paying substantially

    more for multi-core systems.

    Simplified Product Line

    SQL Server has a single well defined and easily understood database product line. Ithas SQL Server and integrated BI, but IBM lacks a tightly integrated/packaged BI

    tools suite for DB2.

    Price-Performance and Scalability

    SQL Server 2008 is designed to scale reliably to meet the needs of the largest

    organizations in the most demanding database environments.

    SQL Server has a lead over IBM DB2 in price/performance

    SQL Server 2008 enables enhanced analytical capabilities and supports more

    complex computations and aggregations

    High Availability

    SQL Server 2008 offers high-availability features that are not offered by IBM DB2

    Enhanced Manageability

    Easier to manage and more productive than IBM DB2. For SMEs it is feasible to

    deploy and manage many database applications without requiring external specialist

    resources.

    16

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    17/22

    7. A Case Study of Carl Zeiss Vision, North America

    The Journey

    Determined Need for Enterprise Data Warehouse Worked with Business Users to Understand Business Requirements

    Determined Software Requirements

    MS SQL Server 2005 & 2008

    MS SSIS (ETL Tool)

    MS SSAS (Analytic Cube Tool)MS SSRS & Excel (Reporting Tools)

    SharePoint for Deploying Reports over Company Intranet

    Designed and Developed zBis Data Warehouse

    4 + 1 Steps Dimensional Design Process

    Ralph Kimballs Process for Developing Star Schemas

    Determine Business Process

    Model business Processes

    Each Process will determine 1 or more FactsDesign Data Warehouse by Business Process _ Not Business Unit

    Identify the Grain of the Fact

    What does 1 row in Fact table representTransactional or Summary

    Design the Data Warehouse Dimensions

    Design the Data Warehouse Facts

    Determine Hierarchies

    Customer Hierarchies Sales Channels

    Distribution Channels

    Business ChannelsCustomer Channels

    Product DivisionsSales Organizations

    Sales OfficeBuy Groups/Directly Purchase

    Product Hierarchy

    Manufacturer

    Brand

    17

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    18/22

    Product Type Each product type had own Hierarchy

    Design

    Make/Model

    Geo Hierarchy

    Sales DivisionSales Region

    Sales Territory

    Conformed Dimensions

    Standardized dimensions across data warehouse

    Dimensions are associated with multiple business processes

    Determine by using Bus Matrix & enforced in ETL

    Conformed Dimensions are shared and consistent across fact tables

    Use Data Warehouse BUS Matrix

    Use Data Warehouse BUS Matrix for

    Understanding & mapping of Business Processes and Dimensions

    Ongoing DW/BI planning effortsTeam & Management Communications

    Understand Business Process unions across the enterprise

    Date Company Customer Product Geo Dist Ctr Promo

    Company

    Sales

    X X X X X X

    Customer

    Discounts

    X X X X X X

    Product

    Cost

    X X X X X X X

    Company

    Inventory

    X X X

    Dist Ctr

    Inventory

    X X X

    Table 1: BUS Matrix

    18

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    19/22

    Figure 3: Dimensional Schema

    Slow Changing Dimensions

    Type 1 Overwrite existing Dimension RowUse when dont need to keep history data row

    Can be used to correct bad data

    Type 2 Create a new Dimension Row

    Use date and/or active non-active fields to identify current and inactive data rows

    Type 3 Keep old and add new attributes in Dimension Row

    Allow Alternate realities to exist simultaneously in one Dimension Row

    Slow Changing Dimensions are handled in the ETL

    Type of Dimensions

    Mini-Dimension

    19

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    20/22

    Junk Dimensions

    Outrigger Dimensions

    Small Static Dimensions

    Lookup tables

    Type of Facts

    Transaction Fact Tables

    Snapshot Fact Tables

    Accumulating Snapshot Fact Tables

    Consolidated or Aggregated Fact Tables

    Figure 4: Bridge Tables

    20

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    21/22

    Figure 5: Bridge Tables

    8. Conclusion

    Microsoft data warehouse is the most demanding data warehouse and business intelligence

    applications in terms of performance, scalability, and security. Microsofts SQL Server product

    has consistently maintained a leadership position in terms of growth in the database segment.

    Tight integration with the Microsoft Visual Studio development environment improves

    developer productivity and reduces database application development cycles. It provides rich

    security architecture to help protect data and network resources.

    References

    A CIOview White Paper on CIOview: Should You Migrate from Sybase to SQL Server?

    Alex Payne, Microsoft "Business Intelligence and Data Warehousing in SQL Server 2005"

    published on 15 July, 2005 available at http://technet.microsoft.com/hi-in/library/bb545450

    Bryan Thomas Solutions for Highly Scalable Database Applications An analysis ofarchitectures and technologies, Performance Tuning Corporation

    Dr. Abhijit Chattaraj, Mr. Philip Cookson Comparing SQL Server 2008 to IBM DB2 9.5

    SQL Server Technical Article published in May 2008

    Elizabeth Diamond Architecting A Data Warehouse, A case study, September 9, 2009

    21

  • 8/7/2019 MISM 621_Microsoft Data Warehouse Tool_Group 8

    22/22

    Paulraj Ponniah Data Warehousing Fundamentals A Comprehensive Guide for IT

    Professionals

    Microsoft SQL Server, available at http://en.wikipedia.org/wiki/Microsoft_SQL_Server

    Mitch Kramer, Green Hill Analysis "A Comparison of Business Intelligence Strategies andPlatforms Comparing Microsoft, Oracle, IBM, and Hyperion",

    Mitch Ruebush Comparing SQL Server 2005 and Oracle 10g as a Database Platform for

    Microsoft .NET Developers - A Comparison of Developer Productivity published in April 2005

    Sam Anahory, Dennis Murray Data Warehousing In The Real World A practical Guide for

    Building Decision Support Systems, Pearson Education

    SQL Server Fast Track Data Warehouse at http://www.microsoft.com/sqlserver/2008/en/

    Virendra Wadkar, Phaneendra babu Sunivis, Nitesh Rai - "SQL Server 2008 BI Features"

    Whitepaper on Microsofts SQL Server product has consistently maintained a leadership

    position in terms of growth in the database segment, Revision 10, October 2004

    22