Click here to load reader

Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data warehouse –Data mart –OLAP cube, multidimensional cube –Star schema

  • View
    218

  • Download
    0

Embed Size (px)

Text of Background of OLAP, DW, Data Mining. Introductory Concepts Related Terms –OLAP –Data...

  • Slide 1
  • Background of OLAP, DW, Data Mining
  • Slide 2
  • Introductory Concepts Related Terms OLAP Data warehouse Data mart OLAP cube, multidimensional cube Star schema Fact, dimension Dimensional modeling Data mining
  • Slide 3
  • Introductory Concepts OLAP Data Warehouse Data Mining Multidimensional Cube Star Schema Relevance of Terms
  • Slide 4
  • Introductory Concepts OLAP(On-Line Analytical Processing) Complex query processing for decision making Report generation for the summary/aggregate /statistics over source data Advanced data analysis techniques OLTP(On-Line Transaction Processing) Simple transaction processing for regular (daily) business operations Manipulating databases maintained for the organizations activities Optimized for simple operations (select,insert...)
  • Slide 5
  • Introductory Concepts , , OLTP DB Transaction
  • Slide 6
  • Introductory Concepts , , OLTP DB Transaction Solution (1) /
  • Slide 7
  • Introductory Concepts , , OLTP DB Transaction Solution (2) OLAP
  • Slide 8
  • Introductory Concepts , , OLTP DB Transaction Solution (2) OLAP / Data Warehouse
  • Slide 9
  • Introductory Concepts OLAP systems/tools Present summary/aggregate data for a large OLTP databases in several ways to decision makers Manipulate DW over several dimensions dynamically to find useful information Data warehouse A collection of data which are used for OLAP Summary/aggregate data extracted from source OLTP databases which can support every possible OLAP requests. Represented by star schema
  • Slide 10
  • Introductory Concepts OLAP Data Warehouse Data Mining Multidimensional Cube Star Schema Relevance of Terms
  • Slide 11
  • Page 11
  • Slide 12
  • Introductory Concepts OLTP(Online Transaction Processing) Typical daily business query and update processing OLAP (Online Analytical Processing) Complex query processing or report generation Advanced data analysis techniques ROLAP: Relational OLAP MOLAP: Multi-dimensional OLAP Data Warehouse An enterprise-wide data repository for decision support Data Mart A smaller targeted DW for a business process Star Schema A DB structure for data warehouse
  • Slide 13
  • Data Warehouse DW is an integrated repository of data that is put into a form that can be easily understood, interpreted, and analyzed by the people who need to use it to make decisions Data are extracted from operational systems, then cleansed, integrated, transformed, and aggregated, into a read-only database that is optimized for decision making Data Warehouse is a Subject-oriented Integrated Time-variant Non-volatile Collection of data in support of managements decision- making process (W.H. Inmon)
  • Slide 14
  • Data Warehouse Motivation for DW An enterprise-wide repository of data, information, knowledge, and meta-data Gather all the information into a single place for in-depth analysis Decouple such analysis from OLTP systems Transform the data into information Provide right information in the right format at the right time Perform sophisticated analysis of data Perform trend analysis, time series analysis, risk analysis, etc. Perform DSS exploration such as alternative formation, alternatives testing, decision-making, etc. Discover/visualize hidden facts, patterns, correlations, rules, exceptions using data mining techniques
  • Slide 15
  • Data Warehouse OLTP queries How many shoes did we sell last month? What are the age, address, phone of a certain student (e.g., Hong Gildong)
  • Slide 16
  • Data Warehouse OLAP queries How many size 10 shoes in red did we sell last month in the Midwest, the Northeast, compared with the same month last year, actual vs. budget? What are the top 25 brands, by products, styles, and regions, for this period for total US based on sales dollars? How much promotional expenses did we spend on customers who purchased less than $100 worth of products? How much discount should we offer to boost the sales volume significantly? Find the correlation between buying patterns of products of type A and those of type B. What are sales trends? What percent of the market do we own? How are our defect rates improving? Are our profits are increasing or decreasing?
  • Slide 17
  • OLTPData Warehouses & OLAP PurposeDaily business support, Transaction processing Decision support, Analytic processing UserData entry clerk, administrator, developer Decision maker, executives DB designApplication orientedSubject-oriented DB design modelER modelStar, snowflake, multidimensional model Data structuresNormalized, complexDenormalized, simple DataCurrent, up-to-date operational data Atomic Isolated Historical Atomic and summarized Integrated UsageRepetitive, RoutineAd-hoc UpdateTransaction constantly generate New data, read/write Data is relatively static Often refreshed weekly, read mostly Response timeSubsecond to secondSeconds, minutes, worse Index typesB+ treesB+ trees, bitmap index, join index Systems requirements Transaction throughput, Data consistency Query throughput, Data accurary Comparison of OLTP and Data Warehouse DWs require a new query-centric view of the data
  • Slide 18
  • Data Warehouse Example Benefits of Data Warehousing & Data Mining Fast sophisticated report generations 4-15 times faster delivery Young men buy beer on Friday nights when they buy diapers More athletic shoes are sold on Friday evenings and Saturdays than during the rest of the week combined More athletic shoes are sold when white tube socks are prominently displayed as part of a 2-for-1 sale. In a retail chain, potato chip purchases were accompanied by a soda purchase in half the cases. That figure increases to 75% when there is a marketing promotion. Blue Cross found some providers had superior treatment success rates for some fatal diseases Victoria Secret found a particular incentive was ineffective, saving $300K per week.
  • Slide 19
  • Data Warehouse Database Design for DW Objective of a DW Creating a database optimized for decision support Limitations of ER model for DW applications Normalized data model support large numbers of transactions with very few records. ER models tend to be very complex and difficult to navigate. ER model identifies first entities, then relationships Four basic requirements of a warehouse design The schema must be simple. The data must be clean, consistent, and accurate Query processing must be fast Load the data into the warehouse quickly Two types of data representation in DW Star Schema Multi-dimensional array
  • Slide 20
  • Star Schema Dimensional Model (Star Schema)Star Schema A database schema for data warehousing Initially developed to simplify SQL queries (by Ralph Ki mball) Consists of a few central fact tables and many dimensi on tables Dimension = analysis criteria Fact = measurements aggregated over dimensions to be analyzed Simplifies end-user query processing and high query p erformance Used to reduce joins by OLAP/Relational engines Relatively few tables and well-defined join paths
  • Slide 21
  • Star Schema
  • Slide 22
  • Dimensional Model (Star Schema) consists of: Fact table:Fact table: Stores all transactions or factual data that are analyzed Typically numeric measures From millions to more than billion rows Example: Revenue, Actuals, Budgets, Sales, Orders, Bookings, Claims Dimension table:Dimension table: Attributes about facts Supports grouping, browsing, constraining Provides the entry points into the DW Example: Time, Customer, Promotions, Demographics, LifeStyles, Prod ucts, Stores, Markets
  • Slide 23
  • Star Schema Dimension hierarchy Each dimension table has several attributes These attributes may have a hierarchy Different level of aggregation for fact data User may want aggregate data by some level of the di mension hierarchy Year Month day State County City Street
  • Slide 24
  • Star Schema The Strengths of Dimensional Modeling (Kimball 98) Provides a predictable, standard framework Report writers, query tools, and UIs could take advantage of DM structure Supports presentation and performance Simplifies the understanding and navigation of metadata Is robust against unexpected changes in user behavior Logical design is independent of the use of the schema All dimensions are equally entry points to the system Can be changed gracefully Could add new unanticipated facts of the same grain Could add new dimensions Add new unanticipated dimensional attributes Break existing dimension records down to a lower level of granularity Availability of common modeling situations Slowly changing dimensions Heterogeneous products Pay-in-advance databases Factless facts as in event-handling Availability of administrative utilities and SW processes that manage an d use aggregates
  • Slide 25
  • Star Schema Variations of Star Schema Star SchemaStar Schema All dimensions are denormalizeddenormalized Wide dimensions and deep facts Snowflake Schema All dimensions are normalized into 3NF
  • Slide 26
  • Star Schema Summary of DW and Star Schema DW is databases specially maintained for analytical processing DW is organized by facts and dimensions Star schema is a way of representing DW schema Cube, Data Mining??
  • Slide 27
  • Page 2