123Data Warehouse Concepts

09916225584-o.sandeep1. Star schema: Characterized by one or more fact tables and corresponding dimension tables2. Fact table : Central table in a star schema that contains the metrics analyzed by dimensions.3. Minidimension table : Contains subsets of heavily queried attributes of parent dimensions4. Hierarchy table Enables users to drill in reports5. Internal table : Stores important information that is used during the ETL process, are rebuilt during each ETL process, and are not used by end-user query tools6. Conforming dimension table : Table shared by fact tables allowing consistent analysis across multiple star schemas7. Dimension table : Stores descriptions of the characteristics of a business and descriptive information that qualifies a fact8. Helper table : Used by the Oracle Business Analytics Warehouse to solve complex problems that cannot be resolved by simple dimensional schemas9. Staging table: Intermediate storage table within the OBAW that holds data for transformation before loading the data into the dimension and fact tables10. Aggregate table : Improves performance by summing fact data with respect to a given dimension11. Oracle Business Analytics Warehouse: A modular enterprise-wide data warehouse data model with conformed dimensions for data integrated from multiple sources12. DATASOURCE_NUM_ID: Unique identifier of the source system from which data was extracted13. ETL_PROC_WID : Unique identifier for the specific ETL process used to create or update data14. INTEGRATION_ID: Unique identifier of a dimension or fact entity in its source system

Different Types of Dimensions and Facts in Data Warehouse

Dimension -A dimension table typically has two types of columns, primary keys to fact tables and textual\ descriptive data.

Fact -A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level.

Types of Dimensions -

Slowly Changing Dimensions: Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.

Type 1 Slowly Changing Dimension: In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept. Advantages: This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information. Disadvantages: - All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before. Type 2 Slowly Changing Dimension In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. Advantages: - This allows us to accurately keep all historical information. Disadvantages: - This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern. - This necessarily complicates the ETL process. When to use Type 2:

http://oditraining.blogspot.co.uk/2012/06/different-types-of-dimensions-and-facts.html

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

Type 3 Slowly Changing Dimension In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

Advantages: - This does not increase the size of the table, since new information is updated. This allows us to keep some part of history.

Disadvantages:

- Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.

When to use Type 3:

Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.

Type 1: The new record replaces the original record. No trace of the old record exists

Type 2: A new record is added into the customer dimension table.Thereby, the customer is treated essentially as two people.

Type 3: The original record is modified to reflect the change.Star Schema

In the star schema design, a single object (the fact table) sits in the middle and is radically connected to other surrounding objects (dimension lookup tables) like a star. Each dimension is represented as a single table. The primary key in each dimension table is related to a foreign key in the fact table.

Snowflake Schema

The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

Data Warehouse: is a collection of data marts representing historical data from different operations in the company.

Holds multiple subject areas Holds very detailed information Works to integrate all data sources Does not necessarily use a dimensional model but feeds dimensional models.

http://www.1keydata.com/datawarehousing/star-schema.html

http://www.1keydata.com/datawarehousing/fact-table-types.html

Data Mart: is a segment of data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company.

Often holds only one subject area- for example, Finance, or Sales May hold more summarised data (although many hold full detail) Concentrates on integrating information from a given subject area or set of source

systems Is built focused on a dimensional model using a star schema.

Fact less Fact Table A fact less fact table is a fact table that does not have any measures. It is essentially an intersection of dimensions. On the surface, a fact less fact table does not make sense, since a fact table is, after all, about facts. However, there are situations where having this kind of relationship makes sense in data warehousing.

For example, think about a record of student attendance in classes. In this case, the fact table would consist of 3 dimensions: the student dimension, the time dimension, and the class dimension.

junk dimensions - a collection of miscellaneous attributes codes, flags and text attributes that are unrelated to any particular dimension. this avoid having a large number of foreign keys in the fact table. A junk dimension is a collection of random transactional codes, flags and text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides the convenient place to store the junk dimension.

degenerate dimensions - data that is dimensional in nature but stored in a fact table.

role playing dimensions - a dimension that can play different roles in a fact table depending on the context.

Rapidly Changing Dimensions: A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions: A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

Inferred Dimensions: While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions: A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Degenerate Dimensions: A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.

Role Playing Dimensions: A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions: A shrunken dimension is a subset of another dimension. For example, the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions: Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

Types of Facts -

Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension.

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.Eg: Facts which have percentages, ratios calculated.

Fact less Fact Table: In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”.Eg: A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time. Based on the above classifications, fact tables are categorized into two:

Cumulative: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.

Snapshot:This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

Delete duplicate records from a table

1 ) Write a Query To Delete The Repeted Rows from emp table;

select count(*), <Col_Name> from table_name group by <col_name> having count(*) > 1;

SQL>Delete from emp where rowid not in(select min(rowid) from emp group by ename)

2) TO DISPLAY 5 TO 7 ROWS FROM A TABLE

SQL>select ename from emp where rowid in(select rowid from emp where rownum<=7 minus

select rowid from emp where rownum<5)

3) DISPLAY TOP N ROWS FROM TABLE?

SQL>SELECT * FROM (SELECT * FROM EMP ORDER BY ENAME DESC) WHERE ROWNUM <10;

4) DISPLAY TOP 3 SALARIES FROM EMP;

SQL>SELECT SAL FROM ( SELECT * FROM EMP ORDER BY SAL DESC ) WHERE ROWNUM <4

5) DISPLAY 9th FROM THE EMP TABLE?

SQL>SELECT ENAME FROM EMP WHERE ROWID=(SELECT ROWID FROM EMP WHERE ROWNUM<=10 MINUS SELECT ROWID FROM EMP WHERE ROWNUM <10)

select second max salary from emp;select max(sal) fromemp where sal<(select max(sal) from emp);

SELECT * FROM emp a WHERE ( n+1 ) = ( SELECT COUNT( DISTINCT ( b.emp ) ) FROM emp b WHERE b.emp >= a.emp )

SELECT * FROM emp Emp1 WHERE (5) = ( SELECT COUNT(DISTINCT(Emp2.sal)) FROM emp Emp2 WHERE Emp2.sal > Emp1.sal )

To Retrieve nth row.

SELECT * FROM t1 a WHERE n = (SELECT COUNT(rowid) FROM t1 b WHERE a.rowid >= b.rowid);

select * from <table> where rownum < N+1 minusselect * from <table> where rownum < N

Self Join

SELECT e1.ename||' works for '||e2.ename "Employees and their Managers" FROM emp e1, emp e2 WHERE e1.mgr = e2.empno;

Delete Duplicate records.

1. Using rowidDELETE FROM tbl_test WHERE ROWID NOT IN (SELECT MIN (ROWID) FROM tbl_test GROUP BY ser_no, fst_nm, deptid, cmnt);

2. Using self-join

delete from tbl_test a where rowid<(select max(rowid) from tbl_test b where a.ser_no = b.ser_no);

DELETE FROM table_name A WHERE ROWID > (SELECT min(rowid) FROM table_name B WHERE A.key_values = B.key_values);

delete from tbl_test e1 where rowid not in (select max(rowid) from tbl_test e2 where e1.ser_no = e2.ser_no );

3. Using group bydelete from emp where (empno,empname,salary) in(select max(empno),empname,salary from emp group by empname,salary );

How to find duplicate records?select id, count(*) from t1 group by id having count(*) > 1;

Difference between UNION and UNION ALL clause – Oracle

UNION and UNION ALL used to combine ( set operation ) two or more query results. UNION will eliminate duplicate rows and UNION ALL will display all rows.

3. What is the difference between DELETE and TRUNCATE ?

a) DELETE is a DML command and TRUNCATE is a DDL command.

http://sqlandplsql.com/2012/06/22/difference-between-union-and-union-all-clause-oracle/

b) TRUNCATE re-set the memory blocks after execution and much faster than DELETE in most of the circumstances.

What is the difference between PRIMARY KEY and UNIQUE KEY constraints ?

1. UNIQUE KEY columns can have null values but PRIMARY KEY column cannot accept null values.

2. A table can have only one PRIMARY KEY column but many UNIQUE KEY columns allowed.

1. What is a CO-RELATED SUBQUERY

A CO-RELATED SUBQUERY is one that has a correlation name as table or view designator in the FROM clause of the outer query and the same correlation name as a qualifier of a search condition in the WHERE clause of the subquery.egSELECT field1 from table1 XWHERE field2>(select avg(field2) from table1 Ywherefield1=X.field1);

What is difference between UNIQUE and PRIMARY KEY constraints

A table can have only one PRIMARY KEY whereas there can be any number of UNIQUE keys. The columns that compose PK are automatically define NOT NULL, whereas a column that compose a UNIQUE is not automatically defined to be mandatory must also specify the column is NOT NULL.

http://www.a2zinterviews.com/RDBMS/oracle/oracle-interview-questions_2.php

http://www.gcreddy.com/2012/12/oracle-interview-questions-and-answers.html

Why use materialized view instead of a table?A materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary based on aggregations of a table's data. Materialized views, which store data based on remote tables, are also known as snapshots. A snapshot can be redefined as a materialized view.

Materialized views are basically used to increase query performance since it contains results of a query. They should be used for reporting instead of a table for a faster execution.What is the difference between a view and a synonym?Synonym is just a second name of table used for multiple link of database. View can be created with many tables, and with virtual columns and with conditions. But synonym can be on view.When do you use WHERE clause and when do you use HAVING clause?The WHERE condition lets you restrict the rows selected to those that satisfy one or more conditions. Use the HAVING clause to restrict the groups of returned rows to those groups for which the specified condition is TRUE.

http://en.wikipedia.org/wiki/Snapshot_(computer_storage)#In_databases

http://en.wikipedia.org/wiki/Sql_join

http://en.wikipedia.org/wiki/Query

http://en.wikipedia.org/wiki/Database

Documents

123Data Warehouse Concepts