DWH Concepts FAQs1

Embed Size (px)

Citation preview

  • 7/30/2019 DWH Concepts FAQs1

    1/20

    Data warehousing concepts (FAQs)

    1) What is Data warehouse?

    Data warehouse is relational database used for query analysis and

    reporting. By definition data warehouse is Subject-oriented, Integrated,

    Non-volatile, Time variant.

    Subject oriented : Data warehouse is maintained particular subject.

    Integrated : Data collected from multiple sources integrated into a

    user readable unique format.

    Non volatile : Maintain Historical date.

    Time variant : data display the weekly, monthly, yearly.

    2) What is Data mart?

    A subset of data warehouse is called Data mart.

    3) Difference between Data warehouse and Data mart?

    Data warehouse is maintaining the total organization of data. Multiple

    data marts used in data warehouse. where as data mart is maintained only

    particular subject.

    4) Difference between OLTP and OLAP?

    OLTP is Online Transaction Processing. This is maintained current

    transactional data. That means insert, update and delete must be fast.

    5) Explain ODS?

    Operational data store is a part of data warehouse. This is maintained only

    current transactional data. ODS is subject oriented, integrated, volatile,

    current data.

    1

  • 7/30/2019 DWH Concepts FAQs1

    2/20

    6) Difference between Power Center and Power Mart?

    Power center receive all product functionality including ability to multiple

    register servers and metadata across the repository and partition data.

    One repository multiple informatica servers. Power mart received allfeatures except multiple register servers and partition data.

    7) What is a staging area?

    Staging area is a temporary storage area used for transaction, integrated

    and rather than transaction processing.

    When ever your data put in data warehouse you need to clean and

    process your data.

    8) Explain Additive, Semi-additive, Non-additive facts?

    Additive fact: Additive Fact can be aggregated by simple

    arithmetical additions.

    Semi-Additive fact: semi additive fact can be aggregated simple

    arithmetical

    additions along with some other dimensions.

    Non-additive fact: Non-additive fact cant be added at all.

    9) What is a Fact less Fact and example?

    Fact table which has no measures.

    10) Explain Surrogate Key?

    Surrogate Key is a series of sequential numbers assigned to be a primary

    key for the table.

    11) How many types of approaches in DHW?

    Two approaches: Top-down(Inmol approach), Bottom-up(Ralph

    Kimball)

    2

  • 7/30/2019 DWH Concepts FAQs1

    3/20

    12) Explain Star Schema?

    Star Schema consists of one or more fact table and one or more

    dimension tables that are related to foreign keys.

    Dimension tables are De-normalized, Fact table-normalized

    Advantages: Less database space & Simplify queries.

    13) Explain Snowflake schema?

    Snow flake schema is a normalize dimensions to eliminate the

    redundancy.The dimension data has been grouped into one large table.

    Both dimension and fact tables normalized.

    14) What is confirm dimension?

    If both data marts use same type of dimension that is called confirm

    dimension.If you have same type of dimension can be used in multiple

    fact that is called confirm dimension.

    15) Explain the DWH architecture?

    16) What is a slowly growing dimension?

    Slowly growing dimensions are dimensional data,there dimensions

    increasing dimension data with out update existing dimensions.That

    means appending new data to existing dimensions.

    17) What is a slowly changing dimension?

    Slowly changing dimension are dimension data,these dimensions

    increasing dimensions data with update existing dimensions.

    Type1: Rows containing changes to existing dimensional are update in the

    target by overwriting the existing dimension.In the Type1 Dimension

    mapping, all rows contain current dimension data.

    Use the type1 dimension mapping to update a slowly changing

    dimension table when you do not need to keep any previous versions of

    dimensions in the table.

    3

  • 7/30/2019 DWH Concepts FAQs1

    4/20

    Type2: The Type2 Dimension data mapping inserts both new and changed

    dimensions into the target.Changes are tracked in the target table by

    versioning the primary key and creating a version number for each

    dimension in the table.

    Use the Type2 Dimension/version data mapping to update a slowly

    changing dimension when you want to keep a full history of dimension

    data in the table.version numbers and versioned primary keys track the

    order of changes to each dimension.

    Type3: The type 3 dimension mapping filters source rows based on user-

    defined comparisions and inserts only those found to be new dimensions

    to the target.Rows containing changes to existing dimensions are updated

    in the target. When updating an existing dimension the informatica server

    saves existing data in different columns of the same row and replaces theexisting data with the updates.

    18) When you use for dynamic cache?

    Your target table is also look up table then you go for dynamic cache .In

    dynamic cache multiple matches return an error.use only = operator.

    19) what is lookup override?

    Override the default SQL statement.You can join multiple sources uselookup override.By default informatica server add the order by clause.

    20) Can we pass the null value in lookup transformation?

    Lookup transformation returns the null value or equal to null value.

    21) what is the target load order?

    You specify the target load order based on source qualifiers in a

    mapping.if u have the multiple source qualifiers connected to the multipletargets you can designate the order in which informatica server loads data

    into the targets.

    22) what is default join that source qualifier provides?

    Inner equi join.

    4

  • 7/30/2019 DWH Concepts FAQs1

    5/20

    23) what are the difference between joiner transformation and source

    qualifier transformation?

    You can join heterogeneous data sources in joiner transformation, which

    we cannot achive in source qualifier transformation.

    You need matching keys to join two relational sources in source qualifier

    transformation.where you doesnt need matching keys to join two sources.

    Two relational sources should come from same data source in source

    qualifier.You can join relational sources, which are coming from different

    sources in source qualifier.You can join relational sources which are

    coming from different sources also.

    24) what is update strategy transformation?

    Whenever you create the target table whether you are store the historical

    data or current transaction data in to target table.

    25) Describe two levels in which update strategy transformation sets?

    26) what is default source option for update strategy transformation?

    Data driven.

    27) What is data driven?

    The information server follows instructions coded into update strategy

    transformations with in the session mapping determine how to flag

    records for insert,update,delete or reject if u do not choose data driven

    option setting , the informatica server ignores all update strategy

    transformations in the mapping.

    28) what are the options in the trarget session of update strategy

    transformation?

    Insert

    Delete

    Update

    Update as update

    5

  • 7/30/2019 DWH Concepts FAQs1

    6/20

    Update as insert

    Update else insert

    Truncate table.

    29) Difference between the source filter and filter?

    Source filter is filtering the data only relational sources. Where as filter

    transformation filter the data any type of source.

    30) what is a tracing level?

    Amount of information sent to log file.

    -- What are the types of tracing levels?

    Normal,Terse,verbose data,verbose intitialization.

    --Expalin sequence generator transformation?

    --can you connect multiple ports from one group to multiple

    transformations?

    Yes

    31) can you connect more than one group to the same target or

    transformation?

    NO

    32) what is a reusable transformation?

    Reusable transformation can be a single transformation.This

    transformation can be used in multiple mappings.when you need to

    incorporate this transformation into mapping you add an instance of it tomapping.Later if you change the definition of the transformation, all

    instances of it inherit the changes.Since the instance of reusable

    transformation is a pointer to that transformation.U can change the

    transformation in the transformation developer, its instance automatically

    reflect these changes. This feature can save U great deal of work.

    6

  • 7/30/2019 DWH Concepts FAQs1

    7/20

    -- what are the methods for creating reusable transformation?

    Two methods

    1) Design it in the transformation developer.

    2) Promote a standard transformation from the mappingdesigner.After you add a transformation to the mapping, you can promote

    it to status of reusable transformation.

    Once you promote a standard transformation to reusable status, you can

    demote it to a standard transformation at any time.

    If u change the properties of a reusable transformation in mapping , you

    can revert it to the original reusable transformation properties by clicking

    the revert.

    33) what are mapping parameters and mapping variables?

    Mapping parameter represents a constant value that you can define before

    running a session.A mapping parameter retains the same value throughout

    the entire session.

    When you use the mapping parameter , you declare and use the parameter

    in a mapping or mapplet.Then define the value of parameter in a

    parameter file for the session.

    Unlike a mapping parameter, a mapping variable represents a value that

    can change through out the session. The informatica server save the value

    of mapping variable to the repository at the end of session run and uses

    that value next time you run the session.

    34) can you use the mapping parameters or variables created in one

    mapping into another mapping?

    NO, we can use mapping parameters or variables in any transformation ofthe same mapping or mapplet in which have crated mapping parameters

    or variables.

    7

  • 7/30/2019 DWH Concepts FAQs1

    8/20

    35) Can you are the mapping parameters or variables created in one

    mapping into any other result transformation?

    Yes because the reusable transformation is not contained with any

    mapplet or mapping.

    36) How the informatica server sorts the string values in rank

    transformation?

    When the informatica server runs in the ASCII data movement mode it

    sorts session data using binary sort order.If you configures the session to

    use a binary sort order, the informatica server calculates the binary value

    of each string and returns the specified number of rows with the highest

    binary values for the string.

    37) What is the rank index in rank transformation?

    The designer automatically creates a RANKINDEX port for each Rank

    transformation. The informatica server uses the Rank Index port to store

    the ranking position for each record in a group.For example, if you create

    a Rank transformation that ranks the top 5 sales persons for each quarter,

    the rank index number the salespeople from 1 to 5.

    38) what is the mapplet?

    Mapplet is a set of transformation that you build in the mapplet designer

    and you can use in multiple mappings.

    39) Difference between mapplet and reusable transformation?

    Reusable transformation can be a single transformation.Where as mapplet

    use multiple transformations.

    40) what is a parameter a file?

    Paramater file defines the values for parameter and variables.

    8

  • 7/30/2019 DWH Concepts FAQs1

    9/20

    WORKFLOW MANAGER

    41) what is a server?

    The power center server moves data from source to targets based on a

    workflow and mapping metadata stored in a repository.

    42) what is a work flow?

    A workflow is a set of instructions that describe how and when to run

    tasks related to extracting,transformation and loading data.

    -- what is session?

    A session is a set of instructions that describes how to move data from

    source to target using a mapping.

    -- what is workflow monitor?

    Use the work flow monitor work flows and stop the power center server.

    43) explain a work flow process?

    The power center server uses both process memory and system shared

    memory to perform these tasks.

    Load manager process: stores and locks the workflow tasks and start the

    DTM run the sessions.

    Data Transformation Process DTM: Perform session validations,create

    threads to initialize the session,read,write and transform data, and handle

    pre and post session operations.

    The default memory allocation is 12,000,000 bytes.

    44) What are types of threads in DTM?

    The main dtm thread is called the master thread.

    Mapping thread.

    Transformation thread.

    9

  • 7/30/2019 DWH Concepts FAQs1

    10/20

    Reader thread.

    Writer thread.

    Pre-and-post session thread.

    45) Explain work flow manager tools?

    Task developer.

    Work flow designer.

    Worklet designer.

    46) Explain work flow schedule.

    You can sehedule a work flow to run continuously, repeat at given time or

    interval or you manually start a work flow.By default the workflow runs

    on demand.

    47) Explain stopping or aborting a session task?

    If the power center is executing a session task when you issue the stop the

    command the power center stop reading data. If continuous processing

    and writing data and committing data to targets.

    If the power center cant finish processing and committing data you issue

    the abort command.

    You can also abort a session by using the Abort() function in the mapping

    logic.

    48) What is a worklet?

    A worklet is an object that represents a set of tasks.It can contain any task

    available in the work flow manager. You can run worklets inside a

    workflow. You can also nest a worklet in another worklet.The worklet

    manager does not provide a parameter file for worklets.

    The power center server writes information about worklet execution in the

    workflow log.

    10

  • 7/30/2019 DWH Concepts FAQs1

    11/20

    49) what is a commit interval and explain the types?

    A commit interval is the interval at which power center server commits

    data to targets during a session. The commit interval the number of rows

    you want to use as a basis for the commit point.

    Target Based commit: The power center server commits data based on the

    number of target rows and the key constraints on the target table. The

    commit point also depends on the buffer block size and the commit

    interval.

    Source-based commit:---------------------------------------------

    User-defined commit:----------------------------------------------

    50) Explain bulk loading?

    You can use bulk loading to improve performance of a session that inserts

    a large amount of data to a db2,sysbase,oracle or MS SQL server

    database.

    When bulk loading the power center server by passes the database

    log,which speeds performance.

    With out writing to the database log, however the target database cant

    perform rollback.As a result you may not be perform recovery.

    51) What is a constraint based loading?

    When you select this option the power center server orders the target load

    on a row-by-row basis only.

    Edit tasks->properties->select treat source rows as insert.

    Edit tasks->config object tab->select constraint based

    If session is configured constraint absed loading when target table receive

    rows from different sources.The power center server revert the normal

    loading for those tables but loads all other targets in the session using

    constraint based loading when possible loading the primary key table first

    then the foreign key table.

    11

  • 7/30/2019 DWH Concepts FAQs1

    12/20

    Use the constraint based loading only when the session option treat rows

    as set to insert.

    Constraint based load ordering functionality which allows developers to

    read the source once and populate parent and child tables in a single

    process.

    52) Explain incremental aggregation?

    When using incremental aggregation you apply captured changes in the

    source to aggregate calculations in a session.If the source changes only

    incrementally and you can capture changes you can configure the session

    to process only those changes. This allows the power center server to

    update your target incrementally rather than forcing it to process the entire

    source and recalculate the same data each time you run the session.

    You can capture new source data.use incremental aggregation when you

    can capture new source data much time you run the session.Use a stored

    procedure on filter transformation only new data.

    Incremental changes do not significantly change the target.Use

    incremental aggregation when the changes do not significantly change the

    target.If processing the incrementally changed source alters more than

    half the existing target, the session may not benefit from using

    incremental aggregation. In this case drop the table and recreate the targetwith complete source data.

    53) Processing of incremental aggregation

    The first time u run an incremental aggregation session the power center

    server process the entire source.At the end of the session the power center

    server stores aggregate data from the session runs in two files, the index

    file and the data file .The power center server creates the files in a local

    directory.

    12

  • 7/30/2019 DWH Concepts FAQs1

    13/20

    Transformations.

    --- what is transformation?

    Transformation is repository object that generates modifies or passes data.

    54) what are the type of transformations?

    2 types:

    1) active

    2) passive.

    -- explain active and passive transformation?

    Active transformation can change the number of rows that pass

    through it.No of output rows less than or equal to no of input rows.

    Passive transformation does not change the number of rows.Always

    no of output rows equal to no of input rows.

    55) Difference b/n filter and router transformation.

    Filter transformation to filter the data only one condition and drop the

    rows dont meet the condition.

    Drop rows does not store any ware like session log file..

    Router transformation to filter the data based on multiple conditions and

    give you the option to route rows that dont match to a default group.

    56) what r the types of groups in router transformation?

    Router transformation 2 groups 1. Input group 2. output groups.

    Output groups in 2 types. 1. user defined group 2. default group.

    57) difference between expression and aggregator transformation?

    Expression transformation calculate the single row values before writes

    the target.Expression transformation executed by row-by-row basis only.

    13

  • 7/30/2019 DWH Concepts FAQs1

    14/20

    Aggregator transformation allows you to perform aggregate calculations

    like max, min,avg

    Aggregate transformation perform calculation on groups.

    58) How can u improve the session performance in aggregatetransformation?

    Use stored input.

    59) what is aggregate cache in aggregate transformation?

    The aggregate stores data in the aggregate cache until it completes

    aggregate calculations.When u run a session that uses an aggregate

    transformation , the informatica server creates index and data caches in

    memory is process the transformation. If the informatica server requires

    more space it stores overview values in cache files.

    60) explain joiner transformation?

    Joiner transformation joins two related heterogeneous sources residing in

    different locations or files.

    --What are the types of joins in joiner in the joiner transformation?

    Normal

    Master outer

    Detail outer

    Full outer

    61) Difference between connected and unconnected transformations?

    Connected transformation is connected to another transformation with in amapping.

    Unconnected transformation is not connected to any transformation with

    in a mapping.

    14

  • 7/30/2019 DWH Concepts FAQs1

    15/20

    62) In which conditions we cannot use joiner transformation(limitations

    of joiner transformation)?

    Both pipelines begin with the same original data source.

    Both input pipelines originate from the same source qualifiertransformation.

    Both input pipelines originate from the same normalizer

    transformation

    Both input pipelines originate from the same joiner transformation.

    Either input pipelines contains an update strategy transformation

    Either input pipelines contains sequence generator transformation.

    63) what are the settings that u use to configure the joiner

    transformation?

    Master and detail source.

    Type of join

    Condition of the join

    64) what is look up transformation

    look up transformation can be used in a table view based on condition by

    default lookup is left outer join

    65) why use the lookup transformation?

    To perform the following tasks.

    Get a related value.For example if your table includes employee ID,but

    you want to include such as gross sales per invoice or sales tax but not the

    calculated value(such as net sales)

    Update slowly changing dimension tables. You can use a lookup

    transformation to determine whether records already exist in the target.

    66) what are the types of lookup?

    Connected and unconnected

    15

  • 7/30/2019 DWH Concepts FAQs1

    16/20

    67) difference between connected and unconnected lookup?

    Connected lookup Unconnected lookup

    Receives input values

    directly from the pipe

    line.

    Receives input values from the result of a clkp

    expression in a another transformation.

    U can use a dynamic

    or static

    Cache

    U can use a static cache

    Cache includes all

    lokkup columns used

    in the mapping(that is

    lookup table columns

    included in the lookupcondition and lookup

    table columns linked

    as output ports to

    other transformations)

    Cache includes all lookup/output ports in the

    lookup condition and the lookup/return port.

    Can return multiple

    columns from the

    same row or insert

    into the dynamic

    lookup cache.

    Designate one return port(R).Returns one

    column from each row.

    If there is no match

    for the lookup

    condition, the

    informatica server

    returns the default

    value for all output

    ports.If u configure

    dynamic caching the

    informatica server

    inserts rows into thecache.

    If there is no matching for the lookup

    condition the informatica server returns NULL

    Pass multiple output

    values to another

    transformatnion.Link

    lookup/output ports to

    another

    Pass one output value to another

    transformation.The lookup/output/return port

    passes the same value to the

    ---------------------------------------------------------

    16

  • 7/30/2019 DWH Concepts FAQs1

    17/20

    transformation

    Supports user-defined

    default values.

    Does not support user-defined default values.

    68) explain index cache and data cache?

    The informatica server stores conditions values in the index cache and

    output values in the data cache.

    69) What are the types of lookup cache?

    Persistent cache: U can save the look up cache files and reuse them

    the next time the informatica server processes a lookup

    transformation to use the cache.

    Static cache: U can configure a static or read-only lookup table.By

    default informatica server creates a static cache.It caches the lookup

    table and lookup values in the cache for each row that comes into

    the transformation.When the lookup condition is true the

    inforamtica server does not update the cache while it processes the

    lookup transformation.

    Dynamic cache: If you want to cache the target table and insert new

    rows into cache and the target you can create a look uptransformation to use dynamic cache.The informatica server

    dynamically inserts data into the target table.

    Shared cache: You can share the lookup cache between multiple

    transformations.You can share unnamed cache between

    transformation in the same mapping.

    70) Difference between static cache and dynamic cache?

    17

  • 7/30/2019 DWH Concepts FAQs1

    18/20

    Static cache Dynamic cache

    You cannot insert or

    update the cache

    You can insert rows into the

    cache as you pass rows to the

    target

    The informatica serverreturns a value from

    the lookup table or

    cache when the

    condition is true,.When

    the condition is true the

    informatica server

    returns the default

    value for connected

    transformation

    The informatica server insertsrows into the cache when the

    condition is false.This indicates

    that the row in the cache or

    target table.You can pass these

    rows to the target table.

    ORACLE:

    71) Difference between primary key and unique key?

    Primary key is Not null unique

    Unique accept the null values.

    72) Difference between inserting and sub string?

    73) What is referential integrity?74) Difference between view and materialized view?

    75) What is Redolog file?

    The set of redo log files for a database is collectively know as the

    databases redo log.

    76) What is RollBack statement?

    A database contains one or more rollback segments to temporarily store

    undo information.Roll back segment are used to generate read consistantdata base information during database recovery to rooback uncommitted

    transactions for users.

    -- what is table space?

    18

  • 7/30/2019 DWH Concepts FAQs1

    19/20

    A data base is divided into logical storage unit called table space.A table

    space is used to grouped related logical structures together.

    -- How to delete the duplicate records.

    -- What are the difference types of joins in Oracle?

    Self-join,equi-join,outer join.

    77) What is outer join?

    One of which rows that dont match those in the commen column of

    another table.

    78) write query Max 5 salaries?

    Select * from emp e where 5>(select count(*) from emp where sal>e.sal)

    79) what is synonym?

    80) --------------------------------

    81)82) What is bit map index and example?

    83) What is stored procedure and advantages?

    84) Explain cursor and how many types of triggers in oracle?

    Trigger is stored procedure.Trigger is automatically executed.

    85) Difference between function and stored procedure?

    Function returns a value.Procedure does not return a value(but returns a

    value tru IN OUT parameters!!!!!!)

    86) Difference between replace and translate?

    87) Write the query nth max sal

    19

  • 7/30/2019 DWH Concepts FAQs1

    20/20

    Select distinct (a.sal) from emp a where &n=select count(distinct(b.sal)

    from emp b where a.sal