View
247
Download
0
Category
Preview:
Citation preview
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 1/18
Loading complex Excel spreadsheets into SAPNetWeaver BI
Published byVassil Kovatchev
at July 9, 2007 in business intelligence.
Summary
This article should help you get a quick start on using DataDefractor Solo to load complex
spreadsheets into SAP NetWeaver 2004s BI (formerly known as SAP BW). It illustrates the
process of converting a financial Excel spreadsheet into a complete structured data model and
then loading it into an SAP BI InfoCube.
Applies to• DataDefractor Solo 1.1.
• SAP NetWeaver 2004s BI SP12.
Introduction
It might be useful in certain customer scenarios to extract data from complex business
spreadsheet reports and load it into SAP BI InfoCubes for special data analysis.
This sample is based on a financial report provided by Freddie Mac, which is freely available
at Freddie Mac’s website. It is also available in DataDefractor Solo samples’ folder. Therefore,
it should be possible for every DataDefractor and SAP NetWeaver 2004s BI (SAP BI) customer
to setup the same environment which was used here.
For the purposes of this example, SAP BI and DataDefractor Solo were deployed to a single
Windows Server 2003 machine. The connection point between DataDefractor Solo and SAP BI
is a set of flat files. Therefore, it is also possible to extract Excel data with DataDefractor on a
Windows machine and then load it into SAP BI deployed to a UNIX machine.
Source spreadsheet
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 2/18
The source spreadsheet is a financial report provided by Freddie Mac. It contains average U.S.
national mortgage rates for 17 years, broken down by month.
The data model captured in the spreadsheet contains numeric mortgage rate facts recorded
in the context of a time period. Let’s look at the contents and the layout of the spreadsheet.
Data Facts
The facts are concentrated in several groups of numbers - one group per year. There are two
columns of numbers for each month of each year - one column for the mortgage rates and
one for the discount pointsthat guarantee the rate.
Time Period
The time period of the recorded facts consists of a yearand a month of the year. The years
of the facts are located in the header rows above each group of facts, while the months are
located in the header column to the left of each group of years.
Product
And finally, the product that is being sold in the particular time period for the particular rate
and discount points is located in the global header of the report - “CONVENTIONAL
CONFORMING 15-YEAR FIXED-RATE MORTGAGE”.
Defining the data model
In this section we will use DataDefractor Solo Wizard to map out the data model captured in
the mortgage rate report. We will draw the layout of the spreadsheet, define the location of
the facts and define the dimensionality of the model. Once we’re done, we will use this
mapping to convert the spreadsheet to a set of CSV files organized as a standard Kimball-style
star schema ready to be loaded in SAP BI.
Step 1: Choosing a data source
The first step is to launch the DataDefractor Solo Wizard. You can start it by either double-
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 3/18
clicking the DataDefractor Solo Wizard icon on your desktop or by navigating the following
path on your “Start” menu:
Start\All Programs\Interactive Edge\DataDefractor Solo\DataDefractor
Solo Wizard
The first page of the wizard asks you to specify the location of the data source.
Follow these steps:
1. Select Excel radio button - this will launch an “Open File” dialog box.
2. Navigate to the following Excel file:
\Program Files\Interactive Edge\DataDefractor
Solo\Samples\Freddie Mac\Freddie Mac.xls
and select “Open”.
3. Select Use all check box. This will instruct DataDefractor to use all the worksheets
available in the workbook.
4. Click Next to move to the next page of the wizard.
Step 2: Mapping out the data layout
In this step we will define the data layout of the spreadsheet. We will start by isolating the
header of the worksheet, then we will move on to slice the worksheet into vertical sub-pages
and then slice those pages further into a collection of low-level horizontal sub-pages, which
contain clear fact data surrounded but unbroken by context information.
Follow these steps
1. Select Header/Footer radio button - this will isolate the header above the data area.
A “Drill Down” reminder tool-tip balloon may pop-up - close it.
2. Select the drill-down button to drill down into the data area. We can now slice this
data area into vertical sub-pages.
3. Select Vertical Sub-Pages. This will add a pattern row above the data area:
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 4/18
We will use this pattern row to define a border-recognition pattern, which will split the
data area into separate vertical pages.
4. The spreadsheet is logically partitioned into several vertically stacked sub-pages. Each
page contains a group of seven years. Looking at the report, we can deduct that every
page begins at a row, which contains a year in column B. Click on the pattern entry
below column B, type “^\d{4}” (without the quotes) and press Enter:
5. Select the drill-down button to drill down again, this time into the vertical sub-
pages. We will slice them further into horizontal sub-pages - one horizontal sub-page
per year.
6. Select Horizontal Sub-Pages. This will add a pattern column to the left of the data
area. We will use this pattern column to define a border-recognition pattern, which will
split the data area into separate horizontally positioned pages.
7. We see that each horizontal page begins with a year in row 1. Click on the pattern
entry to the left of row 1, type “^\d{4}” (without the quotes) and press Enter:
8. Each horizontal sub-page begins with the year, but if we drill down, we will lose the
month column. We need to carry it down into the new horizontal pages so that we can
build a correct data model, which will consist of atomic low level virtual spreadsheets
that contain facts and are surrounded by complete context information.
Select property CarryoverColumns in the property grid below the layout radio
buttons, type in 1 and press Enter:
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 5/18
9. Select the drill-down button again, to drill down into the newly defined horizontal
sub-pages:
10. The new horizontal pages contain the lowest level of facts, the year and the month.
We will move on to eliminating the extra blank columns to the right of the facts.
We need to eliminate all the columns to the right of the last column, which has a non-
empty cell in row 2:
* Select Horizontal Sub-Pages
* Set MatchSingleSubPage property to True
* Set PatternDirection property to RightToLeft
* In the pattern for row 2 type in the following regular expression (without the quotes): “.+ “
(This regular expression matches any cell, which contains at least one symbol)
11. Drill down into the new pages.
12. The only thing left to do is eliminate the footer of the virtual spreadsheets. This footer
includes pre-calculated Annual Averages, which we don’t need in the data
warehouse. Follow these steps to eliminate all the rows past the last row which
contains the text “December” in column A:
* Select Vertical Sub-Pages
* Set MatchSingleSubPage property to True
* Set PatternDirection property to BottomToTop
* In the pattern for column A type in “December” (without the quotes)
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 6/18
13. We are done with defining the spreadsheet layout. Drill down into the new pages and
review them by clicking on the tabs below the grid:
You should see a set of virtual spreadsheets with a uniform layout featuring twelve
months in column A, mortgage rate facts in column B, discount points in
column C and a year in cell B1:
14. We dissected the spreadsheet into smaller pieces and continued drilling into them until
we reached to a repetitive uniform data layout, which contained a core of fact data
surrounded by by context information. Now we are ready to move on and specify the
precise location of that fact area and its context as it appears on all these virtual
spreadsheets.
Click Next to move to the next page of the wizard.
Step 3: Specifying the fact area
The fact area is indicated by picking the first row that contains facts and specifying the
collection of columns that contain fact data.
Follow these steps:
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 7/18
1. Click somewhere on row 4 in the grid to select it as the first row, which contains facts:
2. Click Next to move to the next page of the wizard.
3. On page Fact Source Columns, select columns B and C by clicking on the header of
column C and dragging the mouse pointer over to the header of column B.
4. Click button “Add Selected Columns” to add the selected column range to the list
of column ranges, which contain facts.
5. Select Flexible Range when prompted to select the type of column range to add to
the list. This instructs DataDefractor to consider all the columns that appear to the
right of column A as columns containing facts. This will accommodate spreadsheets
which may contain more than two measures.
Click OK to confirm the column range type:
6. Click Next to move to the next page of the wizard.
Step 4: Defining the dimensional model
In this step we will define the dimensional model in terms of dimensions and attributes. We
will also specify the location of each dimension and attribute as well as the location of measure
names.
Follow these steps:
1. Select item Measures in the axis tree on the left-hand side of the Axes
Definition page.
2. Select row 2 in the grid by clicking on the header of row 2.
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 8/18
3. Click button “Add selected grid cells” to add this row as the source of measure
names:
4. Select item New Dimension in the axis tree, click button Rename below the grid and
type in Time; press Enter.
5. While item Time is selected, select column A by clicking on its header.
6. Click button to make the months represent the set of members for the time
dimension.
7. Click on cell B1 in the grid and click button to append the year to the names of
the months. This will make the members of the time dimension unique across all
virtual spreadsheets.
8. Select Position when prompted to select the type of column reference:
9. While dimension Time is selected, click button New Attribute below the grid and
type Year; press Enter.
10. While the new attribute Year is selected, click on cell B1 in the grid and click
button to make the contents of cell B1 appear as the Yearattribute of the
members of the Time dimension; select Position when prompted for the type of
column reference.
11. Click button New Attribute again and type Month; press Enter.
12. While attribute Month is selected, select column A by clicking on its header and click
button to make the respective cell of column Aappear as the Month attribute of
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 9/18
each member of the Time dimension.
13. Click button New Dimension below the grid and type Product; pressEnter.
14. Click the drill-up button five times until you zoom out to Level 1 of the
spreadsheet.
15. Click on cell B2 in the grid and click button to make the contents of
cell B2 appear as the Product dimension; select Position when prompted for the
type of column reference.
16. The actual name of the product is a substring of the contents of cell B2. It is the string
that appears before the word “SINCE “.
To extract this substring select source Level 1: Cell:B2 in the list of Sources and click
button Properties below that list.
17. In the newly invoked Define Extraction dialog box, select extractionCustom regular
expression and type in the following expression (without the quotes):
“(.*)\sSINCE “:
18. Click OK to confirm the custom extraction rule:
19. We’ve completed the dimensional model of the spreadsheet.
Click Next to move to the next page of the wizard.
Step 4: Defining measure data types
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 10/18
In this step we will define the data types of our measures. Both interest rate and discount
point must be of type Float. Since this is the default type for all the measures detected by
DataDefractor, we will simply verify the data type.
1. Click on the name of each measure and make sure that property FactsType below the
list of measures is set to Float.
2. Click Next to move to the next page of the wizard.
Step 5: Setting output options and processing spreadsheet
This is the last page of the wizard. Here we will specify the type of output we need and where
to generate this output. We will also select some options which will make the output easier to
load into SAP BI.
Follow these steps:
1. Select radio button Star.
2. Select field Files Prefix and type FreddieMacOutput.
3. Select field Files Location and type output. This will create a new sub-
folder output in the folder where the Excel file is located.
4. Check check-box Include member names in fact table.
5. Check check-box Include member names in fact table.
6. The data mapping schema for converting Freddie Mac spreadsheets into a dimensional
model is now complete. Click Save schema… to save the mapping schema for
modifications or to be able to schedule batch processing of Freddie Mac Excel
workbooks.
7. Click Finish to process the current spreadsheet.
At this point, DataDefractor Solo processes the input spreadsheet applying the dimensional
model and layout rules defined in the DataDefractor data mapping schema. When it is done
processing, DataDefractor displays the results:
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 11/18
Click on Open Destination Folder… to navigate to the folder where the output was
generated.
The output is a classical Kimball-style star schema, which consists of three CSV files - one for
the facts and one for each dimension:
• FreddieMacOutput.Facts.csv - contains a normalized view of the facts extracted
from the input spreadsheet. The measures are stored in separate columns. Each row
in the file also contains dimension names and surrogate keys which can be used as
foreign keys linked to the Timeand Product dimension tables discussed below.
• FreddieMacOutput.Time.csv - contains the keys and attributes of the Time
dimension.
• FreddieMacOutput.Product.csv - contains the keys and attributes of the Product
dimension.
Loading the data into SAP BI
In this section we will load the output generated by DataDefractor Solo into SAP BI. We will
use SAP Data Warehousing Workbench to build a data warehouse and to populate it with the
Freddie Mac data.
Here is the list of procedures we will go through to accomplish this task:
1. Build a dimensional model (InfoCube).
2. Load the DataDefractor output into a Persistent Stage Area (PSA).
3. Transfer/transform the data from PSA into the InfoCube.
Note: The following sections contain steps that create many SAP objects, some of which may
need to be “activated” before they can be used. The steps to “activate” these objects have
been omitted from this tutorial for brevity.
Step 1: Building the dimensional data model
Dimensional data modeling defines business entities such as cube, conformed dimension,
attribute, measure etc. It provides the business user semantic meaning to diverse source
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 12/18
data.
In SAP BI, the terms used to describe a dimensional model are:
• InfoCube - a multidimensional data model, which contains facts that can be queried,
pivoted and aggregated according to criteria defined by means of selecting dimension
members on the various axes of the cube.
• Characteristic - equivalent to a conformed dimension. Used to query one or more
InfoCubes.
• Navigation Attribute - equivalent to a conformed dimension’s attribute. Used to
navigate and filter the members of a characteristic.
• Key Figure - equivalent to a measure.
In this example, we will define the data model as one InfoCube FREDDIEMAC which consists
of two characteristics - PRODUCT and TIME. The InfoCube also contains two key figures
- POINTS and RATE. The TIMEcharacteristic has two navigation attributes
- YEAR andMONTH.
Follow these steps to build the model:
1. Log into SAP BI using SAP GUI.
2. Launch SAP Data Warehousing Workbench.
3. Create a new InfoArea called IAFRED.
4. In this area, create a new InfoCatalog called ICFRED.
5. In this catalog create a new characteristic InfoObject and call it YEAR . Make its data
type CHAR(8).
6. Create another characteristic InfoObject in catalog ICFRED - name this
characteristic MONTH. Make its data type CHAR(16).
7. Create another characteristic InfoObject and name it TIME. Make its data
type CHAR(16). Navigate to tab Attribute and add the YEAR andMONTH attributes
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 13/18
to the list of attributes of the TIME InfoObject. Turn on the navigation option for
both attributes:
8. Create a PRODUCT InfoObject characteristic in catalog ICFRED. Make its data
type CHAR(60).
9. Create a new InfoCatalog - ICKFFRED.
10. In this catalog, create a new key figure InfoObject - RATE. Choose base
aggregation SUM. Select exception aggregation AVERAGE on characteristic TIME.
Make this key figure’s data type FLOAT:
11. Create another key figure InfoObject in the same catalog - call itPOINTS. Choose
base aggregation SUM. Select data type FLOAT.
12. Navigate to InfoProviders and create a new InfoCube FREDDIEMAC in
InfoArea IAFRED. Follow these steps to build the InfoCube:
* Define dimension PRODUCT and assign InfoObject PRODUCT to this dimension
using Direct Input.
* Define dimension TIME and assign InfoObject TIME to this dimension using Direct
Input.
* Using Direct Input, assign key figure InfoObject RATE to the list of InfoCube key
figures.
* Using Direct Input, assign key figure InfoObject POINTS to the list of InfoCube
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 14/18
key figures.
Step 2: Loading Freddie Mac data into Persistent Staging Area
In this step we will load the input data into a Persistent Staging Area (PSA). PSA is an
intermediate data storage area where data is collected before it is used to populate the
dimensional data model.
Follow these steps:
1. Navigate to DataSources.
2. Create a new Application Component called ACFRED.
3. Create a new DataSource FREDFACT of type Transaction Data and map it to
file FreddieMacOutput.Facts.csv located in the folder used by DataDefractor Solo to
generate the Freddie Mac star schema.
Use the following data type mapping for the fields in this file:
* TIMEID - INT2
* TIME - CHAR(14)
* PRODUCTID - INT2
* PRODUCT - CHAR(45)
* RATE - CHAR(10)
* POINTS - CHAR(5)
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 15/18
4. Create InfoPackage FREDFACTPKG under DataSource FREDFACT.
5. Navigate to tab Schedule and the execute the package to load the data into PSA.
6. Create a new DataSource FREDTIME of type Master Data Attributes and map it to
file FreddieMacOutput.Time.csv located in the Freddie Mac star schema folder.
Use the following data type mapping for the fields in this file:
* MEMBERID - INT2
* NAME - CHAR(14)
* YEAR - CHAR(5)
* MONTH - CHAR(9)
7. Create InfoPackage FREDTIMEPKG under DataSource FREDTIME.
8. Navigate to tab Schedule and the execute the package to load the data into PSA.
9. Create a new DataSource FREDPROD of type Master Data Attributesand map it to
file FreddieMacOutput.Product.csv located in the Freddie Mac star schema folder.
Use the following data type mapping for the fields in this file:
* MEMBERID - INT2
* NAME - CHAR(45)
10. Create InfoPackage FREDPRODPKG under DataSource FREDPROD.
11. Navigate to tab Schedule and the execute the package to load the data into PSA.
Step 3: Populating the dimensional data model
In this step we will populate the FREDDIEMAC InfoCube fact and master data with the data
stored in PSA. To do this, we will define some transformations to transform, transfer and clean
the data as it flows from PSA to the data targets.
Follow these steps:
1. Under DataSource FREDFACT create a new InfoCube transformation targeting
InfoCube FREDDIEMAC.
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 16/18
2. Map the data source fields to the InfoObjects in the InfoCube by dragging lines from
the data source to the InfoCube, following this mapping:
* TIME to TIME
* PRODUCT to PRODUCT
* RATE to RATE
* PTS to POINTS
Replace the Rule Type of the TIME transformation with the following formula
(without the quotes): “TOUPPER(TIME) “.
3. Under DataSource FREDFACT create a new Data Transfer Process.
4. Navigate to tab Execute and click Execute to populate the key figures of the
InfoCube:
5. Under DataSource FREDTIME create a new InfoObject transformation targeting
InfoObject TIME.
6. Map the data source fields to the attributes in the InfoObject by dragging lines from
the data source to the InfoObject, following this mapping:
* MEMBERNAME to TIME
Rule Type: Formula “TOUPPER(MEMBERNAME) “
* YEAR to YEAR
Rule Type: Formula “TOUPPER(YEAR) “
* MONTH to MONTH
Rule Type: Formula “TOUPPER(MONTH) “
7. Under DataSource FREDTIME create a new Data Transfer Process.
8. Navigate to tab Execute and click Execute to populate the TIME master data.
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 17/18
9. Under DataSource FREDPROD create a new InfoObject transformation targeting
InfoObject PRODUCT.
10. Map the data source fields to the attributes in the InfoObject by dragging lines from
the data source to the InfoObject, following this mapping:
* MEMBERNAME to PRODUCT
Rule Type: Formula “TOUPPER(MEMBERNAME) “
11. Under DataSource FREDPROD create a new Data Transfer Process.
12. Navigate to tab Execute and click Execute to populate the PRODUCTmaster data.
Step 4: Verifying the dimensional data model
Now that we’ve populated the InfoCube with facts and master data extracted from the Freddie
Mac financial report, it is time to verify the results.
Follow these steps to verify the InfoCube contents:
1. Navigate to InfoProviders, right-click on InfoCube FREDDIEMAC and select Display
Data.
2. Uncheck check-box Do not use any conversion.
3. Choose Field selection for output and select the following fields:
* PRODUCT
* TIME
* MONTH [TIME]
* YEAR [TIME]
* POINTS
* RATE
4. Click Execute twice.
You should see the following query result:
8/9/2019 Loading Complex Excel Spreadsheets Into SAP NetWeaver BI
http://slidepdf.com/reader/full/loading-complex-excel-spreadsheets-into-sap-netweaver-bi 18/18
The InfoCube is ready and populated with data, which can be analyzed using various front-end
applications including NetWeaver Enterprise Portal, BEx Analyzer, BEx Web Analyzer, etc.
Additionally with the help of DataDefractor Solo Batch Command, the extraction and
transformation of future Freddie Mac reports can be automated and processing can be
scheduled periodically. Furthermore, the loading of DataDefractor output into SAP can be
automated with the help of SAP BI Process Chain.
Recommended