1. On-Line Analytical Processing (OLAP) OLAP is a query based
methodology that supports data analysis in a multidimensional
environment. OLAP is a valuable tool for verifying or refuting
human generated hypotheses and for performing manual data mining.
An OLAP engine logically structures multidimensional data in the
form of cube like the one shown in Figure 1. The cube displays
three dimensions - purchase category, time in month and region. As
an OLAP cube is designed for a specific purpose, it is not unusual
to have several cubes structures from the data in a single data
warehouse. The design of a data cube includes decisions about which
attributes to include in the cube as well as the granularity of
each attribute. A well designed cube is configured so as to avoid
situations where data cells do not contains useful information. For
example, a cube with two time dimensions, one for month and second
for fiscal quarter(Q1,Q2,Q3,Q4) is a poor choice because cell
combinations such as (January,Q4) or (December, Q1) will always be
empty. A useful OLAP systems interface allows the user to display
the data from different perspective, perform statistical
computations and tests, query the data into successively higher and
/or lower levels details, cross-tabulate the data, and view the
data with graphs and charts. I 1
2. Figure 1. A multidimensional cube for credit card purchases.
Category = Retails Month = January Region = Four Amount = 67090
Count = 120 Jan Feb Mar Apr May Month June July Aug Sep Oct Nov Dec
Four Three Two Retails Travel Supermarket Entertainment One Region
Category OLAP Example In Figure 1, the cube contains 12 X 4 X 4 =
192 cells. Stored within each cell is the total amount spend within
a given category by all credit card customers for a specific month
and region. If an average purchase amount is to be computed, the
cube will also contain a count representing the total number of
purchases for each month, category and region. The arrow in Figure
1 points to a cube holding the total amount and the total number of
retails purchase in region four for the month of January. I 2
3. Each attribute of an OLAP cube may have one or more
associated concept hierarchy. A concept hierarchy defines a mapping
that allows the attributes to be viewed from varying levels of
detail. Figure 1.2 displays a concept hierarchy for the attribute
location. As you can see, region holds the highest level of
generality within the hierarchy. The second level of hierarchy
tells us that one or more states make up a region. The third and
fourth levels show us that one or more cities are contained in a
state and one or more addresses are found within a city. Lets
create a scenario where our OLAP cube together with the concept
hierarchy of Figure 1.2, will be assistance in a decision making
process. Figure 1.2 A concept hierarchy for location Region States
City Street Address Suppose we wish to determine a best situation
for offering a luggage and a hand bag promotion for travel. Our
goal is to determine when and where the promotional offering will
have its greatest impact on customer response. We can do this by
finding those times and locations where relatively large amounts
have been previously spent on travel. Once determined, we then
designate the best regions and times for the promotional offering
so as to take advantage of the likelihood of ensuing travel
purchases. Here is the list of common OLAP operations together with
a few examples for our travel promotion problem: 1. The SLICE
operation select data on a single dimension of OLAP cube. For the
cube in Figure 1, a slice operation leaves two of the three
dimensions intact, while a selection on the remaining dimension
creates a sub cube from the original cube. The two queries for the
slice operator are: a. Provide a spreadsheet of month and region
information for cells pertaining to travel b. Select all cells
where purchase category = retails or supermarket 2. The DICE
operation extracts a sub cube from the original cube by performing
a select operation on two or more dimensions. Here are three
queries requiring one or more dice operations: a. Identify the
month of peak travel expenditure for each region. b. Is there a
significant variation in total dollars spent for travel and
entertainment by customers in each region? c. Which month shows the
greatest amount of total dollars spent on travel and entertainment
for all regions? 3. ROLL-UP or aggregation, is combining of cells
for one of the dimensions defined within a cube. One form of
roll-up uses the concept hierarchy associated with a dimension to
achieve a higher level of generalization. For this example, this is
illustrated in Figure 1.3 where the roll-up is on the time
dimension. The cell pointed I 3
4. to in the figure contains region one supermarket data for
the month October, November and December. A second type of roll-up
operator actually eliminates an entire dimension. For our example,
suppose we choose to eliminate the location dimension. The end
result is a spreadsheet of total purchases delineated by month and
category type. Category = Supermarket Month = Jan,Feb,March Region
= One Q1 Q2 Time Q3 Q4 Four Three Retails Travel One Two
Supermarket Entertainment Regions Category 4. DRILL-DOWN is the
reverse of a roll-up and involves examining data at some level of
greater details. Drilling down on region in Figure 1, results in a
new cube where each cell highlights a specific category, month and
state. 5. ROTATION or pivoting, allows us to view the data from a
new perspective. For our example, we may find it easier to view the
OLAP cube in Figure 1 by having months displayed on the horizontal
axis and purchase category on the vertical axis. General
Considerations a. Most useful strategies for analyzing a cube
require a sequence of two or more operations b. MS Excel provides
an interface that allows us to view OLAP cubes created from data
stored in a relational database c. The information contained in the
cube can be displayed and manipulated in Excel as a pivot table. I
4
5. Excel Pivot Table For Data Analysis. Creating a Simple Pivot
Table We start with a simple example using credit card promotion
database to show how pivot tables summarize data for the attribute
income range. 1. To begin, load the CreditCardPromotion.xls file
into an Excel spreadsheet. 2. Delete the second and first rows of
the spreadsheet data as they are not relevant to our analysis. 3.
Make sure the cursor is positioned in one of the cell containing
data. Proceed to the Data dropdown menu and select PivotTable and
PivotChart Report. 4. Select the Microsoft Excel list or database
radio button. This indicates that the data to be analyzed is housed
within an Excel spreadsheet. Select the PivotTable option and click
next to continue. 5. In step 2 we are asked for the data range
parameters to be used for creating the pivot table. As we initially
placed the cursor in a cell containing the data, the data range
should be correct. Click next to continue step 3. 6. In step 3 we
specify the location of the pivot table. Select New worksheet radio
button and click finished to continue. Figure 2 : A Pivot Table
Template I 5
6. Let us use the toolbars together with the data drop area to
generate a summary report for attribute income range. 1. Use your
mouse to drag income range from the toolbar into the area specified
by Drop Field Here. Next, return to toolbar and drag income range
into the area specified by Drop Data Item here. Figure 3: A summary
report for income range The report tells us, among others thing,
that the majority of credit card customers have an income ranging
between $30000 and $40000 dollars. Now lets change the output
format for the total column (currently a count) to a percent: 1.
Single click on count of income range 2. Single click on the field
settings square located in the top right portion of the pivot table
toolbar. A Pivot Table Field box will appear. 3. Single click on
option >> and examine the options in the Show data as:
dropdown menu. 4. Select % of column and click Ok. The data in the
total column will now appear as a percent. Finally lets make a pie
chart to complement the table output: 1. Begin by highlighting the
percentage score for the four income range values. I 6
7. 2. Next, single click on the ChartWizard located in the top
left portion of the pivot table toolbar. A bar chart representing
the four income range values will appear. However we wish to have a
pie chart showing the value. To accomplish this, single click on
the Chart Wizard a second time. 3. Choose one of the pie chart
types and click on Finish Figure 4: A pie chart for income range
Next we use the pivot table drill down feature to display the
records of those individuals in a particular salary range: 1. Click
on Sheet4 in the bottom tray to display the pivot table. 2. Double
click on the cell containing the percent for the desired salary
range (20-30K). All instances within the chosen salary range will
appear in a new spreadsheet. 3. To return to the pivot table, click
on sheet4. This completes our first example. I 7
8. Pivot Table for Hypotheses Testing The ACME Credit Card
Company has decided to solicit by telephone select cardholders who
received their credit cards within last year and who did not
purchase credit cards insurance with their initial mail-in
application. Their data analyst believes that there is a
relationship between a cardholders age and whether the cardholder
has credit card insurance. Specifically, the analyst wishes to test
the hypotheses that younger cardholders purchase credit card
insurance whereas more senior cardholders do not. If the hypothesis
is true, only those cardholders under certain age will be selected
for the telephone solicitation. To test the hypothesis we will use
a pivot table and our imagination and assume that the credit card
promotion database contains a much larger sampling of cardholders.
The following steps test the hypothesis claiming a relationship
between age and credit card insurance status: 1. To begin, make
sure the cursor is positioned in one of the cells of sheet1 that
contains data. Proceed to the Data Dropdown Menu and select Pivot
Table and Pivot Chart report and select finish. 2. Move age to the
area labeled Drop Row Fields Here. Move credit card insurance to
the area labeled Drop Column Fields Here. 3. Move credit card
insurance to the area labeled Drop Data Items Here. The resultant
pivot table is given in Figure 5 Figure 5: A pivot showing age and
credit card insurance choice The pivot table is informative in that
it tells us that very few individuals currently have credit card
insurance. However the distribution of ages is such that it is
difficult to make I 8
9. any conclusions about a relationship between age and credit
card insurance. We can use the group function to develop a clearer
picture about any possible relationship between the two attributes.
The method is as follows: 1. Single click on the age attribute
within the pivot table 2. Single click on the Data dropdown menu.
3. Mouse to Group and Outline and then to Group. Single click on
Group. A grouping box that allows you to select a Starting at,
Ending at, and By value will appear. 4. Click OK to select the
default values. Figure 6: Grouping the credit card promotion data
by age The new pivot table is displayed in Figure 6. Although our
data set is too small to draw valid conclusions, grouping the data
by age allows us to obtain a clearer picture of the relationship
between the two attributes. A second method for determining if a
relationship between age and credit card insurance exists. This
method computes the average ages for those individuals with and
without credit card insurance. Instead of starting with the
original credit card promotion database, well modify the current
pivot table by invoking the Pivot Table Wizards from the toolbar as
follows. 1. Locate the Pivot Table Wizards in the top row of the
toolbar. 2. Single click on the wizard. This action invokes the
step of 3 display of the Pivot Table Wizard. 3. Locate and left
click on the layout. The current pivot table layout is displayed
within the Pivot Table Wizard. Figure 7, shows the current layout.
4. Use your mouse to remove attribute age from the Row area and
drag it to the age button located on the far right of the layout
display window. Next, drag credit card insurance from the Column
area to the Row area. 5. Remove Count of Credit Card Insurance from
the data area and place age in the data area. I 9
10. 6. Double click on Sum of Age within the data area. A
PivotTable Field box will appear. 7. Single click on Average within
the Summarize by: box. Click on OK. This returns you to the
PivotTable Layout Wizards 8. Click on OK from within the PivotTable
Layout Wizard. Finally, click on Finish within the step 3 display
of the PivotTable Wizard. Figure7: PivotTable Layout Wizard The
resultant pivot table shows the average age for credit card
insurance = no is approximately 41.42, whereas the average age for
credit card insurance = yes is approximately 32.33. Figure 8: Age
Summary I 10
11. Creating a Multidimensional Pivot Table For this example,
we will use a pivot table to investigate relationships between the
magazine, watch and life insurance promotions relative to customer
gender and income range. We will do this by creating a
three-dimensional cube like the one shown in Figure 9. Each cell of
the cube contains a count of the number of customers who either did
or did not take part in the promotional offerings. Figure 9: A
credit card promotion cube Watch Promo = No Life Insurance Promo =
Yes Magazine Promo = Yes Promo Watch No Yes Yes No Magazine Promo
Yes No The arrow in Figure 9 points to the cell holding the total
number of customers who took advantage of life insurance promotion
and the magazine promotion, but who did not take advantage of the
watch promotion. We include sex and income range in our analysis by
designating these attributes as page variables. Heres the
procedure. 1. To begin, make sure the cursor is positioned in one
of the cells of sheet1 that contains data. Proceed to the Data
dropdown menu and select PivotTable and PivotChart Report and then
Finish. 2. Use the mouse to drag watch promotion and life insurance
promotion to the area labeled Drop Row Field Here. Drag magazine
promotion to the area labeled Drop Column Fields Here. 3. Drag life
insurance promotion, watch promotion and magazine promotion to the
area labeled Drop Data Items Here 4. Finally. Drag sex and income
range to the area labeled Drop Page Fields Here. I 11
12. The resultant pivot table appears in Figure 10. The 24
highlighted cells correspond to the cells of the cube shown in
Figure 9. In addition to the 24 cells representing the cube, the
pivot table also shows total yes and no counts for each promotion
together with summary total. Lets use the pivot table to help us
determine relationships among the three promotions. Figure 10: A
pivot table with page variables for credit card promotions First
well use the table to find the customer count for the cell
designated in Figure 10: 1. Find the area to the far left within
the pivot table that shows life insurance promotion = yes. This is
given in Figure 10 by rows 15 through 20. 2. Within this same area,
locate the sub region that has watch promotion=no 3. Finally.
Follow this sub region to the right until you reach the column for
magazine promotion = yes The contents of the cell show a 2 for all
three promotions. This tells us that a total of two customers took
advantage of the life insurance and magazine promotions but did not
purchase the watch promotion. We can drill down to examine the
individual records represented by the cell. Simply double click on
any of the cells containing the value 2. By default, the records
will be displayed in sheet5. Next lets look at the paging feature.
In the upper left corner of the pivot table, you will see the
paging variables sex and income range specified with the table
definition. We can I 12
13. use the page feature to answer questions about the
relationship between the attributes given as page variables and the
promotional offerings. For example, lets say we wish to examine the
relationship between income range and promotional offerings for
female customers. The procedure as follows: 1. Single-click on the
dropdown menu for sex, highlight female, and click OK 2.
Single-click on the drop menu for income range, highlight
$20-$30000 and click OK The pivot table displays the promotional
summary data for females making between $20000-$30000 dollars. The
table shows two female customers within the specified income range.
Neither female took advantage of the watch or magazine promotions,
but one female did purchase the life insurance promotional
offering. By examining the remaining income range data, you will
see that females with annual salary between $30000 and $40000
dollars have traditionally been the best candidates for promotional
offerings. It is obvious that the paging feature adds more
dimension to the analysis capabilities of Excel pivot table. I
13