Upload
nguyenduong
View
217
Download
0
Embed Size (px)
Citation preview
Marketing Analytics
“FGV Italy Cross-product Sales Analysis: a B2B case”
Student: Lorenzo Nicolò Formenti
ID Number: 12-216-354
Course name: Marketing analytics and data mining with inductive fuzzy
classification (Seminar)
Home institution: Università di Pavia
Faculty of Economics
via San Felice 7
27100 Pavia, Italy
Host institution: Universitè de Fribourg
Facultè de sciences èconomiques et sociales
Bd. de Perollès 90
1700 Fribourg, CH
Examiner: Prof. Andreas Meier
Supervisor: Dr. Michael Kaufmann
Date of submission: 6th December 2012
2
TABLE OF CONTENTS
1. Partner .......................................................................................................................................... 3 1.1 History ...................................................................................................................................... 3 1.2 Mission ..................................................................................................................................... 4 1.3 Relevant Market ....................................................................................................................... 4 1.4 Network .................................................................................................................................... 5 1.5 Sales Strategy .......................................................................................................................... 5
2. Research path .............................................................................................................................. 7 2.1 Raw data .................................................................................................................................. 7 2.2 Data improving and selection ................................................................................................... 7 2.3 Research purposes .................................................................................................................. 7 2.4 Methodology ............................................................................................................................. 8
2.4.1 Data mining ....................................................................................................................... 8 2.4.2 Correlation analysis ........................................................................................................... 8 2.4.3 Cluster Analysis ................................................................................................................. 9
3. Data Analysis ............................................................................................................................. 13 3.1 Cross-product Correlation analysis ........................................................................................ 13
3.1.1 Preliminary test ................................................................................................................ 13 3.1.2 Pearson correlation matrix ............................................................................................... 14
3.2 Cluster analysis ...................................................................................................................... 17 LITERATURE ............................................................................................................................... 21 INDEX OF FIGURES ................................................................................................................... 21 ACKNOWLEDGMENTS ............................................................................................................... 21
3
1. Partner
1.1 History
The business of Formenti & Giovenzana SpA was started in 1949 in Veduggio con Colzano, Italy, 20
kilometers south of Como Lake and 30 kilometers north of Milan, by an opportunity-seeking team of young
workers who tried to move forward after being directly involved in the second World War’s catastrophic
events.
To produce budget-friendly furniture hardware was the company’s primary orientation: shiny but scarce raw
materials like gold and silver were highly demanded by European consumers as well as massive-brass
drawer knobs used to represent high fashion in the upper class society houses of the time.
To meet the masses’ increasing demand of such fittings, by developing a valuable but more affordable
product, FGV launched into the market a cleverly-made hollow brass. This enabled the company to exploit
cost savings on one side while offering a consumer-oriented competitive product on the other, by means of
pragmatism and cleverness.
Constantly researching new opportunities to offer clever solutions, the company completed its product range
step by step to achieve a portfolio covering what is today a wide range of furniture-systems solutions: hinges,
slides, drawer systems, wall hanging brackets and wireworks, all meant to make quality products affordable
to the majority. Over the years, the company has consolidated its image and become synonymous of cost-
innovation, at both the product and process level.
In the late 1970’s, the company looked for the first time at starting a gradual penetration of foreign markets of
other European countries (France, Spain, Germany, UK), northern and southern America, the Middle East
and eventually Asia. In a growing market characterized by strong import-tariffs, a manufacturing unit was
opened for the first time in Portugal; afterwards, two sales branches were started in Germany and the USA,
specifically South Carolina, respectively in 1986 and 1993.
In the 1990s, the increasing production costs of the national context compelled the company to look even
further at low labour-cost countries: an Equity Joint Venture (E.J.V.) in Slovakia for the production of drawer
slides and the acquisition of a Brazilian slides producer.
During the first years of the 21st century the emergence of Chinese competitors in the furniture fitting
business. These Asian businesses were also cost-leaders; low prices (often accused of dumping) and
acceptable quality levels were the weapons of such manufacturers who began to make their way throughout
Europe by relying upon local dealers.
Even large distributors, the most relevant customer for FGV, could not ignore the temptation to buy from
China, often displacing on-site purchasing departments. Thus, FGV decided to create its own representative
office in Hong Kong to directly trade in the area. Customers pushed for more and the next step was the
acquisition of a Chinese company, which was, as in the case of Brazil and Slovenia, a drawer slides
producer.
In 2004, after only two years, the company by means of significant investments in technology became a
reference point for the local industry. Other products are now being manufactured other than drawer slides,
such as steel drawers, a part of which is exported abroad. In the international context, FGV Asia, through a
synergic action with the Hong Kong rep-office is today able to supply customers with a wide range of
4
competitive products, guaranteed by a well-known and now established brand.
1.2 Mission
The approach behind the first products has persisted over time and has turned out to be the core of the
current company mission: "Value for many" as a result of a portfolio based on what the company calls Good-
Better-Best solutions. “Value” is today what customers are looking for, from the entry level to the top of the
line product range, showing personal levels of willingness to pay for such value.
Trying to match such a variety of customers’ needs by providing added-value at the lower price, by means of
constant research and innovation, has also become a standard part of the company’s mission, relying upon
mass-volume manufacturing and scale-cost advantages.
1.3 Relevant Market
Until the 1990’s, the market was dominated by German and Austrian large-scale players (1000 million€
turnover) with a relevant share taken by smaller but efficient Italian enterprises (100mln€).
The recent entry of Asian cost-leaders that was discussed above has shifted this scenario into a new double-
set: today German and Austrian companies find themselves dominating only the furniture manufacturers’
side of the market.
Italian firms such as FGV are facing harsh competition from Chinese manufactures to capture the residual
distribution supply side: within this scenario, British and French large-volume distributors, representing a
huge share of the overall demand, seem to be European-oriented but deal sometimes at untenable
conditions, leaving to such challenge fertile ground to sharpen.
Innovation and investment choices are mostly related to the company strategy that is evaluated and revised
periodically, taking into consideration the evolving market conditions and its tradition of flexibility. Hence the
innovation takes place at both the product and process level, incrementally, by improving existing products
and manufacturing lines, as well as more radically, by means of new investments implemented over the
market’s dominant technological trajectories and the introduction of new product ranges.
Exit barriers of a modest size are present at the manufacturing level, since in the event of line failures,
machinery can be converted into new use with some, more or less, expensive alterations and improvements.
In other cases they are designed to the company’s delocalised production units abroad or even to
contractors.
On the entry side, barriers consist in huge start-up capital requirements and economies of scale exploited by
the market leaders. These are out of reach for new players, at least in the first business phase, thus strongly
limiting competitiveness.
Looking at margins, the product mark up changes across products and according to the customer’s business
type and traded volumes. In general, for premium products (top range) the average mark-up is around 30-
35%, while for entry levels and more widespread items (commodities) it drops to 20-25%, due to the harsh
Asian price-competition. Often the big bargaining power of large distributors oblige the company to keep very
low margins, but such fact is counterbalanced by regular huge demand flows, guaranteed with on-time
payments.
On the product side, the company’s traditional flexibility results in a broad range of products (see section
1.5), both standardized and custom. In general, big DIY (Do It Yourself) groups like Ikea, Brico, B&Q and
5
Conforama are supplied mostly with customized solutions, as tailor-made packaged kits ready for reselling.
Distributors, on the other hand, are more likely to buy more standardized items of the range, made of several
variants. Finally, smaller enterprises and furniture’s manufacturers behave more specifically, according to
their product ranges and final targets.
1.4 Network
FGV today is an Italian-based company active in the furniture hardware industry, facing modern market
challenges through a worldwide-established network. Trading as a B2B player in several countries,
operations are carried out through exclusive out-sourced distribution and a broad range of wholly owned
activities. With respect to the second, today the most relevant are:
- Italy: Headquarter, manufacturing, R&D and sales
- Germany: wholly owned subsidiary, distribution
- Poland: wholly owned subsidiary, distribution
- Spain: branch office, sales
- China Dongguan: wholly owned subsidiary, manufacturing
- China Canton: branch office, sales
- Hong Kong: branch office, sales
- Brazil: wholly owned subsidiary, manufacturing and sales
- Slovakia: wholly owned subsidiary, manufacturing
For my purpose, the focus will be on FGV Italy, the Italian division and headquarters of the group, who
provided me with data over the 2011 sales period and also some operative support.
1.5 Sales Strategy
The company strategy takes most of its relevant features from the company mission presented above: cost-
advantages, budget friendly solutions under a “value for many” perspective, as the mission’s key features
are the basic elements taken into account to adress sales operations.
The products are all designed at the Italian headquarter’s R&D labs, manufactured in Italy, China, Slovakia
and Brazil and sold all over the world in a standardized way, as the related final customer needs and uses
are supposed to be similar across the countries.
Instead, the difference lies in the customers’ willingness to pay for such products, since purchasers operate
in different economic backgrounds, manufacture or supply a wide range of products and sell them to different
purchasing-power determined markets.
For these reasons, the attempts to offer price-competitive products and guarantee a good level of quality at
the same time, are pursued by making what a micro-economist would call price descrimination: by following
the company’s cost structures, the same product is sold at different prices depending from its origin:
customers are charged with different prices when buying the same product from China or from Italy.
In this way, a quality-price trade off becomes the main point in customer’s purchasing choices and so the key
imput of this analysis.
Even if products are sold in a standardized way, elements of diversification are present within the product
6
range and must be taken into account. As specificed in section 2.2, our focus will be on three main product
categories: hinges, slides and drawers, manufactured and supplied in different variants. With respect to
them, the considered sales strategy take the following pattern:
1. HINGES
- Hinges – The section is composed by various types of equipment depending on the fastening
system and the box size. Among them, Genios, a dampered-closing hinge, is supposed to represent
the premium product of the line. It is sold for a higher price and designed over innovative concepts. It
represents the most profitable opportunity for future market developments.
- China-manufactured hinges – Chinese contractors today are able to offer the whole product range at
competitive prices and reasonable quality levels. The main differences from “made in EU” items are
only related to small product components and less efficient assembly methods. Outsource material
manufactured in China, are sold to customers by FGV Asia, after the product quality has been
assessed by a quality control team within the firm.
2. DRAWERS
- “Ten” drawers – considered as the premium product of this category, It is 100% manufactured on the
Italian premises of the group by a “lean” manufacturing proceeding. Able to compete even with the
top-level European drawers, it is the product on which the company is betting to increase revenues
in 2013.
- “Prime” drawers – in-source manufactured in Slovakia and mainly sold in huge volumes to large
French and British distributors. There is a good price to quality ratio.
- “Prime” China manufactured drawers – same variant, manufactured in China (outsourcing)
- China drawers – in-source manufactured in China by FGV Dongguan. This is an entry level product,
at a higly competitive price. It is either imported in Italy to serve the European market or locally sold
by FGV Asia. It accounts for a large proportion of the company’s overall sales.
- Drawers – entry level drawer sold at a competitive price. Same as “China drawers” but manufactured
in Italy.
3. SLIDES
- “Excel” slides – in-source manufactured in Italy, top level, high quality standards combined with
damper closing tecnology.
- Slides – in-source made in China (FGV Dongguan). This is an entry level product (commodity). It is
either imported by Italy to be sold in Europe or locally supplied by FGV Asia. It is known as a budget-
friendly product.
- China slides – outsourced to Chinese contractors, even more basic the previous.
In general, the company places itself at a middle-to-low market level, where profitability mainly comes from
large business volumes instead of huge unitary margins. The recent global economic downturn has further
expanded this target area and it is for this reason that the company, offering competitive but valuable
products, holds up well today serving its traditional targets while getting the attention of other new segments.
7
In addition, by means of premium level products (Ten and Genios), FGV today is able to compete with the
top market players, attempting to steal parts of their own shares.
2. Research path
2.1 Raw data
The data-set provided by the market and sales management of the company is related to 2011 sales for
individual customers (rows, 255 observations), divided by product (columns, 48 variables) and expressed as
percentage shares over the global sales generated by each product. Business type (manufacturing,
distribution, agent) and nationality for each customer are also included.
2.2 Data improving and selection
For my purposes, I found it useful to enlarge the provided data set with additional information and make a
selection among the ones provided, so to focus on the relevant dimensions only.
Considering distributors, it is reasonable to think that they show similar behaviours around the world. By
definition, a distributor serves a broad range of customers by providing a more or less widespread array of
products so that would be less attractive to add specifications with respect to its final target and look for
significant patterns. A further investigation was made for furniture manufacturers: by searching the internet,
six new attributes were defined under the label “final”: “bedroom”, “living”, “kitchen”, “bathroom”, “office” and
“other”. Such variables take values 0 or 1 whenever the considered manufacturer is “active” (1) or “non
active” (0) in that particular business, where “other” stands for any other business area not referable to the
main ones, perticularly: wood or rustic items, hotel furniture, hardware, constructions, or aluminium.
With respect to the raw data selection, as the provided data set was boundless and related to the whole of
the company’s activities, I decided to undertake a first manual “mining” phase, for the selection of information
and variables relevant to my purpose, in order to adress the research rationally. Particulalrly, I focused on
the three main product categories described above: hinges, slides and drawers, respectively accounting for
30%, 20% and 30% of the overall turnover.
Moreover, I have been taking into account sideburns, a product which use is strictly complementary to
hinges, as an operative benchmark: being concious of their “natural” sales correlation, it was possible to use
such relation to test for methodological purposes and data consistency.
2.3 Research purposes
The main objective of this research is to make an in-depth analysis of such data, “mining” out hidden
relevant informations and look for the existence of similarities between different customers’ behaviours and
willingness to buy specific products, The main purpose is to look at how they change across different
business types and categories, in order to identify, accordingly, different customers’ classes (“clusters“), with
similar behaviours, by the initial support of cross product correlations. The research findings will then be
helpful for the market-intelligence Head to double-ceck his actual knowledge of the market and target
future’s sales more consciously.
8
2.4 Methodology
2.4.1 Data mining
The methods used in this research are mainly based on data mining techniques in marketing analytics. Over
a huge amount of data, relative to any kind of phenomenon, be them panel, time series or cross-sectional
structured, a large volume of information is provided. Reasonably, the quantity of such information that
people understand is only a small part of it. Lying behind this data stands potentially useful information and
patterns, rarely evident or taken advantage of at a first look. The extraordinary growth of databases in recent
years, concerning basic everyday activities such as customer choices and behaviours, take data mining to
the leadership of new business technologies.
As stated in [Witten & Frank 2005, p.5], as the world grows in complexity, overwhelming us with the data it
generates, data mining becomes our only hope for elucidating the patterns that underlie it. Intelligently
analyzed data is a valuable resource. It can lead to new insights and work as a brilliant link between
statistics and computer science.
Data mining is a term used to describe knowledge discovery in databases. Formerly “the term was used to
describe the process through which undiscovered patterns in data were identified. However, over time, the
original definition has been enlarged to include most types of (automated) data analysis…the process of
finding mathematical patterns from usually large sets of data. These patterns can be rules, affinities,
correlations, trends or prediction models” [Turban, Aronson, Liang & Sharda 2007, p.305].
For our purposes such techniques will be exploited by a combination of correlation and cluster analysis, in
order to identify cross-product sales similarities (and differences), customer behaviours and market
peculiarities over a set of selected relevant variables.
2.4.2 Correlation analysis
In statistics, correlation is a single number that describes the degree of relationship between two variables.
To give a general overview over cross-product purchasing behaviours of customers and the resulting sales
patterns, the relations between n=10 product’s sales will be evaluated by computing a Pearson’s correlation
coefficient for each possible bivariate relation. Intuitive and easy to evaluate, this is the correlation measure
that best fits for our goals.
Given two variables, sales of product X and the sales of product Y, the Pearson’s product-moment
correlation coefficient (ρ) is defined as:
𝜌!,! =cov (𝑋,𝑌)𝜎!𝜎!
where:
cov (𝑋,𝑌): covariance of X and Y,
𝜎!𝜎!: X,Y standard deviations
9
For the evaluation of cross-product sales relations, notice that the coefficient maps in the interval [+1;-1],
where:
𝜌!" > 0 : positive correlation
𝜌!" = 0 : no correlation (X,Y independent)
𝜌!" < 0 : negative correlation
and
0 < 𝜌!" < 0,3 : weak correlation
0,3 < 𝜌!" < 0,7 : moderate correlation 𝜌!" > 0,7 : strong correlation
Looking at the coefficients, the reader will be able to identify the degree of relation existing between different
product sales for both manufacturers and distributors customers, getting a first immediate picture of such
dynamics. Afterwards, the correlations will be considered to narrow-define the findings of the cluster analysis
2.4.3 Cluster Analysis
As one of the most known data mining techniques, clustering’s scope is to partition a (more or less) large
database into smaller subsets (“clusters”), whose members share similar features. Cluster algorithms
classify objects based only on information found in the data that describes the objects and their relationships.
Since such methods aim to “mine out” relevant hidden patterns and information in data, as stated in [Tan,
Steinbach & Kumar 2005, p.490], the final goal is that the objects within a group be similar (or related) to
one another and different from (or unrelated to) the objects in other groups. The greater the similarity (or
homogeneity) within a group and the greater the difference between groups, the better or more distinct the
clustering, defined as a collection of clusters. This suggests that there does not exist an a-priori better way to
cluster a set of data, but it is subject to an iterative process of trial over trial, that will lead the researcher to
the best solution according to the matter at hand and the type of available data.
In this analysis, the focus will be over exclusive or non overlapping cluster algorithms.
Within this group the main trade off in choosing how to undertake the analysis is the one between
hierarchical and non hierarchical (partitional) techniques. A partitional clustering is simply a division of
the set of data objects into non-overlapping subsets (clusters) such that each data object belongs to exactly
one subset. If instead we permit clusters to have sub-clusters, then we obtain a hierarchical clustering, which
is a set of nested clusters that are organized as a tree [Tan, Steinbach & Kumar 2005, p.492]. In both the cases, the goal is to minimize the distance within the clusters and maximize the distance between
them.
10
Figure 1: Different ways of clustering the same set of points (Tan, Steinbach & Kumar 2005)
Hierarchical algorithms, in their associative mode, work in the following way:
1. First, each unit, by its multidimensional connotation, is assigned to a cluster, so that for “n” items, “n”
clusters are created. Let the distances (similarities) between the clusters be the same as the
distances (similarities) between the items they contain.
2. The closest (most similar) pair of clusters (items) is merged into a single cluster, so that now there is
one cluster less.
3. Distances (similarities) between the new cluster and each of the old clusters are computed.
4. Steps 2 and 3 are repeated iteratively until all items are clustered into a single cluster of size n.
Figure 2: Hierarchical algorythm, the Dendogram
11
What is important to notice is that when using such algorithms, the number of final clusters is not defined a
priori. Instead, after choosing a proper measure, given a sample of “n” observations, an nxn distance matrix
will be generated containing the distances, taken in pairs, of that set of points. Then, the closest objects will
be associated according to the 4 steps above.
Afterwards, a particular kind of hierarchical three-called dendongram is generated to illustrate the iterative
association steps in an immediate way (figure 2). Each node of the tree is the union of its “children” (sub-
clusters), and the root of the tree, laying at the “x” axis level, is the cluster containing all of the objects.
The researcher will then decide where to “cut” the pattern, looking for a good compromise between the
number of groups and the homogeneity within them, depending on how the data is distributed and on the
problem at hand.
For our purposes, it is enough to know that the number of clusters at the cut-off level, corresponding to the
red straight line in the graph, is the one adopted to define the final clusters, composed accordingly by the
associated observations.
By contrary, when using non-hierarchical algorithms, the main difference is that the number of final
clusters does not come out as a result, but has to be arbitrarily chosen when starting. In fact, the set of
multidimensional units are grouped into a defined number of clusters, built over a pre-defined number of
centers called centroids. After setting, as before, a proper distance measure, using an iterative algorithm that
optimizes a chosen criterion, each unit is assigned to its closest centroid.
Starting from an initial classification, units are transferred from one group to another or swapped with units
from other groups, until no further improvement can be made, and the out-come partition is supposed to be
optimal, at least locally.
Advantages and disadvantages of the two approaches are summarized in table 1:
Hierarchical Non Hierarchical
- more informative hierarchical
structure of output
- no decision about the
number of clusters
- complex algorithms
(computations may take
long)
- great influence of initial
decisions
- great distortions due to error
components
- easy to read and interpret
- more reliable but less
informative
- number of clusters to be
specified ex ante
- iterative adjustments give
more flexibility
- not easy to read and interpret
- simple and intuitive algorithms
- more defined final clusters
Figure 3: Hierarchical vs. Non hierarchical clustering.
12
Within non-hierarchical criteria, the present analysis will be carried out by means of the so-called K-means
algorithm, which is broadly used, simple and reliable.
Figure 4: Cluster building process by K-means algorithm [Tan, Steinbach & Kumar 2005]
Basically, k initial centroids are chosen, where k is a user-specified parameter, namely, the number of
clusters desired. Each point is then assigned to the closest centroid, and each collection of points assigned
to a centroid is a cluster. The centroid of each cluster is then iteratively updated based on the points
assigned to the cluster. The assignment and update steps are repeated until no point changes clusters, or
equivalently, until the centroids remain the same [Tan, Steinbach & Kumar 2005, p.498]. Disjoined subsets
are thereby formed.
Effectively, some steps have to be followed to carry out a K-means cluster analysis:
1. Choose a valid proximity (distance) measure. For the sake of simplicity, we choose as a distance
measure of clusters the Euclidean distance, defined as:
𝐷!" = (𝑥!" − 𝑥!")!!
!!!
where:
Dij: distance between cases i and j
xki: value of variable Xk for case i
xkj : value of variable Xk for case j
Intuitively, it corresponds to the linear segment connecting two arbitrary points “i” and “j” on the plane, and is
commonly used for data points in the Euclidean space
13
2. Selects relevant variables (data matrix columns), to be included into the analysis over which the
clusters will be defined
3. Set the number of desired clusters, approximated as:
𝑘 ≈ 𝑛 2
4. Run the analysis setting different values of “k”. Check which one maximizes the distance between
the centroids (mean values) and minimizes the distance between the points and the centroids (SSE:
sum of the squared error, i.e. squared Euclidean). Choose accordingly.
5. Interpret the results of the final output, where a final cluster is represented by the mean value that
each selected column variable takes within that particular cluster. Such values have to be compared
with the mean value that the variable takes over the whole distribution, to describe the elements
belonging to the cluster.
6. Describe the output to form homogeneous elements classes.
The most common marketing applications of cluster analysis look at a defined set of customers’ attributes
(age, gender, religion, level of education etc.) grouping them into similar classes and build, according to their
preferences, a coherent market segmentation analysis. Some others are devoted to the search for new
product opportunities, by clustering similar brands to describe the competitive environment and identify
eventual unserved segments.
As a business-to-business, the FGV case has different peculiarities, and this is what makes the matter at
hand attractive. Since attributes of the type described above do not exist for enterprises, we identify as
customer “attributes” their purchasing behaviours, considering 2011 percentage sales over 10 products as a
proxy. Under this perspective, consumers with similar behaviours and characteristics are clustered to identify
common purchasing patterns and finally homogeneous customers classes. This will be carried out
distinguishing between two different categories: furniture manufacturers and distributors, the purchasing
baskets of which are different by definition.
3. Data Analysis
3.1 Cross-‐product Correlation analysis
Following the methodology explained above, the cross product correlation coefficients are computed and
summarized in a matrix, to allow the reader to give a first general evaluation of cross product sales patterns.
The coefficients will then be compared with the final clusters results as a benchmark, to give the results
some consistency and more narrowly define the picture.
3.1.1 Preliminary test
Before starting we do a simple operation, to test for the coherence of this method and the quality of our data:
compute the Pearson cofficient for Hinges and Sideburns. It is necessary to point out here that sideburns are
a fundamental component of hinges, the sales of which are split by the company for accounting purposes.
For this reasons, we expect a strictly positive and strong correlation occurring between the two.
14
Hinger/Sideburns test
51 52
51 Hinges 1 0,824**
52 Sideburns 0,824** 1
**. Correlation is significant at the 0.01 level (2-tailed).
Table 1: Correlation analysis. A preliminary test.
As expected, the coefficient takes value 0,824 and shows that a strong positive sales correlation exists.
3.1.2 Pearson correlation matrix
Used in the same way as the test, the percentage sales per customer of the considered products are
analysed here with respect to their cross relations. A 10x10 Pearson correlation matrix reports the
coefficients for each possible bivariate relation. Here, no distinction is applied for manufacturers and
distributors.
Pearson correlation matrix: global (n=255)
Products 51 51C 54 54B 54C 58 58C 58E 58F 58X
51 Hinges 1 0,128* 0,236** 0,629** 0,452** 0,426** 0,284** 0.097 0.017 0,324**
51C China Hinges 0,128* 1 0,003 0,003 0,022 0,006 -0,003 -0,015 -0,011 -0,024
54 Slides 0,236** 0,003 1 0,292** 0,412** 0,161** 0,093 0,046 0,009 0,013
54B Excel Slides 0,629** 0,003 0,292** 1 0,593** 0,164** 0,211** 0,338** 0,043 0,537**
54C China Slides 0,452** 0,022 0,412** 0,593** 1 0,107 0,112 0,054 0,077 0,157**
58 Drawers 0,426** 0,006 0,161** 0,164** 0,107 1 0,547** 0,348** -.003 0,014
58C China Drawers 0,284** -0,003 0,093 0,211** 0,112 0,547** 1 0,588** 0,013 0,088
58E Prime Drawers 0.097 -0,015 0,046 0,338** 0,054 0,348** 0,588** 1 0,544** 0,099
58F China Prime Drawers 0.017 -0,011 0,009 0,043 0,077 -0,003 0,013 0,544** 1 0,052
58X Ten Drawers 0,324** -0,024 0,013 0,537** 0,157** 0,014 0,088 0,099 0,052 1
*. Correlation is significant at the 0.05 level (2-tailed). **. Correlation is significant at the 0.01 level (2-tailed).
Table 2: Pearson's cross-product correlation coefficients: global.
Globally, what is evident is a consistent level of intra-category correlation: for instance, who is buying
Prime drawers, a good quality to price ratio product, is also likely to diversify his choices within the
same product category and buy other drawers with different features. The same is true if we look at
slides, particularly strong for China slides, the most basic and price-competitive model.
Moreover it is easy to notice a sort of inter-category correlation, even if less strong than the previous:
15
customers who are purchasing Excel slides seem likely to also buy Drawers, especially true for the
model “Ten”, being them both premium level products. This may be due to the fact that slides and
drawers are, to a certain point, complementary products but surprisingly, this does not seem to hold
true for lower level products. Finally, a very peculiar pattern is followed by Hinges, where consistent
degrees of correlations show off with almost all the other items of the range and strongly with the top
range model of slides. Surprisingly instead, the Chinese variant’s sales almost behave independently.
Considering distributors, accounting for the majority of FGV customers, the same patterns are
expected to repeat with some differences. Looking at data, the dynamics described at the general
level are in fact confirmed, with a significant magnitude of intra-category correlations that look
stronger than before.
Pearson correlation matrix: distributors (n=171)
products 51 51C 54 54B 54C 58 58C 58E 58F 58X
51 Hinges 1 0,137 0,348** 0,774** 0,493** 0,392** 0,288** 0,126 0,135 0,338**
51C China Hinges 0,137 1 0,006 0,002 0,018 -0,002 -0,007 -0,013 -0,026 -0,034
54 Slides 0,348** 0,006 1 0,535** 0,614** 0,269** 0,154* 0,105 0,136 0,006
54B Excel Slides 0,774** 0,002 0,535** 1 0,717** 0,131 0,179* 0,078 0,198** 0,573**
54C China Slides 0,493** 0,018 0,614** 0,717** 1 0,111 0,119 0,069 0,187* 0,154*
58 Drawers 0,392** -0,002 0,269** 0,131 0,111 1 0,590** 0,520** -0,002 -0,003
58C China Drawers 0,288** -0,007 0,154* 0,179* 0,119 0,590** 1 0,953** 0,078 0,070
58E Prime Drawers 0,126 -0,013 0,105 0,078 0,069 0,520** 0,953** 1 0,155* 0,059
58F China Prime Drawers 0,135 -0,026 0,136 0,198** 0,187* -0,002 0,078 0,155* 1 0,241**
58X Ten Drawers 0,338** -0,034 0,006 0,573** 0,154* -0,003 0,070 0.059 0,241** 1
**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).
Table 3: Pearson's cross-product correlation coefficients: distributors
Inter-category correlations are still present, but with new features. If we look at the one drawers/slides, for
instance, product sales that before were less correlated are now more correlated. The case of China’s Prime
drawers’ correlation with the whole slides category is emblematic. As before, Hinges look like a product that
is purchased, even if with stronger degrees, along with all others.
Generally, such an augmentation of correlations, especially for intra-family, may be due to the particular
purchasing needs of distributors: facing a broadly defined target market with multiple needs that have to be
provided with a broader range of products with different price marking.
On the manufacturers’ side, sales correlations of a lower intensity are expected to be produced, describing
different patterns due to different customers’ approaches to the market. Looking at data this exactly what
occurs: sales correlations existing before generally become lower, in some cases disappear. This is mostly
16
true if we look at intra-family correlation patterns identified above, where the case of slides family is highly
representative of this, since the consistent positive correlation becomes negative. This is not particularly
evident if we look at inter-category correlations, such as the case of drawers/slides: positive correlations are
lower in some case, as part as shown of a broad trend, but here many peculiarities exist at the product level.
Pearson correlation matrix: manufacturers (n=84)
products 51 51C 54 54B 54C 58 58C 58E 58F 58X
51 Hinges 1 -0,026 0,093 0,116 -0,046 0,732** 0,221* 0,086 -0,028 0,145
51C China Hinges -0,026 1 -0,017 -0,018 0,012 0,117 -0,010 -0,022 -0,014 -0,014
54 Slides 0,093 -0,017 1 -0,024 0,219* -0,023 -0,024 -0,024 -0,015 -0,020
54B Excel Slides 0,116 -0,018 -0,024 1 -0,031 0,397** 0,322** 0,636** 0,010 0,866**
54C China Slides -0,046 0,012 0,219* -0,031 1 -0,020 -0,033 0,150 0,264** -0,040
58 Drawers 0,732** 0,117 -0,023 0,397** -0,020 1 0,155 0,311** -0,005 0,416**
58C China Drawers 0,221* -0,010 -0,024 0,322** -0,033 0,155 1 0,255** -0,008 0,337**
58E Prime Drawers 0,086 -0,022 -0,024 0,636** 0,150 0,311** 0,255** 1 0,682** 0,654**
58F China Prime Drawers -0,028 -0,014 -0,015 0,010 0,264** -0,005 -0,008 0,682** 1 0,012
58X Ten Drawers 0,145 -0,014 -0,020 0,866** -0,040 0,416** 0,337** 0,654** 0,012 1
**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed)
Table 4: Pearson's cross-product correlation coefficients: manufacturers
This is the most relevant difference that we notice, when comparing distributors and manufacturers
behaviour: strong correlations shows off at the product level, and, in particular, higher values come out for
products holding a similar line position (entry with entry, premium with premium). The strongest correlations
in the table (𝜌!" > 0,7 ) is between Drawers and Hinges, basic line products and Excel slides and Ten
drawers, premium levels of the range.
This suggests the existence of a lower level of diversification in furniture manufacturers purchasing
behaviours, since they are called to reach narrow defined segments of final customers, with defined needs
and purchasing powers that must be met by more specific products, using elements of specialization. This
reflects in their buying decisions, where products of a similar level and perception are likely to be bought
together and the intra-family trends identified for distributors, symptoms of their will to provide a broad array
of items, almost disappear.
Finally, and not surprisingly, the behaviour of Hinges is typical: positively correlated with almost all the items
before, this is now true only for products of a similar level. This suggests that the Chinese variant is more
appreciated by manufacturers or small distributors, since this behaviour is always in line with such trends.
Behavioural differences within manufacturers of different furniture and within distributors will be considered in
the following section, devoted to clustering.
17
3.2 Cluster analysis
In this section, considering findings and results coming from the correlation analysis, homogenous groups of
customers will be identified using the K-means cluster technique, with the aim to give a coherent picture of
the company’s customer’s behaviours.
Looking at distributors, representing the majority of the company’s global sales, a 9-cluster setting was
arbitrarily chosen. Values for clusters 2 to 7 are not reported as only one observation turns out to belong to
each cluster: since the goal is to identify similar classes and patterns, this is something that is statistically
negligible. Mean values computed over the distributor’s data subset are instead reported in column 6 and are
to be used, from now on, as the main benchmark for comparison.
Considering the output, it is easy to notice that distributors’ behaviours across countries are very
homogeneous: aside from clusters 8 and 9, almost the entirety of items observed in a 10-dimensional space
belongs to a single and common group, cluster 1, described by under-average sales equally distributed
across products. Within this, basic Slides, Ten drawers and China Prime drawers seems to be more
appreciated by the market. This is coherent with the high level of cross-product correlations identified in the
first part of the analysis, representing widely distributed purchasing schemes typical for distributors.
Nevertheless, more defined preferences, even if for only 6 operators, show off also in the case of
distributors: clusters 8 and 9 represent more specific behaviours outwardly carried out by large distribution
groups. In cluster 8, sale shares for some items increase dramatically and firms’ orientation come out
accordingly, described by above-average sales for products of different line levels, particularly consistent in
the case of Hinges (basic), “Excel” slides (premium) and China drawers (entry). Cluster 9, instead is
characterized by sales close to zero for each item, with all the “efforts” devoted to a budget-friendly product,
China Hinges.
Cluster Centers (final): distributors
Product (columns attributes)
Cluster Mean
1 2à7 8 9
51 Hinges 0.1326% … 9.7516% 0.3534% 0.4244%
51C China Hinges 0.0782% … 1.1486% 7.6863% 0.5324%
54 Slides 0.2741% … 2.3604% e-10 0.3823%
54B Excel Slides 0.1494% … 5.8286% 0.0059% 0.4137%
54C China Slides 0.1484% … 1.1132% e-10 0.4834%
58 Drawers 0.1043% … 1.5428% e-10 0.4530%
58C China Drawers 0.1246% … 4.0050% e-10 0.4409%
58E Prime Drawers 0.1076% … 0.6010% e-10 0.2574% 58F China Prime
Drawers 0.2566% … 0.5554% e-10 0.2745%
58X Ten Drawers 0.3340% … 0.0449% e-10 0.5410% Cluster membership
(n=171) 159 6 2 4 _
Table 5: K-means clustering: distributors.
18
On the furniture manufacturers’ side, a (0,1) variable is introduced in order to identify preferences of
manufacturers producing different kinds of furniture, divided in six classes: “bathroom”, “kitchen”, “living-
room”, “bathroom”, “office” and “other”, where “other” includes every business area that cannot be related to
the five main categories.
As detected by the correlation analysis, here less uniform purchasing patterns, shown by both sales shares
and cluster memberships distributions, come up as signals of well defined preferences of manufacturers,
specialized in their own products and focused on their narrowly defined targets. Not surprisingly, looking at
the mean values reported in column 6, sales volumes decrease as well.
Cluster centers (final): manufacturers
Business area/Product (columns attributes)
Cluster Mean 1 2 3 4 5 6
Bedroom* 1 1 0 0 1 0
Kitchen* 1 0 0 0 1 1
Living* 0 1 0 1 1 0
Bathroom* 1 0 0 0 1 0
Office* 0 0 0 1 1 0
Other* 0 0 1 0 0 0
51 Hinges 0.3696% 0.2855% 0.0427% 0.0415% 0.4826% 0.2191% 0.2554%
51C China Hinges e-10 e-10 0.4601% e-10 e-10 e-10 0.0870%
54 Slides e-10 1.1339% 0.0191% e-10 0.4438% 0.2336% 0.3062%
54B Excel Slides 1.4351% 0.0706% 0.3097% 0.1684% e-10 0.0221% 0.2378%
54C China Slides 0.0059% 0.2138% 0.1837% 0.6941% 0.0041% 0.0536% 0.1626%
58 Drawers 0.6751% e-10 0.1629% 0.0287% 0.0029% 0.0678% 0.2172%
58C China Drawers 0.5494% 0.0907% 0.0822% 0.0058% 0.0157% 0.6187% 0.2123%
58E Prime Drawers 2.1013% 0.0078% 0.9894% e-10 0.0392% 0.3359% 0.4582%
58F China Prime Drawers 0.1536% e-10 2.5971% e-10 0.0503% 0.1207% 0.5060%
58X Ten Drawers 0.3176% 0.0269% 0.0433% e-10 e-10 0.0102% 0.0449%
Cluster membership (n=84) 9 20 18 8 8 21
*variable “business area” (0,1): 1:manufacture; 0:do not manufacture
Table 6: K-means clustering: manufacturers.
19
The output of the arbitrarily chosen 6-clusters setting can be presented as the following. According to size,
six homogeneous classes of customers can be identified:
- Kitchen manufacturers (cluster 6): with 21 belonging manufacturers, this constitutes the most
consistent group of customers, specialized in kitchen furniture manufacturing. They purchase items
along the whole product range, mostly in under-average quantities, with the brilliant exception of
China Drawers, an in-sourced entry level product that seems to fit well for their purposes.
- Bedroom/Living manufacturers (cluster2): this class, consistent as the first, is composed by
manufacturers who associate the production of living-room and bedroom furniture. They mostly hold
non-homogenous preferences, where very small purchasing volumes, clearly under-average for
most of products, come together with strictly defined preferences for mid-line solutions: Hinges and
Slides purchases are noteworthy.
- Outside-oriented (cluster 3): surprisingly, a huge share of manufacturers are active in business
areas that are not considered in the usual company’s target: wood or rustic furniture, hotel apparel,
general hardware, constructions, aluminium. Even more interesting, the dominant purchasing
strategy is composed of above-average purchases for a variety of items at different range positions:
China Hinges, Excel slides and Prime drawers for both the European and Chinese variants are the
most evident.
- Broadly oriented (cluster 1): a smaller class composed by eight customers is devoted to a wider
set of activities, particularly bedroom, kitchen and bathroom furniture manufacturing. Even though it
is smaller, this group looks relevant in terms of profitability: by purchasing consistently over-average
almost all of the products, this class is mostly oriented towards the medium to higher positions of the
range. Relevant here are the shares for Excel slides, Prime and Ten drawers. By contrary the
“Chinese” line is less appreciated, with the important exception of basic Drawers.
- Day-light lovers (cluster 4): within this cluster, as wide as the previous, the light-motive is the
binomium living-room and office. Very small purchasing shares are typical of this class that looks, at
least in theory, less profitable than all the previous. However, the exception of China Slides,
purchased in huge volumes, have to be noticed as signal of a low-budget orientation.
- All-rounders (cluster 5): active in all of the relevant target businesses, this eight customers group
shows not surprisingly a strong orientation for basic products. Particularly their over-average
purchases, pretty consistent, are all concentrated over Hinges and Slides. Other items’ buying is
only marginal.
As part of a coherent framework for sales strategy addressing, the results of both the correlation and
cluster analysis will be summarized in the following section.
3.3 Findings
First, the correlation analysis, if carried out at a global level, shows a consistent amount of intra-family
correlation. This is particularly true for drawers and slides, where customers are likely to diversify their
choices within the same product family. Moreover, even if less strong than the intra-family one, it is easy to
notice a sort of inter-family correlation. Once again with respect to drawers and slides, this is particularly
20
intense for premium level products. This may be due to the fact that slides and drawers are perceived by
customers as complementary in use, but this does not seem to hold true for the lower level products.
Instead, Hinges follow a very peculiar pattern: for the European variants, consistent degrees of correlations
are found with almost all of the other items of the range. Surprisingly instead, the Chinese variant’s sales
almost behave independently.
Considering distributors, the dynamics described at the general level are confirmed and enforced by a
stronger magnitude of intra-family correlations. Generally, such an augmentation of correlations, especially
for intra-family, may be due to the particular purchasing needs of distributors: facing a broadly defined target
market with multiple needs that have to be provided with a broader range of products with different price
marking.
Inter-family correlations are also present, but with new features. For instance, product sales that before were
less correlated are now more correlated. The case of China’s Prime drawers’ correlation with the whole
slides family is emblematic. As before, Hinges look like a product that is purchased, along with all others.
On the manufacturers’ side, as expected, sales correlations existing before generally decrease and in some
cases even disappear. This is mostly true if we look at intra-family correlation patterns identified above. The
case of the slides category is highly representative of this, since the consistent positive correlation becomes
negative. This is not particularly evident if we look at inter family dynamics, such as the case of
drawers/slides: positive correlations are lower in some case, as part of a broad trend, but many peculiarities
exist at the product level. Such new dynamics are good in drawing heterogeneous patterns of purchasing,
due to different manufacturers approaches to the market when considering their narrowly defined target
segments along with their features.
On the other hand, Cluster analysis shows that distributors’ behaviours across countries are something
homogeneous: aside from clusters 8 and 9, almost the entirety of items observed in a 10-dimensional space
belong to a single big common group, cluster 1, described by under average sales equally distributed across
product categories. Within this, Basic Slides, Ten drawers and China Prime drawers seem to be more
appreciated. This is coherent with the high level of cross-product correlations identified in the first part of the
analysis, representing widely distributed purchasing schemes typical for distributors.
Nevertheless, some degrees of specialization appear for a small number of players: clusters 8 and 9
represent more specific behaviours outwardly carried out by the largest distribution groups.
On the furniture manufacturers’ side, as previously detected by the correlation analysis, more specific
purchasing patterns, shown by both sales shares and cluster memberships distributions, come up as signals
of well defined preferences of manufacturers specialized in their own products and focused on their narrowly
defined final targets. With respect to this, six customer groups were identified, with different levels of
specialization and preferences, according to the business areas in which they are active: Kitchen
manufacturers, Bedroom/living manufacturers, Outside-oriented, Broadly-oriented, Daylight lovers, All-
rounders. Specific purchasing behaviours, volumes and levels of profitability for the company come out
accordingly.
21
LITERATURE
[Witten & Frank 2005] Witten I.H.; Frank E.: Data Mining: Practical Machine Learning Tools and Tecniques,
2nd edition, Elsevier/Morgan Kaufmann, San Francisco 2005, USA
[Turban, Aronson, Liang & Sharda 2007] Turban E., Aronson, J. E., Liang, T.P., & Sharda, R. (2007):
Decision Support and Business Intelligence Systems. Pearson Education. New Jersey 2007, USA
[Tan, Steinbach & Kumar 2005] Tan P.N., Steinback M., Kumar V.: Introduction to Data Mining, US edition,
Addison Wesley, Boston 2005, USA
INDEX OF FIGURES
Figure 1: Different ways of clustering the same set of points [Tan, Steinbach & Kumar 2005] ....................... 10
Figure 2: Hierarchical algorythm, the Dendogram ........................................................................................... 10
Figure 3: Hierarchical vs. Non hierarchical clustering ..................................................................................... 11
Figure 4: Cluster building process by K-means algorithm [Tan, Steinbach & Kumar 2005] ............................ 12
Table 1: Correlation analysis. A preliminary test ............................................................................................. 14
Table 2: Pearson's cross-product correlation coefficients: global ................................................................... 14
Table 3: Pearson's cross-product correlation coefficients: distributors ........................................................... 15
Table 4: Pearson's cross-product correlation coefficients: manufacturers ...................................................... 16
Table 5: K-means clustering: distributors ........................................................................................................ 17
Table 6: K-means clustering: manufacturers ................................................................................................... 18
ACKNOWLEDGMENTS
I would like to give special thanks to Ms. Silvana Riboldi, Head of FGV market intelligence department, for
the constant support given during the research and Jules Kingery, my new American friend, for final reviews
and suggestions