Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
GHENT UNIVERSITY
FACULTY OF ECONOMICS AND BUSINESS
ADMINISTRATION
ACADEMIC YEAR 2015 – 2016
Web Crawling in R: Predicting Leads
Master thesis presented to obtain the degree of
Master in Applied Economic Sciences: Business Engineer
Arno Liseune
under the guidance of
Prof. Dirk Van den Poel, Jeroen D’Haen & Tijl Carpels
GHENT UNIVERSITY
FACULTY OF ECONOMICS AND BUSINESS
ADMINISTRATION
ACADEMIC YEAR 2015 – 2016
Web Crawling in R: Predicting Leads
Master thesis presented to obtain the degree of
Master in Applied Economic Sciences: Business Engineer
Arno Liseune
under the guidance of
Prof. Dirk Van den Poel, Jeroen D’Haen & Tijl Carpels
PERMISSION Ondergetekende verklaart dat de inhoud van deze masterproef mag geraadpleegd
en/of gereproduceerd worden, mits bronvermelding.
Undersigned declares that the content of this master thesis may be consulted, and/or
reproduced on condition that the source is quoted.
Arno Liseune
I
PREFACE Writing this master thesis has been made possible thanks to many different people.
First of all, I would like to thank my promoter, Professor Dirk Van den Poel, who
facilitated this research.
I would also like to express my gratitude to Jeroen D’Haen, who provided me with
guidance, suggestions and valuable insights throughout this research project.
Further, I would like to thank Tijl Carpels for the help he offered me during the final
stage of this master thesis.
In addition, I would also like to thank the anonymous Belgian energy supplier for
providing me with the data that made this research possible.
Finally, I would like to address a special thanks to my family and friends who helped
me and gave me moral support throughout the entire duration of this study.
II
TABLE OF CONTENTS
PREFACE..........................................................................................................................ITABLE OF CONTENTS..................................................................................................IILIST OF ABBREVIATIONS...........................................................................................IIILIST OF TABLES...........................................................................................................IVLIST OF FIGURES..........................................................................................................VSAMENVATTING.............................................................................................................1
ABSTRACT......................................................................................................................21 INTRODUCTION......................................................................................................2
2 METHODOLOGY.....................................................................................................62.1 Web mining..................................................................................................................7
2.1.1 Identification of corporate websites..........................................................................72.1.2 Data collection.............................................................................................................9
2.2 Text mining..................................................................................................................92.2.1 Text preparation..........................................................................................................92.2.2 Text representation..................................................................................................102.2.3 Dimensionality reduction.........................................................................................11
2.3 Incorporating expert knowledge............................................................................132.4 Predictive modeling..................................................................................................15
2.4.1 Regularized logistic regression...............................................................................162.4.2 Random forest...........................................................................................................172.4.3 Rotation forest...........................................................................................................172.4.4 AdaBoost...................................................................................................................182.4.5 Support Vector Machine..........................................................................................18
2.5 Model evaluation criteria.........................................................................................20
3 EMPIRICAL VERIFICATION.................................................................................213.1 Research data............................................................................................................213.2 Optimal dimensionality and model selection.......................................................223.3 Results........................................................................................................................24
4 CONCLUSION........................................................................................................26
5 LIMITATIONS AND FURTHER RESEARCH.......................................................27
ACKNOWLEDGEMENTS.............................................................................................28BIBLIOGRAPHY............................................................................................................29
III
LIST OF ABBREVIATIONS B2B Business-to-Business
CRM Customer Relationship Management
XML Extensible Markup Language
HTML HyperText Markup Language
URL Uniform Resource Locator
PCA Principal Component Analysis
PC Principal Component
SVM Support Vector Machine
ROC Receiver Operating Characteristic
AUC Area Under the receiver operating characteristic Curve
TP True Positives
TN True Negatives
FP False Positives
FN False Negatives
IV
LIST OF TABLES
Table 1: Variables used in research…………………...…………………………………14
Table 2: Characteristics marketing data…………………......……………………….….22
Table 3: AUC and top-decile lift…………………...……………………………………...26
V
LIST OF FIGURES
Figure 1: Methodology………………………………………………………………………6
Figure 2: Corporate website identification…………………………………………………8
Figure 3: Text mining stages………………………………………………..…………….10
Figure 4: Term filtering…………………………………………………………………..…12
Figure 5: Principal Component Analysis…………………………………………………13
Figure 6: Hybrid ensemble……………………………………………………...…………16
Figure 7: Random forest…………………………………………………...………………17
Figure 8: AdaBoost…………………………………………………………………………18
Figure 9: Support Vector Machine…...………………...…………………………………19
Figure 10: Model performance in function of dimensionality……..……………………23
Figure 11: Cumulative lift curves…………………………………………….……………25
Figure 12: ROC curves…………………………………………………….………………25
1
SAMENVATTING In deze studie onderzochten we of tekstuele informatie, gevonden op websites van
ondernemingen, gebruikt kan worden om veelbelovende leads te identificeren in een
Business-to-Business (B2B) context. In het bijzonder hebben we aangetoond hoe
verschillende web- en text mining technieken toegepast kunnen worden om deze
ongestructureerde tekstuele gegevens te verzamelen en te organiseren. Ook
onderzochten we hoe Principal Component Analysis kan helpen deze inhoud te
transformeren naar een verzameling van karakteristieken die verbonden zijn aan deze
websites. Aan de hand van een hybrid ensemble konden we vaststellen dat deze
kenmerken de identificatie van veelbelovende leads kunnen bevorderen. Bovendien
toonden we aan dat karakteristieken aangevuld met variabelen die voortkwamen uit
domeinexpertise tot nog betere resultaten leidden. Bijgevolg kan het raamwerk zoals
voorgesteld in dit onderzoek gebruikt worden door B2B verkoopvertegenwoordigers,
aangezien dit hen in staat stelt om leads te rangschikken volgens hun probabiliteit van
conversie. Het resultaat is een meer gerichte marketingaanpak die vooral gunstig is
voor bedrijven die geconfronteerd worden met lage conversieratio’s of die gelimiteerd
zijn door beperkte marketingbudgetten.
2
ABSTRACT In this research, we investigated whether textual information extracted from corporate
websites could be used to identify promising leads in a Business-to-Business (B2B)
environment. Particularly, we showed how several web- and text mining techniques
can be applied to extract and organize this unstructured information and how principal
component analysis (PCA) may help to transform this content into a set of corporate
website characteristics. By means of a hybrid ensemble, we found that these
characteristics can facilitate the identification of promising leads. Additionally, we
showed that a data augmentation with variables that were constructed through expert
knowledge rendered even more promising results. Hence, the framework presented in
this research can be used by B2B marketers as it allows them to rank leads according
to their predicted conversion probabilities. The result is a more targeted marketing
approach which can be especially beneficial for businesses confronted with low
conversion ratios or constrained by limited marketing budgets.
Keywords: Acquisition, B2B, Web Mining, Text Mining, PCA, Machine Learning
1 INTRODUCTION In the beginning of the 20th century, mass production dominated the business
landscape. For many decades following the industrial revolution, producers
manufactured large amounts of standardized products, longing for economies of scale.
As a result, marketing activities were solely focused on covering a large market share,
ignoring different customer segments (Bauer, Grether, & Leach, 2002). Traditional
advertising media such as radio and television enabled firms to spread their message,
trying to reach as many people as possible. As time passed, advancements in
technology changed marketing focus to a more targeted approach, as direct mailing
and telemarketing facilitated direct communication with the customer (Ling & Yen,
2001). Instead of providing the whole market with a single offer, marketers were now
able to present products that were relevant to specific customer segments (Petrison,
Blattberg, & Wang, 1997). Still, little was known about the true individual customer
3
needs as firms lacked personal interactions employing this one-way communication
strategy.
In recent years however, improvements in information systems and technologies
introduced a paradigm shift in marketing. Throughout the 1980s and 1990s, major
innovations in database technology allowed firms to store and analyze data (Petrison
et al., 1997). The emergence of the internet offered the opportunity to collect vast
amounts of customer information through various interactions over time. As computer
power rapidly increased, sophisticated analysis of this data became more attractive.
Moreover, as customers became aware of competitive offers with the advent of the
World Wide Web, many firms began to recognize the value of this information to cope
with the increasingly competitive environment (Rygielski, Wang, & Yen, 2002; Shaw,
Subramaniam, Tan, & Welge, 2001). In order to remain successful, customer
information was leveraged to produce customized products and services in an effort to
create superior value and to build long-term relationships (Kothandaraman & Wilson,
2000; Ulaga & Chacour, 2001). Nowadays, this relational marketing approach is known
as Customer Relationship Management (CRM). According to Ling and Yen (2001),
“CRM is a concept whereby an organization takes a comprehensive view of its
customers to maximize customer’s relationship with an organization and the
customer’s profitability for the company” (pp. 82-83). Whereas operational CRM
provides support to business processes, analytical CRM focuses on the analysis of
customer characteristics as well as behavior in order to maximize marketing
effectiveness (Ngai, Xiu, & Chau, 2009). Data mining tools are often used to analyze
this customer data as they offer advanced algorithms to extract hidden knowledge from
corporate data warehouses. (Bose & Mahapatra, 2001; Ngai et al., 2009; Rygielski et
al., 2002). With the help of these techniques, firms are able to improve customer
acquisition, retention and development, hence achieving higher marketing success
rates (Baecke & Van den Poel, 2010). Generally, CRM focuses on keeping and
satisfying existing customers, since this strategy is considerably more profitable than
acquiring new ones on a regular basis (Isaac & Tooker, 2001; Reinartz & Kumar, 2003;
Wilson, 2006). At some point in time, however, customer relationships will dissolve
(Dwyer, Schurr, & Oh, 1987). Therefore, identifying new customers remains critical to
the viability of today’s organizations (Thorleuchter, Van den Poel, & Prinzie, 2012;
Wilson, 2006). Whereas customer retention is a relatively easy process, relying on
4
existing customer data, companies are much less familiar with the acquisition of new
customers, which requires the search for unknown information. Traditionally, firms use
external data purchased from commercial vendors as input for acquisition models (Hill,
1999; Wilson, 2006). Even so, these lists tend to be very expensive and of poor quality
as they often contain many missing values (D’Haen, Van den Poel, & Thorleuchter,
2013; Shankaranarayanan & Cai, 2005).
Today, web data could be a valuable alternative for acquisition purposes in a Business-
to-Business (B2B) context as modern organizations use their websites to keep
customers informed (Thorleuchter et al., 2012). However, this does not come without
challenges. Firstly, web data is so unstructured that only humans can understand it.
Secondly, the huge amount of data requires machines to process it (Stumme, Hotho,
& Berendt, 2006). As a result, firms seldom use this valuable source of information for
marketing activities (Coussement & Van den Poel, 2009). Nonetheless, new data
mining techniques are emerging to take on these challenges. According to Kosala and
Blockeel (2000), “Web mining is the use of data mining techniques to automatically
discover and extract information from Web documents and services” (p. 2). Hence,
web mining applications deliver tools that could help marketers extract information from
the web (Thorleuchter et al., 2012). Even so, the result can be highly unstructured.
Luckily, text mining algorithms exist to extract hidden knowledge from unstructured
information (Hotho, Nürnberger, & Paaß, 2005). Since approximately 80% of all
information is stored in textual form (Gentsch & Hänlein, 1999), the need to master
these techniques increases.
In this study, web-, text- and data mining techniques will be applied in order to identify
promising leads in a B2B context. Based upon the data of a firm’s previous marketing
campaign, we attempt to determine the relationship between a lead’s website
characteristics and the success rate of customer conversion for this particular firm.
Information from corporate websites will be crawled in order to uncover these hidden
characteristics. Nowadays, vector space models are often used to represent
unstructured text in a way machines can process it (Silva & Ribeiro, 2003).
Nevertheless, the feature space can still be of high-order as text collections often
comprise thousands of terms (Yang & Pedersen, 1997). This high dimensionality may
be problematic for many classifiers when the number of terms is much higher than the
5
number of documents included in the text corpus. Therefore, dimensionality reduction
techniques need to be applied in order to reduce the feature set to a more manageable
form (Sebastiani, 2002; Silva et al., 2003; Yang et al., 1997). Several methods exist to
achieve this purpose. Feature selection techniques aim at the retrieval of the most
informative terms in a text collection, resulting in a subset of the original corpus. An
alternative to this approach is to reduce dimensionality by some linear or nonlinear
projection of the high dimensional space onto a lower one, also known as feature
extraction (Tang, Shepherd, Milios, & Heywood, 2005). After the dimensionality
reduction phase, a prediction model is built upon the new feature set by means of the
combination of several machine learning algorithms. The discovery of a pattern
between a lead’s characteristics and the success rate of customer conversion assists
sales representatives in two ways. On the one hand, characteristics of converted leads
could be used for the search of new ones. On the other hand, the model comprising
the uncovered relationship can be applied on websites of potential customers, resulting
in conversion probabilities. Consequently, sales representatives are able to better
identify interesting leads as well as to allocate marketing means towards those leads
with high conversion probabilities.
This study contributes to the literature in several ways. Firstly, a multilayer web mining
approach is presented to extract the right corporate websites (see Sect. 2.1). Secondly,
we demonstrated the ability of principal component analysis (PCA) to construct
corporate website characteristics that relate to the success rate of customer
conversion (see Sect. 2.2.3). Thirdly, we added expert knowledge to the reduced
feature space in order to cope with the information loss induced by dimensionality
reduction (see Sect 2.3). Finally, a hybrid ensemble is presented in an effort to
optimally approach the underlying relationship (see Sect. 2.4).
6
2 METHODOLOGY Corporate websites are identified by means of a multilayer web crawling algorithm.
Additionally, unstructured textual information from these websites is extracted and
represented as a vector space model after a text preparation phase. This high-order
structure is projected onto a lower dimensional space through the application of PCA.
Next, expert knowledge is added through the construction of new predictors in order
to compensate for information loss. Finally, a hybrid ensemble is built upon this new
feature set in order to uncover a pattern between a lead’s website characteristics and
the probability of customer conversion. Fig. 1 shows the methodology of this approach.
Figure 1: Methodology
7
2.1 Web mining In order to retrieve companies’ websites, a web mining approach is applied allowing
automated data collection. Based on an actual data set containing companies’ names
and locations, the internet is crawled in search of the corresponding websites. The web
crawling technique comprises the parsing of HTML files found on the web, resulting in
structural HTML trees. These hierarchical representations of web documents allow
web miners to query the content by means of XPath expressions which enables them
to find the desired pieces of information.
2.1.1 Identification of corporate websites The identification of a company’s website depends upon two consecutive stages.
Firstly, a pool of plausible websites is generated by means of the company’s business
name. Secondly, the city of the firm’s establishment is used to identify and collect the
right corporate website. This principle is applied in three successive search
approaches and is illustrated in Fig. 2.
As a first attempt to identify the website of a company, alternative URL’s are generated
based on its name since a firm’s business name usually corresponds with the firm’s
domain name (e.g. www.company-name.com). Therefore, the company’s business
name and several meaningful variations to this are converted into a set of multiple
URL’s. Existing websites are crawled and collected if the city or corresponding postal
code of the firm’s establishment is identified in webpages that potentially contain this
information (e.g. ‘contact’, ‘sitemap’, ‘location’ etc.).
Companies whose websites were not identified are now passed to a Google query
generator. The first two corresponding Google search results pages are collected and
parsed, followed by the extraction of the hyperlinks by means of an XPath expression.
Links to information pages such as Gouden Gids, Kompass, Trendstop Knack and
Infobel are gathered and crawled since these pages could contain the URL of the
requested corporate website. XPath queries can be constructed and repeatedly used
to extract relevant data that is presented in these information pages because each of
these websites have their own specific and constant HTML structure. As a result, the
retrieved information webpages can be queried for the company’s name, city, postal
8
codes and URL. Finally, this URL is extracted when the firm’s name and city or postal
code correspond to the name and city or postal code found on the information page.
A final approach consists of crawling the highest ranked Google search results as
these are likely to contain the link to the correct company’s website. Whereas the
Google query in the previous search approach was constructed by a string
concatenation of the company’s name and city, search results are now generated by
a query solely comprising the business name. This eliminates the presence of websites
that only show a relationship with the firm’s city but not the firm itself. Next, the results
are filtered from several predefined irrelevant websites (e.g. ‘facebook’, ‘youtube’,
‘jobat’ etc.) as an extra measure to avoid the collection of faulty websites. At last, a
search result is extracted if the firm’s city or postal code is included in its crawled
content.
Figure 2: Corporate website identification
9
2.1.2 Data collection Once the correct corporate website is identified, all its subdirectories are downloaded
and saved as HTML files. Next, each file is parsed, allowing the extraction of textual
content which does not contain markup tags and other irrelevant HTML objects. The
collected information from all subdirectories is then bundled into one plain text
document, which represents the entire textual content of a firm’s website. After this
process is repeated for all websites, a text corpus is created, containing all the firms’
text documents. This structured collection of textual data facilitates further text mining
operations.
2.2 Text mining Despite having a structured set of text documents, further operations need to be done
in order to deal with the unstructured nature of the documents themselves. Text mining
applications provide the techniques to automatically extract relevant information from
unstructured written resources (Gupta & Lehal, 2009). In particular, they allow the
transformation of a text corpus into a more meaningful text representation, suitable for
statistical analysis. The text mining techniques used in this study are discussed in the
next sections and are illustrated in Fig. 3.
2.2.1 Text preparation In a first stage, several text cleansing procedures are conducted to prepare the text for
the subsequent text representation stage. At first, raw text cleansing is done and
encompasses the removal of numbers, punctuation, whitespace and special
characters as these bear no content information. Additionally, text is converted into
lower case, which avoids the occurrence of duplicate terms in the final text collection.
The last step consists of removing extremely common words, also known as stop
words, as they have little or no discriminative power with respect to the response
variable (Thorleuchter, Van den Poel, & Prinzie, 2010). Several predefined lists exist
that contain language-specific stop words and are used in this study for the stop word
elimination process.
10
Figure 3: Text mining stages
2.2.2 Text representation After the text preparation stage, the remaining text has to be transformed to a
representation that can be processed by computers. This is generally accomplished
by employing the bag-of-words approach. This process converts a text collection into
a vector space model where each document is represented as a vector with entries for
each term that occurs in the whole text collection. The values of the entries are
determined by the number of times the terms appear in the specific document (Silva
et al., 2003). Despite its simplicity, several experiments found that this approach does
not perform worse than more sophisticated representation techniques (Apté, Damerau,
& Weiss, 1994; Dumais, Platt, Heckerman, & Sahami, 1998; Lewis, 1992). Therefore,
this study uses this method for the text representation process, though with the addition
of a more advanced weighting scheme regarding the term values, as this significantly
improves classification performance (Sparck Jones, 1972). The idea is that terms
11
should be weighted according to their importance in the whole text collection, rather
than to their occurrence in a single document. The term frequency is hence multiplied
by the inverse document frequency (i.e. a coefficient that expresses the uniqueness of
the term in the document collection). Finally, a normalization factor is added to ensure
that each document has an equal chance to be retrieved regardless of its length. The
term weighting formula is:
where tf is the term frequency, idf is the inverse document frequency, and N is the
normalization factor (Salton & Buckley, 1988). The result is a weighted document-by-
term matrix wherein each row represents a firm’s website and each column a term
occurring in the cleansed textual content.
2.2.3 Dimensionality reduction The weighted document-by-term matrix is a structured representation of the
unstructured textual content, allowing machines to process it. However, in this case,
the feature set first needs to be reduced since the number of terms is too large to derive
a pattern from the data. In order to solve this problem automatically, dimensionality
reduction techniques can be applied on the original data set. The new feature space
facilitates the construction of a much simpler model, improving the classifier’s
performance and reducing the learning process time (Eyheramendy & Madigan, 2005;
Silva et al., 2003; Yang et al., 1997; Tang et al., 2005). In practice, two major types of
dimensionality reduction techniques are commonly used.
Feature selection techniques aim at finding a subset of the most descriptive instances
(Eyheramendy et al., 2005). For this research, a term filter is used to remove sparse
terms. These are words exceeding a specific sparse percentage, i.e. the percentage
of documents where the word does not occur. Because these terms bear little to no
(1)
12
information with respect to the entire document collection, they are removed from the
data, resulting in a subset of the most relevant features (see Fig. 4).
Figure 4: Term filtering
A more sophisticated dimensionality reduction technique exists that transforms the
high dimensional space into a subspace by means of a linear or nonlinear combination
of the original features. This approach, also known as feature extraction, thus results
in a set of newly created features (Tang et al., 2005). The dimensionality reduction
technique used in this study is PCA, which orthogonally transforms the native feature
space into a set of new variables that are the closest fit to the observations, hence
maximizing the variance in the data (Wold, Esbensen, & Geladi, 1987). These new
features are called principal components (PC) and are a linear combination of the
original variables and their loadings, which describe the direction along which the data
varies the most (see Fig. 5). The first principal component explains more variance in
the data then the second principal component and so forth, under the constraint that
they are all orthogonal, thus uncorrelated. The values in the new feature space are
obtained by the orthogonal projection of the original observations onto the principal
components (Abdi & Williams, 2010). So the score z for an observation i on the
principal component k is then:
where 𝜃𝑗𝑘 = loading of PC𝑘 on variable j and xij = value of observation i for variable j.
(2)
13
Figure 5: Principal Component Analysis
The combination of term filtering and PCA results in the reduced feature set that is
used to build the prediction model. This subspace is composed of a specific number
of principal components together with the corresponding scores. Since the principal
components group together related terms, they are actually describing different
concepts. The decision regarding the number of principal components is therefore very
important as these concepts represent the corporate website characteristics in this
particular study. Too many principal components would result in the incorporation of
irrelevant characteristics while important characteristics are possibly not considered
with too few principal components. The optimal number of principal components is
determined by building and evaluating prediction models for several dimensions. This
procedure is explained in Sect. 3.2.
2.3 Incorporating expert knowledge The disadvantage that accompanies dimensionality reduction is the potential loss of
information (Sebastiani, 2002). This can be countered by the incorporation of domain-
specific expert knowledge (Baesens, Mues, Martens, & Vanthienen, 2009). In
particular, business expertise can be translated into the construction of new variables
that are expected to provide predictive power to the response variable. The knowledge
fusion of reduced data and domain expertise thus results in increased model accuracy
14
(Martens et al., 2006).
In this research, the reduced feature space is augmented with predictors that are
expected to relate to the success rate of customer conversion. Firstly, a dummy
variable is created, indicating whether or not contact information such as a telephone
number, an email address or a contact form can be found on the firm’s website. This
information is very valuable for sales representatives as it allows them to contact their
leads regarding the marketing offer. Secondly, a social media activity variable is
constructed to indicate the extent to which companies are open to be approached by
external parties. Firms whose website contain hyperlinks to social media pages such
as Facebook, Twitter or LinkedIn are marked active on social media. Lastly, the region
where the firms are established is extracted as some firms prefer to do business within
their own region. Table 1 gives an overview of all the variables used in this research.
Table 1: Variables used in research
Variable name Description
Dependent variable
Target Binary variable indicating whether the company has successfully converted into a customer
Independent variables PC 1
… Principal components representing corporate website characteristics
PC k
Contact Binary variable indicating whether the company is contactable through its website
Social media
Binary variable indicating whether the company is active on social media
Region
Binary variable indicating whether the company is located in Flanders
15
2.4 Predictive modeling The final stage consists of approaching the underlying relationship between a
company’s website features and the probability of its conversion. This pattern could be
used to classify new leads, improving marketing effectiveness. In this study, several
prediction models are fit onto the final data set and are combined into a hybrid
ensemble. This technique uses a set of different learners and combines their
predictions in order to classify new instances. Ensembles generally yield better
performance than single classifiers for several reasons. First of all, predictions of
classifiers used in the ensemble are aggregated which reduces the risk of
misclassification. Secondly, different representable functions to approach the reality
are combined which improves the approximation of the true function. Finally, several
fitting procedures are applied, reducing the chance of getting stuck in local optima
(Dietterich, 2000). In practice, there are two main strategies used for creating
ensembles. The first strategy is data-induced and consists of fitting a model on several
manipulations of the original data. The obtained learners are combined in an
aggregated model which classifies unseen observations based on the average
predictions. This technique reduces the variance of a model, resulting in more reliable
results. The algorithm-induced strategy focuses on the variation of the learning
algorithms instead, and hence increases the diversity of the ensemble. The idea is that
classifiers producing wrong votes are compensated by those that do make the right
decisions (Banfield, Hall, Bowyer, & Kegelmeyer, 2005). This study combines both
strategies into a hybrid ensemble in an effort to optimize the model fit (see Fig. 6). The
different machine learning algorithms used to construct the hybrid ensemble are briefly
discussed in the next sections.
16
Figure 6: Hybrid ensemble
2.4.1 Regularized logistic regression Logistic regression is a parametric supervised learning technique especially used for
binary classification. The model makes use of the logit function to obtain predicted
probabilities with respect to the response variable and is given by:
The regression coefficients are estimated according to the maximum-likelihood
estimation (Allison, 1999). In practice, logistic regression is very popular as it is a fairly
easy, quick and robust modelling technique (DeLong, DeLong, & Clarke-Pearson,
1988; Greiff, 1998). In case of a large number of predictors however, logistic
regression tends to over fit the data, hence describing the sample’s random noise
instead of approaching the underlying relationship. In order to avoid this, the model
can be regularized by shrinking coefficients towards zero by means of the incorporation
of a complexity penalty parameter (Le Cessie & Van Houwelingen, 1992). The result
is a model comprising less predictors, which reduces complexity and avoids overfitting.
(3)
17
2.4.2 Random forest Random forest is a non-parametric modelling technique in which several decision trees
are built upon multiple bootstrap samples (Breiman, 2001). This process fabricates a
collection of similar training sets obtained by taking random samples with replacement
of the original data. Each bootstrap is then used for fitting a decision tree, which is
grown by only considering a random subset of the available features at each splitting
node. The result is a large set of decorrelated decision trees which are combined to
obtain average predictions. These characteristics make random forest a relatively
stable and diverse ensemble (Breiman, 1996). Fig. 7 illustrates this modelling
technique.
Figure 7: Random forest
2.4.3 Rotation forest Like random forest, rotation forest also creates an ensemble based on bootstrap
aggregation and random feature selection. The algorithm generates a set of random
feature subspaces and draws bootstrap samples from each of these subspaces. Next,
PCA is applied on each bootstrap and decision trees are trained on these
reconstructed feature sets. The resulting classifiers are combined into a final ensemble
which is both diverse and accurate (Rodriguez, Kuncheva, & Alonso, 2006).
18
2.4.4 AdaBoost Like the two previously explained methods, AdaBoost relies on data manipulation to
generate predictions. The algorithm builds a weak learner (e.g. decision tree) on the
original data and iteratively improves its performance by fitting the learner on the same
data multiple times, but with increased weights for incorrectly classified examples. This
enforces subsequent learners to focus on getting the more difficult cases right. The
final model is a weighted sum of all the constructed classifiers with their performances
as weighting criteria (Freund & Schapire, 1999). Despite the fact that this technique is
quite slow in comparison with other classifiers, it manages to achieve significant
performance improvements. In fact, Breiman (1996) called AdaBoost one of the best
performing classifiers in the world. The procedure is shown in Fig. 8.
Figure 8: AdaBoost
2.4.5 Support Vector Machine The last model included in the hybrid ensemble is the Support Vector Machine (SVM)
(Vapnik, 1995; Vapnik, 1998a; Vapnik, 1998b). This algorithm searches a hyperplane
that optimizes the separation of the data according to their class labels. Often however,
observations are not linearly separable. In this case, observations can be mapped onto
a higher dimensional space which does allow linear separation. The dimensionality
transformation is accomplished by applying a kernel function to the original data and
is succeeded by the training of a SVM in the newly obtained feature space (Lodhi,
19
Saunders, Shawe-Taylor, Cristianini, & Watkins, 2002). The result is an optimal
hyperplane that maximizes the distances between the different groups of observations
and can be reused for classifying new data that underwent the same transformation
rule. Fig. 9 demonstrates this approach.
Figure 9: Support Vector Machine
Finally, the different models are joined into a hybrid ensemble and combined with a
specific combination rule. Predictions are often weighted according to the
performances of the corresponding classifiers on a validation set. The higher a
classifier’s validation performance, the higher its weight in future predictions. This
performance-based voting can lead to a significant improvement of the ensemble’s
accuracy as it increases the influence of models which are best approaching the
underlying relationship. With a small data set however, one needs to be careful with
applying these heuristics. The possibility exists that some models accidentally perform
extremely well on the validation set, but seem to fail in general. In order to avoid this,
a simpler combination rule can be applied where each classifier is assigned an equal
weight. Given the discreet dimensions of the extracted data set in this research, the
single classifiers’ predictions are combined by a simple average.
20
2.5 Model evaluation criteria Once a hybrid ensemble is constructed on training examples, it needs to be evaluated
in order to determine whether company characteristics, extracted from corporate
websites and uncovered by PCA, can be used to predict customer conversion
probabilities. In this study, the performance of the prediction model is assessed by the
area under the receiver operating characteristic curve (AUC) and the lift, two commonly
used performance measures in practice.
The AUC is a measure that reduces the receiver operating characteristic (ROC) curve
into a single figure (Hanley & McNeil, 1982). This curve visualizes the performance of
a model by means of plotting the sensitivity versus (1-specificity) for the entire range
of decision thresholds. These performance measures can be derived from the
confusion matrix, which contains the model’s TP (true positives), TN (true negatives),
FP (false positives) and FN (false negatives). The sensitivity (TP/(TP+FN)), also called
the true positive rate, is the proportion of positive examples that are correctly classified
by the model. The specificity (TN/(TN+FP)), also called the false positive rate, is the
proportion of negative examples that are correctly classified by the model. Since a
model’s output comprises a list of examples ranked according to their predicted class
probabilities, decision makers need to decide which group to target for further
investigation. However, as this decision boundary varies, the previously mentioned
performance measures vary as well. This means that a model’s performance cannot
be determined for a single threshold since it is unknown how these measures will
evolve as this threshold is changed (Bradley, 1997). For this reason, evaluation
measures need to be aggregated for all possible operating points in order to obtain a
model’s overall accuracy. The AUC does this by representing the surface underneath
the ROC curve and ranges from 0.5 to 1, with 0.5 being a random model and 1 being
a perfect model (Hanley et al., 1982). The actual meaning of this number is the
probability that a randomly chosen positive example is ranked higher than a randomly
chosen negative example. As the AUC reduces a model’s overall performance to a
single figure, it can be used for model comparison and selection (Bradley, 1997). This
research used the AUC for selecting the optimal dimension as well as constructing a
fine-tuned hybrid ensemble. These procedures are explained in detail in Sect. 3.2.
21
Another criterion, commonly used in practice, is the lift, which is a measure of model
effectiveness. In particular, the lift measures the ratio between a model’s response rate
for a specific target and the average response rate for the entire population. Suppose
for example that there are 10% churners in an entire customer base and that a certain
model is able to identify 50% of the churners in a particular customer segment. This
model then yields a lift of 5 (50%/10%) for this segment. If this exercise is performed
for every percentage of the population targeted, the cumulative lift curve is obtained,
which indicates how well a model performs compared to the baseline. This can be
particularly useful for marketing purposes, as it allows decision makers to segment
their market and target those groups with a high density of positive responders. In this
research, the cumulative lift curve represents the hybrid ensemble’s effectiveness in
identifying the right leads according to several decision thresholds (See Sect. 3.3).
3 EMPIRICAL VERIFICATION
3.1 Research data In this paper, we used data obtained from an anonymous Belgian energy supplier. The
data set was created based upon the results of a marketing campaign, containing
information about the targeted leads’ business names and the location of their
establishments as well as their binary success rates of customer conversion. In order
to discover a relationship between these leads’ characteristics and their customer
conversion rates, textual content from corporate websites was extracted. From the
3507 targeted leads, 1284 corresponding corporate websites were identified of which
1272 could be used for further research. The collection of unstructured web content
was cleansed and transformed to a structured website-by-term matrix and augmented
with predictors constructed through expert knowledge. This data was then randomly
split in a training, validation and test set. The optimal dimension and fine-tuned model
parameters were derived using the training and validation set (see Sect. 3.2). The test
set was used to estimate the performance of the final ensemble (see Sect. 3.3). The
characteristics of the different data sets are summarized in Table 2.
22
Number of leads Relative percentage
Training set
Converted leads 62 12.18%
Not-converted leads 447 87.82%
total 509 100%
Validation set
Converted leads 49 12.86%
Not-converted leads 332 87.14%
total 381 100%
Test set
Converted leads 47 12.30%
Not-converted leads 335 87.70%
Total 382 100%
Table 2: Characteristics marketing data
3.2 Optimal dimensionality and model selection Once the websites were collected, they were transformed into a high-dimensional
website-by-term matrix by means of a text preparation and representation phase.
Subsequently, PCA was applied in an effort to construct a set of principal components,
describing corporate website characteristics. The final data set was eventually
obtained by augmenting the PCA subspace with variables indicating whether
companies were approachable by external parties through their websites or social
media channels, along with the region of their establishments. Per dimension, a
validation process was applied in which each model, discussed in Sect. 2.4, was
trained on the training set and evaluated on the validation set for several parameter
configurations. The parameters that yielded the highest model performance on the
validation set were extracted together with the corresponding AUC. The result was a
selection of optimized models per dimension, along with their validation AUC’s. The
averages of these AUC’s were used to compare and select the optimal number of
principal components.
23
As illustrated in Fig. 10, the model performance strongly increased up to 19 principal
components. At that point, the performance yielded an average validation AUC of 0.63.
In the range of 20-24 principal components, the AUC dropped to approximately 0.59.
From 25 principal components on, the performance recovered and started to fluctuate
around an AUC of 0.63 for the remaining dimensions. A maximal AUC of 0.64 was
reached at 34 principal components. However, as dimensionality increased, model
complexity increased as well. Compared to the model built on a subspace containing
19 principal components, this model was much more complex while it hardly achieved
a higher AUC. In general, a model yielding approximately the same performance as a
more complex model will perform better on future data. Furthermore, model complexity
reduces the readability and the interpretability of the model (Baesens et al., 2009).
Therefore, 19 principal components were used for representing the customer
conversion related characteristics.
Figure 10: Model performance in function of dimensionality
0 10 20 30 40
0.52
0.56
0.60
0.64
Principal Components (company characteristics)
Mod
el P
erfo
rman
ce (A
UC)
24
3.3 Results The optimized parameters and most discriminative company characteristics, obtained
in the dimensionality selection stage, were used to build a final hybrid ensemble. A
basic model was built upon the feature space solely containing the corporate website
characteristics. An extended model was built upon the same feature space, but
augmented with predictors that resulted from expert knowledge. Both ensembles were
fit on a fusion of the training and validation set and were evaluated on test examples
by comparing the companies’ predicted conversion probabilities against their actual
success rates of customer conversion. Finally, both models were compared with each
other in order to assess the predictive leverage of the expert knowledge variables.
Fig. 11 and Fig. 12 illustrate that the two models perform better than the random model,
as both figures delineate model performance curves that are situated above the
baseline. These improvements were significant (Z = 4.4085 and p < 0.001 for the basic
model, Z = 4.5953 and p < 0.001 for the extended model). This means that both models
succeeded in uncovering a relationship between a company’s website characteristics
and whether it positively responded to the marketing campaign or not. Both figures
also show that the model built on the augmented feature space outperformed the
model that was exclusively fit on corporate website characteristics. The addition of the
expert knowledge predictors to the PCA subspace increased the AUC from 0.656 to
0.694 and is translated in an ROC curve that is located further from the random model.
The extended model’s overall ability to distinguish promising leads from low-potential
leads is thus higher than that of the model without the self-constructed features. This
is especially the case for the first five deciles where the extended model’s cumulative
lift curve is located far above the basic model’s curve. In the first decile, the lift
increased from 1.283 to 2.139, meaning that the hybrid ensemble, built on a feature
space comprising corporate website characteristics and expert knowledge predictors,
is able to identify approximately 21% of the positive responders in the top 10 percentile
of the entire population. Targeting the top 30 percentile would result in the identification
of approximately 53% of all positive responders as the extended ensemble’s lift yields
a value of 1.77 in the third decile.
25
Figure 11: Cumulative lift curves
Figure 12: ROC curves
1.0
1.2
1.4
1.6
1.8
2.0
Cumulative Lift Curves
bucket
Cum
ulat
ive li
ft
10% 30% 50% 70% 90%
bucket
Cum
ulat
ive li
ft
model with expert knowledgemodel without expert knowledge
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC Curves
1− specificity
sens
itivi
ty
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1− specificity
sens
itivi
ty
model with expert knowledgemodel without expert knowledgerandom model
26
Model with expert knowledge Model without expert knowledge
AUC 0.694 0.656
Top-decile lift 2.139 1.283
Table 3: AUC and top-decile lift
4 CONCLUSION Customer acquisition is a time-consuming and cost-intensive process as only certain
leads will actually convert (Cooper & Budd, 2007; Patterson, 2007; Yu & Cai, 2007).
Consequently, companies often focus on doing business with existing customers
rather than searching for new ones on a regular basis (Rygielski et al., 2002). Today,
this marketing strategy may no longer suffice. On the one hand, the increasingly
competitive environment provides customers with valuable alternatives to fulfil their
sophisticated needs (Shaw et al., 2001). On the other hand, customers become more
informed about these competitive offerings due to the flourishing of the World Wide
Web (Rygielski et al., 2002). Attracting new customers thus becomes a critical success
factor for modern organizations (Thorleuchter et al., 2012; Wilson, 2006). In a B2B
context, web crawling activities could facilitate the acquisition process as current
companies often provide information concerning their specific businesses on their
websites (Thorleuchter et al., 2012). This information could be valuable to determine
whether a company would be a suitable target for further acquisition purposes.
In this study, we set out to (1) determine whether textual content extracted from
corporate websites could be used to identify promising leads and (2) whether the
incorporation of expert knowledge would improve the acquisition model. The data we
used contained information about a Belgian company’s marketing campaign situated
in a B2B context. This enabled us to crawl companies’ websites and to analyze
whether companies’ features extracted from web content related with their actual
marketing campaign responses. The websites were collected by means of a multilayer
27
web crawling algorithm. Text mining applications such as text preparation and
representation were used to transform the unstructured web content into a structured
and more manageable form. PCA was subsequently applied in a dimension reduction
phase and resulted in a feature subspace representing groups of frequently related
terms. Lastly, we incorporated expert knowledge by investigating whether companies
were approachable through their website or social media as this may facilitate
acquisition activities. In addition, we determined the region of the leads’ establishments
as this could influence their preferences towards certain business relations.
This research showed that a well-chosen set of company characteristics, derived from
textual information and extracted from corporate websites, do provide discriminative
power concerning the success rate of customer conversion. The underlying
relationship was uncovered by a hybrid ensemble constructed through the combination
of several diverse machine learning algorithms. The ensemble’s ability to identify
promising leads was even more significant when augmenting the companies’
characteristics with predictors that resulted from domain expertise. Moreover, the
extended ensemble succeeded in detecting 53% of all positive responders by targeting
30% of the entire population. As a result, the framework presented in this research
could assist B2B sales representatives in identifying promising leads. It enables
marketers to manage a targeted marketing approach that is more effective and
efficient. This could be especially beneficial for businesses confronted with low
conversion ratios or constrained by limited marketing budgets.
5 LIMITATIONS AND FURTHER RESEARCH We would like to mention that this research was conducted in a specific B2B context
based on the data of a Belgian energy supplier. Similar analyses should be done in
different market settings in order to generalize the findings of this study. Additionally,
we augmented the PCA subspace with three variables that were expected to relate
with a company’s conversion probability. Further research could investigate whether
more discriminative variables exist to predict a lead’s marketing response.
28
Furthermore, the machine learning algorithms are not restricted to the ones used in in
this research for constructing the hybrid ensemble. Other experiments may give more
insight into which combination of machine learners and parameter settings is able to
render the best performance. We would also like to stress that we used principal
component analysis for the dimension reduction process. Other research could
investigate whether other feature extraction techniques, such as latent semantic
indexing, yield better results.
ACKNOWLEDGEMENTS We would like to thank the anonymous Belgian energy supplier for providing us with a
data set that made this research possible. In addition, we would like to thank Prof. Dirk
van Den Poel, Jeroen D’haen and Tijl Carpels for their support and suggestions during
this study. For this research, we used R as a programming language and software
environment as it is todays leading tool in data analysis. Furthermore, R is freely
available, platform-independent, open-source and has a large community of users,
resulting in an extensive amount of available packages submitted by experts in their
respective fields. The RCurl package was used in combination with the XML package
for web crawling activities. The former allows one to extract information resources from
the web, whereas the latter provides functionalities to interpret these retrieved web
documents. The tm package was used for several text mining purposes such as text
cleansing, representation and term filtering. Principal component analysis was
performed by functions available in the R Base package. The hybrid ensemble was
built with different models that were implemented by means of the packages ada,
glmnet, randomForest, rotationForest and e1071. Finally, the model performance was
evaluated by means of the AUC, lift and pROC packages.
29
BIBLIOGRAPHY Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433-459. Allison, P. D. (1999). Logistic Regression using the SAS System: Theory and Application. Cary: SAS Institute Inc. Apté, C., Damerau, F., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems (TOIS), 12(3), 233-251. Baecke, P., & Van den Poel, D. (2010). Improving purchasing behavior predictions by data augmentation with situational variables. International Journal of Information Technology & Decision Making, 9(6), 853-872. Baesens, B., Mues, C., Martens, D., & Vanthienen, J. (2009). 50 years of data mining and OR: upcoming trends and challenges. Journal of the Operational Research Society, S16-S23. Banfield, R. E., Hall, L. O., Bowyer, K. W., & Kegelmeyer, W. P. (2005). Ensemble diversity measures and their application to thinning. Information Fusion, 6(1), 49-62. Bauer, H. H., Grether, M., & Leach, M. (2002). Building customer relations over the Internet. Industrial Marketing Management, 31(2), 155-163. Bose, I., & Mahapatra, R. K. (2001). Business data mining—a machine learning perspective. Information & management, 39(3), 211-225. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7), 1145-1159. Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. Cooper, M. J., & Budd, C. S. (2007). Tying the pieces together: A normative framework for integrating sales and project operations. Industrial Marketing Management, 36(2), 173-182. Coussement, K., & Van den Poel, D. (2009). Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers. Expert Systems with Applications, 36(3), 6127-6134. DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(1), 837-845.
30
D’Haen, J., Van den Poel, D., & Thorleuchter, D. (2013). Predicting customer profitability during acquisition: Finding the optimal combination of data source and data mining technique. Expert systems with applications, 40(6), 2007-2012. Dietterich, T. G. (2000). Ensemble methods in machine learning. In Kittler, J., & Roli, F. (Eds.), Proceedings of the First International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer. Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Makki, K., & Bouganim, L. (Eds.), Proceedings of the Seventh International Conference on Information and Knowledge Management (pp. 148-155). New York: ACM Press. Dwyer, F. R., Schurr, P. H., & Oh, S. (1987). Developing buyer-seller relationships. The Journal of marketing, 51(1), 11-27. Eyheramendy, S., & Madigan, D. (2005). A novel feature selection score for text categorization. In Proceedings of the Workshop on Feature Selection for Data Mining, in conjunction with the 2005 SIAM International Conference on Data Mining (pp. 1-8). Freund, Y., & Schapire, R. (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780. Gentsch, P., & Hänlein, M. (1999). Text mining. Das Wirtschaftsstudium (WiSu), 28(1), 1646-1653. Greiff, W. R. (1998). A theory of term weighting based on exploratory data analysis. In Croft, W. B., Moffat, A., van Rijsbergen, C. J., Wilkinson, R., & Zobel, J. (Eds.), Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 11-19). New York: ACM Press. Gupta, V., & Lehal, G. S. (2009). A survey of text mining techniques and applications. Journal of emerging technologies in web intelligence, 1(1), 60-76. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29-36. Hill, L. (1999). CRM: easier said than done. Intelligent Enterprise, 2(18), 53-55. Hotho, A., Nürnberger, A., & Paaß, G. (2005). A Brief Survey of Text Mining. Journal for Computational Linguistics and Language Technology, 20(1), 19-62. Isaac, S & Tooker, R. N. (2001). The many faces of CRM. LIMRA’s marketFacts Quarterly, 20(1), 84-88. Kosala, R., & Blockeel, H. (2000). Web mining research: A survey. SIGKDD Explorations, 2(1), 1-15.
31
Kothandaraman, P., & Wilson, D. T. (2000). Implementing relationship strategy. Industrial Marketing Management, 29(4), 339-349. Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied Statistics, 41(1), 191-201. Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval (pp. 37-50). New York: ACM Press. Ling, R., & Yen, D. C. (2001). Customer relationship management: An analysis framework and implementation strategies. The Journal of Computer Information Systems, 41(3), 82-97. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. The Journal of Machine Learning Research, 2(1), 419-444. Martens, D., De Backer, M., Haesen, R., Baesens, B., Mues, C., & Vanthienen, J. (2006). Ant-based approach to the knowledge fusion problem. In Dorigo, M., Gambardella, L., Birattari, M., Martinoli, A., Poli, R., & Stützle, T. (Eds.), Ant colony optimization and swarm intelligence, fifth international workshop (pp. 84-95). Berlin: Springer. Ngai, E. W., Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert systems with applications, 36(2), 2592-2602. Patterson, L. (2007). Marketing and sales alignment for improved effectiveness. Journal of digital asset management, 3(4), 185-189. Petrison, L. A., Blattberg, R. C., & Wang, P. (1997). Database marketing: Past, present, and future. Journal of Direct Marketing, 11(4), 109-125. Reinartz, W. J., & Kumar, V. (2003). The impact of customer relationship characteristics on profitable lifetime duration. Journal of marketing, 67(1), 77-99. Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630. Rygielski, C., Wang, J. C., & Yen, D. C. (2002). Data mining techniques for customer relationship management. Technology in society, 24(4), 483-502. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.
32
Shankaranarayanan, G., & Cai, Y. (2005). A web services application for the data quality management in the B2B networked environment. In Proceedings of the 38th Hawaii international conference on system sciences (pp. 1–10). Shaw, M. J., Subramaniam, C., Tan, G. W., & Welge, M. E. (2001). Knowledge management and data mining for marketing. Decision support systems, 31(1), 127-137. Silva, C., & Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. In Neural Networks, 2003. Proceedings of the International Joint Conference on (pp. 1661-1666). Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11-21. Stumme, G., Hotho, A., & Berendt, B. (2006). Semantic web mining: State of the art and future directions. Web semantics: Science, services and agents on the world wide web, 4(2), 124-143. Tang, B., Shepherd, M., Milios, E., & Heywood, M. I. (2005). Comparing and combining dimension reduction techniques for efficient text clustering. In Proceeding of SIAM International Workshop on Feature Selection for Data Mining (pp. 17-26). Thorleuchter, D., Van den Poel, D., & Prinzie, A. (2010). Mining innovative ideas to support new product research and development. In Locarek-Junge, H., & Weihs, C. (Eds.), Classification as a Tool for Research (pp. 587-594). Berlin: Springer. Thorleuchter, D., Van den Poel, D., & Prinzie, A. (2012). Analyzing existing customers’ websites to improve the customer acquisition process as well as the profitability prediction in B-to-B marketing. Expert systems with applications, 39(3), 2597-2605. Ulaga, W., & Chacour, S. (2001). Measuring customer-perceived value in business markets: a prerequisite for marketing strategy development and implementation. Industrial marketing management, 30(6), 525-540. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer. Vapnik, V. (1998a). Statistical learning theory (Vol. 1). New York: Wiley. Vapnik, V. (1998b). The support vector method of function estimation. In Suykens, J. A. K., & Vandewalle, J., (Eds.), Nonlinear Modeling: Advanced Black-box Techniques (pp. 55-85). Boston: Kluwer Academic Publishers. Wilson, R. D. (2006). Developing new business strategies in B2B markets by combining CRM concepts and online databases. Competitiveness Review: An International Business Journal, 16(1), 38-43.
33
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3), 37-52. Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Kaufmann, M. (Ed.), 14th International Conference on Machine Learning (pp. 412-420). Yu, Y. P., & Cai, S. Q. (2007). A new approach to customer targeting under conditions of information shortage. Marketing intelligence & planning, 25(4), 343-359.