40
1 Accelerated Learning of User Profiles Pelin Atahan and Sumit Sarkar School of Management, The University of Texas at Dallas [email protected], [email protected] March 12, 2009 Websites typically provide several links on each page visited by a user. While some of these links help users easily navigate the site, others are typically used to provide targeted recommendations based on the available user profile. When the user profile is not available (or is inadequate), the site cannot effectively target products, promotions and advertisements. In those situations, the site can learn the profile of a user as the user traverses the site. Naturally, the faster the site can learn a user’s profile, the sooner the site can benefit from personalization. We develop a technique that sites can use to learn the profile as quickly as possible. The technique identifies links to make available that will lead to a more informative profile when the user chooses one of the offered links. Experiments conducted using our approach demonstrate that it enables learning the profiles markedly better after very few user interactions as compared to benchmark approaches. Key words: personalization; Bayesian learning; information theory 1. Introduction Technological advances in recent years have enabled firms to provide personalized services and products to customers over the Internet. As a result, personalization systems are becoming an essential part of online businesses (Yang and Padmanabhan 2005). These systems enable firms to customize the content, links, product recommendations, and advertisements on their website based on information available about the users. Content providers such as Yahoo.com customize links to content based on inferred user preferences. E-commerce sites such as Amazon.com and Overstock.com use personalization technologies to make specific product recommendations to their users. Advertising networks such as RightMedia and Specific Media help firms target the right product, promotion and message to customers. Firms like YesMail help their client firms deliver personalized emails to their clients’ customers.

Accelerated Learning of User Profiles

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

1

Accelerated Learning of User Profiles

Pelin Atahan and Sumit Sarkar

School of Management, The University of Texas at Dallas

[email protected], [email protected]

March 12, 2009

Websites typically provide several links on each page visited by a user. While some of these links help

users easily navigate the site, others are typically used to provide targeted recommendations based on the

available user profile. When the user profile is not available (or is inadequate), the site cannot effectively

target products, promotions and advertisements. In those situations, the site can learn the profile of a user

as the user traverses the site. Naturally, the faster the site can learn a user’s profile, the sooner the site can

benefit from personalization. We develop a technique that sites can use to learn the profile as quickly as

possible. The technique identifies links to make available that will lead to a more informative profile

when the user chooses one of the offered links. Experiments conducted using our approach demonstrate

that it enables learning the profiles markedly better after very few user interactions as compared to

benchmark approaches.

Key words: personalization; Bayesian learning; information theory

1. Introduction

Technological advances in recent years have enabled firms to provide personalized services and products

to customers over the Internet. As a result, personalization systems are becoming an essential part of

online businesses (Yang and Padmanabhan 2005). These systems enable firms to customize the content,

links, product recommendations, and advertisements on their website based on information available

about the users. Content providers such as Yahoo.com customize links to content based on inferred user

preferences. E-commerce sites such as Amazon.com and Overstock.com use personalization technologies

to make specific product recommendations to their users. Advertising networks such as RightMedia and

Specific Media help firms target the right product, promotion and message to customers. Firms like

YesMail help their client firms deliver personalized emails to their clients’ customers.

2

Personalized offerings add value to users by providing them with relevant content. Delivering

relevant content improves the user experience and can help impact business outcomes. The added value

from personalization can result in increased sales and cash flows, as well as increased customer loyalty

(Ansari and Mela 2003). A 2007 survey reported by the market intelligence company Aberdeen finds that

companies that actively personalize the online experience benefit from increased revenues (http://software

.tekrati.com/research/9395).

In order to provide effective personalization in the form of relevant content, a firm needs to collect

appropriate profiling information about each user and then exploit this information to recommend content,

products or services that are best matched to the users’ specific needs. Naturally, a firm’s ability to make

good recommendations depends not only on the technologies employed, but also on the user profiles

available. The Aberdeen report points out that well-designed personalization strategies should utilize

appropriate segmentation and user profiles.

The profile information needed about the users depends on the type of personalization application.

For many target marketing purposes, profiles that contain information on the user’s membership in

various market segments are appropriate. Markets are typically segmented based on demographic,

psychographic and behavioral attributes of customers (Kotler 2003). As a firms’ marketing mix decisions

are based on its target markets, profiles containing such attributes simplify the process of matching the

right product or advertisement to the right customer. While all the aforementioned types of profile

attributes (profiles for short) are valuable for target marketing applications, we concentrate on

demographic and psychographic profile attributes. These profiles are commonly used in media planning

and selection for targeting advertisements (Cannon 2001, Gal-Or and Gal-Or 2005, Iyer et al. 2005). Such

profiles are particularly valuable to firms in online environments, since firms can target customers

individually. For instance, online advertisers are willing to pay a premium for demographic and

psychographic targeting capabilities. At wsj.com and its affiliated sites, an advertisement that is

3

demographically targeted is valued twice as much as the same advertisement when it is not targeted.1

Search engines also charge a premium for clicks of sponsored advertisements that are targeted towards

advertiser-specified segments based on such attributes (MSN AdCenter calls it incremental bidding while

Google calls it demographic targeting). Retailers can also benefit from knowing demographic and

psychographic profile attributes of their prospective customers. These attributes are extensively used in

making recommendations (Pazzani 1999, Ansari et al. 2000, Vozalis and Margaritis 2007), as well as in

predicting purchase behavior (Li et al. 2002; Padmanabhan et al. 2006, Van den Poel and Buckinx 2004).

While such profiles are very useful, they are difficult to obtain at the individual level. Obtaining such

information explicitly from each user through registration or surveys has proved to be challenging. This is

because of several reasons. First, it requires user effort. Additionally, not all users are willing to share

their personal information with a site. Surveys have shown that two in three active Web users typically

abandon a site that requests personal information (Culnan and Milne 2001, Statistical Research Inc.

2001). The surveys also show that one in five Web users has entered false information to gain access to a

site. Therefore, explicit learning is not always possible or even desirable. Similar problems exist when

data is purchased from offline sources since matching records in such data to current users require

obtaining accurate identifying information by the site itself.

Given the limitations in learning the desired profiles explicitly, sites are attempting to learn the

profiles implicitly. Researchers have proposed approaches to learn such profiles by observing a user’s

navigational history (Montgomery 2001, Baglioni et al. 2003, Atahan and Sarkar 2007). While these

approaches are potentially useful, learning profiles implicitly can take time. Firms would benefit from

learning the profiles as quickly as possible, since the firm could start providing personalized offerings

sooner and realize the returns from these services earlier. The problem of how to learn such profiles

quickly has not been studied – that is the focus of this work.

1 This information was obtained based on personal communication with the WSJ site owners.

4

The ability to learn profiles based on a user’s navigational history depends on the pages visited (or

links clicked) by the user. A site can impact the learning rate by judiciously choosing the links to offer at

each page. Of course, sites may determine the links to make available to a user with a variety of

considerations. A site that offers personalized content may customize some or all of the links based on a

user’s profile. When the profile is either not available or inadequate (which is the context of our work) the

site cannot effectively personalize the links. In such situations, the firm can replace the links that are used

to provide personalized offerings with links that will help the site learn the profiles quickly.

We study how the profile learning process can be accelerated by carefully selecting the links to make

available to the user at each interaction – the set of links made available to the user is termed the offer set.

We present a technique to identify the links that will lead to a more informative profile when the user

chooses one of the offered links – we refer to this as active profile learning. We use a Bayesian approach

to learn and update user profiles based on page (link) level statistics. We then develop a technique for

determining the optimal offer set to display at each page visited by the user. Here, the optimal offer set

refers to the set of links that maximizes the expected information obtained regarding the profile when the

user selects an offered link. This problem is difficult to solve optimally in real time for large problem

instances for the following reasons. First, the informative power of a link in an offer set depends not only

on the properties of the link itself, but also on the other links included in an offer set. This prevents greedy

approaches from providing optimal solutions. Second, the number of offer sets to compare increases

exponentially with the number of links considered for inclusion in the offer set. For such difficult cases,

we develop an efficient heuristic for determining the offer sets.

We conduct simulated experiments to evaluate the performance of the proposed approach. The

parameter values used in the simulation comes from real data on web traversals of a large panel of users

that have been collected by comScore Networks. These data are used to estimate the necessary probability

parameters for learning the user profiles. Our experiments consider several scenarios where the active

learning approach is applicable. In the first (base) scenario, the site considers for inclusion in the offer set

all links available to the site, and the only consideration for including a link is to help towards learning

5

profiles. The second scenario is one where a site selects links to offer from a set of candidate links that it

considers relevant to the user’s request. For example, if a user is browsing reviews of movies of a certain

genre, say comedy, the site may wish to include in the offer set available reviews of comedies only. In the

third scenario, a site may customize only a subset of the offer set towards learning the profile. For

instance, at a content delivery site, only a proportion of the links may be allocated towards learning the

profile, while other links may be determined based on considerations such as maintaining a certain site

structure or ensuring easy navigation. The performance of the proposed approach is examined for all these

scenarios, and compared against two benchmark approaches that a site may use when links are

determined without the consideration of learning user profiles. Our experiments demonstrate that the

proposed approach vastly improves the rate at which the profiles are learned in all scenarios considered.

The remainder of the paper is organized as follows. In the next section, we provide a brief overview

of related literature. The active profile learning approach is presented in Section 3. Section 4 describes the

difficulty in determining the optimal offer set for large problem instances, and presents the heuristic

approach for the base scenario. Section 5 shows how the heuristic is adapted for the other scenarios. The

simulated experiments are described and their results discussed in Section 6. Section 7 discusses the

implications of this work for firms.

2. Literature Review

Learning profiles has attracted considerable attention from researchers in recent years. Most of this work

has focused on applications such as information retrieval and collaborative filtering, where the types of

profiles considered are very different from the ones we consider in this research. In the context of

information retrieval, user profiles are represented as feature (term) vectors with weights, where features

typically correspond to terms or keywords appearing in documents (Pazzani and Billsus 1997; Billsus and

Pazzani 2000; Wong and Butz 2000; Widyantoro et al. 2001; Liu et al. 2004; Middleton et al. 2004).

These works show how to learn such profiles in order to help users cope with the information overload on

6

the internet. Profiles are updated using algorithms that update the weight of these vectors based on user

feedback on documents viewed. In these studies, no attempt is made to accelerate the learning process.

For collaborative filtering applications, user profiles are typically represented as a vector of users’

ratings for items (e.g., products, documents, etc.). These ratings are used to identify other users who have

similar tastes in terms of their ratings, and recommendations are provided based on the ratings of the

similar users. For such applications, researchers have viewed the profile learning issue as one of asking

users to explicitly rate items (called query items) so that these ratings can serve as the initial profile of

users (Rashid et al. 2002; Yu et al. 2004). Several alternative criteria have been proposed to identify

which items users should be asked to rate. Rashid et al (2002) rank these query items based on metrics

such as item popularity and item entropy, and ask the users to rate a pre-determined number of items from

the top of the ranked list. Yu et al. (2004) present an approach that defines a profile space as the rating

vectors of a subset of users that are considered to be representative (i.e., prototypical) of the rating vectors

of all the users. Their goal is to find that profile in the profile space that is closest to the active user. They

assume ratings for items follow a Gaussian distribution, and rank query items based on the reduction in

entropy of the profile that would result from that item being rated. The user is then asked to rate a pre-

specified number of such items. While Yu et al.’s work is somewhat similar in spirit to our work, the

underlying problem turns out to be very different. In their work, users are explicitly asked to rate items

one at a time whereas in our problem users implicitly compare alternative links and reveal their

preferences based on the set of links offered. Thus, we need to consider the impact a link has on the

remaining links in the offer set, whereas that is not an issue in their problem. Thus, not only is their

application context distinct, the problem formulation and solution methodology are completely different.

There are a few studies that are more closely related to ours in terms of the types of profiles studied.

Montgomery (2001) and Atahan and Sarkar (2007) study how to learn demographic and psychographic

profiles based on users’ web traversals using probability calculus. Baglioni et al. (2003) attempt to learn

the gender of visitors to a website based on the visitors navigational history using approaches such as

7

decision tree induction techniques and association rules. None of these studies consider how the site can

manipulate the offer set to learn profiles quickly.

3. Active Profile Learning

As discussed before, a site benefits from learning a user’s profile as it could use the profile to offer

personalized services. A profile corresponds to a pre-determined attribute (or set of attributes), such as

gender, age, income, political affiliation, education level, etc. Each profile attribute is represented by the

set of possible values that attribute can take, accompanied by the probabilities (beliefs) associated with

each value (i.e., the probability distribution associated with the profile attribute). Thus, each attribute

value represents a consumer segment, and the profile stores the likelihood of a user’s membership in the

possible segments.

The site wishes to infer the value of a user’s profile attribute from the user’s navigational history. The

site makes available a set of links (the offer set) along with the desired content each time the user makes a

page request. The site’s beliefs regarding the values of the profile attributes are revised each time the user

makes a selection, i.e., clicks on an available link. To learn profiles faster, the site must judiciously decide

what links to make available to a user on each page. In this context, the site’s goal can be viewed as that

of reducing the uncertainty associated with a profile attribute as quickly as possible. We use entropy to

quantify the uncertainty in the profile attributes because of the desirable properties associated with that

measure (Shannon 1948).

The information value of an offer set is measured by the expected information gain that results from

the user’s choice when faced with that offer set. The information gain is the difference in entropy for an

attribute in the light of new evidence (i.e., a selected link in this case) as compared to before the new

evidence was obtained (Mitchell, 1997, pp. 57-58). To learn the profile as quickly as possible, the site

should select the set of links that will maximize the expected information gain at each interaction with the

user (each time the user makes a page request). The site could stop manipulating the links offered to learn

a user’s profile when the profile is known with a high level of certainty, e.g., when the probability

8

associated with one of the attribute values is beyond a threshold the site considers adequate. Alternatively,

the site may stop manipulating the links offered when the expected additional information is not

statistically significant. Of course, the site can and should continue revising the profile based on the user’s

navigation as long as the user stays at the site.

Calculating the expected information gain for an offer set involves first determining the anticipated

revised belief about the user’s profile when the user clicks on a link in the offer set. Therefore, the first

step is to develop a method to revise the beliefs when the user makes a selection. Next, the information

gain for each link in the offer set is calculated based on the anticipated beliefs that would result from the

user clicking on that link in the offer set. In order to determine the information gain for the offer set as a

whole, the probability that the user clicks on a link in the offer set must be calculated for each available

link. The expected information gain for the offer set is obtained by multiplying the anticipated

information gain from each link in the offer set with the probability that link is selected, and summing

over all the links.

We present in the next subsection a Bayesian belief revision process employed in our study and

discuss how the probabilities can be estimated. In the following subsection, we show how the expected

information gain for an offer set Of is calculated. We then discuss how to determine the optimal offer set,

and illustrate this with an example.

3.1. Bayesian Belief Revision

A site offers a set of links to a user at each page visited by the user. Each time the user clicks on an

offered link, the site is able to revise the beliefs regarding the user’s profile (i.e., the user’s membership in

a class or segment). Formally, let ai , i=1,…,m, denote the values the relevant attribute A can take. At any

point in the interaction process, the profile captures the user’s likelihood of belonging to class ai for each

i. We use LH to denote the link history (LH) of the user, where the link history is the set of links that the

user has clicked on so far (LH is equal to φ when a user arrives at a site and has not yet begun navigation).

9

The site has to revise the user’s profile when the user clicks on a new link (lj) from the current offer set

(Of). Therefore, the problem is to determine P(ai|LH,lj) for each value the profile can take.

It is difficult for a site to pre-compute this probability for all possible link histories. If a site has even

a moderately large number of pages, then the number of possible link histories can be extremely large

(e.g., a site with 100 pages has 100C10 possible link histories consisting of ten clicks, which is in the order

of trillions of combinations). In addition, the probability calculation must take into consideration the

specific offer set available at each interaction, which increases the number of feasible combinations even

more.2 One way to overcome this difficulty is by assuming that the probability of a user clicking on a link

is independent of the probability of clicking on other links conditioned on the user profile. The

conditional independence assumption implies that the pages visited by a user are driven by the user’s

profile. While this assumption may appear limiting at first sight, it has been found to be robust in many

applications. Atahan and Sarkar (2007) show that models that make this assumption perform better in

general compared to several other models that could be used to learn user profiles from website traversals.

By using this assumption, the data requirements for the belief revision process are drastically reduced.

Two sets of probability parameters are needed for each link. The first is the profile information of visitors

to each page, P(ai|lj), that we refer to as the link profile. Link profiles can be estimated based on the

distribution of visitor profiles to the associated pages. For example, to learn a user’s gender, the firm

requires statistics on the proportion of male and female visitors to each page. If 60% of the visitors to a

page corresponding to link l1 are male and 40% are female, the site can infer the profile of link l1 to be

P(m|l1)=0.6 and P(f|l1)=0.4. The second parameter is the probability of a user clicking on a link (or

visiting the associated page), P(lj), that we refer to as the link popularity. The estimates for the link

popularities need not be exact. The relative likelihood of a link being clicked as compared to the other

links in the offer set is sufficient for our purposes. Statistics on the relative popularity of pages associated

2 Strictly speaking, in the conditioning part, the probability P(ai|LH,lj) should explicitly include the offer sets that were available during each interaction. To keep the notation simple, we recognize this implicitly and do not include it in the expression.

10

with the links such as the number of visits each page gets per week or per month may be used for this

analysis. For example, in a given time frame if 10 percent of all visitors to a site have visited a page

corresponding to link l1, the popularity of link l1 would be stored as P(l1)=0.1.3

These link statistics can be obtained directly if there are a sufficient number of users who have

provided this information when registering on the site and they can be identified when they visit the site

(e.g., through cookies or log-in). Alternatively, the statistics can be obtained by explicitly asking a subset

of users to provide this information for sampling purposes. They can also be obtained from professional

market research agencies such as comScore Networks or Nielsen Media Research. These agencies collect

personal information from a large number of users and track their online activities, and are able to provide

the desired statistics to their client firms.

We now show how a site can revise the beliefs about a user with link history LH who clicks on a link

lj, given the link level probabilities (i.e., link profile and link popularity). Using Bayes rule, the profile can

be expressed as

)(),(*)/1(),( iijji aPalLHPKlLHaP = , where ∑∈

=Aa

iiji

aPalLHPK )(),( .

Here, P(ai) is the relevant prior distribution of the profile for the site, and K is a normalization factor.

Assuming conditional independence as discussed above, we can write:

)()()(*)/1(),( iijiji aPalPaLHPKlLHaP = , where ∑∈

=Aa

iijii

aPalPaLHPK )()()( .

Applying Bayesian inversion to the term P(LH|ai) in the above expression, we have

)()()()()(

*)/1(),(i

iijiji aP

aPalPLHPLHaPKlLHaP = ,where ∑

=Aa i

iiji

iaP

aPalPLHPLHaPK

)()()()()(

.

After algebraic simplifications, we obtain the following:

3 If the relative behavior of users from different profile classes are observed to be different at different time frames (e.g., weekday versus weekend), then the profile and popularity statistics can be separately computed for the different time frames, and the profile learning mechanism can use the parameters relevant at the time the user visits the site.

11

)()(*)/1(),( LHaPalPKlLHaP iijji = , where ∑∈

=Aa

iiji

LHaPalPK )()( . (1)

Here, P(ai|LH) is the prior belief regarding the user’s profile at the time the user makes the current page

request. P(lj|ai) is the probability that a member of class ai will click on a link lj in an offer set Of. This

probability can be calculated using Bayes rule from the individual link level probabilities associated with

the links in a given offer set Of as follows:

∑∈

=

fj Oljji

jjiij lPlaP

lPlaPalP

)()(

)()()( . (2)

We should point out that the probability a specific link is selected when it is part of one offer set will

usually be different from the probability the same link is selected when it is offered as part of a different

offer set. This is because for different offer sets the denominator in equation 2 includes terms

corresponding to different links with different probability parameters. Consequently, from equation 1, the

revised belief about a user’s profile when the user clicks on a specific link is also a function of the offer

set. When determining the probability P(lj|ai), an assumption we make is that the user will follow one of

the links being offered. However, this is not a restrictive assumption. While we restrict the probability

space based on the offer set available, the probabilities need not sum to 1. For instance, the user may click

on the back button, or submit a query if there is a search engine available at the site. In such situations, the

learning process will continue from where it left off once the user continues to navigate through the site.

3.2. Expected Information Gain for an Offer Set

We next evaluate the information content of an offer set for the profile learning approach. We first

evaluate the information value for each link in the offer set by computing the anticipated information gain

if that link was selected by the user (recall that the information gain is the difference in entropy for the

profiling attribute in the light of new evidence). Therefore, the anticipated information gain I(A|LH,lj)

given a link lj is selected from an offer set Of by a user with link history LH is

),()(),( jj lLHAHLHAHlLHAI −= .

12

Here, H(A|LH) is the entropy of the profile distribution prior to the user clicking on one of the links in the

available offer set (i.e., the prior entropy). H(A|LH,lj) is the remaining entropy in the profile distribution

after link lj in the offer set is selected (i.e., the posterior entropy). The prior entropy is calculated as

∑∈

−=Aa

iii

LHaPLHaPLHAH )(log)()( 2 .

The posterior entropy if the user clicks on a link lj is obtained based on the revised beliefs as

calculated in equation 1, and is

∑∈

−=Aa

jijiji

lLHaPlLHaPlLHAH ),(log),(),( 2 .

Next, we need to evaluate the probability that a user with link history LH will click on link lj in the

given offer set Of. This probability is obtained by taking the expectation of the probability that a user of

each class will click on that link.

∑∈

=Aa

iijji

LHaPLHalPLHlP )(),()( .

Based on the conditional independence assumption, this expression simplifies to:

∑∈

=Aa

iijji

LHaPalPLHlP )()()( . (3)

P(lj|ai) is obtained as discussed in Section 3.1. Finally, the expected information gain for the offer set

Of is calculated by taking the expectation of the information gain over all the links in the offer set, i.e.,

∑∈

=fj Ol

jj lLHAILHlPLHAEI ),()()( . (4)

3.3. Determining the Optimal Offer Set of Cardinality n

To learn a profile quickly, the site should make available to the user at each interaction the offer set that

maximizes the expected information gain. In our formulation, we assume that the cardinality (size) of the

offer set, n, is pre-determined based on site specific design considerations. The links in the offer set are

chosen from a consideration set denoted as C, which is the set of candidate links the site takes into

consideration. This problem can be formally expressed as:

13

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

= ∑∈⊂⊂

fjff Olfjfj

COf

COOlLHAIOLHlPOLHAEI ),,(),(maxarg),(maxarg , subject to |Of | = n.

The conditioning event Of in the above expression is made explicit to recognize the fact that the

values of these expressions are based on the offer set Of. The expected information gain will of course be

different for different offer sets. Furthermore, as discussed in Section 3.1, the probability of a link being

selected will be different when the same link is offered in different offer sets, as will be the revised

beliefs. Therefore, the corresponding information gain given that link is selected will also differ across

different offer sets. To recognize such differences, we include Of in the conditioning part of the

probability and information gain expressions when we evaluate these values for different offer sets.

3.4. Illustrative Example

We illustrate the active learning approach using a small example. Consider a firm that wants to learn the

gender of a user who is traversing its website. The profile captures the user’s likelihood of being male (m)

or female (f) given the link history (LH) of the user. When a new user arrives at the site, the first page the

user visits determines the user’s initial profile. For instance, if the user initially visits a finance page (say

l1) to which 60% of visitors are male, the site can infer the user is male with probability 0.6. The current

profile would then be represented as P(m|LH)=0.6 and P(f|LH)=0.4, where LH={l1}.

We assume that the site considers offer sets of cardinality two, and has available three links to offer as

shown in Table 1. The probabilities associated with the links are also shown in the table. In this example,

the consideration set is C={l2,l3,l4}, and the site has three possible offer sets to choose from: O1={l2,l3},

O2={l2,l4}, and O3={l3,l4}.

Table 1.

Link Profiles Link Popularities

P(m|lj) P(f|lj) P(lj)

a banking link (l2) 0.7 0.3 0.4

an insurance link (l3) 0.5 0.5 0.3

a family link (l4) 0.2 0.8 0.3

14

As discussed in Section 3.1, to evaluate the expected information gain for an offer set Of, we need to

calculate (i) for each link in the offer set the probability it will be selected, and (ii) the anticipated user

profile when each link is selected. To calculate these terms, we first need to calculate, for users of each

class, the likelihood of each link in the offer set being selected. Thus, for offer set O1={l2,l3}, we would

first need to determine the following four probabilities: P(l2|m,O1), P(l2|f,O1), P(l3|m,O1), and P(l3|f,O1).

Next, we will calculate the anticipated profile when each link is selected, i.e., P(m|LH,l2,O1),

P(f|LH,l2,O1), P(m|LH,l3,O1), and P(f|LH,l3,O1). Using the link probabilities shown in Table 1, and using

equation 2 we obtain:

65.03.0*5.04.0*7.0

4.0*7.0)()()()(

)()(),(

3322

222 1 =

+=

+=

lPlmPlPlmPlPlmP

OmlP , and

44.03.0*5.04.0*3.0

4.0*3.0),( 12 =+

=OflP .

It then follows that P(l3|m,O1) = 1- P(l2|m,O1) = 0.35 and P(l3|f,O1) = 1- P(l2|f,O1) = 0.56.

If the user clicks on link l2, following equation 1 the profile would be revised as:

69.04.0*44.06.0*65.0

6.0*65.0),(),(),(),(

),(),(),,(

1111

11

1

22

22 =

+=

+=

OLHfPOflPOLHmPOmlP

OLHmPOmlPOlLHmP .

31.0),,(1),,( 11 22 =−= OlLHmPOlLHfP .

Therefore, the information gain if link l2 is clicked from offer set O1 is

( ) ( )[ ] ( ) ( )[ ] .07.031.0log*31.069.0log*69.04.0log*4.06.0log*6.0),,( 222212 =−−−−−=OlLHGI

Similarly, the information gain when link l3 is clicked is found to be (-0.03). Next, for each link in the

offer set the probability the user will click on it is (using equation 3)

57.04.0*44.06.0*65.0),( 12 =+=OLHlP , which implies 43.0),( 13 =OLHlP .

Finally, from equation 4 the expected information gain for offer set O1 is found to be

03.043.0*)03.0(57.0*07.0),( 1 =−+=OLHGEI .

15

The expected information gains for offer sets O2 and O3 are calculated in a similar manner and found

to be 0.18 and 0.08, respectively. Therefore, the offer set O2 that includes the banking link and the family

link yields the highest expected information gain, and is the optimal offer set.

4. A Fast Heuristic Approach to Determine Offer Sets

We have shown how alternative offer sets can be compared in order to learn a profile attribute as quickly

as possible. However, for some sites, the number of potential offer sets to evaluate could be very large.

For example, if a site has 100 candidate links and is considering offer sets of cardinality five, then there

are 5100C (~75,000) possible offer sets to evaluate. If the cardinality of offer sets is ten, then the number

of possible offer sets to evaluate would be more than 17 trillion. In such situations, it is impractical for a

site to evaluate each possible offer set in real time.

We show that the problem of identifying the optimal offer set is difficult in general, and greedy

approaches do not guarantee optimality. We then identify some properties of collections of links that

make them desirable to include in an offer set. These properties are used to develop a heuristic approach.

4.1. Non-Monotonic Information Gain

One way to solve the offer set construction problem is by first determining the optimal offer set of

cardinality two, and then adding those links to this set that lead to the highest increase in the expected

information gain. However, as discussed in Proposition 1 below, we find that this procedure cannot

ensure that the optimal offer set of the desired cardinality will be found.

Proposition 1: A link that is part of the optimal offer set of cardinality r is not guaranteed to be part of

the optimal offer set of cardinality r+1.

Proof. The proof is by a counter example. Assume a site is interested in learning a binary profile

A={a1,a2} using an offer set of cardinality three. We assume that initially the site has diffuse priors about

the user’s profile, i.e., P(a1)=P(a2)=0.5. Four links with the probability parameters shown in Table 2 are

assumed to be available to the site.

16

Table 2.

P(a1|lj) P(a2|lj) P(lj)

l1 0.90 0.10 0.1

l2 0.15 0.85 0.1

l3 0.10 0.90 0.3

l4 0.85 0.15 0.2 The most informative offer set of cardinality two is computed to be Of={l1,l2}, with an expected

information gain of 0.46. The most informative offer set of cardinality three is Of’={l1,l3,l4}, with an

expected information gain of 0.48. Although link l2 is part of the optimal offer set of cardinality two, it is

not part of the optimal offer set of cardinality three.

This finding is primarily due to the fact that the change in the expected information gain by adding a

link to an offer set depends not only on the probability parameters associated with the link itself, but also

on the probability parameters associated with other links in the offer set. The information gain

I(A|LH,lj,Of) attributed to a link lj in an offer set Of depends on the posterior entropy of the profile

H(A|LH,lj,Of) (the prior entropy is fixed for each interaction). The posterior entropy depends on the

posterior belief distribution P(ai|LH,lj,Of) over all i. This posterior belief distribution depends, in turn, on

the probability P(lj|ai,Of) that users from different classes (e.g., class ai) will click on link lj when faced

with the offer set Of. This probability depends not only on probability parameters associated with the link

lj but also on the probability parameters associated with every other link in the offer set. Thus, when the

offer set changes, the contribution to the expected information gain from a specific link also changes. Due

to this characteristic of the information gain a locally optimal solution will not necessarily be part of the

globally optimal solution. Consequently, greedy approaches do not guaranty optimality.

Interestingly, not only does the contribution to the expected information gain from including a link

depend on the other links in an offer set, this contribution is not always guaranteed to be positive.

Proposition 2 shows that there are no regularities related to the change in the information content of an

offer set when a link is added to an existing optimal offer set.

17

Proposition 2: Adding a link to an offer set that is optimal for cardinality r does not guarantee that the

expected information gain will increase nor does it guarantee it will decrease.

Proof. Once again, the proof is by counter example. As before, a site is interested in learning a binary

profile A={a1,a2} using an offer set of cardinality three, and initially the priors about the profile are

diffuse. Four links with the probability parameters shown in Table 3 are assumed to be available to the

site. The most informative offer set of cardinality two is Of={l1,l2}, with an expected information gain of

0.35. If link l3 is added to this offer set, the expected information gain for the new offer set Of’ = {l1,l2,l3}

increases to 0.37. If link l4 is added to Of, the expected information gain for the new offer set Of’’ =

{l1,l2,l4} decreases to 0.26.

Table 3.

P(a1|lj) P(a2|lj) P(lj)

l1 0.90 0.10 0.1

l2 0.20 0.80 0.2

l3 0.85 0.15 0.1

l4 0.25 0.75 0.15 A corollary to the above proposition is that the expected information gain for any offer set (not just an

optimal one) could either go up or down depending on the probability parameters associated with all the

concerned links. This aspect of offer sets makes the problem even harder to solve in an optimal manner.

4.2. Impact of Link Properties on the Information Value of an Offer Set

Our findings in Section 4.1 indicate that efficient techniques based on greedy algorithms cannot guarantee

the optimal offer set. Since the offer set must be determined in a very short amount of time, we focus on

developing an efficient heuristic that exploits the link properties. To do this, we analyze further the impact

of the link profile and link popularity on the offer set construction problem.

As discussed in Section 3, ideally we would like to identify the offer set that maximizes the expected

information gain at each interaction. Since the prior entropy depends on the current profile distribution of

18

a user, it is a constant. Therefore, the expected information gain is maximized by identifying the offer set

that minimizes the expected posterior entropy. Thus,

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

= ∑∈⊂⊂

fjff Oljj

COf

COlLHAHLHlPOLHAEI ),()(minarg),(maxarg

= ⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧− ∑ ∑

∈ ∈fj if Ol Aajijij

OlLHaPlLHaPLHlP ),(log),()(minarg 2

= ⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧− ∑ ∑

∈ ∈fj if Ol Aa j

iij

j

iijj

O LHlPLHaPalP

LHlPLHaPalP

LHlP)(

)()(log

)()()(

)(minarg 2

= ( ) .)(log)()()()(log)()(minarg 22⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−− ∑ ∑∈ ∈fj if Ol Aa

jiijiijiijO

LHlPLHaPalPLHaPalPLHaPalP

The above expression is the difference between two entropy expressions, i.e.,

∑ ∑∈ ∈

−fj iOl Aa

iijiij LHaPalPLHaPalP )()(log)()( 2 , and, ∑∈

−fj Ol

jj LHlPLHlP )(log)( 2 .

The first term is the entropy associated with the probability distribution of observing a link being selected

by members of a specific class, i.e., the distribution P(lj,ai|LH) = P(lj|ai)P(ai|LH). We refer to this as the

link-class entropy. The value of the objective function decreases when this entropy decreases, i.e., when

the product P(lj|ai)P(ai|LH) has extreme values (more precisely, the product P(lj|ai)P(ai|LH) for one of the

combinations of (lj,ai) should be close to one and all other combinations should have a product close to

zero). Since P(ai|LH) is known, a site could select links that make P(lj|ai) extreme. The second term is the

entropy associated with the probability distribution for links in the offer set being selected by the user

(that we call the link entropy). The value of the objective function decreases when the link entropy

increases. This entropy is maximized when each link in the offer set is equally likely to be selected by the

user.

The challenge is to identify links to include in the offer set that minimize the link-class entropy and

simultaneously maximize the link entropy. To provide further insights into this problem, we consider an

19

example where a site wishes to learn a binary profile A={a1,a2} by offering two links Of={lx,ly}. The

probability that link lx is selected by a member of class a1 is (from equation 2):

)()()()()()(

)(11

11

yyxx

xxx lPlaPlPlaP

lPlaPalP

+= .

We have seen that if P(lj|ai) has extreme values, it helps minimize the link-class entropy. For a given

class P(ly|ai) =1-P(lx|ai). Therefore, if we select links that make P(lx|a1) large, P(ly|a1) will be small. If the

link popularities are about equal for the two links, then P(lx|a1) can be made as large as possible by

selecting lx and ly such that P(a1|lx) is as large as possible and P(a1|ly) is as small as possible. This idea is

formalized by Proposition 3a (the proof is provided in the appendix).

Proposition 3a: When learning a binary profile using an offer set of cardinality two, offering the two

links with the most extreme link profiles reinforcing each class respectively is optimal when the links have

equal popularity.

Unfortunately, as shown in Proposition 3b, the proof does not extend to situations where the link

popularities are not equal for links in the offer set.

Proposition 3b: When learning a binary profile using an offer set of cardinality two, offering the two

links with the most extreme link profiles reinforcing each class is not guaranteed to be optimal when the

link popularities are not equal.

Proof. The proof is by counter example. We assume that the current belief about the profile is uniform,

i.e., P(a1|LH)= P(a2|LH)=0.5, and the site has the following three links available for consideration.

Table 4.

P(a1|lj) P(a2|lj) P(lj)

l1 0.9 0.1 0.5

l2 0.1 0.9 0.1

l3 0.2 0.8 0.4

20

In this example, l1 and l2 have the most extreme link profiles. The expected information gain from

offering links l1 and l2 together is 0.37. However, the expected information gain is 0.41 when link l3 is

offered along with link l1.

The above finding highlights the fact that determining the optimal offer set based solely on the link

profiles is not optimal. As one may expect, the link popularities also plays an important role in the

expected information gain calculations.

We have discussed that having an extreme distribution for the terms P(lj|ai) is desirable in general.

When a link lx has a large probability of being selected by members of a specific class, say a1, it implies

that when a user selects link lx that further reinforces the user’s class as being a1. Therefore, it is desirable

that link lx is clicked on by users from class a1. If the second link ly has a smaller link popularity compared

to lx, it would increase the likelihood of link lx being selected by users from class a1 even more. However,

this is possible only at the cost of increasing the likelihood of members of class a2 clicking on link lx as

well. For a link to be truly discriminative, the likelihood of selecting a link in an offer set should be

driven by the link profiles and not link popularities. Therefore, it is desirable to include in the offer set

links that reinforce different classes and have similar popularities. This also helps increase the link

entropy, further improving the expected information gain.

4.3. An Efficient Algorithm to Determine Offer Sets

Based on the findings discussed in Section 4.2, we have developed an efficient algorithm that exploits the

properties of link profiles and link probabilities in constructing the offer set. The algorithm selects links to

include in the offer set in an iterative manner. It includes links that have extreme profiles for the different

profile classes, while also ensuring that the sum of the link popularities associated with links that

corroborate a class remains as close as possible to the sum of the link popularities associated with links

21

that corroborate other classes. We assume that the cardinality of the offer set n is larger than the number

of classes to be learnt m, i.e., n ≥ m.4

Algorithm to Determine Offer Sets

Input:

The link profiles P(ai|lj) and link popularities P(lj) for each link in the consideration set.

The prior P(ai) for each profile class.

The cardinality of the offer set n.

Algorithm:

1) For each profile class, identify the subset of links that have a profile P(ai|lj) greater than the class prior P(ai). Sort the links in a list in descending order of link profile for that class.

2) If there are no links in the offer set:

a) Add the link that is at the top of each sorted list to the offer set.

Else:

a) For each profile class, sum up the link popularities P(lj) for links in the offer set reinforcing that class.

b) Determine the class for which the sum of link popularities is the smallest.

c) Add to the offer set the link at the top of the list for that class.

3) Remove the link(s) added to the offer set from the appropriate list(s). If a link appears on multiple lists, it is removed from all of them.

4) Repeat steps 2 and 3 until n links have been selected.

The links at the top of the sorted lists are the links with the highest link profiles for each class. At

each iteration, this algorithm adds to the offer set the link with the most extreme profiles, while trying to

keep the probability that links selected to reinforce each class are as balanced as possible. We illustrate

the algorithm using an example in Section 5.3.

4 While the site may have some flexibility regarding increasing the cardinality of the offer set, it would usually not be desirable to increase the offer set size beyond a point. Instead, if the number of profile classes to be learnt is very large, then it would be desirable to merge some of the classes so that the number of merged classes is less than or equal to the cardinality considered reasonable for the offer set. Once a user has been identified to belong to one of the merged classes, offer sets constructed for subsequent interactions can be used to learn the profile with finer granularity.

22

5. Three Scenarios

As discussed in Section 1, the active learning approach is viable for several different scenarios. The

scenarios differ in the candidate links that are available for consideration (i.e., the consideration set) and

the proportion of links in the offer set that are allocated towards learning profiles. The consideration set in

each scenario is denoted with a subscript for that scenario (e.g., CI denotes the consideration set in

Scenario I). We show how the active learning approach is adapted to each scenario.

5.1. Scenario I – Base Model

This is the most general scenario, and the one that we have implicitly discussed so far. In this scenario, all

links in the offer set are targeted towards learning the user’s profile and links to all pages that have not yet

been visited by the user are considered as candidate links. As discussed in Section 3.3, the optimal offer

set is characterized by:

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

= ∑∈⊂⊂

fjIfIf Olfjfj

COf

COOlLHAIOLHlPOLHAEI ),,(),(maxarg),(maxarg .

The algorithm can be used directly in this model by identifying the appropriate consideration set and

the size of the offer set. After each interaction (i.e., link clicked) by the user, the corresponding link is

removed from the consideration set for the next interaction.

5.2. Scenario II – Constrained Consideration Set

The second scenario is where the site restricts the consideration set based on the link clicked (page

requested) by the user. This assumes that the site has identified using some site specific criterion the

universe of links that are considered relevant to the page being requested. Therefore, the consideration set

CII at each interaction consists of only those links relevant to the requested page. The site targets all links

in the offer set towards learning the profile as in Scenario I. The optimal offer set is characterized by:

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

= ∑∈⊂⊂

fjIIfIIf Olfjfj

COf

COOlLHAIOLHlPOLHAEI ),,(),(maxarg),(maxarg .

23

The only difference from Scenario I is the reduced consideration set. Therefore, the search space is

smaller in Scenario II compared to that for Scenario I. The algorithm works exactly as in Scenario I,

except that the only links input to the algorithm are those in CII. The consideration set could be

substantially different at each interaction with the user, since it consists of the links relevant to the current

page being requested.

5.3. Scenario III – Predetermined Links in the Offer Set

This is the scenario where a site has identified for a requested page a set of links that must be included in

the offer set. These links would be identified by the site using some other consideration that the site may

have (e.g., ease of navigation, personalization based on contextual consideration such as day of the week,

etc.). This set of pre-determined links is denoted by Opd. The remaining links in the offer set are used

towards learning the profile, and denoted Ol (thus, Ol = Of − Opd). The consideration set consists of all

available links except those that are already included in the offer set as pre-determined links, i.e., CIII = CI

− Opd. The optimal offer set is characterized by:

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

= ∑∈⊂⊂

fjIIIlIIIl Olfjfj

COf

COOlLHAIOLHlPOLHAEI ),,(),(maxarg),(maxarg .

A point to note here is that the site should select those links in Ol that maximize the expected

information gain for the whole offer set Of (which is Ol ∪ Opd). In this scenario, the set of pre-determined

links also need to be provided as input to the algorithm. First, the pre-determined links are included in the

offer set, and the algorithm then determines the remaining links.

We use an example to illustrate how the algorithm works for Scenario III. Consider a site that wishes

to learn the gender of a new visitor. We assume 49% of the visitors to the site are male (i.e., P(m)=0.49

and P(f)=0.51). The universe of links to consider is shown in Table 5, along with the link probabilities.

Links l3 and l6 are assumed to be pre-determined for inclusion by the site. The site wants to offer two

more links to the user (i.e., n=4).

24

Table 5. Available Links

Links P(m|lj) P(f|lj) P(lj)

l1 0.79 0.21 0.08

l2 0.70 0.30 0.12

l3* 0.63 0.37 0.11

l4 0.55 0.45 0.12

l5 0.40 0.60 0.11

l6* 0.35 0.65 0.29

l7 0.23 0.77 0.07

l8 0.19 0.81 0.10

*pre-determined links The consideration set consists of all available links except the pre-determined ones, i.e.,

CIII={l1,l2,l4,l5,l7,l8}. The algorithm first creates two lists of links, one for each class. Each list is sorted

based on the link profile for that class. As shown in Table 6, links l1, l2, and l4 corroborate male, and links

l5, l7, and l8 corroborate female (since P(ai|lj)>P(ai) for these links).

The algorithm checks to see if the sum of link popularities of the pre-determined links that

corroborate male in the offer set is greater than the sum of link popularities of those that corroborate

female. Of the pre-determined links, l3 corroborates male, since P(m|l3)>P(m), and l6 corroborates female

since P(f|l6)>P(f). Because P(l3)=0.11 is less than P(l6)=0.29, the algorithm picks the link from the top of

the sorted list for male (link l1), and adds it to the offer set. In the next iteration, the algorithm compares

P(l1)+P(l3)=0.19 with P(l6)=0.29. Once again, the links corroborating male have a lower sum of link

popularities, and the algorithm picks the next link from the sorted list that corroborates male, which is

link l2. Since all four links have been identified, the algorithm stops with the final offer set Of={l1,l2,l3,l6}.

Table 6. Sorted lists

Links corroborating male Links corroborating female

Links P(m|lj) P(lj) Links P(f|lj) P(lj)

l1 0.79 0.08 l8 0.81 0.10

l2 0.70 0.12 l7 0.77 0.07

l4 0.55 0.12 l5 0.60 0.11

25

6. Experiments

The objective of the proposed approach is to learn the profiles quickly (i.e., after as few clicks as

possible). Therefore, to validate this approach we need to show that it helps infer the profile of a user

(i.e., the true class of a user) more accurately after only a few clicks as compared to some benchmark

approaches that do not consider active learning. We performed simulated experiments to validate the

proposed approach.5 The experiments implement the proposed and benchmark approaches to determine

the offer sets given data on a number of links. We compare the performance of the proposed approach

with the benchmark approaches by evaluating the prediction accuracy after different numbers of links

have been clicked.

The data needed for the experiments are the link properties (link profiles and the link popularities) for

pages at a site. We used click-stream data collected by comScore Networks in our experiments to

generate link properties. The dataset contained web traversals of 7240 panelists (from single person

households) during August 2002 along with their ages (in different ranges). The available data were

aggregated at the domain level; therefore we used information on the domains visited by the panelists in

the given month. To obtain reliable probability estimates for the link properties, we used data from 1141

domains which had at least 100 unique visitors. We experimented with learning the age of the users. The

age variable was transformed into a binary variable using a cutoff value of 35. In our data set, 44% of the

users are younger than 35 and the remaining 56% of the users are 35 or older. Link profiles are estimated

based on the percentage of unique visitors from each class to a domain during that month. The number of

unique visitors to each domain in a month was used as a measure for link popularity.

The benchmark approaches we consider assume that the site is not trying to learn the profiles actively.

Sites can have various objectives when identifying links to make available at each page, such as to make

5 Conducting experiments on a live website is difficult since the site owner would need to collect true profile information of users and then let us manipulate the links offered in live sessions. We do not have access to a site that has the profile information of its visitors and that would allow us to manipulate links offered.

26

recommendations that a user would enjoy (target pages/products), to enable access to certain content, to

facilitate easy navigation, etc. If the site’s purpose is to target certain products or pages, in the absence of

an adequate profile, a site may present links that are more popular as popularity is a broad measure of

consumer interest (Schafer et al. 2001). In that case, the links with high popularities would be more likely

to be included in an offer set. We consider one benchmark approach that imitates such a targeting

strategy. This benchmark approach (referred to as BM-POP) selects links randomly based on the

distribution of their popularities. On the other hand, if the site uses some consideration that is unrelated to

the link profiles and link popularities, then we can view each offer set to consist of links that are picked

randomly with uniform probability from the available consideration sets. We consider a second

benchmark that implements such a strategy. This benchmark approach (referred to as BM-UF) selects

links with uniform probability over the set of candidate links.

We conduct experiments for all three scenarios discussed in Section 5. The implementations of the

approaches in the three scenarios differ in the way the consideration sets and the offer sets are determined.

The proposed approach is implemented based on the algorithm presented in Section 4. We also implement

the two benchmark approaches BM-UF and BM-POP.

In the experiments, each user’s click-stream is simulated as follows. A specific class is first assigned

to a user; this is the true class of the user. For testing purposes, each user is initially assumed to be a new

user and therefore the user’s profile is equivalent to the prior class distribution. Next, the consideration set

is identified according to the scenario being simulated. Then, the offer set is determined based on the

approach being evaluated in a given scenario (specific details are provided later). Offer sets of size 10 are

considered in our experiments. The user’s traversal is simulated based on the user’s probability of

clicking on a link in the offer set given the user’s true class, P(lj|ai,Of), and the profile is subsequently

revised based on the selected link. The Bayesian belief revision technique presented in Section 3.1 is used

to revise the profile of the users. Each user click-stream is simulated for up to 15 links clicked. The whole

process is repeated for 500 users from each class.

27

6.1. Experimental Results

The performances of the proposed approach and the benchmark approaches are evaluated in terms of their

ability to learn the profile, i.e., to identify the true class of the users. The performance after only a small

number of clicks is especially important as a site would like to learn the profile earlier in a session, so that

the profile can be utilized for making targeted recommendations later in the session. Furthermore, a site

may have a threshold probability beyond which a user’s profile is considered to be adequately known and

the site starts exploiting the profile for making recommendations. Thus, we also evaluate the percentage

of users whose true class was inferred with a probability greater than a threshold value after each link

clicked. We found the performances of all approaches to be not significantly different across the two

classes of users. Therefore, the results aggregated over the two classes are reported. For each scenario, we

first describe how the approaches are implemented and then present and discuss the experimental results.

6.1.1. Results for Scenario I

In Scenario I, the consideration set consists of all the links not visited by a user. The performances of our

approach (denoted as AL) and the two benchmark approaches after three clicks are reported in Table 7.

Table 7a lists, for the proposed and benchmark approaches, the proportions of test instances that resulted

in a probability prediction for the true class in different ranges of values (i.e., the probability density of

correct predictions). For instance, the proposed approach is able to identify the true class of 30% of users

with a probability greater than 0.99 and 50% of users with a probability between 0.9 and 0.99 after only

three clicks. Neither of the benchmark approaches can identify the true class of any user with a

probability greater than 0.99 after three clicks. BM-UF identified only 4% of the test cases the user’s true

class with a probability between 0.9 and 0.99, while BM-Pop identified only 0.2% of the user’s true class.

The benchmark approaches are not able to identify the true class with a high level of confidence. A

majority of the predictions are in the 0.3-0.7 range in BM-Pop, while they are in the range of 0.3-0.9 in

BM-UF.

28

Table 7 – Performance in Scenario I after 3 clicks

7a. Probability Density of Correct Prediction 7b. Cumulative Distribution of Correct Prediction

Probability Ranges AL BM-UF BM-POP

Probability Thresholds AL BM-UF B-POP

>0.99 30.00% 0.00% 0.00% >0.99 30.00% 0.00% 0.00%

0.9-0.99 50.00% 4.00% 0.20% >0.9 80.00% 4.00% 0.20%

0.7-0.9 9.20% 21.80% 9.10% >0.7 89.20% 25.80% 9.30%

0.5-0.7 4.50% 38.90% 48.40% >0.5 93.70% 64.70% 57.70%

0.3-0.5 1.50% 27.30% 39.80% >0.3 95.20% 92.00% 97.50%

0.1-0.3 2.00% 7.60% 2.50% >0.1 97.20% 99.60% 100.00%

0-0.1 2.80% 0.40% 0.00% >0 100.00% 100.00% 100.00% Table 7b presents the cumulative distribution of the predicted probabilities of the true class of users.

Each row reports the percentage of users whose true class was identified with a probability larger than the

given threshold. This table helps us evaluate the percentage of users that would be correctly classified

after three link clicks given the corresponding probability thresholds for classification. For example, if a

site considers a threshold of 0.9 to be adequate, then after only three clicks 80% of all users would be

correctly classified by the proposed approach. If the classification threshold is 0.5, then the proposed

approach would be able to identify the correct class of 93.7% of all users after three clicks.

The performances of the benchmark approaches are much inferior. Further, the benchmark approach

that selects links based on popularity (BM-POP) performs worse than the benchmark approach that

selects links with uniform probability (BM-UF). For example, for a probability threshold of 0.5, BM-

POP is able to identify 57.7% of users’ true classes while BM-UF is able to identify 64.7% of user’s true

class after three clicks. The more popular links are often those links that are liked by members of both

classes; hence they are less informative in terms of learning the class of a user. Furthermore, these links

are more likely to be selected in an offer set compared to less popular links. Therefore, the performance of

BM-POP is not as good overall.

Figure 1 presents the performance of the three approaches in terms of the percentage of users

correctly classified beyond the probability threshold 0.9 after each link clicked. The performances of all

29

approaches improve with the numbers of clicks in general. The marked performance difference between

the proposed approach and the benchmark approaches is clearly visible for all numbers of clicks

considered. After 15 clicks, the proposed approach is able to correctly classify 99.7% of all users, whereas

BM-UF and BM-POP correctly classifies 41% of users and 10.2% of users, respectively.

Figure 1. Classification performance in Scenario I for threshold 0.9

As the performance of BM-POP is relatively poor for all numbers of links clicked, we report only the

performance of BM-UF as the benchmark approach of interest for Scenarios II and III.

6.1.2. Results for Scenario II

In Scenario II, the consideration set constitutes of links relevant to a page. We consider a pre-determined

number of other pages to be relevant to each page. We do not presume any specific relationship between

the properties (i.e., link profile and link popularity) of the page requested and the properties of the links

relevant to that page. Therefore, to implement this scenario we identify the consideration set by randomly

selecting a fixed number of links not visited by the user (each link has equal probability of being included

in the consideration set). A new consideration set is identified for each page requested (link clicked).

There are 1141 links available initially. We experiment with consideration sets of size 25, 50, 100, and

200, respectively. The links to offer are identified from the particular consideration set determined for the

requested page.

30

Figure 2. Classification performance in Scenario II for threshold 0.9

Figure 2 presents the percentage of users correctly classified beyond the probability threshold 0.9

after each link clicked. For the proposed model the results of the experiments with different sizes of

consideration sets are presented. The performance of the benchmark approach was not significantly

different for different sizes of consideration sets considered. This is because both the links in the

consideration sets and in the offer sets in the benchmark approach are determined randomly with uniform

probability over the universe of links. Hence, we present only the performance of the benchmark

approach for the consideration set of size 200. Each experiment is denoted with the approach being

evaluated and the size of the consideration set. For example, AL_CS25 refers to the experiment with the

proposed approach where the size of the consideration set is 25.

As expected, the performance of the proposed approach improves with the consideration set size. This

is because as the size of the consideration set increases the likelihood of having more informative links in

the consideration set increases as well. For the consideration set of size 200 the proposed approach

correctly classifies 65% of the users beyond a threshold value of 0.9 after only three clicks. Even with a

consideration set of size 25 the performance of the proposed approach is remarkably better than the

benchmark approach. In this case, the proposed approach correctly classifies 19.9% of the users beyond a

threshold value of 0.9 after only three clicks. This number increases to 38.7% after five clicks, 66% after

31

10 clicks, and 81.5% after 15 clicks. As evident from Figure 2, the benchmark approach with a

consideration set of size 200 performs considerably worse.

6.1.3. Results for Scenario III

In Scenario III, a number of links in the offer set are pre-determined based on considerations other than

profile learning. The site identifies the remaining links in the offer set for learning the profiles. All links

not visited by the user are considered for inclusion in the offer set. Similar to the benchmark approach, we

assume the site’s considerations when identifying the pre-determined links are unrelated to the link

profiles and popularities. Therefore, as in the benchmark approach, the pre-determined links are identified

randomly with uniform probability from the links not visited by the user. The remaining links are selected

from the consideration set based on the proposed algorithm. We vary the numbers of links that are

targeted towards learning the profiles by considering 2, 4, 6, and 8 targeted links out of the 10 links in the

offer set (denoted as AL_2, AL_4, AL_6, and AL_8, respectively). If none of the links are targeted

towards learning the profile then it is equivalent to the benchmark approach (BM-UF). If all 10 links are

targeted towards learning the profile then this approach becomes equivalent to the proposed approach in

Scenario I.

Figure 3 presents the percentage of correctly classified users (using the probability threshold 0.9) for

the benchmark approach and the proposed approach with the different numbers of links targeted towards

learning. Not surprisingly, the proposed approach consistently outperforms the benchmark approach for

all numbers of clicks and for all proportions of pre-determined links. As expected, the performance

improves with the proportion of links allocated towards actively learning the profile. These results

highlight the fact that the ability of a website to learn profiles quickly depends on how flexible the site is

in terms of the proportion of links it can allocate for active learning. Using even a small proportion of the

links towards active learning makes a big difference in terms of the ability to learn quickly. When only

twenty percent of the links are used to actively learn the profile (AL_2), 28.6% of the users are correctly

classified after five clicks as compared to 8.5% correctly classified when none of the links are targeted

32

towards learning (BM-UF). After 10 clicks AL_2 correctly classifies 56.5% of users, where as BM-UF

correctly classifies only 26.9% of users. After 15 clicks 72.6% of users are correctly classified by AL_2

as compared to 39.8% of users correctly classified by BM_UF.

Figure 3. Classification performance in Scenario III for threshold 0.9

When larger numbers of links are allocated for active learning, the proposed approach is able to

correctly classify the majority of users after only a few clicks. For example, the percentage of users

correctly classified after only three clicks is 62.7% when 8 links are used for active learning and it is 43 %

when 6 links are so used.

7. Conclusion and Discussion

Personalization systems are used in online environments to enhance the quality of service to customers by

providing them relevant content. In order to do this effectively, firms require necessary profile

information about its site users. By obtaining the relevant profile information quickly firms can utilize

these systems more effectively. We have shown how an implicit profile learning process can be

accelerated by judiciously selecting the links to offer at each page. Our experiments demonstrate that the

proposed active learning approach vastly speeds up the learning ability as compared to when the site

learns user profiles passively.

33

The proposed approach can reduce or eliminate the need to ask users to register at a site before

allowing them to start using the site. The ability to learn a user’s profile in a very small number of clicks

will enable a site to provide personalized services quickly within a session, even if the users are

anonymous to begin with. This has important implications for both online retailers and content delivery

sites. Online retailers will be able to provide more relevant recommendations to its new or anonymous

users sooner, increasing their immediate sales as well as overall customer satisfaction. The ability to learn

profiles quickly in an implicit manner is especially beneficial for content delivery sites, which typically

have fewer means of collecting customer information due to lack of transactions on their web sites. The

proposed approach will allow content delivery sites to quickly provide targeted advertisements and

services without having to collect the profile information explicitly from users. Providing targeted

advertisements quickly can substantially increase the advertising revenues for such sites. Our work also

has important implications for advertising networks, as offering superior targeting capabilities is of

critical importance to such firms. Advertising networks could find it beneficial to control certain links as

well as the advertisements at their partner websites in order to learn the profiles quickly.

Our work can be extended in several ways. In scenario three, we have assumed that a site could

control a proportion of the offer set for learning a user’s profile, where this proportion is pre-determined.

This proportion could itself be endogenous in the model, where it could depend on the trade-offs faced by

the firm when offering links to learn the profile and when offering links identified based on other

considerations. In another scenario, a site may decide to dynamically change this proportion in the course

of a session, based on how successful it is in learning the desired profile. For instance, a site may consider

allocating a large proportion of links targeted towards learning the profile initially, and then gradually

reduce this proportion. Another research direction to consider is how to combine market segmentation

profiles with other kinds of profiles (e.g., term vectors) in order to best exploit all the information that

may be available to a site.

34

References

1. Ansari, A., S. Essegaier, R. Kohli. 2000. Internet Recommendation Systems. J. Marketing Res. 37(3) 363-375.

2. Ansari, A., C. F. Mela. 2003. E-Customization. J. Marketing Res. 40(2) 131-145. 3. Atahan, P., S. Sarkar. 2007. A Probabilistic Approach to Learning Profiles from Website Traversals.

Proc. Twelfth INFORMS Conf. on Inform. Systems and Technology (CIST), Seattle. 4. Baglioni, M., U. Ferrara, A. Romei, S. Ruggieri, F. Turini. 2003. Preprocessing and Mining Web Log

Data for Web Personalization. Proc. of Eighth Congress of the Italian Association for Artificial Intelligence, Pisa, Italy, 237-249.

5. Billsus, D., M. J. Pazzani. 2000. User Modeling for Adaptive News Access. User Modeling and User-Adapted Interaction 10(2-3) 147-180.

6. Cannon, H. M. 2001. Addressing new media with conventional media planning. J. Interactive Advertising 1(2) http://jiad.org/vol1/no2/cannon/index.html.

7. Culnan, M. J., G. R. Milne. 2001. The Culnan-Milne Survey on Consumers and Online Privacy Notices: Summary of Responses. http://www.ftc.gov/bcp/workshops/glb/supporting/culnan-milne.pdf.

8. Gal-Or, E., M. Gal-Or. 2005. Customized advertising via a common media distributor. Marketing Sci. 24(2) 241–253.

9. Iyer, G., D. Soberman, J. M. Villas-Boas. 2005. The Targeting of Advertising. Marketing Sci. 24(3) 461–476.

10. Kotler, P. 2003. A Framework for Marketing Management, Pearson Education, Upper Saddle River, NJ.

11. Li, S., J. C. Liechty, A. L. Montgomery. 2002. Modeling Category Viewership of Web Users with Multivariate Count Models. Working Paper 2003-E25, Carnegie Mellon Graduate School of Industrial Administration, Pittsburgh, PA.

12. Liu, F., C. Yu, W. Meng. 2004. Personalized Web Search for Improving Retrieval Effectiveness. IEEE Trans. Knowledge Data Engrg. 16(1) 28-40.

13. Middleton, S. E., N.R. Shadbolt, D. C. De Roure. 2004. Ontological User Profiling in Recommender Systems. ACM Trans. Inform. Systems 22(1) 54–88.

14. Montgomery, A. L. 2001. Applying Quantitative Marketing Techniques to the Internet. Interfaces 30(2) 90-108.

15. Tom M. Mitchell. 1997. Machine Learning, McGraw Hill Publishers. 16. Padmanabhan, B., Z. Zheng, S. Kimbrough, S. 2006. An Empirical Analysis of the Value of

Complete Information for eCRM Models. MIS Quarterly 30(2) 247-267. 17. Pazzani, M. 1999. A framework for collaborative, content-based and demographic filtering. Artificial

Intelligence Rev. 13(5-6) 393–408. 18. Pazzani, M.J., D. Billsus. 1997. Learning and Revising User Profiles: The Identification of Interesting

Web Sites. Machine Learning 27(3) 313–331. 19. Rashid, A. M., I. Albert, D. Cosley, S. K. Lam, S. M. McNee, J. A. Konstan, J. Riedl. 2002. Getting

to Know You: Learning New User Preferences in Recommender Systems. Proc. Int’l Conf. Intelligent User Interfaces. San Francisco, 127 – 134.

20. Schafer, J. B., J. A. Konstan, J. Riedl. 2001. E-Commerce Recommendation Applications. Data Mining and Knowledge Discovery 5(1-2)115–153.

35

21. Shannon, C. E. 1948. A Mathematical Theory of Communication. Bell System Technical Journal. 27(7, 10) 379-423, 623-656.

22. Statistical Research Inc. 2001. Even veteran Web users remain skittish about sites that get personal. Press Releases (June 7) http://www.statisticalresearch.com/press/pr060701.htm.

23. Van den Poel, D., W. Buckinx. 2005. Predicting online-purchasing behavior. European J. Operational Res. 166(2) 557–575.

24. Vozalis M. G., K. G. Margaritis. 2007. Using SVD and Demographic Data for the Enhancement of Generalized Collaborative Filtering. AIAI Information Sciences 177(15) 3017-3037.

25. Widyantoro, D. H., T. R. Ioerger, J. Yen. 2001. Learning User Interest Dynamics with a Three-Descriptor Representation. J.the American Society of Information Science and Technology (JASIST) 52(3) 212-225.

26. Wong, S. K. M., C. J. Butz. 2000. A Bayesian Approach to User Profiling in Information Retrieval. Technology Letters 4(1) 50-56.

27. Yang, Y, B. Padmanabhan. 2005. Evaluation of Online Personalization Systems: A Survey of Evaluation Schemes and a Knowledge-Based Approach. J. Electronic Commerce Res. 6(2) 112-122.

28. Yu, K., Schwaighofer, A., Tresp, V., Xu, X., and Kriegel, H. P. 2004. “Probabilistic Memory-Based Collaborative Filtering,” IEEE Trans. Knowledge and Data Engrg. (16: 1), pp. 56-69.

36

Appendix: Proof for Proposition 3a

We first prove the following two lemmas that are used to prove the proposition. In all the proofs, we have,

by assumption: jilPlP ji ,)()( ∀= .

Lemma 1: If P(a1| li) > P(a1| lj) for an offer set O = {li, lj}, then P(li | a1,O) > P(li | a2,O).

Proof. The lemma shows that users belonging to a given class are more likely to click on the link in the

offer set which has a relatively higher probability associated with that class, as compared to users from

the other class. )}.()({5.0)()(

)()()()()(

)()(),( 11

11

1

11

11 ji

ji

i

jjii

iii laPlaP

laPlaPlaP

lPlaPlPlaPlPlaP

OalP >>+

=+

= Q

Similarly, )}.()({5.0)()(

)(),( 22

22

22 ji

ji

ii laPlaP

laPlaPlaP

OalP <<+

= Q

Lemma 2: If P(a1| li) > P(a1| lj) for an offer set O = {li, lj}, then

(i) ),,(),,( 11 OlLHaPOlLHaP ji > (ii) ),,(),,( 22 OlLHaPOlLHaP ij > .

Proof. (i) For ),,(),,( 11 OlLHaPOlLHaP ji > to be true, we must have

)(),()(),()(),(

)(),()(),()(),(

2211

11

2211

11

LHaPOalPLHaPOalPLHaPOalP

LHaPOalPLHaPOalPLHaPOalP

jj

aj

ii

i

+>

+

⇔ )(),()(),()(),()(),(

)(),()(),()(),()(),(

22111111

22111111

LHaPOalPLHaPOalPLHaPOalPLHaPOalP

LHaPOalPLHaPOalPLHaPOalPLHaPOalP

ijij

jiji

+>

+

⇔ )(),()(),()(),()(),( 22112211 LHaPOalPLHaPOalPLHaPOalPLHaPOalP ijji >

⇔ ),(),(),(),( 2121 OalPOalPOalPOalP ijji >

We know from Lemma 1 that ),(),( 21 OalPOalP ii > and ),(),( 12 OalPOalP jj > . Therefore, it must

be true that ),,(),,( 11 OlLHaPOlLHaP ji > .

(ii) The proof follows from part (i).

Proposition 3a: When learning a binary profile using an offer set of cardinality two, offering the two

links with the most extreme link profiles reinforcing each class respectively is optimal when the links have

equal popularity.

Proof:

To prove proposition 3a, we assume offer set O1 = {l1, l2}, where links l1 and l2 are the links most

extremely reinforcing classes a1 and a2, respectively (i.e., P(a1|l1) > P(a1|li) ∀ i≠1, and P(a2|l2) > P(a2|lj) ∀

j ≠ 2). We show that when one of the links is replaced with some other link (say, link la), it leads to a

37

decrease in the expected information gain. Without loss of generality, we assume that link l2 is being

replaced by link la. We denote Oa = {l1, la}. Since P(a2|l1) = 1– P(a1|l1) and P(a1|l2) =1– P(a2|l2), it follows

that )()()( 21111 laPlaPlaP a >> and )()()( 22212 laPlaPlaP a << .

Given the offer sets O1 and Oa as defined, we need to show ),(),( 1 aOLHAEIOLHAEI > . To prove this,

we must show the following condition holds:

),,(),(),,(),(

),,(),(),,(),(

11

12121111

aaaaaa OlLHAIOLHlPOlLHAIOLHlP

OlLHAIOLHlPOlLHAIOLHlP

+>

+.

⇔ )),,()()(,()),,()()(,(

)),,()(),()),,()()(,(

11

12121111

aaaaaa OlLHAHLHAHOLHlPOlLHAHLHAHOLHlP

OlLHAHLHAHOLHlPOlLHAHLHAHOLHlP

−+−>

−+−

⇔ ),,(),(),,(),()(

),,(),(),,(),()(

11

12121111

aaaaaa OlLHAHOLHlPOlLHAHOLHlPLHAH

OlLHAHOLHlPOlLHAHOLHlPLHAH

−−>

−−

⇔ ),,(),(),,(),(

),,(),(),,(),(

11

12121111

aaaaaa OlLHAHOLHlPOlLHAHOLHlP

OlLHAHOLHlPOlLHAHOLHlP

+<

+

Let x = P(a2|l2) – P(a2|la). Then we have P(a2|la) = P(a2|l2) – x, where x > 0. It follows that P(a1|la) =

P(a1|l2) + x. As link l1 is the most discriminating link towards class a1, P(a1|la) cannot be greater than

P(a1|l1). Hence we have P(a1|la) < P(a1|l1). Substituting P(a1|l2) + x for P(a1|la) in this inequality, we find

that x < P(a1|l1) – P(a1|l2).

We define ),,(),(),,(),()( 11 aaaaaa OlLHAHOLHlPOlLHAHOLHlPxf += .

When x = 0, ),,(),(),,(),()( 12121111 OlLHAHOLHlPOlLHAHOLHlPxf += . We differentiate f(x) with

respect to x. We show that the derivative is positive within the valid range of x, which implies that f(x)

increases with x. Therefore, for 0 < x < P(a1|l1) – P(a1|l2), the above inequality will hold.

We first algebraically manipulate f(x) to simplify the analysis.

),,(log),,(),,(log),,()(,(

)),,(log),,(),,(log),,()(,()(

222121

12212112111

aaaaaaaaaa

aaaaa

OlLHaPOlLHaPOlLHaPOlLHaPOLHlP

OlLHaPOlLHaPOlLHaPOlLHaPOLHlPxf

−−+

−−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−+

⎟⎟⎠

⎞⎜⎜⎝

⎛−−=

),,(log),(

)(),(),,(log

),()(),(

),(

),,(log),(

)(),(),,(log

),()(),(

),(

2222

1211

1221

221112

1

1111

aaaa

aaaa

aa

aaaa

aa

aa

a

aa

OlLHaPOLHlP

LHaPOalPOlLHaP

OLHlPLHaPOalP

OLHlP

OlLHaPOLHlP

LHaPOalPOlLHaP

OLHlPLHaPOalP

OLHlP

),,(log)(),(),,(log)(),(

),,(log)(),(),,(log)(),(

22221211

122221112111

aaaaaaaa

aaaa

OlLHaPLHaPOalPOlLHaPLHaPOalP

OlLHaPLHaPOalPOlLHaPLHaPOalP

−−

−−=

38

To simplify the differentiation, we define the following four functions, and then express the function f(x)

in terms of these four functions.

)()()(

)()(

)()()(

)(),()( 12111

111

111

111111 LHaP

xlaPlaPlaP

LHaPlaPlaP

laPLHaPOalPxf

aa ++

=+

==

)()()(

)()(

)()()(

)(),()( 22212

122

212

122212 LHaP

xlaPlaPlaP

LHaPlaPlaP

laPLHaPOalPxf

aa −+

=+

==

)()()(

)()(

)()()(

)(),()( 12111

211

111

1113 LHaP

xlaPlaPxlaP

LHaPlaPlaP

laPLHaPOalPxf

a

aaa ++

+=

+==

)()()(

)()(

)()()(

)(),()( 22212

222

212

2224 LHaP

xlaPlaPxlaP

LHaPlaPlaP

laPLHaPOalPxf

a

aaa −+

−=

+==

Then we have

)()()(

)(),()(),()(),(

),,(21

1

221111

11111 xfxf

xfLHaPOalPLHaPOalP

LHaPOalPOlLHaP

aa

aa +

=+

=

)()()(

)(),()(),()(),(

),,(21

2

221111

22112 xfxf

xfLHaPOalPLHaPOalP

LHaPOalPOlLHaP

aa

aa +

=+

=

)()()(

)(),()(),()(),(

),,(43

3

2211

111 xfxf

xfLHaPOalPLHaPOalP

LHaPOalPOlLHaP

aaaa

aaaa +

=+

=

)()()(

)(),()(),()(),(

),,(43

4

2211

222 xfxf

xfLHaPOalPLHaPOalP

LHaPOalPOlLHaP

aaaa

aaaa +

=+

=

This implies

⎟⎟⎠

⎞⎜⎜⎝

⎛+

++

++

++

−=)()(

)(log)(

)()()(

log)()()(

)(log)(

)()()(

log)()(43

424

43

323

21

222

21

121 xfxf

xfxf

xfxfxf

xfxfxf

xfxf

xfxfxf

xfxf

Differentiating f(x) with respect to x we obtain:

39

( ) ( )( )

( ) ( )( )

( ) ( )( )

( ) ( )( )

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

+

+

+−+

++

+

+

+

+−+

++

+

+

+

+−+

++

+

+

+

+−+

++

−=

)2(ln)()(

)()()(

)(')(')()()()('

)()()(

)(log)('

)2(ln)()(

)()()(

)(')(')()()()('

)()()(

)(log)('

)2(ln)()(

)()()(

)(')(')()()()('

)()()(

)(log)('

)2(ln)()(

)()()(

)(')(')()()()('

)()()(

)(log)('

)('

43

4

243

434434

443

424

43

3

243

433433

343

323

21

2

221

212212

221

222

21

1

221

211211

121

121

xfxfxf

xfxfxfxfxfxfxfxf

xfxfxf

xfxf

xfxfxf

xfxfxfxfxfxfxfxf

xfxfxf

xfxf

xfxfxf

xfxfxfxfxfxfxfxf

xfxfxf

xfxf

xfxfxf

xfxfxfxfxfxfxfxf

xfxfxf

xfxf

xf

( )

( )

( )

( ) ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

+−−+

++

+

+−−+

++

+

+−−+

++

+

+−−+

++

−=

)2(ln)()()(')()(')()()(')()('

)()()(log)('

)2(ln)()()(')()(')()()(')()('

)()()(

log)('

)2(ln)()()(')()(')()()(')()('

)()()(log)('

)2(ln)()()(')()(')()()(')()('

)()()(log)('

43

44344434

43

424

43

43334333

43

323

21

22122212

21

222

21

21112111

21

121

xfxfxfxfxfxfxfxfxfxf

xfxfxfxf

xfxfxfxfxfxfxfxfxfxf

xfxfxf

xf

xfxfxfxfxfxfxfxfxfxf

xfxfxfxf

xfxfxfxfxfxfxfxfxfxf

xfxfxfxf

( )

( )

( )

( ) ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

+−

++

+

+−+

++

+

+−

++

+

+−

++

−=

)2(ln)()()(')()()('

)()()(log)('

)2(ln)()()(')()()('

)()()(

log)('

)2(ln)()()(')()()('

)()()(log)('

)2(ln)()()(')()()('

)()()(log)('

43

3434

43

424

43

4343

43

323

21

1212

21

222

21

2121

21

121

xfxfxfxfxfxf

xfxfxfxf

xfxfxfxfxfxf

xfxfxf

xf

xfxfxfxfxfxf

xfxfxfxf

xfxfxfxfxfxf

xfxfxfxf

⎟⎟⎠

⎞⎜⎜⎝

⎛+

++

++

++

−=)()(

)(log)(')()(

)(log)(')()(

)(log)(')()(

)(log)('43

424

43

323

21

222

21

121 xfxf

xfxfxfxf

xfxfxfxf

xfxfxfxf

xfxf

We evaluate the derivatives of the functions f1(x), f2(x), f3(x), and f4(x):

( )22111

1111

)()(

)()()('

xlaPlaP

LHaPlaPxf

++−=

( )22212

2122

)()(

)()()('

xlaPlaP

LHaPlaPxf

−+=

40

( )( ) ( )

)(')()(

)()(

)()(

))()(()()()()(' 12

2111

1112

2111

211211113 xf

xlaPlaP

LHaPlaP

xlaPlaP

xlaPLHaPxlaPlaPLHaPxf −=

++=

++

+−++=

( ) ( )( ) ( )

)(')()(

)()(

)()(

)()()()()()(' 22

2212

2122

2212

222221224 xf

xlaPlaP

LHaPlaP

xlaPlaP

xlaPLHaPxlaPlaPLHaPxf −=

−+−=

−+

−+−+−=

Inserting them into f’(x), we obtain

( ) ( )

( ) ( ) ⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜

−+−

+++

−++

++−

−=

),,(log)()(

)()(),,(log

)()(

)()(

),,(log)()(

)()(),,(log

)()(

)()(

)('

2222212

212122

2111

111

12222212

2121122

2111

111

aaaa

aa

OlLHaPxlaPlaP

LHaPlaPOlLHaP

xlaPlaP

LHaPlaP

OlLHaPxlaPlaP

LHaPlaPOlLHaP

xlaPlaP

LHaPlaP

xf

( )( )

( )( )),,(log),,(log

)()(

)()(

),,(log),,(log)()(

)()()('

1222222212

212

1211222111

111

aaa

aaa

OlLHaPOlLHaPxlaPlaP

LHaPlaP

OlLHaPOlLHaPxlaPlaP

LHaPlaPxf

−−+

+

−++

=

( ) ( ) ),,(),,(

log)()(

)()(),,(),,(

log)()(

)()()('

12

222

2212

212

1

1122

2111

111

a

aa

aa

a

OlLHaPOlLHaP

xlaPlaP

LHaPlaPOlLHaPOlLHaP

xlaPlaP

LHaPlaPxf

−++

++=

In the above expression, the following expressions will always be positive:

( )22111

111

)()(

)()(

xlaPlaP

LHaPlaP

++ and

( )22212

212

)()(

)()(

xlaPlaP

LHaPlaP

−+.

We know from lemma 2 (i) and (ii) that ),,(),,( 111 aaa OlLHaPOlLHaP > and

),,(),,( 122 aaa OlLHaPOlLHaP > , implying 0),,(),,(

log1

112 >

aa

a

OlLHaPOlLHaP

and 0),,(),,(

log12

22 >

a

aa

OlLHaPOlLHaP

,

respectively. Since the product and sum of positive terms will be positive f’(x) will be positive in the valid

range of x, indicating that f(x) is increasing in x.