Recsys virtual-profiles

Generating Supplemental Content Information UsingVirtual Profiles

Haishan LiuLinkedin Corporation2029 Stierlin Court

Mountain View, CA, [email protected]

Mohammad AminLinkedin Corporation2029 Stierlin Court


Baoshi YanLinkedin Corporation2029 Stierlin Court


Anmol BhasinLinkedin Corporation2029 Stierlin Court


ABSTRACTWe describe a hybrid recommendation platform/techniqueat LinkedIn that seeks to optimally extract relevant infor-mation pertaining to items to be recommended. By extend-ing the notion of an item profile, we propose the conceptof a “virtual profile” that augments the content of the itemwith rich set of features inherited from members who havealready shown explicit interest in it. Unlike item-based col-laborative filtering, we focus on discovering the characteris-tic descriptors that underlie the item-user association. Suchinformation is used as supplemental features in a content-based filtering system. The main objective of virtual pro-files is to provide a means to tap into rich-content infor-mation from one type of entity and propagate features ex-tracted from which to other affiliated entities that may suf-fer from relative data scarcity. We empirically evaluate theproposed method on a real-world community recommenda-tion problem at Linkedin. The result shows that the virtualprofiles outperform a collaborative filtering based approach(user who likes this also likes that). In particular, the im-provement is more significant for new users with only limitedconnections, demonstrating the capability of the method toaddress the cold-start problem in pure collaborative filteringsystems.

Categories and Subject DescriptorsH.2.8 [Database Management]: Data Mining

General TermsTheory

Keywords

hybrid recommender systems, feature generation and extrac-tion, model-based recommendation, virtural profiles

1. INTROCUDTIONLarge scale recommender systems, in the era of internet scaledata deluge, contribute significantly to mitigate informationoverload problem by unveiling relevant and interesting ob-jects to users. Rather than hoping for serendipitous encoun-ters, recommender systems bring forth the notion of per-sonalized information discovery by presenting to the user asmaller pool of relevant objects. Collaborative filtering, thede facto mechanism for recommendation, fails to address“cold start problems” which has led to the exploration ofhybrid recommenders. Hybrid recommenders combine in-formation obtained from different sources and techniques toachieve better outcome. Typically a hybrid recommendersystem incorporates information from a myriad of sourcese.g. content meta data, interaction data, global popularity,social network and social interaction information and so on.Each of these information sources offers different level of rel-evance guarantee at varying computation overhead. Hence,how these information sources are computed and how theyare combined play a vital role in the final outcome.

As of today LinkedIn has more than 220 million users. Asthe largest and most popular professional networking site,LinkedIn presents some unique opportunities and challengesfor content discovery and recommendation. It is imperativefor the members to be able to discover and subscribe tocompanies and groups (referred to as community henceforth)that might be relevant to them from a professional context.

In this paper, we describe a hybrid community recommen-dation platform/technique at LinkedIn that optimally com-bines information from multiple sources. In order to extractmore relevant information pertaining to the community tobe recommended, i.e. to further extend the notion of contentmeta data, we have proposed the concept of “virtual profile”that augments the content meta data with rich set of featuresinherited from the set of members who have already shownexplicit interest to it. In general the notion of virtual profileanswers: “What are the most dominant features pertain-ing to the members who have shown interest to a particular

community?”. This question essentially maps an object intothe same feature space as that of the subscribers’. Contentmeta data, extended with this inferred information providesadditional warranty against cold start problem. LinkedIndata presents a unique opportunity to extend the contentfeatures with extracted features since there is no dearth ofrich set of information about the subscribers in the data set,which essentially renders the synergy immensely valuable.

The contribution of this paper is as follows:

1. Generic content meta data extension method i.e vir-tual profile generation.

2. Scalable and generic recommendation computation plat-form that powers multiple real-time recommendationproducts at LinkedIn.

3. Seamless integration of multiple, heterogeneous datasources to compute optimal outcome.

2. RELATED WORKThere has been a flurry of research in the domain of recom-mender systems with the objective of improving personal-ization [1]. Most traditional recommenders are powered bycollaborative filtering [9, 17], content-based predictors [8,14] and knowledge based filtering techniques [11]. Each in-dividual techniques have their own strengths and weaknessese.g. while collaborative filtering techniques suffer from datasparsity and cold start problems [15], content-based tech-niques are prone to skewed recommendation [14]. Hybridrecommenders combine the best of both worlds, making therecommenders more robust in practice. Much work has beendone to combine multiple recommenders in an effective wayto outperform any single one. In [5] Burke depicts a taxon-omy of recommender systems, where multiple recommendersare arranged to allow execution in a parallel or cascadedtopology. A system described in [4] combines multiple col-laborative filtering approaches using a linear combination ofstatic weights learned via linear regression. STREAM [2],which combines multi-tier predictors, uses dynamically gen-erated metrics to learn the next level of predictors. In [12],a hybrid movie recommender system is proposed that usescontent based predictors to boost user data which drives theensuing collaborative filtering based recommendation. Thecontent information is obtained from IMDB and a NaiveBayes classifier is used for building user item profiles. Fi-nally a user-based collaborative filtering is employed to ob-tain the final recommendation. However, this approach suf-fers from scalability issues. Pazzani [13] proposed a hybridrecommender system where the content based user profilesare used to group similar users which is subsequently usedto predict user preferences. In many of these user-item rec-ommendation frameworks, items to be recommended can beaugmented with meta-data corresponding to the memberswho have already shown explicit interest to it. In otherwords, these items can be represented as an object in thesame feature space as that of the users. These representa-tions could be thought of “virtual user profiles” or “virtualprofiles”. This could potentially add one other layer of in-formation source to guide the recommendation process. Inour approach, we describe a large scale recommender system

that combines data from multiple heterogeneous sources in-cluding virtual profiles and social network to serve real timetraffic in a large professional social networking site.

3. METHOD3.1 System OverviewWe adhere to building our recommender system based oncontent filtering since we have an abundant access to rich-content entities, such as user profiles, which enables a straight-forward means for feature extraction, indexing and match-ing. Target entities (those the client wants recommenda-tions of) are feature extracted and put into a reverse index,and source entities (those the client wants recommendationsfor) are converted into complex queries against the index.This provides a form of content-based recommendation scorewhere the match is determined by the degree of similaritybetween the source and target entity features, with differ-ent fields weighted by a set of parameters determined by anoffline learning-to-rank process. Figure 1 illustrates a briefworkflow of the system. It also shows how we can augmentthe system by including more information, such as virtualprofiles, as new features in the content filtering recommen-dation, as detailed below.

Figure 1: A brief workflow for the recommendersystem with virtual profiles.

We view every entity as being characterized by two set ofcontent features: one extracted from explicit informationassociated with the entity which we name the “primary pro-file”, and the other inferred from the entity’s behavior andassociation with other entities, which we name the “virtualprofile.”The main objective of virtual profiles is to provide ameans to tap into rich-content information from one type ofentity and propagate features extracted from which to otheraffiliated entities that may suffer from relative data scarcity.Essentially, a virtual profile of an entity is an aggregationof statistically relevant features from primary profiles of af-filiated entities, in which way it introduces a collaborative

filtering aspect in our content filtering system. For example,a virtual profile of a Linkedin group constitutes distinctivefeatures from its participants so that the group can be mosteffectively distinguished from others.

To first extract features from entities to generate primaryprofiles, we utilize a feature extractor layer, a standaloneservice that accumulates underlying entity database changeevents and identifies various fields in the document. Varioustypes of fields that could be feature extracted include richtext fields, such as job summary, member position summaryetc., and specialized fields, such as Geo entities includingregion, country, city, coordinates, etc.

The presented content filtering system can be extended toconsider other collaborative filtering aspects, for example,by including network proximity as a feature while computingrelevance scores. We describe a browsemap-based methodalong this line as a comparison in Section 4. As a gen-eral platform, every application consuming recommenda-tions from this system can easily build its own logic forreranking/reordering of results based on custom filtering cri-teria. the concept of network proximity, e.g., recommendingjobs to discussion groups.

3.2 Generating Virtual ProfilesThe virtual profile generation process for an entity aims atselecting from a total of n features of its affiliated entities,a subset with k < n features that is “maximally informa-tive” about the entity. In a classification point of view, theentity that we generate the virtual profile for represents atarget class for a set of documents (primary profiles). Weneed a measure to evaluate the“information content”of eachindividual feature with regard to the target class. We pro-pose to use mutual information for this purpose. Mutualinformation measures arbitrary dependencies between ran-dom variables. And the fact that the mutual informationis independent of the coordinates chosen permits a robustestimation makes it suitable for assessing the “informationcontent” of features in complex classification tasks.

In accordance with Shannon’s information theory, the un-certainty of a document class C as a random variable canbe measured as:

H(C) = −∑c∈C

P (c)logP (c) ,

After knowing the feature vector F , the conditional entropyH(C|F ) measures the remaining uncertainty about C:

H(C) = −∑f∈F

P (f)∑c∈C

P (c|f)logP (c|f) .

After having observed the feature vector F , the mutual in-formation, i.e., the amount of decreased class uncertainty isdefined as:

I(C;F ) = H(C)−H(C|F ) =∑c,f

P (c, f)logP (c, f)

P (c)P (f),

where P (c, f) is the joint probability of class c and featuref .

Therefore, to generate virtual profiles, the goal is to findthe optimal feature subset, S ⊆ F , so that I(C;S) is max-imized. From an information theoretic perspective, select-ing features that maximize I(C;F ) translates into selectingthose features that contain the maximum information aboutclass C. However, locating the optimal subset requires anexhaustive combinatorial search over the feature space, re-quiring a number of runs equal to

(nk

), where n is the size

of the original feature set and k is that of the desired sub-set. Besides, an exact solution also demands large trainingsample sizes to estimate the higher order joint probabilitydistribution in I(F ;C). For example, Fraser’s method [6], acomputationally efficient algorithm for calculating the opti-mal I(C;S), requires for its convergence a number of sam-ples “in the millions” when the number of features in theinput vector is larger than 3 or 4.

Given these difficulties, most of the existing approaches ap-proximate I(F ;C) based on the assumption of lower-orderdependencies between features. For example, a second-orderfeature dependence assumption is proposed by Battiti [3]to approximate I(F ;C) by a greedy incremental selectionscheme with a heuristic to account for correlations betweenfeatures: Given a set of already selected features, the algo-rithm chooses the next feature as the one that maximizes theinformation about the class corrected by subtracting a quan-tity proportional to the average mutual information with theselected features.

Unfortunately, the calculation of pairwise feature correlationI(f, f ′) is impractical in our case because the feature dimen-sion is extremely high given the bag-of-words extracted fromtextual contents. Therefore, we make a first-order class de-pendence assumption that each feature independently influ-ences the class variable, which means to select the mth fea-ture, fm, is independent from the (m − 1) already selectedfeatures, i.e., P (fm|f1, . . . , fm−1, C) = P (fm|C). This re-sults a straightforward greedy algorithm to generate the vir-tual profile for an entity c, which consists of following steps:1) gather features from all primary profiles associated withentities that have an affiliation with c, 2) calculate mutualinformation, I(f ; c), between each feature and e, and 3) se-lect top k features with highest I(f ; c) into the virtual pro-file. More specifically, I(f ; c) can be calculated as follows.

I(f ; c) =∑ef∈{1,0}

∑ec∈{1,0}

P (f = ef , c = ec) logP (f = ef , c = ec)

P (f = ef )P (c = ec),

(1)

where f is a random variable that takes values ef = 1 (en-tity primary profile contains feature f) and ef = 0 (theentity primary profile does not contain feature f), and c isa random variable that takes values ec = 1 (the entity isaffiliated with c) and ec = 0 (the entity is not affiliated withc). The probabilities in Equation 1 can be calculated usingmaximum likelihood estimation.

4. EXPERIMENTS

Our goal is to test if virtual profiles are a valuable sourceof features to improve the recommendation performance. Indesigning experiments, we want to verify the heuristic as-sumption that virtual profile can use features greedily se-lected by mutual information. We also want to compare theperformance of virtual profiles with other classic collabora-tive filtering methods and study their tradeoffs. Further-more, by experimenting with different parameter settingsto generate virtual profiles, we want to provide a generalguidance on how virtual profiles can be best implemented inpractice.

4.1 MethodologiesWe choose a community recommendation problem at Linkedinas the test application. Successful recommendations wouldresult in users following certain communities, while usersare also presented the choice to opt-out communities at anylater point.

We extract three kinds features from entities (users and com-munities) in this application domain as follows.

1. content features: features from users’ and communi-ties’ textual information extracted into predefined stan-dardized fields (e.g., name, industry, description, etc.).

2. virtual profile: as described in Section 3, a set of fea-tures selected from a community’s followers as supple-ments to the community’s primary profile.

3. browsemap: a collaborative feature representing theco-affiliation relationship, or “users who follow X alsofollow Y.”

Browsemaps capture a notion of similarity between com-munities that is driven by users’ preference. To generatea browsemap for a community, from all other communi-ties that it shares followers with, we choose top 50 onesranked by TF/IDF. And then for each user, we take theclosure of communities she has already followed with re-spect to browsemaps, and select top 50 ones weighted bytheir TF/IDF scores normalized over the number of com-munities followed. Communities selected in this way can beessentially seen as recommendations by collaborative filter-ing. We instead treat them as part of a standalone feature,and when combined with users’ content features to generatea search query, it would lead to extra field matches with hitsagainst communities appear in the feature. And the weightof this match, just like matches in other features, can bedetermined in an offline learning process.

The content features extracted for communities containsonly three fields (i.e., name, description, and tags). Theyrepresent nearly a minimum amount of information that isrequired for a content filtering recommender system to func-tion, and are therefore considered as a baseline in the exper-iment. Browsemaps, on the other hand, are designed as analternative to virtual profiles for comparison, given that theyboth take into account the interaction among entities.

As for model fitting, we use a training set including 3.4million positive and 2.2 million negative examples gathered

from both explicit and implicit user feedbacks (e.g., fol-low/unfollow or lack of action to recommendations). Weapply an L2-regularized logistic regression with various com-bination of the above mentioned features. The best modelunder each configuration is selected by optimizing the areaunder the ROC curve (AUC-ROC). Performances of differ-ent models are evaluated both offline and online. The resultsare presented in the next sections.

4.2 Results4.2.1 Offline evaluationWe compare the AUC for models obtained by training withfour different feature configurations, namely, (A) contentfeatures only, (B) content features plus virtual profiles, (C)content features plus browsemaps, and (D) content featuresplus both virtual profiles and browsemaps. It can be seenfrom Figure 2 that, the ROC curve of model A completelydominates that of model B (with AUCs 0.72 vs. 0.60), andboth of them dominate that of model C (AUC 0.44). Thesame performance pattern is also exhibited in the precision-recall curve, as shown in Figure 3.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

False positive rate

True

pos

itive

rat

e

content features + vpcontent features + bmcontent features + vp + bmcontent features only

Figure 2: ROC curves for different models.

Besides classification performance, another important mea-sure that can be evaluated offline is the coverage, whichrefers to the degree to which recommendations cover the setof available items (item space coverage) and the degree towhich recommendations can be generated to all potentialusers (user space coverage) [7, 10]. Owing to a distributedalgorithm developed at Linkedin, we are able to calculaterecommendations offline for all our 220 million users. Us-ing each of the trained model described above, we calculatea different set of recommendations for each user, with thesize of each set capped at 50. We counted numbers of times

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Recall

Pre

cisi

on

content features + vpcontent features + bmcontent features + vp + bmcontent features only

Figure 3: Precision-recall curves for different mod-els.

unique communities appeared in recommendations (frequen-cies) under different models. Figure 4 shows a logarithmicscale of the frequencies sorted in descending order plottedagainst their ranks.

It is not surprising to see that the baseline curve from thecontent-features-only model is the lowest since features ex-tracted for communities in this case contains the least amountof information. And the distribution of the recommendationfrequency simply reflects the distribution of the amount oftextual content of each community, which is subject to thepower law. On the other hand, the curve from the modelwith the addition of browsemaps visibly bulges outwardsfrom the baseline for about two thirds of points, indicatingthat those points are getting higher frequencies showing upin recommendations, hence more coverage. Most remark-ably, the model with the addition of virtual profiles signifi-cantly increased the frequencies for almost all points on thecurve except for cases where original baseline frequencies areextremely high or low.

The reason why browsemaps slightly boost the coverage forsome communities is because those communities bear littlecontent information yet having followers already. Havingfollowers makes them eligible to be potentially included inother communities’ browsemaps, and thus leads to a higherchance to matches with users. However, for users not hav-ing followed any communities at all, browsemaps become anempty feature, which is the reason why for about a third ofcommunities, there sees no increase in coverage from browsemapscompared with the baseline. This phenomenon is also illus-trated in Figure 5, in which the recommendation frequenciesof unique companies are only counted for new users (i.e.,

users who have not started following communities yet). Weobserve that the model with browsemaps produces an iden-tical curve to the baseline, while the model with virtual pro-files exerts a consistent boost. This shows that browsemaps,as a feature of a collaborative filtering aspect, fails to addresscold start, while virtual profiles provides a well-rounded im-provement in terms of both coverage and predictive power.

0 50000 100000 150000 200000 250000

2e+

025e

+02

2e+

035e

+03

2e+

045e

+04

2e+

05

num

ber

of r

ecom

men

datio

ns

content features + vpcontent features + bmcontent features only

Figure 4: number of recommendation per uniquecompanies.

4.2.2 Online evaluationTo further evaluate models with various feature configura-tions (i.e., content features with vp, content features withbm, content features with both vp and bm, and content fea-tures only), we deployed them to serve realtime online rec-ommendation requests and compare performances througha bucket test. We assign a unique bucket of 2.5% randomlyselected users to each model. The bucket with the modelbased only on content features is the control, while othersare variants.

The duration of the test is determined according toWheeler [18],where a conservative estimation of sample size to achieve an80% power (the probability of correctly rejecting the nullhypothesis when it is indeed false) is given by Equation 2.

n = (4rσ

∆)2 , (2)

where n is the minimum number of samples (impressions tobe delivered) for each equal-sized variant, r is the numberof variants, σ2 is the variance of the OEC (Overall Evalu-ation Criterion [16], a quantitative measure of the experi-ment’s objective.), and ∆ is the sensitivity, or the desiredamount of change. The OEC in this test is the Click-throughrate (CTR) of recommendations. Assume each click-through

0 50000 100000 150000 200000

2e+

025e

+02

2e+

035e

+03

2e+

045e

+04

num

ber

of r

ecom

men

datio

ns

content features + vpcontent features + bmcontent features only

Figure 5: number of recommendation for new usersper unique companies.

event is a Bernoulli trial with probability p = ctr0 (con-trol CTR, which is estimated from historical data), thenσ2 = p(1 − p). Applying Equation 2 and knowing the ap-proximate recommendation impressions per day, we derivethe length of the test to be 7 days.

Figure 7 presents the results of the test by showing the per-centage change in CTR of variant models relative to the con-trol, on each individual day of the test. Overall, the modelwith virtual profiles outperforms the control by 91.2%. Sur-prisingly, however, we do not observe any improvement fromthe model with browsemaps. The model with both virtualprofiles and browsemaps increased the CTR by 84.4%. Thedifference between the two best performing model is notsignificant (p value 0.062), which is similar to the offlineevaluation result. The reason why browsemaps fail to in-crease overall CTR may be attributed to the fact that onlyone third of users have followed communities in this par-ticular application, meaning the cold start effect is muchpronounced. Virtual profiles, on the other hand, is not vul-nerable to this problem since it is content-based and doesnot rely on pre-existing user-item affiliations, as is demon-strated in this experiment.

5. CONCLUSION AND FUTURE WORKWe presented virtual profiles, a generic content meta dataextension method. We also introduced how it is utilized in ascalable and generic content-based hybrid recommender sys-tem that powers multiple real-time recommendation prod-ucts at LinkedIn. The goal of virtual profiles is to provide ameans to tap into rich-content information from one type ofentity and propagate features extracted from which to otheraffiliated entities that may suffer from relative data scarcity.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

False positive rate

True

pos

itive

rat

e

vp−top50vp−top100vp−top200

Figure 6: ROC curves for virtual profiles with dif-ferent number of terms.

It brings a collaborative filtering aspect in the form of a sup-plement to content features in the recommender system. Itis shown to outperform a method that directly incorporatenetwork proximity from collaborative filtering.

Experiments supported that our first-order class dependenceassumption and the greedy algorithm in calculating the mu-tual information is a reasonable approximation. In futurework, we will investigate scalable ways to account for de-pendencies among features. We plan to explore more termweighting methods besides mutual information, includingother classic information theoretic quantities such as theKullback-Leibler divergence, or TF/IDF.

6. REFERENCES[1] G. Adomavicius and A. Tuzhilin. Toward the next

generation of recommender systems: A survey of thestate-of-the-art and possible extensions. IEEETRANSACTIONS ON KNOWLEDGE AND DATAENGINEERING, 17(6):734–749, 2005.

[2] X. Bao, L. Bergman, and R. Thompson. Stackingrecommendation engines with additionalmeta-features. In Proceedings of the third ACMconference on Recommender systems, RecSys ’09,pages 109–116, 2009.

[3] R. Battiti. Using mutual information for selectingfeatures in supervised neural net learning. Trans.Neur. Netw., 5(4):537–550, July 1994.

[4] R. M. Bell, Y. Koren, and C. Volinsky. The BellKorsolution to the Netflix Prize.

[5] R. Burke. Hybrid recommender systems: Survey andexperiments. User Modeling and User-AdaptedInteraction, 12(4):331–370, Nov. 2002.

1 2 3 4 5 6 7

0.5

1.0

1.5

2.0

2.5

3.0

Day

CT

R %

1 2 3 4 5 6 7

0.5

1.0

1.5

2.0

2.5

3.0

1 2 3 4 5 6 7

0.5

1.0

1.5

2.0

2.5

3.0 content features + vp

content features + bmcontent features + vp + bm

Figure 7: Model CTRs.

[6] A. M. Fraser and H. L. Swinney. Independentcoordinates for strange attractors from mutualinformation. Physical Review A, 33(2):1134–1140, Feb.1986.

[7] M. Ge, C. Delgado-Battenfeld, and D. Jannach.Beyond accuracy: evaluating recommender systems bycoverage and serendipity. In Proceedings of the fourthACM conference on Recommender systems, RecSys’10, pages 257–260, New York, NY, USA, 2010. ACM.

[8] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins.Eigentaste: A constant time collaborative filteringalgorithm. Inf. Retr., 4(2):133–151, July 2001.

[9] J. L. Herlocker, J. A. Konstan, and J. Riedl.Explaining collaborative filtering recommendations. InProceedings of the 2000 ACM conference on Computersupported cooperative work, CSCW ’00, pages 241–250,New York, NY, USA, 2000. ACM.

[10] J. L. Herlocker, J. A. Konstan, L. G. Terveen, andJ. T. Riedl. Evaluating collaborative filteringrecommender systems. ACM Trans. Inf. Syst.,22(1):5–53, Jan. 2004.

[11] P. B. Kantor. Recommender systems handbook.Springer, 2009.

[12] P. Melville, R. J. Mooney, and R. Nagarajan.Content-boosted collaborative filtering for improvedrecommendations. pages 187–192, 2002.

[13] M. J. Pazzani. A framework for collaborative,content-based and demographic filtering. Artif. Intell.Rev., 13(5-6):393–408, Dec. 1999.

[14] M. J. Pazzani and D. Billsus. The adaptive web.chapter Content-based recommendation systems,pages 325–341. Springer-Verlag, Berlin, Heidelberg,2007.

[15] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, andJ. Riedl. Grouplens: an open architecture forcollaborative filtering of netnews. In Proceedings of the1994 ACM conference on Computer supportedcooperative work, CSCW ’94, pages 175–186, NewYork, NY, USA, 1994. ACM.

[16] R. K. Roy. Design of experiments using the taguchiapproach: 16 steps to product and processimprovement. Wiley, 20011.

[17] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.Item-based collaborative filtering recommendationalgorithms. In Proceedings of the 10th internationalconference on World Wide Web, WWW ’01, pages285–295, 2001.

[18] R. E. Wheller. Portable power. Technometrics,16(2):177–179, 1974.

Documents

Recsys virtual-profiles