On the Security of Trustee-based Social Authentications · fact, security of users are correlated in trustee-based social authentications, in contrast to traditional authenticators

1

On the Security of Trustee-based SocialAuthentications

Neil Zhenqiang Gong, Student Member, IEEE and Di Wang

Abstract—Recently, authenticating users with the help of theirfriends (i.e., trustee-based social authentication) has been shownto be a promising backup authentication mechanism [4, 7, 8,24]. A user in this system is associated with a few trustees thatwere selected from the user’s friends. When the user wants toregain access to the account, the service provider sends differentverification codes to the user’s trustees. The user must obtainat least k (i.e., recovery threshold) verification codes from thetrustees before being directed to reset his or her password.

In this paper, we provide the first systematic study aboutthe security of trustee-based social authentications. Specifically,we first introduce a novel framework of attacks, which we callforest fire attacks. In these attacks, an attacker initially obtainsa small number of compromised users, and then the attackeriteratively attacks the rest of users by exploiting trustee-basedsocial authentications. Then, we construct a probabilistic modelto formalize the threats of forest fire attacks and their costsfor attackers. Moreover, we introduce various defense strategies.Finally, we apply our framework to extensively evaluate variousconcrete attack and defense strategies using three real-worldsocial network datasets. Our results have strong implications forthe design of more secure trustee-based social authentications.

Index Terms—Social authentication, security model, backupauthentication.

I. INTRODUCTION

Web services (e.g., Gmail, Facebook, and online Bankings)today most commonly rely on passwords to authenticate users.Unfortunately, two serious issues in this paradigm are: userswill inevitably forget their passwords, and their passwordscould be compromised and changed by attackers, which resultin the failures to access their own accounts.

Therefore, web services often provide users with backupauthentication mechanisms to help users regain access totheir accounts. Unfortunately, current widely used backupauthentication mechanisms such as security questions andalternate email addresses are insecure or unreliable or both.Previous works [5, 23, 27] have shown that security questionsare easily guessable and phished, and that users might forgettheir answers to the security questions. A previously registeredalternate email address might expire upon the user’s change ofschool or job. For the above reasons, it is important to designa secure and reliable backup authentication mechanism.

N. Z. Gong and D. Wang are with the Electrical Engineering and ComputerScience Department, University of California at Berkeley, Berkeley, CA,94720 USA, e-mail: ([email protected]; [email protected]).

We would like to thank Dawn Song for insightful discussions. This workis supported by the NSF under Grants No. CCF-0424422, 0842695, 0831501CT-L, by the AFOSR under MURI Award No. FA9550-09-1-0539, by theOffice of Naval Research under MURI Grant No. N000140911081, and byIntel through the ISTC for Secure Computing. Any opinions, findings, andconclusions or recommendations expressed in this material are those of theauthor(s) and do not necessarily reflect the views of the funding agencies.

Recently, trustee-based social authentication has attractedincreasing attentions and has been shown to be a promisingbackup authentication mechanism [4, 7, 8, 24]. Brainardet al. [4] first proposed trustee-based social authenticationand combined it with other authenticators (e.g., password,security token) as a two-factor authentication mechanism.Later, trustee-based social authentication was adapted to bea backup authenticator [7, 8, 24]. In particular, Schechter etal. [24] designed and built a prototype of trusted-based socialauthentication system which was integrated into Microsoft’sWindows Live ID. Schechter et al. found that trustee-basedsocial authentication is highly reliable. Moreover, Facebookannounced its trustee-based social authentication system calledTrusted Friends in October, 2011 [8], and it was redesignedand improved to be Trusted Contacts [7] in May, 2013.

However, these previous work either focus on security atindividual levels [4, 24] or totally ignore security [7, 8]. Infact, security of users are correlated in trustee-based socialauthentications, in contrast to traditional authenticators (e.g.,passwords, security questions, and fingerprint) where securityof users are independent. Specifically, a user’s security intrustee-based social authentications relies on the security ofhis or her trustees; if all trustees of a user are alreadycompromised, then the attacker can also compromise him orher because the attacker can easily obtain the verificationcodes from the compromised trustees. The impact of thiskey difference has not been touched. Moreover, none of theexisting work has studied the fundamental design problemssuch as how to select trustees for users so that the systemis more secure and how to set the system parameters (e.g.,recovery threshold) to balance between security and usability.

Our work: In this paper, we aim to provide the firstsystematic study about the security of trustee-based socialauthentications. To this end, we first propose a novel frame-work of attacks that are based on the observation that users’security are correlated in trustee-based social authentications.In these attacks, an attacker initially obtains a small numberof compromised users which we call seed users. The attackerthen iteratively attacks other users according to some priorityordering of them. In an attack trial to a user Alice, if at leastk trustees of Alice are already compromised, then the attackercan easily compromise Alice; otherwise the attacker can (op-tionally) send spoofing messages to Alice’s uncompromisedtrustees to request verification codes, and such spoofing attackscan succeed with some probability [24]. Our attacks are similarto forest fires which start from a few points and spread amongthe forests. Thus, we call them forest fire attacks.

arX

iv:1

402.

2699

v4 [

cs.C

R]

8 J

un 2

014

2

Second, we construct a probabilistic model to formalize thethreats of forest fire attacks and their costs for attackers. Foreach user, our model computes the compromise probabilitythat the user is compromised after a given number of attackiterations. With those compromise probabilities, our modelcalculates the expected number of compromised users andtreats it as the threat. Moreover, our model quantifies the costsof sending spoofing messages for attackers.

Third, we explore various scenarios where seed users havedifferent properties and introduce strategies to construct pri-ority orderings. For instance, one scenario could be that seedusers happen to be appointed as trustees of a large number ofusers. Furthermore, we discuss a few defense strategies. Forexample, one strategy is to guarantee that no user is appointedas a trustee of a large number of users.

Results and impact of our work: We apply our framework toextensively evaluate various concrete attack scenarios, defensestrategies, and the impact of system parameters using threereal-world social networks. First, we find that forest fire attackis a potential big threat. In particular, when all the users withat least 10 friends in these social networks adopt trustee-based social authentications, an attacker can compromise tensof thousands of users in some cases even if the numberof seed users is 0; using a small number (e.g., 1,000) ofseed users, the attacker can further compromise two to threeorders of magnitude more users with low (or even no) costsof sending spoofing messages. Second, our defense strategy,which guarantees that no users are selected as trustees bytoo many other users, can decrease the expected number ofcompromised users by one to two orders of magnitude andincrease the costs for attackers by a few times in some cases.Third, we find that, in contrast to existing work [7, 8, 24]where the recover threshold is set to be three, it could be setto be four to better balance between security and usability.

In summary, our key contributions are as follows:• We propose a novel framework of attacks, which we call

forest fire attacks.• We construct a model to formalize the threats of forest

fire attacks and their costs for attackers. Moreover, weexplore various attack scenarios and defense strategies.• We apply our framework to extensively evaluate these

attack scenarios, defense strategies, and the impact ofsystem parameters using three real-world social networks.Our results have strong implications for designing moresecure trustee-based social authentications.

II. BACKGROUND

First, we overview how a trustee-based social authenticationsystem works. Then, we introduce two basic concepts, i.e.,social networks and trustee networks.

A. Trustee-based social authentications

A trustee-based social authentication includes two phases:• Registration Phase. The system prepares trustees for

a user Alice in this phase. Specifically, Alice is firstauthenticated with her main authenticator (i.e., password),

and then a few (e.g., 5) friends, who also have accountsin the system, are selected by either Alice herself or theservice provider from Alice’s friend list and are appointedas Alice’s trustees.• Recovery Phase. When Alice forgets her password or

her password was compromised and changed by anattacker, she recovers her account with the help of hertrustees in this phase. Specifically, Alice first sends anaccount recovery request with her username to the ser-vice provider which then shows Alice an URL. Aliceis required to share this URL with her trustees. Then,her trustees authenticate themselves into the system andretrieve verification codes using the given URL. Alicethen obtains the verification codes from her trusteesvia emailing them, calling them, or meeting them inperson. If Alice obtains a sufficient number (e.g., 3)of verification codes and presents them to the serviceprovider, then Alice is authenticated and is directed toreset her password. We call the number of verificationcodes required to be authenticated the recovery threshold.

Note that it is important for Alice to know who her trusteesare in the Recovery Phase. Schechter et al. [24] showedthat users cannot remember their trustees via performing userstudies. Thus, a usable trustee-based social authenticationsystem should remind Alice of her trustees.

Next, we provide details about two representative trustee-based social authentication systems which were implementedby Microsoft [24] and Facebook [7, 8], respectively.

Microsoft’s trustee-based social authentication: Schechteret al. [24] designed and built a trustee-based social authen-tication system and integrated it into Microsoft’s WindowsLive ID service. In the Registration Phase, users provide fourtrustees. The recovery threshold is three. Moreover, users willbe reminded of their trustees.

Facebook’s trustee-based social authentication: Facebook’strustee-based social authentication system is called TrustedFriends [8], whose improved version is Trusted Contacts [7].In the Registration Phase of Trusted Contacts, a user selectsthree to five friends from his or her friend list as trustees. Therecovery threshold is also set to be three. Facebook does notremind a user of his or her trustees, but it asks the user to typein the names of his or her trustees instead. However, once theuser gets one trustee correctly, Facebook will remind him orher of the remaining trustees.

Both trustee-based social authentication systems ask usersto select their own trustees without any constraint. In ourexperiments (i.e., Section VII), we show that the serviceprovider can constrain trustee selections via imposing that nousers are selected as trustees by too many other users, whichcan achieve better security guarantees. Moreover, none of thesework performed rigorous studies to support the choice of threeas the recovery threshold. In fact, our experimental resultsshow that setting the recovery threshold to be four could betterbalance between security and usability.

3

u2 u1

u3

u4

u5

u6

u2 u1

u3

u4

u5

u6

u2 u1

u3

u4

u5

u6

Spoo�ng

(a) (b) (c)

u2 u1

u3

u4

u5

u6

(d)

(a) Ignition Phase (b)-(d) Propagation Phase

Fig. 1: Illustration of a forest fire attack to a service with 6 users. The shown graph is the trustee network. Recovery threshold is three.Users u5 and u6 have adopted the trustee-based social authentication. The attack ordering is u6, u5, u4. (a) u1, u2, and u3 are compromisedseed users. (b) u6 is compromised because three of his or her trustees are already compromised. (c) u5 is compromised because the attackeralready compromises his or her trustees u3 and u6 and obtains a verification code from u4 via spoofing attacks. (d) u4 is not compromisedbecause he or she hasn’t adopted the service.

B. Social networks and trustee networks

We denote a social network as G = (V,E), where each nodein V corresponds to a user in the service and an undirectededge (u, v) represents that users u and v are friends. Moreover,in a trustee-based social authentication system, users and theirtrustees form a directed network. We call this directed networka trustee network and denote it as GT = (VT , ET ), where anode in VT is a user in the service and a directed edge (v, u)in ET means v is a trustee of u.

One fundamental challenge in trustee-based social authen-tication is how to construct the trustee network from a socialnetwork so that the system is more secure.

III. THREAT MODEL

We first introduce attackers’ background knowledge andthen a novel family of attacks which we call forest fire attacks.

A. Background knowledge

We assume that attackers know the trustee network in thetarget service. The reasonableness of this threat model issupported by two evidences. First, attackers can obtain users’usernames. A username is usually a string of letters, digits,and special characters. Moreover, Bonneau et al. [3] showedthat a majority (e.g., 96% in their studies) of websites enableattackers to probe if a string is a legitimate username. Thus,strong attackers, who have enough resources (e.g., a botnet)to perform username probings, can obtain all usernames inthe target service. Second, Schechter et al. [24] found, viaperforming user studies, that users cannot remember their owntrustees. Therefore, a usable trustee-based social authenticationsystem must remind users of their trustees. Recall that anaccount recovery request only requires a username. As a result,an attacker could send account recovery requests with thecollected usernames to the service provider which reminds theattacker of the trustees of each user.

Next, we take Facebook as an example to show how anattacker obtains the trustee network. First, Facebook providesan interface1 to test if a user (represented by a username, real

1https://www.facebook.com/login/identify?ctx=recover

name, or email address) is in Facebook. Thus, the attacker canperform username probings to collect Facebook users. Second,the attacker sends account recovery requests to Facebook usingthe collected names. Recall that Facebook shows all trustees toa user once the user correctly types in one trustee. Moreover,Bilge et al. [4] showed that an attacker can obtain friend listsof around 90% of Facebook users. Thus, the attacker canrepeatedly guess the trustees of a user until success. We notethat Facebook only allows a user to try around 10 times fortyping in the trustees within a short period of time. However,such rate limit cannot prevent a strong attacker from obtainingthe trustee network eventually, though it can increase theattacker’s cost.

B. Forest fire attacks

Our forest fire attacks consist of Ignition Phase and Prop-agation Phase.Ignition Phase: In this phase, an attacker obtains a smallnumber of compromised users which we call seed users. Theycould be obtained from phishing attacks, statistical guessings,and password database leaks, or they could be a coalition ofusers who collude each other. Indeed, a large number of socialnetwork accounts were reported to be compromised,2 showingthe feasibility of obtaining compromised seed users.Propagation Phase: Given the seed users, the attacker itera-tively attacks other users. In each attack iteration, the attackerperforms one attack trial to each of the uncompromised usersaccording to some attack ordering of them. In an attack trial toa user u, the attacker sends an account recovery request withu’s username to the service provider, which issues differentverification codes to u’s trustees. The goal of the attacker isto obtain verification codes from at least k trustees. If at least ktrustees of u are already compromised, the attacker can easilycompromise u; otherwise, the attacker can impersonate u andsend a spoofing message to each uncompromised trustee of uto request the verification code. Schechter et al. [24] found thatsuch spoofing attacks can successfully retrieve a verificationcode with an average probability around 0.05.

2For instance, Gao et al. [9] showed that around 57,000 out of 3,500,000(1.6%) Facebook accounts were compromised.

https://www.facebook.com/login/identify?ctx=recover

4

TABLE I: Representative notations used in our model.

Notations DefinitionsG = (V,E) The trust social network among the users in the serviceGT = (VT , ET ) The trustee network among the users in the serviceVa The set of users who adopt the trustee-based social authentication serviceu A user in the serviceΓ(u) The set of friends of uΓT (u) The set of trustees of uΓT,o(u) The set of users who select u as a trusteedo(u) The number of users who select u as a trusteemu The number of trustees of uk Recovery threshold, i.e., u is authenticated if u obtains verification codes from k of mu trusteesns The number of seed users compromised in the Ignition PhaseS The set of seed users that are compromised in the Ignition PhaseS The strategy to select the seed usersn The number of attack iterations in the Propagation PhaseO(t) The attack ordering according to which the attacker performs attack trials to the users in the tth attack iterationO The ordering construction strategyp(t)s (v, u) The probability of obtaining a verification code from u’s trustee v via spoofing attacks in the tth attack iteration

ps Average spoofing probabilityp(t)c (u) The probability that u is compromised in the tth attack iteration

p(t)c (VT ) The vector of compromise probabilities of all users in the tth attack iteration

p(t)a (u) The probability that u is compromised in at least one attack iteration after t attack iterations

p(t)a (VT ) The vector of aggregate compromise probabilities of all users after t attack iterations

nc(GT , k, ns, n,S,O) The expected number of compromised userscI The cost of obtaining the set of compromised seed users in the Ignition Phasec(t)(u) The expected number of spoofing messages that are sent in the attack trial to u in the tth attack iterationce The average cost per spoofing messagec(GT , k, ns, n,S,O) The expected costp(t)r (u) The recovery probability of u in the tth attack iteration

pr Average recovery probability

Although the spoofing attacks can help attackers compro-mise more users, we want to stress that they are optional.We will show in our experiments that an attacker can stillcompromise a large number of users even if he does not usespoofing attacks to retrieve verification codes in some cases.

Example: Figure 1 shows a forest fire attack to a servicewith 6 users. Note that a good attack ordering can increasethe probability that users are compromised and decrease thenumber of required spoofing messages (see our experimentalresults in Section VII). In our example, if the attacker performsattack trials with an attack ordering of u5, u6, u4, the attackerneeds to spoof both u4 and u6 to compromise u5, whichrequires two spoofing messages. However, with the attackordering of u6, u5, u4, the attacker only needs to spoof u4

to compromise u5, which only requires one spoofing messageand could succeed with a higher probability.

Compromised users could be recovered: Users couldrecover their compromised accounts to be uncompromisedafter they or the service provider detect suspicious activitiesof the accounts. For instance, a trustee of u receiving aspoofing message might report to u, who then changes hisor her password; the phenomenon that a trustee requests lotsof verification codes for different users within a short periodof time is a possible indicator of forest fire attacks, and the ser-vice provider could then notify the users, whose trustees haverequested verification codes, to change passwords. Moreover,a recovered account could be compromised again in futureattack iterations, e.g., when the trustees of the recovered userare still compromised. The process of being compromised andbeing recovered could repeat for many attack iterations.

IV. SECURITY MODEL

In this section, we introduce our security model to formalizethe threats of forest fire attacks and their costs for attackers.Table I summarizes our important notations.

A. Formalizing threats

We use ΓT (u) and mu = |ΓT (u)| to denote the setof trustees and the number of trustees of u, respectively.We denote by ΓT,o(u) the set of users who select u as atrustee. We model a set of seed users (denoted as S) isobtained by a seed users selection strategy, and we denoteit as S. In the tth attack iteration, the attacker performs attacktrials to uncompromised users according to an attack orderingO(t). The attack orderings are constructed by an orderingconstruction strategy which is denoted as O.

We call the probability that u is compromised in the tthattack iteration compromise probability, and we denote it asp

(t)c (u). u is eventually compromised if it is compromised

in at least one attack iteration. Thus, we denote by p(t)a (u)

the probability that u is compromised after t attack iterations,and p

(t)a (u) is called aggregate compromise probability. The

compromise probabilities in the tth attack iteration depend onthe aggregate compromise probabilities after (t − 1) attackiterations. Moreover, we use p

(t)c (VT ) and p

(t)a (VT ) to rep-

resent the vectors of compromise probabilities and aggregatecompromise probabilities of all users in VT , respectively.

Algorithm 1 shows our model of forest fire attacks. Next,we elaborate the iterative computations of compromise prob-abilities and aggregate compromise probabilities.

5

1) Ignition Phase: If u is a seed user, then u’s initialcompromise probability is 1, otherwise we model u’s initialcompromise probability as 0. Formally, we have the initialcompromise probability of u as follows:

p(0)a (u) = p(0)

c (u) =

{1 if u ∈ S0 otherwise

(1)

2) Propagation Phase: The key component is to update theaggregate compromise probability of u when the aggregatecompromise probabilities of u’s trustees are given.Obtaining one verification code: We denote by A the eventthat the attacker obtains a verification code from a trustee vof u and by p(t)(v, u) the probability that A happens in thetth attack iteration. Moreover, we denote the event that v isalready compromised when the attacker attacks u in the tthattack iteration as B. Then we can represent p(t)(v, u) as:

p(t)(v, u) = Pr(A)

= Pr(A|B)Pr(B) + Pr(A|¬B)Pr(¬B) (2)

, where ¬B represents that B does not happen. Next, we modelPr(A|B), Pr(B), Pr(A|¬B), and Pr(¬B), respectively.

When B happens, the attacker can obtain a verificationcode from v with a probability 1, i.e., Pr(A|B) = 1. Pr(B)depends on whether the attacker attacks v before u or not. Ifv is attacked before u, then the probability that B happens isp

(t)a (v), otherwise it is p(t−1)

a (v). Formally, we have:

Pr(B) =

{p

(t)a (v) if v is ordered before up

(t−1)a (v) otherwise

(3)

When B does not happen, the attacker can impersonate uand send a spoofing message to v to request a verification code.We call the probability that such spoofing attacks succeedspoofing probability. Spoofing probability might be differentfor different trustees. A trustee might behave differently tospoofing messages impersonating different users because heor she might have different levels of trusts with the usersthat are impersonated. Moreover, spoofing probability mightbe different in different attack iterations because trustees mightgradually become aware of the spoofing attacks. Thus, wemodel the spoofing probability that the attacker obtains averification code from v in an attack trial to u in the tth attackiteration as p(t)

s (v, u), i.e., Pr(A|¬B) = p(t)s (v, u).

In summary, we have:

p(t)(v, u) ={p

(t)a (v) + p

(t)s (v, u)(1− p(t)

a (v)) if v is ordered before up

(t−1)a (v) + p

(t)s (v, u)(1− p(t−1)

a (v)) otherwise

Computing compromise probabilities: Recall that u iscompromised if the attacker can obtain verification codes fromat least k trustees of u. Thus, given the natural assumption thatu’s trustees are independent, the compromise probability of uin the tth attack iteration is calculated using the following localupdate rule:

p(t)c (u) =

∑φ

∏v∈φ

p(t)(v, u)∏

v∈ΓT (u)−φ

(1− p(t)(v, u)) (4)

Algorithm 1: Our Model of Forest Fire Attacks

Input: GT , k, p(t)s (v, u), ns, n, S , O, ce, cI , and p(t)

r (u).Output: nc(GT , k, ns, n,S,O), c(GT , k, ns, n,S,O).

1 begin2 //Selecting seed users in the Ignition Phase.3 S ←− S(GT , ns)4 //Calculating the compromise probabilities.5 //Ignition Phase.6 for u ∈ VT do7 if u ∈ S then8 p

(0)c (u)←− 1

9 else10 p

(0)c (u)←− 0

11 end12 p

(0)a (u)←− p(0)

c (u)13 end14 //Propagation Phase.15 t←− 116 C ←− 017 while t ≤ n do18 //Constructing an attack ordering.19 O(t) ←− O(GT , p

(t−1)a (VT ))

20 for i = 0 to O(t).size()− 1 do21 u←− O(t)[i]22 Apply Equation 4 to u.

p(t)a (u)←− 1− (1− p(t−1)

a (u))(1− p(t)c (u))

23 p(t)a (u)←− (1− p(t)

r (u))p(t)a (u)

24 c(t)(u)←− Apply Equation 1025 C ←− C + c(t)(u)26 end27 t←− t+ 128 end29 //The expected number of compromised users.30 nc(GT , k, ns, n,S,O)←−∑

u∈VTp

(n)a (u)

31 //The expected cost.32 c(GT , k, ns, n,S,O)←− cI + ceC33 return nc(GT , k, ns, n,S,O), c(GT , k, ns, n,S,O)34 end

, where φ ⊆ ΓT (u) and |φ| ≥ k.

Aggregating compromise probabilities: Assuming thatwhether u is compromised in one attack iteration is indepen-dent with whether u is compromised in another attack iter-ation, we can iteratively compute the aggregate compromiseprobability of u as follows:

p(t)a (u) = 1− (1− p(t−1)

a (u))(1− p(t)c (u)) (5)

Compromised users can recover to be uncompromised: Aswe have discussed in Section III-B, compromised users canrecover to be uncompromised and be compromised again dueto various factors. We call the probability that a compromiseduser u recovers to be uncompromised in the tth attack iterationrecovery probability, and we denote it as p(t)

r (u). Considering

6

the recovery probability, we reformulate the aggregate com-promise probability of u (i.e., Equation 5) as follows:

p(t)a (u) = (1− p(t)

r (u))(1− (1− p(t−1)a (u))(1− p(t)

c (u)))

Expected number of compromised users: Given a forestfire attack specified by the seed users selection strategy S, thenumber of seed users ns, the ordering construction strategyO, and the number of attack iteratons n, we define the threatof the attack as the expected number of users it compromises.Moreover, we denote the expected number of compromisedusers as nc(GT , k, ns, n,S,O), and it is formalized as follows:

nc(GT , k, ns, n,S,O) =∑u∈VT

p(n)a (u) (6)

B. Formalizing costs

We consider the attacker’s costs of obtaining seed users andsending spoofing messages. Suppose the cost to obtain seedusers is cI . In the following, we quantify the cost of sendingspoofing messages.

In an attack trial to u, the attacker sends a spoofing messageto u’s trustee v when the following three independent eventshappen: u is uncompromised, v is uncompromised, and amongthe rest of u’s trustees, the number of compromised ones isless than k.

In the tth attack iteration, the first event happens with aprobability (1− p(t−1)

a (u)). We denote by q(t)(v, u) the prob-ability that the second event happens, and q(t)(v, u) dependson whether v is attacked before u or not. Formally, we have:

q(t)(v, u) =

{1− p(t)

a (v) if v is ordered before u1− p(t−1)

a (v) otherwise(7)

, for any v ∈ ΓT (u). Moreover, the third event happens withthe following probability:

r(t)(v, u) =∑φ

∏w∈φ

(1− q(t)(w, u))∏

w∈ΓT (u)−φ−{v}

q(t)(w, u) (8)

, where φ ⊆ ΓT (u)− {v} and |φ| < k.Therefore, the expected number of spoofing messages (de-

noted as c(t)(v, u)) that the attacker sends to v in the attacktrial to u in the tth attack iteration is formalized as follows:

c(t)(v, u) = (1− p(t−1)a (u))q(t)(v, u)r(t)(v, u) (9)

So the total expected number of spoofing messages (denotedas c(t)(u)) that the attacker sends in the attack trial to u in thetth attack iteration is:

c(t)(u) = (1− p(t−1)a (u))

∑v∈ΓT (u)

q(t)(v, u)r(t)(v, u) (10)

Thus, we obtain the total expected cost of performing aforest fire attack as follows:

c(GT , k, ns, n,S,O) = cI + ce

n∑t=1

∑u∈VT

c(t)(u) (11)

, where ce is the average cost for the attacker to send onespoofing message.

V. ATTACK STRATEGIES

The attacker could design the seed users selection strategyand the attack ordering construction strategy to maximize theexpected number of compromised users. First, we show thatfinding the optimal set of seed users and the optimal orderingconstruction strategy is NP-Complete. Then, we explore var-ious scenarios where seed users have different properties andintroduce two ordering construction strategies.

A. NP-Completeness

Given GT , k, ns, and n, the attacker essentially aims tosolve the following attack maximization problem:

Attack Maximization Problem:nc(GT , k, ns, n) = max

S,Onc(GT , k, ns, n,S,O)

We prove that this problem is NP-Complete. Please refer toAppendix A for the proof.

B. Strategies for selecting seed users

A seed users selection strategy S is essentially to assign ascore which represents some metric of importance to each userand to select ns users with the highest scores as seed users.This is closely related to the node centrality problem [18] inthe network science community. In the following, we modify afew node centrality heuristics as seed users selection strategies.These strategies work on a trustee network, and we name themwith a prefix ‘S-’ to indicate they are used to select seed users.S-Random: As a baseline, this strategy assigns a randomscore ranging from 0 to 1 to each user in the trustee network.S-Degree: Intuitively, if a user u is selected as a trustee bymore users, then compromising u will increase the compro-mise probabilities of more users and thus the attacker has anopportunity to compromise more users. Therefore, S-Degreetreats the number of users that select u as a trustee (i.e.,outdegree of u in the trustee network) as u’s score.S-BadRank: S-BadRank, adapted from BadRank [2], per-forms a random walk on the trustee network. Specifically, S-BadRank starts a random walk from u that is picked uniformlyat random. Then S-BadRank iteratively performs one of thetwo operations: choosing a trustee v of u uniformly at randomand walks to v with a probability 1− α, and selecting a userw uniformly at random from the entire trustee network andwalks to w with a probability α. Traditionally, α is called therestart probability.

The random walk will converge to a stationary probabilitydistribution over all users in the trustee network. The stationaryprobability of u is roughly the frequency with which therandom walk visits u and is used as u’s score. Intuitively, S-BadRank tends to assign a high score to a user that is selectedby many users as a trustee because he or she has a large chanceto be visited by the random walk.S-Closeness: The attacker could also select users that areclose to all other users in the trustee network as seed users be-cause compromising them could help the attacker compromiseother users more quickly. This intuition is formally captured by

7

Algorithm 2: O-Gradient

Input: GT = (VT , ET ) and p(t)a (VT ).

Output: An ordering O of all users in VT .1 begin2 for u ∈ VT do3 for v ∈ ΓT (u) do4 //Probability of getting one verification code

pv ←− p(t)a (v) + p

(t+1)s (v, u)(1− p(t)

a (v))5 end6 pu ←−

∑φ

∏v∈φ pv

∏v∈ΓT (u)−φ(1− pv), where

φ ⊆ ΓT (u) and |φ| ≥ k.7 q

(t)a (u)←− 1− (1− p(t)

a (u))(1− pu)8 end9 for u ∈ VT do

10 d(t)a (u)←− q(t)

a (u)− p(t)a (u)

11 end12 O ←− Sort users decreasingly using d(t)

a (u)13 return O14 end

the closeness metric [22]. Specifically, we define the distancebetween v and u as the length of the shortest directed pathwhose tail is v and whose head is u. Then the closeness ofu is defined as the inverse of the sum of its distances to allother users, and it is treated as u’s score. In our experiments,we adopt the approximate algorithm developed by Okamotoet al. [19] to find the top-ns users with the highest closenessscores since it is scalable to large networks.

C. Strategies for constructing attack orderingsWe name these strategies with a prefix ‘O-’ to indicate that

they are used to construct attack orderings. Essentially, thesestrategies assign a score to each user in the trustee networkand rank them in a decreasing order according to their scores.O-Random: As a baseline, this strategy assigns a randomscore ranging from 0 to 1 to each user in the trustee network.O-Gradient: Algorithm 2 shows O-Gradient. Given thecurrent aggregate compromise probabilities p

(t)a (VT ) of all

users after t attack iterations, the attacker calculates a predictedaggregate compromise probability q(t)

a (u) for each user u bysimulating an attack trial to u. Note that q(t)

a (u) might bedifferent from p

(t+1)a (u) because q(t)

a (u) is calculated with thefixed aggregate compromise probabilities of u’s trustees whilep

(t+1)a (u) is updated with the newest aggregate compromise

probabilities of u’s trustees. Intuitively, a user u with a largerdifference of q(t)

a (u)−p(t)a (u) could be attacked with a higher

priority because such an attack trial could bring more increaseto the aggregate compromise probabilities of users who selectu as a trustee. Thus, O-Gradient calculates the differences ofq

(t)a (u) − p

(t)a (u) for all users and ranks them decreasingly

according to these differences.

VI. DEFENSE STRATEGIES

We discuss potential defense strategies from three aspects,i.e., hiding trustee networks from attackers, mitigating spoof-

ing attacks, and constraining the selection of trustees.

A. Hiding trustee networks

Preventing attackers from obtaining a trustee network is anessential step towards the defense of forest fire attacks. Inthe currently deployed social authentication systems, attackerscan obtain a trustee network because users need to know theirtrustees to retrieve verification codes from them in the Re-covery Phase. An alternative implementation of the RecoveryPhase is that the service provider directly sends verificationcodes to the trustees of the user when receiving an accountrecovery request, and the trustees are required to actively sharethe verification codes to the user. This implementation doesnot require users to know their trustees, and thus it is hard forattackers to obtain the trustee network.

However, this implementation is unreliable and could annoyusers and their trustees. Specifically, u’s trustees might alreadyforget they are trustees of u, and thus they might simply treatthose verification codes as spams and not share them withu, which results in low reliability. Moreover, users do forgetwho their trustees are [24], and thus it is highly impossiblefor u to actively request verification codes from its trustees. Ifthe trustees do actively share the codes with u, then attackerscan frequently send account recovery requests to the serviceprovider which immediately sends verification codes to thetrustees, and the trustees will (possibly) frequently share thecodes with u, which could be annoying to both u and itstrustees. More seriously, after u’s trustees realize that theverification codes are just spams, they might not actively sharethe verification codes with u even if she is really trying torecover her account, which again results in low reliability.

Therefore, hiding trustee networks from attackers sacrificesreliability, which possibly explains why existing trustee-basedsocial authentication systems didn’t adopt this alternativeimplementation of the Recovery Phase.

B. Mitigating spoofing attacks

Another way to defend against forest fire attacks is toremind trustees of not sharing verification codes via messages.This strategy is not novel, and we include it for completeness.Indeed, existing social authentication systems [7, 24] alreadytry to mitigate spoofing attacks. For instance, Microsoft’s sys-tem [24] asks a trustee why she is requesting the verificationcode and encourages her to share the code with the user viaphone or meeting in person.

However, it is hard to control how trustees share theverification codes with others in practice. Indeed, an attackercan still obtain a verification code from a trustee with anaverage probability around 0.05 via message-based spoofingattacks in the Microsoft’s system [24]. It is an interestingfuture work to design better user interfaces to further reducethis spoofing probability.

C. Constraining trustee selections

Finally, we introduce strategies to contrain trustee selec-tions, which are easy to implement and effective at defending

8

against forest fire attacks. We consider both local trusteeselection strategies and global trustee selection strategies. Alocal trustee selection strategy is based on a user’s local socialnetwork structure while a global one is based on the entiresocial network structure. We name these strategies with aprefix ‘T-’ to indicate that they are used to select trustees.

We note that how users select their trustees in a realtrustee-based social authentication system such as Facebook’sTrusteed Contacts is not clear and thus might not be one ofour strategies. However, our work focuses on a comparativestudy about different trustee selection strategies and can shedlight on which strategy is more secure.

1) Local trustee selection strategies: For a user u, a localtrustee selection strategy essentially computes a score s(v, u)for each friend v of u and then selects mu friends with thehighest scores as u’s trustees.T-Random: As a baseline, T-Random assigns a randomnumber ranging from 0 to 1 as the score s(v, u).T-CF: As was shown by Gilbert and Karahalios [10], thenumber of common friends of two users is an informativeindicator about the level of trust between them. Thus, onespeculation is that a user might select friends with which heor she shares many common friends as trustees. To quantifythe security of this speculation, we design the strategy T-CF(i.e.,T-CommonFriends), which uses the number of commonfriends shared by u and his or her friend v as the score s(v, u),i.e., s(v, u) = |Γ(v) ∩ Γ(u)|.

However, there are two drawbacks of T-CF. First, the factthat u shares many friends with a popular user v doesn’tnecessarily mean that u and v have a high level of trust becauseit is normal for many friends of u to be in v’s friend list.Second, if a common friend of u and v is a popular user, thensharing him or her doesn’t necessarily indicate a high level oftrust between u and v. Next, we introduce two strategies toovercome the two drawbacks, respectively.T-JC: To overcome the first drawback of T-CF, we designthe trustee selection strategy T-JC (i.e., T-JaccardCoefficient),which uses the Jaccard Coefficient [12] of the two sets Γ(u)

and Γ(v) as the score s(v, u), i.e., s(v, u) = |Γ(v)∩Γ(u)||Γ(v)∪Γ(u)| .

T-AA: To overcome the second drawback, we design the T-AA (i.e., T-AdamicAda) strategy, which uses Adamic-Ada [1]similarity between u and v as the score s(v, u). Adamic-Ada similarity penalizes each common friend by its popularity(i.e., the number of friends). Formally, we have s(v, u) =∑w∈Γ(v)∩Γ(u)

1log|Γ(w)| .

2) Global trustee selection strategies: Global strategiesleverage the entire social network structure and thus arepotentially better than local strategies.T-Degree: As we discussed in Section V-B, seed users couldbe those having large outdegrees in the trustee network, andthey could enable an attacker to compromise many otherusers. Thus, we propose the T-Degree strategy to minimizethe maximum outdegree in the trustee network. Intuitively, T-Degree constrains that no users are selected as trustees by toomany other users.

Algorithm 3 shows our T-Degree strategy. T-Degree selectstrustees for users one by one. For each user u that has

Algorithm 3: T-DegreeInput: G = (V,E), Va, and mu of all u ∈ V .Output: A trustee network GT = (VT , ET ).

1 begin2 VT ←− V3 for u ∈ VT do4 do(u)←− 05 end6 for u ∈ Va do7 φ←− ∅8 while |φ| < mu do9 v ←− argminv∈Γ(u)−φ do(v)

10 φ←− φ ∪ {v}11 do(v)←− do(v) + 112 ET ←− ET ∪ (v, u)13 end14 end15 return GT = (VT , ET )16 end

TABLE II: The number of users, the number of users who have atleast 10 friends, the fraction of such users, and the average numberof friends of such users in the three social networks.

Flickr Google+ Twitter# Nodes 1,551,824 10,230,332 21,297,771

# Nodes (degree ≥ 10) 233,067 3,144,370 4,540,483Fractions 15% 31% 21%

Average degrees 75 27 107

adopted the trustee-based social authentication service, T-Degree selects his or her mu friends whose current outdegreesin the trustee network are the smallest as u’s trustees; ties arebroken uniform at random.

VII. EXPERIMENTS

A. Experimental setups

Parameter settings: Unless otherwise mentioned, we set n =10, mu = m = 5, k = 3,3 ns = 1, 000, p(t)

r (u) = pr = 0for every u, and α = 0.9;4 we use the O-Gradient strategyto construct attack orderings, i.e., O=O-Gradient; accordingto Schechter et al. [24], the average message-based spoofingprobability is around 0.05, thus we set p(t)

s (v, u) = ps = 0.05for every edge (v, u) ∈ ET .

Datasets: We use three social network datasets. They areFlickr, Google+, and Twitter. We obtained the Flickr datasetfrom Mislove et al. [17]. In this social network, we take eachuser as a node and connect two users if they are in each other’sfriend lists. We obtained a Google+ snapshot from Gong etal. [11]. In this social network, each node is a Google+ userand two users are connected if they are in each other’s circles.The Twitter dataset was obtained from Kwak et al. [15]. In

3This is the recovery threshold chosen by the Microsoft’s and Facebook’strustee-based social authentication systems.

4We tried α from 0 to 0.9 with a step size 0.1, and we found that S-BadRankwith α = 0.9 achieves the largest expected number of compromised users.

9

T-RandomT-CF

T-JCT-AA

T-Degree0.0

0.5

1.0

1.5

2.0

Exp

ecte

dnu

mbe

rof

com

prom

ised

user

s ×105

S-RandomS-DegreeS-BadRankS-Closeness

(a) FlickrT-Random

T-CFT-JC

T-AA

T-Degree0

1

2

3

4

5

6

7

Exp

ecte

dnu

mbe

rof

com

prom

ised

user

s ×105


(b) Google+T-Random

T-CFT-JC

T-AA

T-Degree0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Exp

ecte

dnu

mbe

rof

com

prom

ised

user

s ×106


(c) Twitter

Fig. 3: The expected number of compromised users for different seed users selection strategies and trustee selection strategies. We find that,with 1,000 seed users, an attacker can compromise two to three orders of magnitude more users with low costs of sending spoofing messagesin some cases. However, our strategy T-Degree can decrease the expected number of compromised users by one to two orders of magnitude.

1 2 3 4 5 6 7 8 9 10Number of attack iterations

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

Exp

ecte

dn

um

ber

ofco

mp

rom

ised

use

rs ×105

O=O-RandomO=O-Gradient

(a) Number of compromised users

1 2 3 4 5 6 7 8 9 10Number of attack iterations

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Exp

ecte

dn

um

ber

ofsp

oofi

ng

mes

sage

s ×106

O=O-RandomO=O-Gradient

(b) Number of spoofing messages

Fig. 2: Impact of attackers’ resources and attack orderings on Flickr.First, we find that an attacker can perform forest fire attacks with lowcosts. Second, we find that O-Gradient compromises more users andrequires fewer spoofing messages than O-Random when the attackerperforms a given number of attack iterations.

this social network, each node is a Twitter user and two usersare connected if they follow each other.

Since a trustee-based social authentication requires a userto have enough number of friends from which trustees can beselected, we assume that users who have at least 10 friendsare appropriate to adopt the trustee-based social authenticationservice. In our experiments, we consider the worst scenario inwhich every such user has adopted the trustee-based socialauthentication service.5 Table II shows the statistics of ourinterest in the three social networks.

For simplicity, we only show results on the Flickr datasetin some of our experiments, but similar conclusions areapplicable to the other two datasets.

B. Experimental results

Impact of attackers’ resources and attack orderings: Thenumber of attack iterations is closely related to the costsof sending spoofing messages. Figure 2 illustrates the ex-pected number of compromised users (Figure 2a) and theexpected number of required spoofing messages (Figure 2b)as a function of the number of attack iterations on Flickr

5The fewer users adopt the trustee-based social authentication, the moresecure it is. An extreme case is that if no user adopts the system, then theforest fire attacks cannot be spreaded from the seed users at all.

for the ordering construction strategies O-Random and O-Gradient. The seed selection strategy is S-Degree and thetrustee selection strategy is T-CF.

First, we find that a strong attacker (e.g., an attackercontrolling a botnet) can perform firest fire attacks with lowcosts. This is because such an attacker can send out billionsof messages per day with a low cost [25], which is far morethan that needed to spoof trustees in our experiments.

Second, we find that the ordering construction strategy O-Gradient compromises more users and requires fewer spoofingmessages than O-Random when the attacker performs a givennumber of attack iterations. In other words, given the numberof spoofing messages the attacker can send, the attacker shouldadopt O-Gradient to construct the attack orderings.

Similar conclusions are applicable to other combinations ofseed selection strategies and trustee selection strategies. Thus,we don’t show the corresponding results for conciseness.

T-RandomT-CF

T-JCT-AA

T-Degree0.0

0.2

0.4

0.6

0.8

1.0

1.2

Expe

cted

num

bero

fspo

ofing

mes

sage

s ×107


Fig. 4: The expected number of required spoofing messages fordifferent seed users selection strategies and trustee selection strategieson Flickr. We observe that our strategy T-Degree can increase theattacker’s costs by a few times in some cases.

Impact of seed users selection strategies and trusteeselection strategies: Figure 3 and Figure 4 show the expectednumber of compromised users and the expected number ofrequired spoofing messages respectively for different seedselection strategies and trustee selection strategies. We candraw a few conclusions.

First, we find that forest fire attack is a potential big threat.For instance, when the seed users are appointed as trustees of

10

T-RandomT-CF

T-JCT-AA

T-Degree0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Ratio

s S-RandomS-DegreeS-BadRankS-Closeness

Fig. 5: The ratios between the expected number of users thatare compromised without spoofing attacks and those with spoofingattacks for different seed selection strategies and different trusteeselection strategies on Flickr. We find that the expected number ofcompromised users only decreases by 20% to 25% when the attackerdoes not use spoofing attacks in some cases, which implies thatspoofing attacks are optional.

many users (i.e., S-Degree) and the trustees are selected byT-CF, the attacker can compromise around 190,000, 660,000,and 3,760,000 users in the three social networks, respectively.This represents a growth of two to three orders of magnitudefrom the 1,000 seed users. However, our strategy T-Degree candecrease the expected number of compromised users by oneto two orders of magnitude. For instance, the expected numberof compromised users of T-Degree is 53 times smaller thanthat of T-CF on Twitter when the seed users selection strategyis S-Degree. Moreover, our strategy T-Degree can increase thecosts for attackers by a few times in some cases. For instance,the cost of sending spoofing messages of T-Degree is 3 timesbigger than that of T-CF and that of T-AA on Flickr whenthe seed users are appointed as trustees of many users. Thereason is that the trustee networks constructed by T-Degreeare more loosely connected, which makes it harder for forestfire attacks to propagate among them.

Second, even if the seed users are distributed among a socialnetwork uniformly at random (i.e., S-Random), the attackercan still compromise dozens of times more users. For instance,the attacker can still compromise 65 to 80 times more usersin Twitter depending on how trustees are selected.

Third, T-JC works better than T-AA which performs betterthan T-CF for all seed users selection strategies except S-Random. We find that the outdegree distributions of thetrustee networks constructed by T-CF are skewed towards highdegrees the most while those constructed by T-JC are skewedtowards low degrees the most. Thus, the seed users in thetrustee networks constructed by T-JC have lower outdegreesthan those constructed by T-AA, which have lower outdegreesthan those constructed by T-CF. As a result, T-JC performsbetter than T-AA and T-AA performs better than T-CF.

Impact of spoofing attacks: The attacker can choose notto send any spoofing message, which requires no costs ofsending spoofing messages. Moreover, spoofing probabilitiesmight decrease as the attacker performs more attack iterationsbecause trustees become aware of the spoofing attacks, and weconsider an extreme case where spoofing probabilities are zero

3 4 5Number of trustees

0

1

2

3

4

5

6

7

8

Exp

ecte

dn

um

ber

ofco

mp

rom

ised

use

rs ×104

FlickrGoogle+Twitter

(a) Impact of m

1 2 3 4 5k

103

104

105

106

107

Exp

ecte

dn

um

ber

ofco

mp

rom

ised

use

rs


(b) Impact of k

0.0 0.2 0.4 0.6 0.8 1.0ps

103

104

105

106

107

Exp

ecte

dn

um

ber

ofco

mp

rom

ised

use

rs


(c) Impact of ps

0.0 0.2 0.4 0.6 0.8 1.0Percentage of social authentication users

103

104

105

106

107

Exp

ecte

dn

um

ber

ofco

mp

rom

ised

use

rs


(d) Impact of ns

0.0 0.2 0.4 0.6 0.8 1.0pr

100

101

102

103

104

105Ex

pect

ednu

mbe

rofc

ompr

omis

edus

ers


(e) Impact of pr

Fig. 6: Impact of m, k, ps, ns, and pr . We find that k = 4 is a goodtradeoff between security and usability.

in all attack iterations. That spoofing probabilities are zero isequivalent to that the attacker does not use spoofing attacks interms of the number of compromised users.

Figure 5 shows the ratios between the expected numberof users that are compromised without spoofing attacks andthose with spoofing attacks for different seed users selectionstrategies and trustee selection strategies on Flickr.

We find that, without spoofing attacks, the expected numberof compromised users only decreases by 20% to 25% whenthe trustee selection strategy is T-CF or T-AA and the seedusers selection strategy is not S-Random, which means thatthe attacker can still compromise orders of magnitude moreusers and that spoofing attacks are optional in some cases.

Impact of mu, k, p(t)s (v, u), ns, and p

(t)r (u): Towards this

end, we assume that the attacker and the defender are makingtheir best choices, i.e., the seed selection strategy is S-Degreeand the trustee selection strategy is T-Degree. For simplicity,we assume mu = m for all users u who adopt the trustee-based social authentication service and p

(t)s (v, u) = ps for

11

all edges (v, u) in the trustee network and attack iterations.Moreover, we consider zeroth-order approximation of recoveryprobabilities, i.e., we use the average recover probability to ap-proximate recover probabilities of all users in different attackiterations. In other words, p(t)

r (u) = pr for all users and attackiterations. pr represents the average fraction of compromisedusers that realize their accounts are compromised and takeactions to recover them in each attack iteration.

Figure 6 shows the impact of m, k, ps, ns, and pr. Notethat the x-axis in Figure 6d is the percentage of users thatadopt the trustee-based social authentication service. We havea few observations.

Naturally, we observe that the expected number of compro-mised users decreases with k and pr, and it increases withm, ps, and ns. Moreover, the trends are qualitatively similaron the three social networks for each of the five parameters,which indicates the generality of our results.

The parameters m and k balance between security andusability. Specifically, a smaller m or a bigger k makes thesystem more secure but less usable. Figure 6b shows that theexpected number of compromised users decreases dramaticallywhen k increases from 1 to 4. For instance, the expectednumber of compromised users decreases by around one orderwhen k changes from 3 to 4. However, the decrease is notsignificant when k increases from 4 to 5. Existing trustee-based social authentication systems [7, 24] set k to be 3; ourobservations indicate that they could set k to be 4 to betterbalance between security and usability.

We find that the expected number of compromised userschanges dramatically when ps is around 0.2 on all the three so-cial networks. Moreover, almost all users that have adopted thetrustee-based social authentication service are compromisedwhen ps is bigger than around 0.3. Our results imply that it isurgent to mitigate the spoofing attacks via designing better userinterfaces to remind users of not easily sharing the verificationcodes to others via messages.

The expected number of compromised users increases dra-matically with ns in the range where ns is small. This impliesthat obtaining a few more seed users enables the attacker tocompromise significantly more users when the number of seedusers is small. Moreover, even if the number of seed users is 0,the attacker can still compromise thousands of nodes in Flickrand tens of thousands of nodes in Google+ and Twitter.

The expected number of compromised users significantlydecreases with pr. For instance, the number of compromisedusers decreases by one order in all the three social networkswhen around 40% of compromised users recover to be un-compromised in each attack iteration (i.e., pr = 0.4). Ourresults imply that it is important for users to take actions torecover their accounts quickly when suspicious activities oftheir accounts are detected.

VIII. DISCUSSION

A greedy seed users selection strategy: An attacker coulduse a greedy strategy to select seed users. Specifically, theattacker selects seed users one by one. To select a seed user,the attacker iterates over each user that is not a seed user yet;

for each such user u, the attacker pretends that u is a seed userand simulates our security model to predict the correspondingexpected number of compromised users; and the user u whichincreases the expected number of compromised users by themost is added as a new seed user.

We compared this greedy strategy with other seed usersselection strategies using a small subnetwork sampled fromthe Google+ dataset and found that it can compromise moreusers. However, it is not scalable to large social networks. Itis an interesting future work to make the strategy scalable.

Community-based trustee selection strategy: We can alsodesign a trustee selection strategy based on some notion ofcommunity. Specifically, we could select trustees for userssuch that the trustee network consists of isolated communities,which could constrain the propagation of forest fire attacks. Itis an interesting future work to explore these community-basedtrustee selection strategies.

Limiting the issuance of verification codes: A compromiseduser u might request a verification code from the serviceprovider when the attacker performs an attack trial to a userwho selects u as a trustee. Thus, a compromised user who isa trustee of many other users might request many verificationcodes in each attack iteration. Therefore, limiting the numberof verification codes that each user can request within a givenperiod of time (e.g., one hour) can slow down forest fireattacks. It is an interesting future work to explore the impactof such rate limitings on forest fire attacks and what strategiesattackers can adopt to maximize the number of compromisedusers given a time constraint.

IX. RELATED WORK

A. Social authentications

Depending on how friends are involved in the authenticationprocess, social authentications can be classified into trustee-based and knowledge-based social authentications. In trustee-based social authentications [4], the selected friends aid theuser in the authentication process. Knowledge-based socialauthentication, however, asks the user questions about his orher selected friends, and thus friends are not directly involved.

Trustee-based social authentication systems: Authentica-tion is traditionally based on three factors: something you know(e.g., a password), something you have (e.g., a RSA SecurID),and something you are (e.g., fingerprint).

Brainard et al. [4] proposed to use the fourth factor, i.e.,somebody you know, to authenticate users. We call the fourthfactor trustee-based social authentication. Originally, Brainardet al. combined trustee-based social authentication with someother factor as a two-factor authentication mechanism. It waslater adapted to be a backup authenticator [7, 8, 24]. Forinstance, Schechter et al. [2] designed and built a prototypeof trustee-based social authentication system which was inte-grated into Microsofts Windows Live ID system. Moreover,Facebook designed Trusted Friends in October, 2011 [8], andit was improved to be Trusted Contacts [7] in May, 2013.

Knowledge-based social authentication systems: Such so-cial authentications are still based on something you know.

12

Yardi et al. [26] proposed a knowledge-based authenticationsystem based on photos to test if a user belongs to thegroup (e.g., interest groups in Facebook) that he or she triesto access. Facebook recently launched a similar photo-basedsocial authentication system [21], in which Facebook showsa few photos of a friend of a user and asks the user to namethe friend. Such system essentially relies on the knowledgethat the user knows the person in the shown photos. However,recent work has shown, via theoretical modeling [14] andempirical evaluations [20], that photo-based social authenti-cations are not resilient to various attacks such as automaticface recognition techniques, questioning their use as a backupauthentication mechanism.

B. Diffusion models

Our forest fire attacks essentially describe diffusion pro-cesses in a trustee network. We review a few representativediffusion models from different research areas and discuss thedifferences between them and our work.

Updates propagation models: Malkhi et al. [16] proposedthe l-Tree propagation model to diffuse updates among alarge distributed system of data replicas, some of which mightexhibit Byzantine failures. Their model assumes a point-to-point communication for each pair of nodes. A node thatalready receives the update is called active, otherwise it iscalled inactive. Initially, a small set of nodes are active. Eachactive node is associated with a candidate set of nodes. Ineach iteration, each active node is allowed to send the updateto at most F nodes which are selected from the correspondingcandidate set uniformly at random. An inactive node becomesactive if it receives the update from at least k other nodes.

There are two key differences between our forest fire attacksand the l-Tree propagation model. First, an uncompromised(i.e., inactive) node can receive verification codes (i.e., up-dates) from uncompromised trustees via spoofing attacks inforest fire attacks while an inactive node can only receiveupdates from active nodes in the l-Tree model. Second, in eachiteration, each compromised node sends verification codes toall nodes that select it as a trustee in forest fire attacks whilean active node can only send the update to at most F nodesin the l-Tree model.

Information propagation models: Models for how productsand innovations propagate on a social network have beenstudied in various domains such as viral marketing and thespread of technological innovations. These models can bedivided into two categories [13]: linear threshold model andindependent cascade model. Again, a node is said to be activeif it already adopts the corresponding product or innovation,otherwise it is inactive. In both models, a small set of nodesare active initially.

Linear threshold model: In this model, each node u isassociated with a threshold, which indicates the fraction of u’sfriends that must become active before u becomes active. Thepropagation proceeds deterministically in discrete iterations: inthe tth iteration, an inactive node becomes active if its fractionof active friends is no less than u’ activation threshold.

Independent cascade model: Different from the linearthreshold model, the independent cascade model proceeds indiscrete iterations according to the following randomized rule:when a node u first becomes active in the tth iteration, ithas a single chance to activate each of its currently inactivefriend v and succeed with some probability which encodes theinfluence of u to v.

The key difference is that verification codes can be propa-gated from an uncompromised (i.e., inactive) node to anotheruncompromised node via spoofing attacks in forest fire attackswhile an inactive node can only be activated by active nodes inthe linear threshold model and the independent cascade model.Moreover, an active node only has a single chance to activateits friends in the independent cascade model while a compro-mised node can send verification codes to the uncompromisednodes that select it as a trustee as many times as it wants.

Epidemic propagation models: Epidemic propagation mod-els describe the propagations of various contagious diseasessuch as sexually transmitted diseases, inuenza, and measles.

One popular epidemic model is the so-called SIR model(Chapter 21 of [6]). Different from the above reviewed modelsin which each node can be either active or inactive, SIR modelassumes that each node can be in one of the three states,i.e., susceptible, infectious, and removed. Initially, a set ofnodes are infectious and all other nodes are susceptible. Eachinfectious node u remains infectious for a fixed number ofiterations I . In each of the I iterations, u has some probabilityof passing the disease to each of its susceptible neighbors.After I iterations, u becomes removed, which means that ucannot catch nor propagate the disease any more.

Again, a susceptible (i.e., uncompromised) node can onlybe influenced by infectious (i.e., compromised) nodes in theSIR model, which is different from the forest fire attacks.Moreover, the state removed is not meaningful in the contextof forest fire attacks since a node could be compromised againeven if it recovers the account and resets the password.

Summary: The key difference between forest fire attacksand previous diffusion models lies in the spoofing attacks.Specifically, an uncompromised node can obtain verificationcodes from its uncompromised trustees via spoofing attacksin the forest fire attacks while an inactive or susceptible nodecan only be influenced by its active or infectious neighbors inprevious diffusion models.

X. CONCLUSION AND FUTURE WORK

In this paper, we provide the first systematic study aboutthe security of trustee-based social authentications. First, weintroduce forest fire attacks. In these attacks, an attacker firstobtains a small number of compromised seed users and theniteratively attacks the rest of users according to a priorityordering of them. Second, we construct a probabilistic modelto formalize the threats of forest fire attacks and their costs forattackers. Third, we introduce a few strategies to select seedusers and construct priority orderings, and we discuss variousdefense strategies. Fourth, via extensive evaluations using threereal-world social network datasets, we find that forest fireattack is a potential big threat. For instance, with a small

13

number (e.g., 1,000) of seed users, an attacker can furthercompromise two to three orders of magnitude more users insome scenarios with low (or even no) costs of sending spoofingmessages. However, our defense strategy, which guaranteesthat no users are trustees of too many other users, can decreasethe number of compromised users by one to two orders ofmagnitude and increase the costs for attackers by a few timesin some cases. Moreover, the recovery threshold should be setto be 4 to better balance between security and usability.

A few future directions include evaluating forest firest at-tacks on real social authentication systems such as Facebook’sTrusted Contacts, designing new attack and defense strategies,and optimizing forest fire attacks given a time constraint.

REFERENCES

[1] L. A. Adamic and E. Adar. Friends and neighbors on theweb. Social Network, 25(3):211–230, 2003.

[2] BadRank. http://pr.efactory.de/e-pr0.shtml.[3] J. Bonneau and S. Preibusch. The password thicket:

technical and market failures in human authentication onthe web. In WEIS, 2010.

[4] J. Brainard, A. Juels, R. Rivest, M. Szydlo, and M. Yung.Fourth-factor authentication: Somebody you know. InCCS, 2006.

[5] J. P. J. Bunnell and R. Henderson. Cost-effective com-puter security: Cognitive and associative passwords. InOZCHI, 1996.

[6] D. Easley and J. Kleinberg. Networks, Crowds, andMarkets: Reasoning About a Highly Connected World.Cambridge University Press, 2010.

[7] Facebook’s Trusted Contacts. goo.gl/xHmVHA.[8] Facebook’s Trusted Friends. goo.gl/KdyYXJ.[9] H. Gao, J. Hu, C. Wilson, Z. Li, Y. Chen, and B. Zhao.

Detecting and characterizing social spam campaigns. InIMC, 2010.

[10] E. Gilbert and K. Karahalios. Predicting tie strength withsocial media. In CHI, 2009.

[11] N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov,V. Sekar, and D. Song. Evolution of social-attribute net-works: Measurements, modeling, and implications usinggoogle+. In IMC, 2012.

[12] P. Jaccard. Etude comparative de la distribution floraledans une portion des alpes et des jura. Bulletin de laSociete Vaudoise des Sciences Naturelles, 1901.

[13] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing thespread of influence through a social network. In KDD,2003.

[14] H. Kim, J. Tang, and R. Anderson. Social authentication:Harder than it looks. In FC, 2012.

[15] H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter,a social network or a news media? In WWW, 2010.

[16] D. Malkhi, Y. Mansour, and M. K. Reiter. Di usion with-out false rumors: on propagating updates in a byzantineenvironment. Theoretical Computer Science, 2003.

[17] A. Mislove, H. S. Koppula, K. P. Gummadi, P. Druschel,and B. Bhattacharjee. Growth of the flickr social net-work. In WOSN, 2008.

[18] Node Centrality. https://en.wikipedia.org/wiki/Centrality.[19] K. Okamoto, W. Chen, and X.-Y. Li. Ranking of

closeness centrality for large-scale social networks. InFAW, 2008.

[20] I. Polakis, M. Lancini, G. Kontaxis, F. Maggi, S. Ioanni-dis, A. D. Keromytis, and S. Zanero. All your face arebelong to us: Breaking facebook’s social authentication.In ACSAC, 2012.

[21] A. Rice. Facebook’s knowledge-based social authentication.http://blog.facebook.com/blog.php?post=486790652130.

[22] G. Sabidussi. The centrality index of a graph. Psychome-trika, 1966.

[23] S. Schechter, A. J. B. Brush, and S. Egelman. It’s nosecret: Measuring the security and reliability of authen-tication via ‘secret’ questions. In IEEE S & P, 2009.

[24] S. Schechter, S. Egelman, and R. W. Reeder. It’s notwhat you know, but who you know. In CHI, 2009.

[25] Spam messages. http://en.wikipedia.org/wiki/Botnet.[26] S. Yardi, N. Feamster, and A. Bruckman. Photo-based

authentication using social networks. In WOSN, 2008.[27] M. Zviran and W. J. Haga. User authentication by

cognitive passwords: an empirical assessment. In JCIT,1990.

APPENDIX ATHE ATTACK MAXIMIZATION PROBLEM IS NP-COMPLETE

To prove the hardness of the attack maximization problem,we only need to show the hardness of a correspondingdecision problem, i.e., given GT , k, ns, n, and an integer l,it is NP-Complete to decide whether nc(GT , k, ns, n) ≥ l.This follows from a reduction from the NP-Complete SetCover problem: given a ground set X = {x1, . . . , xa},and a collection of its subsets S1, S2, · · · , Sm, we want todetermine if there exists t subsets Si1 , Si2 , · · · , Sit such thatSi1 ∪ Si2 ∪ · · · ∪ Sit = X . We show that this can be viewedas a special case of the attack maximization problem, wherethe spoofing probabilities and recovery probabilities are all 0.

Given any instance of a set cover problem, we construct acorresponding trustee network as follows: there are k nodes foreach Sj , and one node for each xi; each xi has all the k copiesof Sj as trustees iff xi ∈ Sj ; in addition, for each Sj we havea2k2 dummy nodes each of which designates the k copies ofSj as trustees. Then the Set Cover problem is satisfiable iffthe attacker can compromise at least l = ta2k2 + tk+a nodeswith a seed set whose size is tk, i.e., nc(GT , k, tk, n) ≥ l.

http://pr.efactory.de/e-pr0.shtml

goo.gl/xHmVHA

goo.gl/KdyYXJ

https://en.wikipedia.org/wiki/Centrality

http://blog.facebook.com/blog.php?post=486790652130

http://en.wikipedia.org/wiki/Botnet

Documents

On the Security of Trustee-based Social Authentications · fact, security of users are correlated in trustee-based social authentications, in contrast to traditional authenticators