s5.2_04-molloy_pap.pdf

7/27/2019 s5.2_04-molloy_pap.pdf

http://slidepdf.com/reader/full/s5204-molloypappdf 1/8

Risk-Based Access Control Decisions Under Uncertainty

Ian Molloy, Pau-Chen Cheng, Jorge LoboIBM Research

{molloyim,pau,jlobo}@us.ibm.com

Luke Dickens, Alessandra RussoImperial College

{lwd03,ar3}@doc.ic.ac.uk

Charles MorissetRoyal Holloway, University of London

[email protected]

Abstract—This paper addresses making access control deci-sions under uncertainty, when the benefit of doing so outweighsthe need to guarantee the correctness of the decisions. Forinstance, when there are limited, costly, or failed communica-tion channels to a policy-decision-point (PDP). Previously, localcaching of decisions has been proposed, but when a correctdecision is not available, either a PDP must be contacted, ora default decision used. We improve upon this model by usinglearned classifiers of access control decisions. These classifiers,trained on known decisions, infer decisions when an exact matchhas not been cached, and uses intuitive notions of utility, damageand uncertainty to determine when an inferred decision is

preferred over contacting a remote PDP. There is uncertainty inthe inferred decisions, introducing risk. We propose a mechanismto quantify such uncertainty and risk. The learning componentcontinuously refines its models based on inputs from a centralPDP in cases where the risk is too high or there is too muchuncertainty. We validated our models by building a prototypesystem and evaluating it against real world access control data.Our experiments show that over a range of system parameters, itis feasible to use machine learning methods to infer access controldecisions. Thus our system yields several benefits, includingreduced calls to the PDP, communication cost and latency;increased net utility and system survivability.

I. INTRODUCTION

Modern access control systems rely on the PEP/PDP ar-

chitecture: a Policy Enforcement Point (PEP) forwards eachaccess request made within an application to a Policy Decision

Point (PDP), which analyses this request and returns an allow/-

deny authorization decision. In some solutions, such as [1]–[3],

a PDP is commonly implemented as a dedicated authorization

server and is located on a different node than the PEPs.

To enforce a consistent policy throughout the system, this

architecture relies on the PEP being able to contact the PDP

to query decisions, and therefore suffers from a single point

of failure. In particular, key factors that affect the performance

of the PEP are latency of the communication with the PDP,

reliability and survivability of this connection (and the PDP

itself), as well as the aggregated impact of communication

costs (which can include contacting a human to perform theauthorization). In several contexts such as mobile applications,

these costs may be prohibitive.

A number of approaches have been proposed to address

these issues, one common theme is to cache access control

decisions [4], [5] at the PEP, such that the PEP does not have

to forward the same access request more than once to the PDP:

either the PEP finds the request-decision pair in the history, or

it forwards the request. Others explore the tradeoff between

efficiency and accuracy in an access control context, e.g., Ni

et al. [6] use a machine-learning algorithm to build a classifier

of policy decisions from past request-decision pairs. Once the

classifier is accurate enough, it is used as a local decision

point. As with any such approximate classifier, there is a

degree of uncertainty in every decision; thus, every decision

made is inherently associated with a risk of being wrong, either

incorrectly allowing or denying some requests.

This paper generalises of the machine-learning approach

from [6], by taking account of the uncertainty in each locally

inferred decision. We propose a model where this uncertainty

can be used to determine risk of making this decision. Thisallows the policy to trade-off the utility of making the local

decision against the risk associated with getting it wrong: if

favourable it will enact the local decision, otherwise it defers

to the central PDP. Commonly used access control models [7]–

[11] define access control policies in terms of model-specific

relationship between subjects and objects. Our approach is

not tied to a particular model, but it uses attributes of access

requests, e.g., subjects and objects, and assumes the policy can

be defined using these attributes.

We describe a model for quantifying uncertainty and risk,

and show how these can inform access control decisions. Our

work focuses on estimating the uncertainty about the correct-

ness of the inferred access control decisions, and assumes thatestimates of (or distributions over) the gain, loss and damage

associated with correct and incorrect decisions are available

(or can be obtained). There are many situations in which the

model can be applied. For instance, the scenario described

in [6], where expensive calls to a call center to resolve access

control decisions are made. There is also a large body of

research in the estimation of the value of resources and the

cost of security breaches and misuse. [12] describes a proof

of concept method to estimate the value of resources in an

enterprise, such as jewel code, acquisition plans, trade secrets,

and personal information. The estimates of these values can

be continuously refined and verified by the current operations

of the company. For example, IBM was granted over 5,000patents in 2010 alone, and licenses many of these to other

organizations, providing an accurate distribution of their value

over time. Other examples include cases where fine grained

service level agreements are defined a priori such as the

credit card example briefly discussed in Section IV-D (see

also experimets in Section V-C).

We present distributed access control system to illustrate

and test this principle: local decision points determine whether

the tradeoff between utility and uncertainty associated with a



local decision is favorable, and defer to the central PDP for a

binding decision if not. This general framework allows us to

explore the balance between avoiding potential errors in local

decisions, and the desire to limit reliance on the central PDP.

We present three methodologies for making decisions:

• Expected Utility A risk neutral approach that weights

expected gains and damage equally.

• Risk Adjusted Utility A pessimistic view of utility, whereuncertainty about damage reduces desirability.

• Independent Risk Constraints Augments expected utility

with independent risk thresholds, where high utility de-

cisions can be rejected if outweighed by the risk.

This system is validated with data from a large corporation,

used to grant entitlements and permissions to system adminis-

trators. Classifiers are initially untrained and begin by always

consulting the central PDP. Over time, the classifiers are

trained using the central PDP’s decisions, improving accuracy

and reducing the uncertainty, yielding fewer queries. Due to

space constraints, experiments focus on the risk adjusted utility

measure. Compared with the caching algorithm, our results

show that our learning based approach reduces the queries toPDP by as much as 75% and increases the system utility up to

eight fold when the cache hit ratio is low. When the hit ratio

is high, performance is comparable to the caching algorithm.

Our results from this dataset show that our machine-learning

based approach robust and accurate and can be viably used

for informing distributed access control decisions; and more

conservative measures of risk can be applied if needed. We

believe these techniques can generalize to other use cases

and domains, including information sharing in military and

coalition environments; financial applications like credit re-

port processing; and load balancing services like Netflix or

document repositories. Compared to pure machine-learning

solutions, our methods estimate the uncertainty of each clas-

sification, allowing one to arbitrarily minimize the pitfalls of

using approximate decisions, but can ameliorate the higher

communication costs of naive caching solutions.

This paper is organized as follows. Section II briefly dis-

cusses related work. Section III discusses the architecture of

access control systems based on our approach. We present

in Section IV different techniques to assess the risk and

uncertainty of local access control decisions. An experiment

and evaluation of our approach, applied on different scenarios,

is presented in Section V. Section VI concludes this paper.

I I . RELATED W OR K

The caching of access control decisions has been exploredin [1]–[3], and the policy evaluation framework in [13] uses

a caching mechanism, which can dramatically decrease the

time to evaluate access control requests. Clearly, caching

approaches are only valid when the cache is sure to return the

same decision as the central PDP, and techniques to ensure

strong cache consistency are proposed in [14]. In distributed

systems, each node can build its own cache in collaboration

with other nodes, improving the accuracy of each cache [15].

These caching methods return a local decision on a request

only if this exact request has been previously submitted and

answered. In the context of the Bell-LaPadula model [7],

Crampton et al. [4] present a method to calculate previously

unseen decisions, based on the history of submitted requests

and responses, and the formal rules governing Bell-LaPadula.

A similar approach for the RBAC model [8] is presented

in [5] and an extension to Bloom filters is used in [16]. These

inference techniques require knowledge of the underlying

access control model, and are most efficient for hierarchical

and structured access control models. Inference algorithms

have been proposed based on the relationship of the subjects

and objects in a database [17], [18]. However, the structure of

the subject/object space still needs to be known.

The use of machine-learning to make access control deci-

sions is proposed in [6], where the authors note the classifier

can, at times, return a decision that conflicts with the central

PDP. A notion of risk in access control is introduced in [19]–

[21], where they blur the distinction between allowing and

denying an access; instead, each access is associated with a

potential utility and damage. We reuse these notions here, in

order to estimate the potential impact of each access.III. UNCERTAINTY IN ACCESS C ONTROL

Figure 1 depicts the architecture of our system for access

control under uncertainty. An access request req is a triple

(s,o,a), where subject s accesses object o with access mode a.

We assume at least one oracle node with the full policy, called

the central PDP; and a correct local decision is one that agrees

with the central policy. Each node has a local PDP with three

parts. The decision-proposer (proposer) returns a guess of the

correct decision and a measure of the uncertainty. The risk-

assessor (assessor) determines whether the decision is taken

locally or deferred to the central PDP. The decision-resolver

(resolver) returns an enforceable decision. While any discreteset of enforceable decisions is supported, here we restrict the

set to be Datm = {allow, deny}. A PEP passes requests to

the proposer, and enforces decision from the resolver.

Decision-Proposer

Risk Assessor

Decision-Resolver

PEP

PDP

req

req

req

(dec, υ)∈ Dprp

req

dec ∈ Ddef

req,dec ∈ Datm

Datm : Atomic decisions, e.g. {allow, deny}D prp : Proposals, e.g. Datm × [0, 1]

Ddef : Deferrable decisions, e.g. Datm ∪ {defer}Fig. 1. Architecture of a local node

a) Decision-Proposer: A decision-proposer receives an

access request and returns a proposal (dec, υ), where dec ∈Datm is a guess of the correct decision for this request, and υ is

an uncertainty measure of the correctness of dec. Most simply

υ is a probability p, and a proposal (allow, p) is logically

equivalent to (deny, 1 − p) if Datm = {allow, deny}. When p is estimated, we may want to quantify the uncertainty in p.

In our learning based approach, the proposer is mainly a



classifier produced by a machine-learning algorithm, which

treats a decision in Datm as a class and assigns a request to

a class based on the request’s attributes; and the uncertainty

υ is a pair (α, β ); where α, β ∈ [1, ∞] are parameters of a

beta distribution [22], Beta(µ|α, β ). This distribution captures

our uncertainty about p, and µ is a variable approximating

p. The α and β values are computed from the number of

correct and incorrect decisions made by the classifier when

it is evaluated against the training and testing sets of the

the learning process. Elements of these sets are past requests

tagged with the corresponding correct decisions. The beta

distribution fits well with our learning based approach and

gives finer control over risk estimation—see Section IV.

The learning approach makes the assumption that two

requests with similar attributes lead to the similar decisions.

This assumption is supported by underlying principles of many

access control models, a large body of work on learning access

control policies, such as role mining, and best practices exer-

cised by large organizations. Many models, such as attribute-

based access control, assign rights based on attributes; one

only needs to find an appropriate distance measure, suchas the Hamming distance between the subjects’ attributes to

capture similarity. Though it is possible to come up with

a configuration that violates it, this assumption has been

empirically shown to hold for concrete systems [6], and the

large body of research on role mining [23], [24]. In some

models such as the Chinese Wall [9], the decision to grant an

access can also depend on the context. Such models would

need to be restructured to encode context into the attributes

of the subjects and objects; the notion of distance between

requests will be influenced by this encoding.

b) Risk Assessor: The risk-assessor takes a request and

a proposal and determines whether a decision can be madelocally, or should be deferred to the central PDP. The risk

assessor can return any decision in the set of deferrable

decisions, Ddef = Datm ∪ {defer}.

A trivial assessor might have an uncertainty threshold above

which all decisions are deferred to the central PDP – all other

decisions being made locally. However, different requests have

different potential impacts on the system. Some requests have

high potential utility for the system, and wrongly denying

them would not realise this. Some requests have high potential

damage, and wrongly allowing them is a threat to the integrity

and survival of the system. Other factors to consider are the

cost of contacting the central PDP, and the system’s risk

appetite, which controls how risk-averse decisions should be.

A risk-averse decision trades off expected utility for greater

certainty in the outcome. More details on the risk assessor are

given in Section IV.

c) Decision Resolver: A decision-resolver takes a re-

quest, r, and a deferrable decision, decd ∈ Ddef and returns

an enforcable decision, deca ∈ Datm. Typically, deca = decdif decd∈ Datm; otherwise, a r is deferred to the central PDP.

IV. ASSESSING L OCAL D ECISIONS

This section considers how the assessor takes a proposal

from the proposer, and returns a firm decision. As discussed,

the proposal may conflict with the decision that would have

been made if passed to the central policy. If a request that

would be allowed by the central PDP is denoted as valid ,

then there are two kinds of errors that the local PDP could

make: false-allows (allowing an invalid request) and false-

denies (denying a valid request). These errors can have neg-

ative consequences: a false-allow can result in leakage of

information, corruption of information, or privilege escalation;

a false-deny can result in disruption of service, breach of

service level agreements (SLA), and or possibly reputational

damage. Making a correct decision leads to gains and benefits.

Ideally, an access control system should enable as many

appropriate accesses as possible, and so our primary concern is

to maximize the number and value of these accesses—subject

to limiting costs and damages. This trades off the potential

gains of each access control decision, against the costs and

damages that may result from poor decisions. Consider anassessor deciding whether to locally allow or deny a request,

(s,o,a), or to pass this decision on to a central PDP. The

assessor has a ternary choice to make: whether to allow the

request locally, allow; to deny locally, deny; or to defer the

decision to the central PDP, defer. We assume that the central

PDP is an oracle, and always gives the correct decision. If

the assessor decides allow, then either the request is valid

(a true-allow) and the system makes a gain, g, because an

appropriate access has been made; or the request is invalid and

the decision has caused some amount of damage, dA, due to

an inappropriate access being made. Likewise, if the assessor

decides deny, then either the request was invalid, deny is the

correct decision and nothing additional happens (i.e. a gain of 0), or the request was valid and the deny decision is incorrect,

and a damage, dD, is incurred for a false-deny. If the assessor

decides to defer and the request is valid, then the central PDP

will grant the request and the system will make a gain, g , but

incur a cost, c (the contact cost). If the decision is defer and

request is not valid then the central PDP will deny the request

and there will be no gain and again a contact cost, c. As the

central PDP is an oracle, it never makes an incorrect decision,

so there is no damage associated with this.

These gains, costs and damages are shown in Table I (a).

In general, the gains, costs and damages can be probability

distributions over potential values. For instance, in some cases,

wrongly allowing an access only implies the probability of anattack. For simplicity in our examples, we assume these to be

fixed, known values unless otherwise stated. Standard gains

and costs (negative gains) are kept separate from damage,

which is assumed to be associated with rare and costly

negative impacts; this distinction becomes important, when we

consider risk measures in Sections IV-B and IV-C and/or when

considering probability distributions over these values. The

impacts presented in Table I (a) are not the most general that

could be conceived, but are sufficiently rich to be of interest.



(a) Outcome Gains and Damage

Request Valid Request Invalidgain damage gain damage

allow g 0 0 dAdeny 0 dD 0 0defer g − c 0 −c 0

(b) Expected utilities

U (·|allow, p) U (·|deny, p

)

allow E( pg − (1 − p)dA) E((1 − p)g − pdA)deny E(− pdD) E(−(1 − p)dD)defer E( pg − c) E((1 − p)g − c)

TABLE IWHEN GAINS, COSTS AND DAMAGES ARE INCURRED AND THEIR

INFLUENCE ON UTILITY.

For example, there may be a gain associated with denying

an invalid request, e.g., an increase in system reputation, or a

damage in allowing a valid request, e.g., increasing the number

of copies of a document.

All of the approaches described in this paper make a trade-

off between the uncertainties in classification and in estimation

of these gains and damages. To apply our methodology to areal example, we must be able to (approximately) compute

these gains and damages or find probability distributions over

them. We note here that there are many applications and

domains where these gains and damages can be estimated

either empirically or analytically. For example, in financial

service transactions like credit cards, these costs may be

estimated easily or provided a priori by some agreement.

Other examples include cases where fine grained service level

agreements are defined a priori. We discuss further details in

Section IV-D.

A. Expected Utility

To aid this decision, the proposer attempts to classifyrequest, (s,o,a), to determine whether it should be allowed

or denied, and include some measure of its uncertainty in this

classification. In the simplest case, the proposer can return one

of two kinds of answers, e.g. (allow, p) or (deny, p). The first

response, (allow, p) means that the proposer has classified the

request as an allow with probability p that this is correct, i.e.

that the request is valid. Likewise, (deny, p) means that the

proposer has classified the request as a deny with probability

p that the request is invalid.

The expected utility for a decision, dec ∈ Ddef , is the

expected gain of each outcome (request valid or invalid)

minus the expected damage, and these utilities are shown in

Table I(b). In this approach, we assume a risk neutral posture,meaning deferrable decisions are based purely on the expected

utility, U (·), for each decision. The assessor simply returns the

decision corresponding to the highest utility.

There are cases where the assessor may not defer but

return a decision contrary to the proposer’s. For example,

when the proposer returns (allow, p) given a request where

dA dD, dD < c and g ≈ c, then it may occur thatU (defer) < U (deny). Since our goal is to execute the central

policy as faithfully as possible, our system always defers to

the central PDP in these cases. Hence, given a request req and

a decision proposer δ returning a decision and a probability,

the expected utility assessor, ρeu, returns, given a request

req : allow if δ (req ) = (allow, p) and U (allow) ≥ U (defer);

deny if δ (req ) = (deny, p) and U (deny) ≥ U (defer); defer

otherwise.

Thus in this first approach, we directly use the certainty of

the decision to compute the expected utility and choose the

decision with the highest utility. However, this approach can

lead to rare but significant damages due to incorrect decisions.

Because while the probability of an incorrect decision may be

very small to produce a small expected damage, the value of

damage can be significant. In the following approaches we

will try to explicitly account for the risk of such significant

damage.

B. Risk Adjusted Utility

Now let’s assume that we are making the same decision,

but that we want to use a risk based measure. In other

words, we want to explicitly account for uncertainty in the

proposal. For this, the proposer returns a beta distribution,describing uncertain knowledge about the probability of a

correct classification. If the proposer classifies the request as

valid then it returns (allow, α , β ), where α, β ∈ [1, ∞] are

the parameters of the beta distribution [22]. Likewise, if it

classifies the request as invalid then it returns (deny, α, β ).

Again we focus just on the allow case, and assume that

the assessor can return either allow and defer. If the proposer

returns (allow, α , β ), we denote by p = αα+β

(the expectation

of the beta distribution), as the expected probability that this

decision is correct. The risk adjusted utility assumes that any

uncertainty in the utility is dominated by the damage term, this

can be when the damage is fixed, known and very large (the

case we examine here), or if there is a probability distributionover damage with high variance. Taking advantage of this

assumption, we use the expected values of gain and cost,

but are pessimistic when considering the impact of damage

and use the expected shortfall at some significance value of

n. For example, a 5% expected shortfall (n = 0.05) uses

the average damage in the worst 5% of universes. The risk

adjusted utility for various decisions and proposer outputs are

shown in Table II (a).

For our example of fixed, known value of damage dAand given proposal (allow, α , β ), we can simplify further by

using the pessimistic probability, ˜ p↓n, which is essentially the

expected shortfall of the probability, p, given α and β —e.g.

compute value x such that the incomplete beta function of xequals n, ˜ p↓n is expected value of p conditioned on p < x.

This corresponds to taking a pessimistic view of the evidence,

to imagine there is a higher probability of incurring dam-

age than would normally be expected. Therefore probability

˜ p↓n < p = αα+β

, takes a pessimistic view of how certain the

proposer is to have classified correctly when considering the

impact of damage, and U (allow) ≈ pg − (1 − ˜ p↓n)dA.

Given a request req and a proposer δ returning a pro-

posal δ (req ), the risk adjusted utility assessor, ρau, returns,



(a) Risk Adjusted Utilities

U (·|allow,α,β ) U (·|deny, α, β )

allow E(pg)+ESn(−(1-p)dA) –deny – ESn(−(1-p)dD)

defer E(pg − c) E((1-p)g − c)

(b) Risks

R(·|allow,α,β ) R(·|deny, α, β )

allow −ESn(pg−(1-p)dA) –deny – −ESn(−(1-p)dD)

defer −ESn(pg−c) −ESn((1-p)g −c)

TABLE IIRISK A DJUSTED U TILITIES AND R ISKS FOR VARIOUS DECISIONS GIVEN

PROPOSER OUTPUTS.

given a request req : allow if δ (req ) = (allow, α , β ) andU (allow) ≥ U (defer); deny if δ (req ) = (deny, α, β ) and

U (deny) ≥ U (defer); defer otherwise. This approach differs

from the risk-neutral approach from the previous section, in

taking a risk averse approach, which mitigates against very

large damages.

C. Independent Risk Constraints

Our second risk method, called independent risk constraints,uses the standard expected utility to rank the decisions in

order of risk neutral preference, and then reject any which do

not satisfy our risk constraint(s). The proposer classifies the

requests and returns either (allow, α , β ) or (deny, α, β ), and

we calculate the expected utilities as they appear in Table I (b)

using p = αα+β

for the probability of allow, and p = α

α+β

for the probability of deny. The risk, R(·), for each deferrable

decision is calculated independently. The risk for a decision,

dec, takes a pessimistic view of the utility, and is the expected

shortfall, at significance n, of the inverse of the utility. The

risks for each decision and proposer output are shown in Table

II (b).

To simplify, as we did for the risk adjusted utility, we could

assume large fixed, known damages, and approximate the risk

using the pessimistic probability. For example, given proposal

(allow, α , β ), we calculate the risk as R(allow) ≈ (1−˜ p↓n)dA.

The decision making process involves either a fixed thresh-

old t > 0, or more generally for each deferrable decision,

dec ∈ Ddef , with utility, udec, a threshold that depends on the

utility, i.e. t(udec). We then take the following steps

1) Find expected utilities and risks as shown in Tables I (b)

and II (b).

2) Rank the decisions according to their utilities.

3) Check each decision, dec ∈ Ddef in turn (highest utility

first), and choose the first decision whose risk is belowt(udec).

For example, assume that probability p = αα+β

is suffi-

ciently close to 1 that allow is preferable to defer. Notice

also that defer carries zero risk and will always be risk

acceptable. Given request req and decision proposer δ re-

turning proposal δ (req ), the independent risk assessor, ρic,

returns, given a request req : allow if δ (req ) = (allow, α , β ),U (allow) ≥ U (defer) and R(allow) ≤ t(U (allow)); deny

if δ (req ) = (deny, α, β ), U (deny) ≥ U (defer) and

R(deny) ≤ t(U (deny)); defer otherwise.

More generally still we may not always have a decision

with zero risk, and hence we may need some way to resolve a

decision when all decisions exceed the risk threshold. Further,

one may wish to be able to override a decision whose risk

is greater than t(udec) by compensating for the additional

aggregate risk. For example, additional risk mitigating mea-

sures, risk escrow services, or bounding the aggregate risk

and limiting its distribution [21] have been proposed. This

is an orthogonal research problem and the most appropriate

technique is a subject of future research.

Both risk methods have their advantages and disadvantages.

For instance, risk adjusted utility behaves more like standard

expected utilities, and there is always a clearly defined decision

for the assessor. Conversely, the independent risk assessor

keeps the two measures utility and risk separate, is more flex-

ible and we could define multiple risk thresholds at different

significance levels. However, we do not necessarily know how

to deal with situations where all decisions violate the risk

constraint(s).

D. Estimating Gain and Damage with UncertaintyThe instantiation of the parameters in order to address a

real-world situation is a complex task, as it requires one to

evaluate the damage and utility associated with access control

entities. In some instances, all values are easily derivable and

in a common set of units. For example, when processing credit

card transactions, all values are in currencies, and rates are

agreed upon a priori through formal contracts. Consider the

Visa Debit Processing Service [25], [26]: the damage from a

false allow is the transaction amount, plus a Chargeback Fee,

e.g., a fixed amount of $15 or a percentage, e.g., 5%; the gain

is the profit margin for the items sold, minus any processing

fees, e.g., 20% the value of the transaction; the contact cost is

an Authorisation Fee, a fixed value to process the transactionor a percentage (or both), e.g., $1 or 3%; and the false deny can

be estimated as a fixed fraction of the gains from customers

that do not have alternative forms of payment.

In many scenarios, the actual value of gains, losses, and

damages may be given in different units, e.g., currency,

reputation, and physical assets, and may themselves be uncer-

tain. Our discussion above assumes these measures are point

estimates in a common set of units, for example currency,

where there is further uncertainty in such a conversion.

We suggest the same techniques used to calculate the risk

of making an incorrect decision described in Section IV-B can

be applied to selecting conservative, risk-adjusted conversion

rates and handling the uncertainty in value distributions. If oneobtains a probability density function (PDF) over values, say

human life to dollars, the expected value can be used for gains,

while a pessimistic value can be used for damages, resulting in

conservative estimates. For example, if we take US household

income as an estimator, then the median income, around $34–

55K, is the expected and the top 5%, $157K, might be used

for the risk.

Clearly, our method will work best when these values are

known, such as in credit card processing use case. When these



values have distributions over some ranges, our approach is

sound as long as such distributions are conservative, over-

estimating damages and losses while under-estimating gains.

Further, note that we are being conservative here by taking the

pessimistic probability and the pessimistic damage (the worstn% of the worst n%.)

V. EXPERIMENTS

In this section, we describe our prototype system andexperimental results. To evaluate risk-based decision mak-

ing, we devise a simple decentralized access control system

consisting of a single local risk-based policy enforcement

and decision point, and a central policy decision point that

we call the oracle. Users submit requests, which are locally

evaluated, and an estimate of the uncertainty calculated as

described in Section IV-B. The system evaluates potential

risks and gains, and determines when decisions can be made

locally, and when the oracle must be queried at a cost. We

simulate a large number of requests, e.g., 100,000, logging the

request, decisions, and uncertainty. Afterwards, we compare

each decision with the oracle, and aggregate the net utility of

the system, including any gains, costs and damages incurredduring operation. We first describe our experimental system

in more detail. For clarity, we distinguish between standard

damage (of a false-allow) with loss (damage of a false-deny).

A. Experiments

The oracle is considered infallible, i.e., it always returns

correct results, and contains a real access control policy from

a system that provisions administrative access to a commercial

system obtained from a corporation. The policy consists of

881 permissions and 852 users, with 25 attributes describing

each user; there are 6691 allowed requests. As we do not

have access to actual permission usage logs, e.g., requests or

frequency of permission use, we simulate incoming requestsby sampling from the full policy, generating both valid (allow)

and invalid (deny) requests. Users are selected uniformly

randomly, and for permissions we first determine whether

the request is valid with input probability, pv, then if valid

(invalid) choose from the pool of valid (invalid) requests in

an unbiased way. This allows us to generate sample requests

with an arbitrary frequency of valid transactions, e.g. 30%,

50% or 80%, and hence to model different scenarios, such

as a system under attack (mostly invalid requests), or normal

usage (mostly valid requests).

B. Classifiers

Our experimental prototype is implemented in Python, and

uses PyML for the support vector machine1 [27] we use as

our classifier. A support vector machine (SVM) is a super-

vised learning method for classification and regression. Given

labeled points in a k-dimensional space, an SVM finds a

hyperplane (normal vector w and offset b from the origin) that

separates the two classes and maximizes the minimal distance

between any point and the hyperplane. The class of an unseen

1PyML provides a wrapper for libSVM

point x is determined by which side of the hyperplane it is on,

i.e., sign(w · x − b) [27]. See Figure 2. An SVM is a linear

classifier, however a kernel can be applied first to support non-

linear classification. Kernels may result in over fitting when

the number of input datapoints is too small, e.g., less than or

equal to the degree of a polynomial kernel. To prevent this, we

do not train the SVM until a sufficient number of data points

are available. We use both polynomial and Gaussian kernels.

Intuitively,

x

ji

Hyperplane

Support Vectors a l l o

w

d e n

y

α j = 1,

β j = 2

} α i

= 3, β i

= 1

Fig. 2. A Support Vector Machine classifier & (α,β)calculation for uncertainty.

there is less

uncertainty in

a prediction

the farther

away a point

is from the

hyperplane,

a distance

given by

|w · x − b |,and the

uncertaintyis highest

between the

support vectors (the points closest to the hyperplane). We

define an (αi, β i) value for each band (hyper-volume)

corresponding to the point i in our cached training set.

Any point correctly classified closer to the hyperplane

increments αi, while each point farther away incorrectly

classified increments β i. For example, in Figure 2, the

incorrectly predicted point k will increment β j , while

correctly classifying a new point x would increment αi (and

points farther away). The α and β values are only valid

within their respective bands, inclusively. We use the points

in the training set as our initial sample, and further updatethese quantities when we defer to the oracle if the uncertainty

is too high. When we retrain, all α, β values are reset. The

hyperplane itself has α = 1, β = 1.

In our setting, the data points represent users. We map a

user to a k-dimensional space by converting their attributes

into binary vectors. For example, consider a Department

attribute of a company that has three values: R&D, Sales,

and Marketing. These are represented by the binary vectors

0, 0, 1, 0, 1, 0, and 1, 0, 0, respectively. The final k-bit

vector for a user is the concatenation of all attributes. In the

remainder of this paper when we refer to a subject s ∈ S , we

are referring to a point in some k-dimensional space.

Finally, there are | O × A | SVM classifiers, one per permis-sion, trained on subjects s ∈ S . An alternative implementation

is to define a single classifier whose input is S × O × A, how-

ever, based on the results from [6], some object-right requests

have more uncertainty than others. To simplify calculating the

uncertainty of the classified requests, we choose to define a

classifier per permission.

Each PDP based on an SVM will store a small number

of data points (subjects and decisions) in a buffer for later

retraining. Before predicting the decision using an SVM, we



first check to see if the point is in the cached buffer. When it is,

we return the known decision with zero uncertainty (assuming

a static policy). When a new input and decision is specified

by querying the oracle, it is provided to the SVM so that it

may be retrained. When the number of samples exceeds the

size of the buffer a new SVM is trained and the point correctly

classified and project farthest from the hyperplane is discarded.

Thus the buffer contains the points that most closely define the

support vectors. We use polynomial (degree four) and radial

bias function (RBF) kernels in our experiments.

We implement several baseline PDPs to compare against

the risk-based PDP: a local PDP that will always defer to

the central PDP (Default PDP), and a first-in-first-out cache

(FIFO Cache), such that if the request is not in the cache, the

oracle is queried. We use two FIFO cache sizes: (E) the size

of the FIFO cache is equal to the total memory of the SVMs;

(Inf) the FIFO cache is unbounded.

We also implement three learning-based PDPs: the first one

(SACMAT’09) is a PDP consistent with the approach taken

by Ni et al. , where we train an SVM for each permission and

accept any decision they return, i.e., we assume there is zerouncertainty. The second one (Seeded SVM) is a PDP where

an SVM is constructed for each permission and seeded with n

random users granted the permission, and m random users not

granted the permission. We train and test on the n + m users

and we use n = m = 10. Finally, the third one (Unseeded

SVM) uses an untrained SVM created for each permission.

Before the SVMs can be trained a minimum number of allow

and deny decisions must be specified by querying the oracle.

The buffer of an unseeded SVM is the same as a seeded SVM.

We note that our approach is not tied to any specific

classifier, and we introduce a “black box” classifier as an

example. This classifier randomly selects a level of uncertainty,

and then returns a correct or incorrect decision sampled fromthat accuracy. The performance of our experiments further

reinforces the validity of our assumption that similar requests

receive similar decisions.

C. Evaluation Results

In our experiments, we first compare our general approach

with a default policy that always contacts the central PDP, and

the two FIFO caches, the bounded size cache (the same as all

SVMs combined), and the unbounded (see previous section

for details). For these experiments we restrict ourselves to

the seeded SVM settings and evaluate the results against the

scenarios described in Figure 3 (cc = the cost of contacting

the central PDP; g = the gains for true-allow, d = thedamage of false-deny, l = the loss of false-allow. For space

limitations we only show tables where true-denies have 0 gain

and 0 loss). The figure shows that regardless of the parameters

our approach behave well, i.e. calls to the central PDP are

reduced (in some cases as much as 75% – see Figure 3(a))

and the utility is better than in any of the other standard

approaches. This confirms the independence of our approach

from the setting of the parameters making it applicable to

many situations.

U t i l i t y

Transaction Number

(a) Small cc , g = 2cc, d = l = 2g

U t i l i t y

Transaction Number

(b) Small cc , g = 4cc, d = 10g, l = 0

U t i l i t y

Transaction Number

(c) Small cc , g = 10cc, d = 2cc, l = 10g

Fig. 3. Evaluation under different system parameters.

0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0

0

2 0 0 0

4 0 0 0

6 0 0 0

8 0 0 0

1 0 0 0 0

C

e

n

t

r

a

l

Q

u

e

r

i

e

s

D e f a u l t P D P

S e e d ( 1 , 1 ) S V M

S e e d ( n , m ) S V M

S i m p l e F I F O ( E )

S i m p l e F I F O ( I n f )

0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0

T r a n s a c t i o n N u m b e r

1 0 0 0 0 0

5 0 0 0 0

0

5 0 0 0 0

1 0 0 0 0 0

1 5 0 0 0 0

2 0 0 0 0 0

U

t

i

l

i

t

y

(a) 30% Valid Requests

0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0

0

2 0 0 0

4 0 0 0

6 0 0 0

8 0 0 0

1 0 0 0 0

C

e

n

t

r

a

l

Q

u

e

r

i

e

s

D e f a u l t P D P

S e e d ( 1 , 1 ) S V M

S e e d ( n , m ) S V M

S i m p l e F I F O ( E )

S i m p l e F I F O ( I n f )

0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0

T r a n s a c t io n N u m b e r

5 0 0 0 0

0

5 0 0 0 0

1 0 0 0 0 0

1 5 0 0 0 0

2 0 0 0 0 0

2 5 0 0 0 0

3 0 0 0 0 0

3 5 0 0 0 0

U

t

i

l

i

t

y

(b) 50% Valid Requests

Fig. 4. Risk-based classifiers outperform traditional caching with high cache-miss rates.

Next, we consider the performance impact of the ratio of

invalid versus valid requests on the classifiers. We evaluate

this for two reasons. First, the ratio in a typical deployment

is unknown. Second, [6] shows that SVMs result in more

false-denies than false-allows, so varying the ratio may reveal

useful insights. We found that the sampling rate of valid versus

invalid requests impacts the rate of cache hits for the FIFO

implementations greatly affecting their performance. When the

number of cache hits was low, for example, by sampling more

invalid requests, we explore a larger amount of the possible

request space, resulting in more cache misses before a hit is

returned. Conversely, at a high rate of valid requests, a larger

fraction of the valid sampled requests reside in the cache (in

part due to the smaller sampling size because the fraction of

allowed requests is so low). The higher number of false-denies

also results in increased uncertainty, causing the SVMs to be

retrained more frequently, and thus deferring more requests.

The results are shown in Figure 4.Next, we consider the potential impact of trusting the SVM

classifiers without first evaluating their uncertainty, as is done

in [6]. Here, the false-allows and false-denies can result in

devastating losses. To illustrate this, we use the decision

resolver (SACMAT’09) described earlier which accepts any

predicted decision, regardless of the risk. Figure 5 illustrates

the potential decrease in utility from a small number of false-

positives in the classifier.

Finally, we evaluate how well the systems perform when



Fig. 5. The impact of not evaluating the classifier risk.

the communication costs become prohibitive. Here, we set

the cost to contact the oracle to be five-times the gains

from allowing a valid request, and set the damage to befour-times the communication costs. This type of scenario

may correspond to a military setting where communication is

expensive, e.g., requiring a satellite link, or where radio usage

should be minimized to avoid triangulation by the enemy. The

results, shown in Figure 6, illustrate that while the risk-based

outperforms the FIFO caches and reduces queries by 65%, it

cannot keep the utility positive. However, it should be noted

that the FIFO caches lose over three-times more utility due to

the communication costs.

Fig. 6. Communication costs is 5 times the gains, and damages are 10 timesthe gains.

VI . CONCLUSION

This paper presents a new learning and risk-based archi-

tecture for distributed policy enforcement under uncertainty.

The trade-off between the uncertainty and utility of decisions

determines whether they can be taken locally or the central

PDP queried. We present three approaches, Expected Utility,

Risk Adjusted Utility, and Independent Risk Constraints, each

representing a different treatment of risk.

Our approach is validated using data from a large corpo-

ration, and consistently performs better than a naive caching

mechanism, in a number of different scenarios. Under idealconditions, the rate of central PDP queries is reduced up to

75% and an eight-fold increase in system utility is seen. Such

improvements occur when the cache hit rate is low.

We plan to compare our method with the inference tech-

niques cited in Section II. Dynamic access control policies

could also be considered, where the correct decision for each

request can change over time. In this context, the uncertainty

returned by the proposer must combine uncertainty due to

classifier error with uncertainty due to possible policy updates.

Finally, additional research is needed to better determine when

a classifier should be retrained and when a given sample

should be discarded, particularly challenging when dealing

with dynamic policy changes.

ACKNOWLEDGMENT

This research was sponsored by the U.S. Army Research Laboratory

and the U.K. Ministry of Defence and was accomplished under Agreement

Number W911NF-06-3-0001. The views and conclusions contained in this

document are those of the author(s) and should not be interpreted as repre-

senting the official policies, either expressed or implied, of the U.S. Army

Research Laboratory, the U.S. Government, the U.K. Ministry of Defence

or the U.K. Government. The U.S. and U.K. Governments are authorized to

reproduce and distribute reprints for Government purposes notwithstanding

any copyright notation hereon.

REFERENCES

[1] Entrust, “Getaccess design and administration guide,” September 1999.[2] G. Karjoth, “Access control with IBM Tivoli Access Manager,” ACM

Trans. Inf. Syst. Secur., 2003.[3] Netegrity, “Siteminder concepts guide,” 2000.[4] J. Crampton, W. Leung, and K. Beznosov, “The secondary and approxi-

mate authorization model and its application to Bell-LaPadula policies,”in Proceedings of 11th ACM Symposium in Access Control Models and

Technologies, 2006.[5] Q. Wei, J. Crampton, K. Beznosov, and M. Ripeanu, “Authorization

recycling in rbac systems,” in Proceedings of the 13th ACM symposiumon Access control models and technologies, 2008.

[6] Q. Ni, J. Lobo, S. B. Calo, P. Rohatgi, and E. Bertino, “Automatingrole-based provisioning by learning from examples,” in SACMAT , 2009.

[7] L. LaPadula and D. Bell, “Secure Computer Systems: A MathematicalModel,” Journal of Computer Security, 1996.

[8] D. F. Ferraiolo and D. R. Kuhn, “Role-based access control,” inProceedings of the 15th National Computer Security Conference , 1992.

[9] D. F. C. Brewer and M. J. Nash, “The Chinese Wall Security Policy,”in Proceedings of the IEEE Symposium on Security and Privacy, 1989.

[10] P. Bonatti and P. Samarati, “Regulating service access and informationrelease on the web,” in Proceedings of the 7th ACM conference onComputer and communications security, 2000.

[11] L. Wang, D. Wijesekera, and S. Jajodia, “A logic-based framework for attribute based access control,” in Proceedings of the 2004 ACM workshop on Formal methods in security engineering, 2004.

[12] Y. Park, S. C. Gates, W. Teiken, and S. N. Chari, “System for automaticestimation of data sensitivity with applications to access control andother applications,” in Demo at SACMAT’11, 2011.

[13] K. Borders, X. Zhao, and A. Prakash, “Cpol: high-performance policyevaluation,” in Proceedings of the 12th ACM conference on Computer and communications security, 2005.

[14] M. Wimmer and A. Kemper, “An authorization framework for sharingdata in web service federations,” in Secure Data Management , 2005.

[15] Q. Wei, M. Ripeanu, and K. Beznosov, “Cooperative secondary autho-rization recycling,” in Proceedings of the 16th international symposiumon High performance distributed computing, 2007.

[16] M. V. Tripunitara and B. Carbunar, “Efficient access enforcement indistributed role-based access control (RBAC) deployments,” in Proc. of the 14th ACM Symp. on Access control models and technologies, 2009.

[17] A. Rosenthal and E. Sciore, “Administering permissions for distributeddata: factoring and automated inference,” in Proceedings of the fifteenthannual working conference on Database and application security, 2002.

[18] S. Rizvi, A. Mendelzon, S. Sudarshan, and P. Roy, “Extending queryrewriting techniques for fine-grained access control,” in Proc. of the2004 ACM SIGMOD international Conf. on Management of data , 2004.

[19] L. Zhang, A. Brodsky, and S. Jajodia, “Toward Information Sharing:Benefit And Risk Access Control (BARAC),” 7th IEEE Intl. Workshop

on Policies for Distributed Systems and Networks (POLICY’06) , 2006.[20] P.-C. Cheng, P. Rohatgi, C. Keser, P. A. Karger, G. M. Wagner, and A. S.

Reninger, “Fuzzy multi-level security: An experiment on quantified risk-adaptive access control,” in IEEE Symp. on Security and Privacy, 2007.

[21] I. Molloy, P.-C. Cheng, and P. Rohatgi, “Trading in risk: Using marketsto improve access control,” in NSPW , 2008.

[22] C. M. Bishop, Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer, 2007.

[23] I. Molloy, N. Li, J. Lobo, Y. A. Qi, and L. Dickens, “Mining roles withnoisy data,” in SACMAT’10, 2010.

[24] M. Frank, A. Streich, D. Basin, and J. Buhmann, “A probabilisticapproach to hybrid role mining,” in CCS ’09: Proceedings of the 16th

ACM conference on Computer and communications security, 2009.[25] “Visa Debit Processing Service, Transac-

tion Processing, Authorization Processing,”http://www.visadps.com/services/authorization processing.html.

[26] “Card-Present, Merchants, Visa USA,”http://usa.visa.com/merchants/risk management/card present.html.

[27] V. N. Vapnik, Statistical Learning Theory. Wiley-Interscience, Septem-ber 1998, ISBN 9780471030034.

Documents

s5.2_04-molloy_pap.pdf