An Automatic Attribute Based Access Control Policy Extraction From Access Logs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

1

An Automatic Attribute Based Access Control


Policy Extraction from Access Logs
Leila Karimi , Student Member, IEEE, Maryam Aldairi , Student Member, IEEE,
James Joshi , Senior Member, IEEE, and Mai Abdelhakim , Member, IEEE

Abstract—With the rapid advances in computing and information technologies, traditional access control models have become
inadequate in terms of capturing fine-grained, and expressive security requirements of newly emerging applications. An attribute-based
access control (ABAC) model provides a more flexible approach to addressing the authorization needs of complex and dynamic
systems. While organizations are interested in employing newer authorization models, migrating to such models pose as a significant
arXiv:2003.07270v4 [cs.CR] 30 Jan 2021

challenge. Many large-scale businesses need to grant authorizations to their user populations that are potentially distributed across
disparate and heterogeneous computing environments. Each of these computing environments may have its own access control
model. The manual development of a single policy framework for an entire organization is tedious, costly, and error-prone.
In this paper, we present a methodology for automatically learning ABAC policy rules from access logs of a system to simplify the
policy development process. The proposed approach employs an unsupervised learning-based algorithm for detecting patterns in
access logs and extracting ABAC authorization rules from these patterns. In addition, we present two policy improvement algorithms,
including rule pruning and policy refinement algorithms to generate a higher quality mined policy. Finally, we implement a prototype of
the proposed approach to demonstrate its feasibility.

Index Terms—Access Control, Attribute Based Access Control, Policy Mining, Policy Engineering, Machine Learning, Clustering.

1 I NTRODUCTION Although organizations and developers are interested in


employing the next generation AC models, adopting such

A CCESS control systems are critical components of in-


formation systems that help protect information re-
sources from unauthorized accesses. Various access con-
policy frameworks poses a significant challenge. Many large
organizations need to grant authorization to their vast user
populations distributed across disparate computing envi-
trol models and approaches have been proposed in the ronments, including legacy systems. Each of these comput-
literature including Discretionary Access Control (DAC) [1] ing environments may have its own AC model. The manual
[2], Mandatory Access Control (MAC) [3] [4], and Role- development of a single policy for the entire organization
Based Access Control (RBAC) [5]. However, with the rapid is tedious and error-prone. Policy Mining techniques have
advances in newer computing and information technologies been proposed in the literature to address such challenges
(e.g., social networks, Internet of Things (IoT), cloud/edge to help organizations cut the cost, time, and error of policy
computing, etc.), existing access control (AC) approaches development/management. Policy mining algorithms ease
have become inadequate in providing flexible and expres- the migration to more recent/appropriate authorization
sive authorization services [6]. For example, a health care models by completely (or partially) automating the process
environment requires a more expressive AC model that of constructing AC policies.
meets the needs of patients, health care providers as well Policy mining techniques were first introduced for devel-
as other stakeholders in the health care ecosystem [7], oping RBAC policies. Kuhlmann et al. coined the term “role
[8]. Attribute Based Access Control (ABAC) models present mining" to refer to a data mining approach that constructs
a promising approach that addresses newer challenges in roles from a given permission assignment dataset [10]; this
emerging applications [9]. An ABAC approach grants access work was followed by various role mining techniques, such
rights to users based on attributes of entities in the system as [11], [12], [13]. Although the proposed approaches are
(i.e., user attributes, object attributes, and environmental beneficial in developing optimal sets of roles, they are not
conditions) and a set of authorization rules. applicable in extracting ABAC policies.
Xu and Stoller were the first to study the problem of
L. Karimi, M. Aldairi, and J. Joshi are with the School of Computing and mining ABAC policies from given access control matrices
Information, University of Pittsburgh.
M. Abdelhakim is with Electrical and Computer Engineering, Swanson School
or logs [14], [15]. Following that, several researchers have
of Engineering, University of Pittsburgh. investigated various ABAC policy mining techniques [16],
Email addresses: {leila.karimi, ma.aldairi, jjoshi, and maia}@pitt.edu [17], [18]. However, these studies suffer from several limita-
© 2021 IEEE. Personal use of this material is permitted. Permission from tions, as follows:
IEEE must be obtained for all other uses, including reprinting/republishing
this material for advertising or promotional purposes, collecting new collected
works for resale or redistribution to servers or lists, or reuse of any copyrighted • First, the existing approaches do not support mining
component of this work in other works. authorization rules with negative filters. An ABAC
2

policy rule can be comprised of a set of positive and well as the unsupervised learning algorithm. In Section 3,
negative filters. Negative filters are useful in scenar- we define the ABAC policy extraction problem, discuss the
ios when an exception needs to be expressed. For related challenges, and introduce the metrics for evaluating
example, a healthcare provider can express the fol- the extracted policy. In Section 4, we present the proposed
lowing rule using a negative attribute filter: “A nurse ABAC policy extraction approach. In Section 5, we present
can read a patient’s record except for payment purposes." the evaluation of the proposed approach on various sets of
Using negative filters in rule expressions results in a policies. We present the related work in Section 6 and the
more concise authorization policy (Section 5). conclusions and future work in Section 8.
• Second, some proposed approaches such as in [14],
[15], [17] are unable to mine a high-quality policy 2 P RELIMINARIES
when the given access log is not complete in the sense
that every possible combination of attribute values is In this section, we overview ABAC, the ABAC policy lan-
not included in the access log (Section 3). guage, and the unsupervised learning algorithm.
• Third, the proposed approaches are unable to mine
a policy from noisy access logs containing over- 2.1 ABAC Model
assignments and under-assignments [16], [18]. Hav- In 2013, NIST published a “Guide to ABAC Definition and
ing noisy access records is a common problem in Consideration" [9], according to which, “the ABAC engine can
evolving domains such as IoT or social networks [19]. make an access control decision based on the assigned attributes
It is essential that an ABAC policy miner should be of the requester, the assigned attributes of the object, environment
capable of handling a reasonable amount of noise to conditions, and a set of policies that are specified in terms of those
be applicable to real-world applications. attributes and conditions.” Throughout the paper, we use user
• Last but not the least, the existing approaches do not attributes, object attributes, and session attributes to refer to the
include techniques for improving the mined policy attributes of the requester, attributes of the object, and the
after the first round of policy extraction. In addition, environmental attributes/conditions, respectively.
in scenarios where the authorization policies may Accordingly, 𝑈, 𝑂, 𝑆, 𝑂𝑃 are sets of users, objects,
change over time (such as in social networks with sessions, and operations in a system and user attributes
addition and removal of various applications), these (𝐴𝑢 ), object attributes (𝐴𝑜 ), and session attributes (𝐴𝑠 ) are
approaches do not provide any guidelines for adjust- mappings of subject attributes, object attributes, and en-
ing the policy. This makes practical deployment of vironmental attributes as defined in the NIST Guide [9].
these approaches very difficult. 𝐸 = 𝑈 ∪ 𝑂 ∪ 𝑆 and 𝐴 = 𝐴𝑢 ∪ 𝐴𝑜 ∪ 𝐴𝑠 are the sets of all
entities and all attributes in the system, respectively.
Furthermore, none of the existing work addresses these
issues in an integrated way. In this paper, we propose Definition 1. (Attribute Range). Given an attribute 𝑎 ∈ 𝐴,
a machine learning based ABAC policy mining approach the attribute range 𝑉𝑎 is the set of all valid values for 𝑎 in
to address these challenges. To summarize, the primary the system.
contributions of this paper are as follows: Definition 2. (Attribute Function). Given an entity 𝑒 ∈ 𝐸,
an attribute function 𝑓 𝑎_𝑒 is a function that maps an entity
1) We propose an unsupervised learning based ap-
to a specific value from the attribute range. Specifically,
proach to extract ABAC policy rules that contain
𝑓 𝑎_𝑒 (𝑒, 𝑎) returns the value of attribute 𝑎 for entity 𝑒.
both positive and negative attribute filters as well
as positive and negative relation conditions. Example 1. 𝑓 𝑎_𝑒 (𝐽𝑜ℎ𝑛, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛) = faculty indicates that the
2) The proposed policy mining approach is effective value of attribute position for user John is faculty.
even with an incomplete set of access logs and in Example 2. 𝑓 𝑎_𝑒 (𝑑𝑒 𝑝1, 𝑐𝑟 𝑠) = {𝑐𝑠101, 𝑐𝑠601, 𝑐𝑠602} indicates
presence of noise. that the value of attribute crs for object dep1 is a set
3) As part of the unsupervised learning based ap- {𝑐𝑠101, 𝑐𝑠601, 𝑐𝑠602}.
proach, we propose the rule pruning and policy
refinement algorithms to enhance the quality of the Each attribute in the system can be a single-valued
mined policy and to ease its maintenance. (atomic) or multi-valued (set). In Example 1 position is a
4) We propose a policy quality metric based on policy single-valued attribute while crs is a multi-valued attribute
correctness and conciseness to be able to compare in Example 2. For simplicity, we consider only atomic at-
different sets of mined policy rules and to select the tributes in this work. Actually, the process of extracting
best one based on some given criteria. ABAC policy with multi-valued attributes is exactly the
5) We implement a prototype of the proposed model same as that with atomic attributes, however, we need to
and evaluate it using various ABAC policies to pre-process data to convert each multi-valued attribute to
show its efficiency and effectiveness. a set of atomic attributes. This can be done using various
techniques such as defining dummy variables [20], 1-of-𝐾
To the best of our knowledge, our proposed approach is scheme [21], etc. At the end of the process and when policy
the first unsupervised learning based ABAC policy mining rules are extracted, we need one more step to convert back
method that can be used to extract ABAC policies with both atomic attribute filters to the corresponding multi-valued
positive and negative attribute and relationship filters. attribute filters.
The rest of the paper is organized as follows. In Section Attribute filters are used to denote the sets of users,
2, we overview the ABAC model and its policy language as objects, and sessions to which an authorization rule applies.
3

Definition 3. (Attribute Filter). An attribute filter is defined Example 5. Consider rule 𝜌1 = h{h𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑠𝑡𝑢𝑑𝑒𝑛𝑡i,
as a set of tuples F = {h𝑎, 𝑣|!𝑣i| 𝑎 ∈ 𝐴 and 𝑣 ∈ 𝑉𝑎 }. Here h𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛, 𝑐𝑎𝑚 𝑝𝑢𝑠i, h𝑡𝑦 𝑝𝑒, 𝑎𝑟𝑡𝑖𝑐𝑙𝑒i}, {h𝑑𝑒 𝑝𝑡𝑢 , 𝑑𝑒 𝑝𝑡 𝑜 i},
h𝑎, 𝑣i is a positive attribute filter tuple that indicates 𝑎 𝑟𝑒𝑎𝑑i. It can be interpreted as “A student can read an
has value 𝑣, and h𝑎, !𝑣i is a negative attribute filter tuple article if he/she is on campus and his/her department matches
that indicates 𝑎 has any value in its range except 𝑣. the department of the article".
Example 3. Tuple h𝑙𝑎𝑏𝑒𝑙, !𝑡𝑜 𝑝-𝑠𝑒𝑐𝑟𝑒𝑡i points to all entities in Definition 10. (Rule Satisfaction) An access request 𝑞 =
the system that do not have “top-secret" as their security h𝑢, 𝑜, 𝑠, 𝑜 𝑝i is said to satisfy a rule 𝜌, denoted as 𝑞 |= 𝜌,
label “label". iff
h𝑢, 𝑜, 𝑠i |= F ∧ h𝑢, 𝑜, 𝑠i |= R ∧ 𝑜 𝑝 𝑞 = 𝑜 𝑝 𝜌 .
Definition 4. (Attribute Filter Satisfaction). An entity 𝑒 ∈ 𝐸
satisfies an attribute filter F , denoted as 𝑒 |= F , iff Definition 11. (ABAC Policy). An ABAC policy is a tuple
∀h𝑎 𝑖 , 𝑣 𝑖 i ∈ F : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) = 𝑣 𝑖 ∧ 𝜋 = h𝐸, 𝑂𝑃, 𝐴, 𝑓 𝑎_𝑒 , Pi where 𝐸, 𝑂𝑃, 𝐴, and P are sets
of entities, operations, attributes, and ABAC rules in the
∀h𝑎 𝑖 , !𝑣 𝑖 i ∈ F : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) ≠ 𝑣 𝑖 .
system and 𝑓 𝑎_𝑒 is the attribute function.
Example 4. Suppose 𝐴𝑢 = {𝑑𝑒 𝑝𝑡, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑐𝑜𝑢𝑟 𝑠𝑒𝑠}. The set Definition 12. (ABAC Policy Decision). The decision of an
of tuples FU = {h𝑑𝑒 𝑝𝑡, 𝐶𝑆i, h𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑔𝑟𝑎𝑑i} denotes a ABAC policy 𝜋 for an access request 𝑞 denoted as 𝑑 𝜋 (𝑞)
user attribute filter. Here, the graduate students in the is permit iff:
CS department satisfy FU . ∃𝜌 ∈ 𝜋 : 𝑞 |= 𝜌
Definition 5. (Relation Condition). A relation condition is otherwise, the decision is deny.
defined as a set of tuples R = {h𝑎, 𝑏|!𝑏i| 𝑎, 𝑏 ∈ 𝐴 ∧ 𝑎 ≠
𝑏}. Here h𝑎, 𝑏i is a positive relation condition tuple that If an access request satisfies a rule of the access control
indicates 𝑎 and 𝑏 have the same values, and h𝑎, !𝑏i is a policy, then the decision of the system for such access
negative relation condition tuple that indicates 𝑎 and 𝑏 request is permit. If the access request does not satisfy any
do not have the same values. rule in the access control policy then the decision of the
system for such access request is deny.
A relation is used in a rule to denote the equality con- TABLE 1 summarizes the notations used in this paper.
dition between two attributes of users, objects, or sessions.
Note that the two attributes in the relation condition must
2.2 Unsupervised Learning Algorithm
have the same range.
Definition 6. (Relation Condition Satisfaction). An entity Unsupervised learning algorithms try to infer a function
𝑒 ∈ 𝐸 satisfies a relation condition R, denoted as 𝑒 |= R, that describes the structure of unlabeled data. They are
iff useful when no or very few labeled data is available. We
∀h𝑎 𝑖 , 𝑏 𝑖 i ∈ R : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) = 𝑓 𝑎_𝑒 (𝑒, 𝑏 𝑖 ) leverage such methods for extracting ABAC policies from
∀h𝑎 𝑖 , !𝑏 𝑖 i ∈ R : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) ≠ 𝑓 𝑎_𝑒 (𝑒, 𝑏 𝑖 ). access logs.
In particular, given a set of authorization tuples, we
Definition 7. (Access Request). An access request is a tuple employ an unsupervised learning approach to mine and
𝑞 = h𝑢, 𝑜, 𝑠, 𝑜 𝑝i where user 𝑢 ∈ 𝑈 sends a request to the extract an ABAC policy that has high quality. An unsu-
system to perform operation 𝑜 𝑝 ∈ 𝑂𝑃 on object 𝑜 ∈ 𝑂 in pervised learning approach is suitable because there is no
session 𝑠 ∈ 𝑆. labeled data available for desired ABAC rules. ABAC policy
Definition 8. (Authorization Tuple/Access Log). An autho- extraction, in this case, can be considered as a mapping
rization tuple is a tuple 𝑡 = h𝑞, 𝑑i containing decision between authorization tuples to a set of clusters that are
𝑑 made by the access control system for request 𝑞. An representative of the desired ABAC rules. Such a mapping
Access Log L is a set of such tuples. can be expressed as a function, ℎ : X → Y, where:

The decision 𝑑 of an authorization tuple can be permit 1) X is a set of authorization tuples (i.e., access log).
or deny. The tuple with permit decision means that user 𝑢 2) Y is a set of numbered labels (i.e., cluster labels,
can perform an operation 𝑜 𝑝 on an object 𝑜 in session 𝑠. each cluster corresponding to a rule of the ABAC
The authorization tuple with deny decision means that user policy 𝜋).
𝑢 cannot perform operation 𝑜 𝑝 on object 𝑜 in session 𝑠. The goal is then to learn the function ℎ with low cluster-
Access log is a union of Positive Access Log, L + , and ing error and mine the desired policy that is high quality.
Negative Access Log, L − , where:

L + = {h𝑞, 𝑑i|h𝑞, 𝑑i ∈ L ∧ 𝑑 = 𝑝𝑒𝑟𝑚𝑖𝑡},


3 P ROBLEM D EFINITION
3.1 ABAC Policy Extraction Problem
and
Although organizations are interested in employing an
L − = {h𝑞, 𝑑i|h𝑞, 𝑑i ∈ L ∧ 𝑑 = 𝑑𝑒𝑛𝑦}.
ABAC model, adopting it is a big challenge for them.
Definition 9. (ABAC Rule). An access rule 𝜌 is a tuple The manual development of such a policy is tedious and
hF , R, 𝑜 𝑝|!𝑜 𝑝i, where F is an attribute filter, R is a re- error-prone. Policy Mining techniques have been proposed
lation condition, and 𝑜 𝑝 is an operation. !𝑜 𝑝 is a negated to address such challenges in order to reduce the cost,
operation that indicates the operation can have any value time, and error of policy development/maintenance. ABAC
except 𝑜 𝑝. policy mining algorithms ease the migration to the ABAC
4

TABLE 1: Notations

Notation Definition
𝑈 ,𝑂, 𝑆, 𝑂𝑃 Sets of users, objects, sessions, and operations
𝐴𝑢 , 𝐴𝑜 , and 𝐴𝑠 Sets of user attributes, object attributes, and session attributes
𝐸 =𝑈 ∪𝑂∪𝑆 Set of all entities
𝐴 = 𝐴𝑢 ∪ 𝐴𝑜 ∪ 𝐴𝑠 Set of all attributes
𝑉𝑎 Attribute Range: set of all valid values for 𝑎 ∈ 𝐴
𝑓𝑎_𝑒 (𝑒, 𝑎) Attribute Function: a function that maps an entity 𝑒 ∈ 𝐸 to a value from 𝑉𝑎
F = { h𝑎, 𝑣 |!𝑣 i | 𝑎 ∈ 𝐴 ∧ 𝑣 ∈ 𝑉𝑎 } Attribute Filter
R = { h𝑎, 𝑏i | 𝑎, 𝑏 ∈ 𝐴 ∧ 𝑎 ≠ 𝑏 ∧ 𝑉𝑎 = 𝑉𝑏 } Relation Condition
𝑞 = h𝑢, 𝑜, 𝑠, 𝑜 𝑝i Access Request
𝑡 = h𝑞, 𝑑 i Authorization Tuple, showing decision 𝑑 made by the system for request 𝑞
L Access Log, set of authorization tuples
L + = { h𝑞, 𝑑 i | h𝑞, 𝑑 i ∈ L ∧ 𝑑 = 𝑝𝑒𝑟 𝑚𝑖𝑡 } Positive Access Log
L − = { h𝑞, 𝑑 i | h𝑞, 𝑑 i ∈ L ∧ 𝑑 = 𝑑𝑒𝑛𝑦 } Negative Access Log
𝜌 = hF, R, 𝑜 𝑝 |!𝑜 𝑝i ABAC Rule
P Set of all policy rules
𝜋 = h𝐸 , 𝑂𝑃, 𝐴, 𝑓𝑎_𝑒 , P i ABAC Policy
𝑑 𝜋 (𝑞) The decision of an ABAC policy 𝜋 for an access request 𝑞
𝑇 𝑃 𝜋 |L , 𝐹 𝑃 𝜋 |L , 𝑇 𝑁 𝜋 |L , and 𝐹 𝑁 𝜋 |L Relative True Positive, False Positive, True Negative, and False Negative Rates
𝐴𝐶𝐶 𝜋 |L Relative Accuracy Rate
𝐹 -𝑠𝑐𝑜𝑟 𝑒 𝜋 |L Relative F-score
𝑊 𝑆𝐶 ( 𝜋) Weighted Structural Complexity of policy 𝜋
Q𝜋 Policy Quality Metric

framework by completely (or partially) automating the de- interpret they would be. In addition, succinct rules
velopment of ABAC policy rules. are desirable as they are easier to audit and manage.
The primary input to a policy mining algorithm is the log 3) Negative Attribute Filters: The ABAC policy mining
of authorization decisions in the system. The log indicates solution should support both positive and negative
authorization decision (i.e., permit or deny) for any given attribute filters which will result in more concise
access request by a user of the system. For ABAC policy and manageable mined policy.
mining, such a log is accompanied by attributes of entities 4) Relation Conditions: The solution should support the
involved in the log entries. The goal of a policy mining extraction of relation conditions for policy mining
algorithm is to extract ABAC policy rules from access logs in order to generate more concise and manageable
that have high quality with respect to some quality metrics mined policy.
(e.g., policy size and correctness). 5) Sparse Logs: In real-world, the access log that is
We define the ABAC policy extraction problem formally input to the policy mining algorithm may be sparse,
as follows: representing only a small fraction of all possible
Definition 13. (ABAC Policy Extraction Problem). Let access requests. The policy mining algorithm must
𝐼 =< 𝐸, 𝑂𝑃, 𝐴, 𝑓 𝑎_𝑒 , L >, where the components are as be able to extract useful rules even from a sparse
defined earlier, then the ABAC policy extraction problem log.
is to find a set of rules R such that the ABAC policy 6) Mining Negative Authorization Rules: An ABAC pol-
𝜋 =< 𝐸, 𝑂𝑃, 𝐴, 𝑓 𝑎_𝑒 , R > has high quality with respect to icy can contain both positive and negative rules
L. which permit or deny access requests, respectively.
The use of negative rules is helpful in situations
where specifying exceptions to more general rules
3.2 Challenges and Requirements is important. Including negative policy rules would
For an ABAC policy extraction approach to be applicable help in generating a more concise ABAC policy.
to a wide range of real-world scenarios, we identify the Thus, the policy mining algorithm should be able
following challenges and requirements: to extract both positive and negative authorization
rules.
1) Correctness of Mined Policy: The mined policy must 7) Noisy Authorization Log: In the real world and with
be consistent with original authorization log in that complex and dynamic information systems, it is
the access decision of the mined policy must result possible to have a noisy authorization log consisting
in the same access decision of the log entry. An of over-assignments and under-assignments. These
inconsistent extracted policy may result in situations issues occur either due to a wrong configuration
in which an originally authorized access is denied of the original authorization system or improper
(more restrictive) or originally unauthorized access is policy updates by administrators. The policy mining
permitted (less restrictive) by the system. algorithm should be capable of extracting meaning-
2) Complexity of Mined Policy: The policy mining algo- ful rules even in presence of an acceptable amount
rithm should endeavor to extracting a policy that is of noise in the input access log.
as concise as possible. Since the policy rules need to 8) Dynamic and Evolving Policies: Modern information
be manipulated by human administrators, the more systems are often dynamic. The authorization needs
concise they are, the more manageable and easier to
5

of these systems and the attributes of the entities in The relative accuracy metric, 𝐴𝐶𝐶 𝜋 | L , measures the ac-
the environment evolve rapidly. These changes will curacy of mined policy 𝜋 with regards to the decisions made
result in over-assignments or under-assignments. by the original policy indicated by L and is defined formally
The proposed method should employ a mechanism as follows:
to support the dynamicity of the information sys- Definition 16. (Relative Accuracy). Given the relative true
tems and their authorization policies and ease the positive and negative rates, the relative accuracy of 𝜋
maintenance of evolving systems. regarding L denoted as 𝐴𝐶𝐶 𝜋 | L is calculated as follows:
Our proposed approach addresses all the requirements 𝑇 𝑃𝜋 |L + 𝑇 𝑁𝜋 |L
except the sixth one. Table 2 shows the challenges that are 𝐴𝐶𝐶 𝜋 | L =
𝑇 𝑃𝜋 |L + 𝑇 𝑁 𝜋 | L + 𝐹𝑃 𝜋 | L + 𝐹 𝑁 𝜋 | L
addressed by our proposed approach and how it improves
upon the state-of-the-art policy mining techniques. In Sec- As accuracy may be misleading in unbalanced data sets
tion 6, we discuss the existing solutions in details. [22] (which is very probable in case of access logs), we use
relative F-score to better evaluate the mined policy:
3.3 Evaluation Metrics
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝜋 | L · 𝑅𝑒𝑐𝑎𝑙𝑙 𝜋 | L
One of the main metrics for evaluating the quality of an 𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L = 2 ·
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝜋 | L + 𝑅𝑒𝑐𝑎𝑙𝑙 𝜋 | L
extracted policy is how accurately it matches the original
policy. That means the authorization decisions made by the Policies with higher relative F-score are better as they are
extracted policy for a set of access requests should be similar more consistent with the original access log.
to the decisions made by the original policy for that set of On the other hand, as the number of filters in each rule
requests. As an example, if the decision of the original policy and the number of rules in an access control policy increases,
for an access request 𝑞 is permit, then the decision of the policy intelligibility would decrease and maintenance of the
mined policy for the same access request must be permit policy would become harder. Hence, complexity is another
as well. If the mined policy denies the same access request, key metric for evaluating the quality of a policy.
then we record this authorization tuple as a False Negative. Weighted Structural Complexity (WSC) is a general-
We define Relative True Positive, Relative False Positive, Relative ization of policy size and was first introduced for RBAC
True Negative, and Relative False Negative rates, respectively, policies [23] and later extended for ABAC policies [15]. WSC
as follows: is consistent with usability studies of access control rules,
Definition 14. (Relative True Positive Rate). Given an access which indicates that the more concise the policies are the
log L and an ABAC policy 𝜋, the relative true positive more manageable they become [24]. Informally, for a given
rate of 𝜋 regarding L denoted as 𝑇 𝑃 𝜋 | L is the portion of ABAC policy, its WSC is a weighted sum of its elements.
positive access logs for which the decision of 𝜋 is permit: Formally, for an ABAC policy 𝜋 with rules P, its WSC is
defined as follows:
|{h𝑞, 𝑑i ∈ L + |𝑑 𝜋 (𝑞) = 𝑝𝑒𝑟𝑚𝑖𝑡}|
𝑇 𝑃𝜋 |L = 𝑊 𝑆𝐶 (𝜋) = 𝑊 𝑆𝐶 (P)
|L + |
Here, |𝑠| is the cardinality of set 𝑠. ∑︁
𝑊 𝑆𝐶 (P) = 𝑊 𝑆𝐶 (𝜌)
Definition 15. (Relative False Positive Rate). The relative 𝜌∈ P
false positive rate of 𝜋 regarding L denoted as 𝐹𝑃 𝜋 | L is
the portion of negative access logs for which the decision
of 𝜋 is permit: 𝑊 𝑆𝐶 (𝜌 = hFU , FO , FS , R, 𝑜 𝑝, 𝑑i) = 𝑤 1𝑊 𝑆𝐶 (FU )+
𝑤 2𝑊 𝑆𝐶 (FO ) + 𝑤 3𝑊 𝑆𝐶 (FS ) + 𝑤 4𝑊 𝑆𝐶 (R)
|{h𝑞, 𝑑i ∈ L − |𝑑 𝜋 (𝑞) = 𝑝𝑒𝑟𝑚𝑖𝑡}|
𝐹𝑃 𝜋 | L = ∑︁
|L − | ∀𝑠 ∈ {FU , FO , FS , R} : 𝑊 𝑆𝐶 (𝑠) = |𝑠|
Similarly, we calculate the relative true negative rate and where |𝑠| is the cardinality of set 𝑠 and each 𝑤 𝑖 is a user-
false negative rate of 𝜋 regarding L, denoted as 𝑇 𝑁 𝜋 | L and specified weight.
𝐹 𝑁 𝜋 | L , respectively, as follows: Van Rijsbergen proposes an effectiveness measure for
|{h𝑞, 𝑑i ∈ L − |𝑑 𝜋 (𝑞) = 𝑑𝑒𝑛𝑦}| combining two different metrics 𝑃 and 𝑅 in [25] as follows :
𝑇 𝑁𝜋 |L =
|L − | 1
𝐸 =1−
𝛼 1−𝛼
|{h𝑞, 𝑑i ∈ L + |𝑑 𝜋 (𝑞) = 𝑑𝑒𝑛𝑦}| +
𝐹𝑁𝜋 |L = 𝑃 𝑅
|L + |
Given relative F-score and WSC measures for various
The relative precision and relative recall are calculated as mined policies resulting from running different mining al-
follows: gorithms over access log, it may not be straightforward to
𝑇 𝑃𝜋 |L select the best algorithm and, hence, the mined policy with
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝜋 | L = the highest quality. So, to be able to compare the quality of
𝑇 𝑃 𝜋 | L + 𝐹𝑃 𝜋 | L
different mined ABAC policies, we combine the two metrics
𝑇 𝑃𝜋 |L based on Van Rijsbergen’s effectiveness measure [25] and
𝑅𝑒𝑐𝑎𝑙𝑙 𝜋 | L = define the Policy Quality Metric as follows:
𝑇 𝑃𝜋 |L + 𝐹𝑁𝜋 |L
6

TABLE 2: State-of-the-art ABAC Rule Mining Techniques

Xu et al. [15] Medvet et al. [16] Iyer et. al [17] Cotrini et al. [18] Our Proposed Approach
Policy Correctness X X X X X
Policy Complexity X X X X X
Negative Attribute Filters 7 7 7 7 X
Relation Conditions X X X 7 X
Sparse Logs 7 X 7 X X
Negative Authorization Rules 7 7 X 7 7
Noisy Authorization Log X 7 7 7 X
System Dynamicity 7 7 7 7 X

𝛼 1 − 𝛼 −1 1. Data Pre-processing
Q𝜋 = ( + )
𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L Δ𝑊 𝑆𝐶 𝜋 Handling missing
values, Converting to
2. Parameter Tuning
1 categorical values
Here 𝛼 = where 𝛽 determines the importance of Finding best number of clusters,
1 + 𝛽2 best cluster initialization, and
relative F-score over policy complexity and Δ𝑊 𝑆𝐶 𝜋 shows appropriate thresholds
the relative reduction in the complexity with regards to the
6. Policy Refinement
complexity of the most complex mined policy. Δ𝑊 𝑆𝐶 𝜋 is 3. Clustering
calculated as follows: Refining policy rules
based on FP and FN Clustering data using
records k-mean/k-mode
𝑊 𝑆𝐶𝑚𝑎𝑥 − 𝑊 𝑆𝐶 (𝜋) + 1 algorithm
Δ𝑊 𝑆𝐶 𝜋 =
𝑊 𝑆𝐶𝑚𝑎𝑥
𝑊 𝑆𝐶𝑚𝑎𝑥 is the weighted structural complexity of the most 4. Rule Extraction
5. Rule Pruning
complex mined policy. Finding effective
Removing duplicate
Definition 17. (Most Complex Mined Policy). The most attributes and
rules, Finding similar
relations, Building
rules and eliminating
complex mined policy is the mined policy with the them
Rules

highest weighted structural complexity. It is extracted by


iterating through positive access log L + and adding an Fig. 1: Overview of the Proposed Approach.
access control rule for each authorization tuple if it’s not
already included in the mined policy. The corresponding
rule for each authorization tuple includes all attributes 4.1 Data Pre-processing
of user, object, and subject of that authorization tuple. As features of our learning algorithm are categorical vari-
Considering the equal importance of relative F-score and ables, the first step in pre-processing the access log is
relative loss of complexity of the policy, we calculate the to convert all numerical variables to their corresponding
quality measure as follows: categorical values. For example, in ABAC, environmental
attributes deal with time, location or dynamic aspects of the
2 · 𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L · Δ𝑊 𝑆𝐶 𝜋 access control scenario. Hence, we need to pre-process and
Q𝜋 = discretize such continuous variables to categorical ones (e.g.
𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L + Δ𝑊 𝑆𝐶 𝜋
time of access to working hours and non working hours) so
A mined policy with a higher F-score would have a our proposed algorithm is applicable to them.
higher policy quality. On the other hand, as the complexity We also need to handle missing values in this step. As
of a policy increases, its quality will decrease. The intuition the frequency of each attribute value is an important factor
here is that once an extracted policy reaches a high F-score, in our rule extraction algorithm (Section 4.4) for deciding
adding additional rules will lead to a decrease in Q 𝜋 . if an attribute is effective or not, it is important to replace
For the most complex mined policy 𝜋 𝑤 , Δ𝑊 𝑆𝐶 𝜋𝑤 ≈ 0, missing values in a way that it doesn’t mess up with the
so its policy quality Q 𝜋𝑤 is very close to zero. For an original frequency of each attribute value. For this purpose,
empty mined policy 𝜋𝑒 (a policy without any rule), while we replace each missing value by UNK (i.e., unknown).
Δ𝑊 𝑆𝐶 𝜋𝑒 ≈ 1, as it denies all the access requests, its false
negative rate is one and its true positive rate is zero. So its
precision is zero and as a result, its F-score is zero as well. 4.2 Selection of Learning Algorithm
So the quality of the empty policy Q 𝜋𝑒 is zero, too. We use the K-modes algorithm [26], which is a well known
The most complex mined policy and the empty mined unsupervised learning algorithm used for clustering cate-
policy are the two extreme cases with policy quality equal gorical data. K-modes has been proved effective in mining
to zero. Other mined policies between these two cases have ABAC policies [27]; this algorithm uses an initialization
higher policy quality than zero. method based on both the distance between data points
and the density of data points. Using both density and
distance when initializing clusters help avoid two problems:
4 T HE P ROPOSED L EARNING - BASED A PPROACH (i) clustering outliers as new clusters are based only on the
Our proposed learning-based ABAC policy extraction pro- distances; and (ii) creating new clusters surrounding one
cedure consists of the steps summarized in Figure 1. center based only on the density. Compared to a random
7

initialization method, this method provides more robustness effective positive (negative) attribute pair of 𝜌𝑖 correspond-
and better accuracy in the clustering process [26]. ing to cluster 𝐶𝑖 , where the frequency of occurrence of 𝑣 𝑗
in the set of all the records of cluster 𝐶𝑖 is much higher
4.3 Parameter Tuning (lower) than its frequency of occurrence in the original
data; this is determined based on a threshold T𝑃 (T𝑁 ).
In the next step, we tune the learning parameters. There are The attribute expression h𝑎 𝑗 , 𝑣 𝑗 i (h𝑎 𝑗 , !𝑣 𝑗 i) is added to
several challenges that need to be addressed in this step, the attribute filters of the extracted rule 𝜌𝑖 for 𝐶𝑖 .
which include the following:
In the final step, we extract the relation conditions for
4.3.1 Number of Clusters (k) AC rules for each cluster. This will be done based on the
One of the main challenges in an unsupervised learning frequency of equality between pairs of attributes in the
is determining the number of clusters, 𝑘. In our sample records of each cluster. We define effective positive relation
policies, as we know the number of rules in each policy, and effective negative relation as follows:
we can set the number of clusters beforehand but in a
real situation as we do not know the size of the rules in Definition 19. (Effective Positive (Negative) Relation). Let
advance, making the correct choice of 𝑘 is difficult. One of 𝑅 = {h𝑎, 𝑏i} be the set of all possible relations between
the popular methods for determining the number of clusters pairs of attributes in the system; we define h𝑎 𝑗 , 𝑏 𝑗 i as
in an unsupervised learning model is the Elbow Method [28], an effective positive (negative) relation pairs of 𝜌𝑖 cor-
[29]. This method is based on total within group sum of responding to cluster 𝐶𝑖 , where the frequency of 𝑎 𝑗
squares. 𝑘 will be chosen as the number of clusters if adding equals 𝑏 𝑗 in all the records of cluster 𝐶𝑖 is much higher
another cluster doesn’t give much better modeling of the (lower) than their frequency in the original data; this is
data (i.e., the elbow point of the graph). determined based on a threshold 𝜃 𝑃 (𝜃 𝑁 ). The relation
As a second approach, we choose a number of clusters h𝑎 𝑗 , 𝑏 𝑗 i (h𝑎 𝑗 , !𝑏 𝑗 i) is added to the relation conditions of
(𝑘) which gives the best modeling of the data in terms of the the extracted rule 𝜌𝑖 for this cluster.
policy quality metric. For this purpose, we run our clustering
algorithm for different values of 𝑘 and calculate the accuracy We note that the values of the thresholds T𝑃 , T𝑁 , 𝜃 𝑃 ,
of the corresponding model using 10-fold cross-validation. and 𝜃 𝑁 will be different for each data set. To find the
The value of 𝑘 that maximizes the accuracy of the model is best threshold values for each data set, we run the rule
selected as the final number of clusters. extraction algorithm for different values of thresholds, and
Note that increasing 𝑘 will ultimately reduce the amount the values which result in the maximum accuracy over the
of clustering error or it will increase the accuracy of the cross-validation data set will be selected.
model, but by increasing the number of clusters, the num- Algorithms 1 and 2 show effective attribute and effective
ber of extracted rules will also increase resulting in more relation extraction procedures, respectively.
complexity (i.e., higher WSC). So it is important to find an
optimal 𝑘 that balances between policy accuracy and WSC. Algorithm 1 Effective attribute extraction algorithm
1: procedure EXTRACTATTRIBUTE F ILTERS
4.3.2 Cluster Initialization & Local Optima Input: 𝐶𝑖 , 𝐴, 𝑉, L, T𝑃 , T𝑁
Different cluster initializations can lead to a different set Output: F
of clusters as k-means/k-modes may converge to a local 2: F ←∅
optima. To overcome this issue, for a given number of 3: for all 𝑎 ∈ 𝐴 do
clusters, 𝑘, we train multiple models with different cluster 4: for all 𝑣 𝑗 ∈ 𝑉𝑎 do
initializations and then select the partition with the smallest 5: if 𝐹𝑟𝑒𝑞(𝑣 𝑗 , 𝐶𝑖 ) − 𝐹𝑟𝑒𝑞(𝑣 𝑗 , L) > T𝑃 then
clustering error. 6: F 𝑖 ← F ∪ h𝑎, 𝑣 𝑗 i
7: end if
4.4 Policy Rules Extraction 8: if 𝐹𝑟𝑒𝑞(𝑣 𝑗 , L) − 𝐹𝑟𝑒𝑞(𝑣 𝑗 , 𝐶𝑖 ) > T𝑁 then
9: F 𝑖 ← F ∪ h𝑎, !𝑣 𝑗 i
The main phase in our proposed approach is the extraction 10: end if
of ABAC policy rules. In the first step, we need to collect all 11: end for
the authorization tuples related to each rule of the policy. We 12: end for
use data clustering for this purpose. We divide the access log return 𝜌𝑖
into clusters where the records in each cluster correspond to 13: end procedure
one AC rule in the system. This is done based on finding
similar patterns between features (i.e., attribute values) of
the records (i.e., access control tuples). In the second step,
we extract the attribute filters of such a rule. We adapt the
rule extraction algorithm in [27] and extend it to extract both 4.5 Policy Enhancement
positive and negative attribute filters. We define effective After the first phase of policy rule extraction, we get a policy
positive attribute and effective negative attribute as follows: which may not be as accurate and concise as we desire. We
Definition 18. (Effective Positive (Negative) Attribute). Let enhance the quality of the mined policy through iterations
𝑆 = {h𝑎, 𝑣i} be the set of all possible attribute-value pairs of policy improvement steps that include: rule pruning and
in a system; we define h𝑎 𝑗 , 𝑣 𝑗 i ∈ 𝑆 (h𝑎 𝑗 , !𝑣 𝑗 i ∈ 𝑆) as an policy refinement.
8

Algorithm 2 Effective relation extraction algorithm Algorithm 3 Rule Pruning algorithm


1: procedure EXTRACT R ELATIONS 1: procedure RULE P RUNING
Input: 𝐶𝑖 , 𝐴, L, 𝜃 𝑃 , 𝜃 𝑁 Input: 𝜋
Output: R Output: 𝜋
2: R←∅ 2: P ← 𝜋.P
3: for all 𝑎 ∈ 𝐴 do 3: 𝑞 ← CALC Q UALITY (P)
4: for all 𝑏 ∈ 𝐴 and 𝑏 ≠ 𝑎 do 4: for all 𝜌𝑖 ∈ P do
5: if 𝐹𝑟𝑒𝑞(𝑎 = 𝑏, 𝐶𝑖 ) - 𝐹𝑟𝑒𝑞(𝑎 = 𝑏, L)>𝜃 𝑃 then 5: for all 𝜌 𝑗 ∈ P and 𝜌𝑖 ≠ 𝜌 𝑗 do
6: R ← R ∪ h𝑎, 𝑏i 6: if S IMILARITY (𝜌𝑖 , 𝜌 𝑗 ) > 0.5 then
7: end if 7: P𝑖 ← P/𝜌𝑖
8: if 𝐹𝑟𝑒𝑞(𝑎 = 𝑏, L) - 𝐹𝑟𝑒𝑞(𝑎 = 𝑏, 𝐶𝑖 )>𝜃 𝑁 then 8: P 𝑗 ← P/𝜌 𝑗
9: R ← R ∪ h𝑎, !𝑏i 9: 𝑞 𝑖 ← CALC Q UALITY (P𝑖 )
10: end if 10: 𝑞 𝑗 ← CALC Q UALITY (P 𝑗 )
11: end for 11: if 𝑞 𝑖 >= 𝑞 and 𝑞 𝑖 >= 𝑞 𝑗 then
12: end for 12: P ← P𝑖
return R 13: end if
13: end procedure 14: if 𝑞 𝑗 >= 𝑞 and 𝑞 𝑗 >= 𝑞 𝑖 then
15: P ← P𝑗
16: end if
4.5.1 Rule Pruning 17: end if
During the rule extraction phase, it’s possible to have two 18: end for
clusters that correspond to the same rule. As a result, the 19: end for
extracted rules of these clusters are very similar to each return P
other. Having two similar rules in the final policy increases 20: end procedure
the complexity of the mined policy while it may not help
the accuracy of the policy and as a result, it hurts the policy
quality. To address such an issue, in the rule pruning step, Here 𝜌2 is more restricted than 𝜌1 as it imposes more
we identify similar rules and eliminate the ones whose conditions on the user attributes.
removal improves the policy quality more. If eliminating Having such a restricted rule in the mined policy would
neither of the two rules improves the policy quality, we result in a larger number of FNs as an access request that
keep both the rules. This may happen when we have two would be permitted by the original rule will be denied by
very similar AC rules in the original policy. We measure the the restricted rule.
similarity between two rules using Jaccard similarity [30] as On the other hand, an extracted rule is more relaxed
follows: compared to the original rule if it misses some of the filters.
In Example 6, 𝜌1 is more relaxed than 𝜌2 . Such a relaxed rule
𝐽 (𝑆1 , 𝑆2 ) = |𝑆1 ∩ 𝑆2 |/|𝑆1 ∪ 𝑆2 | would result in more FPs as it permits access requests that
Based on this, we calculate the similarity between two should be denied as per the original policies.
rules 𝜌1 and 𝜌2 as follows: To address these issues, we propose a policy refinement
procedure which is shown in Algorithm 4. Here, we try
𝐽 (𝜌1 , 𝜌2 ) =
 Í  to refine the mined policy (𝜋 𝑚 ) based on the patterns dis-
|F𝜌1 ∩ F𝜌2 | + |R 𝜌1 ∩ R 𝜌2 | + |𝑜 𝑝 𝜌1 ∩ 𝑜 𝑝 𝜌2 | covered in the FN or FP records. These patterns are used
F ∈ { FU , FO , FS }
 Í  to eliminate extra filters from restricted rules or append
|F𝜌1 ∪ F𝜌2 | + |R 𝜌1 ∪ R 𝜌2 | + |𝑜 𝑝 𝜌1 ∪ 𝑜 𝑝 𝜌2 | missing filters to relax the rules.
F ∈ { FU , FO , FS }
To extract patterns from the FN or FP records, we apply
We consider two rules to be similar if their Jaccard our rule extraction procedure on these records to get the cor-
similarity score is more than 0.5, which means that the size responding policies 𝜋 𝐹 𝑁 and 𝜋 𝐹 𝑃 . Here our training data are
of their common elements is more than half of the size of the FN and FP records, respectively. We compare the extracted
union of their elements. Algorithm 3 shows the rule pruning FN or FP rules with the mined policy and remove the extra
procedure. filters or append the missed ones to the corresponding rules.
4.5.2 Policy Refinement As an example, consider the FP records. Here, our goal
is to extract the patterns that are common between access
During the rule extraction phase, it is possible to extract
requests that were permitted based on the mined policy
rules that are either too restricted or too relaxed compared
while they should have been denied based on the original
to the original policy rules. A rule is restricted if it employs
policy.
more filters than the original rule.
In each step of refinement, a rule from 𝜋 𝑚 that is similar
Example 6. Consider the following two rules:
to a rule from 𝜋 𝐹 𝑁 or 𝜋 𝐹 𝑃 based on the Jaccard similarity
𝜌1 = h{( 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑓 𝑎𝑐𝑢𝑙𝑡𝑦)},{(𝑡𝑦 𝑝𝑒, 𝑔𝑟𝑎𝑑𝑒𝑏𝑜𝑜𝑘)}, (Section 4.5.1) is selected and then refined in two ways as
{𝑠𝑒𝑡𝑆𝑐𝑜𝑟𝑒}, 𝑝𝑒𝑟𝑚𝑖𝑡i discussed below.
𝜌2 = h{( 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑓 𝑎𝑐𝑢𝑙𝑡𝑦),(𝑢𝐷𝑒 𝑝𝑡, 𝐸 𝐸)}, Policy refinement based on 𝜋 𝐹 𝑁 : In the case of FN records,
{(𝑡𝑦 𝑝𝑒, 𝑔𝑟𝑎𝑑𝑒𝑏𝑜𝑜𝑘)},{𝑠𝑒𝑡𝑆𝑐𝑜𝑟𝑒}, 𝑝𝑒𝑟𝑚𝑖𝑡i two situations are possible: a rule is missing from the mined
9

policy (𝜋 𝑚 ) or one of the rules in 𝜋 𝑚 is more restrictive. To 5.1 Datasets


resolve this issue, for each rule 𝜌𝑖 ∈ 𝜋 𝐹 𝑁 :
We perform our experiments on multiple datasets including
• if there is a similar rule 𝜌 𝑗 ∈ 𝜋 𝑚 then we refine 𝜌 𝑗 as synthesized and real ones. The synthesized access logs are
follows: generated from two sets of ABAC policies. The first one is
a manually written set of policies that is adapted from [15]
∀ 𝑓 ∈ F : F𝜌 𝑗 = F𝜌 𝑗 /(F𝜌 𝑗 /F𝜌 𝑖 ) to be compatible with our policy language. The second one
where F = FU ∪ FO ∪ FS ∪ R. So, the extra filters are includes a completely randomly generated set of policies. To
removed from the restricted rule (𝜌 𝑗 ). synthesize our input data, for each ABAC policy (i.e., Uni-
• if there is no such rule, then 𝜌𝑖 is the missing rule and versity Policy, Healthcare Policy, etc.), a set of authorization
we add it to 𝜋 𝑚 . tuples is generated and the outcome of the ABAC policy for
each access right is evaluated. The authorization tuples with
Policy refinement based on 𝜋 𝐹 𝑃 : In the case of FP records,
permit as their outcomes are the inputs to our unsupervised
some filters might be missing in an extracted rule in the
learning model.
mined policy (𝜋 𝑚 ); so for each rule 𝜌𝑖 ∈ 𝜋 𝐹 𝑃 , we refine the
mined policy as follows: Our real datasets are built from access logs provided by
Amazon in Kaggle competition [31] and available in the UCI
∀ 𝑓 ∈ F : F𝜌 𝑗 = F𝜌 𝑗 ∪ (F𝜌 𝑖 /F𝜌 𝑗 ) machine learning repository [32].
where F = FU ∪FO ∪FS ∪R includes all the filters in the rule. Manual Policy - University: This policy is adapted from
So, the missing filters are added to the relaxed rule (𝜌 𝑗 ). [15] and it controls access of different users including stu-
These refinements can be done in multiple iterations dents, instructors, teaching assistants, etc., to various objects
until further refinement does not give a better model in (applications, gradebooks, etc.).
terms of policy quality Q 𝜋 . Manual Policy - Healthcare: This policy is adapted from
[15] and is used to control access by different users (e.g.
Algorithm 4 Policy refinement algorithm nurses, doctors, etc.) to electronic health records (EHRs) and
1: procedure REFINE P OLICY EHR items.
Input: 𝐴, L Manual Policy - Project Management: This policy is
Output: 𝜋 𝑚 adapted from [15] and it controls access by different users
2: F N ← GET FN S (𝜋 𝑚 , L) (e.g. department managers, project leaders, employees, etc.)
3: 𝜋 𝐹 𝑁 ← EXTRACT P OLICY (F N ) to various objects (e.g. budgets,schedules and tasks).
4: for all 𝜌𝑖 ∈ 𝜋 𝐹 𝑁 .P do Random Policies: The authorization rules for this policy
5: 𝑅𝑠 ← GET S IMILAR R ULES (𝜋 𝐹 𝑁 .P, 𝜋 𝑚 .P) is generated completely randomly from random sets of
6: if |𝑅𝑠 | = 0 then attributes and attribute values. These randomly generated
7: 𝜋 𝑚 .P ← 𝜋 𝑚 .P ∪ 𝜌𝑖 policies provide an opportunity to evaluate our proposed
8: else algorithm on access logs with various sizes and with vary-
9: for all 𝜌 𝑗 ∈ 𝑅𝑠 do ing structural characteristics. However, we note that, the
10: for all F ∈ FU ∪ FO ∪ FS ∪ R do performance of our algorithm on random policies might not
11: F𝜌 𝑗 ← F𝜌 𝑗 \(F𝜌 𝑗 \F𝜌𝑖 ) be representative of its performance in real scenarios and
12: end for over real policies.
13: end for Real Dataset - Amazon Kaggle: The Kaggle competition
14: end if dataset [31] includes access requests made by Amazon’s
15: end for employees over two years. Each record in this dataset de-
16: F P ← GET FP S (𝜋 𝑚 , L) scribes an employee’s request to a resource and whether
17: 𝜋 𝐹 𝑃 ← EXTRACT P OLICY (F P) the request was authorized or not. A record consists of the
18: for all 𝜌𝑖 ∈ 𝜋 𝐹 𝑃 .P do employee’s attribute values and the resource identifier. The
19: 𝑅𝑠 ← GET S IMILAR R ULES (𝜋 𝐹 𝑃 .P, 𝜋 𝑚 .P) dataset includes more than 12,000 users and 7,000 resources.
20: if |𝑅𝑠 | ! = 0 then Real Dataset - Amazon UCI: This dataset is provided
21: for all 𝜌 𝑗 ∈ 𝑅𝑠 do by Amazon in the UCI machine learning repository [32].
22: for all F ∈ FU ∪ FO ∪ FS ∪ R do It includes more than 36,000 users and 27,000 permissions.
23: F𝜌 𝑗 ← F𝜌 𝑗 ∪ (F𝜌𝑖 \F𝜌 𝑗 ) Since the dataset contains over 33,000 attributes, our focus
24: end for in this experiment is narrowed only to the most requested 8
25: end for permissions in the dataset.
26: end if Partial Datasets: To check the efficiency of the proposed
27: end for algorithm over sparse datasets, we generate sparse datasets
return 𝜋 𝑚 (partial datasets) by randomly selecting authorization tu-
28: end procedure ples from the complete dataset. For example, a 10% sparse
(partial) dataset is generated by randomly selecting 10% of
tuples from the complete access logs.
5 E XPERIMENTAL E VALUATION Noisy Datasets: To check the efficiency of the proposed
We have implemented a prototype of our proposed ap- algorithm over noisy datasets, we generate noisy datasets
proach presented in Section 4. Here, we present our experi- by randomly reversing the decision of authorization tuples.
mental evaluation. For instance, a 10% noisy dataset is generated by randomly
10

TABLE 3: Details of the Synthesized and Real Policies

# 𝜋 |P | | 𝐴| |𝑉 | |L| | L+ | | L− |
𝜋1 UniversityP 10 11 45 2,700K 231K 2,468K
𝜋2 HealthcareP 9 13 40 982K 229K 753K
𝜋3 ProjectManagementP 11 14 44 5,900K 505K 5,373K
𝜋4 UniversityPN 10 11 45 2,700K 735K 1,964K
𝜋5 HealthcarePN 9 13 40 982K 269K 713K
𝜋6 ProjectManagementPN 11 14 44 5,900K 960K 4,918K
𝜋7 Random Policy 1 10 8 27 17K 2,742 14K
𝜋8 Random Policy 2 10 10 48 5,250K 245K 5,004K
𝜋9 Random Policy 3 10 12 38 560K 100K 459K
𝜋10 Amazon Kaggle - 10 15K 32K 30K 1897
𝜋11 Amazon UCI - 14 7,153 70K 36K 34K

reversing the decision of 10% of authorization tuples in the Our second set of experiments is on partial datasets. The
complete access logs. algorithm proposed by Xu and Stoller [14] and the approach
For each of the manual policies, we consider two differ- presented by Cotrini et al. [18] are not able to handle
ent sets of policy rules; the first one only contains positive complete datasets as these datasets are huge. To be able to
attribute filters and relations while the second one includes compare the performance of our proposed algorithm with
both positive and negative attribute filters and relations. We their work, we generated 0.1% sparse (partial) datasets and
have included these policies in Appendix A. run all algorithms over these partial datasets. The results of
Table 3 shows the details of the manual and random these experiments are shown in Table 5 and Figures 2, 3, and
access log datasets. In this table, |P | shows the number of 4.
rules in the original policy, | 𝐴| and |𝑉 | show the number The algorithm proposed by Xu and Stoller [14] and the
of attributes and attribute values and |L|, |L + |, |L − | show approach presented by Cotrini et al. [18] do not generate
the number of access control tuples, the number of positive policy rules with negative attribute filters and relations,
access logs, and the number of negative access logs in the however we report the results of their algorithms over
given dataset, respectively. datasets related to policy rules including negations (policies
𝜋4 , 𝜋5 , 𝜋6 ) to show how the quality of mined policies would
5.2 Experimental Setup be impacted if the mining algorithm does not extract rules
To evaluate our proposed method, we use a computer with that include negation.
2.6 GHz Intel Core i7 and 16 GB of RAM. We use Python 3 in
the mining and the evaluation process. The algorithms were 5.3.1 The F-Score of the Mined Policies
highly time-efficient (e.g., maximum time consumption is
Table 4 shows the final 𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L of our proposed ap-
less than half an hour).
proach after several rounds of refinement over all complete
We use kmodes library [33] for clustering our data. The
datasets. As we can see in Table 4, the proposed approach
initialization based on density (CAO) [26] is chosen for
achieves high F-score across all experiments except for 𝜋6 .
cluster initialization in kmodes algorithm.
𝜋6 is a very complex dataset with both positive and negative
To find optimal 𝑘, we apply the Silhouette method to
attributes and relation filters including 14 attributes, 44
test different values of 𝑘. We examine each value of 𝑘 in
attribute values, and around six million access records. The
pre-defined set [10, 20]. Then the 𝑘 value that results in the
final policy quality for this dataset is around 0.63, which is
highest Silhouette score is used in the final model.
acceptable considering the complexity of the policy.
To generate the synthesized access log L, we brute force
through all attributes 𝐴 and their values 𝑉𝑎 to produce Table 5 and Figure 2 show the comparison of the F-
all possible combinations for the tuples. This method was Scores of policies mined by our proposed approach with
used to generate a complete access log for the random and that of previous work over partial datasets (with 0.1% of
manual policy datasets. We generate two sets of partial the complete datasets). The F-Score of policies mined by our
datasets; the 10% partial datasets are used to check the algorithm is very close to the one done by the approach
efficiency of the proposed approach over sparse datasets proposed by Cotrini et al. [18]. As we can see, our proposed
(Table 4) and the 0.1% partial datasets are used to compare approach outperforms theirs in half of the experiments.
the proposed approach with previous work (Table 5). We
also generate a set of noisy datasets to check the efficiency 5.3.2 The Complexity of the Mined Policies
of the proposed algorithm over noisy access log. The results In Table 4, we can see the final 𝑊 𝑆𝐶 of the policies mined by
of such experiments are reported in Table 4. our proposed approach. All extracted policies have the com-
For all experiments, the optimal thresholds for selecting plexity lower than 100 which is much lower than those of the
effective attributes and relations are between 0.2 and 0.3. most complex policies for individual datasets. According to
Definition 17, the most complex policy for each dataset has
5.3 Results the same complexity as the original positive access log (L + ).
We first evaluate the performance of our policy mining Given numbers in Tables 3 and 4, the most complex policies
algorithm on complete datasets. Table 4 shows the results for these scenarios are thousands of times more complex
of these experiments. than the extracted policies by our approach.
11

We compare the complexity of the policies mined by


different ABAC mining algorithms in Figure 3. Among three
1.0 Proposed Approach
different approaches, the Cotrini et al. algorithms extracts Xu and Stoller [14]
Cotrini et al. [18]
the most complex policies with WSC greater than 1000 for
some cases. The complexity of the policies mined by our 0.8

algorithm is very close to the one extracted by the approach


proposed by Xu and Stroller [14]. 0.6

Policy Quality
0.4
5.3.3 The Policy Quality of the Mined Policies
0.2
Finally, Table 4 shows the quality of the extracted policies
through our proposed approach. We can see that out of
all datasets that our proposed algorithm was applied on, 0.0
Partial _1 Partial _2 Partial _3 Partial _4 Partial _5 Partial _6 _10 _11
around 75% of the cases reached the policy quality of more
than 0.8, which is significant, considering the huge size of
original access logs (each more than 30K records).
Fig. 4: The Quality of the Policies Mined by ABAC Mining
According to Figure 4, in most cases the policy quality Algorithms
of the policies mined by our proposed approach is higher
than those of the policies extracted by other ABAC mining
algorithms. 6 R ELATED W ORK
As RBAC approach became popular, many organization
decided to equip their information systems with more re-
cent access control model, however migrating to RBAC
100 Proposed Approach
Xu and Stoller [14] from legacy access control systems was a huge obstacle for
Cotrini et al. [18]
such environments. As a result, several researchers have
80
addressed such a challenge by introducing automated role
extraction algorithms [10], [11], [12], [13], [23], [34], [35], [36],
60
[37], [38], [39]. Role engineering or role mining are the terms
F-score

that have been used to refer to procedures to extract an


40
optimal set of roles given user-permission assignments.
In [10], Kuhlmann and Schimpf try to discover a set
20 of roles from user-permission assignments using clustering
techniques, however, they do not show the feasibility of
0 their proposed approach through experiments. In addition,
Partial _1 Partial _2 Partial _3 Partial _4 Partial _5 Partial _6 _10 _11
their proposed approach lacks a metric to choose the best
model based on their clustering method.
The ORCA role mining tool is proposed by
Fig. 2: The F-Score of the Policies Mined by ABAC Mining
Schlegelmilch and Steffens and tries to perform a
Algorithms
hierarchical clustering on user-permission assignments
[11]. Their proposed method limits the hierarchical
structure to a tree so that each permission/user is assigned
to one role in the hierarchy. This feature limits the feasibility
of their proposed approach as, in real environments, roles
2500 Proposed Approach
Xu and Stoller [14] do not necessarily form a tree.
Cotrini et al. [18]
Ni et al. propose a supervised learning approach for
2000
role mining which maps each user-permission assignment
to a role using a supervised classifier (i.e., a support vector
1500
machine (SVM)) [39]. The main limitation of their proposed
WSC

approach is that the roles and some parts of the role-


1000 permission assignments are needed beforehand; and hence,
it is not applicable in many organizations.
500 Vaidya et al. are the first to define the Role Mining
Problem (RMP) formally and analyze its theoretical bounds
0 [40]. They also propose a heuristic approach for finding
Partial _1 Partial _2 Partial _3 Partial _4 Partial _5 Partial _6 _10 _11 a minimal set of roles for a given set of user-permission
assignments.
Xu and Stoller are the first to propose an algorithm for
Fig. 3: The Complexity of the Policies Mined by ABAC mining ABAC policies from RBAC [41], logs [14], and access
Mining Algorithms control list [15] plus attribute information. Their policy min-
ing algorithms iterate over access control tuples (generated
12

TABLE 4: Results of Our Proposed Approach on Various Synthesized and Real Policy Datasets

𝜋 Total Running Time (s) Optimal 𝑘 P𝑚𝑖𝑛𝑒𝑑 𝐴𝐶𝐶 𝜋 |L 𝐹 -𝑠𝑐𝑜𝑟 𝑒 𝜋 |L 𝑊 𝑆𝐶𝑜𝑟𝑖𝑔 𝑊 𝑆𝐶𝑚𝑖𝑛𝑒𝑑 Q𝜋
𝜋1 9376.556 15 20 97.5% 83.6% 33 91 0.91
Partial 𝜋1 (10%) 1994.769 15 13 97.29% 82.21% 33 54 0.90
Noisy 𝜋1 (10%) 4979.56 10 8 96.94% 80% 33 28 0.90
𝜋2 2180.745 18 18 85.49% 75.93% 33 71 0.86
Partial 𝜋2 (10%) 4787.98 10 8 96.94% 85.33% 33 28 0.92
Noisy 𝜋2 (10%) 7339.91 8 15 72.22% 82.13% 33 27 0.90
𝜋3 7795.44 15 17 95.6% 65.63% 44 55 0.80
Partial 𝜋3 (10%) 1347.29 6 10 95.2% 62.24% 44 56 0.77
Noisy 𝜋3 (10%) 1912.72 15 15 94.47% 62.66% 44 81 0.77
𝜋4 13662.62 7 16 86.7% 71.58% 33 40 0.83
𝜋5 8681.64 15 15 78.11% 62% 33 67 0.76
𝜋6 12905.78 20 17 88.05% 46.28% 44 80 0.63
𝜋7 24.63 8 20 93% 78.33% 33 65 0.88
𝜋8 13081.20 10 14 99.12% 91.28% 33 51 0.95
𝜋9 2266.68 8 16 92.17% 79.66% 33 46 0.89
𝜋10 265.3 15 20 94% 97% - 44 0.98
𝜋11 1010.43 24 25 98.49% 99% - 92 0.82

TABLE 5: Comparison of Our Proposed Approach with Previous Work on Various Synthesized
and Real Policy Datasets

Mining Alg. 𝜋 Time (s) 𝐴𝐶𝐶 𝜋 |L 𝐹 -𝑠𝑐𝑜𝑟 𝑒 𝜋 |L P 𝜋𝑚𝑖𝑛𝑒𝑑 𝑊 𝑆𝐶 ( 𝜋) Q𝜋


Xu and Stoller [14] 227 94.74% 65.87% 10 34 0.79
Partial 𝜋1 (0.1%)
Cotrini et al. [18] 126 80.74% 45.3% 132 508 0.58
Proposed Approch 7.3 96% 74.2% 7 29 0.85
Xu and Stoller [14] 32645 64.43 63.61 3 6 0.78
Partial 𝜋2 (0.1%)
Cotrini et al. [18] 529 72.72% 64% 65 272 0.75
Proposed Approch 7.9 79.78% 68.23% 13 49 0.81
Xu and Stoller [14] −∗ −∗ −∗ −∗ −∗ −∗
Partial 𝜋3 (0.1%)
Cotrini et al. [18] 3587 91.57% 54.124% 24 77 0.70
Proposed Approch 11.44 94.96% 51.31% 12 55 0.78
Xu and Stoller [14] 4230 73.37% 16.1% 10 34 0.28
Partial 𝜋4 (0.1%)
Cotrini et al. [18] 204 93.55% 88.5% 385 1389 0.86
Proposed Approch 15 89.3% 80% 10 40 0.89
Xu and Stoller [14] 45348 79.25 73.09 3 6 0.84
Partial 𝜋5 (0.1%)
Cotrini et al. [18] 3587 86.46% 79.2% 123 462 0.83
Proposed Approch 8.8 87.2% 76.3% 15 66 0.86
Xu and Stoller [14] −∗ −∗ −∗ −∗ −∗ −∗
Partial 𝜋6 (0.1%)
Cotrini et al. [18] 2848 82.75% 62.66% 31 100 0.77
Proposed Approch 22.67 81.2% 49.4% 12 44 0.66
Xu and Stoller [14] −∗ −∗ −∗ −∗ −∗ −∗
𝜋10
Cotrini et al. [18] 237 84.25% 91.39% 1055 2431 0.92
Proposed Approch 265.3 94% 97% 20 44 0.98
Xu and Stoller [14] −∗ −∗ −∗ −∗ −∗ −∗
𝜋11
Cotrini et al. [18] 1345 70.93% 75.64% 466 1247 0.85
Proposed Approch 1010.43 98.49% 99% 24 92 0.99
∗ Xu and Stoller [14] did not terminate nor produced any output for the these datasets even after running for
more than 24 hours.

from available information, e.g., user permission relations their approach is a multi-objective optimization framework
and attributes) and construct candidates rules. They then which incorporates requirements on both correctness and
generalize the candidate rules by replacing conjuncts in expressiveness, it suffers from the same issue as [15].
attribute expressions with constraints. The main limitation Iyer and Masoumzadeh [17] propose a more systematic,
of these algorithms is that as they are based on heuristic yet heuristic ABAC policy mining approach which is based
approaches, the proposed techniques work very well for on the rule mining algorithm called PRISM. It inherits
simple and small scale AC policies, however, as the number shortcomings associated with PRISM that includes dealing
of rules in the policy and the number of elements in each with a large dimensionality of the search space of attribute
rule increases, they do not perform well. values and generation of a huge number of rules.
Following Xu and Stroller’s proposed method, Medvet Cotrini et al. propose an algorithm called Rhapsody for
et al. [16] propose a multi-objective evolutionary algorithm mining ABAC rules from sparse logs [18]. Their proposed
for extracting ABAC policies. The proposed approach is a approach is built upon subgroup discovery algorithms.
separate and conquer algorithm, in each iteration of which, a They define a novel metric, reliability which measures how
new rule is learned and the set of access log tuples becomes overly permissive an extracted rule is. In addition, they
smaller. Their algorithm employs several search-optimizing propose a universal cross-validation metric for evaluating
features to improve the quality of the mined rules. Although the mined policy when the input log is sparse. However,
13

their algorithm is not capable of mining policies from logs We have evaluated our policy extraction algorithm on ac-
with many attributes as the number of extracted rules grows cess logs generated for various sample policies and demon-
exponentially in the number of attributes of the system. strated its feasibility. Furthermore, we have shown that our
approach outperforms previous works in terms of policy
quality.
7 D ISCUSSION AND L IMITATIONS As future work, we plan to extend our method to
As mentioned in section 5.3, our proposed approach is support numerical data and extract negative authorization
able to achieves a practical level of performance when rules as well while studying the effects of various conflict
applied to both synthesized and real datasets. In the case resolution strategies on the quality of the mined policy.
of synthesized datasets, the proposed approach is capable
of mining policies containing both positive and negative
attribute filters from complete datasets. On the other hand, R EFERENCES
our proposed approach shows potential for use in sparse [1] R. S. Sandhu and P. Samarati, “Access control: principle and
datasets. In addition, the real datasets contain a large num- practice,” IEEE communications magazine, vol. 32, no. 9, pp. 40–48,
ber of attributes and attribute values as shown in Table 1994.
3. The ability of our proposed approach in mining high- [2] M. A. Harrison, W. L. Ruzzo, and J. D. Ullman, “Protection in
operating systems,” Communications of the ACM, vol. 19, no. 8,
quality policies for these datasets shows that the size of pp. 461–471, 1976.
attributes and attribute values have minimal impact on the [3] D. E. Bell and L. J. LaPadula, “Secure computer systems: Math-
effectiveness of our approach. ematical foundations,” tech. rep., MITRE CORP BEDFORD MA,
1973.
The proposed approach is based on an unsupervised [4] R. S. Sandhu, “Lattice-based access control models,” Computer,
clustering algorithm. Since finding the proper number of vol. 26, no. 11, pp. 9–19, 1993.
clusters is a challenge related to clustering algorithms, our [5] R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman, “Role-
approach is affected by this issue as well. The same issue based access control models,” Computer, vol. 29, no. 2, pp. 38–47,
1996.
will also be valid in finding the best thresholds to extract [6] P. W. Fong and I. Siahaan, “Relationship-based access control
effective attributes and relations. policies and their policy languages,” in Proceedings of the 16th
We note that, as the proposed algorithm is based on ACM symposium on Access control models and technologies, pp. 51–60,
ACM, 2011.
tuning multiple parameters, it is possible that it gets stuck
[7] J. Jin, G.-J. Ahn, H. Hu, M. J. Covington, and X. Zhang, “Patient-
in minimum optima. For this reason, we do not claim that centric authorization framework for sharing electronic health
it will extract the policy with the highest quality in every records,” in Proceedings of the 14th ACM symposium on Access control
scenario, nor we claim that extracting rules with negative models and technologies, pp. 125–134, ACM, 2009.
[8] L. Karimi and J. Joshi, “Multi-owner multi-stakeholder access
attribute filters and relations would always result in policy control model for a healthcare environment,” in Collaboration and
with higher quality (as we can see in Section 5.3); however, Internet Computing (CIC), 2017 IEEE 3rd International Conference on,
by trying more randomization in cluster initialization and a pp. 359–368, IEEE, 2017.
wider range of parameters, we can get one that is closer to [9] V. C. Hu, D. Ferraiolo, R. Kuhn, A. R. Friedman, A. J. Lang,
M. M. Cogdell, A. Schnitzer, K. Sandlin, R. Miller, K. Scarfone,
global optima. et al., “Guide to attribute based access control (abac) definition and
In our evaluation, we used random selection to create considerations (draft),” NIST special publication, vol. 800, no. 162,
noisy and sparse datasets from complete datasets. Although 2013.
[10] M. Kuhlmann, D. Shohat, and G. Schimpf, “Role mining-revealing
we ensured the same percentage of randomly selected business roles for security administration using data mining tech-
tuples from permitted and denied logs, guaranteeing the nology,” in Proceedings of the eighth ACM symposium on Access
quality of the sampling is difficult. control models and technologies, pp. 179–186, ACM, 2003.
[11] J. Schlegelmilch and U. Steffens, “Role mining with orca,” in
Proceedings of the tenth ACM symposium on Access control models
and technologies, pp. 168–176, ACM, 2005.
8 C ONCLUSION [12] I. Molloy, H. Chen, T. Li, Q. Wang, N. Li, E. Bertino, S. Calo, and
J. Lobo, “Mining roles with semantic meanings,” in Proceedings of
In this paper, we have proposed an unsupervised learning the 13th ACM symposium on Access control models and technologies,
based approach to automating an ABAC policy extraction pp. 21–30, ACM, 2008.
process. The proposed approach is capable of discovering [13] Z. Xu and S. D. Stoller, “Algorithms for mining meaningful roles,”
in Proceedings of the 17th ACM symposium on Access Control Models
both positive and negative attribute expressions as well as
and Technologies, pp. 57–66, ACM, 2012.
positive and negative relation conditions while previous [14] Z. Xu and S. D. Stoller, “Mining attribute-based access control poli-
approaches in access control policy extraction had only cies from logs,” in IFIP Annual Conference on Data and Applications
focused on positive expressions. Furthermore, our work is Security and Privacy, pp. 276–291, Springer, 2014.
[15] Z. Xu and S. D. Stoller, “Mining attribute-based access control
capable of improving the extracted policy through iterations policies,” IEEE Transactions on Dependable and Secure Computing,
of proposed rule pruning and policy refinement algorithms. vol. 12, no. 5, pp. 533–545, 2015.
Such refinement algorithms are based on the false positive [16] E. Medvet, A. Bartoli, B. Carminati, and E. Ferrari, “Evolutionary
and false negative records and they help in increasing the inference of attribute-based access control policies.,” in EMO (1),
pp. 351–365, 2015.
quality of the mined policy. [17] P. Iyer and A. Masoumzadeh, “Mining positive and negative
Most importantly, we have proposed the policy quality attribute-based access control policy rules,” in Proceedings of the
metric which considers both the conciseness and correctness 23nd ACM on Symposium on Access Control Models and Technologies,
pp. 161–172, ACM, 2018.
of the mined policy and is important for comparing the
[18] C. Cotrini, T. Weghorn, and D. Basin, “Mining abac rules from
extracted policy with the original one and for improving sparse logs,” in 2018 IEEE European Symposium on Security and
it as needed. Privacy (EuroS&P), pp. 31–46, IEEE, 2018.
14

[19] P. Marinescu, C. Parry, M. Pomarole, Y. Tian, P. Tague, and I. Papa- Leila Karimi received an undergraduate degree
giannis, “Ivd: Automatic learning and enforcement of authoriza- and the MS degree in information technology en-
tion rules in online social networks,” in 2017 IEEE Symposium on gineering from the Sharif University of Technol-
Security and Privacy (SP), pp. 1094–1109, IEEE, 2017. ogy, Tehran, Iran. She is a Ph.D. candidate at the
[20] D. B. Suits, “Use of dummy variables in regression equations,” School of Computing and Information (SCI), at
Journal of the American Statistical Association, vol. 52, no. 280, the University of Pittsburgh. Her research inter-
pp. 548–551, 1957. ests lie at the intersection of information security,
[21] C. M. Bishop, Pattern recognition and machine learning. springer, data privacy, and machine learning. Currently,
2006. she is working on applying machine learning
[22] Wikipedia contributors, “Accuracy paradox-wikipedia, the free techniques to solve challenging problems in the
encyclopedia,” 2018. [Online; accessed 30-September-2019]. security domain.
[23] I. Molloy, H. Chen, T. Li, Q. Wang, N. Li, E. Bertino, S. Calo, and
J. Lobo, “Mining roles with multiple objectives,” ACM Transactions
on Information and System Security (TISSEC), vol. 13, no. 4, p. 36,
2010.
[24] M. Beckerle and L. A. Martucci, “Formal definitions for usable
access control rule sets from goals to metrics,” in Proceedings of the
Ninth Symposium on Usable Privacy and Security, p. 2, ACM, 2013.
[25] C. J. v. Rijsbergen, Information retrieval. 2.ed. Butterworths, 1979.
[26] F. Cao, J. Liang, and L. Bai, “A new initialization method for cat-
egorical data clustering,” Expert Systems with Applications, vol. 36,
Maryam Aldairi received an undergraduate degree management infor-
no. 7, pp. 10223–10228, 2009.
mation systems From King Faisal University, Alhasa, KSA., and the MS
[27] L. Karimi and J. Joshi, “An unsupervised learning based approach
degree in information science from the University of Pittsburgh. She is
for mining attribute basedaccess control policies,” in Big Data (Big
a Ph.D. student at the School of Computing and Information (SCI), at
Data), 2018 IEEE International Conference on, IEEE, 2018.
the University of Pittsburgh. Her research interests lie at the intersec-
[28] R. L. Thorndike, “Who belongs in the family?,” Psychometrika,
tion of information security, adversarial learning, and machine learning.
vol. 18, no. 4, pp. 267–276, 1953.
Currently, her focus is on applying machine learning techniques to solve
[29] C. Goutte, P. Toft, E. Rostrup, F. Å. Nielsen, and L. K. Hansen, “On challenging problems in the security domain.
clustering fmri time series,” NeuroImage, vol. 9, no. 3, pp. 298–310,
1999.
[30] P. Jaccard, “The distribution of the flora in the alpine zone. 1,” New
phytologist, vol. 11, no. 2, pp. 37–50, 1912.
[31] Amazon.com, “Amazon employee access challenge.” Kaggle.
[32] Montanez, Ken, “Amazon access samples.” UCI Machine Learn-
ing Repository: Amazon Access Samples Data Set.
[33] Devos, Nico and Hes, Robin, “Kmodes implementation.”
[34] J. Vaidya, V. Atluri, and Q. Guo, “The role mining problem: finding
a minimal descriptive set of roles,” in Proceedings of the 12th ACM
symposium on Access control models and technologies, pp. 175–184, James Joshi received the MS degree in com-
ACM, 2007. puter science and the Ph.D. degree in computer
[35] J. Vaidya, V. Atluri, and J. Warner, “Roleminer: mining roles using engineering from Purdue University. He is a pro-
subset enumeration,” in Proceedings of the 13th ACM conference on fessor of School of Computing and Information
Computer and communications security, pp. 144–153, ACM, 2006. (SCI), at the University of Pittsburgh. His re-
[36] D. Zhang, K. Ramamohanarao, and T. Ebringer, “Role engineering search interests include Access Control Mod-
using graph optimisation,” in Proceedings of the 12th ACM sympo- els, Security and Privacy of Distributed Systems,
sium on Access control models and technologies, pp. 139–144, ACM, Trust Management and Information Survivability.
2007. He is the director of LERSAIS at the University of
[37] Q. Guo, J. Vaidya, and V. Atluri, “The role hierarchy mining prob- Pittsburgh. He is an elected fellow of the Society
lem: Discovery of optimal role hierarchies,” in Computer Security of Information Reuse and Integration (SIRI) and
Applications Conference, 2008. ACSAC 2008. Annual, pp. 237–246, is a senior member of the IEEE and the ACM. He currently serves as
IEEE, 2008. a Program Director of the Secure and Trustworthy Cyberspace program
[38] H. Takabi and J. B. Joshi, “Stateminer: an efficient similarity-based at the National Science Foundation.
approach for optimal mining of role hierarchy,” in Proceedings of
the 15th ACM symposium on Access control models and technologies,
pp. 55–64, ACM, 2010.
[39] Q. Ni, J. Lobo, S. Calo, P. Rohatgi, and E. Bertino, “Automating
role-based provisioning by learning from examples,” in Proceed-
ings of the 14th ACM symposium on Access control models and
technologies, pp. 75–84, ACM, 2009.
[40] J. Vaidya, V. Atluri, and Q. Guo, “The role mining problem: A
formal perspective,” ACM Transactions on Information and System
Security (TISSEC), vol. 13, no. 3, p. 27, 2010. Mai Abdelhakim is an assistant professor in the
[41] Z. Xu and S. D. Stoller, “Mining attribute-based access control department of electrical and computer engineer-
policies from rbac policies,” in Emerging Technologies for a Smarter ing at the University of Pittsburgh’s Swanson
World (CEWIT), 2013 10th International Conference and Expo on, school of engineering. She received her Ph.D.
pp. 1–6, IEEE, 2013. degree in Electrical Engineering from Michigan
State University, and Bachelor’s and Master’s
degrees in Electronics and Communications En-
gineering from Cairo University. Her research
interests include cyber-physical systems, cyber-
security, machine learning, stochastic systems
modeling, and information theory.

You might also like