An Automatic Attribute Based Access Control Policy Extraction From Access Logs
An Automatic Attribute Based Access Control Policy Extraction From Access Logs
An Automatic Attribute Based Access Control Policy Extraction From Access Logs
Abstract—With the rapid advances in computing and information technologies, traditional access control models have become
inadequate in terms of capturing fine-grained, and expressive security requirements of newly emerging applications. An attribute-based
access control (ABAC) model provides a more flexible approach to addressing the authorization needs of complex and dynamic
systems. While organizations are interested in employing newer authorization models, migrating to such models pose as a significant
arXiv:2003.07270v4 [cs.CR] 30 Jan 2021
challenge. Many large-scale businesses need to grant authorizations to their user populations that are potentially distributed across
disparate and heterogeneous computing environments. Each of these computing environments may have its own access control
model. The manual development of a single policy framework for an entire organization is tedious, costly, and error-prone.
In this paper, we present a methodology for automatically learning ABAC policy rules from access logs of a system to simplify the
policy development process. The proposed approach employs an unsupervised learning-based algorithm for detecting patterns in
access logs and extracting ABAC authorization rules from these patterns. In addition, we present two policy improvement algorithms,
including rule pruning and policy refinement algorithms to generate a higher quality mined policy. Finally, we implement a prototype of
the proposed approach to demonstrate its feasibility.
Index Terms—Access Control, Attribute Based Access Control, Policy Mining, Policy Engineering, Machine Learning, Clustering.
policy rule can be comprised of a set of positive and well as the unsupervised learning algorithm. In Section 3,
negative filters. Negative filters are useful in scenar- we define the ABAC policy extraction problem, discuss the
ios when an exception needs to be expressed. For related challenges, and introduce the metrics for evaluating
example, a healthcare provider can express the fol- the extracted policy. In Section 4, we present the proposed
lowing rule using a negative attribute filter: “A nurse ABAC policy extraction approach. In Section 5, we present
can read a patient’s record except for payment purposes." the evaluation of the proposed approach on various sets of
Using negative filters in rule expressions results in a policies. We present the related work in Section 6 and the
more concise authorization policy (Section 5). conclusions and future work in Section 8.
• Second, some proposed approaches such as in [14],
[15], [17] are unable to mine a high-quality policy 2 P RELIMINARIES
when the given access log is not complete in the sense
that every possible combination of attribute values is In this section, we overview ABAC, the ABAC policy lan-
not included in the access log (Section 3). guage, and the unsupervised learning algorithm.
• Third, the proposed approaches are unable to mine
a policy from noisy access logs containing over- 2.1 ABAC Model
assignments and under-assignments [16], [18]. Hav- In 2013, NIST published a “Guide to ABAC Definition and
ing noisy access records is a common problem in Consideration" [9], according to which, “the ABAC engine can
evolving domains such as IoT or social networks [19]. make an access control decision based on the assigned attributes
It is essential that an ABAC policy miner should be of the requester, the assigned attributes of the object, environment
capable of handling a reasonable amount of noise to conditions, and a set of policies that are specified in terms of those
be applicable to real-world applications. attributes and conditions.” Throughout the paper, we use user
• Last but not the least, the existing approaches do not attributes, object attributes, and session attributes to refer to the
include techniques for improving the mined policy attributes of the requester, attributes of the object, and the
after the first round of policy extraction. In addition, environmental attributes/conditions, respectively.
in scenarios where the authorization policies may Accordingly, 𝑈, 𝑂, 𝑆, 𝑂𝑃 are sets of users, objects,
change over time (such as in social networks with sessions, and operations in a system and user attributes
addition and removal of various applications), these (𝐴𝑢 ), object attributes (𝐴𝑜 ), and session attributes (𝐴𝑠 ) are
approaches do not provide any guidelines for adjust- mappings of subject attributes, object attributes, and en-
ing the policy. This makes practical deployment of vironmental attributes as defined in the NIST Guide [9].
these approaches very difficult. 𝐸 = 𝑈 ∪ 𝑂 ∪ 𝑆 and 𝐴 = 𝐴𝑢 ∪ 𝐴𝑜 ∪ 𝐴𝑠 are the sets of all
entities and all attributes in the system, respectively.
Furthermore, none of the existing work addresses these
issues in an integrated way. In this paper, we propose Definition 1. (Attribute Range). Given an attribute 𝑎 ∈ 𝐴,
a machine learning based ABAC policy mining approach the attribute range 𝑉𝑎 is the set of all valid values for 𝑎 in
to address these challenges. To summarize, the primary the system.
contributions of this paper are as follows: Definition 2. (Attribute Function). Given an entity 𝑒 ∈ 𝐸,
an attribute function 𝑓 𝑎_𝑒 is a function that maps an entity
1) We propose an unsupervised learning based ap-
to a specific value from the attribute range. Specifically,
proach to extract ABAC policy rules that contain
𝑓 𝑎_𝑒 (𝑒, 𝑎) returns the value of attribute 𝑎 for entity 𝑒.
both positive and negative attribute filters as well
as positive and negative relation conditions. Example 1. 𝑓 𝑎_𝑒 (𝐽𝑜ℎ𝑛, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛) = faculty indicates that the
2) The proposed policy mining approach is effective value of attribute position for user John is faculty.
even with an incomplete set of access logs and in Example 2. 𝑓 𝑎_𝑒 (𝑑𝑒 𝑝1, 𝑐𝑟 𝑠) = {𝑐𝑠101, 𝑐𝑠601, 𝑐𝑠602} indicates
presence of noise. that the value of attribute crs for object dep1 is a set
3) As part of the unsupervised learning based ap- {𝑐𝑠101, 𝑐𝑠601, 𝑐𝑠602}.
proach, we propose the rule pruning and policy
refinement algorithms to enhance the quality of the Each attribute in the system can be a single-valued
mined policy and to ease its maintenance. (atomic) or multi-valued (set). In Example 1 position is a
4) We propose a policy quality metric based on policy single-valued attribute while crs is a multi-valued attribute
correctness and conciseness to be able to compare in Example 2. For simplicity, we consider only atomic at-
different sets of mined policy rules and to select the tributes in this work. Actually, the process of extracting
best one based on some given criteria. ABAC policy with multi-valued attributes is exactly the
5) We implement a prototype of the proposed model same as that with atomic attributes, however, we need to
and evaluate it using various ABAC policies to pre-process data to convert each multi-valued attribute to
show its efficiency and effectiveness. a set of atomic attributes. This can be done using various
techniques such as defining dummy variables [20], 1-of-𝐾
To the best of our knowledge, our proposed approach is scheme [21], etc. At the end of the process and when policy
the first unsupervised learning based ABAC policy mining rules are extracted, we need one more step to convert back
method that can be used to extract ABAC policies with both atomic attribute filters to the corresponding multi-valued
positive and negative attribute and relationship filters. attribute filters.
The rest of the paper is organized as follows. In Section Attribute filters are used to denote the sets of users,
2, we overview the ABAC model and its policy language as objects, and sessions to which an authorization rule applies.
3
Definition 3. (Attribute Filter). An attribute filter is defined Example 5. Consider rule 𝜌1 = h{h𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑠𝑡𝑢𝑑𝑒𝑛𝑡i,
as a set of tuples F = {h𝑎, 𝑣|!𝑣i| 𝑎 ∈ 𝐴 and 𝑣 ∈ 𝑉𝑎 }. Here h𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛, 𝑐𝑎𝑚 𝑝𝑢𝑠i, h𝑡𝑦 𝑝𝑒, 𝑎𝑟𝑡𝑖𝑐𝑙𝑒i}, {h𝑑𝑒 𝑝𝑡𝑢 , 𝑑𝑒 𝑝𝑡 𝑜 i},
h𝑎, 𝑣i is a positive attribute filter tuple that indicates 𝑎 𝑟𝑒𝑎𝑑i. It can be interpreted as “A student can read an
has value 𝑣, and h𝑎, !𝑣i is a negative attribute filter tuple article if he/she is on campus and his/her department matches
that indicates 𝑎 has any value in its range except 𝑣. the department of the article".
Example 3. Tuple h𝑙𝑎𝑏𝑒𝑙, !𝑡𝑜 𝑝-𝑠𝑒𝑐𝑟𝑒𝑡i points to all entities in Definition 10. (Rule Satisfaction) An access request 𝑞 =
the system that do not have “top-secret" as their security h𝑢, 𝑜, 𝑠, 𝑜 𝑝i is said to satisfy a rule 𝜌, denoted as 𝑞 |= 𝜌,
label “label". iff
h𝑢, 𝑜, 𝑠i |= F ∧ h𝑢, 𝑜, 𝑠i |= R ∧ 𝑜 𝑝 𝑞 = 𝑜 𝑝 𝜌 .
Definition 4. (Attribute Filter Satisfaction). An entity 𝑒 ∈ 𝐸
satisfies an attribute filter F , denoted as 𝑒 |= F , iff Definition 11. (ABAC Policy). An ABAC policy is a tuple
∀h𝑎 𝑖 , 𝑣 𝑖 i ∈ F : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) = 𝑣 𝑖 ∧ 𝜋 = h𝐸, 𝑂𝑃, 𝐴, 𝑓 𝑎_𝑒 , Pi where 𝐸, 𝑂𝑃, 𝐴, and P are sets
of entities, operations, attributes, and ABAC rules in the
∀h𝑎 𝑖 , !𝑣 𝑖 i ∈ F : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) ≠ 𝑣 𝑖 .
system and 𝑓 𝑎_𝑒 is the attribute function.
Example 4. Suppose 𝐴𝑢 = {𝑑𝑒 𝑝𝑡, 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑐𝑜𝑢𝑟 𝑠𝑒𝑠}. The set Definition 12. (ABAC Policy Decision). The decision of an
of tuples FU = {h𝑑𝑒 𝑝𝑡, 𝐶𝑆i, h𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑔𝑟𝑎𝑑i} denotes a ABAC policy 𝜋 for an access request 𝑞 denoted as 𝑑 𝜋 (𝑞)
user attribute filter. Here, the graduate students in the is permit iff:
CS department satisfy FU . ∃𝜌 ∈ 𝜋 : 𝑞 |= 𝜌
Definition 5. (Relation Condition). A relation condition is otherwise, the decision is deny.
defined as a set of tuples R = {h𝑎, 𝑏|!𝑏i| 𝑎, 𝑏 ∈ 𝐴 ∧ 𝑎 ≠
𝑏}. Here h𝑎, 𝑏i is a positive relation condition tuple that If an access request satisfies a rule of the access control
indicates 𝑎 and 𝑏 have the same values, and h𝑎, !𝑏i is a policy, then the decision of the system for such access
negative relation condition tuple that indicates 𝑎 and 𝑏 request is permit. If the access request does not satisfy any
do not have the same values. rule in the access control policy then the decision of the
system for such access request is deny.
A relation is used in a rule to denote the equality con- TABLE 1 summarizes the notations used in this paper.
dition between two attributes of users, objects, or sessions.
Note that the two attributes in the relation condition must
2.2 Unsupervised Learning Algorithm
have the same range.
Definition 6. (Relation Condition Satisfaction). An entity Unsupervised learning algorithms try to infer a function
𝑒 ∈ 𝐸 satisfies a relation condition R, denoted as 𝑒 |= R, that describes the structure of unlabeled data. They are
iff useful when no or very few labeled data is available. We
∀h𝑎 𝑖 , 𝑏 𝑖 i ∈ R : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) = 𝑓 𝑎_𝑒 (𝑒, 𝑏 𝑖 ) leverage such methods for extracting ABAC policies from
∀h𝑎 𝑖 , !𝑏 𝑖 i ∈ R : 𝑓 𝑎_𝑒 (𝑒, 𝑎 𝑖 ) ≠ 𝑓 𝑎_𝑒 (𝑒, 𝑏 𝑖 ). access logs.
In particular, given a set of authorization tuples, we
Definition 7. (Access Request). An access request is a tuple employ an unsupervised learning approach to mine and
𝑞 = h𝑢, 𝑜, 𝑠, 𝑜 𝑝i where user 𝑢 ∈ 𝑈 sends a request to the extract an ABAC policy that has high quality. An unsu-
system to perform operation 𝑜 𝑝 ∈ 𝑂𝑃 on object 𝑜 ∈ 𝑂 in pervised learning approach is suitable because there is no
session 𝑠 ∈ 𝑆. labeled data available for desired ABAC rules. ABAC policy
Definition 8. (Authorization Tuple/Access Log). An autho- extraction, in this case, can be considered as a mapping
rization tuple is a tuple 𝑡 = h𝑞, 𝑑i containing decision between authorization tuples to a set of clusters that are
𝑑 made by the access control system for request 𝑞. An representative of the desired ABAC rules. Such a mapping
Access Log L is a set of such tuples. can be expressed as a function, ℎ : X → Y, where:
The decision 𝑑 of an authorization tuple can be permit 1) X is a set of authorization tuples (i.e., access log).
or deny. The tuple with permit decision means that user 𝑢 2) Y is a set of numbered labels (i.e., cluster labels,
can perform an operation 𝑜 𝑝 on an object 𝑜 in session 𝑠. each cluster corresponding to a rule of the ABAC
The authorization tuple with deny decision means that user policy 𝜋).
𝑢 cannot perform operation 𝑜 𝑝 on object 𝑜 in session 𝑠. The goal is then to learn the function ℎ with low cluster-
Access log is a union of Positive Access Log, L + , and ing error and mine the desired policy that is high quality.
Negative Access Log, L − , where:
TABLE 1: Notations
Notation Definition
𝑈 ,𝑂, 𝑆, 𝑂𝑃 Sets of users, objects, sessions, and operations
𝐴𝑢 , 𝐴𝑜 , and 𝐴𝑠 Sets of user attributes, object attributes, and session attributes
𝐸 =𝑈 ∪𝑂∪𝑆 Set of all entities
𝐴 = 𝐴𝑢 ∪ 𝐴𝑜 ∪ 𝐴𝑠 Set of all attributes
𝑉𝑎 Attribute Range: set of all valid values for 𝑎 ∈ 𝐴
𝑓𝑎_𝑒 (𝑒, 𝑎) Attribute Function: a function that maps an entity 𝑒 ∈ 𝐸 to a value from 𝑉𝑎
F = { h𝑎, 𝑣 |!𝑣 i | 𝑎 ∈ 𝐴 ∧ 𝑣 ∈ 𝑉𝑎 } Attribute Filter
R = { h𝑎, 𝑏i | 𝑎, 𝑏 ∈ 𝐴 ∧ 𝑎 ≠ 𝑏 ∧ 𝑉𝑎 = 𝑉𝑏 } Relation Condition
𝑞 = h𝑢, 𝑜, 𝑠, 𝑜 𝑝i Access Request
𝑡 = h𝑞, 𝑑 i Authorization Tuple, showing decision 𝑑 made by the system for request 𝑞
L Access Log, set of authorization tuples
L + = { h𝑞, 𝑑 i | h𝑞, 𝑑 i ∈ L ∧ 𝑑 = 𝑝𝑒𝑟 𝑚𝑖𝑡 } Positive Access Log
L − = { h𝑞, 𝑑 i | h𝑞, 𝑑 i ∈ L ∧ 𝑑 = 𝑑𝑒𝑛𝑦 } Negative Access Log
𝜌 = hF, R, 𝑜 𝑝 |!𝑜 𝑝i ABAC Rule
P Set of all policy rules
𝜋 = h𝐸 , 𝑂𝑃, 𝐴, 𝑓𝑎_𝑒 , P i ABAC Policy
𝑑 𝜋 (𝑞) The decision of an ABAC policy 𝜋 for an access request 𝑞
𝑇 𝑃 𝜋 |L , 𝐹 𝑃 𝜋 |L , 𝑇 𝑁 𝜋 |L , and 𝐹 𝑁 𝜋 |L Relative True Positive, False Positive, True Negative, and False Negative Rates
𝐴𝐶𝐶 𝜋 |L Relative Accuracy Rate
𝐹 -𝑠𝑐𝑜𝑟 𝑒 𝜋 |L Relative F-score
𝑊 𝑆𝐶 ( 𝜋) Weighted Structural Complexity of policy 𝜋
Q𝜋 Policy Quality Metric
framework by completely (or partially) automating the de- interpret they would be. In addition, succinct rules
velopment of ABAC policy rules. are desirable as they are easier to audit and manage.
The primary input to a policy mining algorithm is the log 3) Negative Attribute Filters: The ABAC policy mining
of authorization decisions in the system. The log indicates solution should support both positive and negative
authorization decision (i.e., permit or deny) for any given attribute filters which will result in more concise
access request by a user of the system. For ABAC policy and manageable mined policy.
mining, such a log is accompanied by attributes of entities 4) Relation Conditions: The solution should support the
involved in the log entries. The goal of a policy mining extraction of relation conditions for policy mining
algorithm is to extract ABAC policy rules from access logs in order to generate more concise and manageable
that have high quality with respect to some quality metrics mined policy.
(e.g., policy size and correctness). 5) Sparse Logs: In real-world, the access log that is
We define the ABAC policy extraction problem formally input to the policy mining algorithm may be sparse,
as follows: representing only a small fraction of all possible
Definition 13. (ABAC Policy Extraction Problem). Let access requests. The policy mining algorithm must
𝐼 =< 𝐸, 𝑂𝑃, 𝐴, 𝑓 𝑎_𝑒 , L >, where the components are as be able to extract useful rules even from a sparse
defined earlier, then the ABAC policy extraction problem log.
is to find a set of rules R such that the ABAC policy 6) Mining Negative Authorization Rules: An ABAC pol-
𝜋 =< 𝐸, 𝑂𝑃, 𝐴, 𝑓 𝑎_𝑒 , R > has high quality with respect to icy can contain both positive and negative rules
L. which permit or deny access requests, respectively.
The use of negative rules is helpful in situations
where specifying exceptions to more general rules
3.2 Challenges and Requirements is important. Including negative policy rules would
For an ABAC policy extraction approach to be applicable help in generating a more concise ABAC policy.
to a wide range of real-world scenarios, we identify the Thus, the policy mining algorithm should be able
following challenges and requirements: to extract both positive and negative authorization
rules.
1) Correctness of Mined Policy: The mined policy must 7) Noisy Authorization Log: In the real world and with
be consistent with original authorization log in that complex and dynamic information systems, it is
the access decision of the mined policy must result possible to have a noisy authorization log consisting
in the same access decision of the log entry. An of over-assignments and under-assignments. These
inconsistent extracted policy may result in situations issues occur either due to a wrong configuration
in which an originally authorized access is denied of the original authorization system or improper
(more restrictive) or originally unauthorized access is policy updates by administrators. The policy mining
permitted (less restrictive) by the system. algorithm should be capable of extracting meaning-
2) Complexity of Mined Policy: The policy mining algo- ful rules even in presence of an acceptable amount
rithm should endeavor to extracting a policy that is of noise in the input access log.
as concise as possible. Since the policy rules need to 8) Dynamic and Evolving Policies: Modern information
be manipulated by human administrators, the more systems are often dynamic. The authorization needs
concise they are, the more manageable and easier to
5
of these systems and the attributes of the entities in The relative accuracy metric, 𝐴𝐶𝐶 𝜋 | L , measures the ac-
the environment evolve rapidly. These changes will curacy of mined policy 𝜋 with regards to the decisions made
result in over-assignments or under-assignments. by the original policy indicated by L and is defined formally
The proposed method should employ a mechanism as follows:
to support the dynamicity of the information sys- Definition 16. (Relative Accuracy). Given the relative true
tems and their authorization policies and ease the positive and negative rates, the relative accuracy of 𝜋
maintenance of evolving systems. regarding L denoted as 𝐴𝐶𝐶 𝜋 | L is calculated as follows:
Our proposed approach addresses all the requirements 𝑇 𝑃𝜋 |L + 𝑇 𝑁𝜋 |L
except the sixth one. Table 2 shows the challenges that are 𝐴𝐶𝐶 𝜋 | L =
𝑇 𝑃𝜋 |L + 𝑇 𝑁 𝜋 | L + 𝐹𝑃 𝜋 | L + 𝐹 𝑁 𝜋 | L
addressed by our proposed approach and how it improves
upon the state-of-the-art policy mining techniques. In Sec- As accuracy may be misleading in unbalanced data sets
tion 6, we discuss the existing solutions in details. [22] (which is very probable in case of access logs), we use
relative F-score to better evaluate the mined policy:
3.3 Evaluation Metrics
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝜋 | L · 𝑅𝑒𝑐𝑎𝑙𝑙 𝜋 | L
One of the main metrics for evaluating the quality of an 𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L = 2 ·
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝜋 | L + 𝑅𝑒𝑐𝑎𝑙𝑙 𝜋 | L
extracted policy is how accurately it matches the original
policy. That means the authorization decisions made by the Policies with higher relative F-score are better as they are
extracted policy for a set of access requests should be similar more consistent with the original access log.
to the decisions made by the original policy for that set of On the other hand, as the number of filters in each rule
requests. As an example, if the decision of the original policy and the number of rules in an access control policy increases,
for an access request 𝑞 is permit, then the decision of the policy intelligibility would decrease and maintenance of the
mined policy for the same access request must be permit policy would become harder. Hence, complexity is another
as well. If the mined policy denies the same access request, key metric for evaluating the quality of a policy.
then we record this authorization tuple as a False Negative. Weighted Structural Complexity (WSC) is a general-
We define Relative True Positive, Relative False Positive, Relative ization of policy size and was first introduced for RBAC
True Negative, and Relative False Negative rates, respectively, policies [23] and later extended for ABAC policies [15]. WSC
as follows: is consistent with usability studies of access control rules,
Definition 14. (Relative True Positive Rate). Given an access which indicates that the more concise the policies are the
log L and an ABAC policy 𝜋, the relative true positive more manageable they become [24]. Informally, for a given
rate of 𝜋 regarding L denoted as 𝑇 𝑃 𝜋 | L is the portion of ABAC policy, its WSC is a weighted sum of its elements.
positive access logs for which the decision of 𝜋 is permit: Formally, for an ABAC policy 𝜋 with rules P, its WSC is
defined as follows:
|{h𝑞, 𝑑i ∈ L + |𝑑 𝜋 (𝑞) = 𝑝𝑒𝑟𝑚𝑖𝑡}|
𝑇 𝑃𝜋 |L = 𝑊 𝑆𝐶 (𝜋) = 𝑊 𝑆𝐶 (P)
|L + |
Here, |𝑠| is the cardinality of set 𝑠. ∑︁
𝑊 𝑆𝐶 (P) = 𝑊 𝑆𝐶 (𝜌)
Definition 15. (Relative False Positive Rate). The relative 𝜌∈ P
false positive rate of 𝜋 regarding L denoted as 𝐹𝑃 𝜋 | L is
the portion of negative access logs for which the decision
of 𝜋 is permit: 𝑊 𝑆𝐶 (𝜌 = hFU , FO , FS , R, 𝑜 𝑝, 𝑑i) = 𝑤 1𝑊 𝑆𝐶 (FU )+
𝑤 2𝑊 𝑆𝐶 (FO ) + 𝑤 3𝑊 𝑆𝐶 (FS ) + 𝑤 4𝑊 𝑆𝐶 (R)
|{h𝑞, 𝑑i ∈ L − |𝑑 𝜋 (𝑞) = 𝑝𝑒𝑟𝑚𝑖𝑡}|
𝐹𝑃 𝜋 | L = ∑︁
|L − | ∀𝑠 ∈ {FU , FO , FS , R} : 𝑊 𝑆𝐶 (𝑠) = |𝑠|
Similarly, we calculate the relative true negative rate and where |𝑠| is the cardinality of set 𝑠 and each 𝑤 𝑖 is a user-
false negative rate of 𝜋 regarding L, denoted as 𝑇 𝑁 𝜋 | L and specified weight.
𝐹 𝑁 𝜋 | L , respectively, as follows: Van Rijsbergen proposes an effectiveness measure for
|{h𝑞, 𝑑i ∈ L − |𝑑 𝜋 (𝑞) = 𝑑𝑒𝑛𝑦}| combining two different metrics 𝑃 and 𝑅 in [25] as follows :
𝑇 𝑁𝜋 |L =
|L − | 1
𝐸 =1−
𝛼 1−𝛼
|{h𝑞, 𝑑i ∈ L + |𝑑 𝜋 (𝑞) = 𝑑𝑒𝑛𝑦}| +
𝐹𝑁𝜋 |L = 𝑃 𝑅
|L + |
Given relative F-score and WSC measures for various
The relative precision and relative recall are calculated as mined policies resulting from running different mining al-
follows: gorithms over access log, it may not be straightforward to
𝑇 𝑃𝜋 |L select the best algorithm and, hence, the mined policy with
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝜋 | L = the highest quality. So, to be able to compare the quality of
𝑇 𝑃 𝜋 | L + 𝐹𝑃 𝜋 | L
different mined ABAC policies, we combine the two metrics
𝑇 𝑃𝜋 |L based on Van Rijsbergen’s effectiveness measure [25] and
𝑅𝑒𝑐𝑎𝑙𝑙 𝜋 | L = define the Policy Quality Metric as follows:
𝑇 𝑃𝜋 |L + 𝐹𝑁𝜋 |L
6
Xu et al. [15] Medvet et al. [16] Iyer et. al [17] Cotrini et al. [18] Our Proposed Approach
Policy Correctness X X X X X
Policy Complexity X X X X X
Negative Attribute Filters 7 7 7 7 X
Relation Conditions X X X 7 X
Sparse Logs 7 X 7 X X
Negative Authorization Rules 7 7 X 7 7
Noisy Authorization Log X 7 7 7 X
System Dynamicity 7 7 7 7 X
𝛼 1 − 𝛼 −1 1. Data Pre-processing
Q𝜋 = ( + )
𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L Δ𝑊 𝑆𝐶 𝜋 Handling missing
values, Converting to
2. Parameter Tuning
1 categorical values
Here 𝛼 = where 𝛽 determines the importance of Finding best number of clusters,
1 + 𝛽2 best cluster initialization, and
relative F-score over policy complexity and Δ𝑊 𝑆𝐶 𝜋 shows appropriate thresholds
the relative reduction in the complexity with regards to the
6. Policy Refinement
complexity of the most complex mined policy. Δ𝑊 𝑆𝐶 𝜋 is 3. Clustering
calculated as follows: Refining policy rules
based on FP and FN Clustering data using
records k-mean/k-mode
𝑊 𝑆𝐶𝑚𝑎𝑥 − 𝑊 𝑆𝐶 (𝜋) + 1 algorithm
Δ𝑊 𝑆𝐶 𝜋 =
𝑊 𝑆𝐶𝑚𝑎𝑥
𝑊 𝑆𝐶𝑚𝑎𝑥 is the weighted structural complexity of the most 4. Rule Extraction
5. Rule Pruning
complex mined policy. Finding effective
Removing duplicate
Definition 17. (Most Complex Mined Policy). The most attributes and
rules, Finding similar
relations, Building
rules and eliminating
complex mined policy is the mined policy with the them
Rules
initialization method, this method provides more robustness effective positive (negative) attribute pair of 𝜌𝑖 correspond-
and better accuracy in the clustering process [26]. ing to cluster 𝐶𝑖 , where the frequency of occurrence of 𝑣 𝑗
in the set of all the records of cluster 𝐶𝑖 is much higher
4.3 Parameter Tuning (lower) than its frequency of occurrence in the original
data; this is determined based on a threshold T𝑃 (T𝑁 ).
In the next step, we tune the learning parameters. There are The attribute expression h𝑎 𝑗 , 𝑣 𝑗 i (h𝑎 𝑗 , !𝑣 𝑗 i) is added to
several challenges that need to be addressed in this step, the attribute filters of the extracted rule 𝜌𝑖 for 𝐶𝑖 .
which include the following:
In the final step, we extract the relation conditions for
4.3.1 Number of Clusters (k) AC rules for each cluster. This will be done based on the
One of the main challenges in an unsupervised learning frequency of equality between pairs of attributes in the
is determining the number of clusters, 𝑘. In our sample records of each cluster. We define effective positive relation
policies, as we know the number of rules in each policy, and effective negative relation as follows:
we can set the number of clusters beforehand but in a
real situation as we do not know the size of the rules in Definition 19. (Effective Positive (Negative) Relation). Let
advance, making the correct choice of 𝑘 is difficult. One of 𝑅 = {h𝑎, 𝑏i} be the set of all possible relations between
the popular methods for determining the number of clusters pairs of attributes in the system; we define h𝑎 𝑗 , 𝑏 𝑗 i as
in an unsupervised learning model is the Elbow Method [28], an effective positive (negative) relation pairs of 𝜌𝑖 cor-
[29]. This method is based on total within group sum of responding to cluster 𝐶𝑖 , where the frequency of 𝑎 𝑗
squares. 𝑘 will be chosen as the number of clusters if adding equals 𝑏 𝑗 in all the records of cluster 𝐶𝑖 is much higher
another cluster doesn’t give much better modeling of the (lower) than their frequency in the original data; this is
data (i.e., the elbow point of the graph). determined based on a threshold 𝜃 𝑃 (𝜃 𝑁 ). The relation
As a second approach, we choose a number of clusters h𝑎 𝑗 , 𝑏 𝑗 i (h𝑎 𝑗 , !𝑏 𝑗 i) is added to the relation conditions of
(𝑘) which gives the best modeling of the data in terms of the the extracted rule 𝜌𝑖 for this cluster.
policy quality metric. For this purpose, we run our clustering
algorithm for different values of 𝑘 and calculate the accuracy We note that the values of the thresholds T𝑃 , T𝑁 , 𝜃 𝑃 ,
of the corresponding model using 10-fold cross-validation. and 𝜃 𝑁 will be different for each data set. To find the
The value of 𝑘 that maximizes the accuracy of the model is best threshold values for each data set, we run the rule
selected as the final number of clusters. extraction algorithm for different values of thresholds, and
Note that increasing 𝑘 will ultimately reduce the amount the values which result in the maximum accuracy over the
of clustering error or it will increase the accuracy of the cross-validation data set will be selected.
model, but by increasing the number of clusters, the num- Algorithms 1 and 2 show effective attribute and effective
ber of extracted rules will also increase resulting in more relation extraction procedures, respectively.
complexity (i.e., higher WSC). So it is important to find an
optimal 𝑘 that balances between policy accuracy and WSC. Algorithm 1 Effective attribute extraction algorithm
1: procedure EXTRACTATTRIBUTE F ILTERS
4.3.2 Cluster Initialization & Local Optima Input: 𝐶𝑖 , 𝐴, 𝑉, L, T𝑃 , T𝑁
Different cluster initializations can lead to a different set Output: F
of clusters as k-means/k-modes may converge to a local 2: F ←∅
optima. To overcome this issue, for a given number of 3: for all 𝑎 ∈ 𝐴 do
clusters, 𝑘, we train multiple models with different cluster 4: for all 𝑣 𝑗 ∈ 𝑉𝑎 do
initializations and then select the partition with the smallest 5: if 𝐹𝑟𝑒𝑞(𝑣 𝑗 , 𝐶𝑖 ) − 𝐹𝑟𝑒𝑞(𝑣 𝑗 , L) > T𝑃 then
clustering error. 6: F 𝑖 ← F ∪ h𝑎, 𝑣 𝑗 i
7: end if
4.4 Policy Rules Extraction 8: if 𝐹𝑟𝑒𝑞(𝑣 𝑗 , L) − 𝐹𝑟𝑒𝑞(𝑣 𝑗 , 𝐶𝑖 ) > T𝑁 then
9: F 𝑖 ← F ∪ h𝑎, !𝑣 𝑗 i
The main phase in our proposed approach is the extraction 10: end if
of ABAC policy rules. In the first step, we need to collect all 11: end for
the authorization tuples related to each rule of the policy. We 12: end for
use data clustering for this purpose. We divide the access log return 𝜌𝑖
into clusters where the records in each cluster correspond to 13: end procedure
one AC rule in the system. This is done based on finding
similar patterns between features (i.e., attribute values) of
the records (i.e., access control tuples). In the second step,
we extract the attribute filters of such a rule. We adapt the
rule extraction algorithm in [27] and extend it to extract both 4.5 Policy Enhancement
positive and negative attribute filters. We define effective After the first phase of policy rule extraction, we get a policy
positive attribute and effective negative attribute as follows: which may not be as accurate and concise as we desire. We
Definition 18. (Effective Positive (Negative) Attribute). Let enhance the quality of the mined policy through iterations
𝑆 = {h𝑎, 𝑣i} be the set of all possible attribute-value pairs of policy improvement steps that include: rule pruning and
in a system; we define h𝑎 𝑗 , 𝑣 𝑗 i ∈ 𝑆 (h𝑎 𝑗 , !𝑣 𝑗 i ∈ 𝑆) as an policy refinement.
8
# 𝜋 |P | | 𝐴| |𝑉 | |L| | L+ | | L− |
𝜋1 UniversityP 10 11 45 2,700K 231K 2,468K
𝜋2 HealthcareP 9 13 40 982K 229K 753K
𝜋3 ProjectManagementP 11 14 44 5,900K 505K 5,373K
𝜋4 UniversityPN 10 11 45 2,700K 735K 1,964K
𝜋5 HealthcarePN 9 13 40 982K 269K 713K
𝜋6 ProjectManagementPN 11 14 44 5,900K 960K 4,918K
𝜋7 Random Policy 1 10 8 27 17K 2,742 14K
𝜋8 Random Policy 2 10 10 48 5,250K 245K 5,004K
𝜋9 Random Policy 3 10 12 38 560K 100K 459K
𝜋10 Amazon Kaggle - 10 15K 32K 30K 1897
𝜋11 Amazon UCI - 14 7,153 70K 36K 34K
reversing the decision of 10% of authorization tuples in the Our second set of experiments is on partial datasets. The
complete access logs. algorithm proposed by Xu and Stoller [14] and the approach
For each of the manual policies, we consider two differ- presented by Cotrini et al. [18] are not able to handle
ent sets of policy rules; the first one only contains positive complete datasets as these datasets are huge. To be able to
attribute filters and relations while the second one includes compare the performance of our proposed algorithm with
both positive and negative attribute filters and relations. We their work, we generated 0.1% sparse (partial) datasets and
have included these policies in Appendix A. run all algorithms over these partial datasets. The results of
Table 3 shows the details of the manual and random these experiments are shown in Table 5 and Figures 2, 3, and
access log datasets. In this table, |P | shows the number of 4.
rules in the original policy, | 𝐴| and |𝑉 | show the number The algorithm proposed by Xu and Stoller [14] and the
of attributes and attribute values and |L|, |L + |, |L − | show approach presented by Cotrini et al. [18] do not generate
the number of access control tuples, the number of positive policy rules with negative attribute filters and relations,
access logs, and the number of negative access logs in the however we report the results of their algorithms over
given dataset, respectively. datasets related to policy rules including negations (policies
𝜋4 , 𝜋5 , 𝜋6 ) to show how the quality of mined policies would
5.2 Experimental Setup be impacted if the mining algorithm does not extract rules
To evaluate our proposed method, we use a computer with that include negation.
2.6 GHz Intel Core i7 and 16 GB of RAM. We use Python 3 in
the mining and the evaluation process. The algorithms were 5.3.1 The F-Score of the Mined Policies
highly time-efficient (e.g., maximum time consumption is
Table 4 shows the final 𝐹-𝑠𝑐𝑜𝑟𝑒 𝜋 | L of our proposed ap-
less than half an hour).
proach after several rounds of refinement over all complete
We use kmodes library [33] for clustering our data. The
datasets. As we can see in Table 4, the proposed approach
initialization based on density (CAO) [26] is chosen for
achieves high F-score across all experiments except for 𝜋6 .
cluster initialization in kmodes algorithm.
𝜋6 is a very complex dataset with both positive and negative
To find optimal 𝑘, we apply the Silhouette method to
attributes and relation filters including 14 attributes, 44
test different values of 𝑘. We examine each value of 𝑘 in
attribute values, and around six million access records. The
pre-defined set [10, 20]. Then the 𝑘 value that results in the
final policy quality for this dataset is around 0.63, which is
highest Silhouette score is used in the final model.
acceptable considering the complexity of the policy.
To generate the synthesized access log L, we brute force
through all attributes 𝐴 and their values 𝑉𝑎 to produce Table 5 and Figure 2 show the comparison of the F-
all possible combinations for the tuples. This method was Scores of policies mined by our proposed approach with
used to generate a complete access log for the random and that of previous work over partial datasets (with 0.1% of
manual policy datasets. We generate two sets of partial the complete datasets). The F-Score of policies mined by our
datasets; the 10% partial datasets are used to check the algorithm is very close to the one done by the approach
efficiency of the proposed approach over sparse datasets proposed by Cotrini et al. [18]. As we can see, our proposed
(Table 4) and the 0.1% partial datasets are used to compare approach outperforms theirs in half of the experiments.
the proposed approach with previous work (Table 5). We
also generate a set of noisy datasets to check the efficiency 5.3.2 The Complexity of the Mined Policies
of the proposed algorithm over noisy access log. The results In Table 4, we can see the final 𝑊 𝑆𝐶 of the policies mined by
of such experiments are reported in Table 4. our proposed approach. All extracted policies have the com-
For all experiments, the optimal thresholds for selecting plexity lower than 100 which is much lower than those of the
effective attributes and relations are between 0.2 and 0.3. most complex policies for individual datasets. According to
Definition 17, the most complex policy for each dataset has
5.3 Results the same complexity as the original positive access log (L + ).
We first evaluate the performance of our policy mining Given numbers in Tables 3 and 4, the most complex policies
algorithm on complete datasets. Table 4 shows the results for these scenarios are thousands of times more complex
of these experiments. than the extracted policies by our approach.
11
Policy Quality
0.4
5.3.3 The Policy Quality of the Mined Policies
0.2
Finally, Table 4 shows the quality of the extracted policies
through our proposed approach. We can see that out of
all datasets that our proposed algorithm was applied on, 0.0
Partial _1 Partial _2 Partial _3 Partial _4 Partial _5 Partial _6 _10 _11
around 75% of the cases reached the policy quality of more
than 0.8, which is significant, considering the huge size of
original access logs (each more than 30K records).
Fig. 4: The Quality of the Policies Mined by ABAC Mining
According to Figure 4, in most cases the policy quality Algorithms
of the policies mined by our proposed approach is higher
than those of the policies extracted by other ABAC mining
algorithms. 6 R ELATED W ORK
As RBAC approach became popular, many organization
decided to equip their information systems with more re-
cent access control model, however migrating to RBAC
100 Proposed Approach
Xu and Stoller [14] from legacy access control systems was a huge obstacle for
Cotrini et al. [18]
such environments. As a result, several researchers have
80
addressed such a challenge by introducing automated role
extraction algorithms [10], [11], [12], [13], [23], [34], [35], [36],
60
[37], [38], [39]. Role engineering or role mining are the terms
F-score
TABLE 4: Results of Our Proposed Approach on Various Synthesized and Real Policy Datasets
𝜋 Total Running Time (s) Optimal 𝑘 P𝑚𝑖𝑛𝑒𝑑 𝐴𝐶𝐶 𝜋 |L 𝐹 -𝑠𝑐𝑜𝑟 𝑒 𝜋 |L 𝑊 𝑆𝐶𝑜𝑟𝑖𝑔 𝑊 𝑆𝐶𝑚𝑖𝑛𝑒𝑑 Q𝜋
𝜋1 9376.556 15 20 97.5% 83.6% 33 91 0.91
Partial 𝜋1 (10%) 1994.769 15 13 97.29% 82.21% 33 54 0.90
Noisy 𝜋1 (10%) 4979.56 10 8 96.94% 80% 33 28 0.90
𝜋2 2180.745 18 18 85.49% 75.93% 33 71 0.86
Partial 𝜋2 (10%) 4787.98 10 8 96.94% 85.33% 33 28 0.92
Noisy 𝜋2 (10%) 7339.91 8 15 72.22% 82.13% 33 27 0.90
𝜋3 7795.44 15 17 95.6% 65.63% 44 55 0.80
Partial 𝜋3 (10%) 1347.29 6 10 95.2% 62.24% 44 56 0.77
Noisy 𝜋3 (10%) 1912.72 15 15 94.47% 62.66% 44 81 0.77
𝜋4 13662.62 7 16 86.7% 71.58% 33 40 0.83
𝜋5 8681.64 15 15 78.11% 62% 33 67 0.76
𝜋6 12905.78 20 17 88.05% 46.28% 44 80 0.63
𝜋7 24.63 8 20 93% 78.33% 33 65 0.88
𝜋8 13081.20 10 14 99.12% 91.28% 33 51 0.95
𝜋9 2266.68 8 16 92.17% 79.66% 33 46 0.89
𝜋10 265.3 15 20 94% 97% - 44 0.98
𝜋11 1010.43 24 25 98.49% 99% - 92 0.82
TABLE 5: Comparison of Our Proposed Approach with Previous Work on Various Synthesized
and Real Policy Datasets
from available information, e.g., user permission relations their approach is a multi-objective optimization framework
and attributes) and construct candidates rules. They then which incorporates requirements on both correctness and
generalize the candidate rules by replacing conjuncts in expressiveness, it suffers from the same issue as [15].
attribute expressions with constraints. The main limitation Iyer and Masoumzadeh [17] propose a more systematic,
of these algorithms is that as they are based on heuristic yet heuristic ABAC policy mining approach which is based
approaches, the proposed techniques work very well for on the rule mining algorithm called PRISM. It inherits
simple and small scale AC policies, however, as the number shortcomings associated with PRISM that includes dealing
of rules in the policy and the number of elements in each with a large dimensionality of the search space of attribute
rule increases, they do not perform well. values and generation of a huge number of rules.
Following Xu and Stroller’s proposed method, Medvet Cotrini et al. propose an algorithm called Rhapsody for
et al. [16] propose a multi-objective evolutionary algorithm mining ABAC rules from sparse logs [18]. Their proposed
for extracting ABAC policies. The proposed approach is a approach is built upon subgroup discovery algorithms.
separate and conquer algorithm, in each iteration of which, a They define a novel metric, reliability which measures how
new rule is learned and the set of access log tuples becomes overly permissive an extracted rule is. In addition, they
smaller. Their algorithm employs several search-optimizing propose a universal cross-validation metric for evaluating
features to improve the quality of the mined rules. Although the mined policy when the input log is sparse. However,
13
their algorithm is not capable of mining policies from logs We have evaluated our policy extraction algorithm on ac-
with many attributes as the number of extracted rules grows cess logs generated for various sample policies and demon-
exponentially in the number of attributes of the system. strated its feasibility. Furthermore, we have shown that our
approach outperforms previous works in terms of policy
quality.
7 D ISCUSSION AND L IMITATIONS As future work, we plan to extend our method to
As mentioned in section 5.3, our proposed approach is support numerical data and extract negative authorization
able to achieves a practical level of performance when rules as well while studying the effects of various conflict
applied to both synthesized and real datasets. In the case resolution strategies on the quality of the mined policy.
of synthesized datasets, the proposed approach is capable
of mining policies containing both positive and negative
attribute filters from complete datasets. On the other hand, R EFERENCES
our proposed approach shows potential for use in sparse [1] R. S. Sandhu and P. Samarati, “Access control: principle and
datasets. In addition, the real datasets contain a large num- practice,” IEEE communications magazine, vol. 32, no. 9, pp. 40–48,
ber of attributes and attribute values as shown in Table 1994.
3. The ability of our proposed approach in mining high- [2] M. A. Harrison, W. L. Ruzzo, and J. D. Ullman, “Protection in
operating systems,” Communications of the ACM, vol. 19, no. 8,
quality policies for these datasets shows that the size of pp. 461–471, 1976.
attributes and attribute values have minimal impact on the [3] D. E. Bell and L. J. LaPadula, “Secure computer systems: Math-
effectiveness of our approach. ematical foundations,” tech. rep., MITRE CORP BEDFORD MA,
1973.
The proposed approach is based on an unsupervised [4] R. S. Sandhu, “Lattice-based access control models,” Computer,
clustering algorithm. Since finding the proper number of vol. 26, no. 11, pp. 9–19, 1993.
clusters is a challenge related to clustering algorithms, our [5] R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman, “Role-
approach is affected by this issue as well. The same issue based access control models,” Computer, vol. 29, no. 2, pp. 38–47,
1996.
will also be valid in finding the best thresholds to extract [6] P. W. Fong and I. Siahaan, “Relationship-based access control
effective attributes and relations. policies and their policy languages,” in Proceedings of the 16th
We note that, as the proposed algorithm is based on ACM symposium on Access control models and technologies, pp. 51–60,
ACM, 2011.
tuning multiple parameters, it is possible that it gets stuck
[7] J. Jin, G.-J. Ahn, H. Hu, M. J. Covington, and X. Zhang, “Patient-
in minimum optima. For this reason, we do not claim that centric authorization framework for sharing electronic health
it will extract the policy with the highest quality in every records,” in Proceedings of the 14th ACM symposium on Access control
scenario, nor we claim that extracting rules with negative models and technologies, pp. 125–134, ACM, 2009.
[8] L. Karimi and J. Joshi, “Multi-owner multi-stakeholder access
attribute filters and relations would always result in policy control model for a healthcare environment,” in Collaboration and
with higher quality (as we can see in Section 5.3); however, Internet Computing (CIC), 2017 IEEE 3rd International Conference on,
by trying more randomization in cluster initialization and a pp. 359–368, IEEE, 2017.
wider range of parameters, we can get one that is closer to [9] V. C. Hu, D. Ferraiolo, R. Kuhn, A. R. Friedman, A. J. Lang,
M. M. Cogdell, A. Schnitzer, K. Sandlin, R. Miller, K. Scarfone,
global optima. et al., “Guide to attribute based access control (abac) definition and
In our evaluation, we used random selection to create considerations (draft),” NIST special publication, vol. 800, no. 162,
noisy and sparse datasets from complete datasets. Although 2013.
[10] M. Kuhlmann, D. Shohat, and G. Schimpf, “Role mining-revealing
we ensured the same percentage of randomly selected business roles for security administration using data mining tech-
tuples from permitted and denied logs, guaranteeing the nology,” in Proceedings of the eighth ACM symposium on Access
quality of the sampling is difficult. control models and technologies, pp. 179–186, ACM, 2003.
[11] J. Schlegelmilch and U. Steffens, “Role mining with orca,” in
Proceedings of the tenth ACM symposium on Access control models
and technologies, pp. 168–176, ACM, 2005.
8 C ONCLUSION [12] I. Molloy, H. Chen, T. Li, Q. Wang, N. Li, E. Bertino, S. Calo, and
J. Lobo, “Mining roles with semantic meanings,” in Proceedings of
In this paper, we have proposed an unsupervised learning the 13th ACM symposium on Access control models and technologies,
based approach to automating an ABAC policy extraction pp. 21–30, ACM, 2008.
process. The proposed approach is capable of discovering [13] Z. Xu and S. D. Stoller, “Algorithms for mining meaningful roles,”
in Proceedings of the 17th ACM symposium on Access Control Models
both positive and negative attribute expressions as well as
and Technologies, pp. 57–66, ACM, 2012.
positive and negative relation conditions while previous [14] Z. Xu and S. D. Stoller, “Mining attribute-based access control poli-
approaches in access control policy extraction had only cies from logs,” in IFIP Annual Conference on Data and Applications
focused on positive expressions. Furthermore, our work is Security and Privacy, pp. 276–291, Springer, 2014.
[15] Z. Xu and S. D. Stoller, “Mining attribute-based access control
capable of improving the extracted policy through iterations policies,” IEEE Transactions on Dependable and Secure Computing,
of proposed rule pruning and policy refinement algorithms. vol. 12, no. 5, pp. 533–545, 2015.
Such refinement algorithms are based on the false positive [16] E. Medvet, A. Bartoli, B. Carminati, and E. Ferrari, “Evolutionary
and false negative records and they help in increasing the inference of attribute-based access control policies.,” in EMO (1),
pp. 351–365, 2015.
quality of the mined policy. [17] P. Iyer and A. Masoumzadeh, “Mining positive and negative
Most importantly, we have proposed the policy quality attribute-based access control policy rules,” in Proceedings of the
metric which considers both the conciseness and correctness 23nd ACM on Symposium on Access Control Models and Technologies,
pp. 161–172, ACM, 2018.
of the mined policy and is important for comparing the
[18] C. Cotrini, T. Weghorn, and D. Basin, “Mining abac rules from
extracted policy with the original one and for improving sparse logs,” in 2018 IEEE European Symposium on Security and
it as needed. Privacy (EuroS&P), pp. 31–46, IEEE, 2018.
14
[19] P. Marinescu, C. Parry, M. Pomarole, Y. Tian, P. Tague, and I. Papa- Leila Karimi received an undergraduate degree
giannis, “Ivd: Automatic learning and enforcement of authoriza- and the MS degree in information technology en-
tion rules in online social networks,” in 2017 IEEE Symposium on gineering from the Sharif University of Technol-
Security and Privacy (SP), pp. 1094–1109, IEEE, 2017. ogy, Tehran, Iran. She is a Ph.D. candidate at the
[20] D. B. Suits, “Use of dummy variables in regression equations,” School of Computing and Information (SCI), at
Journal of the American Statistical Association, vol. 52, no. 280, the University of Pittsburgh. Her research inter-
pp. 548–551, 1957. ests lie at the intersection of information security,
[21] C. M. Bishop, Pattern recognition and machine learning. springer, data privacy, and machine learning. Currently,
2006. she is working on applying machine learning
[22] Wikipedia contributors, “Accuracy paradox-wikipedia, the free techniques to solve challenging problems in the
encyclopedia,” 2018. [Online; accessed 30-September-2019]. security domain.
[23] I. Molloy, H. Chen, T. Li, Q. Wang, N. Li, E. Bertino, S. Calo, and
J. Lobo, “Mining roles with multiple objectives,” ACM Transactions
on Information and System Security (TISSEC), vol. 13, no. 4, p. 36,
2010.
[24] M. Beckerle and L. A. Martucci, “Formal definitions for usable
access control rule sets from goals to metrics,” in Proceedings of the
Ninth Symposium on Usable Privacy and Security, p. 2, ACM, 2013.
[25] C. J. v. Rijsbergen, Information retrieval. 2.ed. Butterworths, 1979.
[26] F. Cao, J. Liang, and L. Bai, “A new initialization method for cat-
egorical data clustering,” Expert Systems with Applications, vol. 36,
Maryam Aldairi received an undergraduate degree management infor-
no. 7, pp. 10223–10228, 2009.
mation systems From King Faisal University, Alhasa, KSA., and the MS
[27] L. Karimi and J. Joshi, “An unsupervised learning based approach
degree in information science from the University of Pittsburgh. She is
for mining attribute basedaccess control policies,” in Big Data (Big
a Ph.D. student at the School of Computing and Information (SCI), at
Data), 2018 IEEE International Conference on, IEEE, 2018.
the University of Pittsburgh. Her research interests lie at the intersec-
[28] R. L. Thorndike, “Who belongs in the family?,” Psychometrika,
tion of information security, adversarial learning, and machine learning.
vol. 18, no. 4, pp. 267–276, 1953.
Currently, her focus is on applying machine learning techniques to solve
[29] C. Goutte, P. Toft, E. Rostrup, F. Å. Nielsen, and L. K. Hansen, “On challenging problems in the security domain.
clustering fmri time series,” NeuroImage, vol. 9, no. 3, pp. 298–310,
1999.
[30] P. Jaccard, “The distribution of the flora in the alpine zone. 1,” New
phytologist, vol. 11, no. 2, pp. 37–50, 1912.
[31] Amazon.com, “Amazon employee access challenge.” Kaggle.
[32] Montanez, Ken, “Amazon access samples.” UCI Machine Learn-
ing Repository: Amazon Access Samples Data Set.
[33] Devos, Nico and Hes, Robin, “Kmodes implementation.”
[34] J. Vaidya, V. Atluri, and Q. Guo, “The role mining problem: finding
a minimal descriptive set of roles,” in Proceedings of the 12th ACM
symposium on Access control models and technologies, pp. 175–184, James Joshi received the MS degree in com-
ACM, 2007. puter science and the Ph.D. degree in computer
[35] J. Vaidya, V. Atluri, and J. Warner, “Roleminer: mining roles using engineering from Purdue University. He is a pro-
subset enumeration,” in Proceedings of the 13th ACM conference on fessor of School of Computing and Information
Computer and communications security, pp. 144–153, ACM, 2006. (SCI), at the University of Pittsburgh. His re-
[36] D. Zhang, K. Ramamohanarao, and T. Ebringer, “Role engineering search interests include Access Control Mod-
using graph optimisation,” in Proceedings of the 12th ACM sympo- els, Security and Privacy of Distributed Systems,
sium on Access control models and technologies, pp. 139–144, ACM, Trust Management and Information Survivability.
2007. He is the director of LERSAIS at the University of
[37] Q. Guo, J. Vaidya, and V. Atluri, “The role hierarchy mining prob- Pittsburgh. He is an elected fellow of the Society
lem: Discovery of optimal role hierarchies,” in Computer Security of Information Reuse and Integration (SIRI) and
Applications Conference, 2008. ACSAC 2008. Annual, pp. 237–246, is a senior member of the IEEE and the ACM. He currently serves as
IEEE, 2008. a Program Director of the Secure and Trustworthy Cyberspace program
[38] H. Takabi and J. B. Joshi, “Stateminer: an efficient similarity-based at the National Science Foundation.
approach for optimal mining of role hierarchy,” in Proceedings of
the 15th ACM symposium on Access control models and technologies,
pp. 55–64, ACM, 2010.
[39] Q. Ni, J. Lobo, S. Calo, P. Rohatgi, and E. Bertino, “Automating
role-based provisioning by learning from examples,” in Proceed-
ings of the 14th ACM symposium on Access control models and
technologies, pp. 75–84, ACM, 2009.
[40] J. Vaidya, V. Atluri, and Q. Guo, “The role mining problem: A
formal perspective,” ACM Transactions on Information and System
Security (TISSEC), vol. 13, no. 3, p. 27, 2010. Mai Abdelhakim is an assistant professor in the
[41] Z. Xu and S. D. Stoller, “Mining attribute-based access control department of electrical and computer engineer-
policies from rbac policies,” in Emerging Technologies for a Smarter ing at the University of Pittsburgh’s Swanson
World (CEWIT), 2013 10th International Conference and Expo on, school of engineering. She received her Ph.D.
pp. 1–6, IEEE, 2013. degree in Electrical Engineering from Michigan
State University, and Bachelor’s and Master’s
degrees in Electronics and Communications En-
gineering from Cairo University. Her research
interests include cyber-physical systems, cyber-
security, machine learning, stochastic systems
modeling, and information theory.