This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
1
Achieving Flatness: Selecting the Honeywords
from Existing User Passwords
Imran Erguler
Abstract—Recently, Juels and Rivest proposed honeywords (decoy passwords) to detect attacks against hashed password databases.
For each user account, the legitimate password is stored with several honeywords in order to sense impersonation. If honeywords are
selected properly, a cyber-attacker who steals a file of hashed passwords cannot be sure if it is the real password or a honeyword for
any account. Moreover, entering with a honeyword to login will trigger an alarm notifying the administrator about a password file breach.
At the expense of increasing the storage requirement by 20 times, the authors introduce a simple and effective solution to the detection
of password file disclosure events. In this study, we scrutinize the honeyword system and present some remarks to highlight possible
weak points. Also, we suggest an alternative approach that selects the honeywords from existing user passwords in the system in order
to provide realistic honeywords – a perfectly flat honeyword generation method – and also to reduce storage cost of the honeyword
scheme.
Index Terms—Authentication, honeypot, honeywords, login, passwords, password cracking
✦
1
I NTRODUCTION
D
ISCLOSURE of password files is a severe security
problem that has affected millions of users and
companies like Yahoo, RockYou, LinkedIn, eHarmony
and Adobe [1], [2], since leaked passwords make the
users target of many possible cyber-attacks. These recent events have demonstrated that the weak password
storage methods are currently in place on many web
sites. For example, the LinkedIn passwords were using
the SHA-1 algorithm without a salt and similarly the
passwords in the eHarmony system were also stored
using unsalted MD5 hashes [3]. Indeed, once a password
file is stolen, by using the password cracking techniques
like the algorithm of Weir et al. [4] it is easy to capture
most of the plaintext passwords.
In this respect, there are two issues that should be
considered to overcome these security problems: First,
passwords must be protected by taking appropriate precautions and storing with their hash values computed
through salting or some other complex mechanisms.
Hence, for an adversary it must be hard to invert hashes
to acquire plaintext passwords. The second point is that a
secure system should detect whether a password file disclosure incident happened or not to take appropriate actions. In this study, we focus on the latter issue and deal
with fake passwords or accounts as a simple and cost
effective solution to detect compromise of passwords.
Honeypot is one of the methods to identify occurrence
of a password database breach. In this approach, the
administrator purposely creates deceit user accounts to
lure adversaries and detects a password disclosure, if
any one of the honeypot passwords get used [5], [6].
I. Erguler is with National Research Institute of Electronics & Cryptology
TUBITAK-BILGEM, 41470 Gebze, Kocaeli, Turkey.
E-mail: imran.erguler@tubitak.gov.tr
This idea has been modified by Herley and Florencio
[7] to protect online banking accounts from password
brute-force attacks. According to the study, for each user
incorrect login attempts with some passwords lead to
honeypot accounts, i.e. malicious behavior is recognized.
For instance, there are 108 possibilities for a 8-digit
password and let system links 10000 wrong password
to honeypot accounts, so the adversary performing the
brute-force attack 10000 times more likely to hit a honeypot account than the genuine account. Use of decoys
for building theft-resistant was introduced by Bojinov et
al. in [8] called as Kamouflage. In this model, the fake
password sets are stored with the real user password
set to conceal the real passwords, thereby forcing an
adversary to carry out a considerable amount of online
work before getting the correct information. Recently,
Juels and Rivest have presented the honeyword mechanism to detect an adversary who attempts to login with
cracked passwords [9]. Basically, for each username a set
of sweetwords is constructed such that only one element
is the correct password and the others are honeywords
(decoy passwords). Hence, when an adversary tries to
enter into the system with a honeyword, an alarm is
triggered to notify the administrator about a password
leakage. The details of the method will be given in the
next section.
In this study, we analyze the honeyword approach
and give some remarks about the security of the system.
Furthermore, we point out that the key item for this
method is the generation algorithm of the honeywords
such that they shall be indistinguishable from the correct passwords. Therefore, we propose a new approach
that uses passwords of other users in the system for
honeyword sets, i.e. realistic honeywords are provided.
Moreover, this technique also reduces the storage cost
compared with the honeyword method in [9]. The rest
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
2
of this paper is organized as follows. In Section 2, we
review the honeyword approach and discuss the honeyword generation procedures. Section 3 examines security
of these procedures and Section 4 gives the description of
our proposed model. In Section 5, we analyze its security
properties and demonstrate a comparison between our
approach and the original methods in Section 6. Finally,
in Section 7 we conclude this paper.
2
H ONEYWORDS
In this section, we first briefly summarize the honeyword password model proposed by Juels and Rivest in
[9]. Then, we overview the methods on generation of
honeywords given in the study and discuss some points
that can cause some security problems.
2.1 Review of Honeywords
Basically, a simple but clever idea behind the study is the
insertion of false passwords – called as honeywords –
associated with each user’s account. When an adversary
gets the password list, she recovers many password
candidates for each account and she cannot be sure about
which word is genuine. Hence, the cracked password
files can be detected by the system administrator if a login attempt is done with a honeyword by the adversary.
We use the notations and definitions depicted in Table 1
to simplify the description of the honeyword scheme.
The honeyword mechanism works simply as follows: For each user ui , the sweetword list Wi is
generated using the honeyword generation algorithm
Gen(k). This procedure takes input k as the number of
sweetwords and outputs both the password list Wi =
(wi,1 , wi,2 , . . . , wi,k ) and ci , where ci is the index of the
correct password (sugarword). The username and the
hashes of the sweetwords as < ui , (vi,1 , vi,2 , . . . , vi,k ) >
tuple is kept in the database of the main server, whereas
ci is stored in another server called as honeychecker.
By diversifying the secret information in the system –
storing password hashes in one server and ci in the honeychecker – makes it harder to compromise the system
as a whole, i.e. providing a basic form of distributed
security [9]. Notice that in a traditional password technique < ui , H(pi ) > pair is stored for each account, while
for this system < ui , Vi > tuple is kept in the database,
where Vi = (vi,1 , vi,2 , . . . , vi,k ). The login procedure of
the scheme is summarized below:
• User ui enters a password g to login to the system.
• Server firstly checks whether or not H(g) is in list
Vi . If not, then login is denied.
• Otherwise system checks to verify if it is a honeyword or the correct password.
• Let v(i, j) = H(g). Then j value is delivered to the
honeychecker in an authenticated secure communication.
• The honeychecker checks whether j = ci or not. If
the equality holds, it returns a TRUE value, otherwise it responses FALSE and may raise an alarm
depending on security policy of the system.
Before discussing the honeyword generation methods,
we want to talk about the honeyword generator algorithm Gen(). Note that strength and effectiveness of the
method indeed is directly related to how the Gen() is
constructed. Therefore, the authors introduce a definition
as the flatness of Gen() such that it measures the chance
of an adversary in picking the correct password from the
sweetwords. In other words, if a honeyword generation
method is ǫ-flat, then she has at least a 1 − ǫ chance
of picking a honeyword. For example the attacker has a
chance of at most 25% of picking the correct password pi
from Wi for ǫ = 1/4. In short, if the algorithm is not flat
enough, the real password stands out from the remaining
fake passwords and an adversary can easily reveal the
original one.
2.2 Honeyword Generation Methods and Discussions
The authors in [9] categorize the honeyword generation methods into two groups. The first category
consists of the legacy-UI (user interface) procedures
and the second one includes modified-UI procedures
whose password-change UI is modified to allow better
password/honeyword generation. Take-a-tail method is
given as an example of the second category. According
to this approach a randomly selected tail is produced for
the user to append this suffix to her entered password
and the result becomes her new password. For instance,
let a user enter password games01, and then system let
propose ’413’ as a tail. So the password of the user now
becomes games01413. Although this method strengthens
the password, to our point of view, it is impractical –
some users even forget the passwords that they determined. Therefore in the remaining parts, the analysis that
we conducted is limited with the legacy-UI procedures.
Note that some discussed points are indeed mentioned
in [9], but we emphasize those to address the paramount
importance of the selected generator algorithm in terms
of security.
2.2.1 Chaffing-by-tweaking
In this method, the user password seeds the generator algorithm which tweaks selected character positions of the
real password to produce the honeywords. For instance,
each character of a user password in predetermined
positions is replaced by a randomly chosen character
of the same type: digits are replaced by digits, letters
by letters, and special characters by special characters.
Number of positions to be tweaked, denoted as t should
depend on system policy. As an example t = 3 and
tweaking last t characters may be a method for the
generator algorithm Gen(k, t). Another approach named
in the study as ”chaffing-by-tweaking-digits” is executed
by tweaking the last t positions that contain digits. For
example, by using the last technique for the password
42hungry and t = 2, the honeywords 12hungry and
58hungry may be generated.
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
3
TABLE 1
Notations
H()
ui
pi
Wi
vi,j
Vi
k
ci
Gen(k)
sweetword:
sugarword:
honeyword:
Cryptographic hash function used to compute hash of the passwords
Username for the ith user.
Password of the ith user
List of potential passwords for ui
Hash value of jth element of Wi
List of hash values for ui , Vi = (vi,1 , vi,2 , . . . , vi,k )
Number of elements in Wi
Index of the correct password in list Wi
Procedure used to generate Wi of length k of sweetwords
Each element of Wi
Correct password in Wi
Each fake password of Wi
Remark 1. Many users have the propensity to choose
the numbers included in passwords related to a special
date, e.g. birthday, anniversary or an important historical
event. For example 3.6% of the hacked Adobe password
hints are related to a date [10]. In the light of this fact,
it is highly possible that such a password involves a
digit sequence like 19xx, 20xx or xx where xx represents
the last two digits of the date. For those passwords by
applying the chaffing-by-tweaking-digits method, the date
digits will be replaced with the randomly selected digits.
Hence an adversary who has Wi of a user ui may easily
identify honeywords and recover the correct password.
When we examine publicly available leaked passwords
hacked from RockYou website (approximately 32 million
entries) [11], [12], we observe that the passwords of the
numerous users include such a pattern, e.g. junexxxx
pattern is selected as a password by 1244 users, where
xxxx is a date and starts with 19 or 20. Another example
should be the password alex1992 which is seen 47 times
in the RockYou password list: Suppose the following
honeywords are generated with t = 4 and k = 9 for
this password. Note that the digits in the honeywords
seem not relevant, but the correct password alex1992
makes sense for an adversary.
alex6323
alex9058
alex1992
alex1270
alex5469
alex0976
alex8147
alex2785
alex9705
Apart from the use of a date in passwords, many users
prefer to append consecutive numbers to their password
heads, like ’123’, ’1234’, due to the tendency of users to
choose rememberable number patterns. By considering
the RockYou leaked password database, we realize that
about 0.8% of all user passwords – excluding the ones in
the top 1000– ends with ’123’ or ’1234’ and begins with
letters at least one length. The vulnerability issued with
the date patterns described above is also valid in those
passwords, i.e. an adversary may distinguish the correct
password from the sweetwords by just investigating
the end digit patterns. Indeed, the chaffing-by-tweaking
method suffers from these types of passwords, because
replacing characters of the same type randomly will
give the same hint to an adversary in extracting the
correct password. Similar patterns and examples of user
habits in digit selections can be extended. From a broader
perspective, these examples show that users mostly do
not choose the digits or letters in passwords randomly,
so a randomly replacing technique like this model leads
an adversary to make a natural selection. In particular,
we believe that by deploying the ”chaffing-by-tweaking”
model, it is hard to fulfill aims of the honeyword scheme,
i.e. all the adversary needs to do is to have a human
sense.
2.2.2
Chaffing-with-a-password-model
In this approach, the generator algorithm takes the password from the user and relying on a probabilistic model
of real passwords it produces the honeywords [9]. The
authors give the model of [8] as an example for this
method named as the modeling syntax. In this model,
the password is splitted into character sets. For instance,
mice3blind is decomposed as 4-letters + 1-digit + 5-letters
⇒ L4 + D1 + L5 and replaced with the same composition
like gold5rings.
Another example named as the simple model described
in the study generates honeywords through a password
list: Firstly a password list L is built by combining
numerous real passwords and random passwords of
varying lengths. Then a random word is picked from
the list with a length of d. Moreover, with a probability
of 0.8 some honeywords are generated as ”tough nuts”
which will be explained in the next part. As depicted
in Algorithm 1, given below, honeyword characters are
created by replacing characters of randomly selected
words of L in a probabilistic manner:
Remark 2. Leaked password databases have shown us
that some passwords have a well-known pattern. For
example all of the following passwords are involved in
the list of 10000 most common passwords [13].
bond007
007bond
james007
007007
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
4
Algorithm 1 SimpleModel algorithm
1: procedure S IMPLE M ODEL (L)
2:
w ← random(L)
⊲ randomly returns a word from L
3:
d ← length(w)
⊲ returns length of word w
4:
honeyword(1) ← w(1)
⊲ The first character is the just first character of w
5:
for j ← 2 to d do
⊲ Probabilities of mod1, mod2 and else are 0.1, 0.4 and 0.5
6:
if mod1 then
7:
w ← random(L), honeyword(j) ← w(j)
⊲ Add character in same position of new random word
8:
else if mod2 then
9:
w ← random(L), honeyword(j) ← w(j) ⊲ Select a random word s.t. w(j − 1) = honeyword(j − 1)
10:
else
11:
honeyword(j) ← w(j)
⊲ Proceed with the same word
12:
end if
13:
end for
14: end procedure
Considering the modeling syntax method, one
can conclude that the honeyword system loses its
effectiveness against such passwords, i.e. the correct
password has become noticeably recognized by an
adversary. In fact, this problem seems an inherent
weakness of randomly replacement based honeyword
methods. Since character groups or individual characters
are replaced by a picked character/characters, the
content integrity of such passwords would be broken
and the correct password becomes quite salient.
Remark 3. Besides the previous point, we want to
discuss another issue: If there is a correlation between the
username and the password, then the password can be
easily distinguished from the honeywords. For example,
the password johndoe123 with a username johndoe can
be easily distinguished from the corresponding honeywords. The password policy and guidelines should
dictate users not to create passwords that are correlated
with the username. Unfortunately, some correlations are
inevitable like username peterparker and the password
spiderman1992.
2.2.3
Chaffing with ”Tough Nuts”
In this method, the system intentionally injects
some special honeywords, named as tough nuts,
such that inverting hash values of those words is
computationally infeasible, e.g. fixed length random bit
strings should be set as the hash value of a honeyword.
An illustrative example for a tough nut is given in
[9] as ’9,50PEe[KV.0?RIOtcL-:IJ”b+Wol¡*]!NWT/pb’. It is
stated that the number and positions of tough nuts
are selected randomly. By means of this, it is expected
that the adversary cannot seize whole sweetword set
and some sweetwords will be blank for her, thereby
deterring the adversary to realize her attack. In [9], it
is discussed that in such a situation the adversary may
pause before attempting login with cracked passwords.
Remark 4. Tough nuts are recommended to be used
together with other methods to render the adversary’s
work more challenging and exhaust the attacker. Nevertheless, it has remained an open question in [9] what is
the optimal strategy for an adversary when tough nuts
are experienced. We believe that ”tough nuts” method
is a double-edged-sword: Numerous unknowns in the
password list may discourage an adversary to proceed
mounting her attack. On the other hand, an adversary
may suppose that most of the passwords made up of
simple words and digit combinations, not a tough nut.
Hence, it is reasonable for this adversary to conduct
her classic attack with skipping tough nuts contrarily to
authors’ expectations. Note that for this attack strategy,
entropy contributed by the honeywords is decreased,
because the tough nuts are ignored by the adversary. For
example, if in average 2% of all honeywords are tough
nuts, apparently this rate will be redundant according to
this approach.
2.2.4 Hybrid Method
Another method discussed in [9] is combining the
strength of different honeyword generation methods, e.g.
chaffing-with-a-password-model and chaffing-by-tweakingdigits. By using this technique, random password model
will yield seeds for tweaking-digits to generate honeywords. For example let the correct password be apple1903. Then the honeywords angel2562 and happy9137
should be produced as seeds to chaffing-by-tweakingdigits. For t = 3 and k = 4 for each seed, the sweetword
table given below may be attained:
happy9679
apple1422
angel2656
happy9757
happy9743
apple1903
apple1172
angel2036
angel2849
happy9392
apple1792
angel2562
Remark 5. Feeding on the strength of chaffing-with-apassword-model, this method cuts down chance of an
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
5
adversary in guessing the correct password from the
sugarwords. Nevertheless, previous remarks are also
valid for this case, e.g. in the above example an adversary may make plausible guesses.
3
S ECURITY A NALYSIS
OF
H ONEYWORDS
In this part, we investigate the security of the honeyword
system against some possible scenarios.
3.1 Denial-of-service Attack
In [9], a denial-of-service (DoS) attack is discussed for
the following scenario: Adversary knows the used Gen()
procedure and can produce all possible honeywords
for a given a password. For example, if the chaffingby-tweaking-digits is employed in the system and with
a small t adversary may generate whole possible honeywords from a known password. Consider the case,
let password of a user be test42, then for t = 2 she
can generate 100 possible honeywords and k of these
honeywords are stored in the system password list. Let
Pr(g = wi |pi ) denote the probability of correctly guessing
a valid honeyword of Wi , where correct password pi is
available to the adversary. Hence if this probability is
a non negligible value, the adversary may attempt to
login with the guessed honeyword to trigger an alarm
condition. In fact, this may be serious, if a strong policy
is set by the administrator e.g. a global password reset
in response to a single honeyword hit. In the above
example for k = 20 and t = 2, Pr(g = wi |pi ) = (k −
1)/99 = 0.19. In order to mitigate this risk, the authors
suggest choosing a relatively small set of honeywords
randomly from a larger class of possible sweetwords. For
the previous example, success probability of the attacker
is about 19% for k = 20, while this chance is decreased
to 2% by only changing t = 3.
Nevertheless, we want to consider the case that an
adversary knows m username–password pairs. Perhaps,
she previously created these accounts in the system to
make a DoS attack. Also suppose that there exists a
limit for unsuccessful login attempts as n and success
probability of guessing a valid honeyword for a known
password is Pr(g = wi |pi ) = α1 . Then it is more likely that
the adversary can carry out the DoS attack successfully,
if she makes about α trials. Notice that the adversary
can make at most m · n attempts. For the above example
Pr(g = wi |pi ) = 0.02, so it is highly possible to raise an
alarm condition if an adversary makes about 50 trials.
That is to say if the false attempt limit n is (say) five,
10 known account/passwords pairs will be enough to
launch the mentioned attack.
Remark 6. In fact, a user should deploy the described
attack even she possesses a single account by following
the Algorithm 2. In this case, an adversary solely knows
a single username and password ui and pi respectively.
Also, we suppose that the system limits for unsuccessful
login attempts as n, i.e. after n consecutive wrong password trials the account will be blocked. Nonetheless, if
the correct password is entered before n is reached, then
system resets the wrong password counter. Hence, as illustrated in the procedure, the adversary logins with the
correct password at each nth attempt to avoid blocking
of the account. For example, if the used technique for
the honeyword generation is the chaffing-by-tail-tweaking
and the honeywords are produced by tweaking the
characters in the selected last t positions, e.g. t = 3, then
the adversary should select a password such that last
t positions only involve digits to reduce entropy about
possible characters. For this example |T (p)| = 1000,
where T (p) stands for the set of sweetwords producible
by tweaking p for the selected character positions. Also,
we assume that system uses CAPTCHA or a similar
mechanism [14], [15] to prevent automated login attempts and the adversary is patient to try all guesses
manually each of which needs about 5 seconds. Then,
she hits a honeyword in about 1.5 hours and raises a
false alarm.
3.2 Brute-force Attack
In the previous attack, we point out that if a strict
policy is executed in a honeyword detection, system
may be vulnerable to DoS attacks affecting the whole
system. On the other hand, a soft policy weakens the
influence of honeywords. In this regard, we describe
the following attack to demonstrate an adversary can
capture an amount of accounts in case of a light policy.
We suppose an adversary has obtained a password file
F and cracked numerous user passwords. Then, she tries
to login with any accounts in the list instead of compromising a specific account. Furthermore, we assume that
the adversary has no advantage in guessing the correct
password by analyzing corresponding honeywords, i.e.
Pr(g = pi ) = 1/k. Last, if one of the user’s honeywords
is entered, the system takes the appropriate action according to one of the example policies as follows:
•
•
Login proceeds as usual,
User’s account is shut down until the user establishes a new password.
The common point of the above policies is that even
a honeyword entrance is detected, the system gives a
local or no response. As a result of this, an adversary can
carry out a brute-force search until a successful login is
obtained. For example, even a user’s account is locked
due to a honeyword attempt, she continues to search
with another user’s account, i.e. single guess for each
user. She likely makes a correct guess after k trials, since
P r(g = pi ) = 1/k. As an illustrative example for k = 20,
it is highly possible that the adversary finds a correct
password after 20 attempts. It is equivalent to say that
if there exists N users in the system, the adversary may
recover genuine passwords of N/k users by using bruteforce search.
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
6
Algorithm 2 The DoS Attack
1: procedure D OS ATTACK (pi , T (pi ), n)
2:
for j ← 1 to |T (pi )| do
3:
if mod(j, n) = 0 then
4:
Login(pi )
5:
else
6:
Login(Guessj )
7:
end if
8:
end for
9: end procedure
3.3 Choosing Policy
By considering the described attacks and discussions,
one can infer that there are two major issues about
honeywords. The first issue is flatness of the generator
algorithm such that it is directly related to the chance of
distinguishing the correct password out of the respective
sweetwords. Thus, if the method is not flat enough, it
undermines the main task of the honeywords and an adversary can easily perceive the correct password. Second
issue is that what is the chance of an adversary in hitting
a honeyword intentionally and triggering a false alarm
to render the system in a DoS state. Significance of this
issue depends on the adapted policy, e.g. what would
be done in case of a false alarm. Under these points,
one can see that selection of the Gen() procedure and
appropriate policy are critically important. Indeed, these
security issues are mentioned in [9]. However, the authors propose to adapt factors that decrease the potency
of DoS attacks, e.g. increasing t value for chaffing-bytweaking method instead of insisting on strong policies.
Since the main purpose behind the introduction of the
honeywords is to overcome password-crack detection
problem, we believe that security policies should not
be loosened to mitigate the DoS attacks. In order to
hinder those attacks, the Gen() is chosen such that
Pr(g = wi |pi ) = ε must be satisfied, where ε is a
negligible value. Also a limit, as λ, for the maximum
number of honeyword attempts in a period should be
set to prevent the brute-force attack. When the limit is
exceeded a major appropriate action should be taken,
e.g. forcing users to refresh their passwords.
4
A N EW A PPROACH
Our proposed model is still based on use of honeywords
to detect password-cracking. However, instead of generating the honeywords and storing them in the password
file, we suggest to benefit from existing passwords to
simulate honeywords. In order to achieve this, for each
account k − 1 existing password indexes, which we call
honeyindexes, are randomly assigned to a newly created
account of ui , where k ≥ 2. Moreover, a random index
number is given to this account and hash of the correct
password is kept with the correct index in a list. On
the other hand, in another list ui is stored with an
integer set which is consisted of the honeyindexes and the
⊲ To reset unsuccessful login attempts
⊲ Make j th guess; Guessj ∈ T (pi )
correct index. So, when an adversary analyzes the two
lists, she recognizes that each username is paired with
k numbers as sweetindexes and each of which points to
real passwords in the system. The tentative password
indexes hamper an adversary to make a correct guess
and she cannot be easily sure about which index is
the correct one. It is equivalent to say that to create
uncertainty about the correct password, we propose to
use indexes that map to valid passwords in the system.
The contribution of our approach is twofold. First, this
method requires less storage compared to the original
study. Second, in the previous sections we argue that
effectiveness of the honeyword system directly depends
on how Gen() flatness is provided and how it is close
to human behavior in choosing passwords. Within our
approach passwords of other users are used as the
fake passwords, so guess of which password is fake
and which is correct becomes more complicated for an
adversary.
4.1 Initialization
Firstly, T fake user accounts (honeypots) are created
with their passwords (see Appendix A for details). Also
an index value between [1, N ], but not used previously
is assigned to each honeypot randomly. Then k − 1
numbers are randomly selected from the index list and
for each account a honeyindex set is built like Xi =
(xi,1 , xi,2 , . . . , xi,k ); one of the elements in Xi is the correct index (sugarindex) as ci . Now, we use two password
files as F1 and F2 in the main server: F1 stores username
and honeyindex set, < hui , Xi > pairs as shown in Table 2,
where hui denotes a honeypot account. Note that each
entry has two elements. The first one is the username
of the account and the second element is honeyindex
set for the respective account. Also, the table is sorted
alphabetically by the username field. On the other hand,
F2 keeps the index number and the corresponding hash
of the password, < ci , H(pi ) >, as depicted in Table 3.
In this case, each entry in the table has two elements.
The first element is the sugarindex of the account and the
second one is the hash of the corresponding password.
Notice that the table is sorted according to the index
values. Let SI denote the index column and SH represent
the corresponding password hash column of F2 . Then
the function f (ci ) that gives password hash value in SH
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
7
Username
Honeyindex Set
agent-lisa
alexius
baba13
.
.
.
zack tayland
zoom42
(93, 16626, . . . , 94931)
(15476, 51443, . . . , 88429)
(3, 62107, . . . , 91233)
.
.
.
(1009, 23471, . . . , 47623)
(63, 51234, . . . , 72382)
TABLE 2
Example Password File F1 for the Proposed Model
SI
SH
3
7
85
..
.
100000
100004
H(p3 )
H(p7 )
H(p85 )
..
.
H(p100000 )
H(p100004 )
TABLE 3
Example Password File F2 for the Proposed Model
for the index value ci can be defined as: f (ci ) = {H(pi ) ∈
SH :< ci , H(pi ) > stored pair of ui and ci ∈ SI }. In
order to make points clear, the initialization process is
shown within the following example.
Example 1. Suppose that a honeypot username/password
pair is generated like < macbeth, master2014 > by the
system. Then an index number is randomly selected, for
instance 1008, and assigned as the correct index of this
account. Now F2 file is updated according to this information
as shown below:
Index No
..
.
1008
..
.
Hash of Password
..
.
H(master2014)
..
.
Then, k − 1 numbers are randomly chosen from SI of F2 and
combined with correct index 1008 in a random manner to
produce the index group. For instance, if k = 5, such a group
(42, 96104, 1008, 7201, 23008) may be generated. In this case
F1 file is seen as below:
Username
..
.
macbeth
..
.
Honeyindex Set
..
.
(42, 96104, 1008, 7201, 23008)
..
.
generator algorithm Gen(k, SI ) → ci , Xi , which outputs ci as the correct index for ui and the honeyindexes
Xi = (xi,1 , xi,2 , . . . , xi,k ). Note that Gen(k, SI ) produces
Xi by randomly selecting k−1 numbers from SI and also
randomly picking a number ci ∈
/ SI . So ci becomes one
of the elements of Xi . One can see that the generator
algorithm Gen(k, SI ) is different from the procedure
described in [9], since it outputs an array of integers
rather than a group of honeywords. Note, however, that
the index array Xi indeed represents which honeywords
are assigned to ui . In other words, the corresponding
honeyword will be the real password whose hash value
is f (xi,j ). After ci , Xi are obtained, ui , ci pair is delivered
to the honeychecker and F1 , F2 files are updated as
shown in Table 4.
Last, periodically honeyindexes of each account should
be regenerated. As the number of users in the system
increases to provide uniform distribution of honeyindexes
across SI , fresh honeyindex set must involve numbers
from this new larger list. Otherwise, passwords of newly
created accounts would not be used as honeywords in
the system and it may give a clue to the adversary to
in guessing the correct password of these new accounts.
Note that within a uniform distribution each password
is assigned as a honeyword about k times, because there
are N passwords but N k honeywords are needed.
4.3 Honeychecker
In our approach, the auxiliary service honeychecker is
employed to store correct indexes for each account and
we assume that it communicates with the main server
through a secure channel in an authenticated manner.
Indeed, it can be assumed that security enhancements
for honeychecker and the main server presented in [16]
are applied, but it is out scope of this study.
The role and primary processes of the honeychecker
are the same as described in the original study [9], except
that < i, ci > pair is replaced with < ui , ci > pair in our
case. The honeychecker executes two commands sent by
the main server:
Set: ci ,ui
Sets correct password index ci for the user ui .
Check: ui , j
Checks whether ci for ui is equal to given j.
Returns the result and if equality does not
hold, notifies system a honeyword situation.
Thus, the honeychecker only knows the correct index
for a username, but not the password or hash of the
password. In the following part, the functions of the
honeycheker are described.
4.2 Registration
After the initialization process, system is ready for user
registration. In this phase, a legacy-UI is preferred, i.e.
a username and password are required from the user
as ui , pi to register the system. We use the honeyindex
4.4 Login Process
System firstly checks whether entered password, g, is
correct for the corresponding username ui . To accomplish this, firstly the Xi of the corresponding ui is
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
8
SI
3
.
.
.
ci
.
.
.
..
.
100000
100004
SH
Username
Honeyindex Set
H(p3 )
.
.
.
H(pi )
.
.
.
..
.
H(p100000 )
H(p100004 )
agent-lisa
alexius
baba13
.
.
.
ui
.
.
.
zack tayland
zoom42
(93, 16626, . . . , 94931)
(15476, 51443, . . . , 88429)
(3, 62107, . . . , 91233)
.
.
.
Xi
.
.
.
(1009, 23471, . . . , 47623)
(63, 51234, . . . , 72382)
TABLE 4
After the Registration Process, Change of F2 is Illustrated on the Left, while Update of F1 is Shown on the Right
attained from the F1 file. Then, the hash values stored in
F2 file for the respective indices in Xi are compared with
H(g) to find a match. If a match is not obtained, then
it means that g is neither the correct password, nor one
of the honeywords, i.e. login fails. On the other hand,
if H(g) is found in the list, then the main server checks
whether the account is a honeypot. If it is a honeypot,
then it follows a predefined security policy against the
password disclosure scenario. Notice that for a honeypot
account there is no importance of the entered password
is genuine or a honeyword, so it directly manages the
event without communicating with the honeychecker. If,
however, H(g) is in the list and it is not a honeypot,
the corresponding j ∈ Xi is delivered to honeychecker
with username as < ui , j > to verify it is the correct
index. Honeychecker controls whether j = ci and returns
the result to the main server. At the same time, if it is
not equal, then it assured that the proffered password
is a honeyword and adequate actions should be taken
depending on the policy.
5 S ECURITY A NALYSIS
M ODEL
OF THE
•
•
P ROPOSED
In this section, we investigate the security of the proposed model against some possible attack scenarios.
Before, however, we elaborate on the attack strategies,
we will first state a set of reasonable assumptions about
our approach and the related security policies. We suppose that the adversary can invert most or many of the
password hashes in file F2 . Notice that the introduction
of this scheme comes with a DoS attack sensitivity in
which an adversary deliberately tries to login with honeywords to trigger a false alarm. Hence, the suggested
policies given below mostly focuses on minimizing the
DoS vulnerabilities.
• As described in Section 4.4 when a user logins
with a wrong password, but not a honeyword, the
login fails. If this wrong password is the password
of another account in the system and the same
user hits this situation more than once (trying with
other passwords in F2 ), the system should turn on
additional logging of the user’s activities to detect a
•
•
possible DoS attack and to attribute the adversary,
besides the incorrect login attempt case proceeds as
usual.
If a password, whose hash value is in the SH of
the F2 , is entered in wrong login attempts for more
than once, the system should take actions against
a possible DoS alarm. In this case the system suspects about the respective password such that it is
known by the adversary (possibly she created an
account with this password) and she aims to raise
a honeyword situation. Resultantly, the consecutive
wrong login attempts with this password gives rise
to a DoS warning and further activities of the user
are investigated by the admin as a precaution to
prevent a false honeyword alarm. Note that these
attempts may be done with a single username or
with different usernames.
In order to increase the number of unique passwords in the system, i.e. reduce common passwords,
users should be forced to adhere to a passwordcomposition policy like basic8 (8 or more characters), comprehensive8 (at least 8 characters, including an uppercase and lowercase letter, a symbol,
and a digit and not contain a dictionary word),
basic16 (16 or more characters) in the password
creation [17]. The main reason behind this item is
to minimize the number of common passwords in
the system: As detailed in Section 5.1, if the number
of common password increases, the chance of an
adversary realizing a DoS attack also increases.
A username should not be correlated with its password, Remark 3 should be considered. Otherwise,
the contribution of the honeywords for an account,
that has a correlated username password pair, will
be weakened. Although fulfilling this item is not
easy, some obvious vulnerable cases can be automatically rejected by the system by developing a
custom policy, e.g. the password string involves the
username as a suffix or prefix are not accepted as a
pasword.
To avoid occurrence of a high number of common
passwords in the system, the user should be driven
to choose another password when the created pass-
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
9
word is in the list of 1000 most common passwords.
Hence, chance of a possible DoS attack described
below will be reduced.
5.1 DoS Attack
Under this attack scenario as described in Section 3.1, the
adversary does not have the password files and their
contents. Her main purpose is to trigger a false alarm
and to raise a honeyword alarm situation, i.e. depending
on the policy some or all parts of the system may be out
of service or disabled unnecessarily. We suppose that the
adversary has knowledge m + 1 username and respective passwords in the system as (ua , pa , . . . , ua+m , pa+m );
maybe she intentionally created all of these accounts. In
this case, a plausible method for attacking the system
is creating m accounts with the same password as pz ,
while a single account, uy , has different password like
py and entering the system with the username uy and
the password pz . If pz is assigned by the system as a
honeyword, then the adversary mounts a DoS attack by
entering with the system < uy , pz > pair. Let Pr(pz ∈ Wy )
denote the probability that pz is assigned as one of the
honeywords for uy ; it is also the success probability of
the adversary for this attack. Since there are N − m
passwords different from pz 1 and k honeywords are
assigned to each account:
N − m k
.
(1)
Pr(pz ∈ Wy ) = 1 −
N
As an illustrative example for N = 1000000, k = 20 and
m = 100, from Eq. 1 an adversary succeeds in realizing
the described attack with a probability of 0.002. Note
that, the success of the adversary directly depends on
(m/N ), so for large values the chance of the adversary
will be increased. For instance if N = 1000, m = 10 and
k = 20 (as an extreme example, one out of 100 accounts is
created by the adversary), the success probability of the
adversary will be 0.18. Apart from this, the adversary
would like to perform the attack with more accounts
like py such that in each trial pz is tested. However,
from our assumptions we know that when a password in
the system is entered incorrectly by the same username
or different usernames for more than once, the system
suspects about a DoS attack and a DoS alarm should
be triggered depending on the next activities of the
user. Hence, the adversary cannot increase her chance
by making more trials for the same known password pz
without being noticed by the system.
One can question about what is the chance of a DoS attacker in hitting a honeyword in her next trials and what
happens in this situation, while her activities are logging
by the system due to a potential DoS activity. The answer
may depend on the logs of the attacker and the admin
1. In fact, an adversary may select a common password pz such
that it is already selected by another user, i.e. more than m passwords
would become same with pz . Nevertheless, it seems unlikely to find
a match with a common password, if a strong password-composition
policy is used in the system.
policy. First of all, in case of a DoS alarm, the system
may block all future requests/queries from the same
user/IP address and prevent a possible false honeyword
entrance. Another case should be, if the admin is not
sure about a DoS activity (not a high score is observed
for DoS), then it may ignore the honeyword situation
even the attacker hits a honeyword. In other words, it
permits the attacker to make more trials to gather more
information and ensure the situation. Also, the system
may drive the potential attacker to a fake system and
proceeds to investigate further. Another choice should be
showing a local reaction, i.e. password renewal process,
or freezing should be considered for the only suspected
accounts. As can be seen, many solutions to discriminate
a DoS activity from a real honeyword condition can be
illustrated.
5.2 Password Guessing
In this attack, we assume that the adversary has plundered password files F1 and F2 from the main server and
also obtained plaintext passwords by inverting the hash
values. Extracted F2 file (after inverting hashes) gives
< indexnumber, password > pairs to the adversary, but
they are not directly connected to a specific username. By
just analyzing this, she cannot exactly determine which
password belongs to which user. On the other hand,
F1 gives username, indexset pairs such that for each
username k possible passwords exist. Also, we suppose
that the adversary has no advantage in guessing the
correct password by using specific information about
the user, such as age, gender and nationality. If the
adversary randomly picks an account from the list in F1
and then tries to login with a guessed password, then
her success will depend on: First, the selected account
is not a honeypot (decoy) account. Second guessing the
correct password pi out of k sweetwords. Otherwise,
the adversary will be caught by the system due to a
honeyword or a honeyspot. Let Pr(success) represent
the probability that the adversary makes a correct guess
for a randomly picked username. Below, we express the
probability that the adversary, who makes random trials,
is not detected by the system, where we suppose the
number of honeypots in the system is T :
N −T 1
· .
(2)
N
k
√
A convenient choice for T should be N . For k = 20
and N = 1000000, she picks the correct password pi with
5% probability. Conversely, the adversary will caught by
the system in password guessing attack with a chance
of 95%, as long as the password does not carry any
information about the username. In contrast to the guess
probability in [9] which depends on number of honeywords, the chance depends on two factors – number
of honeywords and honeypots. Thus,√one can create
a higher number of honeypots than N , to increase
detection probability of the adversary.
Pr(success) =
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
10
5.3 Brute-force Attack
In this part, we consider the attack described in Section
3.2. We suppose that if a honeypot entrance is detected
by the system, it responds with a strong reaction, while
a light policy (not suggested) is executed in case of
a honeyword detection. So, we assume that even in
a honeyword detection the adversary may proceed to
make her trials due to light local policies. If, however,
a honeypot account is attempted then system follows a
strong policy e.g. demanding all users to renew their
passwords. From binomial distribution the probability
that the adversary hits at least one honeyspot in her
−T α
) . Even in this
α trials is P r(hit ≥ 1) = 1 − ( NN
case our approach provides resistance against such an
attack, because for α = 700, T = 1000, N = 1000000
values this probability tends to 0.5. It is equivalent to
say that in brute-force guess attack, it is likely that
the adversary hits a honeypot and system detects the
password disclosure situation.
5.4 Same User in Multiple Systems
In [9], the attack scenarios such that a user reuse passwords on two different systems as A and B are investigated. For example suppose A uses honeywords
and B has prevalent password storage techniques and
a target user ui shares her password across these two
systems. In this case, if an adversary compromises B, it
is apparent that honeywords assigned for this user in A
contributes nothing at all. Conversely, if the adversary
pilfers passwords from A, she can try all sweetwords
of the common user ui in A to verify which is the
correct password by submitting to B. If a honeyword is
entered to B, it results in an incorrect password screen,
while the adversary successfully logins in case of the
correct password. Notice that our proposed model is also
vulnerable these scenarios: Indeed, if the password is not
same but correlated for a user in two distinct domains,
then first scenario may be still valid. For example a
user has password bond007 in B which does not use
honeywords. On the other side same user has password
james007 in domain A which assigns honeywords to
these user. Then it is highly possible that an adversary
extracts the correct password from the sweetwords, if
she has knowledge of bond007. So, both of the original
method and our approach can not provide resistance
against such conditions, as long as users select same or
highly correlated passwords in different domains.
6
C OMPARISON
OF HONEYWORD GENERA -
TION MODELS
In this section, we give a comparison of the generation
methods including our proposed model with respect to
storage cost, DoS resistance and flatness of each algorithm. Before discussing these issues in detail, we would
like to talk about how the proposed model changes total
hash inversion effort of an adversary who has a leaked
password file (F1 and F2 files for our case). In fact, as
mentioned in Section 1, defending and detecting are two
different issues from the point of password security. For
example, by realizing the salted-high iteration password
storage techniques, inverting a hash from a captured
password file becomes time consuming, i.e. it makes
extracting plaintext passwords harder. On the other
hand, the main purpose behind the use of honeywords is
providing a detection mechanism in case of a password
disclosure occurrence. Although throughout this paper
in security analysis we assume that an adversary has
capability of inverting most or many of the password
hashes in the password file, in the following, we address
this subject.
In a traditional password based system with a number of N users, an adversary may have at most N
user/password pairs through a cracking process. Suppose that all cryptographic efforts in recovering a plaintext word from the respective hash string stored in
password hash file is represented as a single hash inversion operation. Then, by considering the traditional
system, for each user the attacker has to perform one
hash inversion operation, while for the whole system
N operations must be executed. On the other hand,
according to the Juels and Rivest’s method she needs to
launch k operations for a specific user and kN operations
for the whole space, since for each user k sweetwords
are assigned. Thus, one can see that the adversary has to
spend k times more effort for each case in this method.
Now, for our proposed model, if the attacker focuses
on a specific user, she must still try k hash inversions
to reveal all possible passwords for this target user.
However, since each honeyword is indeed password
of another user, revealing all passwords in the system
requires N operations as in the case of the traditional
system. Therefore, taking the total password-cracking
cost of the adversary for the whole system into account,
we can say that Juels and Rivest’s model requires higher
effort for an attacker in retrieving plaintext passwords.
Notice that, this feature of our model at worst reduces
total hash inversion work to it was at the traditional
model. Besides, we want to stress again that our paper
discusses what can be done in terms of detecting the
password disclosure, when the whole plaintext forms
of the passwords are available to an adversary, rather
than dealing with making the adversary’s work harder
in getting plaintext passwords.
6.1 Storage Cost
In this part, we compute storage requirement of our
method and compare it with that in [9]. A typical password file system requires hN plus storage for usernames,
where N stands for the number of users in the system
and h denotes length of password hash in bytes. On
the other hand this is khN for [9], where k denotes
the number of the sweetwords assigned to each account.
Notice that we ignored the storage cost stemmed from
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
11
usernames, since it is not changed after adaptation of
the honeywords. The authors also propose a storage
optimization technique for the chaffing-by-tweaking model
such that keeping only hash of a single sweetword,
vi,r in database would be enough, because the main
server can compute all possible honeywords from an
entered proffered password g, e.g. T (g) then check hash
of each element in T (g) with stored value vi,r in run
time. The authors claim that for a small value of t, |T (g)|
will be reasonable. For example if t = 2 is selected in
case of ”chaffing-by-tweaking-digits”, |T (g)| becomes 100.
Although the solution works and it is an affordable
computation cost for the main server, we argue about its
applicability, e.g. for each login attempt the server makes
100 more hash computation just to save some storage
space.
For our approach we assume that each index requires
4 bytes and the storage cost becomes2 :
4kN + hN + 4N.
(3)
To measure the gain in storage compared to original
method, we give the ratio as:
4k + h + 4
4kN + hN + 4N
=
.
khN
kh
Notice that this ratio is independent of the number of
users and it is less than one for realistic values of k and h.
For example let used hash function be SHA-1, i.e. h = 20
bytes and k = 20 as mentioned in [9], then this ratio will
be about 0.25. In other words, for this case our approach
needs 1/4 of storage of the original method. Also note
that, as k increases storage cost of our scheme is affected
by the term 4kN , while this is hkN for the methods of
[9]. So for practical values of h, such as 16 for MD5, 20
for SHA-1 and 32 for SHA-256, growth in storage cost of
our method will be less than those of the original ones.
6.2 DoS Resistance
In Section 3.1, we show that the chaffing-with-tweakingmodel may suffer from a DoS attack, due to predictability
of the honeywords. Unlikely, the chaffing-with-a-passwordmodel provides resistance against such an attack, because
honeywords are generated by using a list of passwords
such that they may be independent from the correct
password. In this context, a detailed security analysis
of our proposed model is presented in Section 5.1 and
we claim that our scheme also thwarts a realizable DoS
attack as long as the password policies in Section 5 are
adapted and the users obey these tenets in the password
creation. Note that the authors in [9] avoids direct use of
a password list to eliminate a DoS attack threat in case of
very common passwords exist in the list. As opposed to
this idea, our proposed scheme uses password list in the
system as honeywords of a user. However as stated in
2. In order to make comparable results, we discarded the storage
cost for honeypots– it needs (4kT + hT + 4T ) bytes of storage for T
honeypots.
Section 5, adaptation of a strong password composition
policy likely prevents occurrence of common passwords
in high numbers, i.e. probability of a common password
is assigned as a honeyword for a specific user will be
negligibly low. Although, an adversary may hit a real
password using a common password in the system, it
is not necessarily a honeyword for the corresponding
account. Thus, use of real passwords as honeywords
does not cause a DoS weakness. Last but not least issue is
that in our proposed model in addition to honeywords,
honeypots are employed to detect a password disclosure.
This facilitates showing a strong response to actions of
an adversary, because entering a honeypot account with
one of its sweetwords ensures occurrence of a password
leakage. In other words, in our approach administrator
should take stronger actions in case of a honeypot attempt compared to entering with honeywords in order
to diminish DoS vulnerability.
6.3 Flatness
Remark 1 demonstrates that the chaffing-with-tweakingmodel may leave traces to an adversary in distinguishing
the genuine password from the honeywords. As can be
inferred from this analysis, the superior method of [9] is
the chaffing-with-a-password-model, because the produced
honeywords may seem like user passwords from the
perspective of the adversary. Success of the method in
flatness depends on how password-model is constructed,
for instance the modeling syntax yields honeywords depending composition of the user password, thereby a
perfect user like behaviour cannot be provided. On the
other hand, the simple model described in the study
may satisfy the distribution of honeywords like user
passwords by using a list of real passwords. For our
proposed model as described previously passwords of
other users become honeywords for a user. Hence, our
model satisfies perfect flatness as long as the correct
password is not correlated with username as pointed in
Remark 3 and investigation of a target user profile (age,
gender, religion etc.) gives no advantage to an adversary
in password guessing. Comparing our method with the
simple model, one can see that our method is better than
the latter in terms of flatness: The honeywords in the
former carry all characteristics of the real passwords
in the same system, while the simple model generates
honeywords artificially despite using real passwords of
different list. For example, it is well known that users
choose segregate their passwords for more-secure and
low-secure sites [18], [19]. In [20], it is presented that
reuse rate of weaker passwords is higher than those of
stronger passwords, since the stronger ones are usually
created for higher-security sites e.g. banking accounts.
Consequently, a password list from a lower-security site
password list which is used in the simple model for
a higher-security site may not be natural. Also, just
consider the user passwords for football or movie fan
websites. Intuitively, it is likely that many passwords will
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
12
Method
Tweaking
Password-model
Our model
DoS Resistance
weak
strong
strong
Flatness
weak
strong †,‡
strong ‡
Storage Cost
hN ∗
khN
4kN + hN + 4N
TABLE 5
Comparison of the Honeyword Generator Models
be related to the context, e.g. passwords include names
of heroes, actors, football players or team clubs for movie
and football fan sites respectively. Hence, honeywords
generated by relying on a general real password list may
not exactly match the context of such a specific website,
i.e. an unequivocal pattern incompatibility may exist.
This eventually may lead to advantage of an adversary
in distinguishing the honeywords.
6.4 Usability
In this part, we compare our approach with the simple
model in terms of practicality and ease of use. By
considering the simple model whose password list
is constructed with composition of numerous real
passwords and randomly generated passwords, one can
argue about how the real password source is provided.
If the same resource of real passwords is used in
different sites, similar inherited weaknesses related to
honeyword generation may be observed. Nonetheless, if
use of publicly available password lists is forbidden (as
suggested by the authors), then it will not be easy to get
required large number of real passwords. Conversely,
our approach does not need to use an external real
password resource in honeyword generation, rather it
just feeds itself. Therefore, we claim that our approach
is simpler and more practical for implementation.
The comparison results are summarized in Table 5. Note
that the same expressions of [9] are used for this table
entries. By weak DoS resistance we mean an adversary
who knows the password can hit the one of corresponding honeywords with a non-negligible chance; while by
strong we mean that this chance is ignorably small. The
† is used for condition that its strength depends on how
the real password list is used, e.g. the modeling syntax
may fail as noted in Remark 2. The ‡ is used to mean
that condition is satisfied except the case of Remark 3.
Also ∗ indicates optimization technique is considered in
storage cost calculation.
7
C ONCLUSION
In this study, we have analyzed the security of the
honeyword system and addressed a number of flaws
that need to be handled before successful realization of
the scheme. In this respect, we have pointed out that the
strength of the honeyword system directly depends on
the generation algorithm, i.e. flatness of the generator
algorithm determines the chance of distinguishing the
correct password out of respective sweetwords. Another
point that we would like to stress is that defined reaction policies in case of a honeyword entrance can be
exploited by an adversary to realize a DoS attack. This
will be a serious threat if the chance of an adversary in
hitting a honeyword given the respective password is
not negligible. To combat such a problem, also known
as DoS resistance, low probability of such an event
must be guaranteed. This can be achieved by employing
unpredictable honeywords or altering system policy to
minimize this risk. Hence, we have noted that the security policy should strike a balance between DoS vulnerability and effectiveness of honeywords. Furthermore, we
have demonstrated the weak and strong points of each
method introduced in the original study. It has been
shown that DoS resistance of the chaffing-by-tweaking
method is weak and also its flatness can be questioned
by regarding Remark 1. Although some weaknesses of
the chaffing-by-tweaking techniques are accepted by their
creators, we believe that it should not be considered
as alternative method due to its predictable nature and
a potential DoS weakness. Moreover, the chaffing-withtough nuts model has been investigated, and we have
doubted about its favour as opposed to ideas of Juels and
Rivest. On the other hand, the chaffing-with-a-passwordmodel can fulfill its claims provided that the generator
algorithm is flat. Nevertheless, how the source of the
real passwords is attained for this model should be
answered before judging its applicability. Finally, we
have presented a new approach to make the generation
algorithm as close as to human nature by generating
honeywords with randomly picking passwords that belong to other users in the system. We have compared
the proposed model with other methods with respect
to DoS resistance, flatness, storage cost and usability
properties. The comparisons have indicated that our
scheme has advantages over the chaffing-with-a-passwordmodel in terms of storage, flatness and usability.
In the future, we would like to refine our model by
involving hybrid generation algorithms to also make the
total hash inversion process harder for an adversary in
getting the passwords in plaintext form from a leaked
password hash file. Hence, by developing such methods
both of two security objectives – increasing the total
effort in recovering plaintext passwords from the hashed
lists and detecting the password disclosure – can be
provided at the same time.
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
13
A PPENDIX A
H ONEYPOT G ENERATION
ACKNOWLEDGMENTS
Note that there are two primary reasons of adopting honeypots in our model. 1) For the first users
of the system, there should be a previously prepared
password-pool from where passwords are assigned
as honeywords for these users. So, honeypot passwords indeed make up this initial password-pool. 2)
By means of the honeypots, the proposed model reduces the success probability of the brute-force attack
as explained in Section 5.3. In this appendix, we discuss how honeypot accounts and their respective passwords should be generated. The usernames may be
fictional and can be produced by automated software
programs and scripts, e.g. spam trap, spammer address
and online fake account generators [21], [22], [23] (see
also http://www.fakenamegenerator.com). On the other
hand, to generate passwords for these accounts we adapt
a similar approach of [8]: The method uses a fixed
dictionary3 that includes different length of words, to
pick up random dictionary words. Firstly, the length
of the password ℓ is randomly determined such that it
conforms to the password policy of the system. Next,
the composition of the password is randomly chosen
like La + Db + Lc or Da + Lb + Dc , where a, b, c ≥ 0
and a + b + c = ℓ. For example, L3 + D2 + L4 means
3-letter word followed by 2-digit number and then a
4-letter word. The specified length words are chosen
from the dictionary randomly. Concatenation of each
term results in the candidate password for the honeypot
account. Note that the password may consist of all digits
or a single word, since each of a, b, c can be 0. As an
illustrative example we have chosen the basic8 password
policy as the basis and used an English dictionary of
69903 words. Here is a list of some honeypot passwords
generated by this model with the La + Db + Lc pattern:
anesthesia6
90770807
4930dresden
orcinus51
extort484con
turkistan2by
yearnstud
rubberized
kicking0
builttsar
overturned5
expert506
endo3mom
03803bays
claro331
family21java
silhoutte
titan9285
mosquito
crabbedness
rundown09
One criticize that the produced words from this algorithm may still be distinguished with a human logic.
Nevertheless, as the number of users in the system
increases, the contribution of honeypot passwords to
the honeywords diminishes. That is, as an example for
T = 1000, N = 1000000 values, only one out of 1000
honeywords will be a honeypot password. Thus, the
effect of honeypot passwords in flatness of the system
should be ignored.
The author would like to thank the anonymous reviewers for their valuable comments and suggestions that
greatly improved the quality of this work.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
3. The dictionary can accommodate different languages depending
on context of the user group.
D. Mirante and C. Justin, “Understanding Password Database
Compromises,” Dept. of Computer Science and Engineering Polytechnic Inst. of NYU, Tech. Rep. TR-CSE-2013-02, 2013.
A. Vance, “If Your Password is 123456, Just Make It Hackme,”
The New York Times, vol. 20, 2010.
K. Brown, “The Dangers of Weak Hashes,” SANS Institute InfoSec
Reading Room, Tech. Rep., 2013.
M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek, “Password
Cracking Using Probabilistic Context-Free Grammars,” in Security
and Privacy, 30th IEEE Symposium on. IEEE, 2009, pp. 391–405.
F. Cohen, “The Use of Deception Techniques: Honeypots and
Decoys,” Handbook of Information Security, vol. 3, pp. 646–655, 2006.
M. H. Almeshekah, E. H. Spafford, and M. J. Atallah, “Improving
Security using Deception,” Center for Education and Research
Information Assurance and Security, Purdue University, Tech.
Rep. CERIAS Tech Report 2013-13, 2013.
C. Herley and D. Florencio, “Protecting financial institutions from
brute-force attacks,” in SEC’08, 2008, pp. 681–685.
H. Bojinov, E. Bursztein, X. Boyen, and D. Boneh, “Kamouflage:
Loss-resistant Password Management,” in Computer Security–
ESORICS 2010. Springer, 2010, pp. 286–302.
A. Juels and R. L. Rivest, “Honeywords: Making Passwordcracking Detectable,” in Proceedings of the 2013 ACM SIGSAC
Conference on Computer & Communications Security, ser. CCS
’13. New York, NY, USA: ACM, 2013, pp. 145–160. [Online].
Available: http://doi.acm.org/10.1145/2508859.2516671
M. Burnett, “The Pathetic Reality of Adobe Password Hints,”
https://xato.net/windows-security/adobe-password-hints.
J. Bonneau, “The science of guessing: Analyzing an anonymized
corpus of 70 million passwords,” in Security and Privacy (SP), 2012
IEEE Symposium on. IEEE, 2012, pp. 538–552.
D. Malone and K. Maher, “Investigating the Distribution of
Password Choices,” in Proceedings of the 21st International
Conference on World Wide Web, ser. WWW ’12. New York,
NY, USA: ACM, 2012, pp. 301–310. [Online]. Available:
http://doi.acm.org/10.1145/2187836.2187878
M.
Burnett,
“10000
Top
Passwords,”
https://xato.net/passwords/more-top-worst-passwords/.
L. V. Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA:
Using Hard AI Problems for Security,” in Proceedings of the 22nd
International Conference on Theory and Applications of Cryptographic
Techniques–EUROCRYPT’03, ser. Lecture Notes in Computer Science, vol. 2656. Berlin, Heidelberg: Springer-Verlag, 2003, pp.
294–311.
L. Zhao and M. Mannan, “Explicit Authentication Response
Considered Harmful,” in Proceedings of the 2013 Workshop
on New Security Paradigms Workshop–NSPW ’13. New York,
NY, USA: ACM, 2013, pp. 77–86. [Online]. Available:
http://doi.acm.org/10.1145/2535813.2535822
Z. A. Genc, S. Kardas, and K. M. Sabir, “Examination of a New
Defense Mechanism: Honeywords,” Cryptology ePrint Archive,
Report 2013/696, 2013.
P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas,
L. Bauer, N. Christin, L. F. Cranor, and J. Lopez, “Guess again (and
gain and again): Measuring Password Strength by Simulating
Password-cracking Algorithms,” in Security and Privacy (SP), 2012
IEEE Symposium on. IEEE, 2012, pp. 523–537.
J. Bonneau and S. Preibusch, “The Password Thicket: Technical
and Market Failures in Human Authentication on the Web,” in
WEIS, 2010.
G. Notoatmodjo and C. Thomborson, “Passwords and Perceptions,” in Proceedings of the Seventh Australasian Conference on
Information Security–AISC 2009. Australian Computer Society,
Inc., 2009, pp. 71–78.
D. Florencio and C. Herley, “A Large-scale Study of Web Password Habits,” in Proceedings of the 16th international conference on
World Wide Web. ACM Press, 2007, pp. 657–666.
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TDSC.2015.2406707, IEEE Transactions on Dependable and Secure Computing
14
[21] A. Pathak, “An Analysis of Various Tools, Methods and Systems
to Generate Fake Accounts for Social Media,” Ph.D. dissertation,
Northeastern University Boston, 2014.
[22] D. Nagamalai, B. C. Dhinakaran, and J. K. Lee, “An In-depth
Analysis of Spam and Spammers,” arXiv preprint arXiv:1012.1665,
2010.
[23] C. Biever, “Project Honeypot to Trap Spammers,” New scientist,
no. 2485, p. 26, 2005.
Imran Erguler received his B.Sc., M.Sc. and
Ph.D. degrees in Electrical and Electronics
Engineering from Bogazici University-Istanbul,
Turkey in 2003, 2005 and 2011 respectively.
He has been a chief researcher at National
Research Institute of Electronics & Cryptology
TUBITAK BILGEM, in Kocaeli, Turkey since
2005. His primary research interests include
cryptography, security for signal processing, privacy and network security.
1545-5971 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.