Data Leakage Detection
Data Leakage Detection
Data Leakage Detection
ABSTRACT
This paper proposes a system that can detect the data leakage pattern using a convolutional
neural network based on defining the behaviors of leaking data. In this case, the leakage
detection scenario of data leakage is composed of the patterns of occurrence of security logs
by administration and related patterns between the security logs that are analyzed by
association relationship analysis. This proposed system then detects whether the data is
leaked through the convolutional neural network using an insider malicious behavior graph.
Since each graph is drawn according to the leakage detection scenario of a data leakage, the
system can identify the criminal insider along with the source of malicious behavior
according to the results of the convolutional neural network. The results of the performance
experiment using a virtual scenario show that even if a new malicious pattern that has not
been previously defined is inputted into the data leakage detection system, it is possible to
determine whether the data has been leaked. In addition, as compared with other data leakage
detection systems, it can be seen that the proposed system is able to detect data leakage more
flexibly
EXISTING SYSTEM
We consider applications where the original sensitive data cannot be perturbed. Perturbation
is a very useful technique where the data are modified and made “less sensitive” before being
handed to agent. In some cases it is important not to alter the original distributor’s data.
Traditionally, leakage detection is handled by watermarking, e.g., a unique code is embedded
in each distributed copy. If that copy is later discovered in the hands of an unauthorized party,
the leaker can be identified. Watermarks can be very useful in some cases, but again, involve
some modification of the original data. Furthermore, watermarks can sometimes be destroyed
if the data recipient is malicious. E.g. A hospital may give patient records to researchers who
will devise new treatments. Similarly, a company may have partnerships with other
companies that require sharing customer data. Another enterprise may outsource its data
processing, so data must be given to various other companies. We call the owner of the data
the distributor and the supposedly trusted third parties the agents.
PROPOSED SYSTEM
Our goal is to detect when the distributor’s sensitive data has been leaked by agents, and if
possible to identify the agent that leaked the data. We develop unobtrusive techniques for
detecting leakage of a set of objects or records. In this section we develop a model for
assessing the “guilt” of agents. We also present algorithms for distributing objects to agents,
in a way that improves our chances of identifying a leaker. Finally, we also consider the
option of adding “fake” objects to the distributed set. Such objects do not correspond to real
entities but appear realistic to the agents. In a sense, the fake objects acts as a type of
watermark for the entire set, without modifying any individual members. If it turns out an
agent was given one or more fake objects that were leaked, then the distributor can be more
confident that agent was guilty.
C. OPTIMIZATION MODULE
The Optimization Module is the distributor’s data allocation to agents has one constraint
and one objective. The agent’s constraint is to satisfy distributor’s requests, by providing
them with the number of objects they request or with all available objects that satisfy
their conditions. His objective is to be able to detect an agent who leaks any portion of
his data. User can able to lock and unlock the files for secure.
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data is leaked and found in an unauthorized place. The distributor
must assess the likelihood that the leaked data came from one or more agents, as opposed
to having been independently gathered by other means Admin can able to view the which
file is leaking and fake user’s details also.
C. AGENT GUILT MODULE
Probability of guilt Pr {Gi|S} can be computed by estimating the probability that the
target can guess objects in “S”. The proposed guilt model makes two assumptions. The
first assumption is that the source of a leaked object can be of any agent. The second
assumption is that an object which is part of set of objects distributed can only be
obtained from one of the agents or through other means. With these assumptions the
probability of guilt is computed as Pr{Ui leaked t to S} = { 1-p , if Ui∈Vt
CONCLUSION
From this study we conclude that the data leakage detection system model is very useful
as compare to the existing watermarking model. We can provide security to our data
during its distribution or transmission and even we can detect if that gets leaked. Thus,
using this model security as well as tracking system is developed. Watermarking can just
provide security using various algorithms through encryption, whereas this model
provides security plus detection technique. Our model is relatively simple, but we believe
that it captures the essential trade-offs. The algorithms we have presented implement a
variety of data distribution strategies that can improve the distributor’s chances of
identifying a leaker. We have shown that distributing objects judiciously can make a
significant difference in identifying guilty agents, especially in cases where there is large
overlap in the data that agents must receive. Our future work includes the investigation of
agent guilt models that capture leakage scenarios.
REFERENCES
[1] P.Buneman, S.Khanna, and W.C.Tan, ”Why and Where: A Charaterization of Data
provenance,” Proc.Eighth Int’l Conf. Database Theory(ICDT ‘01’),J.V. den Bussche and
V.Vianu,eds.,pp.316- 330,Jan.2001
[2] P.Buneman and W.C.Tan,”Provenence in Databases”,Proc ACM SIGMOD,
pp.1171-1173,2007
[3] Y.Cui and J.Widom, ”Lineage Tracing For General Data Warehouse
Transformations,” The VLDB J.vol.12,pp.41-58,2003.
[4] J.J.K.O.Ruanaidh, W.J.Dowling, and F.M.Boland,” Watermarking Digital Images For
Copyright Protection”, IEE Proc.Vision,Signal and Image
Processing,vol.143,no.4,pp.250-256,1996.
[5] F.Hartung and B.Girod,”Watermarking of Uncompressed and Compressed Video,”
Signal Processing, vol.66, no.3,pp.283-301,1998.
[6] S.Czerwinski, R.Fromm,and T.Hodes,”Digital Music Distribution and Audio
watermarking,” http://www.Scientificcommons.org/43025658,2007.
[7] S.Jajodia, P.Samarati, M.L.Sapino,and V.S. Subrahmanian,”Flexible Support For
Multiple Access ControlPolicies,”ACMTrans.DatabaseSystems vol.26.no.2.pp.214-
260,2001.
[8] P.Bonatti, S.D.C.di Vimercati,and P.Samarati,”An Algebra For Composing Access
Control Policies,”ACM Trans.Information and System Security,vol.5,no.1,pp.1-35,2002.