DATA LEAKAGE DETECTION-document
DATA LEAKAGE DETECTION-document
DATA LEAKAGE DETECTION-document
Chapter-1
INTRODUCTION
1.1 Project Description
In the course of doing business, sometimes sensitive data must be handed over to supposedly
trusted third parties. For example, a company may have partnerships with other companies
that require sharing customer data. Another enterprise may outsource its data processing, so
data must be given to various other companies. Our goal is to detect when the distributor’s
sensitive data have been leaked by agents, and if possible to identify the agent that leaked the
data. Perturbation is a very useful technique where the data are modified and made “less
sensitive” before being handed to agents For example, one can replace exact values by
ranges, or one can add random noise to certain attributes. Traditionally, leakage detection is
handled by watermarking. We annunciate the need for watermarking database relations to
deter their piracy, identify the unique characteristics of relational data which pose new
challenges for watermarking, and provide desirable properties of watermarking system for
relational data. A watermark can be applied to any database relation having attributes which
are such that changes in a few of their values do not affect the applications. Watermarking
means a unique code is embedded in each distributed copy If that copy is later discovered in
the hands of an unauthorized party, the leaker can be identified. Furthermore, watermarks can
sometimes be destroyed if the data recipient is malicious. In this paper, we study unobtrusive
techniques for detecting leakage of a set of objects or records. Specifically, we study the
following scenario: After giving a set of objects to agents, the distributor discovers some of
those same objects in an unauthorized place. At this point, the distributor can assess the
likelihood that the leaked data came from one or more agents, as opposed to having been
independently gathered by other means. Using an analogy with cookies\ stolen from a cookie
jar, if we catch Freddie with a single cookie, he can argue that a friend gave him the cookie.
But if we catch Freddie with five cookies, it will be much harder for him to argue that his
hands were not in the cookie jar. If the distributor sees “enough evidence” that an agent
leaked data, he may stop doing business with him, or may initiate legal proceedings. In this
paper, we develop a model for assessing the “guilt” of agents. We also present algorithms for
distributing objects to agents, in a way that improves our chances of identifying a leaker.
Finally, we also consider the option of adding “fake” objects to the distributed set.
Identifying data leakages and improve the probability. Our goal is to detect when the
distributor’s sensitive data have been leaked by agents, and if possible to identify the agent
that leaked the data.
1.3 Objectives :
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties).
Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop).
The distributor must assess the likelihood that the leaked data came from one or more
agents, as opposed to having been independently gathered by other means.
We propose data allocation strategies (across the agents) that improve the probability
of identifying leakages.
These methods do not rely on alterations of the released data (e.g., watermarks). In
some cases we can also inject ―realistic but fake‖ data records to further improve our
chances of detecting leakage and identifying the guilty party.
Our goal is to detect when the distributor’s sensitive data has been leaked by agents,
and if possible to identify the agent that leaked the data.
Chapter-2
2.1 Literature Survey
2.1.1 Existing System:
In existing system, we consider applications where the original sensitive data cannot be
perturbed. Perturbation is a very useful technique where the data is modified and made ―less
sensitive‖ before being handed to agents. However, in some cases it is important not to alter
the original distributor’s data. Traditionally, leakage detection is handled by watermarking,
e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in
the hands of an unauthorized party, the leaker can be identified. Watermarks can be very
useful in some cases, but again, involve some modification of the original data. Furthermore,
watermarks can sometimes be destroyed if the data recipient is malicious.
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop). The distributor must assess the likelihood that the leaked data came from
one or more agents, as opposed to having been independently gathered by other means. We
propose data allocation strategies (across the agents) that improve the probability of
identifying leakages. These methods do not rely on alterations of the released data (e.g.,
watermarks). In some cases we can also inject realistic but fake data records to further
improve our chances of detecting leakage and identifying the guilty party.
2.2.2 Inference:
In a perfect world there would be no need to hand over sensitive data to agents that may
unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a
perfect world we could watermark each object so that we could trace its origins with absolute
certainty. However, in many cases we must indeed work with agents that may not be 100%
trusted, and we may not be certain if a leaked object came from an agent or from some other
source, since certain data cannot admit watermarks. In spite of these difficulties, we have
shown it is possible to assess the likelihood that an agent is responsible for a leak, based on
the overlap of his data with the leaked data and the data of other agents, and based on the
probability that objects can be guessed by other means. Our model is relatively simple, but
we believe it captures the essential trade-offs. The algorithms we have presented implement a
variety of data distribution strategies that can improve the distributor’s chances of identifying
a leaker. We have shown that distributing objects judiciously can make a significant
difference in identifying guilty agents, especially in cases where there is large overlap in the
data that agents must receive.
Chapter-3
METHODOLOGY
Problem Setup and Notation:
A distributor owns a set T={t1,…,tm}of valuable data objects. The distributor wants to share
some of the objects with a set of agents U1, U2,…Un, but does not wish the objects be leaked
to other third parties. The objects in T could be of any type and size, e.g., they could be tuples
in a relation, or relations in a database. An agent Ui receives a subset of objects, determined
either by a sample request or an explicit request:
1. Sample request
2. Explicit request
Algorithms:
Chapter-4
IMPLEMENTATION
4.1 SYSTEM ANALYSIS
1) Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the problem, the
goal and constraints.
2) Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as representation,
specification languages and tools, and checking the specifications are addressed during this
activity. The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.
ROLE OF SRS:
The purpose of the software requirement specification is to reduce the communication gap
between the clients and the developers. Software requirement specification is the medium
through which the client and user needs are accurately specified. It forms the basis of
software development. A good SRS should satisfy all the parties involved in the system
SCOPE
This document is the only one that describes the requirements of the system. It is
meant for the use by the developers, and will also be the basis for validating the final
delivered system. Any changes made to the requirements in the future will have to go through
a formal change approval process. The developer is responsible for asking for clarifications,
where necessary, and will not make any alterations without the permission of the client.
Hardware Requirements:
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb(max).
Software Requirements:
• Operating system : - Windows XP.
• Coding Language : DOT NET
• Data Base : SQL Server 2005
1. Modularity and partitioning: software is designed such that, each system should consists
of hierarchy of modules and serve to partition into separate function.
4. Shared use: avoid duplication by allowing a single module be called by other that need the
function it provides
Proposed Modules:
The main focus of our project is the data allocation problem as how can the distributor
“intelligently” give data to agents in order to improve the chances of detecting a guilty agent.
Fake objects are objects generated by the distributor in order to increase the chances of
detecting agents that leak data. The distributor may be able to add fake objects to the
distributed data in order to improve his effectiveness in detecting guilty agents. Our use of
fake objects is inspired by the use of “trace” records in mailing lists.
3.Optimization Module:
The Optimization Module is the distributor’s data allocation to agents has one constraint and
one objective. The distributor’s constraint is to satisfy agents’ requests, by providing them
with the number of objects they request or with all available objects that satisfy their
conditions. His objective is to be able to detect an agent who leaks any portion of his data.
4. Data Distributor:
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop). The distributor must assess the likelihood that the leaked data came from
one or more agents, as opposed to having been independently gathered by other means.
Chapter-5
SYSTEM DESIGN
5.1 UML Diagrams
Data Flow Diagram / Use Case Diagram / Flow Diagram
The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of the input data to the system, various processing carried
out on these data, and the output data is generated by the system.
Login
Admin Agent
Check
no
Exists
Select Agent
Create Account
yes
View and update agent details
Upload File to Agent
File details
File maintenance and secret key
Data Leaker
If exists
End File locked File unlocked
Create an
Account
Login
Upload files to
Agent
Agent
Admin Generate Secret
Key
Download Files
Lock/Unlock
Data Leaker
Lock/UnLock
FileID Edit Account
FilePassword AgentName
ReTypePassword EmailID
SecretKey OldPassword
NewPassword
ReType NewPassword
Lock()
UnLock() Update()
DataBase
Agent Adm in
C reate an Account
L:ock/UnLockFiles
View Files
Download Files
Data Leaker
Login
Check
Create Account
Upload files
no
yes
Exists
File download
Data leaker
If secret key exists Not exists
Check
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;
String ss1=request.getParameter("Name");
String ss2=request.getParameter("Password");
HttpSession hs=request.getSession(true);
try {
Class.forName("com.mysql.jdbc.Driver");
Connection conn=(Connection)
DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony2603");
PreparedStatement ps=conn.prepareStatement(s);
ps.setString(1,ss1);
ps.setString(2,ss2);
ResultSet rs=ps.executeQuery();
if(rs.next()) {
hs.setAttribute("ss2",ss1);
hs.setAttribute("s0",ss2);
System.out.println("Login successful");
response.sendRedirect("welcome.jsp"); }
else {
catch(Exception e) {
System.out.println("database error");
e.printStackTrace(); } } }
Register
import java.io.IOException;
import java.io.PrintWriter;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
@WebServlet("/Register")
String ss1=request.getParameter("ID");
String ss2=request.getParameter("Name");
String ss3=request.getParameter("Password");
String ss4=request.getParameter("Email");
PrintWriter pw=response.getWriter();
Try {
Class.forName("com.mysql.jdbc.Driver");
Connection
conn=DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony26
03");
PreparedStatement ps=conn.prepareStatement(s);
ps.setString(1, ss1);
ps.setString(2, ss2);
ps.setString(3, ss3);
ps.setString(4, ss4);
int i=ps.executeUpdate();
if(i>0){
response.sendRedirect("login.html"); }
else {
catch(Exception e) {
System.out.println("Database error");
e.printStackTrace(); } } }
Change
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;
@WebServlet("/Change")
HttpSession hs=request.getSession(true);
String s1=request.getParameter("o1");
String s2=request.getParameter("n1");
String s3=request.getParameter("c1");
if(s1.equals(s5)&&s2.equals(s3)){
try{
Class.forName("com.mysql.jdbc.Driver");
Connection
conn=DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony26
03");
PreparedStatement ps=conn.prepareStatement(q);
ps.setString(1, s2);
ps.setString(2,s4);
int i=ps.executeUpdate();
if(i>0){
System.out.println("change password");
}else{
}catch(Exception e){ } } }
Delete:
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
@WebServlet("/Delete")
String b=request.getParameter("abc");
try{
Class.forName("com.mysql.jdbc.Driver");
Connection
conn=DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony26
03");
PreparedStatement ps=conn.prepareStatement(s);
ps.setString(1, b);
int i=ps.executeUpdate();
if(i>0) {
System.out.println("delete successful"); }
Else {
catch(Exception e) {
System.out.println("Database error");
e.printStackTrace(); } } }
Update
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.websocket.Session;
@WebServlet("/Update")
String v1=request.getParameter("id");
String v2=request.getParameter("eid");
String v3=request.getParameter("name");
Try {
Class.forName("com.mysql.jdbc.Driver");
Connection conn=(Connection)
DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony2603");
PreparedStatement ps=conn.prepareStatement(q);
ps.setString(1, v1);
ps.setString(2, v2);
ps.setString(3, v3);
int i=ps.executeUpdate();
if(i>0) {
System.out.println("update succes"); }
Else {
System.out.println("not updated"); }
}catch(Exception e){
e.printStackTrace(); }}}
5.3 Testing:
Sl Test Test Steps Test Data Expected Status
No. Scenario Results
Chapter-6
OUTPUT SCREENSHOTS
Distributor Login
Agent Home
View Key
User Registration
Chapter-7
CONCLUSION
In a perfect world there would be no need to hand over sensitive data to agents that
may unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a
perfect world we could watermark each object so that we could trace its origins with absolute
certainty. However, in many cases we must indeed work with agents that may not be 100%
trusted, and we may not be certain if a leaked object came from an agent or from some other
source, since certain data cannot admit watermarks. In spite of these difficulties, we have
shown it is possible to assess the likelihood that an agent is responsible for a leak, based on
the overlap of his data with the leaked data and the data of other agents, and based on the
probability that objects can be “guessed” by other means. Our model is relatively simple, but
we believe it captures the essential trade-offs. The algorithms we have presented implement a
variety of data distribution strategies that can improve the distributor’s chances of identifying
a leaker. We have shown that distributing objects judiciously can make a significant
difference in identifying guilty agents, especially in cases where there is large overlap in the
data that agents must receive. Our future work includes the investigation of agent guilt
models that capture leakage scenarios that are not studied in this paper. For example, what is
the appropriate model for cases where agents can collude and identify fake tuples? A
preliminary discussion of such a model is available in Another open problem is the extension
of our allocation strategies so that they can handle agent requests in an online fashion (the
presented strategies assume that there is a fixed set of agents with requests known in
advance).
Chapter-8
BIBLIOGRAPHY
Good Teachers are worth more than thousand books, we have them in Our Department.
References:
1. User Interfaces in C#: Windows Forms and Custom Controls by Matthew MacDonald.
2. Applied Microsoft® .NET Framework Programming (Pro-Developer) by Jeffrey Richter.
3. Practical .Net2 and C#2: Harness the Platform, the Language, and the Framework by
Patrick Smacchia.
4. Data Communications and Networking, by Behrouz A Forouzan.
5. Computer Networking: A Top-Down Approach, by James F. Kurose.
6. Operating System Concepts, by Abraham Silberschatz.
7. R. Agrawal and J. Kiernan. Watermarking relational databases. In VLDB ’02: Proceedings
of the 28th international conference on Very Large Data Bases, pages 155–166. VLDB
Endowment, 2002.
8. P. Bonatti, S. D. C. di Vimercati, and P. Samarati. An algebra for composing access control
policies. ACM Trans. Inf. Syst. Secur., 5(1):1–35, 2002.
9. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data
provenance. In J. V. den Bussche and V. Vianu, editors, Database Theory - ICDT 2001, 8th
International Conference, London, UK, January 4-6, 2001, Proceedings, volume 1973 of
Lecture Notes in Computer Science, pages 316–330. Springer, 2001
10. P. Buneman and W.-C. Tan. Provenance in databases. In SIGMOD ’07: Proceedings of the
2007 ACM SIGMOD international conference on Management of data, pages 1171–1173,
New York, NY, USA, 2007. ACM.
11. Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. In The
VLDB Journal, pages 471–480, 2001.
12. S. Czerwinski, R. Fromm, and T. Hodes. Digital music distribution and audio
watermarking.
13. F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li. Information Security Applications, pages
138–149. Springer, Berlin / Heidelberg, 2006. An Improved Algorithm to Watermark
Numeric Relational Data.
Sites Referred:
http://www.sourcefordgde.com
http://www.networkcomputing.com/
http://www.ieee.org
http://www.almaden.ibm.com/software/quest/Resources/
http://www.computer.org/publications/dlib
http://www.ceur-ws.org/Vol-90/
http://www.microsoft.com/isapi/redir.dll?prd=ie&pver=6&ar=msnhome