DATA LEAKAGE DETECTION-document

DATA LEAKAGE DETECTION Department of CSE
Chapter-1
INTRODUCTION
1.1 Project Description
In the course of doing business, sometimes sensitive data must be handed over to supposedly
trusted third parties. For example, a company may have partnerships with other companies
that require sharing customer data. Another enterprise may outsource its data processing, so
data must be given to various other companies. Our goal is to detect when the distributor’s
sensitive data have been leaked by agents, and if possible to identify the agent that leaked the
data. Perturbation is a very useful technique where the data are modified and made “less
sensitive” before being handed to agents For example, one can replace exact values by
ranges, or one can add random noise to certain attributes. Traditionally, leakage detection is
handled by watermarking. We annunciate the need for watermarking database relations to
deter their piracy, identify the unique characteristics of relational data which pose new
challenges for watermarking, and provide desirable properties of watermarking system for
relational data. A watermark can be applied to any database relation having attributes which
are such that changes in a few of their values do not affect the applications. Watermarking
means a unique code is embedded in each distributed copy If that copy is later discovered in
the hands of an unauthorized party, the leaker can be identified. Furthermore, watermarks can
sometimes be destroyed if the data recipient is malicious. In this paper, we study unobtrusive
techniques for detecting leakage of a set of objects or records. Specifically, we study the
following scenario: After giving a set of objects to agents, the distributor discovers some of
those same objects in an unauthorized place. At this point, the distributor can assess the
likelihood that the leaked data came from one or more agents, as opposed to having been
independently gathered by other means. Using an analogy with cookies\ stolen from a cookie
jar, if we catch Freddie with a single cookie, he can argue that a friend gave him the cookie.
But if we catch Freddie with five cookies, it will be much harder for him to argue that his
hands were not in the cookie jar. If the distributor sees “enough evidence” that an agent
leaked data, he may stop doing business with him, or may initiate legal proceedings. In this
paper, we develop a model for assessing the “guilt” of agents. We also present algorithms for
distributing objects to agents, in a way that improves our chances of identifying a leaker.
Finally, we also consider the option of adding “fake” objects to the distributed set.
MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 1

1.2 Problem Statement:
Identifying data leakages and improve the probability. Our goal is to detect when the
distributor’s sensitive data have been leaked by agents, and if possible to identify the agent
that leaked the data.
1.3 Objectives :
 A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties).
 Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop).
 The distributor must assess the likelihood that the leaked data came from one or more
agents, as opposed to having been independently gathered by other means.
 We propose data allocation strategies (across the agents) that improve the probability
of identifying leakages.
 These methods do not rely on alterations of the released data (e.g., watermarks). In
some cases we can also inject ―realistic but fake‖ data records to further improve our
chances of detecting leakage and identifying the guilty party.
 Our goal is to detect when the distributor’s sensitive data has been leaked by agents,
and if possible to identify the agent that leaked the data.

Chapter-2
2.1 Literature Survey
2.1.1 Existing System:
In existing system, we consider applications where the original sensitive data cannot be
perturbed. Perturbation is a very useful technique where the data is modified and made ―less
sensitive‖ before being handed to agents. However, in some cases it is important not to alter
the original distributor’s data. Traditionally, leakage detection is handled by watermarking,
e.g., a unique code is embedded in each distributed copy. If that copy is later discovered in
the hands of an unauthorized party, the leaker can be identified. Watermarks can be very
useful in some cases, but again, involve some modification of the original data. Furthermore,
watermarks can sometimes be destroyed if the data recipient is malicious.
2.1.2 Proposed System:

In proposed system, after giving a set of objects to agents, the distributor discovers some of
those same objects in an unauthorized place. At this point the distributor can assess the
likelihood that the leaked data came from one or more agents, as opposed to having been
independently gathered by other means. If the distributor sees enough evidence‖ that an agent
leaked data, he may stop doing business with him, or may initiate legal proceedings. In this
project we develop a model for assessing the guilt of agents. We also present algorithms for
distributing objects to agents, in a way that improves our chances of identifying a leaker.
Finally, we also consider the option of adding fake objects to the distributed set. Such objects
do not correspond to real entities but appear. If it turns out an agent was given one or more
fake objects that were leaked, then the distributor can be more confident that agent was
guilty.
2.2 Project Identification

2.2.1 About data leakage detection :
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop). The distributor must assess the likelihood that the leaked data came from
one or more agents, as opposed to having been independently gathered by other means. We
propose data allocation strategies (across the agents) that improve the probability of

identifying leakages. These methods do not rely on alterations of the released data (e.g.,
watermarks). In some cases we can also inject realistic but fake data records to further
improve our chances of detecting leakage and identifying the guilty party.
2.2.2 Inference:
In a perfect world there would be no need to hand over sensitive data to agents that may
unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a
perfect world we could watermark each object so that we could trace its origins with absolute
certainty. However, in many cases we must indeed work with agents that may not be 100%
trusted, and we may not be certain if a leaked object came from an agent or from some other
source, since certain data cannot admit watermarks. In spite of these difficulties, we have
shown it is possible to assess the likelihood that an agent is responsible for a leak, based on
the overlap of his data with the leaked data and the data of other agents, and based on the
probability that objects can be guessed by other means. Our model is relatively simple, but
we believe it captures the essential trade-offs. The algorithms we have presented implement a
variety of data distribution strategies that can improve the distributor’s chances of identifying
a leaker. We have shown that distributing objects judiciously can make a significant
difference in identifying guilty agents, especially in cases where there is large overlap in the
data that agents must receive.

Chapter-3
METHODOLOGY
Problem Setup and Notation:
A distributor owns a set T={t1,…,tm}of valuable data objects. The distributor wants to share
some of the objects with a set of agents U1, U2,…Un, but does not wish the objects be leaked
to other third parties. The objects in T could be of any type and size, e.g., they could be tuples
in a relation, or relations in a database. An agent Ui receives a subset of objects, determined
either by a sample request or an explicit request:
1. Sample request
2. Explicit request
Guilt Model Analysis:

Our model parameters interact and to check if the interactions match our intuition, in this
section we study two simple scenarios as Impact of Probability p and Impact of Overlap
between Ri and S. In each scenario we have a target that has obtained all the distributor’s
objects, i.e., T = S.
Algorithms:
1. Evaluation of Explicit Data Request Algorithms

In the first place, the goal of these experiments was to see whether fake objects in the
distributed data sets yield significant improvement in our chances of detecting a guilty agent.
In the second place, we wanted to evaluate our e-optimal algorithm relative to a random
allocation.
2. Evaluation of Sample Data Request Algorithms

With the sample data agents are not interested in particular objects. Hence, object sharing is
not explicitly defined by their requests. The distributor is “fixed” to allocate certain objects to
multiple agents only if the number of request objects exceeds the number of objects in set T.
The more data objects the agents requests in total, the more recipients on average an objects
has; and the more objects are shared among different agents, the more difficult it is to detect a
guilty agent.

Chapter-4
IMPLEMENTATION
4.1 SYSTEM ANALYSIS
4.1.1 SOFTWARE REQUIREMENT SPECIFICATION

Software Requirement Specification (SRS) is the starting point of the software developing
activity. As system grew more complex it became evident that the goal of the entire system
cannot be easily comprehended. Hence the need for the requirement phase arose. The
software project is initiated by the client needs. The SRS is the means of translating the ideas
of the minds of clients (the input) into a formal document (the output of the requirement
phase.)
The SRS phase consists of two basic activities:
1) Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the problem, the
goal and constraints.
2) Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as representation,
specification languages and tools, and checking the specifications are addressed during this
activity. The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.
ROLE OF SRS:
The purpose of the software requirement specification is to reduce the communication gap
between the clients and the developers. Software requirement specification is the medium
through which the client and user needs are accurately specified. It forms the basis of
software development. A good SRS should satisfy all the parties involved in the system

SCOPE
This document is the only one that describes the requirements of the system. It is
meant for the use by the developers, and will also be the basis for validating the final
delivered system. Any changes made to the requirements in the future will have to go through
a formal change approval process. The developer is responsible for asking for clarifications,
where necessary, and will not make any alterations without the permission of the client.
4.1.2 System Specification

System Requirements:
Hardware Requirements:
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• Ram : 512 Mb(max).
Software Requirements:
• Operating system : - Windows XP.
• Coding Language : DOT NET
• Data Base : SQL Server 2005
4.2 SOFTWARE DESIGN

In designing the software following principles are followed:
1. Modularity and partitioning: software is designed such that, each system should consists
of hierarchy of modules and serve to partition into separate function.
2. Coupling: modules should have little dependence on other modules of a system.
3. Cohesion: modules should carry out in a single processing function.
4. Shared use: avoid duplication by allowing a single module be called by other that need the
function it provides

Proposed Modules:
1. Data Allocation Module

2. Fake Object Module
3. Optimization Module
4. Data Distributor
4.3 MODULES:
1. Data Allocation Module:
The main focus of our project is the data allocation problem as how can the distributor
“intelligently” give data to agents in order to improve the chances of detecting a guilty agent.
2. Fake Object Module:
Fake objects are objects generated by the distributor in order to increase the chances of
detecting agents that leak data. The distributor may be able to add fake objects to the
distributed data in order to improve his effectiveness in detecting guilty agents. Our use of
fake objects is inspired by the use of “trace” records in mailing lists.
3.Optimization Module:
The Optimization Module is the distributor’s data allocation to agents has one constraint and
one objective. The distributor’s constraint is to satisfy agents’ requests, by providing them
with the number of objects they request or with all available objects that satisfy their
conditions. His objective is to be able to detect an agent who leaks any portion of his data.
4. Data Distributor:
A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or
somebody’s laptop). The distributor must assess the likelihood that the leaked data came from
one or more agents, as opposed to having been independently gathered by other means.

Chapter-5
SYSTEM DESIGN
5.1 UML Diagrams
Data Flow Diagram / Use Case Diagram / Flow Diagram
The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of the input data to the system, various processing carried
out on these data, and the output data is generated by the system.
5.1.1 Data Flow Diagram:
Login
Admin Agent
Check
no
Exists
Select Agent
Create Account
yes
View and update agent details
Upload File to Agent
File details
File maintenance and secret key
File lock with secrete key
Data Leaker
If exists
End File locked File unlocked
File download with secret

If secret key key
exists
Original file Duplicate
file
Fig: Flow chart on Data leakage detection

5.1.2 Use Case Diagram:
Create an
Account
Login
Upload files to
Agent
Agent
Admin Generate Secret
Key
Download Files
Lock/Unlock
Data Leaker
Fig: Use case diagram on Data leakage detection

5.1.3 Class Diagram:
Upload Files Agent Account

FileID AgentName
FileName AgentID
AgentID AgentPassword
FileType EmailID
Filepath
UploadDate
CreateAccount()
SenttoAgent() GenerateKey()
ViewFileDetails()
Lock/UnLock
FileID Edit Account
FilePassword AgentName
ReTypePassword EmailID
SecretKey OldPassword
NewPassword
ReType NewPassword
Lock()
UnLock() Update()
Fig: Class diagram on Data leakage detection

5.1.4 Sequence Diagram:
DataBase
Agent Adm in
C reate an Account
Upload Files Store Files
L:ock/UnLockFiles
View Agent Account
View Files
Download Files
Send Required File
Send Duplicate File
Data Leaker
If Secret key does not matches If Secr et key matches
Fig: Sequence diagram for data leakage detection

5.1.5 Activity Diagram:
Login
Check
Create Account
Upload files
no
yes
Exists
File Maintenance Lock /Unlock Files
File download
Data leaker
If secret key exists Not exists
Check
Receive Duplicate files

Download original File
Fig: Activity diagram on Data leakage detection

5.2 Project Code

Login
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;
public class Login extends HttpServlet {
private static final long serialVersionUID = 1L;
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws

ServletException, IOException {
String ss1=request.getParameter("Name");
String ss2=request.getParameter("Password");
HttpSession hs=request.getSession(true);
try {
Class.forName("com.mysql.jdbc.Driver");
Connection conn=(Connection)
DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony2603");
String s="select * from registration where Name=? and Password=?";
PreparedStatement ps=conn.prepareStatement(s);
ps.setString(1,ss1);

ps.setString(2,ss2);
ResultSet rs=ps.executeQuery();
if(rs.next()) {
hs.setAttribute("ss2",ss1);
hs.setAttribute("s0",ss2);
System.out.println("Login successful");
response.sendRedirect("welcome.jsp"); }
else {
System.out.println("data not entered"); } }
catch(Exception e) {
System.out.println("database error");
e.printStackTrace(); } } }
Register
import java.io.PrintWriter;
@WebServlet("/Register")

public class Register extends HttpServlet {
private static final long serialVersionUL;

String ss1=request.getParameter("ID");
String ss2=request.getParameter("Name");
String ss3=request.getParameter("Password");
String ss4=request.getParameter("Email");
PrintWriter pw=response.getWriter();
Try {
Connection
conn=DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony26
03");
String s="insert into registration values(?,?,?,?)";
ps.setString(1, ss1);
int i=ps.executeUpdate();
if(i>0){
//System.out.println("data entered successfully");
response.sendRedirect("login.html"); }

else {
System.out.println("data not entered successfully"): } }
System.out.println("Database error");
Change
import javax.servlet.http.HttpSession;
@WebServlet("/Change")
public class Change extends HttpServlet {

HttpSession hs=request.getSession(true);

String s1=request.getParameter("o1");
String s2=request.getParameter("n1");
String s3=request.getParameter("c1");
String s4=(String) hs.getAttribute("ss2");
String s5=(String) hs.getAttribute("s0");
if(s1.equals(s5)&&s2.equals(s3)){
try{
Connection
03");
String q="update registration set Password=? where Name=?";
PreparedStatement ps=conn.prepareStatement(q);
ps.setString(1, s2);
ps.setString(2,s4);
if(i>0){
System.out.println("change password");
}else{
System.out.println("not change password"); }
}catch(Exception e){ } } }

Delete:
@WebServlet("/Delete")
public class Delete extends HttpServlet {
private static final long serialVersionUIL;

String b=request.getParameter("abc");
try{
Connection
03");
String s="Delete from registration where (Name=?)";
ps.setString(1, b);

if(i>0) {
System.out.println("delete successful"); }
Else {
System.out.println("delete not success"); } }
System.out.println("Database error");
Update
import javax.websocket.Session;
@WebServlet("/Update")
public class Update extends HttpServlet {


String v1=request.getParameter("id");
String v2=request.getParameter("eid");
String v3=request.getParameter("name");
Try {
Connection conn=(Connection)
DriverManager.getConnection("jdbc:mysql://localhost:3306/data_base","root","sony2603");
String q="update registration set ID=?,Email=? where Name=?";
PreparedStatement ps=conn.prepareStatement(q);
ps.setString(1, v1);
if(i>0) {
System.out.println("update succes"); }
Else {
System.out.println("not updated"); }
}catch(Exception e){
e.printStackTrace(); }}}

5.3 Testing:
Sl Test Test Steps Test Data Expected Status
No. Scenario Results
1 1.Goto Userid=abc Logged in Pass

Check login Password=abc successfully
admin page
2 login with 2.Enter Userid=navya Please enter Fail
valid data user id. Password=navy valid details
3.Enter
password Userid=navya Correct Pass
Details
Password=navya
3 1.Goto Admin name=admin Logged in Pass

Check admin Password=admin successfully
distributor login
4 login with 2.Enter Admin name=admin Please enter Pass
valid data admin Password=Administrator valid details
name
3.Enter
admin
password
5 1.Goto Userid=checha Goto file sent Pass

Login to distributor
2.After Check correct data
Check successful
6 Issue login go Userid=checha Required key Pass
to admin Go to details is applied
details
7 Check 1.Goto Data leaked files Valid Date Pass

Credentials distributor format
2.Enter
key

Chapter-6
OUTPUT SCREENSHOTS
Distributor Login

Distributor Home Page

Distributor Send file

View Sent files

View Leak Files

Agent Home

View Files Sent By Distributor

View Key

Files Sent By Agent

Send File To Agent

Edit Account Details

User Registration

Chapter-7
CONCLUSION
In a perfect world there would be no need to hand over sensitive data to agents that
may unknowingly or maliciously leak it. And even if we had to hand over sensitive data, in a
perfect world we could watermark each object so that we could trace its origins with absolute
certainty. However, in many cases we must indeed work with agents that may not be 100%
trusted, and we may not be certain if a leaked object came from an agent or from some other
source, since certain data cannot admit watermarks. In spite of these difficulties, we have
shown it is possible to assess the likelihood that an agent is responsible for a leak, based on
the overlap of his data with the leaked data and the data of other agents, and based on the
probability that objects can be “guessed” by other means. Our model is relatively simple, but
we believe it captures the essential trade-offs. The algorithms we have presented implement a
variety of data distribution strategies that can improve the distributor’s chances of identifying
a leaker. We have shown that distributing objects judiciously can make a significant
difference in identifying guilty agents, especially in cases where there is large overlap in the
data that agents must receive. Our future work includes the investigation of agent guilt
models that capture leakage scenarios that are not studied in this paper. For example, what is
the appropriate model for cases where agents can collude and identify fake tuples? A
preliminary discussion of such a model is available in Another open problem is the extension
of our allocation strategies so that they can handle agent requests in an online fashion (the
presented strategies assume that there is a fixed set of agents with requests known in
advance).

Chapter-8
BIBLIOGRAPHY
Good Teachers are worth more than thousand books, we have them in Our Department.
References:
1. User Interfaces in C#: Windows Forms and Custom Controls by Matthew MacDonald.
2. Applied Microsoft® .NET Framework Programming (Pro-Developer) by Jeffrey Richter.
3. Practical .Net2 and C#2: Harness the Platform, the Language, and the Framework by
Patrick Smacchia.
4. Data Communications and Networking, by Behrouz A Forouzan.
5. Computer Networking: A Top-Down Approach, by James F. Kurose.
6. Operating System Concepts, by Abraham Silberschatz.
7. R. Agrawal and J. Kiernan. Watermarking relational databases. In VLDB ’02: Proceedings
of the 28th international conference on Very Large Data Bases, pages 155–166. VLDB
Endowment, 2002.
8. P. Bonatti, S. D. C. di Vimercati, and P. Samarati. An algebra for composing access control
policies. ACM Trans. Inf. Syst. Secur., 5(1):1–35, 2002.
9. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data
provenance. In J. V. den Bussche and V. Vianu, editors, Database Theory - ICDT 2001, 8th
International Conference, London, UK, January 4-6, 2001, Proceedings, volume 1973 of
Lecture Notes in Computer Science, pages 316–330. Springer, 2001
10. P. Buneman and W.-C. Tan. Provenance in databases. In SIGMOD ’07: Proceedings of the
2007 ACM SIGMOD international conference on Management of data, pages 1171–1173,
New York, NY, USA, 2007. ACM.
11. Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. In The
VLDB Journal, pages 471–480, 2001.
12. S. Czerwinski, R. Fromm, and T. Hodes. Digital music distribution and audio
watermarking.
13. F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li. Information Security Applications, pages
138–149. Springer, Berlin / Heidelberg, 2006. An Improved Algorithm to Watermark
Numeric Relational Data.

Sites Referred:
http://www.sourcefordgde.com
http://www.networkcomputing.com/
http://www.ieee.org
http://www.almaden.ibm.com/software/quest/Resources/
http://www.computer.org/publications/dlib
http://www.ceur-ws.org/Vol-90/
http://www.microsoft.com/isapi/redir.dll?prd=ie&pver=6&ar=msnhome

DATA LEAKAGE DETECTION-document

Uploaded by

Copyright:

Available Formats

DATA LEAKAGE DETECTION-document

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DATA LEAKAGE DETECTION-document

Uploaded by

Copyright:

Available Formats

What is the goal of the paper?

What is the goal of the paper?

What techniques are proposed to detect data leakage?

What techniques are proposed to detect data leakage?

DATA LEAKAGE DETECTION Department of CSE

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 1

1.2 Problem Statement:

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 2

2.1.2 Proposed System:

2.2 Project Identification

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 3

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 4

Guilt Model Analysis:

1. Evaluation of Explicit Data Request Algorithms

2. Evaluation of Sample Data Request Algorithms

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 5

4.1.1 SOFTWARE REQUIREMENT SPECIFICATION

The SRS phase consists of two basic activities:

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 6

4.1.2 System Specification

4.2 SOFTWARE DESIGN

2. Coupling: modules should have little dependence on other modules of a system.

3. Cohesion: modules should carry out in a single processing function.

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 7

1. Data Allocation Module

1. Data Allocation Module:

2. Fake Object Module:

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 8

5.1.1 Data Flow Diagram:

File lock with secrete key

File download with secret

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 9

5.1.2 Use Case Diagram:

Fig: Use case diagram on Data leakage detection

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 10

5.1.3 Class Diagram:

Upload Files Agent Account

Fig: Class diagram on Data leakage detection

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 11

5.1.4 Sequence Diagram:

Upload Files Store Files

View Agent Account

Send Required File

Send Duplicate File

If Secret key does not matches If Secr et key matches

Fig: Sequence diagram for data leakage detection

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 12

5.1.5 Activity Diagram:

File Maintenance Lock /Unlock Files

Receive Duplicate files

Fig: Activity diagram on Data leakage detection

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 13

5.2 Project Code

public class Login extends HttpServlet {

private static final long serialVersionUID = 1L;

protected void doPost(HttpServletRequest request, HttpServletResponse response) throws

String s="select * from registration where Name=? and Password=?";

MALLAREDDY ENGINEERING COLLEGE FOR WOMEN Page 14

System.out.println("data not entered"); } }