A.V.C.Colege of Engineering Department of Computer Applications
A.V.C.Colege of Engineering Department of Computer Applications
A.V.C.Colege of Engineering Department of Computer Applications
COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
PROJECT TITLE
Date : 23-3-2009
15,T.B.Road,Mappalayam,
Madurai-625 010.
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
INTRODUCTION
Many organizations collect large amounts of data about their clients and customers. Such
data was used for record keeping. Data mining can extract valuable knowledge from this data
.Organizations can obtain better results by pooling their data together. However, the collected data
may contain sensitive or private information about the organizations or their customers, and
privacy concerns are exacerbated if data is shared between multiple organizations. Distributed
data mining is concerned with the computation of models from data that is distributed among
multiple participants. Privacy-preserving distributed data mining seeks to allow for the
cooperative computation of such models without the cooperating parties revealing any of their
individual data items. Our project makes two contributions in privacy-preserving data mining.
First, we introduce the concept of arbitrarily partitioned data, which is a generalization of both
horizontally and vertically partitioned data. Second, we provide an efficient privacy-preserving
protocol for k-means clustering in the setting of arbitrarily partitioned data. Privacy-preserving
distributed data mining allows the cooperative computation of data mining algorithms without
requiring the participating organizations to re- veal their individual data items to each other.
We present a simple I/O-efficient k-clustering algorithm that was designed with the goal of
enabling a privacy-preserving version of the algorithm. Our experiments show that this algorithm
produces cluster centers that are, on average, more accurate than the ones produced by the well
known iterative k-means algorithm. We use our new algorithm as the basis for a communication-
efficient privacy-preserving k-clustering protocol for databases that are horizontally partitioned
between two parties. Unlike existing privacy-preserving protocols based on the k-means
algorithm, this protocol does not reveal intermediate candidate cluster centers. In this work, we
propose methods for constructing the dissimilarity matrix of objects.
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
Existing System:
The earlier techniques are data perturbation techniques for privacy preserving classification
model construction on centralized data , techniques for privacy preserving association rule mining
in distributed environments . Techniques from secure multiparty computation form one approach
to privacy-preserving data mining. Yao’s general protocol for secure circuit evaluation can be
used to solve any two-party privacy-preserving distributed data mining problem. However, since
data mining usually involves millions or billions of data items, the communication cost of this
protocol renders it impractical for these purposes.
Proposed System:
Future Enhancement:
Most of the previous studies investigated the problem and proposed solutions based on the
assumption that all parties are honest or semi-honest. While it is sometimes useful, this
assumption substantially underestimates the capability of adversaries and thus does not always
hold in practical situations. We considered a space of more powerful adversaries which include
not only honest and semi-honest adversaries but also those who are weakly malicious and strongly
malicious.
1) extending the information sharing function from intersection to other operations, and
2) dealing with multiple parties in the system, including dealing with correlated attacks from
multiple adversaries.
Cluster Checking
System-1 Start
FTP Server
Decrypt
IF
EXIT
Receive
System-2
Encrypt Choose File
USECASE DIAGRAM
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
System1
File Choosing
Clustering the
Selected File
Encryption
Decryption
Reclustering the
clustered File
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
MODULE DESCRIPTIONS
Modules:
1. Data collection and mining
2. Formation of Cluster
3. Privacy preservation
4. Comparison evaluation of algorithms
The data collection is the important part of the project. The dataset is prepared by
Integrating the organizations details. Such data was initially used only for record keeping. These
large collections of data could be “mined” for knowledge that could improve the performance of
the organization. While much data mining occurs on data within an organization, it is quite
common to use data from multiple sources in order to yield more precise or useful knowledge.
However, privacy and secrecy considerations can prohibit organizations from being willing or
able to share their data with each other. Therefore we preserve Privacy-preserving data mining
technique.
2. Formation of Cluster:
The k-means of clustering algorithm is used for cluster formation. K-Means Clustering and
Fuzzy Clustering are different than Hierarchical Clustering and Diversity Selection in that the
number of clusters, K, needs to be determined at the onset. The goal is to divide the objects into K
clusters such that some metric relative to the centroids of the clusters is minimized. If the
clustering is over binary objects, medoids need to be used instead of centroids. A medoid is just a
bit string that minimizes the sum of distances to all objects in the cluster.
Privacy-preserving data mining solutions have been presented both with respect to
horizontally and vertically partitioned databases, in which either different data objects with the
same attributes, are owned by each party, or different attributes for the same data objects are
owned by each party, respectively. We introduce the notion of arbitrarily partitioned data,
which generalizes both horizontally and vertically partitioned data. In arbitrarily partitioned
data, different attributes for different items can be owned by either party. Although extremely
“patchworked” data is unlikely in practice, one advantage of considering arbitrarily partitioned
data is that protocols in this model apply both to horizontally and vertically partitioned data, as
well as to hybrid that are mostly, but not completely, vertically or horizontally partitioned.
SCREEN SHOTS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
DATABASE DESCRIPTION
PRIVACY PRESERVATION :
FIELD TYPE
ID Integer
IP ADDRESS String
LOGIN 1:
FIELD TYPE
ID Integer
USERNAME String
PASSWORD String
LOGIN 2:
FIELD TYPE
ID Integer
USERNAME String
PASSWORD String
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
SAMPLE CODING
/*---------------Centroid.java-----------------*/
package distance;
class Centroid {
private double mCx, mCy;
private Cluster mCluster;
public Centroid(double cx, double cy) {
this.mCx = cx;
this.mCy = cy;
}
public void calcCentroid() { //only called by CAInstance
int numDP = mCluster.getNumDataPoints();
double tempX = 0, tempY = 0;
int i;
//caluclating the new Centroid
for (i = 0; i < numDP; i++) {
tempX = tempX + mCluster.getDataPoint(i).getX();
//total for x
tempY = tempY + mCluster.getDataPoint(i).getY();
//total for y
}
this.mCx = tempX / numDP;
this.mCy = tempY / numDP;
//calculating the new Euclidean Distance for each Data Point
tempX = 0;
tempY = 0;
for (i = 0; i < numDP; i++) {
mCluster.getDataPoint(i).calcEuclideanDistance();
}
//calculate the new Sum of Squares for the Cluster
mCluster.calcSumOfSquares();
}
public void setCluster(Cluster c) {
this.mCluster = c;
}
public double getCx() {
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
return mCx;
}
public double getCy() {
return mCy;
}
public Cluster getCluster() {
return mCluster;
}
}
/*-----------------Cluster.java----------------*/
package distance;
import java.util.Vector;
class Cluster
{
private String mName;
private Centroid mCentroid;
private double mSumSqr;
private Vector mDataPoints;
public Cluster(String name)
{
this.mName = name;
this.mCentroid = null; //will be set by calling //setCentroid()
mDataPoints = new Vector();
}
public void setCentroid(Centroid c)
{
mCentroid = c;
}
public Centroid getCentroid()
{
return mCentroid;
}
public void addDataPoint(DataPoint dp)
{ //called from CAInstance
dp.setCluster(this); //initiates a inner call to
//calcEuclideanDistance() in DP.
this.mDataPoints.addElement(dp);
calcSumOfSquares();
}
public void removeDataPoint(DataPoint dp)
{
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
this.mDataPoints.removeElement(dp);
calcSumOfSquares();
}
public int getNumDataPoints()
{
return this.mDataPoints.size();
}
public DataPoint getDataPoint(int pos)
{
return (DataPoint) this.mDataPoints.elementAt(pos);
}
public void calcSumOfSquares()
{ //called from Centroid
int size = this.mDataPoints.size();
double temp = 0;
for (int i = 0; i < size; i++)
{
temp = temp + ((DataPoint)
this.mDataPoints.elementAt(i)).getCurrentEuDt();
}
this.mSumSqr = temp;
}
public double getSumSqr()
{
return this.mSumSqr;
}
public String getName()
{
return this.mName;
}
public Vector getDataPoints()
{
return this.mDataPoints;
}
}
// GRAPH.java//
package Graph;
import java.awt.Point;
public class Cluster{
public Point point=null;
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
public int finished=0;
public Cluster(int x,int y) {
// TODO Auto-generated constructor stub
x=setX(x);
y=setY(y);
this.point=new Point(x,y);
}
public int setX(int i){
i=(int) (Graph.p2.getX()+(i*(Graph.Xlength/Graph.Xedge)));
//i=(int) (Graph.p2.getX()+(i*Graph.Xp));
return i;
}
public int setY(int j){
j=(int)(Graph.p2.getY()-(j*(Graph.Ylength/Graph.Yedge)));
//j=(int) (Graph.p2.getY()-(j*Graph.Yp));
return j;
}
}
/*----------------DataPoint.java----------------*/
package distance;
public class DataPoint {
private double mX,mY;
private String mObjName;
private Cluster mCluster;
private double mEuDt;
public DataPoint(double x, double y, String name) {
this.mX = x;
this.mY = y;
this.mObjName = name;
this.mCluster = null;
}
public void setCluster(Cluster cluster) {
this.mCluster = cluster;
calcEuclideanDistance();
}
public void calcEuclideanDistance() {
REPORTS DETAILS
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
After browsing the ARFF file, click submit button .If the corresponding arff file is present
then a success page report will open indicating the ARFF file.
Give the IP address in the textbox and then it get connected to the FTP server.(192.168.1.60)
After the success page of ARFF file is Opened the report of clustering objects will be shown
in specifically in the graph.
In the Login Page give the corresponding username and login,then the ARFF file will be
opened.
Next Stage the ARFF file is encrypted in the SYSTEM1 and is viewed.
After the process of encryption and decryption the process of reclustering takes place and
Clustered file is being reclustered..
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
TESTING DETAILS
SYSTEM TESTING:
Testing is performed to identify errors. It is used for quality assurance. Testing is an
integral part of the entire development and maintenance process. The goal of the testing during
phase is to verify that the specification has been accurately and completely incorporated into the
design, as well as to ensure the correctness of the design itself. For example the design must not
have any logic faults in the design is detected before coding commences, otherwise the cost of
fixing the faults will be considerably higher as reflected. Detection of design faults can be achieved
by means of inspection as well as walkthrough.
Testing is one of the important steps in the software development phase. Testing checks for the
errors, as a whole of the project testing involves the following test cases:
Static analysis is used to investigate the structural properties of the Source code.
Dynamic testing is used to investigate the behavior of the source code by executing the
program on the test data.
UNIT TESTING:
Unit testing is conducted to verify the functional performance of each modular
component of the software. Unit testing focuses on the smallest unit of the software design (i.e.),
the module. The white-box testing techniques were heavily employed for unit testing.
FUNCTIONAL TESTS:
Functional test cases involved exercising the code with nominal input values for which
the expected results are known, as well as boundary values and special values, such as logically
related inputs, files of identical elements, and empty files.
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
Three types of tests in Functional test:
Performance Test
Stress Test
Structure Test
PERFORMANCE TEST:
It determines the amount of execution time spent in various parts of the unit, program
throughput, and response time and device utilization by the program unit.
STRESS TEST:
Stress Test is those test designed to intentionally break the unit. A Great deal can be
learned about the strength and limitations of a program by examining the manner in which a
programmer in which a program unit breaks.
STRUCTURED TEST:
Structure Tests are concerned with exercising the internal logic of a program and traversing
particular execution paths. The way in which White-Box test strategy was employed to ensure that
the test cases could Guarantee that all independent paths within a module have been have been
exercised at least once.
INTEGRATION TESTING:
Integration testing is a systematic technique for construction the program structure while at
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
the same time conducting tests to uncover errors associated with interfacing. i.e., integration testing
is the complete testing of the set of modules which makes up the product. The objective is to take
untested modules and build a program structure tester should identify critical modules. Critical
modules should be tested as early as possible. One approach is to wait until all the units have passed
testing, and then combine them and then tested. This approach is evolved from unstructured testing
of small programs. Another strategy is to construct the product in increments of tested units. A
small set of modules are integrated together and tested, to which another module is added and tested
in combination. And so on. The advantages of this approach are that, interface dispenses can be
easily found and corrected.
The major error that was faced during the project is linking error. When all the modules
are combined the link is not set properly with all support files. Then we checked out for
interconnection and the links. Errors are localized to the new module and its intercommunications.
The product development can be staged, and modules integrated in as they complete unit testing.
Testing is completed when the last module is integrated and tested.
VALIDATION TESTING:
Software validation is achieved through a series of tests that demonstrates
conformity with requirements. A test plan outlines the classes of test to be conducted and a test
procedure defines specific test cases that will be used to demonstrate conformity with requirements.
Thus the proposed system under consideration has been tested by validation and found to be
working satisfactorily.
PROGRAM TESTING:
The logical and syntax errors have been pointed out by program testing. A syntax error
is an error in a program statement that in violates one or more rules of the language in which it is
written. An improperly defined field dimension or omitted keywords are common syntax error.
These errors are shown through error messages generated by the computer. A logic error on the
other hand deals with the incorrect data fields, out-off-range items and invalid combinations. Since
the compiler s will not deduct logical error, the programmer must examine the output. Condition
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
testing exercises the logical conditions contained in a module. The possible types of elements in a
condition include a Boolean operator, Boolean variable, a pair of Boolean parentheses A relational
operator or on arithmetic expression. Condition testing method focuses on testing each condition in
the program the purpose of condition test is to deduct not only errors in the condition of a program
but also other a errors in the program.
SECURITY TESTING:
Security testing attempts to verify the protection mechanisms built in to a system well, in
fact, protect it from improper penetration. The system security must be tested for invulnerability
from frontal attack must also be tested for invulnerability from rear attack. During security, the
tester places the role of individual who desires to penetrate system.
SOFTWARE TESTING STRATEGIES:
A software testing strategy provides a road map for the software developer. Testing is a set activity
that can be planned in advance and conducted systematically. For this reason a template for
software testing a set of steps into which we can place specific test case design methods should be
strategy should have the following characteristics:
Testing begins at the module level and works “outward” toward the integration of the
entire computer based system.
Different testing techniques are appropriate at different points in time.
The developer of the software and an independent test group conducts testing.
Testing and Debugging are different activities but debugging must be accommodated
in any testing strategy.
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS
Planned Actual
S.No Activity Description Status Remark
Date Date
CLIENT 1 :
CLIENT 2 :
D.S.Arun Francis
Sign of Director
A.V.C.COLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER APPLICATIONS