V3i205 PDF

Volume 3, Issue 2, February-2016, pp.
57-61 ISSN (O): 2349-7084
International Journal of Computer Engineering In Research Trends

Available online at: www.ijcert.org
Detection and Avoidance of Sensitive Data in

Host-assisted Mechanism using Fuzzy
Fingerprint Technique
1
Ms. Patil Deepali E., 2Prof. Takmare Sachin B.
1
Pursuing M.E, CSE Branch, Dept of CSE
2
Assistant Professor Department of Computer Science and Engineering,
Bharati Vidyapeeths College of Engineering, Kolhapur.
Maharashtra, India.
Abstract The Data-leak cases, human mistakes are one of the causes of data loss. Deliberately planned attacks,
inadvertent and human mistakes lead to most of the data-leak incidents. The detecting solutions of inadvertent sensitive
data leaks caused by human mistakes and provide alerts for organizations. A common approach is to screen content in the
storage and transmission for exposed sensitive information. Such an approach requires the detection operation to be
conducted in secrecy. The data-leak detection (DLD) privacy- preserving solution to solve the special set of sensitive data
digests is used in detection. The advantage of data owner is safely delegate the detection operation to a semihonest
provider without revealing sensitive data to the provider. Internet service providers can offer their customers DLD as an add-
on service with strong privacy guarantees. Evaluation results support accurate detection with very small number of false
alarms under various data-leak scenarios. Host-assisted mechanism for the complete data-leak detection for large-scale
organizations. To design the Host-assisted mechanism for DLD, using data signature and fuzzy ngerprint.
Keywords Data leak, network security, privacy, fuzzy fingerprint, data-leak detection.

leak detection solutions those providers to scan the

1. INTRODUCTION content for leaks without learning sensitive
information. The designs, implement, and evaluate the
The Risk Based Security (RBS) number of leaked the fuzzy ngerprint technique that enhances data privacy
sensitive data records has increased dramatically during the data-leakage detection operations. The fast
during last few years. The detecting and preventing and practical one-way computation this is based on the
data leaks requires the set of complementary solutions, sensitive data. Using this detection method, the DLD
which may include data-leak detection data provider, who is modeled as an honest-but-curious
connement stealthy malware detection and policy adversary, can gain limited knowledge about the
enforcement. The network data-leak detection (DLD) sensitive data from either the released digests, or the
performs deep packet inspection (DPI).These searches content being inspected.
for any occurrences of the sensitive data patterns. The
technique of DPI is analyzed to payloads of IP/TCP These techniques, an Internet service provider (ISP) can
packets for inspecting application layer data. Alerts are perform detection on its the customers traffic securely
triggered when the amount of sensitive data found in and provide the data-leak detection as an add-on
traffic passes a threshold. service for its customers. In another scenario, the
individuals can mark their own sensitive data and the
The straightforward realizations of data-leak detection ask administrator of their local network to detect data
require the plaintext sensitive data. This undesirable leaks for them. The DLD provider computes
requirement, threaten it may the condentiality of the ngerprints from network traffic and identies
sensitive information. The data owner may need to potential leaks in them. To prevent the DLD provider
outsource the data-leak detection to providers, the from gathering the exact knowledge about sensitive
plaintext sensitive data to them. One needs new data- data, the collection of potential leaks is composed of
2016, IJCERT All Rights Reserved Page | 57

Patil Deepali et al., International Journal of Computer Engineering In Research Trends
Volume 3, Issue 2, February-2016, pp. 57-61
noises and real leaks. It is the data owner, who post- suited for enabling the privacy in location-aware
processes the potential leaks sent back by the DLD applications. This show by providing two multi-party
provider and determines whether there is any the real protocols for the privacy-preserving computation of
data leak. This model supports detection operation location information, based on the known
delegation. The ISPs can provide data-leak detection as homomorphic properties of public key encryption
an add-on the service to their customers using this schemes.
model. The design, implements, and evaluates an
efficient technique the fuzzy ngerprint, for privacy- K. Borders and A. Prakash (2009) [3] routes of
preserving data-leak detection. information leakage are various, for example, human,
paper, the Internet, and USB ash memory. It is
Fuzzy ngerprints are special sensitive data digests difficult to nd information leakage by calculating the
prepared by data owner for release to the DLD number of characters of HTTP requests in cases where
provider. These results indicate high accuracy achieved the leaked number of characters is not large. If calculate
by this underlying scheme with very low false positive the approximate entropy, the value is small on the
rate. The filtering steps and data preparation can take whole because ignore a lot of repeated information.
considerable amount of processing time but once
preprocessing is done the data become more reliable H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda
and robust results are achieved. They have conducted (2007) [4] malware has become a significant, complex,
extensive experiments to validate the accuracy, and widespread problem within the computer
efficiency and privacy of these solutions. The result industry. The classification model is based on an
provide by host log detect the sensitive data leak examination of eight the malware samples and it
detection. The host-assisted mechanism for data-leak identifies four malware commonalities and
detection the complete for large-scale organizations. To classifications based on dimensions of the persistence
design the Host-assisted mechanism for DLD, using and stealth. The article goal is to provide a better
data signature and fuzzy ngerprint. understanding of when the cyber-conflict will happen
and to help defenders better mitigate the potential
2. LITERATURE REVIEW damage.
Xiaokui Shu, Danfeng Yao and Elisa Bertino, Fellow
(2015) [1] has studied that among multiple data-leak K. Borders, E. V. Weele, B. Lau, and A. Prakash (2009)
cases, human mistakes are one of the main causes of [5] practical and powerful device based isolation
data loss. Detecting inadvertent sensitive data leaks approach for the information security and application
caused by the human mistakes and to provide alerts for of demonstrate in preserving the condentiality of
organizations. They present privacy- preserving data- cryptographic keys. The device-based isolation is
leak detection (DLD) solution to solve the where a dened by isolating the storage and operations related
special set of the sensitive data digests is used in to data with dierent security requirements through
detection. The advantage of method is that it enables computing multiple devices. The isolation should not
the data owner delegate to safely the detection hinder the use and access of the data for practical
operation is to a semihonest provider without revealing applications.
the sensitive data to the provider. The internet service
providers can offer their customers DLD as add-on A. Nadkarni and W. Enck (2013) [6] the exposure of
service with the strong privacy guarantees. The sensitive data in storage and transmission poses a
evaluation results show that method can support serious threat to the organizational and personal
accurate the detection with very small number of false security. Auto-FBI guarantees the secure access of
alarms under various data-leak scenarios. sensitive data on the web. It achieves this guarantee by
automatically generating a new browser instance for
X. Shu and D. Yao (2012) [2] the focus on the latter kind sensitive content. Aquifer is a policy framework and
of services, where location information is essentially system. It helps prevent accidental information
used to determine the membership of one or more disclosure in OS.
geographic sets. This address problem using Bloom
Filters (BF), a compact data structure for representing G. Karjoth and M. Schunter (2002) [7] privacy policy
sets. In particular present an extension of the original specification and enforcement has become a hotbed of
Bloom lter idea: the Spatial Bloom Filter (SBF). The the research activity over past few years as Internet use
SBFs are designed to manage the spatial, geographical has been on the rise around the globe. The number of
information in a space ecient way, and are well- consumers participating in grows online activities; it

becomes increasingly imperative for the organizations of digests or ngerprints from the sensitive data and
to express their privacy practices in an accurate, then discloses only a small amount of them to the DLD
accessible, and useful way. The quality criteria used in provider. This implement detection system and
the software requirements specification can be used to perform extensive experimental evaluation on 2.6 GB
evaluate the privacy policies specified using P3P and Enron dataset, Internet surng traffic of 20 users, and
EPAL. also 5 simulated real-world data-leak scenarios to
measure its privacy guarantee, efficiency and detection
Y. Jang, S. P. Chung, B. D. Payne, and W. Lee (2014) [8] rate.
have proposed a way to capture richer semantics of the
users intent. The method is based on the observation 2. Data Preprocessing: Sentiment or Emotion analysis of
that for the most text-based applications, users intent social networking data involves a lot of data
will be displayed the entirely on screen, text, and the preprocessing. The data preparation and filtering steps
user will make modications. Based on this idea, they can take considerable amount of processing time but
have implemented of prototype called Gyrus2 which once preprocessing is done the data become reliable
enforces correct behavior the applications by capturing and robust results are achieved. Data preprocessing is
user intent. Since this is attack agnostic, it will scale done to eliminate the incomplete, noisy and
better than the traditional security systems. inconsistent data.
A. Broder and M. Mitzenmacher (2004) [9] have

described the mathematics behind Bloom lters, their
history, and some important variations. Bloomlter is a
simple space-ecient randomized the data structure
for representing a set in order to support the
membership queries. The Bloom lters allow false
positives but space savings often outweigh this
drawback when the probability of an error is made
Fig.1 Architecture of Host-assisted Mechanism for
suciently low. This ways in which Bloom lters have
Data-leak Detection model
been used and modied the variety of network
problems, with aim of providing a unied
3. Fuzzy Fingerprint: Fuzzy ngerprint technique that
mathematical and practical framework for them and
enhances data privacy during data-leak detection
stimulating their use in future applications.
operations. Fuzzy ngerprints are special sensitive data
digests prepared by the data owner for release to the
R. Chen, B. C. M. Fung, N. Mohammed, B. C. Desai,
DLD provider. Fuzzy ngerprint mechanism improves
and K. Wang (2013) [10] data in its original form,
the data protection against semi-honest DLD provider.
however, the typically contains sensitive information
about individuals, and publishing such data will
4. Provider: This describes how Internet service is
violate individual privacy. The data publishing relies in
providers can offer their customers DLD as an add-on
current practice mainly on policies and guidelines as to
service with strong privacy guarantees. Using Fuzzy
what types of the data can be published and on
ngerprint techniques, an Internet service provider
agreements on the use of the published data. Privacy-
(ISP) can perform detection on its customers traffic
preserving data publishing (PPDP) provides methods
securely and provide the data-leak detection as an add-
and the tools for publishing useful information while
on service for its customers. The gateway dumps the
preserving data privacy. This systematically summarize
network traffic and sends to a DLD server/provider
and different approaches to evaluate PPDP, study the
(Linux).
challenges in practical data publishing, clarify the
differences and requirements that distinguish of PPDP
5. Detect: A common approach is to screen content in
from other related problems, and propose future
storage and transmission for exposed the sensitive
research directions.
information. The detection operation conducted in
secrecy. The detection system it can be deployed on a
3. ARCHITECTURE DESIGN
router or integrated into existing network intrusion
1. Host-assisted mechanism: Host-assisted mechanism detection systems (NIDS).
for the complete data-leak detection for large-scale
organizations. The data owner computes the special set

6. Host Logs: This describe the server check out by logs assisted mechanism for complete the data-leak
and decide to which model to send in a system. detection for large-scale organizations.
7. DLD Evaluation: The evaluation results show

support accurate detection with very small number of ACKNOWLEDGEMENT
false alarms under the various data-leak scenarios. The authors are grateful to express the sincere thanks
Rabin ngerprints use with variety of moduluss in and gratitude to Computer Department of Engineering,
ngerprint lter as the hash functions, and perform BVCOEK for the encouragement and facilities that were
extensive experimental evaluation on ngerprint lter offered to us for carrying out this project. The authors
and bloom lter with MD5/SHA. would like to thank Prof. Chougule A. B. (Head of
computer science and Engineering (BVCOEK)) & Prof.
The privacy-preserving data-leak detection to supports S. B. Takmare.
practical data-leak detection as a service and minimizes
the knowledge that a DLD provider may gain the REFERENCES
process. Data owner to send the digests to the DLD
[1] Xiaokui Shu, Danfeng Yao and Elisa Bertino,
provider, MONITOR,DETECT for the DLD provider is
Fellow,Privacy-Preserving Detection of Sensitive Data
collect to outgoing traffic of the organization this
Exposure IEEE Trans. on Information Forensics and
compute digests of traffic content, identify potential
Security, vol. 10, no. 5, May 2015,pp.1092-1103.
leaks, and REPORT for the DLD provider to return
data-leak alerts to the data owner where there may be [2] X. Shu and D. Yao, Data leak detection as a
false positives. Fingerprint Filter develop extension to service, in Proc. 8th Int. Conf. Secur. Privacy
use Bloom lter in the DETECT operation for efficient Commun. Netw., 2012, pp. 222240.
set intersection test. Bloom lter is a well-known space-
saving data structure for performing set-membership [3] K. Borders and A. Prakash, Quantifying
test. The multiple hash functions apply to each of the information leaks in outbound web trafc, in Proc.
set elements and stores the resulting values in a bit 30th IEEE Symp. Secur. Privacy, May 2009, pp. 129140.
vector; to test whether a value v belongs to the set, lter
checks each corresponding the bit mapped with each [4] H. Yin, D. Song, M. Egele, C. Kruegel, and E. Kirda,
hash function. Panorama: Capturing system-wide information ow
for malware detection and analysis, in Proc. 14th ACM
Rabin ngerprints use variety of the moduluss in Conf. Comput. Commun. Secur., 2007, pp. 116127.
ngerprint lter as the hash functions, and perform
extensive experimental evaluation on ngerprint lter [5] K. Borders, E. V. Weele, B. Lau, and A. Prakash,
and bloom lter with MD5/SHA. The analyze security Protecting condential data on personal computers
and privacy guarantees provided by this data-leak with storage capsules, in Proc. 18th USENIX Secur.
detection system, as discuss the sources of possible Symp., 2009, pp. 367382. 6. A. Nadkarni and W. Enck,
false negatives data leak cases being overlooked and Preventing accidental data disclosure in modern
false positives legitimate traffic misclassied as data operating systems, in Proc. 20th ACM Conf. Comput.
leak in the detection. Fuzzy ngerprint approach is Commun. Secur., 2013, pp. 10291042.
more exible from deployment the perspective, as the
[7] G. Karjoth and M. Schunter, A privacy policy
data owner can adjust and ne-tune the privacy and
model for enterprises, in Proc. 15th IEEE Comput.
accuracy in the detection without recomputing the
Secur. Found. Workshop, Jun. 2002, pp. 271281.
ngerprints. They have conducted extensive
experiments to validate the accuracy, efficiency, and [8] Y. Jang, S. P. Chung, B. D. Payne, and W. Lee,
privacy of these solutions. Gyrus: A framework for user-intent monitoring of
text-based networked applications, in Proc. 23rd
4. CONCLUSION USENIX Secur. Symp., 2014, pp. 7993.
Fuzzy ngerprint is a privacy-preserving data-leak
detection model and present its realization. Using [9] A. Broder and M. Mitzenmacher, Network
special digests, the exposure of sensitive data is kept to applications of bloom lters: A survey,Internet Math.,
a minimum during the detection. They have conducted vol. 1, no. 4, pp. 485509, 2004.
extensive experiments to validate the accuracy, privacy,
and efficiency of these solutions. Designs a host- [10] R. Chen, B. C. M. Fung, N. Mohammed, B. C.
Desai, and K. Wang, Privacy-preserving trajectory

data publishing by local suppression, Inf. Sci., vol. 231,

pp. 8397, May 2013.
AUTHOR PROFILE
Ms. Patil Deepali E. is a M.E. student

in Bharati Vidyapeeths College of
Engineering, Kolhapur. Maharashtra,
India. Her research interest lies in
Networking and Network security. She
has published paper in National Level
Conference.
Mr. Sachin Balawant Takmare is

working as assistant professor in
Computer Science and Engineering
department of Bharati Vidyapeeths
College of Engineering, Kolhapur with
Teaching experience of about 10 years.
He has published about 3 International
Papers and 5 National Papers.

V3i205 PDF

Uploaded by

Copyright:

Available Formats

V3i205 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

V3i205 PDF

Uploaded by

Copyright:

Available Formats

Volume 3, Issue 2, February-2016, pp.

57-61 ISSN (O): 2349-7084

International Journal of Computer Engineering In Research Trends

Detection and Avoidance of Sensitive Data in

leak detection solutions those providers to scan the

2016, IJCERT All Rights Reserved Page | 57

2016, IJCERT All Rights Reserved Page | 58

A. Broder and M. Mitzenmacher (2004) [9] have

2016, IJCERT All Rights Reserved Page | 59

7. DLD Evaluation: The evaluation results show

2016, IJCERT All Rights Reserved Page | 60

data publishing by local suppression, Inf. Sci., vol. 231,

Ms. Patil Deepali E. is a M.E. student

Mr. Sachin Balawant Takmare is

2016, IJCERT All Rights Reserved Page | 61

You might also like