Proceedings Ncact - 2011
Proceedings Ncact - 2011
Proceedings Ncact - 2011
1
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Organized by
DEPARTMENT
OF
COMPUTER SCIENCE AND ENGINEERING
S.A. ENGINEERING
COLLEGE
NBA ACCREDITED & ISO 9001:2008 CERTIFIED INSTITUTION
Poonamallee – Avadi Road, Veeraraghavapuram,
Thiruverkadu, Chennai – 600 077.
E-Mail: ncact2011@saec.ac.in Website: www.saec.ac.in
Phone Nos : 044 –26801999, 26801499
Fax No: 044 – 26801899
Sponsored by
DHARMA NAIDU EDUCATIONAL AND
CHARITABLE TRUST
3
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
BOARD OF TRUSTEES
(Late) D. SUDHARSSANAM,
Founder
Shri.D. DURAISWAMY
Chairman
Shri.D.PARANTHAMAN
Vice Chairman
Shri.D. DASARATHAN
Secretary
Shri. S. AMARNAATH
Treasurer
Shri.S. GOPINATH
Joint Secretary
4
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
PREFACE
Department of Computer Science & Engineering, S.A. Engineering College, Chennai,
organizes the 4th National Conference on Advanced Computing Technologies (NCACT-
2011) on 2nd February 2010. This National Conference NCACT-2011 aims:
Wireless Networks
3G/4G Networks
E - learning Methodologies
Information retrieval
5
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Computational intelligence
A total of about 172 Technical papers were received from post graduate student, Faculty
members and Research Scholars from R & D organizations covering a wide spectrum of
areas , viz. Cloud, Grid and Quantum Computing , Data Mining and DataWareHousing,
Network Security, Wireless Technologies, Operating Systems, Web Mining etc., These
papers were peer reviewed by technical experts and 78 papers have been selected for
presentation. This volume is a record of current research in the field of recent trends in
advanced computing technologies. We would like to express our sincere thanks to Shri.
D.Duraiswamy, Chairman, Shri.D.Dasarathan ,Secretary,Shri.D.Paranthaman, Vice
Chairman Shri. S. Amarnaath, Treasurer, Thiru P. Venkatesh Raja, Director, Dr.
S.Suyambhazhahan Principal providing us all the supports for conduct this 4 th National
Conference. We thank the various organizations that have deputed delegates to participate
in the conference. We wish to express our sincere thanks to all advisory committee
members for their cordiality and share their expertise during various processes of the
conference. Also we thank faculty members and students of Department of Computer
Science & Engineering for their co-operation in bringing out this conference in grand
success.
EDITORIAL BOARD
N. PARTHEEBAN (Ph.D).,
ASSISTANT PROFESSOR
COMPUTER SCIENCE AND ENGINEERING
6
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
STEERING COMMITTEE
CHIEF PATRON
ADVISORY COMMITTEE
CONFERENCE CHAIRMAN
7
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
8
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
• To create an excellent teaching and learning environment for our staff and
students to realize their full potential thus enabling them to contribute
positively to the community.
world class of academic facilities, employing highly qualified and experienced faculty
9
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
strong over the years in terms of infrastructure facilities, experienced and dedicated
team of faculty strength, technical expertise, modern teaching aids, tutorial rooms,
well equipped and spacious laboratories. The department has separate library and
seminar hall with all latest equipment. The department has currently an intake of 120
students. The department also has 120 KVA power backup and 2 Mbps Leased Line
10
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
S. AMARNAATH, M.Com.,
CORRESPONDENT
MESSAGE
We at this institution constantly strive to provide an excellent academic environment
for the benefit of students and faculty so that they will acquire a technological competence
synonymous with human dignity and values.
I am happy to know that our institution is maintaining the tradition set with respect to
the contents in Engineering & Technology, cultural and other activities of the organization
extending with another milestone of this Conference in this academic year 2010-2011,
organized by the Department of Computer Science and Engineering.
I congratulate and offer my best wishes to the Principal and committee members who
have involved them in this conference towards the academic development for the benefit of
student’s community.
S.AMARNAATH
11
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
MESSAGE
This institution is a tribute to the great organizing genius of its Founder. Without his
initiative and inspiration it would have been impossible to find an institution of this character.
This institution is a memorable experiment in the moral and technological
regeneration of India. It stands for nothing less.
With this, we are proud to conduct the “4th National Conference on ADVANCED
COMPUTING TECHNOLOGIES – NCACT’11 ”, on 2nd February 2011. We wish and thank
the Principal and faculty members who have involved them in this Conference and the
participants who have really come forward to benefit themselves to develop the academic
knowledge of confidence by this conference.
P. VENKATESH RAJA
12
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
MESSAGE
I appreciate the initiative taken by the heads of department and the faculty
members of computer science and engineering for conducting “4th National
Conference on ADVANCED COMPUTING TECHNOLOGIES – NCACT’11 ”, on
2nd February 2011 at our college campus.
I sincerely appreciate the efforts made by the Principal, HOD, staff and
students with the great sense of belongingness and ownership and wish them to
have a great success and in the coming times as well.
Dr.S.SUYAMBAZHAHAN
13
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
S.NO NAME OF THE AUTHOR(S) TITLE OF THE PAPER NAME OF THE COLLEGE MAIL ID/CONTACT NO
14
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
ANUSHA.S s_anusha25@yahoo.com
8. CRYPTANALYSIS OF AN RAJALAKSMI
EDGE CRYPT ALGORITHM ENGINEERING COLLEGE
B.BHUVANESWARAN bhuvangates@yahoo.com
A CONCURRENCY CONTROL
9. PROTOCOL USING ZR+- RAJALAKSHMI
GAYATHRI.U gaayathriu@gmail.com
TREES FOR SPATIAL JOIN ENGINEERING COLLEGE
AND KNN QUERIES
SECURE ENERGY EFFICIENT
LEKSHMI PRIYA.R DATA AGGREGATION
10. RAJALAKSHMI
PROTOCOL FOR DATA yaraj09@gmail.com
ENGINEERING COLLEGE
REPORTING IN WIRELESS
SENSOR NETWORKS
EFFICIENT ENERGY SAVING
EVANGELIN HEMA
USING DISTRIBUTED
11. MARIYA.R RAJALAKSHMI
CLUSTER HEADS IN Hema.mariya@Gmail.com
ENGINEERING COLLEGE
WIRELESS SENSOR
NETWORKS
A NOVEL FRAMEWORK FOR
DENIAL OF PHISHING BY
12. RAJALAKSHMI
VADHANI.R COMBINING HEURISTIC & Vadhanitamilarasi@gmail.com
ENGINEERING COLLEGE
CONTENT BASED SEARCH
ALGORITHM
VEL TECH MULTI TECH
SURESHBABU.D
13. RISK ESTIMATION USING DR.RR & DR.SR
sureshbabu.me@gmail.com
OBJECT-ORIENTED METRICS ENGINEERING
C.PRABHAKARAN
COLLEGE,
DEPARTMENT OF
SECURE ENCRYPTION AND COMPUTER SCIENCE
PIRAMANAYAGAM.M
14. KEYING BASED ON VIRTUAL AND ENGINEERING,
rajaucbe@gmail.com
ENERGY FOR WIRELESS ANNA UNIVERSITY OF
M.YUVARAJU
SENSOR NETWORKS TECHNOLOGY,
COIMBATORE
G.K.M COLLEGE OF
SIVARANJANI.P HYBRID INFRASTRUCTURE
15. ENGINEERING AND
SYSTEM FOR EXECUTING ranjusiva22@gmail.com
TECHNOLOGY,PERINGA
P.NEELAVENI SERVICE WORKFLOWS.
LATHUR, CHENNAI
16. IMPROVISED SOLUTION RAJALAKSHMIENGINEE
NANDHINI.T.J nandhinii.km@gmail.com
THROUGH MERKLE TREE RINGCOLLEGE
15
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
16
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
kmahaalakshmi@yahoo.com
MAHAALAKSHMI K AUTOMATIC DATA VEL TECH MULTI TECH
24.
EXTRACTION FROM DR.RR & DR.SR ENGG snksnk07@gmail.com
NEELAKANDAN S WEBPAGES BY WEBNLP COLLEGE,
anandhime@ymail.com
IMPLEMENTATION OF
PSNA COLLEGE OF
MATHEW P C ENCRYPTED IMAGE
25. ENGINEERING AND
COMPRESSION USING pcmathew_bmc@yahoo.com
TECHNOLOGY,
M. ARUNKUMAR RESOLUTION PROGRESSIVE
DINDIGUL, TAMIL NADU.
COMPRESSION SCHEME
17
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
DR.KARTHIKEYANI.V.TAJU COLLEGE,
DIN.K.,
PARVIN BEGAM.I
SIVAKAMY.N
32. MODIFIED DELAY STRATEGY S.A. ENGINEERING
B.MURUGESWARI, sivakamyn@yahoo.com
IN GRID ENVIRONMENT COLLEGE,CHENNAI
DR.C.JAYAKUMAR,
AN EFFECTIVE WEB-BASED
indumathyc@hotmail.com
33. E-LEARNING BY MANAGING EASWARI ENGINEERING
INDUMATHY.C
RESOURCES USING COLLAGE
ONTOLOGY
EFFECTIVE AND EFFICIENT
QUERY PROCESSING FOR
34. EASWARI ENGINEERING
ESWARI.R eswariram_88@yahoo.co.in
COLLEGE
IDENTIFYING VIDEO
SUBSEQUENCE
35. COMPUTER AND
VEENA.K ANNA UNIVERSITY dce.veena@gmail.com
INFORMATION SECURITY
ENHANCING THE LIFETIME
ALEN JEFFIE PENELOPE.J OF DATA GATHERING
36. EASWARI ENGINEERING
WIRELESS SENSOR msg2allen@yahoo.co.in
COLLEGE
L.BHAKAYA LASKSHMI NETWORK BY BALANCING
ENERGY CONSUMPTION
AMUDHA.S ST PETER’S UNIVERSITY
18
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
GRID COMPUTING
NOVEL METHOD FOR
SEQUENCE NUMBER
COLLECTOR RAJALAKSHMIENGINEE
40.
AARTHI.S RINGCOLLEGE, cse.aarthi@gmail.com
PROBLEM IN BLACK HOLE CHENNAI
ATTACK DETECTION-AODV
BASED MANET
ADVANCED CONGESTION
JAYA ENGINEERING
JESHIFA.G.IMMANUEL CONTROL TECHNIQUE FOR
41. COLLEGE,
A.FIDAL CASTRO HEALTH CARE MONITORING jeshifa@gmail.com
THIRUNINDRAVUR-
PROF.E.BABU RAJ IN WIRELESS BIOMEDICAL
602024
SENSOR NETWORKS
A GAME THEORETIC
FRAMEWORK FOR
42. NITHYA KUMARI.K EASWARI ENGINEERING
kumari.nithu@yahoo.co.in
BHAGYALAKSHMI.L POWER CONTROL IN COLLEGE
WIRELESS AD HOC
NETWORKS
UMA.R SMABS: SECURE MULTICAST
43. S.A.ENGINEERING
L. PAUL AUTHENTICATIONBASED ON uma_devi1985@yahoo.com
COLLEGE
JASMINE RANI BATCH SIGNATURE
SANJAIKUMAR.K k.sanjai31@rocketmail.com
44. AFFINE SYMMETRIC S.A.ENGINEERING
G.UMARANISRIKANTH,
IMAGEMODEL COLLEGE
M.E.(PHD)
COMBINING TPE SCHEME
AND SDEC FOR SECURE
MADHAVI.S Madhavi11lakshmi@gmail.com
45. DISTRIBUTED NETWORKED S.A.ENGINEERING
STORAGE COLLEGE
S. KALPANA DEVI
PERFORMANCE EVALUATION
LAVANYA.R laaviraj@gmail.com
46. OF FLOOD SEQUENCING S.A.ENGINEERING
PROTOCOLS IN SENSOR COLLEGE
E.SUJATHA**
NETWORKS
PARVIN BEGUM.I KNOWLEDGE DISCOVERY
47. SOKA IKEDA COLLEGE
PROCESS THROUGH parvinnadiya@gmail.com
OF ARTS AND SCIENCE
DR.KARTHIKEYANI.V TEXTMINING
19
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
TAJUDIN.K
SHAHINA BEGAM.I.,
DATA LEAKAGE DETECTION PRATHYUSHA
INSTITUTE OF
48. USING TECHNOLOGY AND
SUNIL.P psunilcad@gmail.com
MANAGEMENT
ROBUST AUDIO HIDING
TECHNIQUES ARANVAYALKUPPAM
20
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
WIRELESS SENSOR
NETWORK SECURITY USING
55. VIRTUAL ENERGY BASED S.A.ENGINEERING
HANSON THAYA.S Hanson2001@gmail.com
ENCRYPTION COLLEGE
QOS-AWARE CHECKPOINTING
ARRANGEMENT IN MOBILE
56. S.A.ENGINEERING
SANGEETHA.J GRID ENVIRONMENT Jsnageetha22@gmail.com
COLLEGE
57. S.A.ENGINEERING
SIREESHA.P Siri.it.05@gmail.com
COLLEGE
EXTENDED QUERY
58. ORIENTED, CONCEPT-BASED S.A.ENGINEERING
SIVASAKTHI.K 1k.s.shakthi@gmail.com
USER PROFILES FROM COLLEGE
SEARCH ENGINE LOGS
59. ENSEMBLE REGISTRATION S.A.ENGINEERING
ALGUMANI.S r.alagumani@gmail.com
OF MULTI SENSOR IMAGES COLLEGE
UMA.S EMBEDDING
60. DR.PAULS
CRYPTOGRAPHY IN VIDEO dewuma@gmail.com
ENGINEERING COLLEGE
G.SHOBA STEGANOGRAPHY
VEL TECH MULTI TECH
KRISHNA KUMAR.N
61. RULE CLASSIFICATION FOR DR.RANGARAJAN Krishnakumarme2603@gmail.c
MEDICAL DATASET DR.SAKUNTHALA om
MADHU SUDHANAN.S
ENGINEERING COLLEGE
ENHANCED VEHICLE
62.
NIROSHA.N DETECTION BY EARLY PITAM niroshait@gmail.com
OBJECT IDENTIFICATION
EFFICIENT ROUTING BASED
VEL TECH MULTI TECH
BHARATHIRAJA.S ON LOAD BALANCING IN
63. DR.RANGARAJAN
WIRELESS MESH NETWORKS Bharathiraja.88s@gmail.com
DR.SAKUNTHALA
S.SUMATHI
ENGINEERING COLLEGE
SWAPNA .P
64. S.V.V.S.N. ENGINEERING
Priyaswapna245@gmail.com
COLLEGE
K.SAILAKSHMI
65.
USHA.M ROUTING BASED ON LOAD VELLAMMAL umahalingam@gmail.com
21
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
ADIPARASAKTHI
ENGINEERIMG
FLEXIBLE LOAD BALANCING COLLEGE,
66.
DURAI MURUGAN.J IN MULTI_SERVER GRID Dmurugan02@gmail.com
ENVIRONMENT MELMARUVATHUR.
EFFICIENT LOAD
67. BALANCING IN
JEGADEESAN.R VEL HIGH TECH ramjaganjagan@gmail.com
VIDEO SERVERS
FOR VOD SYSTEM
ADIPARASAKTHI
A TOOL FOR FINDING BUGS
ENGINEERIMG
68. IN WEB APPLICATIONS
RAMALINGAM.D COLLEGE, Ramscse_2006@yahoo.co.in
MELMARUVATHUR.
SURENDRAN.M
VIRTUAL MOUSE USING HCI
70. SRIRAM ENGINEERING
M.SARANYA Suren.csc@gmail.com
COLLEGE
S.SUBRAMANIAN
VEL TECH MULTI TECH
EFFICIENTLY IDENTIFYING
71. DR.RANGARAJAN
PANNER SELVI.R DDOS ATTACKS BY GROUP Rpanner_selvi@yahoo.co.in
DR.SAKUNTHALA
BASED THEORY
ENGINEERING COLLEGE
DEDUCING THE SCHEMA FOR VEL TECH MULTI TECH
72. WEBSITES USING PAGE- DR.RANGARAJAN
TAMILARASI.P tamuluparaman@gmail.com
LEVEL WEB DATA DR.SAKUNTHALA
EXTRACTION ENGINEERING COLLEGE
22
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
S.K.P.ENGINEERING
MINIMIZATION OF HANDOFF
75. COLLEGE,
VICTORIYA FAILURE PROBABILITY IN Victoriya.isai@gmail.com
NGWS USING CHMP
TIRUVANNAMALAI,
LEARNING
DISCRIMINATIVE
76. CANONICALCORRELATI SRR ENGINEERING
SUDHA RAJESH --
ONS FOR OBJECT COLLEGE
RECOGNITION WITH
IMAGE
DISTRIBUTED DATA BACKUP
AND RELIABLE ADHIPARASAKTHI
bhuvanabharathy@gmail.com
77. RECOVERY FROM MOBILE ENGINEERING
BHUVANESWARI
GRID ENVIRONMENT COLLEGE,
MELMARUVATHUR
23
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
24
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
25
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
(U^/U*)>=(γmin /γmax)
B. Fairness
26
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Theorem 2: Given an equilibrium(xˆ,pˆ) , let ĉ Пl=1L ml[k(j)]t + Пl=1L ml[n(j)]t>= Пl=1L ml[ζ(j)]t
j
:=R j xˆ j be the total bandwidth consumed by
flows using protocol at each link. The then the equilibrium of a regular network is
corresponding flow rates xˆ j are the unique locally stable
solution of
27
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
μji(t)+ kji(((Σ j
lєL(j,i) m l(рl (t+T))/ Σ lєL(j,i)рl α= { min{(1+δ)α, α*)}, if α< α*
(t+T))- μ i(t)).Where as, kji =stepsize of the
j
{ max{(1-δ)α, α*)}, if α>α*
flow(j,i).T is large enough so that the fast
timescale dynamics among X and pcan if where determines the responsiveness and is
reach steady state. 0.1 by default.
VII . SIMULATION RESULTS :RENO AND FAST 2. Every window update interval (20 ms by
default), run.
In this section, we apply Algorithm 1 to the
case of Reno and FAST coexisting in the same
network to resolve the issues illustrated in
Section II. It demonstrates how the algorithm
can be deployed incrementally where the
existing protocol (Reno in this case) needs no
change and only the new protocols (FAST in
this case) need to adopt slow timescale
adaptation for the whole network to converge
to the unique equilibrium that maximizes
(weighted) aggregate utility. Experiments in
this section were conducted in ns-2.
We take Reno‘s loss probability as the link Fig. 3. FAST versus Reno with buffer size 400
price, i.e.,m1i(pi)=pi for Reno. Algorithm 1 then packets(a) a sample and (b) an average
reduces to an α adaptation scheme for FAST behaviour
that uses only end-to-end local information
that is available to each flow. This algorithm,
displayed as Algorithm 2, tunes the value of α
according to the signals of queue delay and
loss on a large timescale. The basic idea is
that FAST should adjust its aggressiveness
(parameterα ) to the proper level by looking at
the ratio of end-to-end queueing delay and
end-to-end loss. Therefore FAST also reacts to
loss in a slow timescale.
α*=(q/lw)α0
28
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
ms with buffer size of 400 and 3.2 packets per Fig.5.FAST starts first (a) a sample and (b) an
ms with buffer size of 80, while Reno gets 4.2 average behaviour
and 4.1 packets per ms, respectively. The
fairness is greatly improved and essentially
independent of buffer size now. This is
summarized in Table I by listing the ratio of
Reno‘s bandwidth to FAST‘s.We also note that
the utilization of the link for increases
significantly from 53.6% to 97.7%. The
trajectories of with different buffer sizes are
presented in Fig. 4. It is clear that although
FAST starts with in both cases, it finally ends
up with a much larger in the scenario where ,
as it experiences much higher equilibrium Fig.5.Reno starts first (a) a sample and (b) an
queueing delay with the large buffer. average behaviour
29
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
clarify the global dynamics of the two networks,‖ in Proc. 1stVALUETOOLS, 2006,
timescale system. Article no. 55.
[7] S. Low, ―A duality model of TCP and queue
The main technical difficulty here is that the management algorithms,‖ IEEE/ACM Trans.
fast timescale system may have multiple Netw., vol. 11, no. 4, pp. 525–536, Aug. 2003.
equilibria and therefore the usual two [8] J. Mo and J.Walrand, ―Fair end-to-end
timescale argument (e.g., singular window-based congestion control,‖IEEE/ACM
perturbation) is not applicable. Our current Trans. Netw., vol. 8, no. 5, pp. 556–567, Oct.
model assumes each protocol only reacts to 2000.
one particular price on the fast timescale, even [9] K. Ramakrishnan, S. Floyd, and D. Black,
when they have access to multiple types of ―The addition of explicit congestion notification
prices. Finally, the current results should be (ECN) to IP,‖ Internet Engineering Task Force,
extended from static to dynamic setting where RFC 3168, 2001.
flows come and go . [10] A. Tang, J. Wang, S. Hedge, and S. Low,
―Equilibrium and fairness of networks shared
by TCP Reno and Vegas/FAST,‖ Telecommun.
Syst., vol. 30, no. 4, pp. 417–439, Dec. 2005.
[11] A. Tang, J. Wang, S. Low, and M.
Chiang, ―Equilibrium of heterogeneous
congestion control: Existence and
uniqueness,‖ IEEE/ACMTrans. Netw., vol. 15,
REFERENCES no. 4, pp. 824–837, Aug. 2007.
[12] V. Jacobson, ―Congestion avoidance and
[1] T. Bonald and L. Massoulié, ―Impact of control,‖ in Proc. ACM SIGCOMM,1988, pp.
fairness on Internet performance,‖in Proc. 314–329.
ACM Sigmetrics, Jun. 2001, pp. 82–91. [13] ―WAN-in-Lab,‖ [Online]. Available:
[2] L. Brakmo and L. Peterson, ―TCP Vegas: http://wil.cs.caltech.edu
End-to-end congestion avoidance on a global [14] Z. Wang and J. Crowcroft, ―Eliminating
Internet,‖ IEEE J. Sel. Areas Commun., vol. periodic packet losses in the 4.3-Tahoe BSD
13, no. 6,pp. 1465–80, Oct. 1995. TCP congestion control algorithm,‖ ACM
[3] S. Deb and R. Srikant, ―Rate-based versus Comput.Commun. Rev., vol. 22, no. 2, pp. 9–
queue-based models of congestion control,‖ 16, Apr. 1992.
IEEE Trans. Autom. Control, vol. 51, no. 4, pp. [15] D. Wei, C. Jin, S. Low, and S. Hegde,
606–618, Apr. 2006. ―FAST TCP: Motivation, architecture,
[4] S. Floyd and V. Jacobson, ―Random early algorithms, performance,‖ IEEE/ACM Trans.
detection gateways for congestion avoidance,‖ Netw., vol. 14, no. 6, pp. 1246–1259, Dec.
IEEE/ACM Trans. Netw., vol. 1, no. 4, pp. 2006.
397–413, Aug. 1993. [16] B.Wydrowski, L. H. Andrew, and M.
[5] R. Jain, ―A delay-based approach for Zukerman, ―MaxNet: A congestion control
congestion avoidance in interconnected architecture for scalable networks,‖ IEEE
heterogeneous computer networks,‖ ACM Commun. Lett., vol. 7, no. 10, pp. 511–513,
Comput.Commun.Rev., vol. 19, no. 5, pp. 56– 2003.
71, Oct. 1989. [17] L. Xu, K. Harfoush, and I. Rhee, ―Binary
[6] S. Kunniyur and R. Srikant, ―End-to-end increase congestion control for fast long-
congestion control: Utility functions, random distance networks,‖ in Proc. IEEE INFOCOM,
losses and ECN marks,‖ IEEE/ACM Trans. 2004, vol. 4, pp. 2514–2524.
Netw., vol. 11, no. 5, pp. 689–702, Oct. [18] Highspeed Networks : William Stallings.
2003.[22] S. Liu, T. Basar, and R. Srikant, Edition-4.Publication : pearson.
―TCP-Illinois: A loss and delay-based
congestion control algorithm for high-speed
30
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*V.Gomathi
PG.SCHOLAR, M.E .CSE, Prathyusha Institute of Technology and Management
Email:-gomathi.gowrisankar@gmail.com, 9941446176
31
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
unified and global framework that consists of convergence. This paper is organized as
robust set of applications for capturing, follows: Section 2 gives a brief overview of the
recording, indexing and retrieval combined CBIR; Section 3 describes the basic
with browsing and various other audiovisual architecture and several functionalities of M-
and semantic capabilities. MUVIS. Section 4 describes query techniques,
On this purpose our research work targets Section 5 describes RF method basis on the
to bring the MUVIS framework beyond the NPRF search. Section 6 describes the
desktop environment into the realm of Experimental results. Finally, the conclusion is
wireless devices such as mobile phones, section 7.
Personal Digital Assistants (PDAs),
communicators etc., where the user can II. CONTENT BASED IMAGE RETRIEVAL (CBIR)
perform query operations in large multimedia
databases and query results can be retrieved CONTENT-BASED IMAGE RETRIEVAL, KNOWN AS
within a reasonable time. Therefore, our main CBIR, EXTRACTS SEVERAL FEATURES THAT DESCRIBE
goal is to design and develop a CBIR system THE CONTENT OF THE IMAGE, MAPPING THE VISUAL
that enables any (mobile) client supporting CONTENT OF THE IMAGES INTO A NEW SPACE CALLED
Java platform to retrieve images similar to the THE FEATURE SPACE. THE FEATURE SPACE VALUES FOR
query image from an image database, which is A GIVEN IMAGE ARE STORED IN A DESCRIPTOR THAT
accompanied by a dedicated server CAN BE USED FOR RETRIEVING SIMILAR IMAGES. TO
application. ACHIEVE THESE GOALS, CBIR SYSTEMS USE THREE
In general, the purpose of CBIR is to present BASIC TYPES OF FEATURES: COLOUR
an image conceptually, with a set of low-level FEATURES, TEXTURE FEATURES AND SHAPE FEATURES.
visual features such as colour, texture and HIGH RETRIEVAL SCORES IN CONTENT-BASED IMAGE
shape. These conventional approaches for RETRIEVAL SYSTEMS CAN BE ATTAINED BY
image retrieval are based on the computation ADOPTING RELEVANCE FEEDBACK MECHANISMS. THESE
of the similarity between the user‘s query and MECHANISMS REQUIRE THE USER TO GRADE THE
images via a query-by-example system (QBE). QUALITY OF THE QUERY RESULTS BY MARKING THE
System, the user can pick up some preferred RETRIEVED IMAGES AS BEING EITHER RELEVANT OR
images to refine the image explorations NOT. THEN, THE SEARCH ENGINE USES THIS GRADING
iteratively. The feedback procedure, called INFORMATION IN SUBSEQUENT QUERIES TO BETTER
Relevance Feedback (RF), repeats until the SATISFY USERS NEEDS. IT IS NOTED THAT WHILE
user is satisfied with the retrieval results. RELEVANCE FEEDBACK MECHANISMS WERE FIRST
Although a number of RF studies have been INTRODUCED IN THE INFORMATION RETRIEVAL FIELD,
made on interactive CBIR, they still incur some CURRENTLY RECEIVE CONSIDERABLE ATTENTION IN
common problems, namely redundant THE CBIR FIELD. THIS PROJECT MAINLY FOCUSED ON
browsing and exploration convergence. First, EFFICIENT CONTENT BASED IMAGE RETRIEVAL ON
in terms of redundant browsing, most existing MOBILE DEVICE USING NAVIGATION PATTERN BASED
RF methods focus on how to earn the user‘s RELEVANCE FEEDBACK. THIS SECTION CONTAINS BASIC
satisfaction in one query process. That is, INFORMATION ON CBIR, AND DISCUSSES THE
existing methods refine the query again and TECHNIQUES USED.
again by analysing the specific relevant images
picked up by the users. Especially for the
compound and complex images, the users
might go through a long series of feedbacks to
obtain the desired images using current RF
approaches. The proposed approach NPRF
integrates the discovered navigation patterns
and three RF techniques to achieve efficient
and effective images. The major difference
between our proposed approach and other
contemporary approaches is that it has
approximated an optimal solution to resolve
the problems existing in current RF, such as Fig 1: content based image retrieval
FOR THE DESIGN OF CONTENT-BASED RETRIEVAL
redundant browsing and exploration SYSTEMS, A DESIGNER NEEDS TO CONSIDER FOUR
32
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
33
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
USED TO FIND THE MOST SIMILAR OBJECTS. MULTI- SYSTEMS ARE TRADITIONALLY PERFORMED BY USING
DIMENSIONAL INDEXING IS USED TO ACCELERATE THE AN EXAMPLE OR SERIES OF EXAMPLES. THE TASK OF
QUERY PERFORMANCE IN THE SEARCH PROCESS. THE SYSTEM IS TO DETERMINE WHICH CANDIDATES
ARE THE MOST SIMILAR TO THE GIVEN EXAMPLE. THIS
F. SIMILARITY MEASUREMENT DESIGN IS GENERALLY TERMED QUERY BY EXAMPLE
(QBE) MODE. THE SUCCESS OF THE QUERY IN THIS
TO MEASURE THE SIMILARITY, THE GENERAL APPROACH HEAVILY DEPENDS ON THE INITIAL SET OF
APPROACH IS TO REPRESENT THE DATA FEATURES AS CANDIDATES.
MULTI-DIMENSIONAL POINTS AND THEN TO CALCULATE
THE DISTANCES BETWEEN THE CORRESPONDING I. Relevance Feedback
MULTI-DIMENSIONAL POINTS. SELECTION OF METRICS
HAS A DIRECT IMPACT ON THE PERFORMANCE OF A HIGH RETRIEVAL SCORES IN CONTENT-BASED
RETRIEVAL SYSTEM. EUCLIDEAN DISTANCE IS THE IMAGE RETRIEVAL SYSTEMS CAN BE ATTAINED BY
MOST COMMON METRIC USED TO MEASURE THE ADOPTING RELEVANCE FEEDBACK MECHANISMS. THESE
DISTANCE BETWEEN TWO POINTS IN MULTI- MECHANISMS REQUIRE THE USER TO GRADE THE
DIMENSIONAL SPACE. HOWEVER, FOR SOME QUALITY OF THE QUERY RESULTS BY MARKING THE
APPLICATIONS, EUCLIDEAN DISTANCE IS NOT RETRIEVED IMAGES AS BEING EITHER RELEVANT OR
COMPATIBLE WITH THE HUMAN PERCEIVED SIMILARITY. NOT. THEN, THE SEARCH ENGINE USES THIS GRADING
A NUMBER OF METRICS (E.G., MINKOWSKI-FORM INFORMATION IN SUBSEQUENT QUERIES TO BETTER
DISTANCE, EARTH MOVER‘S DISTANCE, AND SATISFY USERS' NEEDS. IT IS NOTED THAT WHILE
PROPORTIONAL TRANSPORTATION DISTANCE) HAVE RELEVANCE FEEDBACK MECHANISMS WERE FIRST
BEEN PROPOSED FOR SPECIFIC PURPOSES. INTRODUCED IN THE INFORMATION RETRIEVAL FIELD,
THEY CURRENTLY RECEIVE CONSIDERABLE ATTENTION
G. MULTI-DIMENSIONAL INDEXING IN THE CBIR FIELD.
RETRIEVAL OF THE MEDIA IS USUALLY BASED NOT III. M-MUVIS FRAME WORK
ONLY ON THE VALUE OF CERTAIN ATTRIBUTES, BUT our research work targets to bring the MUVIS
ALSO ON THE LOCATION OF A FEATURE VECTOR IN THE framework beyond the desktop environment
FEATURE SPACE. IN ADDITION, A RETRIEVAL QUERY ON into the realm of wireless devices such as
A DATABASE OF MULTIMEDIA WITH MULTI- mobile phones, Personal Digital Assistants
DIMENSIONAL FEATURE VECTORS USUALLY REQUIRES (PDAs), communicators etc., where the user
FAST EXECUTION OF SEARCH OPERATIONS. TO can perform query operations in large
SUPPORT SUCH SEARCH OPERATIONS, AN APPROPRIATE multimedia databases and query results can
MULTI-DIMENSIONAL ACCESS METHOD HAS TO BE USED be retrieved within a reasonable time.
FOR INDEXING THE REDUCED BUT STILL HIGH Therefore, our main goal is to design and
DIMENSIONAL FEATURE VECTORS. POPULAR MULTI- develop a CBIR system that enables any
DIMENSIONAL INDEXING METHODS INCLUDE R-TREE (mobile) client supporting Java platform to
AND R*-TREE. THESE MULTI-DIMENSIONAL retrieve images similar to the query image
INDEXING METHODS PERFORM WELL WITH A LIMIT OF from an image database, which is
UP TO 20 DIMENSIONS. AN APPROACH TO TRANSFORM accompanied by a dedicated server
MUSIC INTO NUMERIC FORMS AND DEVELOPED AN application. The developed system, so called
INDEX STRUCTURE BASED ON R-TREE FOR EFFECTIVE Mobile MUVIS (M-MUVIS), shown in Figure 2 is
RETRIEVAL. structured upon contemporary MUVIS
framework and has client-server architecture.
H. QUERY SPECIFICATIONS The M-MUVIS server basically comprises of
two Java servlets [5] running inside a Tomcat
QUERYING IS USED TO SEARCH FOR A SET OF [8] web server, which in effect transforms
RESULTS WITH SIMILAR CONTENT TO THE SPECIFIED standalone MUVIS into a web application. The
EXAMPLES. BASED ON THE TYPE OF MEDIA, QUERIES IN MUVIS Query Server (MQS) has native libraries
CONTENT-BASED RETRIEVAL SYSTEMS CAN BE for efficient image query related operations.
DESIGNED FOR SEVERAL MODES (E.G., QUERY BY The second servlet so called MUVIS Media
SKETCH, QUERY BY PAINTING [FOR VIDEO AND IMAGE], Retrieval Server (MMRS) is used for the media
QUERY BY SINGING [FOR AUDIO], AND QUERY BY retrieval. In order to take the advantage of
EXAMPLE).QUERIES IN MULTIMEDIA RETRIEVAL flexibility and portability of Java, a M-MUVIS
34
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
client application has been developed by using client) to the client. Sending the intermediate
Java 2, Micro Edition (J2ME) [5]. Such a results to the client consume extra network
system can find its application in sharing or bandwidth, RAM, processing power and
reuse of digital media, content management, battery power of the device. Whereas IQ
networked photo album, shopping and travel. provides an efficient retrieval without
generating many intermediate query results in
larger image database.
V. NPRF SEARCH
Despite the power of the search strategies, it
is very difficult to optimize the retrieval quality
of CBIR within only one query process. The
hidden problem is that the extracted visual
features are too diverse to capture the
concept of the user‘s query. To solve such
problems, in the QBE system, the user can
pick up some preferred images to refine the
image explorations iteratively. The feedback
procedure, called Relevance Feedback (RF),
repeats until the user is satisfied with the
retrieval results. Although a number of RF
studies have been made on interactive CBIR,
Fig 2: M-MUVIS framework
they still incur some common problems,
IV. QUERY TECHNIQUES namely redundant browsing and exploration
With growing image content, an efficient convergence. To resolve the aforementioned
image retrieval technique is deemed required. problems, we propose a novel method named
Specially, for a mobile device user, performing NPRF (Navigation Pattern-Based Relevance
the query can be annoying experience due to Feedback) to achieve the high retrieval quality
the large query processing time [2], [6]. It is of CBIR with RF by using the discovered
therefore, vital to devise a method which not navigation patterns. The proposed approach
only reduces the query processing time but NPRF integrates the discovered navigation
also performs the query operation without patterns and three RF techniques to achieve
requiring a system equipped with high efficient and effective images.
performance hardware such as fast processors Query-Reweighting (QR): Some
and large memory. In this paper present an previous work keeps an eye on investigating
Interactive Query (IQ) [6] for a mobile device what visual features are important for those
which achieves retrieval performance that may images (positive examples) picked up by the
not require a superior performing system on users at each feedback (also called
the server side and reduce network bandwidth iteration).For this kind of approach, no matter
and processing power on the client side. how the weighted or generalized distance
Before IQ, M-MUVIS supported Normal Query function is adapted, the diverse visual features
(NQ) and Progressive Query (PQ) [2]. In NQ extremely limit the effort of image retrieval.
the query results were based on comparing Figure 3 illustrates this limitation that,
similarity distances of all the images primitives although the search area is continuously
present in the entire database and performing updated by re-weighting the features, some
ranking operation afterwards. NQ is costly in targets could be lost.
terms of processing power and in case of
abrupt stopping during the query processes
the retrieved query information is lost. PQ
generates the query results after a fix time
interval. In large image database with small
time interval PQ generates many results that
consume lot of memory and the server
processing power. The server sends the
desired intermediate result (as selected by
35
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Query-Point-Movement (QPM):Another
solution for enhancing the accuracy of image
retrieval is moving the query point towards the
contour of the user‘s preference in feature
space. QPM regards multiple positive examples
as a new query point at each feedback. After
several forceful changes of location and
contour, the query point should be close to a
convex region of the user‘s interest.
Query Expansion (QEX): If QR and
QPM cannot completely cover the user‘s
interest spreading in the broad feature space,
As a result, diverse results for the same
concept are difficult to obtain. For this reason,
the modified version MARS groups the similar
relevant points into several clusters, and Fig 4: Workflow of NPRF Search
selects good representative points from these
clusters to construct the multipoint query. Initial query processing Phase:
Overview of NPRF (Navigation Pattern based Without considering the feature-weight, this
Relevance Feedback) phase extracts the visual features from the
The task of the proposed approach shows original query image to find the similar
various operations .As depicted in Figure 4, images. Afterward, the good examples picked
each operational phase contains some critical up by the user are further analyzed at the first
components for completing the specific feedback.
process. The first query process is called initial Image search phase: Behind the
feedback. Next the good examples picked up search phase, our intent is to extend the one
by the user deliver the valuable information to search point to multiple search points by
the image search phase, including new feature integrating the navigation patterns and
weights, new query-point and the user‘s propose algorithm NPRF search. In this phase,
intention. Then, by using the navigation a new query point at each feedback is
patterns, three search strategies, with respect generated by the preceding positive examples,
to QPM, QR and QEX are hybridized to find the and then the K nearest images to the new
desired images. Overall, at each feedback, the query point can be found by expanding the
results are presented to the user and the weighted query. The search procedure does
related browsing information is stored in the not stop unless the user is satisfied with the
log database. After accumulating long-term retrieval results.
User‘s browsing behaviours, off-line operation Knowledge Discovery Phase: Learning
for knowledge discovery is triggered to from users‘ behaviours in image retrieval can
perform navigation pattern mining and pattern be viewed as one type of knowledge
indexing. The frame work of the proposed discovery. The navigation patterns from user‘s
approach is briefly described as follows: behaviour support to predict optimal image
browsing paths.
Data Storage phase: The databases in
this phase can be regarded as the knowledge
marts of a knowledge warehouse, which store
integrated, time variant and non-volatile
collection of useful data including images,
navigation patterns, log files, and image
features, The knowledge warehouse in very
helpful to improve the quality of image
retrieval.
36
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
K means Algorithm
Step 1: Enter How Many Clusters (Let ―k‖).
Step 2: Randomly Guess K Cluster center
Locations.
Step 3: Each Data point finds out which center
it‘s closest to.
Step 4: Thus Each Center ―Owns‖ Set of Fig 5: The query result shown on Nokia 5800
Points.
Step 5: Each Center Finds the Centroid of its
Own Points.
37
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
38
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
39
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
40
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
work has been done to develop efficient TPR-tree. However, in our proposed
algorithms for continuous reverse nearest methods, we do not assume a specific
neighbor queries. velocity and objects can move in any
Various algorithms have been proposed direction which is not constrained to a
for snapshot RNNs in different single direction. To our knowledge, there
environments, e.g., in euclidian space, is only one algorithm that does not
metric space, high-dimensional space ad- assume that objects move on a single
hoc space and large graphs. In this paper, plane and a velocity is not given, termed
we mainly focus on euclidian space in CRNN [6], for continuous evaluation of
which it is proved that there are at most reverse nearest neighbor queries. CRNN
six reverse nearest neighbors for the extends the idea of dividing the space into
monochromatic case. Utilizing this six pies, originally developed for snapshot
property, an approach has been queries [7], to dynamic environments. As
introduced by dividing the spatial space a result, CRNN monitors each pie region
into six pie regions. Then, six nearest along with six moving objects at every
neighbor objects (one object in each pie) time interval. However, CRNN has two
are used as filters to limit the search main disadvantages: 1) CRNN is limited to
space. monochromatic RNN queries and 2) CRNN
always assumes a constant worst case
The RdNN[12] -Tree extends the RNN- scenario at every time interval where it is
Tree by combining the two index assumed that there are always six RNNs.
structures (NN-Tree and RNN-Tree) into These drawbacks arise from the fact that
one common index. It is also designed for CRNN ignores the relationship between
reverse 1-nearest neighbor search. For the neigh boring pies.
each object p, the distance to p‘s 1- The reverse nearest neighbor queries is
nearest neighbor, i.e. nndist1 (p) is intimately related to nearest neighbor
precomputed. In general, the RdNN-Tree queries. In this section, we first overview
is a R-Tree-like structure containing data the existing proposals for answering
objects in the data nodes and MBRs in the nearest neighbor queries, for both
directory nodes. In addition, for each data stationary and moving points. Then, we
node N, the maximum of the 1-nearest discuss the proposals related to reverse
neighbor distance of the objects in N is nearest neighbor queries.
aggregated. An inner node of the RdNN-
Tree aggregates the maximum 1-nearest III REVERSE NEAREST NEIGHBOR
neighbor distance of all its child nodes. A ALGORITHM
reverse 1-nearest neighbor query is Reverse Nearest Neighbor Queries To our
processed top down by pruning those knowledge, three solutions exist for
nodes N where the maximum 1-nearest answering RNN queries for non-moving
neighbor distance of N is greater than the points in two and higher dimensional
distance between query object q and N, spaces. Stanoi et al. present a solution for
because in this case, N cannot contain answering RNN queries in two-
true hits anymore. Due to the dimensional space. Their algorithm is
materialization of the 1-nearest neighbor based on the following observations [20].
distance of all data objects, the RdNN- Let the space around the query point q be
Tree needs not to compute 1-nearest divided into six equal regions Si( 1 ≤ i ≤ 6
neighbor queries for each object. ) by straight lines intersecting at q , as
A recent technique for finding shown in Figure 3. Then, there exists at
monochromatic reverse nearest neighbors most six RNN points for q, and they are
for moving objects [8] is similar to our distributed as follows.
problem except that the velocity of each 1. There exists at most two RNN points in
object is given as part of the input and each region Si.
each object is assumed to move on a 2. If there exists exactly two RNN points
plane which can then be indexed using a in a region Si, then each point must be on
41
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
one of the space dividing lines through q IV ALGORITHM FOR FINDING REVERSE
delimiting Si. NEAREST NEIGHBORS
In this section, we describe algorithm Find
RNN that computes the reverse nearest
neighbors for a continuously moving point
in the plane. The notation is the same as
in the previous section. The algorithm,
shown in Figure 2, produces a list LRNN=
{<pj, Tj>} where pj is the reverse
nearest neighbor of q during time interval
Tj. Note that the format of LRNN differs
from the format of the answer to the RNN
query, as defined, where intervals Tj do
Figure 1: Division of the Space around not overlap and have sets of points
Query Point q associated with them. To simplify the
description of algorithms we use this
The same kind of observation leads to the format in the rest of the paper. Having
following line l3 line l2 property. Let p be LRNN it is quite straightforward to
a NN point of q in Si. If p is not on one of transform it into the format described by
the space-dividing lines, either ' is the NN sorting end points of time intervals in RNN
point of (and then p is the RNN point of and performing a ―time sweep‖ to collect
q), or q has no RNN point in Si. Stanoi et points for each of the formed time
al. prove this property. These intervals.
observations enable a reduction of the
RNN problem to the NN problem. For
each region Si, a candidate set of one or
two NN points of q in that region is found.
(A set with more than two NN points is
not a candidate set.) Then for each of
those points, it is checked whether q is
the nearest neighbor of that point. The
answer to the RNN (q) query consists of
those candidate points that have q as
their nearest neighbor. In another
solution for answering RNN queries, Korn
and Muthukrishnan use two R-trees for Figure 2: Algorithm Computing Reverse
the querying, insertion, and deletion of nearest Neighbors for Moving Objects in
points. In the RNN-tree, the minimum the Plane
bounding rectangles of circles having a To reduce the disk I/O incurred by the
point as their center and the distance to algorithm, all the six sets Biare found in
the nearest neighbor of that point as their asingle traversal of the index. Note that if,
radius are stored. The NN-tree is simply at some time, there is more than one
an R*-tree where the data points are nearest neighbor in some Si, those
stored. Yang and Lin improve the solution nearest neighbors are nearer to each
of Korn and Muthukrishnan by introducing other than to the query point, meaning
the Rdnn-tree, which makes possible to that Si will hold no RNN points for that
answer both RNN queries and NN queries time. We thus assume in the following
using a single tree. None of the above- that, in sets Bi, each interval Tijconsists of
mentioned methods handle continuously a single nearest neighbor point, nnij.All the
moving points. In the next section, before RNN candidates‘nnijare also verified in one
presenting our method, we discuss the traversal. To make this possible, we use
extendibility of these methods to support Σi,j M(R, nij) as the metric for ordering the
continuously moving points. search in step 2.1 ofFind RNN. In
42
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
43
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
during the expansion we deheap some of points with maximum speeds of 0.75, 1.5,
these nodes, we directly insert their kNN and 3 km/min.
objects into the temporary result Then, queries are introduced, intermixed
(denoted by W in Section 2.1) and do not with additional updates. Each query
en-heap their adjacent nodes. corresponds to a randomly selected point
from the currently active data set. Our
VIII EXPERIMENTAL performance graphs report average
EVALUATION numbers of I/O operations per query.
The algorithms presented in this paper In this section, we evaluate the
were implemented in C++, using a TPR- robustness and scalability of our proposed
tree implementation based on GiST. methods on a real road network. Our
Specifically, the TPR-tree implementation algorithms were implemented in C++ and
with self-tuning time horizon was used. experiments were executed on a Pentium
We investigate the performance of D 2.8 GHz PC. We measured the average
algorithms in terms of the number of I/O of the following performance values over
operations they perform. The disk page a query workload of 100 queries: 1)
size (and the size of a TPR-tree node) is anonymization time and refinement time
set to 4k bytes, which results in 204 at the anonymizer AZ 2) I/O time and
entries per leaf node in trees. An LRU CPU time for query processing at the
page buffer of 50 pages is used, with the location server LS and 3) the
root of a tree always pinned in the buffer. communication cost (in terms of
The nodes changed during an index transmitted points) for the anonym zing
operation are marked as ―dirty‖ in the edge list AEL and the candidate set CS.
buffer and are written to disk at the end Note that each edge in AEL is counted as
of the operation or when they otherwise two points
have to be removed from the buffer.
The performance studies are based on CONCLUSION
synthetic ally generated workloads that In this paper we proposed a frame work
intermix update operations and queries. for answering Reverse Nearest Neighbor
To generate the workloads, we simulate Ë queries using k anonymity.And shown the
objects moving in a region of space with experimental results for robustness and
dimensions 1000 X 1000 kilometers. scalability of the proposed system using a
Whenever an object reports its Road Networks.
movement, the old information pertaining ACKNOWLEDGMENT
to the object is deleted from the index The authors gratefully acknowledge the
(assuming this is not the first reported following individuals for their support: Mr.
movement from this object), and the new R. Nakeeran Professor, Dr.Pauls
information is inserted into the index. Engineering College and my family and
Two types of workloads were used in the friends for their valuable guidance for
experiments. In most of the experiments, devoting their precious time, sharing their
we use uniform workloads, where knowledge and co-operation.
positions of points and their velocities are
distributed uniformly. The speeds of REFERENCES
objects vary from 0 to 3 kilometers per [1] B.Gedik and L. Liu, ―MobiEyes:
time unit (minute). In other experiments, Distributed Processing of Continuously
more realistic workloads are used, where Moving Queries on Moving Objects in a
objects move in a network of two-way Mobile System,‖ Proc. Int‘l Conf.
routes, interconnecting a number of Extending Database Technology (EDBT),
destinations uniformly distributed in the 2004.
plane. Points start at random positions on [2] H. Hu, J. Xu, and D.L. Lee, ―A
routes and are assigned with equal Generic Framework for Monitoring
probability to one of three groups of Continuous Spatial Queries over Moving
Objects,‖ Proc. ACM SIGMOD, 2005.
44
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
45
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
46
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
47
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
4. Applying SVD
5. Applying IDWT
6. Get the Equalized Image..
CONCLUSION
Figureno. 3.2 Architecture Diagram of The singular-value-based image
DWT equalization (SVE) technique is based on
DATA FLOW DIAGRAM equalizing the singular value matrix
Data Flow strategy shows the use of data obtained by singular value decomposition
in a system pictorically. The tools used in (SVD). A new satellite image contrast
the following strategy shows all the enhancement technique based on DWT
essential features of a system and how and SVD is implemented. The
they fit together. Data Flow tools help by techniquedecomposes the input image
illustrating the essential components of a into the DWT sub bands, and, after
system and their interactions. Data Flow updating the singular value matrix of the
Diagrams are one of the most important LL subband, it reconstructed the image by
tools in Data Flow strategy. using IDWT.
Data Flow Diagram is a means of Discrete Wavelet Transform (DWT) is any
representing a system at any level of wavelet transform for which the wavelets
detail with the graphic network of are discretely sampled. The Haar DWT
symbols showing data flows, data stores, illustrates the desirable properties of
data processes and data sourced for wavelets in general.
destination. The singular value matrix represents the
intensity information of the give image
The purpose of Data Flow diagram is to and any change on the singular values
provide a semantic bridge between users change the intensity of the input image.
and system developers. This technique converts the image into
LEVEL 0 DFD the SVD domain and after normalizing the
singular value matrix it reconstructs the
0 image in the spatial domain by using the
Lo Equ updated singular value matrix.
w Imag aliz The technique was compared with the
Con e ed GHE, LHE, BPDHE and SVE techniques.
trast Enha Sate The visual results on the final image
Sate ncem llite quality show the superiority over the
Figure no.
llite3.3 Level ent
0 DFD ima conventional and the state-of-the-art
ima based ge techniques.
SYSTEM geDESIGN on
DWT REFERENCES
In this chapter let and us discuss about the
context diagram, the SVDdata flow diagrams [1].H. Demirel, G. Anbarjafari, and M. N.
and that are the modules present in our S. Jahromi, ―Image equalization based on
Paper with the description for each singular value decomposition,‖ in Proc.
modules. 23rd IEEE Int. Symp.Comput. Inf. Sci.,
Istanbul, Turkey, Oct. 2008, pp. 1–5.
MODULES
1. Get image [2] H. Ibrahim and N. S. P. Kong,
2. Equalized Image Using GHE ―Brightness preserving dynamic histogram
3. Applying DWT
48
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
49
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
50
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
51
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2.RELATED WORKS
52
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
53
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
C. Estimation of the group size from the message-size equals to the length of the
rekeying-message-size binary representation of user IDs, which
In some tree-based key can be independent of N(t).
management schemes, key tree is fully In Iolus, a large group is decomposed into
loaded and maintained as balanced as a number of subgroups, and the trusted
possible by putting the joining users on local security agents perform admission
the shortest branches. In this case, the control and key updating for the
group size N(t) can be estimated directly subgroups. This architecture reduces the
from the rekeying-message-size. Here, number of users affected by key updating
derive a Maximum Likelihood (ML) resulting from membership changes.
estimator and then demonstrate the Since the key updating is localized within
effectiveness of this estimator through each subgroup, the insiders or outsiders
simulations. This ML estimator is first can only obtain the dynamic membership
applied in simulated group information of the subgroups that they
communications. belong to or can monitor.
The idea of clustering was
D. Estimation of group size based on key introduced in to achieve the efficiency by
IDs localizing key updating. The group
members are organized into a hierarchical
Each key contains the secret material that clustering structure. The cluster leaders
is the content of the key and a key are selected from group members and
selector that is used to distinguish the perform partial key management. Since
key. The key selector consists of: 1) a the cluster leaders establish keys for the
unique ID that stays the same even if the cluster members through pair-wise key
key content changes and 2) a version and exchange, the cluster members cannot
revision field, reflecting update of the key. obtain GDI of their clusters. However, the
The basic format of the rekeying cluster leaders naturally obtain the
messages is {Ky}Kx, representing Ky dynamic membership information of their
encrypted by Kx. This message has two clusters and all clusters below from 3 to
parts. The first part is the key selector of 15. Therefore, this key management
Kx, which is not encrypted because scheme can be applied only when a large
otherwise a user will not be able to portion of group members are trusted to
understand this message. The second perform key management and obtain
part is Ky and the key selector of Ky, GDI.
encrypted by Kx. Thus, in the current In, a topology-matching key
implementation, everyone who can management (TMKM) scheme was
overhear therekeying messages can see presented to reduce the communication
the IDs of Kx. overhead by matching the key tree with
the network topology and localizing the
E. GDI vulnerability in other key transmission of the rekeying messages. In
management schemes this scheme, group members receive only
the rekeying message that are useful for
Tree-based key management
schemes have been known for their
efficiency in terms of communication,
computation and storage overhead.
Besides the tree-based scheme, the
VersaKey framework also includes a
centralized flat scheme. When a user joins
or leaves the group, the rekeying-
54
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
themselves and their neighbors. users and the real users lead to a new
rekeying process, called the observed
rekeying process.
55
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
56
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
57
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
techniques, the fundamental tradeoff [7] X. Li, Y. Yang, M. Gouda, and S. Lam,
between the communication overhead ―Batch rekeying for secure group
and the leakage of GDI was studied. In communications,‖ Proceedings of the 10th
addition, this paper provided a brief international conference on
discussion on the GDI problem in World Wide Web, pp. 525–534, 2001.
contributory key management schemes. It [8] M. Moyer, J. Rao, and P. Rohatgi, ―A
was argued that contributoryschemes survey of security issues in multicast
were not suitable for applications in which communications,‖ Network, IEEE, vol. 13,
GDI should be protected. no. 6, pp. 12–23,
1999.
REFERENCES [9] S. Rafaeli and D. Hutchison, ―A survey
of key management for secure group
[1] I. Chang, R. Engel, D. Kandlur, D. communication,‖ ACM Computing Surveys
Pendarakis, D. Saha, I. Center, and Y. (CSUR), vol. 35, no. 3,
Heights, ―Key management for secure pp. 309–329, 2003.
lnternet multicast using Boolean [10] E. McCluskey, ―Minimization of
functionminimization techniques,‖ Boolean functions,‖ Bell System Technical
INFOCOM‘99. Eighteenth Journal, vol. 35, no. 5, pp. 1417–1444,
Annual Joint Conference of the IEEE 1956.
Computer and Communications Societies. [11] A. Fiat and M. Naor, ―Broadcast
Proceedings. IEEE, vol. 2, 1999. Encryption, Advances in Cryptology-
[2] R. Poovendran and J. Baras, ―An Crypto93,‖ Lecture Notes in Computer
information-theoretic approach for design Science, vol. 773, pp. 480–491,
and analysis ofrooted-tree-based 1994.
multicast key management schemes,‖ [12] D. Boneh, A. Sahai, and B. Waters,
IEEE Transactions on Information Theory, ―Fully collusion resistant traitortracing
vol. 47, no. 7, pp. 2824–2834, 2001. with short ciphertexts and private keys,‖
[3] A. Sherman and D. McGrew, ―Key pp.
Establishment in Large Dynamic Groups 573–592, 2006.
Using One-Way Function Trees,‖ IEEE
TRANSACTIONS ONSOFTWARE
ENGINEERING, pp. 444–458, 2003.
[4] A. Perrig, D. Song, and J. Tygar, ―ELK,
A New Protocol for Efficient Large-Group
Key Distribution,‖ IEEE SYMPOSIUM ON
SECURITY
AND PRIVACY, pp. 247–262, 2001.
[5] L. Cheung, J. Cooley, R. Khazan, and
C. Newport, ―Collusion-Resistant Group
Key Management Using Attribute-Based
Encryption,‖ Cryptology ePrint Archive
Report 2007/161, 2007. http://eprint.
iacr. org, Tech. Rep.
[6] J. Bethencourt, A. Sahai, and B.
Waters, ―Ciphertext-Policy Attribute-
Based Encryption,‖ Proceedings of the
28th IEEE Symposium on Securityand
Privacy (Oakland), 2007.
58
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
ABSTRACT
Compromised node and denial of service 1. INTRODUCTION
are two key attacks in wireless sensor OF the various possible security threats
networks (WSNs). In this paper, data encountered in awireless sensor network
delivery mechanisms with high probability (WSN), in this paper, we arespecifically
circumvent black holes formed by these interested in combating two types of
attacks. Classic multipath routing attacks:compromised node (CN) and
approaches are vulnerable to such attacks, denial of service (DOS).In the CN attack,
mainly due to their deterministic nature. an adversary physically compromises
So once the adversary acquires the asubset of nodes to eavesdrop
routing algorithm, it can compute the information, whereas in theDOS attack,
same routes known to the source, hence, the adversary interferes with the
making all information sent over these normaloperation of the network by
routes vulnerable to its attacks. Under our actively disrupting, changing,or even
designs, the routes taken by the ―shares‖ paralyzing the functionality of a subset of
of different packets change over time. So nodes.These two attacks are similar in the
even if the routing algorithm becomes sense that they bothgenerate black holes:
known to the adversary, the adversary still areas within which the adversary caneither
cannot pinpoint the routes traversed by passively intercept or actively block
each packet. Besides randomness, the informationdelivery.
generated M routes are also highly Due to the unattended nature of WSNs,
dispersive and energy efficient, making adversariescan easily produce such black
them quite capable of circumventing black holes. Severe CNand DOS attacks can
holes. Formulation of an optimization disrupt normal data delivery
problem used to minimize the end-to-end betweensensor nodes and the sink, or
energy consumption under given security even partition the topology. Aconventional
constraints. cryptography-based security method
cannotalone provide satisfactory solutions
Key Words: Randomized multipath to these problems. Thisis because, by
routing, wireless sensor network, secure definition, once a node is compromised,
data delivery. the adversary can always acquire the
. encryption/decryptionkeys of that node,
and thus can intercept any
informationpassed through it.
59
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
60
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
61
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
62
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
side of the link.The adversary has the decryption key is very hard to derive from
ability to compromise multiplenodes. the encryption key. The encryption key is
However, we assume that the adversary public so that anyone can encrypt a
cannotcompromise the sink and its message. However, the decryption key is
immediate surrounding nodes.This private, so that only the receiver is able to
assumption is reasonable because the decrypt the message. It is common to set
sink‘s neighbourhood is usually a small up "key-pairs" within a network so that
area, and can be easily physicallysecured each user has a public and private key.
by the network operator, e.g., by The public key is made available to
deploying guardsor installing video everyone so that they can send messages,
surveillance/monitoring equipment. but the private key is only made available
to the person it belongs to.
3.2 Encryption 4.MULTIPATH CONCEPT
Encryption is the 4.1 DSR Protocol Description
conversion of data into a form, called The DSR protocol is composed of two
a ciphertext. mechanisms that work together to allow
There are two basic techniquesfor the discovery and maintenance of source
encrypting information. They are routes in the ad hoc network [6].They are
symmetric Route Discovery is the mechanismby
encryptionor secretkey encryption which a node S wishing to send a packet
asymmetric encryptionor public to a destination node Dobtains a source
key encryption. route to D. Route Discovery is used only
when S attempts to send a packet to D
anddoes not already know a route to D[3].
Route Maintenance is the mechanism by
3.3 Symmetric Encryption which node S is able to detect, while using
Symmetric Encryption (also known as a source routeto D, if the network
symmetric-key encryption, single-key topology has changed such that it can no
encryption, one-key encryption and longer use its route to D because a
private key encryption) is a type of linkalong the route no longer works.
encryption where the same secret key is When Route Maintenance indicates a
used to encrypt and decrypt information source route is broken, S canattempt to
or there is a simple transform between the use any other route it happens to know to
two keys. D, or can invoke Route Discovery again to
A secret key can be a number, a word, or find anew route. Route Maintenance is
just a string of random letters. used only when S is actually sending
Secret key is applied to the information to packets to D.
change the content in a particular way.
This might be as simple as shifting each 4.2 Route Discovery and Route
letter by a number of places in the Maintanence
alphabet. Symmetric algorithms require When a traffic source needs a route to a
that both the sender and the receiver destination,it initiates a route discovery
know the secret key, so they can encrypt process.Route discovery typically involve
and decrypt all information. as network-wide flood of route
There are two types of symmetric request(RREQ) packets targeting the
algorithms: destination and waiting for a route
Stream algorithm(Stream ciphers) reply(RREP)[7].An intermediate node
Block algorithms (Block ciphers) receiving a RREQ packets first sets up a
3.4 Asymmetric Encryption (Public Key reverse pathto source using previous hop
Encryption) of RREQ as the next hop on the reverse
Asymmetric encryption uses different keys path.
for encryption and decryption. The
63
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
64
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
REFERENCES
65
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
66
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
at the medium access control (MAC) layer, The rest of the paper is structured as
together with playout and content- follows. Section II discusses related work.
awareness at the video application layer. Section III introduces the system model
Video content is taken into account both in and problem formulation. Section IV
playout as well as in rate-distortion provides simulation results. Section V
optimized packet scheduling. We briefly concludes the paper.
note the following intuitive tradeoffs faced
by the individual controls in the attempt to II. RELATED WORK
maximize video quality. At the Tx side, the
dilemma is the following: on one hand we Streaming media over an unreliable and/or
want to transmit all media units; on the time-varying network,whether this is the
other hand, during periods that the Internet or a wireless network, is a large
bandwidth is scarce, we may choose to problem space with various aspects and
transmit the most important units and skip control parameters. Several network-
some others, depending on their rate- adaptive techniques have been proposed
distortion values. [5], including rate-distortion optimized
packet scheduling [6], power [12] and/or
At the Rx side, the dilemma is the rate control at the transmitter, and playout
following: on one hand, we want to speed adaptation at the receiver [13], [9].
display the sequence at the natural frame Wireless video, in particular, is a very
rate; on the other hand, during bad challenging problem, due to the limited,
periods of the channel, we may choose to time-varying resources of the wireless
slowdown the playout in order to extend channel; a survey can be found in [4].
the playout deadlines of packets in There is a large body of work on cross-
transmission, and avoid late packet layer design for video streaming over
arrivals (leading to buffer underflow and wireless, including [15], [2], [18] to
frame losses), but at the expense of the mention a few representative examples.
potentially annoying slower playout. A Our work also falls within the scope of
novel aspect of this work is that we cross-layer optimization. In the rest of this
perform content-aware playout variation; section, we clarify our contribution and
that is, we take into account the comparison
characteristics of a video scene when we to prior work in this problem space.
adapt the playout speed. The
contributions of this work are the A. Prior Work on Adaptive Playout
following.
Playout control at the receiver can
1. We formulate the problem of joint mitigate packet delay variation and
playout and scheduling within the provide smoother playout. Adaptive
framework of Markov decision processes playout has been used in the past for
and we obtain the optimal control using media streaming over the Internet for
dynamic programming. both audio [17], [14], and video [13],
[9], [10]. Our work proposes for the first
2. We introduce the idea of content- time to make playout control content-
aware playout and demonstrate that it aware. That is, given a certain amount of
significantly improves the user experience. playout slowdown and variation caused by
bad channel periods, we are interested in
The idea is to vary the playout speed of applying this slowdown to those part of
scenes, based on the scene content; e.g., the video sequence that are less sensitive
scenes with low or no motion typically from a perceptual point of view. Within
may be less affected by playout variation the adaptive playout literature, the closest
than scenes with high motion. to our work are [13] and [9]. However,
there are two differences. The first
67
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
difference is that we propose, for the first enhances the playout and is optimized for
time, a content-aware playout; we build that purpose. The state-of-the-art in rate-
on and extend the metrics in [9] to include distortion optimized packet scheduling is
the notion of motion-intensity of a scene. currently the RaDiO family of techniques
A second and more subtle difference lies [3]: in every transmission opportunity, a
in the formulation. [13] models the system decision is made as to which media units
as a Markov chain and analyzes the to transmit and which to discard, so as to
performance of adaptive algorithms that maximize the expected quality of received
slowdown or speed up the playout rate video subject to a constraint in the
based on the buffer occupancy. However, transmission rate, and taking into
the parameters of these algorithms, such consideration transmission errors, delays
as buffer occupancy thresholds, speedup and decoding dependencies. Similar to
and slowdown factors, are fixed and must RaDiO, our scheduler efficiently allocates
be chosen offline. In contrast, we model bandwidth among packets, so as to
the system as a controlled Markov chain, minimize distortion and meet playout
which allows for more fine-tuned control: deadlines. Both works propose analytical
the control policy itself is optimized for the frameworks to study video transmitted
parameters of the system, including the over networks. However, there are two
channel characteristics. For example, differences. First, the two modeling
when the channel is good, the playout approaches are different: we formulate
policy can be optimistic and use low levels the problem as a controlled Markov chain,
of buffer occupancy, under which it starts thus being able to exploit the channel
to slowdown; when the channel is bad the variations, while RaDiO formulates the
optimal policy should be more problem using Lagrange multipliers, thus
conservative and start slowing down even optimizing for the average case. Second,
when the buffer is relatively full. Finally, different simplifications are used to
another difference lies in the system efficiently search for the optimal solution:
design: [13] performs slowdown and RaDiO optimizes the transmission policy
speedup at the Rx, while we perform for one packet at a time, while we
slowdown at the Rx and drop late packets constrain our policies to in-order
at the Tx, thus saving system resources transmission. Our approach could also be
from unnecessary transmissions. extended to include out-of-order
transmissions. Another framework for
B. Prior Work on Packet Scheduling optimizing packet scheduling is CoDiO,
congestion-distortion optimized streaming
In this paper, we use packet scheduling at [16], which takes into account congestion
the Tx to complement the playout which is detrimental to other flows but
functionality at the Rx. The main purpose also to the stream itself; in a somewhat
of the scheduler is to discard late packets similar spirit, our scheme may purposely
and catch up with accumulated delay drop late packets at the transmitter in
caused by playout slowdown during bad order to avoid self-inflicted increase in the
channel periods; these late packets would stream‘s end-to-end delay. Finally, we
be dropped anyway at the Rx, but would like to note that, in this paper, we
dropping them at the Tx saves system focus on nonscalable video encoding,
resources. In addition, we enhanced the which is the great majority of pre-encoded
scheduler to transmit a subset of the video video today as well as in the foreseeable
packets to meet the channel rate future. If the original video is encoded
constraint with minimum distortion. This with scalable video coding then we will
paper does not aim at improving the state have more flexibility in terms of what to
of the art in radio-distortion optimized drop to fit the available bandwidth and
scheduling; instead, its contribution lies in delay constraints; however, some of the
the playout control. The scheduling control techniques proposed in this paper for
68
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
69
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
the absolute difference between the SAMVIQ than with the other methods. In
original and processed signal. The addition, in this method there is only one
traditional quality metrics are the Root viewer at a time, which alleviates a "group
Mean Square Error (RMSE), the Signal-to- effect" for Evaluation Setup and Scenarios.
Noise Ratio (SNR), and the Peak Signal-to-
Noise Ratio (PSNR) in dB. In this work we Topology
employ a Full reference method and use
the PSNR as the objective quality metric. The evaluation topology consists of one
There are numerous metrics used to Video Streaming Server, two backbone
express the objective quality of an image routers and video clients of variable types
or video, which cannot, however, and connectivity methods (fixed, mobile,
characterize fully the response and end wired, wireless) as shown in Fig. 1. The
satisfaction of the viewer. Perceived video streaming server is attached to the
measure of the quality of a video is done first backbone router with a link which has
through the human "grading" of streams 10Mbps bandwidth and 10ms propagation
which helps collect and utilize the general delay. These values remain constant
user view. There is a number of perceived during all scenarios. This router is
quality of service measurement connected to a second router using a link
techniques. Most of them are explained in with unspecified and variable band-width,
[19]. The following are the most popular: propagation delay, and packet loss. The
a) DSIS (Double Stimulus Impairment different parameter values used to
Scale) b) DSCQS (Double Stimulus characterize this variable link are shown in
Continuous Quality Scale) c) SCAJ Table 1. Using this topology, we
(Stimulus Comparison Adjectival conducted several experiments for two
Categorical Judgement) d) SAMVIQ different sample sequences and with
(Subjective Assessment Method for Video fixed-wired clients, fixed-wireless clients
Quality evaluation) In this work we have and mobile-wireless clients.
used the SAMVIQ [8] method. SAMVIQ is
based on random playout. The individual Variable Test Parameters
viewer can start and stop the evaluation
process as he wishes and is allowed to The choice of the parameters used in the
determine his own pace for per-forming video quality evaluations (Table 1) was
grading, modifying grades, repeating based on the typical characteristics of
playout when needed, etc. For the mobile and wireless networks, as these
SAMVIQ method, quality evaluation is are described in Section 2. For example,
carried out scene after scene including an the Link Bandwidth can be considered as
explicit reference, a hidden reference and either the last hop access link BW or the
various algorithms (codecs). As a result, available BW to the user. The values
SAMVIQ offers higher reliability, i.e. chosen can represent typical wired home
smaller standard deviations. A major access rates (modem, ISDN, xDSL) or
advantage of this subjective evaluation different bearer rates for UMTS.
scheme is in the way video sequences are
presented to the viewer. In SAMVIQ video Video Link Propagatio Packet
sequences are shown in multi-stimulus Stream Bandwidth n Delay Loss
form, so that the user can choose the Bit Rate
order of tests and correct their votes, as
64 Kbps 64 Kbps 10 ms
appropriate. As the viewers can directly
10-5
compare the impaired sequences among 128 100 Kbps 20 ms
themselves and against the reference, Kbps
they can grade them accordingly. Thus, 256 256 Kbps 100 ms
viewers are generally able to discriminate Kbps
the different quality levels better with
70
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Test sequences
IV SIMULATION RESULTS
The test sequences used in this work were
the sample sequences Foreman and Fig 2 represents the performance of the
Claire. The sequences were chosen system in terms of PSNR gained with
because of their different characteristics. content- aware playout and content-
The first is a stream with a fair amount of unaware playout. The content aware
movement and change of background, playout provides better quality video.
whereas the second is a more static
sequence. The characteristics of these
sequences are shown in Table 2. The
sample sequences were encoded in
MPEG4 format with a free software tool
called FFMPEG encoder [20]. The two
sequences have temporal resolution 30
frames per second, and GoP (Group of
Pictures) pattern IBBPBBPBBPBB. Each
sequence was encoded at the rates shown
in Table 1. The video stream bit rate1
varies from 64Kbps to 768Kbps. This rate
is the average produced by the encoder.
Since the encoding of the sample video
Figure 2: PSNR evaluation
sequences is based on MPEG4, individual
frames have variable sizes.
V CONCLUSION
Trace Resoluti Total I P B
In this work, we formulated the problem
on Frame
of media streaming over a time-varying
Forema 176x144 400 34 10 26 wireless channel as a stochastic control
n 0 6 problem, and we analyzed the joint control
Claire 176x144 494 42 12 32 of packet scheduling and content-aware
4 8 playout. We showed that a small increase
in playout duration can result in a
Table 2: Characteristics of different video significant increase in video quality.
Furthermore, we proposed to take into
account the characteristics of a video
Data Collection sequence in order to adapt the playout
control based on the characteristics of
All the aforementioned experiments were each scene in the video sequence; this
conducted with an open source network reduces the perceived effect of playout
simulator tool NS2 [21]. Based on the speed variation. Our proposed method can
open source framework called EvalVid [7] improve the quality of the video stream
we were able to collect all the necessary over the network.
information needed for the objective 1 The
terms video stream bit rate and video
71
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
72
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
73
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
74
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
order of bit planes. It simply reverses the the edge map, the type of the edge
order of the bit planes. A random bit detector and its threshold value. The users
sequence generated from a logic chaotic have flexibility to choose any existing
map is used to encrypt the edge map and approach for edge detection and select
it is again performed with XOR operation any threshold value for the edge detector.
between each bit of a random bit The edge map can also be interleaved
sequence and each pixel of the edge map between any two bit planes. In the
to obtain encrypted edge map. The logic decryption process, the authorized users
chaotic map is defined as follows do not have to know the type of the edge
detector and its threshold value to
xn+1= r xn(1-xn) reconstruct the original image because the
edge map has been sent to users along
Where parameter r is a rational number with the encrypted image.
3.5699456 < r ≤ 4
If the size of the edge map is M×N, the The edge map can be completely
random bit sequence can be generated by recovered only by using the correct
the definition, security keys: the location to interleave
the edge map as well as the initial
condition x0and parameter r of the logic
Chaotic map. The decryption process first
decomposes the encrypted image into
binary bit planes .It then reverses the
order of all bit planes and extracts the
edge map from the bit planes. The edge
Where n=0, 1, 2……. MN- 1. map is reconstructed using security keys.
The algorithm performs an XOR operation
between the edge map and each bit plane
and combines the XORed bit planes to
obtain the reconstructed medical image.
75
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
chaotic map with the initial condition x0= • The chosen-cipher text attack - attackers
0.6 and the parameter r = 3.65. can choose some cipher texts and get the
corresponding plaintexts.
III. CRYPTANALYSIS
76
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
E
d Dec
g ryp
e ted
Medica
c Weakn ima
l image
r ess of ge(
y
Encryp edge orgi
p
ted Sec
crypt nal
t urit Fig5.Example for overall process during
medica alg ima
y transmission of medical images
a
l image ge)
l H C fla
IV. EXPERIMENTAL RESULT
Edge
g a r ws
map c y Less Susceptibility To The Change Of
k p Plain-Image
Fig4. Original image
e reconstruction
t by
unauthorized users It is well known that the cipher text of
r a a secure encryption scheme should be
n
Fig5 shows that authorized users use very sensitive to the change of plaintext.
the security key and applya edge crypt But the encryption scheme under test fails
algorithm process to decrypt
l the medical to comply this requirement. Plain-image
image. If a cryptanalyst or a hacker gets ―I‖ (say) have only one pixel difference at
y
that edge map and encrypted medical the position (i, j), the difference will be
s
image, then the weakness of the algorithm permuted to a new position (i*, j*)
can be spotted leading t to image according to the shuffling matrix P*.
reconstruction. Because of the fact that all plain-pixels
before (i*, j*) are identical for the plain-
image, the cipher image will also be
identical. This shows the low sensitivity of
the image encryption scheme to changes
in the plain image.
77
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Slow Encryption
Sample Outcome
78
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Fig7 Edge detected image and edge map sequence generated by iterating the logic
value chaotic function is found not sufficiently
random for secure encryption. Thus it has
been identified that security flaws are
involved in the edge crypt algorithm. It is
also brought to the notice that cipher text
attack and design weakness are held in
the edge crypt algorithm. Therefore, it is
not being recommended for applications
requiring a high level of security.
REFERENCES
79
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
80
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
U. Gayathri, (P.G.Student)/CSE,
Rajalakshmi Engineering College,
gaayathriu@gmail.com,
Contact: 9444931673
remodel the terrain data to significantly reduce surface distance computation and hence incurring
the CPU and I/O costs by accessing and low I/O and computation costs. The authors have
processing surface data in a just-enough manner. introduced an efficient skNN processing
Experiments using large scale, real terrain data methodthat provides: 1) exact answers to the
have shown that MR3 outperforms the benchmark queries, 2) the actual shortest surface paths and
algorithm in all cases by nearly one order of 3) incremental results. This approach is compared
magnitude. In reference [6] a maintenance-free, in accuracy with the range ranking method and in
itinerary-based approach called Density-aware response time with the Chen-Han algorithm.
Itinerary KNN query processing (DIKNN) is While the results are 100% accurate (vs. lower
proposed. Current approaches to K Nearest than 50% accuracy for the most accurate
Neighbor (KNN) search in mobile sensor networks variation of when k > 5) its response time is 4 to
require certain kind of indexing support. This 5 times better than an efficient variation for most
index could be either a centralized spatial index or cases. The authors in reference [8] study the
an in-network data structure that is distributed problem of processing rank based KNN query
over the sensor nodes. Creation and maintenance against uncertain data. Besides applying the
of these index structures, to reflect the network expected rank semantic to compute KNN, also the
dynamics due to sensor node mobility, may result median rank which is less sensitive to the outliers
in long query response time and low battery is introduced. Both ranking methods satisfy nice
efficiency, thus limiting their practical use. The top-k properties such as exact-k, containment,
DIKNN divides the search area into multiple cone- unique ranking, value invariance, stability and
shape areas centered at the query point. It then fairfulness. For given query q, IO and CPU
performs a query dissemination and response efficient algorithms are proposed in the paper to
collection itinerary in each of the cone-shape compute KNN based on expected (median) ranks
areas in parallel. The design of the DIKNN scheme of the uncertain objects. To tackle the correlations
also takes into account challenging issues such as of the uncertain objects and high IO cost caused
the dynamic adjustment of the search radius (in by large number of instances of the uncertain
terms of number of hops) according to spatial objects, randomized algorithms are proposed to
irregularity or mobility of sensor nodes. A cost approximately compute KNN with theoretical
effective solution, DIKNN, for handling the KNN guarantees. Here rank based KNN query on
queries in mobile sensor networks. DIKNN uncertain data is studied where expected
integrates query propagation with data collection (median) rank satisfying important top-k
along a well-designed itinerary traversal, which properties are adopted as ranking criteria. Exact
requires no infrastructures and is able to sustain and randomized algorithms integrating efficient
rapid change of the network topology. A simple object pruning and IO accessing techniques are
and effective KNNB algorithm has been proposed developed to process queries modeled by either
to estimate the KNN boundary under the trade-off query points or uncertain regions.
between query accuracy and energy efficiency.
Dynamic adjustment of the KNN boundary has II. BACKGROUND
also been addressed to cope with spatial
irregularity and mobility of sensor nodes. From A. Overview OF R-TREE FAMILY
extensive simulation results, DIKNN exhibits a
superior performance in terms of energy 1. R-tree
efficiency, query latency, and accuracy in various R-trees are tree data structures that are similar to
network conditions. Reference [7] proposes an B-trees, but are used for spatial access methods,
index structure on land surface that enables exact i.e., for indexing multidimensional information; for
and fast responses to skNN queries. Two example, the (X, Y) coordinates of geographical
complementary indexing schemes, namely Tight data. A common real-world usage for an R-tree
Surface Index (TSI) and Loose Surface Index might be: "Find all museums within 2 kilometers
(LSI), are constructed and stored collectively on a (1.2 mi) of my current location". The data
single novel data structure called Surface Index R- structure splits space with hierarchically nested,
tree (SIR-tree). With those indexes, skNN query and possibly overlapping, minimum bounding
can be efficiently processed by localizing the rectangles (MBRs, otherwise known as bounding
search and minimizing the invocation of the costly boxes, i.e. "rectangle", what the "R" in R-tree
83
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
g h
d e f
A B C
i j
Fig.2. R-tree
f g h
2. R+-Tree
d e f
An R+ tree is a method for looking up data using
a location, often (x, y) coordinates, and often for
locations on the surface of the earth. Searching f i j
on one number is a solved problem; searching on
two or more, and asking for locations that are
nearby in both x and y directions, requires craftier Fig.3. R+-tree
algorithms. Fundamentally, an R+ tree is a tree
data structure, a variant of the R tree, used for 3. ZR+-Tree
indexing spatial information. R+ trees are a
compromise between R-trees and kd-trees. They ZR+-trees resolves the limitations of the original
avoid overlapping of internal nodes by inserting R+-tree by eliminating the overlaps of leaf nodes.
an object into multiple leaves if necessary. The essential idea behind the ZR+-tree is to
logically clip the data objects to fit them into the
R+ trees differ from R trees in that: exclusive leaf nodes.
84
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
A B C
f2 g h
d E f1
f3 i j
Fig.4. ZR+-tree
B. Concurrency Control
If the regions do not cover the same space, as is Step3. Sort the distances of all the training
likely when joining index nodes, then the search samples and determine the nearest-neighbour
space can be reduced. Only objects within the based on the k-th minimum distance.
86
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Step4. Use the majority of nearest-neighbours as [3] ShubinZhang, Jizhong Han, Zhiyong Liu, Kai
the prediction value. Wang ―SJMR: Parallelizing Spatial Join with
MapReduce on Clusters‖ Cluster Computing and
V. CONCLUSION Workshops, 2009. CLUSTER '09. IEEE
International Conference in September 2009.
The objects are arranged in the ZR+-tree format
where the search operation can be performed [4] Xiaohui Yu, Ken Q. Pu and Nick Koudas
efficiently than in R+-trees. The ZR+-tree ―Monitoring k-Nearest Neighbor Queries over
segments the object to ensure every fragment is Moving Objects‖ Proceeding ICDE ‘05 Proceedings
fully covered by a leaf node. This clipping-object of the 21st International Conference on Data
design provides a better indexing structure. Engineering in the year 2005.
Furthermore, several structural limitations of the
R+-tree are overcome in the ZR+-tree by the use [5]Ke Deng, Xiao Fang Zhou, Heng Tao Shen, Kai
of a nonoverlap clipping and a clustering based Xu and Xuemin Lin ―Surface K-NN Query
reinsert procedure. Spatial join is performed to Processing‖ ICDE '06 Proceedings of the 22nd
identify the intersected regions without duplicates International Conference on Data Engineering.
and the knn queries help in identifying the
nearest-neighbours. [6] Shan-Hung Wu, Kun-Ta Chuang,Chung-Min
Chen and Ming-Syan Chen ―DIKNN: An Itinerary-
REFERENCES based KNN Query Processing Algorithm for Mobile
Sensor Networks‖ IEEE 23rd International
[1] Xiaopeng Xiong, Mohamed F. Mokbel and Conference, Data Engineering 2007, ICDE 2007 in
Walid G. Aref., ―SEA-CNN: Scalable Processing of April 2007.
Continuous K-Nearest Neighbor Queries in Spatio-
temporal Databases‖ Data Engineering, 2005. [7] Cyrus Shahabi, Lu-An Tang and Songhua Xing
ICDE 2005. Proceedings. 21st International ―Indexing Land Surface for Efficient KNN Query‖
Conference. Journal Proceedings of the VLDB Endowment
Volume 1 Issue 1 August 2008.
[2] Min Soo Kin, Ju Wan Kim and Myoung Ho Kim
―Semijoin Base Spatial Join Processing in the [8] Ying Zhang, Xuemin Lin, Gaoping Zhu, Wenjie
Multiple Sensor Networks‖ ETRI journal, Volume Zhang and Qianlu Lin ―Efficient Rank Based KNN
30, No. 6, December 2008. Query Processing over Uncertain Data‖ Data
Engineering (ICDE), 2010 IEEE 26th International
Conference in March 2010.
87
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
88
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The commutative cipher based en-route In this paper, a dynamic en-route filtering
filtering (CCEF) scheme [3], relies on fixed scheme is proposed along with the ESPDA
paths as IHA does. Second, it needs protocol to address both false report
expensivepublic-key operations to injection attacks and DoS attacks in
implement commutativeciphers. Third, it wireless sensor networks. In the proposed
can only filter the false reports generated scheme, sensor nodes are organized into
by a malicious node without the session clusters. Each legitimate report should be
key instead of those generated by a validated by multiple message
compromised cluster-head or other authentication codes (MACs). Before
sensing nodes. sending reports, nodes disseminate their
In location-based resilient security (LBRS) keys to forwarding nodes using Hill
solution [4], the adversaries can Climbing approach. Then, they send
intentionally attach invalid MACs to reports in rounds. Each node can monitor
legitimate reports to make them dropped its neighbours by overhearing their
by other nodes. In addition, LBRS suffers broadcast, which prevents the
a severe drawback: It assumes that all the compromised nodes from changing the
nodes can determine their locations and reports. The ESPDA protocol used in this
generate location-based keys in a short paper prevents the redundant data
secure time slot. transmission from sensor nodes to cluster-
A location-aware end-to-end data security heads. ESPDA is energy and bandwidth
(LEDS) [5] scheme assumes that sensor efficient because cluster-heads prevent
nodes can generate the location-based the transmission of redundant data from
keys bounded to cells within a secure sensor nodes. This scheme can deal
short time slot like LBRS. However, this efficiently with the topology changes of
cannot prevent the adversaries from the sensor networks.
sending false reports with less than valid
shares. In addition, LEDS addresses II PROPOSED SCHEME
selective forwarding attacks by letting the A. System model
whole cell of nodes to forward reports, The communication region of wireless
which incurs high communication sensor nodes is modelled as a circle area
overhead. of radius r, which is called the
The dynamic enroute filtering scheme [6] transmissionrange. Only the bidirectional
makes use of hill climbing approach for links between neighbour nodes are
the early detection of false reports. Here, considered. Based on these assumptions,
no schemes are used for the reduction of two nodes must be the neighbour of each
redundant data transmission from the other and can always communicate with
sensor nodes to the cluster head. The each other if the distance between them is
period to re-disseminate the no more than r.These nodes detecting the
authentication keys is not taken into event are called sensing nodes. They
account, the metrics to choose the generate and broadcast the sensing
forwarding nodes is not considered, the reportsto the cluster-head. The cluster-
extra control messages increases head is responsible for aggregating these
operation complexity and also incurs extra sensing reports into the aggregatedreports
overhead. and forwarding these aggregated reports
Energy-Efficient Secure Pattern based to the base stationthrough some
Data Aggregation (ESPDA) [7] protocol forwarding nodes
prevents the redundant data transmission
from sensor nodes to cluster-heads. In
ESPDA, they did not give any mechanisms
to reduce the number of hops travelled by
the false data reports.
89
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
90
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
91
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
92
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
93
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
from K(n) before. Then, it verifies the networks,‖ in proc. Ieee infocom, 2004,
integrity and validity of the reports by vol. 4, pp. 2446–2457.
checking the MACs of reports using the [2] s. Zhu, s. Setia, s. Jajodia, and p.
disclosed keys. Ning, ―an interleaved hop-by-hop
Step5: If the reports are valid, uj sends an authentication scheme forfiltering of
OK message to uj+1. Otherwise, it informs injected false data in sensor networks,‖
uj+1 to drop invalid reports. in proc. Ieee symp. Security privacy, 2004,
Step6: Similar to Step2, uj+1 forward the pp. 259–271.
reports to next hop. [3] h. Yang and s. Lu, ―commutative
Step7: Similar to Step3, after overhearing cipher based en-route filtering in wireless
the broadcast from uj+1, uj discloses K(t) sensor networks,‖ in proc. Ieee vtc, 2004,
to uj+1. vol. 2, pp. 1223–1227.
Every forwarding node repeats Step4 to [4] h. Yang, f. Ye, y. Yuan, s. Lu, and w.
Step7 until the reports are dropped or Arbaugh, ―toward resilient security in
delivered to the base station. The wireless sensor networks,‖ in proc. Acm
broadcast nature of wireless mobihoc, 2005, pp. 34–45.
communication is taken into account. In [5] k. Ren, w. Lou, and y. Zhang, ―leds:
this scheme, each node monitors its next- providing location-aware end-to-end data
hop node to assure no message is forged security in wireless sensor networks,‖ in
or changed intentionally. proc. Ieeeinfocom, 2006, pp. 1–12.
[6]zhen yu, member, ieee, and yong
guan, member, ieee, ―a dynamic en-route
IV CONCLUSION filtering scheme for data reporting in
In wireless sensor networks, adversaries wireless sensor networks,‖ ieee/acm
can inject false data reports via transactions on networking, vol. 18, no. 1,
compromised nodes and launch denial of february 2010.
service attacks. In this paper a dynamic [7] hasan çam, suat ozdemir, prashant
en-route filtering scheme that addresses nair, devasenapathy muthuavinashiappan
both false report injection and DoS attacks and h. Ozgur sanli, ―energy-efficient
in wireless sensor networks is proposed. secure pattern based data aggregation for
The Hill climbing key dissemination wireless sensor networks‖ acm journal,
approach is used to ensure that the nodes computer communications, volume29
closer to data sources have stronger issue 4, february, 2006.
filtering capacity. The redundant [8] b. Karp and h. T. Kung, ―gpsr: greedy
transmission of data reports by the sensor perimeter stateless routing for wireless
nodes to the cluster head is to be reduced networks,‖ in proc. Acm mobicom, 2000,
by using an Energy-Efficient Secure pp. 243– 254.
Pattern based data aggregation (ESPDA) [9] y. Yu, r. Govindan, and d. Estrin,
protocol. It reduces the energy ―geographical and energy aware routing: a
consumption of the sensor nodes and recursive data dissemination protocol for
improves the filtering capacity of the wireless sensor networks,‖ comput. Sci.
scheme. Dept., univ. California, los angeles, ucla-
csd tr-01–0023, 2001.
REFERENCES [10] z. Yu and y. Guan, ―a dynamic en-
route scheme for filtering false data
[1] f. Ye, h. Luo, s. Lu, and l. Zhang, injection in wireless sensor networks,‖ in
―statistical en- route detection and proc. Ieee infocom, 2006, pp. 1–12.
filtering of injected false data in sensor
94
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
95
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
clusters is useful in reducing energy among the nodes within their clusters
consumption. Many energy-efficient (intracluster coordination), and
routing protocols are designed based on communication with each other and/or
the clustering structure. The clustering with external observers on behalf of their
technique can also used to perform data clusters (inter-cluster communication).
aggregation, which combines the data Energy efficiency operations are essential
from source nodes into a small set of in extending Wireless Sensor Networks
meaningful information. Under the lifetime. Among the energy saving- based
condition of achieving sufficient data rate solutions, clustering sensor nodes is an
specified by applications, the fewer interesting alternative that features a
messages are transmitted, the more reduction in energy consumption through:
energy is saved. Localized algorithms can (i) aggregating data; (ii) controlling
efficiently operate within clusters and transmission power levels (iii) balancing
need not to wait for control messages load; (iv) putting redundant sense or
propagating across the whole network. nodes to sleep.
Therefore localized algorithms bring better
scalability to large networks than This paper proposes a Distributed
centralized algorithms, which are executed clustering mechanism equipped with
in global structure. Clustering technique energy maps and constrained by Quality-
can be extremely effective in broadcast of-Service (QoS) requirements. Such a
and data query. Cluster-heads will help to clustering mechanism is used to collect
broadcast messages and collect interested data in sensor networks. The first original
data within their own clusters. aspect of this investigation consists of
adding these constraints to the clustering
During data collection, two mechanisms mechanism that helps the data collection
are used to reduce energy consumption: algorithm in order to reduce energy
message aggregation and filtering of consumption and provide applications with
redundant data. These mechanisms the information required without
generally use clustering methods in order burdening them with unnecessary data.
to coordinate aggregation and filtering. The existing centralized clustering
Clustering is particularly useful for methods cannot be used to solve this
applications that require scalability to issue due to the fact that our approach to
hundreds or thousands of nodes. model the problem assumes that the
Scalability in this context implies the need numbers of clusters and cluster heads are
for load balancing and efficient resource unknown before clusters are created,
utilization. Applications requiring efficient which constitutes another major original
data aggregation are natural candidates facet of this paper.
for clustering. Routing protocols can also 2. Problem Statement
employ clustering. Clustering was
proposed as a useful tool for efficiently The Central approach is less efficient than
pinpointing object locations. Clustering can the distributed approach in the cluster
be extremely effective in one-to many, building phase. The nodes in the
many - to- one, one-to-any, or one-to-all centralized approach have to send their
(broadcast) communication. For example, information to a central node that collects
in many-to-one communication, clustering all of the information and runs the
can support data fusion and reduce algorithm to build the clusters. Energy
communication interference. The essential consumed by building cluster and the
operation in sensor node clustering is to energy consumed during the data
select a set of cluster heads among the collection phase is more in centralized
nodes in the network, and cluster the rest approach.
of the nodes with these heads. Cluster
heads are responsible for coordination 3. Related work
96
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
97
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
and media access control (MAC) and off neighboring radios during a certain
routing protocolsenable low-energy point-to-point wireless transmission can
networking. The advantage of rotatingthe mitigate this cost. To overcome this,
cluster head position among all the nodes Energy –Efficient data gathering and
enables LEACH toachieve a longer lifetime Dissemination algorithm is used.
than static clustering. LEACH is not as 3 Preliminaries of Proposed Algorithm
efficient as LEACH-C.
Lee et al [7] define an energy 3.1 Energy consumption model
consumption model. It shows theimpact
the coverage aging processof a sensor The energy consumption model
network, i.e., how it degrades over time determines the sensor lifetime. This model
as some nodes become energy-depleted. is affected by the application type, the
To evaluate sensing coverage with data extraction model and the network
heterogeneousdeployments, we use total communication model. Calculate the
sensing coverage, which representstotal energy consumption for a single cycle as
information that can be extracted from all follows:
functioningsensors in a network area. Ecycle = ED + ES + ET + ER
Energy consumption model determines a Where ED, ES, ET and ER represent the
device lifetimeby considering application energy required for data processing,
specific event characteristics, and network sensing, transmitting and receiving per
specific data extraction model and cycle time, respectively. The quantity of
communicationmethod. High-cost devices energy spent for each operation depends
can function as a cluster-head or sink to on the network and the event model.
collect and process the data from low-cost
sensors, which can enhance the duration 3.2 Energy maps
of network sensing operation.
Liang et al[8] proposes an Energy The energy map, the component that
efficientmethod for data gathering to contains information concerning the
prolong network lifetime. Theobjective is remaining available energy in all network
to maximize the network lifetime without areas, can be used to prolong the network
any knowledge of future query arrivals lifetime.
and generation rates. In other words,
theobjective is to maximize the number of 3.3 Data Collection Mechanism
data gathering queries answered until the
first node in the network fails. The Generally, sensor networks contain a large
Algorithm MNL significantly outperforms all quantity of nodes that collect
theother algorithms in terms of network measurements before sending them to the
lifetime delivered. applications. If all nodes forwarded their
Basu et al[9] discusses about the measurements, the volume of data
datadissemination and gathering.. A received by the applications would
majority of sensor networking applications increase exponentially, rendering data
involve data gathering and dissemination, processing a tedious task. A sensor system
hence energy efficient mechanisms of should thus contain mechanisms that
providing these services become critical. allow the applications to express their
However, due to the broadcast nature of requirements in terms of the required
the wireless channel many nodes in the quality of data. Data aggregation and data
vicinity of a sender node overhearits filtering are two methods that reduce the
packet transmissions even if those are not quantity of data received by Applications.
the intended recipients of these The aim of those two methods is not only
transmissions .This redundant reception to minimize the energy consumption by
results in unnecessary expenditure of decreasing the number of messages
battery energy of the recipients.Turning exchanged in the network but also to
98
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
provide the applications with the needed 5.Design intensification and diversification
data without needlessly overloading them mechanisms.
with exorbitant quantities of messages.
The aggregation data mechanism allows Initial solution:
for the gathering of several measures into The goal is to find an appropriate initial
one record whose size is less than the solution for the problem, in order to get
extent of the initial records. However, the the best solution from tabu search
result semantics must not contradict the iterations within a reasonable delay.
initial record semantics. Moreover, it must
not lose the meanings of the initial The Neighborhood definition:
records. The data filtering mechanism It involves a move involving a regular
makes it possible to ignore measurements node,a move involving an active node and
considered redundant or those irrelevant a move involving a cluster head .
to the application needs. A sensor system
provides the applications with the means Tabu lists:
to express the criteria used to determine Our adaptation proposes two tabu lists: a
measurement relevancy, e.g., an reassignment list and a re-election list.
application could be concerned with Reassignment list:
temperatures, which are 1) lower than a The first tabu list prevents cycles that can
given value and 2) recorded within a be generated by the reassigning of a
delimited zone. The sensor system filters node to the same cluster. After each
the network messages and forwards only move, which consists of reassigning node
those that respect the filter conditions. to cluster, the pair is added to this tabu
list.
Re-election list:
The second tabu list prevents the
3.5 A Tabu Search Approach reelection of an active node in the same
cluster. After a move, consisting of
In order to facilitate the usage of tabu electing node a in cluster , two pairs of
search for CBP, a new graph called Gr is nodes are added to the reelection list ,the
defined. It is capable of determining first pair prohibits the move the second
feasible clusters. A feasible cluster consists pair prevents the reverse move.
of a set of nodes that fulfil the cluster
building constraints. Nodes that satisfy
Constraint, i.e., ensure zone cover-age,
are called active nodes. The vertices of Gr
represent the network nodes. An edge is
defined in graph Gr between nodes i and j
if they satisfy Constraints. Consequently, it
is clear that a clique in Gr embodies a
feasible cluster. A clique consists of a set
of nodes that are adjacent to one another.
Five steps should be conducted in order to
adapt tabu search heuristics to solve a
particular problem:
1.Design an algorithm that returns an
Figure 1 – Flow Diagram for Tabu Search
initial solution,
for Clustering
2.Define moves that determine the
neighbourhood N of a solution s,
Aspiration criteria:
3.Determine the content and size of tabu
Aspiration criterion, which consists of
lists,
considering a move inventoried in the tabu
4.Define the aspiration criteria,
list, which in turn, engenders a solution
99
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
that is superior to the best solution found 2. The maximal number of iterations
in the first place. allowed has been reached;
3. The maximal number of iterations,
where the best solution is not enhanced
successively, has been reached.
5. Conclusions
100
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
101
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
102
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
103
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Client based techniques are may learn to distrust the toolbar. The
implemented on user end point through FIG3. describe about how the security tool
browser. bar prevent phishing attack .
D.ANTI-PHISHING TOOLS
F. DRAWBACK OF THE SECURITY
The existing anti-phishing tools is TOOLBAR APPROACH
categories into blacklist-based, heuristic-
based, and content-based tools; which is A toolbar is a small display in the
kind of heuristic-based. Google Safe peripheral area of the browser, compared
Browsing for Firefox is one of the blacklist- to the large main window that displays the
based tools, which is an extension of a web content. Users may not pay enough
web browser that alerts users if a web attention to the toolbar at the right times
page visited appears to be asking for to notice an attack.
personal or financial information under A security toolbar shows security-
false pretenses by combining advanced related information, but security is rarely
algorithms with reports about misleading the user‘s primary goal in web
pages from a number of sources. Because browsing.User may not
Google Safe Browsing for Firefox uses a Care about the toolbar‘s display even if
blacklist, it is vulnerable to new phishing they do notice it.
sites. If a toolbar sometimes makes
mistakes and identifies legitimate sites as
E. EFFECTIVENESS OF ANTI-PHISHING phishing sites, users may learn to distrust
TOOLS the toolbar.
104
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
false positive rate is more. Solving this LIMITATION: Some times APWG group
problem the system compare not only the unable to identify the phishing website.
image displayed on the web pagebut also TITLE:―A layout-similarity-based
the content and language in which the approach for detecting phishing pages‖.
web page is designed so the false positive REFERENCE: Angelo P. E.
rate is reduced. Rosiello[4]explained the client side
solution for phishing attack.In this
approach makes DOM-based layout
H. RELATED WORK comparisons of legitimate sites with
A growing number of user studies are potential phishing sites to detect phishing
investigating why phishing attacks are so pages.
effective against computer users. LIMITATION: Only applicable for client
based solution not possible for server
Title: ―An Evaluation of Anti- based solution.
Phishing Toolbars‖ TITLE:―A Content-Based Approach
REFERENCE: In November 2006 a study in to Detecting Phishing Web Sites‖.
Carnegie Mellon University found that anti- REFERENCE: Yue Zhang[5] founded the
phishing toolbars that were examined in CANTINA algorithm and it takes Robust
this study left a lot to be desired.[1] The Hyperlinks, an idea for overcoming page
researcherfound that three of the 10 not found problems using the well-known
toolbars, Spoof Guard, EarthLink and Net Term Frequency / Inverse Document
craft, were able to identify over 75% of Frequency (TF-IDF) algorithm, and applies
the phishing sites tested. it to anti-phishing.
LIMITATION:There are four of the LIMITATION:TF-IDF approach
toolbars were not able to identify even can identify
half the phishing sites tested. At the same 97% of phishing sites with about 6% false
time, SpoofGuard incorrectly identified positives rate.
38% of the legitimate URLs as phishing
URLs. 2. DESIGN
TITLE:‖Do Security
ToolbarsActually Prevent Phishing The UML diagram describes how the
Attacks?‖ Phishing detection carried out in the
REFERENCE: In ACM Conference April system.
2006 researcher identify three types of
security toolbars, as well as browser
address and status bars, to test their
effectiveness at preventing phishing
attacks. [2]
LIMITATION: All the tool bars are failed to
prevent users from being spoofed by high-
quality phishing attacks.
TITLE:Phishing Activity Trends
Report, Q1 2008‖.
REFERENCE: The APWG, founded
as the Anti-Phishing Working Group and it
is serves as a public and industry
resource for information about the
problem of phishing and email fraud,
including identification and promotion of
pragmatic technical solutions that can
provide immediate protection and benefits
against phishing attacks[3].
105
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
3. ALGORITHM
STEP 1 : Start
STEP 2 : Input the url of the website.
Start STEP 3 : Capture the screen shoot
automatically & store in the database.
STEP 4 : Compare with the available
INPUT: URL of Image db using K-means
the website (1) clustering Algorithm.
STEP 5 :If the threshold value* of the
Image doesn‘t exceeds go to step(9).
Capture the screen shot STEP 6 : If the images are dissimilar
(2) compare the Content and Language of the
web page stored in the database.
(3) STEP 7 : If the comparison match go to
Search with image database
step (9).
STEP 8 : Store the website is Phishing
(4) site.
yes Then go to step (10).
Threshold valve STEP 9 : Store the website as Legitimate
exceeds in the database.
STEP 10:Display the result.
STEP 11: Stop the Process
no
no (7) The system where it is determined
whether input URLs are in right format.
Compare the And the image database consists of a pair
(6)Legitimate site content and of URLs and their image displayed on the
language web browser. First, this system accesses
(5) yes the targeted URL by the web browser and
yes takes the image displayed on the browser.
Next, this system compares the image
Register in Image db
with in the Image database. Each entry in
Phishing site
the database has one of three labels:
legitimate, phishing, and unknown. If an
image is registered in the database, this
method regards the site as an imitating
(8) site. This system distinguishes malicious
Display the result from legitimate web pages by comparing
whether the siteis domain and images. If the image of the
Phishing or model site is not in the image database,
Legitimate
this method can detect malicious web
pages by similarity between victim sites
imitating the same site.
End .
4. IMPLEMENTATION
106
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
IMAGE CAPTURE
107
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
REFERENCE
5.CONCLUSION
Phishing has become a significant [1] Lorrie Cranor, Serge Egelman, Jason
threat to Internet users. Phishing attacks Hong, and Yue Zhang, ―Phinding Phish:
typically use legitimate-looking but fake ―An Evaluation of Anti-Phishing Toolbars‖
emails and websites to deceive users into 14th Annual Network and Distributed
disclosing personal or financial information System Security Symposium (NDSS 2007).
to the attacker.Phishing is a form of [2] Min Wu, Robert C. Miller and Simson L.
criminal conduct that poses increasing Garfinkel, ―Do Security ToolbarsActually
threats to consumers, financial Prevent Phishing Attacks?,‖ In Proceedings
institutions, and commercial enterprises in of ACM Conference on Human Factors in
Canada, the United States, India, and Computing Systems (CHI2006).
other countries. Because phishing shows [3]Anti Phishing Working Group, ―Phishing
no sign of abating, and indeed is likely to Activity Trends Report, Q1 2008,‖ Aug.
continue in newer and more sophisticated 2008
forms, law enforcement, other http://www.antiphishing.org/reports/apwg
government agencies, and the private _report_Q1_2008.pdf.
sector.In this paper, we propose a 4]Angelo Rosiello, Christopher Kruegel,
phishing detection mechanism based on a Engin Kirda and Fabrizio Ferrandi, ―A
novel framework for denial ofphishing by layout-similarity-based approach for
combining heuristic & content based detecting phishing pages,‖ 3rd
search algorithm .Here the false positive International Conference on Security and
rate should be reduced. Privacy in Communication Networks
(SecureComm 2007).
[5]Yue Zhang, Jason Hong, Lorrie Cranor,
―CANTINA: A Content-Based Approach to
Detecting Phishing Web Sites,‖ the 16th
International World Wide Web Conference
(WWW 2007).
108
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
109
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
structures that reflect the complexity of each Number of Class and Methods Thrown
individual entity, such as methods and classes, Away
and on external complexity that measures the
interactions among entities, such as coupling The CK OO Metric Suite
and inheritance. Metrics measure
computational complexity,that affects the Chidamber and Kemerer (CK) proposed six OO
efficiency of an algorithm and use the use of design and complexity metrics. They are
machine resources, as well as psychological
complexity factors that affect the ability of Weighted Methods Per Class(WMC)
programmer to create, modify and maintain The WMC metric is the sum of the complexity
software. The OO metrics have been used to of all methods for a class. It is the summing of
assess the quality of the software design such cyclomatic complexity of all the methods in the
as the fault-pronenessandthe maintainability class. Therefore, high values of the WMC
of classes. metric mean highcomplexities as well.
Lorenz proposed eleven metrics as OO design The cohesion of a class is indicated by how
metrics. They are listed below as closely the local methods are related to the
Average Method Size(LOC) local instance variables in the class. High
Average Number of Methods per Class cohesion indicates good class subdivision. The
Average Number of Instance Variables LCOM metric measures the dissimilarities of
per Class methods in a class by usage of instance
Class Hierarchy Nesting Level(DIT) variables. LCOMis measured as the number of
Number of Subsystem/Subsystem disjoint sets oflocal methods. Lack of cohesion
Relationships increases complexity and opportunities for
Number of Class/Class Relationship in error during the development process.
Each Subsystem Response for class (RFC)
Instance Variable Usage This is the number of methods that can be
Average Number of Comment executed in response to a message received
Lines(per Method) by an object of that class. The larger the no of
Number of Problem Reports per Class methods that can be invoked from a class
Number of Times Class is Reused through messages, the greater the complexity
110
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
of the class. It captures the size of the By using those identified threshold values,
response set of a class. The response set of a classes under examination can be clustered
class is all the methods called by local into low and high risk levels. Predicting the
methods. RFC is the number of local methods probability of the faulty classes is necessary
plus the number of methods called by local information to guide developers in their
methods. endeavor to improve the software quality
Number of child classes (NOC) and to reduce the costs of testing and
The NOC metric counts the number of maintenance. The probability of the faults in
descendents of a class. The number of classes can be used to rank classes based on
children represents the number of the risk level. The classes that are within the
specializations and uses of a class. Therefore, high-level risk need more investigation than
understanding all children classes is important the classes within the low-level risk.
to understand the parent. The high number of Software metrics thresholds can be used
children increases the burden on developers for the purpose of alarming the classes that
and testers in comprehending,maintaining, fall within an arbitrary risk level. With the help
and uncovering pre and post release faults. of the threshold values,developers and
3 Approach testers can scrutinize the classes during
Approach Overview the project progress and prepare design
The objective of this work is to estimate resolutions for these classes. The developers
risks levels in software development using OO and testers may usethese thresholds to
metrics. This can accomplished by means of identify refactoring candidates such as bad
using a statistical model, derived from the code classes. Therefore, software
logistic regression(LR), to identify threshold developers and testers need convenient
values for the Chidamber and Kemerer (CK) and intuitive techniques for identifying classes
metrics. ThisLogistic regression model yields that exceed an empirically specified risk
probability value for all the metrics individually level. Hence s/w
thereby we derive the threshold values for Table 1 Significance levels (P-values)
each of the metics. Univariate Logistics Analysis
metrics have been validated theoretically and values which results in identification of faulty
empirically as good predictors of quality classes. The general logistic regression model
factors. is as follows
But in previous approaches, metrics have not
been validated as measures of design P(X) = / ----(1)
complexity since there is a lack of empirical
validation of the acceptable risk levels and
Where g(x) =α+β*X is logit function, P is
quality assurance tools and the absence of
probability of class being fault, X is OO metric,
quantitative models that can be used easily to
β is estimated coefficient from maximizing the
derive metrics threshold values without
log-likelihood and α is the estimated constant.
repeating the tedious process of the data
collection makes OO design as complex
method. It results in significant correlations
between bad code and software faults then
studies found that association between metrics
and fault-proneness of classes, these
associations have not been exploited
effectively to identify threshold effects.
Univariate Logistic Regression Analysis
The CB0, RFC,and WMC metrics aresignificant
The logistic regression is used to validate the
predictors of the fault-proneness of classes.
metrics and to construct the threshold
model. In this section, we discuss the use of
Threshold Effects Analysis
the logistic regression to identify threshold
111
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
112
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The above figure shows numbers of bugs [1] L. Briand, J. Wust, and H. Lounis,
in a project can be increased with probability ―Replicated Case Studies forInvestigating
of class being faulty.Whenever it happens Quality Factors in 0bject-0riented Designs,‖
refinement work should be done to minimise Em- pirical Software Eng., vol. 6, no. 1, pp.
the occurrence of cumulative numbers of 11-58, 2001.
bugs.
[2] L. Rosenberg, ―Metrics for 0bject-0riented
4 Conclusion Environment,‖ Proc.EFAITP/AIE Third Ann.
Software Metrics Conf., 1997.
The threshold values provide a
meaningful interpretation for metrics and [3] S. Chidamber, D. Darcy, and C.
provide a surrogate to identify classes atrisk. Kemerer, ―Managerial Use of Metrics for
The classes that exceed a threshold value can 0bject 0riented Software: An Exploratory
be selected for more testing to improve Analysis,‖ IEEE Trans. Software Eng., vol. 24,
their internal quality, which increases the no. 8, pp. 629-639, Aug. 1998.
testing efficiency. This aproach can be on an
open-source system. It is clear that OO [4] L. Briand, J. Daly, and J. Wust, ―A
metrics serve the project manager, the Unified Framework for Coupling
developer, and the tester in assuring the Measurement in 0bject-0riented Systems,‖
quality of the software product and IEEE Trans. Software Eng., vol. 25, no. 1, pp.
mitigate potential problems in software 91-121, Jan./Feb. 1999.
complexity.
[5] M.H. Tang, M.H. Kao, and M.H. Chen,
References ―An Empirical Study on0bject-0riented
Metrics,‖ Proc. Sixth Int‘l Symp. Software
Metrics,pp. 242-249, 1999.
113
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
114
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
115
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
cover data can be any multimedia data this report they will be categorized based
like text, image, audio, video etc on the number of keys that are employed
for encryption and decryption, and further
2. Cryptography defined by their application and use. The
Data that can be read and following are the three types of Algorithm
understood without any special measures that are disscused
is called plaintext or cleartext. The method Symmetric Key Cryptography
of disguising plaintext in such a way as to The most widely used symmetric key
hide its substance is called encryption. cryptographic method is the Data
Encrypting plaintext results in unreadable Encryption Standard (DES) . It is still the
gibberish called ciphertext. You use most widely used symmetric-key
encryption to ensure that information is approach. It uses a fixed length, 56-bit
hidden from anyone for whom it is not key and an efficient algorithm to quickly
intended, even those who can see the encrypt and decrypt messages. It can be
encrypted data. The process of reverting easily implemented in hardware, making
ciphertext to its original plaintext is called the encryption and decryption process
decryption.A cryptographic algorithm, or even faster. In general, increasing the key
cipher, is a mathematical function used in size makes the system more secure. A
the encryption and decryption process. A variation of DES, called Triple-DES or DES-
cryptographic algorithm works in EDE (encrypt-decrypt-encrypt), uses three
combination with a key—a word, number, applications of DES and two independent
or phrase—to encrypt the plaintext. The DES keys to produce an effective key
same plaintext encrypts to different length of 168 bits.
ciphertext with different keys. The IDEA uses a fixed length, 128-bit key
security of encrypted data is entirely (larger than DES but smaller than Triple-
dependent on two things: the strength of DES). It is also faster than Triple-DES.
the cryptographic algorithm and the These use variable length keys and are
secrecy of the key claimed to be even faster than IDEA.
Despite the efficiency of symmetric key
cryptography , it has a fundamental weak
spot-key management. Since the same
Plain Encryptio Ciphe key is used for encryption and decryption,
it must be kept secure. If an adversary
Text n r Text knows the key, then the message can be
decrypted. At the same time, the key must
be available to the sender and the receiver
and these two parties may be physically
separated. Symmetric key cryptography
transforms the problem of transmitting
Plain Decryption messages securely into that of
Text transmitting keys securely. This is an
improvement , because keys are much
smaller than messages, and the keys can
be generated beforehand. Nevertheless,
ensuring that the sender and receiver are
Fig 1-Process of Encryption and using the same key and that potential
Decryption adversaries do not know this key remains
a major stumbling block. This is referred
2.1 TYPES OF CRYPTOGRAPHIC to as the key management problem.
ALGORITHMS Public/Private Key Cryptography
There are several ways of classifying Asymmetric key cryptography overcomes
cryptographic algorithms. For purposes of the key management problem by using
116
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
117
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Original Embed
image/Frame Cipher Text
Stego Image
118
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
119
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
120
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Cryptology–Eurocrypt‘05, pp.1-18,
SpringerVerlag, May 2005.
[14] X. Wang, Y. Yin, H. Yu, Finding
Collisions in the Full SHA-1. In Advances in
Cryptology - CRYPTO '05, 2005.
[15] A. Lenstra, X. Wang and B. de
Weger, Colliding X.509 Certificates,
Cryptology ePrint Archive, Report
2005/067, 2005. Available at:
http://eprint.iacr.org/
121
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*M.Piramanayagam **M.Yuvaraju
122
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
there are two fundamental key management schemes [5], using static pair wise keys and two MACs
for WSNs: static and dynamic. In static key appended to the sensor reports, ―an interleaved hop-
management schemes, key management functions by-hop authentication scheme for filtering of injected
(i.e., key generation and distribution) are handled false data‖ was proposed by Zhu et al. to address
statically. That is, the sensors have a fixed number of both the insider and outsider threats.
keys loaded either prior to or shortly after network Another crucial idea of this paper is the notion of
deployment. On the other hand, dynamic key sharing a dynamic cryptic credential (i.e., virtual
management schemes perform keying functions energy) among the sensors. A similar approach was
(rekeying) either periodically or on demand as suggested inside the SPINS study [6] via the SNEP
needed by the network. The sensors dynamically protocol. In particular, nodes share a secret counter
exchange keys to communicate. Although dynamic when generating keys and it is updated for every new
schemes are more attack resilient than static ones, key. However, the SNEP protocol does not consider
one significant disadvantage is that they increase the dropped packets in the network due to
communication overhead due to keys being refreshed communication errors. Although another study,
or redistributed from time to time in the network. Minisec [7], recognizes this issue, the solution
There are many reasons for key refreshment, suggested by the study still increases the packet size
including: updating keys after a key revocation has by including some parts of a counter value into the
occurred, refreshing the key such that it does not packet structure. The following sections will address
become stale, or changing keys due to dynamic the related works briefly.
changes in the topology. In this paper, we seek to A. Dynamic energy-based encoding and filtering
minimize the overhead associated with refreshing H. Hou, C. Corbett, Y. Li, and R. Beyah proposed
keys to avoid them becoming stale. Because the DEEF .In critical sensor deployments it is important to
communication cost is the most dominant factor in a ensure the authenticity and integrity of sensed data.
sensor‘s energy consumption, the message Further, one must ensure that false data injected into
transmission cost for rekeying is an important issue in the network by malicious nodes is not perceived as
a WSN deployment (as analyzed in the next section). accurate data. Here they present the Dynamic
Furthermore, for certain WSN applications (e.g., Energy-based Encoding and Filtering (DEEF)[2]
military applications), it may be very important to framework to detect the injection of false data into a
minimize the number of messages to decrease the sensor network. DEEF requires that each sensed
probability of detection if deployed in an enemy event report be encoded using a simple encoding
territory. That is, being less ―chatty‖ intuitively scheme based on a keyed hash. The key to the
decreases the number of opportunities for malicious hashing function dynamically changes as a function of
entities to eavesdrop or intercept packets the transient energy of the sensor, thus requiring no
II. RELATED WORKS need for re-keying. Depending on the cost of
Dynamic keying schemes go through the phase of transmission vs. computational cost of encoding, it
rekeying either periodically or on demand as needed may be important to remove data as quickly as
by the network to refresh the security of the system. possible. Accordingly, DEEF can provide
With rekeying, the sensors dynamically exchange authentication at the edge of the network or
keys that are used for securing the communication. authentication inside of the sensor network.
DEEF [2], is that in reality battery levels may Depending on the optimal configuration, as the report
fluctuate and the differences in battery levels across is forwarded, each node along the way verifies the
nodes may spur synchronization problems, which can correctness of the encoding probabilistically and
cause packet drops. Ma‘s work [3] applies the same drops those that are invalid. They have evaluated
filtering concept at the sink and utilizes packets with DEEF‘s feasibility and performance through analysis.
multiple MACs appended. A work [4] proposed by Their results show that DEEF, without incurring
Hyun and Kim uses relative location information to transmission overhead (increasing packet size), is
make the compromised data meaningless and to able to eliminate 90% - 99% of false data injected
protect the data without cryptographic methods. In
123
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
from an outsider within 9 hops before it reaches the minimal hardware: the performance of the protocol
sink. suite easily matches the data rate of our network.
B. Statistical en-route filtering of injected false Additionally, They demonstrate that the suite can be
data in sensor networks used for building higher level protocols.
Fan Ye, Haiyun Luo and Songwu Lu proposed the D. Dynamic en-route scheme for filtering false
―Statistical En-Route Filtering of Injected False Data data injection
in Sensor Networks‖to detect and drop false reports Zhen Yu and Yong Guan proposed a dynamic en-
during the forwarding processAssuming that the same route filtering scheme for false data injection attacks
event can be detected by multiple sensors, in SEF in wireless sensor networks. In sensor networks,
each of the detecting sensors generates a keyed adversaries can inject false data reports containing
message authentication code (MAC) and multiple bogus sensor readings or nonexistent events from
MACs are attached to the event report. As the report compromised nodes. Such attacks may not only cause
is forwarded, each node along the way verifies the false alarms, but also drain out the limited energy of
correctness of the MAC‘s probabilistically and drops sensor nodes. Several existing schemes for filtering
those with invalid MACs. SEF exploits the network false reports either cannot deal with dynamic
scale to filter out false reports through collective topology of sensor networks or have limited filtering
decision- making by multiple detecting nodes and capacity. In our scheme, a legitimate report is
collective false detection by multiple forwarding endorsed by multiple sensing nodes using their own
nodes. Authors have evaluated SEF‘s feasibility and authentication keys generated from one-way hash
performance through analysis, simulation, and chains. Cluster head uses HillClimbing approach to
implementation. Our results show that SEF can be disseminate the authentication keys of sensing nodes
implemented efficiently in sensor nodes as small as to the forwarding nodes along multiple paths toward
Mica2. It can drop up to 70% of bogus reports the base station.
injected by a compromised node within five hops, and The purpose system will provide fulfill all issues
reduce energy consumption by 65% or more in many discussed in the previous works and provide security
cases. in a efficient manner.
C. SPINS: Security Protocols for Sensor III. OVERVIEW OF THE SYSTEM
Networks In this paper provides secure communication
A. Perrig, R. Szewczyk, V. Wen, D. Cullar, framework provides a technique to verify data in line
and J. Tygar proposed the SPIN,As sensor networks and drop false packets from malicious nodes, thus
edge closer towards wide-spread deployment, maintaining the health of the sensor network. It
security issues become a central concern. So far, dynamically updates keys without exchanging
much research has focused on making sensor messages for key renewals and embeds integrity into
networks feasible and useful, and has not packets as opposed to enlarging the packet by
concentrated on security. They present a suite of appending message authentication codes (MACs).
security building blocks optimized for resource Specifically, each sensed data is protected using a
constrained environments and wireless simple encoding scheme based on a permutation
communication. SPINS has two secure building code generated with the RC4 encryption scheme and
blocks: SNEP and _TESLA. SNEP provides the sent towards the sink. The key to the encryption
following important baseline security primitives: Data scheme dynamically changes as function of the
confidentiality, two-party data authentication, and residual virtual energy of the sensor, thus requiring
data freshness. Aparticularly hard problem is to no need for rekeying. The nodes forwarding the data
provide efficient broadcast authentication, which is an along the path to the sink are able to verify the
important mechanism for sensor networks. _TESLA is authenticity and integrity of the data and to provide
a new protocol which provides authenticated non- repudiation.
broadcast for severely resource-constrained The contributions of this paper are as follows. First, A
environments. They implemented the above dynamic en route filtering mechanism that does not
protocols, and show that they are practical even on exchange explicit control messages for rekeying.
124
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Second, provision of one-time keys for each packet Fig 1. Modular diagram
transmitted to avoid stale keys. Third, modular and
flexible security architecture with a simple technique A. Virtual Energy-Based Keying module
for ensuring authenticity, integrity, and no The virtual energy-based keying process involves the
repudiation of data without enlarging packets with creation of dynamic keys. Contrary to other dynamic
MACs. Forth, A robust secure communication keying schemes, it does not exchange extra
framework that is operational in dire communication messages to establish keys. A sensor node computes
situations and over unreliable medium access control keys based on its residual virtual energy of the
layers .The random distribution of data is done by sensor. energy-based keying module ensures that
using DES techniques. It is used to provide security in each detected packet2 is associated with a new
a efficient way. The energy of the sensor is being unique key generated based on the transient value of
saved by doing all the encryption and decryption in the virtual energy. After the dynamic key is
with the residual energy of the sensor. generated, it is passed to the crypto module, where
IV. MODULES the desired security services are implemented. The
The virtual energy-based keying process involves the process of key generation is initiated when data is
creation of dynamic keys. Contrary to other dynamic sensed; thus, no explicit mechanism is needed to
keying schemes, it does not exchange extra refresh or update keys. Moreover, the dynamic nature
messages to establish keys. A sensor node computes of the keys makes it difficult for attackers to intercept
keys based on its residual virtual energy of the enough packets to break the encoding algorithm.
sensor. The key is then fed into the crypto module. B. Crypto module
The crypto module employs a simple encoding The crypto module employs a simple encoding
process, which is essentially the process of process, which is essentially the process of
permutation of the bits in the packet according to the permutation of the bits in the packet according to
dynamically created permutation code generated via the dynamically created permutation code generated
RC4. The encoding is a simple encryption mechanism via RC4. Due to the resource constraints of WSNs,
adopted However, architecture allows for adoption of traditional digital signatures or encryption
stronger encryption mechanisms in lieu of encoding. mechanisms requiring expensive cryptography is not
Last, the forwarding module handles the process of viable. The scheme must be simple, yet effective.
sending or receiving of encoded packets along the Thus, in this section, we introduce a simple encoding
path to the sink. operation similar to that used in [2]. The encoding
operation is essentially the process of permutation of
the
bits in the packet, according to the dynamically transmission overhead of traditional schemes.
created permutation code via the RC4 encryption However, since the key generation and handling
mechanism. The key to RC4 is created by the process is done in another module, This flexible
previous module (virtual energy based keying architecture allows for adoption of stronger
module). The purpose of the crypto module is to encryption mechanisms in lieu of encoding. In this
provide simple confidentiality of the packet header module DES technique is used to provide the random
and payload while ensuring the authenticity and packet transmission from source to sink in order to
integrity of sensed data without incurring provide security.
125
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
126
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
127
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract - Cloud computing systems provide knowledge of, expertise in, or control over the
on demand access to computational resources technology infrastructure about the cloud they
for dedicated use. Grid computing allows users are using. It typically involves the provision of
to share heterogeneous resources from dynamically scalable and often virtualized
multiple administrative domains applied to resources as a service over the Internet [1].
common tasks. In this paper, the system The cloud computing characteristics are on
discusses the characteristics and requirements demand self-service, ubiquitous network
of a hybrid infrastructure composed of both access, Independent resource location
grid and cloud technologies. The infrastructure (reliability), rapid elasticity (scalability), and
is used to manage the execution of service pay per use. The cloud computing allow the
workflows in system through dynamic service use of Service Oriented Computing (SOC)
composition. The dynamic service composition standards, permitting users to establish links
is achieved by the autonomic computing between services, organizing them as
characteristics of the cloud computing workflows instead of building traditional
technologies. The infrastructure can be applications using programming languages.
expanded by acquiring computational The on demand computing offered by the
resources on demand from the cloud during cloud allows users to keep using their
the workflow execution and it manages these particular systems (computers, clusters, and
resources and the workflow execution without grids), aggregating the cloud resources as
user interference. Optimized Scheduling they need. However, this technology union
Algorithm was used. The hybrid infrastructure results in a hybrid computing system, with
enables the execution of service workflows of new demands, notably in resource
grid jobs using cloud technology. management. Besides that, even though it
Keywords – Grid Process Orchestration, uses the SOC paradigm, the cloud does not
Dynamic Deployment Virtual Resource, Cloud offer support to dynamic service workflow
System Interface composition and coordination.
I. INTRODUCTION In this paper we discuss the characteristics of
Grid computing refers the combination of a hybrid system, composed of the union of a
computer resources from multiple grid with a cloud, and we propose an
administrative domains to reach a common infrastructure able to manage the execution of
goal.The cloud computing brings the service workflows in such system.
supercomputing to the users, making them to This paper is organized as follows. Some basic
transparently achieve virtually unbounded concepts and related works are presented in
processing and storage accessible from their Section II, while Section III shows the
laptops or personal computers. infrastructure to execute service workflows in
In the cloud computing paradigm details are the hybrid system. Section IV presents the
abstracted from the users. They do not need system architecture, and application scenarios
128
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
are discussed in Section V. Conclusion and the provision of dynamically scalable and often
future works are shown in Section VI. virtualized resources as a service over the
Internet [1]. Cloud computing delivers three
II. CONCEPTS AND RELATED WORKS defined models: software as a service (SaaS),
platform as a service (PaaS), and
infrastructure as a service (IaaS). In SaaS the
Our infrastructure is directed to systems that consumer uses an application but does not
use the service oriented computing paradigm. control the host environment. Google Apps [7]
In our work we combine a service oriented and Salesforce.com [8] are examples of this
grid and the Nimbus cloud [2], an option model. In PaaS the consumers use a hosting
based on the Amazon‘s Elastic Compute Cloud environment for their applications. The Google
(EC2) [3]. App Engine [9] and Amazon Web Services
Grids are environments where shared [10] are PaaS examples. In this model the
heterogeneous computing resources are platform is typically an application framework.
connected through a network, local or remote In IaaS the consumer uses computing
[4]. Grids allow institutions and people to resources such as processing power and
share resources and objectives through storage. In this model the consumer can
security rules and use policies, comprising the control the environment including deployment
so called Virtual Organizations (VOs) [4]. The of applications. Amazon Elastic Compute Cloud
Open Grid Services Architecture (OGSA) [3], Globus Nimbus [2], and Eucalyptus [11]
standard [4] proposes that the interoperability are good examples of this model.
among grid heterogeneous resources to be The most popular model is the IaaS. In a
made through Internet protocols, allowing simplified manner we can understand this
grids to use standards and paradigms from the cloud model as a set of virtual servers
service oriented computing (SOC) [5]. In our accessible through the Internet. These servers
work we used the Globus Toolkit version 4 can be managed, monitored, and maintained
(GT4)[19], an OGSA implementation from the dynamically and remotely. It is easy to see
Globus Alliance [6]. that the virtualization concept is fundamental
The grid and virtual organization dynamics in the IaaS model. Virtualization [12] is the
intensify the need of on demand provisioning, process of presenting a logical grouping or
where organizational requirements must guide subset of computing resources so that they
the system configuration. It is an environment can be accessed in abstract ways with benefits
duty to dynamically provide the services over the original configuration. The
related to each application when they are virtualization software abstracts the hardware
needed. It is not recommendable to make all by creating an interface to virtual machines
services available in all resources in the grid, (VMs), which represents virtualized resources
since this can overload resources and use such as CPUs, physical memory, network
processing power, memory, and bandwidth connections, and peripherals. Each virtual
without need. To allow on demand machine alone is an isolated execution
provisioning, it is necessary to have support to environment independent from the others.
dynamic instantiation of services, i.e, to send With this, each VM can have its own operating
the service to the resource, publish it so it can system, applications, and network services.
be handled by a container, and activate the This isolation allows users to have control over
container so it can start replying to service the resource without interference in other
requisitions. Our infrastructure aggregates participants in the cloud.
functionalities for on demand service Clouds and grids are distinct. Clouds provide
provisioning during workflows execution, since full private cluster, where individual users can
the GT4 does not offer such functionality. access resources from the pool, and its
The cloud computing paradigm abstracts resources are ―opaque‖, being accessible
details from users who no longer need through the user interface without knowledge
knowledge about the technology infrastructure about hardware details. Grids permit individual
that supports the cloud. It typically involves users to select resources and get most, if not
129
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
all, the resources in a single request. Its The work proposed in this paper aggregates
middleware approach takes federation as a the support for service workflow execution in
first principle, and it exposes resources both grids and clouds, offering support to on
without any preparation or isolation. These demand dynamic instantiation and service
differences claim different architectures for publication. Its functionalities allow the user to
each one, and require personalized solutions. execute abstract workflows, without indicating
Our infrastructure supplies service support the resources where each part of the workflow
offering automatic service deployment in the will execute. Besides that, the hybrid systems
resources provided dynamically and controlling management allows the use of different clouds
the service workflow execution through a with several architectures.
workflow manager, interacting with the cloud
and the grid in a transparent manner without III. THE HYBRID INFRASTRUCTURE
the user interference.
Some works propose solutions for the The on demand computing requires services
execution of workflows in grids or clouds, but scalability, flexibility, and availability. To
only a few considers a hybrid environment. In supply these requirements it is important to
[13] the authors explore the use of cloud the infrastructure to offer reconfiguration with
computing for scientific workflows. The the possibility of the deployment of new
approach is to evaluate the tradeoffs between resources or update of the existing ones
running tasks in a local environment, if such is without stopping processes in execution. In a
available, and running in a virtual environment hybrid system composed of a grid with the
via remote, wide-area network resource possibility of accessing a cloud computing
access. The work in [14] describes a scalable infrastructure, the workflow management
and lightweight computational workflow must supply the requirements in some levels.
system for clouds which can run workflow jobs First, it must provide facilities to the user to
composed of multiple Hadoop MapReduce or make submissions without the need of
legacy programs. But both works do not offer choosing or indicating their localization of the
support for services. computational resources to be used. Inside
In [15] the authors show issues that limit the the grid boundary, the workflow manager
use of clouds for highly distributed applications must find the best resources available and,
in a hybrid system. However, it has lack of when necessary, it must make the dynamic
interoperability between different cloud deployment of services in these resources. On
platforms, and it does not offer support to the other hand, inside the cloud boundary, the
service workflows. The authors propose a infrastructure must be able to interact with the
hybrid system formed by the DIET Grid and cloud interfaces to obtain computational
Eucalyptus in [16]. It shows possible ways of resources. After that, it must be able to
connecting these two architectures as well as prepare these resources according to workflow
requirements to achieve this, but it does not necessities, making the dynamic deployment
support services or service workflows. In [17], of services in the resources inside the cloud.
the authors show a solution that automatically This deployment can be made when local
schedules workflow steps to underutilized resources are not enough to the workflow
hosts and provides new hosts using cloud necessities. This configuration increases the
computing infrastructures. This interesting computational power of the grid without new
work extends a BPEL implementation to infrastructure investment, using the on
dynamically schedule service calls of a BPEL demand computing advantages provided by
process based on the target hosts load[20]. To the cloud.
handle peak loads, it integrates a provisioning In this paper we show an infrastructure for the
component that dynamically launches virtual execution of service workflows in hybrid
machines in Amazons EC2 infrastructure and systems composed of grid and clouds. The
deploys the required middleware components infrastructure provides dynamic instantiation
(web/Grid service stack) on-the-fly. However, of services when necessary, and it is formed
it does not support hybrid systems.
130
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
by a set of services which offer the following with the service available, the infrastructure
functionalities[21]: tries to publish the service in a resource.
• Simple workflow description language: Users
describe workflows through the Grid Process IV. THE HYBRID INFRASTRUCTURE
Orchestration Language (GPOL)[18]. The ARCHITECTURE
GPOL allows users to build abstract workflows,
where the computational resources do not The architectural diagram of hybrid system
need to be indicated. shows that the first step in the process is the
• Dynamic service instantiation: During the creation of Workflow. The Workflow
workflow execution, the infrastructure application is made to run in the hybrid
searches, in the grid and in the cloud, the best infrastructure system consists of both Grid
computational resources available to execute boundary and Cloud boundary. The imaging
each service. application is used as a workflow. The
• Automatic reference coordination (endpoint workflow is created and it is submitted to the
reference service): When offering dynamic Hybrid Workflow Manager without indicating
instantiation, some activities must be the resources needed for their execution.
transparent to the users. For example, There is no user interference in the allocation
consider a service that when executed of the resources. Grid Process Orchestration is
generates a file that is used by other services the workflow manager. It is a middleware to
in the workflow. Because the services support interoperability of distributed
localization is made on demand, the applications which require service composition
infrastructure resolves the references between in the computational grid. The Hybrid
services in execution time, without user Workflow Manager is responsible for managing
interference. the tasks of the workflow. The GPO allows the
• Dynamic service deployment: When the best creation and management of application flows,
resource option is identified, the infrastructure tasks, and services in grids. The Workflow
can deploy the new service if necessary. The tasks level is managed by the GPO. The
dynamic deployment is executed regardless Workflow is given to the grid boundary as well
the resource is in the grid or in the cloud; and as the to the cloud software.
• Robust workflow execution: If a service fails
during the execution, the infrastructure can The Grid Workflow Scheduling Engine can
search for an alternative resource, schedule efficiently the grid service through
automatically redirecting the execution and the Grid workflow monitor and Grid services
making the necessary adjustments in the information. Optimized Scheduling algorithm
services references. If there is no resource was used for better performance [22].
131
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Work Flow
Cloud
Data Repository Cloud Service
Resource Monitor
Software
Computational Resource
Figure-1 Architecture
The grid workflow monitoring monitors the Resource Monitors monitor the level of
execution and level of workflow as well as the workflow execution. It gathers information
number of resource instance allocated to each about computational resources in the hybrid
service in the workflow. The grid service system, grid or cloud. It operates in a
information indicates the services performed in distributed manner, maintaining one instance
each level. Based on the information the grid in each computational resource. Such
workflow engine uses the scheduler service to instances are used on demand by the other
schedule the tasks. The Scheduler service services when information about resources is
provides the function of distributing the needed. The workflow manager uses the
workflow services to be executed in the grid resource monitor to have knowledge about
resources. which resources are in the grid at a given
time, and the scheduler uses it to obtain
132
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
information about the current state of the workflow execution through a workflow
resources. Based on information stored in the manager interacting with the cloud and the
resource monitor, the scheduler can simply grid in a transparent manner without the user
schedule the unscheduled services. If any of interference.
the workflow exceeds the given limit the alert
level is created and accordingly the workflow V. APPLICATION SCENARIOS
service shifts to the cloud boundary. The
resource monitor now monitors the workflow The proposed infrastructure can be useful in
service in the cloud boundary. The cloud many scenarios that appear in grid application
boundary provides dynamic service. All the executions nowadays. If we consider the
information about the resources and the minimization of the makespan as the objective
history are stored in the data repositories. The to be achieved, the infrastructure has always
resources repository has information such as the option of requesting cloud resources if the
the characteristics of each computational current resources available cannot give a
resource, performance history and load. This satisfactory makespan. For instance, this can
information can also be used by the scheduler occur if the grid is overloaded with many
in its decision process. Besides that, the submissions in a load peak. In this scenario, a
services repository has information about new workflow may have its speedup heavily
services available in the grid, and it stores the prejudiced because it may need to wait for
necessary files to the dynamic publication. many other workflows to finish their
When the service workflow exceeds the limit execution.
the dynamic service is provided by the cloud Another scenario where we envision that our
software. It provides the feature of autonomic infrastructure can be applied is when a
computing that automatically allocates the workflow has more parallel tasks than the
resources to the workflow service without any number of resources available. An example of
user interference. application that is represented by a workflow
The Dynamic Deployment Virtual Resource is and can have different sizes is Montage [23].
used by the infrastructure when resources Montage is an image application that makes
from the cloud are needed. It is composed of mosaics from the sky for astronomy research.
two groups of services. The first group is the Its workflow size depends on the square
DDVM itself, which communicates with the degree size of the sky to be generated. For
infrastructure, taking care of the example, for a 1 square degree of the sky, a
functionalities. The second group called Cloud workflow with 232 jobs is executed. For a 10
Interface Service (CIS) makes the interface square degrees of the sky a 20,652 jobs
with the cloud. This layout gives flexibility and workflow is executed, dealing with an amount
scalability to the infrastructure. For each of data near to 100 GB. The full sky is around
resource, one instance of the couple 400000 squaredegrees [24]. For such an
DDVM/CIS is responsible for the binding application, an elastic infrastructure is desired,
between the workflow manager and the cloud where the number of resources can be
resource. To use the cloud resources, the GPO adapted according to the size of the
communicates with the DDVM, which application to be run. In our hybrid system,
communicates with the cloud through CIS, and the infrastructure can avoid the cloud use
requests a resource. The resource monitor when the grid resources are sufficient to the
maintains information about the cloud load. execution of the workflow. On the other hand,
The Computational resource contains all the when the workflow is too large, the
resources used in both the grid and the cloud. infrastructure can gather resources from
The hybrid system thus enables the execution clouds to afford its execution.
of service workflow through dynamic service Our infrastructure can be applied in cases
composition. The infrastructure supplies where deadlines for the completion of the
service support offering automatic service workflow execution exist. If the scheduler
deployment in the resources provided finds that the grid itself is not able to provide
dynamically, and controlling the service resources in the quantity and quality needed
133
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
to execute a workflow within a given deadline, giving to the user the responsibility of
it may ask the cloud for resources to compose preparing the environment for such execution.
the infrastructure and therefore be capable of We propose an infrastructure that covers this
finishing the workflow before the deadline is aspect, offering support to automatically install
reached. services in the resources dynamically provided
by the grid or by the cloud, while providing
VI. CONCLUSION the service workflow execution control. Our
workflow management system interacts with
The cloud computing provides computational the cloud and with the grid in a transparent
resources on demand for dedicated use. On manner, without the user interference, and its
the other hand, the computational grid functionalities allow the execution of abstract
proposes interoperability among workflows without indicating which resource
heterogeneous resources through Internet must be used. Additionally, the proposed
protocols. In this paper we discuss the hybrid system management gives flexibility
characteristics and requirements of a hybrid and scalability to our infrastructure, permitting
system formed by these two technologies, and the use of several clouds with different
we propose an infrastructure to the architectures in an independent and
management of service workflows in this simultaneous way.
system. Our motivation comes from the fact Future works are to study how to identify
that both technologies do not offer adequate Cloud load peaks and to choose the best
support to the execution of service workflows, cloud.
134
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
135
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
136
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
priori, then data can bedelivered over theroutes may not be spatially dispersive
paths that circumvent (bypass) these enough to circumventa moderate-size
holes, whenever possible. In practice, due black hole.
to the difficulty of acquiring such location In this paper, we propose a
information, the above idea isimplemented randomized multipath routingalgorithm
in a probabilistic manner, typically through that can overcome the above problems. In
atwo-step process. First, the packet is thisalgorithm, multiple paths are
broken into M shares(i.e., components of computed in a randomizedway each time
a packet that carry partial information) an information packet needs to be sent,
using a (T-M)-threshold secret sharing suchthat the set of routes taken by
mechanism such as the Shamir‘s various shares of differentpackets keep
algorithm. The original information can changing over time. As a result, a large
berecovered from a combination of at numberof routes can be potentially
least T shares, but noinformation can be generated for each source and
guessed from less than T shares. destination. To intercept different packets,
F(x)=∑2j=0 = yj . lj (x) the adversary hasto compromise or jam all
(1) possible routes from the source tothe
destination, which is practically not
Second,multiple routes from the source to possible.
the destination are computed according to Because routes are now randomly
some multipath routing algorithm.These generated, they mayno longer be node-
routes are node-disjoint ormaximally disjoint. However, the algorithm
node-disjoint subject to certain ensuresthat the randomly generated
constraints. We argue that four security routes are as dispersive aspossible, i.e.,
problems exist in the above approach. the routes are geographically separated as
First, this approach is no longer valid if the faras possible such that they have high
adversary can selectively compromise or likelihood of notsimultaneously passing
jamnodes. This is because the route through a black hole. The main challenge
computation in the abovemultipath routing in our design is to generate
algorithms is deterministic in the sensethat highlydispersive random routes at low
for a given topology and given source and energy cost. And for secure set of packets
destinationnodes, the same set of routes we propose merkle tree algorithm that
are always computed by therouting authenticate the set of packets. It
algorithm. As a result, once the routing generate tree for the set and attaches a
algorithmbecomes known to the adversary mark to each packet and saves
the adversary can compute the set of computation overhead at each receiver.
routes for any givensource and
destination. Then, the adversary can
pinpoint toone particular node in each
route and compromise (or jam)these
nodes. Such an attack can intercept all
shares of theinformation Second, actually
very fewnode-disjoint routes can be found
when the node density ismoderate and the
source and destination nodes are
severalhops apart. Third it assign single
secret key for whole packet, (i.e.,
assigning single key for whole set of
packets. so if adversary pinpoint the key
they can easily retrieved all packets in the Fig.1. Randomized dispersive routing in a
set. Last, because theset of routes is WSN.
computed under certain constraints,
137
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2.1 Overview
138
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
1 ≤ M ≤Mmax;
1 ≤ N ≤ Nmax;
139
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
140
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
141
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*P.G Scholar,priya_be_cs@yahoo.co.in,9715837210
**Assistant Professor II, tamil1806@yahoo.co.in
Department of Computer Science and Engineering
Velammal Engineering College, Chennai,
Tamil Nadu,
India
142
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
143
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
methods proposed by which a node can individual entities in the external world. A
verify whether other identities are Sybil single individual who controls many
identities, including radio resource testing, identities can disrupt, manipulate, or
key validation for random key pre corrupt peer-to-peer applications and
distribution, position verification and other applications that rely on
registration. redundancy; this is commonly called the
Direct validation is a node directly tests Sybil attack. Detection of Sybil attack is
whether another node identity is valid. the problem. To solve this problem, this
The most promising method among the introduces a trust game that makes false
methodology is the random key pre- claims financially risky for the claimant.
distribution which associates a node's keys The informant [4] will accept the game if
with its identity. Random key pre and only if she is Sybil with a low
distribution will be used in many scenarios opportunity cost, and the target will
for secure communication, and because it cooperate if and only if she is identical to
relies on well understood cryptographic the informant. Sybil Game is a more
principles it is easier to analyze than other sophisticated game that includes the
methods. These methods are robust to economic benefit to the detective of
compromised nodes. In indirect validation, learning of Sybil and the economic cost to
nodes that have already been verified are informant and target of revealing that
allowed to vouch for or refute other Sybil‘s are present. This paper [4] proves
nodes. This paper [5] leaves secure the optimal strategies for each participant.
methods of indirect validation as future The detective will offer the game if and
work. only if it will determine her choice about
Sybil Attack In Recommendation Systems using the application in which these
Recommendation systems [8] can be identities participate. As future work,
attacked in various ways, and the ultimate intends to develop a protocol to detect
attack form is reached with a Sybil attack, Sybil attack.
where the attacker creates a potentially The methodology applied in [1] are
unlimited number of Sybil identities to Inferring honest sets, Approximating EXX,
vote. Defending against Sybil attacks is representing the ‗gap‘ between the case
often quite challenging, and the nature of when the full graph is fast mixing,
recommendation systems makes it even Sampling honest configurations,
harder. Exploiting heavy-tail distribution of Experimental evaluation using synthetic
typical voting behavior of the honest data, and the final Experimental
identities, Carefully identifying whether evaluation using real world data. Through
the system is already getting ―enough analytical results as well as experiments
help‖ from the (weighted) voters already on simulated and real-world network
taken into account or whether more ―help‖ topologies that, given standard constraints
is needed; DSybil [8] can defend against on the adversary, Sybil Infer [1] is secure,
an unlimited number of Sybil identities in that it successfully distinguishes
over time. DSybil provides a growing between honest and dishonest nodes and
defense. If the user has used DSybil for is not susceptible to manipulation by the
some time when the attack starts, the loss adversary. Results show that Sybil Infer
will be significantly smaller than the loss outperforms state of the art algorithms,
under the worst-case attack. DSybil into both in being more widely applicable, as
real-world recommendation systems and well as providing vastly more accurate
study the system‘s robustness against results. Modifying the simple minded
DDoS. protocol into a fully fledged one-hop
Sybil Attack In Peer To Peer Systems distributed hash table is an interesting
Networked applications [4] often assume challenge for future work. Sybil Infer can
or require that identities over network also be applied to specific on-line
have a one-to-one relationship with communities. In such cases a set of nodes
144
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
145
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
considered as a Sybil node and not as an correlation within a single random route.
honest node. Namely, if a random route visits the same
Every node is simultaneously a suspect node more than once, the exiting edges
and a verifier. As in Sybil Guard, we will be correlated. In Sybil Guard, a
assume that each suspect S has a locally random walks starting from an honest
generated public/private key pair, which node in the social network is called
serves to prevent the adversary from escaping if it ever crosses any attack
―stealing‖ S‘s identity after is accepted. edge.
When a verifier V accepts a suspect S,V B. Secure Random Route Protocol.
actually accepts S‘s public key, which can We first focus on all the suspects in Sybil
be used later to authenticate. Limit, i.e., nodes seeking to be accepted.
IV. Sybil Limit Protocol Figure 2. presents the pseudo-code for
Sybil Limit has two component protocols: how they perform random routes . In the
a secure random route protocol and a protocol, each node has a public/private
verification protocol. The first protocol key pair and communicates only with its
runs in the background and maintains neighbors in the social network. Every pair
information used by the second protocol. of neighbors share a unique symmetric
A. Random Walk And Random secret key (the edge key, established out
Routes of band for authenticating each other. A
Sybil Guard uses a special kind of random Sybil node M1 may disclose its edge key
walk, called random routes, in the social with some honest node A to another Sybil
network. In a random walk, at each hop, node M2. However, because all neighbors
the current node flips a coin on the fly to are authenticated via the edge key, when
select a uniformly random edge to direct M2 sends a message to A , A will still route
the walk (the walk is allowed to turn the message as if it comes from M1.In the
back). For random routes, each node uses protocol, every node has a pre computed
a pre computed random permutation— random permutation x1x2,…xd( d being
―x1x2,…xd,‖ where d is the degree of the the node‘s degree) as its routing table.
node—as a one-to-one mapping from The routing table never changes unless
incoming edges to outgoing edges. A the node adds new neighbors or deletes
random route entering via edge will old neighbors. A suspect S starts a
always exit via edge xi . This pre random route along a uniformly random
computed permutation, or routing table, edge (of S) and propagates along the
serves to introduce external correlation route its public key Ks together with a
across multiple random routes. Namely, counter initialized to 1.
once two random routes traverse the
same directed edge, they will merge and
stay merged (i.e., they converge).
Furthermore, the outgoing edge uniquely
determines the incoming edge as well;
thus the random routes can be back-
traced. These two properties are key to
Sybil Guard‘s guarantees. As a side effect,
such routing tables also introduce internal
146
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
147
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
1. S sends to V its public key Ks and S ‗s set of tails {(j, KA, KB ) | S‘s tail in the jth s-
instance is the edge ―A→B‖ and KA (KB) is A‘s (B‘s) public key};
2. V computes the set of intersecting tails X={(i, KA, KB) | (i, KA, KB) is V‘s tail and (j, KA, KB
) is S‘s tail};
3. For every (i, KA, KB) ϵX, V authenticates B usingKB and asks B whether S isregistered
under ―KA → KB ‖
If not, remove (i, KA, KB) from X;
4. If X is empty then reject S and return;
5. Let a= (1+ ∑ri=1 ci) / r and b = h.max(log r, a);
6. Let cmin bethe smallest counter among those ci‘s
corresponding to(i, KA, KB) that still remain in X
7. If (cmin + 1) >b) then reject S; otherwise, increment cmin and acceptS;
estim
ated
r, it
will
be the adversary that helps it to accept most of the
honest nodes. Second, the benchmark set is itself a
set with fraction of Sybil nodes. That an application
can just use the nodes in directly and avoid the full
Sybil Limit protocol.
V.Evaluation Figure4.Result of sybil attack
Our experiments thus mainly serve to
validate such an assumption, based on To bind the number of Sybil nodes we use
real world social networks. Such validation the Java and JavaFX to prove our Sybil
has a more general implication beyond guarantees. JavaFX is used for complete
Sybil Limit—these results will tell us Graphical User Interface design. Sybil
whether the approach of leveraging social Guard uses registry tables and witness
networks to combat Sybil attacks is valid. tables. Registry tables ensure that each
A second goal of our experiments is to node registers with the nodes on its
gain better understanding of the hidden random routes. The witness table is
constant in Sybil Limit‘s o (log n) propagated and updated in a similar
guarantee. fashion as the registry table, except that it
propagates ―backward‖. This process is
used to verify the receiver node and needs
to perform an intersection between each
of its random routes.
It reduces communication overhead.
When a node interact with other node, it
always authenticates the node by
requiring that node to sign every message
sent, using its private key. In Sybil Guard,
a node communicates with other nodes
only when (i) it tries to verify another
node, and hence needs to contact the
intersection nodes of the random routes,
and (ii) it propagates its registry and
witness tables to its neighbors. It also has
a mechanism that allows a node to bypass
offline nodes when propagating registry
and witness tables. In the process of
propagating/updating registry and witness
148
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
tables, the social network may change is quite satisfactory. The case without
again. Thus, it is helpful to consider it as a using redundancy is much worse (even if
decentralized, background stabilization we seek only a single intersection),
process. demonstrating that exploiting redundancy
is necessary. For our 10000-node topology
and 100-node topology, g = 204 and g =
11 give probabilities of 99.6% and 87.7%,
respectively. Notice that an 87.7%
probability does not mean that 12.3% of
the nodes will not be accepted by the
system. It only means that given a
verifier, 12.3% of the nodes will not be
accepted by that verifier. Each honest
Figure 3. Probability of routes node, on average, should still be accepted
remaining entirely within the honest by 87.7% of the honest nodes (verifiers).
region. VI.Concluding Remarks
This paper presented Sybil Guard, a near-
optimal defense against Sybil attacks
using social networks. Sybil Guard
improvement derives from the
combination of multiple novel techniques:
1) leveraging multiple independent
instances of the random route protocol to
perform many short random routes; 2)
exploiting intersections on edges instead
of nodes; 3) using the novel balance
condition to deal with escaping tails of the
Figure 4. Probability of an honest verifier; and 4) using the novel
node accepting another honest node benchmarking technique to safely estimate
Figure 5 shows the probability that the . Finally, our results on real-world social
majority of an honest node‘s routes networks confirmed their fast-mixing
remain entirely in the honest region. As property and, thus, validated the
we can see from the Figure5, the fundamental assumption behind Sybil
probability [2] is always almost 100% Limit‘s (and Sybil Guard‘s) approach. As
before g = 2000, and only drops to 99.8% future work, we intend to implement Sybil
when g = 2500. This means that even Limit within the context of some real-
with 2500 attack edges, only 0.2% of the world applications and demonstrate its
nodes are not protected by Sybil Guard. utility.
These are mostly nodes adjacent to References
multiple attack edges. In some sense, [1] G. Danezis and P. Mittal, ―SybilInfer:
these nodes are ―paying the price‖ for Detecting sybil nodes using social
being friends of Sybil attackers. For the networks,‖ presented at the NDSS, 2009.
10000-node topology and the 100-node [2] M. Mitzenmacher and E. Upfal,
topology, g = 204 and g = 11 will result in Probability and Computing. Cambridge,
0.4% and 5.1% nodes unprotected, U.K.: Cambridge Univ. Press, 2005.
respectively. For better understanding, [3] J. Douceur, ―The Sybil attack,‖ in Proc.
Figure 5 also includes a second curve IPTPS, 2002, pp. 251–260.
showing the probability of a single route [4] N. B. Margolin and B. N. Levine,
remaining entirely in the honest region. ―Informant: Detecting sybils using
Figure 6 presents the probability of V incentives,‖ in Proc. Financial
accepting S, as a function of the number Cryptography, 2007, pp. 192–207.
of attack edges g. This probability [2] is
still 99.8% with 2500 attack edges, which
149
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
150
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
151
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
All odd bit errors as long as the of m bits, {b0; b1… bm-1}, to allow the
generator polynomial. receiver to detect possible errors. The
Any burst error for which the sequence S2 is commonly known as a
length of the burst is less than length of Frame Check Sequence (FCS). It is
CRC. generated by taking into account the fact
Larger burst errors. that the complete sequence, S=S1US2,
Data misordering detection. obtained by the concatenating of S1 and
S2, has the property that it is divisible
Hardware implementation on VLSI is (following a particular arithmetic) by some
preferred due to several reasons: predetermined sequence P, { p0; p1… pm},
Slow processing and limited of m+1 bits. After Tx sends S to Rx, Rx
application to lower encoding rates and divides S (i.e., the message and the FCS)
large delay before delivering the data in by P, using the same particular arithmetic,
bit wise software implementation. after it receives the message. If there is
In byte wise software, it takes no remainder, Rx assumes there was no
considerable CPU time and large memory error.
is required for the processing. The product operator is accomplished by a
While hardware implementations bitwise AND, whereas both sum and
are simple, fast and easy to realize. subtraction are accomplished by bitwise
XOR operators. A CRC circuit can be easily
The proposed design starts from LFSR, realized as a special shift register, called
used in serial CRC. An unfolding algorithm LFSR. It is used by both transmitter and
[2] is used for realize parallel processing. receiver. In transmitter side, the dividend
As direct application of this algorithm may is the sequence S1 concatenated with a
lead to unleash a parallel circuit with large sequence of m zeros to the right. The
iteration bound [2]. So delay elements are divisor is P. In receiver, the dividend is the
added or pipelined so as to achieve the received sequence and the divisor is the
minimum critical path. Critical path (CP) of same P.
Data Flow Graph (DFG) is the path with One of the possible LFSR [4] is shown in
the longest computation time among all the Figure 1. In this m FFs have common
paths that contain zero delays. To achieve clock and clear signal. The input x`i of the
high speed, the length of CP must be ith FF is obtained by taking an XOR of the
reduced by pipelining and parallel (i-1) th FF output and a term given by the
processing. Finally retiming algorithm is logical AND between pi and xm-1. The
applied to obtain the lowest achievable signal x`0 is obtained by taking an XOR of
CP. the input d and xm-1. If pi is zero, only a
shift operation is performed (i.e., XOR
The article is structured as follows: Section related to x`i is not required); otherwise,
II, illustrates the key factors in CRC. the feedback xm-1 is XOR-ed with xi-1. We
Section III, briefs about the related works point out that the AND gates in Fig. 2 are
on parallel CRC. In Section IV, the unnecessary if the divisor P is time-
methods used to reduce the critical path invariant.
are discussed. Finally in Section V, the
results are analyzed.
152
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
153
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
A. Pipelining Process
154
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
155
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
bit CRC is taken for a high performance Field Programmable Logic and
operations and faster applications. Here Applications, 1996.
we have chosen serial implementation of [7] M. Spachmann, ―Automatic Generation
32 bit CRC. The frequency of operation of Parallel CRC Circuits,‖ IEEE Design and
and the area usage of the serial Test of Computers, May 2001.
architecture are analyzed. The 32bit serial [8] Youngju. Do Sung-Rok. Yoon, Taekyu.
CRC uses 18 slices and the frequency of Kim, Kwang Eui. Pyun and Sin-Chong,
operation is found to be ―High-speed Parallel Architecture for
370.508MHz.Hence the proposed method Software-based CRC‖ IEEE CCNC 2008.
provides increase in clock rate and [9] Michael E. Kounavis and Frank L.
provides better performance within less Berry, ―Novel Table Lookup-Based
area. Depending on the required reduction Algorithms for High-Performance CRC
in iteration bound, we will be performing Generation‖ IEEE Transactions on
three or four level pipelining method. Care Computer, VOL. 57, NO. 11, pp. 1550-
has to be taken in not increasing the area 1560 Nov 2008.
of architecture along with pipelining, as [10] Chao Cheng, Keshab K. Parhi, ―High
pipelining will reduce the critical path (CP) Speed VLSI Architecture for General Linear
by adding delay elements which will Feedback Shift Register (LFSR) Structures
increase the size of architecture. ―IEEE, 2009.
[11] K. K. Parhi, ―Eliminating the fanout
REFERENCES bottleneck in parallel long bchencoders,‖
[1] T. V. Ramabadran and S. S Gaitonde, IEEE Transactions on Circuits and Systems
―A tutorial on CRC computations,‖ IEEE I: Regular Papers, vol. 51, no. 3, pp.
Micro, vol. 8, no. 4, pp. 62–75, Aug. 1988. 512–516, 2004.
[2] K. K. Parhi, VLSI Digital Signal [12] A. Tanenbaum, Computer Networks,
Processing Systems: Design and 4th ed. Prentice Hall, 2003.
Implementation. Hoboken, NJ: Wiley, [13] P. Koopman, ―32-Bit Cyclic
1999. Redundancy Codes for Internet
[3] T.-B. Pei and C. Zukowski, ―High-speed Applications,‖ in DSN ‘02: Proceedings of
parallel CRC circuits in VLSI,‖ IEEE Trans. the 2002 International Conference on
Commun., vol. 40, no. 4, pp. 653–657, Dependable Systems and Networks.
Apr. 1992. Washington, DC, USA: IEEEComputer
[4] G. Campobello, G. Patané, and M. Society, 2002, pp. 459–468.
Russo, ―Parallel CRC realization,‖ IEEE [14] P. Koopman and T. Chakravarty,
Trans. Comput., vol. 52, no. 10, pp. 1312– ―Cyclic Redundancy Code (CRC)
1319, Oct. 2003. Polynomial Selection for Embedded
[5] G. Albertengo and R. Sisto, ―Parallel Networks,‖ in DSN ‘04: Proceedings of the
CRC Generation,‖ IEEE Micro, vol. 10, no. 2004 International Conference on
5, Oct. 1990, pp. 63-71. Dependable Systems and Networks.
[6] M. Braun et al., ―Parallel CRC Washington, DC, USA: IEEE Computer
Computation in FPGAs,‖ Proc.Workshop Society, 2004, pp.145–154.
156
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
lingesan.j@gmail.com, kannamma.sridharan@gmail.com
Abstract— A ―botnet‖ consists of a network of 1. INTRODUCTION
compromised computers controlled by an attacker
(―botmaster‖). Recently, botnets have developed into Internet malware attacks have evolved into better-
the root cause of a lot of Internet attacks. To be organized and more profit-centered endeavors. E-mail
healthy ready for future attacks, it is not sufficient to spam, extortion through denial-of-service attacks, and
study how to detect and defend against the botnets click fraud represent a few examples of this emerging
that have appeared in the past. More prominently, we trend. ―Botnets‖ are a root cause of these problems. A
should study advanced botnet designs that could be ―botnet‖ consists of a network of compromised
developed by botmasters in the near future. In this computers (―bots‖) connected to the Internet that is
paper, we present the design of an advanced hybrid controlled by a remote attacker (―botmaster‖). Since a
peer-to-peer botnet. Compared with current botnets, botmaster could scatter attack tasks over hundreds or
the proposed botnet is harder to be shut down, even tens of thousands of computers distributed across
observe, and hijacked. It provides robust network the Internet, the enormous cumulative bandwidth and
connectivity, individualized encryption and control large number of attack sources make botnet-based
traffic dispersion, limited botnet exposure by each bot, attacks extremely dangerous and hard to defend
and easy monitoring and improvement by its against. Compared to other Internet malware, the
botmaster. Our enhancement is to defend against such unique feature of a botnet lies in its control
an advanced botnet. We can secure the data in every communication network. Most botnets that have
bots by having the session key for viewing that data. appeared until now have had a common centralized
This key will be changing at every transaction. That key architecture. That is, bots in the botnet connect directly
changing mechanism is controlled and issued by the to some special hosts (called ―command-and-control‖
authorization person. For every transaction the receiver servers, or ―C&C‖ servers). These C&C servers receive
must register to the authorization person. After commands from their botmaster and forward them to
registration authorization person provide the session the other bots in the network. From now on, we will
key to the requested receiver. At the time of receiver call a botnet with such a control communication
node registration the authorization person can read architecture a ―C&C botnet.‖ Fig. 1 shows the basic
connected nodes information. By this way we can find control communication architecture for a typical C&C
the availability of transaction that is used to find botnet (in reality, a C&C botnet usually has more than
botmasters two C&C servers). Arrows represent the directions of
Index Terms — Botnet, Botmaster, Bot, Honeypot network connections. As botnet-based attacks become
popular and dangerous, security researchers have
studied how to detect, monitor, and defend against
them. Most of the current research has focused upon
157
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
the C&C botnets that have appeared in the past, A botmaster will lose control of her botnet once
especially Internet Relay Chat (IRC)- based botnets. It the limited numbers of C&C servers are shut down by
is necessary to conduct such research in order to deal defenders.
with the threat we are facing today. However, it is Defenders could easily obtain the identities
equally important to conduct research on advanced (e.g., IP addresses) of all C&C servers based on their
botnet designs that could be developed by attackers in service traffic to a large number of bots, or simply from
the near future. Otherwise, we will remain susceptible one single captured bot (which contains the list of C&C
to the next generation of Internet malware attacks. servers).
From a botmaster‘s perspective, the C&C servers are An entire botnet may be exposed once a C&C
the fundamental weak points in current botnet server in the botnet is hijacked or captured by
architectures. First, a botmaster will lose control of her defenders. As network security practitioners put more
botnet once the limited number of C&C servers are shut resources and effort into defending against botnet
down by defenders. Second, defenders could easily attacks, hackers will develop and deploy the next
obtain the identities (e.g.,IP addresses) of all C&C generation of botnets with different control
servers based on their service traffic to a large number architecture.
of bots, or simply from one single captured bot (which
contains the list of C&C servers). Third, an entire
botnet may be exposed once a C&C server in the
botnet is hijacked or captured by defenders. As
network security practitioners put more resources and
effort into defending against botnet attacks, hackers
will develop and deploy the next generation of botnets
with a different control architecture.
From a botmaster‘s perspective, the C&C servers are
the fundamental weak points in current botnet
architectures.
First, a botmaster will lose control of her botnet once Fig. 1. C&C architecture of a C&C botnet.
the limited number of C&C servers are shut down by
defenders.
Second, defenders could easily obtain the identities PROPOSED BOTNETS AND THEIR ADVANTANGES
(e.g., IP addresses) of all C&C servers based on their
service It provides robust network connectivity,
traffic to a large number of bots [7], or simply from one individualized encryption and control traffic dispersion,
single captured bot (which contains the list of C&C limited botnet exposure by each bot,
servers). Easy monitoring and recovery by its botmaster
Third, an entire botnet may be exposed once a C&C No bootstap procedure
server in the botnet is hijacked or captured by Each bot has a peer list to communicate
defenders [4]. As Report command to communicate
network security practitioners put more resources and Update command –contact a sensor host to update
effort into defending against botnet attacks, hackers bots‘ botspeer list
will develop Bots with static IP are candidates for being in peer
and deploy the next generation of botnets with a lists
different control architecture. Servent bot listens on a self-defined port & uses
appropriate key for incoming traffic
CURRENT BOTNETS AND THEIR WEAKNESSES
The C&C servers are the fundamental weak
points in current botnet architectures. 2. RELATED WORKS
158
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[1] Global Internet threats are undergoing a profound hard to defend against using standard techniques, as
transformation from attacks designed solely to disable the malicious requests differ from the legitimate ones in
infrastructure to those that also target people and intent but not in content.
organizations. Behind these new attacks is a large pool [5] Botnets-networks of (typically compromised)
of compromised hosts sitting in homes, schools, machines are often used for nefarious activities (e.g.,
businesses, and governments around the world. These spam, click fraud, denial-of-service attacks, etc.).
systems are infected with a bot that communicates with Identifying members of botnets could help stem these
a bot controller and other bots to form what is attacks, but passively detecting botnet membership
commonly referred to as a zombie army or botnet. (i.e., without disrupting the operation of the botnet)
Botnets are a very real and quickly evolving problem proves to be difficult. This paper studies the
that is still not well understood or studied. effectiveness of monitoring lookups to a DNS-based
[2] Time zones play an important and unexplored role black hole list (DNSBL) to expose botnet membership
in malware epidemics. To understand how time and 3. BOTNET ARCHITECTURE
location affect malware spread dynamics, we studied 3.1 Two classes of Bots
botnets, or large coordinated collections of victim The bots in the proposed P2P botnet are classified into
machines (zombies) controlled by attackers. Over a six two groups. The first group contains bots that have
month period we observed dozens of botnets static, nonprivate IP addresses and are accessible from
representing millions of victims. We noted diurnal the global Internet. Bots in the first group are called
properties in botnet activity, which we suspect occurs servent bots since they behave as both clients and
because victims turn their computers off at servers.2 The second group contains the remaining
night.Through binary analysis, we also confirmed that bots, including 1) bots with dynamically allocated IP
some botnets demonstrated a bias in infecting regional addresses, 2) bots with private IP addresses, and 3)
populations.Clearly, computers that are of line are not bots behind firewalls such that they cannot be
infectious, and any regional bias in infections will affect connected from the global Internet. The second group
the overall growth of the botnet. We therefore created of bots is called client bots since they will not accept
a diurnal propagation model. The model uses diurnal incoming connections.
shaping functions to capture regional variations in
online vulnerable populations.
[3] Denial-of-Service (DoS) attacks pose a significant
threat to the Internet today especially if they are
distributed, i.e., launched simultaneously at a large
number of systems. Reactive techniques that try to
detect such an attack and throttle down malicious
traffic prevail today but usually require an additional
infrastructure to be really effective. In this paper we
show that preventive mechanisms can be as effective
with much less effort. DoS attack prevention that is
based on the observation that coordinated automated
activity by many hosts needs a mechanism to remotely
control them.
[4] Recent denial of service attacks are mounted by
professionals using Botnets of tens of thousands of Fig. 2. C&C architecture of the proposed botnet.
compromised machines. To circumvent detection,
attackers are increasingly moving away from bandwidth 3.2 Botnet Command and Control Architecture
floods to attacks that mimic the Web browsing
behaviour of a large number of clients, and target Fig. 2 illustrates the C&C architecture of the proposed
expensive higher-layer resources such as CPU, botnet. The illustrative botnet shown in this figure has
database and disk bandwidth. The resulting attacks are five servent bots and three client bots. The peer list
size is two (i.e., each bot‘s peer list contains the IP
159
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
addresses of two servent bots). An arrow from bot A to node registration the authorization person can read
bot B represents bot A initiating a connection to bot B. connected nodes information. By this way we can find
This figure shows that a big cloud of servent bots the availability of transaction that is used to find
interconnect with each other—they form the backbone botmasters
of the control communication network of a botnet. A 4.2 Session Key Generation
botmaster injects her commands through any bot(s) in Every node has data‘s with session key embedded that
the botnet. Both client and servent bots periodically session key generate by the Authentication server. This
connect to the servent bots in their peer lists in order session key is used to open the data. it is changeable
to retrieve commands issued by their botmaster. When key that every transformation the session key will be
a bot receives a new command that it has never seen changed .the changing mechanism control by
before (e.g., each command has a unique ID), it Authentication server
immediately forwards the command to all servent bots 4.3 Data Transformation and Botmaster Detection
in its peer list. In addition, if itself is a servent bot, it At the registration Authentication server retrieve details
will also forward the command to any bots connecting (own IP, ID & connected node/botserver ID, IP) of the
to it. botserver If requested botserver/node is authorized
botserver/node the authentication server provide
session key. The botserver/node can Read the data by
3.3 Relationship between Traditional C&C Botnets and the use of session key. If Data will transmitted to
the Proposed Botnet request botserver/node Session key will be changed at
the transaction so Data received botserver/node need
Compared to a C&C botnet (see Fig. 1), it is easy to see modified session key then Botserver/node can register
that the proposed hybrid P2P botnet shown in Fig. 2 is to the Authentication server for modified session key. If
actually an extension of a C&C botnet. The hybrid P2P requested botserver/node is not an authorized
botnet is equivalent to a C&C botnet where servent botserver /node that botserver/node is a botmaster
bots take the role of C&C servers: the number of C&C
servers (servent bots) is greatly enlarged, and they
interconnect with each other. Indeed, the large number
of servent bots is the primary reason why the proposed
hybrid P2P botnet is very hard to be shut down.
4. DETECTING BOTMASTERS
160
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
161
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
security. One drawback, however, is that encrypting include Diffie-Hellman, ElGamal discrete log
and decrypting in elliptic curve cryptosystems may take cryptosystem and DSA.
longer than other cryptosystems. There are several Doing the group operations needed to run the system
slightly different versions of elliptic curve cryptography, is slower for an ECC system than for a factorization
all of which rely on the widely believed difficulty of system or modulo integer discrete log system of the
solving the discrete logarithm problem for the group of same size. However, proponents of ECC systems
an elliptic curve over some finite field. The most believe that the ECDLP problem is significantly harder
popular finite fields for this are the integers modulo a than the DLP or factorisation problems, and so equal
prime number (see modular arithmetic) GF(p), or a security can be provided by much smaller key lengths
Galois field of characteristic two GF(2m). Galois fields of using ECC, to the extent that it can actually be faster
size of power of some other prime have also been than, for instance, RSA. Published results to date tend
proposed, but are considered a bit dubious among to support this belief, but some experts are skeptical.
cryptanalysts. ECC is widely regarded as the strongest asymmetric
Given an elliptic curve E, and a field GF(q), we consider algorithm at a given key length, so may become useful
the abelian group of rational points E(q) of the form (x, over links that have very tight bandwidth requirements.
y), where both x and y are in GF(q), and where the
group operation "+" is defined on this curve as
described in the article elliptic curve. We then define a
second operation "*" | Z×E(q) → E(q): if P is some 5. PERFORMANCE MEASURE
point in E(q), then we define 2*P = P + P, 3*P = 2*P Two factors affect the connectivity of a botnet: 1) some
+ P = P + P + P, and so on. Note that given integersj bots are removed by defenders and 2) some
and k, j*(k*P) = (j*k)*P = k*(j*P). The elliptic curve bots are offline. These two factors, even though
discrete logarithm problem (ECDLP) is then to completely different, have the same impact on botnet
determine the integer k, given points P and Q, and connectivity when the botnet is used by its botmaster
given that k*P = Q. at a specific time. Let C(p) denote the connected ratio
It is believed that the usual discrete logarithm problem and D(p)denote the degree ratio after removing top p
over the multiplicative group of a finite field (DLP) and fraction of mostly connected bots among those peer-list
ECDLP are not equivalent problems; and that ECDLP is updating servent bots—this is the most efficient and
significantly more difficult than DLP. In cryptographic aggressive defense that could be done when defenders
use, a specific base point G is selected and published have the complete knowledge (topology, bot IP
for use with the curve E(q). A private key k is selected addresses . . . ) of the botnet. C(p) and D(p) are
as a random integer; and then the value P = k*G is defined as
published as the public key (note that the purported
difficulty of ECDLP implies that k is hard to determine C(p) = # of bots in the largest connected graph
from P). If Alice and Bob have private keys kA and kB, ------------------------------------------------------
and public keys PA and PB, then Alice can calculate # of remaining bots
kA*PB = (kA*kB)*G; and Bob can compute the same D(p) = Average degree of the largest connected graph
value as kB*PA = (kB*kA)*G. ----------------------------------------------------------
This allows the establishment of a "secret" value that ----
both Alice and Bob can easily compute, but which is Average degree of the original botnet
difficult for any third party to derive. In addition, Bob
does not gain any new knowledge about kA during this These two metric functions have clear physical
transaction, so that Alice's private key remains private. meanings. The metric C(p) shows how well a botnet
The actual methods used to then encrypt messages survives a
between Alice and Bob based on this secret value are defense action by keeping the remaining members
adaptations of older discrete logarithm cryptosystems connected together. The metric D(p) shows how
originally described for use on other groups. These densely the
162
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
163
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*Research Scholar, Vel Tech DR.RR & DR.SR Technical University, Chennai-62
senthilmuruganme@gmail.com
Mobile No.: 9176031383
**B.E(CSE) , Vel Tech Engineering College, Avadi, Chennai-62
SSSENTHIL.P@gmail.com
Mobile No.: 9600685018
***B.TECH (IT), Tagore Engineering College, Rathinamangalam, Chennai-48. manikandan.tk@gmail.com
164
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
165
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
network to any of the node, that has been a RREQ is received a route to the source is
identified as malicious, thus discouraging them created. If the receiving node has not received
from dropping packets. this RREQ before, is not the destination and
does not have a current route to the
A different approach with a 2ACK scheme, destination, it rebroadcasts the RREQ. If the
which is a network-layer technique to detect receiving node is the destination or has a
misbehaving links and to mitigate their effects. current route to the destination, it generates a
It can be implemented as an add-on to Route Reply (RREP). The RREP is unicast in a
existing routing protocols for MANETs, such as hop-by-hop fashion to the source. As the RREP
DSR (Dynamic Source Routing). The 2ACK propagates, each intermediate node creates a
scheme detects misbehavior through the use route to the destination. When the source
of a new type of acknowledgment packet, receives the RREP, it records the route to the
termed 2ACK. A 2ACK packet is assigned a destination and can begin sending data.
fixed route of two hops (three nodes), in the
opposite direction of the data traffic route. 2. Sender connects to the nearest
Whereas, the proposed algorithm just uses a intermediate node:The
snippet below is
simple acknowledgement approach instead a describes the procedure of connecting to
2ACK scheme, which increases the overhead. intermediate node:
166
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
…} readcnt=in1.read(chstr);
else if(readcnt <=0)
{ continue;
split=end+48 else
extract ( end, split ) break;
…} }
}
The variable st will point to the start of the 7. Calculates time taken and the number of
message, end will be initialized to 48, and split packets lost: The moment the message is sent
variable keeps adding 48 to the end, to point the time is saved in start, which is long
to the next position where the message has to variable, and once the acknowledgement has
be split. reached the time is again noted in the long
variable end.
4. Creates data frame with destination
address, sender name, hash code and Start=System.currentTimeMillis();
message: The data frame contains the end=System.currentTimeMillis();
following fields as shown below:
The total time taken for the message to be
Destination address: is taken from the text sent and acknowledgement to reach back is
field, entered by the user. calculated from end-start.
Every time a packet is sent, there is a counter
Host name: it is obtained using the following cpkt, which is incremented. If the total time
snippet of code. taken exceeds the wait time limit which is
20msec, a counter cmiss that keeps count of
InetAddress inta=InetAddress.getLocalHost(); packets lost is incremented. This uses the
Sender‘s hostname=ineta.getHostName(); principle of flow conservation for calculating
the (cmiss/cpkt) ratio which is explained in the
Hash code: the function hashing() is called following step.
with msg as the parameter, which calculates
and returns the hash code. 8. Chooses the intermediate node: Once the
whole message is sent, a packet called ―done‖
Message: msg is taken from the text area that is sent by the sender to mark the end of the
is either manually entered by the user or message. If the ratio of (cmiss/cpkt) exceeds
browsed and copied from a text file. 20%, the link is said to be misbehaving. And if
the acknowledgement field that is extracted
5. Sends the packet: Once the connection is from the ack packet sent by the destination
established, BufferedReader and matches ―CONFIDENTIALITY LOST‖ then we
BufferedOutputStream are used to create the consider that the message is modified. If the
input and output stream that sends and ratio of (cmiss/cpkt) is less than 20% and the
receives packets in bytes. Functions write () acknowledgement field extracted is ―ACK‖ then
and read () are used for sending the packet the link is considered to be working properly.
and receiving the acknowledgement. Thus the sender displays appropriate
information message indicating the behaviour
6. Waits for acknowledgement: The sender of the link.
keeps waiting till acknowledgement is received
from the intermediate node. The function read If the link is misbehaving or the confidentiality
() reads the acknowledgement written by the of the message is lost, there has to be a
intermediate node to the sender in to the switch in the intermediate node used. This is
string object chstr, and returns the number of done so that in the next session, a faithful
bytes read. The following snippet shows the communication is carried out. In case the link
infinite loop that is used for waiting. is learnt to be working properly then the same
link is used for the further sessions of sending
While (true) ///read ACK messages.
{
167
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
168
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[2] Payal N.Raj and Prashant B. swadas, ― ― Adaptive and Secure Routing protocol for
DPRAODV: A Dynamic learning system against Emergency Mobile Ad Hoc Networks‖,
blackhole attack in AODV based MANET‖, International Journal of Wireless and Mobile
International Journal of Computer Networks (IJWMN), Vol-2, No-2, May 2010.
ScienceIssues, Vol-2, 2009.
[5] Y. Zhang, and W. Lee, ―Intrusion detection
[3] Shailender Gupta and Chander Kumar, in wireless ad-hoc networks‖, in Proc. 6th ACM
―Shared Information BasedSecurity Solution International Conference on Mobile Computing
forMobile Ad Hoc Networks‖, International and Networking, Boston, USA, August 2000.
Journal of Wireless and Mobile Networks International Journal of Wireless & Mobile
(IJWMN), Vol-2, N0-1, Feb 2010. Networks (IJWMN) Vol.2, No.4, November
2010
[4] Emmanouil A. Panaousis, Tipu A.
Ramrekha, Grant P. Millar and Christos Politis,
169
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
170
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
choices is that we might end up having, say, the from the word list, is recorded alongside the
word ―dog‖ and the word ―wolf‖ in the list, and this particular image category and distortion type. Since
may cause ambiguity in labeling. To avoid this it is difficult to get user responses for each distortion
problem, we propose a WordNet-based [5] algorithm type over all images 𝜒, we measure the average
to generate a semantically non-overlapping set of recognizability for a given distortion using the
word choices while preventing odd-one-out attacks following. If ( ) is the set of all images presented
using the choices themselves. Because the number to users subjected to (·),
of choices are limited, the location of the mouse-
click on the composite image acts as additional user
input, and together with the annotation, it forms the
two-step mechanism to reduce the rate of random
attacks. is correctly recognized(4)where I is the indicator
function. The implicit assumptions made here, under
II. HELPFUL HINTS which the term is comparable to )
or ñirm(äy) is that (a) all users independently assess
recognizability of a distorted image (since they are
A. Algorithmic Recognizability presented privately, one at a time), and (b) with
sufficient, but not necessarily identical number of
Algorithms that attempt to perform image responses, the average recognizability measures
recognition under distortion can be viewed from two converge to their true value.
different angles here. First, they can be thought of Assessing Recognizability with User Study: The user
as methods that potential adversaries may employ in study we use in order to measure what we term as
order to break image CAPTCHAs. Second, they can the average human recognizability under
be considered as intelligent vision systems. Because distortion , is only one of many ways to assess the
the images in question can be widely varying and be ability of humans to recognize images in clutter. This
part of a large image repository, content-based metric is designed specifically to assess the usability
image retrieval (CBIR) systems [30] seem apt. of CAPTCHAs, and may not reflect on general human
Essentially a memory-based method of attack, the vision. Furthermore, the study simply asks users to
assumption is that the adversary has access to the choose one appropriate image label from a list of 15
original (undistorted) images (which happens to be a words, and recognizability is measured as the
requirement [3] of CAPTCHAs) for matching with the fraction of times the various users made the correct
distorted image presented. While our experiments choice. While correct selection may mean that the
focus on image matching algorithms, other types of user recognized the object in the image correctly, it
algorithms also seem plausible attack strategies. could also mean that it was the only choice
Near-duplicate detection [15], which focus on perceived to be correct, by elimination of choices
finding marginally modified/distorted copyrighted (i.e., best among many poor matches), or even a
images, seems to be a potential choice as well. This random draw from a reduced set of potential
is part of our future work. Automatic image matches. Furthermore, using the averaged
annotation and scene recognition techniques [7] responses over multiple users could mean that the
have potential, but given the current state-of-the- CAPTCHA may still be unusable by some fraction of
art, these methods are unlikely to do better than the population. While it is very difficult to assess true
direct image-to-image matching. recognizability, our metric serves the purpose it is
used for: the ability of users to pick one correct label
B. Human Recognizability from a list of choices, given a distorted image, and
We measure human recognizability under distortion hence we use these averaged values in the
using a controlled user study. An image I is sampled CAPTCHA design. Furthermore, the user study
from X, subjected to distortion (·), and then consists of roughly the same number of responses
presented to a user, along with a set of 15 word from over 250 random users, making the average
choices, one of which is unambiguously an recognizability metric fairly representative. Later in
appropriate label. While higher than 15 choices Sec. V, we will see that there is sufficient room for
makes it harder to solve automatically, too many relaxing the intensity of distortions so as to ensure
choices also makes it more challenging for humans high recognizability for most users, without
and hence affects usability. The user choice, made compromising on security.
171
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
We look at image distortion candidates that are Multiple random orthogonal partitions. Image
relevant in designing image CAPTCHAs. With the segments, generated using k-means clustering with
exception of the requirement that the distortion k-center initialization on color, followed by
should obfuscate machine vision more than human connected component labeling.
vision, the space of possible distortions ä y(·) is
unlimited. Any choice of distortion gets further In either case, for each such partition, randomly
support if simple filtering or other pre-processing select y colors (being the parameter for this
steps are ineffective in undoing the distortion. distortion) and use them to dither that region. This
Furthermore, we avoid non-linear transformations on leaves a segment wise dithering effect on the image,
the images so as to retain basic shape information, which is difficult to undo. Automatic image
which can severely affect human recognizability. For segmentation is expected to be particularly affected.
the same reason we do not use other images or Distortion tends to have a more severe effect on
templates to distort an image. Pseudo-randomly recognizability at lower values of y.
generated distortions are particularly useful here, as
with text CAPTCHAs. For the purpose of making it 3 Cutting and Re-scaling
harder for machine recognition to undo the effect of For machine recognition methods that rely on pixel-
distortion, we need to also consider the approaches to-pixel correspondence based matching, scaling and
taken in computer vision for this task. In the translation helps making them ineffective.Take a
literature, the fundamental step in generic portion of one of the four sides of the image, cut out
recognition tasks has been low-level feature between 10 - 20% from the edge (chosen at
extraction from the images [30], [7]. In fact, this is random), and re-scale the remainder to bring it back
the only part of the recognition process that we to the original image dimensions. This is rarely
have the power to affect. disruptive to human recognition, since items of
interest occupy the central region in our image set.
1 Color Quantization On the other hand, it breaks the pixel
Instead of allowing the full color range, we quantize correspondence. Which side to cut is also selected at
the color space for image representation. For each random.
image, we transform pixels from RGB to CIE-LUV 4 Line and Curve Noise
color space. The resultant color points, represented Addition of pixel-wide noise to images is typically
in _3 space, are subject to k-means clustering with reversible by median filtering, unless very large
k-center initialization [4]. A parameter controls the quantities are added, in which case human
number of color clusters generated by the k-means recognizability also drops. Instead, stronger noise
algorithm. All colors are then mapped to this elements can be added on to the image, at random.
reduced set of colors. A lower number of color In particular, thick lines, sinusoids, and higher-order
clusters translate to loss of information and hence curves are added.
lower recognizability. 5. Word Choice Generator
172
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
173
satisfy the requirements - we find this out cause ambiguity, and the choices themselves may
experimentally, s described in Sec. V), applied to result in odd one- out attacks if the correct choice is
the image and presented to the user along with an semantically different from all others. An algorithm is
unambiguous choice of 15 words (generated proposed to generate the word choice set W
automatically). A sample screenshot is presented in containing unambiguous choices for the ease of
Fig. 2. If the user fails in image recognition, users, while ensuring that word-based attacks are
authentication is immediately considered failed and ineffective. For this a WordNet-based [5] is used.
re-start from step 1 is necessary. Semantic word similarity measure [4], denoted by
d(w1, w2) where w1 and w2 are English words.
Given the correct annotation wk (e.g. ―tiger‖) of
image ik, and optionally, other words Wo (e.g.
{―lake‖}) with the requirement of Nw choices, the
algorithm for determining W is as follows:
The main aim of building image-based CAPTCHAs V. CONCLUSION AND FUTURE WORK
secure against such attacks. Certain assumptions
about possible attack strategies are needed in order We have presented a novel way to distinguish
to design attack resistant distortions. Here, the only humans from machines by an image recognition
feasible way is to use CBIR to perform inexact test, one that has far-reaching implications in
matches between the distorted image and the set of computer and information security. The key point is
images in the database, and use the label associated that image recognition, especially under missing or
with an appropriately matched one for attack. This pseudo information is still largely unsolved, and this
assumption is reasonable since attack strategy needs fact can be exploited for the purpose of building
to work on the entire image database in real time in better CAPTCHA systems than the vulnerable text-
order to be effective, and image retrieval usually based CAPTCHAs that are in use today. We have
scales better than other techniques. explored the space of systematic distortions as a
means of making automated image matching and
Determining the Word Choice Set recognition a very hard AI problem. Without on-the-
For word choice generation, factors related to fly distortion, and with the original images publicly
image-based CAPTCHAs that have not been available, image recognition by matching is a trivial
previously addressed are it may be possible to task. We have learned that atomic distortions are
remove ambiguity in labeling images (hence making largely ineffective in reducing machine-based
annotation easier for humans) by the choices attacks, but when multiple atomic distortions
themselves, the images might seem to have multiple combine; their effect significantly reduces machine
valid labels (e.g. a tiger in a lake can be seen as recognizability.
―tiger‖ and ―lake‖ as separate entities), and this may
174
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Our study, while in no way encompassing the entire [9] J. Elson, J. R. Douceur, J. Howell, and J. Saul,
space of distortions (or algorithms that can ―Asirra: A CAPTCHA that Exploits Interest-Aligned
recognize under distortion), presents one way to Manual Image Categorization,‖ Proc. ACMCCS, 2007.
understand the effects of distortion on the [10] F. Fleuret and D. Geman, ―Stationary Features
recognizability of images in general, and more and Cat Detection,‖ J.
specifically to help design image CAPTCHA systems. Machine Learning Research, 9:2549-2578,
Furthermore, it attempts to expose the weaknesses 2008.
of low-level feature extraction to very simple [11] R.W. Floyd and L. Steinberg, ―An Adaptive
artificial distortions. As a bi-product, In the future, Algorithm for Spatial Grey Scale,‖ Proc. Society of
large scale user-studies can be carried out on the Information Display, 17:75-77, 1976.
ease of use, building a Web interface to the [12] P. Golle, ―Machine learning attacks against the
IMAGINATION system, and generate greater attack Asirra CAPTCHA,‖
resistance by considering other possible attack Proc. ACM CCS, 2008.
strategies such as interest points, scale invariants, [13] Guardian, ―How Captcha was foiled: Are you a
and other object recognition techniques. man or a mouse?‖
REFERENCES http://www.guardian.co.uk/technology/2008/
[1]. L. von Ahn, M. Blum, and J. Langford, ―Telling internet.captcha. Retrieved on 08/28/2008.
Humans and [14] A.K. Jain and R. C. Dubes, Algorithms for
Computers Apart (Automatically) or How Lazy Clustering Data, Prentice
Cryptographers do AI,‖ Hall, 1988.
Communications of the ACM, 47(2):57-60, 2004. [15] Y. Ke, R. Sukthankar, and L. Huston, ―Efficient
[2] A.L. Blum and P. Langley, ―Selection of Relevant Near-duplicate Detection and Subimage Retrieval,‖
Features and Proc. ACM Multimedia, 2004.
Examples in Machine Learning,‖ Artificial [16] R.E. Korf, ―Optimal Rectangle Packing: New
Intelligence, 97(1–2):245 Results,‖ Proc. ICAPS,
271,1997. 2004.
[3] ―The CAPTCHA Project,‖ [17] C. Leacock and M. Chodorow, ―Combining Local
http://www.captcha.net. Context and WordNet Similarity for Word Sense
[4] K. Chellapilla and P. Y. Simard, ―Using Machine Identification,‖ Fellbaum, 1998.
Learning to Break [18] D.B. Lenat, ―Cyc: A Large-Scale Investment in
Visual Human Interaction Proofs (HIPs),‖ Proc. Knowledge Infrastructure,‖Comm. of the ACM,
NIPS, 2004. 38(11):33-38, 1995.
[5] M. Chew and J. D. Tygar, ―Image Recognition [19] J. Li and J.Z. Wang, ―Real-time Computerized
CAPTCHAs,‖ Proc. Annotation of Pictures,‖ IEEE Trans. Pattern Analysis
ISC, 2004. and Machine Intelligence, 30(6):985–1002, 2008.
[6] Computerworld, ―Building a better spam-blocking [20] D.G. Lowe, ―Object Recognition from Local
CAPTCHA,‖ Scale-invariant Features,‖ Proc. ICCV, 1999.
http://www.computerworld.com/action/article.do? [21] C.L. Mallows, ―A Note on Asymptotic Joint
command=viewArticleBasic&articleId=9126378. Normality,‖ Annals of
Retrievedon 01/23/2009. Mathematical Statistics, 43(2):508–515, 1972.
[7] R. Datta, D. Joshi, J. Li, and J. Z. Wang, ―Image [22] G. Miller, ―WordNet: A Lexical Database for
Retrieval: Ideas, English,‖ Communications of the ACM, 38(11):39-41,
Influences, and Trends of the New Age,‖ ACM 1995.
Computing Surveys, [23] W.G. Morein, A. Stavrou, D.L. Cook, A.D.
40(2):1-60, 2008. Keromytis, V. Mishra, and D.
[8] R. Datta, J. Li, and J. Z. Wang, ―IMAGINATION:
A Robust Image-based CAPTCHA Generation
System,‖ Proc. ACM Multimedia, 2005.
175
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*ASSISTANT PROFESSOR, DEPARTMENT OF IT, Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College
, Avadi, Chennai andersaruna@gmail.com
** ASSISTANT PROFESSOR, DEPARTMENT OF IT, Anjalai Ammal-Mahalingam , Engineering College, Kovilvenni,
Tiruvarur . ppradhu07@gmail.com
***PG, STUDENT (M.E/NETWORK ENGINEERING), DEPARTMENT OF IT,
Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College, Avadi,Chennai. ramnath25@gmail.com
176
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Fig.1: Wireless Mesh Backbone Network In this section, we propose an MA-based handoff
architecture, which offers seamless and fast handoff
177
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
to support VoIP and other real-time applications. In To provide seamless handoff, we apply MA
our approach, all the handoff logics are done by the technology to WMNs. As shown in Fig. 2, in our
MA, and only the standard medium-access control solution, each mesh client is assigned a ―client MA.‖
protocol and IP are used. Therefore, it is compatible The mesh client places its client MA in the mesh
with any 802.11 mobile devices, regardless of the router that it registers with. If the mesh client moves
vendor or architecture. from the coverage of one mesh router to that of
another mesh router, the client MA also migrates.
A. MA-BASED HANDOFF ARCHITECTURE IN WMNs We study the scenario that a mesh client moves
from the coverage of one mesh router to that of
An MA is an executing program that can migrate another mesh router during a call. To eliminate the
during execution from machine to machine in a overall handoff latency, we can employ a proactive
heterogeneous network. In other words, the agent scan scheme to counteract channel scan delay and
can suspend its execution, migrate to another our MA approach to counteract connection
machine, and then resume execution on the new reestablishment delay.
machine from the point at which it left off. On each Particularly, when the scan trigger of a proactive
machine, scan scheme is fired, the mesh client will actively
probe channels and choose the appropriate
neighboring mesh router for handoff. Then, the
client MA will move from the current mesh router to
the chosen mesh router and complete the processes
of reassociation, CAC, rerouting, resource
reservation, etc. Once the handoff trigger is fired,
the mesh client will register to the new mesh router
and resume all the connections using the facilities
that have been prepared by the client MA earlier.
178
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
client MA to the new mesh router in the Our PMG-based WMN architecture has the
neighborhood, and the client MA will preset up following major advantages:
backup connections on the new mesh router to
prepare for seamless handoff. The preset up of By planning multicast groups during the
backup connections usually involves reassociation for deployment,
context switching between the old access point (AP) each mesh router knows that which subnet it
and the new AP by the inter access belongs to in
Point protocol, interaction with the CAC module for advance. This design makes it straightforward for
resource reservation and negotiation with the address
routing protocol for network layer path Management and L3 handoff detection.
reestablishment. Fourth, once the backup connection Both multicast groups and PMG multicast messages
is built up, the client MA will notify the mesh client can be easily implemented in IPv6 as multicast
that it is ready for handoff. Finally, the mesh client addressing
receives the notification and waits for the fire of the is a required part of the IPv6 protocol. Therefore, we
handoff trigger to register to the new mesh router believe that this solution is feasible and practical to
and complete the handoff. The foregoing be Implemented in future IPv6-based WMNs.
illustrations show that before the actual handoff Information sharing for network management for
occurs in the fifth step, the client MA has already intra subnet roaming is restrained to within a
constructed a backup connection on the new mesh multicast group, instead of broadcasting to the
router in the third step. As a result, overall handoff whole mesh backbone, which saves signaling
delay only involves registration delay, which is spent overhead. Information sharing between groups can
on the authentication information exchange between be implemented using PMG.
the mesh client and the new mesh router. In
addition to reducing handoff delay, MA-based Since the PMG-WMN architecture can facilitate the
handoff can also achieve high computing efficiency. Cross-layer protocol design and PMGs are able to
The client MA executes handoff logics on the mesh exchange handoff information between different
router where the network computing resource is subnets, both intra and inter-gateway mobility can
affluent and, thus releases the burden on the mesh be improved.
client, which is dedicated to running user
applications. The basic idea of the proposed cross-layer
handoff design is to take advantages of the PMG-
IV. PROPOSED PMG CROSS-LAYER HANDOFF based architecture and utilize the information
DESIGN obtained from the L2, such as the link quality of the
new channel and the IP address of the new AP after
We propose a PMG-based approach to a handoff, to predict the L3 and L5 handoffs in
position and configure mesh routers in order to form advance so that part of the handoff procedures can
a scalable wireless be carried out.
mesh backbone for mobility assistance. The benefit
of this
approach is that the protocols used for address
management and handoffs can be streamlined to
take advantage of the resulting network
architecture. Under the PMG approach, mesh routers
are grouped into connected multicast groups rooted
at gateway mesh routers. Special mesh routers,
namely PMGs, are equipped with multiple IP
addresses with each address corresponding to a
different subnet. Note that a mesh router can use
the Address Resolution Protocol (ARP) to map
different IP addresses to the MAC address of the
router. PMGs are the bridging nodes connecting Fig.4: Hand-off Delay using PMG based cross-layer
different groups. They can facilitate information handoff design
exchange between different groups during inter-
gateway handoffs.
179
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Parallel before an MN completes a L2 handoff. Fig. 5 the PMG to prepare the LHRD path between the cap
shows the sequence of the handoff delays of the and (PGMMR) for the MN in advance, while the PMG
proposed PMG based cross-layer handoff scheme takes care of the UHRD to the old gateway. For the
and the complete handoff procedure is shown in inter-gateway case, the corresponding PMG which
Algorithm 1. belongs to both the old and new subnets first
formulates an IP address for the MN by using the
Algorithm 1 cAP‘s network ID and MN‘s interface ID via the
PMGM it receives. This IP address is stored in the
PMG-based algorithm for cross-layer handoffs PMGMR and cAP‘s routing table before MN‘s L3
1. While (true) { handoff starts. Furthermore, the PMG prepares for
2. If (RL3T <RCUR ≤ RL2T) the LHRD to the cAP and the UHRD to the new
3. MN sends an HOM to its oAP to retrieve nAP list; gateway. By doing so, the routing path for both the
4. OAP informs nap‘s to activate additional channel; binding update to the HA and binding
5. MN sends Probe Request & waits for Probe acknowledgement from the HA are prepared for the
Response; MN in advance. Nevertheless, since the PMGMR
6. MN sorts nAPs & obtains the cAP; could be multiple hops away from the cAP and the
7. MN sends an HOM which contains the preferred gateway, the preparation time for both the LHRD
CAP‘s network ID to its oAP; and UHRD increases with the number of hops. After
8. OAP sends a multicast PMGM to locate the the MN finishes the L2 handoff, the cAP sends the
PMGMR; MN the IP address of the MN. Our objective in this
9. If (L2HT <RCUR ≤ RL3T) stage is to eliminate both the L3 handoff detection
10. OAP unicasts to the PMGMR for handoff delay and routing path discovery delay which are
preparations; significant handoff delays in the L3 handoff.
11. If (cAP belongs to another subnet)
12. PMGMR formulates an address for the MN; V. PERFORMANCE EVALUATION
13. PMGMR prepares the UHRD to the new gateway
& the LHRD to the cAP; In this section, we conduct simulations to evaluate
14. SIP message exchanges; the performance of the proposed PMG-based cross
15. else layer handoff scheme. Since the current modeler
16. PMGMR prepares the UHRD to the old gateway does not provide the WMN handoff support, we
& the LHRD to the cAP; implement new models for mesh routers with both
17. If (RCUR ≤ L2HT) Mobile IPv6and AODV routing functionalities
18. If (subnet changes) activated so as to realize the handoff support in IP-
19. MN associates to the cAP; based infrastructure WMNs.
20. MN obtains a new IP address & uses the
obtained A. Simulation Setup
routing path for address binding with the HA;
21. MN resumes the multimedia session on layer-5; We developed two default handoff scenarios in
22. else WMNs in order to compare with our proposed PMG-
23. MN associates to the cAP; based WMN architecture. One is the default-based
24. MN uses the obtained routing path for resuming handoff scheme which depends on RA messages to
the multimedia session on layer-5; trigger an MN‘s L3 handoff, as explained in Section
25 } III-A. The other is the gateway-based handoff
scheme under which an MN detects a L3 handoff by
A. L3 Handoff Preparation to Eliminate the L3 receiving a reply message from the gateway. In our
Handoff handoff simulation, the WMN is composed of two
Detection Delay & Routing Path Discovery Delay gateways, a few regular mesh routers, and one
PMG. All mesh routers and gateways‘ wireless
When the RCUR of the MN reaches the RL3T, it interfaces use both AODV and IPv6 routing protocols
triggers the oAP to notify the PMG to prepare for the for delivering multihop IPv6 traffic. Only the PMG
L3 handoff. The PMG first checks whether the cAP is has multiple IP addresses (two IP addresses in our
located in the current subnet of the MN or not. For simulation) with each IPv6 address belonging to a
the intra-gateway case, the IP address of the MN different subnet. The PMG message interval on
does not need to change. The cAP will be notified by PMGMR is uniformly distributed from 0.5s to 1s. The
180
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Internet backbone network has a constant latency of cross-layer design and eliminates L3 handoff
0.1 second. detection and route discovery delay.
B. Results Analysis
181
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
VII. REFERENCES
[12] Y. Amir, C. Danilov, M. Hilsdale, R. Musaloui-
[1] I. F. Akyildiz and X. Wang, ―A survey on wireless Elefteri, and N. Rivera, ―Fast handoff for seamless
mesh networks,‖IEEE Communications Magazine, wireless mesh networks,‖ in Proc. ACM MobiSys,
vol. 43, no. 9, pp. 23-30, Sept. 2005. 2006, pp. 83–95.
[2] C. E. Perkins, ―IP mobility support for IPv4,‖ [13] D. C. Plummer, ―An Ethernet Address
Request for Comments (RFC) 3220, Internet Resolution Protocol,‖ IETF Request for Comments
Engineering Task Force (IETF), January 2002. (RFC) 826, November 1982.
[3] G. Holland and N. H. Vaidya, ―Analysis of TCP [14] C. E. Perkins, E. M. Belding-Royer, and S. Das,
performance over mobile ad hoc networks,‖ in Proc. ―Ad hoc on-demand distance vector (AODV)
ACM MobiCom, 1999, pp. 219–230. routing,‖ Request for Comments (RFC) 3561,
IETF, July 2003.
[4] I. Ramani and S. Savage, ―SyncScan: practical
fast handoff for 802.11 infrastructure networks,‖ in [15] C. Chang, C. J. Chang, and K. R. Lo, ―Analysis
Proc. IEEE INFOCOM, 2005, pp. 675–684. of a hierarchical cellular system with reneging and
[5] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. dropping for waiting new calls and handoff calls,‖
Johnston, J. Perterson, R. Sparks, M. Handley, and IEEE Trans. Veh. Technol., vol. 48, no. 4, pp. 1080–
E. Schooler, ―SIP: Session initiation protocol,‖ 1091, Jul. 1999.
Request for Comments (RFC) 3261, IETF, June
2002. [16] V. K. N. Lau and S. V. Maric, ―Mobility of
queued call requests of a new call-queuing
[6] H. Wu, K. Tan, Y. Zhang, and Q. Zhang, technique for cellular systems,‖IEEE Trans. Veh.
―Proactive scan: fast handoff with smart triggers for Technol., vol. 47, no. 2, pp. 480–488, May 1998.
802.11 wireless LAN,‖ in Proc. IEEE INFOCOM, 2007,
pp. 749–757. [17] W. Zhuang, B. Bensaou, and K. C. Chua,
―Adaptive quality of service handoff priority scheme
[7] N.Montavont and T. Noel, ―Handover for mobile multimedia networks,‖ IEEE Trans.Veh.
management for mobile nodes in IPv6 networks,‖ Technol., vol. 49, no. 2, pp. 494–505, Mar. 2000.
IEEE Communications Magazine, vol. 40, no. 8, pp. [18] J. Zhang, J. W. Mark, and X. Shen, ―An adaptive
38-43, August 2002. handoff priority scheme for wireless MC-CDMA
cellular networks supporting multimedia
[8] H. Soliman, C. Castelluccia, K. El Malki, and L. applications,‖in Proc. IEEE GLOBECOM, Nov. 2004,
Bellier, ―Hierarchical Mobile IPv6 Mobility vol. 5, pp. 3088–3092.
Management (HMIPv6),‖ Request for Comments
(RFC) 4140, IETF, August 2005. [19] M. R. Kibria and A. Jamalipour, ―NXG04-5: Fair
call admission control for prioritizing vertical handoff
[9] G. Dommety et. al, ―Fast Handovers for Mobile in multi-traffic B3G networks,‖ in Proc.IEEE
IPv6,‖ Request for Comments (RFC) 4068, IETF, July GLOBECOM, Nov. 2006, pp. 1–5.
2005.
[20] R. L. Geiger, J. D. Solomon, and K. J. Crisler,
[10] M. Buddhikot, A. Hari, K. Singh, and S. Miller, ―Wireless network extension using mobile IP,‖ IEEE
―MobileNAT: A new technique for mobility across Micro, vol. 17, no. 6, pp. 63–68, Nov./Dec. 1997.
heterogeneous address spaces,‖ Mobile Networks
and Applications, vol. 10, no. 3, pp. 289–302, June [21] S. Mohanty and I. F. Akyildiz, ―Performance
2005. analysis of handoff techniques based on mobile IP,
TCP-migrate, and SIP,‖ IEEE Trans. MobileComput.,
[11] V. Navda, A. Kashyap, and S. R. Das, ―Design vol. 6, no. 7, pp. 731–747, Jul. 2007.
and evaluation of iMesh: an infrastructure-mode
wireless mesh network,‖ in Proc. Sixth IEEE
International Symposium on a World of Wireless
Mobile and Multimedia Networks (WoWMoM), June
2005, pp. 164–170.
182
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
* D.Anandhi **K.R.ArjunAdhityaa
*
Asst. Prof., Department of Information Technology,
anandhime@yahoo.com
**
PG Student, Department of Information Technology,
kr.arjunadhityaa@gmail.com
183
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
stegos should be visually and statistically message, and the relationship (odd–even
similar to the covers while keeping the combination) of the two pixel values
embedding rate as high as possible. In carries another bit of secret message. In
this paper, we consider digital images as such a way, the modification rate of pixels
covers and investigate an adaptive and can decrease from 0.5 to 0.375
secure data hiding scheme in the spatial bits/pixel(bpp) in the case of a maximum
least-significant-bit (LSB) domain. embedding rate, meaning fewer changes
to the cover image at the same payload
LSB replacement is a well-known compared to LSB replacement and LSBM.
steganographic method. In this It is also shown that such a new scheme
embedding scheme, only the LSB plane of can avoid the LSB replacement style
the cover image is overwritten with the asymmetry, and thus it should make the
secret bit stream according to a detection slightly more difficult than the
pseudorandom number generator (PRNG). LSBM approach based on our experiments.
As a result, some structural asymmetry
(never decreasing even pixels and II. ANALYSIS OF LIMITATIONS OF
increasing odd pixels when hiding the RELEVANT APPROACHES
data) is introduced, and thus it is very AND STRATEGIES
easy to detect the existence of hidden
message even at a low embedding rate In this section, we first give a brief
using some reported steganalytic overview of the typical LSB-based
algorithms, such as the Chi-squared attack approaches including LSB replacement,
, regular/singular groups (RS) analysis, LSBM,and LSBMR, and some adaptive
sample pair analysis, and the general schemes including the original PVD
framework for structural steganalysis, scheme,the improved version of PVD
.same coverage can be achieved by a (IPVD),adaptive edges with LSB (AE-LSB),
Mesh router with much lower transmission and hiding behind corners (HBC),and then
power. To further improve the flexibility of show some image examples to expose the
mesh networking, a mesh router is usually limitations of these existing schemes.
equipped with multiple wireless interfaces Finally we propose some strategies to
built on either the same or different overcome these limitations. In the LSB
wireless access technologies. LSB replacement and LSBM approaches, the
matching (LSBM) employs a minor embedding process is very similar. Given a
modification to LSB replacement. If the secret bit stream to be embedded, a
secret bit does not match the LSB of the traveling order in the cover image is first
cover image, then or is randomly added to generated by a PRNG, and then each pixel
the corresponding pixel value. Statistically, along the traveling order is dealt with
the probability of increasing or decreasing separately. For LSB replacement, the
for each modified pixel value is the same secret bit simply overwrites the LSB of the
and so the obvious asymmetry artifacts pixel, i.e., the first bit plane, while the
introduced by LSB replacement can be higher bit planes are preserved. For the
easily avoided. Therefore, the common LSBM scheme, if the secret bit is not equal
approaches used to detect LSB to the LSB of the given pixel, then 1 is
replacement are totally ineffective at added randomly to the pixel while keeping
detecting the LSBM. Up to now, several the altered pixel in the range of . In such
steganalytic algorithms have been a way, the LSB of pixels along the
proposed to analyze the LSBM scheme. traveling order will match the secret bit
stream after data hiding both for LSB
Unlike LSB replacement and LSBM, which replacement and LSBM. Therefore, the
deal with the extracting process is exactly the same for
pixel values independently, LSB matching the two approaches.It first generates the
revisited (LSBMR)uses a pair of pixels as same traveling order according to a
an embedding unit, in which the LSB of shared key, and then the hidden message
the first pixel carries one bit of secret
184
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
can be extracted correctly by checking the data extraction. In practice, such side
parity bit of pixel values. information (7 bits in our work) can be
embedded into a predetermined region of
LSBMR applies a pixel pair(x(i),x(i+1)) in the image.
the cover image as an embedding unit. In data extraction, the scheme first
After message embedding, the unit extracts the side information from the
ismodified as (x(i)‘,x(i+1)‘) in the stego stego image. Based on the side
image which satisfies information, it then does some
LSB(x(i)‘)=m(i) preprocessing and identifies the regions
LSB((x(i)‘)/2+(x(i+1)‘)=m(i+1) that have been used for data hiding.
where the function denotes the LSB of the Finally, it obtains the secret message
pixel valueand are the two secret bits to according to the corresponding extraction
be embedded.By using the relationship algorithm. we apply such a region
(odd–even combination) of adjacent adaptive scheme to the spatial LSB
pixels, the modification rate of pixels in domain. We use the absolute difference
LSBMR would decrease compared with between two adjacent pixels as the
LSB replacement and LSBM at the same criterion for region selection, and use
embedding rate. What is more, it does not LSBMR as the data hiding algorithm. The
introduce the LSB replacement style details of the data embedding and data
asymmetry. Similarly, in data extraction, it extraction algorithms are as follows.
first generates a traveling order by a
PRNG with a shared key. And then for
each embedding unit along the order, two
bits can be extracted. The first secret bit is
the LSB of the first pixel value, and the
second bit can be obtained by calculating
the relationship between the two pixels as
shown above.
III.RELATED WORKS
185
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
186
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
187
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
and128 16, 32, 32, 64, released. When is 0, all the embedding
and64 units within the cover become available.
In such a case, our method can achieve
the maximum embedding capacity of
RMSE PSNR RMSE 100% (100% means 1 bpp on average for
PSNR RMSE PSNR all the methods in this paper), and
Lena 2.07 41.69 0.97 therefore, the embedding capacity of our
48.43 0.28 59.05 proposed method is almost the same as
Baboon 3.25 37.90 1.59 the LSBM and LSBMR methods except for
44.10 0.27 59.36 7 additional bits.It can also be observed
Peppers 2.09 41.73 1.20 that most secret bits are hidden within the
47.19 0.39 56.24 edge regions when the embedding rate is
TABLE III low, e.g., less than 30% in the example,
VALUES OF MSE, RMSE, PSNRS AND while keeping those smooth regions such
AFCPV OF STEGO- IMAGE IN WHICH as the sky in the top left corner as they
AN ATM CARD IMAGE IS EMBEDDED. are. Therefore, the subjective quality of
our stegos would be improved based on
Cover Image Our Method the human visual system (HVS)
MSE RMSE PSNR AFCPV characteristics.
Lena 0.14 0.38 56.42
0.001029 Our Method PVD Method
MSE: 0.08 MSE:
4.28
Fig. 2 Stego Images for the proposed
method and PVD
188
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
189
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
190
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
ABSTRACT-Information Extraction (IE) is the name WebNLP enables bi-directional integration of page
given to any process, which selectively structures structure understanding and text understanding in
and combines data, which is found, explicitly an iterative manner
stated or implied, in one or more texts. The final
output of the extraction process varies; in every 1 INTRODUCTION
case, however, it can be transformed so as to
populate some type of database. Information
Extraction plays an important role in Web The World Wide Web contains huge amounts of
Knowledge discovery and management. data. However, we cannot benefit very much from
Information analysts working long term on specific the large amount of raw webpages unless the
tasks already carry out information extraction information within them is extracted accurately
manually with the express goal of database and organized well. Therefore, information
creation. The two most important tasks in extraction (IE) plays an important role in Web
information extraction from the Web are web page knowledge discovery and management. Among
structure understanding and natural language various information extraction tasks, extracting
sentences processing. Our recent work on web structured Web information about real-world
page understanding introduces a joint model of entities (such as people, organizations, locations,
Hierarchical Conditional Random Fields (HCRFs) publications, products) has received much
and extended Semi-Markov Conditional Random attention of late. However, little work has been
Fields (Semi-CRFs) to leverage the page structure done toward an integrated statistical model for
understanding results in free text segmentation understanding webpage structures and processing
and labeling. The HCRF model can reflect the natural language sentences within the HTML
structure and the Semi-CRF model can make use elements of the webpage. Our recent work on
of the gazetteers. In this top-down integration Web object extraction has introduced a template-
model, the decision of the HCRF model could independent approach to understand the visual
guide the decision making of the Semi-CRF model. layout structure of a webpage and to effectively
However, the drawback of the top down label the HTML elements with attribute names of
integration strategy is also apparent, i.e., the an entity.
decision of the Semi-CRF model could not be used Our latest work on webpage understanding
by the HCRF model to guide its decision making. introduces a joint model of the Hierarchical
The WebNLP framework consists of two Conditional Random Fields (HCRFs) model and the
components, a structure understanding extended Semi-Markov Conditional Random Fields
component and a text understanding component. (Semi-CRFs) model to leverage the page structure
191
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
understanding results in free text segmentation (i.e., wrappers) based on the labeling results.
and labeling. The HCRF model can reflect the Unsupervised approaches do not need labeled
structure and the Semi-CRF model can make use training samples. They first automatically discover
of the gazetteers. In this top-down integration clusters of the webpages and then produce
model, the decision of the HCRF model could wrappers from the clustered webpages. No matter
guide the decision of the Semi- CRF model. how the wrappers are generated, they can only
However, the drawback of the top-down strategy work on the webpages generated by the same
is that the decision of the Semi-CRF model could template. Therefore, they are not suitable for
not be used by the HCRF model to refine its general purpose webpage understanding.
decision making. In this paper, we introduce a In contrast, template-independent approaches can
novel framework called WebNLP that enables process various pages from different templates.
bidirectional integration of page structure However, most of the methods in the literature
understanding and text understanding in an can only handle some special kinds of pages or
iterative manner. In this manner, the results of specific tasks such as object block (i.e., data
page structure understanding and text record) detection. . Zhai and Liu proposed an
understanding can be used to guide the decision algorithm to extract structured data from list
making of each other, and the performance of the pages. The method consists of two steps. It first
two understanding procedures is boosted identifies individual records based on visual
iteratively. information and a tree matching method. Then a
Although the WebNLP framework is motivated by partial tree alignment technique is used to align
multiple mentions of object attributes (named and extract data items from the identified records.
entities) in a webpage, it will also improve entity First, they use the Vision-based Page
extraction from webpages without multiple Segmentation (VIPS) algorithm to partition a
mentions because of the joint optimization nature webpage into semantic blocks with a hierarchy
of the framework structure. Then, spatial features (such as position
The main contributions of this work are as follows: and size) and content features (such as the
1. We introduce a novel framework for number of images and links) are extracted to
webpage understanding called WebNLP to boost construct a feature vector for each block. Based
the perfor- mance of page structure understanding on these features, learning algorithms, such as
and shallow natural language processing SVM and neural network, are applied to train
iteratively. various block importance models.
2. We introduce the multiple occurrence However, the natural language understanding
features to the WebNLP framework. It improved component in the loop needs to be accurate
both the precision and recall of the named entity enough to provide positive feedback to the
extraction and structured Web object extraction on structure understanding compo- nent. In the
a webpage. Semi-CRF model is designed to handle simple text
3. Shallow natural language processing fragment segmentation, such as the segmenta-
features are applied to the WebNLP framework, tion between city, state, and zip code. Therefore,
which allows training of the natural language the model only contains some regular expression
features on existing large corpus different from features. For these regular expression features,
the limited labeled webpages. the model can be trained to achieve nearly optimal
parameters with only hundreds of labeled
2 RELATED WORK webpages. However, these features are not com-
prehensive enough to segment and label the
Webpage understanding plays an important role in natural language sentences in the webpage for
information retrieval from the Web. There are two tasks like business name extraction.
main branches of work for webpage It is natural to close the loop in webpage
understanding: template-dependent approaches understanding by introducing a bidirectional
and template-independent approaches. integration model, where the bottom-up model
Template-dependent approaches (i.e., wrapper- using text understanding to guide struc- ture
based approaches) can generate wrappers either understanding is integrated with the top-down
with supervision or without supervision. The model mentioned above. However, the natural
supervised approaches take in some manually language understanding component in the loop
labeled webpages and learn some extraction rules needs to be accurate enough to provide positive
192
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
feedback to the structure understanding compo- model and the Semi- CRF model. It also extends
nent. In [11], the Semi-CRF model is designed to the Semi-CRF model to take the vision node label
handle simple text fragment segmentation, such assignment as an input of the feature functions.
as the segmenta- tion between city, state, and zip The label of the vision node is actually a switch. It
code. Therefore, the model only contains some eliminates unnecessary searching paths in the
regular expression features. optimization procedure in the Semi-CRF model.
The task of the template-independent webpage This joint model is in fact only a top-down
understanding is defined in this paper as the task integration, where only the label of the vision node
of page structure understanding and text content can guide the segmentation and labeling of its
segmentation and labeling. The state-of-the-art inner text. The labeling of the text strings cannot
webpage structure under- standing algorithm is be used to refine the labeling of the vision nodes
the HCRF algorithm. HCRF organizes the Observing the drawback of existing models, we
segmentation of the page hierarchically to form a propose our WebNLP framework. The differences
tree structure and conducts inference on the vision between the model in and the WebNLP framework
tree to tag each vision node (vision block) with a are obvious. First, the WebNLP framework is a
label. The HCRF model has been proved effective bidirectional integration strategy, where the page
for product information extraction. Since the structure understanding and the text
attribute values of product objects (such as the understanding are reinforced by each other in an
product price, image, and description) are usually iterative way. It closes the loop in the webpage
the entire text content of an HTML element, text understanding. Second, we introduce multiple
segmentation of the content within HTML mention features in this new framework. Our
elements is done as a postprocessing step. model treats the segmentation and labeling
The requirement of text understanding in decision at all mentions of one same entity as its
information retrieval is simpler than classical observation. Such a treatment greatly expands the
natural language under- standing. Deep parsing of valid features of the entity to make more accurate
the sentences is unnecessary in most of the cases. decisions. Third, we introduce an auxiliary corpus
Shallow parsing that can extract some important to train the weights of the statistical language
named entities is usually enough. The most features of the extended Semi-CRF model. It
popular technique used for named entity makes our model perform much better than the
recognition is Conditional Random Field (CRF), extended Semi-CRF model in with only regular
which is language independent. The combination expression matching features and sequential
of structure understanding and text understanding structure features.
is natural. All this work holds the belief that the
structure understanding can help the text
understanding. For example, Zhu et al. described a 3 PROBLEM DEFINITON
joint model that was able to segment and label the
text within the vision node. It integrates the HCRF
This paper aims at introducing a joint framework whole webpage. All the leaf nodes form the most
that can segment and label both the structure detailed flat segmentation of the webpage. Only
layout and text in the webpage. In this section, we leaf nodes have inner text content. The text
first introduce the data representation of the content inside leaf nodes may contain information
structure layout of the webpage and the text like business name. The text content could be
content within the webpage. Then, we formally structured text like address lines or grammatical
define the webpage understanding problem paragraphs, which contain the attribute values of
3.1 Data Representation an entity.
We use the VIPS approach to segment a webpage In this study, we use the vision tree as the data
into visually coherent blocks. VIPS makes use of representation for the structure understanding. We
page layout features, such as client region, font, use X ¼ fx1; x2; . . . ; xi; . . . ; xjXjg to denote
color, and size, to construct a vision tree the entire vision tree of a webpage. xi is the
representation of the webpage. Different from the observation on the ith vision node, which can be
HTML DOM tree, each node in the vision tree either inner node or leaf node. The observation
represents a region on the webpage. The region of contains both the visual information, e.g., the
the parent node is the aggregation of those of all position of the node, and the semantic
its child nodes. The root node represents the information, e.g., the text string within the node.
193
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Each vision node is associated with a label h to formal definitions of the two conditional
represent the role of the node on the whole tree, optimization problems are as follows:
e.g., whether the node contains all or some of the Definition 2 (Structure understanding). Given a
attributes of an object. So, H ¼ fh1; h2; . . . ; hi; . vision tree X and the text segmentation and
. . ; hjXjg represents the label of the vision tree X. labeling results S on the leaf nodes of the tree,
We denote the label space of h by Q. structure understanding is to find the optimal label
The text string within the leaf node is represented assignment of all the nodes in the vision tree HÃ
by a character sequence. Understanding the text The objective of the structure understanding is to
means to segment the text into nonoverlapping identify the labels of all the vision nodes in the
pieces and tag each piece with a semantic label. vision tree. Both the raw observations of the
In this paper, text under- standing is equal to text nodes in the vision tree and the understanding
segmentation and labeling. We use s ¼ fs1; s2; . . results about the text within each leaf node are
. ; sm; . . . ; sjsjg to represent the segmentation used to find the optimal label assignment of all the
and tagging over the text string within a leaf node nodes on the tree.
x. Each segment in s is a triple sm ¼ fffm; fim; Definition 3 (Text understanding). Given a vision
ymg in which ffm is the starting position; fim is the tree X and the label assignment H on all vision
end position; and ym is the segment label that is nodes, text understanding is to find the optimal
assigned to all the characters within the segment. segmentation and labeling SÃ on the leaf nodes:
We use jxj to denote the length of the text string The task of the text understanding problem in
within the vision node x. Then, segment sm entity extraction is to identify all the named
satisfies 0 ffm < fim jxj and ffmþ1 ¼ fim þ 1. entities in the webpage. The labeling results of the
Named entities are some special segments vision nodes will constrain the text understanding
differentiated from other segments by their labels. component to search only part of the label space
We denote the label space of y by Y. All the of the named entities. The labels of the named
segmentation and tagging of the leaf nodes in the entities within a vision node are forced to be
vision tree are denoted by S ¼ fs1; s2; . . . ; si; . . compatible with the label of the node assigned by
. ; sjSjg. Unless otherwise specified, these symbols the structure understanding The problem
defined above have the same meaning throughout described in Definition 1 can be solved by solving
the paper. 3.2 Problem Definition Given the data the two subproblems in Definition 2 and Definition
representation of the page structure and text 3 iteratively, starting from any reasonable initial
strings, we can define the webpage understanding solution. In Definition 2, the S in the condition is
problem formally as Follows: the optimum of the text understanding in the last
Definition 1 (Joint optimization of structure iteration, and in Definition 3, the H in the condition
understanding and text understanding). Given a is the optima of the structure understanding in the
vision tree X, the goal of joint optimization of last iteration. The iteration can begin with either
structure understanding and text understanding is the structure understanding or text understanding.
to find both the optimal assignment of the node In this work, we will begin with the text
labels and text segmentations ðH; SÞÃ: understanding. The features related to the label
This definition is the ultimate goal of webpage given by structure understanding are set as zero in
understanding, i.e., the page structure and the the first run of text understanding. The loop stops
text content should be understood together. when the optima in two adjacent iterations are
However, such a definition of the problem is too close enough
hard because the search space is the Cartesian
product of Q and Y. Fortunately, the negative 4 WEBNLP FRAMEWORK
logarithm of the posterior in (1) will be a convex
function, if we use the exponential function as the In this section, we introduce the WebNLP
potential function [24]. Then we can use the framework to solve the webpage understanding
coordinatewise optimization to optimize H and S problem We first introduce the framework
iteratively. In this manner, we can solve two intuitively and describe the individual models
simpler conditional optimization problems instead within the framework formally. Then, we describe
of solving the joint optimization problem in (1) how we integrate the page structure
directly, i.e., we do structure understanding and understanding model and the text understanding
text understanding separately and iteratively. The model together in the framework. The parameter
194
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
learning method and the label assignment represent all functions defined on the vertex,
procedure will be explained last. edge, or triangle. As the WebNLP framework is an
4.1 Overview iterative one, we further use the superscript j to
The WebNLP framework consists of two indicate the decision in the jth iteration.
components, i.e., a structure understanding 4.3 Model Integration
component and a text under- standing component. We will analyze how these two models are
The observations of these two components are integrated together in this section. Fig. 2 gives an
both from the webpage. The understanding results illustrativeexample of the connection between the
of one component can be used by the other extended HCRF model and the extended Semi-CRF
component to make a decision. The information model in a webpage. It is generated based on the
flows between the two components form a closed example webpage shown in Fig. 1. There are two
loop. The beginning of the loop is not very types of connections in the integrated model. One
important. However, we will show that starting is the connection between the vision node label
from the text understanding component is a good and the segmentation of the inner text. The other
choice. is the connection between multiple mentions of a
The structure understanding component assigns same named entity.
labels to the vision blocks in a webpage, 4.3.1 Vision Tree Node and Its Inner Text
considering visual layout features directly from the The natural connection between the extended
webpage and the segments returned by the text HCRF model and the extended Semi-CRF model is
understanding component together. If the via the vision tree node and its inner text. The
segments of the inner text are not available, it will feature functions that connect the two models are
work without such information. The text rkð_Þ in the extended Semi-CRF model and ekð_Þ
understanding component segments the text in the extended HCRF model. Feature function
string within the vision block according to the rkð_Þ in the extended Semi-CRF model takes the
statistical language features and the label of the labeling results of the leaf node given by the
vision block assigned by the structure extended HCRF model as its input. Feature
understanding component. If the label of the function ekð_Þ in the extended HCRF model uses
vision block is not available, it can also work the segmentation and labeling results of the
without such information. The two components extended Semi-CRF model as its input.
run iteratively until some stop criteria are met. 4.3.2 Multiple Mentions
Such iterative optimization can boost both the In many cases, a named entity has more than one
performance of the structure understanding mention within a webpage. Therefore, it is natural
component and text understanding component. to collect evidence from all the different mentions
4.2 The Extended Models of one same named entity to make a decision on
As we introduced previously, the state-of-the-art all these occurrences together. The evidence from
models for webpage structure understanding and all the other mentions of a named entity is
text understanding are the HCRF model and the delivered to the vision tree node, where one of the
Semi-CRF model, respectively. However, there is mentions of the named entity lies via feature
no way to make them interact with each other in function ukð_Þ, when the extended Semi-CRF
their original forms. Therefore, we extend them by model is working.
introducing additional input parameters to the ukð_Þ can introduce the segmentation and
feature functions. The original forms of the HCRF labeling evidence from other occurrences of the
model and the Semi-CRF model have been text fragment all over the current webpage. By
introduced in Section 4. There- fore, we will only referencing the decision Sj_1 all over the text
introduce the forms of the extended HCRF model strings in last iteration, ukð_Þ can determine
and the extended Semi-CRF model in this section. whether the same text fragment has been labeled
We first extend the HCRF model by introducing as an ORGANIZATION elsewhere, or whether it
other kinds of feature functions. These feature has been given a label other than STREET. By this
functions take the segmentation of the text strings means, the evidence for a same named entity is
as their input. Analogizing to the feature functions shared among all its occurrences within the
defined in Section 4.2, we use ekðHjt; X; SÞ to webpage.
represent the feature functions having text strings
segmentation input. To simplify the expression, we
use the functions defined on the triangle to
195
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
196
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
not seen by BHS. However, it is a practical and the extended Semi-CRF model for text
strategy to incorporate as many resources as understanding. The performance of both models
possible, as long as these resources are easy to can be boosted in the iterative optimization
obtain and the algorithm can handle them. For the procedure. The auxiliary corpus is introduced to
HNS algorithm, the additional corpus is easy to train the
obtain and it can handle it without too much statistical language features in the extended Semi-
effort. CRF model for text understanding, and the
The contribution of the multiple mention features multiple occurrence features are also used in the
is reflected by the difference between the MHS extended Semi-CRF model by adding the decision
algorithm and the NHS algorithm. The multiple of the model in last iteration. Therefore, the
mention features helped the MHS algorithm to extended Semi-CRF model is improved by using
increase both the precision and recall of the both the label of the vision nodes assigned by the
business NAME compared with the NHS algorithm. HCRF model and the text segmentation and
However, we can see that the improvement is labeling results, given by the extended Semi-CRF
limited. It proves that the simple feature sharing model itself in last iteration as additional input
mechanism could not fully utilize the information parameters in some feature functions; the
The WebNLP framework gets the best numbers on extended HCRF model benefits from the extended
all attributes and the object as a whole. It amends Semi-CRF model via using the segmentation and
the decrease of the recall of CITY and STATE by labeling results of the text
reusing part of the Semi-CRF model in BHS. The strings explicitly in the feature functions. The
iterative labeling procedure greatly improved the WebNLP framework closes the loop in webpage
recall of the business NAME. In our experiment, understanding for the first time. The experimental
we found out that two iterations were enough to results show that the WebNLP framework
make the labeling procedure converged. performs significantly better than the state-ofthe-
Therefore, the process of the WebNLP algorithm in art algorithms on English local entity extraction
this experiment was Semi-CRF ! HCRF ! Semi-CRF and Chinese named entity extraction on
! HCRF ! Semi-CRF. webpages.
We can also conclude from Table 1 that the object
extraction benefits from the improvement of the
attribute extraction, i.e., the extended Semi-CRF REFERENCES
model helps the extended HCRF model to make a [1] J. Cowie and W. Lehnert, ―Information
better decision on the object block extraction. Extraction,‖ Comm. ACM,vol. 39, no. 1, pp. 80-91,
Essentially, the object is described by its 1996.
associated attributes. [2] C. Cardie, ―Empirical Methods in Information
Extraction,‖ AI Magazine, vol. 18, no. 4, pp. 65-80,
1997.
[3] R. Baumgartner, S. Flesca, and G. Gottlob,
6 CONCLUSIONS ―Visual Web Information Extraction with Lixto,‖
Proc. Conf. Very Large Data Bases(VLDB), pp.
Webpage understanding plays an important role in 119-128, 2001.
Web search and mining. It contains two main [4] A. Arasu and H. Garcia-Molina, ―Extracting
tasks, i.e., page structure understanding and Structured Data from Web Pages,‖ Proc. ACM
natural language understanding. However, little SIGMOD, pp. 337-348, 2003.
work has been done toward an integrated [5] D.W. Embley, Y.S. Jiang, and Y.-K. Ng,
statistical model for understanding webpage ―Record-Boundary Discovery in Web Documents,‖
structures and processing natural language Proc. ACM SIGMOD, pp. 467-
sentences within the HTML elements. 478, 1999.
In this paper, we introduced the WebNLP [6] N. Kushmerick, ―Wrapper Induction: Efficiency
framework for webpage understanding. It enables and Expressiveness,‖ Artificial Intelligence, vol.
bidirectional integration of page structure 118, nos. 1/2, pp. 15-68, 2000. YANG ET AL.:
understanding and natural language CLOSING THE LOOP IN WEBPAGE
understanding. Specifically, the WebNLP UNDERSTANDING 649 TABLE 2 Extraction
framework is composed of two models, i.e., the Evaluation of NLP and WebNLP Authorized
extended HCRF model for structure understanding licensed use limited to: LA TROBE UNIVERSITY.
197
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Downloaded on July 17,2010 at 10:28:16 UTC Agents and Multi-Agent Systems, vol. 4, nos. 1/2,
from IEEE Xplore. Restrictions apply. pp. 93-114, 200 1.
[7] K. Lerman, S. Minton, and C.A. Knoblock, [9] J. Zhu, Z. Nie, J.-R. Wen, B. Zhang, and W.-Y.
―Wrapper Maintenance: A Machine Learning Ma, ―Simultaneous Record Detection and Attribute
Approach,‖ J. Artificial Intelligence Labeling in Web Data Extraction,‖
Research (JAIR), vol. 18, pp. 149-181, 2003. Proc. Int‘l Conf. Knowledge Discovery and Data
[8] I. Muslea, S. Minton, and C.A. Knoblock, Mining (KDD), pp. 494-503, 2006
―Hierarchical Wrapper Induction for
Semistructured Information Sources,‖ Autonomous
.
198
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*II M.E CC, PSNA College of Engineering and Technology, Dindigul, Tamil Nadu .
Email: pcmathew_bmc@yahoo.com
**Lecturer, Department of Information and Technology, PSNA College of Engg and Technology,
Dindigul, Tamil Nadu
199
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
from those images, while higher-level processing provider. John wants to keep the information
involves ―making sense‖ of an ensemble of confidential to Ben. In this situation, John encrypts
recognized objects as in image analysis, and the data using a simple cipher and gets its
performing the cognitive function associated with forwarded. Thus Ben can compress the data
human vision. In particular digital image without accessing the secret key. If Ken holds the
processing is the practical technology for area of: secret key used by John, then he will be able
Image compression, Classification, Feature perform joint decryption and decompression. Thus
extraction, Pattern recognition, Projection, Multi- the overall performance of the system can be
scale signal analysis. increased by doing this. This is illustrated in the
figure 2.
The rest of the paper is organized as follows.
II. COMPRESSION OF ENCRYPTED IMAGES Section III gives about the related work in this
Security in communication systems has become area. Section IV gives a detailed explanation of the
increasingly important in recent times. The proposed system. Section V concludes and
Internet has become a hostile environment with discusses about the future work.
both wired and wireless channels offering no
inherent assurance of confidentiality. Strong
encryption schemes, such as the Advanced
Encryption Standard (AES), have been designed to Compression Decompression
provide confidentiality for arbitrary binary data.
However, communications have become
increasingly multimedia in nature and such strong
encryption schemes do not take into account the
special characteristics of multimedia data and the Encryption Decryption
way in which they are accessed. Images and video
are typically large in size compared to text and
audio, and often already consume significant Fig. 2. Conventional approach for secure data
computational resources at both the source and transmission.
receiver for coding and decoding, respectively.
Also, applications such as remote surveillance may III. RELATED WORK
involve the streaming of sensitive visual image
data over untrusted networks. Confidentiality may In the existing system, Lossless compression of
be required, but blindly applying a strong encrypted sources can be achieved through
encryption scheme such as AES would demand a Slepian-Wolf coding [3]. For encrypted real-world
prohibitive amount of computational resources for sources, such as images, the key to improve the
the large volume of real-time data. Other compression efficiency is how the source
applications, such as online collaboration, may dependency is exploited. Trellis Coded Vector
involve the use of power limited mobile devices, Quantization [3] can also be used for compressing
such as mobile phones and personal digital the encrypted image sources. It has been reported
assistants (PDAs) with embedded imaging that good results are produced for the binary
capabilities, forming ad-hoc wireless networks. images. But still challenges remain when it comes
Most of the computational resources of the devices practical in real world applications. The coding
are dedicated to the coding anddecoding of the efficiency can be improved only by exploiting the
visual data, making the application of schemes source dependency. Both these two techniques
such as AES exceedingly difficult or impossible. have the following disadvantages.
For secure transmission of data through the • Markov decoding in a Slepian-Wolf coding
communication channel, the data is usually first is expensive with computational complexity.
compressed and then encrypted at the source and • The source dependency is not fully
at the destination the data is received and is utilized.
decrypted followed by decompression [1]. This is • Since image and video are highly
illustrated in the figure 1. nonstationary, the Markov model cannot describe
But this traditional method is not suitable for its local statistics precisely.
some applications. For example if John want to
send information to Ken, while Ben is the network
200
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
• For 8-bit gray scale images, only two most outstrip the capabilities of available technologies.
significant bit-planes are compressible by In a distributed environment large image files
employing a 2-D Markov model in bit planes[10]. remain a major bottleneck within systems. Image
A. Encryption Compression is an important component of the
Image encryption techniques try to convert an solutions available for creating image file sizes of
image to another one that is hard to understand. manageable and transmittable dimensions.
On the other hand, image decryption retrieves the Platform portability and performance are important
original image from the encrypted one. There are in the selection of the compression/decompression
various image encryption systems to encrypt and technique to be employed.
decrypt data, and there is no single encryption Image compression has become increasingly
algorithm satisfies the different image types. important with the continuous development of
Internet, remote sensing and satellite
communication techniques. Due to the high cost of
providing a large transmission bandwidth and a
huge amount of storage space, many fast and
efficient image compression engines have been
introduced.
In image processing applications such as web
browsing, photography, image editing and
printing, a lossy coding such as JPEG is sufficient
as an image compression tool. Although some
information loss can be tolerated in most of these
applications, there are certain images processing
Fig. 2. Secure transmission using compression of applications that demand no pixel difference
encrypted data. between the original and the reconstructed image.
Such applications include medical imaging, remote
Most of the algorithms specifically designed to sensing, satellite imaging and forensic analysis
encrypt digital images are proposed in the mid- where a lossless compression is extremely
1990s. There are two major groups of image important.
encryption algorithms: (a) non-chaos selective
methods and (b) Chaos-based selective or non- IV. PROPOSED WORK
selective methods. Most of these algorithms are In the proposed system, in order to achieve
designed for a specific image format compressed efficient compression of encrypted images, a
or uncompressed, and some of them are even Resolution Progressive Compression scheme is
format compliant. There are methods that offer used. Here the encryption is performed using RSA
light encryption (degradation), while others offer algorithm. Here it compresses an encrypted image
strong form of encryption. Some of the algorithms progressively in resolution, such that the decoder
are scalable and have different modes ranging can observe a low-resolution version of the image,
from degradation to strong encryption. study local statistics based on it, and use the
statistics to decode the next resolution level. The
B. Image Compression success of RPC scheme is due to enabling partial
Data Compression is one of the technologies for access to the current source at the decoder side to
each of the aspect of this multimedia revolution. improve the decoder‘s learning of the source
Cellular phones would not be able to provide statistics.
communication with increasing clarity without data The encoder gets the ciphertext and decomposes
compression. Data compression is art and science it into four subimages, namely, the 00, 01, 10, and
of representing information in compact form. 11 sub-images. Each sub-image is a
Uncompressed multimedia (graphics, audio and downsampled-by-two version of the encrypted
video) data requires considerable storage capacity image. When the decomposition image is
and transmission bandwidth. Despite rapid obtained, we try to find a way how to code the
progress in mass-storage density, processor wavelet coefficients into an efficient result, taking
speeds, and digital communication system redundancy and storage space into consideration.
performance, demand for data storage capacity
and data-transmission bandwidth continues to
201
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
System Description
Context Adaptive
Interpolation
Fig. 3. Block Diagram of RPC Scheme. Three lists are maintained by the algorithm:
1) list of insignificant sets (LIS);
SPIHT is one of the most advanced schemes 2) list of insignificant pixels (LIP);
available, even outperforming the state-of-the-art 3) list of significant pixels (LSP).
JPEG 2000 in some situations. The basic principle The LIS contains two types of entries,
is the same; a progressive coding is applied, representing the sets D(i,j) and L(i,j). The LIP is a
processing the image respectively to a lowering list of insignificant coefficients that do not belong
threshold. The difference is in the concept of to any of the sets in the LIS. The LSP is a list of
zerotrees (spatial orientation trees in SPIHT). This coefficients that have been identified as
is an idea that takes bounds between coefficients significant. The SPIHT algorithm encodes the
across subbands in different levels into wavelet coefficients by selecting a threshold such
consideration. The first idea is always the same: if that T≤max(i,j)│Cij,│< 2T, where (i,j) ranges over
there is a coefficient in the highest level of all coordinates in the coefficient matrix. Initially,
transform in a particular subband considered the LIP contains the coefficients in H, the LIS
insignificant against a particular threshold, it is contains D(i,j) entries, where (i,j) are coordinates
very probable that its descendants in lower levels with descendants in H , and LSP is empty. During
will be insignificant too, so we can code quite a the sorting pass, the significant coefficients in the
large group of coefficients with one symbol. In the LIS are identified by partitioning the sets D(i,j)into
SPIHT algorithm, each 2x2 block of coefficients in L(i,j) and the individual coefficients in O(i,j) or
the root level corresponds to three trees of L(i,j) into D(k,l),where (k,l)εO(i,j). During the
coefficients, as shown in Fig. 4. The coefficient at refinement pass, all coefficients in LSP that have
(i, j) is denoted as Ci,j. The following sets of been identified as significant in previous passes
coefficients are defined. are then refined in a way similar to binary
•O(i,j) is the set of coordinates of the children of search.ach significant coefficient is moved to the
the coefficient at (i,j) . LSP. The threshold is decreased by a factor of two,
•D(i,j)is the set of coordinates of all descendants and the above steps are repeated. The encoding
of the coefficient at (i,j). process stops when the desired bit rate is reached.
•H is the set of coordinates of all coefficients in The output is fully embeddedso that the output at
the root level. a higher bit rate contains the output at all lower
. L(i,j)= D(i,j)- O(i,j) bit rates embedded at the beginning of the data
Given a threshold T= 2n, a set of coefficients S is stream.
significantif there is a coefficient in S whose
magnitude is at least T.
202
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
203
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
LSBs are sent prior tothe MSBs, such that the [5] J. D. Slepian and J. K. Wolf, ―Noiseless coding
decoder can have better knowledge about the of correlated information sources,‖ IEEE Trans.
pixels before starting decoding the MSBs. The four Inf. Theory, vol. IT-19, pp. 471–480, Jul. 1973.
MSBs, on the other hand, are Slepian-Wolf [6] -Minn Ang and Kah Phooi
encoded using rate-compatible punctured turbo Seng, ―Lossless Image Compression using Tuned
codes in a bit-plane based fashion. The sending Degree-K Zerotree Wavelet Coding‖, Proceedings
rate of each Slepian-Wolf coded bit-plane is of the International MultiConference of Engineers
determined by the decoder‘s feedback. and Computer Scientists Vol I, IMECS 2009, March
18 - 20, 2009, Hong Kong.
[7]A. A. Kassim, W. S. Lee: Embedded Color
Image Coding Using SPIHT With Partially Linked
Spatial Orientation Tree, IEEE Transactions on
V. CONCLUSION Circuits and Systems for Video Technology, vol.
An efficient compression of encrypted image data 13, pp. 203-206, 2003.
scheme was proposed, employing SPIHT [8] Q. Yao, W. Zeng, and W. Liu, ―Multi-resolution
compression algorithm and RSA algorithm. Here based hybrid spatiotemporal compression of
this method provides a better coding efficiency encrypted videos,‖ in Proc. IEEE Int. Conf. Acous.,
and less computational complexity than existing Speech and Sig. Process., Taipei, Taiwan, R.O.C.,
approaches. This technique allows only partial Apr. 2009, pp. 725–728.
access to current sources at the decoder side. [9] J. Bajcsy and P. Mitran, ―Coding for the
Thus, further in the future we could try to use for Slepian-Wolf problem with turbo codes,‖ in Proc.
compression of encrypted videos where Resolution IEEE Global Telecommun. Conf., San Antonio, TX,
Progressive Compression Scheme can be used for Nov. 2001, pp. 1400–1404.
interframe and intraframe correlation learning at [10] Wei Liu, Wenjum Zeng, Lina Dong, and
the decoder side. Qiuming Yao, ―Efficient Compression of Gray Scale
REFERENCES Images‖, Vol. 19, no.4, Apr 2010.
[11] J.J. Amador, R. W.Green ―Symmetric-Key
[1] M. Johnson, P. Ishwar, V. M. Prabhakaran, D. Block Cipher for Image and Text Cryptography‖:
Schonberg, and K. Ramchandran, ―On International Journal of Imaging Systems and
compressing encrypted data,‖ IEEE Trans. Signal Technology, No. 3, 2005, pp. 178-188.
Process., vol. 52, no. 10, pp. 2992–3006, Oct. [12] M. J. Weinberger, J. J. Rissanen, and R. B.
2004. Arps, ―Applicationsof universal context modeling to
[2] A. Liveris, Z. Xiong, and C. Georghiades, lossless compression of gray-scale images,‖ IEEE
―Compression of binary sources with side Trans. Image Processing, vol. 5, pp. 575–586, Apr.
information at the decoder using LDPC codes,‖ 1996.
IEEE Commun. Lett., vol. 6, no. 10, pp. 440–442,
Oct. 2002.
[3] Y. Yang, V. Stankovic, and Z. Xiong, ―Image
encryption and data hiding: Duality and code
designs,‖ in Proc. Inf. TheoryWorkshop, Lake
Tahoe, CA, Sep. 2007, pp. 295–300.
[4] D. Schonberg, ―Practical Distributed Source
Coding and its Application to the Compression of
Encrypted Data,‖ Ph.D. dissertation, Univ.
California, Berkeley, 2007.
204
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
205
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
only simple clones is known in the field. The describes the implementation of CCfinder.
main problem is the huge number of simple Section 5 describes a mechanism to create
clones typically reported by clone detection generic representation of structural clones
tools. There have been a number of attempts found in the system for better maintenance
to move beyond the raw data of simple and reuse. Section 6 presentsthe related work
clones.We observed that at the core of the in higher-level similarities and design recovery.
structural clones, often there are simple clones Section 8 concludes the paper and presents
that coexist and relate to each other in certain future work.
ways.
2Structural clones- higher level
We proposed a technique to detect some Similarities in programs
specific types of structural clones from the
repeated combinations of colocated simple We describe in detail the phenomenon of
clones. We implemented the structural clone higher level similarities, which we call
detection technique in a tool called CCFinder, structural clones. We define structural clones
implemented in C++. It has its own token- as similar program structures that can be
based simple clone detector. Our structural analyzed hierarchically, at many levels of
clone detection technique works with the abstraction, with similar code fragments at the
information of simple clones, which may come bottom of such hierarchy. Locating these
from any clone detection tool. It only requires higher level similarities can have significant
the knowledge of simple clone sets and the value for program understanding, evolution,
location of their instances in programs. As reuse, and reengineering.
structural clones often represent some domain
or design concepts, their knowledge helps in 2.1 From Simple Clones to
program understanding, and their detection Structural Clones
opens new options for design recovery that
are both practical and scalable. Representing We primarily focus on similarity patterns
these repeated program structures of large representing design concepts or solutions that
granularity in a generic form also offers can be of significant importance in the context
interesting opportunities for reuse and their of understanding, maintaining, reengineering
detection becomes useful in the reengineering or reusing programs. We use the term
of legacy systems for better maintenance. structural clone to mean similar program
structures that are configurations of lower-
We can find clone patterns in different units of level similar program entities. Therefore, our
code, either methods or classes or structural clones may form a hierarchy of
components or modules, gaining useful clones, with cloned code fragments at the
insights into the cloning situation at different bottom level.
levels of abstraction. We have initially tried 2.1.1File-Level Structural Clone
this approach at file level, by finding the
frequently occurring clone patterns in different
files and analyzing those patterns, with
promising results.Detecting the frequently co-
occurring clone classes in different files, we
can isolate the groups of files that have strong
similarity with each other. This is achieved by
a clustering algorithm that we have devised
for this particularproblem. These clusters of
highly similar files form basic structural clones.
Fig 1- File level Structural clone
The remainder of this paper is organized Functions shown in the same shade are
as follows: In Section 2, we define types of clones of each other (e.g., staff_fn1, task_fn1,
structural clones-higher level similarities in project_fn1). The
programs. Section 3 describes our detecting
structural clones with data mining. Section 4
206
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Relationship between the functions is ‗same where each structure has four files
file‘, which holds between fragments of the create[M].php,
same file, regardless of the order in which display[M].php,edit[M].php,delete[M].php([M]
they appear. The three host files editStaff.php, =Staff, Task, Project) as entities and the
editTask.php and editProject.php perform relationship ‗same folder‘ among the entities
similar tasks, but belong to three different (relationships are not shown in figure). Note
modules (i.e., Staff module, Task module, and thatthe module Project does not carry a
Project module). Provided these structural deleteProject.php file. Still, there was enough
clones cover a substantial portion of the host similarityamong Project and other modules to
files, we can consider the three files as consider all of them as structural clones of
abstract entities that are clones of each other, each other.
as discussed in the previous section. This
illustrates how the concept of structural clones 2.1.3Multiple Structural Clones
helps us to move from smaller entities (in this in the Same File
case, functions) to larger entities (in this case,
files). These files can now be considered as
entities in forming a higher level structure.
207
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2.1.4Crosscutting Structural
Clones
208
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
interpretation is likewise for the other data from FCIM is in the format shown in Figure 7.
rows. Each row represents one frequentclone
pattern along with its support count, indicating
the number of files containing this clone
209
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
some clones may overlap in a file, as generated in the form of textfiles so that any
discussed earlier,so we cannot simply add up visualization tool developed in the futurecan
the size of all clones in a pattern to find the easily interface with Clone Miner.For
file coverage. The clustering based onthese performance evaluation, we ran CCFinder on
values and other parameters can also be made full J2SE 1.5 source code, 3 consisting of
totally customizable to suit the needs ofthe 6,558 source files in370 directories, 625,096
different users. Currently, we let the user LOC (excluding comments and blanklines),
specify a minimum FPC and FTC value and 70,285 methods, using different values of
toindicate the significance of a cluster. The minimumclone size. For forming FCSets and
cluster will be considered significant even if MCSets, a value of50 tokens is used for the
one filehas the FPC or FTC value greater than clustering parameter minLen, wherethe Len is
threshold values. The expected output is to measured in terms of tokens. Likewise,
find all thesignificant clusters that cover forminCover, a value of 50 percent is used in
maximum number of files and no file is all cases. The testswere run on a Pentium IV
preferably repeated intwo clusters. machine with 3.0 GHz processorand 1GB
RAM.Each time it took around two to three
minutesto run the whole process from finding
simple clones to theanalysis of files, methods,
and directories for structuralclones
210
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
configurability and the Diff featurecan aid the configuration of lower-level clones. We
user in the finer details of refactoring.This presented a technique for detecting
proposed method is somewhat general due to structuralclones. The process starts by finding
the varying objectives; it gives a basic simple clones (that is, similar code fragments).
framework for the analysis process. Increasingly higher-level similarities are then
6 Related works found incrementally using data mining
technique of finding frequent closed item sets,
Clone detection tools produce an and clustering. We implemented the structural
overwhelming volume ofsimple clones‘ data clone detection technique in a tool called
that is difficult to analyze in order to finduseful Clone Miner. While Clone Miner can also detect
clones. This problem prompted different simple clones, its underlying structural clone
solutionsthat are related to our idea of detection technique can work with the output
detecting structural clones. Clone detection from any simple clone detector. Structural
techniques using Program DependenceGraphs clone informationleads to better program
(PDG). In addition tothe simple clones, these understanding, maintenance,reengineering
tools can also detect noncontiguousclones, and reuse.
where the segments of a clone are connected
bycontrol and data dependency information
links. Such clonesalso fall under the premise of
structural clones. While our technique detects 8 References
structural clones with segments related to
each other based only on their colocation, with [1] H.A. Basit and S. Jarzabek, ―Detecting
or without information links, the PDG-based Higher-Level Similarity Patterns in Programs,‖
techniques relate them using the information Proc. European Software Eng. Conf. and ACM
links only. Moreover, the clustering mechanism SIGSOFT Symp. Foundations of Software Eng.,
in Clone Miner, to identify to identify groups of pp. 156-165, Sept. 2005
highlysimilar methods, files, or directories
based on theircontained clones, is missing [2] J.R. Cordy, ―Comprehending Reality:
from these techniques. Practical Barriers toIndustrial Adoption of
Software Maintenance Automation, ―Proc. 11th
PR-Miner is another tool that discovers IEEE Int‘l Workshop Program Comprehension,
implicitprogramming rules using the frequent (keynote paper), pp. 196-206, 2003.
item set technique. Compared to structural
clones found by Clone Miner, these [3] A.De Lucia, G. Scanniello, and G. Tortora,
programming rules are much smaller entities, ―Identifying Clones in Dynamic Web Sites
usually confined to a couple of function calls Using Similarity Thresholds,‖ Proc. Int‘l Conf.
within a function. The work by Ammon‘s et al. Enterprise Information Systems, pp. 391-396,
is also similar, finding the frequent interaction 2004.
patterns of a piece of code with an API or an
ADT, and representing it in the form of a state [4] J.Y. Gil and I. Maman, ―Micro Patterns in
machine. These frequent interaction patterns Java Code,‖ Proc. 20th Object Oriented
may appear as a special type of structural Programming Systems Languages and
clone, in which the dynamicrelationship of Applications, pp. 97-116, 2005.
cloned entities is considered. Similar toClone
Miner, this tool also helps in avoiding [5] G.Grahne and J. Zhu, ―Efficiently Using
updateanomalies, though only in the context Prefix-Trees in Mining Frequent Itemsets,‖
of anomalies to the frequent interaction Proc. First IEEE ICDM Workshop Frequent
patterns. Itemset Mining Implementations, Nov. 2003.
211
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
212
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
213
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
214
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
215
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
216
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
customer data that are predictive. For Figure.1 The Basic CRM Cycle
example, a firm could use data mining to
predict the behavior surrounding a
particular lifecycle event (e.g., retirement) In this figure 1, boxes represent actions:
and find other people in similar life stages
and determine which customers are The customer takes the initiative of
following similar behavior patterns. contacting the company, e.g. to purchase
something, to ask for
Analysis of Data Mining in CRM after sales support, to make a reclamation
or a suggestion etc.
Problem Context: The maximization of • The company takes the initiative of
lifetime values of the (entire) customer contacting the customer, e.g. by launching
base in the context of a company's a marketing campaign, selling in an
strategy is a key objective of CRM. Various electronic store or a brick-and-mortar
processes and personnel in an store etc.
organization must adopt CRM practices • The company takes the initiative of
that are aligned with corporate goals. For understanding the customers by analyzing
each institution, corporate strategies such the information available from the other
as diversification, coverage of market two types of action. The results of this
niches or minimization of operative costs understanding guide the future behaviour
are implemented by "measures", such as of the company towards the customer,
mass customization, segment-specific both when it contacts the customer and
product configurations etc. The role of when the customer contacts it. The reality
CRM is in supporting customer-related of CRM, especially in large companies,
strategic measures. looks quite different from the central
Customer understanding is the core of coordination and integration suggested by
CRM. It is the basis for maximizing Figure 1:
customer lifetime value, which in turn • Information about customers flows into
encompasses customer segmentation and the company from many channels, but not
actions to maximize customer conversion, all of them are intended for the acquisition
retention, loyalty and profitability. Proper of customer-related knowledge.
customer understanding and actionability • Information about customers is actively
lead to increased customer lifetime value. gathered to support well-planed customer-
Incorrect customer understanding can related actions, such as marketing
lead to hazardous actions. Similarly, campaigns and the launching of new
unfocused actions, such as unbounded products. The knowledge acquired as the
attempts to access or retain all customers, result of these actions is not always
can lead to decrease of customer lifetime juxtaposed to the original assumptions,
value (law of diminishing return). Hence, often because the action-taking
emphasis should be put on correct organizational unit is different from the
customer understanding and concerted information-gathering unit. In many cases,
actions derived from it. neither the original information, nor the
derived knowledge are made available
outside the borders of the organizational
unit(s) involved. Sometimes, not even
their existence is known.
• The limited availability of customer-
related information and knowledge has
several causes. Political reasons, e.g.
rivalry among organization units, are
known to lead often in data and
knowledge hoarding. A frequently
expressed concern of data owners is that
data, especially in aggregated form,
217
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
218
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
3. Edelstein H. Data mining: exploiting 9. Hill L. CRM: easier said than done.
the hidden trends in your data. DB2 Online Intelligent Enterprise 1999;2(18):53.
Magazine.
http://www.db2mag.com/9701edel.htm 10. Thearling K. Data mining and CRM:
zeroing in on your best customers. DM
4. Data Intelligence Group Pilot Software. Direct. December, 1999.
An overview of data mining at Dun & http://www.dmreview.com/editorial/dmrev
Bradstreet. DIG White Paper, 1995 iew/print—action.cfm?EdID
http://www3.shore.net/_kht/text/wp9501/
wp9501.shtml 11. Chablo E, Marketing Director,
smartFOCUS Limited. The importance of
5. Decision Support Solutions: Compaq. marketing data intelligence in delivering
Object relational data successful CRM, 1999. http://www.crm-
mining technology for a competitive forum.com/crm—forum—white—
advantage. papers/mdi/sld01.htm
http://www.tandem.com/brfs—
wps?odmadvwp/odmadvwp.htm 12. Thearling K, Exchange Applications,
Inc. Increasing customer value by
6. Garvin M. Data mining and the web: integrating data mining and campaign
what they can do together. management software. Exchange
DMReview.com, 1998. Applications White Paper, 1998.
http://www.dmreview.com/editorial/dmrev http://www.crmforum. com/crm—forum—
iew/print— white—papers/icv/sld01.html
219
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
220
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
221
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
222
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
time the object updates its location (either quarantine areas overlap this cell as the p.srQ
client-initiated or server- initiated), the index for any rest query is the cell itself. These
should be optimized to handle frequent overlapping queries are called relevant queries
updates. and are exactly pointed by the bucket of this
3.3 Query processor and Location Manager cell in the query index.
The Location manager computes the
safe region of an object. Safe region is based
on the Quarantine line. The location manager
recomputed the safe region in two cases:
1.after a new query is evaluated 2. After
object sends the location update. Query
processor evaluates the Query using KNN
algorithm (Algorithm 2).
Mobile
Bounding Box Location Updater Client
223
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
region p.sr and the safe region for this new (D(q,p)). In general, when an object u is
query Q, i.e., p.sr′ = p.sr∩p.srQ. 2) After popped from H, we need to do the following.
processing a source-initiated location update If d(q,ui) is larger than D(q, v),
of object p, p‘s safe region needs to be Where v is the top object in h, then v is
completely recomputed by computing the guaranteed a kNN and removed from h. Then,
p.srQ for each relevant query. 3) During the d(q,u) is compared with the next D(q, v) until
processing of a source-initiated location it is no longer the larger one. Then, u itself is
update, if object p is probed, its safe region is inserted to h and the algorithm continues to
also completely recomputed as in case 2. pop up the next entry from H. The algorithm
Although it is still a probe as in case 1 and continues until k objects are returned.
only one p.srQ changes (i.e., the query which
Algorithm 3:Reevaluating a kNN Query
probes p), we completely recomputed p.sr Input: C: existing set of kNNs
since p.srQ could be enlarged by this probe p: the updating object
and recomputing it allows such enlargement to Output: C: the new set of kNNs
contribute to p.sr. the objective of a safe Procedure:
region is to reduce the number of location 1: if p is closer to the k-th NN then
updates. 2: if p ЄC then
3: p* =the rank of p in C;
4: else
5: p* =k;
6: enqueuep into C;
7: else
8: if p Є C then
9: evaluate 1NN query to find u;
10: p* =k;
11: remove p and enqueueu into C;
Algorithm 2: Evaluating a new kNN Query 12: relocate p or u in C, starting from p*;
Input: root: root node of object index q: the
query point To reevaluate an existing kNN query that is
Output C: the set of kNNs affected by the updating object p, the first
Procedure: step is to decide whether p is a result object
1: initialize queue H and h; by comparing p with the kth NN using the
2: enqueue<root; d(q,root)>into H; ―closer‖ relation: if p is closer, then it is a
3: while |C| < k and H is not empty do result object;
4: u = H.pop (); otherwise, it is a non result object. This then
5: if u is a leaf entry then leads to three cases: 1) case 1: p was a result
6: while d(q, u) > D(q,v)do
object but is no longer so; 2) case 2: p was
7: v = h.pop();
8: insert v to C; not a result object but becomes one; and 3)
9: enqueueu into H; case 3: p is and was a result object.4 For case
10: else if u is an index entry then 1, there are fewer than k result objects, so
11: for each child entry v of u do there should be an additional step of
12: enqueue<v, d(v,q)>into H; evaluating a 1NN query at the same query
point to find a new result object u. The
evaluation of such a query is almost the same
The query is evaluated using the algorithm 2 as Algorithm 2, except that all existing kNN
the algorithm maintains an additional priority result objects are not considered. The final
queue h besides H. It is a priority queue of step of reevaluation is to locate the order of
objects sorted by the ―closer‖ relation. The new
reason to introduce h is that when an object p result object p in the kNN set. This is done by
is popped from H, it is not guaranteed a kNN comparing it with other existing objects in the
in the new space. Therefore, H is used to hold kNN set using the ―closer‖ relation. For cases 1
p until it can be guaranteed a kNN. This and 2, since this object is a new result object,
occurs when another object p‘ is popped from the comparison should start from the kth NN,
H, and its minimum distance to q (d(q,p‘)) is then
larger than the maximum distance of p to q
224
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
0.8
0.6
0.4
0.2
0
overall 2 3 4 5
k
Proposed(No Dummy)
8
graph finds the location anonymity set and 6
Proposed(Dummy)
225
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
226
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
227
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
find a party trusted by all, which can read all the has a lower computation cost than the related
private tuples for computations without eaking solution in and by trading off communication
the tuples. We focus in this paper on two tuple cost. Since the computation cost dominates the
matching problems, Privacy-Preserving Duplicate whole cost, our protocol is faster than the
Tuple Matching (PPDTM) and Threshold solution in distributed system.
Attributes Matching (PPTAM), and address their
privacy protection issues without existence of a 2. Our PPTAM protocol for the
trusted party. semihonest model has lower computation and
Motivations for PPDTM communication costs than the related solution
PPDTM has numerous applications. For example, derived by the techniques in [22] and [23].
a foundation providing grants to support 3. By constructing the required zero-knowledge
participation of academic conferences usually proofs, we extend the PPDTM and PPTAM
does not accept an application that is protocols on malicious model. Some of these
simultaneously submitted to another foundation zero-knowledge proofs were also mentioned in
(duplicate submission). It may want to find out [23], but detailed constructions were not given.
all duplicate applications and remove them from In addition, the Proof of
its pool: denoted by a tuple with attributes such Correct Polynomial Evaluation (POCPE) is not
as the abbreviated conference name, abbreviated considered in [23], without which an adversary
academic paper title, and the first author‘s name, can ask for decryptions of any useful information
a submission in this foundation can be checked for itself. In comparison with the solutions
by tuple matching with those in other derived from the techniques in [23], our PPDTM
foundations. However, all foundations should do protocol in malicious model has the same
this under protection on applicants‘ personal magnitude of costs, and our PPTAM protocol in
information following privacy policies. malicious model has lower costs both in
Motivations for PPTAM computation and communication.
PRIVACY-PRESERVING DUPLICATE TUPLE
PPTAM can be used to securely find out regular MATCHING
rules in temporal database composed of Main Idea
timestamped transactions. The temporal Denoting T(I, j) as T(I,j)||k . . . k||T(I, j)M, Pi
database can be weekly sales data of a can compute a polynomial fi to represent its
supermarket, investment profits of a financial inputs Ti: fi ¼ (x _ T(i; 1)) _ _ _ (x _ T(i; S))
institution, etc. Some regular rules may be mod N. Then, each coefficient of fi is in ZZN ,
interesting for forecasting and decision support and can be encrypted to get E(fi). If T(i; j) has a
of the database owners, e.g.,―Everyday 6 to 7 duplicate on some Pi0 (i0 6¼ i), its evaluation on
PM, more than 100 bottles of beers are sold.‖ the polynomial Gi ¼ Q i02f0;...;N_1g;i0 6¼i fi0
The rules considered in this paper are about is 0. If T(i; j) has not a duplicate, we make the
whether each single item (beer, chips, etc.) has a evaluation a random number by randomizing Gi
threshold number of count in a regular time as Fi in (2), where r is a random number over
interval, so they are different from the cyclic ZZN . To prevent an adversary from factoring Fi
association rules in [27], which are regular rules to get the honest parties‘ inputs, we need to
among associated items (beer ! chips), such as encrypt Fi, evaluate T(i; j) for j ¼ 1; . . . ; S in
―Everyday 6 to 7 PM, 87 percent customers who E(Fi), and only decrypt the evaluations. If T(i; j)
buy beers also buy chips.‖ We consider the has a duplicate, the decryption will be 0 in the
scenario that several retail stores ally to stand in Paillier scheme. If it has not a duplicate, by
competition with some retailing magnate, and Lemma 1, the decryption will be a random
they want to find out the regular rules over the number.
union of their databases without publishing their PRIVACY-PRESERVING THRESHOLD
databases for privacy concerns. ATTRIBUTES MATCHING
My contributions Main Idea
My main contributions in this paper include:
1. By calculating the total exps (modular As defined in Section 1, for the PPTAM problem,
exponentiations) and muls (modular each Pi needs to privately determine whether T(i;
multiplications), and total communication bits, j) 2 T) . We will treat each attribute of a tuple as
our PPDTM protocol for the semihonest model an individual input. The 1772 IEEE
228
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
229
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
230
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
the accuracy of the suspicion assignment for rate DoS attacks and cannot handle high-
each client session, DDoS shield can provide workload ones.
efficient session schedulers for defending
possible DDoS attacks. However, the overhead In a system viewpoint, our defense scheme is
for per-session validation is not negligible, to embed
especially for services with dense traffic. multiple virtual servers within each physical
CAPTCHA-based defenses introduce additional back-end server and map these virtual servers
service delays for legitimate clients and are to the testing pools in GT, then assign clients
also restricted to human interaction services. A into these pools by distributing their service
kernel observation and brief summary of our requests to different virtual servers. By
method is: the identification of attackers can periodically monitoring some indicators (e.g.,
be much faster if we can find them out by average responding time) for resource usage
testing the clients in group instead of one by in each server and comparing them with some
one.Thus, the key problem I show to group dynamic thresholds, all the virtual servers can
clients and assign themto different server be judged as ―safe‖ or ―under attack.‖ By
machines in a sophisticated way, so that if means of the decoding algorithm of GT, all the
any server is found under attack, we can attackers can be identified. Therefore, the
immediately biggest challenges of this method are
threefold: 1) How to construct a testing matrix
identify and filter the attackers out of its client to enable prompt and accurate detection. 2)
set. Apparently, this problem resembles the How to regulate the service requests to match
group testing (GT) theory which aims to the matrix in practical system. 3) How to
discover defective items in a large population establish proper thresholds for server source
with the minimum number of tests where each usage indicator, to generate accurate test
test is applied to a subset of items, called outcomes. Similar to all the earlier applications
pools, instead of testing them one by one. of GT, this new application to network security
Therefore, we apply GT theory to this network requires modifications of the classical GT
security issue and propose specific algorithms model and algorithms, so as to overcome the
and protocols to achieve high detection obstacle of applying the theoretical models to
performance in terms of short detection practical scenarios. Specifically, the classical
latency and low false positive/negative rate. GT theory assumes that each pool can have as
Since the detections are merely based on the many items as needed and the number of
status of service resources usage of the victim pools for testing is unrestricted. However, in
servers, no individually signature-based
authentications or data classifications are order to provide real application services,
required; thus, it may overcome the limitations virtual servers cannot have infinite quantity or
of the current solutions. GT was proposed capacity, i.e., constraints on these two
during World War Two and has been applied parameters are required to complete our
to many areas since then, such as medical testing model.
testing, computer networks, and molecular
biology . The advantages of GT lie in its Our main contributions in this paper are as
prompt testing efficiency and fault-tolerant follows:
decoding methods . To our best knowledge, . Propose a new size-constrained GT model for
the first attempts to apply GT to networking practical DoS detection scenarios.
attack defense are proposed in parallel by Thai . Provide an end-to-end underlying system for
et al. (which is the preliminary work of this GT based
journal) and Khattab et al.. The latter schemes, without introducing complexity at
proposed a detection system based on the network core.
―Reduced-Randomness Non adaptive . Provide multiple dynamic thresholds for
Combinatorial Group Testing‖ . However, since resource usage indicators, which help avoid
this method only counts the number of error test from legitimate bursts and diagnose
incoming requests rather then monitoring the servers handling various amount of clients.
server status, it is restricted to defending high- . Present three novel detection algorithms
based on the proposed system, and show their
231
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
high efficiencies in terms of detection delay sequential GT, use the results of previous tests
and false positive/negative rate via theoretical to determine the pool for the next test and
analysis and simulations. complete the test within several rounds. While
nonadaptive GT methods employ d-
Besides application DoS attacks, our defense disjunctmatrix , run multiple tests in parallel,
system is and finish the test within only one round. We
applicable to DoS attacks on other layers, e.g., investigate both these methods and propose
protocol layer attack—SYN flood where victim three algorithms accordingly.
servers are exhausted by massive half-open
connections. Although these attacks occur in 2.1.3 Decoding Algorithms
different layers and of different styles, the For sequential GT, at the end of each round,
victim machines will gradually run out of items in negative pools are identified as
service resource and indicate anomaly. Since negative, while the ones in positive pools
our mechanism only relies on the feedback of require to be further tested. Notice that one
the victims, instead of monitoring the client item is identified as positive only if it is the
behaviors or properties, it is promising to only item in a positive pool. Non adaptive GT
tackle these attack types. takes d-disjunct matrices as the testing matrix
M, where no column is contained in the
The paper is organized as follows: In Section Boolean summation of any other d columns.
2, we briefly introduce some preliminaries of Du and Hwang proposed a simple decoding
the GT theory, as well as the attacker model algorithm for this matrix type. A sketch of this
and the victim/detection model of our system. algorithm can be shown using Fig. 1 as an
In Section 3, we propose the detection example. Outcomes V [3] and V [4] are 0, so
strategy derived from the adjusted GT model items in pool 3 and pool 4 are negative, i.e.,
and illustrate the detailed components in the items 3, 4, and 5 are negative. If this matrix M
presented system, In section 4 we will see is a d-disjunct matrix, items other than those
simulation results and finally, we reach our appearing in the negative pools are positive;
conclusion by summating our contributions therefore, items 1 and 2 are positive ones.
and providing further discussions over false
positive negative rate i 2.1.4 Apply to Attack Detection
A detection model based on GT can be
2 PRELIMINARY assume that there are t virtual servers and n
clients, among which d clients are attackers.
2.1 Classic Group Testing Model Consider the matrix in Fig. 1, the clients can
2.1.1 Basic Idea be mapped into the columns and virtual
The classic GT model consists of t pools and n servers into rows in M, where M[i,j]=11 if and
items (including at most d positive ones). As only if the requests from client j are
shown in Fig. 1, this model can be represented distributed to virtual server i. With regard to
by a t x n binary matrix M where rows the test outcome column V, we have V [I,j]= 1
represent the pools and columns represent the if and only if virtual server i has received
items. An entry m[i,j]=1 if and only if the ith malicious requests from at least one attacker,
pool contains the jth item; otherwise, M[i,j]=
0. The t-dimensional binary column vector V
denotes the test outcomes of these t pools,
where 1-entry represents a positive outcome
and 0-entry represents a negative one. Note
that a positive outcome indicates that at least
one positive item exists within this pool, Fig.1. Binary testing matrix M and testing
whereas negative one means that all the items outcome vector V.
in the current pool are negative. but we cannot identify the attackers at once
unless this virtual server is handling only one
2.1.2 Classic Methods client. Otherwise, if V [i,j]= 0, all the clients
Two traditional GT methods are adaptive and assigned to server i are legitimate. The d
non adaptive . Adaptive methods, a.k.a.
232
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
attackers can then be captured by decoding characteristics of this attack . Due to the
the test outcome vector V and the matrix M. benefits of virtual Servers we employ, this
constraint can be relaxed, but we keep it for
the theoretical analysis in the current work
233
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
virtual servers within this machine by means will address the last two in this section and
of virtual switch. This distribution depends on leave the first one to the next section.
the testing matrix generated by the detection
algorithm. By periodically monitoring the 3.2.1 System Overview
average response time to service requests and As mentioned in the detection model, each
comparing it with specific thresholds fetched back-end server works as an independent
testing domain, where all virtual servers within
it serve as testing pools. In the following
sections, we only discuss the operations within
one backend server, and it is similar in any
other servers. The detection consists of
multiple testing rounds, and each round can
Fig.3.Two-state diagram of the be sketched in four stages (Fig. 4):
system.
from a legitimate profile, each virtual server is
associated with a ―negative‖ or ―positive‖ First, generate and update matrix M for
outcome. Therefore, a decision over the testing. Second, ―assign‖ clients to virtual
identities of all clients can be made among all servers based onM. The back-end server maps
physical servers, as discussed further in the each client into one distinct column in
following Section 3. M and distributes an encrypted token queue to
it. Each token in the token queue corresponds
3 STRATEGY AND DETECTION SYSTEM to a 1-entry in the mapped column, i.e., client
j receives a token with destination virtual
3.1 Size Constraint Group Testing server i iffM[i,j]= 1. Being piggy backed with
As mentioned in the detection model, each one token, each request is forwarded to a
testing pool is mapped to a virtual server virtual server by the virtual switch. In addition,
within a back-end server machine. Although requests are validated on arriving at the
the maximum number of virtual servers can be physical servers for faked tokens or identified
extremely huge, since each virtual server malice ID. This procedure ensures that all the
requires enough service resources to manage client requests are distributed exactly as how
client requests, it is practical to have the the matrix M regulates and prevents any
virtual server quantity (maximum number of attackers from accessing the virtual servers
servers) and capacity (maximum number of other than the ones assigned to them. Third,
clients that can be handled in parallel) all the servers are monitored for their service
constrained by two input parameters K and w, resource usage periodically, specifically, the
arriving request aggregate (the total number
respectively. Therefore, the traditional GT of incoming requests) and average response
model is extended with these constraints to time of each virtual server are recorded and
match our system setting The maximum compared with some dynamic thresholds to be
number of attackers d is assumed known shown later. All virtual servers are associated
beforehand. Scenarios with nondeterministic d withpositive or negative outcomes
are out of the scope of this paper. In fact, accordingly.Fourth, decode these outcomes
these scenarios can be readily handled by first and
testing with an estimated d, then increasing d
if exactly d positive items are found. identify legitimate or malicious IDs. By
following the detection algorithms(presented
3.2 Detection System in the next section), all the attackers can be
The implementation difficulties of our identified within several testing rounds. To
detection scheme lower the overhead and delay introduced by
are threefold: how to construct proper testing the mapping and piggybacking for each
matrix M, how to distribute client requests request, the system is exempted from this
based on M with low overhead, and how to procedure in normal service state. As shown in
generate test outcome with high accuracy. We Fig. 3, the back-end server cycles between
two states, which we refer as NORMAL mode
and DANGER mode.
234
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
235
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
236
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Probes Selection for Biological Target [13] L. Ricciulli, P. Lincoln, and P. Kakkar,
Identification,‖ Proc. Conf. Data Mining, ―TCP SYN Flooding Defense,‖ Proc. Comm.
Systems Analysis and Networks and Distributed Systems Modeling
Optimization in Biomedicine, 2007. and Simulation Conf. (CNDS), 1999.
[7] J. Mirkovic, J. Martin, and P. Reiher, ―A [14] D.Z. Du and F.K. Hwang, Pooling
Taxonomy of DDoS Attacks and DDoS Defense Designs: Group Testing in Molecular Biology.
Mechanisms,‖ Technical Report 020018, World Scientific, 2006.
Computer Science Dept., UCLA, 2002. [15] M.T. Thai, D. MacCallum, P. Deng, and
[8] M.J. Atallah, M.T. Goodrich, and R. W. Wu, ―Decoding Algorithms in Pooling
Tamassia, ―Indexing Information for Data Designs with Inhibitors and Fault Tolerance,‖
Forensics,‖ Proc. Int‘l Conf. Applied Int‘l J. Bioinformatics Research and
Cryptography and Network Security (ACNS), Applications, vol. 3, no. 2, pp. 145-152, 2007
pp. 206-221, 2005.
[9] J. Lemon, ―Resisting SYN Flood DoS
Attacks with a SYN Cache,‖ Proc. BSDCON,
2002.
[10] Service Provider Infrastructure Security,
―Detecting, Tracing, and Mitigating Network
eAnomalies,‖http://www. arbornetworks.com,
2005.
[11] Y. Kim, W.C. Lau, M.C. Chuah, and H.J.
Chao, ―Packet score: Statistics based Overload
Control against Distributed Denial-of- Service
Attacks,‖ Proc. IEEE INFOCOM, 2004.
[12] F. Kargl, J. Maier, and M. Weber,
―Protecting Web Servers from Distributed
Denial of Service Attacks,‖ Proc. 10th Int‘l
Conf. World Wide Web (WWW ‘01), pp. 514-
524, 2001.
237
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
238
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
239
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
iii) Methods that use overlapping index specific area (or at a specific point), during a specific
structures in order to represent the state of the time
database in different (valid or transaction ) time interval (or at a specific time instant)‖ are expected
instants to be the most common ones addressed by STDBMS
users.Assuming a hierarchical tree structure, the
retrieval procedure is straightforward: starting from
the root node(s), a downwards traversal of the index
is executed by applying the criterion of intersected
intervals (for time) and ranges (for space) between
the query window and each node approximation. It
is important to point out that pure temporal or pure
spatial selection queries need to be supported as
well.
join queries:
4.TEMPORAL DATA AND Queries of the form ―find all pairs of objects that
QUERY PROCESSING have lied
Temporal attributes are important for many spatially close (i.e., within distance X), during a
applications. Investigate management of moving specifictime interval (or at a specific time instant)‖
objects, whose locations change continuously are also crucial in spatiotemporal databases. An
overtime and hence require continuous answers for immediate application is accident detection by
queries comparing vehicle trajectories. The retrieval
4.1 INDEXING AND MINING ON TEMPORAL DATA procedure is also straightforward: starting from the
Indexing and retrieving data records according to two root nodes, a downwards traversal of the two
their temporal attributes are primitive functionalities indexes is performed in parallel, by comparing the
for managing temporal data. While there are many entries of each visited node according to the overlap
temporal indexes proposed, it is shown in [6] that operator, such as the synchronized tree traversal
how the TSB-tree, a well-known temporal index proposed in [14] for R-tree structures.
structure is implemented in a commercial database nearest-neighbor queries:
and still retains a performance close to one with a Given an object X, the nearest-neighbor query
non-temporal, standard B+-tree. This involves (i) requests for the k closest objects with respect to X.
unique designs of version chaining and treating For example the query: ―find the 5 closest
index terms as versioned records to achieve the ambulances with respect to the accident place‖ is a
TSB-tree implementation with backward nearest-neighbor query. Evidently, such a query can
compatibility with B+tree, (ii) a data compression be supported by the algorithm proposed in [13].
scheme that reduces substantially the storage However, consider the query: ―find the 5 closest
needed for preserving historical data, and (iii) ambulances with respect to the accident place in a
dealing with technical issues such as concurrency time interval of 2 minutes before and after the
control, recovery, handling uncommitted data, and accident,knowing the directions and velocities of
log management ambulances and the street map‖. Evidently, more
Ex: Stock Market sophisticated algorithms are required, towards
spatiotemporal nearest-neighbor query processing.
4.1 QUERY PROCESSING
The major objective a STAM is to efficiently handle 5.ACCESS METHODS FOR PAST, PRESENT, AND
query FUTURE SPATIO-TEMPORAL DATA
processing. The broader is the set of queries
supported, the RPPF -tree [10]: The RPPF -tree (Past, Present, and
more applicable and useful the access method Future) indexes positions of moving objects at all
becomes. A points in time. The past positions of an object
set of fundamental query types as well as some between two consecutive samples are linearly
specialized interpolated and the future positions are computed
queries are discussed in the sequel. via a linear function from the last sample. The RPPF
selection queries: -tree applies partial persistence to the TPR-tree to
Queries of the form ―find all objects that have lied captures the past positions. Leaf and non-leaf
within a entries of the RPPF -tree include a time interval of
240
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
validity - [insertion time, deletion time]. When a tstart; tend; pointer >, where x rep is the
node, say x, is split at time t, entries in x alive at t transformed 1D object location value using an SFC,
are copied to a new node, say y and their tstartand tend are the insert and update times of the
timestamps are set to [t, ?) (i.e., their deletion times object, respectively. Each tree corresponds to a
are unidentified). While a time-parameterized timestamp signature being the end timestamp of a
bounding rectangle (TPBR) of the TPR-tree is valid phase when the tree is built and a lifespan being the
from the current time, the structure of a TPBR in the minimum start time and the maximum end time of
RPPF -tree is valid from its insertion time. The all the entries in that tree. Unlike the B x -tree that
straightforward, optimized, and double TPBRs are concatenates the timestamp signature and the 1D
studied. In the straightforward approach, the transformed value, the BBx -index maintains a
bounding rectangle is the integral of the TPBR from separate tree for each timestamp signature, and
its insertion time to infinity. In the optimized TPBR, models the moving objects from the past to the
the bounding rectangle is the integral of the TPBR future. Insertion is the same as in the B x -tree.
from its insertion time to (current time + H time Instead of deleting an object, update sets the end
units in the future that can be efficiently queried). time of the object to the current time, followed by
The straightforward and optimized TPBRs cannot be an insertion of the updated object into the latest
tightened since these rectangles start from their tree.
insertion times. The double TPBR allow tightening by
having two components: a tail MBR and a head MBR.
The tail MBR starts at the time of the last update 6.CONCLUSION
and extends to infinity and thus is a regular TPBR of
the TPR-tree. The head MBR bounds the finite In this short survey, we presented an overview of
historical trajectories from the insertion time to the existing spatio-temporal index structures. Spatio
last update. Querying is similar to the regular TPR- temporal indexing methods are classified based on
tree search with the exception of redefining the the type and time of the queries they can support
intersection function to accommodate for the double With the variety of spatio-temporal access methods,
TPBR. it becomes essential to have a general and
PCFI+-index [11]: The Past-Current-Future+-Index extensible framework to support the class of spatio-
builds on SETI and TPR_-tree [43]. As in SETI,space temporal indexing. There is still a lot of research
is divided into non-overlapping cells. For each cell, work that needs to be investigated in spatio-
an in-memory TPR_-tree indexes the current temporal indexing. A variety of tests using synthetic
positions and velocities of objects. Current data and real spatiotemporal data sets are necessary in
records are organized as a main-memory hash order to better understand the spatiotemporal
index, hence allowing efficient access to current indexing and retrieval issues.
positions. To index the objects‘ past trajectories, the
PCFI+-index uses a sparse ondisk R_-tree to index REFERENCES:
the lifetimes of historical data that only contains the [1]. Jiawei Han and MichelineKamber― Data Mining
segments from one cell. Insertion, update, and Concepts and Techniques‖ , second edition, Morgan
deletion are similar to those of SETI and TPR_-tree. Kaufmann Publishers an imprint of Elsevier
Upon update, if the new location resides inside the [2] TaherOmran Ahmed and MaryvonneMiquel
same partition, a new segment is inserted into the ―Multidimensional Structures Dedicated to
historical data file; the TPR_-tree updates the new Continuous Spatiotemporal Phenomena ― Springer-
location for the object. Otherwise, a split occurs and Verlag Berlin Heidelberg 2005, pp 1-12
two segments are inserted into the historical data [3] DimitrisPapadias, Yufei Tao, PanosKalnis, Jun
file at different pages; the corresponding entry in the Zhang, "Indexing Spatio-Temporal Data
old TPR_-tree is removed and is inserted into Warehouses," Data Engineering, International
another TPR_-tree. If the insertion of a segment Conference 2002
overflows a page, the corresponding R_-tree entry is 4.Kimball, R. The Data Warehouse Toolkit.John
updated to set its end time. Wiley, 1996.
[5]. Ghazi h. Al-naymat ―new methods for mining
BBx-index[12] : The BBx -index uses the Bx -tree Sequential and time series Data‖ , Ph.D thesis, the
techniques to support the present and future. To university of Sydney june 2009
index the past, the BBx-index keeps multiple tree [6]. Anthony David Veness ―A real-time spatio-
versions. Each tree entry has the form< x rep; temporal data exploration tool for marine
241
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
242
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
***Prof in,CSEdept,R.M.K.EnggCollege,Chennai.niyansree@gmail.com.
243
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
environment due to the fact that they cannot deal FLS parallel programs have the property that the
with the ever-changing environment. Especially, the problem can be divided into sub problems or jobs,
synchronization in FLS programs causes inefficiency: each of which can be solved or executed in roughly
one late job can delay the whole process. This raises the same way. Each run consists of I iterations of P
the need for methods that make the FLS applications jobs, which are distributed on P processors: each
robust against changes in the grid environment. processor receives one job per iteration. Further,
Load-balancing[1] system is a cluster system every run contains I synchronization moments: after
composed of multiple hosts. Each individual host, computing the jobs, all the processors send their
which can provide services independently without data and wait for each others data before the next
any external assistance of other hosts, has iteration starts. In general, the runtime is equal to
equivalent status in the system. They form a cluster the sum of the individual iteration times (ITs)
in a symmetrical way. Logically, from the user side, presents the situation for one iteration of a FLS run
these machines function as a single large virtual in a grid environment. The figure shows that each
machine.[4]Instead of taking all these parameters processor receives a job, and the IT is equal to the
for the comparison, we have considered only two maximum of the individual job times plus the
parameters to represent the fairness and throughput synchronization time (ST). ELB assumes no prior
of the scheduling algorithms. The parameters are knowledge of processor speeds of the nodes and
batch completion time and workstation processing consequently balances the load equally among the
completion time. If workstations are completing their different nodes. The standard FLS program is
processing in less time, then throughput of the implemented according to the ELB principle.
algorithm is higher. Similarly, if batch completion
time is smaller then each job is getting proportionate 2.2 Load Balancing and Job Replication
CPU time resulting in the improved fairness of the In this section, we briefly discuss the two main
algorithm. We have also plotted the graphs for methods to cope with the dynamics of the grid
comparing the variation among these values. From environment: DLB and JR.
the graphs, it is observed that the matrices
considered by us can very well replace the 2.2.1 Dynamic Load Balancing
performance matrices considered by Hui and
Chanson.[2]BigJob is launched, sub-jobs keeps
c c c……….
c
running until final solution is achieved and the
manager quits the Pilot-Jobat that time. In case 1 2 C
………. Ma 1. 2 C
multiple BigJobs are submitted for the same . Slav n ster Slav n
simulation or if a load balancing function isincluded, e
e
sub-jobs experience several restarts from their
checkpointing data.
Servi Ser Servi
ce 1
Slav Slav
vice ce 3
2. System Modelling
Briefly describe the concept of the FLS model. Then,
e
cServi
c Sc e
cServi
c
in Section 2.2, we describe the implementation ……….
1 ce22 C ………
1 ce2n C
details of DLB and JR. Figure:
.
n he
DLB Architecture … n
DLB starts with the execution of an iteration, which
2.1 Full Load Situations does not differ from the common FLS program
du
explained above. However, at the end of each
iteration, the processors predict their processing
speed for the next iteration. We select one processor
ler
to be the DLB scheduler. After every N iteration, the
processors send their prediction to this scheduler.
Subsequently, this processor calculates the ―optimal‖
load distribution given those predictions and sends
relevant information to each processor. The load
distribution is optimal when all processors finish their
calculation exactly at the same time. Therefore, it is
―optimal‖ when the load assigned to each processor
is proportional to its predicted processor speed.
244
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Finally, all processors redistribute the load. Fig. 1b deduce the number of clients that has access to
provides an overview of the different steps within a itself. Basically the relationship between the
DLB implementation on four processors. The resources is a Client-Server relationship. The clients
effectiveness of DLB partly relies on the dividing in the network are attached loosely to the server and
possibilities of the load. Load balancing at every hence the number of clients changes dynamically in
single iteration is rarely a good strategy. On one the network, which has to be known to the server
hand, the runtime of a parallel application directly 3.2 PERFORMANCE ANALYSIS
depends on the overhead of DLB, and therefore, it is The server must deduce the free workspaces in the
better to increase the number of iterations between resources .If the free workspace is between 40 to
two load balancing steps. On the other hand, less 100%, a fragment of the text is assigned to client. If
load balancing leads to an imbalance of the load for it is less than 40% then it means that the clients are
the processors for sustained periods of time due to not free hence the server does not overload those
significant changes in processing speeds. The clients the server maintains entries about its clients.
present the theoretical speedups in runtimes when
using DLB compared to ELB, given that the 3.3 FRAGMENTATION
application rebalances the load every N iterations Based on the performance analysis like CPU idle time
but without taking into account the overhead. Based and the total memory of the remote system (i.e. free
on those speedups and the load balancing overhead and usage) the tasks are fragmented. The fragment
addressed above, a suitable value of N was found to is based on the no of free space available on the
be 2:5P iterations, for P> 1. The effectiveness of remote system.
DLB strongly relies on the accuracy of the 3.4 SCHEDULING & REFRAGMENTATION
predictions. Previous research has shown that the The fragments are then sent to the clients
Dynamic Exponential Smoothing (DES) predictor dynamically. The server has a control over the
accurately predicts the job times on shared resources residing at the clients. After performing
processors. For that reason, the DES predictor has the tasks, encrypted fragments are sent to server to
been implemented in the DLB-simulation obtain the final encrypted message.
implementation of this paper.
4. Function of DLB
2.2.2 Job Replication
In this section, we introduce the concept of job 4.1 DLB Experiments
replicating in FLS parallel programs. In an R-JR run, In this section, we present the results of the
R 1 exact copies of each job have been created and simulations of the DLB runs. We investigated the
have to be executed, such that there exist R samples DLB runtimes with both sets of processors for runs
of each job. Two copies of a job perform exactly the with a CCR of 0.01, 0.25, and 0.50 on 1, 2, 4, 8, 16,
same computations: the data sets, the parameters, and 32 processors. a depicts the average In light of
and the calculations are completely the same. A JR the above quantitative analyses and results; this
run consists of I iterations. One iteration takes in section develops a deterministic algorithm for the
total R steps. R copies of all P jobs have been broadcast schedule optimization and the assessment
distributed to P processors, and therefore, each of mean service access. Runtimes on a logarithmic
processor receives each iteration R different jobs. As scale of all performed simulations on nodes of set
soon as a processor has finished one of the copies, it one. Moreover, depicts the average runtimes on a
sends a message to the other processors that they logarithmic scale of all the performed simulations on
can kill the job and start the next job in the nodes of set two. From the simulations results of the
sequence. runs with a CCR of0.01, we conclude that selecting
more processors in the run decreases the runtimes,
which is the main motivation for programming in
parallel. Although the rescheduling and send times
increase when more processors are selected in the
run, the decrease in the computation times for this
3. Modules case is always higher. As is shown by we draw
different conclusions when the CCR is higher. For
3.1 DEDUCTION OF RESOURCES runs on nodes of set one and a CCR of 0.25, we
The server holds the complete text that was notice a decrease in runtimes until the amount of 16
encrypted by the different clients. The server must processors is selected. When more processors have
245
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
4.2 Implementation
4.2.1 DEDUCTION OF RESOURCES
The server holds the complete text that was
encrypted by the different clients. The server must
deduce the number of clients that has access to
itself. Basically the relationship between the Conclusion and Future Work
resources is a Client-Server relationship. The clients The proposed approaches are better then existing
in the network are attached loosely to the server and approach. Even single processor can finish the task
hence the number of clients changes dynamically in quickly if it is faster then other processors. The load
the network, which has to be known to the server distribution is optimal when all processors finish their
calculation exactly at the same time. Hence it is
―optimal‖ when the load assigned to each processor
is proportional to its predicted processor speed.
References
[1] Jie Chang, Wen‘an Zhou, Junde Song, Zhiqi Lin.
Beijing University of Posts and
Telecommunications‖Scheduling Algorithm of Load
Balancing Based on Dynamic Policies―978-0-7695-
3969-0/10 2010 IEEE DOI 10.1109/ICNS.2010.57.
[2] Soon-HeumKo, Nayong Kim, Joohyun Kim,
Abhinav Thota1, ShantenuJha‖Efficient Runtime
Environment for Coupled Multi-Physics Simulations
Dynamic Resource Allocation and Load-
Balancing‖.2010 10th IEEE/ACM International
4.2.2 PERFORMANCE ANALYSIS Conference on Cluster, Cloud and Grid Computing
The server must deduce the free workspaces in the 978-0-7695-4039-9/10 $26.00 © 2010 IEEE DOI
resources .If the free workspace is between 40 to 10.1109/CCGRID.2010.107.
100%, a fragment of the text is assigned to client. If [3] P. K. Suri , Department of Computer Sc. &
it is less than 40% then it means that the clients are Applications Kurukshetra University
not free hence the server does not overload those Kurukshetra,India ―An Efficient Decentralized Load
clients the server maintains entries about its clients. Balancing Algorithm For Grid‖,978-1-4244-4791-
6/10/$25.00 2010 IEEE.
[4] Hemant Kumar Mehta,ManoharChandwani,
PriyeshKanungo,‖A Modified Delay Strategy for
Dynamic Load Balancing in Cluster and Grid
Environment‖978-1-4244-5943-8/10/$26.002010
IEEE.
246
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[6] Menno Dobber, Student Member, IEEE, Rob van [9] H. Attiya, ―Two Phase Algorithm for Load
der Mei, and GerKoole ―Dynamic Load Balancing Balancing in Heterogeneous Distributed Systems,‖
and Job Replication in a Global-Scale Grid Proc. 12th EuromicroConf.Parallel, Distributed and
Environment‖ IEEE Transaction on parallel and Network-Based Processing (PDP ‘04), p. 434,2004.
distributed systems, vol. 20, no. 2, February 2009. [10] R. Bajaj and D.P. Agrawal, ―Improving
[7] Dynamic Load Balancing for Cluster- Scheduling of Tasks in aHeterogeneous
BasedPublish/Subscribe s System[2009]. Environment,‖ IEEE Trans. Parallel and Distributed
Systems, vol. 15, no. 2, pp. 107-118, Feb. 2004.
[8]The Contribution Of Static And Dynamic Load [11] I.Banicescu and V. Velusamy, ―Load Balancing
Balancing In A Real-Time Distributed Air Defense Highly Irregular Computations with the Adaptive
Simulation [2008]. Factoring,‖ Proc. 16th Int‘lParallel and Distributed
Processing Symp. (IPDPS ‘02), p. 195, 2002.
247
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
248
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
249
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
A. Semantic mapping
250
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
251
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
252
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Simply combine the knowledge sets K1and Given a resource request q = <c1 ,c2
K2 of the two SubO together to get a new one …..ck>and a SubO B=<C,K,O> to what extent
K. Additional extraction may be required in B is able tomatch qis represented by the
order to preserve a valid knowledge set in the knowledge matchingcoefficient (KMC). The
new SubO. For example, if c1 is a class in K1, KMC of B to q is given as
c2 is a class inK2 , and there is a property r KMC(B, q) = +
1 2+
between the two classes in O, neither K1nor K2
3 (1)
contains r. Then, we need to retrieve r from O
where and are weighting coefficients. n1
in order to maintain the completeness of K.
refers to thenumber of exact matches between
the classes in K and theclasses in q. n2 refers
IV. SUBO-BASED RESOURCE REUSE
to the number of plugin matches. n3refers to
the number of subsume matches.The
A.Resource Reuse
importance of the three weighting coefficients
where and is not equal. is the
Generally, the goal of learning is to
mostimportant in evaluating KMC, followed by
gain knowledge, skills,and experiences in
and Therefore, assign values to the
order to solve problems better andfaster. The
weighting coefficients according to the
users of e-learning systems need to reuse
variouse-learning resources to achieve a goal condition: > > One should take into
or solve a problem. Itreceives and handles account is thenormalization of the weighting
requests for e-learning resources coefficients. In order toevaluate KMC with
fromapplications or users. How to satisfy different sets of the coefficients,should
normalize the coefficients based on
253
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
254
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
255
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Research andDevelopment (ICCBR ‘97), [14] N.F. Noy and M. Klein, ―Ontology
pp. 467-476, 1997. Evolution: Not the Same as Schema
[3] T. Berners-Lee, J. Hendler, and O. Lassila, Evolution,‖ SMI Technical Report SMI-
―The Semantic Web,‖Scientific Am., vol. 2002-0926, http:// smi.stanford.edu/smi-
284, no. 5, pp. 34-43, 2001. web/reports/SMI-2002-0926.pdf, 2002.
[4] P. Brusilovsky, ―Adaptive Educational [15] L. Stojanovic, A. Maedche, B. Motik,
Hypermedia,‖ Proc. 10thInt‘l PEG Conf., pp. and N. Stojanovic, ―User- Driven Ontology
8-12, 2001. Evolution Management,‖ Proc. 13th Int‘l
[5] T. Gruber, ―A Translation Approach to Conf. Knowledge Eng. and Knowledge
Portable OntologySpecifications,‖ Management (EKAW ‘02), pp. 285-300,
Knowledge Acquisition, vol. 5, no. 2, pp. 2002.
199-220,1993.. [16] G. Flouris, ―On Belief Change and
[6] A. Maedche, B. Motik, L. Stojanovic, R. Ontology Evolution: Thesis,‖AI Comm.,
Studer, and R. Volz, ―Ontologies for vol. 19, no. 4, pp. 395-397, 2006.
Enterprise\ Knowledge Management,‖ [17] A.Y. Halevy, ―Answering Queries Using
IEEEIntelligent System, vol. 18, no. 2, pp. Views: A Survey,‖VLDB J., vol. 10, pp.
26-33, Mar./Apr. 2003. 270-294, 2001.
[7] A. Papasalouros, S. Retalis, and N.
Papaspyrou, ―SemanticDescription of
Educational Adaptive Hypermedia Based
on aConceptual Model,‖ Educational
Technology and Soc., vol. 7, no. 4,pp.
129-142, 2004
[8] W. Huang, D. Webster, D. Wood, and T.
Ishaya, ―An Intelligent Semantic e-
Learning Framework Using Context-Aware
Semantic Web Technologies,‖ British J.
Educational Technology, vol. 37, no. 3, pp.
351-373, 2006.
[9] N.F. Noy and M.A. Musen, ―Specifying
Ontology Views by Traversal,‖ Proc. Third
Int‘l Semantic Web Conf. (ISWC ‘04), pp.
713-725, 2004.
[10] J. Seidenberg and A. Rector, ―Web
Ontology Segmentation: Analysis,
Classification and Use,‖ Proc. 15th Int‘l
World Wide Web Conf. (WWW ‘06), pp.
13-22, 2006.
[11] B.C. Grau et al., ―Modularity and Web
Ontologies,‖ Proc. 10th Int‘l Conf.
Principles of Knowledge Representation
and Reasoning (KR ‘06), pp. 198-209,
2006.
[12] P. Doran, ―Ontology Reuse via
Ontology Modularisation,‖ Proc.
KnowledgeWeb PhD Symp. (KWEPSY ‘06),
http://www.l3s.de/
kweb/kwepsy2006/FinalSubmissions/kwep
sy2006_doran.pdf, 2006.
[13] M. Klein and D. Fensel―Ontology
Versioning for the Semantic Web,‖ Proc.
First Int‘l Semantic Web Working Symp.
(SWWS ‘01), pp. 483-493, 2001.
256
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
eswariram_88@yahoo.co.in
257
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
258
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
entire database sequence of feature The methods mentioned above only have
vectors for exhaustive comparison. been designedto detect videos of the same
Our approach does not involve the temporal order and length. Tofurther search
presegmentation of video required by videos with changes from query due tocontent
the proposals based on shot boundary editing, a number of algorithms have been
detection [9], [19], [21], [22], [23]. proposedto evaluate video similarity. To deal
Shot resolution, which could be a few with inserting in orcutting out partial content,
seconds in duration, is usually too Hua et al. [6] used dynamicprogramming
coarse to accurately locate a based on ordinal measure of resampledframes
subsequence boundary. Meanwhile, at a uniform sampling rate to find the best
our approach based on frame matchfor different length video sequences.
subsampling is capable of identifying This method has onlybeen tested on a small
video content containing ambiguous video database. Through timewarping distance
shot boundaries (suchas dynamic computation, they achieved highersearch
commercial, TV program lead-in and accuracy than the methods proposed in [5]
lead-out subsequences). and [6].However, with the growing popularity
of video editing tools,videos can be temporally
II. RELATED WORK manipulated with ease. This workwill extend
the investigations of copy detection not only
inthe aspect of potentially different length but
A.Video Copy Detection
also allowingflexible temporal order (tolerance
Extensive research efforts have been made to content reordering).Cheung and Zakhor
on extractingand matching content-based [11], [12] developed Video Signatureto
signatures to detect copies ofvideos. Mohan summarize each video with a small set of
[4] introduced to employ ordinal measurefor sampledframes by a randomized algorithm.
video sequence matching. Naphade et al. [8] Shen et al. [13]proposed Video Triplet to
developedan efficient scheme to match video represent each clip with anumber of frame
clips using colorhistogram intersection.Pua et clusters and estimate the cluster similarityby
al. [9] proposed a methodbased on color the volume of intersection between two
moment feature to search video copy from hyperspheresmultiplying the smaller density. It
along segmented sequence. In theirwork, also derives the overallvideo similarity by the
query sequence slides frame by frame on total number of similar framesshared by two
databasevideo with a fixed length window. In videos. For compactness, these
addition to distortionsintroduced by different summarizationsinevitably lose temporal
encoding parameters, Kim andVasudev [5] information. Videos aretreated as a ―bag‖ of
proposed to use spatiotemporal frames thus they lack the ability todifferentiate
ordinalsignatures of frames to further address two sequences with temporal reordering,such
display formatconversions, such as different as ―ABCD‖ and ―ACBD.‖Various time series
aspect ratios (letter-box,pillar-box, or other similaritymeasures can be considered, such as
styles).Since the process of video Mean distance,DTW, and LCSS, all of which
transformation could give riseto several can be extended tomeasure the similarity of
distortions, techniques multidimensional trajectories and applied for
circumventingthesevariations by globe video matching. However, it adheres to
signatures have been considered. Theytend to temporal orderin a rigid manner and does not
depict a video globally rather than focusing on allow frame alignment orgap, and is very
itssequential details. This method is efficient, sensitive to noise. DTW can be utilized
but haslimitations with blurry shot boundaries toaddress frame alignment by repeating some
or very limitednumber of shots. Moreover, in frames as manytimes as needed without extra
reality, a query video clip canbe just a shot or cost [7], but no frame can beskipped even if it
even a subshot. However, this method isonly is just a noise. In addition, it is capacitylimited
applicable for queries which consist of multiple in the context of partial content reordering.
shots. LCSS isproposed to address temporal order
and handle possiblenoise by allowing some
B. Video Similarity Search elements to be skipped withoutrearranging the
259
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
sequence order [21], but it will ignore database video with query. It facilitates safely
theeffect of potentially different gap numbers. pruning a large portion of irrelevant parts and
As known from the research on psychology, rapidly locating some promising candidates for
the visualjudgment of human perception has further similarity evaluations. Constructing a
anumber of factors.The proposed model bipartite graph representing the similar frame
incorporating different factors formeasuring mapping relationship between Q and S with an
video similarity is inspired by the efficient batch kNN search algorithm [18], all
weightedschemes [19], [22], [23] originally the possibly similar video subsequences along
introduced at shot level. the 1D temporal line can be extracted. Then,
to effectively but still efficiently identify the
Definition 1.Video subsequence most similar subsequence, the proposed query
identification.Let Q = { q1,q2, . . . , q|Q| } processing is conducted in a coarse-to-fine
be a short query clip and S = {s1, s2, . . . , s|S| style. Imposing a one-to-one mapping
} be the long database video sequence, constraint similar in spirit to that of [19],
where qi = {qi1, . . . ,qid} ϵ Q and sj = {sj1, . Maximum Size Matching (MSM) [20] is
. . ,sjd} ϵ S are d-dimensional feature vectors employed to rapidly filter some actually
representing video frames, and |Q| and |S| nonsimilar subsequences with lower
denote the total frame number of Q and S, computational cost. The smaller numbers of
respectively (normally |Q|«|S|). Video candidates which contain eligible numbers of
subsequence identification is to find Ŝ = { sm, similar frames are then further evaluated with
sm+1, . . . , sn} in S, where 1≤m≤n≤|S|, relatively higher computational cost for
which is the most similar part to Q under a accurate identification. Since measuring the
defined score function. video similarities for all the possible 1:1
mappings in a sub graph is computationally
For easy reference, a list of notations used in intractable, a heuristic method Sub-Maximum
this paper is shown in Table 1. Similarity Matching (SMSM) is devised to
quickly identify the subsequence
corresponding to the most suitable 1:1
TABLE 1 mapping.
A List of Notations
A.Retrieving Similar Frames
Similar frame retrieval in S foreach element
qi ϵ Q is processed as a range or kNN
search.Given qi and S, Algorithm 1 gives the
framework of retrieving similar frames. The
output set F(qi) consists of frames of S.
However, as explained later, we are more
inclined to haveeach qi retrieve the same
number of similar frames, and thedifferences
ofthemaximumdistances, dmax(qi,sj), where
sjϵ F(qi) and dmax(qi',sj'), where sj'ϵ F(qi') can
vary substantially.Therefore, kNN search is
preferred.
260
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
261
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
segment may have multiple 1:1 mappings, query clip. Inthe preliminary phase, the similar
andthe most similar subsequence in S may frames of query clip areretrieved by a batch
only be a portion of the sequence, next, we query algorithm. Then, a bipartite graphis
further refine it to find the most suitable1:1 constructed to exploit the opportunity of
mapping for accurate identification (or spatial pruning;thus, the high-dimensional
ranking), byconsidering visual content, query and database videosequence can be
temporal order and framealignment transformed to two sides of a bipartitegraph.
simultaneously. Only the dense segments are roughly obtained
aspossibly similar subsequences. In the filter-
IV. EXPERIMENTS and-refine phase,some nonsimilar segments
are first filtered, several relevantsegments are
A. Effectiveness
then processed to quickly identify the
To measure the effectiveness of mostsuitable 1:1 mapping by optimizing the
ourapproach, we use hit ratio, which is defined factors of visualcontent, temporal order, and
as the number of our method to correctly frame alignment together. Inpractice, visually
identify the position of the most similar videos may exhibit with
similarsubsequence (ground-truth), to the differentorderings due to content editing,
total number of queries. Notethat since for which yields some intrinsiccross mappings.
each query there is only one target Our video similarity model which
subsequence(where the original fragment was elegantlyachieves a balance between the
extracted) in thedatabase, hit ratio approaches of neglectingtemporal order and
corresponds to P(1), i.e., the precisionvalue at strictly adhering to temporal order
the first rank. The original video has also isparticularly suitable for dealing with this
beenmanually inspected so that the ground- case, thus cansupport accurate identification.
truth of each queryclip can be validated. Although only color featureis used in our
experiments, the proposed approach
B. Efficiency inherentlysupports other features. For the
future work, we plan tofurther investigate the
To show the efficiency of our approach,
effect of representing videos by otherfeatures,
weuse response time, which indicates the
such as ordinal signature. Moreover, the
average running timeof a query. Without
weight ofeach factor for measuring
SMSM, all the possible 1:1 mappingswill be
video similarity might be adjustedby user
evaluated. Since it is computationally
feedback to embody the degree of similarity
intractable toenumerate all 1:1 mappings to
morecompletely and systematically.
find the most suitable one,and there is no
prior practical method dealing with
ACKNOWLEDGMENTS
thisproblem for performance comparison, we
mainly study theefficiency of our approach by Sound and Vision video is copyrighted. The
investigating the effect ofMSM filtering. Sound andVision videoused in this work is
Without MSM, all the segments extracted in provided solely forresearch purposes through
dense segment extraction will be processed, the TREC Video InformationRetrieval
while with MSM, only asmall number of Evaluation Project Collection. The authors
segments are expected. Note that wouldlike to thank the anonymous reviewers
theperformance comparison is not affected by for their comments,which led to improvements
the underlyinghigh-dimensional indexing of this paper. This work issupported in part by
method. Australian Research Council underGrant
DP0663272.
V. CONCLUSIONS
REFERENCES
This paper has presented an effective and
efficient queryprocessing strategy for temporal
[1] A.W.M. Smeulders, M. Worring, S. Santini,
localization of similarcontent from a long
A. Gupta, and R. Jain, ―Content-Based
unsegmented video stream, consideringtarget
Image Retrieval at the End of the Early
subsequence may be approximate occurrence
Years,‖ IEEE Trans. Pattern Analysis and
ofpotentially different ordering or length with
262
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Machine Intelligence, vol. 22, no. 12, pp. [12] S.-C.S. Cheung and A. Zakhor, ―Fast
1349-1380, Dec. 2000. Similarity Search andClustering of Video
[2] C. Faloutsos, M. Ranganathan, and Y. Sequences on the World-Wide-Web,‖ IEEE
Manolopoulos, ―Fast Subsequence Trans. Multimedia, vol. 7, no. 3, pp. 524-
Matching in Time-Series Databases,‖ Proc. 537, 2005.
ACM SIGMOD ‘94, pp. 419-429, 1994. [13] H.T. Shen, B.C. Ooi, X. Zhou, and Z.
[3] H. Wang, A. Divakaran, A. Vetro, S.-F. Huang, ―Towards Effective Indexing for
Chang, and H. Sun, ―Survey of Very Large Video Sequence Database,‖
Compressed-Domain Features Used in Proc. ACM SIGMOD ‘05, pp. 730-741,
Audio-Visual Indexing and Analysis,‖ J. 2005.
Visual Comm. and Image Representation, [14] H.T. Shen, X. Zhou, Z. Huang, J.
vol. 14, no. 2, pp. 150-183, 2003. Shao, and X. Zhou, ―Uqlips: A Real-Time
[4] R. Mohan, ―Video Sequence Matching,‖ Near-Duplicate Video Clip Detection
Proc. IEEE Int‘l Conf. Acoustics, Speech, System,‖ Proc. 33rd Int‘l Conf. Very Large
and Signal Processing (ICASSP ‘98), pp. Databases (VLDB ‘07), pp. 1374-1377,
3697-3700, 1998. 2007.
[5] C. Kim and B. Vasudev, ―Spatiotemporal [15] J. Yuan, L.-Y. Duan, Q. Tian, S.
Sequence Matching for Efficient Video Ranganath, and C. Xu, ―Fast and Robust
Copy Detection,‖ IEEETrans. Circuits and Short Video Clip Search for Copy
Systems for Video Technology, vol. 15, Detection,‖ Proc. Fifth IEEE Pacific-Rim
no. 1, pp. 127-132, 2005. Conf. Multimedia (PCM ‘04), vol. 2, pp.
[6] X.-S. Hua, X. Chen, and H. Zhang, ―Robust 479-488, 2004.
Video Signature Based on Ordinal [16] J. Shao, Z. Huang, H.T. Shen, X.
Measure,‖ Proc. IEEE Int‘l Conf. Image Zhou, E.-P. Lim, and Y. Li, ―Batch Nearest
Processing (ICIP ‘04), pp. 685-688, 2004. Neighbor Search for Video Retrieval,‖ IEEE
[7] C.-Y. Chiu, C.-H. Li, H.-A. Wang, C.-S. Trans. Multimedia, vol. 10, no. 3, pp. 409-
Chen, and L.-F. Chien, ―A Time Warping 420, 2008.
Based Approach for Video Copy [17] Y. Peng and C.-W. Ngo, ―Clip-Based
Detection,‖ Proc. 18th Int‘l Conf. Pattern Similarity Measure for Query-Dependent
Recognition (ICPR ‘06), vol. 3, pp. 228- Clip Retrieval and Video
231, 2006. Summarization,‖IEEE Trans. Circuits and
[8] M.R. Naphade, M.M. Yeung, and B.-L. Yeo, Systems for Video Technology, vol. 16, no.
―A Novel Scheme for Fast and Efficient 5, pp. 612-627, 2006.
Video Sequence Matching Using Compact [18] D.R. Shier, ―Matchings and
Signatures,‖ Proc. Storage and Retrieval Assignments,‖ Handbook of Graph Theory,
for Image and Video Databases (SPIE J.L. Gross and J. Yellen, eds., pp. 1103-
‘00), pp. 564-572, 2000. 1116, CRC Press, 2004.
[9] K.M. Pua, J.M. Gauch, S. Gauch, and J.Z. [19] L. Chen and T.-S. Chua, ―A Match and
Miadowicz, ―Real Time Repeated Video Tiling Approach toContent-Based Video
Sequence Identification,‖ Computer Vision Retrieval,‖ Proc. IEEE Int‘l Conf.
and Image Understanding, vol. 93, no. 3, Multimedia and Expo (ICME ‘01), pp. 417-
pp. 310-327, 2004. 420, 2001.
[10] K. Kashino, T. Kurozumi, and H. [20] X. Liu, Y. Zhuang, and Y. Pan, ―A New
Murase, ―A Quick SearchMethod for Audio Approach to Retrieve Video by Example
and Video Signals Based on Video Clip,‖ Proc. Seventh ACM Int‘l Conf.
HistogramPruning,‖ IEEE Trans. Multimedia (MULTIMEDIA ‘99), vol. 2, pp.
Multimedia, vol. 5, no. 3, pp. 348-357, 41-44, 1999.
2003. [21] Y. Wu, Y. Zhuang, and Y. Pan,
[11] S.-C.S. Cheung and A. Zakhor, ―Content-Based VideoSimilarity Model,‖
―Efficient Video SimilarityMeasurement Proc. Eighth ACM Int‘l Conf. Multimedia
with Video Signature,‖ IEEE Trans. Circuits (MULTIMEDIA ‘00), pp. 465-467, 2000.
and Systems for Video Technology, vol.
13, no. 1, pp. 59-74, 2003.
263
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
264
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
265
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
266
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
CONCLUSIONS
ACKNOWLEDGEMENTS
The computer and information security is a
advanced topic. I owe a great many thanks
to a great many people who helped and
supported me during the writing of this
paper.
REFERENCES
Fig. 1 Example of a defence in depth [9] Firewalls VPNs Firewalls CompleteS.
The terms reasonable and prudent person, [10] Handbook of Information Security
due care and due diligence have been used Management (M.Krause.H.F.Tipton)
in the fields of Finance, Securities, and Law [11] The opensource PKI Book by
for many years. In recent years these terms Symeon(Simos)Xenitellis Mirror
have found their way into the fields of [12] IT Security Cookbook(Sean Boran)
computing and information security. U.S.A. [13] Intrusion Detection Systems(IDS)with
Federal Sentencing Guidelines now make it Snort advanced IDS with Snort Apache,
possible to hold corporate officers liable for MySQL, PHP, and ACID
failing to exercise due care and due diligence [14] O`reilly Security Books Chapters
267
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
268
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
269
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
270
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
271
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
272
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
This section focuses on solving the Intra-CECB This section concentrates on solving the Inter-
problem. The sufficient and necessary CECB problem. When energy consumption is
condition for Intra-CECB is first presented. It is balanced among nodes in Ci, all nodes in Ci
proven that energy consumption among nodes transmit the same amount of data Fi through
within each corona can be balanced if and hop-by-hop transmission mode, transmit the
only if theamount of data received by nodes in same amount of data Di through direct
this corona is balanced. Based on this transmission mode, and receive the same
observation, a localized zone-based routing amount of data Si from nodes in Ci-1.
scheme is designed to balance energy
consumption among nodes within each 5.1 Optimal Data Distribution Ratio
corona. Allocation
273
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
using heuristic algorithms. In this scheme, the scale sensor networks because of its high
optimal corona number is computed offline scalability and efficiency. By dividing the whole
using Simulated annealing algorithm and then network into small clusters, each node only
distributed to all nodes in network set-up needs to communicate with its cluster head
phase. through single or multihop routing, thereby
eliminating the requirement for large
7 The EBDG Protocol communication range. Similar to the work in it
is assumed that the network is composed of
In this section, the EBDG protocol is designed. two kinds of nodes: regular nodes and cluster-
The operation of EBDG is divided into two head nodes. The regular nodes have battery
phases: network set-up phase and data energy E0, and do the basic sensing, data
gathering phase. aggregation as well as packet relaying. The
cluster-head nodes are equipped with battery
7.1 Network Set-Up Phase energy E1 which is much lager than E0, and
these nodes are responsible for collecting data
As discussed in Sections 4 and 5, network from nodes within each cluster and
parameters such as the optimal number of transmitting the data to the remote sink. The
coronas n, the optimal number of subcoron as data gathering in the extended EBDG is
w, and the optimal data distribution ratio for performed as follows:
each corona can be computed offline. In
network set-up phase, the sink distributes Intra-Cluster. In each cluster, EBDG is
these global parameters to all nodes through employed to gather the data from all
broadcasting, and each node establishes its nodes to the cluster head. Therefore,
corona, subcorona,and zone identifications energy consumption is balanced
based on these parameters. In hop-by-hop among nodes within each cluster.
transmission mode, EBDG uses the zone based . Inter-Cluster. All the cluster heads
routing scheme presented in Section 4.1 to form into a superclusterwith the sink
balance energy consumption among nodes acting as the cluster head. Then EBDG
within the same corona. can be used to balance energy
consumption among cluster heads.
7.2 Data Gathering Stage Besides this approach, some schemes
based on mobile sink can also be used
In EBDG, all sensor nodes work in two states: to achieve the same goal.
active and
sleep. In active state, each node can transmit
data, receive data and perform data 9 SIMULATION RESULTS AND ANALYSIS
aggregation. In sleep state, each node turns
off its radio to save energy . In this section, the performance of EBDG is
evaluated through extensive simulations. To
8EXTENSION TO LARGE-SCALE DATA demonstrate the efficiency
GATHERING SENSOR NETWORKS of EBDG in terms of balancing energy
consumption and maximizing network lifetime,
In mixed-routing schemes, the basic EBDG is compared with a conventional
requirement is that all nodes must have the multihop routing scheme, a direct transmission
capability to directly communicate with the scheme, a cluster-head rotation scheme, and a
sink. However, in practice, most realistic maximum lifetime data gathering scheme.
sensor motes usually have limited transmission
range, which indicates that mixed-routing
scheme is not suitable for large-scale sensor
networks. In this section, the solution for
extending EBDG to large-scale sensor
networks is proposed by employing the
advantages of clustering techniques.
Clustering is a promising technique for large-
274
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
9.1 Comparison with Multihop Routing aggregation (MLDA) and without data
and Direct aggregation (MLDR).A centralized algorithm
Transmission Schemes which can generate a near-optimal data
gathering schedule in terms of maximizing
In EBDG, energy consumption among all network lifetime was proposed. In
nodes in the network is balanced in MLDA/MLDR, the data gathering schedule is a
expectation that all nodes shouldrun out of collection of directed trees rooted at the sink
energy at nearly the same time. This set of that spans all the sensors. Each tree may be
simulations are focused on evaluating the used for one or more rounds, and lifetime
performance ofEBDG in terms of energy maximization is Achieved by optimally
consumption balancing and network lifetime allocating the number of rounds to each
extension by comparing with multihoprouting tree.
and direct transmission schemes. For
multihoprouting schemes, the energy-efficient
geographic routingprotocols EEGR proposed is
referred for comparison EEGR can provide
near-optimal sensor-to-sink multihopdata
delivery in terms of minimizing the total
energy consumption for delivering each
packet. For the directtransmission scheme
(DT), in each round, every node generates its
data, performs data compression and
transmitsthe data directly to the sink without
any relay.
275
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
EBDG protocol and its extension to large-scale [3] J. Li and P. Mohapatra, ―Analytical
data-gathering sensor networks were Modeling and Mitigation Techniques for
developed. Simulation results show that EBDG the Energy Hole Problem in Sensor
can improve system lifetime by an order of Networks,‖ Pervasive and Mobile
magnitude compared with a multihop Computing, vol. 3, pp. 233-254, 2007.
transmission scheme, a direct transmission [4] J. Lian, K. Naik, and G.B. Agnew, ―Data
scheme, and a cluster-head rotation scheme. Capacity Improvement of Wireless
Future extensions of this work can be done in Sensor Networks Using Non-Uniform
two directions. First, this work is based on a Sensor Distribution,‖ Int‘l J. Distributed
collision-free MAC protocol. Future work can Sensor Networks, vol. 2, pp. 121-145,
be done to extend it to networks with 2006.
contention-basedMACprotocols. Second, in this [5] S. Olariu and I. Stojmenovic, ―Design
study, it is assumed that all nodes in the same Guidelines for Maximizing Lifetime and
corona have the same data distribution ratio Avoiding Energy Holes in Sensor
since assigning each node with a different Networks,‖ Proc. IEEE INFOCOM ‘06, pp.
data distribution ratio would significantly 1-12, 2006.
increase [6] O. Powell, P. Leone, and J. Rolim,
―Energy Optimal Data Propagation in
11 REFERENCES Wireless Sensor Networks,‖ J. Parallel
and Distributed Computing, vol. 67, pp.
[1] C. Efthymiou, S. Nikoletseas, and J. 302-317, 2007.
Rolim, ―Energy Balanced Data [7] X. Wu, G. Chen, and S.K. Das, ―On the
Propagation in Wireless Sensor Energy Hole Problem of Nonuniform
Networks,‖ Proc. 18th Int‘l Parallel and Node Distribution in Wireless Sensor
Distributed Processing Symp. (IPDPS Networks,‖ Proc. IEEE Int‘l Conf. Mobile
‘04), p. 225a, 2004. Adhoc and Sensor Systems (MASS ‘06),
[2] W. Guo, Z. Liu, and G. Wu, ―An Energy- pp. 180-187, 2006.
Balanced Transmission Scheme for [8] H. Zhang, H. Shen, and Y. Tan, ―Optimal
Sensor Networks,‖ Proc. First Int‘l Conf. Energy Balanced Data Gathering in
Embedded Networked Sensor Systems Wireless Sensor Networks,‖ Proc. 21st
(SenSys ‘03), pp. 300-301, 2003. Int‘l Parallel and Distributed Processing
Symp. (IPDPS ‘07), pp. 1-10, 2007.
276
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
277
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
sharing the infrastructure they use for voice, computing resources, and it is an alternative
video, or web applications. to having local servers handle applications.
Cloud Computing a Good Idea in Cloud computing groups together large
Government: numbers of compute servers and other
Cloud computing can reduce the costs resources and typically offers their combined
of existing services and enable government to capacity on an on-demand, pay-per-cycle
cost-effectively introduce enhanced services. basis. The end users of a cloud computing
Citizens benefit from cloud computing because network usually have no idea where the
their tax dollars are used more efficiently. servers are physically located—they just spin
Government IT costs often decrease because up their application and start working.
agencies don‘t need to purchase more Cloud computing is fully enabled by
capacity than they need to prepare for usage virtualization technology (hypervisors) and
spikes. Management costs can decrease, as virtualappliances. A virtual appliance is an
well. Agency IT personnel spend less time and application that is bundled with all the
resources making the IT infrastructure components that it needs to run, along with a
efficient, which enables them to focus on the streamlined operating system. In a cloud
core mission. Cloud computing also makes it computing environment, a virtual appliance
much easier for agencies to introduce new can be instantly provisioned and
citizen services. Examples include interactive decommissioned as needed, without complex
Web 2.0 applications that let you share videos configuration of the operating environment.
or collaborate with coworkers on a social This flexibility is the key advantage to
networking site. cloud computing, and what distinguishes it
from other forms of grid or utility computing
WHERE IS THE CLOUD? and software as a service (SaaS). The
An agency can host a cloud itself, ability to launch new instances of an
subscribe to a cloud service hosted by another application with minimal labor and expense
agency, or subscribe to a service from a third- allows application providers to:
party service provider. Some agencies • Scale up and down rapidly
subscribe to an external cloud for some • Recover from a failure
services and build a private cloud for others, • Bring up development or test instances
depending on the criticality and security • Roll out new versions to the customer base
classification of the service. The Office of • Efficiently load test an application
Management and Budget, General Service
Agency, and National Institute of Standards THE ECONOMICS OF THE CLOUD
and Technology are all defining standards for Before we delve into how to architect
cloud procurement and acquisition. an application for a cloud computing
To host a cloud, the agency needs a environment, we should explain why it is
platform with the following characteristics: financially advantageous to do so.
• Performance: The platform needs to be The first clear advantage of using an
able to support high transaction volume and existing cloud infrastructure is that you don‘t
multiple applications. have to make the capital investment yourself.
• Low management overhead: Acquiring Rather than expending the time and cost to
new servers should not add management build a data
burden. Automated provisioning offloads the center, you use someone else‘s investment
agency IT department. And to keep costs and turn what would have been a massive
down, IT staff should be able to manage capital
computing, storage access, outlay into a manageable variable cost.
networkinfrastructure, and virtualization from In the pay-per-cycle model of cloud
one interface. computing, you can start small and requisition
• Energy efficiency: Look for a platform that more computer time as your business grows.
minimizes the number of components to This makes starting up inexpensive, and gives
power and cool. you time to build your on-demand business
Cloud computing is a relatively new before investing in additional capacity. Instead
way of referring to the use of shared of investing ahead of demand, you simply use
278
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
and pay for exactly what you need when you open industry standard platforms J2EE. The
need it. portal enable the financial enterprise to have a
Development time can also be a common infrastructure that encapsulates
significant cost in creating an on-demand business rules, back-end connectivity logic and
application environment. If you adopt a SaaS transaction behavior, enabling banks to
model, your entire application must be re- writeonce, deploy-everywhere, across
architected to supportMulti-tenancy. With channels. The solution ensures a unified view
cloud computing, the cost of a machine year in of customer interactions to both the customers
the Amazon EC2 cloud (~$880 annually) is and the enterprises.
much less than the cost of a fully-loaded
developer (anywhere from $400-$1000 per Designing an application to run as a
day). This makes it a lot less expensive to virtual appliance in a cloud computing
scale up more virtual servers in the cloud than environment is very different than designing it
it is to spend even one day on development. for an on-premise or SaaS deployment. We
Finally, you can save money by discuss the following considerations. To be
designing your application with a simpler successful in the cloud, your application must
architecture ideal for cloud computing, which be designed to scale easily, tolerate failures
we‘ll spend the rest of this paper discussing. A and include management tools.
simpler architecture speeds time to market Scale
because it is easier to test, and you can Cloud computing offers the potential for nearly
eliminate some of the equipment and unlimited scalability, as long as the application
processes required to migrate an application is designed to scale from the outset. The best
from development into production. All the way to ensure this is to follow some basic
activities involved with development, test, QA application
and production can exist side-by-side in design guidelines:
separate instances running in the cloud. Start simple: Avoid complex design and
performance enhancements or optimizations in
ARCHITECTURAL CONSIDERATIONS favor
of simplicity. It‘s a good idea to start with the
simplest application and rely on the scalability
of the cloud to provide enough servers to
ensure good application performance. Once
you‘ve
gotten some traction and demand has grown,
then you can focus on improving the efficiency
of your application, which allows you to serve
more users with the same number of servers
or to reduce the number of servers while
maintaining performance. Some common
design techniques to improve performance
include caching, server affinity, multi-
Figure 1: MGCFP threading and tight sharing of data, but they
Figure 1 shows the architecture of all make it more difficult to distribute your
mining-grid centrice-finance portal (MGCFP) application across many servers. This is the
that has been developing byus. The MGCFP reason you don‘t want to introduce them at
consists of following primary applications,such the outset and only consider them when you
as banking, investment, insurances, need to and can ensure that you are not
mortgageand loans, wealth management as a breaking horizontal scalability.
set of integrated financialservices.The Split application functions and couple loosely:
architecture comprises a distributed, multi- Use separate systems for different pieces of
tired,service-oriented, component-based applicationfunctionality and avoid synchronous
solution that offers ahigh degree of connections between them. Again, as demand
modularity. The solution is available on the grows, you can scale each one independently
instead of having to scale the entire
279
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
280
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
can be planned and executed ahead of any the application will slowly sprawl across the
real production cloud, wasting resources and money.
failure. Your management system also plays
Be aware of the real cost of failure: Of course an important role in the testing and
the ideal situation is avoiding any application deployment process. We‘ve already
failure, but what is the cost to provide that highlighted how the cloud can be used for
assurance? A large internet company once everything from general testing to load testing
said that they could tolerate failure as long as to testing for specific failure scenarios.
the impact was small enough as to not be Including your testing in your management
noticeable to the overall customer base. This system allows you to bring up a test cluster,
assertion came from an analysis of what it conduct any testing that is required and
would cost to ensure seven nines of then migrate the tested application into
application uptime versus the impact of a production. The uniform resources that
failure on a portion of the customer base. underlie the
Manage cloud mean that you can achieve a rapid
Deploying cloud applications as virtual release to production process, allowing you to
appliances makes management significantly deliver
easier. The appliances should bring with them updated features and functions to your
all of the software they need for their entire customers faster.
lifecycle in the cloud. More important, they Finally, by automating the creation
should be built in a systematic way, akin to an and management of these appliances, you are
assembly line production effort as opposed to tackling
a hand crafted approach. The reason for this one of the most difficult and expensive
systematic approach is the consistency of problems in software today: variability. By
creating and re-creating images. We have producing a
shown how effectively scaling and failure consistent appliance image and managing it
recovery can be handled by rapid provisioning effectively, you are removing variability from
of new systems, but these benefits cannot be the
achieved if the images to be provisioned are release management and deployment process.
not consistent and repeatable. Reducing the variability reduces the chances
When building appliances, it is obvious of mistakes – mistakes that can cost you
that they should contain the operating system money.
and any middleware components they need.
Less obvious are the software packages that The advantages of designing your application
allow them to automatically configure for management in the cloud include:
themselves, monitor and report their state
back to a management system, and update • Reducing the cost and overhead of preparing
themselves in an automated fashion. the application for the cloud
Automating the appliance configuration and • Reducing the overhead of bringing up new
updates means that as the application grows instances of the application
in the cloud, the management overhead does • Eliminating application sprawl
not grow in proportion. In this way appliances • Reducing the chance for mistakes as the
can live inside the cloud for any length of time application is scaled out, failed over,
with minimal management overhead. When upgraded, etc.
appliances are instantiated in the cloud, they
should also plug into a monitoring and
management system. This system will allow ISSUES OF CLOUD COMPUTING
you to track application instances running in The legal issues arising out of cloud
the cloud, migrate or shutdown instances as computing can be broadly categorized as
needed, and gather logs and other system operational, legislative or regulatory, security,
information necessary for troubleshooting or third party contractual limitations, risk
auditing. Without a management system to allocation or mitigation, and those relating to
handle the virtual appliances, it is likely that jurisdiction.
281
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Operational legal issues concern legal [1] J. Dean and S. Ghemawat, ―MapReduce:
issues that arise from the use of cloud Simplified DataProcessing on Large Clusters‖,
computing services on a day-to-day basis and Communications of the ACM,ACM, January
include concerns such as access to information 2008, pp. 107-113.
of the business and manner of storage of the
said information. It is imperative that such [2] L.A. Barroso, J. Dean, U. Holzle, ―Web
issues be addressed prior to availing services search for aPlanet: the Google Cluster
of a service provider and be adequately dealt Architecture‖, Micro, IEEE,March-April 2003,
with in the contractual negotiations. Also, pp.22-28.
included in operational issues are those of
upgrade and vendor lock-in. This would imply [3] S. Ghemavat, H. Gobioff, S. Leung, ―The
that the business must consider as to whether, Google FileSystem‖, SOSP, ACM, December
while performing its operations, it would be 2003, pp.29-43.
able to upgrade to newer operating
procedures and systems and who, and to what [4] L. Ivanov, H. Hadimioglu, M. Hoffman, ―A
extent, shall be responsible for the process. new look atParallel computing in the computer
science curriculum‖,Journal of Computing
Another operational concern that a business Sciences in Colleges, Consortium forComputing
must consider is data portability. Would it be Sciences in Colleges, May 2008, pp. 176-179.
possible, in the event of discontinuation of
relationship between the vendor and the [5] B. Raghavan, K. Vishwanath, S.
business or in case of technical, financial or Ramabhadran, K.Yocum, A.C. Snoeren, ―Cloud
other difficulties, for the business to access its control with distributed rate
information through other applications or limiting‖, SIGCOMM Computer Communication
service providers? It is essential for businesses Review,ACM, October 2007, pp. 337-348.
to consider such a scenario, since there have
been several instances of data being lost due [6] A. Weiss, ―Computing in the Clouds‖,
to technical hitches or due to the vendor networker, ACM,December 2007, pp.16-25.
closing up shop. Such contingencies, if
provided for and dealt with in the contract [7] J. Hu, N.Zhong, ―Developing Mining-Grid
between the parties can go a long way in Centric e-Finance Portal‖ in Proc. of the
eliminating risks and also allocating liability in International Conference onWeb Intelligence,
case of loss. Hong Kong, 2006, pp. 966-969.
282
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
283
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
284
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
285
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The power supply is automatically cut framing, and then transmit the new data
off to prevent electrical damage and frames to the network coordination in
troubleshooting in time to remind the user. the same manner. Once receive the data, the
The motor unit has the PIC16f877a with CPU PAN network coordinator will upload the
where the data from the voltage and current receiving data to computer for further
sampling and signal processing unit is been processing and analysis.
received by mutual inductance current output
and mutual inductance voltage output. Current transformer:
Temperature sensing circuit and transform
circuit using vibration fault data storage to the A current transformer senses the
PIC microcontroller. Zigbee receiver is been current in induction motor and converts into
connected to the PIC and the LCD display for corresponding Voltage signal. The current
data displaying. The motor fault control circuit transformer output (through signal condition
and also the fault data storage. circuit) is given to input (Analog) port pin of
Microcontroller .Here we use PIC 16F877A, it
PROGRAM have Inbuilt of A/D Converter.
TO
MONITOR Potential transformer:
MOTOR DATA BASE
Similarly Potential transformer step down the
line voltage to 5V Peak. Voltage signal is given
to input (Analog) pin of PIC16F877A.
PC ZIGBEE
Example:
If Potential transformer output is given to
input AD0 (Channel 1) of PIC Controller means
Figure 2.Monitor unit
we have to Call Set_adc_channel (0), then
read_adc ();
III.ZIGBEE MODULE:
In Main function:
IEEE802.15.4 standard defines the
Int z;
protocol and interconnection of devices via
Set_adc_channel (0);
radio communication in a personal area
Delay_ms (100);
network (PAN). It operates in the ISM
Z=read_adc ();
(Industrial, Science and Medical) radio bands,
Here Variable Z have the Digital output of
at 868 MHz in Europe, 915 MHz in the USA
Potential Transformer
and 2.4 GHz worldwide. The purpose is to
provide a standard for ultra-low complexity,
Sensors:
Ultra-low cost, ultra-low power consumption,
Temperature sensor LM35:
and low data rate wireless connectivity. The
system framework for health monitoring
These sensors use a solid-state
system based on wireless sensor network is
technique to determine the temperature. The
made up of data collection nodes and PAN
fact as temperature increases, the voltage
network coordinator.
across a diode increases at a known
rate.Usually, a temperature sensor converts
The data collection nodes can carry
the temperature into an equivalent voltage
out desired functions such as detecting the
output IC LM35 is such a sensor.
vibration signals, signal quantizing, simple
processing, and the IEEE802.15.4 Standard
Here we describe a simple
package framing to transmit data to the PAN
temperature measurement and display system
network coordinator. In addition, they can also
based on LM35 sensor and PIC16F877A
receive data frames from other nodes, and
microcontroller. The temperature in degrees
then adding multi-hop information, package
Celsius is displayed on a 16×2 LCD. Fig. 3
shows the functional block diagram of the
286
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
IV.BIBLIOGRAPHY:
[1] HAO Yingji and LI Liangfu, ―A study of an
intelligent monitoring protection System based
on the 80C196 microcomputer for use with
motors,‖ Industrial Instrumentation
&Automation, No.4, 2001, pp.50-55.
Figure.3 .shows the circuit of the [2] Jiang Xianglong Cheng Shanmei and Xia
temperature monitoring system Litao,―The Design of Intelligent Monitor and
Protection System of AC Motor,‖ Monitor and
Vibration sensor: Protection, no.8, 2001, pp.18-20.
The Vibration Sensor can be mounted
on the grinding machine either by use of the [3] LI Jun bin, ZHANG Yanxian and YANG
magnetic mount provided, or by permanent Guangde, ―Application of PICMCU in
stud mount. The magnetic mount should be Electromotor Protect,‖ Journal of Zibo
used during initial system start up, until a University, Vol.3, No.4, Dec.2001, pp.57-59.
good permanent location is found on the
grinding machine for the sensor. The sensor [4] YI Pan, SHI Yihui and ZHANG Chengxue,
can then be permanently stud mounted at that ―Study of low voltage motor protection
location. When stud mounting the sensor, a devices,‖ RELAY, Vol.34, No.19, 2006, pp.7-
machined flat should be supplied at the 10.
mounting location on the machine
[5] ZHANG Nan, HUANG Yizhuang and LI
Signal conditioning unit: xuanhua, Multi task Processing in the
In both Current & voltage Integrated Protection Device,‖ Relay, Vol.31,
transformers outputs will be Alternative No. 3, 2003, pp.3132.
Current (AC). Microcontroller unit is working
under DC, so we must need to convert AC to [6] HU Zhijian, ZHANG Chengxue and CHENG
DC. For that purpose we use bridge rectifier Yunping, ―Study on Protective Algorithm for
and variable resistor. This setup is called as Elimination of Decaying Aperiodic
Signal conditioning circuit. Component,‖ Power System Technology,
Vol.25, No.3, 2001, pp.7211.
287
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
288
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
289
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
such delay, its performance can be poor or it rollback delay, and 2) providing performance
can be rendered totally inoperative and prediction to the scheduler, enabling the
useless. Our Reliability Driven middleware, client‘s specified maximum delay tolerance to
ReD, allows an MoG scheduler to make be better negotiated and matched with MoG
informed decisions, selectively submitting job resource capabilities. Suitable for scientific
portions to hosts having superior applications, MoGs are particularly useful in
checkpointing arrangements in order to ensure remote areas where access to the wired Grid is
successful completion by 1) providing highly infeasible, and autonomous, collaborative
reliable checkpointing, increasing the computing is needed.
probability of successful recovery, minimizing
290
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
checkpointed data must be stored on stable frequent need for multihop relays of
safe storage (i.e., a computer server or PC, checkpoint messages to access wired storage
dubbed a base station, BS, on a wired can lead to heavy traffic, significant latency,
network). The methodology is supported by and needless power consumption due to
recent routing mechanisms that interconnect collisions and interference.
inadvertently partitioned adhoc MH networks III. DECENTRALIZED CHECKPOINTING IN THE
[19]. Checkpointing wireless MHs to BSs has MOG
its own drawbacks, however, when not all MHs This work focuses on the MH
are adjacent to BSs or when BSs do not exist checkpointing arrangement mechanism,
(like the MoG at hand). Mobility is a major seeking superior checkpointing arrangements
impediment to moving checkpointed data from to maximize the probability of distributed
MH to BS. A complication is that routes application completion without sustaining an
between MH and BS change frequently due to unrecoverable failure. It deals with MoG
varying wireless links, complete and checkpointing among neighboring MHs without
intermittent disconnections, and mobility. The any access point or BS.
291
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
292
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
p(x) = ∫p(x|)p()d
2. If interested in some statistic f(θ):
p(x|), p() knownp(x) computable, but
Compute Eθ|x [f(θ)]. This is a ―full‖ have to integrate.
Bayesian approach. Problem 2: Expectationsrequire integration
against p(x|)p().Analytic integration- Perfect
Problem 1: Normalization. How do we if it works, but even for many simple standard
compute the normalization constantp(x) (the models (e. g. Gaussian likelihood + Cauchy
evidence)? prior), integral has no analytic solution.
Quadrature-Next step if analysis does not
1) Evidence is value of joined distribution of work. Problem: Curse of dimensionality.
sample x1, ..., xn at the singlepoint (x1, ..., (Example: Estimate parameters of 3D
xn). We cannot hope to estimate p(x) from a Gaussianquadrature on 9D grid). Monte
single point! Carlo integration-E. g. MCMC sampling. Very
powerful, but requires some expertise.
2) Evidence is also normalization constant of
posterior, so
293
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
294
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Supercomputers,‖ Proc. Int‘l Conf. Dependable Trade-Off Analysis,‖ Proc. Symp. Fault-
Systems and Networks, pp. 812-821, July Tolerant Computing, pp. 16-25, June 1996.
2005. [17] H. Higaki and M. Takizawa, "Checkpoint-
Recovery Protocol for Reliable Mobile
[13] A. Agbaria and W. Sanders, ―Application- Systems," Proc. 17th IEEE Symp. Reliable
Driven Coordination- Free Distributed Distributed Systems, pp. 93-99, Oct. 1998.
Checkpointing,‖ Proc. 25th IEEE Conf. [18] C. Ou, K. Ssu, and H. Jiau, "Connecting
Distributed Computing Systems, pp. 177-186, Network Partitions with Location-Assisted
June 2005. Forwarding Nodes in Mobile Ad Hoc
[14] A. Oliner, R. Sahoo, J. Moreira, and M. Environments," Proc. 10th IEEE Pacific Rim
Gupta, ―Performance Implications of Periodic Int'l Symp. Dependable Computing, pp. 239-
Checkpointing on Large-Scale Cluster 247, Mar. 2004.
Systems,‖ Proc. 19th IEEE Int‘l Conf. Parallel [19] K. Ssu et al., "Adaptive Checkpointing
and Distributed Processing Symp., Apr. 2005. with Storage Management for Mobile
[15] C. Lin, S. Kuo, and Y. Huang, ―A Environments," IEEE Trans. Reliability, vol. 48,
Checkpointing Tool for Palm Operating no. 4, pp. 315-324, Dec. 1999.
System,‖ Proc. Int‘l Conf. Dependable Systems [20] G. Cao and M. Singhal, "Mutable
and Networks, pp. 71-76, July 2001. Checkpoints: A New Checkpointing Approach
[16] D. Pradhan, P. Krishna, and N. Vaidya, for Mobile Computing Systems," IEEE Trans.
―Recoverable Mobile Environment: Design and Parallel and Distributed Systems, vol. 12, no.
2, pp. 157-172, Feb. 2001.
295
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
296
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
made dynamically based on the network connection. Other AODV nodes forward this
connectivity. One of the classification of message, and record the node that they heard
wireless ad hoc network is MANET (Mobile ad it from, creating an explosion of temporary
hoc network).A mobile ad hoc routes back to the needy node. When a node
network (MANET), sometimes called a mobile receives such a message and already has a
mesh network, is a self-configuring network of route to the desired node, it sends a message
mobile devices connected by wireless links. backwards through a temporary route to the
Each device in a MANET is free to move requesting node. The needy node then begins
independently in any direction, and will using the route that has the least number of
therefore change its links to other devices hops through other nodes. Unused entries in
frequently. Each must forward traffic unrelated the routing tables are recycled after a time.
to its own use, and therefore be a route. The When a link fails, a routing error is passed
primary challenge in building a MANET is back to a transmitting node, and the process
equipping each device to continuously repeats.Much of the complexity of
maintain the information required to properly the protocol is to lower the number of
route traffic[1]. Such networks may operate messages to conserve the capacity of the
by themselves or may be connected to the network. For example, each request for a
larger Internet. MANETs are a kind of wireless route has a sequence number. Nodes use this
ad hoc networks that usually has a routeable sequence number so that they do not repeat
networking environment on top of a Link route requests that they have already passed
Layer ad hoc network. They are also a type on. Another such feature is that the route
of mesh network, but many mesh networks requests have a "time to live" number that
are not mobile or not wireless. MANET are limits how many times they can be
highly vulnerable to attacks due to open retransmitted. Another such feature is that if a
medium, dynamically changing network route request fails, another route request may
topology, co-operative algorithms, lack of not be sent until twice as much time has
centralized monitoring and management point, passed as the timeout of the previous route
lack of clear line defense. One of the typical request. The advantage of AODV is that it
routing protocols for MANET is called Ad Hoc creates no extra traffic for communication
On-Demand Distance Vector(AODV)[2] . One along existing links. Also, distance vector
of the possible and commonest attacks in ad routing is simple, and doesn't require much
hoc networks is the black hole attack. memory or calculation. However AODV
requires more time to establish a connection,
B. AODV PROTOCOL and the initial communication to establish a
Ad hoc On-Demand Distance Vector (AODV) route is heavier than some other approaches.
Routing is a routing protocol for mobile ad hoc A RouteRequest carries the source
networks (MANETs) and other wireless ad-hoc identifier ,the destination identifier ,
networks. It is a reactive routing protocol, the source sequence number , the destination
meaning that it establishes a route to a sequence number , the broadcast identifier ,
destination only on demand. In contrast, the and the time to live (TTL) field.
most common routing protocols of the
Internet are proactive, meaning they find C. BLACK HOLE ATTACK
routing paths independently of the usage of One of the possible and commonest attacks in
the paths. AODV is, as the name indicates, Ad hoc networks is the Black Hole attack[3].
a distance-vector routing protocol. AODV In the Black Hole Attack the malicious nodes
avoids the counting-to-infinity problem of advertise itself as having the shortest path to
other distance-vector protocols by using the destination node. Black hole attack is one
sequence numbers on route updates, a of the security threat in which the traffic is
technique pioneered by DSDV. AODV is redirected to such a node that actually does
capable of both unicast and multicast routing. not exist in the network. It‘s an analogy to the
In AODV, the network is silent until a black hole in the universe in which things
connection is needed. At that point the disappear. The node presents itself in such a
network node that needs a way to the node that it can attack other nodes
connection broadcasts a request for and networks knowing that it has the shortest
297
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
path. MANETs must have a secure way for AODV) to prevent security threats of black
transmission and communication which is hole by notifying other nodes in the network
quite challenging and vital issue. of the incident is proposed.The prevention
scheme detects the malicious nodes and
2. PROBLEM STATEMENT isolates it from the active data forwarding and
routing and reacts by sending ALARM packet
When a node moves out of the transmission to its neighbors. The calculation of the
range of the source node, the source node threshold value is difficult. Zhao Min et al.[7]
assumes to be normal node as a malicious proposed a method to prevent cooperative
node (False positive). When a normal node black hole attack for manets. Two
detects a malicious node the node will authentication mechanisms, based on the
broadcast an alarm message. However this hash function, the Message Authentication
alarm message may not arrive at the source Code (MAC) and the Pseudo Random Function
node in time for various reasons such as no (PRF), are proposed to provide fast message
route to the source node. Thus the source verification and group identification, identify
node cannot judge that there is a malicious multiple black holes cooperating with each
node in the root and begins to send data other and to discover the safe routing avoiding
along this dangerous route(False negative). cooperative black hole attack. Xiao Yang
The sequence number generated by a Zhang et al.[8] proposed a method to detect
destination node is very important, a malicious black hole attack in MANET. Every
node can play a role of sequence number conventional method to detect such an attack
collector in order to get the sequence number has a defect of rather high rate of
of as many other nodes as possible by misjudgment in the detection. In order to
broadcasting request with high frequency to overcome this defect, a new detection method
different nodes in MANET, so that this based on checking the sequence number in
collector always keeps the freshest of the Route Reply message by making use of a
sequence numbers of other nodes. new message originated by the destination
node and also by monitoring the messages
3. RELATED WORK relayed by the intermediate nodes in the route
Satoshi Kurosawa et al.[4] proposed a is proposed. Can detect more than one black
Dynamic Learning Method to detect black hole hole attacker at the same time. No need of
attack in AODV based MANET. This paper any threshold or trust level.When a node
analyzes the black hole attack which is one of moves out of the transmission range of the
the possible attacks in ad hoc networks. In source node the source node assumes the
conventional schemes, anomaly detection is normal node as a malicious node. When a
achieved by defining the normal state from normal node detects a malicious node the
static training data. However, in mobile ad hoc node will broadcast an alarm msg. however
networks where the network topology this alarm message may arrive at the source
dynamically changes, such static training node in time for various reasons such as no
method could not be used efficiently. In this route the source node. Thus the source node
paper, an anomaly detection scheme using cannot judge that there is a malicious node in
dynamic training method in which the training the route and begins to send data along this
data is updated at regular time intervals is dangerous route.H.Lan Hguyen et al.[9] made
proposed. Latha Tamilselvan et al.[5] a study of different types of attacks on
proposed a method to prevent black hole multicast in MANET.Security is an essential
attack in MANET. To reduce the probability it requirement in mobile ad hoc networks
is proposed to wait and check the replies from (MANETs).In the above study the following are
all the neighboring nodes to find a safe route. analyzed First, protocols that use the duplicate
Handling the timer expiration event is difficult suppression mechanism such as ODMRP,
The handling of route reply packet takes more MAODV, ADMR are very vulnerable to rushing
time. Payal N. Raj et al.[6] proposed a attacks. Second, although the operations of
dynamic learning system against black hole black hole attacks and neighbor attacks are
attack in AODV based Manet. In this paper, a different they both cause the same degree of
DPRAODV (Detection, Prevention and Reactive damage to the performance of a multicast
298
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
group in terms of packer loss rate. Finally, malicious behavior is very less. Table 1 is the
jellyfish attacks do not affect the packet Association table of node 1 in Fig.1
delivery ratio or the throughput of a multicast
group, but they severely increase the packet A.BLOCK DIAGRAM
end-to-end delay and delay jitter. The
performance of a multicast session in MANETs Input (Nodes
under attacks depends heavily on many entering the
factors such as the number of multicast MANET)
senders, the number of multicast receivers,
the number of attackers as well as their Behavior of the
positions. node is analyzed
4. PROPOSED SCHEME
Unassociated Associated Friend
(Blackhole node)
This section presents the extension of
Association based Routing which is to be
applied over the AODV protocol in order to Not given Given preference in the route
enhance the security. The purpose of this preference in the selection and data forwarding
route selection
scheme is to fortify the existing and data
implementation by selecting the best and forwarding Message is forwarded to
secured route in the network. For each node correct destination
in the network, a trust value will be stored
that represent the value of the trustiness to
each of its neighbor nodes. This trust value B.CALCULATION OF TRUST VALUE
will be adjusted based on the experiences that
the node has with its neighbor nodes. The trust values are calculated based on the
following parameters of the nodes. We
In our proposed scheme we classify the propose a very simple equation for the
Association among the nodes and their calculation of trust value.
neighboring nodes in to three types as below R=The ratio between the number of packets
forwarded and number of packets to be
forwarded. The threshold trust level is
1) UNASSOCIATED(UA) calculated by using (1)
The nodes that newly joined the network and A=Acknowledgement bit.(0 or 1)
those nodes which have not forwarded any T= tanh (R+A)
message comes under this category. The trust
levels are very low and the malicious behavior
is very high.
2) ASSOCIATED(A)
The nodes that have started to send message
but have some more messages to forward
comes under this category. The trust levels
are neither low nor too high, probability of
malicious nodes in the network is to be
observed.
299
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
300
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
message to the destination and consumes the [5] Latha Tamilselvan, V. Sankaranarayanan:
entire message our proposed scheme "Prevention of Black Hole Attack in MANEr",
identifies more than one black hole node and The 2nd international conference on wireless,
the data is not allowed to pass through the Broadband and Ultra Wideband
black hole node path thus delay and overhead Communications (January 2007).
in route selection is reduced. [6] Payal N. Raj and Prashant B. Swadas,
―DPRAODV: A Dynamic Learning System
REFERENCES Against Black Hole Attack In AODV Based
[1] Elizabeth M, Royer, and Chai-Keong Toh: Manet‖, IJCSI International Journal of
"A Review of CurrentRouting Protocols for Ad Computer Science Issues, Vol. 2, 2009.
Hoc Mobile Wireless Networks," IEEE Personal [7] Zhao Min, Zhou Jiliu, ―Cooperative Black
Communications, pp. 46-55, (April 1999). Hole Attack Prevention for Mobile Ad Hoc
[2] C.E. Perkins, S,R, Das, and E. Royer: "Ad- Networks‖, International Symposium on
I-Ioe on Demand Distance Vector(AODV)", Information Engineering and Electronic
RFC 3561. Commerce, 2009.
[3] H. Lan Nguyen and U, Trang Nguyen: "A [8] Xiao Yang Zhang, yuji Sekiya and Yasushi
study of different types of attacks on multicast wakahara, ―Proposal of a Method to Detect
in mobile ad hoc networks", Ad Hoc Black Hole Attack in MANET‖, Autonomous
Network,VoI.6,No. I, (2007). Decentralized Systems, pp 1-6, ISADS 2009 .
[4] Satoshi Kurosawa, Hidehisa Nakayama, Nei [9] H.Lan Hguyen and U.Trang Nguyen, ―A
Kato, Abbas Jamalipour, and Yoshiaki Nemoto: Study of Different Types of Attacks on
"Detecting Blackhole Attack on AODV-based multicast in Mobile Ad Hoc Networks‖, Ad Hoc
Mobile Ad Hoc Network by Dynamic Learning Network,Vol.6, No.1, 2007.
Method", International Journal of Network [10] Network Simulator Official Site for
Security, Vo1.5, PP.33S-346, (November, Package Distribution, web reference,
2007). http://www.isi.edulnsnam
301
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
***Department of Computer Science and Engineering, Sun College of Engineering and Technology .
Abstract: 1. INTRODUCTION
Congestion is the difficult problem in 1.1 Wireless Sensor Networks:
wireless sensor networks, which causes an With the popularity of laptops, cell
increase in the amount of data loss and delays phones, PDAs, GPS devices, RFID, and
in data transmission. Both the node level and intelligent electronics in the post-PC era,
link level congestions have direct impact on computing devices have become cheaper,
energy efficiency and Quality of Data (QoD). more mobile, more distributed, and more
In this paper, we propose a new congestion pervasive in daily life. It is now possible to
control technique called Adaptive construct, from commercial off-the-shelf
Compression-based congestion control (COTS) components, a wallet size embedded
Technique (ACT) which is designed for remote system with the equivalent capability of a 90's
monitoring of patient vital signs and PC. Such embedded systems can be supported
physiological signals. The compression with scaled down Windows or Linux operating
techniques used in the ACT are Discrete systems. From this perspective, the
Wavelet Transform (DWT), Adaptive emergence of wireless sensor networks
Differential Pulse Code Modulation (ADPCM), (WSNs) is essentially the latest trend of
and Run-Length Coding (RLC).DWT is Moore's Law toward the miniaturization and
introduced for priority based congestion ubiquity of computing devices. Typically, a
control because it classifies the data in to four wireless sensor node (or simply sensor node)
groups with different frequencies. Congestion consists of sensing, computing,
is detected in advance, using ACT at each communication, actuation, and power
intermediate sensor node. The main purpose components. These components are
of ACT is to guarantee high quality of data by integrated on a single or multiple boards, and
reducing dropped data due to congestion. ACT packaged in a few cubic inches. With state-of-
increases the network efficiency and the-art, low-power circuit and networking
guarantees fairness to sensor nodes as technologies, a sensor node powered by 2 AA
compared with the existing methods. batteries can last for up to three years with a
Moreover; it exhibits a very high ratio of the 1% low duty cycle working mode. A WSN
available data in the sink. usually consists of tens to thousands of such
nodes that communicate through wireless
channels for information sharing and
cooperative processing. WSNs can be
302
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
303
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
decision
Service
provider
2. System Model
Fig 1: Architecture Diagram
Monitoring nodes
The Fig 1 shows the architecture
diagram where the physiological and vital
signs are monitored from sensor nodes. These
details are stored in the database where all
records about the patients are maintained.
The Priority Scheduler schedules the data
packets based on priority. During the
transmission of data packets the congestion is
controlled using ACT protocol.
2.1. Problem
There are mainly two types of
congestion in WSNs. The first type is the
node-level congestion which occurs due to
queue overflow inside the node. Queue
overflow might lead to packet drop and this
leads to retransmission if required and
therefore consumes additional energy.
Wireless channels are shared by several nodes
using Carrier Sense Multiple Access (CSMA)-
like protocols and thus collisions among sensor
304
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
nodes can occur when multiple sensor nodes packets with congestion notification. If a
try to occupy the channel concurrently. This is sensor node receives a routing packet with
the second type of congestion-link-level congestion notification from the parent node,
congestion. Link-level congestion increases the the child sensor node increases the
packet service time and decreases the link transmission interval of packets in the queue
utilization. Both the node level and the link- and checks whether the ARC and APC are
level congestions have direct impact on energy applicable. If the child node faces congestion
efficiency and Quality of Data (QoD). similar to the parent node, it propagates the
Therefore congestion must be efficiently congestion notification to its child nodes.
controlled. Congestion control protocol
efficiency depends on how much it can 2.3. Compression Technique:
achieve the following objectives: first, energy- In ACT we use three compression
efficiency should be improved in order to techniques namely DWT, ADPCM, and
extend system lifetime. Therefore congestion RLC.The Discrete Wavelet Transform (DWT),
control protocols need to avoid or reduce which is based on sub-band coding is found to
packet loss due to buffer overflow, and remain yield a fast computation of the Wavelet
lower control overhead that consumes less Transform.Adaptive DPCM (ADPCM) is a
energy; second, it is also necessary to support variant of Differential Pulse-Code Modulation
traditional QoS metrics such as packet loss (DPCM) that varies the size of the quantization
ratio, packet delay, and throughput; third, step, to allow further reduction of the required
fairness needs to be guaranteed so that each bandwidth for a given signal-to-noise ratio.The
node can achieve fair throughput. most commonly used entropy encoders are
In this paper we propose a concept the Huffman encoder and the arithmetic
called ACT (Adaptive compression-based encoder, although for applications requiring
congestion control technique. ACT transforms fast execution, simple Run Length Coding
the data from time domain to the frequency (RLC) is very effective. It is important to note
domain. This reduces the range of data by that a properly designed quantizer and
using ADPCM (Adaptive differential pulse code entropy encoder are absolutely necessary
modulation).Next it reduces the number of along with optimum signal transformation to
packets with the help of RLC before get the best possible compression.
transferring data to source node. ACT
introduces DWT (Discrete wavelet transform)
for priority-based congestion control because
the DWT Classifies data into four groups with
different frequencies. The ACT assign priorities
to these data groups in an inverse proportion
to the respective frequencies of the data
groups and defines the quantization step size Fig 2: Data compression procedure
of ADPCM in an inverse proportion to the The ACT first transforms the data from the
priorities. time domain to the frequency domain by using
the DWT, reduces the range of the data with
2.2. Operation of ACT: the help of ADPCM, and then reduces the
ACT checks the queue state number of packets by employing RLC before
periodically using routing timer. If the queue is transfer of data in source node. Then, it
congested with packets, then the ACT checks introduces the DWT for priority-based
whether the ARC and APC are applicable or congestion control because DWT classifies the
not. If the ARC and APC are applicable, the data into four groups with different
ACT applies the APC for source packets and frequencies. Subsequently, it assigns priorities
the ARC for transit packets. If the congestion to these data groups in an inverse proportion
persists, the quantization step size is increased to the respective frequencies of the data
drastically. If the quantization step size groups and defines the quantization step size
reaches the limit and the queue is still of ADPCM in an inverse proportion to the
congested, the ACT starts to drop packets with priorities. RLC generates a less number of
a low priority in the queue and send routing
305
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
packets for a packet with a low priority. In the queue is served in the clockwise direction and
relaying node, the ACT reduces the number of a routing packet is inserted at the end of the
packets by increasing the quantization step queue.
size of ADPCM in case of congestion. The
destination node (usually a sink node) 2.5. APC
reverses the compression procedure. A sink The DWT and RLC are similar to the
node should apply Inverse Run-Length Coding ones conventionally used, and ADPCM is a
(IRLC), Inverse Adaptive Differential Pulse reduced version used for sensed data
Code Modulation (IADPCM), and then Inverse compression. The encoder of the ADPCM
Discrete Wavelet Transform (IDWT). consists of a difference signal computation,
adaptive quantizer, inverse adaptive quantizer,
2.4. Adaptive Queue Operation in quantizer scale factor adaptation, and signal
Congestion reconstructor. The difference signal computer
The DWT classifies the incoming data subtracts the reconstructed signal from the
in to four groups based on priority; these data original data. The subtraction of two
will be given to the queue. The queue in the successive data packets that have similar
network layer works in the First Come First values reduces the range of data values. The
Serve mode [Fig 3-A]. range of values is reduced again by the
adaptive quantizer. The quantized data is
encoded by RLC and then inserted into the
network layer. The quantized data are also
inversely quantized by an inverse adaptive
quantizer and reconstructed into the signal by
the signal reconstructor. The number of
generated packets is varied by the adaptive
quantizer, which is controlled by the quantizer
scale factor adaptation. The quantizer scale
factor adapter increases the quantizer step
(A) size in proportion to the strength of
congestion. Further, the decoder of the
ADPCM consists of an inverse adaptive
quantizer, signal reconstructor, and quantizer
scale factor adapter.
2.6. ARC
The ARC controls the output packet
rate by re-encoding the transit packets in the
relaying queue. It consists of RLC, IRLC,
ADPCM, and IADPCM units.Transit packets are
(B) in a compressed state after RLC is performed,
and therefore, the ARC first decodes them
Fig 3: Queue Operation in Congestion with the help of IRLC. Then, the
decompressed transit packets are
If there is no packet in the sending reconstructed by IADPCM and compressed
queue, the packet is served immediately [Fig again with a different quantizer step size,
3-A-(2)]. If packets are already present in the which is affected by the congestion state.
queue, the packet that was inserted last has
3. Simulation Results
to wait for the previous packets to be served
[Fig 3-A-(3)]. If there is no congestion, We use a simulation study to evaluate
packets will be served in the order in which the performance of the proposed protocol
they are inserted. If packets are congested in under different scenarios. For this purpose, we
queue, they must wait until the congestion is simulated a wireless biomedical sensor
reduced [Fig 3-A-(4)].During congestion, if the network including 10 different patients.
306
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
307
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
to the particular patient‘s sensor node and 3. H. L. Ren, M. Q. H. Meng, and X. J. Chen,
increases the patient‘s priority. All sensor "Physiological Informationacquisition through
nodes along the path detect this change in wireless biomedical sensor networks,"
situation and allocate more network bandwidth Proceeding ofIEEE International Conference on
for vital signs and physiological signals from Information Acquisition July2005.
the patients in need. 4. M.H.Yaghmaee, Donald Adjeroh, ―A new
priority based congestioncontrol protocol for
wireless multimedia sensor networks‖,
InProceedings, 9th IEEE International
References: Symposium on a World of Wireless,Mobile and
1. Moghaddam, M.H.Y. Adjeroh, ―A Novel Multimedia Networks (WOWMOM 2008),
Congestion Control Protocol for Vital Signs Newport Beach,CA, June 23-27, 2008.
Monitoring in Wireless Biomedical Sensor 5. B. Hull, K. Jamieson, and H. Balakrishnan,
Networks,‖ Proceeding of IEEE International ―Mitigating congestion inwireless sensor
Conference on Information Acquisition 08 July networks,‖ in Proc. ACM Sensys ‘04, Baltimore,
2010. MD,Nov. 3–5, 2004.
2. Lee, J.-H.; Jung, I.-B.Adaptive-Compression 6. C.-Y. Wan, S. B. Eisenman, and A. T.
Based Congestion Control Technique for Campbell, ―CODA: congestiondetection and
Wireless Sensor Networks.Sensors2010, 10, avoidance in sensor networks,‖ in Proc. ACM
2919-2945. Sensys ‘03,Los Angeles, CA, Nov. 5–7, 2003.
308
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract—In infrastructure-less ad hoc networks, with these thresholds. The power level at which a
efficient usage of energy is very critical because node should transmit, to maximize its utility, is
of the limited energy available to the sensor evaluated. Moreover, we compare the utilities
nodes. Among various phenomena that consume when the nodes are allowed to transmit with
energy, radio communication is by far the most discrete and continuous power levels; the
demanding one. One of the effective ways to limit performance with discrete levels is upper
unnecessary energy loss is to control the power bounded by the continuous case. We define a
at which the nodes transmit signals. In this distortion metric that gives a quantitative
paper, we apply game theory to solve the power measure of the goodness of having finite power
control problem in a CDMA-based distributed levels and also find those levels that minimize the
sensor network. We formulate a noncooperative distortion. Numerical results demonstrate that
game under incomplete information and study the proposed algorithm achieves the best
the existence of Nash equilibrium. With the help possible payoff/utility for the sensor nodes even
of this equilibrium, we devise a distributed by consuming less power.
algorithm for optimal power control and prove Index Terms—Wireless ad hoc network,
that the system is power stable only if the nodes game theory, distributed power control,
comply with certain transmit power thresholds. energy efficiency.
We show that even in a noncooperative scenario,
it is in the best interest of the nodes to comply
I. INTRODUCTION
The advancements in wireless communication development of low-cost, low-power, multifunctional
technologies coupled with the techniques for sensor networks. The sensor nodes in these networks
miniaturization of electronic devices have enabled the are equipped with sensing mechanisms that gather and
process information. These nodes are also capable of be as energy efficient as possible. Since the
communicating untethered over short distances [1]. transmission of data signals consumes the most energy,
Oftentimes, sensor networks are deployed at locations transmission at the optimal transmit power level is very
that do not allow human intervention due to difficulty in crucial. This is because a node will always try to
accessing such areas; hence, refurbishing energy via transmit at high power levels just to make sure that the
replacing battery is infeasible. As a result, these packets are delivered with a high success probability.
networks are deployed only once with finite amount of Hence, smart power control algorithms must be
energy available to every sensor node. As energy is employed that find the optimal transmit power level for
depleted for sensing, computing, and communication a node for a given set of local conditions. Some
activity, the algorithms and protocols that are used must distributed iterative power control algorithms have been
309
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
proposed for cellular networks; these algorithms We investigate the existence of Nash equilibrium
investigate to find the power vector for all the nodes [14]. We observe that there exists a transmission
that minimizes the total power with good convergence power threshold and channel quality threshold that
[2], [3]. In this respect, it is important that concepts the nodes must comply with in order to achieve
from game theory are used to guide the design process
of the nodes that work in a distributed manner. Ideas Nash equilibrium. We also observe that with
and fundamental results from game theory have been repeated games in effect, sensor nodes follow the
used for solving resource management problems in transmission strategies to achieve Nash equilibrium
many computational systems, such as network even without presence of any third party
bandwidth allocation, distributed database query enforcement.
optimization, and allocating resources in distributed
systems such as clusters, grids, and peer-to-peer Next, we consider a system that would allow only
networks ([4], [5], [6], [7], [8], [9], [10], [11], and finite number of discrete power levels. A metric
references therein). In a game theoretic framework, the
called distortion factor is defined to investigate the
nodes buy, sell, and consume goods in response to the
prices that are exhibited in a virtual market. A node performance of such system and compare it with
attempts to maximize its ―profit‖ for taking a series of systems that would allow any continuous power
actions. Whether or not a node receives a profit is levels. We also propose a technique to find the
defined by the success of the action; for example, power levels that would minimize the distortion.
whether a packet is successfully received. The essence
of this research is the application of game theory to We present numerical results to verify the
achieve efficient energy usage through optimal selection performance of the proposed games. The results
of the transmit power level. show that if the nodes comply with the transmit
In this paper, we take a game theoretic approach to
thresholds, net utility is maximized. Also with the
regulate the transmit power levels of the nodes in a
distributed manner and investigate if any optimality is proposed mechanism of finding discrete power
achievable. We focus on the problem of optimal power levels, distortion factor is reduced.
control in wireless sensor networks with the aim of
maximizing the net utilities (defined later) for the
rational sensor nodes. One may argue that the sensor II. GAME THEORY FOR AD HOC/SENSOR
nodes usually belong to the same authority, and hence NETWORKS
they can be programmed to negotiate strategies that is
most advantageous for the entire network. However,
this claim may not be applicable to the power control Game theory has been successfully used in ad hoc and
problem in sensor networks as strategies for sensor networks for designing mechanisms to induce
transmission power and negotiation for self-coexistence desirable equilibria by offering incentives to the
must be done in real time and distributed manner [12], forwarding nodes [15], [16], [17] and also punishing
[13]. We adopt a noncooperative game model where nodes for misbehaving [18]. Recently, there has been a
each node tries to maximize its net utility. Net utility is growing interest in applying game theoretic techniques
computed by considering the benefit received and the to solve problems where there are agents/nodes that
cost.In summary, the contributions of this paper are as might not have the motive or incentive to cooperate.
follows: Such noncooperation is very likely since the rational
We formulate a noncooperative game under agents will not work (e.g., forward packets) for others
incomplete information for the distributed sensor unless, and until, convinced that such cooperation will
nodes. We define the benefit received and the cost eventually be helpful for themselves. In [13], Niyato et
al. investigated energy harvesting technologies required
incurred, and hence the net utility for successful
for autonomous sensor networks using a noncooperative
packet transmission. game theoretic technique. Nash equilibrium was
proposed as the solution of this game to obtain the
310
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
optimal probabilities of the two states, viz., sleep and (signatures) can be allocated to different nodes with
wake up, that were used for energy conservation. Their possible code reuse between spatially separated nodes.
solutions revealed that sensor nodes selfishly try to In general, due to nonzero cross-correlation between
conserve energy at the expense of high packet blocking node signatures, we understand that there is an upper
probability. Xidong et al. applied game theoretic limit in the number of simultaneously active nodes in the
dynamic power management (DPM) policy for vicinity of a receiver (i.e., within the interference range
distributed wireless sensor network using repeated of a receiver) so that then received SINR stays above a
stage games. minimum operational threshold.
As far as ad hoc networks are concerned, Buttyan and To obtain the node distribution, we use the following
Hubaux [16] proposed the concept of virtual currency assumptions and definitions:
(called ―nuglets‖) which is a method to reward nodes All nodes have an omnidirectional transmit and
participating in forwarding packets in a mobile ad hoc receive antenna of the same gain.
network. The Terminodes project [24] has proposed a
method that encourages cooperation in ad hoc networks Receiving and interference ranges for each sensor
that is based on the principles laid out in [25]. It has node depend on the transmission power of the
been well established that incorporating pricing schemes sender and the other sensor nodes in vicinity.
(in terms of reward and penalty) can stimulate a
cooperative environment, which benefits both the The receiving distance, is defined as the
network and the nodes. A traffic-pricing-based approach
maximum distance from which a receiving node can
was proposed in [26].
Though game theory has been used to study various correctly recover a transmitted signal.
aspects of ad hoc and sensor networks, there is none
that tries to find the optimal transmission power levels The interference distance, , is defined as the
when the nodes are allowed both continuous and maximum distance from which a receiving node can
discrete power levels. The problem arises due to the sense a carrier.
difficulty in characterizing the information that each
sensor node has about the others. Hence, seeking the The signal power level at each receiver is controlled
desired operating point in the incomplete-information by the corresponding transmitter and is equal to the
scenario becomes a challenge. Though there are several lowest possible operational threshold. Since the
game theory power control approaches for cellular
internodal distance varies randomly, the required
networks (see [31] and references therein), those
centralized algorithms cannot be directly applied to transmit power is different for different transmitter
sensor networks. In this paper, we attempt to develop a receiver pairs.
game theoretic framework that helps the nodes decide
on the optimal power levels for a specified objective Fig. 1 shows node w as the receiver under
given by the utility function. consideration. Node u, while transmitting to node v, acts
as an interferer to node w. Note that the reverse need
III. INTERFERENCE FOR RANDOMLY not necessarily be true since the transmission power of
DISTRIBUTED NODES node u and node w can be different. A rigorous
treatment for the distribution of interference power can
be found in [32].
We consider the problem of communication
between neighboring nodes in a network that consists of Fig. 1. Interference at node from a local neighbor
sensor nodes scattered randomly over an area. Given node .
that the sensor nodes have limited energy, buffer space,
and other resources, contention-based protocols may
not be a suitable option. Here, as an alternative, we use
code division multiplexing, where distinct codes
311
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
A. Game Formulation
We consider that the strategy profile of all the nodes is
identical, i.e., all the nodes can transmit with a power
We assume a set of homogeneous nodes in our sensor level between and .Since all the nodes are
network playing repeated game. The information from identical, we assume that the set of allowable transmit
previous rounds are used to devise strategies in future powers is applicable to all the nodes. Hence, ,
rounds. We focus our attention on a particular node with
where i and j denote any two nodes, and is the fixed
potentially as many as N neighbors within the
strategy profile of any node. Then, (2) reduces to
interference range. Due to homogeneity of the nodes,
the actions allowed by the nodes are the same, i.e., all
the nodes can transmit with any power level to make its
transmission successful. Also, the nodes have no
B. Utility
information if the other nodes are transmitting, hence
leading to an incomplete information scenario [34]. If
the nodes transmit with an arbitrary high power level, it
The game is played by having all the nodes
will increase the interference level of the other nodes.
The neighboring nodes in turn will transmit at higher simultaneously pick their individual strategies. This set
power to overcome the effect of high interference. of choices results in some strategy profile , which
Soon, this will lead to a noncooperative situation. To we call the outcome of the game. Each node has a set
control this noncooperative behavior, we try to devise of preferences over these outcome s . At the end of
an equilibrium game strategy which will impose an action, each node receives a utility value.
constraints on the nodes to act in cooperative manner
even in a noncooperative network.
We assume the existence of some strategy sets
for the nodes 1, 2,3,4,5,...,( ). These Where is the strategy profile of all the nodes but for
sets consist of all possible power levels ranging from the the th node. Note that the utility each node receives
minimum transmit power to maximum transmit depends not only on the strategy it picked, but also on
the strategies which all the other nodes picked. In other
312
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
313
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Then, the optimal transmit power is the power level, from all its neighboring nodes. We show the results for
which will maximize the expected power efficiency. two different modulation schemes: DPSK and
noncoherent PSK. As expected, with improvement in
A. Distortion Factor channel condition, i.e., with increase in SINR, the
probability of successful transmissions increases.
Fig. 4 present the maximum power efficiency for both
We define distortion factor, , as the difference schemes. More precisely, from the graphs, we find that
between the best possible net utility obtainable with if SINR is low and transmitting power P is high,
continuous power level and the best possible net utility where , then the power efficiency is
obtained with discrete power levels. almost equal to zero. This proves our previous claim,
Fig. 2. Average bit error rate. which is, during bad channel conditions or below a
certain threshold channel condition (when the SINR of
Given the transmission powers in both continuous and the intended receiver node is very low), a node should
discrete cases, respectively, as and , the distortion not transmit. This only increases its power consumption
factor for the th node is represented by and thus expected power consumption is no longer
minimized. On the contrary, when the SINR is high, a
node should transmit with low power to maximize its
power efficiency. In this case, increasing transmitting
where represents the strategy profile of rest of the power unnecessarily will decrease the power efficiency
nodes. With increase in number of power levels , the below its maximum.
distortion can be reduced. It is intuitive that there will be an optimal value of
beyond which the net utility will only decrease. This
figure serves as a guideline for calculating the desired
transmitting power to maximize net utility for a node
transmitting to node given the strategies taken by all
other nodes.
We consider that the sensor nodes can transmit For finding the best response to the strategies adopted
uniformly in the range . We assume that the by other nodes, we assume a subset of nodes to be
SINR received by the nodes is uniformly distributed active that are operating with fixed strategies. Fig. 6
between . For our calculation, we shows the effect of having nonuniform power levels.
assume and . SINR is assumed We choose 1, 5, 20, 30, 50, and 100 mW as the power
to range from -12:5 dB to 11.5 dB. levels. For our calculation, we varied transmitting power
Fig. 2 and 3 show the average bit error rate and from 1 to 100 mW. We find that there exist points for
probability of successful transmissions, respectively, for each of the cost functions considered (i.e., linear,
different values of SINR (in dB) perceived by node j quadratic, and exponential), which give the maximum
net utility given the strategies taken by all other nodes
314
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
as fixed. This desired transmitting power level gives the existence of Nash equilibrium. We found that Nash
best response for the node. If a node unilaterally equilibrium exists if we assume a minimum and
changes its strategy and does not transmit with this maximum threshold for channel condition and power
transmitting power level, then the node will not get its level, respectively.
best response and will not be able to reach Nash We suggest that a node should only transmit when its
equilibrium even if Nash equilibrium exists for this condition is better than the minimum threshold and its
model. transmission power level is below the threshold power
Fig. 5 plots the net utility against the transmission level. We evaluated the desired power level at which the
power for a fixed received power. We compare nodes.should transmit to maximize their utilities under
continuous power level with two sets of discrete power any given condition.
levels; one set has six and the other has 20 power
levels. The power levels are uniformly spaced between
the maximum and the minimum. As expected, with
more number of allowed power levels, the maximum net
utility gets closer to that as obtained by continuous
power levels. Here, we compare our proposed
mechanism of finding discrete power levels based on
interference distribution with uniform spaced discrete
power levels. The result shows that the distortion factor
is reduced with increase in number of power levels.
Moreover, the distortion obtained is reduced if the
knowledge of the interference is used instead of having
uniform equal spaced power levels.
Fig. 6.Net utility for nonuniform power levels.
VIII. REFERENCES
315
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[3] R. Yates, ―A Framework for Uplink Power Parallel and Distributed Systems, vol. 19, no. 1,
Control in Cellular pp. 66-76, Jan. 2008.
[4] Radio Systems,‖ IEEE J. Selected Areas in [12] S. Sengupta and M. Chatterjee, ―An
Comm., vol. 13, no. 7, pp. 1341-1348, Sept. Economic Framework for Dynamic Spectrum
1995. Access and Service Pricing,‖ IEEE/ACM Trans.
Networking, vol. 17, no. 4, pp. 1200-1213, Aug.
[5] S. Clearwater, Market-Based Control: A 2009.
Paradigm for Distributed Resource Allocation.
World Scientific, 1996. [13] M. Kubisch, H. Karl, A. Wolisz, L. Zhong,
and J. Rabaey ―Distributed Algorithms for
[6] F. Kelly, A. Maulloo, and D. Tan, ―Rate Control Transmission Power Control in Wireless Sensor
in Communications Networks: Shadow Prices, Networks,‖ Proc. IEEE Wireless Comm. And
Proportional Fairness and Stability,‖ J. Networking Conf., vol. 1, pp. 558-563, 2003.
Operations Research Soc., vol. 49, pp. 237-252,
1998. [14] D. Niyato, E. Hossain, M. Rashid, and V.
Bhargava, ―Wireless Sensor Networks with
[7] P. Key and D. McAuley, ―Differential QoS and Energy Harvesting Technologies: A Game-
Pricing in Networks: Where Flow Control Meets Theoretic Approach to Optimal Energy
Game Theory,‖ IEE Proc.Software, vol. 146, no. Management,‖ IEEE Wireless Comm., vol. 14,
1, pp. 39-43, Feb. 1999. no. 4, pp. 90-96, Aug. 2007.
[8] H. Lin, M. Chatterjee, S. Das, and K. Basu, [15] J. Nash, ―Equilibrium Points in N-Person
―ARC: An Integrated Admission and Rate Control Games,‖ Proc. Nat‘l Academy of Sciences, vol.
Framework for CDMA Data Networks Based on 36, pp. 48-49, 1950. S. Buchegger and J. Le
Non-Cooperative Games,‖ Proc. Ninth Ann. Int‘l Boudec, ―Performance Analysis of the
Conf. Mobile Computing and Networking, pp. CONFIDANT Protocol,‖ Proc. Third ACM Int‘l
326-338, 2003. Symp. Mobile Ad Hoc Networking & Computing,
pp. 226-236, 2002.
[9] R. Maheswaran and T. Basar, ―Decentralized
Network Resource Allocation as a Repeated [16] L. Buttyan and J.P. Hubaux, ―Nuglets: A
Noncooperative Market Game,‖ Proc. 40th IEEE Virtual Currency to Stimulate Cooperation in
Conf. Decision and Control, vol. 5, pp. 4565- Selforganized Mobile Ad-Hoc Networks,‖
4570, 2001. Technical Report DSC/2001/001, Swiss Fed.
Inst. Of Technology, Jan. 2001.
[10] M. Stonebraker, R. Devine, M.
Kornacker,W. Litwin, A. Pfeffer, A. Sah, and C. [17] W. Wang, M. Chatterjee, and K. Kwiat,
Staelin, ―An Economic Paradigm for Query ―Enforcing Cooperation in Ad Hoc Networks with
Processing and Data Migration in Mariposa‖ Unreliable Channel,‖ Proc. Fifth IEEE Int‘l Conf.
Proc. Third Int‘l Conf. Parallel and Distributed Mobile Ad-Hoc and Sensor Systems (MASS), pp.
Information Systems, pp. 58-67, Sept. 1994. 456-462, 2008.
[11] R. Subrata, A. Zomaya, and B. [18] V. Srinivasan, P. Nuggehalli, C.
Landfeldt, ―Game-Theoretic Approach for Load Chiasserini, and R. Rao, ―Cooperation in
Balancing in Computational Grids,‖ IEEE Trans.
316
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Wireless Ad Hoc Networks,‖ Proc. IEEE Data Networks,‖ IEEE Trans. Comm., vol. 53,
INFOCOM, vol. 2, pp. 808-817, Apr. 2003. no. 11, pp. 1885-1894, Nov. 2005.
317
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
318
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
rate of wireless channel increases the congestion already-received packets can still be
possibility. This is not desirable for applications like authenticated by receivers.
real time online streaming or stock quotes delivering. 2. SMABS-B is efficient in terms of less latency,
Therefore, for applications where the quality of service computation, and communication overhead.
is critical to end users, a multicast authentication Though
protocol should provide a certain level of resilience to SMABS-E is less efficient than SMABS-B since it
packet loss. Efficiency and packet loss resilience can includes the DoS defense, its overhead is still at
hardly be supported simultaneously by conventional the same level as previous schemes.
3. We propose two new batch signature schemes
multicast schemes. In order to reduce computation
based
overhead, conventional schemes use efficient on BLS [36] and DSA [38] and show they are
signature algorithms [8], [9] or amortize one signature more
over a block of packets [10], [11], [12], [13], [14], efficient than the batch RSA [33] signature
[15], [16], [17], [18], [19], [20], [21], [22], [23], scheme.
[24], [25], [26] at the expense of increased The rest of the paper is organized as follows: We
communication overhead [8], [9], [10], [11] or briefly review related work in Section 2. Then, we
vulnerability to packet loss [12], [13], [14], [15], [16], present a basic scheme for lossy channels in Section 3,
[17], [18], [19], [20], [21], [22], [23], [24], [25], which also includes three batch signature schemes
[26]. based on RSA [33], BLS [36], and DSA, respectively
Another problem with schemes in [8], [9], [10], [38]. An enhanced scheme is discussed in Section 4.
[11], [12], [13], [14], [15], [16], [17], [18], [19], After performance evaluation in Section 5, the paper is
[20], [21], [22], [23], [24], [25] is that they are concluded in Section 6.
vulnerable to packet injection by malicious attackers.
An attacker may compromise a multicast system by X. II. RELATED WORKS
intentionally injecting forged packets to consume Schemes in [8], [9] follow the ideal approach of
receivers' resource, leading to Denial of Service signing and verifying each packet individually, but
(DoS).Compared with the efficiency requirement and reduce the computation overhead at the sender by
packet loss problems. In the literature, some scheme using one-time signatures [8] or k-time signatures [9].
attempt to provide the DoS resilience. However, they They are suitable for RSA [33], which is expensive on
still have the packet loss problem because they are signing while cheap on verifying. For each packet,
based on the same approach as previous schemes however, each receiver needs to perform one or more
[10], [11], [22], [23], [24], [25]. verification on its one-time or k-time signature plus
Recently, we demonstrated that batch signature one ordinary signature verification. Moreover, the
schemes can be used to improve the performance of length of one-time signature is too long (on the order
broadcast authentication [5], [6]. In this paper, we of 1,000 bytes).
present our comprehensive study on this approach and Tree chaining was proposed in [10], [11] by
propose a novel multicast authentication protocol constructing a tree for a block of packets. The root of
called SMABS (in short for Multicast Authentication the tree is signed by the sender. Each packet carries
based on Batch Signature). SMABS includes two the signed root and multiple hashes. When each
schemes. The basic scheme (called SMABS-B receiver receives one packet in theblock, it uses the
hereafter) utilizes an efficient asymmetric authentication information in the packet to
cryptographic primitive called batch signature which authenticate it. The buffered authentication
supports theauthentication of any number of packets information is further used to authenticate other
simultaneously with one signature verification, to packets in the same block.
address the efficiency and packet loss problems in Graph chaining was studied in [12], [13], [14], [15],
general environments. The enhanced scheme (called [16],
SMABS-E hereafter) combines SMABS-B with packet
[17], [18], [19], [20], [21]. A multicast stream is
filtering to alleviate the DoS impact in hostile
environments. SMABS provides data integrity, origin divided into
authentication, and nonrepudiation as previous blocks and each block is associated with a signature.
asymmetric key based protocols. In addition, we make In each block, the hash of each packet is embedded
the following contributions: into several other packets in a deterministic or
probabilistic way. The hashes form a graph, in which
1. Our SMABS can achieve perfect resilience to each path links a packet to the block signature. Each
packet loss in lossy channels in the sense that receiver verifies the block signature and authenticates.
no matter how many packets are lost the Erasure codes were used in [22], [23], [24], [25]. A
signature is generated for the concatenation of the
319
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
hashes of all the packets in one block and then is key management. In this paper, we focus on multicast
erasure-coded into many pieces. authentication.
All these schemes [10], [11], [12], [13], [14], [15],
[16],[17], XI. III. BASIC SCHEME
[18], [19], [20], [21], [22], [23], [24], [25] are Our target is to authenticate multicast streams from a
indeedcomputationally efficient since each receiver sender to multiple receivers. Generally, the sender is a
needs to verify only one signature for a block of powerful multicast server managed by a central
packets. However, they all increase packet overhead authority and can be trustful. The sender signs each
for hashes or erasure codes and the block design packet with a signature and transmits it to multiple
introduces latency when buffering many packets. receivers through a multicast routing protocol. Each
Another major problem is that most schemes [12], receiver is a less powerful device with resource
[13], [14], [15], [16], [17], [18], [19], [20], [21], constraints and may be managed by a non-
[22], [23], [24],[25], are vulnerable to packet loss trustworthy person. Each receiver needs to assure that
even though they are designed to tolerate a certain the received packets are really from the sender
(authenticity) and the sender cannot deny the signing
level of packet loss. If too many packets are lost, other
operation (nonrepudiation) by verifying the
packets may not be authenticated. In particular, if a
corresponding signatures.
block signature is lost, the entire block cannot be Ideally, authenticating a multicast stream can be
authenticated. achieved by signing and verifying each packet.
Moreover, previous schemes [8], [9], [10], [11], [12], However, the per-packet signature design has been
[13], criticized for its high computation cost, and therefore,
[14], [15], [16], [17], [18], [19], [20], [21], [22], most previous
[23], [24], [25], target at lossy channels, which are schemes [10], [11], [12], [13], [14], [15], [16], [17],
realistic in our daily life since the Internet and wireless [18], [19], [20], [21], [22], [23], [24], [25]
networks suffer from packet loss. In a hostile incorporate a block-based design as shown in Section
environment, however, an active attacker can inject 2.
forged packets to consume receivers' resource, leading They do reduce the computation cost, but also
to DoS. In particular, schemes in [8], [9], introduce new problems. The block design builds up
[10], [11], [12], [13], [14], [15], [16], [17], [18], correlation among packets and makes them vulnerable
[19], [20], [21] are vulnerable to forged signature to packet loss, which is inherent in the Internet and
attacks because they require each receiver to verify wireless networks. Also, the heterogeneity of receivers
each signature whereby to authenticate data packets, means that the buffer resource at each receiver is
different and can vary over the
and schemes in [22], [23], [24], [25] suffer from
time depending on the overall load at the receiver. In
packet injection because each receiver has to theblock design, the required block size, which is
distinguish a certain number of valid packets from a chosen by the sender, may not be satisfied by each
pool of a large number of packets including injected receiver.
ones, which is very time-consuming. Third, the correlation among packets can incur addi-
In order to deal with DoS, schemes in were tional latency. Consider the high layer application
proposed.PARMis similar to the tree chaining scheme needs new data from the low layer authentication
[10], [11] in the sense that multiple oneway hash module in order to render a smooth video stream to
chains are used as shared keys between the sender the client user. It is desirable that the lower layer
and receivers. Unfortunately, these schemes are still authentication module delivers authenticated packets
vulnerable to DoS because they require that one-way to the high layer application at the time when the high
hash chains are signed and transmitted to each layer application needs new data. In the per-packet
receiver and therefore an attacker can inject forged signature design it is not a problem, since each packet
signatures for oneway hash chains. can be independently verifiable at any time. In the
PRABSuses distillation codes to deal with DoS. LTT block design, however, it is possible that the packets
buffered at the low layer authentication module are
uses error correction codes to replace erasure codes in
not verifiable because the correlated packets,
schemes [22], [23], [24], [25]. The reason is that
especially the block signatures, have not been
error correction codes tolerate error packets. These
received. Therefore, the high layer application has to
three schemes are resilient to DoS, but they still have
the packet loss problem. either wait, which leads to additional latency, or return
with a no-available-packets exception, which could be
In this paper, we focus on the signature approach.
interpreted as that the buffered packets are "lost."
Though confidentiality is another important issue for In view of the problems regarding the sender-
securing multicast, it can be achieved through group favored block-based approach, we conceive a receiver-
320
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
oriented approach by taking into account the random loss or burst loss. This is a significant
heterogeneity of the receivers. As receiving devices advantage over previous schemes [10], [11], [12],
have different computation and communication [13], [14], [15], [16], [17], [18], [19], [20], [21],
capabilities, some could be powerful desktop [22],
computers, while the others could be cheap handsets 4. [23], [24], [25], Meanwhile, efficiency can also be
with limited buffers and low-end CPUs. achieved because a batch of packets can be
In order to fulfill the requirement, the basic scheme authenticated simultaneously through one batch
SMABS-B uses an efficient cryptographic primitive signature verification operation. The packet indepen-
called batch signature which supports simultaneously dency also brings other benefits in terms of smaller
verifying the signatures of any number of packets. In latency and communication overhead compared with
particular, when a receiver collects n packets: previous
5. schemes [10], [11], [12], [13], [14], [15], [16],
Pi ={mt,at},i=1,...,n, [17], [18], [19], [20], [21], [22], [23], [24], [25]
wheremiis the data payload, aiis the corresponding 6. [32]. In particular, each receiver can verify the
signature, and n can be any positive integer, it can authenticity of all the received packets in its buffer
input them into an algorithm whenever the high layer applications require, and
there is no additional hash or code overhead in each
BatchVerify(p1,p2,. . . ,pn)2{True, False}. packet.
7. Next, we present three implementations. In
If the output is True, the receiver knows the n packets addition to the one based on RSA [33], we propose
are authentic, and otherwise not. two new batch signature schemes based on BLS [36]
To support authenticity and efficiency, the and DSA [38], which are more efficient than batch
BatchVerifyQalgorithm should satisfy the following RSA. We must point out and will show later that
properties: SMABS is independent from these signature
algorithms. This independency brings the freedom to
1. Given a batch of packets that have been signed optimize SMABS for a particular physical system or
by the sender, BatchVerify( ) outputs True. platform as much as possible.
2. Given a batch of packets including some
unauthentic packets, the probability that
Batch RSA Scheme
BatchVerify() outputs True is very low.
RSA
3. The computation complexity of BatchVerify() is
comparable to that of verifying one signature RSA [33] is a very popular cryptographic algorithm in
and is increased only gradually when the batch many security protocols. In order to use RSA, a sender
size n is increased. chooses two large random primes P and Q to get N =
The computation complexity of BatchVerify() comes PQ,
with the fact that there are some additional cost on and then calculates two exponents e, d 2
processing multiple packets. As we will show later, ed = 1 mod 0(N), where 0(N) = (P - 1)(Q - 1). The
those additional computations are mostly modular sender publishes (e, N) as its public key and keeps d
additions and multiplications, which are much faster in secret as its private key. A signature of a message
than modular exponentiations required in final m can be generated as a =(h(m))d mod N, where h()
signature verifications. Theoretically, a concern comes is a collision-resistant hash function. The sender sends
when the cost grows higher than the final signature {m,a} to a receiver that can verify the authenticity of
verification if the batch size is too large. However, it is the message m by checking ae= h(m) mod N.
not the case in reality. In order to show the merit of BATCH RSA
signature preaggregation, we implemented batch To accelerate the authentication of multiple signatures,
signature by using our Batch-BLS (will be discussed the batch verification of RSA [34], [35] can be used.
later) as an example. We measured the normalized Given n packets {mi,ai} , i = 1 , . . . ,n, where miis
time cost of batch signature verification with the batch the data payload, aiis the corresponding signature and
size growing from 1 to 1,000, and recorded the results n is any positive integer, the receiver can first calculate
for two scenarios, with and without signature hi = h(mi) and then perform the following verification:
preaggregation. SMABS-B uses per-packet signature
instead of per-block signature and thus eliminates the
correlation among packets. The packet independency
makes SMABS-B perfect resilient to packet loss. The
Internet and wireless channels tend to be lossy due to If all n packets are truly from the sender, the equation
congestion or channel instability, where packets can holds because
be lost according to different loss models, such as
321
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
322
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
attacker B can break BLS under the chosen message Boyd and Pavlovski pointed out in that Harn's
attack by colluding with A. work is still vulnerable to malicious attacks.
Proof.Suppose B is given n — 1 messages and Here, we propose a batch DSA scheme based on
their valid signatures B can Harn's work and counteract the attack described
forge a signa- in.
ture for any chosen message mn, such
that satisfies the BLS signature scheme, by
Ham DSA
colluding with A in the following steps: In Harn DSA, some system parameters are
defined as:
1. B sends n messages and n —
1 signatures to A. 1. p, a prime longer than 512 bits.
2. Because A can break the batch BLS scheme, 2. q, a 160-bit prime divisor of p — 1.
A generates n false 3. g, a generator of with order q,
signatures that pass the batch i.e.,
BLS verification, then returns to Ba 4. x, the private key of the signer, 0 < x < q .
value 5. y, the public key of the signer,
3. B computes as the signature 6. h(), a hash function generating an output
for mn, because in
Given a message m, the signer generates a
signature by:
1. randomly selecting an integer k with 0
<k<q,
2. computing h=h(m),
3. computing and
4. computings=rk—hxmodq.
The signature for mis (r, s).
The receiver can verify the signature by first
Also like batch RSA, an attacker may not forge computing h = h(m) and then checking whether
signatures but manipulate authentic packets to
produce invalid signatures. For instance, two
packets and Fori = j can be
replaced with and andstill pass This is because if the packet is authentic, then
the batch verification. However, it does not
affect the correctness and the authenticity of m i
Proof.Suppose B is given n — heir
and m j because they have been correctly signed
by the sender.
323
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
324
Fig. 4. Verification rate under the burst loss model with the maximum burst length 10
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
325
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
later, however, Tree [11] achieves this independency signatures and SMABS-E requires additional O(nlog2n)
by incurring large overhead and latency at the sender hashes. If long signatures are used (like 1,024-bit
and each receiver and is vulnerable to DoS, while our RSA), SMABS-B and SMABS-E have more
SMABS-B has less overhead and latency and SMABS-E communication overhead than those in [14], [16],
is resilient to DoS at the same level of overhead as [18], [23], which is the same case as Tree [11].
Tree [11]. However, BLS generates short signatures of 171 bits,
One thing needs to be pointed out is that we do not which is comparable to most well-known hash
differentiate between SMABS-B and SMABS-E in Fig. 3 algorithmsMD5 (128 bits) and SHA-1(160 bits).
and Fig. 4. SMABS-B is perfect resilient to packet loss Comparisons over DoS Channels
because of its inherent design. While it is not designed DoS is a method for an attacker to deplete the
for lossy channels, SMABS-E can also achieve the resource of a receiver. Processing injected packets
perfect resilience to packet loss in lossy channels. In from the attacker always consumes a certain amount
the lossy channel model, where no DoS attack is of resource. Here, we assume an attacker factor
assumed to present, we can set the threshold t = 1 (3,which means that for nvalid packets (3ninvalid
(refer to Section 4) for SMABS-E, and thus each packets are injected.
receiver can start batch-verification as long as there is For schemes in [11], [14], [16], [18], which
at least one packet received for each set of packets authenticate signatures first and then authenticate
constructed under the same Merkle tree. packet through hash chains, the attacker can inject
( n forged signature packets because signature
Efficiency verification is an expensive operation. For SAIDA [23],
We consider latency, computation, and communication which requires erasure decoding, the attacker simply
overhead for efficiency evaluation under lossy injects ( n forged packets because each receiver has to
channels and DoS channels. choose a certain number of valid packets from all the
Comparisons over Lossy Channels (1+ (3) n packet to do decoding, which can have a
SMABS-B and well-known loss-tolerant schemes tree significant number of tries.
chain (Tree) [11], EMSS [14], PiggyBack [16],
augmented chain (AugChain) [18], andSAIDA [23]. We 5.3 Comparisons of Signature Schemes
also include SMABS-E and three DoS resilient schemes We compare the computation overhead of three batch
PRABS [30], BAS [31], and LTT [32] in the table just signature schemes in Table 4. RSA and BLS require
for comparisons even though they are not designed one modular exponentiation at the sender and DSA
for lossy channels. requires two modular multiplications when r value is
Previous block-based schemes introduce latency computed offline. Usually one c-bit modular
either at the sender [11], [16] or at each receiver exponentiation is equivalent to 1.5c modular
[14], [31] or both [18], [23], [30], [32]. The latency is multiplications over the same field. Moreover, a c-bit
inherent in the block design due to chaining or coding. modular exponentiation in DLP is equivalent to a 6-bit
At the sender side, the correlation among a block of modular exponentiation in BLS for the same security
packets has to be established before the sender starts level. Therefore, we can estimate that the computation
sending the packets. At each receiver, the latency is overhead of one 1,024-bit RSA signing operation is
incurred when the high layer application waits for the roughly equivalent to that of 768 DSA signing
buffered packets to be authenticated after the operations (1,536 modular multiplications) and that of
correlation is recovered. This receiver side latency is 6 BLS signing operations (each one is corresponding to
variable depending on whether the correlation among 255 modular multiplications).
the underline buffered packets has been recovered or According to the report on the computational overhead
not when the high layer application needs new data, of signature schemes on PIII 1 GHz CPU, the signing
and its maximum value is the block size. SMABS-B and certification time for 1,024-bit RSA with a 1,007-
eliminates the correlation among packets. bit private key are 7.9 ms and 0.4 ms, for 157-bit BLS
In SMABS, a trade-off for perfect resilience to are 2.75 ms and
packet loss is that the sender needs to sign each 81 ms, and for 1,024-bit DSA with a 160-bit private
packet, which incurs more computation overhead than key(without precompiling r value) are 4.09 ms and
conventional block-based schemes. Therefore, efficient
4.87 ms. We can observe that for BLS and DSA the
signature generation is desirable at the sender.
Compared with RSA [33], which is efficient in verifying signing is efficient but the verification is expensive,
but is expensive in signing, BLS [36] and DSA [38] are and vice versa for RSA.
pretty good candidates as we will show later. Given the same security level as 1,024-bit RSA, BLS
For n packets, Tree [11] require an overhead of n generates a 171-bit signature and DSA generates a
signature and O(nlog2n) hashes, schemes in [14], 320-bit signature. It is clear that by using BLS or DSA,
[16], [18], [23], require one or more signatures and SMABS can achieve more bandwidth efficiency than
up to O(n2) hashes. SMABS-B and SMABS-E require n
326
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
using RSA, and could be even more efficient than [12] R. Gennaro and P. Rohatgi, "How to Sign
Digital Streams," Information and Computation,
conventional schemes using a large number of hashes. vol. 165, no. 1, pp. 100-116, Feb. 2001.
[13] R. Gennaro and P. Rohatgi, "How to Sign Digital
VI.CONCLUSION Streams," Proc. 17th Ann. Cryptology Conf.
To reduce the signature verification overheads in the Advances in Cryptology (CRYPTO '97),
secure multimedia multicasting, block-based Aug. 1997.
authentication schemes have been proposed. [14]A. Perri g, R. Canetti , J . D. T ygar, andD. Song, "
Unfortunately, most previous schemes have many E ffi ci ent Authentication and Signing of Multicast
problems such as vulnerability to packet loss and lack Streams over Lossy Channels," Proc. IEEE Symp.
of resilience to denial of service (DoS) attack. To Security and Privacy (SP 00), pp. 5675, May 2000.
overcome these problems, we develop a novel [15] Y. Challal, H. Bettahar, and A. Bouabdallah,
authentication scheme SMABS. We have demonstrated "A2Cast: AnAdaptive Source Authentication
that SMABS is perfectly resilient to packet loss due to Protocol for Multicast Streams," Proc. Ninth Int'l
the elimination of the correlation among packets and Symp. Computers and Comm. (ISCC '04), vol. 1,
can effectively deal with DoS attack. Moreover, we pp. 363-368, June 2004.
[16] S. Miner and J. Staddon, "Graph-Based
also show that the use of batch signature can achieve Authentication of Digital Streams," Proc. IEEE
the efficiency less than or comparable with the Symp. Security and Privacy (SP 01), pp. 232246,
conventional schemes. Finally, we further develop two May 2001.
new batch signature schemes based on BLS and DSA, [17] Z. Zhang, Q. Sun, W-C Wong, J. Apostolopoulos,
which are more efficient than the batch RSA signature and S. Wee, "A Content-Aware Stream
Authentication Scheme Optimized for Distortion
scheme. and Overhead," Proc. IEEE Int l Conf. Multimedia
and
Expo (ICME 06), pp. 541-544, July 2006.
[18] P. Golle and N. Modadugu, "Authenticating
REFERENCES Streamed Data in the Presence of Random Packet
[1] S.E. Deering, "Multicast Routing in Internetworks Loss," Proc. Eighth Ann. Network and Distributed
and Extended LANs," Proc. ACM SIGCOMM Symp. System Security Symp. (NDSS 01), Feb. 2001.
Comm. Architectures and Protocols, pp. 55-64, [19] Z. Zhang, Q. Sun, and W-C Wong, "A Proposal of
Aug. 1988.
[2] T. Ballardie and J. Crowcroft, "Multicast-Specific Butterfly-Graphy Based Stream Authentication over
Security Threats and Counter-Measures," Proc. Lossy Networks," Proc.IEEEInt l Conf. Multimedia
Second Ann. Network and Distributed System and Expo (ICME 05), July 2005.
Security Symp. (NDSS '95), pp. 2-16, Feb. 1995.
[3] P. Judge and M. Ammar, "Security Issues and [20] S. Ueda, N. Kawaguchi, H. Shigeno, and K.
Solutions in Mulicast Content Distribution: A Okada, "Stream Authentication Scheme for the
Survey," IEEE Network Magazine, vol. 17, no. 1, Use over the IP Telephony," Proc. 18th Int l Conf.
pp. 30-36, Jan./Feb. 2003.
[4] Y. Challal, H. Bettahar, and A. Bouabdallah, "A Advanced Information Networking and Application
Taxonomy of Multicast Data Origin Authentication: (AINA 04), vol. 2, pp. 164-169, Mar. 2004.
Issues and Solutions," IEEE Comm. Surveys & [21] D. Song, D. Zuckerman, and J.D. Tygar,
Tutorials, vol. 6, no. 3, pp. 34-57, Oct. 2004. "Expander Graphs for Digital Stream
[5] Y. Zhou and Y. Fang, "BABRA: Batch-Based Authentication and Robust Overlay Networks,"
Broadcast Authentication in Wireless Sensor Proc. 2002 IEEE Symp. Security and Privacy (S&P
Networks," Proc. IEEE GLOBECOM,Nov. 2006. '02), May 2002.
[6] Y. Zhou and Y. Fang, "Multimedia Broadcast
Authentication Based on Batch Signature," IEEE [22] J.M. Park, E.K.P. Chong, and H.J. Siegel, "Efficient
Comm. Magazine, vol.45,no.8,pp.72-77, 2007 MulticastPacket Authentication Using Signature
[7] K. Ren, K. Zeng, W. Lou, and P.J. Moran, "On Amortization," Proc. IEEESymp. Security and
Broadcast Authentication in Wireless Sensor
Networks," Proc. First Ann. Int'l Conf. Wireless Privacy (SP 02), pp. 227-240, May 2002.
Algorithms, Systems, and Applications (WASA '06) [23] J.M. Park, E.K.P. Chong, and H.J. Siegel, "Efficient
[8] S. Even, O. Goldreich, and S. Micali, "On-
Line/Offline Digital Signatures," J. Cryptology, vol. MulticastStream Authentication Using Erasure
9, pp. 35-67, 1996. Codes," ACM Trans. Information and System
[9] P. Rohatgi, "A Compact and Fast Hybrid Signature Security, vol. 6, no. 2, pp. 258-285, May 2003.
Scheme for Multicast Packet," Proc. Sixth ACM [24] A. Pannetrat and R. Molva, "Authenticating Real
Conf. Computer and Comm
Security (CCS 99), Nov. 1999. Time Packet Streams and Multicasts," Proc.
[10] C.K. Wong and S.S. Lam, "Digital Signatures for Seventh IEEE Int l Symp.Computersand Comm.
Flows and Multicasts," Proc. Sixth Int'l Conf. (ISCC 02), pp. 490-495, July 2002.
Network Protocols (ICNP '98), [25] A. Pannetrat and R. Molva, "Efficient Multicast
pp. 198-209, Oct. 1998. Packet Authentication," Proc. 10th Ann. Network
[11] C.K. Wong and S.S. Lam, "Digital Signatures for and Distributed System Security Symp. (NDSS
Flows and Multicasts," IEEE/ACM Trans. '03), Feb.
Networking, vol. 7, no. 4, pp. 502513, Aug. 1999.
327
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
**H.O.D, P.G.Department,
S.A.EngineeringcollegePoonomallee-Avadi road
Veeraragavapuram, Thiruverkadu post
Chennai-600077
Tamilnadu,
India.
328
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
329
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
In which linear transform can be coefficient around [14].coarser scale sub band has
processed by different process such as coefficient of parent as Fig-1.a
shearing, P scaling, rotation. It is
applied for provide the
interrelationship among the textons.
(n+1)th scale
(Lower frequency)
Provided solution by the Fourier transformation It permit the cross scale dependencies in natural
boundary formula is [U,V]T In which U,and V are images [22][20].we can‘t split the pictures. We can
coefficients of Fourier function .They representing to use the artifacts of block boundary. we can remove
the x, y coordination . the patch overlap instead of use the local method on
patches overlap.
Linear operator of Fourier Transformation is The consist of single large coefficient are indicating the
other large coefficient which is obtained from
[UK, VK]T = A [U0 , V0] T adjacent. it is adopted to the Gaussian Scale mixture
k
(GSM).From
Where [U0,V0]T is representing the coefficients of
transformation of affine . υ Rd representing the coefficient d of patches.
GSM is denoting = υ
2. B. Coefficient of patches as localized and GSM In which Z is the scalar hidden variable .it is
method: spatially varying and C is the covariance is the zero
mean Gaussian.
Amplitude of wavelets coefficient inhomogeeneity is
Marginal for statistics of coefficient of wavelets are provided by GSM model threw scalar variable z.it
subjected to the associative tractability. These provides the homogeneity of Gaussian process. The
coefficients are independent. Natural images original sub band marginal statics can be define by log
coefficients from adjacent space, scale locations are histogram.
providing the spastically interdependencies. These
concepts are utilized in many textures analysis, image g( ;C) = exp(-1/2υC-1υ )
codes, denoising and artifact destructing process [19]-
[21],[6],[18]. Zero is multi variance Gaussian with covariance C.
Multivariate probability models are generated by
means of captured dependencies from natural images
.it is applied for small patches of wavelets coefficients.
Denoising is determined by the collection of centered
330
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
.
III. Ease of Use
IMAGES
331
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
is applied to the image block over the domain Di,and Let Nsr N, and Ne denote the total numbers-parents
mapped onto the range cell Bi. andchildren-of range shade blocks, midrange blocks,
In figure 2, are displayed the first eight iterations of a and edge blocks inthe original image.
fractalcode for the ―Lena‖ image, applied to an initial The bit rate is equal to:
black image. Thevalues of the SNR between the
original ―lena‖ image and successiveterms of the +bpp
reconstruction sequence are listed in table 2.
.
Convergenceof the sequence of images is obtained, Where Np= B is the total number of parent blocks in
within .2 dB of accuracy, at the eighth iteration. Block
the image.
type Parametersshade gi
332
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Encoding specifications
Partition
Range blocks: 16x16 (parent), 8x8
(child)
Domain pool
Domain blocks: 16x16 (parent), 8x8
(child)
Classification: shade (s), midrange (m),
simple edge (se)
mixed edge (me)
SNR 27.7 dB
Transformation pool
Shade block: Absorption at gE{glmin . ,
Midrange block: glmax}
Edge block: Gray level scaling by
aE{.7,. . . ’1.0)
Translation by Ag€{-
glmax,...,glmax..)
Gray level scaling by a E {
.2, . . . , .9}
Translation by Age{ -
glmax,,,, . . . , glmax‖}
Isometrics { ln}0<n<7
:
System performance
I(x, y,t)=
=
333
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
signals ai( t ) exceeds the dimensionality of the movie but may heapplied at any time within the movie
I ( x , y, t ) . The model is illustrated schematically sequence.
infigure 1
The coefficients for a given image sequence are
computedby maximizing the posterior distribution over whereS is a non-convex function appropriate for
the coefficients shaping
the prior to heof sparse form [i.e., more peaked at
A = arg max P(a/I:Ө ) zero andwith heavy tails a5 compared to a Gaussian of
a the same variance,a? shown in figure 2). Here we use
= arg max P(I/a, Ө)P(a/Ө) S(x) = log(1 + )2)where eis a scaling parameter, and
a
controls thedegree of sparseness.
P(a/Ө) =
P(ai(t))=
334
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
335
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
4. EXPERIMENTAL RESULTS
336
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
symmetry with other patches, particularly patches and synthesis,‖ IEEE Trans. Image Process., vol. 7, no.
from background areas with uniform intensity. The 10, pp.
first obstacle, therefore, was to develop a texture 1466–1476, Oct. 1998.
analyzer so that a patch with structural texture can be [2] R. Haralick, ―Statistical and structural approaches
distinguished and treated differently from to texture,‖ Proc.
nonstructural textures. Two different approaches to IEEE, vol. 67, no. 5, pp. 786–804, May 1979.
estimate the distance in affine space were presented; [3] R. Haralick, K. Shanmugam, and I. Dinstein,
one based upon warping residue and the other based ―Texture features for
upon affine invariant features. The latter provides a Image classification,‖ IEEE Trans. SMC, vol. SMC–3,
more practical solution in terms of computational no. 6, pp.
efficiency. Affine invariance has received much 610–621, Nov. 1973.
attention with the recent emergence of content based [4] B. Olshausen and D. Field, ―What is the other 85%
retrieval systems, which could take advantage of this of V1 doing?,‖ in
work. The majority of existing texture analysis 23 Problems in Systems Neuroscience. London, U.K.:
methods, however, are not designed to analyze Oxford Univ.
texture from an invariance viewpoint. Several Press, 2004.
noteworthy geometric invariant analysis methods have [5] B. Olshausen and D. Field, ―Emergence of simple-
the same common theme of directional pattern cell receptive field
recognition. This led us to develop further the texture properties by learning a sparse code for natural
model used for structured-ness analysis into a fully images,‖ Nature, vol.
affine invariant descriptor. The usefulness of the new 381, pp. 607–609, 1996.
affine [6] E. Oja, ―A simplified neuron model as a principal
invariant feature was demonstrated in a multi component analyzer,‖
resolution framework for the segmentation of a J. Math. Biol., vol. 15, pp. 267–273, 1982.
textured object. This work not only presents an [7] R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki ,
interesting approach to the segmentation task but also Probabilistic Models
offers a feasible solution for efficient implementation. of the Brain: Perception and Neural Function.
The underlying concept has been applied to image Cambridge, MA:MIT
classification by many researchers but few have Press, 2002.
applied the affine PARK et al.: AN AFFINE SYMMETRIC [8] E. O. A. Hyvärinen and J. Karhunen, Independent
IMAGE MODEL AND ITS APPLICATIONS 1705 Component Analysis.
symmetry model to segmentation by partitioning the Hoboken, NJ: Wiley, 2001.
image into blocks. The complexity of the algorithms, [9] B. Julesz, ―Texture gradients: The texton theory
however, has been a major issue prohibiting practical revisited,‖ Spatial Vis.,
implementations. The motivation has been to develop vol. 1, no. 1, pp. 19–30, 1985.
a computationally efficient image texture classification [10] S. Zhu, Y. Z. W. C. Guo, and Z. J. Xu, ―What are
algorithm while maintaining the texture discriminative textons?,‖ Int. J.
power of previous approaches. The simplicity and Comput.Vis., pp. 121–143, 2005.
efficiency of the presented approach utilizing an affine [11] B. Olshausen and D. Field, ―Sparse coding with an
invariant shape description is demonstrated. It may be over-complete
of interest where efficient texture segmentation is basis set: A strategy employed by V1?,‖ Vis. Research,
required. Experimental evaluation indicates acceptable vol. 37, pp.
segmentation results for structural texture and the 3311–3325, 1997.
algorithm‘s robustness to noise. Further study utilizing [12] B. Olshausen, ―Learning sparse, over complete
a random field segmentation framework with other representations of time varying
useful features may improve the algorithm, thereby natural images,‖ in Proc. IEEE Int. Conf. Image
determining the optimal number of segmented Processing,
regions. Additionally, it can also be utilized for image 2003, vol. 1, pp. 41–44.
compression. [13] A. Jacquin, ―A novel fractal block-coding
technique for digital images,‖
REFERENCES inProc. IEEE ICASSP, Alberquerque, NM, Apr. 1990,
[1] T. Hsu and R. Wilson, ―A two-component model of pp.
texture for analysis 2225–2228.
337
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[14] Y. Fisher, Fractal Image Compression: Theory and Process. vol. 38, no. 3, pp. 674–690, Mar. 1992.
Applications, [22] H. Park, G. Martin, and A. Bhalerao, ―Structural
ser. Communications and Information Theory. New texture segmentation
York: Using affine symmetry,‖ in Proc. IEEE Int. Conf. Image
Springer-Verlag, 1995. Processing,
[15] R. Bracewell, K. Chang, A. Jha, and Y. Wang, San Antonio, TX, Sep. 2007, pp. 49–52.
―Affine theorem for [23] Rawlinson and C. Li, ―A class of discrete
Two-dimensional Fourier transform,‖ Electron. Letts., Multiresolution random fields
vol. 29, no. 3, p. And its application to image segmentation,‖ IEEE
304, 1993. Trans. Pattern Anal.
[16] E. Brigham, The fast Fourier Transform and Its Mach. Intell., vol. 25, no. 1, pp. 42–56, Jan. 2003.
Applications. Upper [24] A. Bhalerao and R. Wilson, ―Affine invariant image
Saddle River, NJ: Prentice-Hall, 1988. segmentation,‖
[17] Z. Yao, N. Rajpoot, and R. Wilson, ―Directional Presented at the British Machine Vision Conference,
wavelet with Fourier type 2004, Kingston
bases for image processing,‖ in Wavelet Analysis and University, U.K..
Applications, [25] T. Smith, ―Texture Modeling and Synthesis in Two
ser. Applied and Numerical Harmonic Analysis, X. Y. T. and Three Dimensions,‖
QianAnd M. I. Vai, Eds. New York: Springer-Verlag, M.S. thesis, Dept. Comput. Sci., Univ.Warwick,
2007, pp. 123–142. Coventry,
[18] H. Park, G. Martin, and Z. Yao, ―Image denoising U.K., 2004.
with directional [26] H. Park, A. Bhalerao, G. Martin, and A. Yu, ―An
Bases,‖ in Proc. IEEE Int. Conf. Image Processing, San affine symmetric approach
Antonio, TX, to natural image compression,‖ in Proc. 2nd Int.
Sep. 2007, pp. 301–304. Mobile Multimedia
[19] D. Hammond and E. Simon celli, ―Image modeling Communications Conf., Alghero, Italy, Sep. 2006, pp.
and denoising 1–6.
With orientation-adapted Gaussian scale mixtures,‖ [27] C. Li, ―Multiresolution image segmentation
IEEE Trans. Image integrating Gibbs sampler
Process. vol. 17, no. 11, pp. 2089–2101, Nov. 2008. ang region mergion algorithm,‖ Signal Process., vol.
[20] K. Arbter,W. Snyder, H. Burkhardt, and G. 83, pp. 67–78,
Harbinger, ―Application of 2003.
affine-invariant Fourier descriptors to recognition of 3- [28] D. Donoho and I. Johnstone, ―Adapting to
d objects,‖ IEEE unknown smoothness via
Trans. Pat. Anal. Mach. Intell., vol. 12, no. 7, pp. 640– Wavelet shrinkage,‖ J. Amer. Statist.Assoc., vol. 90,
647, Jul. 1990. no. 432, pp.1200–1224, 1995.
[21] R. Wilson, A. Calway, and E. Pearson, ―A [29] A. Bhalerao and R. Wilson, ―Warp let: An image-
generalized wavelet transform dependent wavelet
for Fourier analysis: The Multiresolution Fourier Representation,‖ in Proc. IEEE Int. Conf. Image
Transform and Processing, 2005, pp.
its application to image and audiosignal analysis,‖ IEEE 490–493.
Trans. Image
338
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
339
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The structure of the system (network disk. But it is not simple to do so. Because today‘s
topology, network latency, number of large data centers have so many disks that multiple
computers) is not known in advance, the disk failures are more common. Permanent data loss
system may consist of different kinds of becomes likely.
computers and network links, and the system The performance metrics is affected like
may change during the execution of a storage efficiency, saturation throughput, rebuild time,
distributed program mean time to data loss,
Each computer has only a limited, incomplete encoding/decoding/update/rebuild complexity, etc.
view of the system where each computer may Hence Erasure codes are used to overcome the above,
know only one part of the input. where the data are encoded on n-disks onto (n+m)
A distributed computer (also known as a disks such that the whole system can tolerate upto m
distributed memory multiprocessor) is a distributed disk failures.
memory computer system in which the processing Distributed networked storage systems aim to
elements are connected by a network. Distributed provide the storage service on the Internet. Current
computers are highly scalable. It is possible to roughly research on distributed networked storage systems
classify concurrent systems as "parallel" or focuses on:
"distributed" using the following criteria: Efficiency of the storage system
In parallel computing, all processors have Robustness of the storage systems
access to a shared memory. Shared memory The methods for accelerating the storing and
can be used to exchange information between retrieval processes should be done with minimal cost
processors. and maximal robustness. Since the Internet is a public
In distributed computing, each processor has environment that anyone can freely access, it is also
its own private memory (distributed memory). important to consider the privacy issue of the stored
Information is exchanged by passing information of the users. The goal is to design the
messages between the processors. distributed networked storage systems in such a way
There are two main reasons for using that privacy is guaranteed while maintaining the
distributed systems and distributed computing. distributed structure and to ensure that the data
1. The very nature of the application may require stored in the system remain private even if all storage
the use of a communication network that connects servers in the system are compromised.
several computers. For example, data is produced in
one physical location and it is needed in another 2. LITERATURE REVIEW
location.
2. There are many cases in which the use of a The purpose of distributed networked storage
single computer would be possible in principle, systems [2], [3], [4] is to store data reliably over a
but the use of a distributed system is very long period of time by using a distributed
beneficial for practical reasons. accumulation of storage servers. Long term reliability
For example, it may be more cost-efficient to requires some sort of redundancy. A straightforward
obtain the desired level of performance by using a solution is simple replication; however, the storage
cluster of several low-end computers, in comparison cost for the system is high. Erasure codes are
with a single high-end computer. A distributed system proposed in several designs for reducing the storage
can be more reliable than a non-distributed system, as overhead in each storage server [5], [6] after linear
there is no single point of failure. Moreover, a network codes [7], [8] are proposed. A decentralized
distributed system may be easier to expand and erasure code [9] is an erasure code with a fully
manage than a monolithic uniprocessor system. decentralized encoding process. Assume that there are
Distributed Storage needs increase almost n storage servers in the networked storage system,
exponentially – widespread use of e-mail, photos, and k messages are stored into the storage servers
videos, logs, etc. It cannot store everything on one such that one can retrieve the kmessages by only
large disk. Thus, if the disk fails, all the stored querying any k storage servers. The method of erasure
information will be lost. So, the solution is to store the codes provides some level of privacy guarantee since
users information along with some redundant the stored data in less than k storage servers are not
information across many disks. Even if a disk fails, enough to reveal all information about the k
then it still have enough information in the surviving messages. However, it is hard to assure that only less
disks, where lost information can be replaced in a new than k storage servers are compromised in an open
network. Thus, a more sophisticated method is
340
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
required to protect the data in the storage servers The figure provides an overview of system.
while the owner of the messages can retrieve them There are k messages Mi, 1 ≤ i ≤ k, to be stored into n
even if only some storage servers respond to the storage servers SSi, 1 ≤ i ≤ n. These messages are the
retrieval request. segments of a file. For those k messages, a message
identifier is assigned. Each message Mi is encrypted
3. SYSTEM DESIGN under the owner‘s public key pk as Ci= E(pk,Mi). Then,
each cipher text is sent to v storage servers, where
The scope of the project is to protect the data the storage servers are randomly chosen.
from the unauthorized access in the distributed
network storage system. To reduce the unauthorized
access, a concept called SDEC is used. Because of this
concept the data can be secured in the form of
encryption. SDEC are designed for reducing the
storage overhead in each storage server. An erasure
code is a fully decentralized encoding process. A SDEC
is proposed which combines the concepts of data
encryption and decentralized erasure codes. In this
code, the messages are stored in an encrypted form.
Even if the attacker compromises all storage servers,
he cannot compute information about the content of
the messages. In cryptography, the security of a
system lies on protection of the secret key. Thus, the
key servers that hold the secret key are set up or Each storage server SSi combines the received cipher
carefully chosen by the owner. Due to their texts by using the decentralized erasure code to form
importance, they are highly protected by various the stored data ζi. The owner‘s secret key sk is shared
system and cryptographic mechanisms[10][11]. among m key servers KSi, 1 ≤ i ≤ m, by a threshold
In this storage system, the owner shares his secret sharing scheme so that the key server KSi holds
decryptionkey to a set of key servers in order to a secret key share ski. To retrieve the k messages, the
mitigate the riskof key leakage. As long as less than t owner instructs the m key servers such that each key
key servers arecompromised by the attacker, the server retrieves stored data from u storage servers
decryption key is safe.Furthermore, as long as t key and does partial decryption for the retrieved data.
servers get cipher texts fromsome storage servers to Then, the owner collects the partial decryption results,
decrypt, the owner can computethe messages back. called decryption shares, from the key servers and
The system should maintain decentralized combines them to recover the k messages.
architecture while applying the data encryption. Thus,
a new threshold public key encryption scheme is used 5. PRELIMINARIES
such that each key server independently perform the
decryption. In traditional threshold public key This section briefly describes the bilinear maps and
encryption schemes [12], [13], decrypting a set of threshold public key encryption using bilinear map is
cipher texts requires that each of the key servers proposed. Also provides the overview of the
decrypts all of the cipher texts. On the other hand, in decentralized erasure codes.
new threshold public key encryption scheme,
decrypting a set of cipher texts only requires that each 5.1 BILINEAR MAPS AND ASSUMPTIONS
of the key servers decrypts one of the cipher texts. As
a result, the distributed networked storage system Bilinear map: If there are two cyclic
constructed by secure decentralized erasure code is multiplicative groups with prime order and generator,
secure and fully decentralized: each encrypted then the bilinear mapping can be generated provided if
message is distributed independently; each storage it satisfies bilinearity and non-degeneracy. The
server performs the encoding process independently; assumptions based on Bilinear map are:
and each key server executes decryption Bilinear Diffie-Hellman assumption: The
independently. assumption is that it is hard to solve the problem with
a significant probability in polynomial time.
4. SYSTEM ARCHITECTURE
341
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
342
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
3) Combining and decoding: The owner Proceedings ofthe 18th Symposium on Operating
chooses decryption share from all received data Systems Principles - SOSP. ACM, 2001, pp. 202–215.
and computes the message identifier. If the [5]S. Aceda´n ski, S. Deb, M. M´edard, and R.
number of the received decryption share is more Keettor, ―How good is random linear coding based
than the key servers, then the owner randomly distributed networked storage,‖ in Proceedings of the
selects out of them. If the number is less than key First workshop on Network Coding, Theory,and
servers, the retrieval process fails. By using the Applications - NetCod, 2005.
messages identifier, the owner decrypts and [6]C. Gkantsidis and P. Rodriguez, ―Network coding for
obtains the original information. large scale content distribution,‖ in Proceedings of
IEEE 24th AnnualJoint Conference of the IEEE
Computer and CommunicationsSocieties - INFOCOM,
6. CONCLUSION
vol. 4. IEEE Communications Society, 2005, pp. 2235–
2245.
Thus the distributed networked storage
[7]R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung,
system when constructed by SDEC provides both of
―Network information flow,‖ IEEE Transactions on
the storage service and the key management service
Information Theory, vol. 46, pp. 1204–1216, 2000.
where the construction will be fully decentralized; each
[8]S.-Y. R. Li, R. W. Yeung, and N. Cai, ―Linear
encrypted message will be distributed independently;
network coding,‖ IEEE Transactions on Information
each storage server might perform the encoding
Theory, vol. 49, pp. 371– 381, 2003.
process in a decentralized way; each key server
[9]A. G. Dimakis, V. Prabhakaran, and K.
queries the storage servers independently. Moreover,
Ramchandran, ―Decentralized erasure codes for
the secure distributed networked storage system
distributed networked storage,‖ IEEETransactions On
guarantees the privacy of messages even if all storage
Information Theory, vol. 52, no. 6, pp. 2809– 2816,
servers are compromised. Hence storage system
2006.
securely stores data for a long period of time on
[10]A. Herzberg, S. Jarecki, H. Krawczyk, and M.
untrusted storage servers in the distributed network
Yung, ―Proactive secret sharing or: How to cope with
structure.
perpetual leakage,‖ in Proceedings of the 15th Annual
International CryptologyConference - CRYPTO, ser.
Lecture Notes in Computer Science, vol. 963. Springer,
1995, pp. 339–352.
REFERENCES
[11]C. Cachin, K. Kursawe, A. Lysyanskaya, and R.
Strobl, ―Asynchronous verifiable secret sharing and
[1]Hsiao-Ying Lin and Wen-GueyTzeng ―A Secure
proactive cryptosystems,‖ in Proceedings of the 9th
Decentralized Erasure Code for Distributed Networked
ACM conference on Computer andcommunications
Storage‖, IEEE Transaction on Parallel and distributed
security - CCS. ACM, 2002, pp. 88–97.
systems, 2010.
[12]R. Canetti and S. Goldwasser, ―An efficient
[2]J. Kubiatowicz, D. Bindel, Y. Chen, S. E. Czerwinski,
threshold public key cryptosystem secure against
P. R. Eaton, D. Geels, R. Gummadi, S. C. Rhea, H.
adaptive chosen cipher text attack,‖ 1999, pp. 90–106.
Weatherspoon, W. Weimer, C. Wells, and B. Y. Zhao,
[13]D. Boneh, X. Boyen, and S. Halevi, ―Chosen cipher
―Oceanstore: an architecture for global-scale persistent
text secure public key threshold encryption without
storage,‖ in Proceedings ofthe Ninth international
random oracles,‖ in CT-RSA, ser. Lecture Notes in
Conference on Architectural Support for Programming
Computer Science, vol. 3860. Springer, 2006, pp. 226–
Languages and Operating Systems - ASPLOS, vol. 35.
243.
ACM, 2000, pp. 190–201.
[3]S. C. Rhea, C. Wells, P. R. Eaton, D. Geels, B. Y.
Zhao, H.Weatherspoon, and J. Kubiatowicz,
―Maintenance-free
343
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
344
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
guaranteed to converge back to its legitimate states decisions of their flood sequencing protocols) were not
where every sensor accepts every fresh flood message specified.
and discards every redundant flood message. In Scalable Reliable Multicast (SRM) [10], when a
In this paper, we discuss a family of four flood receiver in a multicast group detects that it has a
sequencing protocols. They are a sequencing free missing data message, it attempts to retrieve the
protocol, a linear sequencing protocol, a circular message from any node in the group by requesting
sequencing protocol, and a differentiated sequencing retransmission. This work is based on the assumption
protocol. We analyse the stabilization properties of that each data message has a unique and persistent
these four protocols. For each of the protocols, we first name, and it utilizes application data units to name
compute an upper bound on the convergence time of messages. In a flood sequencing protocol, sensors can
the protocol from an illegitimate state to legitimate use sequence numbers in a limited range for flood
states. Second, we compute an upper bound on the messages. Thus, the sensors cannot identify a
number of fresh flood messages that can be discarded message uniquely based on the sequence number of
by each sensor during the convergence. Third, we the message, and cannot use the sequence number
compute an upper bound on the number of redundant for requesting retransmission and replying to a
flood messages that can be accepted by each sensor request.
during the convergence. The protocols in [11], [12] use named data that is
specific to applications for dissemination and routing in
XIII. RELATED WORK sensor networks. However, a flood sequencing
A flood sequencing protocol can be designed in protocol can be used, before any application is
various ways, depending on several design decisions deployed in the network. Thus, using named data is
such as how the next sequence number is selected by not suitable for a flood sequencing protocol.
the base station, how each sensor determines based
on the sequence number in a received message if the
received message is fresh or redundant, and what
information the base station and each sensor stores in
its local memory.The practice of using sequence
numbers to distinguish between fresh and redundant
flood messages has been adopted by most flood
protocols in the literature.
345
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
346
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
.
Fig. 4. A specification of sensor u in a sequencing Fig. 5. A specification of sensor 0 in a linear
free protocol. sequencing protocol
The stabilization property of the sequencing free When sensor u receives a data (h, s) message,
protocol can be stated by the following three sensor u accepts the message if s > slast, and
theorems: Theorem 1A gives an upper bound on the forwards the message if h > 1. Otherwise,
convergence time of the protocol from an illegitimate sensor u discards the message. The receiving action of
state to legitimate states. Theorem 1B gives an upper u is given as follows:
bound on the number of fresh messages that can be Let k be the maximum value between 1 and k‘,
discarded by each sensor during the convergence. where k‘ is the maximum difference slast: u-slast: 0
Theorem 1C gives an upper bound on the number of for any sensor u in the network at an initial state. Note
redundant messages that can be accepted by each that the value of k is finite but it is unbounded.
sensor during the convergence. Theorem 2A. In the linear sequencing protocol,
Theorem 1A. In the sequencing free protocol, starting from any illegitimate state, the protocol
starting from any illegitimate state, the protocol reaches a legitimate state within (k+1)*f time units,
reaches a legitimate state within 2* f time units, and and continues to execute within legitimate states.
continues to execute within legitimate states.
347
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Fig. 6. A specification of sensor u in a linear returns true if s is logically larger than slast, and
sequencing protocol otherwise returns false. Sensor u accepts the message
if Larger(s, slast) returns true, and forwards it if h > 1.
Theorem 2B. In the linear sequencing protocol, The receiving action of sensor u is modified as follows:
starting from any illegitimate state, every sensor
discards at most (k+1)*f fresh messages (before the
protocol converges to a legitimate state).
348
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
flood messages, the differentiated sequencing protocol Theorem 4A. In the differentiated sequencing
cannot be self-stabilizing. protocol, starting from any illegitimate state, the
Sensor 0 in this protocol is identical to the one in protocol reaches a legitimate state within (smax/ 2 +
the circular sequencing protocol. However, when a 2)* f time units, and continues to execute within
sensor u receives a data (h, s) message, sensor u legitimate states.
accepts the message if s is different from slast, and
forwards the message if h > 1. The receiving action of Theorem 4B. In the differentiated sequencing
sensor u is modified as follows: protocol, starting from any illegitimate state, every
sensor discards at most (smax/ 2 + 2)* f fresh
messages (before the protocol converges to a
legitimate state).
349
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
3) For each in-neighbor w of v, other than u, if w Fig. 3. Reach of the four flood sequencing protocols
sends a message at t, then a random integer starting from an illegitimate state in sparse networks.
number is uniformly selected in the range 0... (a) A 10_10 network. (b) A 20_20 network
99 and the selected number is at least 100-p‘,
where p0 is the probability label of edge (w,
v). If the selected number is less than 100 –
p‘, then this message sent by w collides with
the message sent by u.
XVII. CONCLUSIONS
In this paper, we discussed a family of the four
flood sequencing protocols that use sequence numbers
to distinguish between fresh and redundant flood
messages. The members of our family are the
sequencing free protocol, the linear sequencing
protocol, the circular sequencing protocol, and the
differentiated sequencing protocol. We concluded that
the differentiated sequencing protocol has better
350
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
overall performance in terms of communication, and in Wireless Multi-Hop Networks,‖ Proc. IEEE
stabilization and stable properties, compared to those INFOCOM, 2006.
of the other three protocols. Note that our analysis is [10] S. Floyd, V. Jacobson, C. Liu, S. McCanne, and
useful for sensor network designers or developers to L. Zhang, ―A Reliable Multicast Framework for
select a proper flood sequencing protocol that satisfies Light-Weight Sessions and Application Level
the needs of a target sensor network. Framing,‖ IEEE/ACM Trans. Networking, vol. 5,
no. 6, pp. 784-803, Dec. 1997.
XVIII. ACKNOWLEDGMENT [11] J. Kulik, W. Heinzelman, and H. Balakrishnan,
The authors are grateful to anonymous referees for ―Negotiation- Based Protocols for Disseminating
their helpful comments. A preliminary version of this Information in Wireless Sensor Networks,‖
paper appeared at the IEEE Transaction on Parallel Wireless Networks, vol. 8, nos. 2/3, pp. 169-
and Distributed Computing (July 2010)[1]. 185, 2002.
[12] C. Intanagonwiwat, R. Govindan, and D. Estrin,
XIX. REFERENCES ―Directed Diffusion: A Scalable and Robust
[1] Young-ri Choi, and Chin-Tser Huang, Communication Paradigm for Sensor Networks,‖
―Stabilization of Flood Sequencing Protocol in Proc. ACM MobiCom, 2000.
Sensor Network‖ IEEE Trans. Parallel and
Distributed Systems, vol. 21, no. 7, July. 2010.
[2] S. Ni, Y. Tseng, Y. Chen, and J. Sheu,
―TheBroadcast Storm Problem in a Mobile Ad
Hoc Network,‖ Proc. ACM MobiCom, pp. 151-
162, 1999.
[3] B. Williams and T. Camp, ―Comparison of
Broadcasting Techniques for Mobile Ad Hoc
Networks,‖ Proc. ACM Int‘l Symp. Mobile Ad
Hoc Networking and Computing, 2002.
[4] D.B. Johnson and D.A. Maltz, ―Dynamic Source
Routing in Ad Hoc Wireless Networks,‖ Mobile
Computing, Chapter 5, vol. 353, pp. 153-181,
Kluwer Academic Publishers, 1996.
[5] W. Peng and X. Lu, ―AHBP: An Efficient
Broadcast Protocol for Mobile Ad Hoc Network,‖
J. Science and Technology, 2001.
[6] A. Durresi, V. Paruchuri, S.S. lyengar, and R.
Kannan, ―Optimized Broadcast Protocol for
Sensor Networks,‖ IEEE Trans. Computer, vol.
54, no. 8, pp. 1013-1024, Aug. 2005.
[7] H. Sabbineni and K. Chakrabarty, ―Location-
Aided Flooding: An Energy-Efficient Data
Dissemination Protocol for Wireless-Sensor
Networks,‖ IEEE Trans. Computers, vol. 54, no.
1, pp. 36-46, Jan. 2005.
[8] D. Ganesan, B. Krishnamurthy, A. Woo, D.
Culler, D. Estrin, and S. Wicker, ―An Empirical
Study of Epidemic Algorithms in Large Scale
Multihop Wireless Networks,‖ IRP-TR-02-003,
2002.
[9] M. Heissenbu¨ ttel, T. Braun, M. Waelchli, and
T. Bernoulli, ―Optimized Stateless Broadcasting
351
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*
Dr.KARTHIKEYANI.V **PARVIN BEGUM.I***TAJUDIN.K ****SHAHINA BEGAM.I.,
* ** *** ****
Assistant Professor, Lecturer, .Lecturer, Asst.Professor,
Department of Computer Department Of Department Of Department of MCA.,
Science, Govt. Arts Computer Application, Computer Science, VelTech, Dr.RR &
College for Women,Salem- Soka Ikeda College Of New College, Dr.SR Engg College,
08 Arts and Science ,Ch- Royapeetah,Ch-14 Ch
99
drvkarthikeyani@gmail.co tajudinap@gmail.com sbshahintaj@gmail.c
m parvinnadiya@gmail.c om
om
Key words: TextMining,DataMining,association
rule.
ABSTRACT
I.INTRODUCTION
This paper describes knowledge discovery
HE information age is characterized by a rapid growth
through text mining for extracting association
rules from a collection of database. The main for information available in electronic media such as
contribution of the technique with Information
Retrieval(TF-IDF) It consists of three phases (i)
collecting a database , (ii) Association rule databases, data warehouses, intranet documents,
mining phase , The EART system treats texts
business emails and www. This growth has created a
only not images or figures. EART discovers
association rules amongst keywords labeling demanding task called Knowledge Discovery in
the collection of text database. (iii)
Databases (KDD) and in Texts (KDT). Therefore,
Visualization phase. The experiments applied on
diseases .The term mining refers to loosely researchers and companies in recent years [1] focused
coupled to finding relevant information or
on this task and significant progress has been made.
discovery knowledge from a large volume of
data .Knowledge discovered from a database Text Mining (TM) and Knowledge discovery in Text
can be represented by a set of
(KDT) are new research areas that try to solve the
rules.Experiments applied on a collection of
database selected from MEDLINE that are problem of information overload by using techniques
related to the outbreak of chikungunya, TB in
from Data Mining, Machine Learning, Natural
Tamil nadu.
LanguageProcessing (NLP).NLP components improving
352
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
overtime.A manual search on a corpus is not sufficient source of knowledge and considered as assets, it is
for giving answers to more complex research worthwhile to invest in efforts to get access to these
questions[8].The NLP field can be used to support sources.
scientific work. Information Retrieval (IR), Information
Extraction (IE) and Knowledge The outline of the paper is as the follows: in section II,
we present the EART system. Experimental results are
present in section III, Section IV presents the related
work. Section V provides conclusion and future work.
II EART System
Association rule is one of the important techniques of In the Text Mining analysis to collect the data in the
data mining. Association rules highlight correlations disease in public health organization .Based on the
betweenkeywords in the texts. Moreover, association data to create a database on name of the district
rules are easy to understand and to interpret for an diseases name .
analyst. In this paper, we focus on the extraction of
association rules amongst keywords labeling the B).Association Rule
database.Since collections of database are a valuable
353
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The goal ofAssociation rules mining to generate all among the two sets containing nominal or ordinal data
possible rules that exceeds some minimum user– items. More specifically, such an association rule
specified Support and Confidence Thresholds[7]. should translate the following meaning: Statistical
basis is represented in the form of minimum support
Various proposals for mining association rules from and confidence measures of these rules
transaction data were presented on different context.
Some of these proposals are constraint-based in the C).Visualization Phase
sense that all rules must fullfill a predefined set of
conditions, such as support and confidence [4],[6]. Information visualization for text mining typically
involves producing a 2D or 3D representation that
An Enhanced Scaling Apriori for Association Rule exposes selected types of semantic patterns in data
Mining Efficiency form X + Y, where X and Y are sets base collection.Visualization characteristics for the
of items. The problem is to find all association rules rendering to be meaningful. It can be generate
that satisfy user-specified minimum support and visualizing based on attributes such as keyword,
minimum confidence constraints. Conceptually, this District id ,District name,Support and Confidence.
problem can be viewed as finding associations Principal components visualizations that represent
between the ―l‖ values in a relational table where all relationships among association rule based.Text data
the attributes are Boolean. The table has an attribute needs a separate visual tools that combine numeric
corresponding to eachitem and a record corresponding and textual information.The higher dimensionality of
to each disease. The value of an attribute for a given text makes it harder to view than numeric data with its
record is ―1― if the disease corresponding to the fewer dimensions.This approach supports the user in
attribute is present in the transaction corresponding to identifying quickly the main topics or concepts by their
the record, ―0‖ else Relational tables in most disease importance on the representation..The ability to
and scientific domains have richer attribute types. visualize large data set lets users quickly explore the
Attributes can be quantitative (e.g. effected, status) semantic relationships that exist large collection of
Boolean attributes can be considered a special case of data‘s[4].
categorical attributes. This research work defines the
problem of mining association rules over quantitative III EXPERIMENTAL RESULT
attribute in large relational tables and techniques for
discovering such rules. This is referred as the
Quantitative Association Rules problem.
The problem of mining association rules in categorical
data presented in diseases
The original problem of mining association rules was
formulated as how to find rules of theform set1, set2.
This rule is supposed to denote affinity or correlation
354
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
355
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
DATA REPORT
Did Dname T C
no B hi
k
101 Chennai 1 1
102 Coimbatore 1 0
103 Cudalore 1 1
104 Dharmapuri 1 1
105 Dindigual 1 0
106 Erode 1 0
107 Kanchipuram 1 0
356
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
357
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract—We study the following problem: A data attributes, or one can replace exact values by ranges
distributor has given sensitive data to a set of [18]. However, insome cases it is important not to
supposedly trusted agents (third parties). Some of the alter the original distributor‘s data. For example, if an
data is leaked and found in an unauthorized place outsourcer is doingour payroll, he must have the exact
(e.g., on the web or somebody‘s laptop). The salary and customer bank account numbers. If medical
distributor must assess the likelihood that the leaked researchers will be treating patients (as opposed to
data came from one or more agents, as opposed to simply computing statistics),they may need accurate
having been independently gathered by other means. data for the patients.Traditionally, leakage detection is
We propose data allocation strategies (across the handled by watermarking,e.g., a unique code is
agents) that improve the probability of identifying embedded in each distributed copy. If that copy is
leakages. These methods do not rely on alterations of later discovered in the hands of an unauthorized party,
the released data (e.g.,watermarks). In some cases the leaker can be identified.Watermarks can be very
we can also inject ―realistic but fake‖ data records to useful in some cases, but again, involve some
further improve our chances of detecting leakage and modification of the original data.Furthermore,
identifying the guilty party. watermarks can sometimes be destroyed if the data
recipient is malicious.In this paper we study
1 INTRODUCTION unobtrusive techniques for detecting leakage of a set
In the course of doing business, of objects or records. Specifically we study the
sometimessensitivedata must be handed over to following scenario: After giving a set of objects to
upposedly trusted third parties. For example, a agents, the distributor discovers some of thosesame
hospital may give patient records to researchers who objects in an unauthorized place. (For example, the
will devise new treatments. Similarly,a company may data may be found on a web site, or may be
have partnerships with other companiesthat require obtainedthrough a legal discovery process.) At this
sharing customer data. Another enterprise may point the distributor can assess the likelihood that the
outsource its data processing, so data must be given leaked data came from one or more agents, as
to various other companies.We call the owner of the opposed to having been independently gathered by
datathe distributor and the supposedly trusted third other means. Using an analogy with cookies stolen
parties the agents. Our goal is to detect when the from a cookie jar, if we catch Freddie with a single
distributor‘s sensitive data has been leaked by agents, cookie, he can argue that a friendgave him the cookie.
and if possible to identify the agent that leaked the But if we catch Freddie with 5 cookies, it will be much
data.We consider applications where the original harder for him to argue that his hands were not in the
sensitive data cannot be perturbed. Perturbation is a cookie jar. If the distributor sees ―enough evidence‖
very usefultechnique where the data is modified and that an agent leaked data, he may stop doing business
made ―less sensitive‖ before being handed to agents. with him, or may initiate legal proceedings.In this
For example,one can add random noise to certain paper we develop a model for assessing the ―guilt‖ of
358
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
agents. We also present algorithms for distributing Suppose that after giving objects to agents, the
objects to agents, in a way that improves our chances distributor discovers that a set S ⊆T has leaked. This
of identifying a leaker. Finally, we also consider the means that some third party called the target, has
option of adding ―fake‖ objects to the distributed been caught in possession of S. For example, this
set.Such objects do not correspond to real entities but target may be displaying S on its web site, or perhaps
appear realistic to the agents. In a sense, the fake as part of a legal discovery process, the target turned
objects acts as a type of watermark for the entire set, over S to the distributor.Since the agents U1, . . . , Un
without modifying any individual members. If it turns have some of the data, it is reasonable to suspect
out an agent was given one or more fake objects that them leaking the data. However,the agents can argue
were leaked, then thedistributor can be more that they are innocent, and that theSdata was
confident that agent was guilty. We start in Section 2 obtained by the target through other means.
by introducing our problem setup and the notation we For example, say one of thebjects in S represents a
use. In the first part of the paper, Sections 4 and 5, customer X. Perhaps X is also a customer of some
we present a model for calculating ―guilt‖ probabilities other company, and that company provided the data
in cases of data leakage. Then, in the second part, to the target. Or perhaps X can be reconstructed from
Sections 6 and 7, we present strategiesfor data variouspublicly available sources on the web. Our goal
allocation to agents. Finally, in Section 8, we evaluate is to estimate the likelihood that the leaked
the strategies in different data leakage scenarios,and data came from the agents as opposed to other
check whether they indeed help us to identify a leaker. sources.Intuitively, the more data in S, the harder it is
2. PROBLEM SETUP AND NOTATION for the agents to argue they did not leak anything.
2.1 Entities and Agents Similarly,the ―rarer‖ the objects, the harder it is to
A distributor owns a set T = {t1, . . . , tm} of argue that the target obtained them through other
valuabledata objects. The distributor wants to share means. Not onlydo we want to estimate the likelihood
some of theobjects with a set of agents U1, U2, ...,Un, the agentsleakeddata, but we would also like to find
but does not wish the objects be leaked to other third out if one of themin particular was more likely to be
parties. The objects in T could be of any type and size, the leaker. For instance, if one of the S objects was
e.g., they could be tuples in a relation, or relations in a only given to agent U1, while the other objects were
database.An agent Uireceives a subset of objects given to all agents, we may suspect U1 more. The
Ri⊆T,determined either by a sample request or an model we present next captures this intuition.
explicit request:• Sample request Ri= SAMPLE(T,mi): We say an agent Uiis guilty and if it contributes one or
Any subset of mi records from T can be given to Ui.• more objects to the target. We denote the event that
Explicit request Ri= EXPLICIT(T, condi): Agent agent Uiis guilty as Giand the event that agent Uiis
Uireceives all the T objects that satisfy condi. guilty for a given leaked set S as Gi|S. Our next step is
Example.Say T contains customer records for a given to estimate Pr{Gi|S}, i.e., the probability that agent
company A. Company A hires a marketing agency U1 Uiis guilty given evidence S.
to do an on-line survey of customers. Since any
customers will do for the survey, U1 requests a sample 3 RELATED WORK
of 1000 customer records. At the same time, company The guilt detection approach we present is related to
Asubcontracts the data provenance problem [3]: tracing the lineage
with agent U2 to handle billing for all California of S objects implies essentially the detection of the
customers. Thus, U2 receives all T records that satisfy guilty agents. Tutorial [4] provides a good overview on
the condition ―state is California.‖ theresearch conducted in this field. Suggested
Although we do not discuss it here, our model can be solutions are domain specific, such as lineage tracing
easily extended to requests for a sample of objects for data warehouses [5], and assume some prior
that satisfy a condition (e.g., an agent wants any 100 knowledge on the way a data view is created out of
California customer records). Also note that we do not data sources.Our problem formulation with objects
concern ourselves with the randomness of a and sets is moregeneral and simplifies lineage tracing,
sample.(We assume that if a random sample is since we do not consider any data transformation from
required, there are enough T records so that the to- Risets to S.As far as the data allocation strategies are
be-presented objectselection schemes can pick concerned, our work is mostly relevant to
random records from T.) watermarking that is used as a means of establishing
original ownership of distributed objects. Watermarks
2.2 Guilty Agents were initially usedin images [16], video [8] and audio
data [6] whose digital representation includes
359
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
considerable redundancy.Recently, [1], [17], [10], [7] probability that U2 is guilty decreases significantly: all
and other works have also studied marks insertion to of U2‘s 8 objects were also given to U1, so it gets
relational data. Our approachand watermarking are harderto blame U2 for the leaks. On the other hand,
similar in the sense of providing agents with some kind U2‘s probability of guilt remains close to 1 as p
of receiver-identifying information. However, by its increases, since U1 has 8 objects not seen by the
very nature, a watermark modifies the item being other agent. At the extreme, as p approaches 1, it is
watermarked. If the object to be watermarked very possible that thetarget guessed all 16 values, so
cannot be modified then a watermark cannot be the agent‘s probability of guilt goes to 0.
inserted. In such cases methods that attach 5.2 Impact of Overlap between Riand S
watermarks to the distributed data are not In this subsection we again study two agents, one
applicable.Finally, there are also lots of other works on receiving all the T = S data, and the second one
mechanismsthat allow only authorized users to access receiving a varying fraction of the data. Figure 1(b)
sensitivedata through access control policies [9], [2]. shows the probability of guilt for both agents, as a
Such approaches prevent in some sense data leakage function of the fraction of the objects owned by U2,
by sharing information only with trusted parties. i.e., as a function of|R2 ∩ S|/|S|. In this case, p has a
However, thesepolicies are restrictive and may make it low value of 0.2, and U1 continues to have all 16 S
impossible to satisfy agents‘ requests. objects. Note that in our previous scenario, U2 has
To compute this Pr{Gi|S}, we need an estimate for 50% of the S objects. We see that when objects are
theprobability that values in S can be ―guessed‖ by rare (p = 0.2), it does not take many leakedobjects
thetarget. before we cansay 2 is guilty with high confidence. This
4 AGENT GUILT MODEL result matches our intuition:an agent that owns even a
small number of incriminating objects is clearly
probability that values in S can be ―guessed‖ by the suspicious.
target. For instance, say some of the objects in S are
emails of individuals. We can conduct an experiment 6 DATA ALLOCATION PROBLEM
and ask a person with approximately the expertise and
resources of the target to find the email of say 100 The main focus of our paper is the data allocation
individuals. If this person can find say 90 emails, then problem.how can the distributor ―intelligently‖ give
we can reasonably guess that the probability of finding data to agents in order to improve the chances of
oneemail is 0.9. On the other hand, if the objects in detecting a guilty agent? As illustrated in Figure 2,
question are bank account numbers, the person may there are fourinstances of this problem we
only discover say 20, leading to an estimate of 0.2. We address,epending on thetype of data requests made
call this estimatept, the probability that object t can be by agents and whether ―fake objects‖ are allowed.
guessed by the target. The two types of requests we handle were defined in
Section 2: sample and explicit. Fake objects are
objects generated by the distributor that are not in set
5 GUILT MODEL ANALYSIS T. The objects are designed to look like real objects,
and are distributed to agents together with the T
In order to see how our model parameters interact objects, in orderto increase the chances of detecting
and to check if the interactions match our intuition, in agents that leak data. We discuss fake objects in more
this section we study two simple scenarios. In each detail in Section 6.1 below.As shown in Figure 2, we
scenario we have a target that has obtained all the represent our four problem nstances with the names
distributor‘s objects, i.e., T = S. EF, EF, SF and SF, where Estands for explicit requests,
5.1 Impact of Probability p S for sample requests, F for the use of fake objects,
In our first scenario, T contains 16 objects: all of them and F for the case where fake objects are not allowed.
are given to agent U1 and only 8 are given to a Note that for simplicity we are assuming that in the E
second agent U2. We calculate the probabilities problem instances, all agents make explicitwhile in the
Pr{G1|S} and Pr{G2|S} for p in the range [0,1] and S instances, all agents make sample requests.Our
we present the results in Figure 1(a). The dashed line results can be extended to handle mixed cases, with
shows Pr{G1|S}and the solid line shows Pr{G2|S}. As some explicit and some sample requests. We provide
p approaches 0, it becomes more and more unlikely here a small example to illustrate how mixed requests
that the target guessed all 16 values. Each agent has can be handled, but then do not elaborate further.
enough of the leaked data that its individual guilt Assume that we have two agents with requests R1 =
approaches 1. However, as p increases in value, the EXPLICIT(T, cond1) and R2 = SAMPLE(T_, 1)where T_
360
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
= EXPLICIT(T, cond2). Further, say cond1 is scenarios with objects shared among fewer agents are
―state=CA‖ (objects have a state field). If agent U2 obviously easier to handle.As far as scenarios with
has the same condition cond2 = cond1, we can create many objects to distribute and many overlapping
an equivalent problem with sample data requests on agent requests are concerned,they are similar to the
setT_. That is, our problem will be how to distribute scenarios we study, since we can map them to the
the CA objects to two agents, with R1 = SAMPLE(T_, distribution of many small subsets.In our scenarios we
|T_|)and R2 = SAMPLE(T_, 1). If instead U2 uses have a set of |T| = 10 objects
condition ―state=NY,‖ we can solve two different for which there are requests by n = 10 different
problems for sets T_ and T − T_. In each problem we agents.times for each value of B. The results we
will have only one agent. Finally, if the conditions present are the average over the 10 runs.
partially overlap,R1 ∩ T_ _= ∅, but R1 _= T_ we can
solve three differentproblems for sets R1 − T_, R1 ∩
T_ and T_ − R1.
9 CONCLUSIONS
7 ALLOCATION STRATEGIES
In a perfect world there would be no need to hand
In this section we describe allocation strategies that over sensitive data to agents that may unknowingly or
solve exactly or approximately the scalar versions of maliciously leak it. And even if we had to hand over
Equation 8 for the different instances presented in sensitive data, in a perfect world we could watermark
Figure 2.We resort to approximate solutions in cases each object so that we could trace its origins with
where it is inefficient to solve accurately the absolute certainty. However, in many cases we must
optimization problem.In Section 7.1 we deal with indeed workwith agents that may not be 100%
problems with explicit datarequests and in Section 7.2 trusted, and we may not be certain if a leaked object
with problems with sample data requests. came from an agent or from some other source, since
8 EXPERIMENTAL RESULTS certain data cannot admit watermarks.
In spite of these difficulties, we have shown it is
We implemented the presented allocation algorithms possible to assess the likelihood that an agent is
in Python and we conducted experiments with responsiblefor a leak, based on the overlap of his data
simulated data leakage problems to evaluate their with the leaked data and the data of other agents, and
performance.In Section 8.1 we present the metrics we
based on the probability that objects can be ―guessed‖
use for the algorithm evaluation and in Sections 8.2
and 8.3 we present the evaluation for sample requests by other means. Our model is relatively simple, but we
and explicit data requests respectively. believeit captures the essential trade-offs. The
8.1 Metrics algorithms we have presented implement a variety of
In Section 7 we presented algorithms to optimize the data distribution strategies that can improve the
problem of Equation 8 that is an Approximation to the distributor‘s chances ofidentifying a leaker. We have
original optimization problem of Equation 7. In this
shown that distributing objects judiciously can make a
sectionwe evaluate the presented algorithms with
respect to the original problem. In this way we significant difference in identifying guilty agents,
measure not only the algorithm performance, but also especially in cases where there is large overlap in the
we implicitly evaluate how effective the approximation data that agents must receive.Our future work
8.2 Explicit Requests includes the investigation of agent guilt models that
In the first place, the goal of these experiments was to capture leakage scenarios that are not studied in this
see whether fake objects in the distributed data sets paper. For example, what is the appropriate model for
yield significant improvement in our chances of
cases where agents can collude and identify fake
detecting guilty agent. In the second place, we wanted
to evaluate our e-optimal algorithm relative to a tuples? A preliminary discussion of such a model is
random allocation.We focus on scenarios with a few available in [14]. Another open problem isthe
objects that are shared among multiple agents. These extension of our allocation strategies so that they can
are the most interestingscenarios, since object sharing handle agent requests in an online fashion (the
makes it difficultto distinguish a guilty from non-guilty presented strategies assume that there is a fixed set of
agents. Scenarios with more objects to distribute or agents withrequests known in advance).
361
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*Assistant Professor,
Department of Computer science and Engineering
S.A. Engineering College, Chennai.
Abstract A secured cloud computing for Life Care 1 Introduction In recent years, Wireless Sensor
integrated with Wireless Sensor Network (WSN) Networks (WSNs) have been employed to monitor
monitors human health, activities, and shares human health and provide life care services.
information among doctors, care-givers, clinics, Existing life care systems simply monitor human
and pharmacies in the Cloud and it incorporates health and rely on a centralized server to store
various technologies with novel ideas including; and process sensed data, leading to a high cost of
sensor networks, Cloud computing security, and system maintenance, yet with limited services and
activities recognition. In addition to that in low performance. For instance, Korea u-Care
emergency condition the person may System for a Solitary Senior Citizen
communicate with the care givers through voice Monitors human health at home and provide
communication and automatically the alert limited services like 24 hours×365 days safety
messages of emergency are sent to the care monitoring services for a SSC, emergency-
giver‘s mobile phone, and the main server. Thus connection services, and information sharing
the patients can provided with much cares and services. In this paper, I have proposed A secured
services. Also the cloud computing offers the Cloud computing for life care integrated with WSN
facility of sharing the resources with less cost and which monitors not only human health but also
provides any where accessing capability of human activities [1] including emergency
patient‘s details which is maintained in a secured notification to the care givers to provide low cost,
way thus nobody can access the details of the high-quality care service to users.
patients other than the authenticated and
authorized person. WSNs are deployed in home environments for
monitoring and collecting raw data. The software
Keywords Wireless sensor network (WSN), Voice architecture is built to gain data efficiently and
Over Internet protocol(VoIP), Activity recognition, precisely. Sensed data is uploaded to Clouds using
Accelerometer, Hidden Markov Model (HMM), a fast and scalable Sensor Data Dissemination
Conditional Random Fields (CRF). mechanism [2][3]. In the Cloud, this sensed data
is either health data or can be used to detect
362
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
human activities. For human activity recognition, tree, support vector machine and some other
there are two novel approaches: embodied kinds of classification methods were evaluated in
sensor-based and video-based activity recognition [9] and [6]. To make use of the sequential
[2]. In the former approach, a gyroscope and structure of activities, Hidden Markov Model
accelerometer - supported sensor is attached to (HMM) was used in [10]. Recently, Conditional
human body (e.g. on his/her wrist). By using Random Fields model (CRF) was introduced as a
gyroscope and accelerometer data, an activity is much better approach compared to HMM in
predicted or inferred based on Semi Markov sequential modelling. Thus, some researchers
Conditional Random Fields. Detected activities have successfully applied CRF to activity
could be simple (e.g. sitting, standing, and falling recognition [7], [11]. A limitation of both
down) or more complicated (e.g. eating, reading, conventional HMM and first-order CRF is the
teeth brushing, and exercising). In the latter Markovian property, which assumes that the
approach, activities are detected by collecting current state depends only on the previous state.
images from cameras, extracting the background Because of this assumption, the labels of two
to get body shapes and comparing to predefined adjacent states must be supposed to occur
patents. It can detect basic activities like walking, successively in the observation sequence.
seating, and falling down. Ontology engine is Unfortunately, the presumption is not always
designed to deduce high-level activities and make satisfied in reality. For example, in the activity
decisions according to user profile and performed recognition problem, two expected activities
activities. To access data on the Cloud, the user (activities that we want to recognize) are often
must authenticate and granted access separated by irrelevant activities (activities that
permissions. An image-based authentication and we do not intend to detect). Furthermore,
activity-based access control are proposed to constant self-transition probabilities cause the
enhance security and flexibility of user‘s access distribution of state‘s duration to be geometric [8]
[4][5]. Independent Clouds can collaborate with which is inappropriate to the real activity duration
each other by using Cloud Dynamic Collaboration model.For accessing the data on the Cloud, the
method [3]. Using these data on the Cloud, many user must authenticate and granted access
low-cost and high quality life cares can be permissions [4], a Human Identification Protocol
provided to users. based on the ability of humans to efficiently
process an image given a secret predicate. It is a
2 Paper contribution and Outline Paper challenge-response protocol in which a subset of
contribution are of two folds, first prosed a novel images presented satisfies a secret predicate
implementation of VoIP (Voice Over Internet shared by the challenger and the user. We
protocol), a voice communication between a conjectured e that it is hard to guess this secret
patients or others in the patients environments predicate for adversaries, both humans and
with the care givers through the servers programs. It can be efficiently executed by
administrators in the hospital environments with humans with the knowledge of the secret which in
human activity recognition by cameras [1]. turn is easily memorable and replaceable.
Second , the emergency alert messages are
passed to the doctors, care givers through the 4 Implementation The functional architecture is
mobile phones by passing Short message shown in the below Figure1 . First of all, human
services. activity data is captured from sensors and
cameras, then being transmitted to the Cloud
3 Related Works For human activity recognition Gateway. The gateway classifies data into health
in the patient environment a semi-CRF is used data, gyroscope and accelerometer data and
[2]. imaging data, and store in a local database. The
Filtering Module filtered redundant and noise data
So far, many algorithms have been proposed for to reduce communication overhead before
accelerometer based activity recognition. Decision sending to the Cloud. The filtered data is also
363
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Context-Aware Engine
Activity Context-
Recognition Awareness Health Data
Engine
Ontology Engine
Req
Security Manager
364
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Cloud data. Data is forwarded to authentic nurses [5] X.H. Le, S. Lee, Y. Lee, H. Lee. Activity-based
and doctors. Access Control Model to Hospital Information,
Proceeding of 13th IEEE Int. Conf. Embedded and
Real-Time Computing Systems and Applications
5 Conclusion and future work This paper (RTCSA 2007), Seoul, Korea, 2007, pp. 488-496.
presented A secured Cloud computing for life care [6]Ling Bao and Stephen S. Intille. Activity
integrated with WSN .It monitors human health as recognition from user annotated acceleration
well as activities and shares this information data. In Proceedings of the 2nd International
among doctors, care-givers, clinics, and Conference on Pervasive Computing, volume
pharmacies from the Cloud to provide low-cost 3001, pages 1–17, 2004.
and high-quality care to users. Thus proposed [7] Lin Liao, Dieter Fox, and Henry A. Kautz.
system is a combination of various technologies Extracting places and activities from gaps traces
with novel proposed ideas. using hierarchical conditional random fields.
International Journal of Robotics Research,
Future planning is to work on and provide more 26:119–134,2007.
services to different kind of disease patient‘s such [8]. Lawrence R. Rabiner. A tutorial on hidden
as nervous disorder, cardio problem disease markov models and selected applications in
patients. Improving the security and privacy of speech recognition. In Proceedings of the IEEE,
data available on Cloud is also in the pipeline. volume 77, pages 257–286, 1989.
Another extension is to extend its services military [9]. Nishkam Ravi, Nikhil Dandekar, Preetham
services. Mysore, and Michael L Littman. Activity
recognition from accelerometer data.
6 References In Proceedings of the 20th National Conference
[1]Xuan Hung Le, Sungyoung Lee1, Phan Tran Ho on Artificial Intelligence, volume 20, pages 1541–
Truc, La The Vinh, Asad Masood Khattak, 1546, 2005.
Manhyung Han, DangViet Hung, Mohammad M. [10]. Jaakko Suutala, Susanna Pirttikangas, and
Hassan, Miso (Hyung-Il) Kim, Kyo-Ho Koo, Young- Juha Rning. Discriminative temporal smoothing
Koo Lee, Eui-Nam Huh for activity recognition from wearable sensors. In
[2] L. Vinh, X.H. Le, S. Lee. Semi Markov Proceedings of the 4th International Symposium
Conditional Random Fields for Accelerometer on Ubiquitous Computing Systems, volume 4836,
Based Activity Recognition (submitted). pages 182–195,2007.
[3] M. Hassan, E. Huh. A Framework of Sensor- [11]. Douglas L. Vail, Manuela M. Veloso, and
Cloud Integration: Opportunities and Challenges. John D. Lafferty Lafferty.Conditional random fields
International Conference on Ubiquitous for activity recognition. In Proceedingsof the 6th
Information Management and Communication . International Joint Conference on
[4] H. Jameel, R.A. Shaikh, H. Lee and S. Lee. AutonomousAgents and Multi-agent Systems,
Human Identification through Image Evaluation page 235, 2007.
Using Secret Predicates. Topics in Cryptology -
CT-RSA 07, LNCS 4377 (2007) 67–84.
365
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
366
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The original Chaum mix network operates on entire Mixes with any tested batching strategy may fail
mail messages at a time and therefore does not need under flow-correlation attacks in the sense that, for a
to pay particular attention to latency added by the given flow over an input link, the adversary can
mixes. Increasingly, the data exchanged exceed by effectively detect which output link is used by the
far the capacity of mixes, for example, in file-sharing same flow. To overcome, the detection rate is used,
applications. As a result, current mixes operate on which is the probability that the adversary correctly
individual packets of a flow rather than on entire correlates flows into and out of a mix, defined as the
messages. In conjunction with source routing at the measure of success for the attack.
sender, this allows for very efficient network-level XXI. RELATED WORKS
implementations of mix networks. Chaum [1] pioneered the idea of anonymous
Mixes are also being used in applications where low communication . Since then, researchers have applied
latency is relevant, for example, voice-over-IP or the idea to different applications such as message-
video streaming. Many other applications, such as based e-mail and flow-based low-latency
traditional FTP or file-sharing applications, rely on communications, and they have developed new
delay-sensitive protocols, such as TCP, and are defense techniques as more attacks have been
therefore in turn delay-sensitive as well. For such proposed.
applications, it is well known that the level of traffic For anonymous e-mail applications, Chaum [1]
perturbation caused by the mix network must be proposed using relay servers, called mixes, which
carefully chosen in order to not unduly affect delay encrypt and reroute messages. An encrypted
and throughput requirements of the applications. message is analogous to an onion constructed by a
This paper focuses on the quantitative evaluation of sender, who sends the onion to the first mix:
mix performance. We focus our analysis on a a) Using its private key, the first mix peels off
particular type of attack, which we call the flow- the first layer, which is encrypted using the public key
correlation attack. In general, flow-correlation attacks of the first mix.
attempt to reduce the anonymity degree by b) Inside the first layer is the second mix‘s
estimating the path of flows through the mix network. address and the rest of the onion, which is encrypted
Flow correlation analyzes the traffic on a set of links with the second mix‘s public key.
(observation points) inside the network and estimates c) After getting the second mix‘s address, the
the likelihood for each link to be on the path of the first mix forwards the peeled onion to the second mix.
flow under consideration. An adversary analyzes the This process repeats all the way to the receiver.
network traffic with the intention of identifying which d) The core part of the onion is the receiver‘s
of several output ports a flow at an input port of a address and the real message to be sent to the
mix is taking. Obviously, flow correlation helps the receiver by the last mix.
adversary identify the path of a flow and Chaum proposed return address and digital
consequently reveal other critical information related pseudonyms for users to communicate with each
to the flow (e.g., sender and receiver). other anonymously.
C. Goals Zhenghao Zhang [2] proposes simultaneous Multiple
Major contributions are summarized as follows: Packet Transmission (MPT) to improve the downlink
1) Two Classes Of Correlation Methods performance of networks. Using this multiple packet
Formally model the behavior of an adversary who transmissions, two compatible packets can send
launches flow-correlation attacks. In order to simultaneously by the sender to two different
successfully identify the path taken by a particular systems. This will increase the performance of a
flow, the attacker measures the dependency of traffic network. Paper gives fast approximation algorithm
flows. Two classes of correlation methods are that is capable of finding a matching at least 75% of
considered, namely time-domain methods and the size of a maximum matching in a calculated time.
frequency-domain methods. There are some limits for arrival rate that can allow in
2) Detection Rate a network. The project can enhance MPT and the
It measures the effectiveness of a number of popular results show that the maximum arrival rate increases
mix strategies in countering flow-correlation attacks.
367
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
368
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
We define the distance function d(Xi,Yj) ,which in a batch and the time that elapses between two
measures the ―distance‖ between an input flow at batches.
input link ―i‖ and the traffic at output link ―j‖. The For threshold-triggered batching strategies, packets
smaller the distance, the more likely the flow on an leave the mix in batches. Hence, the interarrival time
input link is correlated to the corresponding flow on of packets in a batch is determined by the link
the output link. bandwidth, which is independent of the input flow.
Thus, the useful information to the adversary is the
Distance function
number of packets in a batch and the time that
d ( Xi , Yj )
elapses between two batches. Normalizing this
Once the distance function has been defined between relationship, we define the elements in pattern vector
an input flow and an output link, it can be easily carry Yj
out the correlation analysis by selecting the output Yj,k= Number of packets in batch in the sampling
link whose traffic has the minimum distance to input interval
flow pattern vector Xi. (Ending time of batch k)-(Ending time of batch
This paper focuses on preventing Flow- k-1)
correlation attack from the adversary node. The Flow- For timer-triggered batching strategies, a batch of
correlation attack can be overcome by using the packets is sent whenever a timer fires. The length of
Intermediate node. This intermediate node performs the time interval between two consecutive timer
the batching and the reordering techniques for events is a predefined constant. Thus, following a
providing security to the data via a congested similar argument made for the threshold-triggered
network. In this paper, Single mix can achieve a batching strategies, we define the elements in pattern
certain level of communication anonymity. The vector Yj as follows:
sender of a message attaches the receiver address to Yj,k = Number of packets in the kth time-out interval
a packet and encrypts it using the mix‘s public key. (time of kth time-out)-(time of(k-1)th time-out)
Upon receiving a packet, a mix decodes the packet.
Different from an ordinary router, a mix usually will = Number of packets in the kth time-out interval
not relay the received packet immediately. Rather, it Predefined inter-time-out length
collects several packets and then sends them out in a For the traffic without batching (i.e., the baseline
batch. The order of packets may be altered as well. strategy s0 defined in Table 1), we use similar
Techniques such as batching and reordering are methods defined for timer-riggered batching
simple means to perturb the timing behavior of strategies as shown in (5). The basic idea in the
packets across a mix, which in turn is considered methods for extraction of pattern vectors is to
necessary for mixes to prevent timing-based attacks. partition a sampling interval into multiple subintervals
Due to this batching and reordering the intruders and to calculate the average traffic rate in each
cannot hack any datas or files that are batched. subinterval as the values of the elements of traffic
D. Flow Pattern Vector Extraction pattern vectors. The above two methods differ on
Once the data are collected, the relevant pattern how to partition the interval, depending on which
vectors must be extracted. Recall that batching batching strategy is used by the mix. We take a
strategies in Table 1can be classified into two classes: similar approach to extract pattern vectors Xis
threshold-triggered batching (s1, s3, and s5)1 and corresponding to Yjs. Again, the specific method of
timer-triggered batching (s2,s4, s6, and s7). The subinterval partition depends on how the mix is
packet timing characteristics at the output link allows batching the packets.
for targeted feature extraction for these different XXII. PROCESS DESCRIPTION
classes of batching. For threshold-triggered batching A. Shortest Path Algorithm
strategies, packets leave the mix in batches. Hence, The shortest path problem is the problem of finding a
the interarrival time of packets in a batch is path between two vertices (or nodes) such that the
determined by the link bandwidth, which is sum of the weights of its constituent edges is
independent of the input flow. Thus, the useful minimized. Shortest path algorithms are applied to
information to the adversary is the number of packets automatically find directions between physical
369
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
locations, such as driving directions on web mapping entire mix has only one queue.Each of these two
websites like Map quest or Google Maps. methods has its own advantages and disadvantages.
The Shortest path algorithm is used in proposed The
scheme to find the intermediate system from the end-
systems. Then the batching algorithm is used for
sending the datas in a security manner. Algorithm
B. Batching Stratergies
Batching strategies are designed to prevent not only Name Adjust Algorithm
simple timing analysis attacks, but also powerful Strategy able
trickle attacks, flood attacks, and many other forms of Index Param
attacks. The seven batching strategies are listed in eters
Table 1, in which batching strategies from s1 to s4 S0 Simple none no batching or reordering
are denoted as simple mixes, while batching Proxy
strategies from s5 to s7 are denoted as pool mixes. S1 Threshold < rn > if n = ra, send n packets
Table 1 – Batching Strategies Mix
Glossary S2 Timed Mix < t > if timer times out, send n
packets
S3 Threshold < m, if timer times out, send n
queue size Or Timed t > packets; elseif n = m {send
Mix n packets; reset the timer}
threshold to control the packet sending
S4 Threshold < m,t if (timer times out) and (n
timer's period if a timer is used and > > ra), send n packets
Timed Mix
the minimum number of packets left in the pool for S5 Threshold <m,f if n = m + /, send m
pool Mixes Pool Mix > randomly chosen packets
a fraction only used in Timed Dynamic-Pool Mix S6 Timed <t,f> if (timer times out) and (n
Batching is typically accompanied by reordering. In Pool Mix > f), send n — f randomly
this proposed scheme, the attacks focus on the traffic chosen packets
characteristics. As reordering does not significantly S7 Timed <m,t,f, if (timer times out) and (n
change packet interarrival times for mixes that use Dynamic- p> > ra + /), send max (l,
batching, these attacks are unaffected by reordering. Pool Mix [p(n — f)\) randomly cho-
Thus, these results are applicable to systems that use sen packets
any kind of reordering methods. More precisely, control of link-based batching is distributed inside the
reorderings are in all cases caused by packets being mix and hence may have good efficiency.
delayed by the batcher, and can therefore be handled XXIII. SYSTEM ANALYSIS
by modifying the batching algorithm accordingly. Any A. Architecture
of the batching strategies can be implemented in two The proposed system architecture consist of three
ways: main modules (objectives), namely, the client
Link-Based Batching module, server module and a intermediator module.
Mix-Based Batching
Link-Based Batching, in which each output link has a
separate queue. Mix-Based Batching, in which the
370
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
371
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
372
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[5] O.R.D. Achives, ―Link Padding and the [7] P. Boucher, A. Shostack, and I. Goldberg,
Intersection Attack,‖ http://archives.seul.org/or/dev, ―Freedom Systems 2.0 Architecture,‖
2002. http://www.freedom.net/products/whitepapers
[6] P.F. Syverson, D.M. Goldschlag, and M.G. Reed, /Freedom_System_2_Architecture.pdf, Dec. 2000.
―Anonymous Connections and Onion Routing,‖ Proc. [8] R. Dingledine, N. Mathewson, and P. Syverson,
IEEE Symp. Security and Privacy, pp. 44-54, 1997. ―Tor: The Second-Generation Onion Router,‖ Proc.
13th USENIX Security Symp.,pp. 303-320, Aug. 2004
373
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
374
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2 discusses related work; Sections 3 elaborate check- will checkpoint during its execution, and let be the
pointing and provides a simulation-based comparison probability the job finishes the first time it executes,
between check-pointing and replication Section 4, in where β is the failure rate of the system. Then
turn, discusses simulation results while Sections 5 n
concludes the paper. Pc( n|t / k ) = (1)
XXVII. RELATED WORK
The fault tolerance scheme is responsible for detecting Is the probability that there are n failures for a job with
failure events and supporting schedulers to make k checkpoints. The expected number of times the job
appropriate decisions regarding scheduling of failed must run before it successfully completes is given by
jobs. Condor-G [6] detects resource failure using polling,
while heartbeat mechanism is used in Netsolve[7]. In ic ( n| t / k ) = Pc( n|t / k ) = -1 (2)
[8], a fault tolerance scheme for dynamic scheduler is
presented with the aim of improving performance in the The pdf the job will fail at time x with k checkpoints
presence of multiple simultaneous failures. A fault when a failure occurs is given by the average time it
tolerant resource broker architecture for economy based takes the job to complete with k checkpoints and
computational grid, which combines check-pointing and failures can occur is
transaction to provide fault-tolerant scheduling is
implemented in [9]. Failure models are created by Tc( t / k ) = dx = (3)
means of probabilistic distributions with fully
configurable parameters [10]. A large number of When f(x) is exponentially distributed, it can be shown
research efforts have already been devoted to fault that Tc (t) is
tolerance in the scope of distributed environments. Tc( 1- βr ) ( k – 1 )c. (4)
Aspects that have been explored include the design and
implementation of fault detection services [4], [5], as
Using equation (6) – (7), the optimal number of
well as the development of failure prediction [3], [6],
checkpointsfor some job with time t, recovery cost C,
[7], [8] and recovery strategies [9], [10]. The latter are
failure rate β, checkpoint cost c, and job service time
often implemented through job check-pointing in
distribution f(x) can be determined by minimizing the
combination with migration and job replication. Although
function given in (4) above and solving for k. for
both methods aim to improve system performance in
exponential distributions ( by using Equation (7) ) this
the presence of failure, their effectiveness largely
would be
depends on tuning runtime parameters such as the
kopt = γ + (5)
check-pointing interval and the number of replicas .
Determining optimal values for these parameters is far
from trivial, for it requires good knowledge of the Tc(t) gives the mean time to finish the task for some job
application and the distributed system at hand. time with fixed t and k. If job service times are modeled
by some distribution f(x), then the mean time to finish
XXVIII. CHECKPOINTING the job averaged overall possible job times is
Check-pointing is the process of saving the current state Tc(t) =
to a stable storage, so that whenever a failure occurs in
the resource running jobs are failed on account of
resource failure in between they will be allocated to Again, observe that the integral for Tc(t) does not
another available resource, instead of starting from the converge if . This indicates Tc(t) is
beginning. This reduces the execution time, cost, and infinite for all distributions that go to 0 more slowly that
increases the computing performance. exp(-β , which implies that Tc (t) falls into the class of
E. Job Completion Times With Check-pointing
PT distributions.
When a job checkpoint completed work is saved to
stable storage at intervals as it is executing. If the job To estimate Tc (t) for small β, can be replaced
fails, then it begins processing again from the last with .
checkpoint. Let k represent the number of times a job Thus, it can be shown that
375
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
(7)
Averaging over all possible job times, we get
T‘c ( t ) ( 1 – βr )
(8) Fig. 1. Simulated topology
When f(x) is exponentially distributed, it can be seen
that T‘c (t) is
T‘c (t) = ( 1- βr )
(9)
Using equation (6) – (7), the optimal number of
checkpoints for some job with time t, recovery cost C,
failure rate β, checkpoint cost c, and job service time
distribution f(x) can be determined by minimizing the
function given in (6) above and solving for k. for
exponential distributions ( by using Equation (7) ).
kopt = γ +
(10) Fig.2. Resources executing jobs with check-pointing
We get the above equation to utilize the number of
checkpoints sometimes breaks up the highly variable
behavior (i.e. infinite and variance), however, as the
mean time of the job increases, then using the optimal
number of checkpoints does not necessarily break up
this behavior. Truncation parameters are special cases
of hyper-exponential distributions that asymptotically
behave like PT distributions as the number of
exponential phases approach infinity. That is, if one
376
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
377
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
378
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
traditional hop-count routing metric, they neglect the we define N and L as the sets of the indices of all
problem of traffic load imbalance in the WMN. nodes and all links in the network, respectively. In
In the WMN, a great portion of users in tends to Table 1, we summarize all mathematical notations
communicate with outside networks via the wired introduced in this section.
gateways. In such environment, the wireless links
around the gateways are likely to be a bottleneck of
the network. If the routing algorithm does not take TABLE 1
account of the traffic load, some gateways may be Table of Symbols
overloaded while the others may not. This load
imbalance can be resolved by introducing a load-aware Notation Description
routing scheme that adopts the routing metric with N Set of indices of all nodes
load factor. When the load-aware routing algorithm is
L Set of indices of all links
designed to maximize the system capacity, the major
F Set of indices of flows
benefit of the load-aware routing is the enhancement
of the overall system capacity due to the use of C Set of indices of clusters
underutilized paths. Dr Set of indices of all intermediate links
on route r
II. SYSTEM MODEL Hl Set of indices of all routes passing
Each wireless router in a WMN is fixed at a location. through link l
Thus, the WMN topology does not change frequently Gf Set of indices of all possible routes for
and the channel quality is quasi-static. In addition, flow f
each wireless router serves so many subscribers (i.e., Qr Set of indices of all flows using router r
users) in general that the characteristic of the Mc Set of indices of all links in cluster c
aggregated traffic is stable over time. Therefore, we Vl Set of indices of all clusters including
design the routing scheme under the system model of link l
which topology and user configuration are stable. Pf,r Flow data rate of flow f
dl Effective transmission rate of link l
al Airtime ration of link l
Ratio of the time for data transmission
to the whole time
uf(x) Utility of flow f when the data rate in x
Α System wide fairness parameter
Pf Priority of flow f
Ζ Delay penalty parameter
Fig 1. Example mesh network.
In Fig. 1, in this figure, a node stands for a wireless
The WMN under consideration provides a connection
router, which not only delivers data for its own users,
oriented service, where connections are managed in
but also relays data traffic for other wireless r outers.
the unit of a flow. A flow is also unidirectional. A user
Among nodes, there are some gateway nodes
can communicate with the other user or the gateway
connected to the wired backhaul network. Each user is
node after setting up a flow connecting them. Since a
associated with its serving node. In this paper, we do
user is connected to a unique node, the flow between
not deal with the interface between a user and its
a pair of users can also be specified by the
serving node to focus on the mesh network itself.
corresponding node pair. The node where a flow starts
Through the serving node, a user can send (receive)
(ends) will be called the source (destination) node of
data traffic to (from) the other user in the WMN or to
the flow. Fig. 1 shows an example scenario where a
(from) outside networks via the gateway nodes. If
user intends to send data to outside networks. As seen
node can transmit data to node directly (i.e., without
in this figure, if a flow conveys data to (from) outside
relaying), there exists a link from the node into the
networks, all gateway nodes can be the destination
node m. In this paper, we define a link as
(source) node of the flow. We will identify a flow by an
unidirectional. For the mathematical representation,
379
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
index, generally f, and define F as the set of the cluster c. This process is the follows: Each node
indices of all flows in the network. estimates the total loads for all outgoing links and
Data traffic on a flow is conveyed to the destination broadcasts the estimated load periodically receives
node through a multihop route. We only consider broadcasting message, the cluster head can compute
acyclic routes. Thus, a route can be determined by the the total airtime ratio consumed by the links in its
set of all intermediate links that the route takes. We cluster. If the total airtime ratio exceeds the available
will index a route by rand define Df as the set of the airtime ratio, it means that the cluster is overloaded.
indices of all intermediate links on the route r. For a Therefore cluster h
flow, there can be a number of possible routes that
connect the source and destination nodes. Let G ead c increases .If the cluster c is not overloaded,
denote the set of the indices of all possible routes for that is ,its total airtime ratio is smaller than , the
flow f. cluster head c decreases
The cluster head c periodically broadcasts .
III. DISTRIBUTED IMPLEMENTATION
The routing scheme can be implemented in a Routing: The link cost of link l is calculated as
distributed way, which improves the scalability of the .
WMNs. In this section, discuss the distributed Since is the effective transmission rate reflecting
implementation of the proposed scheme. The flow the PHY transmission rate as well as the packet error
data rate vectors and the Lagrange probability, we can say that is equivalent to the
multipliers are distributively managed by the ETT. Therefore, the link cost in the proposed scheme
nodes in the WMN. The flow data rate vector can be viewed as the ETT augmented with the load
managed by the source node of the flow f. Recall control variable. To find the optimal route on which
is the single path flow data rate vector with the active the sum of the proposed link cost is minimized, we can
route of use either the existing proactive routing protocols or
,and the flow data rate on is equal to reactive routing protocols. The source node of flow f
. periodically finds the new optimal route by using these
For implementation, one node within a cluster is routing protocols. The source node is aware of the link
designated as the head of the cluster. The head of a costs on the current active route from the periodic
cluster is assumed to able to communicate with the report, and is also informed of the link cost on the
transmitted nodes of the links in its cluster. Let us call current active route from the periodic report, and is
the head of cluster c the ―cluster head‖c the cluster also informed of the link costs on the new optimal
head c takes the role of maintaining and updating route by the routing protocol. Based on these link
. costs, the source node decides whether to change the
When the dual decomposition method is used, active route or not.
different variables can be updated according to the
different time schedules. Therefore, in order to Flow/Congestion control: The source node periodically
improve the convergence speed, in practice, different recalculates the flow date rate by using
network entities carry out these operations the link cost on the active route. The source node
asynchronously, by using currently available lowers the flow data rate of its traffic to the flow data
information. Though it is difficult to prove that the rate, the source node can be quenched when the
asynchronous operation leads to the exact solution, we active route passes through the congested area of the
have confirmed by simulation that the solutions network.
produced by the asynchronous and the synchronous By the above three operations, network-wide load
operations are the same in our routing problem. In the balance can be achieved. If an area of the network is
following, we describe three operations in more detail overloaded, the link costs around the area are
when they are implemented asynchronously. increased by the link cost control operation. Then, the
source node of the flow passing through the area
Link cost control: For link cost control, the cluster head reduces its flow data rate, for find another route that
c gathers the information on the total load in the allows a higher flow data rate.
cluster c and adjusts to control the load on the
380
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
381
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
382
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Mrs. P. Mahalakshmi
Department of Computer Science Engineering,
Jerusalem College of Engineering,
Anna University Chennai,
Chennai, India.
383
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
static in their location. Thus, the focus of the gateway nodes connected to the wired backhaul
routing studies in the WMN has moved to network. Each user is associated with its serving
performance enhancement by using sophisticated node. In this paper, we do not deal with the
routing metrics. For example, as the routing interface between a user and its serving node to
metrics, researchers have proposed the expected focus on the mesh network itself. Through the
transmission number (ETX), the expected serving node, a user can send (receive) data
transmission time (ETT) and weighted cumulative traffic to (from) the other user in the WMN or to
ETT (WCETT), the metric of interference and (from) outside networks via the gateway nodes. If
channel switching (MIC), and the modified node can transmit data to node directly (i.e.,
expected number of transmissions (mETX) and without relaying), there exists a link from the node
effective number of transmissions (ENTs). into the node m. In this paper, we define a link as
Although these metrics have shown significant unidirectional. For the mathematical
performance improvement over the traditional representation, we define N and L as the sets of
hop-count routing metric, they neglect the the indices of all nodes and all links in the
problem of traffic load imbalance in the WMN. network, respectively. In Table 1, we summarize
In the WMN, a great portion of users in tends to all mathematical notations introduced in this
communicate with outside networks via the wired section.
gateways. In such environment, the wireless links
around the gateways are likely to be a bottleneck
of the network. If the routing algorithm does not TABLE 1
take account of the traffic load, some gateways Table of Symbols
may be overloaded while the others may not. This
load imbalance can be resolved by introducing a Notation Description
load-aware routing scheme that adopts the routing N Set of indices of all nodes
metric with load factor. When the load-aware
L Set of indices of all links
routing algorithm is designed to maximize the
F Set of indices of flows
system capacity, the major benefit of the load-
aware routing is the enhancement of the overall C Set of indices of clusters
system capacity due to the use of underutilized Dr Set of indices of all intermediate
paths. links on route r
Hl Set of indices of all routes passing
II. SYSTEM MODEL through link l
Each wireless router in a WMN is fixed at a Gf Set of indices of all possible routes
location. Thus, the WMN topology does not change for flow f
frequently and the channel quality is quasi-static. Qr Set of indices of all flows using
In addition, each wireless router serves so many router r
subscribers (i.e., users) in general that the Mc Set of indices of all links in cluster c
characteristic of the aggregated traffic is stable Vl Set of indices of all clusters including
over time. Therefore, we design the routing link l
scheme under the system model of which topology Pf,r Flow data rate of flow f
and user configuration are stable. dl Effective transmission rate of link l
al Airtime ration of link l
Ratio of the time for data
transmission to the whole time
uf(x) Utility of flow f when the data rate in
x
Α System wide fairness parameter
Pf Priority of flow f
Ζ Delay penalty parameter
Fig 1. Example mesh network.
Dampening parameter
In Fig. 1, in this figure, a node stands for a
wireless router, which not only delivers data for its
The WMN under consideration provides a
own users, but also relays data traffic for other
connection oriented service, where connections
wireless r outers. Among nodes, there are some
are managed in the unit of a flow. A flow is also
384
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
unidirectional. A user can communicate with the available information. Though it is difficult to prove
other user or the gateway node after setting up a that the asynchronous operation leads to the exact
flow connecting them. Since a user is connected to solution, we have confirmed by simulation that the
a unique node, the flow between a pair of users solutions produced by the asynchronous and the
can also be specified by the corresponding node synchronous operations are the same in our
pair. The node where a flow starts (ends) will be routing problem. In the following, we describe
called the source (destination) node of the flow. three operations in more detail when they are
Fig. 1 shows an example scenario where a user implemented asynchronously.
intends to send data to outside networks. As seen
in this figure, if a flow conveys data to (from) Link cost control: For link cost control, the cluster
outside networks, all gateway nodes can be the head c gathers the information on the total load in
destination (source) node of the flow. We will the cluster c and adjusts to control the load on
identify a flow by an index, generally f, and define the cluster c. This process is the follows: Each
F as the set of the indices of all flows in the node estimates the total loads for all outgoing
network. links and broadcasts the estimated load
Data traffic on a flow is conveyed to the periodically receives broadcasting message, the
destination node through a multihop route. We cluster head can compute the total airtime ratio
only consider acyclic routes. Thus, a route can be consumed by the links in its cluster. If the total
determined by the set of all intermediate links that airtime ratio exceeds the available airtime ratio, it
the route takes. We will index a route by rand means that the cluster is overloaded. Therefore
define Df as the set of the indices of all cluster head c increases .If the cluster c is not
intermediate links on the route r. For a flow, there overloaded, that is ,its total airtime ratio is smaller
can be a number of possible routes that connect than , the cluster head c decreases
the source and destination nodes. Let G denote The cluster head c periodically broadcasts .
the set of the indices of all possible routes for flow
f. Routing: The link cost of link l is calculated as
.
III. DISTRIBUTED IMPLEMENTATION Since is the effective transmission rate
The routing scheme can be implemented in a reflecting the PHY transmission rate as well as the
distributed way, which improves the scalability of packet error probability, we can say that is
the WMNs. In this section, discuss the distributed equivalent to the ETT. Therefore, the link cost in
implementation of the proposed scheme. The flow the proposed scheme can be viewed as the ETT
data rate vectors and the Lagrange augmented with the load control variable. To find
multipliers are distributively managed by the the optimal route on which the sum of the
nodes in the WMN. The flow data rate vector proposed link cost is minimized, we can use either
managed by the source node of the flow f. Recall the existing proactive routing protocols or reactive
is the single path flow data rate vector with the routing protocols. The source node of flow f
active route of periodically finds the new optimal route by using
,and the flow data rate on is equal to these routing protocols. The source node is aware
. of the link costs on the current active route from
For implementation, one node within a cluster is the periodic report, and is also informed of the link
designated as the head of the cluster. The head of cost on the current active route from the periodic
a cluster is assumed to able to communicate with report, and is also informed of the link costs on
the transmitted nodes of the links in its cluster. Let the new optimal route by the routing protocol.
us call the head of cluster c the ―cluster head‖c Based on these link costs, the source node decides
the cluster head c takes the role of maintaining whether to change the active route or not.
and updating
. Flow/Congestion control: The source node
When the dual decomposition method is used, periodically recalculates the flow date rate
different variables can be updated according to the by using the link cost on the active route. The
different time schedules. Therefore, in order to source node lowers the flow data rate of its traffic
improve the convergence speed, in practice, to the flow data rate, the source node can be
different network entities carry out these quenched when the active route passes through
operations asynchronously, by using currently the congested area of the network.
385
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
By the above three operations, network-wide load the dampening algorithm should be able to be
balance can be achieved. If an area of the network implemented in a distributed way. To accomplish
is overloaded, the link costs around the area are these goals, the dampening algorithm prevents
increased by the link cost control operation. Then, the route flapping by changing the active route
the source node of the flow passing through the more conservatively than the original algorithm
area reduces its flow data rate, for find another does. When the original algorithm is used, we
route that allows a higher flow data rate. have . This means that, at the jth
iteration, the original algorithm finds any optimal
route in and immediately changes the active
route to the new optimal route. However, the
dampening algorithm changes the active route
only if the new route increases by a certain
margin.
Let us explain the operation of the dampening
algorithm. We define ξ as the dampening
parameter which controls the conservativeness in
changing the route. The value of ξ is between
zero and one. If ξ is set to one, the dampening
algorithm is the same as the original algorithm.
The active route is changed more conservatively
with the smaller value of ξ. At jth iteration, the
dampening algorithm first finds any optimal route.
Consequently, the dampening algorithm can
alleviate the route flapping problem if the
dampening parameter ξ is set to a sufficiently
small value. The stability comes at the cost of the
suboptimal route selection.
386
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
flows is 100). This means that the proposed The main advantage of the proposed routing
scheme is robust to high load skewness following scheme is that it is favorable to practical
to its load balancing capability. To clearly show the implementation although it is theoretically
convergence speed, we assume in this simulation designed. The proposed scheme is a practical
that all operations are performed synchronously at single-path routing scheme, unlike other multipath
each iteration, instead of employing the routing schemes which are designed by using the
asynchronous distributed implementation in optimization theory. Also, the proposed scheme
previous simulations. Note that one iteration can easily be implemented in a distributed way by
includes one round of routing, link cost control, means of the existing routing algorithms. The
and flow/congestion control. That the routes are proposed scheme can be applied to various single-
stabilized as the number of iterations increases. band PHY/MAC layer protocols. In future work, we
Every 100 iterations, we count the number of can extend the proposed scheme so that it can
flows whose route is changed during the last 100 also be applied to the multiband protocols, which
iterations. The simulation is performed in both the can provide larger bandwidth to the WMN.
gateway and no gateway scenarios. The load
skewness is one for the gateway scenario, and the REFERENCES
concentrated traffic model is selected for the no
gateway scenario. The total number of flows is [1] Bhupendra Kumar Gupta and B.M.Acharya,
100. Manoj Kumar Mishra.Optimization of Routing
To see that almost all route changes occur Algorithm in Wireless Mesh Networks, IEEE 2009
within 200 iterations for both scenarios. It is noted World Congress on Nature & Biologically Inspired
that this number of iterations (i.e., 200) is required Computing.
for convergence from the initial state, where all [2] Chi Ma, Zhenghao Zhang and Yuanyuan Yang.
flows start simultaneously. Since there is only a Battery - Aware Router Scheduling in Wireless
small change in network configuration (e.g., Mesh Networks, IEEE 2009 International
addition/deletion of a flow) at a time in usual Conference.
situation, much smaller number of iterations is
needed to converge in practice. In case of the [3] Fatos Xhafa, Leonard Barolli. Ad Hoc and
gateway scenario, we observe that no route Neighborhood Search Methods for Placement
change takes place after 1 ,700 iterations. In the of Mesh Routers in Wireless Mesh
no gateway scenario, few routes change Networks, 2009 29th IEEE International
continuously due to the route flapping. However, Conference on Distributed Computing Systems
this is acceptable since the number of route Workshops.
flappings is very small compared to the total
number of flows. As mentioned before, if needed, [4] Jonathan Guerin, Marius Portmann. Routing
the routing scheme can be further stabilized by Metrics for Multi-Radio Wireless Mesh
decreasing the parameter ξ. Networks, IEEE Applications Conference December
2nd – 5th 2007.
VI. CONCLUSION
A load aware routing scheme is developed for [5] Md. Arafatur Rahman, Md. Saiful Azad, Farhat
WMN. We have formulated the routing problem as Anwar. Integrating Multiple Metrics to Improve
an optimization problem, and have solved it by the Performance of a Routing Protocol over
using the dual decomposition method. The dual Wireless Mesh Networks, IEEE 2009 International
decomposition method makes it possible to design Conference on Signal Processing Systems.
a distributed routing scheme. However, there
could be a route flapping problem in the [6] Richard Draves Jitendra Padhye Brian Zill.
distributed scheme. To tackle this problem, we Routing in Multi-Radio, Multi-Hop Wireless
have suggested a dampening algorithm and have Mesh Networks Richard Draves Jitendra
analyzed the performance of the algorithm. The Padhye Brian Zill, MobiCom‘04, Sept. 26-Oct. 1,
numerical results show that the proposed scheme 2004.
with a dampening algorithm well converges to a
stable state and achieves much higher throughput [7] Usman Ashraf, Slim Abdellatif and Guy
than the ETT based scheme does owing to its Juanole. Route Stability in Wireless Mesh Access
load-balancing capability.
387
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
388
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
**MRS.M.RAJALAKSHMI, M.E,
ASSISTANT PROFESSOR OF ECE DEPT,
Adiparasakthi Engineerimg College,
Melmaruvathur.
389
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
are significantly different with respect to the idea is to define k centroids, one for each
same characteristic(s). cluster [2]. These centroids should be placed
in a cunning way because of different location
During segmentation, an image is causes different result. So, the better choice is
preprocessed, which can involve restoration, to place them as much as possible far away
enhancement, or Simply representation of the from each other. The next step is to take each
data. Certain features are extracted to point belonging to a given data set and
segment the image into its key components. associate it to the nearest centroid. When no
The segmented image is routed to a Classifier point is pending, the first step is completed
or an image-understanding system. The image and an early group age is done. At this point
classification process maps different regions or we need to re-calculate k new centroids as bar
segments into one of several objects. Each centers of the clusters resulting from the
object is identified by a label. The image previous step.
understanding system then determines the
relationships between different objects in a After we have these k new centroids, a new
scene to provide a complete scene description. binding has to be done between the same
Powerful segmentation techniques are data set points and the nearest new centroid.
currently available however, each technique is A loop has been generated. As a result of this
ad hoc. The creation of hybrid techniques loop we may notice that the k centroids
seems to be a future research area that is change their location step by step until no
promising with respect to current Navy digital more changes are done. In other words
mapping applications. Medical image centroids do not move any more. Finally, this
segmentation refers to the segmentation of algorithm aims at minimizing an objective
known anatomic structures from medical function, in this case a squared error function.
images. Structures of interest include organs
or parts thereof, such as cardiac ventricles or Using K Means clustering we can segment
kidneys, abnormalities such as tumors and Angiographic images. The goal is to propose
cysts, as well as other structures such as an algorithm that can be better for large
bones, vessels, brain structures etc. The datasets and to find initial centroids. K-Means
overall objective of such methods is referred Clustering is an iterative technique that is used
to as computer-aided diagnosis they are used to partition an image into K clusters.
for assisting doctors in evaluating medical
imagery or in recognizing abnormal findings in REGION GROWING METHODS
a medical image.
In Region growing technique, segment an
2.METHODOLOGY: image pixels that are belong to an object into
regions. Segmentation is performed based on
Several general-purpose algorithms and some predefined criteria. Two pixels can be
techniques have been developed for image grouped together if they have the same
segmentation. Since there is no general intensity characteristics or if they are close to
solution to the image segmentation problem, each other. It is assumed that pixels that are
Thresholding approaches, Region Growing closed to each other and have similar intensity
approaches, Clustering techniques often have values are likely to belong to the same object.
to be combined with domain knowledge in The simplest form of the segmentation can be
order to effectively solve an image achieved through threshold and component
segmentation problem for a problem domain. labeling. Another method is to find region
boundaries using edge detection. Region
CLUSTERING METHODS growing is a procedure that groups pixels or
K-means is one of the simplest unsupervised subregions into larger regions.
learning algorithms that solve the well known
clustering problem. The procedure follows a The simplest of these approaches is pixel
simple and easy way to classify a given data aggregation, which starts with a set of ―seed‖
set through a certain number of clusters points and from these grows regions by
(assume k clusters) fixed a priori. The main appending to each seed points those
390
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
OTSU’S METHOD
391
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
cell core because entire image is not in same pixel by pixel. For values greater than 1, the
intensity . value used in the process of iteration of the
algorithm is the value that was defined by the
user.
B. Valleys Analysis
The identification of the histogram valleys is
very important, because in these valleys the
thresholds are concentrated, and therefore the
division of classes. The algorithm identifies
automatically these valleys using the
transition of the histogram values signals,
which is done in the following way:
FIG:3 Segmentation using Region growing • First you compare the first group‘s value,
method which was determined in the histogram
segmentation, with this group‘s last value. If
using otsu‘s method threshold value of the the first is lower it means that the histogram
cardiac image is calculated setting that as values are increasing and the signal is positive.
threshold value biopsy cardiac image is On the other hand, the histogram values are
segmented which is shown in fig 4. From the decreasing and the signal is negative.
segmented image we cannot clearly • The next step is to identify the sign of the
diagnosing the mismatch of the tissue or cell next
core because the threshold value is dependent group, every time there is a transition from a
upon the user. This method show different negative to positive, a valley is identified.
segmented image for different value of Once you found the first valley, you pass to
threshold. the next step, the analysis of the percentage
of slope.
BLOCK DIAGRAM
392
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
393
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Transactions on Systems, Man, and Information science and engineering 17, 713-
Cybernetics, v. 9, n.1, 1979. 727.
394
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
395
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
WMN.The utility is a value which quantifies and the multi radio link quality source routing
how satisfied a user is with the network. Since (MR-LQSR), respectively. The WCETT is a
the degree of user satisfaction depends on the modification of the ETT to consider the intra
network performance, the utility can begiven flow interference. While the WCETT only
as a function of the user throughput. considers the intra flow interference, the MIC
Generally, the utility function is concave to and the interference aware (iAWARE) take
reflect the law of diminishing marginal utility. account of the interflow interference as well as
To design the scheme, we use the dual the intra flow interference.
decomposition method for utility maximization The mETX and the ENT are
.Using this method, we can incorporate not proposed to cope with the fast link quality
only the load aware routing scheme but also variation. These routing metrics contain the
congestion control and fair rate allocation standard deviation of the link quality in
mechanisms into the WMN. Most notably, we addition to the average link quality. The
can implement the load-aware routing scheme blacklist-aided forwarding (BAF) algorithm in
in a distributed way owing to the structure of explains to tackle the problem of shortterm
the dual decomposition method. link quality degradation by disseminating the
In the proposed routing blacklist, i.e., a set of currently degraded links.
scheme, a WMN is divided into multiple The ExOR algorithm explains the next hop
overlapping clusters. A cluster head takes role after the transmission for that hop without
of controlling the traffic load on the wireless predetermined routes. The ExOR can choose
links in its cluster. The cluster head the next hop that successfully received the
periodically estimates the total traffic load on packet, and therefore, it is robust to packet
the cluster and increases the ―link costs‖ of error and link quality variation. The resilient
the links in the mesh client. . In this paper we opportunistic mesh routing(ROMER) algorithm
propose an algorithm to network these mesh uses opportunistic forwarding to deal with
nodes in to well define clusters with less- short-term link quality variation. The
energy-constrained gateway nodes acting as ROMER maintains the long-term routes and
cluster heads, and balance load among these opportunistically expands or shrinks them at
gateways. Simulation results show how our runtime.
approach can balance the load and improve The ad hoc on-demand distance
the lifetime of the system vector spanning tree (AODV-ST) is an
adaptation of the AODV protocol to the WMN
2 RELATED WORKS with the wired gateways. The AODV-ST
For the WMN, a number of routing constructs a spanning tree of which the root is
metrics and algorithms have been proposed to the gateway. A routing and
take advantage of the stationary channel assignment algorithm for the
topology. The first routing metric is the ETX , multichannel WMN.
which is the expected number of In this algorithm, a spanning tree
transmissions required to deliver a packet to is formed in
the neighbor. In minimum loss (ML) metric such a way that a node attaches itself to the
this is used to find the route with lowest end- parent node.
to-end loss probability. The medium time The load-aware routing protocols incorporate
metric (MTM) is proposed for the multirate the load factor into their routing metrics. The
network.The MTM of a link is inverse dynamic load-aware routing (DLAR) takes as
proportional to the physical layer transmission the routing metric the number of packets
rate of the link. The ETT is a combination of queued in the node interface. The load-
the ETX and the MTM. The ETT is a required balanced ad hoc routing (LBAR) counts then
time to transmit a single packet over a link in umberof active pathson a node and its
the multirate network, calculated in neighbors, and uses it as a routing metric.
consideration of both the number of Both the DLAR and LBAR are designed for the
transmissions and the physical layer mobile ad hoc network, and aim to reduce the
transmission rate. packet delay and the packet loss ratio. An
The routing metric and algorithm admission control and load balancing
for the multiradio WMN, which are the WCETT algorithm is proposed for the 802.11 mesh
396
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
networks. In this work, the available radio can be decomposed into the subproblems
time (ART) is calculated for each node, and which are solved by the different network
the route with the largest ART is selected layers in the different network nodes. In the
when a new connection is requested.This decomposed problem, the Lagrange multipliers
algorithm tries to maximize the average act as a interface between the layers and the
number of connections. The WCETT load nodes, enabling the distributed entities to find
balancing (WCETT-LB) metric. The WCETT-LB the global optimal solution only by solving
is the WCETT augmented by the load factor their own subproblems.
consisting of the average queue length and Therefore ,the dual decomposition
the degree of traffic concentration. method provides a systematical way to design
The QoS-aware routing algorithm a distributed algorithm which finds the global
with congestion control and load balancing optimal solution.The mesh router relays
(QRCCLB) calculates the number of congested aggregated data traffic of mesh clients to and
nodes on each route and chooses the route from the IP core network. Typically, a mesh
with the smallest number of congested router has multiple wireless Interfaces to
nodes.Compared to these load-aware routing communicate with other mesh routers, and
protocols, the each wireless interface corresponds to one
proposed routing scheme has three major wireless channel.
advantages. First,the proposed scheme is These wireless channels have
design to maximize the system capacity by different characteristics, because wireless
considering all necessary elements for load interfaces are running on different frequencies
balancing, e.g., the interference between and built on either the same or different
flows, the link capacity, and the user demand, wireless access technologies, e.g., IEEE
etc. On the other hand, the existing protocols 802.11a/b/g/n. It is also possible that
fail to reflect these elements since they use directional antennas are employed on some
heuristically designed routing metrics. For interfaces to establish wireless channels over
example, the DLAR, the ART, and the WCETT- long distances.
LB do not take account of the interference
between flows. Also, the link capacity is not 3 SYSTEM MODEL
considered by the DLAR, the LBAR, the ART, 3.1 Mesh Network Structure
and the QRCCLB. Second, the proposed Each wireless router in a WMN
scheme can guarantee fairness between users. is fixed at a location. Thus, the WMN topology
When the network load is high, it is of does not change frequently and the channel
importance for users to fairly share scarce quality is quasi-static. In addition, each
radio resources.
However, the existing protocols
cannot fairly allocate resources, since they are
unable to distinguish which route is
monopolized by a small number of users.
Third, the proposed scheme can provide
routes stable over time. Since most of the
existing protocols adopt highly variable routing
metrics such as the queue length or the
collision probability, they are prone to suffer
from the route flapping problem.
We design the proposed routing
scheme by using the dual decomposition
method for the network utility maximization.
To use this method, one should formulate the
global optimization problem that is to
maximize the total system utility under the
constraints on the traffic flows and the radio
resources. After the constraints are relaxed by
the Lagrange multipliers, the whole problem
397
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
wireless router serves so many subscribers If all flows convey data traffic
(i.e., users) in general that the characteristic through each route at their flow data rates,
of the aggregated traffic is stable over time. the sum of the data rates of traffic passing
Therefore, we design the routing through link l is calculated as Pr2Hl Pf2Qr
scheme under the system model of which _f;r, where Hl is defined as the set of the
topology and user configuration are stable. In indices of all routes passing through the link l,
Fig. 1, we illustrate an example of the WMN. and Qr is the indices of all flows that use the
In this figure, a node stands for a wireless route r.We define the ―airtime ratio‖ of the link
router, which not only delivers data for its own l, denoted by al, as the ratio of the time taken
users, but also relays data traffic for other up by the transmission to the total time of link
wireless routers. Among nodes, there are l. The airtime ratio of the link l can be
some gateway nodes connected to the wired calculated as the sum of the data rates on the
backhaul network. Each user is associated link l divided by the effective transmission rate
with its serving node. In this paper, we do not of the link l. That is,
deal with the interface between a user and its
serving node to focus on the mesh network
itself. Through the serving node, a user can
send (receive) data traffic to (from) the other Now, we discuss the restriction on the
user in the WMN or to (from) outside networks radio resource allocation. For the protocols
via the gateway nodes. If node n can transmit under consideration, time is the only radio
data to node m directly (i.e., without relaying), resource, which is shared by links for data
there exists a link from the node n to the transmission. If two links are adjacent enough
nodem. In this paper, we define a link as to interfere with each other, packets cannot be
unidirectional. conveyed through the two links at the same
time. To incorporate this restriction into the
3.2 Physical and Medium Access Control Layer proposed scheme, we divide the WMN into
Model multiple overlapping clusters. A cluster
The proposed scheme can be includes the links adjacent enough to interfere
implemented on top of various physical with each other. Therefore, any pair of links in
(PHY) and medium access control (MAC) the same cluster cannot deliver packets
layer protocols that utilize a limited simultaneously. A cluster is generally indexed
bandwidth and divide the time for multiple by c, and let C be the set of the indices of all
access, for example, such as the carrier sense clusters in the WMN. We also define Mc as the
multiple access/collision avoidance set of all links in the cluster c. The proposed
(CSMA/CA), the time division multiple access scheme estimates the traffic load in each
(TDMA), cluster. The traffic load in a cluster is the sum
and the reservation ALOHA (R-ALOHA). of the traffic load on the links in the cluster. If
The effective transmission rate of a the traffic load in a cluster is estimated to be
link is defined as the number of actually too high, the proposed scheme can redirect
transmitted bits divided by the time spent for the routes passing through the overloaded
data transmission, calculated in consideration cluster for load balancing. The airtime ratio of
of retransmissions due to errors. That is, the a link represents the traffic load on the link. If
effective transmission rate can be calculated the sum of the airtime ratios of the links in a
as the PHY layer transmission rate times the cluster exceeds a certain bound, the cluster
probability of successful transmission. The can be regarded as overloaded. Roughly, we
PHY layer transmission rate can be fixed, or assume that a fixed portion of the time can be
can be adaptively adjusted according to the used for data transmission, while the
channel quality by means of rate control remainder is used for the purpose of control,
schemes such as the receiver-based autorate e.g., control message exchange and random
(RBAR). In the WMN under consideration, the back-off. Let _ denote the ratio of the time for
effective transmission rate of a link is assumed data transmission to the whole time. Since
to be static for a long time due to fixed only a link can convey data traffic at a time
locations of nodes. We define dl as the within a cluster, the sum of the airtime ratios
effective transmission rate of the link l. of the links in a cluster cannot exceed _.
Therefore, we have the following constraint:
398
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
399
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
4.1 Optimization Heuristics the part of the cluster for which it minimizes
Before minimizing the objective the objective function. The process is repeated
function we allocate the nodes in the ESet to till all the nodes in the sensors are not
their respective clusters and calculate the load. clustered.
If we allocate the remaining nodes to the
clusters only by minimizing the objective
function we experience large overlapping of 5 Performance Results
clusters. Considering only the load on In this section we present some results
gateways as a factor for clustering might do so obtained by our simulation. To evaluate the
at the expense of sensors. Our experiments performance of our algorithm, we compare the
also show that some sensors are not part of results with shortest distanceclustering where
the gateways nearest to them. This will a gateway includes a sensor in its cluster if the
increase the communication energy of the distance between them is minimum. We
sensors. Exhaustive search methods like measure three different properties of the
simulated annealing can be used to find the system based on different metrics. Standard
optimum results to balance the load as well as Deviation of Load per cluster: Experiments are
maintain the minimum distance with the performed to measure load on each cluster
gateway. But by using these methods the after clustering. Standard deviation of the load
complexity of the algorithm is increased with of the system gives a good evaluation of
the increase in sensors and gateways. In order distribution on load per cluster. We measure
to balance load of gateways and preserve the deviation in load by varying the number of
precious energy of sensors, we select few gateways from 2 to 10 in a fixed 100 nodes
nodes that are located radically near a network. In order to demonstrate that the
gateway and include them in the ESet of the load is balanced for any setup we ran the
gateway. A node is included in the ESet of a same experiments for 10 different normal
gateway if, its distance to the gateway is less distributions. Same experiments are performed
than a critical distance. Initially, the critical with shortest distance clustering and the
distance is equal to the minimum distance in results are compared with our approach.
the ESet. Then the critical distance is gradually Variance in load signifies that load is not
increased till the median of distances in ESet is uniformly distributed among the clusters.
reached. This procedure is repeated for all the Results demonstrate that for all distributions
gateways based on increasing order of our approach outperforms the shortest
cardinality, which balances load while distance
performing the selection. Experimental results clustering
show that this method significantly reduces
the number of nodes to be considered for
exhaustive search and reduces overlapping
between the clusters.
Now, we start clustering the
remaining sensors in the system. Since
sensors cannot reach all the gateways,
minimizing objective function for those
gateways will unnecessarily increase the
complexity of algorithm. In order to save
computation for clustering we sort the sensors
based on increasing order their reach. Nodes
with same reach are grouped together to
avoid extra computation of calculating the
objective function for the gateways they
cannot reach. Nodes with lower reach are To test our system for different sensors
considered first because they have fewer densities we measured standard deviation of
clusters to join. The objective function is load by using 5 gateways and increasing the
calculated by assigning these nodes to the number of sensors in the system from 100 to
gateways they can reach. The node becomes 500 with uniform increments. The graph
400
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
shown in Fig 3 clearly indicates that our serves as a hop to relay data from sensors
approach increase the scalability of the to a distant command node. If nodes are not
system. The performance of our approach uniformly distributed around the gateways the
remains constant with increase in density. The clusters formed will be of varied load, which
rising curve of the shortest distance clustering will affect the lifetime and energy consumption
indicates that variance in load is increasing of the system. Simulation results demonstrate
with increase in density. The demonstrated that our algorithm consistently balances load
results are based on the normal distribution of among different clusters and performs well in
sensors. Average communication energy per all distributions of nodes. Our future plan
cluster: We measure total energy required to includes extending the clustering model to
communicate between gateway and all the allow gateway mobility. Also, we plan to study
sensors in its cluster. Communication energy is different failure scenarios in sensor networks
directly proportional to the distance between and introduce run-time fault-tolerance in the
two nodes. If clusters are formed based on system.
shortest distance the average energy REFERENCES
consumed will be minimal but the load will not [1] R. Bruno, M. Conti, and E. Gregori,
be balanced. Sensors clustered by shortest ―Mesh Networks: Commodity Multihop Ad Hoc
distance method will consume less Networks,‖ IEEE Comm. Magazine, vol. 43, no.
communication energy in the beginning but 3, pp. 123-131, Mar. 2005.
will consume more energy later due to [2] D. De Couto, D. Aguayo, J. Bicket, and
overhead of re-clustering. We try to minimize R. Morris, ―High- Throughput Path Metric for
the average communication energy to perform Multi-Hop Wireless Routing,‖ Proc. ACM
as good as shortest distance algorithm in MobiCom, Sept. 2003.
terms of communication energy. The [3] D. Passos, D.V. Teixeira, D.C.
experimental results, in Fig. 4, show that the Muchaluat-Saade, L.C.S. Magalhaes, and
performance of shortest distance clustering C.V.N. Albuquerque, ―Mesh Network
decreases with the increase in number of Performance Measurements,‖ Proc. Int‘l
clusters. Information and Telecomm. Technologies
6. Conclusions and future work: Symp.,
In this paper we have introduced an Dec. 2006.
approach to cluster unattended wireless [4] B. Awerbuch, D. Holmer, and H.
sensors about few high-energy gateway nodes Rubens, ―The Medium Time Metric: High
and balance load among these clusters. The Throughput Route Selection in Multi-Rate Ad
gateway node acts as a centralized manager Hoc Wireless Networks,‖ Mobile Networks and
to handle the sensors Applications, vol. 11,no. 2, pp. 253-266, Apr.
2006.
[5] R. Draves, J. Padhye, and B. Zill,
―Routing in Multi-Radio, Multi-Hop Wireless
Mesh Networks,‖ Proc. ACM MobiCom, Sept.
2004.
[6] Y. Yang, J. Wang, and R. Kravets,
―Designing Routing Metrics for Mesh
Networks,‖ Proc. IEEE Workshop Wireless
Mesh Networks, Sept. 2005.
401
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
402
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
403
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Code5
General Factors
Age
Sex M:Male F:Female
Job type
Sleep time
Smoking Yes No
Physical Exercise Yes No
Using meat
and salt Yes No
Using milk
and egg Yes No
Alcohol
consumption Yes No
Lipids(mg/dL)
CH 1:<200 2:200-239 3:>240
TG 1:<150 2:150-199 3:200-499 4:>500
HDL 1:<40 2:40-59 3:>60
LDL Optimal:<100 Acceptable:100-129 Bordeline:130-159 High:160-189
Very High:>190
Blood Factors
WBC Yes:>11000 No:<11000
RBC Yes:>4-5 No:<4-5
HB Yes:>11-13 No:<11-13
HCT Yes:>30-35 No:<30-35
PLT Yes:>1-2 No:<1-2
MCV Yes:>30-35 No:<30-35
MCH Yes:>28-36 No:<28-36
Obesity factors
BMI Yes:>18.5-24.5 No:<18.5-24.5
WC Yes:>86 No:<86
WHR (Woman) Yes:>0.7 No:<0.7
(Man) Yes:>0.9 No:<0.9
Apo lipoprotein(mg/dL)
APOA Yes:>2-200 No:<2-200
APOB Yes:>40-125 No:<40-125
APOB/APOA Yes:>0.9 No:<0.9
Inflammation
Factor(mg/dL) Yes:>10 No:<10
Sugar factor(mg/dL)
FBS Yes:>110 No:<110
HOMA Yes:>2.5 No:<2.5
Resting
blood pressure(mmHg) Yes:>110 No:<110
More importance is given to the numerator as assumes that the effect of an attribute value
the denominator does not depend on and on a given class is independent of the values
also it is effectively constant. It is the of the other attributes. This is called class
normalizing factor which is equal for all independence.
classes. Bayesian classifier has high accuracy
when applied to large databases. Simple Table 3: Classification errors
Bayesian classifier is called Naïve Bayes which
404
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Simple CART
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal .289 0.05 0.6 0.289 0.39
Acceptable 0.835 0.201 0.622 0.835 0.713
Borderline 1 0.288 0.625 1 0.769 62.13
High 0 0 0 0 0
Very High 0 0 0 0 0
ID3
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal 0.96 0.033 0.883 0.96 0.92
Acceptable 0.908 0.012 0.967 0.908 0.937
Borderline 0.984 0.051 0.903 0.984 0.942 92.4
High 0.811 0.006 0.959 0.811 0.879
Very High 0.772 0.001 0.974 0.772 0.861
J48
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal 0.634 0.055 0.753 0.634 0.689
Acceptable 0.825 0.101 0.764 0.825 0.793
Borderline 0.945 0.185 0.71 0.945 0.811 73.4
High 0.353 0.022 0.738 0.353 0.478
Very High 0.275 0.006 0.624 0.275 0.382
Naïve Bayes
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal 0.301 0.071 0.526 0.301 0.383
Acceptable 0.777 0.197 0.61 0.777 0.683
Borderline 0.999 0.287 0.625 0.999 0.769 60.74
High 0.001 0.001 0.167 0.001 0.001
Very High 0 0 0 0 0
405
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
4. Accuracy Measures CART, ID3, J48 and Naïve Bayes for mining
The performance of different algorithms of the major risk factors of cardiovascular
data mining are analyzed with the help of disease. By choosing 29 predictor variable and
accuracy measures which include True Positive LDL as the target class, it was shown that
rate, False Positive rate, Precision, Recall and APOA, CRP, Resting blood pressure, HDL and
F-Measure. Confusion matrix is an efficient MCH are the major risk factors according to
way to represent these values in a matrix ID3 which has an accuracy rate of 92.4%.In
format. future work we can expand and enhance this
a) True Positive rate: The TruePositive(TP) work with clustering, association analysis and
rate is the proportion of examples which were other classification algorithms.
classified as class x, among all examples which 7. References
truly have class x, i.e. how much part of the [1]. Alierza Kajabadi, Mohamad Hosein Sarace,
class was captured. It is equivalent to Recall. Sedighe Asgari. ―Data mining Cardiovascular
In the confusion matrix, this is the diagonal Risk Factors‖, IEEE 2009.
element divided by the sum over the relevant [2]. Dan-Anderi, Adela Viviana, ―Overview on
row. How Data Mining Tools May Support
b) False Positive rate:The FalsePositive(FP) Cardiovascular Disease Prediction‖, Journal of
rate is the proportion of examples which were Applied Computer Science & Mathematics, 57-
classified as class x, but belong to a different 62, 2010.
class, among all examples which are not of [3]. Minas A. Karaolis, Joseph A. Moutiris,
class x. In the matrix, this is the column sum Demetra Hadjipanayi, Constantinos S.
of class x minus the diagonal element, divided Pattichis, ―Assessment of the Risk Factors of
by the rows sums of all other classes. Coronary Heart Events Based on Data Mining
c)Precision:The Precision is the proportion of With Decision Trees‖, IEEE Transactions on
the examples which truly have class x among information technology in Biomedicine, vol. 14,
all those which were classified as class x. In No. 3, May 2010.
the matrix, this is the diagonal element divided [4]. Jiawei H. Micheline Kamber, ―Data Mining,
by the sum over the relevant column. d) Concepts and Techniques‖, Second edition,
Recall:Recall is same as true positive rate. e) Elsevier, 2006.
F-Measure:The F-Measure is simply [5]. L.Gaga, V. Moustakis, Y. Vlachakis, G.
2*Precision*Recall/(Precision+Recall), a Charissis, ―ID3: Enhancing Medical Knowledge
combined measure for precision and recall. Acquisition with Machine Learning‖, Applied
Artificial Intelligence, vol. 10, p79-94, Taylor &
5. Experimental Results Francis, 1996.
The WEKA tool, which is open source java [6]. I.H. Witten and E. Frank, ―Data Mining:
software, is used for provision of classification Practical Machine Learning tools and
technique. Default settings of WEKA are used techniques‖, 2nd Edition, Morgan Kaufmann,
for other options.29 predictor variables and San Francisco, 2005.
LDL as target variable is used in the tool. The [7]. K. Mollazade, H. Ahmadi, M. Omid, R.
major factors as per Simple CART are CH, Alimardani. International Journal of Intelligent
APOA, CRP, PLT, HCT and FBS. According to Technology, ―An Intelligent Combined Method
ID3 APOA, CRP, Resting blood pressure, HDL Based on Power Spectral
and MCH are the major risk factors. J48 gives Density, Decision Trees and Fuzzy Logic for
the major risk factors as APOA, CRP, HCT, Hydraulic Pumps Fault Diagnosis‖, Vol. 3 Issue
HDL, PLT and HOMA. The classification error is 4, p251-263, 2008.
shown in Table 3. ID3 with 92.4% accuracy [8]. D.W. Aha, D. Kibler and M.K. Albert.
performs better than Simple CART, J48 and Instance-Based Learning Algorithms. Machine
Naïve Bayes with 62%, 73% and 60% Learning, 6(1):37–66, 1991.
respectively. [9] Z. Zheng, G.I. Webb. Lazy Learning of
Bayesian Rules, Machine Learning, 41, 53–87,
6. Conclusion and future work Kluwer Academic Publishers, 2000.
In this paper we compared the performance of
different classification algorithms like Simple
406
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
407
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
them with a high reliability, performance and cost- connections to a server are deemed reliable, of high
efficiency. In those cases mobile Grid has the meaning bandwidth, and of low latency, 2) Fail to deal with link
of ‗gridifying‘ the mobile resources. In the second case disconnections and degrees of system topological
of having mobile Grid resources, we should underline dynamicity [6] [9]. At any time during job execution, a
that the performances of current mobile devices are host or link failure may lead to severe performance
significantly increased. degradation or even total job abortion, unless execution
A. CHECKPOINTING checkpointing is incorporated [8].
A checkpoint facility enables the intermediate state of a The mobile Grid will introduce changes to the general
process to be saved to a file. Users can later resume Grid concept. New functionalities of the Grid will be
execution of the process from the checkpoint file. This needed since the old ones will not make use of all the
prevents the loss of data generated by long-running capabilities that will be available [8]. These
processes due to program or system failures, and it functionalities will involve end-to-end solutions with
also facilitates debugging when the bug appears after emphasis on Quality of Service (QoS) and security, as
the program has executed for a long time. well as interoperability issues between the diverse
technologies involved. Enhanced security policies and
approaches to address large scale and heterogeneous
environments will be needed. [1][2] Additionally, the
volatile, mobile and poor networked environments have
to be addressed with adaptable QoS aspects which
have to be contextualized with respect to users and
their profiles. Mobile Grids will make use of the value
that many mobile users perceive due to the advanced
capabilities of their mobile devices[12]. These advanced
capabilities refer to the comparison of today‘s mobile
devices with the ones that existed in the past. Although
mobile devices are subject to physical constraints due
to their nature they have the ability of having
computational and storage capabilities similar to PCs,
Fig 1: A typical Checkpoint/Restart on an high quality displays, multiple interfaces (for instance
application Bluetooth, Ethernet adapters, USB, Infrared) etc. This
value can be converted into revenue for service
providers. This implies changes in various business
II. RELATED WORKS
models and policy issues. Complex workflows for
A lot of studies have been done on Mobile Grids businesses will be needed and Virtual Organizations
(MoGs), are receiving growing attention and expected (VO) will be enriched with the opportunity for automatic
to become a critical part of future computational Grid federations and resource sharing schemes.[15]
involving mobile hosts to facilitate user access to the Valuations of the diverse services and especially with
Grid and to also offer computing resources [4]. A MoG respect to the Service/Sharing level agreement have to
can involve a number of mobile hosts (MHs), having be adapted in relation to QoS aspects. Enterprises have
wireless interconnections among one another, or to to issue policies that will handle the conflict between
access points [2] [3]. Indeed, a recent push by HP to public rights to their users and viable models for
equip business notebooks with integrated global operation. Moreover the fair use of Grid must be
broadband wireless connectivity has made it possible to determined by reconciling rights of public access to
form a truly mobile Grid (MoG) that consist of MHs resources and private ownership of infrastructure and
providing computing utility services collaboratively, with resources [13]. In the sequel we present a short
or without connections to a wired Grid.[5][6][7].Various discussion on some of the most important challenges of
checkpointing mechanisms have been pursued for Mobile Grids with respect to the resource management
distributed systems (whose computing hosts are wire- topic. Of course many things will change and raise their
connected) However, they are not suitable for the MoG own challenges to be addressed in this new context.
because their checkpointing arrangements, 1) are
relatively immaterial as checkpointed data from hosts
can be stored at a designated server or servers, since
408
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
409
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
quickly and concurrently find superior arrangements readily grants permission and sends a positive
within each cluster instead of having to labor toward a acknowledgment back to MHk, establishing a MHk →
global MoG solution. MHl relationship. On the other hand, if a relationship,
2. ReD makes decisions about whether to request, say, MHj → MHl already exists, MHl checks to see if the
accept, or break checkpointing relationships, locally (at requesting consumer‘s pairing reliability gain is greater
the MH level) and in a fully distributed manner, instead than that for its existing paired consumer, If so, it
of attempting a high-level centralized or global breaks its relationship with MHj by sending it a break
consensus. message, and then grants permission to MHk by
3. ReD keeps checkpoint transmissions local, i.e., sending it an acknowledgment. If, on the other hand,
neighbor to neighbor, not requiring multiple hops and the statement proves false, MHk is sent a negative
significant additional transmission overhead to achieve acknowledgment, and MHl maintains the relationship,
checkpointing relationships, and MHj → MHl, unless otherwise severed due to mobility,
4. ReD allows a given consumer or provider to break its or weak signal. ReD‘s protocol messages while the
existing checkpointing relationship, only when the pseudo code used is listed in the Appendix., MHk, with
arrangement reliability improvement is significant, thus some provider, MHa, the pairing gain of k on a, ReD
promoting stability. has been enhanced since our initial preliminary work to
include pairing gain considerations when comparing
C. ReD METHODOLOGY DESCRIPTION prospective relationships as opposed to just strictly
comparing alternative relationship reliabilities directly. It
The central mechanism of our MoG middleware attempts to globally maximize Ri, through local,
component is our Reliability Driven (ReD) protocol, decentralized, MHk → MHa pairing decisions. To allow
which is aware of the reliabilities of links among MHs for mobility, tables of connectivity and link reliabilities
within the MoG, a significant indicator of the service are updated an aged via the soft state process. A
quality (i.e., QoS) a distributed application will receive. consumer periodically refreshes, checks its λka × ρa
Defining the term ―connectivity‖ to mean the parallel sorted product list, and determines if it might do better
reliability of all links from a given node to its neighbors, to find another provider, whereupon it takes action. A
ReD makes use of these link reliability values to provider, upon loss of relationship with a consumer, for
determine the best possible checkpointing arrangement any reason, deletes its consumer pointer and admits
dynamically. We seek to maximize requests from other consumers. Finally, upon receiving
a break message, a paired consumer initiates the
process of finding a checkpoint provider all over again.
Note that ReD is designed to be IID (meaning to run on
where RA is the arrangement reliability we seek to each host independently and identically). Because
maximize, Ci is the connectivity of consumer i, Pj is the messages can be lost in transmission, especially over
connectivity of provider j, and Lij is the reliability of the poor wireless links, tables of host connectivity link
wireless link from Ci to Pj. Because we determined that reliabilities, and consumer and provider pointers are
the problem of finding the optimum checkpointing maintained at hosts by the soft state registration
arrangement to be NP-Complete, ReD utilizes a process, e.g., due to mobility or a weak signal, a host
heuristic algorithm. To ensure convergence within a may declare that it has lost its provider (or consumer),
reasonable time, the global MoG is partitioned into attempting to find a new one .
clusters. Because ReD operates within clusters, and is
localized, it often arrives at suboptimal checkpoint IV. IMPLEMENTATION
arrangements
Upon initiation or refresh, if some consumer, MHk, To evaluate our approach, we design an optimal global
does not have a designated provider, it begins to look arrangement to improve the performance of wired grid
for one. In doing so, it examines and compares λka × computing systems, which will develop in C# and to
ρa products of each of its n neighbors, MHa . Next, it implement our techniques for network based approach.
transmits a checkpoint request, first to the list top host As this figure shows, superior reliability checkpointing
(having the greatest λka × ρa product), e.g., MHl. In arrangements in MoG.
essence, a checkpoint request asks the provider, MHl‘s
permission to send checkpointed data to it. If
prospective provider, MHl has no consumer of record, it
410
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
411
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract— A good user profiling strategy is an demands on concepts and technologies to support
essential and fundamental component in users to filter relevant information. Information
search engine personalization. Most existing retrieval (IR) and information filtering (IF) are two
user profiling strategies such as Web server log analytical information seeking stategies in this paper
files and Meta data describing page contents we focus on information filtering. Information filtering
are based on objects that users are interested assumes a rather stable user interest (reflected
in (i.e., positive preferences), but not the through a user profile) but has to deal with highly
objects that users dislike (i.e., negative dynamic information sources [20]. In IF systems, a
preferences). In this paper, we focus on search user profile typically includes long-term user interests
engine personalization and develop several [3] and the acceptance of IF systems highly depends
concept-based user-profiling strategies that on the quality of user profiles. In particular, a user
are based on both positive and negative profile describes a set of user interests which can be
preferences. We evaluate the proposed modeled via categories like sports, technology, or
methods against our previously proposed nutrition, and can be used for the purpose of
personalized query clustering method. User information filtering. The definition of user profiles
profiling methods that incorporate negative can either be explicit, implicit or a combination of
concept weights return termination points that both. In the explicit approach the system interacts
are very close to the optimal points obtained with the user and acquires feedback on information
by exhaustive search. An accurate user profile that the user has retrieved or filtered respectively. In
can greatly improve a search engine’s turn, the user can, for example, indicate which
performance by identifying the information filtering results are of most interest to him to improve
needs for individual users. By applying future filtering results (so called relevance feedback).
preference mining rules to infer not only users’ A good user profiling strategy is an essential
positive preferences but also their negative and fundamental component in search engine
preferences and utilized both kinds of personalization.
preferences in deriving users profiles.
XXX. RELATED WORKS
Keywords— Negative preferences, personalization, User profiling strategies can be broadly classified
personalized query clustering, search engine, user into two main approaches: document-based and
profiling. concept-based approaches. Document-based user
profiling methods aim at capturing users‘ clicking and
XXIX. INTRODUCTION browsing behaviors. Users‘ document preferences are
The constantly growing information supply in first extracted from the clickthrough data, and then,
Internet-based information systems poses high used to learn the user behavior model which is
412
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
usually represented as a set of weighted features. On Section 4, are employed to create conceptbased user
the other hand, concept-based user profiling methods profiles. Finally, the concept-based user profiles are
aim at capturing users‘ conceptual needs. Users‘ compared with each other and against as baseline
browsed documents and search histories are our previously proposed personalized concept-based
automatically mapped into a set of topical categories. clustering algorithm
User profiles are created based on the users‘
preferences on the extracted topical categories. 3.1 Concept Extraction Using Web-Snippets
Our concept extraction method is inspired by the
2.1 Document-Based Methods wellknown problem of finding frequent item sets in
Most document-based methods focus on analyzing data mining [9]. When a user submits a query to the
users‘ clicking and browsing behaviors recorded in the search engine, a set of web-snippets are returned to
users‘ clickthrough data. On Web search engines, the user for identifying the relevant items. We
clickthrough data is a kind of implicit feedback from assume that if a keyword or a phrase appears
users. Table 1 is an example clickthrough data for the frequently in the web-snippets of a particular query, it
query ―apple,‖ which shows the URLs returned from represents an important concept related to the query
the search engine for the query and the URLs clicked because it coexists in close proximity with the query
on by the user. in the top documents. We use the following support
formula for measuring the interestingness of a
TABLE 1
particular keyword/phrase ti with respect to the
THE CLICKTHROUGH DATA FOR THE QUERY returned websnippets arising from a query q:
―APPLE‖
. │ti│
413
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
where is the similarity between Fig. 3. (a) A concept relationship graph for the query
―apple‖ derived without incorporating user
concepts ti and tj, which is composed of clickthroughs. (b) A concept preference profile
, and constructed using the user clickthroughs and the
as follows: concept relationship graph in (a). wti is the
interestingness of the concept ti to the user. More
clicks on a concept gradually increase the
interestingness wti of the concept.
3.3 Creating User Concept Preference Profile
The concept relationship graph is first derived
without taking user clickthroughs into account.
Intuitively, the graph shows the possible concept
space arising from user‘s queries. The concept space,
in general, covers more than what the user actually
wants. For example, when the user searches for the
query ―apple,‖ the concept space derived from the
web-snippets contains concepts such as ―ipod,‖
―iphone,‖ and ―recipe.‖ Therefore, we propose the
where n is the total number of web-snippets following formulas to capture user‘s interestingness
returned, is the joint snippet frequency wti on the extracted concepts ti when a clicked web-
of concepts ti and tj in document titles, snippet sj, denoted by click(sj), is found as follows:
is the snippet frequency of concept t in Click(sj) => = 1
document titles, is the joint Click(sj) => = if
snippet frequency of ti and tj in document summaries,
is the snippet frequency of concept t in where sj is a web-snippet, wti is the interestingness
document summaries, is the joint weight of the concept ti, and tj is the neighborhood
snippet frequency of concept ti in a document title concept of ti.
and tj in the document‘s summary (or vice versa), XXXII. QUERY CLUSTERING ALGORITHM
and
414
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
415
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
methods that incorporate negative concept weights [34] D. Beeferman and A. Berger, ―Agglomerative
return termination points that are very close to the Clustering of a
optimal points obtained by exhaustive search. [35] Search Engine Query Log,‖ Proc. ACM
SIGKDD, 2000.
6.1 Experimental Setup [36] C. Burges, T. Shaked, E. Renshaw, A. Lazier,
The query and clickthrough data for evaluation are M. Deeds, N.
adopted from our previous work [11]. To evaluate the [37] Hamilton, and G. Hullender,
performance of our user profiling strategies, we ―Learning to Rank Using Gradient
developed a middleware for Google3 to collect [38] Descent,‖ Proc. Int‘l Conf. Machine
clickthrough data. We used 500 test queries, which learning (ICML), 2005.
are intentionally designed to have ambiguous [39] K.W. Church, W. Gale, P. Hanks, and D.
meanings (e.g., the query ―kodak‖ can refer to a Hindle, ―Using Statistics
digital camera or a camera film). We ask human [40] in Lexical Analysis,‖ Lexical
judges to determine a standard cluster for each Acquisition: Exploiting On-Line
query. To avoid any bias, the test queries are [41] Resources to Build a Lexicon,
randomly selected from 10 different categories. Table Lawrence Erlbaum, 1991.
8 shows the topical categories in which the test [42] Z. Dou, R. Song, and J.-R. Wen, ―A Largescale
queries are chosen from. When a query is submitted Evaluation and
to the middleware, a list containing the top 100 [43] Analysis of Personalized Search
search results together with the extracted concepts is Strategies,‖ Proc. World Wide Web
returned to the users, and the users are required to [44] (WWW) Conf., 2007.
click on the results they find relevant to their queries. [45] S. Gauch, J. Chaffee, and A. Pretschner,
―Ontology-Based
XXXV. CONCLUSIONS [46] Personalized Search and Browsing,‖
An accurate user profile can greatly improve a ACM Web Intelligence and
search engine‘s performance by identifying the [47] Agent System, vol. 1, nos. 3/4, pp.
information needs for individual users. In this paper, 219-234, 2003.
we proposed and evaluated several user profiling [48] T. Joachims, ―Optimizing Search Engines Using
strategies. The techniques make use of clickthrough Clickthrough
data to extract from Web-snippets to build concept- [49] Data,‖ Proc. ACM SIGKDD, 2002.
based user profiles automatically. We [50] K.W.-T. Leung, W. Ng, and D.L. Lee,
applied preference mining rules to infer not only ―Personalized Concept-
users‘ positive preferences but also their negative [51] Based Clustering of Search Engine
preferences, andutilized both kinds of preferences in Queries,‖ IEEE Trans. Knowledge
deriving users profiles. [52] and Data Eng., vol. 20, no. 11, pp.
[23] REFERENCES 1505-1518, Nov. 2008.
[24] E. Agichtein, E. Brill, and S. Dumais, [53] B. Liu, W.S. Lee, P.S. Yu, and X. Li, ―Partially
―Improving Web Search Supervised
[25] Ranking by Incorporating User [54] Classification of Text Documents,‖
Behavior Information,‖ Proc. ACM Proc. Int‘l Conf. Machine
[26] SIGIR, 2006. [55] Learning (ICML), 2002.
[27] E. Agichtein, E. Brill, S. Dumais, and R. [56] F. Liu, C. Yu, and W. Meng, ―Personalized
Ragno, ―Learning User Web Search by
[28] Interaction Models for Predicting Web [57] Mapping User Queries to Categories,‖
Search Result Preferences,‖ Proc. Int‘l Conf. Information
[29] Proc. ACM SIGIR, 2006. [58] and Knowledge Management (CIKM),
[30] Appendix: 500 Test Queries, 2002.
http://www.cse.ust.hk/~dlee/ [59] Magellan, http://magellan.mckinley.com/,
[31] tkde09/Appendix.pdf, 2009. 2008.
[32] R. Baeza-yates, C. Hurtado, and M. Mendoza, [60] W. Ng, L. Deng, and D.L. Lee, ―Mining User
―Query Recommendation Using Query Logs in Preference Using Spy
Search Engines,‖ Proc. Int‘l [61] Voting for Search Engine
[33] Workshop Current Trends in Personalization,‖ ACM Trans. Internet
Database Technology, pp. 588-596, 2004.
416
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
417
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
ABSTRACT 1. INTRODUCTION
Distributed Denial of Service (DDoS) attacks continue
to plague the Internet. Application DoS attack, which Denial of Services (DoS) attacks aimed at disrupting
aims at disrupting application service rather than network services range from simple bandwidth
depleting the network resource, has emerged as a exhaustion attacks and those targeted at flaws in
larger threat to network services, compared to the commercial software to complex distributed attacks
classic DoS attack. When the DDoS attack is detected exploiting specific commercial off-the-shelf
by IDS, the firewall just discards all over-bounded (COTS)software flaws. Denial of Services (DoS)
traffic for a victim of absolutely decreases the attacks aimed at disrupting network services range
threshold of the router. Also, Attacker use spoofing IP from simple bandwidth exhaustion attacks and those
address. Defense against these attacks is complicated targeted at flaws in commercial software to complex
by spoofed source IP addresses, which make it distributed attacks exploiting specific COTS software
difficult to determine a packet‘s true origin. To flaws. DENIAL-OF-SERVICE (DoS) attack, which aims
identify application DoS attack, we propose a novel to make a service unavailable to legitimate clients,
group testing (GT)-based approach deployed on has become a severe threat to the Internet security
back-end servers, which not only offers a theoretical [2]. Traditional DoS attacks mainly abuse the network
method to obtain short detection delay and low false bandwidth around the Internet subsystems and
positive/negative rate, but also provides an degrade the quality of service by generating
underlying framework against general network congestions at the network [2], [3]. Consequently,
attacks. More specifically, we first extend classic GT several network-based defense methods have tried to
model with size constraints for practice purposes, detect these attacks by controlling traffic volume or
then redistribute the client service requests to differentiating traffic patterns at the intermediate
multiple virtual servers embedded within each back- routers [9], [10].However, with the boost in network
end server machine according to specific testing bandwidth and application service types, recently, the
matrices. Since this method only counts the number target of DoS attacks has shifted from network to
of incoming requests rather than monitoring the server resources and application procedures
server status, it is restricted to defending high-rate themselves, forming a new application DoS attack
DoS attacks. Based on this framework, we propose a [1], [2].Application DoS attacks exhibit three
two-mode detection mechanism using some dynamic advantages over traditional DoS attacks which help
thresholds to efficiently identify the attackers. evade normal detections: malicious traffic is always
indistinguishable from normal traffic, adopting
Index Terms—IDS, Application DoS, group testing, automated script to avoid the need for a large
network security. amount of ―zombie‖ machines or bandwidth to launch
the attack, much harder to be traced due to multiple
418
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
redirections at proxies. According to these and only if the ith pool contains the jth item;
characteristics, the malicious traffic can be classified otherwise, M[i, j]= 0. The t-dimensional binary
into legitimate-like requests of two cases: 1) at a high column vector V denotes the test outcomes of these t
inter_arrival rate and 2) consuming more service pools, where 1-entry represents a positive outcome
resources. We call these two cases ―high-rate‖ and and 0-entry represents a negative one. Note that a
―high-workload‖ attacks, respectively. Since these positive outcome indicates that at least one positive
attacks usually do not cause congestion at the item exists within this pool, whereas negative one
network level; thus, bypass the network-based means that all the items in the current pool are
monitoring negative.
system [21], detection, and mitigation at the end
system of the victim servers have been proposed by 2.1.2 Classic Methods
the two techniques, DDoS shield and CAPTCHA-
based defense .In DDoS shield the session validation Two traditional GT methods are adaptive and
based on legitimate behavior profile and in CAPTCHA nonadaptive [14]. Adaptive methods, use the results
authentication using human-solvable puzzles. The of previous tests to determine the pool for the next
overhead for per-session validation is not negligible, test and complete the test within several rounds.
especially for services with dense traffic. CAPTCHA- While nonadaptive GT methods employ d-disjunct
based defenses introduce additional service delays for matrix [14], run multiple tests in parallel, and finish
legitimate clients and are also restricted to human the test within only one round.
interaction services.
2.1.3 Decoding Algorithms
2.1 Classic Group Testing
Model
For sequential GT, at the end of each round, items in
The identification of attackers can be much faster if negative pools are identified as negative, while the
we can find them out by testing the clients in group ones in
instead of one by one. Thus, the key problem is how positive pools require to be further tested. Notice that
to group clients and assign them to different server one item is identified as positive only if it is the only
machines in a sophisticated way, so that if any server item in a
is found under attack ,we can immediately identify positive pool. Nonadaptive GT takes d-disjunct
and filter the attackers out of its client set. matrices as the testing matrix M, where no column is
Apparently, this problem resembles the group testing contained in the
(GT) theory [14] which aims to discover defective Boolean summation of any other d columns.
items in a large population with the minimum number
of tests where each test is applied to a subset of
items, called pools, instead of testing them one by
one. Therefore, we apply GT theory to this network
security issue and propose specific algorithms and
protocols to achieve high detection performance in
terms of short detection latency and low false
positive/negative rate. Since the detections are Fig. 1. Binary testing matrix M and testing outcome
merely based on the status of service resources vector V .
usage of the victim servers, no individually signature-
based authentications or data classifications are
required; thus, it may overcome the limitations of the Fig- 1 as an example, outcomes V [ 3] and V[ 4] are
0, so items in pool 3 and pool 4 are negative, i.e.,
current solutions.
items 3, 4, and 5 are negative. If this matrix M is a d-
disjunct matrix, items other than those appearing in
2.1.1 Basic Idea the negative pools are positive; therefore, items 1
and 2 are positive ones.
The classic GT model consists of t pools and n items
(including at most d positive ones). As shown in Fig. 2.1.4 Apply to Attack Detection
1, this model can be represented by a t×n binary
matrix M where rows represent the pools and
columns represent the items. An entry M[i, j]= 1 if
419
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
420
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
client requests strictly even to the back-end servers, The PND and SDP algorithms achieve slightly better
i.e., without considering session sticked. The way of performance than the SDoP algorithm. Furthermore,
distributing token queues to be mentioned later is the efficiency of the PND algorithm can be further
tightly related to this assumption. However, even if enhanced by optimizing the d-disjunct matrix
the proxies conduct more sophisticated forwarding, employed and state maintenance efficiency.
the token queue distribution can be readily adapted
by manipulating the token piggybacking mechanism
at the client side accordingly. 6 RELATED WORK IN DoS
Since the testing procedure requires distributing DETECTION
intrasession requests to different virtual servers, the Numerous defense schemes against DoS have been
overhead for maintaining consistent session state is proposed and developed [7], which can be
incurred. Our motivation of utilizing virtual servers is categorized into network-based mechanisms and
can retrieve the latest client state though the shared system-based ones. Existing network-based
memory, which resembles the principle of Network mechanisms aim to identify the malicious packets at
File System (NFS). An alternative way out is to the intermediate routers or hosts [10], [12], by either
forward intrasession requests to the same virtual checking the traffic volumes or the traffic
server, which callsfor longer testing period for each distributions. However, the application DoS attacks
round. have no necessary deviations in terms of these
metrics from the legitimate traffic statistics; therefore,
5.SIMULATION CONFIGURATIONS AND network-based mechanisms cannot efficiently handle
RESULTS these attack types.
421
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
422
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
423
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The proposed scheme greatly reduces the computation 2) The transmission time for the authentication
cost required for each mobile user. Furthermore, the message transmitted from Alice to Bob (or from Bob to
proposed scheme is formally demonstrated as being Alice) must be stable.
immune to both the replay attack and the
impersonating attack. The Advantages Of A Timestamp-Based
Authentication Scheme:
The rest of this paper is organized as follows. 1) The protocol only requires two rounds of
Objective and Hwang and Chang‘s scheme of [1] is transmission to reach the goal of mutual
briefly described in Section II and III. Basic idea is authentication.
illustrated in Section IV. In Section V, I present an 2) It is efficient in computation and communication.
efficient mutual authentication scheme for mobile
communications. The conclusion, Future enhancement Although timestamp-based authentication
and references are discussed in section VI, VII, VIII. schemes are simple and efficient, the above two
constraints make them impractical in the Internet and
mobile environments since most of the users‘ clocks
II. Objective
are not synchronous with the server‘s or system‘s
clocks and the transmission time is usually not stable.
The main objective is topropose a secure
mutual authentication and key exchange scheme for The Advantages Of A Nonce-Based
mobile communications by the nested one-time secret Authentication Scheme:
mechanisms and also accomplish the fast mutual 1) It is not necessary to synchronize the clocks of Alice
authentication for mobile environment and Bob.
2) The transmission time for the authentication
III. Review Of Hwang And Chang’s message transmitted from Alice to Bob (or from Bob to
Scheme Alice) can be unstable.
The two basic approaches to achieve mutual The Advantages Of An Authentication Scheme
authentication between two entities (Alice and Bob). Based On Onetime Secrets:
One is the timestamp-based approach, and the other 1) The protocol only requires two rounds of
is the nonce-based approach transmission to reach the goal of mutual
authentication.
The Assumptions Of A Timestamp-Based 2) It is more efficient than a nonce-based
Authentication Scheme: authentication scheme in computation and
1) The clocks of Alice and Bob must be synchronous. communication. (However, it is less efficient than a
timestamp-based scheme since an additional string
424
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
must be computed in the scheme based on a one-time On the other hand, it is difficult to synchronize
secret.) the clocks of the system (VLR and the HLR) and all
mobile users Hence, I cannot utilize the timestamp-
The Drawback Of An Authentication Scheme based solution to construct the authentication protocol
Based On Onetime Secrets: between the system and every mobile user even
Alice and Bob must store an extra string, i.e., though the solution is the most efficient one among
the one-time secret, in their devices or computers. the three authentication mechanisms. Owing to the
assumption of the mechanism based on one-time
The comparisons of the three authentication secrets, it cannot form the authentication protocol for
mechanisms (i.e., timestamps, one-time secrets, and the initial authentication between the system and each
nonces) are summarized in Table- I . mobile user. Thus, I adopt the nonce-based
mechanism to establish the authentication protocol for
the initial authentication between the system and
every user
425
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Figure
2) Initial Authentication For User And The
Current VLR According to the specification of Advanced
Encryption Standard (AES) [2], which is the current
The next process is The Initial Authentication standard of symmetric cryptosystems. The key length
protocol for User and the Current VLR. That is user of every encryption/decryption key in the proposed
generate A and send A to VLR.VLR verify the A then scheme is 256 bits, which will generate a large enough
generate D and send D to user. User can verify the D key space containing possible key values.
and send x to the VLR.
VI. Conclusion
Where
A=EW(s+1) I have proposed a secure mutual
D=EW(x, y, s) authentication and key exchange scheme for mobile
communications based on a novel mechanism, i.e.,
3) Final Authentication For User And The nested one-time secrets. The proposed scheme can
Current VLR withstand the replay attack and the impersonating
attack on mobile communications and speed up
After the process of Initial Authentication authentication. Not only does the proposed scheme
protocol for User and the Current VLR If user visit on reduce the communication and computation cost, but
new VLR redo the Authentication Protocol for Mobile also the security of our scheme has been formally
User and the System Otherwise can perform Final proved.
Authentication Protocol for User and the Current VLR.
After the Final Authentication If user visit on new VLR VII. Future Enhancement
redo the Authentication Protocol for Mobile User and
the System Otherwise end Authentication process This project can use the Symmetric-key algorithms
for the encryption. The enhancement of this project
is to use an asymmetric-key algorithm for encryption.
Technique
426
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
VIII. References:
427
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
428
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
task conventionally returns similar clips from a large videos in both groups are highly relevant, but not
collection of videos which have been either chopped copies. Another example is the extended cinema
up into similar lengths or cut at content boundaries, version of Toyota commercial (60 seconds) and its
subsequence identification task aims at finding if there shorter TV version (30 seconds), which obviously are
exists any subsequence of a long database video that not copies of each other by definition. On the other
shares similar content to a query clip. In other words, hand, a video copy may be no longer regarded as
while for the former, the clips for search have already visually similar if transformed substantially. Video
been segmented and are always ready for similarity subsequence matching techniques using a fixed length
ranking [11], [12], [13], [14], [15], the latter is a sliding window at every possible offset of database
typical subsequence matching problem. Because the sequence for exhaustive comparison [4], [5], [8] are
boundary and even the length of target subsequence not efficient, especially in the case of seeking over a
are not available initially, choosing which fragments to long-running video. Although a temporal skip scheme
evaluate similarities is not preknown. Therefore, most using similarity upper bound [10], [15] can accelerate
existing methods for retrieval task on video clip the search process by reducing the number of
collections [11], [12], [13], [14], [15] are not candidate subsequences, under the scenario that
applicable to this more complicated problem. actually a target subsequence could have different
ordering or length with a query, these methods could
be not effective.
429
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
slides frame by frame on database video with a fixed extended to measure the similarity of multidimensional
length window. In addition to distortions introduced by trajectories and applied for video matching. However,
different encoding parameters, Kim and Vasudev [5] it adheres to temporal order in a rigid manner and
proposed to use spatiotemporal ordinal signatures of does not allow frame alignment or gap, and is very
frames to further address display format conversions, sensitive to noise. DTW can be utilized to address
such as different aspect ratios (letter-box, pillar-box, frame alignment by repeating some frames as many
or other styles). Since the process of video times as needed without extra cost [7], but no frame
transformation could give rise to several distortions, can be skipped even if it is just a noise. In addition, it
techniques circumventing these variations by globe is capacity limited in the context of partial content
signatures have been considered. They tend to depict reordering. LCSS is proposed to address temporal
a video globally rather than focusing on its sequential order and handle possible noise by allowing some
details. This method is efficient, but has limitations elements to be skipped without rearranging the
with blurry shot boundaries or very limited number of sequence order [21], but it will ignore the effect of
shots. Moreover, in reality, a query video clip can be potentially different gap numbers. As known from the
just a shot or even a subshot. However, this method is research on psychology, the visual judgment of human
only applicable for queries which consist of multiple perception has a number of factors. The proposed
shots. model incorporating different factors for measuring
video similarity is inspired by the weighted schemes
B. Video Similarity Search [19] originally introduced at shot level.
The methods mentioned above only have been
Definition 1. Video subsequence identification.
designed to detect videos of the same temporal order
Let Q = { q1,q2, . . . , q|Q| } be a short query clip and S
and length. To further search videos with changes
= {s1, s2, . . . , s|S| } be the long database video
from query due to content editing, a number of
sequence, where qi = {qi1, . . . ,qid} ϵ Q and sj = {sj1,
algorithms have been proposed to evaluate video
. . . ,sjd} ϵ S are d-dimensional feature vectors
similarity. To deal with inserting in or cutting out
representing video frames, and |Q| and |S| denote the
partial content, Hua et al. [6] used dynamic
total frame number of Q and S, respectively (normally
programming based on ordinal measure of resampled
|Q|«|S|). Video subsequence identification is to find Ŝ
frames at a uniform sampling rate to find the best
= { sm, sm+1, . . . , sn} in S, where 1≤m≤n≤|S|,
match for different length video sequences. This
which is the most similar part to Q under a defined
method has only been tested on a small video
score function.
database. Through time warping distance
For easy reference, a list of notations used in this
computation, they achieved higher search accuracy
paper is shown in Table 1.
than the methods proposed in [5] and [6]. However,
with the growing popularity of video editing tools,
videos can be temporally manipulated with ease. This
work will extend the investigations of copy detection
not only in the aspect of potentially different length
but also allowing flexible temporal order (tolerance to
TABLE 1
content reordering). Cheung and Zakhor [11], [12]
A List of Notations
developed Video Signature to summarize each video
with a small set of sampled frames by a randomized
algorithm. Shen et al. [13] proposed Video Triplet to
represent each clip with a number of frame clusters
and estimate the cluster similarity by the volume of
intersection between two hyperspheres multiplying the
smaller density. It also derives the overall video
similarity by the total number of similar frames shared
by two videos. For compactness, these summarizations
inevitably lose temporal information. Videos are
treated as a ―bag‖ of frames thus they lack the ability
to differentiate two sequences with temporal
reordering, such as ―ABCD‖ and ―ACBD.‖ Various time
series similarity measures can be considered, such as
Mean distance, DTW, and LCSS, all of which can be
430
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
431
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
of this, we avoid comparing all the possible only be a portion of the sequence, next, we further
subsequences in S, which is infeasible, but safely and refine it to find the most suitable 1:1 mapping for
rapidly filter a large portion of irrelevant parts prior to accurate identification (or ranking), by considering
similarity evaluations. To do so, the densely matched visual content, temporal order and frame alignment
segments of S containing all the possibly similar video simultaneously.
subsequences have to be identified. Note that it is
unnecessary to maintain the entire graph. Instead, we IX. EXPERIMENTS
just process the small-sized E and the subsequences
A. Effectiveness
for the following steps.
To measure the effectiveness of our approach, we
C. Dense Segment Extraction use hit ratio, which is defined as the number of our
method to correctly identify the position of the most
Along the S side of G with integer counts {0,1, . . .
similar subsequence (ground-truth), to the total
, |Q|}, we consider where the chances of nonzero
number of queries. Note that since for each query
number presences are relatively large. Considering
there is only one target subsequence (where the
potential frame alignment and gap, segments without
original fragment was extracted) in the database, hit
strictly consecutive nonzero counts, e.g., the segment
ratio corresponds to P(1), i.e., the precision value at
{s1, . . . , s6} with counts ―241101,‖ should also be
the first rank. The original video has also been
accepted. To depict the frequency of similar frame
manually inspected so that the ground-truth of each
mappings, we introduce the density of a segment.
query clip can be validated.
D. Filtering by Maximum Size Matching
B. Efficiency
After locating the dense segments, we have k separate
To show the efficiency of our approach, we use
subgraphs in the form of Gk = {Vk, Ek}, where Vk is the
response time, which indicates the average running
vertex set while Ek is the edge set representing similar
time of a query. Without SMSM, all the possible 1:1
frame mappings. However, high density of a segment
mappings will be evaluated. Since it is computationally
cannot sufficiently indicate high similarity to query due
intractable to enumerate all 1:1 mappings to find the
to neglect of actual similar frame number, temporal
most suitable one, and there is no prior practical
order, or frame alignment.
method dealing with this problem for performance
comparison, we mainly study the efficiency of our
Definition 2. MSM A matching M in G = {V, E} is a
approach by investigating the effect of MSM filtering.
subset of E, with pairwise nonadjacent edges. The size
Without MSM, all the segments extracted in dense
of matching M is the number of edges in M, written as
segment extraction will be processed, while with MSM,
|M|. The MSM of G is a matching MMSM with the
only a small number of segments are expected. Note
largest size |MMSM|.
that the performance comparison is not affected by
the underlying high-dimensional indexing method.
Relative to a matching Mk in Gk = {Vk, Ek}, we say the
vertices belonging to the edges of Mk saturated by the
matching, and the others are unsaturated. MSM is
X. CONCLUSIONS
characterized by the absence of augmenting paths
[20]. A matching Mk in Gk is its MSM if and only if Gk This paper has presented an effective and efficient
has no Mk-augmenting path. Starting with a matching query processing strategy for temporal localization of
of size 0 in each subgraph, Augmenting Path Algorithm similar content from a long unsegmented video
progressively selects the augmenting path to enlarge stream, considering target subsequence may be
the current matching size by 1 at a time. We can approximate occurrence of potentially different
search for an Mk-augmenting path from each Mk- ordering or length with query clip. In the preliminary
unsaturated vertex. The detailed MSM algorithm can phase, the similar frames of query clip are retrieved by
be found in [20]. a batch query algorithm. Then, a bipartite graph is
constructed to exploit the opportunity of spatial
E. Refinement by Sub-Maximum Similarity Matching pruning; thus, the high-dimensional query and
database video sequence can be transformed to two
The above filtering step can be viewed as a rough
sides of a bipartite graph. Only the dense segments
similarity evaluation disregarding temporal information.
are roughly obtained as possibly similar subsequences.
Observing that a segment may have multiple 1:1
In the filter-and-refine phase, some nonsimilar
mappings, and the most similar subsequence in S may
432
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
segments are first filtered, several relevant segments Detection,‖ IEEETrans. Circuits and Systems for
are then processed to quickly identify the most Video Technology, vol. 15, no. 1, pp. 127-132,
suitable 1:1 mapping by optimizing the factors of 2005.
visual content, temporal order, and frame alignment [27] X.-S. Hua, X. Chen, and H. Zhang, ―Robust
together. In practice, visually similar videos may Video Signature Based on Ordinal Measure,‖ Proc.
exhibit with different orderings due to content editing, IEEE Int‘l Conf. Image Processing (ICIP ‘04), pp.
which yields some intrinsic cross mappings. Our video 685-688, 2004.
similarity model which elegantly achieves a balance [28] C.-Y. Chiu, C.-H. Li, H.-A. Wang, C.-S. Chen,
between the approaches of neglecting temporal order and L.-F. Chien, ―A Time Warping Based
and strictly adhering to temporal order is particularly Approach for Video Copy Detection,‖ Proc. 18th
suitable for dealing with this case, thus can support Int‘l Conf. Pattern Recognition (ICPR ‘06), vol. 3,
accurate identification. Although only color feature is pp. 228-231, 2006.
used in our experiments, the proposed approach [29] M.R. Naphade, M.M. Yeung, and B.-L. Yeo, ―A
inherently supports other features. For the future Novel Scheme for Fast and Efficient Video
work, we plan to further investigate the effect of Sequence Matching Using Compact Signatures,‖
representing videos by other features, such as ordinal Proc. Storage and Retrieval for Image and Video
signature. Moreover, the weight of each factor for Databases (SPIE ‘00), pp. 564-572, 2000.
measuring [30] K.M. Pua, J.M. Gauch, S. Gauch, and J.Z.
video similarity might be adjusted by user feedback to Miadowicz, ―Real Time Repeated Video Sequence
embody the degree of similarity more completely and Identification,‖ Computer Vision and Image
systematically. Understanding, vol. 93, no. 3, pp. 310-327, 2004.
[31] K. Kashino, T. Kurozumi, and H. Murase, ―A
ACKNOWLEDGMENTS Quick Search Method for Audio and Video Signals
Based on Histogram Pruning,‖ IEEE Trans.
Sound and Vision video is copyrighted. The Sound
Multimedia, vol. 5, no. 3, pp. 348-357, 2003.
and Vision video used in this work is provided solely
[32] S.-C.S. Cheung and A. Zakhor, ―Efficient Video
for research purposes through the TREC Video
Similarity Measurement with Video Signature,‖
Information Retrieval Evaluation Project Collection.
IEEE Trans. Circuits and Systems for Video
The authors would like to thank the anonymous
Technology, vol. 13, no. 1, pp. 59-74, 2003.
reviewers for their comments, which led to
[33] S.-C.S. Cheung and A. Zakhor, ―Fast Similarity
improvements of this paper. This work is supported in
Search and Clustering of Video Sequences on the
part by Australian Research Council under Grant
World-Wide-Web,‖ IEEE Trans. Multimedia, vol. 7,
DP0663272.
no. 3, pp. 524-537, 2005.
[34] H.T. Shen, B.C. Ooi, X. Zhou, and Z. Huang,
REFERENCES
―Towards Effective Indexing for Very Large Video
Sequence Database,‖ Proc. ACM SIGMOD ‘05, pp.
[22] A.W.M. Smeulders, M. Worring, S. Santini, A.
730-741, 2005.
Gupta, and R. Jain, ―Content-Based Image
[35] H.T. Shen, X. Zhou, Z. Huang, J. Shao, and X.
Retrieval at the End of the Early Years,‖ IEEE
Zhou, ―Uqlips: A Real-Time Near-Duplicate Video
Trans. Pattern Analysis and Machine Intelligence,
Clip Detection System,‖ Proc. 33rd Int‘l Conf. Very
vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
Large Databases (VLDB ‘07), pp. 1374-1377, 2007.
[23] C. Faloutsos, M. Ranganathan, and Y.
[36] J. Yuan, L.-Y. Duan, Q. Tian, S. Ranganath,
Manolopoulos, ―Fast Subsequence Matching in
and C. Xu, ―Fast and Robust Short Video Clip
Time-Series Databases,‖ Proc. ACM SIGMOD ‘94,
Search for Copy Detection,‖ Proc. Fifth IEEE
pp. 419-429, 1994.
Pacific-Rim Conf. Multimedia (PCM ‘04), vol. 2, pp.
[24] H. Wang, A. Divakaran, A. Vetro, S.-F. Chang,
479-488, 2004.
and H. Sun, ―Survey of Compressed-Domain
[37] J. Shao, Z. Huang, H.T. Shen, X. Zhou, E.-P.
Features Used in Audio-Visual Indexing and
Lim, and Y. Li, ―Batch Nearest Neighbor Search for
Analysis,‖ J. Visual Comm. and Image
Video Retrieval,‖ IEEE Trans. Multimedia, vol. 10,
Representation, vol. 14, no. 2, pp. 150-183, 2003.
no. 3, pp. 409-420, 2008.
[25] R. Mohan, ―Video Sequence Matching,‖ Proc.
[38] Y. Peng and C.-W. Ngo, ―Clip-Based Similarity
IEEE Int‘l Conf. Acoustics, Speech, and Signal
Measure for Query-Dependent Clip Retrieval and
Processing (ICASSP ‘98), pp. 3697-3700, 1998.
Video Summarization,‖IEEE Trans. Circuits and
[26] C. Kim and B. Vasudev, ―Spatiotemporal
Sequence Matching for Efficient Video Copy
433
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Systems for Video Technology, vol. 16, no. 5, pp. [41] X. Liu, Y. Zhuang, and Y. Pan, ―A New
612-627, 2006. Approach to Retrieve Video by Example Video
[39] D.R. Shier, ―Matchings and Assignments,‖ Clip,‖ Proc. Seventh ACM Int‘l Conf. Multimedia
Handbook of Graph Theory, J.L. Gross and J. (MULTIMEDIA ‘99), vol. 2, pp. 41-44, 1999.
Yellen, eds., pp. 1103-1116, CRC Press, 2004. [42] Y. Wu, Y. Zhuang, and Y. Pan, ―Content-Based
[40] L. Chen and T.-S. Chua, ―A Match and Tiling VideoSimilarity Model,‖ Proc. Eighth ACM Int‘l
Approach to Content-Based Video Retrieval,‖ Proc. Conf. Multimedia (MULTIMEDIA ‘00), pp. 465-467,
IEEE Int‘l Conf. Multimedia and Expo (ICME ‘01), 2000.
pp. 417-420, 2001.
434
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract— A mobile ad hoc network (MANET) is a Nodes in the MANET usually share the same
self-configuring network of mobile devices physical media; they transmit and acquire signals at
connected by wireless links. Security has the same frequency band, and follow the same
become important to Mobile Ad Hoc Networks hopping sequence or spreading code. The data-link-
(MANETs) due to their nature and use for many layer manages the wireless link resources and
mission and life-critical applications. So, there is coordinates medium access among neighboring nodes.
a critical need to replace single layer intrusion
detection technology with multi layer detection.
An efficient cross layer intrusion detection The medium access control (MAC) protocol allows
system is proposed to discover the malicious mobile nodes to share a common broadcast channel.
nodes and different types of DoS attacks by The network-layer holds the multi-hop communication
exploiting information available across different paths across the network. All nodes in the mobile ad
layers of protocol stack in order to improve the hoc network function as routers that discover and
accuracy of detection. Fixed width clustering maintain routes to other nodes in the network.
algorithm is used for efficient detection of the
anomalies in the MANET traffic and different The nature of mobility creates new vulnerabilities
types of attacks in the network. due to the open medium, dynamically changing
network topology, cooperative algorithms, lack of
Keywords— MANET, Fixed Width Clustering centralized monitoring and management points. A
algorithm, Cross Layer malicious node may take advantages of the MANET
node to launch routing attacks as the node acts as
INTRODUCTION router to communicate with each other. The wireless
links between the nodes along with the mobility raises
Ad hoc networks are a new paradigm of wireless
the challenges of IDS to detect the attacks. It is very
communication for mobile hosts. Wireless ad-hoc
difficult and challenging for Intrusion Detection system
network consists of a collection of ―peer‖ mobile nodes
(IDS) to fully detect routing attacks due to MANET‘s
that are capable of communicating with each other
characteristics. So, the IDS needs a scalable
without help from a fixed infrastructure. Mobile ad hoc
architecture to collect sufficient evidences to detect
network (MANET) does not rely on a preexisting
routing attacks effectively.
infrastructure, such as routers in wired networks or
access points in managed (infrastructure) wireless We have proposed a new intrusion detection
networks. Instead, each node participates in routing architecture which incorporates cross layer that
by forwarding data for other nodes, and so the interacts between the layers. In addition to this we
determination of which nodes forward data is made have used association module to link between the OSI
dynamically based on the network connectivity. protocol stack and the IDS module which results in low
overhead during the data collection. We have
435
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
implemented the fixed width clustering algorithm in types of routing attacks. They are able to detect
anomaly detection engine for efficient detection of sinkhole attacks effectively which are intense form of
intrusion in the adhoc networks. attack. There are some flaws like there is absence of
simulation platform that can support a wider variety of
The rest of the paper is organized as follows.
attacks on larger scale networks. Fixed width
Section II presents the related work. A brief
clustering algorithm has shown to be highly effective
description about cross layer techniques in IDS and
for anomaly detection in network intrusion [9]. It
followed by association module is given in Section III.
presents a geometric framework for unsupervised
A detailed description of intrusion detection module
anomaly detection. This paper needs more feature
and its underlying architecture is dealt in Section IV.
maps over different kinds of data and needs to
The anomaly detection mechanism used in MANET is
perform more extensive experiments evaluating the
discussed in section V. Finally, section VI concludes
methods presented.
the carried out research and future works.
CROSS LAYER TECHNIQUES IN IDS
RELATED WORKS
The very advantage of mobility in MANET leads to
A lot of studies have been done on security
its vulnerabilities. For efficient intrusion detection, we
prevention measures for infrastructure-based wireless
have used cross layer techniques in IDS. The
networks but few works has been done on the
traditional way of layering network approach with the
prospect of intrusion detection [1]. Some general
purpose of separating routing, scheduling, rate and
approach has been used in a distributed manner to
power control is not efficient for ad-hoc wireless
insure the authenticity and integrity of routing
networks. A. Goldsmith discussed that rate control,
information such as key generation and management
power control; medium access and routing are building
on the prevention side. Authentication based
block of wireless network design [10].
approaches are used to secure the integrity and the
authenticity of routing messages such as [2], [3]. Generally, routing is considered in a routing layer
There are some difficulties that have to be faced in and medium access in MAC layer whereas power
realizing some of the schemes like cryptography and control and rate control are sometimes considered in a
they are relatively expensive on MANET because of PHY and sometimes in a MAC layer. With the help of
computational capacity. A number of intrusion cross layer interaction, the routing forwards possible
detection schemes for intrusion detection system have route choices to MAC and MAC decides the possible
been presented for ad-hoc networks. In [4], Zhang routes using congestion and IDS information as well as
proposes architecture for a distributed and cooperative returns the result to the routing.
intrusion detection system for ad-hoc networks based The selection of correct combination of layers in the
on statistical anomaly detection techniques but has not design of cross layer IDS is very critical to detect
properly mentioned about the simulation scenario and attacks targeted at or sourced from any layers rapidly.
the type of mobility used has not been mentioned. It is optimal to incorporate MAC layer in the cross layer
In [5], Huang details an anomaly detection design for IDS as DoS attack is better detected at this
technique that explores the correlations among the layer. The routing protocol layer and MAC layer is
features of nodes and discusses about the routing chosen for detecting routing attacks in an efficient
anomalies. In [6], A. Mishra emphasizes the challenge way. Data with behavioural information consisting of
for intrusion detection in ad-hoc network and purpose layer specific information are collected from multiple
the use of anomaly detection, but do not provide a layers and forward it to data analysis module which is
detailed solution or implementation for the problem. In located in an optimal location [11]. Figure-1 illustrates
[7],Kisung Kim discusses about sinkhole attack which the cross layer design.
is one of the representative attacks in MANET caused This cross layer technique incorporating IDS leads
by attempts to draw all network traffic to a sinkhole to an escalating detection rate in the number of
node. He focuses on the sinkhole problem on Dynamic malicious behaviour of nodes increasing the true
Source Routing (DSR) protocol in MANET and detects positive and reducing false positives in the MANET. It
the sinkhole node, several useful sinkhole indicators also alleviates the congestion which can adapt to
through analyzing the sinkhole problem has been changing network and traffic characteristics. In order
used. to evade congestion and reroute traffic, MAC and
routing layers have to cooperate with each other with
In [8], Loo presents an intrusion detection method
the IDS in order to avoid insertion of malicious nodes
using a clustering algorithm for routing attacks in
in the new routes.
sensor networks. It is able to detect three important
436
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
437
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
438
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
strength of intrusion evidence. It collects them in the We use some of the features for detecting DoS attacks
alert cache for t seconds. If there are more abnormal and attacks that manipulate routing protocol. The
predictions than the normal predictions then it is number of data packets received is used to detect
regarded as ―abnormal‖ and with adequate information unusual level of data traffic which may indicate a DoS
an alarm is generated to inform that an intrusive attack based on a data traffic flood.
activity is in the system.
C. Training normal data using cluster mechanism
ANOMALY DETECTION MECHANISM We have implemented fixed-width clustering
The anomaly detection system creates a normal algorithm as an approach to anomaly detection. It
base line profile of the normal activities of the network calculates the number of points near each point in the
traffic activity. The main objective is to collect set of feature space. In fixed width clustering technique, set
useful features from the traffic to make the decision of clusters are formed in which each cluster has fixed
whether the sampled traffic is normal or abnormal. radius w also known as cluster width in the feature
Some of the advantages of anomaly detection system space [20]. The cluster width w is chosen as the
are it can detect new and unknown attacks, it can maximum threshold radius of a cluster.
detect insider attacks; and it is very difficult for the
attacker to carry out the attacks without setting off an Algorithm: Fixed Width Clustering
alarm [18]. The process of anomaly detection
Input: k: the number of clusters,
comprises of two phases: training and testing.
A. Construction of normal Dataset ST: a data set containing n traffic samples.
The data obtained from the audit data sources mostly Output: A set of k clusters.
contains local routing information, data and control
information from MAC and routing layers along with Method:
other traffic statistics. The training of data may entail (1) Arbitrarily choose k objects from D as the initial
modeling the allotment of a given set of training points
or characteristic network traffic samples. cluster centers;
We have to make few assumptions so that the (2) Repeat
traced traffic from the network contains no attack
(3) (re)assign each object to the cluster to which the
traffic [19]:
• The normal traffic occurs more frequently than the object is the most similar, based on the mean value of
attack traffic.
the objects in the cluster;
• The attack traffic samples are statistically different
from the normal connections. (4) Update the cluster means, i.e., calculate the mean
Since, we have used two assumptions; the value of the objects for each cluster;
attacks will appear as outliers in the feature space
resulting in detection of the attacks by analyzing and (5) Until no change;
identifying anomalies in the data set. Explanation of the fixed width algorithm:
A set of network traffic sample ST are obtained
from the audit data for training purpose. Each sample
B. Feature construction
si in the training set is represented by a d-dimensional
For feature construction, an unsupervised method
vector of attributes. In the beginning, the set of
is used to construct the feature set. Clustering
clusters as well as the number of clusters are null.
algorithm is used to construct features from the audit
Since, there is significant variation in each attribute.
data. The feature set is created by using the audit data
While calculating the distance between points,
and most common feature set are selected as essential
normalization is done before mapping into the feature
feature set which has weight not smaller than the
space to ensure that all features have the same
minimum threshold. A set of considerable features
outcome. It is obtained by normalizing each
should be obtained from the incoming traffic that
continuous attribute in terms of the number of
differentiates the normal data from the intrusive data.
standard deviations from the mean. The first point of
Few and semantic information is captured which
the data forms the centre of the new cluster. A new
results in better detection performance and saves
cluster ψ1 is formed having centroid ψ1* from sample
computation time. In case of feature construction, we
si. For every succeeding point, we measure the
collect the traffic related features as well as non-traffic
distance of each traffic sample si to the centroid of
related features which represents routing conditions.
439
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
each cluster ψ1* that has been generated by the doing the simulation using NS-2 simulator and analyze
cluster set Ψ. If the distance to the nearest cluster ψn the result.
is within w of cluster center, then the point is assigned
to the cluster, and the centroid of the closest cluster is
updated.
REFERENCES
The total number of points in the cluster is
S. Jacobs, S. Glass, T. Hiller, and C. Perkins,
incremented. Else, the new point forms the centroid of
―Mobile IP authentication, authorization, and
a new cluster. Euclidean distance as well as argmin is
accounting requirements,‖ Request for Comments
used because it is more convenient to have items
2977, Internet Engineering Task Force, October
which minimizes the functions. As a result, the
2000.
computational load is decreased. Moreover, the traffic
K. Sanzgiri, B. Dahill, B.N. Levine, E.B. Royer, and
samples are not stored and only one pass is required
C. Shields, ―A Secure Routing Protocol for Ad-hoc
through the traffic samples. In the final stage of
Networks,‖ in the Proceedings of International
training, labeling of cluster is done based on the initial
Conference on Network Protocols (ICNP), 2002.
assumptions like ratio of the normal traffic is very
Yih-Chun Hu, Adrian Perrig, and David Johnson.
small than attack traffic and the anomalous data points
Ariadne: ―A Secure On- Demand Routing Protocol
are statistically different to normal data points. If the
for Ad Hoc Networks,‖ in the Proceedings of
cluster contains less than a threshold η % of the total
MobiCom, 2002.
set of points then it is considered as anomalous.
Y. Zhang, W. Lee, and Y.-A. Huang, "Intrusion
Otherwise the clusters are labeled as normal. Besides,
Detection Techniques for Mobile Wireless
the points in the dense regions will be higher than the
Networks," ACM J. Wireless Networks, pp. 545-
threshold; we only consider the points that are
556, 2003.
outliers.
Y. Huang, W. Fan, W. Lee, and P. S. Yu, "Cross-
D. Testing Phase feature analysis for detecting ad-hoc routing
The testing phase takes place by comparing each new anomalies," in the Proceedings of the 23rd
traffic samples with the cluster set Ψ to determine the International Conference on Distributed
anonymity. The distance between a new traffic sample Computing Systems (ICDCS) Providence, pp. 478-
point si and each cluster centroid ψ1* is calculated. If 487, 2003.
the distance from the test point s to the centroid of its A. Mishra, K. Nadkarni, and A. Patcha, "Intrusion
nearest cluster is less than cluster width parameter w, Detection in Wireless Ad-Hoc Networks," in IEEE
then the traffic sample shares the label as either Wireless Communications, pp. 48- 60, February
normal or anomalous of its nearest cluster. If the 2004.
distance from s to the nearest cluster is greater than Kisung Kim, Sehun Kim ―A Sinkhole Detection
w, then s lies in less dense region of the feature space, Method based on Incremental Learning in
and is labeled as anomalous. While comparing our IDS Wireless Ad Hoc Networks‖, 2008.
module with [14], it has complexity of the system due C. Loo, M. Ng, C. Leckie, and M. Palaniswami,
to non linear pattern recognition where as the ―Intrusion Detection for Routing attacks in Sensor
proposed IDS is simple using association rule to Networks,‖ in International Journal of Distributed
comply with the anomaly profiling. Similarly, [12] has Sensor Networks, pp. 313-332, october-December
message overhead as a result it consumes more power 2006
resulting battery constrains while the proposed IDS E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S.
consumes low energy by adopting association rule. Stolfo, ―A geometric framework for unsupervised
anomaly detection: detecting intrusions in
CONCLUSIONS AND FUTURE WORK
unlabeled data,‖ in Applications of Data Mining in
Hence, an efficient intrusion detection mechanism Computer Security. Kluwe, 2002.
based on anomaly detection is presented in this paper A.Goldsmith and S.B. Wicker, ―Design challenges
utilizing cluster data mining technique. We expect that for energyconstrained ad hoc wireless networks,‖
our proposed cross-layer based intrusion detection in IEEE Wireless Communications, pp. 9(4):8–27,
architecture designed to detect DoS attacks, sinkhole August 2002.
attack at different layers of the protocol stack, and C. J. John Felix, A. Das, B.C. Seet, and B.S. Lee,
various types of UDP flooding attack in an efficient ―Cross Layer versus Single Layer Approaches for
way. Intrusion Detection in MANET,‖ in IEEE
Future work will involve in implementing the International Conference on Networks, Adelaide,
proposed architecture with fixed width algorithm and pp. 194-199, November, 2007.
440
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
J. S. Baras and S. Radosavac, "Attacks and Networking in China, CHINACOM '07, pp. 296-
Defenses Utilizing Cross- Layer Interactions in 300, August 2007.
MANET," in workshop on Cross-Layer Issues in R. Shrestha, K.H. Han, J.Y. Sung, K.J Park, D.Y.
the Design of Tactical Mobile Ad Hoc Wireless Choi, S.J. Han, ―An Intrusion Detection System in
Networks: Integration of Communication and Mobile Ad-Hoc Networks with Enhanced Cross
Networking Functions to Support Optimal Layer Features,‖ KICS conference, Suncheon
Information Management, Washington, DC, June University, pp. 264-268, May 2009.
2004. A. Patcha and J.M. Park, ―An overview of anomaly
L. Yu, L. Yang, and M. Hong, ―Short Paper: A detection techniques: existing solutions and latest
Distributed Cross- Layer Intrusion Detection technological trends,‖ Elsevier Computer
System for Ad Hoc Networks,‖ in Proceedings of Networks, Vol. 51, Issue 12, pp. 3448–3470,
the 1st International Conference on Security and 2007.
Privacy for Emerging Areas in Communication L. Portnoy, E. Eskin, and S. Stolfo, ―Intrusion
Networks, Athens, Greece, pp. 418-420, detection with unlabeled data using clustering,‖ in
September 2005. proceedings of the Workshop on Data Mining for
C. J. John Felix, A. Das, B.C. Seet, and B.-S. Lee, Security Applications, November 2001.
―CRADS:Integrated Cross Layer Approach for C. Loo, M. Ng, C. Leckie, and M. Palaniswami,
Detecting Routing Attacks in MANETs,‖ in IEEE ―Intrusion Detection for Routing attacks in Sensor
Wireless Communications and Networking Networks,‖ in International Journal of Distributed
Conference (WCNC), Las Vegas, CA, USA, pp. Sensor Networks, pp. 313-332, October-
1525-1530, March 2008. December 2006.
R. Shrikant, ―Fast algorithm for mining association
rule and sequential pattern,‖ PhD Thesis ,
University of Wisconsin, Madison, 1996.
S.J. hua and M.C. Xiang, ―Anomaly Detection
Based on Data-Mining for Routing Attacks in
Wireless Sensor Networks,‖ in Second
International Conference on Communications and
441
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract— Designing cost-efficient, secure network Keywords— Wireless Sensor Networks, Security in
protocols for Wireless Sensor Networks (WSNs) is a Wireless Sensor Net-Works, energy-based keying,
challenging problem because sensors are resource- resource-constrained devices.
limited wireless devices. The communication cost is the
most dominant factor in a sensor‘s energy I. INTRODUCTION
consumption, we introduce an wireless sensor network Today, WSNs are no longer a nascent technology and
security using virtual energy based encryption scheme future advances in technology will bring more sensor
for WSNs that significantly reduces the number of applications into our daily lives as well as into many
transmissions needed for rekeying to avoid stale keys. diverse and challenging application scenarios. For
The key to the hashing function dynamically changes example, in a battlefield scenario, sensors may be
as a function of the transient energy of the sensor, used to detect the location of enemy sniper fire or to
thus requiring no need for re-keying. Multiple sensing detect harmful chemical agents before they reach
nodes using their own authentication key. The goal of troops. In another potential scenario, sensor nodes
saving energy, minimal transmission is imperative for forming a network under water could be used for
some military applications of WSNs where an oceanographic data collection, pollution monitoring,
adversary could be monitoring the wireless spectrum. assisted navigation, military surveillance, and mine
Energy Based Encryption and Keying is a secure reconnaissance operations. Future improvements in
communication framework where sensed data is technology will bring more sensor applications into our
encoded using a scheme based on a permutation code daily lives and the use of sensors will also evolve from
generated via the RC4 encryption mechanism. The key merely capturing data to a system that can be used for
to the RC4 encryption mechanism dynamically changes real-time compound event alerting
as a function of the residual virtual energy of the Surveillance of territorial waters, to biological and
sensor. Thus, a one-time dynamic key is employed for chemical attack detection.
one packet only and different keys are used for the In this regard, designing secure protocols for
successive packets of the stream. The intermediate wireless sensor networks is vital. However, designing
nodes along the path to the sink are able to verify the secure protocols for WSNs requires first the detailed
authenticity and integrity of the incoming packets under-standing of the WSN technology and its relevant
using a predicted value of the key generated by the security aspects. Compared to other wireless
sender‘s virtual energy, thus requiring no need for networking technologies, WSNs have unique
specific rekeying messages. Energy Based Encryption characteristics that need to be taken into account
and Keying is able to efficiently detect and filter false when building protocols. Among many factors, the
data injected into the network by malicious outsiders. available resources (i.e., power, computational
capacities, and memory) onboard the sensor nodes are
severely limited.
442
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
In this paper, we focus on keying mechanisms for exchange explicit control messages for rekeying;
WSNs. There are two fundamental key management 2. provision of one-time keys for each packet
schemes for WSNs: static and dynamic. In static key transmitted
management schemes, key management functions to avoid stale keys;
(i.e., key generation and distribution) are handled 3. a modular and flexible security architecture
statically. That is, the sensors have a fixed number of with a
keys loaded either prior to or shortly after network simple technique for ensuring authenticity ,
deployment. On the other hand, dynamic key integrity,
management schemes perform keying functions and nonrepudiation of data without enlarging
(rekeying) either periodically or on demand as needed packets with
by the network. The sensors dynamically exchange MACs; and
keys to communicate. Although dynamic schemes are 4. a robust secure communication framework that is
more attack-resilient than static ones, one significant operational in dire communication situations and over
disadvantage is that they increase the communication unreliable medium access control layers.
overhead due to keys being refreshed or redistributed This paper is structured as follows: Section 2 deals
from time to time in the network. There are many about the Background and Motivation about the
reasons for key refreshment, including: updating keys concepts discussed through the rest of the paper,
after a key revocation has occurred, refreshing the key Section 3, Section 4, Section 5 and finally section 6.
such that it does not become stale, or changing keys
due to dynamic changes in the topology. In this paper, 2 BACKGROUND AND MOTIVATION
we seek to minimize the overhead associated with
refreshing keys to avoid them becoming stale. Because One significant aspect of confidentiality research in
the communication cost is the most dominant factor in WSNs entails designing efficient key management
a sensor‘s energy consumption schemes. This is because regardless of the encryption
The purpose of this paper is to develop an efficient mechanism chosen for WSNs, the keys must be made
and available to the communicating
secure communication framework for WSN nodes (e.g., sources and sink(s)). The keys could be
applications. Secure communication framework distributed to the sensors before the network
provides a technique to verify data in line and drop deployment or they could be redistributed (rekeying)
false packets from malicious nodes, thus maintaining to nodes on demand as triggered by keying events.
the health of the sensor network. Wireless sensor The former is static key management and the latter is
network security using virtual energy based encryption dynamic key management. There are myriads of
dynamically updates keys without exchanging variations of these basic schemes in the literature. The
messages for key renewals and embeds integrity into main motivation behind Wireless sensor network
packets as opposed to enlarging the packet by security using virtual energy based encryption is that
appending message authentication codes (MACs). the communication cost is the most dominant factor in
Specifically, each sensed data is protected using a a sensor‘s energy consumption.
simple encoding scheme based on a permutation code Rekeying with control messages is the approach of
generated with the RC4 encryption scheme and sent existing dynamic keying schemes whereas
toward the sink. The key to the encryption scheme rekeying without extra control messages is the
dynamically changes as a function of the residual primary feature of the Wireless sensor network
virtual energy of the sensor, thus requiring no need for security using virtual energy based encryption
rekeying. Therefore, a one-time dynamic key is used framework. Dynamic keying schemes go through the
for one message generated by the source sensor and phase of
different keys are used for the successive packets of rekeying either periodically or on demand as needed
the stream. The nodes forwarding the data along the by the network to refresh the security of the system.
path to the sink are able to verify the authenticity and With rekeying, the sensors dynamically exchange keys
integrity of the data and to provide non repudiation. that are used for securing the communication. Hence,
The protocol is able to continue its operations under the energy cost function for the keying process from a
dire communication cases as it may be operating in a source sensor to the sink while sending a message on
high-error-prone deployment area like under water. a particular path with dynamic key-based schemes can
The contributions of this paper are as follows: be written as follows (assuming computation cost,
1. a dynamic en route filtering mechanism that Ecomp, would approximately be fixed):
does not E Dyn= (EKdisc + Ecomp)* E[ŋh]* X / T, (1)
443
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Sensing data
Forward data Drop data
Key
Fig. 1. Keying cost of dynamic key-based schemes Check
based on E[nh] Crypto Authentication
444
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Fig. 2. Modular structure Wireless sensor network al. to address both the insider and outsider threats.
security using virtual energy based encryption However, the common downside of all these schemes
is that they are complicated for resource-constrained
3 SEMANTICS OF EBEK sensors
and they either utilize many keys or they transmit
The EBEK framework is comprised of three modules: many messages in the network, which increases the
Energy-Based Keying, Crypto, and Forwarding. The energy consumption of WSNs. Also, these studies have
energy-based keying process involves the creation of not been designed to handle dire communication
dynamic keys. Contrary to other dynamic keying scenarios unlike VEBEK. Another significant
schemes, it does not exchange extra messages to observation with all of these works is that a realistic
establish keys. A sensor node computes keys based on energy analysis of the protocols wasnot presented.
its residual virtual energy of the sensor. The key is Last, the concept of dynamic energy-based
then fed into the crypto module. The crypto module in encoding and filtering was originally introduced by the
VEBEK employs a simple encoding process, which is DEEF framework. Essentially, EBEK has been largely
essentially the process of permutation of the bits in the inspired by DEEF. However, VEBEK improves DEEF in
packet according to the dynamically created several ways. First, VEBEK utilizes virtual energy in
permutation code generated via RC4. The encoding is place of actual battery levels to create dynamic keys.
a simple encryption mechanism adopted for EBEK. EBEK‘s approach is more reasonable because in real
However, EBEK‘s flexible architecture allows for life, battery levels may fluctuate and the differences in
adoption of stronger encryption mechanisms in lieu of battery levels across nodes may spur synchronization
encoding. Last, the forwarding module handles the problems, which can cause packet drops. Second,
process of sending or receiving of encoded packets VEBEK integrates handling of communication errors
along the path to the sink. A high-level view of the into its logic, which is missing in DEEF. Last, EBEK is
EBEK framework and its underlying modules are implemented based on a realistic WSN routing
shown in Fig. 2. These modules are explained in protocol, i.e., Directed Diffusion , while DEEF
further detail below. Important notations usedare articulates the topic only theoretically. Another crucial
given in Table 1. idea of this paper is the notion of sharing a dynamic
cryptic credential (i.e., virtual energy) among the
sensors. A similar approach was suggested inside the
SPINS study via the SNEP protocol. In particular,
nodes share a secret counter when generating keys
and it is updated for every new key. However, the
SNEP protocol does not consider dropped packets in
the network due to communication errors. Although
another study, Minisec , recognizes this issue, the
solution suggested by the study still increases the
4. RELATED WORK packet size by including some parts of a counter value
into the packet structure. Finally, one useful pertinent
work surveys cryptographic primitives and
En route dynamic filtering of malicious packets has implementations for sensor nodes.
been the focus of several studies, including DEF by Yu
and Guan , As the details are given in the 5 CONCLUSION AND FUTURE WORK
performance evaluation section where they were
compared with the VEBEK framework, the reader is Communication is very costly for wireless sensor
referred to that section for further details as not to networks (WSNs) and for certain WSN applications.
replicate the same information here. Moreover, Ma‘s Independent of the goal of saving energy, it may be
work applies the same filtering concept at the sink very important to minimize the exchange of messages
and utilizes packets with multiple MACs appended. A (e.g., military scenarios). To address these concerns,
work proposed by Hyun and Kim uses relative location we presented a secure communication framework for
information to make the compromised data WSNs called Energy- Based Encryption and Keying. In
meaningless and to protect the data without comparison with other key management schemes,
cryptographic methods. In using static pairwise keys EBEK has the following benefits: 1) it does not
and two MACs appended to the sensor reports, ―an exchange control messages for key renewals and is
interleaved hop-by-hop authentication scheme for therefore able to save more energy and is less chatty,
filtering of injected false data‖ was proposed by Zhu et
445
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2) it uses one key per message so successive packets Wireless Sensor Networks,‖ Proc. IEEE
of the stream use different keys—making VEBEK more
INFOCOM, pp. 1-12, Apr. 2006.
resilient to certain attacks (e.g., replay attacks, brute-
force attacks, and masquerade attacks), and 3) it [5] C. Kraub, M. Schneider, K. Bayarou, and C.
unbundles key generation from security services,
Eckert, ―STEF: A Secure Ticket-Based En-Route
providing a flexible modular architecture that allows
for an easy adoption of different key-based encryption Filtering Scheme for Wireless Sensor Networks,‖
or hashing schemes. We have evaluated EBEK‘s
Proc. Second Int‘l Conf.
feasibility and performance through both theoretical
analysis and simulations. Our results show that Availability, Reliability and Security (ARES ‘07), pp.
different operational modes of EBEK (I and II) can be
310-317, Apr. 2007.
configured to provide optimal performance in a variety
of network configurations depending largely on the [6] S. Zhu, S. Setia, S. Jajodia, and P. Ning, ―An
application of the sensor network. We also compared
Interleaved Hop-by- Hop Authentication Scheme for
the energy performance of our framework with other
en route malicious data filtering schemes. Our results Filtering of Injected False Data in Sensor Networks,‖
show that VEBEK performs better (in the worst case
Proc. IEEE Symp.
between 60-100 percent improvement in energy
savings) than others while providing support for Security and Privacy, 2004.
communication error handling, which was not the
[7] A. Perrig, R. Szewczyk, V. Wen, D. Cullar,
focus of earlier studies. Our future work will address
insider and J. Tygar, ―Spins: Security Protocols for Sensor
threats and dynamic paths.
Networks,‖ Proc. ACM MobiCom, 2001.
[8] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and
E.Cayirci, ―Wireless Sensor Networks: A
Survey,‖ Computer Networks, vol. 38, no. 4,
6 REFERENCES pp. 393-422, Mar. 2002.
[1] S. Uluagac, C. Lee, R. Beyah, and J.
Copeland, ―Designing Secure Protocols for Wireless [9] C. Vu, R. Beyah, and Y. Li, ―A Composite Event
Sensor Networks,‖ Wireless Algorithms, Systems, and Detection in Wireless Sensor Networks,‖ Proc.
Applications, vol. 5258, pp. 503-514, Springer, 2008. IEEE Int‘l Performance, Computing, and
[2] H. Hou, C. Corbett, Y. Li, and R. Beyah, Comm. Conf. (IPCCC ‘07), Apr. 2007.
―Dynamic Energy-Based Encoding and Filtering in
Sensor Networks,‖ Proc. IEEE Military Comm. Conf. [10] L. Eschenauer and V.D. Gligor, ―A Key-
(MILCOM ‘07), Oct. 2007. Management Scheme for Distributed Sensor
[3] F. Ye, H. Luo, S. Lu, and L. Zhang, ―Statistical Networks,‖ Proc. Ninth ACM Conf. Computer
En-Route Filtering of Injected False Data in Sensor and Comm. Security, pp. 41-4, 2002.
Networks,‖ IEEE J. Selected Areas in Comm., vol. 23, [11] M. Eltoweissy, M. Moharrum, and R.
no. 4, pp. 839-850, Mukkamala, ―Dynamic Key Management in
Apr. 2005. Sensor Networks,‖ IEEE Comm. Magazine, vol.
[4] Z. Yu and Y. Guan, ―A Dynamic En-Route 44, no. 4, pp. 122-130, Apr. 2006
Scheme for Filtering False Data Injection in [12] K. Akkaya and M. Younis, ―A Survey on
Routing Protocols for Wireless Sensor
446
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
.Sites Referred
1) http://en.wikipedia.org/wiki/Sensor_Networks
2) http://searchdatacenter.techtarget.com/definiti
on/sensor-network
447
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract : Image registration is a fundamental register them all together. We call this collection of
operation in image analysis. Conventionally ensembles, images an ensemble. The vast majority of registration
the image sets, are registered by choosing one image methods are designed to register only two images at a
as a template, and every other image is registered to time. It is not clear how to use these pairwise methods
it. The problem with this pair wise approach is that the for ensemble registration.
results depend on which image is chosen as the The problem of registration becomes more difficult
template. Since different sensors create images with when the images come from different sources. For
different features the issue is particularly acute for example, a body part could be imaged with different
multi-sensor ensembles. The problem of registration modalities such as magnetic resonance imaging (MRI),
becomes more difficult when the images come from computed tomography (CT), and positron emission
different sources. This paper addresses the question of tomography (PET), or a region of the earth captured
how to register more than two images. In this paper, by satellite imagery using a variety of different
we present a method that employs clustering to sensors, or several images of a face acquired with
simultaneously register an entire ensemble of images. different illumination conditions. In these cases, the
The method clustering in the JISP, jointly modeling the image intensities cannot be compared directly
distribution of points in the JISP as it estimates the because, although the images depict the same
motion parameters. The method computes the content, they do so with different transfer functions.
registration solution, and at the same time generates a We refer to such registration problems as multisensor
model of the transfer functions among the images of registration.
the ensemble.
Pairwise ensemble registration has the undesirable
Index Terms—registration, multi-sensor, multi-image, property that the solution depends on which pairs of
mutual images are chosen and registered. We will refer to this
information, Gaussian mixture models. issue as selection dependency. In addition, most
pairwise registration methods do not offer a way to
I. INTRODUCTION guarantee that redundancy in the solution is
Image registration is the process of transforming consistent.
different sets of data into one coordinate system. Data
may be multiple photographs, data from different
sensors, from different times, or from different
viewpoints It is used in, computer vision, medical
imaging, military automatic target recognition, and streaming. Many other applications, such as
compiling and analyzing images and data from traditional FTP or file-sharing applications, rely on
satellites. Registration is necessary in order to be able delay-sensitive protocols, such as TCP, and are
to compare or integrate the data obtained from these therefore in turn delay-sensitive as well. For such
different measurements. applications, it is well known that the level of traffic
This paper addresses the question of how to register perturbation caused by the mix network must be
more than two images. Suppose you have several carefully chosen in order to not unduly affect delay
images—all of the same content—and you want to and throughput requirements of the applications.
448
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
This paper focuses on the quantitative to automatically and simultaneously solve two
evaluation of mix performance. We focus our analysis problems, image alignment and clustering. Both the
on a particular type of attack, which we call the flow- alignment parameters and clustering parameters are
correlation attack. In general, flow-correlation attacks formulated into a unified objective function, whose
attempt to reduce the anonymity degree by estimating optimization leads to an unsupervised joint estimation
the path of flows through the mix network. Flow approach. It is further extended to semi-supervised
correlation analyzes the traffic on a set of links simultaneous estimation where a few labeled images
(observation points) inside the network and estimates are provided. Extensive experiments on diverse real-
the likelihood for each link to be on the path of the world databases demonstrate the capabilities of our
flow under work on this challenging problem.
Fig1 Success rate of existing pairwise registration 4) Image recognition
method The recognition of members of certain object
classes, such as faces or cars, can be substantially
improved by first transforming a detected object into a
. Consider a pairwise method that registers phantom canonical pose. Such alignment reduces the variability
image A to B and B to C. By composing those two that a classifier/recognizer must cope with in the
transformations, one can derive a transformation from modeling stage. Given a large set of training images,
A to C. However, it is extremely unlikely that one popular alignment approach is called congealing
registering A to C will yield exactly the same which jointly estimates the alignment/warping
transformation. parameters in an unsupervised manner for each image
We refer to this phenomenon as internal in an ensemble. It has been shown that congealing
inconsistency. We hypothesize that a registration can be reliably performed for faces and it improves the
strategy that egisters allthe images simultaneously can appearance-based face recognition performance. The
avoid both selection dependency and internal conventional congealing approach works on an image
inconsistency. That is, including all the images in a ensemble of a single object class. However, in
single, global registration problem precludes the need practices we often encounter the situation where the
to Choosewhich pairs to register, while generating a multiple object classes, or object modes exhibited in
solution that is not redundant (and, thus, is internally an ensemble.
consistent). Moreover, we hypothesize that the
statistical power of using all the images at the same II. BACKGROUND
time, rather than just two at a time, Consider two images, one overlaid on the other.
will yield more accurate registration solutions. Each pixel corresponds to two intensity values, one
In this paper, we present a method that employs from each of the two images. This 2-tuple can be
clustering to simultaneously register an entire plotted in the joint intensity space, where each axis
ensemble of images. The method computes the corresponds to intensity from each of the images.
registration solution, and at the same time generates a Plotting the points for all the pixels creates a scatter
model of the transfer functions among the images of plot in this joint intensity space, and we refer to this
the ensemble. scatter plot as the joint intensity scatter plot, or JISP.
The implicit assumption linking different images of
A. Goals the same object is that they are recognizable as the
same object because of some consistency by which
Major contributions are summarized as intensities are assigned to components in the image.
follows: The pixels with intensities near x in one image often
correspond to pixels with intensities near y in the other
3) Image transformation image. We call this correspondence an intensity
Joint alignment for an image ensemble can rectify mapping. An intensity mapping need not be one-to-
images in the spatial domain such that the aligned one.
images are as similar to each other as possible. This Each object in an image corresponds to a coherent
important technology has been applied to various collection of points in the JISP. As two images are
object classes and medical applications. However, moved out of register, the spatial correspondence of
previous approaches to joint alignment work on an objects in the images gets disturbed, causing the
ensemble of a single object class. Given an ensemble coherence of the JISP to be disrupted. The clusters
with multiple object classes, we propose an approach and swaths of scatter points spread out and move
around because some bone pixels are now paired with
449
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
muscle pixels, others with fat pixels, etc. Intensity- (i) Density Estimation
based multi-sensor image registration is based on this
(ii) Motion Adjustment
observation. The objective is to move the images until
the JISP is optimally coherent, or minimally disperse.
One of the most successful applications of this idea Suppose we are registering an ensemble of D
uses the entropy of the joint histogram to quantify images. Then, each pixel in our image domain has D
dispersion. Given the JISP between two images, one values associated with it. We will refer to the vector of
forms a joint histogram to reflect the density of points intensities for a single pixel as an ―intensity vector‖,
in the scatter plot. One can compute the entropy of and denote the intensity vector for pixel .
this histogram. The lower the entropy, the more Let us represent our density estimate by . If we
compact and tightly clustered the scatter plot, and model the pixels as spatially independent variables, the
hence the more closely registered the two images. likelihood of observing the images can be written
The same idea can be applied to ensemble [63]
registration. The problem with the entropy-based
methods is that they do not scale well for registration
with more than two images. The joint histogram is an where p is a probability function (defined later) and
intermediary to those cost functions, and as you add x denotes a pixel in our image domain (usually a
more images to the problem, the number of histogram subset of R2 or R3). Thus, L( ) is the probability of
bins increases exponentially. For example, the joint observing the set of intensity vectors, given the
histogram among five images, with each axis distribution specified by .
partitioned into 64 bins, has 230 bins (over 1 billion).
Because of the form of L, it is easier to optimize its
With 256 intensity bins per image, it gives us 240 bins logarithm, log L, because the product over x turns into
(over 1 trillion). Hence,these histogram-based a sum,
methods are infeasible for ensemble registration.
[64]
Some registration methods measure the dispersion
in the JISP without the need to form the joint
histogram. The dispersion is quantified as the length of
a minimum-length spanning tree on the joint intensity
scatter plot. Roche et al. model the clusters in the Our aim to maximize log L(ø,ө). There are three
JISP as a polynomial, thus assuming a functional steps in our algorithm
relation between the intensities in the two images.
This is often not the case. Some other ensemble
registration methods have recently emerged in the Gaussian mixture model
literature .However, these methods have not been
demonstrated on multi-sensor image ensembles, but
rather focus on the problem of registering a set of
images from the same modality to form a template (or Density Estimation
so-called atlas). A different domain-specific method
was designed to simultaneously register sets of brain
MR images, but relies on the use of a human brain
atlas to perform tissue classification, and then aligns
the tissue-classification images Finally, one method
Motion Adjustment
jointly registers and clusters a set of motion-corrupted
images, automatically grouping images by similarity.
However, their method assumes that the set of images
is composed of moved and noisy versions of a set of
F. Gaussian Mixture Model
prototype images, so the registration of the images to
their class archetype amounts to mono-modal The density of points in the JISP is modelled
registration. Hence, these methods are not suitable for using a Gaussian Mixture Model.
general-purpose multi-sensor ensemble registration The mixture consists of K Gaussian
components, each specified by a mean μ and
.III. ENSEMBLE REGISTRATION METHOD covariance matrix∑. Then, for a single pixel
Our approach to minimizing the dispersion in the location x, the likelihood of observing the
JISP involves two steps: intensity vector Ix is
450
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
H. Motion Adjustment
451
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
adjusted. Ten EM iterations are executed for each of because function may be compartmentalized and
these three phases. interfaces are simplified. Software architecture
The algorithm has an expectation step that maps embodies modularity, that is, software is divided into
scatter points to clusters, followed by a maximization separately named and addressable components called
step that re-estimates the optimal clusters. The modules that are integrated to satisfy problem
advantage of using this algorithm with a GMM is that requirements.
each iteration has a closed-form, least-squares The following are the modules of the project which
solution. Density estimation of the clusters is modeled is planned in aid to complete the project with respect
as a Gaussian mixture model (GMM), and is to the proposed system, while overcoming existing
established iteratively using an estimation- system and also providing the support for the future
maximization (EM) enhancement system. They are listed below:-
method. The motion parameters are also solved using (i) user interface design
an iterative Newton-type method. The iterates of these (ii) Image conversion
two methods are interleaved, thereby solving the two (iii) Recognition
problems (density estimation and registration) in (iv) Output
synchrony We design the windows for the project in user
interface design. These windows are used to give input
V. SYSTEM ANALYSIS to the process and to display the output.Recognition is
performed at the level of informative features
extracted from images of objects. The features are
A. Architecture combined in vectors called templates. Templates of
different objects are stored in a library.
INPUT IMAGE
IMAGE CONVERSIO
N VI. EMPIRICAL EVALUATION
A Satellite images
Out of the 300 pairwise registrations (10 trials, each
IMG3 IMG4
with 30 registration pairs), the initial average error for
IMG1 IMG2
the unregistered images was 15.9 pixels. The
ensemble clustering registration method failed on 20
of them. The pairwise clustering method failed on 150
pairs (50%), and FLIRT‘s correlation ratio method
failed on 37 of the pairs (12%).Both clustering
IMAGE IMAGE OUTPUT methods used a multiresolution framework with scales
TEMPLATE COMPARISON IMAGE 20%, 50% and 100%.
It is worth noting that the 20 misregistration cases
for the ensemble method were the result of only two
Fig 2. Architecture registration failures. In each of two trial, one of the six
images failed to converge to the other five (and vice
versa), and was thus recorded as 10 misregistered
The proposed system architecture consist of four image pairs. Of the successful cases, the ensemble
main modules (objectives), namely, the user interface clustering method had a mean error of 0.31, while the
module, image conversion module, template module pairwise clustering method and FLIRT‘s CR method
and recognition module reported 0.65 and 0.41, respectively
This paper generally used to register images from
different sensors.
B. Modules
A modular design reduces complexity, facilities
change (a critical aspect of software maintainability)
and results in easier implementation by encouraging
parallel development of different parts of the system.
Software with effective modularity is easier to develop
452
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[65]
[66] REFERENCES
VII. DISCUSSION [1] M. Jenkinson and S. Smith, ―A global optimisation
Our method can be viewed as a parametric method for robust
regression method, with the number of parameters affine registration of brain images,‖ Med Image Anal,
dictated by the number of Gaussian components. The vol. 5, no. 2, pp.
clustering registration method scales linearly with the 143–156, 2001.
number of Gaussian components (k), and the number [2] A. Collignon, F. Maes, D. Delaere, D.
of pixels (N). However, the computation time is Vandermeulen, P. Suetens, and
proportional to the square of the number of motion G. Marchal, ―Automated multi-modality image
parameters (M), and the cube of the number of registration based on
images (D) because of the matrix products in (17). information theory,‖ in Proceedings of Info Proc Med
More precisely, the method has computational Imaging, Y. Bizais,
complexity O(kNM2D3) C. Barillot, and R. Di Paola, Eds., 1995, pp. 263–274.
453
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
454
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*G.SHOBA.,B.E.,M.E **S.UMA.,B.E.,M.E
455
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
encrypted data is entirely dependent on two things: DES or DES-EDE (encrypt-decrypt-encrypt), uses three
the strength of the cryptographic algorithm and the applications of DES and two independent DES keys to
secrecy of the key produce an effective key length of 168 bits.
IDEA uses a fixed length, 128-bit key (larger
than DES but smaller than Triple-DES). It is also faster
than Triple-DES. These use variable length keys and
are claimed to be even faster than IDEA. Despite the
efficiency of symmetric key cryptography , it has a
fundamental weak spot-key management. Since the
same key is used for encryption and decryption, it
must be kept secure. If an adversary knows the key,
then the message can be decrypted. At the same time,
the key must be available to the sender and the
receiver and these two parties may be physically
separated. Symmetric key cryptography transforms the
problem of transmitting messages securely into that of
Plain Encryptio Ciphe transmitting keys securely. This is an improvement ,
because keys are much smaller than messages, and
Text n r Text the keys can be generated beforehand. Nevertheless,
ensuring that the sender and receiver are using the
same key and that potential adversaries do not know
this key remains a major stumbling block. This is
referred to as the key management problem.
456
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
also true - to encrypt with the private key means you The 256, 384, and 512-bit versions of SHA2
can decrypt only with the public key. share the same interface. . SHA-256 and SHA-512 are
novel hash functions computed with 32- and 64-bit
Hash functions words, respectively. They use different shift amounts
and additive constants, but their structures are
―Is a type of one-way function this are fundamental otherwise virtually identical, differing only in the
for much of cryptography. A one way function - is a number of rounds. SHA-224 and SHA-384 are simply
function that is easy to calculate but hard to invert. It truncated versions of the first two, computed with
is difficult to calculate the input to the function given different initial values.The SHA-2 functions are not as
its output. The precise meanings of "easy" and "hard" widely used as SHA-1, despite their better security.
can be specified mathematically. With rare exceptions,
almost the entire field of public key cryptography rests 3. Steganography
on the existence of one-way functions
In this application, functions are characterized The objective of this work is to develop a
and evaluated in terms of their ability to withstand Compressed Video Steganographic Scheme that can
attack by an adversary. More specifically, given a provide provable security with high computing speed,
message x, if it is computationally infeasible to find a that embed secret messages into images without
message y not equal to x such that H(x) = H(y) then H producing noticeable changes. Here we are embedding
is said to be a weakly collision-free hash function. data in video frames. A video can be viewed as a
A strongly collision-free hash function H is one for sequence of still images. Data embedding in videos
which it is computationally infeasible to find any two seems very similar to images. However, there are
messages x and y such that H(x) = H(y). many differences between data hiding in images and
The requirements for a good cryptographic videos, where the first important difference is the size
hash function are stronger than those in many other of the host media. Since videos contain more sample
applications (error correction and audio number of pixels or the number of transform domain
identification not included). For this reason, coefficients, a video has higher capacity than a still
cryptographic hash functions make good stock hash image and more data can be embedded in the video.
functions--even functions whose cryptographic security Also, there are some characteristics in videos which
is compromised, such as MD5 and SHA-1. The SHA-2 cannot be found in images as perceptual redundancy
algorithm, however, has no known compromises‖ hash in videos is due to their temporal features.
function ca also be referred to as a function with
certain additional security properties to make it
suitable for use as a primitive in various information
security applications, such as authentication and
message integrity. It takes a long string (or message) Original Embed
of any length as input and produces a fixed length image/Frame Cipher Text
string as output, sometimes termed a message digest
or a digital fingerprint.
2.2 SHA 2
The SHA2 functions implement the NIST
Secure Hash Standard. The SHA2 functions are used
to generate a condensed representation of a message
Stego Image
called a message digest, suitable for use as a digital
signature. There are three families of functions, with
names corresponding to the number of bits in the
resulting message digest. The SHA-256 functions are
limited to processing a message of less than 2^64 bits
as input. The SHA-384 and SHA-512 functions can
process a message of at most 128 - 1 bits as input. Fig 2 -Steganography Process
The SHA2 functions are considered to be more secure
than the sha1 functions with which they share a Here data hiding operations are executed
similar interface. entirely in the compressed domain. On the other hand,
when really higher amount of data must be embedded
457
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
in the case of video sequences, there is a more This architecture consists of four functions: I, P and B
demanding constraint on real-time effectiveness of the frame extraction, the scene change detector, motion
system. The method utilizes the characteristic of the vectors calculation and the data embedder and
human vision‘s sensitivity to color value variations. The steganalysis. The details of data embedding in P and B
aim is to offer safe exchange of color stego video frames are as follows:
across the internet that is resistant to all the 1. For each P and B frames, motion vectors are
steganalysis methods like statistical and visual extracted from the bitstream.
analysis. 2. The magnitude of each motion vector is calculated
Image based and video based steganographic as follows:
techniques are mainly classified into spatial domain MVj |= sqrt(| Hj2+Vj 2 )
and frequency domain based methods. The former where j MV the motion vector of the jth macroblock,
embedding techniques are LSB, matrix embedding etc. and i H is horizontal and j V is the vertical components
Two important parameters for evaluating the of the MV respectively.
performance of a steganographic system are capacity 3. This magnitude is compared with a threshold
and imperceptibility. Capacity refers to the amount of 4. Select the block with maximum magnitude and
data that can be hidden in the cover medium so that embed the data using PVD method
no perceptible distortion is introduced. Imperceptibility To increase the capacity of the hidden secret
or transparency represents the invisibility of the hidden information and to provide an imperceptible
data in the cover media without degrading the stegoimage
perceptual quality by data embedding. for human vision, here pixel-value differencing (PVD)
is used for embedding
458
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
459
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Abstract— Object detection and classification are autonomous system. Especially, object classification
necessary components in an artificially intelligent plays a major role in applications such as security
autonomous system. It is expected that these systems, traffic surveillance system, target identification,
artificially intelligent autonomous system venture etc. It is expected that these artificially intelligent
on to the street of the world, thus requiring autonomous system venture on to the street of the
detection and classification of car objects world, thus requiring detection and classification of car
commonly found on the street. The identification objects commonly found on the street.In reality, these
and classification of object in an image should be classification systems face two types of problems. i)
faster and accurate. The aim of our project is to objects of same category with large variation in
detect the object as soon as possible with better appearance ii) the objects with different viewing
accuracy and improved performance even if the conditions like occlusion, complex background
object varies in appearance. Object identification containing buildings, trees, people, road vies, etc.. This
and classification is a challenging process when paper tries to bring out the importance of the
the object of same category with large variation background elimination as early as possible. Thus
appears. Though number of papers deal with background is removed and the image is fed to small
appearance variation, object detection process is subset of detectors for improving the speed of object
considered to be slower. In our proposed work, detection.
we tend to detect the object as quickly as
possible and we improve the detection speed by The existing system deals with whole bank of
using the optimized detectors i.e. small subset of detectors for the given input image. Then, during object
detector for the given input. Also, we detect the detection, we tend to avoid false detection.There are
multi-posed vehicle for small variation of the three main contributions of our object detection
rotation angle. Moreover, we avoid false framework. The first contribution of the object detection
detection when objects are close to one another. is a new image representation called an integral
imagethat allows for very fast feature evaluation. The
Keywords— Detection, Classification, Multi-posed
integral image can be computed from an image using a
vehicle few operations per pixel. The second contribution of the
object detection is a method for constructing a classifier
XXXVI. INTRODUCTION by selecting a small number of important features using
AdaBoost. AdaBoost provides an effective learning
algorithm and strong bounds on generalization
Given an input image, object detection is to determine
performance.The third major contribution of the object
whether or not the specified object is present. Object
detection is a method for combining successively more
detection is a very complex problem that includes some
complex classifiers in a cascade structure which
real hardcore math and long tuning of parameters to the
dramatically increases the speed of the detector by
computation methods. In our project, object in the
focussing attention on promising regions of the image.
sense vehicle i.e. cars.Object detection and classification
are necessary components in an artificially intelligent
460
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Our object detection procedure classifies images feature extraction, we used two methods to compare the
based on the value of simple features. There are many
motivations for using features rather than the pixels efficiency .i.e. Principle Component Analysis (PCA) and
directly. The most common reason is that features can
act to encode ad-hoc domain knowledge that is difficult Histogram of Oriented Gradients (HOG). After extracting
to learn using a finite quantity of training data. For this
system there is also a second critical motivation for the features using these two methods, we perform feature
features: the feature-based system operates much
faster than a pixel-based system. selection using Adaptive Boosting technique. We get the
The research for object detection and relevant features after performing feature selection. In
recognition is focusing on
1) Representation: How to represent an object. Training Module, the relevant features are used to train the
2) Learning: Machine Learning algorithms to learn the
common property of a class of objects. classifier. We perform training by 100 car images and 100
3) Recognition: Identify the object in an image using
learning models. non-car images. The trained features are then classified.In
In our proposed work, we tend to detect the Classification Module, to classify the objects the Support
object as quickly as possible and we improve the
detection speed by using the optimized detectors i.e. Vector Machine (SVM) classifier is used. The trained
small subset of detectors for the given input. Also, we
detect the multi-posed vehicle for small variation of the features are divided into two classes to classify the car
rotation angle. Moreover, we avoid false detection when
objects are close to one another. image and non-car image. After classification, the query
The Phase I is divided into four modules, they are
image .i.e. the image to be tested is given as input. Then
1. Background Subtraction Module. the features are extracted and it is used as test feature.
2. Feature Extraction and Feature
Selection Module. After the features are extracted, the Classification is done
3. Training Module.
4. Object Classification Module. likewise. Then the object is identified by performing the
above process.
In Background Subtraction Module, we first convert
461
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
462
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
463
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
464
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Non-car images 20 16 4
Table 5.1 Predicated result based on HOG features
Total 70 66 4
Accuracy 94% 6%
Non-car images 20 13 7
about 100 images and the Testing set is of about 70 Accuracy 89% 11%
465
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
HOG and PCA technique, I found that HOG performs Proc. IEEE International Conf. on Computer Vision, pp.
5-8, 2003.
betterthan PCA technique. So, I used HOG technique for [6]. C. Papageorgiou and T. Poggio. ―A trainable system
for object detection‖. International Journal of Computer
further testing the accuracy of object detection and
Vision, 38(1), pp. 15–33, 2000.
classification.
[68]
REFERENCES
[67]
[1]. Jie Cao, Li Li. ―Vehicle Objects Detection of Video
Images Based on Gray-Scale Characteristics‖. First
International Workshop on Education Technology and
Computer Science, pp. 937–940, 2009.
[2]. B. Wu and R. Nevatia. ―Cluster boosted tree
classifier for multi-view multi-pose object detection‖. In
Proc. IEEE International Conf. on Computer Vision, pp.
1-8, 2007.
[3]. P. Viola and M. Jones. ―Robust real time object
detection‖. International Journal of Computer Vision,
57(2), pp. 137 – 154, 2004.
[4]. Torralba, K. Murphy, and W. Freeman. ―Sharing
features: Efficient boosting procedures for multiclass
object detection‖. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition, pp. 1-8, 2004.
[5]. G. Shakhnarovich, P. Viola, and T. Darrell. ―Fast
pose estimation with parameter-sensitive hashing‖. In
466
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
1.INTRODUCTION
1.1Routing
A wireless mesh network (WMN) is a
communications network made up of radionodes Routing algorithms have used many different
organized in a meshtopology. Wireless mesh metrics to determine the best route. Sophisticated
networks often consist of mesh clients, mesh routing algorithms can base route selection on
routers and gateways multiple metrics, combining them in a single
(hybrid) metric. All the following metrics have
A mesh network is reliable and offers redundancy. been used:
When one node can no longer operate, the rest of
the nodes can still communicate with each other, Path Length
467
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Reliability
Delay
Bandwidth
Load
User
Input
Topology construction
Dual Decomposition
DataBas
ee Calculating Available
Path
469
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
and attempts to associate this address with a next starts Load Estimation. It takes the required
hop. information from the database and starts calculating
load, it uses Dijkstras algorithm for distance
Route Flap Damping Algorithm calculation and it increases the link cost in order to
overcome the overloaded situation. Route Flap
The dual decompositionmethod makes it possible to Damping Algorithm is used to do the proper load
design a distributedrouting scheme. However, there sharing. After the Load Estimation is done, it
could be a route flappingproblem in the distributed selects the best path and sends the message to the
cheme. To tackle this problem,we have suggested a destination.
dampening algorithm and haveanalyzed the
performance of the algorithms Route flap damping
(RFD) plays an important role in maintaining the
stability of the Internet routing when receiving a
route r with prefix d from peer j
470
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[3]http://www.springerlink.com/content/h71831624
01gk741/
[4]http://www.linuxjournal.com/article/3345
[5]http://portal.acm.org/citation.cfm?id=1618241
[6]http://airccse.org/journal/jwmn/0210s5.pdf
[7]http://www.computer.org/portal/web/csdl/doi/10.
1109/AINAW.2007.50
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
471
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*saranya_munirathinam@yahoo.com9965107139
**suren.csc@gmail.com9043409185
***ssmani184@gmail.com9788736081
Sri Ram Engineering College
it to some events that communicate with the movement of the nose tip in the live video feed are
computer. translated to become the coordinates and movement
In our work we were trying to compensate of the mouse pointer on the user‟s screen. The
people who have hands disabilities that prevent them left/right eye blinks fire left/right mouse click events.
from using the mouse by designing an application
that uses facial features (nose tip and eyes) to MODULES DESCRIPTION
interact with the computer. The nose tip was selected
as the pointing device; because it is located in the The paper concerns each and every aspect in
middle of the face it is more comfortable to track the detecting and tracking the face. Let be in the
mouse as the face moves. Eyes were used to simulate following hierarchy.
mouse clicks, so the user can fire their events as He
blinks.
While different devices were used in HCI (e.g.
infrared cameras, sensors, Microphones) we used an
off-the-shelf web cam that affords a moderate
Resolution and frame rate as the capturing device in Video Frame Module
order to make the ability of using the program
affordable for all individuals. We will try to present The first module is to capture the Live Image
an algorithm that distinguishes true eye blinks from using the Web Cam and to detect a face in the
involuntary ones, detects and tracks the desired captured Image Segment.
facial features precisely
Face Detection Module
OBJECTIVES
The paper enable the user to select the items Face detection has always been a vast
present in the computer screen by mouse but not research field in the computer vision World,
directly using with hand but through the movement considering that it is the backbone of any application
of his/her head. The main objective of this paper is that deals with the human face (e.g. surveillance
listed as follows. systems, access control). Researchers did not spare
1. Mouse pointer is recognized from the nose point any effort or imagination in inventing and Evolving
of the human. methods to localize, extract, and verify faces in
2. Selecting an event in the screen will be done by images.
blinking the left eye.
3. Right click event in the mouse button can be Simple heuristics were applied to images taken
done by blinking right with certain restrictions (e.g. plain background).
eye. These methods however have improved over time
and become more robust to lighting conditions, face
PAPER DESCRIPTION orientation, and scale. Despite the large number of
face detection methods, they can be organized In two
PROBLEM DEFINITION main categories: Feature-based methods, and image-
This paper aims to present an application that based methods.
is able of replacing the traditional mouse with the
human face feature. Facial features (nose tip and
eyes) are detected and tracked in real-time to use
their actions as mouse events. The coordinates and
473
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
The first involves finding facial features (e.g. The sum of pixels in each sector is denoted as S
nose trills, eye brows, Lips, eye pupils….) and in along with the sector
order to verify their authenticity performs number.The use of this filter will be explained in
Geometrical analysis of their locations, areas, detail in the face detection algorithm.
and distances from each other. This feature-based
analysis will eventually lead to the Localization Integral Image
of the face and the features that it contains. In order to facilitate the use of SSR filters an
intermediate image representation called integral
Some of the most famous methods that are image will be used. In this representation the integral
applied in this category are skin models, and image at location x, y contains the sum of pixels
motion cues which are effective in image which are above and to the left of the pixel x, y [3] .
segmentation and face extraction. On one hand
feature-based analysis is known for its pixel- Fig Integral Image.
accuracy features localization, and speed, on the Ii: Integral image, i: Pixel value
other hand its lack of Robustness against head
rotation and scale has been a drawback of its With this representation calculating the
application in computer vision. sectors of the SSR filter becomes fast and easy. No
matter how big the sector is,we will need only 3
The second is based on scanning the image of arithmetic operations to calculate the sum of pixels
interest with a window that looks for faces at all which belong to it.So each SSR filter requires 6*3
scales and locations. This category of face operations to calculate it.
detection implies pattern recognition, and
achieves it with simple methods such as template Find Candidates Face Module
matching or with more advanced techniques such The human face is governed by proportions
as neural networks and support vector Machines. that define the different sizes and distances between
facial features. we will be using these proportions in
Image-based detection methods are popular our heuristics to improve facial features detection
because of their Robustness against head rotation and
and scale, despite the fact that the Exhaustive trackin
window scanning requires heavy computations. g.
Face
detecti
on
general steps.
We will be using feature based face detection
methods to reduce the area in which we are looking
for the face, so we can decrease the execution time.
To find candidates face the SSR filter will be used in
Fig SSR Filter. the following way.
474
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
At first we calculate the integral image by making a the conditions are fulfilled then the center of the
one pass over the video frame using these equations: filter will be considered as a face candidate.
s(x, y) = s(x, y-1) + i(x, y)
ii(x, y) = ii(x-1, y) + s(x, y) Extract BTE Templates
Where s(x, y) is the cumulative row sum, s(x,-1) = 0, Now that we found pupils candidates for each
and ii(-1, y) = 0. of the clusters (face candidates) we can extract BTE
templates in order to pass them to the support vector
machine. As earlier mentioned, we trained our
support vector machine on templates of size 35 * 21
pixels , so no matter how big is the template that we
are going to extract we need to scale it down to that
Figure shows an ideal location of the SSR filter, size.
where its center is
considered as a candidate face. In order to find the scale rate we divide the
distance between the left and right pupil candidates
on 23, where 23 is the distance between the left and
right pupils in our training templates.
We will extract a template that has the size
of 35*SR * 21*SR where the left and right pupil
candidates are aligned on the 8*SR row and the
distance between them is 23*SR pixels. After
extracting the template we scale it down with SR,
and we get a template that has the size and
alignments of the training templates.
475
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
line we will get 35 pixels that cover that line, the So since that the nose bridge is brighter than
same for the height of the template (21*SR) if we the surrounding features the values should
select each SR line we will get 21 lines that cover accumulate faster at the bridge location. In vertical
21*SR lines, so the final result will be 35*21 pixels intensity profiles we add horizontally to each column
that represent 35*SR * 21*SR pixels; in other words the values of the columns that precedes it in the ROI;
we got a 35*21 template that represents the SR down the same as in the horizontal profile, the values
scale of the original template so now it is ready to be accumulate faster at the nose tip position. So the
passed to the support vector machine. maximum value gives us the „y‟ coordinates of the
nose tip. From both, the horizontal and vertical
Find Nose TIP Module profiles we were able to locate the nose tip position,
Now that we located the eyes, the final step is but unfortunately this method did not give accurate
to find the nose tip. So the first step is to extract the results because there might be several max values in
Region of Interest (ROI) in case the face was rotated a profile that are close to each other, and choosing
we need to rotate the ROI back to a horizontal the correct max value that really points out the
alignment of the eyes. coordinates of the nose tip is a difficult task.
From the following figure we can see that the So instead of using the intensity profiles
blue line defines a perfect square of the pupils and alone, we will be applying the following method.
outside corners of the mouth; the nose tip should fall
inside this square, so this square becomes our region At first we need to locate the nose bridge and
of interest in finding the nose tip. then we will find the nose tip on that bridge. As
earlier mentioned the nose bridge is brighter than
surrounding features, so we will use this criterion to
locate the nose-bridge-point (NBP) on each line of
the ROI.
476
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
CONCLUSION
In detection mode the eyes and nose tip were
located accurately when the following conditions
were fulfilled:
The face is not rotated more than 5° around
the axis that passes from the nose tip (as long
as the eyes fall in sectors S1 and S3 of the
SSR filter).
The face is not rotated more than 30° around
the axis that passes from the neck (profile
view).
Nose bridge detection with the SSR filter and the Wearing glasses does not affect our detection
horizontal profile process.
ROI is enlarged only for a clearer vision. As for different scales it is best to get about
Now that we located the nose bridge we need to find 35 cm close to the webcam, because when
the nose tip on that bridge. Since each NBP the face is a bit far from the screen the
represents the brightest S2 sector on the line it program may detect a false positive
belongs to, and that S2 sector contains the especially when the background is jammed
accumulated vertical sum of the intensities in that (crowded).
sector from the first line to the line it belongs to, we
will be using this information to locate the nose tip.
478
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
479
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
480
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
481
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
node. The key chain is generated by finding Analysis Of Connectivity, Resilience And
out the key combinations using the Chinese Memory Requirements
reminder theorem formulation which includes Key connectivityis the probability that two
a mod function which is found between the (or more) sensor nodes store the same key or
values of the key pool and the relatively prime keying material. Enough key connectivity must
numbers. be provided for a WSN to perform its intended
Let A denote the key pool which ranges functionality.
from 0 to M-1 and m0 and m1 be the Key connectivity can be calculated using the
relatively prime numbers selected during the following formulas. Heren=no of nodes
parameter selection phase the key chains are supported.
generated using the CRT formula (A mod m0, Total no of pairs=n(n-1)/2.
A mod m1). Let A denote the Key pool which No of pairs sharing atleast 1
ranges from 0 to M-1. key=(total pairs - the no of pairs NOT
Algorithm: sharing atleast one key).
Input: mi=relatively prime numbers, Key connectivity (KC) = (No of pairs sharing
M=Product of atleast one key / Total pairs).
mi, n=no of relatively prime numbers. Resilience is the resistance against node
Output: Key ring set. capture. Compromise of security credentials,
A=(0 toM-1) which are stored on a sensor node or
for(j=0;j≤M-1;j++) exchanged over radio links, should not reveal
{ any credential information. These factors are
for(i=0;i≤n-1;i++) to be analyzed.
{
KR[i][j]={j mod mi} conclusion
} To overcome the various security
} vulnerabilities in the wireless sensor networks
the design of a pre distribution algorithm using
The key pool A={0,1,2,3,4,5} where M=6. a deterministic approach is initiated. The
The various key combinations of mi =(2,3) are deterministic approach is the process of
found using the Chinese reminder theorem. determining the keys or key chain based on
The key chain is generated for the above some criteria. In this paper we have proposed
example using the mod functions (A mod m0,A a novel deterministic key pre distribution
mod m1) is algorithm using the Chinese reminder theorem
{(0,0),(1,1),(0,2),(1,0),(0,1),(1,2)}. for distributed wireless sensor network. In our
.Key Chain Selection future work, after analyzing the performance
Key chain selection is performed by metrics of key distribution algorithm, we plan
considering the constraint that there should be to extend this novel algorithm for hierarchal
no same or repeated keys within a key chain. wireless network by adapting probabilistic
The nodes having repeated keys are rejected. method.
The key chain having without any repeated
key is termed as the valid key chain. The
count of such valid key chain determines the
network size it can support. The combinations REFERENCES
which are same or with repeated keys are
considered as in valid key chain. [1] L. Eschenauer and V. D. Gligor, ―A key-
management scheme for distributed sensor
Eliminate key ring which satisfy Condition 1 networks.‖ in ACM CCS, 2002, pp. 41–47.
Condition1:checking whether any keys are [2]Camtepe, S. and Yener, B.. ―Combinatorial
repeated within the key ring design of key istribution mechanisms for
Condition2:Check whether the key rings wireless sensor networks‖. In 9th European
has same keys but in different order if so keep Symposium on Research Computer Security
anyone key ring and eliminate the rest ,2004.
From example1considering the condition [3]Elaine shi and Adrian perrig, Carnegie
specified above the valid key chains Mellon University. ―Designing Secure Sensor
are(0,1),(0,2),(1,2) and the number of nodes Networks‖,IEEE wireless communications
supported are 3 by mi =(2,3). December 2004.
482
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[4]Hung-Min Sun∗, Shih-Ying Chang∗, Yu- networks‖, IEEE communications surveys and
Hsiang Hung∗, Yu-Kai Tseng∗ and Hsin-Ta tutorials 2nd Quarter 2006,Volume8,No2.
Chiao, ―Decomposable Forward Error [15] Nguyen, H.T.T.; Guizani, M.; Minho Jo;
Correction Codes Based on Chinese Remainder Eui-Nam Huh ―An Efficient Signal-Range-Based
Theorem‖, 10th International Symposium on Probabilistic Key Predistribution Scheme in a
Pervasive Systems, Algorithms, and Wireless Sensor Network‖, Vehicular
Networks,2009. Technology, IEEE
[5] Zhu, S., Setia, S., and Jajodia, S. ―Leap: Transactions on, Volume 58, Issue 5, 2009,
Efficient security mechanisms for large-scale Page(s):2482 – 2497.
distributed sensor networks‖, In 10th ACM [16] S. A. Camtepe and B. Yener, ―Key
Conference on Computer and Communications Distribution Mechanisms for Wireless Sensor
Security2003. Networks: A Survey,‖ Computer Science
[6]Jianmin Zhang , Wenqi Yu,Xiande Department at RPI, Tech. Rep. TR-05-07,
Liu,‖CRTBA: Chinese Remainder Theorem- 2005.
Based Broadcast Authentication in Wireless [17] Yang Xiao, Venkata Krishna Rayi, Bo
Sensor Networks‖, 2009 IEEE Sun, Xiaojiang Du, Fei Hu and Michael
[7]Kaoru Kurosawa, Wataru Kishimoto, and Galloway, ―A survey of key management
Takeshi Koshiba,‖A Combinatorial Approach to schemes in wireless sensor networks‖,
Deriving lower transactions on information Computer Communications Elsevier, Science
theory,‖ vol. 54, no. 6, june 2008 Direct, pp. 2007, 2314–2341.
[8] Hwang, D., Lai, B., and Verbauwhede, I. [18] Yang Xiao, Venkata Krishna Rayi, Bo
―Energy-memory-security tradeoffs in Sun, Xiaojiang Du, Fei Hu, and Michael
distributed sensor networks‖, In 3rd Galloway, ―A Survey of Key Management
International Conference on Ad-Hoc Networks Schemes in Wireless Sensor Networks‖,
and Wireless Network. In 1st ACM Workshop Computer Communications, Special Issue On
on Security of Ad Hoc and Sensor Networks, Security On Wireless Ad Hoc And Sensor
2004. Networks, April 24, 2007.
[9] Seyit a. Camtepe and Bulent yener [19] SUN Dong-Mei1 HE Bing ―Review of Key
Rensselaer Polytechnic Institute. ―Key Management Mechanisms in Wireless Sensor
Distribution Mechanisms for Wireless Sensor Networks‖, Acta Automatica Sinica, Vol. 32,
Networks‖: A Survey TechnicalReport TR-05- No. 6, November, 2006.
07, March 23,2005.
[10]T. Kavitha, Dr.D.Sridharan, ―Hybrid Design [20] T. kavitha, Dr.D.sridharan ―Security
of Scalable Key Distribution for Wireless vulnerabilities in wireless sensor networks: a
Sensor Networks‖. IACSIT International survey‖ is published in an international Journal
Journal of Engineering and Technology, Vol.2, on Information Assurance and Security (JIAS)
No.2, April 2010 in issue 5, 2010, pp 031-044.
[11]Taekyoung Kwon, JongHyup Lee,JooSeok [21] Du, W., Deng, J., Han, Y., Chen, S., and
Song, Kwon, ―Location-Based Pairwise Key Varshney, P. ―A key management scheme for
Predistribution for Wireless Sensor wireless sensor networks using deployment
Networks‖IEEE transactions on wireless knowledge‖, In IEEE Infocom‘04, 2004.
communications, vol. 8, no. 11, november
2009.
[12]xiangqian chen,kia makki,kang yen and
nikki pissinou―sensor network security: a
survey‖ IEEE communications surveys and
tutorials vol 11,2009
[13] Yi Cheng and Dharma P. Agrawal,‖
Efficient Pairwise Key Establishment and
Management in Static Wireless Sensor
Networks‖, 2005 IEEE.
[14]Yong Wang,Garchan Attebury,and Byrav
ramamurthy .University of Nebraska-Lincoln.
―A survey of security issues in wireless sensor
483
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
1 INTRODUCTION
Abstract— Many web sites contain large sets DEEP Web, as is known to everyone,
of pages generated using a common template contains magnitudes more and valuable
or layout. So web data extraction has been an information than the surface Web. However,
important part for many web data analysis making use of such consolidated information
applications. In this paper, we study the requires substantial efforts since the pages are
problem of automatically extracting the generated for visualization not for data
database values from template generated web exchange. Thus, extracting information from
pages without any learning examples. And also Webpages for searchable Websites has been a
we study an unsupervised, page level data key step for Web information integration. An
extraction approach to deduce the schema and important characteristic of pages belonging to
templates for each individual web site, which the same Website is that such pages share the
contains either singleton or multiple data same template since they are encoded in a
records in one webpage.FiVaTech applies tree consistent manner across all the pages. In
matching, tree alignment, and mining other words, these pages are generated with a
techniques to achieve the challenging task. In predefined template by plugging data values.
experiments, FiVaTech has much higher In practice, template pages can also occur in
precision than EXALG.The experiments show surface Web (with static hyperlinks).In
an encouraging result for the test pages used addition, templates can also be used to render
in many state-of-the-art Web data extraction a list of records to show objects of the same
works. kind. Thus, information extraction from
template pages can be applied in many
Index Terms-Semi structured data, Web data situations. What‘s so special with template
extraction, multiple trees merging, wrapper pages is that the extraction targets for
induction. template Webpages are almost equal to the
data values embedded during page
484
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Fig. 1. (a) A Webpage and its two different schemas (b) S and (c) S‘.
When multiple pages are given,the 2 PROBLEM FORMULATION
extraction target aims at page-wide In this section, we formulate the model
information (e.g., RoadRunner [4] and EXALG for page creation, which describes how data
[1]).When single pages are given, the are embedded using a template. As we know,
extraction target is usually constrained to a Webpage is created by embedding a data
recordwide information (e.g., IEPAD [2], DeLa instance encoding function that combines a
[11], and DEPTA [14]), which involves the data instance with the template to form the
addition issue of record-boundary detection. Webpage, where all data instances of the
Page-level extraction tasks, although do not database conform to a common schema,
involve the addition problem of boundary which can be defined as follows (a similar
detection, are much more complicated than definition can also be found at EXALG [1]):
record-level extraction tasks since more data Definition 2.1 (Structured data). A data
are concerned. In this paper, we focus on schema can be of the following types:
page-level extraction tasks and propose a new 1. A basic type β represents a string of
approach, called FiVaTech, to automatically tokens, where a token is some basic
detect the schema of a Website. The proposed units of text.
technique presents a new structure, called
fixed/variant pattern tree, a tree that carries 2. If η1,η2, . . . ,ηk are types, then their
all of the required information needed to ordered list <η1,η2, . . . ηk> also
identify the template and detect the data forms a type η. We say that the type
schema. We combine several techniques: η is constructed from the types
alignment, pattern mining, as well as the idea <η1,η2, . . . ,ηk> using a type
of tree templates to solve the much difficult constructor of order k. An instance of
problem of page-level template construction. the korder _ is of the form <x1; x2; .
In experiments, FiVaTech has much higher . . ; xk>, where x1, x2, . . . .,xk are
precision than EXALG, one of the few page- instances of types η1,η2 . . . ηk,
level extraction system. respectively. The type η is called
485
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
a. A tuple, denoted by <k >η, if introduce how input DOM trees can be
the cardinality (the number recognized and merged into the pattern tree
of instances) is 1 for every for schema detection. According to our page
instantiation. generation model, data instances of the same
type have the same path from the root in the
b. An option, denoted by DOM trees of the input pages.
<k>?η, if the cardinality is Thus, our algorithm does not need to
either 0 or 1 for every merge similar subtrees from different levels
instantiation. and the task to merge multiple trees can be
broken down from a tree level to a string level.
c. A set, denoted by {k}η, if the Starting from root nodes <html> of all input
cardinality is greater than 1 DOM trees, which belong to some type
for some instantiation. constructor we want to discover, our algorithm
applies a new multiple string alignment
d. A disjunction, denoted by
algorithm to their first-level child nodes. There
{η1|η2|....|ηk)η are options
are at least two advantages in this design.
and the cardinality sum of
First, as the number of child nodes under a
the k options (η1-ηk)equals 1
parent node is much smaller than the number
for every instantiation of η.
of nodes in the whole DOM tree or the number
As mentioned before, template pages of HTML tags in a Webpage, thus, the effort
are generated by embedding a data instance for multiple string alignment here is less than
in a predefined template via a CGI program. that of two complete page alignments in
Thus, the reverse engineering of finding the RoadRunner [4].
template and the data schema given input Second, nodes with the same tag
Webpages should be established on some name (but with different functions) can be
page generation model, which we describe better differentiated by the subtrees they
next.The advantage of tree-based page represent, which is an important feature not
generation model is that it will not involve used in EXALG [1]. Instead, our algorithm will
ending tags (e.g., </html>, </body>, etc.) recognize such nodes as peer nodes and
into their templates as in string-based page denote the same symbol for those child nodes
generation model applied in EXALG. to facilitate the following string alignment.
Concatenation is a required operation After the string alignment step, we conduct
in page generation model since subitems of pattern mining on the aligned string S to
data must be encoded with templates to form discover all possible repeats (set type data)
the result. For example, encoding of a k-order from length 1 to length |S|/2. After removing
type constructor η with instance x should extra occurrences of the discovered pattern
involve the concatenation of template trees T, (as that in DeLa [11]), we can then decide
with all the encoded trees of its subitems for whether data are an option or not based on
x. However, tree concatenation is more their occurrence vector, an idea similar to that
complicate since there is more than one point in EXALG [1].
to append a subtree to the rightmost path of
an existing tree. Thus, we need to consider
the insertion position for tree concatenation.
486
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
487
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
schema resulted by our system, and at the challenge and causes many problems. Also,
same time, to compare FiVaTech with EXALG EXALG assumes that a pair of two valid
[1]; the pagelevel data extraction approach equivalence classes is nested, although this is
that also detects the schema of a Website. not necessarily true. Two data records may be
Given a set of Webpages of a Website intertwining in terms of their HTML codes.
as input, FiVaTech outputs three types of files Finally, a more compact schema can
for the Website. The first type (a text file) be conducted by compressing continuous
presents the schema (data values) of the tuples, removing continuous sets and any 1-
Website in an XML-like structure. We use tuples. A list of types η1,η2, . . . ,ηn is
these XML files in the experiment to compare continuous if ηi is a child of ηi-1 (for n > i > 1).
FiVaTech with EXALG. If η1,η2, . . . ,ηn are of tuples of order k1,k2, . . .
,kn respectively, then the new compressed
6 COMPARISONS WITH RELATED tuple is of order k1+k2+ . . ....kn-n+1. For the
WORK above example, we can compress η3,η4,η5,η7 to
Web data extraction has been a hot get a 7-tuple(=2+2+4+2+2-4+1) and the
topic for recent 10 years. A lot of approaches new schema
have been developed with different task S={{β1,β2,(β3),β4,(β5)?η8,
domain, automation degree, and techniques (<{β6}η11>η10)?η9,(<β7>η13)?η12}ω}η1,
[3], [7].Regardless of the granularity of the where ω is a 7-set.
used tokens, both FiVaTech and EXALG 7 CONCLUSIONS
recognize template and data tokens in the In this paper, we proposed a new Web
input Webpages. data extraction approach, called FiVaTech to
EXALG assumes that HTML tags as the problem of page-level data extraction. We
part of the data and proposes a general formulate the page generation model using an
technique to identify tokens that are part of encoding scheme based on tree templates and
the data and tokens that are part of the schema, which organize data by their parent
template by using the occurrence vector for node in the DOM trees. FiVaTech contains two
each token and by differentiating the role of phases: phase I is merging input DOM trees to
the tokens according to its DOM tree path. construct the fixed/variant pattern tree and
Although this assumption is true, phase II is
differentiating HTML tag tokens is a big
schema and template detection based generation model with tree-based template
on the pattern tree. matches the nature of the Webpages.
According to our page generation Meanwhile, the merged pattern tree gives very
model, data instances of the same type have good result for schema and template
the same path in the DOM trees of the input deduction.
pages. Thus, the alignment of input DOM trees REFERENCES
can be implemented by string alignment at [1] A. Arasu and H. Garcia-Molina, ―Extracting
each internal node. We design a new Structured Data from Web Pages,‖ Proc. ACM
algorithm for multiple string alignment, which SIGMOD, pp. 337-348, 2003.
takes optional- and set-type data into [2] C.-H. Chang and S.-C. Lui, ―IEPAD:
consideration. The advantage is that nodes Information Extraction Based on Pattern
with the same tag name can be better Discovery,‖ Proc. Int‘l Conf. World Wide Web
differentiated by the subtree they contain. (WWW-10), pp. 223-231, 2001.
Meanwhile, the result of alignment makes [3] C.-H. Chang, M. Kayed, M.R. Girgis, and
pattern mining more accurate. With the K.A. Shaalan, ―Survey of Web Information
constructed fixed/variant pattern tree,wecan Extraction Systems,‖ IEEE Trans. Knowledge
easily deduce the schema and template for the and Data Eng., vol. 18, no. 10, pp. 1411-1428,
input Webpages. Oct. 2006.
Although many unsupervised [4] V. Crescenzi, G. Mecca, and P. Merialdo,
approaches have been proposed for Web data ―Knowledge and Data Engineerings,‖ Proc.
extraction (see [3], [7] for a survey), very few Int‘l Conf. Very Large Databases (VLDB), pp.
works (RoadRunner and EXALG) solve this 109-118, 2001.
problem at a page level. The proposed page
488
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
489
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
490
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
491
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2. IMPLEMENTATION
492
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
user interaction needed to execute further tracks the state of the environment, and
parts of the application. In such cases, automatically discovers additional
pressing the button may result in the configurations based on an analysis of the
execution of additional PHP source files. output for available user options. In
There are two challenges involved in particular, the algorithm 1) tracks changes
dealing with such interactive applications. to the state of the environment (i.e.,
session state, cookies, and the database)
and 2) performs an ―on-the-fly‖ analysis of
the output produced by the program to
determine what user options it contains,
with their associated PHP scripts.
3. ALGORITHM
4 The algorithm uses a set of
Fig. 1 shows pseudocode configurations that are already in the
queue (line 14) and it performs state
for the algorithm, which extends the
matching in order to only explore new
algorithm in Fig. 2 with explicit-state
model checking to handle the complexity configurations (line 11).
of simulating user inputs. The algorithm
493
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
FIG 1
4. CONCLUSION
We have presented a
technique for finding faults in PHP
Web applications that is based on
combined concrete and symbolic
execution. The work is novel in several
respects. First, the technique not only
detects runtime errors but also uses
an HTML validator as an oracle to
FIG 2 determine situations where malformed
HTML is created. Second, we address
a number of PHP-specific issues, such
as the simulation of interactive user
input that occurs when user-interface
elements on generated HTML pages
are activated, resulting in the
execution of additional PHP scripts.
Third, we perform an automated
analysis to minimize the size of failure-
inducing inputs.
494
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
REFERENCES
[1] S. Anand, P. Godefroid, and N.
Tillmann, ―Demand-Driven Compositional
Symbolic Execution,‖ Proc. Int‘l Conf.
Tools and Algorithms for the Construction
and Analysis of Systems, pp. 367-381,
2008.
[2] S. Artzi, A. Kiezun, J. Dolby, F.
Tip, D. Dig, A. Paradkar, and M.D. Ernst,
―Finding Bugs in Dynamic Web
Applications,‖ Proc. Int‘l Symp. Software
Testing and Analysis, pp. 261-272, 2008.
[3] M. Benedikt, J. Freire, and P.
Godefroid, ―VeriWeb: Automatically
Testing Dynamic Web Sites,‖ Proc. Int‘l
Conf. World Wide Web, 2002.
[4] D. Brumley, J. Caballero, Z.
Liang, J. Newsome, and D. Song,
―Towards Automatic Discovery of
Deviations in Binary Implementations with
Applications to Error Detection and
Fingerprint Generation,‖ Proc. 16th
USENIX Security Symp., 2007.
[5] C. Cadar, D. Dunbar, and D.R.
Engler, ―Klee: Unassisted and Automatic
Generation of High-Coverage Tests for
Complex Systems Programs,‖ Proc.
USENIX Symp. Operating Systems Design
and Implementation, pp. 209-224, 2008.
[6] C. Cadar and D.R. Engler,
―Execution Generated Test Cases: How to
Make Systems Code Crash Itself,‖ Proc.
Int‘l SPIN Workshop Model Checking of
Software, pp. 2-23, 2005.
[7] C. Cadar, V. Ganesh, P.M.
Pawlowski, D.L. Dill, and D.R. Engler,
―EXE: Automatically Generating Inputs of
Death,‖ Proc. Conf. Computer and Comm.
Security, pp. 322-335, 2006.
[8] J. Clause and A. Orso,
―Penumbra: Automatically Identifying
Failure-Relevant Inputs Using Dynamic
495
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
*N.Suganya, (P.G.Student)/CSE,
**K.Jayashree (Senior Lecturer)/CSE
*suganya2011@gmail.com, Ph: +91 9944573696
**K_jayashree106@yahoo.com
Rajalakshmi Engineering College, Chennai, India
flow management, application integration, etc. It
Abstract—Nowadays the number of services presents a promising solution for solving platform
published over the internet is growing at an interoperability problems encountered by the
explosive speed. So it is difficult for service application system integrators.
requesters to select satisfactory web services,
which provide similar functionality. The Quality of With the rapid development of web service
service is considered the most important criterion technology in these years, traditional XML based
standards (i.e., UDDI) have been mature during
for service filtering. In this paper, the web service
description models consider the service Qos service registry and discovery process. They can be
information and present an overall web service dynamically discovered and integrated at runtime in
selection and ranking for fulfilling service order to develop and deploy business applications.
requester‘s functional and non functional However, the standard WS techniques (such as
requirements. The service selection method is WSDL and UDDI) fail to realize dynamic WSDi, as
based on particle swarm optimization technique. By they rely on syntactic and static descriptions of
using this multi objective Particle swarm service interfaces and other nonfunctional service
optimization technique, a number of Qos values attributes for publishing and finding WSs. As a
can be optimized at the same time and it ultimatelyresult, the corresponding WSDi mechanisms return
improve the service performance. This method can results with low precision and recall. In addition, no
significantly improve the problem solving speed andmeans are provided in order to select among
reduce the complexity in selection, ranking and multiple functionally equivalent WSs. The solution
updation of Qos Web service. to the last problem is QoS-based WS Description
and Discovery (WSDD). QoS for WSs is a set of
Keywords-Web Services, Service selection, nonfunctional properties that encompass
Particle Swarm Optimization, Service Oriented performance characteristics among others. As users
Architecture, Quality of Service are very concerned about the performance of WSs
they use, QoS can be used for discriminating
between functionally equivalent WSs.
Introduction
Web Services (WSs) are modular, self-describing, There is a large body of related work in service
and loosely coupled software applications that can selection that attempt to solve the service selection
be advertised, located, and used across the problem using various techniques. The work was
Internet using a set of standards such as SOAP, proposed in the reference [1], which presented a
WSDL, and UDDI. Web service technology is model of reputation-enhanced QoS-based web
becoming more and more popular in many practical service discovery that combines an augmented
application domains, such as electronic commerce, UDDI registry to publish the QoS information and a
reputation manager to assign reputation scores to
496
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
497
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
to a service broker. Service requester "find" With semantic annotations being added to WSDL
required services by using a service broker, and and UDDI, basic definitions of Web service,
then "binds" to these services. operation of service and user query are given as
follows:
B. QOS Attributes
Definition 1:Web Service
QoS for web services is defined as the A Web service ws is the 4-tuple: ws = {n, d, q, P},
nonfunctional properties of the service being where:
provided to its users. These properties are called 1. n is the name of the Web service,
also metrics; common quality attributes for web 2. d is the functional description of the Web
services are Response time, Availability, Latency, service,
Cost, and Throughput. In Service Oriented 3. q is a set of quality items of the Web service,
Architectures (SOA), both service providers and 4. P = {p1, p2…pm } is a set of operations of Web
service clients should be able to define QoS related service.
statements to enable QoS-aware service publication
and service discovery. Definition 2:Operation
The first step toward supporting QoS in WS is a An operation p is the 4-tuple: p = {n, d, I, O},
precise definition of quality metrics related to the where:
services. The quality attributes are classified and 1. n is the name of the operation,
defined according to users‘ QoS requirements 2. d is the functional description of the operation,
(different requirements for different user profiles). 3. I = {i1, i2…in} is a set of input parameters of
These attributes and their related internal the operation,
properties are identified and described by accurate 4. O = {o1, o2...om} is a set of output parameters
and valid metrics. Classes of QoS attributes are of the operation.
defined; each one has different QoS attributes with
different values. Definition 3:User Query:
A user query r is the 4-tuple: r = {n, I, O, λ},
where:
1. n is the functional name of user query, it is the
desired functional operation,
2. I is a set of input parameters of user query,
3. O is a set of output parameters of user query
4. λ(0<λ≤1) is a value set by user, when the
similarity of a Web service and user query is not
less than λ, the Web service is the suitable service.
λ could be changed as needed.
498
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
499
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
500
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Searching
Process
501
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
502
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2
Assistant Professor, Department of Computer Science,
Adhiparasakthi Engineering College, Melmaruvathur.
503
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
504
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
505
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
2.2 Flexible Rollback Recovery in Dynamic this implies that the execution will always
Heterogeneous Grid Computing create the same (isomorphic) dataflow graph.
Large applications executing on Grid
or cluster architectures consisting of hundreds 2.3 On the Design of Fault-Tolerant Scheduling
or thousands of computational nodes create Strategies Using Primary Backup Approach for
problems with respect to reliability. The source Computational Grids with Low Replication
of the problems is node failures and the need Costs.
for dynamic configuration over extensive Fault-tolerant scheduling is an
runtime. This paper presents two fault imperative step for large-scale computational
tolerance mechanisms called Theft-Induced Grid systems, as often geographically
Checkpointing and Systematic Event Logging. distributed nodes cooperate to execute a task.
By and large, primary-backup approach is a
THEFT-INDUCED CHECKPOINTING
common methodology used for fault tolerance
wherein each task has a primary copy and a
The dataflow graph constitutes a
backup copy on two different processors. For
global state of the system. In order to use its
independent tasks, the backup copy can
abstraction for recovery, it is necessary that
overload with other backup copies on the
this global state also represents a consistent
same processor, as long as their corresponding
global state. we can capture the abstraction of
primary copies are scheduled on different
the execution state at two extremes. At Level
processors. However, for dependent tasks,
0, one assumes the representation derived
precedence constraint among tasks must be
from the construction of the dataflow graph,
considered when scheduling backup copies
whereas at Level 1, the interpretation is
and overloading backups. In this paper, we
derived as the result of its evaluation, which
first identify two cases that may happen when
occurs at the time of scheduling.
scheduling dependent tasks with primary-
SYSTEMATIC EVENT LOGGING
backup approach.
C1: Once a task starts executing, it will
The following are the conditions under which
continue without being affected by external backups can be overloaded on a processor :
events, until its execution ends.
1. Backups scheduled on a processor
C2: The execution of a task is
can overload only if their primaries are
deterministic with respect to the tasks and
scheduled on different processors. Therefore,
shared data objects that are created. Note that
although several backups may be overlapped
on a processor, at most, one of them needs to
506
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
be executed under the single processor failure substitute host chosen to run the recovered
model. The case that several overlapped application in place of the failed host. If data is
backups execute concurrently will not happen. deleted the data should recovery from nearest
2. At most, one of these primaries is neigh our node to avoid delay and improve
expected to encounter a fault. This is to QOS. Upon host failure or inadvertent link
ensure that, at most, one backup is required to disconnection, job execution at a substitute
be executed among the overloaded backups. host can then be resumed from the last good
3. At most, one version of a task is checkpoint. This crucial function avoids having
expected to encounter a fault. In other words, to start job execution all over again from the
if the primary of a task fails, its backup always very beginning in the presence of every
succeeds. This condition is guaranteed by the failure, thus substantially enhancing the
assumption that the minimum required value performance realized by grid applications.
of mean time to failure (MTTF) 1 is always
greater than or equal to the maximum task 3.1 Server Module
execution time in a primary-backup approach. In this module it totally monitor the
client objects and store all the details of the
3. DESIGN AND IMPLEMENTATION clients. if the client misses any details, the
The mobile moving object are server can send the missed data to the
monitored and back up by nearest corresponding client requested by the server.
neighborhood node. If a mobile host or data The server can choose the path to send the
loss may lead to severe performance datas.while choosing the path hierarchical
degradation or even total job abortion, unless clustering algorithm is used. The admin can
execution check pointing is incorporated. view all the details of user, through their
Certain hosts may suspend execution while respective user-id and password. In mobile
waiting for intermediate results (as input to registration the User must register their new
their processes) that may never arrive due to user Id and password with their respective
the host or link failure. If the link failure we Mobile number for any further verification of
can‘t recovery our data Check pointing forces Admin.
hosts involved in job execution to periodically In the user Login the user can view their
save intermediate states, registers, process updated details by sign in with their respective
control blocks, messages, logs, etc., to stable User-Id and Password. In View mobile
storage. This stored checkpoint information Transaction the admin has rights to view all
can then be used to resume job execution at a
507
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
the details of user and they can also view what 2. Random checkpoints are set up to
are all the transaction made by the user. achieve surprise, as opposed to known
3.2 Mobile Host permanently located manned
In the mobile host it will always checkpoints. They might be
monitor the user and send the client datas to established in locations where they
the server. Set check point is used to save the cannot be observed by approaching
upto-date client information into this mobile traffic until it is too late to withdraw
host. Updating the information to the server is and escape without being observed
used to update their respective requested 3. The middleware is built by layered
information to the admin. interacting packages and may be
tailored using different managers
3.3 Moving Object called by a common API so that the
Client is nothing but the moving object users are not concerned of the
which will send the request to the server and different syntax and access methods
get the details from the server. In the of specific packages. The most
information recovery, the user can send their common example is given by the job
request to the admin regarding of their loss of scheduler that can be any of a more or
information or data and they can login through less complex set of products.
their respective user-id and password to view
their data‘s. 4.1 EVALUATION METRICS
508
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
transmit checkpoints and to reach check bed studies, that ReD achieves significant
pointed data when it is needed. reliability gains by quickly and efficiently
determining checkpointing arrangements for
most MHs in a MoG. ReD is shown to
outperform its RCA counterpart in terms of the
average reliability metric and does so with
fewer required messages and superior stability
(which is crucial to the checkpoint
arrangement, minimization of latency, and
wireless bandwidth utilization).
High mobility environments necessitate
some moderation in so that the system can
flexibly adjust arrangements sufficiently to
more optimally track reliability as host
positions and network conditions change. This
is borne out by the flattening of the curves
5 CONCLUSION AND FUTURE WORK
indicating a point of diminishing returns upon
In the proposed system we have used further increases.
the random checkpointing arrangements to Therefore, both average host
increase the minimization of latency and mobility and average provider density are
produce 100% accuracy by using less wireless inputs to the process of modulating and thus
connections. Nodal mobility in a large MoG stability versus reliability control.
may render a MH participating in one job Using stability control mechanism,
execution, unreachable from the remaining hosts could theoretically snoop broadcasts of
MHs occasionally, calling for efficient general wireless traffic to monitor the level of
checkpointing in support of long job execution. break message activity, gather data about
As earlier proposed checkpointing approaches neighborhood density and mobility through ex-
cannot be applied directly to MoGs and are not change with neighbors about connectivity, and
QoS-aware, we have dealt with QoS-aware thus modulate in order to effectively and
checkpointing and recovery specifically for responsively control stability versus
MoGs, with this paper focusing solely on arrangement reliability.
checkpointing arrangement. It has been
demonstrated via simulation and actual test 6 REFERENCES
509
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
510
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
511
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
carrying any integrated terminal can use a wide application level performance measures and
range of applications provided by multiple requirements should be collected and considered
wireless networks. 4G systems provide not only together in order to make optimal handover
telecommunications services, but also a data- decision for each user flow.
rate service when good system reliability is
provided. At the same time, a low per-bit The rest of this paper is organized in the
transmission cost is maintained. Users can use following way. Section 3 gives a detailed
multiple services from any provider at the same explanation on the architecture of the proposed
time. The mobile may simultaneously connect to platform. Simulations and its numerical results
different wireless systems. to illustrate the potential merits of the proposed
Under these circumstances, critical to the platform and the algorithms are presented in
satisfaction of service users is being able to Section 3. Finally, Section 4 concludes the
select and utilize the best access technologies paper.
among the available ones. In this paper ,we
envision a comprehensive architectural platform Handovers refer to the automatic
for mobility management that allow end-users to failover from one technology to another in order
dynamically and fully take advantage of different to maintain communication. This handoff
access networks. All of the existing solutions technology is needed for seamless mobility and
cannot satisfy all different QoS constraints of to get the connection without any interruption.
various applications and also the monetary cost. The types of handovers are:
1) Horizontal Handover
In this paper, we also deploy an end-to-end 2) Vertical Handover
mobility management technique that is similar to 3) Downward and Upward Handover
the ones proposed in [2]. However, our main In vertical handover the user can move
objective of adopting an end-to-end mobility between different network access technologies.
management is not just to achieve the In vertical handover the mobility perform
objectives of [2], but to consider the monetary between the different layers. In vertical
cost. Specifically, handover the mobile node moves across the
the objectives of the proposed architecture can different heterogeneous networks and not only
be summarized as follows: changes the IP address but also change the
network interface, QoS characteristics etc .
1. Enabling the triggering of handover decisions 2. MOBILITY MANAGEMENT
when a new user enters or move away from the Mobility management contains two components:
current network. location management and handover
2. Enabling the handover decision that optimizes management.
the handover performance in terms of each
application requirements and also monetary Location management enables the network to
cost. find the current attachment point of a mobile
3. Implementing a route selection algorithm that user. The first step is location registration (or
maximizes the overall battery lifetime as well location update). In this step, the mobile
providing Quality of service. terminal periodically informs the network of its
up-to-date location information, allowing the
These objectives require the gathering network to authenticate the user and update the
of dynamic status information across entire user location profile. The second step is call
protocol stack as well as the dynamic delivery. The network determines the current
adjustment of protocol parameters and controls location in which a mobile terminal is located.
accordingly. For instance, not only the There are some challenges for the design of
information on the available wireless network location management especially for inter-domain
interfacees and their characteristics but also the roaming in terms of the signaling overhead, call
512
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
delivery latency, and QoS guarantees in different functional modules to facilitate efficient inter-
systems. layer communications. The MAs are the interface
to each legacy protocol layers to monitor and
Handover management enables the network to collect protocol specific dynamic status
maintain the on-going connection when a mobile information as well as to adjust the protocol
terminal switches its access point. There are controls without requiring direct modification to
three stage processes for handover. First the the existing protocols.
initiation for handover is triggered by user,
network agent, or changing network conditions. The PDB maintains both the static and
The second stage is new connection generation, dynamic information necessary for handover
where the network must find new resources for related decisions and processes, and the
the handover connection and perform any dynamic part of the information in the PDB are
additional routing operations. Finally dataflow updated by the MAs. The information about the
control needs to maintain the delivery of the available network interfaces available, available
data from the old connection path to the new bandwidth, protocols for increasing the QoS.
connection path according to agreed-upon
service guarantees. Mobility management in The decision engine (DE) maintains the
heterogeneous networks can happen in different per-application handover processing policies to
layers of the OSI protocol stack reference model enable seamless handover of each user session.
such as network layer, transport layer, and A set of rules to determine when to trigger the
application layer. handover decision procedure for a certain
service flow is also maintained to avoid
3. A PROTOCOL FOR MAINTAINING unnecessary handover decision processing
MOBILTY MANAGEMENT FOR CROSS caused by redundant status reports from
LAYER multiple MAs. In making the handover decisions,
the DE utilizes the information on the predefined
key parameters across the protocol layers by
obtaining the necessary static/dynamic data
from the PDB. The DE consists of one or more
application-specific DEs (ADEs). Each ADE
contains application-specific mechanisms for
handover decisions (HDM) and the
513
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
handoff, the VHDC evaluates the APs and then going user session. An SPT entry is created for
directs a handoff operation to the network with each user session, and it contains the 5-tuple
optimal performance /cost. On the other hand, representing the user session, i.e., (source
if no other APs are found perceived address, destination perceived
for a possible handoff, then the cellular network address, transport protocol, source port number,
would then be considered the best available destination port number), transport IP and MAC
wireless network. addresses, ADE class of the user session,and the
unique session identification (SID). The SID is
IP Agent is responsible for the mapping used by the all functional modules of the
of the end point addresses of ongoing sessions proposed platform in order to identify the user
to the addresses corresponding to the current session uniquely. The DE also updates its SPT
location. It enables the discovery of a peer‘s entry and/or executes the ADE in the following
current location as well as the continuity of data three cases: (1) a new user session
delivery transparent to the mobility by tracking initiation is notified by the application layer MA,
IP address changes of end points. The
functionality of IP Agent consists of two core
modules: address management module (AMM)
and location management module (LMM).
514
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
mobility management, we define two different 3. When QoS requirements of applications in the
kinds of IP addresses recognized by the current access network cannot be satisfied.
proposed platform: (1) perceived address – an 4. When the RSS for the MN has dropped below
IP address originally perceived by transport layer a specified threshold.
session and its socket, and (2) transport address
– an IP address used for actual data When a new access network is detected or the
transportation that changes per-handover and current access network is unreachable, the MAs
represents the current location of an MN. in the PHY/MAC layer can generate HDTEs
through the detection of link status changes.
When a new user session is initiated, the DE is Further, the platform allows the network layer
informed of a new user session initiation by the MA to generate HDTEs by detecting a new IP
application MA . The DE first determines the SID address acquisition of a reachable neighbor
and the ADE class for the new session. The DE network from the current access router,
then consults the PDB to find out all of the receiving ICMP error messages due to a link
available network interfaces. If more than one failure over data path.
interface is active, the DE executes the ADE in
order to determine the initial transport address When QoS requirements of applications in the
for the user session. Determining the above current access network cannot be satisfied any
three things completes the generation of the MA of each protocol layer can generate HDTEs
SPT entry for the new session. The DE then as different QoS parameters in each protocol
informs the new user session initiation to IP layer are monitored. For example, QoS
Agent with the source/ parameters for video service such as a peak
destination perceived addresses and the source noise to signal ratio (PNSR), delay-jitter, or
transport address of the session, and the ADE packet loss rate can be monitored by any MAs in
generates the directions to the MAs regarding application, transport and network layers.
the parameters and criteria for the new user
session. The AMM of IP Agent then generates a When the RSS for the MN has dropped below a
new MIT entry for the new user session, and specified threshold , the MA can generate when
queries LMM to find out the destination a new access network is detected or current
transport address. By obtaining the peer‘s access network is not reachable the HDTE is
current location IP address from the LMM, the also send to VHDC to optimize the system
AMM completes the new MIT entry . The AMM performance[1].
then start establishing an end-to-end mobility
management session for the new user session
with the peer AMM by transmitting an end-to- For each AP the load on the AP can be given as
end mobility management (E2E-MM) message.
Load(P)= ∑ e
515
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Referring to the SPT, the DE determines the [3] to evaluate the performance of the proposed
sessions to enter the handover decision platform and the application-specific handover
procedure for each HDTE generated decision approach. In this simulation, we
considered the throughput, packet loss ,delay,
load balancing and maximizing the battery
The DE can receive multiple HDTEs in a raw power occurred only by handovers. For the
from various layers. Preference must be given handover
for cause 1 and cause 2 rather than cause 3 and decision polices, both the monetary cost and the
4. application QoS are the most interested ones to
the users as well as to the service providers.
4.2 THE HANDOVER PROCEDURE
As illustrated in Fig. 2, when an MA notifies the
DE of HDTEs to trigger the handover decision
procedure , the DE determines the related
session by referring to the SPT and determines
whether it is necessary to execute the
corresponding ADE or not. If the ADE is invoked,
the HDM of the selected ADE is then executed
first. The HDM consults the PDB to obtain the
necessary status information to make the In our simulation, though, we consider both the
application handover decision and the network application QoS and monetary cost as the
selection and handover takes place. Then the handover decision policy in order to illustrate the
VHDC is invoked. It checks for the triggers and gains of the proposed platform in a
execution takes place. The SPT entry is also straightforward way. Since there is a tradeoff
updated with the new transport IP and MAC between the monetary cost and the application
addresses. In order to inform its peer IP agent QoS, it becomes complicated to illustrate the
about the change of its transport address, the gains of the proposed approach themselves in
DE informs the new transport address to the terms of both metrics. But we evaluate the
AMM of IP Agent with the Handover Indication performance of the proposed
signal . The AMM then modifies the source platform in terms of the application QoS and
transport address of the corresponding session monetary cost. Whenever the trigger occurs the
from the MIT. Further, the AMM sends out an VHDC checks for handover decision and
E2E-MM message informing the peer AMM of its maintains the load balancing and thereby
new transport address . In this case, the E2E- optimizing the battery life time of MN.
MM message may also include additional
information necessary for the transport/ (1) For file transfer the policy was set to
application layer control adjustment for maximize the throughput. Therefore, whenever
seamless a new network is discovered or the current
handover. access network is not
reachable any the HDM is called to select a
A route selection algorithm is used to forward network with the highest available bandwidth
the packets. We use Dynamic Source among the candidate networks. After the
Routing(DSR) for forwarding. handover completion, when the peer IP Agent
receives E2E-MM control messages notifying the
transmission path update, the peer‘s TAC can
5. PERFORMANCE EVALUATION direct the MA to resume the data transmission
We have constructed a simulation model with using the adaptive transmission rate to the new
NS-2 data path.
516
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
(2) For Audio and Video data the HDM, the management platform for application-
policy was set to minimize the handover latency, specific handover decision in overlay
packet delay-jitter variations, and packet loss. networks.‖ homepage:
Therefore, only when the current service www.elsevier.com/locate/ comnet published by
network cannot meet its QoS requirement or the the authors.
MN moves out of the coverage of the current [3] http://www.isi.edu/nsnam/ns.
service network , the MN attempts a handover.
A network with the largest coverage is selected
as long as it provides the required QoS in order
to minimize the number of handover.
CONCLUSION
When connections need to migrate
between heterogeneous networks seamless
vertical handoff is necessary. We have proposed
the platform for mobility management which is
essential seamless vertical handoff is the first
step... There is a trade-off between the
monetary cost and application QoS. In existing
systems it becomes complicated to illustrate the
gains in terms of both the metrics. In this paper
we have improved the monetary cost as well as
QoS for applications which is not available in the
existing methods. To improve the monetary cost
we are maximizing the battery life time of each
MN. However in heterogeneous network , the
amount of traffic that each MN relays has a
great impact on the MN‘s battery lifetime. Hence
a VHD algorithm and route selection algorithm is
used to improve the monetary cost. Here we
take account of QoS parameters such as
through put , jitter , packet error rate and end-
to-end delay to improve the QoS of the
application services. Results are simulated using
the NS2 simulator.
REFERENCES:
[1] SuKyoung Lee, Member, IEEE, Kotikalapudi
Sriram, Fellow, IEEE, Kyungsoo Kim, Yoon Hyuk
Kim, and Nada Golmie ―Vertical Handoff
Decision Algorithms for Providing
Optimized Performance in Heterogenous
Wireless Network‖, Member, IEEE IEEE
TRANSACTIONS ON VEHICULAR TECHNOLOGY,
VOL. 58, NO. 2, FEBRUARY 2009
[2] Moonjeong Chang a, Hyunjeong Lee b,
Meejeong Lee Ewha Womans University, 11-1
Daehyun-dong, Seodaemun-gu, Seoul 120-750,
Republic of Korea ―A per-application mobility
517
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
518
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
Topology-aware techniques for P2P have been locations, into self-organized clusters with
extensively studied in recent years. Three main properties of certainty and symmetry.
approaches are widely used to construct topology- A. Basic definitions
aware structured P2P overlay: geographic layout, Before illustrating the clustering algorithm, four
proximity routing and proximity neighbor selection basic definitions are introduced to show the
[6]. Geographic layout is to reflect physical method on the physical topology-aware overlay
location of a node by the value of its NodeID. As construction and the cluster
NodeID assignment mechanism is always identification.
determined by the architecture Itself, the
geographic layout method is always conceived for Definition 1) The reference frame of physical
a certain P2P protocol. [7] Presents an effective topology is defined as n different landmarks [10]
implementation in CAN [1] with an achievement of in Internet; each landmark stands for one
a loss delay stretch. In addition, [8] proposes a dimension of physical topology. In such reference
method of getting an appropriate NodeID by frame, a node locates itself according to the
considering the nodes' physical position in Chord Round Trip Time (RTT) by sending ICMP echo
[2]. Proximity routing approach attempts to select message (Ping) to each landmark. Out of
a relatively nearnode from a set of candidates as consideration for accuracy, the landmarks are
the next hop of routing. supposed to be distributed uniformly in Internet.
519
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
520
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
10: if Receive(Nm)=true
I I: i++; The routing table is an improvement to traditional
12: Update the routing table; k-buckets [4] in the proposed Kademlia. Every
13: endif node stores a list of <NodeID, IP address, Port,
14: endfor Dis, T ime> records for neighbors whose XOR
15: endif distance fall into the range between I and i+l. If Ni
16: endwhile EClusteri and Nj EClusterj,Dis stands for the
17: if i≤K mapping distance between Clusteri and Clusterj.
1 8: return the contact information of Nj; In each k-bucket, records are sorted by Dis-the
19: else nearest node at the head, the farthest node at the
20: return 0; end; thus the nearer neighbors have the priority to
21: endif be requested. Time registers the latest contact
22: end function time of neighbor, which is the basis to update
B. Logical Topology routing table. The farthest XOR distance between
nodes is 2^ n[log2n ] +128 and there are n[log2n
The logical topology of original Kademlia is an ] +128 k-buckets totally, since NodeID is a
Incomplete Binary Tree, where nodes are n[log2n] +128 bit quantities. The original
determined as leaves by unique prefix of 160 bit Kademlia defines four RPCs: PING, STORE,
hash quantities. The notion of distance is defmed FIND_NODE and FIND VALUE. In the proposed
to be bit-wise XOR between NodeIDs. structure, they are also adopted in routing
algorithm which is detailed in Algorithm II .
D. Theoretical Analysis
521
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
522
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
523
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE
[11] G. Kwon and K. D. Ryu, "BYPASS: topology- Xiong, Y. W. Zhang, P. L. Hong and J. S. Li,
aware lookup overlay for DHT-based P2P file "Reduce Chord
locating services," Proc. International Conference routing latency issue in the context of IPv6," IEEE
on Parallel and Distributed Systems, IEEE Communications Letters, vol. 10, Jan. 2006, pp.
Computer Society, 2004, pp. 297-304, doi: 62-64,
10.1109/ ICPADS.2004.24. doi: 10. 1109/LCOMM.2006.1576571.
[12] F. Hong, M. Li and J. D. Yu, "PChord: [20] J. P. Xiong, Y. W. Zhang, P. L. Hong and J. S.
improvement on Chord to achieve better routing Li, "Chord6: IPv6 based topology-aware Chord,"
efficiency by exploiting proximity," Proc. IEEE Proc. International Conference on Networking and
International Conference on Distributed Computing Services, IEEE Computer Society, 2005, pp. 4.
Systems Workshops, IEEE Computer Society,
2006, pp. 806-811, doi: 1 0.11 09/ICDCSW.2005.1 [21] S. G. Wang, H. Ji, T. Li and J. Q. Mei,
08. "Topology-aware peer-to-peer overlay network for
[13] Y. Liu, P. Yang, Z. Chu and J. G. Wu, "TCS- Ad-hoc," The Journal of China Universities of Posts
Chord: an improved routing algorithm to Chord and lecommunications, vol. 16, Feb. 2009, pp.
based on the topology-aware clustering in self- 111-115.
organizing mode," Proc. International Conference [22] R. Winter, T. Zahn and J. Schiller, "Random
on Semantics, landmarking in mobile, topology-aware peer-to-
Knowledge, and Grid, IEEE Computer Society, peer networks," Proc. IEEE International
2005. pp. 25-25, doi:10.1109/SKG.2005.121. Workshop on Future Trends of Distributed
Computing Systems,
[14] Y. Liu and P. Yang, "An advanced algorithm IEEE Press, 2004, pp. 319-324.
to P2P semantic routing based on the
topologically-aware clustering in self-organizing [23] The P2PSim Project,
mode," Journal of Software, 2006, vol.17, part 2, http://pdos.csail.mit.edulp2psirnl, July, 2008.
pp. 339-348.
524