Proceedings Ncact - 2011

Download as pdf or txt
Download as pdf or txt
You are on page 1of 524

PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING

TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

1
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2nd FEBRUARY, 2011

2
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Organized by

DEPARTMENT
OF
COMPUTER SCIENCE AND ENGINEERING

S.A. ENGINEERING
COLLEGE
NBA ACCREDITED & ISO 9001:2008 CERTIFIED INSTITUTION
Poonamallee – Avadi Road, Veeraraghavapuram,
Thiruverkadu, Chennai – 600 077.
E-Mail: ncact2011@saec.ac.in Website: www.saec.ac.in
Phone Nos : 044 –26801999, 26801499
Fax No: 044 – 26801899

Sponsored by
DHARMA NAIDU EDUCATIONAL AND
CHARITABLE TRUST
3
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

BOARD OF TRUSTEES

(Late) D. SUDHARSSANAM,

Founder

Shri.D. DURAISWAMY

Chairman

Shri.D.PARANTHAMAN

Vice Chairman

Shri.D. DASARATHAN

Secretary

Shri. S. AMARNAATH

Treasurer

Shri.S. GOPINATH

Joint Secretary

4
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

PREFACE
Department of Computer Science & Engineering, S.A. Engineering College, Chennai,
organizes the 4th National Conference on Advanced Computing Technologies (NCACT-
2011) on 2nd February 2010. This National Conference NCACT-2011 aims:

• The main objective of this conference is to create awareness and also to


provide a perfect platform for the participants to upgrade their knowledge and
experience and to discuss on the ways to disseminate the awareness of the
latest developments and advances in computing Technology
 To reflect the current focus of global research, recent developments,
challenges and emerging trends in the field of Advanced Computing
Technologies..
 The deliberation of this conference will be through presentation of papers.

Areas of the Conference

 Cloud, Grid and Quantum Computing

 Nano, Distributed and Parallel Computing

 Wearable ,Ubiquitous Computing

 Computer and Information Security

 Wireless Networks

 Multimedia Network and Applications

 3G/4G Networks

 E - learning Methodologies

 Data Mining and Warehousing

 Intelligent Web Services

 Information retrieval

5
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 SOA tools and Services

 Computational intelligence

A total of about 172 Technical papers were received from post graduate student, Faculty
members and Research Scholars from R & D organizations covering a wide spectrum of
areas , viz. Cloud, Grid and Quantum Computing , Data Mining and DataWareHousing,
Network Security, Wireless Technologies, Operating Systems, Web Mining etc., These
papers were peer reviewed by technical experts and 78 papers have been selected for
presentation. This volume is a record of current research in the field of recent trends in
advanced computing technologies. We would like to express our sincere thanks to Shri.
D.Duraiswamy, Chairman, Shri.D.Dasarathan ,Secretary,Shri.D.Paranthaman, Vice
Chairman Shri. S. Amarnaath, Treasurer, Thiru P. Venkatesh Raja, Director, Dr.
S.Suyambhazhahan Principal providing us all the supports for conduct this 4 th National
Conference. We thank the various organizations that have deputed delegates to participate
in the conference. We wish to express our sincere thanks to all advisory committee
members for their cordiality and share their expertise during various processes of the
conference. Also we thank faculty members and students of Department of Computer
Science & Engineering for their co-operation in bringing out this conference in grand
success.

EDITORIAL BOARD

N. PARTHEEBAN (Ph.D).,
ASSISTANT PROFESSOR
COMPUTER SCIENCE AND ENGINEERING

6
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

STEERING COMMITTEE

CHIEF PATRON

(Late)Shri. D. SUDHARSSANAM, Founder


Shri.D. DURAISAMY, Chairman
Shri.D.PARANTHAMAN, Vice Chairman
Shri.D.DASARATHAN, Secretary
Shri.S. AMARNAATH, Treasurer
Shri.S. GOPINATH, Joint Secretary
PATRON
Shri. P. VENKATESH RAJA, Director.
CO-PATRON

Dr. S.SUYAMBAZHAHAN, Principal.

ADVISORY COMMITTEE

Dr. C.CHELLAPPAN, Professor, Anna University


Dr.K.S.EASWARAKUMAR, Professor, Anna University,Chennai.
Dr. V. RHYMEND UTHARIRAJ, Director, Ramanujam Computing Center, Anna University
Dr.A.KANNAN, Professor , Anna University
Dr.S.VALLI,Associate Prof, Anna University
Dr. V. UMA MAHESHWARI, Associate Prof, Anna University.
Dr. A.P. SHANTHI, Professor, Anna University.
Dr.B.VINAYAGA SUNDARAM, MIT, Chennai
Dr. S.R.BALASUNDARAM, Professor, NIT, Trichy

CONFERENCE CHAIRMAN

Mrs. P.N.JEBARANI SARGUNAR(Ph.D), HOD / CSE


CO-ORDINATORS
Mr. N.PARTHEEBAN(Ph.D), Asst. Prof / CSE
Mrs. B.MURUGESWARI(Ph.D), Asst. Prof./ CSE

7
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ORGANIZING COMMITTEE MEMBERS :

Mrs. R.GEETHA, M.E, Asst.Professor


Mr. C. BALAKRISHNAN, M.E. Asst.Professor
Mrs. N.S. USHA, M.E. Asst.Professor
Mrs.D.CHITRA, M.E. Asst.Professor
Mrs. E. SUJATHA, M.Tech. Asst.Professor
Mrs. A. BHAGYALAKSHMI, M.E. Asst.Professor
Mrs. S. KALPANA DEVI, M.Tech., Asst.Professor
Mr. M. BALASUBRAMANIAN, B.E. Senior Lecturer
Mr.A.MANI,M.E, Senior Lecturer
Mrs.PAUL JASMINE RANI, M.E, Senior Lecturer
Mrs.NITHYA, M.E,Senior Lecturer
Mrs.VANITHA, B.E. Lecturer
Ms. K. RAMYA DEVI, B.E. Lecturer
Mr.G.THIAGARAJAN, B.E. Lecturer
Mr.S.PRABHU, B.E. Lecturer
Mrs.JOYCE JESIE, B.E. Lecturer
Mrs.R.SUDHA, B.E. Lecturer
Mr.MUTHU KUMARASWAMY, B.E. Lecturer
Ms.S.PREETHI,B.Tech ,Lecturer.

8
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

VISION AND MISSION

• To transform our institution into quality technical education center imparting


updated technical knowledge with Character building.

• To create an excellent teaching and learning environment for our staff and
students to realize their full potential thus enabling them to contribute
positively to the community.

• To significantly enhance the self-confidence level for developing creative skills


of staff and students.

ABOUT THE COLLEGE

The S.A. Engineering College was established by the Dharma Naidu


Educational & Charitable Trust in the year 1998-'99.The college is approved by
AICTE Delhi and affiliated to Anna University, Chennai, Tamilnadu. The college is
well-planned and well-designed with spread over to 42 acres and has more than
2.91 lakhs sq.ft. of constructed area. In recognition of quality system of high calibre
being implemented in the administration of the Institution and achievement of its
goals, M/s TUV has accorded ISO 9001:2008 Certification. All the under graduate
programs offered are accredited by National Board of Accreditation (NBA).The
college offers following 6 U.G programmes and 5 P.G programmes.

B.E. - Computer Science and Engineering


B.E. - Electronics and Communication Engineering
B.E. - Electrical and Electronics Engineering
B.E. - Mechanical Engineering
B.E. - Civil Engineering
B.Tech. - Information Technology
M.E. - Computer Science Engineering
M.E. - Communication Engineering
M.E. - Embedded Systems Technologies
M.B.A. - Master of Business Administration
M.C.A. - Master of Computer Applications

The college maintains high standard of education by providing a wide array of

world class of academic facilities, employing highly qualified and experienced faculty

members and creating an ambience conducive of quality education.

9
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ABOUT THE DEPARTMENT

The department of Computer Science & Engineering was established in the

year 1998.The department is accredited by NBA. The department has grown in

strong over the years in terms of infrastructure facilities, experienced and dedicated

team of faculty strength, technical expertise, modern teaching aids, tutorial rooms,

well equipped and spacious laboratories. The department has separate library and

seminar hall with all latest equipment. The department has currently an intake of 120

students. The department also has 120 KVA power backup and 2 Mbps Leased Line

Internet connectivity for the benefit of the students.

10
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

(NBA Accredited and ISO 9001:2008 Certified Institution)


Poonamallee – Avadi Road, Veeraraghavapuram, Chennai – 600 077.
E-Mail : ncact2011@saec.ac.in Website: www.saec.ac.in
Phone Nos : 044 –26801999, 26801499 Fax No: 044 – 26801899

S. AMARNAATH, M.Com.,
CORRESPONDENT

MESSAGE
We at this institution constantly strive to provide an excellent academic environment
for the benefit of students and faculty so that they will acquire a technological competence
synonymous with human dignity and values.

We are dedicated to a continuous process through this “ 4th National Conference on


ADVANCED COMPUTING TECHNOLOGIES – NCACT’11 “ to enable upgrading academic
performance and managerial practices through infra-structure and technological facilities.
This commitment, will enable us to provide updated knowledge-inputs and practical support
to the participants in order to build their confidence level.

I am happy to know that our institution is maintaining the tradition set with respect to
the contents in Engineering & Technology, cultural and other activities of the organization
extending with another milestone of this Conference in this academic year 2010-2011,
organized by the Department of Computer Science and Engineering.

I congratulate and offer my best wishes to the Principal and committee members who
have involved them in this conference towards the academic development for the benefit of
student’s community.
S.AMARNAATH

11
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

(NBA Accredited and ISO 9001:2008 Certified Institution)


Poonamallee – Avadi Road, Veeraraghavapuram, Chennai – 600 077.
E-Mail : ncact2011@saec.ac.in Website: www.saec.ac.in
Phone Nos : 044 –26801999, 26801499 Fax No: 044 – 26801899

P. VENKATESH RAJA, B.E.,M.S.,


DIRECTOR

MESSAGE
This institution is a tribute to the great organizing genius of its Founder. Without his
initiative and inspiration it would have been impossible to find an institution of this character.
This institution is a memorable experiment in the moral and technological
regeneration of India. It stands for nothing less.

We proposed to maintain here standards of discipline and decorum, of decency,


dignity and character building are equaled by few and surpassed by none in contemporary
education systems.

With this, we are proud to conduct the “4th National Conference on ADVANCED
COMPUTING TECHNOLOGIES – NCACT’11 ”, on 2nd February 2011. We wish and thank
the Principal and faculty members who have involved them in this Conference and the
participants who have really come forward to benefit themselves to develop the academic
knowledge of confidence by this conference.

P. VENKATESH RAJA

12
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

(NBA Accredited and ISO 9001:2008 Certified Institution)


Poonamallee – Avadi Road, Veeraraghavapuram, Chennai – 600 077.
E-Mail : ncact2011@saec.ac.in Website: www.saec.ac.in
Phone Nos : 044 –26801999, 26801499 Fax No: 044 – 26801899

Dr. S. SUYAMBAZHAHAN, M.E., Ph.D.,(IITM)


PRINCIPAL

MESSAGE

I appreciate the initiative taken by the heads of department and the faculty
members of computer science and engineering for conducting “4th National
Conference on ADVANCED COMPUTING TECHNOLOGIES – NCACT’11 ”, on
2nd February 2011 at our college campus.

It also gives sense accomplishment and achievements by the student and


staff of
CSE department to release the proceedings on the occasion, which focus on the
latest
advancements in the areas of computing Technology. I am sure; this Conference
highlights and brings out the best of every paper presented by the authors from
academics, R&D institutions and student community pursuing higher degree and
doctoral programmes in various institutions.

I sincerely appreciate the efforts made by the Principal, HOD, staff and
students with the great sense of belongingness and ownership and wish them to
have a great success and in the coming times as well.

Dr.S.SUYAMBAZHAHAN

13
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

S.NO NAME OF THE AUTHOR(S) TITLE OF THE PAPER NAME OF THE COLLEGE MAIL ID/CONTACT NO

EARLY DETECTION AND


1. CONGSTION CONTROL IN
VINITHRA.I PITAM psoftworld87@gmail.com
HETEROGENEOUG
NETWORKS
CONTENT BASED IMAGE
RETRIEVAL ON MOBILE
2. gomathi.gowrisankar@gmail.co
GOMATHI.V DEVICES BY NAVIGATION PITAM
m
PATTERN BASED
RELEVANCE FEEDBACK
REVERSE NEAREST
3. ANNA ARASU .A DR.PAULS smilearasucse@gmail.com
NEIGHBOR FOR
R.NAKEERAN ENGINEERING COLLEGE sughandhiram@yahoo.com
ANONYMOUS QUERIES
AN EFFICIENT IMAGE
4.
BHANUMATHI.R ENHANCEMENT BASED ON PITAM bhanu291987@yahoo.co.in
DWT AND SVD
ANALYSIS AND PROTECTION
5. OF KEY DISTRIBUTION
UMAMAHESWARI.P.K PITAM umapk2008@gmail.com
SCHEME FORSECURE
GROUP COMMUNICATION
JABALIDBIN.M SECURE INFORMATION VELTECHMULTITECH madhan868@gmail.com
6.
DELIVERY IN WIRELESS DRRRDRSRENGINEERIN
S.MADHAN KUMAR SENSOR NODES GCOLLEG jabalidbin@yahoo.com
GEETHANJALI geethavec@gmail.com
JAYACHANDRAN
7. CONTENT AWARE PLAYOUT VELTECH MULTITECH
N.GOMATHI gomathi1974@gmail.com
FOR VIDEO STREAMING SRS ENGG

V.R. VIMAL vimalraman2004@gmail.com

14
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ANUSHA.S s_anusha25@yahoo.com
8. CRYPTANALYSIS OF AN RAJALAKSMI
EDGE CRYPT ALGORITHM ENGINEERING COLLEGE
B.BHUVANESWARAN bhuvangates@yahoo.com
A CONCURRENCY CONTROL
9. PROTOCOL USING ZR+- RAJALAKSHMI
GAYATHRI.U gaayathriu@gmail.com
TREES FOR SPATIAL JOIN ENGINEERING COLLEGE
AND KNN QUERIES
SECURE ENERGY EFFICIENT
LEKSHMI PRIYA.R DATA AGGREGATION
10. RAJALAKSHMI
PROTOCOL FOR DATA yaraj09@gmail.com
ENGINEERING COLLEGE
REPORTING IN WIRELESS
SENSOR NETWORKS
EFFICIENT ENERGY SAVING
EVANGELIN HEMA
USING DISTRIBUTED
11. MARIYA.R RAJALAKSHMI
CLUSTER HEADS IN Hema.mariya@Gmail.com
ENGINEERING COLLEGE
WIRELESS SENSOR
NETWORKS
A NOVEL FRAMEWORK FOR
DENIAL OF PHISHING BY
12. RAJALAKSHMI
VADHANI.R COMBINING HEURISTIC & Vadhanitamilarasi@gmail.com
ENGINEERING COLLEGE
CONTENT BASED SEARCH
ALGORITHM
VEL TECH MULTI TECH
SURESHBABU.D
13. RISK ESTIMATION USING DR.RR & DR.SR
sureshbabu.me@gmail.com
OBJECT-ORIENTED METRICS ENGINEERING
C.PRABHAKARAN
COLLEGE,
DEPARTMENT OF
SECURE ENCRYPTION AND COMPUTER SCIENCE
PIRAMANAYAGAM.M
14. KEYING BASED ON VIRTUAL AND ENGINEERING,
rajaucbe@gmail.com
ENERGY FOR WIRELESS ANNA UNIVERSITY OF
M.YUVARAJU
SENSOR NETWORKS TECHNOLOGY,
COIMBATORE
G.K.M COLLEGE OF
SIVARANJANI.P HYBRID INFRASTRUCTURE
15. ENGINEERING AND
SYSTEM FOR EXECUTING ranjusiva22@gmail.com
TECHNOLOGY,PERINGA
P.NEELAVENI SERVICE WORKFLOWS.
LATHUR, CHENNAI
16. IMPROVISED SOLUTION RAJALAKSHMIENGINEE
NANDHINI.T.J nandhinii.km@gmail.com
THROUGH MERKLE TREE RINGCOLLEGE

15
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ALGORITHM FOR SECURE


MULTIPATH ROUTING WITH
EFFICIENT COLLABORATION
OF BLACK HOLES
LAKSHMI PRIYA.R.V SYBIL GUARD: DEFENDING VELAMMAL priya_be_cs@yahoo.co.in
17.
AGAINST SYBIL ATTACKS ENGINEERING
R. TAMILARASI VIA SOCIAL NETWORKS COLLEGE, CHENNAI, tamil1806@yahoo.co.in
vjanandbe_08@hotmail.com
18. ANAND.V.J, REDUNDANCY CHECK RAJALAKSHMI
SELVAKUMAR.V.S ARCHITECTURE ENGINEERING COLLEGE selvakumar.vs@rajalakshmi.ed
u.in
PRATHYUSHA
LINGESAN.J MODELING BOTNET lingesan.j@gmail.com,
19. INSTITUTE OF
PROPAGATION FOR kannamma.sridharan@gmail.co
TECHNOLOGY AND
R.KANNAMMA DETECTING BOTMASTERS m
MANAGEMENT
VEL TECH DR.RR &
DR.SR TECHNICAL
senthilmuruganme@gmail.com
UNIVERSITY
SENTHILMURUGAN.T
IMPROVING SECURITY
Mobile No.: 9176031383
20. PERFORMANCE OF MOBILE VEL TECH DR.RR &
SENTHIL.P
AD-HOC NETWORKS DR.SR TECHNICAL
SSSENTHIL.P@gmail.com
AGAINST ATTACKS UNIVERSITY
MANIKANDAN.T
manikandan.tk@gmail.com
TAGORE ENGINEERING
COLLEGE
POONGUZHALI.C

21. IMAGE RECOGNITION FOR S.A.ENGINEERING poonguzhali81@yahoo.co.in


D.CHITHRA
DESIGNING CAPTCHAS COLLEGE meetchithra.d@gmail.com

RAMNATH.M VEL TECH MULTI TECH


PMG BASED HANDOFF IN
DR.RANGARAJAN
22. WIRELESS MESH NETWORKS
S.ARUNA DR.SAKUNTHALA Ramnath25@gmail.com
ENGINEERING COLLEGE
P.PRABHU , AVADI, CHENNAI
A NOVEL TECHNIQUE FOR VEL TECH MULTI TECH
ARJUNADHITYAA.K.R kr.arjunadhityaa@gmail.com
23. DETECTING DATA HIDDEN DR.RANGARAJAN
ON DIGITAL IMAGE USING DR.SAKUNTHALA
D.ANANDHI anandhime@yahoo.com
STEGANOGRAPHY ENGINEERING COLLEGE

16
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

kmahaalakshmi@yahoo.com
MAHAALAKSHMI K AUTOMATIC DATA VEL TECH MULTI TECH
24.
EXTRACTION FROM DR.RR & DR.SR ENGG snksnk07@gmail.com
NEELAKANDAN S WEBPAGES BY WEBNLP COLLEGE,
anandhime@ymail.com
IMPLEMENTATION OF
PSNA COLLEGE OF
MATHEW P C ENCRYPTED IMAGE
25. ENGINEERING AND
COMPRESSION USING pcmathew_bmc@yahoo.com
TECHNOLOGY,
M. ARUNKUMAR RESOLUTION PROGRESSIVE
DINDIGUL, TAMIL NADU.
COMPRESSION SCHEME

VEL TECH MULTI TECH


IDENTIFICATION OF
REVATHI.P DR.RANGARAGAN & me.jaga4688@gmail.com
26. STRUCTURAL CLONES
DR.SAKUNTHALA
USING ASSOCIATION RULE
J. JAGADEESH ENGINEERING p_revathime@gmail.com
AND CLUSTERING
COLLEGE.

DATA MINING TECHNIQUES RESEARCH SCHOLAR,


27. FOR CUSTOMER ANNA UNIVERSITY OF
ASOKKUMAR.S asokkumar777@gmail.com
RELATIONSHIP TECHNOLOGY
MANAGEMENT COIMBATORE
NANCY.P.N
LOCATION DEPENDENT
PRIVACY AWARE
28. PROF.R.PRASANNA JAYA ENGINEERING
MONITORING FRAMEWORK baslinnancy24@gmail.com
KUMAR COLLEGE
FOR SAFE REGION MOVING
OBJECTS
DR.T.RAVI
THIRUVALLUVAR
PRIVACY-PRESERVING
SANTHIKALA.M COLLEGE OF saravanansanthi@yahoo.co.in.
29. USING TUPLE AND
ENGGINEERING AND
THRESOLD MATCHING IN
ANANTHARAJ. B TECHNOLOGY, 9789074232
DISTRIBUTED SYSTEMS
VANDHAVASI.
DDOS DEFENSE
HIMAVANTHA RAJU
MECHANISMS FOR R.M.K ENGINEERING
30. VATSAVAI
DETECTING, TRACING AND COLLEGE,KAVARAIPETT himavanthraju@gmail.com
MS.G.MUNEESWARI
MITIGATING NETWORK WIDE AI
M.E.(PH.D)
ANOMALIES
31. SHAHINA BEGAM.I SPATIO-TEMPORAL INDEX VELTECHHIGHTECHDR.R
sbshahintaj@gmail.com
STRUCTURE ANALYSIS R& DR.SR ENGG

17
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DR.KARTHIKEYANI.V.TAJU COLLEGE,
DIN.K.,

PARVIN BEGAM.I
SIVAKAMY.N
32. MODIFIED DELAY STRATEGY S.A. ENGINEERING
B.MURUGESWARI, sivakamyn@yahoo.com
IN GRID ENVIRONMENT COLLEGE,CHENNAI
DR.C.JAYAKUMAR,
AN EFFECTIVE WEB-BASED
indumathyc@hotmail.com
33. E-LEARNING BY MANAGING EASWARI ENGINEERING
INDUMATHY.C
RESOURCES USING COLLAGE
ONTOLOGY
EFFECTIVE AND EFFICIENT
QUERY PROCESSING FOR
34. EASWARI ENGINEERING
ESWARI.R eswariram_88@yahoo.co.in
COLLEGE
IDENTIFYING VIDEO
SUBSEQUENCE
35. COMPUTER AND
VEENA.K ANNA UNIVERSITY dce.veena@gmail.com
INFORMATION SECURITY
ENHANCING THE LIFETIME
ALEN JEFFIE PENELOPE.J OF DATA GATHERING
36. EASWARI ENGINEERING
WIRELESS SENSOR msg2allen@yahoo.co.in
COLLEGE
L.BHAKAYA LASKSHMI NETWORK BY BALANCING
ENERGY CONSUMPTION
AMUDHA.S ST PETER’S UNIVERSITY

ALLIRANI.P SECURITY ISSUES AND SRIRAM ENGINEERING


37. amudhasaravanan@gmail.com,
PRIVACY OF CLOUD COLLEGE
9994554412
M.KAVITHA COMPUTING
SRIRAM ENGINEERING
COLLEGE
VEL TECH MULTI TECH
COST EFFECTIVE WIRELESS
FATHIMA.K DR.RANGARAJAN
38. HEALTH MONITORING
DR.SAKUNTHALA fathimakhadar@gmail.com
SYSTEM FOR INDUCTION
MS. KOUSALYA ENGINEERING
MOTORS
COLLEGE,AVADI,
SANTHOSHKUMAR.S.P RANDOM CHECKPOINTING ANNA UNIVERSITY OF
39. sp_santhoshkumar@yahoo.co.i
ARRANGEMENT IN TECHNOLOGY,
n
M.YUVARAJU DECENTRALIZED MOBILE COIMBATORE, INDIA.

18
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

GRID COMPUTING
NOVEL METHOD FOR
SEQUENCE NUMBER
COLLECTOR RAJALAKSHMIENGINEE
40.
AARTHI.S RINGCOLLEGE, cse.aarthi@gmail.com
PROBLEM IN BLACK HOLE CHENNAI
ATTACK DETECTION-AODV
BASED MANET
ADVANCED CONGESTION
JAYA ENGINEERING
JESHIFA.G.IMMANUEL CONTROL TECHNIQUE FOR
41. COLLEGE,
A.FIDAL CASTRO HEALTH CARE MONITORING jeshifa@gmail.com
THIRUNINDRAVUR-
PROF.E.BABU RAJ IN WIRELESS BIOMEDICAL
602024
SENSOR NETWORKS
A GAME THEORETIC
FRAMEWORK FOR
42. NITHYA KUMARI.K EASWARI ENGINEERING
kumari.nithu@yahoo.co.in
BHAGYALAKSHMI.L POWER CONTROL IN COLLEGE
WIRELESS AD HOC
NETWORKS
UMA.R SMABS: SECURE MULTICAST
43. S.A.ENGINEERING
L. PAUL AUTHENTICATIONBASED ON uma_devi1985@yahoo.com
COLLEGE
JASMINE RANI BATCH SIGNATURE
SANJAIKUMAR.K k.sanjai31@rocketmail.com
44. AFFINE SYMMETRIC S.A.ENGINEERING
G.UMARANISRIKANTH,
IMAGEMODEL COLLEGE
M.E.(PHD)
COMBINING TPE SCHEME
AND SDEC FOR SECURE
MADHAVI.S Madhavi11lakshmi@gmail.com
45. DISTRIBUTED NETWORKED S.A.ENGINEERING
STORAGE COLLEGE
S. KALPANA DEVI

PERFORMANCE EVALUATION
LAVANYA.R laaviraj@gmail.com
46. OF FLOOD SEQUENCING S.A.ENGINEERING
PROTOCOLS IN SENSOR COLLEGE
E.SUJATHA**
NETWORKS
PARVIN BEGUM.I KNOWLEDGE DISCOVERY
47. SOKA IKEDA COLLEGE
PROCESS THROUGH parvinnadiya@gmail.com
OF ARTS AND SCIENCE
DR.KARTHIKEYANI.V TEXTMINING

19
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

TAJUDIN.K

SHAHINA BEGAM.I.,
DATA LEAKAGE DETECTION PRATHYUSHA
INSTITUTE OF
48. USING TECHNOLOGY AND
SUNIL.P psunilcad@gmail.com
MANAGEMENT
ROBUST AUDIO HIDING
TECHNIQUES ARANVAYALKUPPAM

KIRUTHIKA DEVI .S A SECURED CLOUD


49. S.A. ENGINEERING
COMPUTING FOR LIFE CARE kiruthisuju29@gmail.com
COLLEGE, CHENNAI.
A.BHAGYALAKSHMI INTEGRATED WITH WSN

RENUKADEVI.M TRAFFIC ANALYSIS AGAINST


50. S.A .ENGINEERING
FLOW CORRELATION cserenukadevi13@gmail.com,
COLLEGE, CHENNAI-77
G.UMARANI SRIKANTH ATTACKS
SAMSUL ADAM.M A FAULT TOLERANT BASED
51. RESOURCE ALLOCATION COLLEGE OF
Adams146@gmail.com
U.SYED ABUDHAGIR M. FOR THE GRID ENGINEERING , GUINDY
DEIVAMANI ENVIRONMENT
SIVAPERUMAL.V JERUSALEM COLLEGE
OPTIMIZED ROUTING OF ENGINEERING,
52.
P. MAHALAKSHMI ALGORITHM FOR WIRELESS shiva_sathya@yahoo.com
MESH NETWORKS ANNA UNIVERSITY
CHENNAI
ADIPARASAKTHI
AUTOMATIC MULTILEVEL
SIVARANJANI.G ENGINEERIMG shivapss@gmail.com
53. THRESHOLDING OF
COLLEGE,
M.RAJALAKSHMI
DIGTAL IMAGES
MELMARUVATHUR.
AN EFFICIENT CROSS LAYER
BALAJI.V INTRUSION DETECTION
54. S.A.ENGINEERING
TECHNIQUE FOR MANET Knparthi78@gmail.com
COLLEGE
PARTHEEBAN.N

20
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

WIRELESS SENSOR
NETWORK SECURITY USING
55. VIRTUAL ENERGY BASED S.A.ENGINEERING
HANSON THAYA.S Hanson2001@gmail.com
ENCRYPTION COLLEGE

QOS-AWARE CHECKPOINTING
ARRANGEMENT IN MOBILE
56. S.A.ENGINEERING
SANGEETHA.J GRID ENVIRONMENT Jsnageetha22@gmail.com
COLLEGE

57. S.A.ENGINEERING
SIREESHA.P Siri.it.05@gmail.com
COLLEGE
EXTENDED QUERY
58. ORIENTED, CONCEPT-BASED S.A.ENGINEERING
SIVASAKTHI.K 1k.s.shakthi@gmail.com
USER PROFILES FROM COLLEGE
SEARCH ENGINE LOGS
59. ENSEMBLE REGISTRATION S.A.ENGINEERING
ALGUMANI.S r.alagumani@gmail.com
OF MULTI SENSOR IMAGES COLLEGE
UMA.S EMBEDDING
60. DR.PAULS
CRYPTOGRAPHY IN VIDEO dewuma@gmail.com
ENGINEERING COLLEGE
G.SHOBA STEGANOGRAPHY
VEL TECH MULTI TECH
KRISHNA KUMAR.N
61. RULE CLASSIFICATION FOR DR.RANGARAJAN Krishnakumarme2603@gmail.c
MEDICAL DATASET DR.SAKUNTHALA om
MADHU SUDHANAN.S
ENGINEERING COLLEGE
ENHANCED VEHICLE
62.
NIROSHA.N DETECTION BY EARLY PITAM niroshait@gmail.com
OBJECT IDENTIFICATION
EFFICIENT ROUTING BASED
VEL TECH MULTI TECH
BHARATHIRAJA.S ON LOAD BALANCING IN
63. DR.RANGARAJAN
WIRELESS MESH NETWORKS Bharathiraja.88s@gmail.com
DR.SAKUNTHALA
S.SUMATHI
ENGINEERING COLLEGE

SWAPNA .P
64. S.V.V.S.N. ENGINEERING
Priyaswapna245@gmail.com
COLLEGE
K.SAILAKSHMI
65.
USHA.M ROUTING BASED ON LOAD VELLAMMAL umahalingam@gmail.com

21
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

BALANCING IN MULTI-HOP ENGINEERING COLLEGE


WIRELESS MESH NETWORKS

ADIPARASAKTHI
ENGINEERIMG
FLEXIBLE LOAD BALANCING COLLEGE,
66.
DURAI MURUGAN.J IN MULTI_SERVER GRID Dmurugan02@gmail.com
ENVIRONMENT MELMARUVATHUR.

EFFICIENT LOAD
67. BALANCING IN
JEGADEESAN.R VEL HIGH TECH ramjaganjagan@gmail.com
VIDEO SERVERS
FOR VOD SYSTEM

ADIPARASAKTHI
A TOOL FOR FINDING BUGS
ENGINEERIMG
68. IN WEB APPLICATIONS
RAMALINGAM.D COLLEGE, Ramscse_2006@yahoo.co.in

MELMARUVATHUR.

69. DESIGN OF DETERMINISTIC JERUSALAM


JENIFA SUBHA PRIYA.S Jenefa16@gmail.com
KEY DISTRIBUTION FOR WSN ENGINEERING COLLEGE

SURENDRAN.M
VIRTUAL MOUSE USING HCI
70. SRIRAM ENGINEERING
M.SARANYA Suren.csc@gmail.com
COLLEGE

S.SUBRAMANIAN
VEL TECH MULTI TECH
EFFICIENTLY IDENTIFYING
71. DR.RANGARAJAN
PANNER SELVI.R DDOS ATTACKS BY GROUP Rpanner_selvi@yahoo.co.in
DR.SAKUNTHALA
BASED THEORY
ENGINEERING COLLEGE
DEDUCING THE SCHEMA FOR VEL TECH MULTI TECH
72. WEBSITES USING PAGE- DR.RANGARAJAN
TAMILARASI.P tamuluparaman@gmail.com
LEVEL WEB DATA DR.SAKUNTHALA
EXTRACTION ENGINEERING COLLEGE

22
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

QOS METRICS IN PARTICLE


Suganya2011@gmail.com
73. SWARM TECHNIQUE FOR RAJALAKSMI
SUGANYA.N
SELECTION, RANKING AND ENGINEERING COLLEGE
UPDATION OF WEB SERVICE
ANALYSIS ON THE
PERFORMANCE OF VARIOUS
DATA MINING ALGORITHMS
74. RAJALAKSMI
BINU JOHN FOR CARDIOVASCULAR RISK Binujohn86@gmail.com
ENGINEERING COLLEGE
FACTORS

S.K.P.ENGINEERING
MINIMIZATION OF HANDOFF
75. COLLEGE,
VICTORIYA FAILURE PROBABILITY IN Victoriya.isai@gmail.com
NGWS USING CHMP
TIRUVANNAMALAI,
LEARNING
DISCRIMINATIVE
76. CANONICALCORRELATI SRR ENGINEERING
SUDHA RAJESH --
ONS FOR OBJECT COLLEGE
RECOGNITION WITH
IMAGE
DISTRIBUTED DATA BACKUP
AND RELIABLE ADHIPARASAKTHI
bhuvanabharathy@gmail.com
77. RECOVERY FROM MOBILE ENGINEERING
BHUVANESWARI
GRID ENVIRONMENT COLLEGE,
MELMARUVATHUR

23
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EARLY DETECTION AND CONGSTION


CONTROL IN HETEROGENEOUG NETWORKS
* Vinithra.I
PG SCHOLAR, M.E. CSE, Prathyusha Institute of Technology and Management,
Email id:psoftworld87@gmail.com, 9003983630
Abstract— The heterogeneous congestion worked remarkably well in the past, but its
control protocols that react to different pricing limitations in wireless networks and in
signals share the same network, the current networks with large bandwidth-delay product
theory based on utility maximization fails to have motivated various proposals, some of
predict the network behavior. Unlike in a which use different congestion signals. For
homogeneous network, the bandwidth example, in addition to loss based protocols
allocation now depends on router and flow such as HighSpeed TCP[3] , STCP[4] and BIC
arrival patterns. This paper studies the nature TCP[6] , schemes that use queuing delay
of the network like fairness and uniqueness in include the earlier proposals CARD [9],DUAL
the network and the withstanding capability of [10] and Vegas [10], and the recent proposal
the network is analyzed along with optimality FAST [10].
and stability properties it extends the study
with two objectives: analyzingthe optimality Schemes that use one-bit congestion signal
and stability of such networks and designing include ECN [15],and those that use multibit
controlschemes to improve those properties. feedback include XCP [9], MaxNet [16], and
First, we demonstrate theintricate behavior of RCP [16]. Indeed, the Linux operating system
a heterogeneous network through already allows users to choose from a variety
simulationsand present a framework to help of congestion control algorithms since the
understand its equilibriumproperties. Second, kernel version 2.6.13, including TCP-Illinois
we propose a simple source-based algorithmto that uses both packet loss and delay as
decouple bandwidth allocation from router congestion signals.
parameters and flow arrival patterns by only Recently, compound TCP [12] which
updating a linear parameter in the sources‘ also uses multiple congestion signals is
algorithms on a slow timescale. It steers a deployed in Windows Vista and Window Server
network to the unique optimal equilibrium. 2008 TCP stack [13]. Furthermore, if explicit
The scheme can be deployed incrementally as feedback is deployed, it will become possible
the existing protocol needs no change and to feed back different signals to different users
only new protocols need to adopt the slow to implement new applications and services.
timescale adaptation. Note that in this case, the heterogeneous
signals can all be loss-based – different users
Index Terms—Congestion control, receiving different explicit values based on the
heterogeneous protocols, optimal allocation, same actual link loss rate – or all delay-based,
stability. or a mix. Clearly, going forward, our network
will become more heterogeneous in which
I. INTRODUCTION protocols that react to different congestion
signals interact. Yet, our understanding of
CONGESTION control in Transmission Control such a heterogeneous network is rudimentary.
Protocol(TCP), first introduced in [1], has For example, a heterogeneous network, as
enabled the explosive growth of the Internet. shown in an early companion paper [11], may
The currently predominant implementation, have multiple equilibrium points, and they
referred to as TCP Reno in this paper, uses cannot all be stable unless the equilibrium is
packet loss as the congestion signal to globally unique.
dynamically adapt its transmission rate, or
more precisely, its window size.1 It has

24
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

In a homogeneous network, even though the • Uniqueness of equilibrium.


sources may control their rates using different —Local uniqueness: Theorem 3 in [11];
algorithms, they all adapt to the same —Global uniqueness: Theorems 7 and 12 in
congestion signal, e.g., all react to packet loss [11].
rate, as in the various variants of Reno and • Optimality of equilibrium
TFRC [8], or all to queuing delay, as in Vegas —Efficiency: Theorems 1 and Corollary 3 in
and FAST. For homogeneous networks, this
besides various detailed studies there is paper
already a well-developed theory, based on —Fairness: Theorems 2 and 3 in this paper.
network utility maximization, that can help • Stability of equilibrium:
understand and engineer network behaviors. —Local stability: Theorem 4 in this paper;

In particular, it is known that a homogeneous


network of general topology always has a II. TWO MOTIVATING EXAMPLES
unique equilibrium (operating point). It
maximizes aggregate utility, and the fairness In this section, we describe two simulations to
associated with it can be well predicted and illustrate some particular throughput behavior
controlled. More importantly, the bandwidth in heterogenous networks. All simulations use
allocation depends only on the congestion TCP Reno, which uses packet loss as
control algorithms (equivalently ,its underlying congestion signal, and FAST TCP, which uses
utility functions) but not on network queueing delay as congestion signal. The first
parameters (e.g., buffer sizes) or flow arrival experiment (Example 1a) shows that when a
patterns, and hence can be designed through Reno flow shares a single bottleneck link with
the choice of end-to-end TCP algorithms. It a FAST flow, the relative bandwidth allocation
means that in general we cannot predict, nor depends critically on the link parameter (buffer
control, the bandwidth allocation purely size): the Reno flow achieves higher
through the design of end-to-end congestion bandwidth than FAST when the buffer size is
control algorithms for heterogeneous large and smaller bandwidth when it is small.
networks. This implies, for example, the This implies that one cannot control the
standard ―TCP friendly‖ concept is not well fairness between Reno and FAST through just
defined anymore given equilibrium point, we the design of end-to-end congestion control
propose a general scheme to steer an algorithms, since fairness is now linked to
arbitrary heterogeneous network to the unique network parameters, unlike in the case of
equilibrium that maximizes the standard homogeneous networks.
weighted aggregate utility by updating a linear
scaler in the sources‘ algorithms on a slow The second experiment (Example 2a) shows
timescale . that even on a(multilink) network with fixed
parameters, one cannot control the fairness
The scheme requires only local end-to-end between Reno and FAST because the relative
information but does assume all flows have allocation can change dramatically depending
access to a common price, which is generally on which flow starts first!
true in practice since the common price can be
what the incumbent dominate protocol uses. It FAST [16] is a high speed TCP variant that
can be deployed incrementally as theexisting uses delay as its main control signal.
protocol needs no change and only the new Periodically, a FAST flow adjusts its congestion
protocolsneed to adopt the slow timescale window according to (1) In equilibrium, each
adaption. Packet-level (ns-2)simulation results FAST flow achieves a throughput, where is the
using TCP Reno and FAST are presented in equilibrium queueing delay observed by flow .
and Linux experiments on a realistic testbed Hence, is the number of packets that each
.We summarize here the main results that we FAST flow maintains in the bottleneck links
have derived about heterogeneous congestion along its path. In this example, one FAST flow
control in [11] and this paper. and one Reno flow share a single bottleneck
link with capacity of 8.3 packets per ms
• Existence of equilibrium: Theorem 2 in [11]; (equivalent to 100 Mbps with m

25
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A. Example 1a: Dependence of Bandwidth


The second experiment (Example 2a) shows Allocation on Network Buffer Size
that even on a(multilink) network with fixed
parameters, one cannot control the fairness
between Reno and FAST because the relative In this section, we further investigate the
allocation can change dramatically depending deviation of optimality in terms of both
on which flow starts maximum packet size) efficiency and fairness. This analysis provides
and roundtrip propagation delay 50 ms. insights on networks with heterogeneous
congestion signals, for example, how to define
The topology is shown in Fig. 1.The FAST flow interprotocol fairness.
fixes its parameter at 50 packets. In all of the
ns-2 simulations in this paper, heavy-tail noise A. Efficiency
traffic is introduced at each link at an average
rate of 10% of the link capacity.2 Fig. 2 shows We first make the following key observation,
the result with a bottleneck buffer size which motivates other results on optimality
packets. In this case, FAST gets an average of and algorithm development.
2.1 packets per ms while Reno gets 5.4
packets per ms. Theorem 1: Given an equilibrium p *, there
exists a positive vector γ(p) , such that the
III. OPTIMALITY equilibrium rate vector x*(p) is the unique
solution of following problem:
As we have shown in [36], for heterogeneous
congestion control networks, equilibrium Maxx>=0 ∑i,j riuijxij
cannot be characterized anymore. In this subject to Rx<=c
section, we further investigate the deviation of
optimality in terms of both efficiency and Corollary 1: All equilibrium points are Pareto
fairness. This analysis provides insights on efficient.
networks with heterogeneous congestion
signals, for example, how to define Corollary 2: Assume all utility functions are
interprotocol fairness. It also motivates the nonnegative , i.e.,U(x)>=0 . Suppose the
algorithm design . optimal aggregate utility is U* and U^ is the
achieved aggregate utility at an equilibrium
(x^ ) of a network with heterogeneous
protocols. Then

(U^/U*)>=(γmin /γmax)

B. Fairness

In this subsection, we study fairness in


networks shared by heterogeneous congestion
control protocols. Two questions we address
are: how the flows within each protocol share
among themselves (intraprotocol fairness) and
how these protocols share bandwidth in
equilibrium (interprotocol fairness). The results
here generalize the corresponding theorems in
[35].

26
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

mapping functions that guarantees uniqueness


1) Intraprotocol Fairness: when a network is is satisfied, the unique equilibrium is also
shared only by flows using the same locally stable. In particular, if for any l all mjl
congestion signal, the utility functions describe are the same, the equilibrium is locally stable.
how the flows shareˆˆ bandwidth among This certainly agrees with our knowledge on
themselves. When flows using different the homogeneous case.
congestion signals share the same network,
this feature is still preserved ―locally‖ within Theorem 4: If for any vector jє{1,…..j}Land
each protocol. any permutations (ζ,k,n) in{1,….L}L

Theorem 2: Given an equilibrium(xˆ,pˆ) , let ĉ Пl=1L ml[k(j)]t + Пl=1L ml[n(j)]t>= Пl=1L ml[ζ(j)]t
j
:=R j xˆ j be the total bandwidth consumed by
flows using protocol at each link. The then the equilibrium of a regular network is
corresponding flow rates xˆ j are the unique locally stable
solution of

Maxxj>=0 ∑i=1Nj uij (xij)


j j
subject to R x <=ĉ
V. SLOW TIMESCALE UPDATE
2) Interprotocol Fairness: Even though flows
using different congestion signals individually A. Motivation
solve a utility maximization problem to
determine their intraprotocol fairness, they in As pointed out in Corollary 2, all equilibria are
general do not jointly solve any predefined Pareto efficient. However, based on analysis ,
convex utility maximization problem. Here we large efficiency loss may occur and no
provide a feasibility result, which says any guarantee on fairness can be provided. This
reasonable interprotocol fairness is motivates us to turn from analysis to design,
achievableby linearly scaling congestion and develop a readily implementable control
control algorithms. Assume flow (j , i) has a mechanism that ―drives‖ any network with
parameter with which it chooses its rate in the heterogeneous congestion control protocols to
following way: a target operating point with a fair and
efficient bandwidth allocation. Our target
Xji(qji)=(Uji)i-1((1/µji)qji) equilibrium is the maximizer of some weighted
aggregate utility. The first step is to set up the
existence and uniqueness of such a solution.
Theorem 3: For every link l , assume there is
at least one type j flow that only uses that Theorem 5: For any given network (c,m,R,U),
link. Given any xєX, there exists an≧>=0 such for any positive vector , there exists a unique
that xєs(≧) positive vectorµ such that, if every source
scales their own prices byµji , i.e.,
IV. STABILITY

For general dynamical systems, a globally Xji=(Uji)i-1((1/µji)Σmjl(pl))


unique equilibrium point may not even be
locally stable . In this section, we focus on the
stability of heterogeneous congestion control
protocols, which dictates whether an Algorithm 1:Two Time Scale Adaptation
equilibrium can manifest itself experimentally
or not. For general networks, it is shown that
once the ―degree of heterogeneity‖ is properly
bounded, the equilibrium is not only unique 1. Every source chooses its rate byXji(t)=(U‘)-
but also locally stable. We now state the 1 j
(q i(t)/μji(t) ).
general result on local stability. It essentially
says that if the similarity condition on price 2. Every source updates its μji by μji(t+T)=

27
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

μji(t)+ kji(((Σ j
lєL(j,i) m l(рl (t+T))/ Σ lєL(j,i)рl α= { min{(1+δ)α, α*)}, if α< α*
(t+T))- μ i(t)).Where as, kji =stepsize of the
j
{ max{(1-δ)α, α*)}, if α>α*
flow(j,i).T is large enough so that the fast
timescale dynamics among X and pcan if where determines the responsiveness and is
reach steady state. 0.1 by default.

VII . SIMULATION RESULTS :RENO AND FAST 2. Every window update interval (20 ms by
default), run.
In this section, we apply Algorithm 1 to the
case of Reno and FAST coexisting in the same
network to resolve the issues illustrated in
Section II. It demonstrates how the algorithm
can be deployed incrementally where the
existing protocol (Reno in this case) needs no
change and only the new protocols (FAST in
this case) need to adopt slow timescale
adaptation for the whole network to converge
to the unique equilibrium that maximizes
(weighted) aggregate utility. Experiments in
this section were conducted in ns-2.

We take Reno‘s loss probability as the link Fig. 3. FAST versus Reno with buffer size 400
price, i.e.,m1i(pi)=pi for Reno. Algorithm 1 then packets(a) a sample and (b) an average
reduces to an α adaptation scheme for FAST behaviour
that uses only end-to-end local information
that is available to each flow. This algorithm,
displayed as Algorithm 2, tunes the value of α
according to the signals of queue delay and
loss on a large timescale. The basic idea is
that FAST should adjust its aggressiveness
(parameterα ) to the proper level by looking at
the ratio of end-to-end queueing delay and
end-to-end loss. Therefore FAST also reacts to
loss in a slow timescale.

Fig. 4. FAST versus Reno with buffer size 80


packets(a) a sample and (b) an average
behaviour

Algorithm 2: α - Adaptation Algorithm

1. Every α update interval (2 min by default),


calculate:

α*=(q/lw)α0

α0is the initial α value; q and l are average


queueing delay and average packet loss rate A. Example 1b: Independence of Bandwidth
over the interval; w is a parameter with the Allocation onBuffer Size
same unit of q/l . It determines the relative
fairness between delay-based and loss-based We repeat the simulations in Example 1a with
protocols. Then if Algorithm 2, with set to 125 s6..With
Algorithm 2, FAST achieves 3.4 packets per

28
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ms with buffer size of 400 and 3.2 packets per Fig.5.FAST starts first (a) a sample and (b) an
ms with buffer size of 80, while Reno gets 4.2 average behaviour
and 4.1 packets per ms, respectively. The
fairness is greatly improved and essentially
independent of buffer size now. This is
summarized in Table I by listing the ratio of
Reno‘s bandwidth to FAST‘s.We also note that
the utilization of the link for increases
significantly from 53.6% to 97.7%. The
trajectories of with different buffer sizes are
presented in Fig. 4. It is clear that although
FAST starts with in both cases, it finally ends
up with a much larger in the scenario where ,
as it experiences much higher equilibrium Fig.5.Reno starts first (a) a sample and (b) an
queueing delay with the large buffer. average behaviour

B. Example 2b: Independence of Bandwidth VII. CONCLUSION


Allocation on
Flow Arrival Pattern Congestion control has been extensively
studied for networks running a single protocol.
We repeat the simulations in Example 2a with However, when sources sharing the same
Algorithm 2,with set to 1,820 s. Figs. 5 and 6 network react to different congestion signals,
show the effect of adaptation in the multiple- the existing duality model no longer explains
bottleneck . Theorem 5 guarantees a unique the behavior of bandwidth allocation. The
equilibrium when we adapt according to existence and uniqueness properties of
Algorithm 2. In this particular case, this single equilibrium in heterogeneous protocol case are
equilibrium is around the point where each examined in [11].
Reno flow gets a throughput of 0.6 packets
per ms and each FAST flow gets 1.5 packets In this paper, we study the nature of the
per ms. At this single equilibrium, link 1 and network like fairness and uniqueness in the
link 3 are the bottleneck links. In Fig. 5, FAST network and the withstanding capability of the
flowsstart at time zero and link 2 becomes the network is analyzed along with optimality and
bottleneck. When Reno flows join at the 100th stability properties. In particular, it is shown
second, the ratio of queue delay and loss at that equilibrium is still Pareto efficient, but
link 2 is much higher than the target value. there is efficiency loss. On fairness,
The FAST flows hence reduce their values intraprotocol fairness is still determined by
gradually and the set of bottleneck links utility maximization problem, while
switches from link 2 to links 1 and 3 around interprotocol fairness is the part which we do
the 2000th second. After that, FAST flows and not have control on. However, we can achieve
Reno flows converge to the unique any desired interprotocol fairness by properly
equilibrium. choosing protocol parameters. Motivated by
the analytical results, we further propose a
distributed scheme to steer the whole network
to the unique optimal equilibrium.

The scheme only needs to update a linear


scalar in the source algorithm on a slow
timescale. It can be deployed incrementally as
the existing protocol needs no change and
only the new protocols need to adapt on the
slow timescale. There are several interesting
directions in this relatively open area. For
example, more efforts are still needed to fully

29
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

clarify the global dynamics of the two networks,‖ in Proc. 1stVALUETOOLS, 2006,
timescale system. Article no. 55.
[7] S. Low, ―A duality model of TCP and queue
The main technical difficulty here is that the management algorithms,‖ IEEE/ACM Trans.
fast timescale system may have multiple Netw., vol. 11, no. 4, pp. 525–536, Aug. 2003.
equilibria and therefore the usual two [8] J. Mo and J.Walrand, ―Fair end-to-end
timescale argument (e.g., singular window-based congestion control,‖IEEE/ACM
perturbation) is not applicable. Our current Trans. Netw., vol. 8, no. 5, pp. 556–567, Oct.
model assumes each protocol only reacts to 2000.
one particular price on the fast timescale, even [9] K. Ramakrishnan, S. Floyd, and D. Black,
when they have access to multiple types of ―The addition of explicit congestion notification
prices. Finally, the current results should be (ECN) to IP,‖ Internet Engineering Task Force,
extended from static to dynamic setting where RFC 3168, 2001.
flows come and go . [10] A. Tang, J. Wang, S. Hedge, and S. Low,
―Equilibrium and fairness of networks shared
by TCP Reno and Vegas/FAST,‖ Telecommun.
Syst., vol. 30, no. 4, pp. 417–439, Dec. 2005.
[11] A. Tang, J. Wang, S. Low, and M.
Chiang, ―Equilibrium of heterogeneous
congestion control: Existence and
uniqueness,‖ IEEE/ACMTrans. Netw., vol. 15,
REFERENCES no. 4, pp. 824–837, Aug. 2007.
[12] V. Jacobson, ―Congestion avoidance and
[1] T. Bonald and L. Massoulié, ―Impact of control,‖ in Proc. ACM SIGCOMM,1988, pp.
fairness on Internet performance,‖in Proc. 314–329.
ACM Sigmetrics, Jun. 2001, pp. 82–91. [13] ―WAN-in-Lab,‖ [Online]. Available:
[2] L. Brakmo and L. Peterson, ―TCP Vegas: http://wil.cs.caltech.edu
End-to-end congestion avoidance on a global [14] Z. Wang and J. Crowcroft, ―Eliminating
Internet,‖ IEEE J. Sel. Areas Commun., vol. periodic packet losses in the 4.3-Tahoe BSD
13, no. 6,pp. 1465–80, Oct. 1995. TCP congestion control algorithm,‖ ACM
[3] S. Deb and R. Srikant, ―Rate-based versus Comput.Commun. Rev., vol. 22, no. 2, pp. 9–
queue-based models of congestion control,‖ 16, Apr. 1992.
IEEE Trans. Autom. Control, vol. 51, no. 4, pp. [15] D. Wei, C. Jin, S. Low, and S. Hegde,
606–618, Apr. 2006. ―FAST TCP: Motivation, architecture,
[4] S. Floyd and V. Jacobson, ―Random early algorithms, performance,‖ IEEE/ACM Trans.
detection gateways for congestion avoidance,‖ Netw., vol. 14, no. 6, pp. 1246–1259, Dec.
IEEE/ACM Trans. Netw., vol. 1, no. 4, pp. 2006.
397–413, Aug. 1993. [16] B.Wydrowski, L. H. Andrew, and M.
[5] R. Jain, ―A delay-based approach for Zukerman, ―MaxNet: A congestion control
congestion avoidance in interconnected architecture for scalable networks,‖ IEEE
heterogeneous computer networks,‖ ACM Commun. Lett., vol. 7, no. 10, pp. 511–513,
Comput.Commun.Rev., vol. 19, no. 5, pp. 56– 2003.
71, Oct. 1989. [17] L. Xu, K. Harfoush, and I. Rhee, ―Binary
[6] S. Kunniyur and R. Srikant, ―End-to-end increase congestion control for fast long-
congestion control: Utility functions, random distance networks,‖ in Proc. IEEE INFOCOM,
losses and ECN marks,‖ IEEE/ACM Trans. 2004, vol. 4, pp. 2514–2524.
Netw., vol. 11, no. 5, pp. 689–702, Oct. [18] Highspeed Networks : William Stallings.
2003.[22] S. Liu, T. Basar, and R. Srikant, Edition-4.Publication : pearson.
―TCP-Illinois: A loss and delay-based
congestion control algorithm for high-speed

30
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

CONTENT BASED IMAGE RETRIEVAL ON


MOBILE DEVICES BY NAVIGATION PATTERN
BASED RELEVANCE FEEDBACK

*V.Gomathi
PG.SCHOLAR, M.E .CSE, Prathyusha Institute of Technology and Management
Email:-gomathi.gowrisankar@gmail.com, 9941446176

Abstract— Content-based image retrieval with significant advances in the areas of


(CBIR) is the mainstay of image retrieval communications and multimedia. Currently
systems. Image retrieval over mobile devices state-of-the-art multimedia compliant mobile
is a challenging research problem. This paper phones equipped with digital cameras and
presents client-server architecture whereas the camcorders have inherent support for network
client (Mobile) sends content-based query connection and thus, enable access to large
request to the server (PC) and the server amount of digital media. Nowadays, mobile
performs an interactive content-based query platforms support Java [5] that provides rich
and sends the query results to the client.The programming APIs (Application Programming
implementation of an advanced retrieval Interface). With the generation of digital
scheme is presented. The interactive query media by capturing and storing facility in
(IQ) is presented in mobile platforms can smart phones there is a need for content
avoid un-wanted progressing query results management and system to provide rapid
and thus reduce the server query time and retrieval of digital media items from large
memory. To be more profitable, relevance media archives. Therefore, it has become vital
feedback techniques were incorporated into to retrieve desired information expeditiously
CBIR such that more precise results can be and efficiently using these devices. Content-
obtained by taking user‘s feedbacks into based image retrieval (CBIR) addresses the
account. This proposed NPRF search problem of accessing the images that bears
(Navigation Pattern-based Relevance some certain content and usually relies on the
Feedback) algorithm, to achieve the high characterization of low-level features such as
efficiency and effectiveness of CBIR in coping color, shape and texture, all of which can be
with the large-scale image data. This search extracted from the images. CBIR area
algorithm makes use of the discovered possesses a tremendous potential for
navigation patterns and three kinds of query exploration and utilization equally for
refinement strategies, QPM (Query-Point- researchers and people in industry due to its
Movement), QR (Query-Reweighting) and QEX promising results. It has been an active area
(Query Expansion), to converge the search of research for the past decade. The content
space towards the user‘s intention effectively. based retrieval of a desired multimedia item is
By using NPRF method, high quality of image currently based upon indexing of the content
retrieval on RF can be achieved in a small by the extraction of low-level visual features
number of feedbacks. based on shape, color and texture.
Systems such as ―Multimedia Video
Keywords— content; relevance feedback, Indexing and Retrieval System‖ (MUVIS), [14],
Navigation pattern; Mobile; retrieval. VisualSEEk [13], Photobook [12] and Virage
have a framework designed for indexing and
retrieving images and/or audio-video files. The
I. INTRODUCTION contemporary MUVIS has been developed as a
The mobile phone industry is going through a system for content-based multimedia retrieval
phenomenal change over the past few years on a PC-based environment. It provides a

31
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

unified and global framework that consists of convergence. This paper is organized as
robust set of applications for capturing, follows: Section 2 gives a brief overview of the
recording, indexing and retrieval combined CBIR; Section 3 describes the basic
with browsing and various other audiovisual architecture and several functionalities of M-
and semantic capabilities. MUVIS. Section 4 describes query techniques,
On this purpose our research work targets Section 5 describes RF method basis on the
to bring the MUVIS framework beyond the NPRF search. Section 6 describes the
desktop environment into the realm of Experimental results. Finally, the conclusion is
wireless devices such as mobile phones, section 7.
Personal Digital Assistants (PDAs),
communicators etc., where the user can II. CONTENT BASED IMAGE RETRIEVAL (CBIR)
perform query operations in large multimedia
databases and query results can be retrieved CONTENT-BASED IMAGE RETRIEVAL, KNOWN AS
within a reasonable time. Therefore, our main CBIR, EXTRACTS SEVERAL FEATURES THAT DESCRIBE
goal is to design and develop a CBIR system THE CONTENT OF THE IMAGE, MAPPING THE VISUAL
that enables any (mobile) client supporting CONTENT OF THE IMAGES INTO A NEW SPACE CALLED
Java platform to retrieve images similar to the THE FEATURE SPACE. THE FEATURE SPACE VALUES FOR
query image from an image database, which is A GIVEN IMAGE ARE STORED IN A DESCRIPTOR THAT
accompanied by a dedicated server CAN BE USED FOR RETRIEVING SIMILAR IMAGES. TO
application. ACHIEVE THESE GOALS, CBIR SYSTEMS USE THREE
In general, the purpose of CBIR is to present BASIC TYPES OF FEATURES: COLOUR
an image conceptually, with a set of low-level FEATURES, TEXTURE FEATURES AND SHAPE FEATURES.
visual features such as colour, texture and HIGH RETRIEVAL SCORES IN CONTENT-BASED IMAGE
shape. These conventional approaches for RETRIEVAL SYSTEMS CAN BE ATTAINED BY
image retrieval are based on the computation ADOPTING RELEVANCE FEEDBACK MECHANISMS. THESE
of the similarity between the user‘s query and MECHANISMS REQUIRE THE USER TO GRADE THE
images via a query-by-example system (QBE). QUALITY OF THE QUERY RESULTS BY MARKING THE
System, the user can pick up some preferred RETRIEVED IMAGES AS BEING EITHER RELEVANT OR
images to refine the image explorations NOT. THEN, THE SEARCH ENGINE USES THIS GRADING
iteratively. The feedback procedure, called INFORMATION IN SUBSEQUENT QUERIES TO BETTER
Relevance Feedback (RF), repeats until the SATISFY USERS NEEDS. IT IS NOTED THAT WHILE
user is satisfied with the retrieval results. RELEVANCE FEEDBACK MECHANISMS WERE FIRST
Although a number of RF studies have been INTRODUCED IN THE INFORMATION RETRIEVAL FIELD,
made on interactive CBIR, they still incur some CURRENTLY RECEIVE CONSIDERABLE ATTENTION IN
common problems, namely redundant THE CBIR FIELD. THIS PROJECT MAINLY FOCUSED ON
browsing and exploration convergence. First, EFFICIENT CONTENT BASED IMAGE RETRIEVAL ON
in terms of redundant browsing, most existing MOBILE DEVICE USING NAVIGATION PATTERN BASED
RF methods focus on how to earn the user‘s RELEVANCE FEEDBACK. THIS SECTION CONTAINS BASIC
satisfaction in one query process. That is, INFORMATION ON CBIR, AND DISCUSSES THE
existing methods refine the query again and TECHNIQUES USED.
again by analysing the specific relevant images
picked up by the users. Especially for the
compound and complex images, the users
might go through a long series of feedbacks to
obtain the desired images using current RF
approaches. The proposed approach NPRF
integrates the discovered navigation patterns
and three RF techniques to achieve efficient
and effective images. The major difference
between our proposed approach and other
contemporary approaches is that it has
approximated an optimal solution to resolve
the problems existing in current RF, such as Fig 1: content based image retrieval
FOR THE DESIGN OF CONTENT-BASED RETRIEVAL
redundant browsing and exploration SYSTEMS, A DESIGNER NEEDS TO CONSIDER FOUR

32
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ASPECTS: FEATURE EXTRACTION AND PROCESSING HIGH-LEVEL QUERIES ARISES FROM


REPRESENTATION, DIMENSION REDUCTION OF EXTERNAL KNOWLEDGE WITH THE DESCRIPTION OF
FEATURE, INDEXING, AND QUERY SPECIFICATIONS, LOW-LEVEL FEATURES, KNOWN AS THE SEMANTIC GAP.
WHICH WILL BE SHOWN IN THE FIGURE 1. THE RETRIEVAL PROCESS REQUIRES A TRANSLATION
MECHANISM THAT CAN CONVERT THE QUERY OF
A. FEATURE EXTRACTION AND ―MONA LISA SMILE‖ INTO LOW-LEVEL FEATURES. TWO
REPRESENTATION POSSIBLE SOLUTIONS HAVE BEEN PROPOSED TO
MINIMIZE THE SEMANTIC GAP. THE FIRST IS
REPRESENTATION OF MEDIA NEEDS TO CONSIDER AUTOMATIC METADATA GENERATION TO THE MEDIA.
WHICH FEATURES ARE MOST USEFUL FOR AUTOMATIC ANNOTATION STILL INVOLVES THE
REPRESENTING THE CONTENTS OF MEDIA AND WHICH SEMANTIC CONCEPT AND REQUIRES DIFFERENT
APPROACHES CAN EFFECTIVELY CODE THE ATTRIBUTES SCHEMES FOR VARIOUS MEDIA. THE SECOND USES
OF THE MEDIA. THE FEATURES ARE TYPICALLY RELEVANCE FEEDBACK TO ALLOW THE RETRIEVAL
EXTRACTED OFF -LINE SO THAT EFFICIENT SYSTEM TO LEARN AND UNDERSTAND THE SEMANTIC
COMPUTATION IS NOT A SIGNIFICANT ISSUE, BUT CONTEXT OF A QUERY OPERATION.
LARGE COLLECTIONS STILL NEED A LONGER TIME TO
COMPUTE THE FEATURES. FEATURES OF MEDIA
CONTENT CAN BE CLASSIFIED INTO LOW-LEVEL AND
HIGH-LEVEL FEATURES. D. DIMENSION REDUCTION OF FEATURE
VECTOR
B. LOW-LEVEL FEATURES
MANY MULTIMEDIA DATABASES CONTAIN LARGE
LOW-LEVEL FEATURES SUCH AS OBJECT NUMBERS OF FEATURES THAT ARE USED TO ANALYZE
MOTION, COLOR, SHAPE, TEXTURE, LOUDNESS, POWER AND QUERY THE DATABASE. SUCH A FEATURE-VECTOR
SPECTRUM, BANDWIDTH, AND PITCH ARE EXTRACTED SET IS CONSIDERED AS HIGH DIMENSIONALITY. HIGH
DIRECTLY FROM MEDIA IN THE DATABASE. FEATURES DIMENSIONALITY CAUSES THE ―CURSE OF DIMENSION‖
AT THIS LEVEL ARE OBJECTIVELY DERIVED FROM THE PROBLEM, WHERE THE COMPLEXITY AND
MEDIA RATHER THAN REFERRING TO ANY EXTERNAL COMPUTATIONAL COST OF THE QUERY INCREASES
SEMANTICS. FEATURES EXTRACTED AT THIS LEVEL CAN EXPONENTIALLY WITH THE NUMBER OF DIMENSIONS.
ANSWER QUERIES SUCH AS ―FINDING IMAGES WITH DIMENSION REDUCTION IS A POPULAR TECHNIQUE TO
MORE THAN 20% DISTRIBUTION IN BLUE AND GREEN OVERCOME THIS PROBLEM AND SUPPORT EFFICIENT
COLOR,‖ WHICH MIGHT RETRIEVE SEVERAL IMAGES RETRIEVAL IN LARGE-SCALE DATABASES. HOWEVER,
WITH BLUE SKY AND GREEN GRASS MANY EFFECTIVE THERE IS A TRADEOFF BETWEEN THE EFFICIENCY
APPROACHES TO LOW -LEVEL FEATURE EXTRACTION OBTAINED THROUGH DIMENSION REDUCTION AND THE
HAVE BEEN DEVELOPED FOR VARIOUS PURPOSES. COMPLETENESS OBTAINED THROUGH THE
INFORMATION EXTRACTED. IF EACH DATA IS
C. HIGH-LEVEL FEATURES REPRESENTED BY A SMALLER NUMBER OF DIMENSIONS,
THE SPEED OF RETRIEVAL IS INCREASED. HOWEVER,
HIGH-LEVEL FEATURES ARE ALSO CALLED SEMANTIC SOME INFORMATION MAY BE LOST. ONE OF THE MOST
FEATURES. FEATURES SUCH AS TIMBRE, RHYTHM, WIDELY USED TECHNIQUES IN MULTIMEDIA RETRIEVAL
INSTRUMENTS, AND EVENTS INVOLVE DIFFERENT IS PRINCIPAL COMPONENT ANALYSIS (PCA). PCA IS
DEGREES OF SEMANTICS CONTAINED IN THE MEDIA. USED TO TRANSFORM THE ORIGINAL DATA OF HIGH
HIGH-LEVEL FEATURES ARE SUPPOSED TO DEAL WITH DIMENSIONALITY INTO A NEW COORDINATE SYSTEM
SEMANTIC QUERIES (E.G., ―FINDING A PICTURE OF WITH LOW DIMENSIONALITY BY FINDING DATA WITH
WATER‖ OR ―SEARCHING FOR MONA LISA SMILE‖). HIGH DISCRIMINATING POWER. THE NEW COORDINATE
THE LATTER QUERY CONTAINS HIGHER-DEGREE SYSTEM REMOVES THE REDUNDANT DATA AND THE
SEMANTICS THAN THE FORMER. AS WATER IN IMAGES NEW SET OF DATA MAY BETTER REPRESENT THE
DISPLAYS THE HOMOGENEOUS TEXTURE REPRESENTED ESSENTIAL INFORMATION.
IN LOW-LEVEL FEATURES, SUCH A QUERY IS EASIER TO
PROCESS. TO RETRIEVE THE LATTER QUERY, THE E. INDEXING
RETRIEVAL SYSTEM REQUIRES PRIOR KNOWLEDGE
THAT CAN IDENTIFY THAT MONA LISA IS A WOMAN, THE RETRIEVAL SYSTEM TYPICALLY CONTAINS TWO
WHO IS A SPECIFIC CHARACTER RATHER THAN ANY MECHANISMS: SIMILARITY MEASUREMENT AND MULTI-
OTHER WOMAN IN A PAINTING. THE DIFFICULTY IN DIMENSIONAL INDEXING. SIMILARITY MEASUREMENT IS

33
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

USED TO FIND THE MOST SIMILAR OBJECTS. MULTI- SYSTEMS ARE TRADITIONALLY PERFORMED BY USING
DIMENSIONAL INDEXING IS USED TO ACCELERATE THE AN EXAMPLE OR SERIES OF EXAMPLES. THE TASK OF
QUERY PERFORMANCE IN THE SEARCH PROCESS. THE SYSTEM IS TO DETERMINE WHICH CANDIDATES
ARE THE MOST SIMILAR TO THE GIVEN EXAMPLE. THIS
F. SIMILARITY MEASUREMENT DESIGN IS GENERALLY TERMED QUERY BY EXAMPLE
(QBE) MODE. THE SUCCESS OF THE QUERY IN THIS
TO MEASURE THE SIMILARITY, THE GENERAL APPROACH HEAVILY DEPENDS ON THE INITIAL SET OF
APPROACH IS TO REPRESENT THE DATA FEATURES AS CANDIDATES.
MULTI-DIMENSIONAL POINTS AND THEN TO CALCULATE
THE DISTANCES BETWEEN THE CORRESPONDING I. Relevance Feedback
MULTI-DIMENSIONAL POINTS. SELECTION OF METRICS
HAS A DIRECT IMPACT ON THE PERFORMANCE OF A HIGH RETRIEVAL SCORES IN CONTENT-BASED
RETRIEVAL SYSTEM. EUCLIDEAN DISTANCE IS THE IMAGE RETRIEVAL SYSTEMS CAN BE ATTAINED BY
MOST COMMON METRIC USED TO MEASURE THE ADOPTING RELEVANCE FEEDBACK MECHANISMS. THESE
DISTANCE BETWEEN TWO POINTS IN MULTI- MECHANISMS REQUIRE THE USER TO GRADE THE
DIMENSIONAL SPACE. HOWEVER, FOR SOME QUALITY OF THE QUERY RESULTS BY MARKING THE
APPLICATIONS, EUCLIDEAN DISTANCE IS NOT RETRIEVED IMAGES AS BEING EITHER RELEVANT OR
COMPATIBLE WITH THE HUMAN PERCEIVED SIMILARITY. NOT. THEN, THE SEARCH ENGINE USES THIS GRADING
A NUMBER OF METRICS (E.G., MINKOWSKI-FORM INFORMATION IN SUBSEQUENT QUERIES TO BETTER
DISTANCE, EARTH MOVER‘S DISTANCE, AND SATISFY USERS' NEEDS. IT IS NOTED THAT WHILE
PROPORTIONAL TRANSPORTATION DISTANCE) HAVE RELEVANCE FEEDBACK MECHANISMS WERE FIRST
BEEN PROPOSED FOR SPECIFIC PURPOSES. INTRODUCED IN THE INFORMATION RETRIEVAL FIELD,
THEY CURRENTLY RECEIVE CONSIDERABLE ATTENTION
G. MULTI-DIMENSIONAL INDEXING IN THE CBIR FIELD.

RETRIEVAL OF THE MEDIA IS USUALLY BASED NOT III. M-MUVIS FRAME WORK
ONLY ON THE VALUE OF CERTAIN ATTRIBUTES, BUT our research work targets to bring the MUVIS
ALSO ON THE LOCATION OF A FEATURE VECTOR IN THE framework beyond the desktop environment
FEATURE SPACE. IN ADDITION, A RETRIEVAL QUERY ON into the realm of wireless devices such as
A DATABASE OF MULTIMEDIA WITH MULTI- mobile phones, Personal Digital Assistants
DIMENSIONAL FEATURE VECTORS USUALLY REQUIRES (PDAs), communicators etc., where the user
FAST EXECUTION OF SEARCH OPERATIONS. TO can perform query operations in large
SUPPORT SUCH SEARCH OPERATIONS, AN APPROPRIATE multimedia databases and query results can
MULTI-DIMENSIONAL ACCESS METHOD HAS TO BE USED be retrieved within a reasonable time.
FOR INDEXING THE REDUCED BUT STILL HIGH Therefore, our main goal is to design and
DIMENSIONAL FEATURE VECTORS. POPULAR MULTI- develop a CBIR system that enables any
DIMENSIONAL INDEXING METHODS INCLUDE R-TREE (mobile) client supporting Java platform to
AND R*-TREE. THESE MULTI-DIMENSIONAL retrieve images similar to the query image
INDEXING METHODS PERFORM WELL WITH A LIMIT OF from an image database, which is
UP TO 20 DIMENSIONS. AN APPROACH TO TRANSFORM accompanied by a dedicated server
MUSIC INTO NUMERIC FORMS AND DEVELOPED AN application. The developed system, so called
INDEX STRUCTURE BASED ON R-TREE FOR EFFECTIVE Mobile MUVIS (M-MUVIS), shown in Figure 2 is
RETRIEVAL. structured upon contemporary MUVIS
framework and has client-server architecture.
H. QUERY SPECIFICATIONS The M-MUVIS server basically comprises of
two Java servlets [5] running inside a Tomcat
QUERYING IS USED TO SEARCH FOR A SET OF [8] web server, which in effect transforms
RESULTS WITH SIMILAR CONTENT TO THE SPECIFIED standalone MUVIS into a web application. The
EXAMPLES. BASED ON THE TYPE OF MEDIA, QUERIES IN MUVIS Query Server (MQS) has native libraries
CONTENT-BASED RETRIEVAL SYSTEMS CAN BE for efficient image query related operations.
DESIGNED FOR SEVERAL MODES (E.G., QUERY BY The second servlet so called MUVIS Media
SKETCH, QUERY BY PAINTING [FOR VIDEO AND IMAGE], Retrieval Server (MMRS) is used for the media
QUERY BY SINGING [FOR AUDIO], AND QUERY BY retrieval. In order to take the advantage of
EXAMPLE).QUERIES IN MULTIMEDIA RETRIEVAL flexibility and portability of Java, a M-MUVIS

34
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

client application has been developed by using client) to the client. Sending the intermediate
Java 2, Micro Edition (J2ME) [5]. Such a results to the client consume extra network
system can find its application in sharing or bandwidth, RAM, processing power and
reuse of digital media, content management, battery power of the device. Whereas IQ
networked photo album, shopping and travel. provides an efficient retrieval without
generating many intermediate query results in
larger image database.
V. NPRF SEARCH
Despite the power of the search strategies, it
is very difficult to optimize the retrieval quality
of CBIR within only one query process. The
hidden problem is that the extracted visual
features are too diverse to capture the
concept of the user‘s query. To solve such
problems, in the QBE system, the user can
pick up some preferred images to refine the
image explorations iteratively. The feedback
procedure, called Relevance Feedback (RF),
repeats until the user is satisfied with the
retrieval results. Although a number of RF
studies have been made on interactive CBIR,
Fig 2: M-MUVIS framework
they still incur some common problems,
IV. QUERY TECHNIQUES namely redundant browsing and exploration
With growing image content, an efficient convergence. To resolve the aforementioned
image retrieval technique is deemed required. problems, we propose a novel method named
Specially, for a mobile device user, performing NPRF (Navigation Pattern-Based Relevance
the query can be annoying experience due to Feedback) to achieve the high retrieval quality
the large query processing time [2], [6]. It is of CBIR with RF by using the discovered
therefore, vital to devise a method which not navigation patterns. The proposed approach
only reduces the query processing time but NPRF integrates the discovered navigation
also performs the query operation without patterns and three RF techniques to achieve
requiring a system equipped with high efficient and effective images.
performance hardware such as fast processors  Query-Reweighting (QR): Some
and large memory. In this paper present an previous work keeps an eye on investigating
Interactive Query (IQ) [6] for a mobile device what visual features are important for those
which achieves retrieval performance that may images (positive examples) picked up by the
not require a superior performing system on users at each feedback (also called
the server side and reduce network bandwidth iteration).For this kind of approach, no matter
and processing power on the client side. how the weighted or generalized distance
Before IQ, M-MUVIS supported Normal Query function is adapted, the diverse visual features
(NQ) and Progressive Query (PQ) [2]. In NQ extremely limit the effort of image retrieval.
the query results were based on comparing Figure 3 illustrates this limitation that,
similarity distances of all the images primitives although the search area is continuously
present in the entire database and performing updated by re-weighting the features, some
ranking operation afterwards. NQ is costly in targets could be lost.
terms of processing power and in case of
abrupt stopping during the query processes
the retrieved query information is lost. PQ
generates the query results after a fix time
interval. In large image database with small
time interval PQ generates many results that
consume lot of memory and the server
processing power. The server sends the
desired intermediate result (as selected by

35
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Fig 3: Query Refinement techniques.

 Query-Point-Movement (QPM):Another
solution for enhancing the accuracy of image
retrieval is moving the query point towards the
contour of the user‘s preference in feature
space. QPM regards multiple positive examples
as a new query point at each feedback. After
several forceful changes of location and
contour, the query point should be close to a
convex region of the user‘s interest.
 Query Expansion (QEX): If QR and
QPM cannot completely cover the user‘s
interest spreading in the broad feature space,
As a result, diverse results for the same
concept are difficult to obtain. For this reason,
the modified version MARS groups the similar
relevant points into several clusters, and Fig 4: Workflow of NPRF Search
selects good representative points from these
clusters to construct the multipoint query.  Initial query processing Phase:
Overview of NPRF (Navigation Pattern based Without considering the feature-weight, this
Relevance Feedback) phase extracts the visual features from the
The task of the proposed approach shows original query image to find the similar
various operations .As depicted in Figure 4, images. Afterward, the good examples picked
each operational phase contains some critical up by the user are further analyzed at the first
components for completing the specific feedback.
process. The first query process is called initial  Image search phase: Behind the
feedback. Next the good examples picked up search phase, our intent is to extend the one
by the user deliver the valuable information to search point to multiple search points by
the image search phase, including new feature integrating the navigation patterns and
weights, new query-point and the user‘s propose algorithm NPRF search. In this phase,
intention. Then, by using the navigation a new query point at each feedback is
patterns, three search strategies, with respect generated by the preceding positive examples,
to QPM, QR and QEX are hybridized to find the and then the K nearest images to the new
desired images. Overall, at each feedback, the query point can be found by expanding the
results are presented to the user and the weighted query. The search procedure does
related browsing information is stored in the not stop unless the user is satisfied with the
log database. After accumulating long-term retrieval results.
User‘s browsing behaviours, off-line operation  Knowledge Discovery Phase: Learning
for knowledge discovery is triggered to from users‘ behaviours in image retrieval can
perform navigation pattern mining and pattern be viewed as one type of knowledge
indexing. The frame work of the proposed discovery. The navigation patterns from user‘s
approach is briefly described as follows: behaviour support to predict optimal image
browsing paths.
 Data Storage phase: The databases in
this phase can be regarded as the knowledge
marts of a knowledge warehouse, which store
integrated, time variant and non-volatile
collection of useful data including images,
navigation patterns, log files, and image
features, The knowledge warehouse in very
helpful to improve the quality of image
retrieval.

36
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Step 6: Center now moves to the New


Algorithm NPRF Search Centroid.
The NPRF Search algorithm In brief, Step 7: Repeat Step 3 to Step 6 Until
the iterative search procedure can be Terminated.
decomposed into several steps: 1)Generate a
new query point by averaging the visual VI. EXPERIMENTAL RESULT
features of positive examples,2) Find the top s Experimental Data
relevant visual query points from the set of The experimental data came from the
the nearest leaf nodes and 3) Finally, the top k collection of the corel image database and the
relevant images are returned to the user. By web images. We prepared different kinds of
collecting a large number of query datasets .Each category contains 200 images.
transactions, most queries can be well All the experiments were implemented in
answered for matching user‘s interests by JAVA, running on mobile (Java Enabled like
NPRF Search. The details of the NPRF Search Nokia N95, Nokia5800) and personal computer
algorithm are described as follows. The with 3.5 GHz processor and 1G RAM.
simplest algorithm for identifying a sample
from the test set is called the Nearest Retrieval Efficiency
Neighbor method. The object of interest is To analyze the effectiveness of our
compared to every sample in the training set, proposed approach, two major criterions,
using a distance measure, a similarity namely precision and coverage, are used to
measure, or a combination of measures. This measure the experimental evaluations. They
process is computationally intensive and not are defined as:
very robust.We can make the Nearest
Neighbor method more robust by selecting not
just the closest sample in the training set,but Precision = No. of relevant images
by consideration of a group of close feature
vectors. This is called the K-Nearest Neighbor *100
method, where, for example K = 5. Then we Total number of images retrieved
assign the unknown feature vector to the class
that occurs most often in the set of K- Coverage = No. of correct images
Neighbors. This is still very computationally
intensive, since we have to compare each *100
unknown sample to every sample in the Total number of images relevant
training set, and we want the training set as
large as possible to maximize success. We can Experimental Result
reduce this computational burden by using a
method called Nearest Centroid. Here, we find
the centroids for each class from the samples
in the training set, and then we compare the
unknown samples to the representative
centroids only. The centroids are calculated by
finding the average value for each vector
component in the training set.

K means Algorithm
Step 1: Enter How Many Clusters (Let ―k‖).
Step 2: Randomly Guess K Cluster center
Locations.
Step 3: Each Data point finds out which center
it‘s closest to.
Step 4: Thus Each Center ―Owns‖ Set of Fig 5: The query result shown on Nokia 5800
Points.
Step 5: Each Center Finds the Centroid of its
Own Points.

37
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[2] I. Ahmad, S. Abdullah, S.Kiranyaz,


M.Gabbouj, ―Progressive query technique for
image retrieval on mobile devices‖, CBMI,
June 21-23, 2005, Riga, Latvia.
[3] V. Chopra, Amit Bakore, Jon Eaves,
Ben Galbraith, Sing Li, Chanoch Wiggers,
―Professional Apache Tomcat 5‖, published by
Wrox, May, 2004, ISBN 0764559028.
[4] H. M. Deitel, P. J. Deitel, Harvey M.
Deitel, Paul J. Deitel, ―Java How to Program‖,
5th Edition, published by Prentice Hall,
December 1999 .
[5] J. Keogh, ―The Complete Reference
J2ME‖, published by McGraw-Hill OSBORNE
Fig 6: The resulting example for NPRF Edition. Feb 27, 2003. ISBN: 0072227109.
[6] S. Kiranyaz, Moncef Gabbouj,
"Hierarchical Cellular Tree: An Efficient
VII. CONCLUSION Indexing Scheme for Content-based Retrieval
The dramatic rise in the sizes of images on Multimedia Databases", IEEE Transactions
databases has stirred the development of on Multimedia, vol. 9, no. 1, January 2007, pp.
effective and efficient retrieval systems. The 102-119 .
development of these systems started with [7] Ahmad, S. Kiranyaz and M. Gabbouj,
retrieving images using textual annotations ―An Efficient Image Retrieval Scheme on Java
but later introduced image retrieval based on Enabled Mobile Devices,‖ MMSP 05,
content. This came to be known as CBIR or International Workshop on Multimedia Signal
Content Based Image Retrieval. Systems using Processing, Shanghai, China, 2005.
CBIR retrieve images based on visual features
such as colour, texture and shape, as opposed
to depending on image descriptions or textual
indexing. In this paper researched various
modes of representing and retrieving the [8] Moncef Gabbouj, Esin Guldogan, Mari
image properties of colour, texture and PartioBirinci, Ahmad Iftikhar-―An Extended
shape.This system aims mainly at content Framework Structure in MUVIS for Content-
based efficient image retrieval on mobile based Multimedia Indexing and Retrieval ―IEEE
devices. That is client-server architecture 2007.
where a server is running on a personal [9] Moncef Gabbouj, Iftikhar Ahmad,
computer and a client on the device. The Malik Yasir Amin and Serkan, ―Content-based
client sends content-based query request to Image Retrieval for Connected Mobile Devices‖
the server and the server performs an IEEE2003.
interactive content-based query and sends the [10] Venkat N Gudivada ―Relevance
query results to the client. To be more Feedback in Content-Based Image
profitable, NPRF search techniques were Retrieval‖Marshall University, Huntington, IEEE
incorporated into CBIR such that more precise 2000.
results can be obtained by taking user‘s [11] Sagarmay Deb Yanchun Zhang ―An
feedbacks into account. NPRF search can bring Overview of Content-based Image Retrieval
out more accurate results. Techniques School of Computer Science and
Mathematics‖, Australia.
REFERENCES [12] ―Facebook‖,
[1] I. Ahmad, S. Kiranyaz and M. http://www.facebook.com/
Gabbouj, ―An Efficient Image Retrieval [13] ―Flickr‖, http://www.flickr.com/
Scheme on Java Enabled Mobile Devices,‖ [14] ―MUVIS‖, http://muvis.cs.tut.fi/
MMSP 05, International Workshop on [15] ―Nokia‖, http://www.nokia.com/
Multimedia Signal Processing, Shanghai,
China, November, 2005.

38
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

REVERSE NEAREST NEIGHBOR FOR ANONYMOUS


QUERIES
*A.Anna Arasu **R.Nakeeran

*PG Student in Department of Computer Science and Engineering,


Dr.Pauls Engineering College, Anna University,
Vanur, Tamilnadu, India
smilearasucse@gmail.com

**Professor in Department of Computer Science and Engineering,


Dr.Pauls Engineering College, Anna University,
Vanur, Tamilnadu, India
sughandhiram@yahoo.com
Abstract— In this paper we propose an Keywords— Location based service,
algorithm for answering reverse nearest Reverse nearest neighbor, Anonymous
neighbor (RNN) for anonymous queries. query.
The class of queries is strongly related to INTRODUCTION
that of nearest neighbor queries, although The past decade has seen the assimilation
the two are not necessarily of sensor networks and location-based
complementary. The increasing availability systems in real world applications such as
of location-aware mobile devices has enhanced 911 services, army strategic
given rise to a flurry of location-based planning, retail services, and mixed-reality
services (LBSs). On the other hand, games. The continuous1 movement of
revealing exact user locations to data objects within these applications
(potentially untrusted) LBS may pinpoint calls for new query processing techniques
their identities and breach their privacy. that scale up with the high rates of
One such query is the reverse nearest location updates. While numerous works
neighbor (RNN) query that returns the have addressed continuous range queries
objects that have a query object as their (e.g., see [1], [2], [4],) and continuous
closest object. This paper proposes an nearest neighbor queries there is still a
algorithm for answering RNN queries for lack of research in addressing the
continuously moving points in the plane. continuous reverse nearest neighbor
We design location obfuscation (RNN) queries.
techniques that: 1) provide anonymous We are currently experiencing rapid
LBS access to the users and 2) allow developments in key technology areas
efficient query processing at the LBS side. that combine to promise widespread use
Our methods are experimentally of mobile, personal information
evaluated with real and synthetic data. appliances, most of which will be on-line,
i.e., on the Internet. Industry analysts
uniformly predict that wireless, mobile

39
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Internet terminals will outnumber the position to an individual is possible by


desktop computers on the Internet. This various means, such as publicly available
proliferation of devices offers companies information (e.g., city maps and
the opportunity to provide a diverse range telephone directories), physical
of e-services, many of which will exploit observation, cell phone signal
knowledge of the user‘s changing triangulation, etc. User privacy may be
location. Location awareness is enabled threatened because of the sensitive
by a combination of political nature of accessed data, e.g., inquiring
developments, e.g., the de-scrambling of for pharmacies that offer medicines for
the GPS signals and the US E911 diseases associated with a social stigma,
mandate, and the continued advances in or asking for nearby addiction recovery
both infrastructure-based and handset- groups (Alcoholics/Narcotics Anonymous,
based positioning technologies. etc.). Another source of threats comes
from less sensitive data (e.g., gas station
The area of location-based games offers locations, shops, restaurants, etc.) that
good examples of services where the may reveal the user‘s interests and
positions of the mobile users play a shopping needs, resulting in a flood of
central role. In the recent released unsolicited advertisements through e-
BotFighters game, by Swedish company coupons and personal messages.
it‘s Alive, players get points for finding
and ―shooting‖ other players via their Solve this problem, the following general
mobile phones. Only players close by can approach is taken. When a user u wishes
be shot. In such mixed-reality games, the to pose a query, she sends her location to
real physical world becomes the backdrop a trusted server, the anonymizer (AZ),
of the game, instead of the world created through a secure connection (e.g., SSL).
on the limited displays of wireless devices. The latter obfuscates her location,
replacing it with an anonym zing spatial
The low cost and small size of positioning region (ASR) that encloses u. The ASR is
equipment (e.g., GPS receivers) have then forwarded to the LS. Ignoring where
allowed their embedding into PDAs and exactly u is, the LS retrieves (and reports
mobile phones. The wide availability of to the AZ) a candidate set (CS) that is
these location-aware portable devices has guaranteed to contain the query results
given rise to a flourishing industry of for any possible user location inside the
location-based services (LBSs). An LBS ASR. The AZ receives the CS and reports
makes spatial data available to the users to u the subset of candidates that
through one or more location servers corresponds to her original query. In
(LSs) that index and answer user queries order for the AZ to produce valid ASRs,
on them. Examples of spatial queries the users send location updates whenever
could be ―Where is the closest hospital to they move (through their secure
my current location?‖ or ―Which connection).
pharmacies are open within a 1 km RELATED WORK
radius?‖ In order for the LS to be able to There is a recent interest in developing
answer such questions, it needs to know new continuous query processors to cope
the position of the querying user. with the recent advances in dynamic
There exist many algorithms for efficient location-aware environments. As a result,
spatial query processing, but the main new algorithms have been developed for
challenge in the LBS industry is of a various types of continuous location-
different nature. In particular, users are based queries,e.g., continuous range
reluctant to use LBSs, since revealing queries [1], [2], [4], continuous nearest
their position may link to their identity. neighbor queries [3], and continuous
Even though a user may create a fake ID aggregates [4], [5].Although reverse
to access the service, her location alone nearest neighbor queries are of the same
may disclose her actual identity. Linking a importance as these query types, little

40
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

work has been done to develop efficient TPR-tree. However, in our proposed
algorithms for continuous reverse nearest methods, we do not assume a specific
neighbor queries. velocity and objects can move in any
Various algorithms have been proposed direction which is not constrained to a
for snapshot RNNs in different single direction. To our knowledge, there
environments, e.g., in euclidian space, is only one algorithm that does not
metric space, high-dimensional space ad- assume that objects move on a single
hoc space and large graphs. In this paper, plane and a velocity is not given, termed
we mainly focus on euclidian space in CRNN [6], for continuous evaluation of
which it is proved that there are at most reverse nearest neighbor queries. CRNN
six reverse nearest neighbors for the extends the idea of dividing the space into
monochromatic case. Utilizing this six pies, originally developed for snapshot
property, an approach has been queries [7], to dynamic environments. As
introduced by dividing the spatial space a result, CRNN monitors each pie region
into six pie regions. Then, six nearest along with six moving objects at every
neighbor objects (one object in each pie) time interval. However, CRNN has two
are used as filters to limit the search main disadvantages: 1) CRNN is limited to
space. monochromatic RNN queries and 2) CRNN
always assumes a constant worst case
The RdNN[12] -Tree extends the RNN- scenario at every time interval where it is
Tree by combining the two index assumed that there are always six RNNs.
structures (NN-Tree and RNN-Tree) into These drawbacks arise from the fact that
one common index. It is also designed for CRNN ignores the relationship between
reverse 1-nearest neighbor search. For the neigh boring pies.
each object p, the distance to p‘s 1- The reverse nearest neighbor queries is
nearest neighbor, i.e. nndist1 (p) is intimately related to nearest neighbor
precomputed. In general, the RdNN-Tree queries. In this section, we first overview
is a R-Tree-like structure containing data the existing proposals for answering
objects in the data nodes and MBRs in the nearest neighbor queries, for both
directory nodes. In addition, for each data stationary and moving points. Then, we
node N, the maximum of the 1-nearest discuss the proposals related to reverse
neighbor distance of the objects in N is nearest neighbor queries.
aggregated. An inner node of the RdNN-
Tree aggregates the maximum 1-nearest III REVERSE NEAREST NEIGHBOR
neighbor distance of all its child nodes. A ALGORITHM
reverse 1-nearest neighbor query is Reverse Nearest Neighbor Queries To our
processed top down by pruning those knowledge, three solutions exist for
nodes N where the maximum 1-nearest answering RNN queries for non-moving
neighbor distance of N is greater than the points in two and higher dimensional
distance between query object q and N, spaces. Stanoi et al. present a solution for
because in this case, N cannot contain answering RNN queries in two-
true hits anymore. Due to the dimensional space. Their algorithm is
materialization of the 1-nearest neighbor based on the following observations [20].
distance of all data objects, the RdNN- Let the space around the query point q be
Tree needs not to compute 1-nearest divided into six equal regions Si( 1 ≤ i ≤ 6
neighbor queries for each object. ) by straight lines intersecting at q , as
A recent technique for finding shown in Figure 3. Then, there exists at
monochromatic reverse nearest neighbors most six RNN points for q, and they are
for moving objects [8] is similar to our distributed as follows.
problem except that the velocity of each 1. There exists at most two RNN points in
object is given as part of the input and each region Si.
each object is assumed to move on a 2. If there exists exactly two RNN points
plane which can then be indexed using a in a region Si, then each point must be on

41
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

one of the space dividing lines through q IV ALGORITHM FOR FINDING REVERSE
delimiting Si. NEAREST NEIGHBORS
In this section, we describe algorithm Find
RNN that computes the reverse nearest
neighbors for a continuously moving point
in the plane. The notation is the same as
in the previous section. The algorithm,
shown in Figure 2, produces a list LRNN=
{<pj, Tj>} where pj is the reverse
nearest neighbor of q during time interval
Tj. Note that the format of LRNN differs
from the format of the answer to the RNN
query, as defined, where intervals Tj do
Figure 1: Division of the Space around not overlap and have sets of points
Query Point q associated with them. To simplify the
description of algorithms we use this
The same kind of observation leads to the format in the rest of the paper. Having
following line l3 line l2 property. Let p be LRNN it is quite straightforward to
a NN point of q in Si. If p is not on one of transform it into the format described by
the space-dividing lines, either ' is the NN sorting end points of time intervals in RNN
point of (and then p is the RNN point of and performing a ―time sweep‖ to collect
q), or q has no RNN point in Si. Stanoi et points for each of the formed time
al. prove this property. These intervals.
observations enable a reduction of the
RNN problem to the NN problem. For
each region Si, a candidate set of one or
two NN points of q in that region is found.
(A set with more than two NN points is
not a candidate set.) Then for each of
those points, it is checked whether q is
the nearest neighbor of that point. The
answer to the RNN (q) query consists of
those candidate points that have q as
their nearest neighbor. In another
solution for answering RNN queries, Korn
and Muthukrishnan use two R-trees for Figure 2: Algorithm Computing Reverse
the querying, insertion, and deletion of nearest Neighbors for Moving Objects in
points. In the RNN-tree, the minimum the Plane
bounding rectangles of circles having a To reduce the disk I/O incurred by the
point as their center and the distance to algorithm, all the six sets Biare found in
the nearest neighbor of that point as their asingle traversal of the index. Note that if,
radius are stored. The NN-tree is simply at some time, there is more than one
an R*-tree where the data points are nearest neighbor in some Si, those
stored. Yang and Lin improve the solution nearest neighbors are nearer to each
of Korn and Muthukrishnan by introducing other than to the query point, meaning
the Rdnn-tree, which makes possible to that Si will hold no RNN points for that
answer both RNN queries and NN queries time. We thus assume in the following
using a single tree. None of the above- that, in sets Bi, each interval Tijconsists of
mentioned methods handle continuously a single nearest neighbor point, nnij.All the
moving points. In the next section, before RNN candidates‘nnijare also verified in one
presenting our method, we discuss the traversal. To make this possible, we use
extendibility of these methods to support Σi,j M(R, nij) as the metric for ordering the
continuously moving points. search in step 2.1 ofFind RNN. In

42
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

addition, a point or a rectangleispruned 2. Query the ORT and collect objects


only if it can be pruned for each of the falling on edges in L
query point‘s nnij. 3. for every border node n
4. Perform range (or NN) search with
Thus, the index is traversed twice in parameter r (k) at n
total.When analysing the I/O complexity 5. Form CS as the union of objects
of Find RNN, we observe that in the worst retrieved in steps 2 and4
case, all nodes ofthe tree are visited to VI RANGE QUERY OPTIMIZATIONS
find the nearest neighbors using Find NN, If the query type is r-range, border node
which is performed twice. As noted expansion (in step 4) does not need to
byHjaltason and Samet, this is even the proceed to AEL edges, because the
case for static points (t⌐ = t⌐), where the corresponding objects are either on some
size of the result set isconstant. For points AEL edge (and thus, retrieved by step 2)
with linear movement, the worst case size or, if they‘re outside the AEL, they are
of the result set of the NN query is O (N) discovered by the expansion of another
(where N is the database size). The size border node. An additional optimization
of the result set of Find NN is important particular to the range query type is to
because if the combinedsize of the sets Bi combine steps 2 and 4 so that CS objects
is too large, the Biwill not fit in main are collected by querying the ORT only
memory. once. Specifically, after step 1, we expand
V SINGLE QUERY PROCESSING the network (using only the adjacency
Processing is based on Theorem 1. A index) for every border node up to
direct implementation of the theorem distance r, and then, query the ORT to
uses (network-based) search operations retrieve the objects that fall on some of
as off-the-shelf building blocks. Thus, the the acquired edges or inside the AEL. An
NAP query evaluation methodology is additional optimization is that when the
readily deployable on existing systems, expansion of a border node n visits (i.e.,
and can be easily adapted to different deheaps) a previously expanded node n0,
network storage schemes. As a case then expansion needs not proceed to
study, in this section, we focus on the (i.e., en-heap) the adjacent nodes of n‘,
storage scheme and the network since the objects reachable through n‘ are
expansion framework of, in order to inserted into the CS by the expansion of
provide a concrete NAP prototype. n‘.
Consider first the scenario where the AZ
sends a single AEL query to the LS. CS VII KNNQUERY OPTIMIZATIONS
computation follows Algorithm 1. Step 1 If the query type is kNN, in step 4, the LS
computes the border nodes of the AEL retrieves the kNNs of each border node
(using the edge R-tree and the adjacency using network expansion in all directions,
index). Step 2 queries the ORT and places i.e., it also proceeds on the AEL edges.
into the CS all objects falling on some AEL The reason is that even if some NNs fall
edge. Then, steps 3 and 4 expand all inside the AEL or belong to the kNN set of
border nodes to include in the CS their other border nodes, they lead to earlier
kNN objects (or for r-range query type, termination of the expansion. However,
the objects within distance r from them). kNN processing allows for an optimization
Depending on the query type, some on ORT accesses; if a border node
optimizations are possible to reduce the expansion needs to process objects that
LS processing cost. fall inside the AEL or lie on an edge
encountered in a previous expansion, we
Algorithm: need not query the ORT, but may directly
Candidate Set Computation in NAP use the data objects already fetched into
Search (AEL L) the memory-resident CS. Another
1. Identify the border nodes of L optimization is toreuse kNN results of
previously expanded border nodes; if

43
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

during the expansion we deheap some of points with maximum speeds of 0.75, 1.5,
these nodes, we directly insert their kNN and 3 km/min.
objects into the temporary result Then, queries are introduced, intermixed
(denoted by W in Section 2.1) and do not with additional updates. Each query
en-heap their adjacent nodes. corresponds to a randomly selected point
from the currently active data set. Our
VIII EXPERIMENTAL performance graphs report average
EVALUATION numbers of I/O operations per query.
The algorithms presented in this paper In this section, we evaluate the
were implemented in C++, using a TPR- robustness and scalability of our proposed
tree implementation based on GiST. methods on a real road network. Our
Specifically, the TPR-tree implementation algorithms were implemented in C++ and
with self-tuning time horizon was used. experiments were executed on a Pentium
We investigate the performance of D 2.8 GHz PC. We measured the average
algorithms in terms of the number of I/O of the following performance values over
operations they perform. The disk page a query workload of 100 queries: 1)
size (and the size of a TPR-tree node) is anonymization time and refinement time
set to 4k bytes, which results in 204 at the anonymizer AZ 2) I/O time and
entries per leaf node in trees. An LRU CPU time for query processing at the
page buffer of 50 pages is used, with the location server LS and 3) the
root of a tree always pinned in the buffer. communication cost (in terms of
The nodes changed during an index transmitted points) for the anonym zing
operation are marked as ―dirty‖ in the edge list AEL and the candidate set CS.
buffer and are written to disk at the end Note that each edge in AEL is counted as
of the operation or when they otherwise two points
have to be removed from the buffer.
The performance studies are based on CONCLUSION
synthetic ally generated workloads that In this paper we proposed a frame work
intermix update operations and queries. for answering Reverse Nearest Neighbor
To generate the workloads, we simulate Ë queries using k anonymity.And shown the
objects moving in a region of space with experimental results for robustness and
dimensions 1000 X 1000 kilometers. scalability of the proposed system using a
Whenever an object reports its Road Networks.
movement, the old information pertaining ACKNOWLEDGMENT
to the object is deleted from the index The authors gratefully acknowledge the
(assuming this is not the first reported following individuals for their support: Mr.
movement from this object), and the new R. Nakeeran Professor, Dr.Pauls
information is inserted into the index. Engineering College and my family and
Two types of workloads were used in the friends for their valuable guidance for
experiments. In most of the experiments, devoting their precious time, sharing their
we use uniform workloads, where knowledge and co-operation.
positions of points and their velocities are
distributed uniformly. The speeds of REFERENCES
objects vary from 0 to 3 kilometers per [1] B.Gedik and L. Liu, ―MobiEyes:
time unit (minute). In other experiments, Distributed Processing of Continuously
more realistic workloads are used, where Moving Queries on Moving Objects in a
objects move in a network of two-way Mobile System,‖ Proc. Int‘l Conf.
routes, interconnecting a number of Extending Database Technology (EDBT),
destinations uniformly distributed in the 2004.
plane. Points start at random positions on [2] H. Hu, J. Xu, and D.L. Lee, ―A
routes and are assigned with equal Generic Framework for Monitoring
probability to one of three groups of Continuous Spatial Queries over Moving
Objects,‖ Proc. ACM SIGMOD, 2005.

44
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[3] G.S. Iwerks, H. Samet, and K. Symp,Spatial and Temporal Databases


Smith, ―Continuous K-Nearest Neighbor (SSTD), 2003.
Queries for Continuously Moving Points [6] C.S. Jensen, D. Lin, B.C. Ooi, and
with Updates,‖ Proc. Int‘l Conf. Very Large R. Zhang, ―Effective Density Queries on
Data Bases (VLDB), 2003. Continuously Moving Objects,‖ Proc. Int‘l
[4] I. Lazaridis, K. Porkaew, and S. Conf. Data Eng. (ICDE), 2006.
Mehrotra, ―Dynamic Queries over Mobile [7] I. Stanoi, D. Agrawal, and A. E.
Objects,‖ Proc. Int‘l Conf. Extending Abbadi. Reverse nearest neighbor queries
Database Technology (EDBT), 2002. for dynamic databases. In Proc. DMKD,
[5] M. Hadjieleftheriou, G. Kollios, D. 2000.
Gunopulos, and V.J. Tsotras, ―On-Line [8] S. Saltenis and C. S. Jensen.
Discovery of Dense Areas in Spatio- Indexing of Moving Objects for Location-
temporal Databases,‖ Proc. Int‘l Based Services.Time Center TR-63, 24
pages, 2001.

45
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

AN EFFICIENT IMAGE ENHANCEMENT


BASED ON DWT AND SVD
R.Bhanumathi,
PG SCHOLAR, M.E.CSE, PITAM, bhanu291987@yahoo.co.in,09884229007
ABSTRACT
This Paper deals with new satellite image Most image-processing techniques involve
contrast enhancement technique based treating the image as a two-dimensional
on the discrete wavelet transform (DWT) signal and applying standard signal-
and singular value decomposition (SVD) processing techniques to it. Image
has been proposed. Two-dimensional processing usually refers to digital image
Discrete Haar wavelet transform is applied processing, but optical and analog image
to the given satellite image. It processing also are possible. This article is
decomposes an input image into four sub- about general techniques that apply to all
bands, one average component (LL) and of them. The acquisition of images
three detail components (LH, HL, HH). (producing the input image in the first
Then SVD (Singular Value Decomposition) place) is referred to as imaging.
is applied to the LL sub band only. The Satellite images are used in many
singular value matrix represents the applications such as geosciences studies,
intensity information of the given image astronomy, and geographical information
and any change on the singular values systems. One of the most important
change the intensity of the input image. quality factors in satellite images comes
The technique converts the image into the from its contrast.
SVD domain and after normalizing the Contrast is created by the difference in
singular value matrix it reconstructs the luminance reflected from two adjacent
image in the spatial domain by using the surfaces. In visual perception, contrast is
updated singular value matrix and then it determined by the difference in the color
reconstructs the enhanced image by and brightness of an object with other
applying inverse DWT (IDWT). The objects.
technique is compared with conventional If the contrast of an image is highly
image equalization techniques such as concentrated on a specific range, the
standard general histogram equalization information may be lost in those areas
(GHE) and local histogram equalization which are excessively and uniformly
(LHE), as well as state-of-the-art concentrated. The problem is to optimize
techniques such as brightness preserving the contrast of an image in order to
dynamic histogram equalization (BPDHE) represent all the information in the input
and singular value equalization (SVE). image.
INTRODUCTION Histogram equalization is a method in
An image is an array, or a matrix of image processing of contrast adjustment
square pixels (picture elements) arranged using the image's histogram. This method
in columns and rows. Image processing is usually increases the local contrast of
any form of signal processing for which many images, especially when the usable
the input is an image, such as a data of the image is represented by close
photograph or video frame, the output of contrast values. Through this adjustment,
image processing may be either an image the intensities can be better distributed on
or, a set of characteristics or, parameters the histogram. This allows for areas of
related to the image. lower local contrast to gain a higher

46
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

contrast without affecting the global Software Requirements


contrast. Histogram equalization  Operating System :
accomplishes this by effectively spreading Windows XP
out the most frequent intensity values.  Language : JAVA
TYPES OF IMAGES
PROPOSED SYSTEM 1.Intensity image (gray scale image)
Two-dimensional Discrete Haar wavelet 2.Binary image
transform is applied to the given satellite 3.RGB image
image. It decomposes an input image into 4.Multiframe image
four sub-bands, one average component
(LL) and three detail components (LH, HL, SYSTEM ANALYSIS
HH). Then SVD (Singular Value BLOCK DIAGRAM
Decomposition) is applied to the LL sub
band only.
The Haar wavelet has the advantage of
being simple to compute and easy to
understand, each step in the Haar wavelet
transform calculates a set of scaling
coefficients and a set of wavelet
coefficients.
The singular value matrix represents the
intensity information of the given image
and any change on the singular values
change the intensity of the input image.
This technique converts the image into
the SVD domain and after normalizing the
singular value matrix it reconstructs the
image in the spatial domain by using the
updated singular value matrix. The
technique is called the singular value
equalization (SVE).

The LL subband concentrates the


illumination information. That is why the
LL subband goes through the SVE
process, which preserves the high-
frequency components (i.e., edges).
Hence, after inverse DWT (IDWT), the
resultant image will be sharper with good
contrast.The technique was compared
with the GHE, LHE, BPDHE, and SVE
techniques.
DOMAIN ANALYSIS
The basic hardware and software
requirements for this Paper are analyzed
and specified.
Hardware Requirements Figure no. 3.1 Detailed steps of proposed
 Hard disk : 80 GB equalization technique
 RAM : 512 MB ARCHITECTURE DIAGRAM
 Processor : Pentium IV.

47
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4. Applying SVD
5. Applying IDWT
6. Get the Equalized Image..

CONCLUSION
Figureno. 3.2 Architecture Diagram of The singular-value-based image
DWT equalization (SVE) technique is based on
DATA FLOW DIAGRAM equalizing the singular value matrix
Data Flow strategy shows the use of data obtained by singular value decomposition
in a system pictorically. The tools used in (SVD). A new satellite image contrast
the following strategy shows all the enhancement technique based on DWT
essential features of a system and how and SVD is implemented. The
they fit together. Data Flow tools help by techniquedecomposes the input image
illustrating the essential components of a into the DWT sub bands, and, after
system and their interactions. Data Flow updating the singular value matrix of the
Diagrams are one of the most important LL subband, it reconstructed the image by
tools in Data Flow strategy. using IDWT.
Data Flow Diagram is a means of Discrete Wavelet Transform (DWT) is any
representing a system at any level of wavelet transform for which the wavelets
detail with the graphic network of are discretely sampled. The Haar DWT
symbols showing data flows, data stores, illustrates the desirable properties of
data processes and data sourced for wavelets in general.
destination. The singular value matrix represents the
intensity information of the give image
The purpose of Data Flow diagram is to and any change on the singular values
provide a semantic bridge between users change the intensity of the input image.
and system developers. This technique converts the image into
LEVEL 0 DFD the SVD domain and after normalizing the
singular value matrix it reconstructs the
0 image in the spatial domain by using the
Lo Equ updated singular value matrix.
w Imag aliz The technique was compared with the
Con e ed GHE, LHE, BPDHE and SVE techniques.
trast Enha Sate The visual results on the final image
Sate ncem llite quality show the superiority over the
Figure no.
llite3.3 Level ent
0 DFD ima conventional and the state-of-the-art
ima based ge techniques.
SYSTEM geDESIGN on
DWT REFERENCES
In this chapter let and us discuss about the
context diagram, the SVDdata flow diagrams [1].H. Demirel, G. Anbarjafari, and M. N.
and that are the modules present in our S. Jahromi, ―Image equalization based on
Paper with the description for each singular value decomposition,‖ in Proc.
modules. 23rd IEEE Int. Symp.Comput. Inf. Sci.,
Istanbul, Turkey, Oct. 2008, pp. 1–5.
MODULES
1. Get image [2] H. Ibrahim and N. S. P. Kong,
2. Equalized Image Using GHE ―Brightness preserving dynamic histogram
3. Applying DWT

48
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

equalization for image contrast


enhancement,‖ IEEE Trans.
Consum.Electron., vol. 53, no. 4, pp.
1752–1758, Nov. 2007.

[3]. T. Kim and H. S. Yang, ―A


multidimensional histogram equalization
by fitting an isotropic Gaussian mixture to
a uniform distribution,‖ in Proc.IEEE Int.
Conf. Image Process., Oct. 8–11, 2006,
pp. 2865–2868.

[4]. H. Demirel and G. Anbarjafari,


―Satellite image super resolution using
complex wavelet transform,‖ IEEE Geosci.
Remote Sens. Lett., vol. 7, no. 1, Jan.
2010,

[5]. C. C. Sun, S. J. Ruan, M. C. Shie, and


T. W. Pai, ―Dynamic contrast
enhancement based on histogram
specification,‖ IEEE Trans.
Consum.Electron., vol. 51, no. 4, pp.
1300–1305, Nov. 2005.

[6]. W. G. Shadeed, D.I. Abu-Al-Nadi, and


M. J. Mismar, ―Road traffic sign detection
in color images,‖ in Proc. 10th IEEE Int.
Conf. Electron.,

49
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ANALYSIS AND PROTECTION OF KEY DISTRIBUTION


SCHEME FORSECURE GROUP COMMUNICATION
*P.K.UMAMAHESWARI
PGSCHOLOR, M.E. CSE, PRATHYUSHA INSTITUTE OF TECHNOLOGY AND MANAGEMENT
umapk2008@gmail.com

Abstract—In secure group-oriented


applications, key management schemes
are employed to distribute and update 1. INTRODUCTION
keys such that unauthorized parties
cannot access group communications. Key The project motivation is ubiquity of
management, however, can disclose communication networks is facilitating
information about the dynamics of group applications that allow communication
membership, such as the group size and and collaboration among a large number
the number of joining and departing of diverse users. Group key management,
users. This is a threat to applications with which is concerned with generating and
confidential group membership updating secret keys, is one of the
information. This paper investigates fundamental technologies to secure such
techniques that can stealthily acquire group communications. Key management
group dynamic information from key facilitates access control and data
management. I have shown that insiders confidentiality by ensuring that the keys
and outsiders can successfully obtain used to encrypt group communication is
group membership information by shared only among legitimate group
exploiting key establishment and key members. Thus, only legitimate group
updating procedures in many popular key members can access group
management schemes. Particularly, I communications. The shared group key
develop three methods targeting tree- can also be used for authentication. When
based centralized key management a message is encrypted using the group
schemes. Further, I propose a defense key, the message must be from a
technique utilizing batch rekeying and legitimate group member.
phantom users, and derive performance There are three types of group
criteria that describe security level of the key management schemes in centralized
proposed scheme using mutual key management, such as group
information. The proposed defense members trust a centralized server,
scheme is evaluated based on the data referred to as the key distribution center
from MBone multicast sessions. I also (KDC), which generates and distributes
provide a brief analysis on the disclosure encryption keys. In decentralized
of group dynamic information in schemes, such as, the task of KDC is
contributory key management schemes. divided among subgroup managers. In
contributory key management schemes,
Index Terms—Security, Group Key such as group members are trusted
Management, RSA equally and all participate in key
establishment.

50
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

content of group communication is


The design of current key management protected by encryption using the secret
schemes focuses on maintaining key keys, group dynamic information is
secrecy and reducing overhead associated disclosed through key management.
with key updating the efficiency of a Collectively refer to Group Dynamic
keying solution for secure multicast Information (GDI) as information
applications; it is often beneficial to use describing the dynamic group
features of multicast communications that membership, including the number of
makes it an efficient form of group users in a multicast group as a function of
communications. The ideal key time, and the number of joining or
distribution efficiency in a multicast departing users in a time interval.
environment is O(1). In such a scenario, a
centralized server may transmit only a Analyse GDI leakage problem and
single keying message to the entire group propose a framework to protect GDI from
to perform a group rekey. Every group insiders and outsiders. To protect GDI,
member can extract the required key develop a defence method that is fully
material from this one message. In compatible with existing key management
contrast, the efficiency of using unicast schemes. In the centralized key
techniques to distribute a group key management schemes, there exists a key
separately to each group member is O(n). server that generates and distributes the
Note, in most cases it is more efficient to decryption keys. The GDI should be kept
perform the initial keying of participants in confidential in many group-oriented
a unicast fashion during the registration applications, yet to acquire GDI from key
process. The registration function is management can be simple and stealthy.
inherently a one-to-one between a single Instead of trying to break encryption or
participant and the Initiator or other compromise the key distribution center,
trusted registration authority. By coupling the adversaries can subscribe to the
registration with key distribution, the service as regular users. In this case, they
overall number of transmissions required are referred to as insiders. Insiders can
to perform both functions can be reduced. obtain very accurate estimation of GDI by
Keying functions may be either centralized monitoring the rekeying messages, which
or distributed throughout the architecture. are the messages conveying new key
In a centralized architecture, keying updating information.
functions are restricted to a single trusted
authority. In some cases, this may be the The main objective is the techniques that
Initiator of a session or another entity can stealthily acquire group dynamic
assigned by the Initiator to handle these information from key management. I had
vital functions. For scalability purposes, show that insiders and outsiders can
keying and registration functions may be successfully obtain group membership
distributed to other trusted entities. information by exploiting key
Applications that are of the type ―one-to- establishment and key updating
many‖ may benefit from a strictly procedures in many popular key
centralized architecture. Alternatively, management schemes. In secure group
distributed architectures may prove more communications, users of a group share a
scalable since processing and storage common group key. A key server sends
requirements are distributed across the the group key to authorized new users as
network. However, that key management well as performs group rekeying for group
can disclose information about dynamic users whenever the key changes. Rekey
group membership to both insiders and transport has an eventual reliability and a
outsiders. In other words, while the soft real-time requirement, and that the

51
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

rekey workload has a sparseness  Clustering Structure


property, that is, each group user only  Distribution Novel Network
needs to receive a small fraction of the Technique
packets that carry a rekey message sent  Intranet Service
by the key server.  Multicast Network
 Multimedia
 Real Time Systems
 Tele Conference
 Wireless Micro Sensor Networks

2.RELATED WORKS

Implementing security for IP multicast


networks. These issues are of importance
to application developers wishing to
Fig 1.1 - Functional Components of a Key implement security services for their
Management Service multicast applications. The paper[4]
ADVANTAGES investigates the steps required to create a
Collecting and analyzing the secure multicast session including issues
join/leave behavior is of great benefit in of group membership and key
understanding how any networking distribution. A common simple criteria is
infrastructure with multicast and real time established that can be used to evaluate
capabilities will be used. multicast keying architectures. The criteria
This collected data is then analyzed to focus on the efficiency and scalability of
produce the keying solution. Using these criteria,
(1) Basic statistics about group size and several keying architectures are evaluated
membership turnover, (2) Information and compared to determine their
about the temporal and spatial dynamics strengths and weaknesses.
of the multicast group Group communication can benefit from
(3) User join/leave behavior workloads IP multicast to achieve scalable exchange
and models. of messages. However, there is a
(4) Cost of preventing GDI leakage. challenge of effectively controlling access
to the transmitted data. IP multicast by
itself does not provide any mechanisms
DISADVANTAGES for preventing no group members to have
access to the group communication.
The disadvantage of batch Although encryption can be used to
rekeying is that the joining/departing protect messages exchanged among
users will be able to access a small group members, distributing the
amount of information before/after their cryptographic keys becomes an issue. In
join/departure. Thus, the parameter [2]Researchers have proposed several
Btmust be chosen based on the group different approaches to group key
policies. In particular, Btshould be smaller management. These approaches can be
than the maximum acceptable delay divided into three main classes:
between revoking a user and sending centralized group key management
information that should not be accessed protocols, decentralized architectures and
by the revoked user. distributed key management protocols.
The three classes are described here and
APPLICATIONS an insight given to their features and
 Artificial Intelligence goals. The area of group key

52
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

management is then surveyed and considerations important to the


proposed solutions are classified development of new tools/systems.
according to those characteristics.
[8] This report contains a discussion of 3.SYSTEM MODEL AND BACKGROUND
the difficult problem of key management
for multicast communication sessions. It HIERARCHICAL TREE APPROACH
focuses on two main areas of concern
with respect to key management, which The Hierarchical Tree Approach
are, initializing the multicast group with is our recommended approach to address
a common net key and rekeying the the multicast key management problem.
multicast group. A rekey may be This approach provides for the Following
necessary upon the compromise of a user requisite features:
or for other reasons (e.g., periodic rekey).
In particular, this report identifies a 1. Provides for the secure removal
technique which allows for secure of a compromised userfrom the Multicast
compromise recovery, while also being group
robust against collusion of excluded users. 2. Provides for transmission
This is one important feature of multicast efficiency
key management which has not been 3. Provides for storage efficiency
addressed in detail by most other
multicast key management proposals. Three methods to obtain GDI,
The benefits of this proposed technique
are that it minimizes the number of A. Tree-based centralized key
transmissions required to rekey the management schemes
multicast group and it imposes minimal
storage requirements on the multicast Similar to other tree-based schemes, the
group. centralized Versakey scheme in employs a
[12] As the Internet is expected to better key tree to maintain the keying material.
support multimedia applications, new Each node of the key tree is associated
services will need to be deployed. An with a key. The root of the key tree is
example of one of these next-generation associated with the session key (SK), Ks,
services is multicast communication, the which is used to encrypt the multicast
one-to-many delivery of data. Over the content. Each leaf node is associated with
last ten years, multicast research as well a user‘s private key, ui, which is only
as deployment efforts have both been known by this user and the KDC. The
major areas of interest. In order to bridge intermediate nodes are associated with
the gap between the initial deployment key encryption-keys (KEK), which are
experiments and the availability of auxiliary keys and used only for the
multicast as a robust network service, purpose of protecting the session key and
there needs to be a full complement of other KEKs.
multicast monitoring tools. In this paper
we first survey the debugging, modeling, B. Rekeying-message format
and management tools that have evolved
along side the Internet's multicast An insider receives rekeying messages,
infrastructure. Through this survey, we decrypts some of the messages, and
have observed important generalizations observes the rekeying-message-size
in three areas: (1) the challenges unique without having to understand the content
to monitoring multicast, (2) a of all messages.
methodology common to many multicast
monitoring tools/systems and (3) a set of

53
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

C. Estimation of the group size from the message-size equals to the length of the
rekeying-message-size binary representation of user IDs, which
In some tree-based key can be independent of N(t).
management schemes, key tree is fully In Iolus, a large group is decomposed into
loaded and maintained as balanced as a number of subgroups, and the trusted
possible by putting the joining users on local security agents perform admission
the shortest branches. In this case, the control and key updating for the
group size N(t) can be estimated directly subgroups. This architecture reduces the
from the rekeying-message-size. Here, number of users affected by key updating
derive a Maximum Likelihood (ML) resulting from membership changes.
estimator and then demonstrate the Since the key updating is localized within
effectiveness of this estimator through each subgroup, the insiders or outsiders
simulations. This ML estimator is first can only obtain the dynamic membership
applied in simulated group information of the subgroups that they
communications. belong to or can monitor.
The idea of clustering was
D. Estimation of group size based on key introduced in to achieve the efficiency by
IDs localizing key updating. The group
members are organized into a hierarchical
Each key contains the secret material that clustering structure. The cluster leaders
is the content of the key and a key are selected from group members and
selector that is used to distinguish the perform partial key management. Since
key. The key selector consists of: 1) a the cluster leaders establish keys for the
unique ID that stays the same even if the cluster members through pair-wise key
key content changes and 2) a version and exchange, the cluster members cannot
revision field, reflecting update of the key. obtain GDI of their clusters. However, the
The basic format of the rekeying cluster leaders naturally obtain the
messages is {Ky}Kx, representing Ky dynamic membership information of their
encrypted by Kx. This message has two clusters and all clusters below from 3 to
parts. The first part is the key selector of 15. Therefore, this key management
Kx, which is not encrypted because scheme can be applied only when a large
otherwise a user will not be able to portion of group members are trusted to
understand this message. The second perform key management and obtain
part is Ky and the key selector of Ky, GDI.
encrypted by Kx. Thus, in the current In, a topology-matching key
implementation, everyone who can management (TMKM) scheme was
overhear therekeying messages can see presented to reduce the communication
the IDs of Kx. overhead by matching the key tree with
the network topology and localizing the
E. GDI vulnerability in other key transmission of the rekeying messages. In
management schemes this scheme, group members receive only
the rekeying message that are useful for
Tree-based key management
schemes have been known for their
efficiency in terms of communication,
computation and storage overhead.
Besides the tree-based scheme, the
VersaKey framework also includes a
centralized flat scheme. When a user joins
or leaves the group, the rekeying-

54
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

themselves and their neighbors. users and the real users lead to a new
rekeying process, called the observed
rekeying process.

Artificial GDI functions can also be non-


deterministic. The proposed defense
scheme is compatible with any artificial
GDI functions that satisfy the requirement
(r1)-(r4). Utilizing phantom users and
batch rekeying is
not the only solution to the problem of
GDI leakage. There are other techniques
that can protect GDI against one or
several attack methods.

It is important to point out that the idea


of employing phantom users is not
complicated. The challenge is to
determine the amount of phantom users
such that the observed rekeying process
Fig 2.1 – A Typical Key Management Tree
reveals the least amount of GDI given the
resource consumption constraint.
TECHNIQUES USED
4. CONSTRUCTION OF RSA
To protect GDI in the time
domain, use batch rekeying, which
Group Formation
postpones the updates of the keys in
The formation of the multicast group as
order to remove the correlation between
being Root initiated; the hierarchical
the time of key updating and the time
approach is consistent with user
when users join/leave the group. In
particular, implement batch rekeying as initiated joining. User initiated joining is
the method of multicast group formation
periodic updates of keys. Particularly, the
presented .User initiated joining may be
users who join or leave the group in the
desirable when some core subset of users
time interval [(k- 1)Bt; kBt], are added to
in the multicast group need to be
or removed from the key tree together at
brought up on-line and communicating
time kBt, where k is an positive integer
and Bt is the key updating period. By more quickly. Other participants in the
doing so, the time-domain observations multicast group can then be brought in
when they wish. In this type of approach
do not contain information about when
though, there does not exist a finite
users join/leave the group. It is important
to note that batch rekeying was originally period of time by when it can be ensured
proposed to reduce the rekeying all participants will be a part of the
multicast
overhead.

To reduce the amount of GDI in the


Sender Specific Authentication
In the multicast environment,
message domain, insert phantom users
the possibility exists that participants of
into the system. These phantom users, as
well as their join and departure behaviors, the group at times may want to uniquely
are created by the KDC in such a way that identify which participant is the sender of
a multicast group message. In the
the combined effects of the phantom

55
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

multicast key distribution system authenticated using that key. The


described by, the notion of "sender generation, transmission and storage of
specific keys" is presented. Another keys is called key management all
option to allow participants of a multicast cryptosystems must deal with key
group to uniquely determine the sender of management issues. Because all keys in a
a message is through the use of a secret-key cryptosystem must remain
signature process. When a member of secret, secret-key cryptography often has
the multicast group signs a message with difficulty providing secure key
their own private signature key, the management, especially in open systems
recipients of that signed message in the with a large number of users.
multicast group can use the sender's
public verification key to determine if The RSA cryptosystem is a public-key
indeed the message is from who it is cryptosystem that offers both encryption
claimed to be from. and digital signatures (authentication).
Ronald Rivest, Adi Shamir, and Leonard
Group Controller Adleman developed the RSA system in
1977 . RSA stands for the first letter in
Group controller (or KDC) that each of its inventors' last names. The RSA
is used in this architecture may not be the algorithm works as follows: take two large
best design for small, interactive groups. primes, p and q, and compute their
But for large, single-source multicast product n = pq; n is called the modulus.
groups, it is generally undesirable to Choose a number, e, less than n and
distribute key management functions relatively prime to (p-1)(q-1), which
among group members: Unlike small, means e and (p-1)(q-1) have no common
interactive groups, large single-source factors except 1. Find another number d
multicast groups generally need a such that (ed - 1) is divisible by (p-1)(q-
specialized KDC to support large numbers 1). The values e and d are called the
of group members. Large distributed public and private exponents,
simulations, moreover, may combine the respectively. The public key is the pair (n,
need for large-group operation with many e); the private key is (n, d). The factors p
senders. and q may be destroyed or kept with the
private key.It is currently difficult to
RSA Authentication obtain the private key d from the public
The sender and receiver of a message key (n, e). However if one could factor n
know and use the same secret key; the into p and q, then one could obtain the
sender uses the secret key to encrypt the private key d. Thus the security of the
message, and the receiver uses the same RSA system is based on the assumption
secret key to decrypt the message. This that factoring is difficult. The discovery of
method is known as secret key or an easy method of factoring would
symmetric cryptography. The main "break" .Here is how the RSA system can
challenge is getting the sender and be used for encryption and authentication
receiver to agree on the secret key
without anyone else finding out. If they
are in separate physical locations, they
must trust a courier, a phone system, or 5. PERFORMANCE MEASURE AND
some other transmission medium to OPTIMIZATION
prevent the disclosure of the secret key.
Anyone who overhears or intercepts the Define two performance criteria and
key in transit can later read, modify, and evaluate the performance of the proposed
forge all messages encrypted or defense technique. The criteria are (a) the

56
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

amount of information leaked to the Key management is a technology that


insiders and outsiders measured by enables key updating in real time as
mutual information, and (b) the group membership changes. Future
communication overhead introduced by commercial multicast services, which
the phantom users. Study the tradeoff could occur in non-traditional broadcast
between these two metrics and provide a media such as Internet and 3G/4G
framework of choosing the proper amount wireless networks, will allow a user to
of phantom users. subscribe to an arbitrary set of programs
and change his/her subscription at any
A. The leakage of GDI time .
Aspects of key management schemes that
Use mutual information to can reveal group dynamic information.
measure the leakage of the GDI, which New attacks may emerge in the future.
represents the maximum amount of The search for the best artificial
information that can possibly be revealed. GDI functions will be investigated in the
Since the observed rekeying process is future work.
determined by the artificial GDI, and the In future secure multicast
artificial GDI is only related with the real applications with confidential GDI,
GDI, the following Markov chain can be research will be done to make the service
formed: real GDI->artificial GDI - providers have control over whether
>observed rekeying process. Thus, the multicast monitoring tools can be used or
mutual information between the observed not.
process and the real GDI is no more than The work is done in many other
the mutual information between the real aspects, such as traffic analysis, can be
and artificial GDI. jointly investigated with key management
such that GDI will be better protected
B. Communication Overhead against attacks from other angles.

Communication overhead, measured by 7. CONCLUSION


the rekeying-message-size, is one of the
major performance criteria of key This paper raised the issues of the GDI
management schemes. disclosure through key management in
secure group communications.Such a
C. System Optimization security concern has not been addressed
in the design of current key management
From the system design points of schemes. In particular, this paper has
view, the leakage of the GDI is minimized made two main contributions. First,
while the extra communications overhead present several effective methods that
do not exceed certain requirements. could get dynamic group membership
information from the current centralized
D. Backward Secrecy key management schemes. This study
New joined group members must have no showed that GDI could be easily obtained
access to past group communication. by insiders and outsiders who exploited
the rekeying messages in key
E. Forward Secrecy management protocols. This posed a
Revoked group members must have no threat to group communications with
access to future group communication. confidential GDI. Second, develop defense
. techniques that could protect GDI, by
utilizing batch rekeying and phantom
6. FUTURE ENHANCEMENT users. For the proposed defense

57
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

techniques, the fundamental tradeoff [7] X. Li, Y. Yang, M. Gouda, and S. Lam,
between the communication overhead ―Batch rekeying for secure group
and the leakage of GDI was studied. In communications,‖ Proceedings of the 10th
addition, this paper provided a brief international conference on
discussion on the GDI problem in World Wide Web, pp. 525–534, 2001.
contributory key management schemes. It [8] M. Moyer, J. Rao, and P. Rohatgi, ―A
was argued that contributoryschemes survey of security issues in multicast
were not suitable for applications in which communications,‖ Network, IEEE, vol. 13,
GDI should be protected. no. 6, pp. 12–23,
1999.
REFERENCES [9] S. Rafaeli and D. Hutchison, ―A survey
of key management for secure group
[1] I. Chang, R. Engel, D. Kandlur, D. communication,‖ ACM Computing Surveys
Pendarakis, D. Saha, I. Center, and Y. (CSUR), vol. 35, no. 3,
Heights, ―Key management for secure pp. 309–329, 2003.
lnternet multicast using Boolean [10] E. McCluskey, ―Minimization of
functionminimization techniques,‖ Boolean functions,‖ Bell System Technical
INFOCOM‘99. Eighteenth Journal, vol. 35, no. 5, pp. 1417–1444,
Annual Joint Conference of the IEEE 1956.
Computer and Communications Societies. [11] A. Fiat and M. Naor, ―Broadcast
Proceedings. IEEE, vol. 2, 1999. Encryption, Advances in Cryptology-
[2] R. Poovendran and J. Baras, ―An Crypto93,‖ Lecture Notes in Computer
information-theoretic approach for design Science, vol. 773, pp. 480–491,
and analysis ofrooted-tree-based 1994.
multicast key management schemes,‖ [12] D. Boneh, A. Sahai, and B. Waters,
IEEE Transactions on Information Theory, ―Fully collusion resistant traitortracing
vol. 47, no. 7, pp. 2824–2834, 2001. with short ciphertexts and private keys,‖
[3] A. Sherman and D. McGrew, ―Key pp.
Establishment in Large Dynamic Groups 573–592, 2006.
Using One-Way Function Trees,‖ IEEE
TRANSACTIONS ONSOFTWARE
ENGINEERING, pp. 444–458, 2003.
[4] A. Perrig, D. Song, and J. Tygar, ―ELK,
A New Protocol for Efficient Large-Group
Key Distribution,‖ IEEE SYMPOSIUM ON
SECURITY
AND PRIVACY, pp. 247–262, 2001.
[5] L. Cheung, J. Cooley, R. Khazan, and
C. Newport, ―Collusion-Resistant Group
Key Management Using Attribute-Based
Encryption,‖ Cryptology ePrint Archive
Report 2007/161, 2007. http://eprint.
iacr. org, Tech. Rep.
[6] J. Bethencourt, A. Sahai, and B.
Waters, ―Ciphertext-Policy Attribute-
Based Encryption,‖ Proceedings of the
28th IEEE Symposium on Securityand
Privacy (Oakland), 2007.

58
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SECURE INFORMATION DELIVERY IN


WIRELESS SENSOR NODES
S.MADHAN KUMAR*, M.JABALIDBIN#
* Department of Computer Science and Engineering, VelTechMultiTech
DrRRDrSREngineeringCollege,
Chennai, Tamil Nadu.
madhan868@gmail.com
#
M.E., Computer Science and Engineering, VelTechMultiTech DrRRDrSREngineeringCollege,
Chennai, Tamil Nadu
jabalidbin@yahoo.com

ABSTRACT
Compromised node and denial of service 1. INTRODUCTION
are two key attacks in wireless sensor OF the various possible security threats
networks (WSNs). In this paper, data encountered in awireless sensor network
delivery mechanisms with high probability (WSN), in this paper, we arespecifically
circumvent black holes formed by these interested in combating two types of
attacks. Classic multipath routing attacks:compromised node (CN) and
approaches are vulnerable to such attacks, denial of service (DOS).In the CN attack,
mainly due to their deterministic nature. an adversary physically compromises
So once the adversary acquires the asubset of nodes to eavesdrop
routing algorithm, it can compute the information, whereas in theDOS attack,
same routes known to the source, hence, the adversary interferes with the
making all information sent over these normaloperation of the network by
routes vulnerable to its attacks. Under our actively disrupting, changing,or even
designs, the routes taken by the ―shares‖ paralyzing the functionality of a subset of
of different packets change over time. So nodes.These two attacks are similar in the
even if the routing algorithm becomes sense that they bothgenerate black holes:
known to the adversary, the adversary still areas within which the adversary caneither
cannot pinpoint the routes traversed by passively intercept or actively block
each packet. Besides randomness, the informationdelivery.
generated M routes are also highly Due to the unattended nature of WSNs,
dispersive and energy efficient, making adversariescan easily produce such black
them quite capable of circumventing black holes. Severe CNand DOS attacks can
holes. Formulation of an optimization disrupt normal data delivery
problem used to minimize the end-to-end betweensensor nodes and the sink, or
energy consumption under given security even partition the topology. Aconventional
constraints. cryptography-based security method
cannotalone provide satisfactory solutions
Key Words: Randomized multipath to these problems. Thisis because, by
routing, wireless sensor network, secure definition, once a node is compromised,
data delivery. the adversary can always acquire the
. encryption/decryptionkeys of that node,
and thus can intercept any
informationpassed through it.

59
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Likewise, an adversary can alwaysperform dispersiveenough to circumventa


DOS attacks (e.g., jamming) even if it moderate-size black hole.
does nothave any knowledge of the
underlying cryptosystem.One remedial 2.RANDOMIZED MULTIPATH DELIVERY
solution to these attacks is to exploit A three-phase approachfor secure
thenetwork‘s routing functionality. information delivery in a WSN: secret
Specifically, if the locationsof the black sharing ofinformation, randomized
holes are known a priori, then data can propagation of each informationshare, and
bedelivered over paths that circumvent normal routing (e.g., min-hop routing)
(bypass) these holes,whenever possible. towardthe sink. More specifically, when a
Three security problems exist in the sensor node wants tosend a packet to the
abovecounter-attack approach. First, this sink, it first breaks the packet intoM
approach is no longervalid if the adversary shares.
can selectively compromise or jamnodes. Each share is thentransmitted to some
This is because the route computation in randomly selected neighbour.
the abovemultipath routing algorithms is Thatneighbour will continue to relay the
deterministic in the sensethat for a given share it has received to other randomly
topology and given source and selected neighbours, and so on.
destinationnodes, the same set of routes In eachshare, there is a TTL field, whose
are always computed by therouting initial value is set by thesource node to
algorithm. As a result, once the routing control the total number of random
algorithmbecomes known to the adversary relays.After each relay, the TTL field is
(this can be done, e.g.,through memory reduced by 1. When theTTL value reaches
interrogation of the compromised 0, the last node to receive this
node),the adversary can compute the set sharebegins to route it toward the sink
of routes for any givensource and using min-hop routing.Once the sink
destination. Then, the adversary can collects at least T shares, it can
pinpoint toone particular node in each reconstruct theoriginal packet. No
route and compromise (or jam)these information can be recovered from
nodes. Such an attack can intercept all lessthan T shares.
shares of theinformation, rendering the
above counter-attack
approachesineffective.
Second, as pointed out in actually very
fewnode-disjoint routes can be found
when the node density ismoderate and the
source and destination nodes are several
hops apart. For example, for a node
degree of 8, on average only two node-
disjoint routes can be found between a
source and a destination that are at least
7 hops apart. Thereis also 30 percent
probability that no node-disjoint paths
canbe found between the source and the
destination. Thelack of enough routes Fig.1Randomized dispersive routing in a
significantly undermines the WSN.
securityperformance of this multipath
approach. The effect of route dispersiveness on
Last, because theset of routes is bypassing blackholes is illustrated in Fig.
computed under certain constraints, 2, where the dotted circlesrepresent the
theroutes may not be spatially ranges the secret shares can be

60
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

propagated toin the random propagation control needs to be imposed on


phase. therandom propagation process.

2.1.1 Purely Random Propagation


(Baseline Scheme)
In PRP, shares are propagated based on
one-hop neighbourhood information. More
specifically, a sensor node maintainsa
neighbor list, which contains the ids of all
nodes within itstransmission range. When
a source node wants to sendshares to the
sink, it includes a TTL of initial value N in
eachshare. It then randomly selects a
neighbor for each share, andunicasts the
share to that neighbor. After receiving the
share,the neighbor first decrements the
TTL.
If the new TTL isgreater than 0, the
Fig.2implication of route dispersiveness on neighbor randomly picks a node from
bypassing the black hole.(a) Routes of itsneighbor list (this node cannot be the
higher dispersiveness. (b) Routes oflower source node) and relaysthe share to it,
dispersiveness and so on. When the TTL reaches 0, the
finalnode receiving this share stops the
A larger dotted circleimplies that the random propagation ofthis share, and
resulting routes are geographically starts routing it toward the sink using
moredispersive. Comparing the two cases normalmin-hop routing. The WANDERER
in Fig. 2, it is clear thatthe routes of scheme [2] is a specialcase of PRP with N
higher dispersiveness are more capable ¼1.The main drawback of PRP is that its
ofavoiding the black hole. Clearly, the propagationefficiency can be low, because
random propagationphase is the key a share may be propagatedback and forth
component that dictates the security multiple times between neighbouring
andenergy performance of the entire hops.
mechanism. Increasing the TTL value does not fully
address this problem.This is because the
2.1 Random Propagation of Information random propagation process
Shares reachessteady state under a large TTL,
To diversify routes, an ideal random and its distribution will nolonger change
propagation algorithmwould propagate even if the TTL becomes larger.
shares as dispersively as
possible.Typically, this means propagating 2.1.2 No repetitive Random Propagation
the shares farther fromtheir source. At the NRRP is based on PRP, but it improves the
same time, it is highly desirable to havean propagationefficiency by recording the
energy-efficient propagation, which calls nodes traversed so far. Specifically,NRRP
for limiting thenumber of randomly adds a ―node-in-route‖ (NIR) field to
propagated hops. theheader of each share. Initially, this field
The challenge herelies in the random and is empty. Startingfrom the source node,
distributed nature of the propagation:a whenever a node propagates theshare to
share may be sent one hop farther from the next hop, the id of the upstream node
its source in agiven step, but may be sent isappended to the NIR field.
back closer to the source in thenext step, Nodes included in NIR areexcluded from
wasting both steps from a security the random pick at the next hop. Thisno
standpoint. Totackle this issue, some repetitive propagation guarantees that the

61
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

share will berelayed to a different node in


each step of randompropagation, leading 3. ASYMPTOTIC ANALYSIS OF THE PRP
to better propagation efficiency. SCHEME
The random routes generated by the four
2.1.3 Directed Random Propagation algorithms inSection 2 are not necessarily
DRP improves the propagation efficiency node-disjoint. So, a naturalquestion ishow
by using two-hopneighbourhood good these routes are in avoiding black
information. More specifically, DRP adds holes.We answer this question by
a―last-hop neighbour list‖ (LHNL) field to conducting asymptoticanalysisof the PRP
the header of eachshare. Before a share is scheme. Theoretically, such analysis can
propagated to the next node, therelaying beinterpreted as an approximation of the
node first updates the LHNL field with performancewhenthenode density is
itsneighbor list. sufficiently large. It also serves as a
When the next node receives the share, lowerbound on the performance of the
itcompares the LHNL field against its own NRRP, DRP, and MTRPschemes.
neighbor list, andrandomly picks one node Note that the security analysis for the CN
from its neighbors that are not inthe and DOS attacks are similar because both
LHNL. It then decrements the TTL value, of them involve calculating the packet
updates theLHNL field, and relays the interception probability. The same
share to the next hop, and so on. treatment can be applied to the DOS
Whenever the LHNL fully overlaps with or attack with a straightforward modification.
contains therelaying node‘s neighbour list,
a random neighbour is selected,just as in 3.1 Network and Attack Models
the case of the PRP scheme. According to An area S that is uniformly covered by
thispropagation method, DRP reduces the sensorswith density of a unit-disk model
chance of propagatinga share back and for the sensorcommunication, i.e., the
forth by eliminating this type transmitted signal from a sensorcan be
ofpropagation within any two consecutive successfully received by any sensor that is
steps. Comparedwith PRP, DRP attempts at most Rhmeters away. Multihop relay is
to push a share outward awayfrom the used if the intendeddestination is more
source, and thus, leads to better than Rh away from the source.
propagationefficiency for a given TTL Link-level security has been
value. establishedthrough a
conventionalcryptography-based
2.1.4 Multicast Tree-Assisted Random bootstrappingalgorithm, i.e., consecutive
Propagation links along an end-to-end pathare
MTRP aims at actively improving the encrypted by symmetric link keys. So,
energy efficiency ofrandom propagation when a node Awants to send a share to its
while preserving the dispersiveness ofDRP. neighbor B, it first encrypts theplaintext
Among the three different routes taken by using link key KAB and then sends the
shares,the route on the bottom right is the ciphertextto B.
most energy efficientbecause it is the When B wants to forward the received
shortest end-to-end path. share to itsneighbor C, it decrypts the
So, in order toimprove energy efficiency, ciphertext using key KAB,reencrypts the
shares should be best propagatedin the plaintext using key KBC, then sends it to
direction of the sink. In other words, C,and so on. In this way, the openness of
theirpropagation should be restricted to the wireless media iseliminated: a node
the right half of thecircle in Fig. cannot decrypt a ciphertext overheardover
1.Conventionally, directional routing the wireless channel if it is not the
requires locationinformation of both the intended receiver.
source and the destination nodes,and A link key is safe unless the
sometimes of intermediate nodes. adversaryphysically compromises either

62
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

side of the link.The adversary has the decryption key is very hard to derive from
ability to compromise multiplenodes. the encryption key. The encryption key is
However, we assume that the adversary public so that anyone can encrypt a
cannotcompromise the sink and its message. However, the decryption key is
immediate surrounding nodes.This private, so that only the receiver is able to
assumption is reasonable because the decrypt the message. It is common to set
sink‘s neighbourhood is usually a small up "key-pairs" within a network so that
area, and can be easily physicallysecured each user has a public and private key.
by the network operator, e.g., by The public key is made available to
deploying guardsor installing video everyone so that they can send messages,
surveillance/monitoring equipment. but the private key is only made available
to the person it belongs to.
3.2 Encryption 4.MULTIPATH CONCEPT
Encryption is the 4.1 DSR Protocol Description
conversion of data into a form, called The DSR protocol is composed of two
a ciphertext. mechanisms that work together to allow
There are two basic techniquesfor the discovery and maintenance of source
encrypting information. They are routes in the ad hoc network [6].They are
 symmetric Route Discovery is the mechanismby
encryptionor secretkey encryption which a node S wishing to send a packet
 asymmetric encryptionor public to a destination node Dobtains a source
key encryption. route to D. Route Discovery is used only
when S attempts to send a packet to D
anddoes not already know a route to D[3].
Route Maintenance is the mechanism by
3.3 Symmetric Encryption which node S is able to detect, while using
Symmetric Encryption (also known as a source routeto D, if the network
symmetric-key encryption, single-key topology has changed such that it can no
encryption, one-key encryption and longer use its route to D because a
private key encryption) is a type of linkalong the route no longer works.
encryption where the same secret key is When Route Maintenance indicates a
used to encrypt and decrypt information source route is broken, S canattempt to
or there is a simple transform between the use any other route it happens to know to
two keys. D, or can invoke Route Discovery again to
A secret key can be a number, a word, or find anew route. Route Maintenance is
just a string of random letters. used only when S is actually sending
Secret key is applied to the information to packets to D.
change the content in a particular way.
This might be as simple as shifting each 4.2 Route Discovery and Route
letter by a number of places in the Maintanence
alphabet. Symmetric algorithms require When a traffic source needs a route to a
that both the sender and the receiver destination,it initiates a route discovery
know the secret key, so they can encrypt process.Route discovery typically involve
and decrypt all information. as network-wide flood of route
There are two types of symmetric request(RREQ) packets targeting the
algorithms: destination and waiting for a route
Stream algorithm(Stream ciphers) reply(RREP)[7].An intermediate node
Block algorithms (Block ciphers) receiving a RREQ packets first sets up a
3.4 Asymmetric Encryption (Public Key reverse pathto source using previous hop
Encryption) of RREQ as the next hop on the reverse
Asymmetric encryption uses different keys path.
for encryption and decryption. The

63
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

If a valid route to the destination is position of sensor nodes need not be


available then the intermediate generates engineered or predetermined. This allows
RREP else RREQ is re-broadcast. Duplicate random deployment in inaccessible
packets of RREQ received on any node are terrains or disaster relief operations. On
discarded. when the destination receives other hand these also means the sensor
RREQ it also generates RREP. The RREP is network protocols and algorithms must
routed back to the source using reverse possess self organising capabilities.
path. Another unique feature of sensor network
As RREP id proceed to the source, a is the co-operative effort of sensor nodes.
forward path to the destination is sink may communicate with the task
established. manager via Internet or satellite.Design of
sensor nodes has the factors such as
4.3 Multipath Routing Fault tolerance
Multipath routing is needed for secure Scalability
communicationwhen route recovery Production cost
cannot be guaranteed to be done Operating environment
fastenough because of the high mobility of Sensor network topology
the system [2]. Withstandby paths, traffic Hardware constraints
can be redirected whenever we haveroute Transmission media
failure, thus reducing route recovery time. Power consumption
Multipathrouting also offers other quality
of service advantages (suchas, load
balancing, aggregation of network
bandwidth, reducingtraffic congestion etc).
Multipath routing in networks with no
fixed infrastructureis a major challenge
and in general requires a
differentapproach from that used with
fixed infrastructures[5].

4.4. Split Multipath Routing


Split Multipath Routing (SMR) is an on-
demand routingprotocol that builds
multiple routes using request/reply
cycles[4].
When the source needs a route to the
destination but no routeinformation is Fig.3 Sensor node scattered in a sensor
known, it floods the ROUTE REQUEST field.
(RREQ)message to the entire network.
Because this packet is flooded,several The Sensor node scattered in a sensor
duplicates that traversed through different field as shown in fig.3 Each of these
routes reachthe destination. The scattered sensor nodes has the capabilities
destination node selects multiple disjoint to collect the data and route back to the
routes and sends ROUTE REPLY (RREP) sink. The data route back to the sink by a
packets back to thesource via the chosen multi-hop infrastructureless architecture
routes. [1].

5.SENSOR NETWORK ARCHITECTURE 6.CONCLUSION


A sensor network is composed of a large The effectiveness of the randomized
number of sensor nodes [1].that are dispersive routing in combating CN and
densely deployed either inside the DOS attacks. By appropriately setting the
phenomenon or very close to it. The secret sharing and propagation

64
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

parameters, the packet interception 2. M. Burmester and T.V. Le, ―Secure


probability can be easily reduced by the Multipath Communication inMobile Ad Hoc
proposed algorithms which is at least one Networks,‖ Proc. Int‘l Conf. Information
order of magnitude smaller than Technology:Coding and Computing, pp.
approaches that use deterministic node- 405-409, 2004.
disjoint multipath routing. The proposed 3. D.B. Johnson, D.A. Maltz, and J.
algorithms can be applied to Broch, ―DSR: The DynamicSource Routing
selectivepackets in WSNs to provide Protocol for Multihop Wireless Ad Hoc
additional security levels against Networks,‖Ad Hoc Networking, C.E.
adversaries attempting to acquire these Perkins, ed., pp. 139-172,Addison-Wesley,
packets. By adjusting the random 2001.
propagation and secret sharing 4. S.J. Lee and M. Gerla, ―Split
parameters (N and M), different security Multipath Routing with MaximallyDisjoint
levels can beprovided by our algorithms at Paths in Ad Hoc Networks,‖ Proc. IEEE
different energy costs. Int‘l Conf. Comm.(ICC), pp. 3201-3205,
Considering that the percentage of 2001.
packets in a WSN that require a high 5. W. Lou, W. Liu, and Y. Zhang,
security level is small, we believe that the ―Performance Optimization Using Multipath
selective use of the proposed algorithms Routing in Mobile Ad Hoc and Wireless
does not significantly impact the energy Sensor Networks,‖ Proc. Combinatorial
efficiency of the entire system. A small Optimization in Comm. Networkspp. 117-
number of black holes in the WSN. 146, 2006.
In reality,a stronger attack could be 6. David B. Johnson David A. Maltz
formed, whereby the adversary selectively Josh Broch,‖DSR: The Dynamic Source
compromises a large number of sensors Routing Protocol for Multi-Hop Wireless Ad
that are several hops away from the sink Hoc Networks‖ Computer Science
to form clusters of black holes around the Department Carnegie Mellon University
sink. Collaborating with each other, these Pittsburgh, PA 15213-3891.
black holes can form a cut around the sink 7. Mahesh K.Marina Samir R.Das,
and can block every path between the ‖On-demand Multipath Distance vector
source and the sink. routing‖.

REFERENCES

1. I.F. Akyildiz, W. Su, Y.


Sankarasubramaniam, and E. Cayirci,
―ASurvey on Sensor Networks,‖ IEEE
Comm. Magazine, vol. 40, no. 8,pp. 102-
114, Aug. 2002.

65
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

CONTENT AWARE PLAYOUT FOR VIDEO


STREAMING
*Geethanjali Jayachandran **N.Gomathi and ***V.R. Vimal

*II M.E. (CSE), VelTech MultiTech SRS Engg College ,geethavec@gmail.com


**Prof. CSE, VelTech MultiTech SRS Engg College,gomathi1974@gmail.com
***Asst. Prof. CSE, VelTech MultiTech SRS Engg College,vimalraman2004@gmail.com

Abstract—Media streaming over wireless


links is a challenging problem due to both
the unreliable, time-varying nature of the
wireless channel and the stringent delivery I. INTRODUCTION
requirements of media traffic. In this
paper, we use joint control of packet RECENT advances in video compression
scheduling at the transmitter and content- and streaming as well as in wireless
aware playout at the receiver, so as to networking technologies (next-generation
maximize the quality of media streaming cellular networks and high-throughput
over a wireless link. Our contributions are wireless LANs), are rapidly opening up
twofold. First, we formulate and study the opportunities for media streaming over
problem of joint scheduling and playout wireless links. However, the erratic and
control in the framework of Markov time-varying nature of a wireless channel
decision processes. Second, we propose a is still a serious challenge for the support
novel content-aware adaptive playout of high-quality media applications. To deal
control that takes into account the content with these problems, network-adaptive
of a video sequence, and in particular the techniques have been proposed, [5] that
motion characteristics of different scenes. try to overcome the time-variations of the
We find that the joint scheduling an wireless channel using controls at various
playout control can significantly improve layers at the transmitter and/or the
the quality of the received video, at the receiver. In this work, we consider the
expense of only a small amount of playout transmission of pre-stored media units
slowdown. Furthermore, the content- over a wireless channel which supports a
aware adaptive playout places the time-varying throughput. We investigate
slowdown preferentially in the fast and the joint control of packet scheduling at
low-motion scenes, where its perceived the transmitter (Tx) and playout speed at
effect is lower. the receiver (Rx), so as to overcome the
variations of the channel and maximize
Index Terms— Adaptive media playout, the perceived video quality, in terms of
cross-layer optimization, network control, both picture and playout quality. We
packet scheduling, video-aware adaptation couple the action of the transmitter and
and communication. the receiver, so that they coordinate to
overcome the variations of the wireless
channel. We jointly consider and optimize
several layers, including packet scheduling

66
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

at the medium access control (MAC) layer, The rest of the paper is structured as
together with playout and content- follows. Section II discusses related work.
awareness at the video application layer. Section III introduces the system model
Video content is taken into account both in and problem formulation. Section IV
playout as well as in rate-distortion provides simulation results. Section V
optimized packet scheduling. We briefly concludes the paper.
note the following intuitive tradeoffs faced
by the individual controls in the attempt to II. RELATED WORK
maximize video quality. At the Tx side, the
dilemma is the following: on one hand we Streaming media over an unreliable and/or
want to transmit all media units; on the time-varying network,whether this is the
other hand, during periods that the Internet or a wireless network, is a large
bandwidth is scarce, we may choose to problem space with various aspects and
transmit the most important units and skip control parameters. Several network-
some others, depending on their rate- adaptive techniques have been proposed
distortion values. [5], including rate-distortion optimized
packet scheduling [6], power [12] and/or
At the Rx side, the dilemma is the rate control at the transmitter, and playout
following: on one hand, we want to speed adaptation at the receiver [13], [9].
display the sequence at the natural frame Wireless video, in particular, is a very
rate; on the other hand, during bad challenging problem, due to the limited,
periods of the channel, we may choose to time-varying resources of the wireless
slowdown the playout in order to extend channel; a survey can be found in [4].
the playout deadlines of packets in There is a large body of work on cross-
transmission, and avoid late packet layer design for video streaming over
arrivals (leading to buffer underflow and wireless, including [15], [2], [18] to
frame losses), but at the expense of the mention a few representative examples.
potentially annoying slower playout. A Our work also falls within the scope of
novel aspect of this work is that we cross-layer optimization. In the rest of this
perform content-aware playout variation; section, we clarify our contribution and
that is, we take into account the comparison
characteristics of a video scene when we to prior work in this problem space.
adapt the playout speed. The
contributions of this work are the A. Prior Work on Adaptive Playout
following.
Playout control at the receiver can
1. We formulate the problem of joint mitigate packet delay variation and
playout and scheduling within the provide smoother playout. Adaptive
framework of Markov decision processes playout has been used in the past for
and we obtain the optimal control using media streaming over the Internet for
dynamic programming. both audio [17], [14], and video [13],
[9], [10]. Our work proposes for the first
2. We introduce the idea of content- time to make playout control content-
aware playout and demonstrate that it aware. That is, given a certain amount of
significantly improves the user experience. playout slowdown and variation caused by
bad channel periods, we are interested in
The idea is to vary the playout speed of applying this slowdown to those part of
scenes, based on the scene content; e.g., the video sequence that are less sensitive
scenes with low or no motion typically from a perceptual point of view. Within
may be less affected by playout variation the adaptive playout literature, the closest
than scenes with high motion. to our work are [13] and [9]. However,
there are two differences. The first

67
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

difference is that we propose, for the first enhances the playout and is optimized for
time, a content-aware playout; we build that purpose. The state-of-the-art in rate-
on and extend the metrics in [9] to include distortion optimized packet scheduling is
the notion of motion-intensity of a scene. currently the RaDiO family of techniques
A second and more subtle difference lies [3]: in every transmission opportunity, a
in the formulation. [13] models the system decision is made as to which media units
as a Markov chain and analyzes the to transmit and which to discard, so as to
performance of adaptive algorithms that maximize the expected quality of received
slowdown or speed up the playout rate video subject to a constraint in the
based on the buffer occupancy. However, transmission rate, and taking into
the parameters of these algorithms, such consideration transmission errors, delays
as buffer occupancy thresholds, speedup and decoding dependencies. Similar to
and slowdown factors, are fixed and must RaDiO, our scheduler efficiently allocates
be chosen offline. In contrast, we model bandwidth among packets, so as to
the system as a controlled Markov chain, minimize distortion and meet playout
which allows for more fine-tuned control: deadlines. Both works propose analytical
the control policy itself is optimized for the frameworks to study video transmitted
parameters of the system, including the over networks. However, there are two
channel characteristics. For example, differences. First, the two modeling
when the channel is good, the playout approaches are different: we formulate
policy can be optimistic and use low levels the problem as a controlled Markov chain,
of buffer occupancy, under which it starts thus being able to exploit the channel
to slowdown; when the channel is bad the variations, while RaDiO formulates the
optimal policy should be more problem using Lagrange multipliers, thus
conservative and start slowing down even optimizing for the average case. Second,
when the buffer is relatively full. Finally, different simplifications are used to
another difference lies in the system efficiently search for the optimal solution:
design: [13] performs slowdown and RaDiO optimizes the transmission policy
speedup at the Rx, while we perform for one packet at a time, while we
slowdown at the Rx and drop late packets constrain our policies to in-order
at the Tx, thus saving system resources transmission. Our approach could also be
from unnecessary transmissions. extended to include out-of-order
transmissions. Another framework for
B. Prior Work on Packet Scheduling optimizing packet scheduling is CoDiO,
congestion-distortion optimized streaming
In this paper, we use packet scheduling at [16], which takes into account congestion
the Tx to complement the playout which is detrimental to other flows but
functionality at the Rx. The main purpose also to the stream itself; in a somewhat
of the scheduler is to discard late packets similar spirit, our scheme may purposely
and catch up with accumulated delay drop late packets at the transmitter in
caused by playout slowdown during bad order to avoid self-inflicted increase in the
channel periods; these late packets would stream‘s end-to-end delay. Finally, we
be dropped anyway at the Rx, but would like to note that, in this paper, we
dropping them at the Tx saves system focus on nonscalable video encoding,
resources. In addition, we enhanced the which is the great majority of pre-encoded
scheduler to transmit a subset of the video video today as well as in the foreseeable
packets to meet the channel rate future. If the original video is encoded
constraint with minimum distortion. This with scalable video coding then we will
paper does not aim at improving the state have more flexibility in terms of what to
of the art in radio-distortion optimized drop to fit the available bandwidth and
scheduling; instead, its contribution lies in delay constraints; however, some of the
the playout control. The scheduling control techniques proposed in this paper for

68
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

assessing what todrop may still be


applicable. and catch up with the delay accumulated
from slowing down during bad channel
C. Relation to Our Prior Work periods. In contrast, in [12], we did not
consider delay-sensitive applications and
In the past, we have also used the general we transmitted all packets. An early
framework of controlled Markov chains to version of this work appeared in [11]. This
study different control problems, with journal paper is extended to include: the
emphasis on power control. The closest of results of in-house subjective testing; a
our past work to this paper is [12], where greedy algorithm for packet scheduling;
we studied power-playout control for video more details about the video sequences
streaming over a channel with time- used and their motion intensity; additional
varying error characteristics. In this paper, simulation results for a range of channels
we use the same modeling and and for different rate adaptation
optimization methodology, but we study a timescales; a comparison to related work;
different control problem. The first a detailed discussion on rate adaptation
difference is that we deal with a different techniques for video and audio.
input: an error-free channel with time-
varying rate is given to us and we try to
overcome its fluctuations. We have no III. SYSTEM MODEL AND PROBLEM
control over the channel characteristics, FORMULATION
we can only adapt to their fluctuations. (In
contrast, in [12], we considered a channel Video Quality Assessment Schemes
with fixed rate and time-varying packet Objective QoS Measures
loss rate, which we could affect by varying
the power.) From a streaming application‘s In an optimal case, the quality of video is
perspective, the setting considered in this monitored during transmission. According
paper is more realistic in today‘s wireless to measurements, adjustment of
systems, as explained in Section III-B. parameters and possible retransmission of
Second, we introduce two novel controls. the data is carried out. Objective quality
We introduce for the first time, content- assessment methods of digital video can
aware playout: the idea is to selectively be classified into three categories. In the
vary the playout in scenes with less first category, the quality is evaluated by
motion, where the effect will be less comparing the decoded video sequence to
perceived. In addition, we use packet the original. The objectivity of this method
scheduling to discard late packets and is owed to the fact that there is no human
catch up with the delay accumulated from interaction; the original video sequence
slowing down during bad channel periods and the impaired one are fed to a
computer algorithm that calculates the
distortion between the two. The second
category contains methods that compare
features calculated from the original and
the decoded video sequences. The
methods of the third category make
observations only on decoded video and
estimate the quality using only that
information. The Video Quality Experts
Group (VQEG) calls these groups the full,
the reduced and the no reference methods
Figure 1: Joint scheduling and playout [15]. Traditional signal distortion
control over a time varying wireless measures use an error signal to determine
network. the quality of a system. The error signal is

69
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

the absolute difference between the SAMVIQ than with the other methods. In
original and processed signal. The addition, in this method there is only one
traditional quality metrics are the Root viewer at a time, which alleviates a "group
Mean Square Error (RMSE), the Signal-to- effect" for Evaluation Setup and Scenarios.
Noise Ratio (SNR), and the Peak Signal-to-
Noise Ratio (PSNR) in dB. In this work we Topology
employ a Full reference method and use
the PSNR as the objective quality metric. The evaluation topology consists of one
There are numerous metrics used to Video Streaming Server, two backbone
express the objective quality of an image routers and video clients of variable types
or video, which cannot, however, and connectivity methods (fixed, mobile,
characterize fully the response and end wired, wireless) as shown in Fig. 1. The
satisfaction of the viewer. Perceived video streaming server is attached to the
measure of the quality of a video is done first backbone router with a link which has
through the human "grading" of streams 10Mbps bandwidth and 10ms propagation
which helps collect and utilize the general delay. These values remain constant
user view. There is a number of perceived during all scenarios. This router is
quality of service measurement connected to a second router using a link
techniques. Most of them are explained in with unspecified and variable band-width,
[19]. The following are the most popular: propagation delay, and packet loss. The
a) DSIS (Double Stimulus Impairment different parameter values used to
Scale) b) DSCQS (Double Stimulus characterize this variable link are shown in
Continuous Quality Scale) c) SCAJ Table 1. Using this topology, we
(Stimulus Comparison Adjectival conducted several experiments for two
Categorical Judgement) d) SAMVIQ different sample sequences and with
(Subjective Assessment Method for Video fixed-wired clients, fixed-wireless clients
Quality evaluation) In this work we have and mobile-wireless clients.
used the SAMVIQ [8] method. SAMVIQ is
based on random playout. The individual Variable Test Parameters
viewer can start and stop the evaluation
process as he wishes and is allowed to The choice of the parameters used in the
determine his own pace for per-forming video quality evaluations (Table 1) was
grading, modifying grades, repeating based on the typical characteristics of
playout when needed, etc. For the mobile and wireless networks, as these
SAMVIQ method, quality evaluation is are described in Section 2. For example,
carried out scene after scene including an the Link Bandwidth can be considered as
explicit reference, a hidden reference and either the last hop access link BW or the
various algorithms (codecs). As a result, available BW to the user. The values
SAMVIQ offers higher reliability, i.e. chosen can represent typical wired home
smaller standard deviations. A major access rates (modem, ISDN, xDSL) or
advantage of this subjective evaluation different bearer rates for UMTS.
scheme is in the way video sequences are
presented to the viewer. In SAMVIQ video Video Link Propagatio Packet
sequences are shown in multi-stimulus Stream Bandwidth n Delay Loss
form, so that the user can choose the Bit Rate
order of tests and correct their votes, as
64 Kbps 64 Kbps 10 ms
appropriate. As the viewers can directly
10-5
compare the impaired sequences among 128 100 Kbps 20 ms
themselves and against the reference, Kbps
they can grade them accordingly. Thus, 256 256 Kbps 100 ms
viewers are generally able to discriminate Kbps
the different quality levels better with

70
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

512 Kbps 200 ms 10-3 encoding rate are used interchangeably in


Kbps this paper 6 video quality evaluation like
768 1 Mbps 400 ms PSNR values, frames lost, packet end to
Kbps end delay and packet jitter. Some new
functionality were implemented in NS2 in
Table 1: Variable Parameter order to support EvalVid.

Test sequences
IV SIMULATION RESULTS
The test sequences used in this work were
the sample sequences Foreman and Fig 2 represents the performance of the
Claire. The sequences were chosen system in terms of PSNR gained with
because of their different characteristics. content- aware playout and content-
The first is a stream with a fair amount of unaware playout. The content aware
movement and change of background, playout provides better quality video.
whereas the second is a more static
sequence. The characteristics of these
sequences are shown in Table 2. The
sample sequences were encoded in
MPEG4 format with a free software tool
called FFMPEG encoder [20]. The two
sequences have temporal resolution 30
frames per second, and GoP (Group of
Pictures) pattern IBBPBBPBBPBB. Each
sequence was encoded at the rates shown
in Table 1. The video stream bit rate1
varies from 64Kbps to 768Kbps. This rate
is the average produced by the encoder.
Since the encoding of the sample video
Figure 2: PSNR evaluation
sequences is based on MPEG4, individual
frames have variable sizes.
V CONCLUSION
Trace Resoluti Total I P B
In this work, we formulated the problem
on Frame
of media streaming over a time-varying
Forema 176x144 400 34 10 26 wireless channel as a stochastic control
n 0 6 problem, and we analyzed the joint control
Claire 176x144 494 42 12 32 of packet scheduling and content-aware
4 8 playout. We showed that a small increase
in playout duration can result in a
Table 2: Characteristics of different video significant increase in video quality.
Furthermore, we proposed to take into
account the characteristics of a video
Data Collection sequence in order to adapt the playout
control based on the characteristics of
All the aforementioned experiments were each scene in the video sequence; this
conducted with an open source network reduces the perceived effect of playout
simulator tool NS2 [21]. Based on the speed variation. Our proposed method can
open source framework called EvalVid [7] improve the quality of the video stream
we were able to collect all the necessary over the network.
information needed for the objective 1 The
terms video stream bit rate and video

71
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

REFERENCES [11] Li, A. Markopoulou, Apostolopoulos,


[15]Berry and E. Yeh, ―Cross-layer and N. Bambos, ―Packet transmission and
wireless resource allocation—Fundamental content-dependent playout variation for
performance limits for wireless fading video streaming over wireless networks,‖
channels,‖ IEEE Signal Process. Mag., vol. in Proc. IEEE MMSP 2005(Special Session
21, no. 5, pp. 59–68, Sep. 2004. on Content Aware Video Coding and
[2] Cabrera, A. Ortega, and J. I. Ronda, Transmission), Shanghai, China, Oct.
―Stochastic rate-control of video coders for 2005.
wireless channels,‖ IEEE Trans. Circuits [12] Li, A. Markopoulou, N. Bambos, and
Syst. VideoTechnol., vol. 12, no. 6, pp. J. Apostolopoulos, ―Joint power-playout
496–510, Jun. 2002. control schemes for media streaming over
[3] P. Chou and Z. Miao, ―Rate-distortion wireless links,‖ in Proc. IEEE Packet Video,
optimized streaming of packetized media,‖ Irvine, CA, Dec. 2004.
Microsoft Research Technical Report MSR- [13] Liang, N. Farber, and B. Girod,
TR-2001-35, Feb. 2001. ‖Adaptive playout scheduling and loss
[4] Färber and B. Girod, ―Wireless Video,‖ concealment for voice communications
in Compressed Video over Networks, A. over the networks,‖ IEEE Trans. on
Reibman and M.-T. Sun, Eds. New York: Multimedia, Apr. 2001.
MarcelDekker, 2000. [14] S. Moon, J. Kurose, and D. Towsley,
[5] Girod, J. Chakareski, M. Kalman, Y. J. ‖Packet audio playout delay adjustment:
Liang, E. Setton, and R.Zhang, ―Advances performance bounds and algorithms,‖
in network-adaptive video streaming,‖ in ACM/Springer Multimedia Systems, vol. 6,
Proc.IWDC 2002, Capri, Italy, Sept. 2002. pp. 17-28, Jan. 1998.
[6] Jacob Chakareski and Pascal Frossard . [15] Rohaly M. et al. (2000) "Video Quality
―Rate-Distortion Optimized Distributed Experts Group: Current Results and Fu-
Packet Scheduling of Multiple Video ture Directions", in SPIE Visual
Streams Over Shared Communication Communications and Image Processing,
Resources.‖ in IEEE Trans. Multimedia, Perth, Australia, June 21-23, 2000, Vol.
Vol. 8, No. 2, April 2006. 4067, p.742-753.
[7] Jirka Klaue, Berthold Rathke, and [16] Setton and B. Girod, ―Congestion-
Adam Wolisz.―EvalVid - A Framework for Distortion Optimized Scheduling of Video,‖
Video Transmission and Quality Multimedia Signal Processing Work- shop
Evaluation,‖ in Proc. International (MMSP), Sienna, Italy, pp. 99–102, Oct.
Conference on Modelling Techniques and 2004.
Tools for Computer Performance [17] Towsley, H. Schulzrinne, R. Ramjee,
Evaluation, Urbana, Illinois, USA, J. Kurose, ‖Adaptive playout mechanisms
September 2003. for packetized audio applications in wide-
[8] Kozamernik, V. Steinmann, P. Sunna, area networks,‖ in Infocom 1994, Jun
E. Wyckens, "SAMVIQ: A New EBU 1994.
Methodology for Video Quality Evaluations [18] Vijaykumar. M and Santhosha Rao.―A
in Multimedia," SMPTE Motion Imaging crosss layer framework for adaptive video
Journal, April 2005, pp. 152-160. streaming over IEEE 802.11 wireless
[9] Kalman, E. Steinbach, and B. Girod, networks.‖ in ICCCT ,2010,
‖Adaptive media playout for low delay [19] ITU-R BT.500-11 "Methodology for
video streaming over error-prone the subjective assessment of the quality of
channels,‖ IEEE Transactions on Circuits television pictures"
and Systems for Video Technology. Vol. [20] FFMPEG Multimedia System.
14, No. 6, June 2004. http://ffmpeg.sourceforge.net/index.php
[10] Kalman, E. Steinbach, and B. Girod,
‖Rate-distortion optimized video streaming [21] Network Simulator 2
with adaptive playout,‖ in IEEE ICIP-2002, http://www.isi.edu/nsnam/ns/
vol. 3, pp. 189-192, Sep 2002.

72
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

CRYPTANALYSIS OF AN EDGE CRYPT


ALGORITHM
*S.Anusha, (P.G. Student)/CSE,
s_anusha25@yahoo.com.
**Mr.B.Bhuvaneswaran, Lecturer /CSE,
bhuvangates@yahoo.com
Rajalaksmi Engineering College, Chennai, India.
Abstract — In modern medical world possible to reconstruct the original image
communication of medical images over the using this security keys.
network occurs frequently. So the security
of medical images becomes more and
more important. In order to fulfill security Keywords - edge crypt algorithm, logic
of medical images, many image encryption chaotic map encryption, crypt analysis,
methods have been proposed. Recently, a cipher text attack, shuffling process.
lossless encryption method for medical
images using edge maps was proposed. It I. Introduction
encrypts the medical images using the
information contained in the edge map of Modern telecommunications
the image. This edge crypt algorithm can technologies allow sending and receiving
fully protect the selected objects/regions files, images, and data in a relatively short
within the medical images or the entire time. Nowadays, the use of traditional
medical images. It can also encrypt other symmetric and asymmetric cryptography is
types of images such as grayscale images the way to secure the information
or color images.Thisalgorithm encrypts exchange. The content protection of
medical images by using the combination multimedia data (especially digital images
of four different methods to encrypt the and videos) through encryption has
images.A random bit sequence, generated attracted more and more attention as the
from a logic chaotic map, is used to rapid development of multimedia and
encrypt the edge map.In the edge crypt network technologies in past decades. To
algorithm, encryption process involves offer reasonable background knowledge
XOR operation and logic chaotic maps. It on the content of this paper, in the
is insecure because design weakness is following a very brief introduction to some
involved in both the operation mentioned existing image/video encryption schemes
above. So edge crypt algorithm is not will be given.
secure from the cryptographic point of Dusit Niyato [1] proposes the Constrained
view. This paper addresses the protection Markov decision process (CMDP) is
of images and the problem of involved in formulated to make the optimization over
the edge crypt algorithm. Further this here. The objective of this formulation is
paper identifies some security flaws in to minimize the connection cost.
these operational methods and how to Optimization of capacity reservation in the
break them with cipher text attack. It also radio access network issues is not
points out that the edge crypt algorithm is considered. Related to queue
not secure against cipher text attacks, management and transmission scheduling
since only edge maps is enough to get the in a patient-attached device is not
security keys of this algorithm. So it is considered. Koredianto Usman et al, [2]

73
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

proposed a random permutation precede Meghdad Ashtiyani et al. [8] proposed


by a simple pixels arrangement will fulfill an encryption scheme for the medical
the high computation speed, and a long image encryption based on combination of
permutation key as inherited from the big scrambling and confusion. Chaotic cat map
image size .pixels arrangement and is used for the scrambling the addresses
permutation provide simple and quick of the medical image pixels. Large
processes. In compressed images such as computational cost, time consuming and
JPEG or GIF format, this scheme is not design weakness also involved in this
applicable. Arooshi Kumar et al. [3] build technique. Yicong Zhou, [9] investigates a
effective wireless patient management new application of the edge map for
system (WPMS) using open software that medical image encryption in the non-
can be accessed via a personal digital compression domain. This paper provides
assistance (PDA) or a cell phone. But more a lossless approach called edge crypt, to
details about the patient must be added. encrypt medical images using the
A. G. Fragopoulos et al. [4] intends to fill information contained with an edge maps.
the security gap, author utilizes MPEG-21 Edge map algorithm also contains some
standard's primitives, Protection of security flaws. It contains some design
transmitted medical information and weakness in logic chaotic map and XOR
enhancement of patient's privacy is operation. In this cryptanalysis edge crypt
accomplished. The main problem is the algorithm is implemented using java net
safe destruction of the medical data after beans and real security problem is
their viewing is not applied here. identified.

Faxin Yu et al. [5] proposed a method to II.ENCRYPTION SCHEME


encrypt an image in two shares with
following properties. The method is Edge crypt algorithm [9] is used to
suitable for binary, gray level or color encrypt the medical images using the
image encryption. The bit error rates are information contain in the edge map. It
adjustable, and thus the secret image can obtains the edge map of the medical
be recovered to different extents including image by applying the specific type of the
lossless and nearly lossless. Actually, in edge detector like sobel, canny or any
transmission when one pixel of the share other. It obtains edge map value using
images is lost or an error appears which edge detector with some threshold value.
results in a wrong pixel received. Zahia The algorithm first decomposes the
Brahimi et al [6] proposed an image medical image into several binary bit
encryption scheme based on JPEG2000 planes, encrypts all bit planes by
Codec for medical images. This scheme is performing an XOR operation between the
based on the precincts organization in edge map and each bit plane, encrypts the
jpeg 2000 for selecting sensitive data to edge map using a random bit sequence
encrypt where corresponding code blocks generated from the logic chaotic map,
are also permuted. These schemes are of interleaves the encrypted edge map
low cost and support direct bit-rate among the XORed bit planes, reverses the
control, but they are not secure against order of all bit planes, and combines them
known plaintext and or select plaintext to obtain the final encrypted medical
attacks. K. C. Ravishankar et al. [7] was images. Thus fig 1 shows the step of edge
proposed to overcome security gap a crypt algorithm.
region based selective image encryption
technique which provides the facilities of Bit-plane shuffling process is added to
selective encryption and selective change the values of image pixels in the
reconstruction of images. Original vertical direction to improve the security of
information is accessed by unauthorized this algorithm. The users have flexibility to
users because of partial encryption. utilize any existing approach to shuffle the

74
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

order of bit planes. It simply reverses the the edge map, the type of the edge
order of the bit planes. A random bit detector and its threshold value. The users
sequence generated from a logic chaotic have flexibility to choose any existing
map is used to encrypt the edge map and approach for edge detection and select
it is again performed with XOR operation any threshold value for the edge detector.
between each bit of a random bit The edge map can also be interleaved
sequence and each pixel of the edge map between any two bit planes. In the
to obtain encrypted edge map. The logic decryption process, the authorized users
chaotic map is defined as follows do not have to know the type of the edge
detector and its threshold value to
xn+1= r xn(1-xn) reconstruct the original image because the
edge map has been sent to users along
Where parameter r is a rational number with the encrypted image.
3.5699456 < r ≤ 4
If the size of the edge map is M×N, the The edge map can be completely
random bit sequence can be generated by recovered only by using the correct
the definition, security keys: the location to interleave
the edge map as well as the initial
condition x0and parameter r of the logic
Chaotic map. The decryption process first
decomposes the encrypted image into
binary bit planes .It then reverses the
order of all bit planes and extracts the
edge map from the bit planes. The edge
Where n=0, 1, 2……. MN- 1. map is reconstructed using security keys.
The algorithm performs an XOR operation
between the edge map and each bit plane
and combines the XORed bit planes to
obtain the reconstructed medical image.

This is an example [9] for MRI image


encryption. The encrypted image in Fig.2
(c) is completely different from the original
MRI brain image in Fig.2 (a). The
histogram in Fig.2 (g) shows the nearly
equal distribution of the pixel values in
encrypted image. This makes the
encrypted image difficult to be broken by
attacks. The original image can be
protected with high level of security. This
is one of advantages of the presented
Fig 1 Block diagram of the edge crypt algorithm. The original image has been
algorithm completely reconstructed. The
reconstruction can be verified from the
Security keys of edge crypt algorithm reconstructed image in Fig.2 (d) and its
include initial histogram in Fig.2 (h) because both of
them are exactly the same as the original
image.
Fig 1 Block diagram for edge crypt
algorithm The edge map in this example is
generated by the Sobel edge detector with
condition x0, parameter r of the logic threshold 0.5. It is encrypted by logic
chaotic map, the interleaved location of

75
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

chaotic map with the initial condition x0= • The chosen-cipher text attack - attackers
0.6 and the parameter r = 3.65. can choose some cipher texts and get the
corresponding plaintexts.

As the world has become more


digitalized today, the last two attacks
which used to occur rarely have become
universal. Many image/video encryption
schemes are not secure enough against
cipher text attacks. This edge crypt
algorithm is insecure against cipher text
attacks. This paper studies the security of
the image encryption scheme and reports
the following findings:
Fig.2. MRI image encryption. (a) The
original MRI image; (b) The edge map
(1) The scheme can be broken by using
obtained by Sobel edge detector with
cipher images;
threshold 0.5; (c) The encrypted MRI
(2) There exist some weak keys and
image, (d) The reconstructed MRI image;
equivalent keys;
(e) Histogram of the original MRI image;
(3) The scheme is not sufficiently sensitive
(f) The encrypted edge
to the changes of plain-images; and
map,x0=0.6,r=3.65;(g) Histogram of the
(4) The logic chaotic map is not random
encrypted MRI image; (h) Histogram of
enough to be used for encryption.
the reconstructed MRI image.

III. CRYPTANALYSIS

Security is a major concern for the


encrypted objects as well as for the
encryption algorithm. Security issues of
the edge crypt algorithm shall be
discussed in this session. Cryptanalysis
addresses the weaknesses of proposed
algorithms and their effectiveness of the
algorithm to the vulnerable attacks. Thus,
the main task of cryptanalysis is to
reconstruct the key, or its equivalent form
that can successfully reconstruct all or part
of the plaintexts. A cryptographically
cipher should be secure enough against all Fig3. System design
kinds of attacks. For most ciphers, the
following four attacks under different The main drawback of this edge crypt
scenarios should be checked: algorithm is edge map is sent through the
encrypted image. So it is not secure to
• The cipher text-only attack - attackers unauthorized access. Fig3 explain the
can only observe some cipher texts; cryptanalysis of edge crypt algorithm. The
• The known-plaintext attack - attackers cipher image is XORed with random no
can get some plaintexts and the generation. It gives the edge map of the
corresponding ciphertexts; image and the edge map is taken out from
• The chosen-plaintext attack - attackers the cipher image. The other pixel values
can choose some plaintexts and get the except edge map XOR with edge map
corresponding cipher texts; values. Finally it is reversed and checked

76
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

whether it gives the proper image pixel


values or not. Finally original image is
constructed using random no generation
method. Fig4 gives the overall structure of
crypt analysis process.

E
d Dec
g ryp
e ted
Medica
c Weakn ima
l image
r ess of ge(
y
Encryp edge orgi
p
ted Sec
crypt nal
t urit Fig5.Example for overall process during
medica alg ima
y transmission of medical images
a
l image ge)
l H C fla
IV. EXPERIMENTAL RESULT
Edge
g a r ws
map c y Less Susceptibility To The Change Of
k p Plain-Image
Fig4. Original image
e reconstruction
t by
unauthorized users It is well known that the cipher text of
r a a secure encryption scheme should be
n
Fig5 shows that authorized users use very sensitive to the change of plaintext.
the security key and applya edge crypt But the encryption scheme under test fails
algorithm process to decrypt
l the medical to comply this requirement. Plain-image
image. If a cryptanalyst or a hacker gets ―I‖ (say) have only one pixel difference at
y
that edge map and encrypted medical the position (i, j), the difference will be
s
image, then the weakness of the algorithm permuted to a new position (i*, j*)
can be spotted leading t to image according to the shuffling matrix P*.
reconstruction. Because of the fact that all plain-pixels
before (i*, j*) are identical for the plain-
image, the cipher image will also be
identical. This shows the low sensitivity of
the image encryption scheme to changes
in the plain image.

Key space reduction

X0 and r are also part of the key.


However, from an attacker‘s point of view,
attacker only needs to guess the chaotic
states after the x0 and r chaotic iterations
as the initial conditions of the logic chaotic
map systems. In this way, x0 and r are
removed from the keys to reduce the key
space.

77
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Slow Encryption

The chaotic iterations systems involve


complicated numerical differential
functions, so the encryption speed is
expected to be very slow compared with
other encryption technique. The
implementation of edge crypt has shown
clearly that usage of continuous chaotic
systems can drastically reduce the
encryption speed. Also there are no other
obvious merits in using continuous chaotic
systems rather than a simple discrete-time
chaotic map, the usage of the Logic
chaotic systems in the image encryption
scheme is unnecessary. Instead, these Fig 6 Edge detection
continuous chaotic systems can be
replaced by a simpler discrete-time chaotic Fig6 shows that it detects the sharp
map without compromising the security. edges contain in the images. This edge
detection only important for edge map
Ciphertext Attack encryption. Here, it applies a canny edge
detector algorithm to detect the edges.
When a variation of stream cipher is
created, obtaining the key stream is totally Fig7 displays the edge image size
equivalent to obtaining the key. In this because the edge map is larger than the
section, it presents a cipher text attack original image depend upon the original
which allows reduce both the key stream image edges. Sometimes it may be lower
and the shuffling matrix. Let us consider a than the original image size. Fig8 shows
cipher image I such that ¥ i, j=1 ~ MN I‘(i) that the edge map is encrypted using
= I‘ (j) = a. In this case, the shuffling part random number generated by chaotic
does not work, so we have I* = I. Then, map. It also encrypts with decomposition
we can recover the key stream by of the binary bit plane values. Each bit
generating random no that is equal to the plane encrypted with edge map pixel
encrypted pixel values. Now the edge map values. This result also shows the final xor
pixels value gets changed. values.

After removing the edge map part,


other pixel cipher values XOR with edge
map pixel values. According to the general
cryptanalysis on permutation-only ciphers
in only [log256(MN)] chosen cipher-images
are needed to recover the shuffling matrix
P*. In total it need [log256(MN)] +1 cipher-
images to perform this cipher text attack.

Sample Outcome

78
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Fig7 Edge detected image and edge map sequence generated by iterating the logic
value chaotic function is found not sufficiently
random for secure encryption. Thus it has
been identified that security flaws are
involved in the edge crypt algorithm. It is
also brought to the notice that cipher text
attack and design weakness are held in
the edge crypt algorithm. Therefore, it is
not being recommended for applications
requiring a high level of security.

REFERENCES

[1] Dusit Niyato, Member, IEEE,


Ekram Hossain, Senior Member, IEEE, and
Sergio Camorlinga‖Remote Patient
Monitoring Service using Heterogeneous
Wireless Access Networks: Architecture
and Optimization ―IEEE journal on selected
areas in communications, vol. 27, no. 4,
May 2009.
Fig8 Edge map encryption and chaotic [2] Koredianto Usman,Hiroshi
map encryption with xor values Juzojil,Isao Nakajimal Soegijardjo
Soegidjoko ,Mohamad Ramdhani,Toshihiro
Hori Seiji Igi ‖ Medical Image Encryption
Based on Pixel Arrangement and Random
Permutation for Transmission Security‖,
2007.
[3] Arooshi Kumar, Rajita Kumar1 and
Sanjuli Agarwal duPont Manual High
School Computer Science Department
Indiana University Southeast, Indiana
―Wireless Information System for Patient
Health Care Management ―,2007.
[4] A.G.Fragopoulos, J. Gialelis, D.
Serpanos ―Imposing Holistic Privacy and
Data Security on Person Centric eHealth
Fig9 Final encrypted image
Monitoring Infrastructures ―, 2010.
[5] Faxin Yu, Hao Luo
(Correspondence author), Jeng-Shyang
V. CONCLUSION
Pan, Zhe-Ming Lu School of Aeronautics
and Astronautics, Zhejiang University,
The security of a recently proposed
Hangzhou, 310027, P.R. National
image encryption scheme based on an
Kaohsiung University of Applied Sciences,
edge crypt algorithm has been studied. It
Kaohsiung 807, Taiwan, ROC.Sun Yat-sen
is found that the scheme can be broken
University, Guangzhou 501725, P.R. China
with only cipher- images by real
―Lossless and Lossy Image Secret Sharing
implementation of edge crypt algorithm.
for Data Transmission‖, Eighth
In addition, it is found that the scheme
International Conference on Intelligent
has some weak keys and equivalent keys,
Systems Design and Applications, 2008.
and that the scheme is not sufficiently
[6] Zahia BRAHIMI, Hamid
sensitive to the changes of plain-images.
BESSALAH, A. TARABET, M. K. KHOLLADI
Furthermore, the Pseudo random number

79
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

―A new selective encryption technique Symmetric Cryptography‖.Information and


ofJPEG2000 code stream for medical communication technologies: from theory
images transmission‖ 2008 5th and application, 2008.ICTA 2008, 3rd
International Multi-Conference on International conference on 7-11 April
Systems, Signals and Devices 2008, on page 1-5.
[7] K. C. Ravishankar, M.G. [9] Yicong Zhou, Member,IEEE, Karen
Venkateshmurthy ―Region Based Selective Panetta, Fellow, IEEE, and Sos Agaian,
Image Encryption ―Department of Senior Member, ―A Lossless Encryption
Computer Science and Engineering, Method for Medical Images Using Edge
Malnad College of Engineering, Hassan, Maps‖IEEE 31st Annual International
Kamataka, India,2006. Conference of the IEEE EMBS Minneapolis,
[8] Meghdad Ashtiyani, Electrical Eng. Minnesota, USA, September 2-6, 2009,
Department,IHU Tehran, Iran,Parmida
Moradi Birgani,Biomedical Eng.
Department,Islamic Azad University
Tehran, Iran,moradi,Hesam M.
Hosseini,Electrical Eng.
Department,Tarbiat Modares
University.Tehran, Iran. ―Chaos-Based
Medical Image Encryption Using

80
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A CONCURRENCY CONTROL PROTOCOL USING


ZR+-TREES FOR SPATIAL JOIN AND KNN
QUERIES

U. Gayathri, (P.G.Student)/CSE,
Rajalakshmi Engineering College,
gaayathriu@gmail.com,
Contact: 9444931673

Spatial databases contain multidimensional data


with explicit knowledge about objects, their
Abstract—Developments in the database extent, and their position in space. The objects
technology enable changes in the practice of GIS are usually represented in some vector-based
(Geographic Information System), where spatial format, and their relative position may be explicit
data is widely used. Multidimensional indices like or implicit (i.e., derivable from the internal
R-trees and its variants can directly handle spatial representation of their absolute positions).
data which also has various concurrency control Database systems use indexes to quickly look up
mechanism. In the existing system a dynamic values and the way that most databases index
granular approach to phantom protection in R- data is not optimal for spatial queries. Instead,
trees and its variants is GLIP, Granular Locking spatial databases use a spatial index to speed up
Indexing Protocol. The overlapping of leaf nodes database operations.
in R+-tree is overcome using ZR+-trees. In the
proposed system the aim is to implement GLIP Many indexing structures (e.g., the R-tree family,
using ZR+-tree for extending spatial join and KNN Generalized Search Trees (GiSTs), grid files, and
queries. z-ordering) have been proposed to support fast
access to multidimensional data in relational
Keywords – spatial data, concurrency control. databases. The data structure splits space with
hierarchically nested, and possibly overlapping,
1 INTRODUCTION minimum bounding rectangles. R+-trees are a
compromise between R-trees and kd-trees. They
Multidimensional database systems have gathered avoid overlapping of internal nodes by inserting
tremendous market momentum as the platform an object into multiple leaves if necessary. ZR+-
for building new decision-support applications. trees resolves the limitations of the original R+-
Multidimensional structure is defined as ―a tree by eliminating the overlaps of leaf nodes.
variation of the relational model that uses
multidimensional structures to organize data and Spatial join operation is used to combine two or
express the relationships between data‖. The more datasets with respect to a spatial predicate.
structure is broken into cubes and the cubes are When data is organized in an R-Tree, the k
able to store and access data within the confines nearest neighbours of all points can efficiently be
of each cube. ―Each cell within a multidimensional computed using a spatial join.
structure contains aggregated data related to
elements along each of its dimensions‖. The data Many related work have been proposed where
still remains interrelated. Even when data is spatial join operation is performed and hence the
manipulated it is still easy to access as well as be k nearest-neighbours of all points are computed.
a compact type of database. In [1], location-aware environments are
examined. They are characterized by a large
81
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

number of objects and a large number of


continuous queries. Both the objects and Reference [4] proposes two efficient and scalable
continuous queries may change their locations algorithms using grid indices. Many location-based
over time. The Shared Execution Algorithm (SEA- applications require constant monitoring of k-
CNN, for short) was introduced to efficiently nearest neighbor (k-NN) queries over moving
maintain the answer results of CKNN queries. The objects within a geographic area. He presented an
problem of evaluating multiple queries which analysis of two proposed approaches namely,
arose is solved by a spatial join between the query-indexing and object indexing. Existing
query table and the object table. Reference [2] approaches to this problem have focused on
proposes an energy-efficient spatial join algorithm predictive queries, and relied on the assumption
for multiple sensor networks employing a spatial that the trajectories of the objects are fully
semijoin strategy. For optimization of the predictable at query processing time. One is
algorithm, a GR-tree index and a grid-ID-based based on indexing objects, and the other on
spatial approximation method, which are unique queries. For each approach, a cost model is
to sensor networks, are proposed. The GR-tree is developed, and a detailed analysis along with the
a distributed spatial index over the sensor nodes, respective applicability is presented. The Object-
which efficiently prunes away the nodes that will Indexing approach is further extended to multi-
not participate in a spatial join result. The grid-ID- levels to handle skewed data. Also the results of
based approximation provides great reduction in our analysis are validated, and extensions of the
communication cost by approximating many basic methods to handle non-uniform data
spatial objects in simpler forms. Experiments efficiently are presented and a variety of
demonstrate that the algorithm outperforms experiments to explore the benefits of our
existing methods in reducing energy consumption approach in a variety of parameter settings were
at the nodes. The 2-SN spatial join algorithm conducted. Reference [5] investigates the
based on a spatial semijoin strategy is presented problem of surface k-NN query processing, where
in this paper. For its optimization, additionally a the distance is calculated from the shortest path
GR-tree index and a grid-ID-based spatial along a terrain surface. A k-NN query finds the k
approximation method, which greatly reduce the nearest-neighbors of a given point from a point
communication cost was proposed. Reference [3] database. When it is sufficient to measure object
describes how spatial join can be effectively distance using the Euclidian distance, the key to
implemented and accelerated with MapReduce on efficient k-NN query processing is to fetch and
clusters. MapReduce is a widely used parallel check the distances of a minimum number of
programmingmodel and computing platform. With points from the database. For this type of k-NN
MapReduce, it is very easy to develop scalable queries, the focus of efficient query processing is
parallel programs to process data-intensive to minimize the cost of computing distances using
applications on clusters of commodity machines. the environment, which can be several orders of
However, it does not directly support magnitude larger than that of the point data. This
heterogeneous related data sets processing, problem is very challenging, as the terrain data
which is common in operations like spatial joins. can be very large and the computational cost of
The algorithm designed is named as SJMR finding shortest paths is very high. An efficient
(Spatial Join with MapReduce). SJMR is the first solution based on multiresolution terrain models is
parallel spatial join algorithm optimized for proposed. This approach eliminates the need of
MapReduce, which allows spatial join to be costly process of finding shortest paths by ranking
processed using MapReduce platform on clusters objects using estimated lower and upper bounds
of commodity machines. SJMR algorithm splits the of distance on multiresolution terrain models. This
inputs into disjoint partitions with a spatial is the first in-depth study of efficient sk-NN query
partitioning function at the Map stage, and processing in spatial databases. The proposed
merges every partition with a strip-based plane algorithm MR3 focuses on the underlying terrain
sweeping algorithm and a tile-based duplicate data management and can avoid extremely
avoidance method at the Reduce stage. The expensive surface distance computation by
strategies of SJMR can also be used in other ranking objects based on estimated surface
parallel environments, especially where neither of distance ranges. Two novel multiresolution data
the inputs has spatial index. structures, DMTM and MSDN, have been used to
82
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

remodel the terrain data to significantly reduce surface distance computation and hence incurring
the CPU and I/O costs by accessing and low I/O and computation costs. The authors have
processing surface data in a just-enough manner. introduced an efficient skNN processing
Experiments using large scale, real terrain data methodthat provides: 1) exact answers to the
have shown that MR3 outperforms the benchmark queries, 2) the actual shortest surface paths and
algorithm in all cases by nearly one order of 3) incremental results. This approach is compared
magnitude. In reference [6] a maintenance-free, in accuracy with the range ranking method and in
itinerary-based approach called Density-aware response time with the Chen-Han algorithm.
Itinerary KNN query processing (DIKNN) is While the results are 100% accurate (vs. lower
proposed. Current approaches to K Nearest than 50% accuracy for the most accurate
Neighbor (KNN) search in mobile sensor networks variation of when k > 5) its response time is 4 to
require certain kind of indexing support. This 5 times better than an efficient variation for most
index could be either a centralized spatial index or cases. The authors in reference [8] study the
an in-network data structure that is distributed problem of processing rank based KNN query
over the sensor nodes. Creation and maintenance against uncertain data. Besides applying the
of these index structures, to reflect the network expected rank semantic to compute KNN, also the
dynamics due to sensor node mobility, may result median rank which is less sensitive to the outliers
in long query response time and low battery is introduced. Both ranking methods satisfy nice
efficiency, thus limiting their practical use. The top-k properties such as exact-k, containment,
DIKNN divides the search area into multiple cone- unique ranking, value invariance, stability and
shape areas centered at the query point. It then fairfulness. For given query q, IO and CPU
performs a query dissemination and response efficient algorithms are proposed in the paper to
collection itinerary in each of the cone-shape compute KNN based on expected (median) ranks
areas in parallel. The design of the DIKNN scheme of the uncertain objects. To tackle the correlations
also takes into account challenging issues such as of the uncertain objects and high IO cost caused
the dynamic adjustment of the search radius (in by large number of instances of the uncertain
terms of number of hops) according to spatial objects, randomized algorithms are proposed to
irregularity or mobility of sensor nodes. A cost approximately compute KNN with theoretical
effective solution, DIKNN, for handling the KNN guarantees. Here rank based KNN query on
queries in mobile sensor networks. DIKNN uncertain data is studied where expected
integrates query propagation with data collection (median) rank satisfying important top-k
along a well-designed itinerary traversal, which properties are adopted as ranking criteria. Exact
requires no infrastructures and is able to sustain and randomized algorithms integrating efficient
rapid change of the network topology. A simple object pruning and IO accessing techniques are
and effective KNNB algorithm has been proposed developed to process queries modeled by either
to estimate the KNN boundary under the trade-off query points or uncertain regions.
between query accuracy and energy efficiency.
Dynamic adjustment of the KNN boundary has II. BACKGROUND
also been addressed to cope with spatial
irregularity and mobility of sensor nodes. From A. Overview OF R-TREE FAMILY
extensive simulation results, DIKNN exhibits a
superior performance in terms of energy 1. R-tree
efficiency, query latency, and accuracy in various R-trees are tree data structures that are similar to
network conditions. Reference [7] proposes an B-trees, but are used for spatial access methods,
index structure on land surface that enables exact i.e., for indexing multidimensional information; for
and fast responses to skNN queries. Two example, the (X, Y) coordinates of geographical
complementary indexing schemes, namely Tight data. A common real-world usage for an R-tree
Surface Index (TSI) and Loose Surface Index might be: "Find all museums within 2 kilometers
(LSI), are constructed and stored collectively on a (1.2 mi) of my current location". The data
single novel data structure called Surface Index R- structure splits space with hierarchically nested,
tree (SIR-tree). With those indexes, skNN query and possibly overlapping, minimum bounding
can be efficiently processed by localizing the rectangles (MBRs, otherwise known as bounding
search and minimizing the invocation of the costly boxes, i.e. "rectangle", what the "R" in R-tree
83
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

stands for). Each node of an R-tree has a variable


number of entries (up to some pre-defined  Nodes are not guaranteed to be at least
maximum). Each entry within a non-leaf node half filled
stores two pieces of data: a way of identifying a  The entries of any internal node do not
child node, and the bounding box of all entries overlap
within this child node.  An object ID may be stored in more than
one leaf node
e
d Advantages over R-trees

 Because nodes are not overlapped with


A i
C each other, point query performance benefits
g since all spatial regions are covered by at most
one node.
 A single path is followed and fewer nodes
are visited than with the R-tree.
f
Disadvantages

 Since rectangles are duplicated, an R+


B
tree can be larger than an R tree built on same
h
j data set.
Fig.1. Spatial Ordering  Construction and maintenance of R+
trees is more complex than the construction and
A B C maintenance of R trees and other variants of the
R tree.

g h

d e f

A B C
i j

Fig.2. R-tree
f g h
2. R+-Tree
d e f
An R+ tree is a method for looking up data using
a location, often (x, y) coordinates, and often for
locations on the surface of the earth. Searching f i j
on one number is a solved problem; searching on
two or more, and asking for locations that are
nearby in both x and y directions, requires craftier Fig.3. R+-tree
algorithms. Fundamentally, an R+ tree is a tree
data structure, a variant of the R tree, used for 3. ZR+-Tree
indexing spatial information. R+ trees are a
compromise between R-trees and kd-trees. They ZR+-trees resolves the limitations of the original
avoid overlapping of internal nodes by inserting R+-tree by eliminating the overlaps of leaf nodes.
an object into multiple leaves if necessary. The essential idea behind the ZR+-tree is to
logically clip the data objects to fit them into the
R+ trees differ from R trees in that: exclusive leaf nodes.
84
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

techniques must be developed to prevent


There are two fundamental differences between concurrent insertions, deletions and updates from
the clipping techniques applied in the ZR+-tree violating the consistency of the data structure.
and the R+-tree: Usage of the standard two-phase locking protocol
for this purpose results in the data structure
 From the definition of the ZR+-tree, becoming a bottleneck and thus poor
object clipping in the ZR+-tree must differentiate performance. Second, techniques must be
the MBRs of the segmented objects in leaf nodes, developed to protect the ranges specified in the
while the clipping in the R+-tree retains the retrieval from subsequent insertions and deletions
original MBRs. before the retrieval commits. Such insertions and
 In the ZR+-tree, each entry in a leaf node deletions are referred to as the phantoms. The
is a list of segmented objects that share the same Granular Locking for Clipping Indexing protocol
MBR, while each leaf node entry in the R+-tree (GLIP) provides phantom update protection for
contains exactly one object. the R+-tree and its variants. The concurrency
control protocol, GLIP, provides serializable
Advantages over R+-trees isolation, consistency, and deadlock-free
operations for indexing trees with object clipping.
 It eliminates overlaps even among entries
in different leaf nodes. C. Spatial Join
 The multidimensional access method,
ZR+-tree, utilizes object clipping, optimized Spatial join operation is used to combine two or
insertion, and reinsert approaches to refine the more data sets with respect to a spatial predicate.
indexing structure and remove limitations in A typical example of a spatial join query is ―Find
constructing and updating R+-trees. all pair of rivers and cities that intersect‖.
Predicate can be a combination of directional,
Disadvantages distance, and topological spatial relations. In case
of non-spatial join, the joining attributes must of
 The number of entries in the ZR+ - trees the same type, but for spatial join they can be of
may be larger than the number of actual objects different types. Usually each spatial attribute is
due to fragmentation. represented by its minimum bounding rectangles
 The insert and delete operations consume (MBR).
extra CPU cycles and I/O operations and hence is
slower than R-trees and R+-trees.

A B C

f2 g h

d E f1

f3 i j

Fig.4. ZR+-tree

B. Concurrency Control

Concurrent access to data through a Fig.5. Spatial join


multidimensional index structure introduces two
independent concurrency control problems. First, D. KNN Queries
85
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

intersecting region of the two nodes need to be


When data is organized in an R-Tree, the k compared, as shown in Figure 6.
nearest neighbors of all points can efficiently be
computed using a spatial join. The class of k
Nearest Neighbor (kNN) queries is frequently used
in geospatial applications. The new geospatial
applications heavily operate on a third dimension,
i.e., land surface. Spatial queries for extracting
data from wireless sensor networks are important
for many applications, such as environmental
monitoring and military surveillance. K Nearest Space A
Neighbor (KNN) query facilitates sampling of
Space B
monitored sensor data in correspondence with a
given query location.

III. SPATIAL JOIN ALGORITHM

In a tree structure every node of the tree


corresponds to a region of the data space. An
internal node's region covers the regions of its
sub-nodes and each node might or might not
overlap other nodes, depending on the index
type. Each node is typically stored on one page of
external memory. A spatial join can be performed
efficiently with a synchronized traversal of the Fig.6.When joining two data pages, only the
indices. objects within the intersecting region of the pages
(colored blue) need to be considered.
Step1. Starting with the two root nodes of the
indices, rootA and rootB, the algorithm finds IV. KNN ALGORITHM
intersections between the sub nodes of rootA and
rootB using the FIND INTERSECTING PAIRS K- nearest-neighbours is a part of supervised
function. learning that has been used in many applications
in the field of data mining, statistical pattern
Step2. The intersecting sub-node pairs are added analysis and many others.
to the priority queue and these pairs are checked
for intersecting sub-nodes in later iterations. K- nearest-neighbours (KNN) measures the
distance between a query scenario and a set of
Step3. If the two nodes are leaves, then the scenarios in the data set. The distance between
leaves are compared to report any intersecting two scenarios can be calculated using some
objects. distance function such as absolute distant
measuring and Euclidean distant measuring.
FIND INTERSECTING PAIRSfunction: When
joining two regions A and B, which could Step1. Determine the parameter k= number of
represent two index nodes, two data pages, or nearest-neighbours.
two partitions, if both regions cover the same
space and fit in internalmemory, then every Step2. Calculate the distance between the query
object in region A needs to be joined with every instance and all of the training samples using any
other object in region B. of the distance algorithms.

If the regions do not cover the same space, as is Step3. Sort the distances of all the training
likely when joining index nodes, then the search samples and determine the nearest-neighbour
space can be reduced. Only objects within the based on the k-th minimum distance.

86
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Step4. Use the majority of nearest-neighbours as [3] ShubinZhang, Jizhong Han, Zhiyong Liu, Kai
the prediction value. Wang ―SJMR: Parallelizing Spatial Join with
MapReduce on Clusters‖ Cluster Computing and
V. CONCLUSION Workshops, 2009. CLUSTER '09. IEEE
International Conference in September 2009.
The objects are arranged in the ZR+-tree format
where the search operation can be performed [4] Xiaohui Yu, Ken Q. Pu and Nick Koudas
efficiently than in R+-trees. The ZR+-tree ―Monitoring k-Nearest Neighbor Queries over
segments the object to ensure every fragment is Moving Objects‖ Proceeding ICDE ‘05 Proceedings
fully covered by a leaf node. This clipping-object of the 21st International Conference on Data
design provides a better indexing structure. Engineering in the year 2005.
Furthermore, several structural limitations of the
R+-tree are overcome in the ZR+-tree by the use [5]Ke Deng, Xiao Fang Zhou, Heng Tao Shen, Kai
of a nonoverlap clipping and a clustering based Xu and Xuemin Lin ―Surface K-NN Query
reinsert procedure. Spatial join is performed to Processing‖ ICDE '06 Proceedings of the 22nd
identify the intersected regions without duplicates International Conference on Data Engineering.
and the knn queries help in identifying the
nearest-neighbours. [6] Shan-Hung Wu, Kun-Ta Chuang,Chung-Min
Chen and Ming-Syan Chen ―DIKNN: An Itinerary-
REFERENCES based KNN Query Processing Algorithm for Mobile
Sensor Networks‖ IEEE 23rd International
[1] Xiaopeng Xiong, Mohamed F. Mokbel and Conference, Data Engineering 2007, ICDE 2007 in
Walid G. Aref., ―SEA-CNN: Scalable Processing of April 2007.
Continuous K-Nearest Neighbor Queries in Spatio-
temporal Databases‖ Data Engineering, 2005. [7] Cyrus Shahabi, Lu-An Tang and Songhua Xing
ICDE 2005. Proceedings. 21st International ―Indexing Land Surface for Efficient KNN Query‖
Conference. Journal Proceedings of the VLDB Endowment
Volume 1 Issue 1 August 2008.
[2] Min Soo Kin, Ju Wan Kim and Myoung Ho Kim
―Semijoin Base Spatial Join Processing in the [8] Ying Zhang, Xuemin Lin, Gaoping Zhu, Wenjie
Multiple Sensor Networks‖ ETRI journal, Volume Zhang and Qianlu Lin ―Efficient Rank Based KNN
30, No. 6, December 2008. Query Processing over Uncertain Data‖ Data
Engineering (ICDE), 2010 IEEE 26th International
Conference in March 2010.

87
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SECURE ENERGY EFFICIENT DATA


AGGREGATION PROTOCOL FOR DATA
REPORTING IN WIRELESS SENSOR
NETWORKS
R. Lekshmi Priya
M.E Final Year, Dept of Computer Science and Engineering, Rajalakshmi Engineering
College, Chennai.
Email id: yaraj09@gmail.com
Contact No: 9994498924
Abstract—In wireless sensor networks, networks may suffer different types of
adversaries can inject false data reports malicious attacks.
via compromised nodes and launch DoS One type is called false report injection
attack against legitimate reports. Recently, attacks [2], in which adversaries inject
a number of filtering schemes against into sensor networks the false data reports
false reports have been proposed. containing nonexistent events or faked
However, they either lack strong filtering readings from compromised nodes.Also,
capacity or cannot support highly dynamic the adversaries may launch DoS attacks
sensor networks very well. In this paper, against legitimate reports. In selective
a dynamic en-route filtering scheme is forwarding attacks[5], they may
proposed along with the Energy-Efficient selectively drop legitimate reports, while in
Secure Pattern based data aggregation report disruption attacks[4]; they can
(ESPDA) protocol that addresses both intentionally contaminate the
false report injection DoS attacks and authentication information of legitimate
reduce energy consumption by redundant reports to make them filtered out by other
data transmission to the cluster head. The nodes.
Hill Climbing key dissemination approach Several schemes have been proposed to
is designed that ensures the nodes closer address false report injection attack and
to data sources have stronger filtering DoS attack. However, they all have some
capacity. The ESPDA protocol used here limitations. The statistical en-route filtering
will reduce the redundant data (SEF) scheme [1] is independent of
transmission from sensor nodes to the network topology, but it has limited
cluster head. filtering capacity and cannot prevent
Keywords-Datareporting, en-route filtering impersonating attacks on legitimate
scheme, wireless sensor networks, ESPDA, nodes.
Hill climbing approach. The interleaved hop-by-hop authentication
I. Introduction (IHA) scheme [2],has a drawback, that is,
Wireless sensor networks consist of a it must periodically establish multihop
large number of small sensor nodes pairwise keys between nodes. Moreover, it
having limited computation capacity, asks for a fixed path between the base
restricted memory space, limited power station and each cluster-head to transmit
resource, and short-range radio messages in both directions, which cannot
communication device. In military be guaranteed due to the dynamic
applications, sensor nodes may be topology of sensor networks or due to the
deployed in hostile environments such as use of some underlying routing protocol
battlefields to monitor the activities of such as GPSR [8] and GEAR [9].
enemy forces. In these scenarios, sensor

88
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The commutative cipher based en-route In this paper, a dynamic en-route filtering
filtering (CCEF) scheme [3], relies on fixed scheme is proposed along with the ESPDA
paths as IHA does. Second, it needs protocol to address both false report
expensivepublic-key operations to injection attacks and DoS attacks in
implement commutativeciphers. Third, it wireless sensor networks. In the proposed
can only filter the false reports generated scheme, sensor nodes are organized into
by a malicious node without the session clusters. Each legitimate report should be
key instead of those generated by a validated by multiple message
compromised cluster-head or other authentication codes (MACs). Before
sensing nodes. sending reports, nodes disseminate their
In location-based resilient security (LBRS) keys to forwarding nodes using Hill
solution [4], the adversaries can Climbing approach. Then, they send
intentionally attach invalid MACs to reports in rounds. Each node can monitor
legitimate reports to make them dropped its neighbours by overhearing their
by other nodes. In addition, LBRS suffers broadcast, which prevents the
a severe drawback: It assumes that all the compromised nodes from changing the
nodes can determine their locations and reports. The ESPDA protocol used in this
generate location-based keys in a short paper prevents the redundant data
secure time slot. transmission from sensor nodes to cluster-
A location-aware end-to-end data security heads. ESPDA is energy and bandwidth
(LEDS) [5] scheme assumes that sensor efficient because cluster-heads prevent
nodes can generate the location-based the transmission of redundant data from
keys bounded to cells within a secure sensor nodes. This scheme can deal
short time slot like LBRS. However, this efficiently with the topology changes of
cannot prevent the adversaries from the sensor networks.
sending false reports with less than valid
shares. In addition, LEDS addresses II PROPOSED SCHEME
selective forwarding attacks by letting the A. System model
whole cell of nodes to forward reports, The communication region of wireless
which incurs high communication sensor nodes is modelled as a circle area
overhead. of radius r, which is called the
The dynamic enroute filtering scheme [6] transmissionrange. Only the bidirectional
makes use of hill climbing approach for links between neighbour nodes are
the early detection of false reports. Here, considered. Based on these assumptions,
no schemes are used for the reduction of two nodes must be the neighbour of each
redundant data transmission from the other and can always communicate with
sensor nodes to the cluster head. The each other if the distance between them is
period to re-disseminate the no more than r.These nodes detecting the
authentication keys is not taken into event are called sensing nodes. They
account, the metrics to choose the generate and broadcast the sensing
forwarding nodes is not considered, the reportsto the cluster-head. The cluster-
extra control messages increases head is responsible for aggregating these
operation complexity and also incurs extra sensing reports into the aggregatedreports
overhead. and forwarding these aggregated reports
Energy-Efficient Secure Pattern based to the base stationthrough some
Data Aggregation (ESPDA) [7] protocol forwarding nodes
prevents the redundant data transmission
from sensor nodes to cluster-heads. In
ESPDA, they did not give any mechanisms
to reduce the number of hops travelled by
the false data reports.

89
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

infected clusters can be easily quarantined


by the base station.
6) It prevents the redundant
transmission of data reports to the cluster
head by the sensor nodes; thereby it will
reduce energy consumption of the nodes.
C. Overview of the scheme
Specifically, the proposed scheme can be
divided into four phases: redundant data
Fig.1. Sensors nodes are organized into reduction, key predistribution phase, key
clusters. The big dashed circles outline the dissemination phase, and reportforwarding
regions of clusters. CH and BS denote phase. In the redundant data reduction
Cluster-Head and Base Stationrespectively. phase, the redundant data‘s from the
u1~u5 are forwarding nodes, and v1~v8are sensor nodes to the cluster heads are
sensing nodes (they can also serve as reduced by using the ESPDA protocol. In
forwarding nodes for other clusters). The the key predistribution phase, each nodeis
black dots represent the compromised preloaded with a distinct seed key from
nodes, which are located either within the which it can generatea hash chain of its
clusters or en-route. auth-keys. In the key disseminationphase,
Fig. 1 illustrates the organization of the cluster-head disseminates each node‘s
sensing nodes in wireless sensor first auth-keyto the forwarding nodes,
networks. In the figure, CH and BS denote which will be able to filter false
Cluster-Headand Base Stationrespectively. reportslater. In the report forwarding
u1~u5 are forwarding nodes, and v1~v8are phase, each forwardingnode verifies the
sensing nodes (they can also serve as the reports using the disclosed auth-keys and
forwarding nodes for other clusters). The disseminatedones. If the reports are valid,
black dots represent the compromised the forwarding nodediscloses the auth-
nodes, which are located either in the keys to its next-hop node after
clusters or en-route. Assume that the overhearingthat node‘s broadcast.
topologies of wireless sensor networks Otherwise, it informs the next-hop nodeto
change frequently. drop the invalid reports. This process is
B.Goals repeated by everyforwarding node until
Compared to existing ones, the proposed the reports are dropped or delivered tothe
scheme is expected to achieve the base station.
following goals: Fig. 2 demonstrates the relationship
1) It can offer stronger filtering between the four phases of the proposed
capacity and drop false reports earlier with scheme.
an acceptable memory requirement.
2) It can address or mitigate the
impact of DoS attacks such as report
disruption attacks and selective forwarding III ALGORITHM DESCRIPTION
attacks. In the section, the procedure of each
3) It can accommodate highly phase is given in detail.
dynamic sensor networks.
4) It should not rely on any fixed 1) Redundant data reduction phase:
paths between the base station and The ESPDA protocol is used to reduce the
cluster-heads to transmit messages. redundant data transmission from the
5) It should prevent the sensor nodes to the cluster head. It makes
uncompromised nodes from being use of the sleep- active mode coordination
impersonated. Therefore, when the protocol to eliminate the redundant data
compromised nodes are detected, the transmission. Each sensor node is set to
either idle or active mode for sensing

90
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

operation based on the connectivity and ALGORITHM:


conditions of the sensing environment. Begin
One technique is to focus on reducing 1. if (events in buffer observed by
redundant data to be transmitted from neighbours whichhas notbroadcast their
sensor nodes to cluster-heads. In ESPDA, decision)
nodes that have overlapping sensing 2. Z‘ = min (2 * Z, Z max)
ranges are identified and the sensing units 3. Broadcast sleep decision to neighbours
of some of those nodes are turned off for 4. Turn off sensing unit for duration Z
a bounded amount of time reduces energy 5. else
wastage since these nodes will produce 6. Z‘ = 0.5 * T
redundant data due to the overlapping. 7. Stay awake for next slot
8. endif
9. Flush event buffer
End

Algorithms to generate the pattern codes


and its comparisons are as follows:
Redundant Key ALGORITHM: Pattern Generation (PG)
data reduction Pre-distribution Input: Sensor reading D.Data parameters
are being sensed.
For clusters Output: Pattern-code (PC)
Begin
For forwarding nodes 1. Variable PC = []; // Initialize the pattern
code
2. for each data parameter
Report Key 3. Declare n; //Number of data intervals
Forwarding Dissemination for this data type
4. Declare interval[n], criticalvalue[n]; //
Fig. 2. The relationship between four Lookup tables
phases of the proposed scheme. 5. Extract the data from D for the
Redundant data reduction is done within corresponding dataparameter.
the clusters. Key predistribution is 6. Round off data for the precision
preformed only once. Key dissemination is required by the
executed by clusters periodically. Report corresponding data parameter.
forwarding happens at each forwarding 7. for i = 1 to n
node in every round. 8. interval[i] = threshold[i-1] –
threshold[i] ;
Whenever an event occurs in its sensing 9. endfor
range, the sensor node will send the 10. if (new seed sent by cluster-head)
corresponding pattern code to its cluster- then
head along with its neighbours. In the 11. // Refresh the mapping of critical
sleep protocol a sensor node cooperates values to data intervals
with its neighbours to identify the 12. for i = 1 to n
overlapping coverage regions. 13. criticalvalue[i] = S(criticalvalue[i],
Neighbouring nodes can communicate seed) ;
with each other via cluster-head. The term 14. endfor
sleep protocolis used to refer turning off 15. endif
the sensing unit of the nodes rather than 16. Find the respective critical value for
turning off the radio. When the timer each current datasensed using interval
expires each node performs the following and criticalvalue lookup tables.
algorithm. 17. PC = PC + [critical value]; //
Concatenate critical value topattern code

91
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

18. endfor ; The detailed procedure of key


19. PC = PC + [Timestamp] + [Sensor dissemination phase is as follows:
Id].; // Appendtimestamp and sensor id Step1: Each node constructs an Auth
End message, which contains l+1 copies of its
current auth-key, each encrypted using a
2) Key Predistribution phase: different one of its secret keys.
Step2: The cluster-head collects the Auth
Key predistribution needs to be performed messages from all nodes and aggregates
only once. It consists of two steps. them into message K(n)
Step1: Each node is preloaded with a Step3: The cluster-head chooses q(q>1)
distinct seed key. From the seed key, it forwarding nodes from its neighbours and
can generate a sequence of auth-keys forwards them a message, K(n).
using a common hash function. Thus, Step4: When a forwarding node receives
each node‘s authkeys form a hash chain h. K(n), it performs the following operations:
Letm denote the length of hash chain. 1) It verifies K(n) to see if K(n)contains at
Given node vias well as its seed keykmvi, its least t distinct indexes of z-keys.
authkeys can be calculated as follows: 2) It checks the indexes of secret keys in
K(n) to see if it has any shared key. When
a shared secret key is found, it decrypts
the corresponding auth-key using that key
and stores the auth-key in its memory.
3) K(n) does not need to be disseminated
to the base station. hmaxis defined as the
Where viis the node‘s index, and h2(kmvi) maximum number of hops that K(n)
means hashing kmvitwice. should be disseminated. Each forwarding
Step2: Besides the seed key, each node is node discards the K(n) that has already
also equipped with l+1 secret keys, where been disseminated hmax hops. Otherwise, it
l keys (called y-keys) are randomly picked forwards K(n) to other q downstream
from a global key pool (called y-key pool) neighbour nodes.
of size v, and the rest (called z-key) is Each node receiving K(n) repeats these
randomly chosen from another global key operations, until K(n) gets to the base
pool (z-key pool) of size w. station or has been disseminated hmax
hops.
3) Key Dissemination phase: Hill climbing: Two important observations
The cluster-head discloses the sensing are introduced. First, when multiple
nodes auth-keys after sending the reports clusters disseminate keys at the same
of each round. However, it is vulnerable to time, some forwarding nodes need to
such an attack that a malicious node can store the auth-keys ofdifferent clusters.
pretend to be a cluster-head and inject The nodes closer to the base station need
arbitrary reports followed by falsified auth- to store moreauth-keys than others
keys. To prevent this attack (typically those closer to clusters)
keydissemination is used i.e., the cluster- dobecause theyare usually the hot spots
head should disseminate the first auth- and have to serve more clusters.
keys of all nodes to the forwarding nodes
before sending the reports in the first
round. By using the disseminated keys,
the forwarding nodes can verify the
authenticity of the disclosed auth-keys,
which are in turn used to check the
validity and integrity of the reports. Key
dissemination should be performed
periodically.

92
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4) Report Forwarding Phase:


In this phase, sensing nodes generate
sensing reports in rounds. Each round
contains a fixed number of reports, where
this number is predetermined before
nodes are deployed. In each round, every
sensing node chooses a new auth-key,
i.e., the node‘s current auth-key, to
authenticate its reports.
Given node vi, its sensing report r(vi )is
Fig. 3. Key predistribution for Hill r(vi )={E,vi,ji,MAC(E, kviji)}
Climbing. Each y-key is randomly selected Where E denotes the event information, ji
from a different hash chain of length u=v/l is the index of vi‘s current auth-key, and
, but the z-keys are still selected from the MAC(E, kviji) is the MAC generated from E
global key pool. using key kviji .
In each round, the cluster-head generates
For example, in Fig. 1, u3 serves two the aggregated reports and forwards them
clusters and u1 serves only one, so u3 has to next hopnodes. Then, it discloses the
to store more auth-keys. Second, the sensing nodes‘ auth-keys after
falsereports are mainly filtered by the overhearing the broadcast from the next-
nodes closer to clusters, whilemost nodes hop node. The reports are forwarded hop-
closer to the base station have no chance by-hop to the base station. At every hop,
to usethe auth-keys they stored for a forwarding node verifies the validity of
filtering. If the nodes closer to clusters reports using the disclosed keys and
hold more auth-keys, the false reports can informs its own next-hop node the
be dropped earlier. verification result. The same procedure is
Hill Climbinginvolves two variations, one repeated at each forwarding node until the
for the key predistribution phase and the reports are droppedor delivered to the
other for the key dissemination phase. base station.
The first variation is: In Step2 of the key Fig. 3 depicts the detailed procedure,
predistribution phase, instead of picking y- which consists of the following steps.
keys from a global key pool, each node Step1: In each round, the cluster-head
selects each of its y-keys randomly from collects the sensing reports from all
an independent hash chain. Specifically, sensing nodes and generates a number of
the original y -key pool is partitioned into l aggregated reports such as R1, R2…. It
equal-sized hash chains, each containing sends these aggregated reports and an OK
v/l keys that are generated from a distinct message to next hop, uj. For example, an
seed key. The second variation is aggregated report R looks as follows:
proposed, in Step4 of the key R={r(vi1),…,r(vit)}
dissemination phase, after a forwarding Step2: Receiving the aggregated reports
node decrypts an auth-key from K(n), it and the OK message, uj forwards the
updates K(n) by encrypting the auth-key aggregated reports to next hop, uj+1. The
using its own y-key and then forwards the cluster-head overhears the broadcast of
updated K(n) to its downstream neighbour aggregated reports from uj.
nodes. By enforcing this substitution at Step3: Overhearing the broadcast from uj,
every forwarding node, it becomes harder the cluster-head discloses the auth-keys to
and harder for the nodes closer to the uj by message K(t). K(t) contains the auth-
base station to decrypt the auth-keys from keys of vi1,…vit. It has the same format as
K(n). Consequently, the nodes closer to K(n), but only t auth-keys.
clusters store more auth-keys, which Step4: Receiving K(t), uj first checks the
makes the false reports dropped earlier. authenticity of the disclosed keys using
the disseminated ones that it decrypted

93
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

from K(n) before. Then, it verifies the networks,‖ in proc. Ieee infocom, 2004,
integrity and validity of the reports by vol. 4, pp. 2446–2457.
checking the MACs of reports using the [2] s. Zhu, s. Setia, s. Jajodia, and p.
disclosed keys. Ning, ―an interleaved hop-by-hop
Step5: If the reports are valid, uj sends an authentication scheme forfiltering of
OK message to uj+1. Otherwise, it informs injected false data in sensor networks,‖
uj+1 to drop invalid reports. in proc. Ieee symp. Security privacy, 2004,
Step6: Similar to Step2, uj+1 forward the pp. 259–271.
reports to next hop. [3] h. Yang and s. Lu, ―commutative
Step7: Similar to Step3, after overhearing cipher based en-route filtering in wireless
the broadcast from uj+1, uj discloses K(t) sensor networks,‖ in proc. Ieee vtc, 2004,
to uj+1. vol. 2, pp. 1223–1227.
Every forwarding node repeats Step4 to [4] h. Yang, f. Ye, y. Yuan, s. Lu, and w.
Step7 until the reports are dropped or Arbaugh, ―toward resilient security in
delivered to the base station. The wireless sensor networks,‖ in proc. Acm
broadcast nature of wireless mobihoc, 2005, pp. 34–45.
communication is taken into account. In [5] k. Ren, w. Lou, and y. Zhang, ―leds:
this scheme, each node monitors its next- providing location-aware end-to-end data
hop node to assure no message is forged security in wireless sensor networks,‖ in
or changed intentionally. proc. Ieeeinfocom, 2006, pp. 1–12.
[6]zhen yu, member, ieee, and yong
guan, member, ieee, ―a dynamic en-route
IV CONCLUSION filtering scheme for data reporting in
In wireless sensor networks, adversaries wireless sensor networks,‖ ieee/acm
can inject false data reports via transactions on networking, vol. 18, no. 1,
compromised nodes and launch denial of february 2010.
service attacks. In this paper a dynamic [7] hasan çam, suat ozdemir, prashant
en-route filtering scheme that addresses nair, devasenapathy muthuavinashiappan
both false report injection and DoS attacks and h. Ozgur sanli, ―energy-efficient
in wireless sensor networks is proposed. secure pattern based data aggregation for
The Hill climbing key dissemination wireless sensor networks‖ acm journal,
approach is used to ensure that the nodes computer communications, volume29
closer to data sources have stronger issue 4, february, 2006.
filtering capacity. The redundant [8] b. Karp and h. T. Kung, ―gpsr: greedy
transmission of data reports by the sensor perimeter stateless routing for wireless
nodes to the cluster head is to be reduced networks,‖ in proc. Acm mobicom, 2000,
by using an Energy-Efficient Secure pp. 243– 254.
Pattern based data aggregation (ESPDA) [9] y. Yu, r. Govindan, and d. Estrin,
protocol. It reduces the energy ―geographical and energy aware routing: a
consumption of the sensor nodes and recursive data dissemination protocol for
improves the filtering capacity of the wireless sensor networks,‖ comput. Sci.
scheme. Dept., univ. California, los angeles, ucla-
csd tr-01–0023, 2001.
REFERENCES [10] z. Yu and y. Guan, ―a dynamic en-
route scheme for filtering false data
[1] f. Ye, h. Luo, s. Lu, and l. Zhang, injection in wireless sensor networks,‖ in
―statistical en- route detection and proc. Ieee infocom, 2006, pp. 1–12.
filtering of injected false data in sensor

94
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EFFICIENT ENERGY SAVING USING


DISTRIBUTED CLUSTER HEADS IN
WIRELESS SENSOR NETWORKS
R.Evangelin Hema Mariya
M. E Final Year, Dept of Computer Science and Engineering,Rajalakshmi Engineering College,
Chennai. Email id: Hema.mariya@Gmail.com , 09600250469.

Abstract-The main challenge in wireless sensor network (WSN) technology is a key


sensor network is to optimize energy component for ubiquitous computing. A
consumption when collecting data from WSN consists of a large number of sensor
sensor nodes. In many ad-hoc sensor nodes. Each sensor node senses
networks the important requirements are environmental conditions such as
prolonged network lifetime, scalability, and temperature, pressure and light and sends
load balancing. These requirements can the sensed data to a base station (BS),
be effectively achieved by the technique of which is a long way off in general. Since
Clustering sensor nodes. In this paper, a the sensor nodes are powered by limited
distributed clustering algorithm has been power batteries, in order to prolong the
used to collect data from sensor nodes life time of the network, low energy
and to reduce energy consumption using consumption is important for sensor
tabu search, which enables with low nodes. In order to reduce the energy
communication cost. Tabu search uses a consumption, a clustering and data
local or neighborhood search procedure to aggregation approach is being used. In
iteratively move from a solution x to a this approach, sensor nodes are divided
solution x' in the neighborhood of x, until into clusters, and for each cluster, one
some stopping criterion has been satisfied. representative node, which called cluster
Communication between the distributed head (CH), aggregates all the data within
cluster heads is to be achieved. This the cluster and sends the data to BS.
approach is suitable to avoid the energy Since only CH nodes need long distance
wastage during the transmission and to transmission, the other nodes save the
prolong the lifetime of sensor networks. energy consumption and increase the
The performance of distributed approach scalability and lifetime of the network.
is to be compared with the centralized
clustering approach. Clustering is one of the fundamental
issues in wireless adhoc and sensor
Index Terms: wireless sensor networks, networks. In clustered sensor networks,
clustering, efficient energy saving, clusterheads (CH) are responsible for data
Distributed cluster heads. fusion within each cluster and transmit the
aggregated data to the remote Base
1. Introduction station (BS). With clustering the network
payload has been greatly reduced i.e.
Sensor networks have recently emerged battery energy can be considerably saved.
as an important computing platform. In order to prolong the network lifetime,
Sensor nodes are typically less mobile and energy-efficient protocols should be
more densely deployed than mobile ad- designed for the characteristic of WSN.
hoc networks (MANETs).The wireless Efficiently organizing sensor nodes into

95
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

clusters is useful in reducing energy among the nodes within their clusters
consumption. Many energy-efficient (intracluster coordination), and
routing protocols are designed based on communication with each other and/or
the clustering structure. The clustering with external observers on behalf of their
technique can also used to perform data clusters (inter-cluster communication).
aggregation, which combines the data Energy efficiency operations are essential
from source nodes into a small set of in extending Wireless Sensor Networks
meaningful information. Under the lifetime. Among the energy saving- based
condition of achieving sufficient data rate solutions, clustering sensor nodes is an
specified by applications, the fewer interesting alternative that features a
messages are transmitted, the more reduction in energy consumption through:
energy is saved. Localized algorithms can (i) aggregating data; (ii) controlling
efficiently operate within clusters and transmission power levels (iii) balancing
need not to wait for control messages load; (iv) putting redundant sense or
propagating across the whole network. nodes to sleep.
Therefore localized algorithms bring better
scalability to large networks than This paper proposes a Distributed
centralized algorithms, which are executed clustering mechanism equipped with
in global structure. Clustering technique energy maps and constrained by Quality-
can be extremely effective in broadcast of-Service (QoS) requirements. Such a
and data query. Cluster-heads will help to clustering mechanism is used to collect
broadcast messages and collect interested data in sensor networks. The first original
data within their own clusters. aspect of this investigation consists of
adding these constraints to the clustering
During data collection, two mechanisms mechanism that helps the data collection
are used to reduce energy consumption: algorithm in order to reduce energy
message aggregation and filtering of consumption and provide applications with
redundant data. These mechanisms the information required without
generally use clustering methods in order burdening them with unnecessary data.
to coordinate aggregation and filtering. The existing centralized clustering
Clustering is particularly useful for methods cannot be used to solve this
applications that require scalability to issue due to the fact that our approach to
hundreds or thousands of nodes. model the problem assumes that the
Scalability in this context implies the need numbers of clusters and cluster heads are
for load balancing and efficient resource unknown before clusters are created,
utilization. Applications requiring efficient which constitutes another major original
data aggregation are natural candidates facet of this paper.
for clustering. Routing protocols can also 2. Problem Statement
employ clustering. Clustering was
proposed as a useful tool for efficiently The Central approach is less efficient than
pinpointing object locations. Clustering can the distributed approach in the cluster
be extremely effective in one-to many, building phase. The nodes in the
many - to- one, one-to-any, or one-to-all centralized approach have to send their
(broadcast) communication. For example, information to a central node that collects
in many-to-one communication, clustering all of the information and runs the
can support data fusion and reduce algorithm to build the clusters. Energy
communication interference. The essential consumed by building cluster and the
operation in sensor node clustering is to energy consumed during the data
select a set of cluster heads among the collection phase is more in centralized
nodes in the network, and cluster the rest approach.
of the nodes with these heads. Cluster
heads are responsible for coordination 3. Related work

96
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

neighbors or node degree. Cluster heads


There is a large body of related work in are randomly selected based on their
cluster formation and the communication residual energy, and nodes join clusters
between them that attempts to solve such that communication cost is
similar problems using various techniques. minimized. Simulation results show that
Moussaoui et al [1] discusses a―novel HEED prolongs network lifetime, and the
energy efficient and reliable clustering clusters it produces exhibit several
(EERC) algorithm‖ rebuild the clusters appealing characteristics.
when there is a heavy load in the CH. Suchismita Chinara et al [4] propose a
Furthermore, they have a great problem in cluster head selection criteria using an
reliability that cluster-heads are easy to be adaptive algorithm. As the selected cluster
attacked. It may lead uselessness of the heads form the routing backbone of the
whole cluster, thus greatly reduce the dynamic network, better stability is
network reliability. ensured by preferring low mobile nodes to
Raghuwanshi et al[2] similarly use act as cluster heads. The algorithm weight
the cluster communication based distributed mobility adaptive
.Communication within the cluster takes algorithm DMAC aims to distribute the
place over one-hop distance while traffic time for which anode is selected as cluster
moves through the network over multi- head in an uniform manner so thatevery
hops to points that are connected to a node obtains nearly equal opportunity to
much larger infrastructure. A handshake act as a centralrouter for its neighbor
takes place between the broadcasting nodes .
cluster-head and the non-cluster-head El Rhazi et al [5] propose a data collection
neighbors, before any data transmission algorithm using energy maps. Data
can begin. Each time the nodesin the aggregation and filtering methods, which
network configure – minimize transmitted messages over a
new/mobile/hibernating nodes network, are widely used at the moment
getdiscovered by the local search to reduce powerconsumption.A new data
performed as a part of thedynamic collectionmechanism that uses a
clustering scheme. Nodes that are closer distributed clustering method. The new
in distance can have lower energy levels cluster building approach is based on
than farther nodes and run out of battery thenetwork energy map and the QoS
power quickly. The broadcast is done to requirementsspecified by an application.
make the presence known to all The energy consumption model
neighbours at single-hop distance. Based determines the sensor lifetime. The
on the assumption that at least one node energy map, the component that contains
is awake at one-hop distance, the information concerning the remaining
corresponding cluster-head sets a timer available energy in all network areas, can
for which it decides to stay as the cluster- be used to prolong the network lifetime. A
head novel data collection approach for sensor
Younis et al [3] proposes another networks that use energy maps to reduce
clustering method for clusters in power consumption and increase network
distributed manner. Network lifetime can coverage is used. The nodesconsume
be defined as the time elapsed until the more energy compared to TAG.
first node (or the last node) in the network Heinzelman et al [6] focus on the limits of
depletes its energy (dies).energy efficient the scalability of the protocol. For this,
clustering method is implemented using a LEACH, application-specific protocol
protocol, HEED (Hybrid Energy-Efficient architecture is being proposed.LEACH, a
Distributed clustering), that periodically protocol architecture where computationis
selects cluster heads according to a hybrid performed locally to reduce the amount of
of their residual energy and a secondary transmitteddata, network configuration
parameter, such as node proximity to its and operation is done using localcontrol,

97
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

and media access control (MAC) and off neighboring radios during a certain
routing protocolsenable low-energy point-to-point wireless transmission can
networking. The advantage of rotatingthe mitigate this cost. To overcome this,
cluster head position among all the nodes Energy –Efficient data gathering and
enables LEACH toachieve a longer lifetime Dissemination algorithm is used.
than static clustering. LEACH is not as 3 Preliminaries of Proposed Algorithm
efficient as LEACH-C.
Lee et al [7] define an energy 3.1 Energy consumption model
consumption model. It shows theimpact
the coverage aging processof a sensor The energy consumption model
network, i.e., how it degrades over time determines the sensor lifetime. This model
as some nodes become energy-depleted. is affected by the application type, the
To evaluate sensing coverage with data extraction model and the network
heterogeneousdeployments, we use total communication model. Calculate the
sensing coverage, which representstotal energy consumption for a single cycle as
information that can be extracted from all follows:
functioningsensors in a network area. Ecycle = ED + ES + ET + ER
Energy consumption model determines a Where ED, ES, ET and ER represent the
device lifetimeby considering application energy required for data processing,
specific event characteristics, and network sensing, transmitting and receiving per
specific data extraction model and cycle time, respectively. The quantity of
communicationmethod. High-cost devices energy spent for each operation depends
can function as a cluster-head or sink to on the network and the event model.
collect and process the data from low-cost
sensors, which can enhance the duration 3.2 Energy maps
of network sensing operation.
Liang et al[8] proposes an Energy The energy map, the component that
efficientmethod for data gathering to contains information concerning the
prolong network lifetime. Theobjective is remaining available energy in all network
to maximize the network lifetime without areas, can be used to prolong the network
any knowledge of future query arrivals lifetime.
and generation rates. In other words,
theobjective is to maximize the number of 3.3 Data Collection Mechanism
data gathering queries answered until the
first node in the network fails. The Generally, sensor networks contain a large
Algorithm MNL significantly outperforms all quantity of nodes that collect
theother algorithms in terms of network measurements before sending them to the
lifetime delivered. applications. If all nodes forwarded their
Basu et al[9] discusses about the measurements, the volume of data
datadissemination and gathering.. A received by the applications would
majority of sensor networking applications increase exponentially, rendering data
involve data gathering and dissemination, processing a tedious task. A sensor system
hence energy efficient mechanisms of should thus contain mechanisms that
providing these services become critical. allow the applications to express their
However, due to the broadcast nature of requirements in terms of the required
the wireless channel many nodes in the quality of data. Data aggregation and data
vicinity of a sender node overhearits filtering are two methods that reduce the
packet transmissions even if those are not quantity of data received by Applications.
the intended recipients of these The aim of those two methods is not only
transmissions .This redundant reception to minimize the energy consumption by
results in unnecessary expenditure of decreasing the number of messages
battery energy of the recipients.Turning exchanged in the network but also to

98
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

provide the applications with the needed 5.Design intensification and diversification
data without needlessly overloading them mechanisms.
with exorbitant quantities of messages.
The aggregation data mechanism allows Initial solution:
for the gathering of several measures into The goal is to find an appropriate initial
one record whose size is less than the solution for the problem, in order to get
extent of the initial records. However, the the best solution from tabu search
result semantics must not contradict the iterations within a reasonable delay.
initial record semantics. Moreover, it must
not lose the meanings of the initial The Neighborhood definition:
records. The data filtering mechanism It involves a move involving a regular
makes it possible to ignore measurements node,a move involving an active node and
considered redundant or those irrelevant a move involving a cluster head .
to the application needs. A sensor system
provides the applications with the means Tabu lists:
to express the criteria used to determine Our adaptation proposes two tabu lists: a
measurement relevancy, e.g., an reassignment list and a re-election list.
application could be concerned with  Reassignment list:
temperatures, which are 1) lower than a The first tabu list prevents cycles that can
given value and 2) recorded within a be generated by the reassigning of a
delimited zone. The sensor system filters node to the same cluster. After each
the network messages and forwards only move, which consists of reassigning node
those that respect the filter conditions. to cluster, the pair is added to this tabu
list.
 Re-election list:
The second tabu list prevents the
3.5 A Tabu Search Approach reelection of an active node in the same
cluster. After a move, consisting of
In order to facilitate the usage of tabu electing node a in cluster , two pairs of
search for CBP, a new graph called Gr is nodes are added to the reelection list ,the
defined. It is capable of determining first pair prohibits the move the second
feasible clusters. A feasible cluster consists pair prevents the reverse move.
of a set of nodes that fulfil the cluster
building constraints. Nodes that satisfy
Constraint, i.e., ensure zone cover-age,
are called active nodes. The vertices of Gr
represent the network nodes. An edge is
defined in graph Gr between nodes i and j
if they satisfy Constraints. Consequently, it
is clear that a clique in Gr embodies a
feasible cluster. A clique consists of a set
of nodes that are adjacent to one another.
Five steps should be conducted in order to
adapt tabu search heuristics to solve a
particular problem:
1.Design an algorithm that returns an
Figure 1 – Flow Diagram for Tabu Search
initial solution,
for Clustering
2.Define moves that determine the
neighbourhood N of a solution s,
Aspiration criteria:
3.Determine the content and size of tabu
Aspiration criterion, which consists of
lists,
considering a move inventoried in the tabu
4.Define the aspiration criteria,
list, which in turn, engenders a solution

99
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

that is superior to the best solution found 2. The maximal number of iterations
in the first place. allowed has been reached;
3. The maximal number of iterations,
where the best solution is not enhanced
successively, has been reached.

4 Simulation Result and Discussion

Generate initial solution Comparison between Centralized and


Distributed approach
Construct neighbourhood The figure shows, the energy consumed to
build the clusters by the centralized and
Select best neighbour distributed approaches. Results show that
Update best solution the distributed approach needs less
energy consumption than the centralized
approach and the gap between these
Update memory structures energies becomes bigger when the
network size increases.
No
More iteration
Stop

Figure 3- Comparison between Centralized


and Distributed Approach
The reason behind this result is that the
central node needs to generate a
considerable number of messages in order
to collect all the node information.

5. Conclusions

This paper has presented a heuristic


approach based on a energy efficient
Figure2-Tabu Search Algorithm for search to solve clustering problems where
Clustering the numbers of clusters and cluster heads
Design intensification and diversification are unknown beforehand. The tabu search
mechanisms: adaptation consists of defining three types
of moves that allow reassigning nodes to
Diversification and Intensification are two clusters, selecting cluster heads, and
mechanisms that make it possible to removing existing clusters. Such moves
improve tabu search methods. They start use the largest size clique in a feasibility
by analysing the appropriate solutions cluster graph, which facilitates the analysis
visited and obtain their common of several solutions and makes it possible
properties in order to be able to intensify to compare them using a gain function.
the search in another neighborhood or to Performance of distributed approach with
diversify the searches. those of a centralized approach and we
The algorithm ends when one of the conclude that the central approach is less
following three conditions occurs: efficient than the distributed approach in
1. All possible moves are prohibited by the the cluster building phase.
tabu lists;

100
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Conf. Wireless and Mobile Computing,


Networking, and Comm. (WiMob ‘07),
References 2007.
[1]. O. Moussaoui, A. Ksentini, M. Naimi, [6]. W. Heinzelman, A. Chandrakasan, and
and M. Gueroui, ―A Novel Clustering H. Balakrishnan, ―An Application Specific
Algorithm for Efficient Energy Saving in Protocol Architecture for Wireless
Wireless Sensor Networks,‖ Proc. Seventh Microsensor Networks,‖ IEEE Trans.
Int‘l Symp. Computer Networks (ISCN Wireless Comm., vol. 1, no. 4, pp. 660-
‘06), pp. 66-72, 2006. 670, Oct. 2002.
[2].S. Raghuwanshi and A. Mishra, ―A Self- [7]. J.J. Lee, B. Krishnamachari, and C.C.J.
Adaptive Clustering Based Algorithm for Kuo, ―Impact of Heterogeneous
Increased Energy-Efficiency and Scalability Deployment on Lifetime Sensing Coverage
in Wireless Sensor Networks,‖ Proc. IEEE in Sensor Networks,‖ Proc. IEEE Sensor
58th Vehicular Technology Conf. (VTC and Ad Hoc Comm. and Networks Conf.
‘03), vol. 5, pp. 2921-2925, 2003. (SECON ‘04), pp. 367-376, 2004.
[3]. O. Younis and S. Fahmy, ―Distributed [8].W. Liang and Y. Liu, ―Online Data
Clustering in Ad-Hoc Sensor Networks: A Gathering for Maximizing Network Lifetime
Hybrid, Energy-Efficient Approach,‖ Proc. in Sensor Networks,‖ IEEE Trans. Mobile
IEEE INFOCOM, pp. 629-640, 2004. Computing, vol. 6, no. 1, pp. 2-11, Jan.
[4] Suchismita Chinara , Santanu Kumar 2007.
Rath ―Energy Efficient Mobility Adaptive [9]. P. Basu and J. Redi, ―Effect of
Distributed Clustering Algorithm for Mobile Overhearing Transmissions on Energy
Ad Hoc Network,‖ Proc ADCOM 2008,pp Efficiency in Dense Sensor Networks,‖
265-272,2008 Proc. Third Int‘l Symp. Information
[5].A. El Rhazi and S. Pierre, ―A Data Processing in Sensor Networks (IPSN ‘04),
Collection Algorithm Using Energy Maps in pp. 196- 204, Apr. 2004.
Sensor Networks,‖ Proc. Third IEEE Int‘l

101
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A NOVEL FRAMEWORK FOR DENIAL OF


PHISHING BY COMBINING HEURISTIC &
CONTENT BASED SEARCH ALGORITHM

R.Vadhani , M.E.( Computer Science Engineering) , Department of computer science ,


Rajalakshmi Engineering College , Thandalam , Chennai , Vadhanitamilarasi@gmail.com,
09884069489

Abstract: Phishing is a current social 1. INTRODUCTION


engineering attack that results in online
identity theft. In a phishing attack, the A phishing attack is an attack whereby
attacker persuades the victim to reveal users‘ account information is stolen by a
confidential information by using web site fake website. Attacks target customers of
spoofing techniques. There are two major banks and online payment
approaches in phishing detection: the services.Phishing attacks, which steal
blacklist and the heuristics-based users‘ account information by fake
approach. A blacklist is a list of known websites and spoofed emails,have become
phishing sites, compared with accessing a serious problem on the Internet
sites in order to distinguish whether the users.Attackers send an e-mail to victim
sites are original or not. Heuristics-based user, leading them to phishing sites
approaches employ common because phishing attacks are becoming
characteristics of phishing sites such as more skillful, we must take
distinctive keywords used in web pages or countermeasures against these attacks.
URLs in order to detect new phishing sites Phishing has become a significant threat
that are not yet listed in blacklists. In to Internet users. Phishing attacks
order to overcome this weakness, visual typically use legitimate-looking but fake
similarity-based detection techniques have emails and websites to deceive users into
been proposed. In the proposed system disclosing personal or financial information
phishing detection mechanism based on to the attacker.
phishing detection mechanism based on
heuristic and content based search
algorithm. Here the false-positive rate can Phishing is the creation of email messages
be reduced. To decrease the false positive and webpages that are replicas of existing
rate the accuracy of calculating similarity sites to fool usersinto submitting personal,
by not only images displayed on the web financial, or password data tothe
page but also HTML analysis. Here the fraudsters. Users can also be tricked into
system can compare the content, downloading and installing hostile
language and domain for detection software, which searches the user‘s
mechanism. This technique can also computer or monitors online activities to
reduce the false-positive rate. steal private information. There are many
Anti –Phishing toolbar are used to prevent
KEYWORDS:Security Usability, Phishing, the user from the Phishing attacks like
Anti Phishing, Visual Similarity. EarthLink, Google, Net craft, Cloud mark,
and Internet Explorer 7.

102
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

developer to customize and control


Internet Explorer but also phishers to
compromise connections. And the host on
the victim‘s machine is corrupted, for
example using malware. The host files
maintain local mapping between DNS
name and IP addresses. By inserting a fake
DNS entry into the user‘s host file, it will
appear that their web browser is
connecting to a legitimatewebsite when in
fact it is connecting to a phishing website.

FIG 1.Flow Diagram of Phishing Attack

A.PROCESS OF PHISHING ATTACK

In a typical Phishing attack, the phisher


send a large number of spoofed (i.e. fake) FIG 2. Phishing Attack
E-mails to random Internet users that
seem to be coming from the Legitimate
and well known business Organization B.CLASSFICATION OF PHISHING ATTACK
(e.g. financial institutions, credit card
companies, etc).The E-Mail Urges the  Spoofed E-mails are sent to a set
victim to update his Personal information of victim asking them to upgrade their
as a condition to avoid losing access rights password, data account etc.,
to specific services (e.g. access to online  MSN, ICQ, AOL and other IM
bank account, etc).By clicking on the link channel are used to reach the victim.
provided the victim is directed to a bogus Social engineering techniques are used
website implemented by the attacker. The to gain victims sensitive information.
Phishing website is structured as a clone  Calling the victim on the phone
of the original website so the victim is not classic social engineering techniques are
able to distinguish it from that of the used by phisher.
service he/she has access to. Web site  Another kind of attack is based on
URL are encoded or obfuscated to not Internet browser vulnerability. This
raise suspicious. IDN spoofing, for approach is usually adopted to
Example, uses Unicode URLs that render automatically install dialers.
URLs in browsers in a way that the address C.TECHNIQUES IN PHISHING DETECTION
looks like the original web site address but
actually link to a fake web site with a  Server based techniques are
different address. Victims are redirected to implemented by service provider (E.g.
a Phishing website by first using malware Internet service providers, e-commerce
to install a malicious Browser Helper stores, financial institutes, etc...).
Object. BHOs are DLLs that allow

103
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 Client based techniques are may learn to distrust the toolbar. The
implemented on user end point through FIG3. describe about how the security tool
browser. bar prevent phishing attack .
D.ANTI-PHISHING TOOLS
F. DRAWBACK OF THE SECURITY
The existing anti-phishing tools is TOOLBAR APPROACH
categories into blacklist-based, heuristic-
based, and content-based tools; which is  A toolbar is a small display in the
kind of heuristic-based. Google Safe peripheral area of the browser, compared
Browsing for Firefox is one of the blacklist- to the large main window that displays the
based tools, which is an extension of a web content. Users may not pay enough
web browser that alerts users if a web attention to the toolbar at the right times
page visited appears to be asking for to notice an attack.
personal or financial information under  A security toolbar shows security-
false pretenses by combining advanced related information, but security is rarely
algorithms with reports about misleading the user‘s primary goal in web
pages from a number of sources. Because browsing.User may not
Google Safe Browsing for Firefox uses a Care about the toolbar‘s display even if
blacklist, it is vulnerable to new phishing they do notice it.
sites.  If a toolbar sometimes makes
mistakes and identifies legitimate sites as
E. EFFECTIVENESS OF ANTI-PHISHING phishing sites, users may learn to distrust
TOOLS the toolbar.

There are two methods of anti-phishing:


protection by the toolbar and protection
by the anti-phishing browser. Toolbars do
not support all web browsers and users
are thus forced to use specific browsers
that are supported by toolbars. Anti-
Phishing is divided into two types server
based and client based. Therefore, we
must install certain browsers for installing
anti-phishing toolbars.

In addition, it must be considered how


effective the toolbar and anti-phishing
browser are in preventing users from
visiting web sites the toolbars had
determined to be fraudulent. They found
that many people ignored the toolbar
security indicators and instead used the FIG 3.Visual security tool indicator mozilla
site‘s content to decide whether or not it firefox browser
was a scam. Therefore, toolbar-based
implementations are not so effective and G. FALSE POSITIVE RATE
showed that most people cannot
distinguish between a legitimate site and a Phishing site detection using images
phishing site, even when they aware that displayed on web pages with an initial
they are being tested for their ability to image database determined more false
identify a phishing attack. If a toolbar positive rate.(ex: Actually the system
sometimes makes mistakes and identifies accessed site is Legitimate site but it
legitimate sites as phishing sites, users produce the result is Phishing site). So the

104
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

false positive rate is more. Solving this LIMITATION: Some times APWG group
problem the system compare not only the unable to identify the phishing website.
image displayed on the web pagebut also  TITLE:―A layout-similarity-based
the content and language in which the approach for detecting phishing pages‖.
web page is designed so the false positive REFERENCE: Angelo P. E.
rate is reduced. Rosiello[4]explained the client side
solution for phishing attack.In this
approach makes DOM-based layout
H. RELATED WORK comparisons of legitimate sites with
A growing number of user studies are potential phishing sites to detect phishing
investigating why phishing attacks are so pages.
effective against computer users. LIMITATION: Only applicable for client
based solution not possible for server
 Title: ―An Evaluation of Anti- based solution.
Phishing Toolbars‖  TITLE:―A Content-Based Approach
REFERENCE: In November 2006 a study in to Detecting Phishing Web Sites‖.
Carnegie Mellon University found that anti- REFERENCE: Yue Zhang[5] founded the
phishing toolbars that were examined in CANTINA algorithm and it takes Robust
this study left a lot to be desired.[1] The Hyperlinks, an idea for overcoming page
researcherfound that three of the 10 not found problems using the well-known
toolbars, Spoof Guard, EarthLink and Net Term Frequency / Inverse Document
craft, were able to identify over 75% of Frequency (TF-IDF) algorithm, and applies
the phishing sites tested. it to anti-phishing.
LIMITATION:There are four of the LIMITATION:TF-IDF approach
toolbars were not able to identify even can identify
half the phishing sites tested. At the same 97% of phishing sites with about 6% false
time, SpoofGuard incorrectly identified positives rate.
38% of the legitimate URLs as phishing
URLs. 2. DESIGN
 TITLE:‖Do Security
ToolbarsActually Prevent Phishing The UML diagram describes how the
Attacks?‖ Phishing detection carried out in the
REFERENCE: In ACM Conference April system.
2006 researcher identify three types of
security toolbars, as well as browser
address and status bars, to test their
effectiveness at preventing phishing
attacks. [2]
LIMITATION: All the tool bars are failed to
prevent users from being spoofed by high-
quality phishing attacks.
 TITLE:Phishing Activity Trends
Report, Q1 2008‖.
 REFERENCE: The APWG, founded
as the Anti-Phishing Working Group and it
is serves as a public and industry
resource for information about the
problem of phishing and email fraud,
including identification and promotion of
pragmatic technical solutions that can
provide immediate protection and benefits
against phishing attacks[3].

105
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

3. ALGORITHM

ALGORITHM FOR PHISHING DETECTION

STEP 1 : Start
STEP 2 : Input the url of the website.
Start STEP 3 : Capture the screen shoot
automatically & store in the database.
STEP 4 : Compare with the available
INPUT: URL of Image db using K-means
the website (1) clustering Algorithm.
STEP 5 :If the threshold value* of the
Image doesn‘t exceeds go to step(9).
Capture the screen shot STEP 6 : If the images are dissimilar
(2) compare the Content and Language of the
web page stored in the database.
(3) STEP 7 : If the comparison match go to
Search with image database
step (9).
STEP 8 : Store the website is Phishing
(4) site.
yes Then go to step (10).
Threshold valve STEP 9 : Store the website as Legitimate
exceeds in the database.
STEP 10:Display the result.
STEP 11: Stop the Process
no
no (7) The system where it is determined
whether input URLs are in right format.
Compare the And the image database consists of a pair
(6)Legitimate site content and of URLs and their image displayed on the
language web browser. First, this system accesses
(5) yes the targeted URL by the web browser and
yes takes the image displayed on the browser.
Next, this system compares the image
Register in Image db
with in the Image database. Each entry in
Phishing site
the database has one of three labels:
legitimate, phishing, and unknown. If an
image is registered in the database, this
method regards the site as an imitating
(8) site. This system distinguishes malicious
Display the result from legitimate web pages by comparing
whether the siteis domain and images. If the image of the
Phishing or model site is not in the image database,
Legitimate
this method can detect malicious web
pages by similarity between victim sites
imitating the same site.
End .
4. IMPLEMENTATION

FIG 4. UML DIAGRAM FOR THE SYSTEM STEPS INVOLVED IN PHISHING


DETECTION

106
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

MECHANISM IN PROPOSED METHOD


Web browser

IMAGE CAPTURE

This is the first phase of the project. In End user


this module we prepare the system to Enter the URL
obtain images displayed on the web
browser. We prepare a virtual screen of X
window to display a web browser. The
proposed system accesses the site of the
Capture image
URL. And when web page is displayed,
this system takes screen shot of the web
browser.
Image
&Content
Comparison
INITILIZATION OF DATABASE

Our system can use the initial database for


determination from the beginning. In the
initial state, the image database consists
of legitimate sites and phishing sites.

IMAGE SIMILARITY Legitimate Phishing Unknown

Image in image database consist of


Legitimate, phishing and unknown.Each Image database
entry in thedatabase has one of three
labels: legitimate, phishing, and
unknown.Image similarity search take
place based on the features of the web
page and in threshold calculation. FIG 5 . ARCHITECTURE DIAGRAM

CONTENT AND LANGUAGE SIMILARITY


* THRESHOLD CALCULATION FOR IMAGE
In the content similarity search the COMPARISON.
content of the web page is extracted and
compare and then in the language (1). An Initial Threshold (T) is chosen.
similarity search the language in which the (2).The Image is segmented into two set
web page is designed is compared and
produce the result. *G1={f(m,n);f(m,n)>T}(object Pixel)
*G2={f(m,n);f(m,n)<=T}(background
Pixel).
f(m,n)=The value of the pixel located in
the nth column, nth row.
(3).The average of each set is computed
*m1=Average value of G1.
*m2=Average value of G2.
(4).A new threshold value is created that
is the average of m1 and m2.
*T `=m1+m2/2
(5).To calculates the new threshold value.
Go back to step(2).

107
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

REFERENCE
5.CONCLUSION
Phishing has become a significant [1] Lorrie Cranor, Serge Egelman, Jason
threat to Internet users. Phishing attacks Hong, and Yue Zhang, ―Phinding Phish:
typically use legitimate-looking but fake ―An Evaluation of Anti-Phishing Toolbars‖
emails and websites to deceive users into 14th Annual Network and Distributed
disclosing personal or financial information System Security Symposium (NDSS 2007).
to the attacker.Phishing is a form of [2] Min Wu, Robert C. Miller and Simson L.
criminal conduct that poses increasing Garfinkel, ―Do Security ToolbarsActually
threats to consumers, financial Prevent Phishing Attacks?,‖ In Proceedings
institutions, and commercial enterprises in of ACM Conference on Human Factors in
Canada, the United States, India, and Computing Systems (CHI2006).
other countries. Because phishing shows [3]Anti Phishing Working Group, ―Phishing
no sign of abating, and indeed is likely to Activity Trends Report, Q1 2008,‖ Aug.
continue in newer and more sophisticated 2008
forms, law enforcement, other http://www.antiphishing.org/reports/apwg
government agencies, and the private _report_Q1_2008.pdf.
sector.In this paper, we propose a 4]Angelo Rosiello, Christopher Kruegel,
phishing detection mechanism based on a Engin Kirda and Fabrizio Ferrandi, ―A
novel framework for denial ofphishing by layout-similarity-based approach for
combining heuristic & content based detecting phishing pages,‖ 3rd
search algorithm .Here the false positive International Conference on Security and
rate should be reduced. Privacy in Communication Networks
(SecureComm 2007).
[5]Yue Zhang, Jason Hong, Lorrie Cranor,
―CANTINA: A Content-Based Approach to
Detecting Phishing Web Sites,‖ the 16th
International World Wide Web Conference
(WWW 2007).

108
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

RISK ESTIMATION USING OBJECT-


ORIENTED METRICS
*D.Sureshbabu, **C.Prabhakaran
*Asst.Professor, Department of IT, Vel Tech Multi Tech Dr.RR & Dr.SR Engineering
College, No.60,Avadi-Alamathi Road,Chennai-62.
*PG Student, Department of IT, Vel Tech Multi Tech Dr.RR & Dr.SR Engineering College,
No.60,Avadi-Alamathi Road,Chennai-62.
sureshbabu.me@gmail.com(9894905247),augustprabha@gmail.com(9003705766)
approach to software metrics.Since object
Abstract-With the increasing use of object- oriented technology uses objects and not
oriented methods in new software algorithms as its fundamental buildingblocks,
development there is a growing need to both the approach to software metrics for object
document and improve current practice in oriented programs must be different from
object-oriented design and development. In thestandard metrics set. Some metrics, such
response to this need, a number of as lines of code and cyclomatic complexity,
researchers have developed various metrics havebecome accepted as "standard" for
for object-oriented systems as proposed aids traditional functional/ procedural programs,
to the management of these systems.Object- but for objectoriented,there are many
oriented metrics have been validated proposed object oriented metrics in the
empirically as measures of design complexity. literature. The question is,"Which object
However, there are few studies that were oriented metrics should a project use, and can
conducted to formulate the guidelines, to any of the traditional metrics beadapted to the
interpret the complexity of the software design object oriented environment?"
using metrics. In order to keep OO Object oriented software development
development approach as an efficient one risk requires a different approach from more
(complexity) level estimation should be done traditional functional decomposition and data
with OO metrics. For such estimation we use a flow development methods. While the
statistical model, derived from the logistic functional and data flow approaches
regression, to identify threshold values for OO commence by considering the systems
metrics. Classes can be clustered into low and behavior and/or data separately, object
high risk levels using threshold values. These oriented analysis approaches the problem by
metrics can be used to mitigate potential looking for system entities that combine them.
problems in the software complexity. Object oriented analysis and design focuses on
objects as the primary agents involved in a
computation; each class of data and related
1 Introduction operations are collected into a single system
Object-oriented design and development is entity.
becoming very popular in today'ssoftware Our approach to identify a set of object
development environment. Object oriented oriented metrics that can be the primary
development requires not only a critical constructs of object oriented design
differentapproach to design and and to select metrics that evaluate those
implementation, it requires a different areas. The metrics focus on internal object

109
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

structures that reflect the complexity of each  Number of Class and Methods Thrown
individual entity, such as methods and classes, Away
and on external complexity that measures the
interactions among entities, such as coupling The CK OO Metric Suite
and inheritance. Metrics measure
computational complexity,that affects the Chidamber and Kemerer (CK) proposed six OO
efficiency of an algorithm and use the use of design and complexity metrics. They are
machine resources, as well as psychological
complexity factors that affect the ability of Weighted Methods Per Class(WMC)
programmer to create, modify and maintain The WMC metric is the sum of the complexity
software. The OO metrics have been used to of all methods for a class. It is the summing of
assess the quality of the software design such cyclomatic complexity of all the methods in the
as the fault-pronenessandthe maintainability class. Therefore, high values of the WMC
of classes. metric mean highcomplexities as well.

2 Object Oriented Metrics Depth of inheritance hierarchy (DIT)


The DIT measures the length of the
The object oriented metrics that support inheritance chain from the root of the
the goal of measuring design and code quality inheritance tree to the measured class. The
in object oriented programming structures. DIT metric is an indicator of the number of
ancestors of a class. It may require developers
Design and Complexity Metrics and testers to understand all ancestors to
comprehend all specializations of the class,
Classes and methods are the basic constructs which is necessary to maintain or uncover pre
of OO technology. The amount of function and post release faults.
provided by an OO software can be estimated
based on number of identified classes and Coupling between objects (CBO)
methods or its variants. Hence basic metrics
are related to classes and methods. For design An object class is coupled to another one if
and complexity measures the metrics would it invokes another one‘s member function or
have to deal with specific OO characteristics instance variables. The CBO metric counts the
such as inheritance, instance variable and number of other classes to which a class is
coupling. coupled.

Lorenz Metrics and Rules of Thumb Lack of cohesion of methods (LCOM)

Lorenz proposed eleven metrics as OO design The cohesion of a class is indicated by how
metrics. They are listed below as closely the local methods are related to the
 Average Method Size(LOC) local instance variables in the class. High
 Average Number of Methods per Class cohesion indicates good class subdivision. The
 Average Number of Instance Variables LCOM metric measures the dissimilarities of
per Class methods in a class by usage of instance
 Class Hierarchy Nesting Level(DIT) variables. LCOMis measured as the number of
 Number of Subsystem/Subsystem disjoint sets oflocal methods. Lack of cohesion
Relationships increases complexity and opportunities for
 Number of Class/Class Relationship in error during the development process.
Each Subsystem Response for class (RFC)
 Instance Variable Usage This is the number of methods that can be
 Average Number of Comment executed in response to a message received
Lines(per Method) by an object of that class. The larger the no of
 Number of Problem Reports per Class methods that can be invoked from a class
 Number of Times Class is Reused through messages, the greater the complexity

110
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

of the class. It captures the size of the By using those identified threshold values,
response set of a class. The response set of a classes under examination can be clustered
class is all the methods called by local into low and high risk levels. Predicting the
methods. RFC is the number of local methods probability of the faulty classes is necessary
plus the number of methods called by local information to guide developers in their
methods. endeavor to improve the software quality
Number of child classes (NOC) and to reduce the costs of testing and
The NOC metric counts the number of maintenance. The probability of the faults in
descendents of a class. The number of classes can be used to rank classes based on
children represents the number of the risk level. The classes that are within the
specializations and uses of a class. Therefore, high-level risk need more investigation than
understanding all children classes is important the classes within the low-level risk.
to understand the parent. The high number of Software metrics thresholds can be used
children increases the burden on developers for the purpose of alarming the classes that
and testers in comprehending,maintaining, fall within an arbitrary risk level. With the help
and uncovering pre and post release faults. of the threshold values,developers and
3 Approach testers can scrutinize the classes during
Approach Overview the project progress and prepare design
The objective of this work is to estimate resolutions for these classes. The developers
risks levels in software development using OO and testers may usethese thresholds to
metrics. This can accomplished by means of identify refactoring candidates such as bad
using a statistical model, derived from the code classes. Therefore, software
logistic regression(LR), to identify threshold developers and testers need convenient
values for the Chidamber and Kemerer (CK) and intuitive techniques for identifying classes
metrics. ThisLogistic regression model yields that exceed an empirically specified risk
probability value for all the metrics individually level. Hence s/w
thereby we derive the threshold values for Table 1 Significance levels (P-values)
each of the metics. Univariate Logistics Analysis
metrics have been validated theoretically and values which results in identification of faulty
empirically as good predictors of quality classes. The general logistic regression model
factors. is as follows
But in previous approaches, metrics have not
been validated as measures of design P(X) = / ----(1)
complexity since there is a lack of empirical
validation of the acceptable risk levels and
Where g(x) =α+β*X is logit function, P is
quality assurance tools and the absence of
probability of class being fault, X is OO metric,
quantitative models that can be used easily to
β is estimated coefficient from maximizing the
derive metrics threshold values without
log-likelihood and α is the estimated constant.
repeating the tedious process of the data
collection makes OO design as complex
method. It results in significant correlations
between bad code and software faults then
studies found that association between metrics
and fault-proneness of classes, these
associations have not been exploited
effectively to identify threshold effects.
Univariate Logistic Regression Analysis
The CB0, RFC,and WMC metrics aresignificant
The logistic regression is used to validate the
predictors of the fault-proneness of classes.
metrics and to construct the threshold
model. In this section, we discuss the use of
Threshold Effects Analysis
the logistic regression to identify threshold

111
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The Value of an Acceptable Risk Level p0 values, we obtain different threshold


(VARL) threshold values are calculated using values for each metric because p0 was a
(2) and the results are presented in Table 2. factor in calculating the threshold values.
For classes with metrics values below Second, the threshold values at (p0 =
VARL, the risk of a fault occurrence is O.O5) lay outside the observation range of all
lower than the probability (p0 ). The VARL metrics; therefore, we cannot identify
is calculated as follows thresholds at this level. Table 2 shows many
potential threshold values. We need to select
VARL=1/β(log(p0/1-p0)-α)----(2) the threshold value that can result in the
best classification accuracy. Fig.1 shows the
The α and β values are the coefficient relationship between the number of bugs and
estimates which we use in the calculation the risk levels for the three metrics. These
of VARL values. Table 2 shows the results of graphs show the use of a specific risk level in
the VARL at fivelevels of risk (between p0 = identifying the bugs. The best VARL is the
O.O5 and p0 = O.1O). one that can predict more bugs.

We havetwo observations on the results that


are presented in Table 2. First, for different

Table 2 VARL Threshold values for p0

Fig.1.The Relationship between bugs and risk levels(p0)

112
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The above figure shows numbers of bugs [1] L. Briand, J. Wust, and H. Lounis,
in a project can be increased with probability ―Replicated Case Studies forInvestigating
of class being faulty.Whenever it happens Quality Factors in 0bject-0riented Designs,‖
refinement work should be done to minimise Em- pirical Software Eng., vol. 6, no. 1, pp.
the occurrence of cumulative numbers of 11-58, 2001.
bugs.
[2] L. Rosenberg, ―Metrics for 0bject-0riented
4 Conclusion Environment,‖ Proc.EFAITP/AIE Third Ann.
Software Metrics Conf., 1997.
The threshold values provide a
meaningful interpretation for metrics and [3] S. Chidamber, D. Darcy, and C.
provide a surrogate to identify classes atrisk. Kemerer, ―Managerial Use of Metrics for
The classes that exceed a threshold value can 0bject 0riented Software: An Exploratory
be selected for more testing to improve Analysis,‖ IEEE Trans. Software Eng., vol. 24,
their internal quality, which increases the no. 8, pp. 629-639, Aug. 1998.
testing efficiency. This aproach can be on an
open-source system. It is clear that OO [4] L. Briand, J. Daly, and J. Wust, ―A
metrics serve the project manager, the Unified Framework for Coupling
developer, and the tester in assuring the Measurement in 0bject-0riented Systems,‖
quality of the software product and IEEE Trans. Software Eng., vol. 25, no. 1, pp.
mitigate potential problems in software 91-121, Jan./Feb. 1999.
complexity.
[5] M.H. Tang, M.H. Kao, and M.H. Chen,
References ―An Empirical Study on0bject-0riented
Metrics,‖ Proc. Sixth Int‘l Symp. Software
Metrics,pp. 242-249, 1999.

113
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[6] S. Chidamber and C.F. Kemerer, ―A


Metrics Suite for 0bject-0riented Design,‖
IEEE Trans. Software Eng., vol. 20, no. 6, pp.
476-493, June 1994.

[7] Y. Zhou and H. Leung, ―Empirical


Analysis of 0bject-0riented Design Metrics
for Predicting High and Low Severity
Faults,‖ IEEE Trans. Software Eng, vol. 32,
no. 10, pp. 771-789, 0ct. 2006.

[8] R. Gronback, ―Software Remodeling:


Improving Design and Implementation
Quality Using Audits, Metrics and
Refactoring in Borland Together Control
Center,‖ a Borland white paper, Jan.2003.

[9] M. Cartwright and M. Shepperd, ―An


Empirical Investigation of an 0bject-0riented
Software System,‖ IEEE Trans. Software
Eng., vol. 26, no. 8, pp. 786-796, Aug. 2000.

114
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EMBEDDING CRYPTOGRAPHY IN VIDEO


STEGANOGRAPHY
G.SHOBA.,B.E.,M.E
Dr.Paul’s Engineering College ,mailtoshoba@gmail.com,09443033607
S.UMA.,B.E.,M.E
Dr.Paul’s Engineering College ,dewuma@gmail.com,08825226654

Abstract: into unintelligible i.e., ciphertext.


Steganography is the art of hiding Decryption is the reverse, in other words,
information in ways that avert the moving from the unintelligible ciphertext
revealing of hiding messages whereas back to plaintext. A cipher is a pair
cryptographic techniques try to conceal of algorithms that create the encryption
the contents of a message. Video and the reversing decryption. The detailed
Steganographic Scheme that can provide operation of a cipher is controlled both by
provable security with high computing the algorithm and in each instance by
speed, that embed secret messages into a key. This is a secret parameter (ideally
images without producing noticeable known only to the communicants) for a
changes. Here we are embedding data in specific message exchange context. A
video frames. In this work, we show a "cryptosystem" is the ordered list of
model for a case where extreme security elements of finite possible plaintexts, finite
is needed. In such case possible ciphertexts, finite possible keys,
steganocryptography (steganography and and the encryption and decryption
cryptography) is used. In this model we algorithms which correspond to each key.
use Secure Hash Algorithm-2. Keys are important, as ciphers without
variable keys can be trivially broken with
1. Introduction only the knowledge of the cipher used and
The security and privacy of digital videos are therefore useless (or even counter-
has become increasingly more important productive) for most purposes.
in today's highly computerized and Text, image, audio, and video can be
interconnected world. Digital media represented as digital data. The explosion
content must be protected in applications of Internet applications leads people into
such as pay-per-view TV or confidential the digital world, and communication via
video conferencing, as well as in medical, digital data becomes recurrent. However,
industrial or military multimedia systems. new issues also arise and have been
With the rise of wireless portable devices, explored, such as data security in digital
many users seek to protect the private communications, copyright protection of
multimedia messages that are exchanged digitized properties, invisible
over the wireless or wired networks. In communication via digital media, etc.
general, applying a well-established, Steganography is the art of hiding
general-purpose hash function encryption information in ways that prevent the
algorithm to ensure the confidentiality detection of hiding message. In
during video transmission is a good idea steganography, the object of
from a security point of view. communication is the hidden message and
cryptography referred almost exclusively the cover data are only the means of
to encryption, which is the process of sending it. Secret information as well as
converting ordinary information plaintext

115
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

cover data can be any multimedia data this report they will be categorized based
like text, image, audio, video etc on the number of keys that are employed
for encryption and decryption, and further
2. Cryptography defined by their application and use. The
Data that can be read and following are the three types of Algorithm
understood without any special measures that are disscused
is called plaintext or cleartext. The method Symmetric Key Cryptography
of disguising plaintext in such a way as to The most widely used symmetric key
hide its substance is called encryption. cryptographic method is the Data
Encrypting plaintext results in unreadable Encryption Standard (DES) . It is still the
gibberish called ciphertext. You use most widely used symmetric-key
encryption to ensure that information is approach. It uses a fixed length, 56-bit
hidden from anyone for whom it is not key and an efficient algorithm to quickly
intended, even those who can see the encrypt and decrypt messages. It can be
encrypted data. The process of reverting easily implemented in hardware, making
ciphertext to its original plaintext is called the encryption and decryption process
decryption.A cryptographic algorithm, or even faster. In general, increasing the key
cipher, is a mathematical function used in size makes the system more secure. A
the encryption and decryption process. A variation of DES, called Triple-DES or DES-
cryptographic algorithm works in EDE (encrypt-decrypt-encrypt), uses three
combination with a key—a word, number, applications of DES and two independent
or phrase—to encrypt the plaintext. The DES keys to produce an effective key
same plaintext encrypts to different length of 168 bits.
ciphertext with different keys. The IDEA uses a fixed length, 128-bit key
security of encrypted data is entirely (larger than DES but smaller than Triple-
dependent on two things: the strength of DES). It is also faster than Triple-DES.
the cryptographic algorithm and the These use variable length keys and are
secrecy of the key claimed to be even faster than IDEA.
Despite the efficiency of symmetric key
cryptography , it has a fundamental weak
spot-key management. Since the same
Plain Encryptio Ciphe key is used for encryption and decryption,
it must be kept secure. If an adversary
Text n r Text knows the key, then the message can be
decrypted. At the same time, the key must
be available to the sender and the receiver
and these two parties may be physically
separated. Symmetric key cryptography
transforms the problem of transmitting
Plain Decryption messages securely into that of
Text transmitting keys securely. This is an
improvement , because keys are much
smaller than messages, and the keys can
be generated beforehand. Nevertheless,
ensuring that the sender and receiver are
Fig 1-Process of Encryption and using the same key and that potential
Decryption adversaries do not know this key remains
a major stumbling block. This is referred
2.1 TYPES OF CRYPTOGRAPHIC to as the key management problem.
ALGORITHMS Public/Private Key Cryptography
There are several ways of classifying Asymmetric key cryptography overcomes
cryptographic algorithms. For purposes of the key management problem by using

116
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

different encryption and decryption key adversary. More specifically, given a


pairs. Having knowledge of one key, say message x, if it is computationally
the encryption key, is not sufficient infeasible to find a message y not equal to
enough to determine the other key - the x such that H(x) = H(y) then H is said to
decryption key. Therefore, the encryption be a weakly collision-free hash function.
key can be made public, provided the A strongly collision-free hash function H is
decryption key is held only by the party one for which it is computationally
wishing to receive encrypted messages infeasible to find any two messages x and
(hence the name public/private key y such that H(x) = H(y).
cryptography). Anyone can use the public The requirements for a good
key to encrypt a message, but only the cryptographic hash function are stronger
recipient can decrypt it. than those in many other applications
RSA is a widely used public/private key (error correction and audio
algorithm is, named after the initials of its identification not included). For this
inventors, Ronald L. Rivest, Adi Shamir, reason, cryptographic hash functions
and Leonard M. Adleman . It depends on make good stock hash functions--even
the difficulty of factoring the product of functions whose cryptographic security is
two very large prime numbers. Although compromised, such as MD5 and SHA-1.
used for encrypting whole messages, RSA The SHA-2 algorithm, however, has no
is much less efficient than symmetric key known compromises‖ hash function ca
algorithms such as DES. ElGamal is also be referred to as a function with
another public/private key algorithm . This certain additional security properties to
uses a different arithmetic algorithm than make it suitable for use as a primitive in
RSA, called the discrete logarithm various information security applications,
problem. such as authentication and message
The mathematical relationship between integrity. It takes a long string (or
the public/private key pair permits a message) of any length as input and
general rule: any message encrypted with produces a fixed length string as output,
one key of the pair can be successfully sometimes termed a message digest or a
decrypted only with that key's counterpart. digital fingerprint.
To encrypt with the public key means you
can decrypt only with the private key. The 2.2 SHA 2
converse is also true - to encrypt with the The SHA2 functions implement the
private key means you can decrypt only NIST Secure Hash Standard. The SHA2
with the public key. functions are used to generate a
condensed representation of a message
Hash functions called a message digest, suitable for use
as a digital signature. There are three
―Is a type of one-way function this are families of functions, with names
fundamental for much of cryptography. A corresponding to the number of bits in the
one way function - is a function that is resulting message digest. The SHA-256
easy to calculate but hard to invert. It is functions are limited to processing a
difficult to calculate the input to the message of less than 2^64 bits as input.
function given its output. The precise The SHA-384 and SHA-512 functions can
meanings of "easy" and "hard" can be process a message of at most 128 - 1 bits
specified mathematically. With rare as input. The SHA2 functions are
exceptions, almost the entire field of considered to be more secure than the
public key cryptography rests on the sha1 functions with which they share a
existence of one-way functions similar interface.
In this application, functions are The 256, 384, and 512-bit versions of
characterized and evaluated in terms of SHA2 share the same interface. . SHA-256
their ability to withstand attack by an and SHA-512 are novel hash functions

117
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

computed with 32- and 64-bit words,


respectively. They use different shift
amounts and additive constants, but their Fig 2 -Steganography Process
structures are otherwise virtually identical,
differing only in the number of rounds. Here data hiding operations are executed
SHA-224 and SHA-384 are simply entirely in the compressed domain. On the
truncated versions of the first two, other hand, when really higher amount of
computed with different initial values.The data must be embedded in the case of
SHA-2 functions are not as widely used as video sequences, there is a more
SHA-1, despite their better security. demanding constraint on real-time
effectiveness of the system. The method
3. Steganography utilizes the characteristic of the human
vision‘s sensitivity to color value variations.
The objective of this work is to develop a The aim is to offer safe exchange of color
Compressed Video Steganographic stego video across the internet that is
Scheme that can provide provable security resistant to all the steganalysis methods
with high computing speed, that embed like statistical and visual analysis.
secret messages into images without Image based and video based
producing noticeable changes. Here we steganographic techniques are mainly
are embedding data in video frames. A classified into spatial domain and
video can be viewed as a sequence of still frequency domain based methods. The
images. Data embedding in videos seems former embedding techniques are LSB,
very similar to images. However, there are matrix embedding etc. Two important
many differences between data hiding in parameters for evaluating the
images and videos, where the first performance of a steganographic system
important difference is the size of the host are capacity and imperceptibility. Capacity
media. Since videos contain more sample refers to the amount of data that can be
number of pixels or the number of hidden in the cover medium so that no
transform domain coefficients, a video has perceptible distortion is introduced.
higher capacity than a still image and Imperceptibility or transparency
more data can be embedded in the video. represents the invisibility of the hidden
Also, there are some characteristics in data in the cover media without degrading
videos which cannot be found in images the perceptual quality by data embedding.
as perceptual redundancy in videos is due
to their temporal features.

Original Embed
image/Frame Cipher Text

Stego Image

118
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4. Select the block with maximum


magnitude and embed the data using PVD
method
To increase the capacity of the hidden
secret information and to provide an
imperceptible stegoimage
for human vision, here pixel-value
differencing (PVD) is used for embedding

3.1 Compressed Video


Steganographic Algorithm
Here a novel steganographic approach
called tri-way pixel-value differencing with
pseudorandom
dithering (TPVDD) is used for embedding.
TPVDD enlarges the capacity of the hidden
secret information and provide an
imperceptible stego-image for human
vision with enhanced security. A small
difference value of consecutive pixels can
be located on a smooth area and the large
one is located on an edged area.
According to the properties of human
vision,
Fig 3 Process of Steganography and eyes can tolerate more changes in sharp-
Steganalysis edge blocks than in smooth blocks. That
is, more data can be embedded into the
A steganographic algorithm for edge areas than into smooth areas. This
compressed video is introduced here, capability is made used in this approach
operating directly in compressed bit which leads to good imperceptibility with a
stream. The secret data‘s are embedded in high embedding rate. The Tri-way
I frame, and in P frames and in B frames. Differencing Scheme is explained as
This secure compressed video follows. In general, the edges in an image
Steganographic architecture taking are roughly classified into vertical,
account of video statistical invisibility .The horizontal, and two kinds of diagonal
frame work is shown in the Figure 3. This directions. Motivated from the PVD
architecture consists of four functions: I, P method, using two-pixel pairs on one
and B frame extraction, the scene change directional edge can work efficiently for
detector, motion vectors calculation and information hiding. This should accomplish
the data embedder and steganalysis. The more efficiency while considering four
details of data embedding in P and B directions from four two pixel pairs
frames are as follows:
1. For each P and B frames, motion Cipher Text Embedding in video
vectors are extracted from the bitstream. Figure 4 shows frames before and after
2. The magnitude of each motion vector is embedding. Here Text data‘s are the
calculated as follows: secret information .
MVj |= sqrt(| Hj2+Vj 2 )
where j MV the motion vector of the jth
macroblock, and i H is horizontal and j V is
the vertical components of the MV
respectively.
3. This magnitude is compared with a
threshold

119
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[4] Y. K. Lee., L. H. Chen.: High capacity


image steganographic model, IEE
Proceedings on Vision,
Image and Signal Processing, Vol. 147,
No.3, pp. 288-294, 2000.
[5] D.-C. Wu., and W.-H. Tsai.: A
steganographic method for images by
pixel-value differencing, Pattern
Recognition Letters, Vol. 24, pp. 1613–
1626, 2003
Fig 4 Frame Before and after Embedding [6] Y. J. Dai., L. H. Zhang and Y. X. Yang.:
A New Method of MPEG
Conclusion VideoWatermarking Technology.
International Conference on
A Video Steganographic Scheme along Communication Technology Proceedings
with SHA 2 was proposed in this paper, (ICCT), 2003.
operating directly in compressed domain. [7] G. C. Langelaar and R. L. Lagendijk.:
This technique provides high capacity and Optimal Differential Energy Watermarking
imperceptible stego-image for of DCT Encoded
humanvision of the hidden secret Images and Video. IEEE Trans. on Image
information. Here the frame with Processing, 2001, 10(1):148-158.
maximum scene change blocks wereused [8] Bin Liu, Fenlin Liu, Chunfang Yang and
for embedding. The performance of the Yifeng Sun, ―Secure Steganography in
steganographic algorithm is studied Compressed Video
andexperimental results shows that this Bitstreams‖ " Proc of the Int. Conf. IEEE
scheme can be applied on compressed ARS ,pp 520-525,2008
videos with nonoticeable degradation in [9] A.Hanafy,Gouda I.salama and Yahya
visual quality. Z.Mohasseb, ―A Secure Covert
Communication model Based On Video
References Steganography‖ Proc of the Int. Conf.
IEEE Military Communication,2008
[1] F Hartung., B. Girod.: Watermarking of [10] M. Abadi and B. Blanchet.
uncompressed and compressed video, Secrecy types for asymmetric
Signal Processing, Special Issue on communication.
Copyright Protection and Access Control In Foundations of Software Science and
for Multimedia Services, 1998, 66 (3): Computation Structures, volume 2030 of
283-301. Lecture Notes in Computer
[2] Bin Liu., Fenlin Liu., Chunfang Yang Science, pages 25–41. Springer, 2001.
and Yifeng Sun.: Secure Steganography in [11]M. Abadi and B. Blanchet.
Compressed Video Analyzing security protocols with secrecy
Bitstreams,The Third International types and logic programs.
Conference on Availability,Reliability and In 29th ACM Symposium on Principles of
Security,2008 Programming Languages, pages 33–44,
[3] Ko-Chin Chang., Chien-Ping Chang., 2002.
Ping S. Huang., and Te-Ming Tu,: A Novel [12]M. Abadi.
Image Secrecy by typing in security protocols.
Steganographic Method Using Tri-way Journal of the ACM, 46(5):749–786,
Pixel-Value Differencing, Journal of September 1999.
Multimedia , VOL. 3, NO. 2, [13] X. Y. Wang, X. J. Lai, D. G. Feng, H.
JUNE 2008 Chen, X. Y. Yu. Cryptanalysis for Hash
Functions MD4 and RIPEMD. Advances in

120
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Cryptology–Eurocrypt‘05, pp.1-18,
SpringerVerlag, May 2005.
[14] X. Wang, Y. Yin, H. Yu, Finding
Collisions in the Full SHA-1. In Advances in
Cryptology - CRYPTO '05, 2005.
[15] A. Lenstra, X. Wang and B. de
Weger, Colliding X.509 Certificates,
Cryptology ePrint Archive, Report
2005/067, 2005. Available at:
http://eprint.iacr.org/

121
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SECURE ENCRYPTION AND KEYING BASED ON


VIRTUAL ENERGY FOR WIRELESS SENSOR
NETWORKS

*M.Piramanayagam **M.Yuvaraju

*PG Scholar, prabhujef@gmail.com,+91-9894795055


**Assistant Professor, rajaucbe@gmail.com
Department of Computer science and Engineering, Anna University of Technology, Coimbatore, India..

Abstract- In order to provide secure and cost-efficient


encryption and keying for wireless sensor networks I. INTRODUCTION
,this paper introduce a Secure Encryption and keying Sensor network technology has rapidly developed in
based on virtual energy for wireless sensor recent years and will be used in a variety of
networks(WSN). Since sensors are resource limited environments. Accordingly, people will come to rely
wireless devices and the communication cost is the more on sensor networks. For example, in a
most dominant factors in WSN, the propose system battlefield scenario, sensors may be used to detect
can save the energy of the sensors by monitoring the the location of enemy sniper fire or to detect harmful
wireless spectrum where the unattended sensors can chemical agents before they reach troops. Research
be reused efficiently and generating dynamic keys for on WSN indicates that energy required for
rekeying to avoid stale keys. Here the sensed data is transmission is greater than the energy required for
encoded with RC4 encryption mechanism. The key to processing data. Due to this fact, many energy aware
the mechanism dynamically changes as a function of routing protocols have been introduced. The sensor
residual virtual energy of the sensor. The networks work on a very small battery having very
intermediate nodes along the path to the sink are low energy. It is near to impossible to change the
able to verify the authenticity and integrity of the battery of a node once it is deployed. In most of the
incoming packets using a predicted value of the key cases, nodes survive on the energy recharged with
generated by the sender‘s virtual energy, thus the help of photovoltaic or thermal conversion.
requiring no need for specific rekeying messages. To It is very important to provide authentic and accurate
protect the keys from the malicious outsiders, data to surrounding sensor nodes and to the sink to
hashing function is being used. This scheme can trigger time-critical responses Protocols should be
eliminate the false injection of data into network and resilient against false data injected into the network
eliminate insider threads and providing dynamic by malicious nodes. Otherwise, consequences for
paths. propagating false data or redundant data are costly,
Index Terms— WSN security, RC4 encryption, depleting limited network resources and wasting
hashing, virtual energy response efforts. To focusing the key management,

122
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

there are two fundamental key management schemes [5], using static pair wise keys and two MACs
for WSNs: static and dynamic. In static key appended to the sensor reports, ―an interleaved hop-
management schemes, key management functions by-hop authentication scheme for filtering of injected
(i.e., key generation and distribution) are handled false data‖ was proposed by Zhu et al. to address
statically. That is, the sensors have a fixed number of both the insider and outsider threats.
keys loaded either prior to or shortly after network Another crucial idea of this paper is the notion of
deployment. On the other hand, dynamic key sharing a dynamic cryptic credential (i.e., virtual
management schemes perform keying functions energy) among the sensors. A similar approach was
(rekeying) either periodically or on demand as suggested inside the SPINS study [6] via the SNEP
needed by the network. The sensors dynamically protocol. In particular, nodes share a secret counter
exchange keys to communicate. Although dynamic when generating keys and it is updated for every new
schemes are more attack resilient than static ones, key. However, the SNEP protocol does not consider
one significant disadvantage is that they increase the dropped packets in the network due to
communication overhead due to keys being refreshed communication errors. Although another study,
or redistributed from time to time in the network. Minisec [7], recognizes this issue, the solution
There are many reasons for key refreshment, suggested by the study still increases the packet size
including: updating keys after a key revocation has by including some parts of a counter value into the
occurred, refreshing the key such that it does not packet structure. The following sections will address
become stale, or changing keys due to dynamic the related works briefly.
changes in the topology. In this paper, we seek to A. Dynamic energy-based encoding and filtering
minimize the overhead associated with refreshing H. Hou, C. Corbett, Y. Li, and R. Beyah proposed
keys to avoid them becoming stale. Because the DEEF .In critical sensor deployments it is important to
communication cost is the most dominant factor in a ensure the authenticity and integrity of sensed data.
sensor‘s energy consumption, the message Further, one must ensure that false data injected into
transmission cost for rekeying is an important issue in the network by malicious nodes is not perceived as
a WSN deployment (as analyzed in the next section). accurate data. Here they present the Dynamic
Furthermore, for certain WSN applications (e.g., Energy-based Encoding and Filtering (DEEF)[2]
military applications), it may be very important to framework to detect the injection of false data into a
minimize the number of messages to decrease the sensor network. DEEF requires that each sensed
probability of detection if deployed in an enemy event report be encoded using a simple encoding
territory. That is, being less ―chatty‖ intuitively scheme based on a keyed hash. The key to the
decreases the number of opportunities for malicious hashing function dynamically changes as a function of
entities to eavesdrop or intercept packets the transient energy of the sensor, thus requiring no
II. RELATED WORKS need for re-keying. Depending on the cost of
Dynamic keying schemes go through the phase of transmission vs. computational cost of encoding, it
rekeying either periodically or on demand as needed may be important to remove data as quickly as
by the network to refresh the security of the system. possible. Accordingly, DEEF can provide
With rekeying, the sensors dynamically exchange authentication at the edge of the network or
keys that are used for securing the communication. authentication inside of the sensor network.
DEEF [2], is that in reality battery levels may Depending on the optimal configuration, as the report
fluctuate and the differences in battery levels across is forwarded, each node along the way verifies the
nodes may spur synchronization problems, which can correctness of the encoding probabilistically and
cause packet drops. Ma‘s work [3] applies the same drops those that are invalid. They have evaluated
filtering concept at the sink and utilizes packets with DEEF‘s feasibility and performance through analysis.
multiple MACs appended. A work [4] proposed by Their results show that DEEF, without incurring
Hyun and Kim uses relative location information to transmission overhead (increasing packet size), is
make the compromised data meaningless and to able to eliminate 90% - 99% of false data injected
protect the data without cryptographic methods. In

123
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

from an outsider within 9 hops before it reaches the minimal hardware: the performance of the protocol
sink. suite easily matches the data rate of our network.
B. Statistical en-route filtering of injected false Additionally, They demonstrate that the suite can be
data in sensor networks used for building higher level protocols.
Fan Ye, Haiyun Luo and Songwu Lu proposed the D. Dynamic en-route scheme for filtering false
―Statistical En-Route Filtering of Injected False Data data injection
in Sensor Networks‖to detect and drop false reports Zhen Yu and Yong Guan proposed a dynamic en-
during the forwarding processAssuming that the same route filtering scheme for false data injection attacks
event can be detected by multiple sensors, in SEF in wireless sensor networks. In sensor networks,
each of the detecting sensors generates a keyed adversaries can inject false data reports containing
message authentication code (MAC) and multiple bogus sensor readings or nonexistent events from
MACs are attached to the event report. As the report compromised nodes. Such attacks may not only cause
is forwarded, each node along the way verifies the false alarms, but also drain out the limited energy of
correctness of the MAC‘s probabilistically and drops sensor nodes. Several existing schemes for filtering
those with invalid MACs. SEF exploits the network false reports either cannot deal with dynamic
scale to filter out false reports through collective topology of sensor networks or have limited filtering
decision- making by multiple detecting nodes and capacity. In our scheme, a legitimate report is
collective false detection by multiple forwarding endorsed by multiple sensing nodes using their own
nodes. Authors have evaluated SEF‘s feasibility and authentication keys generated from one-way hash
performance through analysis, simulation, and chains. Cluster head uses HillClimbing approach to
implementation. Our results show that SEF can be disseminate the authentication keys of sensing nodes
implemented efficiently in sensor nodes as small as to the forwarding nodes along multiple paths toward
Mica2. It can drop up to 70% of bogus reports the base station.
injected by a compromised node within five hops, and The purpose system will provide fulfill all issues
reduce energy consumption by 65% or more in many discussed in the previous works and provide security
cases. in a efficient manner.
C. SPINS: Security Protocols for Sensor III. OVERVIEW OF THE SYSTEM
Networks In this paper provides secure communication
A. Perrig, R. Szewczyk, V. Wen, D. Cullar, framework provides a technique to verify data in line
and J. Tygar proposed the SPIN,As sensor networks and drop false packets from malicious nodes, thus
edge closer towards wide-spread deployment, maintaining the health of the sensor network. It
security issues become a central concern. So far, dynamically updates keys without exchanging
much research has focused on making sensor messages for key renewals and embeds integrity into
networks feasible and useful, and has not packets as opposed to enlarging the packet by
concentrated on security. They present a suite of appending message authentication codes (MACs).
security building blocks optimized for resource Specifically, each sensed data is protected using a
constrained environments and wireless simple encoding scheme based on a permutation
communication. SPINS has two secure building code generated with the RC4 encryption scheme and
blocks: SNEP and _TESLA. SNEP provides the sent towards the sink. The key to the encryption
following important baseline security primitives: Data scheme dynamically changes as function of the
confidentiality, two-party data authentication, and residual virtual energy of the sensor, thus requiring
data freshness. Aparticularly hard problem is to no need for rekeying. The nodes forwarding the data
provide efficient broadcast authentication, which is an along the path to the sink are able to verify the
important mechanism for sensor networks. _TESLA is authenticity and integrity of the data and to provide
a new protocol which provides authenticated non- repudiation.
broadcast for severely resource-constrained The contributions of this paper are as follows. First, A
environments. They implemented the above dynamic en route filtering mechanism that does not
protocols, and show that they are practical even on exchange explicit control messages for rekeying.

124
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Second, provision of one-time keys for each packet Fig 1. Modular diagram
transmitted to avoid stale keys. Third, modular and
flexible security architecture with a simple technique A. Virtual Energy-Based Keying module
for ensuring authenticity, integrity, and no The virtual energy-based keying process involves the
repudiation of data without enlarging packets with creation of dynamic keys. Contrary to other dynamic
MACs. Forth, A robust secure communication keying schemes, it does not exchange extra
framework that is operational in dire communication messages to establish keys. A sensor node computes
situations and over unreliable medium access control keys based on its residual virtual energy of the
layers .The random distribution of data is done by sensor. energy-based keying module ensures that
using DES techniques. It is used to provide security in each detected packet2 is associated with a new
a efficient way. The energy of the sensor is being unique key generated based on the transient value of
saved by doing all the encryption and decryption in the virtual energy. After the dynamic key is
with the residual energy of the sensor. generated, it is passed to the crypto module, where
IV. MODULES the desired security services are implemented. The
The virtual energy-based keying process involves the process of key generation is initiated when data is
creation of dynamic keys. Contrary to other dynamic sensed; thus, no explicit mechanism is needed to
keying schemes, it does not exchange extra refresh or update keys. Moreover, the dynamic nature
messages to establish keys. A sensor node computes of the keys makes it difficult for attackers to intercept
keys based on its residual virtual energy of the enough packets to break the encoding algorithm.
sensor. The key is then fed into the crypto module. B. Crypto module
The crypto module employs a simple encoding The crypto module employs a simple encoding
process, which is essentially the process of process, which is essentially the process of
permutation of the bits in the packet according to the permutation of the bits in the packet according to
dynamically created permutation code generated via the dynamically created permutation code generated
RC4. The encoding is a simple encryption mechanism via RC4. Due to the resource constraints of WSNs,
adopted However, architecture allows for adoption of traditional digital signatures or encryption
stronger encryption mechanisms in lieu of encoding. mechanisms requiring expensive cryptography is not
Last, the forwarding module handles the process of viable. The scheme must be simple, yet effective.
sending or receiving of encoded packets along the Thus, in this section, we introduce a simple encoding
path to the sink. operation similar to that used in [2]. The encoding
operation is essentially the process of permutation of
the

bits in the packet, according to the dynamically transmission overhead of traditional schemes.
created permutation code via the RC4 encryption However, since the key generation and handling
mechanism. The key to RC4 is created by the process is done in another module, This flexible
previous module (virtual energy based keying architecture allows for adoption of stronger
module). The purpose of the crypto module is to encryption mechanisms in lieu of encoding. In this
provide simple confidentiality of the packet header module DES technique is used to provide the random
and payload while ensuring the authenticity and packet transmission from source to sink in order to
integrity of sensed data without incurring provide security.

125
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

C. Forwarding module where constitutes


The final module is the forwarding module. The encoding x with permutation code Pc. The local
forwarding module is responsible for the sending of virtual energy value is updated and stored for use
packets (reports) initiated at the current node (source with the transmission of the next report.
node) or received packets from other sensors 2. Forwarder Node Algorithm
(forwarding nodes) along the path to the sink. The Once the forwarding node receives the packet it will
reports traverse the network through forwarding first check its watch-list to determine if the packet
nodes and finally reach the terminating node. came from a node it is watching. If the node is not
V. ALGORITHMS being watched by the current node, the packet is
In the forwarding module the following algorithms are forwarded without modification or authentication.
used to forward packets from source to sink. Although this node performed actions on the packet
1. Source node algorithm (received and forwarded the packet), its local virtual
The source node uses the local virtual energy perceived energy value is not updated. This is done
value to construct the next key. The source sensor to maintain synchronization with nodes watching it
fetches the current value of the virtual energy from further up the route. If the node is being watched by
the first module. Then, the key is used as input into the current node, the forwarding node checks the
the RC4 algorithm inside the crypto module to create associated current virtual energy record (Algorithm 2)
a permutation code for encoding the message. When stored for the sending node and extracts the energy
an event is detected by a source node, the next step value to derive the key. It then authenticates the
is for the report to be secured. The source node uses message by decoding the message and comparing
the local virtual energy value and an IV (or previous the plaintext node ID with the encoded node ID. If
key value if not the first transmission) to construct the packet is authentic, an updated virtual energy
the next key. As discussed earlier, this dynamic key value is stored in the record associated with the
generation process is primarily handled by the first sending node.
module. The source sensor fetches the current value VI. CONCLUSION
of the virtual energy from the virtual energy based Communication is very costly for wireless sensor
keying module. Then, the key is used as input into networks (WSNs) and for certain WSN applications.
the RC4 algorithm inside the crypto module to create Independent of the goal of saving energy, it may be
a code for encoding the message. The very important to minimize the exchange of messages
encoded message and the clear text ID of the (e.g., military scenarios). To address these concerns,
originating node are transmitted to the next hop The presented communication framework for WSNs is
(forwarding node or sink) using the following format:
Secure Encryption and keying based on virtual [1] A.S.Uluagac, R.A.Beyah, Y.Li, ―VEBEK: Virtual
energy. In comparison with other key management Energy-Based Encryption and Keying for Wireless
schemes, This has the following benefits: 1) it does Sensor Networks‖ IEEE Trans. Mobile Computing,
not exchange control messages for key renewals and vol.9 no 7, July 2010.
is therefore able to save more energy and is less [2] H. Hou, C. Corbett, Y. Li, and R. Beyah,
chatty, 2) it uses one key per message so successive ―Dynamic Energy-Based Encoding and Filtering in
packets of the stream use different keys—making Sensor Networks,‖ Proc. IEEE Military Comm. Conf.
more resilient to certain attacks (e.g., replay attacks, (MILCOM ‘07), Oct. 2007.
brute-force attacks, and masquerade attacks), and 3) [3] M. Ma, ―Resilience of Sink Filtering Scheme in
it unbundles key generation from security services, Wireless Sensor Networks,‖ Computer Comm., vol.
providing a flexible modular architecture that allows 30, no. 1, pp. 55-65, 2006.
for an easy adoption of different key-based [4] V. Rahunathan, C. Schurgers, S. Park and
encryption or hashing schemes.DES algorithm is Mani B. Srivastava, ―Energy Aware Wireless Sensor
implemented to increase the security. Networks‖, pp.1-17; Dept of Elec Eng, Uni of
REFERENCES California, L. A, 2004.

126
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[5] S. Zhu, S. Setia, S. Jajodia, and P. Ning, ―An


Interleaved Hop-by- Hop Authentication Scheme for
Filtering of Injected False Data in Sensor Networks,‖
Proc. IEEE Symp. Security and Privacy, 2004
[6] A. Perrig, R. Szewczyk, V. Wen, D. Cullar, and
J. Tygar, ―Spins: Security Protocols for Sensor
Networks,‖ Proc. ACM MobiCom, 2001.
[7] M. Luk, G. Mezzour, A. Perrig, and V. Gligor,
―Minisec: A Secure Sensor Network Communication
Architecture,‖ Proc. Sixth Int‘l Symp. Information
Processing in Sensor Networks (IPSN ‘07), pp. 479-
488, Apr. 2007
[8] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam,
and E. Cayirci,―Wireless Sensor Networks: A Survey,‖
Computer Networks,vol. 38, no. 4, pp. 393-422, Mar.
2002.
[9] C. Vu, R. Beyah, and Y. Li, ―A Composite
Event Detection in Wireless Sensor Networks,‖ Proc.
IEEE Int‘l Performance, Computing, and Comm. Conf.
(IPCCC ‘07), Apr. 2007.

127
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

HYBRID INFRASTRUCTURE SYSTEM FOR


EXECUTING SERVICE WORKFLOWS
*P.Sivaranjani**P.Neelaveni
G.K.M College of Engineering and Technology,Peringalathur, Chennai
ranjusiva22@gmail.com

Abstract - Cloud computing systems provide knowledge of, expertise in, or control over the
on demand access to computational resources technology infrastructure about the cloud they
for dedicated use. Grid computing allows users are using. It typically involves the provision of
to share heterogeneous resources from dynamically scalable and often virtualized
multiple administrative domains applied to resources as a service over the Internet [1].
common tasks. In this paper, the system The cloud computing characteristics are on
discusses the characteristics and requirements demand self-service, ubiquitous network
of a hybrid infrastructure composed of both access, Independent resource location
grid and cloud technologies. The infrastructure (reliability), rapid elasticity (scalability), and
is used to manage the execution of service pay per use. The cloud computing allow the
workflows in system through dynamic service use of Service Oriented Computing (SOC)
composition. The dynamic service composition standards, permitting users to establish links
is achieved by the autonomic computing between services, organizing them as
characteristics of the cloud computing workflows instead of building traditional
technologies. The infrastructure can be applications using programming languages.
expanded by acquiring computational The on demand computing offered by the
resources on demand from the cloud during cloud allows users to keep using their
the workflow execution and it manages these particular systems (computers, clusters, and
resources and the workflow execution without grids), aggregating the cloud resources as
user interference. Optimized Scheduling they need. However, this technology union
Algorithm was used. The hybrid infrastructure results in a hybrid computing system, with
enables the execution of service workflows of new demands, notably in resource
grid jobs using cloud technology. management. Besides that, even though it
Keywords – Grid Process Orchestration, uses the SOC paradigm, the cloud does not
Dynamic Deployment Virtual Resource, Cloud offer support to dynamic service workflow
System Interface composition and coordination.
I. INTRODUCTION In this paper we discuss the characteristics of
Grid computing refers the combination of a hybrid system, composed of the union of a
computer resources from multiple grid with a cloud, and we propose an
administrative domains to reach a common infrastructure able to manage the execution of
goal.The cloud computing brings the service workflows in such system.
supercomputing to the users, making them to This paper is organized as follows. Some basic
transparently achieve virtually unbounded concepts and related works are presented in
processing and storage accessible from their Section II, while Section III shows the
laptops or personal computers. infrastructure to execute service workflows in
In the cloud computing paradigm details are the hybrid system. Section IV presents the
abstracted from the users. They do not need system architecture, and application scenarios

128
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

are discussed in Section V. Conclusion and the provision of dynamically scalable and often
future works are shown in Section VI. virtualized resources as a service over the
Internet [1]. Cloud computing delivers three
II. CONCEPTS AND RELATED WORKS defined models: software as a service (SaaS),
platform as a service (PaaS), and
infrastructure as a service (IaaS). In SaaS the
Our infrastructure is directed to systems that consumer uses an application but does not
use the service oriented computing paradigm. control the host environment. Google Apps [7]
In our work we combine a service oriented and Salesforce.com [8] are examples of this
grid and the Nimbus cloud [2], an option model. In PaaS the consumers use a hosting
based on the Amazon‘s Elastic Compute Cloud environment for their applications. The Google
(EC2) [3]. App Engine [9] and Amazon Web Services
Grids are environments where shared [10] are PaaS examples. In this model the
heterogeneous computing resources are platform is typically an application framework.
connected through a network, local or remote In IaaS the consumer uses computing
[4]. Grids allow institutions and people to resources such as processing power and
share resources and objectives through storage. In this model the consumer can
security rules and use policies, comprising the control the environment including deployment
so called Virtual Organizations (VOs) [4]. The of applications. Amazon Elastic Compute Cloud
Open Grid Services Architecture (OGSA) [3], Globus Nimbus [2], and Eucalyptus [11]
standard [4] proposes that the interoperability are good examples of this model.
among grid heterogeneous resources to be The most popular model is the IaaS. In a
made through Internet protocols, allowing simplified manner we can understand this
grids to use standards and paradigms from the cloud model as a set of virtual servers
service oriented computing (SOC) [5]. In our accessible through the Internet. These servers
work we used the Globus Toolkit version 4 can be managed, monitored, and maintained
(GT4)[19], an OGSA implementation from the dynamically and remotely. It is easy to see
Globus Alliance [6]. that the virtualization concept is fundamental
The grid and virtual organization dynamics in the IaaS model. Virtualization [12] is the
intensify the need of on demand provisioning, process of presenting a logical grouping or
where organizational requirements must guide subset of computing resources so that they
the system configuration. It is an environment can be accessed in abstract ways with benefits
duty to dynamically provide the services over the original configuration. The
related to each application when they are virtualization software abstracts the hardware
needed. It is not recommendable to make all by creating an interface to virtual machines
services available in all resources in the grid, (VMs), which represents virtualized resources
since this can overload resources and use such as CPUs, physical memory, network
processing power, memory, and bandwidth connections, and peripherals. Each virtual
without need. To allow on demand machine alone is an isolated execution
provisioning, it is necessary to have support to environment independent from the others.
dynamic instantiation of services, i.e, to send With this, each VM can have its own operating
the service to the resource, publish it so it can system, applications, and network services.
be handled by a container, and activate the This isolation allows users to have control over
container so it can start replying to service the resource without interference in other
requisitions. Our infrastructure aggregates participants in the cloud.
functionalities for on demand service Clouds and grids are distinct. Clouds provide
provisioning during workflows execution, since full private cluster, where individual users can
the GT4 does not offer such functionality. access resources from the pool, and its
The cloud computing paradigm abstracts resources are ―opaque‖, being accessible
details from users who no longer need through the user interface without knowledge
knowledge about the technology infrastructure about hardware details. Grids permit individual
that supports the cloud. It typically involves users to select resources and get most, if not

129
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

all, the resources in a single request. Its The work proposed in this paper aggregates
middleware approach takes federation as a the support for service workflow execution in
first principle, and it exposes resources both grids and clouds, offering support to on
without any preparation or isolation. These demand dynamic instantiation and service
differences claim different architectures for publication. Its functionalities allow the user to
each one, and require personalized solutions. execute abstract workflows, without indicating
Our infrastructure supplies service support the resources where each part of the workflow
offering automatic service deployment in the will execute. Besides that, the hybrid systems
resources provided dynamically and controlling management allows the use of different clouds
the service workflow execution through a with several architectures.
workflow manager, interacting with the cloud
and the grid in a transparent manner without III. THE HYBRID INFRASTRUCTURE
the user interference.
Some works propose solutions for the The on demand computing requires services
execution of workflows in grids or clouds, but scalability, flexibility, and availability. To
only a few considers a hybrid environment. In supply these requirements it is important to
[13] the authors explore the use of cloud the infrastructure to offer reconfiguration with
computing for scientific workflows. The the possibility of the deployment of new
approach is to evaluate the tradeoffs between resources or update of the existing ones
running tasks in a local environment, if such is without stopping processes in execution. In a
available, and running in a virtual environment hybrid system composed of a grid with the
via remote, wide-area network resource possibility of accessing a cloud computing
access. The work in [14] describes a scalable infrastructure, the workflow management
and lightweight computational workflow must supply the requirements in some levels.
system for clouds which can run workflow jobs First, it must provide facilities to the user to
composed of multiple Hadoop MapReduce or make submissions without the need of
legacy programs. But both works do not offer choosing or indicating their localization of the
support for services. computational resources to be used. Inside
In [15] the authors show issues that limit the the grid boundary, the workflow manager
use of clouds for highly distributed applications must find the best resources available and,
in a hybrid system. However, it has lack of when necessary, it must make the dynamic
interoperability between different cloud deployment of services in these resources. On
platforms, and it does not offer support to the other hand, inside the cloud boundary, the
service workflows. The authors propose a infrastructure must be able to interact with the
hybrid system formed by the DIET Grid and cloud interfaces to obtain computational
Eucalyptus in [16]. It shows possible ways of resources. After that, it must be able to
connecting these two architectures as well as prepare these resources according to workflow
requirements to achieve this, but it does not necessities, making the dynamic deployment
support services or service workflows. In [17], of services in the resources inside the cloud.
the authors show a solution that automatically This deployment can be made when local
schedules workflow steps to underutilized resources are not enough to the workflow
hosts and provides new hosts using cloud necessities. This configuration increases the
computing infrastructures. This interesting computational power of the grid without new
work extends a BPEL implementation to infrastructure investment, using the on
dynamically schedule service calls of a BPEL demand computing advantages provided by
process based on the target hosts load[20]. To the cloud.
handle peak loads, it integrates a provisioning In this paper we show an infrastructure for the
component that dynamically launches virtual execution of service workflows in hybrid
machines in Amazons EC2 infrastructure and systems composed of grid and clouds. The
deploys the required middleware components infrastructure provides dynamic instantiation
(web/Grid service stack) on-the-fly. However, of services when necessary, and it is formed
it does not support hybrid systems.

130
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

by a set of services which offer the following with the service available, the infrastructure
functionalities[21]: tries to publish the service in a resource.
• Simple workflow description language: Users
describe workflows through the Grid Process IV. THE HYBRID INFRASTRUCTURE
Orchestration Language (GPOL)[18]. The ARCHITECTURE
GPOL allows users to build abstract workflows,
where the computational resources do not The architectural diagram of hybrid system
need to be indicated. shows that the first step in the process is the
• Dynamic service instantiation: During the creation of Workflow. The Workflow
workflow execution, the infrastructure application is made to run in the hybrid
searches, in the grid and in the cloud, the best infrastructure system consists of both Grid
computational resources available to execute boundary and Cloud boundary. The imaging
each service. application is used as a workflow. The
• Automatic reference coordination (endpoint workflow is created and it is submitted to the
reference service): When offering dynamic Hybrid Workflow Manager without indicating
instantiation, some activities must be the resources needed for their execution.
transparent to the users. For example, There is no user interference in the allocation
consider a service that when executed of the resources. Grid Process Orchestration is
generates a file that is used by other services the workflow manager. It is a middleware to
in the workflow. Because the services support interoperability of distributed
localization is made on demand, the applications which require service composition
infrastructure resolves the references between in the computational grid. The Hybrid
services in execution time, without user Workflow Manager is responsible for managing
interference. the tasks of the workflow. The GPO allows the
• Dynamic service deployment: When the best creation and management of application flows,
resource option is identified, the infrastructure tasks, and services in grids. The Workflow
can deploy the new service if necessary. The tasks level is managed by the GPO. The
dynamic deployment is executed regardless Workflow is given to the grid boundary as well
the resource is in the grid or in the cloud; and as the to the cloud software.
• Robust workflow execution: If a service fails
during the execution, the infrastructure can The Grid Workflow Scheduling Engine can
search for an alternative resource, schedule efficiently the grid service through
automatically redirecting the execution and the Grid workflow monitor and Grid services
making the necessary adjustments in the information. Optimized Scheduling algorithm
services references. If there is no resource was used for better performance [22].

131
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Work Flow

Hybrid Work Flow Manager (GPO)

Scheduler Service Grid work flow Dynamic Deployment


Scheduling Engine Virtual Resource

Grid workflow Workflow service

Cloud
Data Repository Cloud Service
Resource Monitor
Software

Computational Resource

Figure-1 Architecture
The grid workflow monitoring monitors the Resource Monitors monitor the level of
execution and level of workflow as well as the workflow execution. It gathers information
number of resource instance allocated to each about computational resources in the hybrid
service in the workflow. The grid service system, grid or cloud. It operates in a
information indicates the services performed in distributed manner, maintaining one instance
each level. Based on the information the grid in each computational resource. Such
workflow engine uses the scheduler service to instances are used on demand by the other
schedule the tasks. The Scheduler service services when information about resources is
provides the function of distributing the needed. The workflow manager uses the
workflow services to be executed in the grid resource monitor to have knowledge about
resources. which resources are in the grid at a given
time, and the scheduler uses it to obtain

132
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

information about the current state of the workflow execution through a workflow
resources. Based on information stored in the manager interacting with the cloud and the
resource monitor, the scheduler can simply grid in a transparent manner without the user
schedule the unscheduled services. If any of interference.
the workflow exceeds the given limit the alert
level is created and accordingly the workflow V. APPLICATION SCENARIOS
service shifts to the cloud boundary. The
resource monitor now monitors the workflow The proposed infrastructure can be useful in
service in the cloud boundary. The cloud many scenarios that appear in grid application
boundary provides dynamic service. All the executions nowadays. If we consider the
information about the resources and the minimization of the makespan as the objective
history are stored in the data repositories. The to be achieved, the infrastructure has always
resources repository has information such as the option of requesting cloud resources if the
the characteristics of each computational current resources available cannot give a
resource, performance history and load. This satisfactory makespan. For instance, this can
information can also be used by the scheduler occur if the grid is overloaded with many
in its decision process. Besides that, the submissions in a load peak. In this scenario, a
services repository has information about new workflow may have its speedup heavily
services available in the grid, and it stores the prejudiced because it may need to wait for
necessary files to the dynamic publication. many other workflows to finish their
When the service workflow exceeds the limit execution.
the dynamic service is provided by the cloud Another scenario where we envision that our
software. It provides the feature of autonomic infrastructure can be applied is when a
computing that automatically allocates the workflow has more parallel tasks than the
resources to the workflow service without any number of resources available. An example of
user interference. application that is represented by a workflow
The Dynamic Deployment Virtual Resource is and can have different sizes is Montage [23].
used by the infrastructure when resources Montage is an image application that makes
from the cloud are needed. It is composed of mosaics from the sky for astronomy research.
two groups of services. The first group is the Its workflow size depends on the square
DDVM itself, which communicates with the degree size of the sky to be generated. For
infrastructure, taking care of the example, for a 1 square degree of the sky, a
functionalities. The second group called Cloud workflow with 232 jobs is executed. For a 10
Interface Service (CIS) makes the interface square degrees of the sky a 20,652 jobs
with the cloud. This layout gives flexibility and workflow is executed, dealing with an amount
scalability to the infrastructure. For each of data near to 100 GB. The full sky is around
resource, one instance of the couple 400000 squaredegrees [24]. For such an
DDVM/CIS is responsible for the binding application, an elastic infrastructure is desired,
between the workflow manager and the cloud where the number of resources can be
resource. To use the cloud resources, the GPO adapted according to the size of the
communicates with the DDVM, which application to be run. In our hybrid system,
communicates with the cloud through CIS, and the infrastructure can avoid the cloud use
requests a resource. The resource monitor when the grid resources are sufficient to the
maintains information about the cloud load. execution of the workflow. On the other hand,
The Computational resource contains all the when the workflow is too large, the
resources used in both the grid and the cloud. infrastructure can gather resources from
The hybrid system thus enables the execution clouds to afford its execution.
of service workflow through dynamic service Our infrastructure can be applied in cases
composition. The infrastructure supplies where deadlines for the completion of the
service support offering automatic service workflow execution exist. If the scheduler
deployment in the resources provided finds that the grid itself is not able to provide
dynamically, and controlling the service resources in the quantity and quality needed

133
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

to execute a workflow within a given deadline, giving to the user the responsibility of
it may ask the cloud for resources to compose preparing the environment for such execution.
the infrastructure and therefore be capable of We propose an infrastructure that covers this
finishing the workflow before the deadline is aspect, offering support to automatically install
reached. services in the resources dynamically provided
by the grid or by the cloud, while providing
VI. CONCLUSION the service workflow execution control. Our
workflow management system interacts with
The cloud computing provides computational the cloud and with the grid in a transparent
resources on demand for dedicated use. On manner, without the user interference, and its
the other hand, the computational grid functionalities allow the execution of abstract
proposes interoperability among workflows without indicating which resource
heterogeneous resources through Internet must be used. Additionally, the proposed
protocols. In this paper we discuss the hybrid system management gives flexibility
characteristics and requirements of a hybrid and scalability to our infrastructure, permitting
system formed by these two technologies, and the use of several clouds with different
we propose an infrastructure to the architectures in an independent and
management of service workflows in this simultaneous way.
system. Our motivation comes from the fact Future works are to study how to identify
that both technologies do not offer adequate Cloud load peaks and to choose the best
support to the execution of service workflows, cloud.

REFERENCES [8] ―Salesforce,‖ 2009. [Online]. Available:


http://www.salesforce.com/
[1] ―The NIST definition of cloud [9] ―Google app engine,‖ 2009. [Online].
computing 15,‖National Institute of Standards Available:http://code.google.com/intl/en/appe
and Technology(NIST), Tech. Rep., July 2009. ngine/
[Online].Available:http://csrc.nist.gov/groups/ [10] ―Amazonwebservices(aws),‖2009.[Onli
SNS/cloudcomputing/cloud-defv15. Doc ne]. Available:http://aws.amazon.com/
[2] ―Nimbus toolkit, globus aliance,‖ 2009. [11] D. Nurmi, R. Wolski, C. Grzegorczyk,
[Online].Available:http://workspace.globus.org G. Obertelli, S. Soman,L. Youseff, and D.
/ Zagorodnov, ―The Eucalyptus open-source
[3] ―Amazon elastic compute cloud cloudcomputing system,‖ in 2009 9th
(amazon EC2),‖ 2009. [Online]Available: IEEE/ACM International Symposium on Cluster
http://aws.amazon.com/ec2/ Computing and the Grid (CCGRID), vol. 0.
[4] I. Foster, C. Kesselman, and S. Washington, DC, USA: IEEE, May 2009, pp.
Tuecke, ―The anatomy of the grid:Enabling 124–131. [Online].Available:
scalable virtual organizations,‖ International http://dx.doi.org/10.1109/CCGRID.2009.93
Journal of Supercomputer Applications, vol. [12] J. Smith and R. Nair, Virtual machines:
15(3), pp. 200–222, 2001. versatile platforms for systems and processes.
[5] F. Curbera, R. Khalaf, N. Mukhi, S. Tai, The Morgan Kaufmann, 2003.
and S. Weerawarana, ―The next step in web [13] C. Hoffa, G. Mehta, T. Freeman, E.
services,‖ Communications of ACM, vol. 46, Deelman, K. Keahey, B. Berriman, and J.
no. 10, pp. 29–34, 2003. Good, ―On the use of cloud computing for
[6] ―Globus toolkit version 4, globus scientific workflows,‖ in ESCIENCE ‘08:
aliance,‖ 2008. [Online]. Available: Proceedings of the 2008 Fourth IEEE
http://www.globus.org/toolkit/ International Conference on eScience.
[7] ―Google app,‖ 2009. [Online]. Washington, DC, USA: IEEE Computer Society,
Available: 2008, pp. 640–645.
http://www.google.com/apps/intl/enB/busines [14] C. Zhang and H. D. Sterck, ―Cloudwf:
s/index.html A computational workflow system for clouds
based on hadoop,‖ in CloudCom, ser. Lecture

134
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Notes in Computer Science, M. G. Jaatun, G. International Conference on Network and


Zhao, and C. Rong, Eds., vol. 5931. Springer, Parallel Computing, Springer-Verlag LNCS
2009, pp. 393–404. 3779,Beijing,China,2005, pp.2– 13.
[15] S.Jha,A.Merzky,and G. Fox,―Using [20] ―Web services business process
clouds to provide grids with higher levels of execution language version 2.0,‖ OASIS Web
abstraction and explicit support for Services Business Process Execution Language
usagemodes,‖Concurr.Comput. : Pract.Exper., (WSBPEL) TC, Tech. Rep., April 2007.
vol. 21, no. 8, pp. 1087–1108, 2009. [Online]. Available: http://docs.oasisopen.
[16] E. Caron, F. Desprez, D. Loureiro, and org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html
A. Muresan, ―Cloud computing resource [21] Luiz F. Bittencourt, Carlos R. Senna,
management through a grid middleware: A and Edmundo R. M. Madeira, ―Enabling
case study with diet and eucalyptus,‖ Cloud Executiono f Service Workflows in
Computing, IEEE International Conference on, Grid/CloudHybridSystems‖,Network Operations
vol. 0, pp. 151–154, 2009. and Management Symposium
[17] T. Dornemann, E. Juhnke, and B. Workshops(NOMSWksps),2010 IEEE/IFIP
Freisleben, ―On-demand resource provisioning [22] Alina Simion, Dragos Sbirlea, Florin
for bpel workflows using amazon‘s elastic Pop, Valentin Cristea, ―Dynamic Scheduling
compute cloud,‖in CCGRID ‘09: Proceedings of Algorithms for Workflow Applications in Grid
the 2009 9th IEEE/ACM International Environment‖,200911th International
Symposium on Cluster Computing and the Symposium on Symbolic and Numeric
Grid. Washington, DC, USA: IEEE Computer Algorithms for Scientific Computing
Society, 2009, pp. 140–147. [23] E. Deelman, G. Singh, M.-H. Su, J.
[18] C. Senna, L. Bittencourt, and E. Blythe, Y. Gil, C. Kesselman, G. Mehta, K.
Madeira, ―Execution of service workflows in Vahi, G.B.Berriman, J. Good, A. Laity, J. C.
grid environment,‖ in TridentCom 2009: Jacob, and D. S. Katz, ―Pegasus: A framework
Proceedings of the 5th International for mapping complex scientific
Conference on Testbeds and Research workflowsontodistributedsystems,‖Scientific
Infrastructures for the Development of Programming Journal, vol. 13, no. 3, pp. 219–
Networks &Communities and Workshops. 237, 2005.
IEEE, 2009, pp. 1–10. [24] E. Deelman, ―Clouds: An opportunity
[19] I. Foster, ―Globus toolkit version 4: for scientific applications? (keynote in the
Software for service-oriented systems,‖ in IFIP 2008 Cracow Grid Workshops),‖ 2008.

135
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

IMPROVISED SOLUTION THROUGH MERKLE


TREE ALGORITHM FOR SECURE MULTIPATH
ROUTING WITH EFFICIENT COLLABORATION
OF BLACK HOLES
T.J.Nandhini, M.E – Computer Science, RajalakshmiEngineeringCollege, E-mail
id:nandhinii.km@gmail.com,Mobile No:9566771075
Abstract - Compromised node and denial 1. INTRODUCTION
of service are two key attacks in wireless 1.1Motivations
sensor networks (WSNs). These two On the various possible security
attacks are similar in the sense that they threats encountered in a wireless sensor
generate black holes. We argue that network (WSN), in this paper, we
classic multipath routing approaches are arespecifically
vulnerable to such attacks, but if the
adversary acquires the routing algorithm,
it can compute the same routes known to interested in combating two types of
the source, hence, making all information attacks: compromised node (CN) and
sent over these routes vulnerable to its denial of service (DOS). In the CN attack,
attacks. In this paper, we develop an adversary physically compromises
mechanisms that generate randomized asubset of nodes to eavesdrop
multipath routes. And we study data information, whereas in the DOS attack,
delivery mechanisms that can with high the adversary interferes with the normal
probability circumvent black holes formed operation of the network by actively
by these attacks. Under our designs, the disrupting, changing, disrupt normal data
routes taken by the ―shares‖ of different delivery betweensensor nodes and the
packets change over time. So even if the sink, even partition the topology, or even
routing algorithm becomes known to the paralyzing the functionality of a subset of
adversary, the adversary still cannot nodes. These two attacks are similar in
pinpoint the routes traversed by each the sense that they both generate black
packet. Beyond this we use shamir‘s holes: areas within which the adversary
secret sharing algorithm and merkle tress can either passively intercept or actively
algorithm to secure the data packet from block information delivery. Aconventional
attackers. The generated routes are also cryptography-based security method
highly dispersive and energy efficient, cannotalone provide satisfactory solutions
making them quite capable of to these problems. This is because, by
circumventing black holes. We analytically definition, once a node is compromised,
investigate the security and energy the adversary can always acquire the
performance of the proposed schemes and encryption/decryptionkeys of that node,
formulate an optimization problem to and thus can intercept any information
minimize the end-to-endenergy passed through it. Likewise, an adversary
Consumption under given security can always perform DOS attacks (e.g.,
constraints. jamming) even if it does not have any
knowledge of the underlying
Index Terms - Randomized multipath cryptosystem.One remedial solution to
routing, wireless sensor network, secure these attacks is to exploit the network‘s
data delivery. routing functionality. Specifically, if the
locations of the black holes are known a

136
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

priori, then data can bedelivered over theroutes may not be spatially dispersive
paths that circumvent (bypass) these enough to circumventa moderate-size
holes, whenever possible. In practice, due black hole.
to the difficulty of acquiring such location In this paper, we propose a
information, the above idea isimplemented randomized multipath routingalgorithm
in a probabilistic manner, typically through that can overcome the above problems. In
atwo-step process. First, the packet is thisalgorithm, multiple paths are
broken into M shares(i.e., components of computed in a randomizedway each time
a packet that carry partial information) an information packet needs to be sent,
using a (T-M)-threshold secret sharing suchthat the set of routes taken by
mechanism such as the Shamir‘s various shares of differentpackets keep
algorithm. The original information can changing over time. As a result, a large
berecovered from a combination of at numberof routes can be potentially
least T shares, but noinformation can be generated for each source and
guessed from less than T shares. destination. To intercept different packets,
F(x)=∑2j=0 = yj . lj (x) the adversary hasto compromise or jam all
(1) possible routes from the source tothe
destination, which is practically not
Second,multiple routes from the source to possible.
the destination are computed according to Because routes are now randomly
some multipath routing algorithm.These generated, they mayno longer be node-
routes are node-disjoint ormaximally disjoint. However, the algorithm
node-disjoint subject to certain ensuresthat the randomly generated
constraints. We argue that four security routes are as dispersive aspossible, i.e.,
problems exist in the above approach. the routes are geographically separated as
First, this approach is no longer valid if the faras possible such that they have high
adversary can selectively compromise or likelihood of notsimultaneously passing
jamnodes. This is because the route through a black hole. The main challenge
computation in the abovemultipath routing in our design is to generate
algorithms is deterministic in the sensethat highlydispersive random routes at low
for a given topology and given source and energy cost. And for secure set of packets
destinationnodes, the same set of routes we propose merkle tree algorithm that
are always computed by therouting authenticate the set of packets. It
algorithm. As a result, once the routing generate tree for the set and attaches a
algorithmbecomes known to the adversary mark to each packet and saves
the adversary can compute the set of computation overhead at each receiver.
routes for any givensource and
destination. Then, the adversary can
pinpoint toone particular node in each
route and compromise (or jam)these
nodes. Such an attack can intercept all
shares of theinformation Second, actually
very fewnode-disjoint routes can be found
when the node density ismoderate and the
source and destination nodes are
severalhops apart. Third it assign single
secret key for whole packet, (i.e.,
assigning single key for whole set of
packets. so if adversary pinpoint the key
they can easily retrieved all packets in the Fig.1. Randomized dispersive routing in a
set. Last, because theset of routes is WSN.
computed under certain constraints,

137
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

1.2 Contributions and Organization As illustrated in Fig. 1, we consider a


The key contributions of this work are as three-phase approachfor secure
follows: information delivery in a WSN: secret
1.We explore the potential of random sharing of information, randomized
dispersion forinformation delivery in propagation of each informationshare, and
WSNs. Depending on thetype of normal routing (e.g., min-hop routing)
information available to a sensor, we towardthe sink. More specifically, when a
developfour distributed schemes for sensor node wants tosend a packet to the
propagating information―shares‖: purely sink, it first breaks the packet into M
random propagation (PRP),directed shares, according to a (T-M)-threshold
random propagation (DRP), nonrepetitive secret sharingalgorithm, e.g., Shamir‘s
random propagation (NRRP), and algorithm. Each share is thentransmitted
multicast treeassistedrandom propagation to some randomly selected neighbor.
(MTRP). PRP utilizesonly one-hop Thatneighbor will continue to relay the
neighborhood information and share it has received toother randomly
providesbaseline performance. DRP utilizes selected neighbors, and so on. In
two-hop neighborhood information to eachshare, there is a TTL field, whose
improve the propagationefficiency, leading initial value is set by thesource node to
to a smaller packet interception control the total number of random
probability. The NRRP scheme achieves relays.After each relay, the TTL field is
asimilar effect, but in a different way: it reduced by 1. When theTTL value reaches
records alltraversed nodes to avoid 0, the last node to receive this
traversing them again inthe future. MTRP sharebegins to route it toward the sink
tries to propagate shares in thedirection of using min-hop routing.Once the sink
the sink, making the delivery processmore collects at least T shares, it can
energy efficient. reconstruct theoriginal packet. No
2.We theoretically evaluate the information can be recovered from
goodness of thesedispersive routes in lessthan T shares. The effect of route
terms of avoiding black holes.Ouranalysis dispersiveness on bypassing blackholes is
helps us better to understand how illustrated in Fig. 2, where the dotted
securityis achieved under dispersive circlesrepresent the ranges the secret
routing. Based on thisanalysis, we shares can be propagated toin the random
investigate the trade-off between propagation phase. A larger dotted
therandom propagation parameter and the circleimplies that the resulting routes are
secretsharing parameter. We further geographically moredispersive. Comparing
optimize these parametersto minimize the the two cases in Fig. 2, it is clear thatthe
end-to-end energy consumptionunder a routes of higher dispersiveness are more
given security constraint. capable ofavoiding the black hole. Clearly,
3. We study theperformance of the the random propagationphase is the key
proposed schemes under morerealistic component that dictates the security and
settings. Our results are used toverify the energy performance of the entire
effectiveness of our design. When mechanism.
theparameters are appropriately set, all
four randomizedschemes are shown to
provide better securityperformance at a
reasonable energy cost than
theirdeterministic counterparts.

2. RANDOMIZED MULTIPATH DELIVERY

2.1 Overview

138
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

optimizationproblem to solve for the most


energy-efficient combinationof M and N
subject to a given security constraint.
Formally,
this is given as follows:

minimize Q(PRP) (M,N) (2)

s:t: Ps(max) (M,N)≤ PS(req)

1 ≤ M ≤Mmax;
1 ≤ N ≤ Nmax;

where M and N are variables and PS(req) is


Fig. 2. Implication of route dispersiveness the given securityrequirement. The upper
on bypassing the black hole. bounds, Mmax and Nmax, aredictated by
(a) Routes of higher dispersiveness. practical considerations such as the
(b) Routes of lower dispersiveness.
hardwareor energy constraints. Because
the range of M and N thatare of practical
interest is not large, e.g., at most few of
2.2. Energy-Optimal Secret Sharing and
tens,the space of feasible (M,N) is
Random Propagation
moderate. Thus, the optimal(Mo,No) can
In this section, we consider the problem of
be solved by the exhaustive search
deciding theparameters for secret sharing
algorithm.
(M) and random propagation(N) to
achieve a desired security performance.
To obtain themaximum protection of the
information, the thresholdparameter
should be set as T ¼ M. Then, increasing
thenumber of propagation steps (N) and
increasing thenumber of shares a packet is
broken into (M) has a similareffect on
reducing the message interception
probability.Specifically, to achieve a given
Ps(max) for a packet, we could
either break the packet into more shares
but restrict therandom propagation of
these shares within a smaller range,or
break the packet into fewer shares but
randomlypropagate these shares into a
larger range. Therefore, whenthe security
performance is concerned, a trade-off
relationship Fig.3. Energy consumption under different
exists between the parameters M and N. (N, M).
On the otherhand, although different
combinations of M and N maycontribute to 3.DATA SECURING USING MERKLE TREE
the same Psmax , their energy cost may We propose merkle tree algorithm for
bedifferent, depending on the parameters secure information in set of packets. It
Ls, Lp, and q. This motivates us to include generates merkle tree for the set and
their energy consumption attaches a mark to each packet. The root
intoconsideration when deciding the secret can be recovered based on each packet
sharing and randompropagation ant its mark and each packet in the set
parameters: We can formulate an has unique secret key. Each receiver does

139
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

not need to wait for a set to include all the


packets under the merkle tree, and it can [4] T. Claveirole, M.D. de Amorim, M.
batch-verify the set anytime. Once the set Abdalla,
is authentic, the corresponding root can and Y. Viniotis,―Securing Wireless Sensor
be used to authenticate the rest of Networks Against Aggregator
packets under the same merkle tree Compromises,‖
without batch-verifying them, which saves IEEE Comm. Magazine, vol. 46, no. 4, pp.
computation overhead at each receiver. 134-141, Apr.2008.
Each node has multiple of two node. It [5] D.B. Johnson, D.A. Maltz, and J. Broch,
allow more number of data to be added in ―DSR: The DynamicSource Routing
the set. Protocol
for Multihop Wireless Ad Hoc Networks,‖
Ad Hoc Networking, C.E. Perkins, ed., pp.
4.CONCLUSIONS 139-172,Addison-Wesley, 2001.
Our analysis have shown the effectiveness [6] P.C. Lee, V. Misra, and D. Rubenstein,
of the randomized dispersive multipath ―Distributed Algorithmsfor Secure
routing in combating CN and DOS attacks. Multipath
The generated routes are higly dispersive Routing,‖ Proc. IEEE INFOCOM, pp. 1952-
and energy efficient.By appropriately 1963, Mar. 2005.
setting the secret sharing and propagation [7] P.C. Lee, V. Misra, and D. Rubenstein,
parameters, the packetinterception ―Distributed Algorithms forSecure
probability can be easily reduced by Multipath
theproposed algorithms to as low as 10-3. Routing in Attack-Resistant Networks,‖
The secret sharing and merkle tree IEEE/
algorithms can be applied to packets to ACM Trans. Networking, vol. 15, no. 6, pp.
provide additional security levels.It help to 1490-1501, Dec. 2007.
handle multiple collaborating black holes. [8] S.J. Lee and M. Gerla, ―Split Multipath
Routing with MaximallyDisjoint Paths in Ad
Hoc Networks,‖ Proc. IEEE Int‘l Conf.
REFERENCES Comm.(ICC), pp. 3201-3205, 2001.
[9] X.Y. Li, K. Moaveninejad, and O.
[1] I.F. Akyildiz, W. Su, Y. Frieder,
Sankarasubramaniam, ―Regional GossipRouting Wireless Ad Hoc
and E. Cayirci,―A Survey on Sensor Networks,‖ ACM J. Mobile Networks and
Networks,‖ IEEE Comm. Magazine, vol. Applications, vol. 10, nos. 1-2, pp. 61-77,
40, Feb.
no. 8,pp. 102-114, Aug. 2002. 2005.
[2] C.L. Barrett, S.J. Eidenbenz, L. Kroc, [10] W. Lou and Y. Kwon, ―H-Spread: A
M. Hybrid
Marathe, and J.P. Smith,―Parametric Multipath Scheme forSecure and Reliable
Probabilistic Sensor Network Routing,‖ Data Collection in Wireless Sensor
Proc. Networks,‖IEEE Trans. Vehicular
ACMInt‘l Conf. Wireless Sensor Networks Technology, vol. 55, no. 4, pp. 1320-1330,
and Applications (WSNA),pp. 122-131, July 2006.
2003. [11] W. Lou, W. Liu, and Y. Fang,
[3] M. Burmester and T.V. Le, ―Secure ―Spread:
Multipath Enhancing DataConfidentiality in Mobile Ad
Communication inMobile Ad Hoc Hoc Networks,‖ Proc. IEEEINFOCOM, vol.
Networks,‖ Proc. Int‘l Conf. Information 4, pp. 2404-2413, Mar. 2004.
Technology:Coding and Computing, pp. [12] W. Lou, W. Liu, and Y.
405- Zhang,―Performance
409, 2004. Optimization UsingMultipath Routing in

140
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Mobile Ad Hoc and Wireless Sensor (CNDS), 2002.


Networks,‖ Proc. Combinatorial [17] P. Papadimitratos and Z.J. Haas,
Optimization in Comm. Networks,pp. 117- ―Secure
146, 2006. Data Communication inMobile Ad Hoc
[13] M.K. Marina and S.R. Das, ―On- Networks,‖ IEEE J. Selected Areas in
Demand Comm., vol.24, no. 2,pp.343-356, Feb.
Multipath DistanceVector Routing in Ad 2006.
Hoc [18] A. Perrig, R. Szewczyk, V. Wen, D.
Networks,‖ Proc. IEEE Int‘l Conf.Network Culler,
Protocols (ICNP), pp. 14-23, Nov. 2001. and D. Tygar, ―SPINS:Security Protocols
for
[14] R. Mavropodi, P. Kotzanikolaou, and Sensor Networks,‖ Proc. ACM MobiCom,
C. 2001.
Douligeris, ―SecMR—aSecure Multipath [19] K. Ren, W. Lou, and Y. Zhang,
Routing Protocol for Ad Hoc Networks,‖ Ad ―LEDS:
HocNetworks, vol. 5, no. 1, pp. 87-99, Providing Location-AwareEnd-to-End Data
Jan. Security in Wireless Sensor Networks,‖
2007. Proc.
[15] N.F. Maxemchuk, ―Dispersity IEEE INFOCOM, 2006.
Routing,‖ [20]B. Vaidya, J.Y. Pyun, J.A.Park, and
Proc. IEEE Int‘l Conf.Comm. (ICC), pp. S.J.
41.10-41.13, 1975. Han, ―Secure MultipathRouting Scheme for
[16] P. Papadimitratos and Z.J. Haas, Mobile Ad Hoc Network,‖ Proc. IEEE Int‘l
―Secure Symp. Dependable, Autonomic and Secure
Routing for MobileAd Hoc Networks,‖ Computing, pp. 163-171,2007.
Proc. SCS Comm. Networks and
Distributed
Systems Modeling and Simulation Conf.

141
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SYBIL GUARD: DEFENDING AGAINST SYBIL


ATTACKS VIA SOCIAL NETWORKS
*R.V. Lakshmi Priya **Mrs. R. Tamilarasi

*P.G Scholar,priya_be_cs@yahoo.co.in,9715837210
**Assistant Professor II, tamil1806@yahoo.co.in
Department of Computer Science and Engineering
Velammal Engineering College, Chennai,
Tamil Nadu,
India

intersections on edges instead of nodes


Abstract—This paper presents a Sybil and using the novel balance condition to
Guard, for defending against Sybil attacks deal with escaping tails of the verifier.
without relying on a trusted central Sybil Guard provides the first evidence
authority. Open-access systems such as that real-world social networks are indeed
peer-to-peer systems aim to provide fast mixing. This validates the
service to any user who wants to use the fundamental assumption behind Sybil
service. Peer-to-peer and other Guard‘s approach.
decentralized, distributed systems are Keywords- Social networks, Sybil attack,
known to be particularly vulnerable to Sybil identities, Sybil Guard, Sybil Limit.
Sybil attacks. In a Sybil attack, a malicious I. Introduction
user obtains multiple fake identities and Sybil attack [3] is the fundamental problem
pretends to be multiple, distinct nodes in where the attacker can create multiple
the system. Among the small number of identities. It is already observed in the real
decentralized approaches, our recent Sybil world peer to peer systems. Social
Guard leverages a key insight on social networking is the grouping of individuals
networks to bind the number of Sybil into specific groups, like small rural
nodes accepted. Despite its promising communities, or a neighborhood
direction, Sybil Guard can allow a large subdivision, if you will. Although social
number of Sybil nodes to be accepted. networking is possible in person, especially
Furthermore, Sybil Guard assumes that in the workplace, universities, and high
social networks are fast-mixing, which has schools, it is most popular online. Social
never been confirmed in the real world. networking websites function like an online
Sybil Guard exploits this property to bind community of internet users.
the number of identities a malicious user
can create. Sybil Guard offers dramatically
improved and near optimal guarantees.
Sybil Guard guarantee is at most a log n
factor away from optimal, when
considering approaches based on fast-
mixing social networks. Sybil limits
leveraging multiple independent instances
of the random route protocol to perform
many short random routes and exploiting

142
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

relying on a trusted central authority. Sybil


limit leverages a key insight regarding
social networks.
The Sybil Guard Approach
Recently, we propose a Sybil Guard, a new
protocol for defending against Sybil
attacks without relying on a trusted
central authority. In a social network
(Figure 1.), the vertices (nodes) are
identities in the distributed system and the
(undirected) edges correspond to human-
Figure 1. The social network established trust relations in the real
world. The edges connecting the honest
The local entity has no direct physical region (i.e., the region containing all the
knowledge [3] of remote entities; it honest nodes) and the Sybil region (i.e.,
recognizes that only informational the region containing all the Sybil
abstractions called identities. The system identities created by malicious users) are
must ensure that distinct identities refer to called attack edges. Sybil Guard ensures
distinct entities; otherwise, when the local that the number of attack edges is
entity selects a subset of identities to independent of the number of Sybil
redundantly perform a remote operation, it identities and is limited by the number of
can be duped into selecting a single trust relation pairs between malicious
remote entity multiple times, thereby users and honest users.
defeating the redundancy. The Sybil Guard is a completely decentralized
illegitimately presents of multiple identities protocol and enables any honest node V
will create a Sybil attack [3] on the system. called the verifier to decide whether or not
Peer-to-peer and other decentralized, to accept another node S called the
distributed systems are known to be suspect. ―Accepting‖ means that V is
particularly open to Sybil attacks. In a Sybil willing to do collaborative tasks with S.
attack, a malicious user obtains multiple Sybil-Guard‘s provable (probabilistic)
forged identities [6] and pretends to be guarantees hold for verifiers out of the n
multiple, distinct nodes in the system. honest nodes, where is some small
Without a trusted central authority that constant close to 0.
can tie identities to real human beings, II. Related Work
defending against Sybil attacks is quite A. Sybil Attack In Sensor Networks
challenging.When a malicious user‘s Sybil Security is important for many sensor
nodes comprise a large fraction of the network [5] applications. Security in
nodes in the system, that one user is able sensor networks is complicated by the
to ―outvote‖ the honest users in a wide broadcast nature of the wireless
variety of collaborative tasks. The exact communication and the lack of tamper-
form of such collaboration and the exact resistant hardware. Sensor nodes have
fraction of Sybil nodes these collaborative limited storage and computational
tasks can tolerate may differ from case to resources, rendering public key
case. The ultimate form is reached with a cryptography impractical. The Sybil attack
Sybil attack [3], where the attacker creates is a harmful attack in sensor networks. In
a potentially unlimited number of fake the Sybil attack, a malicious node behaves
identities [6] (i.e. Sybil identities) to vote. as if it were a larger number of nodes, for
A generic requirement for upsetting such example by impersonating other nodes or
attacks is that the number of Sybil nodes simply by claiming false identities. An
needs to be properly bounded. Sybil Guard attacker may generate an arbitrary
a new protocol is the solution for number of additional node identities, using
defending against Sybil attacks without only one physical device. Several novel

143
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

methods proposed by which a node can individual entities in the external world. A
verify whether other identities are Sybil single individual who controls many
identities, including radio resource testing, identities can disrupt, manipulate, or
key validation for random key pre corrupt peer-to-peer applications and
distribution, position verification and other applications that rely on
registration. redundancy; this is commonly called the
Direct validation is a node directly tests Sybil attack. Detection of Sybil attack is
whether another node identity is valid. the problem. To solve this problem, this
The most promising method among the introduces a trust game that makes false
methodology is the random key pre- claims financially risky for the claimant.
distribution which associates a node's keys The informant [4] will accept the game if
with its identity. Random key pre and only if she is Sybil with a low
distribution will be used in many scenarios opportunity cost, and the target will
for secure communication, and because it cooperate if and only if she is identical to
relies on well understood cryptographic the informant. Sybil Game is a more
principles it is easier to analyze than other sophisticated game that includes the
methods. These methods are robust to economic benefit to the detective of
compromised nodes. In indirect validation, learning of Sybil and the economic cost to
nodes that have already been verified are informant and target of revealing that
allowed to vouch for or refute other Sybil‘s are present. This paper [4] proves
nodes. This paper [5] leaves secure the optimal strategies for each participant.
methods of indirect validation as future The detective will offer the game if and
work. only if it will determine her choice about
Sybil Attack In Recommendation Systems using the application in which these
Recommendation systems [8] can be identities participate. As future work,
attacked in various ways, and the ultimate intends to develop a protocol to detect
attack form is reached with a Sybil attack, Sybil attack.
where the attacker creates a potentially The methodology applied in [1] are
unlimited number of Sybil identities to Inferring honest sets, Approximating EXX,
vote. Defending against Sybil attacks is representing the ‗gap‘ between the case
often quite challenging, and the nature of when the full graph is fast mixing,
recommendation systems makes it even Sampling honest configurations,
harder. Exploiting heavy-tail distribution of Experimental evaluation using synthetic
typical voting behavior of the honest data, and the final Experimental
identities, Carefully identifying whether evaluation using real world data. Through
the system is already getting ―enough analytical results as well as experiments
help‖ from the (weighted) voters already on simulated and real-world network
taken into account or whether more ―help‖ topologies that, given standard constraints
is needed; DSybil [8] can defend against on the adversary, Sybil Infer [1] is secure,
an unlimited number of Sybil identities in that it successfully distinguishes
over time. DSybil provides a growing between honest and dishonest nodes and
defense. If the user has used DSybil for is not susceptible to manipulation by the
some time when the attack starts, the loss adversary. Results show that Sybil Infer
will be significantly smaller than the loss outperforms state of the art algorithms,
under the worst-case attack. DSybil into both in being more widely applicable, as
real-world recommendation systems and well as providing vastly more accurate
study the system‘s robustness against results. Modifying the simple minded
DDoS. protocol into a fully fledged one-hop
Sybil Attack In Peer To Peer Systems distributed hash table is an interesting
Networked applications [4] often assume challenge for future work. Sybil Infer can
or require that identities over network also be applied to specific on-line
have a one-to-one relationship with communities. In such cases a set of nodes

144
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

belonging to a certain community of regardless of the number of Sybil


interest can be extracted to form a sub- identities. SumUp limits the number of
graph. Sybil Infer can then be applied on false votes to be no more than the
this partial view of the graph, to detect number of attack edges with high
nodes that are less well integrated than probability. SumUp can significantly limit
others in to detect nodes that are less well the number of bogus votes without
integrated than others in the group. affecting the number of honest votes that
On-line Voting System [7] is a web based can be gathered. Additionally, SumUp uses
system that facilitates the running of user feedback on false votes to further
elections and surveys online. This system reduce the attack capacity to below the
has been developed to simplify the number of attack edges. The specific
process of organizing elections and make feedback mechanism used by SumUp.
it convenient for voters to vote remotely Capacity assignment should minimize the
from their home computers while taking attack capacity. SumUp collects votes from
into consideration security, anonymity and a trusted source by computing a set of
providing auditioning capabilities. Online max-flow paths on the trust graph from
voting system is liable to the Sybil attack the source to all voters. The basic design
where adversaries can out-vote real users has two limitations. First, although the
by creating several Sybil identities. A basic expected attack capacity is bounded by
problem with any user-based content the number of attack edges, there might
rating system is the Sybil attack where the be cases where is high when some
attacker can out-vote real users by adversarial identities happen to be close to
creating many Sybil identities. the source. Second, the basic design only
SumUp [7], a Sybil resilient online content bounds the number of false votes
rating system that advantages a trust collected on a single object. As a result,
networks among users to defend against adversaries can still cast up to false votes
Sybil attacks with strong security on every object in the system. The real-
guarantees. SumUp, a Sybil-resilient online world benefits of SumUp by evaluating it
content rating system that prevents on the voting trace of Digg. SumUp has
adversaries from arbitrarily distorting detected many suspicious articles marked
voting results SumUp addresses the basic as ―popular‖ by Digg.Digg is a social news
vote aggregation problem of how to website. Digg is a place for people to
aggregate votes from different users in a discover and share content from anywhere
trust network in the face of Sybil identities on the web.
casting an arbitrarily large number of false III. System Model And Attack Model
votes. By using the technique of adaptive Sybil Limit adopts a similar system model
vote flow aggregation, SumUp can and attack model as Sybil Guard. The
significantly limit the number of false system has honest human beings as
votes cast by adversaries to no more than honest users, each with one honest
the number of attack edges in the trust identity/node. Honest nodes obey the
network SumUp powers the user voting protocol. The system also has one or more
history to further restrict the voting power malicious human beings as malicious
of adversaries who continuously users, each with one or more identities/
misbehave to below the attack edges. nodes. To unify terminology, we call all
Aggregate all votes from honest users. identities created by malicious users as
Limit the number of false votes from the Sybil identities/nodes. Sybil nodes are
attacker. Eventually ignore votes from Byzantine and may behave arbitrarily. All
nodes that repetitively cast false votes. Sybil nodes are colluding and are
Capacity assignment is to construct a vote controlled by an adversary. A
envelope around the source. Compromised honest node is completely
SumUp bounds the power of attackers controlled by the adversary and hence is
according to the number of attack edges

145
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

considered as a Sybil node and not as an correlation within a single random route.
honest node. Namely, if a random route visits the same
Every node is simultaneously a suspect node more than once, the exiting edges
and a verifier. As in Sybil Guard, we will be correlated. In Sybil Guard, a
assume that each suspect S has a locally random walks starting from an honest
generated public/private key pair, which node in the social network is called
serves to prevent the adversary from escaping if it ever crosses any attack
―stealing‖ S‘s identity after is accepted. edge.
When a verifier V accepts a suspect S,V B. Secure Random Route Protocol.
actually accepts S‘s public key, which can We first focus on all the suspects in Sybil
be used later to authenticate. Limit, i.e., nodes seeking to be accepted.
IV. Sybil Limit Protocol Figure 2. presents the pseudo-code for
Sybil Limit has two component protocols: how they perform random routes . In the
a secure random route protocol and a protocol, each node has a public/private
verification protocol. The first protocol key pair and communicates only with its
runs in the background and maintains neighbors in the social network. Every pair
information used by the second protocol. of neighbors share a unique symmetric
A. Random Walk And Random secret key (the edge key, established out
Routes of band for authenticating each other. A
Sybil Guard uses a special kind of random Sybil node M1 may disclose its edge key
walk, called random routes, in the social with some honest node A to another Sybil
network. In a random walk, at each hop, node M2. However, because all neighbors
the current node flips a coin on the fly to are authenticated via the edge key, when
select a uniformly random edge to direct M2 sends a message to A , A will still route
the walk (the walk is allowed to turn the message as if it comes from M1.In the
back). For random routes, each node uses protocol, every node has a pre computed
a pre computed random permutation— random permutation x1x2,…xd( d being
―x1x2,…xd,‖ where d is the degree of the the node‘s degree) as its routing table.
node—as a one-to-one mapping from The routing table never changes unless
incoming edges to outgoing edges. A the node adds new neighbors or deletes
random route entering via edge will old neighbors. A suspect S starts a
always exit via edge xi . This pre random route along a uniformly random
computed permutation, or routing table, edge (of S) and propagates along the
serves to introduce external correlation route its public key Ks together with a
across multiple random routes. Namely, counter initialized to 1.
once two random routes traverse the
same directed edge, they will merge and
stay merged (i.e., they converge).
Furthermore, the outgoing edge uniquely
determines the incoming edge as well;
thus the random routes can be back-
traced. These two properties are key to
Sybil Guard‘s guarantees. As a side effect,
such routing tables also introduce internal

146
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Executed by each suspect S


1. S picks a uniformly random neighborY;
2. S sends to Y :(1,S‘s public key Ks, MAC(1||Ks,))with the MAC generated using the Edge
Key between S and Y;
Executed by each node B upon receiving a message(i, Ks, MAC) from some neighbor A:
1. discard the message if the MAC does not verify or i <1 or i > w;
2. if (i = w)
3. record Ks under the edge name ―KA → KB ‖ where KA and KB are A‘s and B‘s public key,
respectively;
4. else
5. look up the routing table and determine to which neighbor (C) the random route
should be directed;
6. B sends to C: (i + 1, Ks, MAC((i + 1)|| Ks)) with the MAC generated using the edge key
between B and C;

Figure 2. Secure random route protocol


D. Estimating the number of routes needed
Every node along the route increments the counter Sybil Limit uses a novel and perhaps counterintuitive
and forwards the message until the counter reaches w, benchmarking technique [9] to address the number of
the length of a random route. Sybil Limit‘s end routes
guarantees hold even if Sybil nodes (on the route)
modify the message. In Sybil Limit, w is chosen to be problem by mixing the real suspects with some
the mixing time of the honest region of the social random benchmark nodes [9] that are already known
network. All these random routes need to be to be mostly honest.
performed only one time (until the social network Every verifier V maintains two sets of suspects: the
changes) and the relevant information will be benchmark set K and the test set T. The benchmark
recorded. set is constructed by repeatedly performing random
C. Verification protocol routes of length w and then adding the ending node
After the secure random route protocol stabilizes, a (called the benchmark node) to K. Let K +and K- be the
verifier can invoke the verification protocol in Figure 3 set of honest and Sybil suspects in K, respectively.
to determine whether to accept a suspect S. The Sybil Limit does not know which nodes in K belong to
intersection condition requires that S‘s tails and V‘s K + .However, a key property here is that because the
tails must intersect (instance number is ignored when escaping probability [2] of such random routes is o(1)
determining intersection), with S being registered at , even without invoking Sybil Limit, we are assured
the intersecting tail. In contrast, Sybil Guard has an that |K- |/ |K|= o(1). The test set T
intersection condition on nodes (instead of on edges or contains the real suspects that V wants to verify,
tails). For the balance condition, V maintains r which may or may not happen to belong to K. We
counters corresponding to its r tails. Every accepted similarly define T +and T- . Our technique will hinge
suspect increment the ―load‖ of some tail. The balance upon the adversary not knowing K + or T- even
condition requires that accepting S should not result in though it may know K +U T+and K -U T-.
a large ―load spike‖ and cause the load on any tail to To estimate r, a verifier V starts from r=1 and then
exceed h.max(log r, a). Here a, is the current average repeatedly doubles r,. For every r value, verifies all
load across all V‘s tails, and is some universal constant suspects in K and T. It stops doubling when most of
that is not too small. In comparison, Sybil Guard does the nodes in K are accepted, and then makes a final
not have any balance condition. The verification determination for each suspect in T. The
protocol can be made highly efficient. The adversary benchmarking technique may appear counter intuitive
may intentionally introduce additional intersections in in two aspects. First, if Sybil Limit uses an under
the Sybil region between S‘s and V‘s escaping tails.

147
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

1. S sends to V its public key Ks and S ‗s set of tails {(j, KA, KB ) | S‘s tail in the jth s-
instance is the edge ―A→B‖ and KA (KB) is A‘s (B‘s) public key};
2. V computes the set of intersecting tails X={(i, KA, KB) | (i, KA, KB) is V‘s tail and (j, KA, KB
) is S‘s tail};
3. For every (i, KA, KB) ϵX, V authenticates B usingKB and asks B whether S isregistered
under ―KA → KB ‖
If not, remove (i, KA, KB) from X;
4. If X is empty then reject S and return;
5. Let a= (1+ ∑ri=1 ci) / r and b = h.max(log r, a);
6. Let cmin bethe smallest counter among those ci‘s
corresponding to(i, KA, KB) that still remain in X
7. If (cmin + 1) >b) then reject S; otherwise, increment cmin and acceptS;
estim
ated
r, it
will
be the adversary that helps it to accept most of the
honest nodes. Second, the benchmark set is itself a
set with fraction of Sybil nodes. That an application
can just use the nodes in directly and avoid the full
Sybil Limit protocol.
V.Evaluation Figure4.Result of sybil attack
Our experiments thus mainly serve to
validate such an assumption, based on To bind the number of Sybil nodes we use
real world social networks. Such validation the Java and JavaFX to prove our Sybil
has a more general implication beyond guarantees. JavaFX is used for complete
Sybil Limit—these results will tell us Graphical User Interface design. Sybil
whether the approach of leveraging social Guard uses registry tables and witness
networks to combat Sybil attacks is valid. tables. Registry tables ensure that each
A second goal of our experiments is to node registers with the nodes on its
gain better understanding of the hidden random routes. The witness table is
constant in Sybil Limit‘s o (log n) propagated and updated in a similar
guarantee. fashion as the registry table, except that it
propagates ―backward‖. This process is
used to verify the receiver node and needs
to perform an intersection between each
of its random routes.
It reduces communication overhead.
When a node interact with other node, it
always authenticates the node by
requiring that node to sign every message
sent, using its private key. In Sybil Guard,
a node communicates with other nodes
only when (i) it tries to verify another
node, and hence needs to contact the
intersection nodes of the random routes,
and (ii) it propagates its registry and
witness tables to its neighbors. It also has
a mechanism that allows a node to bypass
offline nodes when propagating registry
and witness tables. In the process of
propagating/updating registry and witness

148
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

tables, the social network may change is quite satisfactory. The case without
again. Thus, it is helpful to consider it as a using redundancy is much worse (even if
decentralized, background stabilization we seek only a single intersection),
process. demonstrating that exploiting redundancy
is necessary. For our 10000-node topology
and 100-node topology, g = 204 and g =
11 give probabilities of 99.6% and 87.7%,
respectively. Notice that an 87.7%
probability does not mean that 12.3% of
the nodes will not be accepted by the
system. It only means that given a
verifier, 12.3% of the nodes will not be
accepted by that verifier. Each honest
Figure 3. Probability of routes node, on average, should still be accepted
remaining entirely within the honest by 87.7% of the honest nodes (verifiers).
region. VI.Concluding Remarks
This paper presented Sybil Guard, a near-
optimal defense against Sybil attacks
using social networks. Sybil Guard
improvement derives from the
combination of multiple novel techniques:
1) leveraging multiple independent
instances of the random route protocol to
perform many short random routes; 2)
exploiting intersections on edges instead
of nodes; 3) using the novel balance
condition to deal with escaping tails of the
Figure 4. Probability of an honest verifier; and 4) using the novel
node accepting another honest node benchmarking technique to safely estimate
Figure 5 shows the probability that the . Finally, our results on real-world social
majority of an honest node‘s routes networks confirmed their fast-mixing
remain entirely in the honest region. As property and, thus, validated the
we can see from the Figure5, the fundamental assumption behind Sybil
probability [2] is always almost 100% Limit‘s (and Sybil Guard‘s) approach. As
before g = 2000, and only drops to 99.8% future work, we intend to implement Sybil
when g = 2500. This means that even Limit within the context of some real-
with 2500 attack edges, only 0.2% of the world applications and demonstrate its
nodes are not protected by Sybil Guard. utility.
These are mostly nodes adjacent to References
multiple attack edges. In some sense, [1] G. Danezis and P. Mittal, ―SybilInfer:
these nodes are ―paying the price‖ for Detecting sybil nodes using social
being friends of Sybil attackers. For the networks,‖ presented at the NDSS, 2009.
10000-node topology and the 100-node [2] M. Mitzenmacher and E. Upfal,
topology, g = 204 and g = 11 will result in Probability and Computing. Cambridge,
0.4% and 5.1% nodes unprotected, U.K.: Cambridge Univ. Press, 2005.
respectively. For better understanding, [3] J. Douceur, ―The Sybil attack,‖ in Proc.
Figure 5 also includes a second curve IPTPS, 2002, pp. 251–260.
showing the probability of a single route [4] N. B. Margolin and B. N. Levine,
remaining entirely in the honest region. ―Informant: Detecting sybils using
Figure 6 presents the probability of V incentives,‖ in Proc. Financial
accepting S, as a function of the number Cryptography, 2007, pp. 192–207.
of attack edges g. This probability [2] is
still 99.8% with 2500 attack edges, which

149
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[5] J. Newsome, E. Shi, D. Song, and A. [8] H. Yu, C. Shi, M. Kaminsky, P. B.


Perrig, ―The Sybil attack in sensor Gibbons, and F. Xiao, ―DSybil: Optimal
networks: Analysis & defenses,‖ in Proc. sybil-resistance for recommendation
ACM/IEEE IPSN, 2004, pp. systems,‖ in Proc. IEEE Symp. Security
259–268. Privacy, 2009, pp. 283–298.
[6] Baptiste Pretre, Semester Thesis, [9] H. Yu, P. B. Gibbons, M. Kaminsky,
―Attacks on Peer to Peer Networks‖. and F. Xiao, ―Sybil Limit: A near optimal
[7] N. Tran, B. Min, J. Li, and L. social network defense against sybil
Subramanian, ―Sybil-resilient online attacks,‖ in Proc. IEEE/ACM transactions
content voting,‖ in Proc. USENIX NSDI, on networking, Vol. 18, No. 3, June 2010,
2009, pp. 15–28.

150
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

REDUNDANCY CHECK ARCHITECTURE


*Anand.V.J, **Selvakumar.V.S
*vjanandbe_08@hotmail.com
**selvakumar.vs@rajalakshmi.edu.in
Department Of ECE
Rajalakshmi Engineering College
Thandalam, Chennai-602105, India
data transmission is required, the general
Abstract- Cyclic redundancy check (CRC) is serial implementation cannot meet the
widely used to detect errors in data speed requirement. Since parallel
communication and storage devices. When processing is a very efficient way to
high-speed data transmission is required, increase the throughput rate. Although
the general serial implementation cannot [3],[4] parallel processing increases the
meet the speed requirements. Since number of message bits that can be
parallel processing is very efficient way to processed in one clock cycle, it can also
increase the throughputs rate, parallel lead to a long critical path (CP); thus, the
CRC implementation is used in this increase of throughput rate that is
proposed method. Here we are using 32 achieved by parallel processing will be
bit CRC architecture for rendering the reduced by the decrease of circuit speed.
better performance. The CRC generator Another issue is the increase of hardware
polynomial,G(x)=1+x+x2+x4+x5+x7+x9+x1 cost caused by parallel processing, which
0
+x11+x12+x16+x22+x23+x26+x32 is used needs to be controlled. Here we also came
because of its better Hamming across issues like iteration bottle neck and
Distance(HD) value and its features. Here fan out bottleneck.
high speed VLSI based architecture is CRC implementations can use either
used for getting a low area and low power hardware or software methods. The
consumed architecture of CRC. The factors based on which a code is selected
proposed method will check for the better will be depending on the amount of
parallel 32 bit CRC architecture with protection needed, the overhead involved,
minimum iteration bound with improved the implementation cost, the error control
throughput along with less increase in strategy and nature of errors.ARQ or
area when compared with previously automatic repeat request control strategy
proposed parallel CRC architectures.Xilinx is used in CRC, which uses error detection
and ModelSim are used to analyze the along with retransmission. This provides
operating frequency and the LUTs used. better protection with minimum number of
Keywords- Cyclic Redundancy Check check bits than needed for error
(CRC), LFSR, VLSI, Verilog, Digital logic. correction. It‘s found the probability of
error in error detection is less compared to
I. INTRODUCTION error correction. For any given message
CRC can detect the following types of
Cyclic Redundancy Check (CRC) [1], has errors:
been widely used in data communication
and storage devices as a powerful tool for  All single bit errors.
dealing with data errors. It can be applied  All double bit errors as long as the
for testing the integrated circuits and generating polynomial.
detecting logic faults. When high-speed

151
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 All odd bit errors as long as the of m bits, {b0; b1… bm-1}, to allow the
generator polynomial. receiver to detect possible errors. The
 Any burst error for which the sequence S2 is commonly known as a
length of the burst is less than length of Frame Check Sequence (FCS). It is
CRC. generated by taking into account the fact
 Larger burst errors. that the complete sequence, S=S1US2,
 Data misordering detection. obtained by the concatenating of S1 and
S2, has the property that it is divisible
Hardware implementation on VLSI is (following a particular arithmetic) by some
preferred due to several reasons: predetermined sequence P, { p0; p1… pm},
 Slow processing and limited of m+1 bits. After Tx sends S to Rx, Rx
application to lower encoding rates and divides S (i.e., the message and the FCS)
large delay before delivering the data in by P, using the same particular arithmetic,
bit wise software implementation. after it receives the message. If there is
 In byte wise software, it takes no remainder, Rx assumes there was no
considerable CPU time and large memory error.
is required for the processing. The product operator is accomplished by a
 While hardware implementations bitwise AND, whereas both sum and
are simple, fast and easy to realize. subtraction are accomplished by bitwise
XOR operators. A CRC circuit can be easily
The proposed design starts from LFSR, realized as a special shift register, called
used in serial CRC. An unfolding algorithm LFSR. It is used by both transmitter and
[2] is used for realize parallel processing. receiver. In transmitter side, the dividend
As direct application of this algorithm may is the sequence S1 concatenated with a
lead to unleash a parallel circuit with large sequence of m zeros to the right. The
iteration bound [2]. So delay elements are divisor is P. In receiver, the dividend is the
added or pipelined so as to achieve the received sequence and the divisor is the
minimum critical path. Critical path (CP) of same P.
Data Flow Graph (DFG) is the path with One of the possible LFSR [4] is shown in
the longest computation time among all the Figure 1. In this m FFs have common
paths that contain zero delays. To achieve clock and clear signal. The input x`i of the
high speed, the length of CP must be ith FF is obtained by taking an XOR of the
reduced by pipelining and parallel (i-1) th FF output and a term given by the
processing. Finally retiming algorithm is logical AND between pi and xm-1. The
applied to obtain the lowest achievable signal x`0 is obtained by taking an XOR of
CP. the input d and xm-1. If pi is zero, only a
shift operation is performed (i.e., XOR
The article is structured as follows: Section related to x`i is not required); otherwise,
II, illustrates the key factors in CRC. the feedback xm-1 is XOR-ed with xi-1. We
Section III, briefs about the related works point out that the AND gates in Fig. 2 are
on parallel CRC. In Section IV, the unnecessary if the divisor P is time-
methods used to reduce the critical path invariant.
are discussed. Finally in Section V, the
results are analyzed.

II. CYCLIC REDUNDANCY CHECK

Let the CRC can be briefly explained as


follows Let us suppose that a transmitter,
Tx, sends a sequence, S1, of k bits {a0;
a1;…; ak-1}, to a receiver, Rx. At the same
time, Tx generates another sequence, S2,

152
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

approach. Their idea was to apply the


digital filter theory to the classical CRC
circuit. They derived a method for
determining the logic equations for any
generator polynomial. Their formalization
is based on a z-transform. To obtain logic
equations, many polynomial divisions are
needed. Thus, it is not possible to write a
synthesizable VHDL code that
automatically generates the equations for
parallel CRCs.
 In 1996, Braun et al. [6]
presented an approach suitable for FPGA
implementation. A very complex analytical
proof is presented. They developed a
special heuristic logic minimization to
compute CRC checksums on FPGA in
parallel.
 In 2001, Sprachmann [7]
Figure 1: LFSR architecture. implemented parallel CRC circuits of
LSFR2. He proposed interesting VHDL
The sequence S1 is sent serially to the parametric codes. The derivation is valid
input d of the circuit starting from the for any polynomial and data-width w, but
most significant bit, b0. Let us suppose equations are not so optimized.
that the k bits of the sequence S1 are an  In [8], a software based parallel
integral multiple of m, the degree of the CRC algorithm called ‗N-byte RCC
divisor P. The process begins by clearing (repetition of computation and
all FFs. Then, all k bits are sent, once per combination)‘. This method is the iterative
clock cycle. Finally, m zero bits are sent process of message by the ‗slicing –by-4‘
through d. In the end, the FCS appears at and combination through the ‗zero block
the output end of the FFs. look up tables‘. This method can
parallelize the CRC calculation with any
number of processors.
 [9] Shows deals with algorithms
that can ideally read arbitrarily large
amounts of data at a time, while
optimizing their memory requirement to
meet the constraints of specific computer
architectures. The algorithms can ideally
read arbitrarily large amounts of data at a
time and also create an arbitrary number
Figure 2: LFSR for polynomial G(x) = x4 + of slices in each step of their execution.
x3 + x1 + 1.  In [10], Chao Cheng, Keshab K.
Parhi proposes improved three-step LFSR
architecture with both higher hardware
III. RELATED WORKS efficiency and speed. Generator
Parallel CRC hardware is attractive polynomials for the first and third steps
because, by processing the message in are constructed with iterative small
blocks. The literature of main works on lengthpolynomials, which can in turn be
parallel CRC are given below. easily handled by look-ahead pipelining
 In 1990, Albertengo and Sisto [5] algorithm. Efficient high-speed parallel
proposed an interesting analytical LFSR structures must address two

153
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

important issues: large fan out bottleneck


and Iteration bound bottleneck.

IV. METHODS TO REDUCE


ITERATION BOUNDS

A. Pipelining Process

Pipelining [2] transformation leads to a


reduction in the critical path (CP), which
can be exploited either increase the clock
speed or sample speed or to reduce power
consumption at same speed. In parallel
processing, multiple outputs are computed
Figure 3. Block Processing
in parallel in a clock period. Therefore, the
Pipelining can be used only to the extent
effective speed is increased by the level of
such that the critical path computation
parallelism. Similar to pipelining, parallel
time is limited by the communication (or
processing can be also used for reduction
I/O) bound. Once this is reached,
of power consumption. Pipelined
pipelining can no longer increase the
processing will involve inter-processor
speed .So, in such cases, pipelining can be
communication. By inserting latches or
combined with parallel processing to
registers between combinational logic
further increase the speed of the DSP
circuits, the critical path can be shortened.
system. By combining parallel processing
(block size: L) and pipelining (pipelining
B. Parallel Processing
stage: M), the sample period can be
Parallel processing and pipelining
reduce to:
techniques are duals each other: if a
computation can be pipelined, it can also
be processed in parallel. Both of them
exploit concurrency available in the
computation in different ways. (5)
Consider a single-input single-output C. Retiming
(SISO) FIR filter: After we apply pipelining to the original
y (n)=a*x(n)+b*x(n-1)+c*x(n-2) serial CRC architecture, the minimum
(1) achievable CP (iteration bound) of the
We have to convert the SISO system into unfolded CRC architecture is reduced.
an MIMO (multiple-input multiple-output) Retiming [2] is the transformation
system in order to obtain a parallel technique used to change the locations of
processing structure. For example, to get delay elements in a circuit without
a parallel system with 3 inputs per clock affecting the input/output characteristics
cycle (i.e., level of parallel processing of the circuit. Retiming can be used to
L=3). increase the clock rate of a circuit by
y (3k)=a*x(3k)+b*x(3k-1)+c*x(3k-2) reducing the computation time of the
(2) critical path. . Retiming can be used to
y (3k+1)=a*x(3k+1)+b*x(3k)+c*x(3k-1)
(3)
y
(3k+2)=a*x(3k+2)+b*x(n3k+1)+c*x(n3k
), (4)
Here k denotes the clock cycle.

154
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

reduce the number of registers in a circuit. PARAMETERS OUTPUT VALUE

Minimum Period 2.699ns


Minimum input 4.515ns
arrival time before
clock
Maximum output 4.880ns
required time after
clock
Clock period 2.699ns
Maximum 370.508MHz
Frequency
Total number of 45/32
paths

Figure 4. Block diagram of CRC based on


pipelining, unfolding and retiming
Retiming can be used to reduce the power
consumption of a circuit by reducing
switching, which can lead to the dynamic
power consumption in the static CMOS
circuits. Figure 5 shows the overall block
diagram of the proposed design
V. CRC PERFORMANCE ANALYSIS
The circuit of serial architecture of 32 bit
CRC has been described in Verilog and
synthesized using Xilinx Sparten3E and
ModelSim tools.

TABLE I. DEVICE UTILIZATION SUMMARY

PARAMETERS OUTPUT VALUE


Number of slices 18 out of 4656
32 out of 9312
Number of Slice Flip
Flops Figure 5. RTL Diagram of 32bit CRC
15 out of 9312
Number of 4 input VI. CONCLUSION
LUTs Cyclic redundancy check (CRC) is an error
Number of IOs 35 detecting code that is widely used to
detect corruption in blocks of data that
Number of bonded 35 out of 190 have been transmitted or stored. The
IOBs number of error detected depends on
1 out of 24 generator polynomials used. Due to
Number of GCLKs technology development higher generator
Table 1 shows the device utilization polynomials are to be used. So a better
summary of the architecture. And Table 2 architecture is required to have the
speed grade of the architecture. operation done in required time. As for
error detection, time consumed for
TABLE II. TIMING SUMMARY operation is a major constraint .Here 32

155
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

bit CRC is taken for a high performance Field Programmable Logic and
operations and faster applications. Here Applications, 1996.
we have chosen serial implementation of [7] M. Spachmann, ―Automatic Generation
32 bit CRC. The frequency of operation of Parallel CRC Circuits,‖ IEEE Design and
and the area usage of the serial Test of Computers, May 2001.
architecture are analyzed. The 32bit serial [8] Youngju. Do Sung-Rok. Yoon, Taekyu.
CRC uses 18 slices and the frequency of Kim, Kwang Eui. Pyun and Sin-Chong,
operation is found to be ―High-speed Parallel Architecture for
370.508MHz.Hence the proposed method Software-based CRC‖ IEEE CCNC 2008.
provides increase in clock rate and [9] Michael E. Kounavis and Frank L.
provides better performance within less Berry, ―Novel Table Lookup-Based
area. Depending on the required reduction Algorithms for High-Performance CRC
in iteration bound, we will be performing Generation‖ IEEE Transactions on
three or four level pipelining method. Care Computer, VOL. 57, NO. 11, pp. 1550-
has to be taken in not increasing the area 1560 Nov 2008.
of architecture along with pipelining, as [10] Chao Cheng, Keshab K. Parhi, ―High
pipelining will reduce the critical path (CP) Speed VLSI Architecture for General Linear
by adding delay elements which will Feedback Shift Register (LFSR) Structures
increase the size of architecture. ―IEEE, 2009.
[11] K. K. Parhi, ―Eliminating the fanout
REFERENCES bottleneck in parallel long bchencoders,‖
[1] T. V. Ramabadran and S. S Gaitonde, IEEE Transactions on Circuits and Systems
―A tutorial on CRC computations,‖ IEEE I: Regular Papers, vol. 51, no. 3, pp.
Micro, vol. 8, no. 4, pp. 62–75, Aug. 1988. 512–516, 2004.
[2] K. K. Parhi, VLSI Digital Signal [12] A. Tanenbaum, Computer Networks,
Processing Systems: Design and 4th ed. Prentice Hall, 2003.
Implementation. Hoboken, NJ: Wiley, [13] P. Koopman, ―32-Bit Cyclic
1999. Redundancy Codes for Internet
[3] T.-B. Pei and C. Zukowski, ―High-speed Applications,‖ in DSN ‘02: Proceedings of
parallel CRC circuits in VLSI,‖ IEEE Trans. the 2002 International Conference on
Commun., vol. 40, no. 4, pp. 653–657, Dependable Systems and Networks.
Apr. 1992. Washington, DC, USA: IEEEComputer
[4] G. Campobello, G. Patané, and M. Society, 2002, pp. 459–468.
Russo, ―Parallel CRC realization,‖ IEEE [14] P. Koopman and T. Chakravarty,
Trans. Comput., vol. 52, no. 10, pp. 1312– ―Cyclic Redundancy Code (CRC)
1319, Oct. 2003. Polynomial Selection for Embedded
[5] G. Albertengo and R. Sisto, ―Parallel Networks,‖ in DSN ‘04: Proceedings of the
CRC Generation,‖ IEEE Micro, vol. 10, no. 2004 International Conference on
5, Oct. 1990, pp. 63-71. Dependable Systems and Networks.
[6] M. Braun et al., ―Parallel CRC Washington, DC, USA: IEEE Computer
Computation in FPGAs,‖ Proc.Workshop Society, 2004, pp.145–154.

156
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

MODELING BOTNET PROPAGATION FOR


DETECTING BOTMASTERS
*J.LINGESAN,** R.KANNAMMA
M.E. – COMPUTER SCIENCE AND ENGINEERING
PRATHYUSHA INSTITUTE OF TECHNOLOGY AND MANAGEMENT
TIRUVALLUR- 602 025

lingesan.j@gmail.com, kannamma.sridharan@gmail.com
Abstract— A ―botnet‖ consists of a network of 1. INTRODUCTION
compromised computers controlled by an attacker
(―botmaster‖). Recently, botnets have developed into Internet malware attacks have evolved into better-
the root cause of a lot of Internet attacks. To be organized and more profit-centered endeavors. E-mail
healthy ready for future attacks, it is not sufficient to spam, extortion through denial-of-service attacks, and
study how to detect and defend against the botnets click fraud represent a few examples of this emerging
that have appeared in the past. More prominently, we trend. ―Botnets‖ are a root cause of these problems. A
should study advanced botnet designs that could be ―botnet‖ consists of a network of compromised
developed by botmasters in the near future. In this computers (―bots‖) connected to the Internet that is
paper, we present the design of an advanced hybrid controlled by a remote attacker (―botmaster‖). Since a
peer-to-peer botnet. Compared with current botnets, botmaster could scatter attack tasks over hundreds or
the proposed botnet is harder to be shut down, even tens of thousands of computers distributed across
observe, and hijacked. It provides robust network the Internet, the enormous cumulative bandwidth and
connectivity, individualized encryption and control large number of attack sources make botnet-based
traffic dispersion, limited botnet exposure by each bot, attacks extremely dangerous and hard to defend
and easy monitoring and improvement by its against. Compared to other Internet malware, the
botmaster. Our enhancement is to defend against such unique feature of a botnet lies in its control
an advanced botnet. We can secure the data in every communication network. Most botnets that have
bots by having the session key for viewing that data. appeared until now have had a common centralized
This key will be changing at every transaction. That key architecture. That is, bots in the botnet connect directly
changing mechanism is controlled and issued by the to some special hosts (called ―command-and-control‖
authorization person. For every transaction the receiver servers, or ―C&C‖ servers). These C&C servers receive
must register to the authorization person. After commands from their botmaster and forward them to
registration authorization person provide the session the other bots in the network. From now on, we will
key to the requested receiver. At the time of receiver call a botnet with such a control communication
node registration the authorization person can read architecture a ―C&C botnet.‖ Fig. 1 shows the basic
connected nodes information. By this way we can find control communication architecture for a typical C&C
the availability of transaction that is used to find botnet (in reality, a C&C botnet usually has more than
botmasters two C&C servers). Arrows represent the directions of
Index Terms — Botnet, Botmaster, Bot, Honeypot network connections. As botnet-based attacks become
popular and dangerous, security researchers have
studied how to detect, monitor, and defend against
them. Most of the current research has focused upon

157
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

the C&C botnets that have appeared in the past,  A botmaster will lose control of her botnet once
especially Internet Relay Chat (IRC)- based botnets. It the limited numbers of C&C servers are shut down by
is necessary to conduct such research in order to deal defenders.
with the threat we are facing today. However, it is  Defenders could easily obtain the identities
equally important to conduct research on advanced (e.g., IP addresses) of all C&C servers based on their
botnet designs that could be developed by attackers in service traffic to a large number of bots, or simply from
the near future. Otherwise, we will remain susceptible one single captured bot (which contains the list of C&C
to the next generation of Internet malware attacks. servers).
From a botmaster‘s perspective, the C&C servers are  An entire botnet may be exposed once a C&C
the fundamental weak points in current botnet server in the botnet is hijacked or captured by
architectures. First, a botmaster will lose control of her defenders. As network security practitioners put more
botnet once the limited number of C&C servers are shut resources and effort into defending against botnet
down by defenders. Second, defenders could easily attacks, hackers will develop and deploy the next
obtain the identities (e.g.,IP addresses) of all C&C generation of botnets with different control
servers based on their service traffic to a large number architecture.
of bots, or simply from one single captured bot (which
contains the list of C&C servers). Third, an entire
botnet may be exposed once a C&C server in the
botnet is hijacked or captured by defenders. As
network security practitioners put more resources and
effort into defending against botnet attacks, hackers
will develop and deploy the next generation of botnets
with a different control architecture.
From a botmaster‘s perspective, the C&C servers are
the fundamental weak points in current botnet
architectures.
First, a botmaster will lose control of her botnet once Fig. 1. C&C architecture of a C&C botnet.
the limited number of C&C servers are shut down by
defenders.
Second, defenders could easily obtain the identities PROPOSED BOTNETS AND THEIR ADVANTANGES
(e.g., IP addresses) of all C&C servers based on their
service  It provides robust network connectivity,
traffic to a large number of bots [7], or simply from one individualized encryption and control traffic dispersion,
single captured bot (which contains the list of C&C limited botnet exposure by each bot,
servers).  Easy monitoring and recovery by its botmaster
Third, an entire botnet may be exposed once a C&C  No bootstap procedure
server in the botnet is hijacked or captured by  Each bot has a peer list to communicate
defenders [4]. As  Report command to communicate
network security practitioners put more resources and  Update command –contact a sensor host to update
effort into defending against botnet attacks, hackers bots‘ botspeer list
will develop  Bots with static IP are candidates for being in peer
and deploy the next generation of botnets with a lists
different control architecture.  Servent bot listens on a self-defined port & uses
appropriate key for incoming traffic
CURRENT BOTNETS AND THEIR WEAKNESSES
 The C&C servers are the fundamental weak
points in current botnet architectures. 2. RELATED WORKS

158
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[1] Global Internet threats are undergoing a profound hard to defend against using standard techniques, as
transformation from attacks designed solely to disable the malicious requests differ from the legitimate ones in
infrastructure to those that also target people and intent but not in content.
organizations. Behind these new attacks is a large pool [5] Botnets-networks of (typically compromised)
of compromised hosts sitting in homes, schools, machines are often used for nefarious activities (e.g.,
businesses, and governments around the world. These spam, click fraud, denial-of-service attacks, etc.).
systems are infected with a bot that communicates with Identifying members of botnets could help stem these
a bot controller and other bots to form what is attacks, but passively detecting botnet membership
commonly referred to as a zombie army or botnet. (i.e., without disrupting the operation of the botnet)
Botnets are a very real and quickly evolving problem proves to be difficult. This paper studies the
that is still not well understood or studied. effectiveness of monitoring lookups to a DNS-based
[2] Time zones play an important and unexplored role black hole list (DNSBL) to expose botnet membership
in malware epidemics. To understand how time and 3. BOTNET ARCHITECTURE
location affect malware spread dynamics, we studied 3.1 Two classes of Bots
botnets, or large coordinated collections of victim The bots in the proposed P2P botnet are classified into
machines (zombies) controlled by attackers. Over a six two groups. The first group contains bots that have
month period we observed dozens of botnets static, nonprivate IP addresses and are accessible from
representing millions of victims. We noted diurnal the global Internet. Bots in the first group are called
properties in botnet activity, which we suspect occurs servent bots since they behave as both clients and
because victims turn their computers off at servers.2 The second group contains the remaining
night.Through binary analysis, we also confirmed that bots, including 1) bots with dynamically allocated IP
some botnets demonstrated a bias in infecting regional addresses, 2) bots with private IP addresses, and 3)
populations.Clearly, computers that are of line are not bots behind firewalls such that they cannot be
infectious, and any regional bias in infections will affect connected from the global Internet. The second group
the overall growth of the botnet. We therefore created of bots is called client bots since they will not accept
a diurnal propagation model. The model uses diurnal incoming connections.
shaping functions to capture regional variations in
online vulnerable populations.
[3] Denial-of-Service (DoS) attacks pose a significant
threat to the Internet today especially if they are
distributed, i.e., launched simultaneously at a large
number of systems. Reactive techniques that try to
detect such an attack and throttle down malicious
traffic prevail today but usually require an additional
infrastructure to be really effective. In this paper we
show that preventive mechanisms can be as effective
with much less effort. DoS attack prevention that is
based on the observation that coordinated automated
activity by many hosts needs a mechanism to remotely
control them.
[4] Recent denial of service attacks are mounted by
professionals using Botnets of tens of thousands of Fig. 2. C&C architecture of the proposed botnet.
compromised machines. To circumvent detection,
attackers are increasingly moving away from bandwidth 3.2 Botnet Command and Control Architecture
floods to attacks that mimic the Web browsing
behaviour of a large number of clients, and target Fig. 2 illustrates the C&C architecture of the proposed
expensive higher-layer resources such as CPU, botnet. The illustrative botnet shown in this figure has
database and disk bandwidth. The resulting attacks are five servent bots and three client bots. The peer list
size is two (i.e., each bot‘s peer list contains the IP

159
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

addresses of two servent bots). An arrow from bot A to node registration the authorization person can read
bot B represents bot A initiating a connection to bot B. connected nodes information. By this way we can find
This figure shows that a big cloud of servent bots the availability of transaction that is used to find
interconnect with each other—they form the backbone botmasters
of the control communication network of a botnet. A 4.2 Session Key Generation
botmaster injects her commands through any bot(s) in Every node has data‘s with session key embedded that
the botnet. Both client and servent bots periodically session key generate by the Authentication server. This
connect to the servent bots in their peer lists in order session key is used to open the data. it is changeable
to retrieve commands issued by their botmaster. When key that every transformation the session key will be
a bot receives a new command that it has never seen changed .the changing mechanism control by
before (e.g., each command has a unique ID), it Authentication server
immediately forwards the command to all servent bots 4.3 Data Transformation and Botmaster Detection
in its peer list. In addition, if itself is a servent bot, it At the registration Authentication server retrieve details
will also forward the command to any bots connecting (own IP, ID & connected node/botserver ID, IP) of the
to it. botserver If requested botserver/node is authorized
botserver/node the authentication server provide
session key. The botserver/node can Read the data by
3.3 Relationship between Traditional C&C Botnets and the use of session key. If Data will transmitted to
the Proposed Botnet request botserver/node Session key will be changed at
the transaction so Data received botserver/node need
Compared to a C&C botnet (see Fig. 1), it is easy to see modified session key then Botserver/node can register
that the proposed hybrid P2P botnet shown in Fig. 2 is to the Authentication server for modified session key. If
actually an extension of a C&C botnet. The hybrid P2P requested botserver/node is not an authorized
botnet is equivalent to a C&C botnet where servent botserver /node that botserver/node is a botmaster
bots take the role of C&C servers: the number of C&C
servers (servent bots) is greatly enlarged, and they
interconnect with each other. Indeed, the large number
of servent bots is the primary reason why the proposed
hybrid P2P botnet is very hard to be shut down.

4. DETECTING BOTMASTERS

4.1 Securing Data

We can secure the data in every bots by having the


session key for viewing that data. This key will be
changing at every transaction. That key changing
mechanism is controlled and issued by the
authorization person. For every transaction the receiver
must register to the authorization person. After
registration authorization person provide the session
key to the requested receiver. At the time of receiver

160
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

value and use it to compute the secret value of the two


Registration
Authentication server communicating users. Afterward, he (she) can derive
the session key of the two users and then supervise the
communications. For the level-based hierarchy, any two
users with the same security level can efficiently
Yes establish a session key shared by them and apply it to
encrypt (decrypt) the communication message. The
If
requested
users with higher security level can efficiently derive
Provide
Read the data session key botserver/ the session key and supervise the communication. we
node is will propose a scheme based on software computation
authorized without any assumption of tamper-proof hardware and
botserver
/node the software technique indicates more flexibility in real
If Data will transmitted to applications.
No
requested botserver/node
Random Number Access Algorithm
In this Random number access algorithm, dynamically
Session key will be changed at many numbers of accessing can be done .This RNA
the transaction He is botmaster algorithm helps for dynamic accessing on encryption
process in the session system. In this RNA algorithm,
the Encrypting File System (EFS) is a feature of the
Data received botserver/node Windows 2000 operating system that lets any file or
need modified session key folder be stored in encrypted form and decrypted only
by an individual user and an authorized recovery agent.
EFS are especially useful for mobile computer users,
Botserver/node can register to the Authentication whose computer (and files) are subject to physical
server for modified session key theft, and for storing highly sensitive data. EFS simply
make encryption an attribute of any file or folder. To
4.4ALGORITHM store and retrieve a file or folder, a user must request a
A key agreement protocol is utilized in a network key from a program that is built into Windows 2000.
system such that two users are able to construct a Although an encrypting file system has existed in or
share common key. Instead of the tamper-proof been an add-on to other operating systems, its
hardware, we will propose a scheme to solve the inclusion in Windows 2000 is expected to bring the idea
problem of supervising secure communication in a to a larger audience. Encryption is the process of
level-based hierarchy. The scheme is divided into three scrambling data so as to render it unreadable to all but
phases: the initialization phase, the communication the holder of the correct decryption key. In the
phase and the supervising phase. In the initialization encryption and decryption process, for the dynamic
phase, assume that there exists a key distribution accessing the RANDOM NUMBER ACCESS Algorithm is
center, which is responsible for initiating parameters used in the session key.
and evaluating special information for the users in the
system. Once a user enrolls to the system, KDC
provides him (her) with ID number, and secret key. In
the communication phase, two users in the same 4.5 Elliptic curve cryptography (ECC)
security level can communicate securely each other. ECC is an approach to public-key cryptography based
They can generate the common session key by using on the mathematics of elliptic curves. The use of elliptic
RNA‘s scheme. curves in cryptography was suggested independently
For supervision by a higher level user, the by Neal Koblitz and Victor Miller in 1985.
communicating users are required to transmit the The main benefit of ECC is that under certain situations
encrypted messages to him (her). In the supervising it uses smaller keys than other methods — such as RSA
phase, a higher level user needs to derive the lock — while providing an equivalent or higher level of

161
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

security. One drawback, however, is that encrypting include Diffie-Hellman, ElGamal discrete log
and decrypting in elliptic curve cryptosystems may take cryptosystem and DSA.
longer than other cryptosystems. There are several Doing the group operations needed to run the system
slightly different versions of elliptic curve cryptography, is slower for an ECC system than for a factorization
all of which rely on the widely believed difficulty of system or modulo integer discrete log system of the
solving the discrete logarithm problem for the group of same size. However, proponents of ECC systems
an elliptic curve over some finite field. The most believe that the ECDLP problem is significantly harder
popular finite fields for this are the integers modulo a than the DLP or factorisation problems, and so equal
prime number (see modular arithmetic) GF(p), or a security can be provided by much smaller key lengths
Galois field of characteristic two GF(2m). Galois fields of using ECC, to the extent that it can actually be faster
size of power of some other prime have also been than, for instance, RSA. Published results to date tend
proposed, but are considered a bit dubious among to support this belief, but some experts are skeptical.
cryptanalysts. ECC is widely regarded as the strongest asymmetric
Given an elliptic curve E, and a field GF(q), we consider algorithm at a given key length, so may become useful
the abelian group of rational points E(q) of the form (x, over links that have very tight bandwidth requirements.
y), where both x and y are in GF(q), and where the
group operation "+" is defined on this curve as
described in the article elliptic curve. We then define a
second operation "*" | Z×E(q) → E(q): if P is some 5. PERFORMANCE MEASURE
point in E(q), then we define 2*P = P + P, 3*P = 2*P Two factors affect the connectivity of a botnet: 1) some
+ P = P + P + P, and so on. Note that given integersj bots are removed by defenders and 2) some
and k, j*(k*P) = (j*k)*P = k*(j*P). The elliptic curve bots are offline. These two factors, even though
discrete logarithm problem (ECDLP) is then to completely different, have the same impact on botnet
determine the integer k, given points P and Q, and connectivity when the botnet is used by its botmaster
given that k*P = Q. at a specific time. Let C(p) denote the connected ratio
It is believed that the usual discrete logarithm problem and D(p)denote the degree ratio after removing top p
over the multiplicative group of a finite field (DLP) and fraction of mostly connected bots among those peer-list
ECDLP are not equivalent problems; and that ECDLP is updating servent bots—this is the most efficient and
significantly more difficult than DLP. In cryptographic aggressive defense that could be done when defenders
use, a specific base point G is selected and published have the complete knowledge (topology, bot IP
for use with the curve E(q). A private key k is selected addresses . . . ) of the botnet. C(p) and D(p) are
as a random integer; and then the value P = k*G is defined as
published as the public key (note that the purported
difficulty of ECDLP implies that k is hard to determine C(p) = # of bots in the largest connected graph
from P). If Alice and Bob have private keys kA and kB, ------------------------------------------------------
and public keys PA and PB, then Alice can calculate # of remaining bots
kA*PB = (kA*kB)*G; and Bob can compute the same D(p) = Average degree of the largest connected graph
value as kB*PA = (kB*kA)*G. ----------------------------------------------------------
This allows the establishment of a "secret" value that ----
both Alice and Bob can easily compute, but which is Average degree of the original botnet
difficult for any third party to derive. In addition, Bob
does not gain any new knowledge about kA during this These two metric functions have clear physical
transaction, so that Alice's private key remains private. meanings. The metric C(p) shows how well a botnet
The actual methods used to then encrypt messages survives a
between Alice and Bob based on this secret value are defense action by keeping the remaining members
adaptations of older discrete logarithm cryptosystems connected together. The metric D(p) shows how
originally described for use on other groups. These densely the

162
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

remaining botnet is connected together—it exhibits the


ability of the remaining botnet to survive a further REFERENCES
removal. [1] E. Cooke, F. Jahanian, and D. McPherson, ―The
6. FUTURE ENHANCEMENT Zombie Roundup: Understanding, Detecting, and
we see that honeypot plays a critical role in most Disrupting Botnets,‖ Proc. USENIX Workshop Steps to
defense methods against the proposed botnet. Reducing Unwanted Traffic on the Internet (SRUTI
Botmasters might design countermeasures against ‘05), July 2005.
honeypot defense systems. Such countermeasures [2] D. Dagon, C. Zou, and W. Lee, ―Modeling Botnet
might include detecting honeypots based on software Propagation Using Time Zones,‖ Proc. 13th Ann.
or hardware fingerprinting, or exploiting the legal and Network and Distributed System Security Symp. (NDSS
ethical constraints held by honeypot owners. Most of ‘06), pp. 235-249, Feb. 2006.
current botnets do not attempt to avoid honeypots— [3] F. Freiling, T. Holz, and G. Wicherski, ―Botnet
perhaps, it is simply because attackers have not felt the Tracking: Exploring a Root-Cause Methodology to
threat from honeypot defense yet. As honeypot-based Prevent Distributed Denial-of- Service Attacks,‖
defense becomes popular and being widely deployed, Technical Report AIB-2005-07, CS Dept. RWTH Aachen
we believe botmasters will eventually add honeypot Univ., Apr. 2005.
detection mechanisms in their botnets. [4] S. Kandula, D. Katabi, M. Jacob, and A. Berger,
The war between honeypot-based defense and ―Botz-4-Sale: Surviving Organized DDOS Attacks That
honeypot-aware botnet attack will come soon and Mimic Flash Crowds,‖ Proc. Second Symp. Networked
intensify in the near future. For botnet defense, current Systems Design and Implementation (NSDI ‘05), May
research shows that it is not very hard to monitor 2005.
Internet botnets. The hard problem is: how to defend [5] A. Ramachandran, N. Feamster, and D. Dagon,
against attacks sent from botnets, since it is normally ―Revealing Botnet Membership Using DNSBL Counter-
very hard to shut down a botnet‘s control. Because of Intelligence,‖ Proc. USENIX Second Workshop Steps to
legal and ethical reason, we as security defenders Reducing Unwanted Traffic on the Internet (SRUTI
cannot actively attack or compromise a remote bot ‘06), June 2006.
machine or a botnet C&C server, even if we are sure a [6] E.K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S.
remote machine is installed with a bot program. For Lim, ―A Survey and Comparison of Peer-to-Peer Overlay
example, the well-known ―good worm‖ approach is not Network Schemes,‖IEEE Comm. Surveys and Tutorials,
practical in the real world. The current practice of vol. 7, no. 2, 2005.
collaborating with the ISPs containing bot-infected [7] R. Bhagwan, S. Savage, and G.M. Voelker,
machines is slow and resource consuming. There are ―Understanding Availability,‖ Proc. Second Int‘l
still significant challenges in botnet defense research in Workshop Peer-to-Peer Systems (IPTPS ‘03), Feb.
this aspect. 2003.
[8] R. Puri, Bots & Botnet: An Overview,
http://www.sans.org/rr/whitepapers/malicious/1299.ph
6. CONCLUSION p, 2003.
To be well prepared for future botnet attacks, we [9] P. Barford and V. Yegneswaran, An Inside Look at
should study advanced botnet attack techniques that Botnets, to appear in Series: Advances in Information
could be developed by botmasters in the near future. Security. Springer, 2006.
In this project we present the design of an advanced [10] F. Monrose, ―Longitudinal Analysis of Botnet
hybrid P2P botnet. Compared with current botnets, the Dynamics,‖ ARO/DARPA/DHS Special Workshop Botnet,
proposed one is harder to be monitored, and much 2006.
harder to be shut down. It provides robust network
connectivity, individualized encryption and control
traffic dispersion, limited botnet exposure by each
captured bot, and easy monitoring and recovery by its
botmaster.

163
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

IMPROVING SECURITY PERFORMANCE OF


MOBILE AD-HOC NETWORKS AGAINST
ATTACKS
*Senthilmurugan.T, **Senthil.P,** Manikandan.T

*Research Scholar, Vel Tech DR.RR & DR.SR Technical University, Chennai-62
senthilmuruganme@gmail.com
Mobile No.: 9176031383
**B.E(CSE) , Vel Tech Engineering College, Avadi, Chennai-62
SSSENTHIL.P@gmail.com
Mobile No.: 9600685018
***B.TECH (IT), Tagore Engineering College, Rathinamangalam, Chennai-48. manikandan.tk@gmail.com

ABSTRACT Mobile ad hoc network (MANET) is a type of


wireless networks and it is a combination of
Mobile ad hoc network (MANET) is a mobile hosts that can communicate with each
combination of mobile hosts that can other. This network depends on the mobile
communicate with each other and it is a nodes and there is no infrastructure, routers,
wireless network. This network depends on servers, access points or cables. Nodes or
the mobile nodes and there is no mobiles can move freely, so it may change its
infrastructure, routers, servers, access points location from time to time. All nodes of these
or cables. In Wireless communications the networks behave as routers and take part in
traffic across a mobile ad hoc network can be discovery and maintenance of routes to other
highly vulnerable to security threats. Because nodes in the network.
of the features like unreliability of wireless 1.2 Properties of MANET Routing
links between nodes, constantly changing Protocols
topology, restricted battery power, lack of Distributed operation is an essential property,
centralized control and others, the mobile ad but it should be stated nonetheless. Loop-
hoc networks are more prone to suffer from freedom is not required per second in light of
the malicious behaviors than the traditional certain quantitative measures but generally
wired networks. In this paper a new combine desirable to avoid problems such as worst-
approach, which combines three techniques case phenomena, e.g. a small fraction of
like as principle of conservation of flow (PCF), packets spinning around in the network for
Acknowledgement (ACK) with Ad hoc on arbitrary time periods. Demand-based
demand distance vector protocol (AODV) is operation is instead of assuming an uniform
used. This approach is used to identify and traffic distribution within the network the
prevent the malicious nodes exhibiting routing algorithm adapt to the traffic pattern
different network layer attacks and is on a demand or need basis. Proactive
compared with a nearest approach. The operation is the flip-side of demand based
Performance evaluation is done based on few operation. Security is without some form of
network parameters. network-level security, a MANET routing
protocol is vulnerable to many forms of attack.
1. INTRODUCTION This property may require close coupling with
the link-layer protocol through a standardized
1.1 MANET interface.

164
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

1.3 Attacks tolerant authentication protocol proposed. Also


The inherent features of mobile ad hoc extending DSR to provide it with security
networks make them more vulnerable to a mechanisms is Cooperation Of Nodes: Fairness
wide variety of attacks by misbehaving nodes. In Dynamic Ad-hoc Networks[1].
Such attacks can be listed as passive and
active attacks. In active attacks, we mainly
consider the internal attacks for network layer 2.2. Misbehavior Detection
such as black hole attack, gray hole attack,
worm hole attack, message tampering, routing In order to provide reliable network
attacks. A malicious node drops packets or connectivity some work has been carried that
generates additional packets solely to disrupt aims to protect data packet forwarding against
the network performance and prevent other malicious attacks. The some approaches that
nodes from accessing any network services (a detect malicious behavior in the data
denial of service attack) [2]. Misbehavior can forwarding phase. WATCHERS (Watching for
be divided into two categories [3]: routing Anomalies in Transit Conservation: a Heuristic
misbehavior (failure to behave in accordance for Ensuring Router Security) is a protocol
with a routing protocol) and packet forwarding designed to detect disruptive routers in fixed
misbehavior (failure to correctly forward data networks through analysis of the number of
packets in accordance with a data transfer packets entering and exiting a router[2].
protocol).
SCAN (self-organized network layer security in
mobile ad hoc networks) focuses on securing
2. RELATED WORK packet delivery. It uses AODV, but argues that
the same ideas are applicable to other routing
In this paper we focus on the latter. Our protocols. SCAN assumes a network with
approach consists of an algorithm that sufficient node density that nodes can
performs two tasks: a) enables packet overhear packets being received by a
forwarding misbehavior detection through the neighbor, in addition to packets being sent by
principle of conservation of flow (PFC), and b) the neighbor. SCAN nodes monitor their
enables the prevention of nodes that are neighbors by listening to packets that are
consistently detected exhibiting packet forwarded to them. The SCAN node maintains
forwarding misbehavior. A node that is a copy of the neighbor‘s routing table and
accused of misbehavior is denied access to the determines the next-hop node to which the
network by its peers, which ignore any of its neighbor should forward the packet; if the
transmission attempts. Thus, misbehaving packet is not overheard as being forwarded, it
nodes are isolated from the rest of the is considered to have been dropped .
network. Whereas, in the proposed algorithm the nodes
do not need to overhear transmissions to and
2.1. Routing and Packet Forwarding Protection from any neighbor in order to detect
misbehavior.
Secure routing protocols have been proposed
based on existing ad hoc routing protocols. Finally, in a system that can mitigate the
These eliminate some of the optimizations effects of packet dropping is proposed. This is
introduced in the original routing protocols composed of two mechanisms that are kept in
because they can be exploited to launch all network nodes: a watchdog and a path
different types of attacks. Examples of such rater. The watchdog mechanism identifies any
protocols are the secure efficient distance misbehaving nodes by promiscuously listening
vector routing which is based on the to the next node in the packet‘s path. If such
destination sequenced distance vector, the a node drops more than a predefined
secure ad-hoc on-demand distance vector threshold of packets the source of the
routing protocol based on AODV, and the communication is notified. The path rater
secure on-demand routing protocol for ad hoc mechanism keeps a rate for every other node
networks based on the dynamic source routing in the network it knows about. Whereas, the
protocol and the timed efficient stream loss- proposed approach denies access to the

165
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

network to any of the node, that has been a RREQ is received a route to the source is
identified as malicious, thus discouraging them created. If the receiving node has not received
from dropping packets. this RREQ before, is not the destination and
does not have a current route to the
A different approach with a 2ACK scheme, destination, it rebroadcasts the RREQ. If the
which is a network-layer technique to detect receiving node is the destination or has a
misbehaving links and to mitigate their effects. current route to the destination, it generates a
It can be implemented as an add-on to Route Reply (RREP). The RREP is unicast in a
existing routing protocols for MANETs, such as hop-by-hop fashion to the source. As the RREP
DSR (Dynamic Source Routing). The 2ACK propagates, each intermediate node creates a
scheme detects misbehavior through the use route to the destination. When the source
of a new type of acknowledgment packet, receives the RREP, it records the route to the
termed 2ACK. A 2ACK packet is assigned a destination and can begin sending data.
fixed route of two hops (three nodes), in the
opposite direction of the data traffic route. 2. Sender connects to the nearest
Whereas, the proposed algorithm just uses a intermediate node:The
snippet below is
simple acknowledgement approach instead a describes the procedure of connecting to
2ACK scheme, which increases the overhead. intermediate node:

3. ALGORITHM Socket soc=null;


The brief overview on the design aspects of try
the algorithm with AODV protocol {
implementation, a simple acknowledgement soc=new Socket(currentnode,currentport);
approach and principle of flow conservation is }
discussed as follows: The proposed algorithm catch(Exception e)
aims at efficient data forwarding in network {
and in that process monitors the misbehaving if (currentport==4000)
nodes or routes, so that such nodes or routes OptionPane.showMessageDialog(jf,"Error","con
are avoided in data forwarding. nection
error",JOptionPane.ERROR_MESSAGE);
The proposed system is developed by using a else
simple acknowledgement approach with two JOptionPane.showMessageDialog(jf,"Error
way communications. Once the sender sends while connecting to
the message it waits for the acknowledgement centernode2","Error in connection",
back from the receiver to confirm that the JOptionPane.ERROR_MESSAGE);
message has reached the receiver or not. Also
the particular data frame formats which 3. Dividing message/data into packets:First
specify the various fields in the data and the length of the msg is calculated, if it is less
acknowledgement frames are presented. The that 48 bytes then it generates the data frame
routing takes place according to an on- according to the data frame format and sends
demand protocol like AODV (Ad hoc on it. Else the msg is divided in to packets of
demand distance vector protocol). The 48byte each.
malicious behavior which exhibits significant
packet dropping is identified by principle of The pseudo code for the same is as follows:
flow conservation. So only, the approach is a st=0, end=48, split=0;
combination of 3 techniques as (ACK+AODV+ len=200, len1=len;
PFC). extract the first 48 bytes

1. AODV protocol initiates routing and selects While ( len<=48 )
the path based on the highest destination {
sequence number: When a source has data to len1=len-48
transmit to an unknown destination, it if(len1<=48)
broadcasts a Route Request (RREQ) for that {
destination. At each intermediate node, when extract ( end, len )

166
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

…} readcnt=in1.read(chstr);
else if(readcnt <=0)
{ continue;
split=end+48 else
extract ( end, split ) break;
…} }
}
The variable st will point to the start of the 7. Calculates time taken and the number of
message, end will be initialized to 48, and split packets lost: The moment the message is sent
variable keeps adding 48 to the end, to point the time is saved in start, which is long
to the next position where the message has to variable, and once the acknowledgement has
be split. reached the time is again noted in the long
variable end.
4. Creates data frame with destination
address, sender name, hash code and Start=System.currentTimeMillis();
message: The data frame contains the end=System.currentTimeMillis();
following fields as shown below:
The total time taken for the message to be
Destination address: is taken from the text sent and acknowledgement to reach back is
field, entered by the user. calculated from end-start.
Every time a packet is sent, there is a counter
Host name: it is obtained using the following cpkt, which is incremented. If the total time
snippet of code. taken exceeds the wait time limit which is
20msec, a counter cmiss that keeps count of
InetAddress inta=InetAddress.getLocalHost(); packets lost is incremented. This uses the
Sender‘s hostname=ineta.getHostName(); principle of flow conservation for calculating
the (cmiss/cpkt) ratio which is explained in the
Hash code: the function hashing() is called following step.
with msg as the parameter, which calculates
and returns the hash code. 8. Chooses the intermediate node: Once the
whole message is sent, a packet called ―done‖
Message: msg is taken from the text area that is sent by the sender to mark the end of the
is either manually entered by the user or message. If the ratio of (cmiss/cpkt) exceeds
browsed and copied from a text file. 20%, the link is said to be misbehaving. And if
the acknowledgement field that is extracted
5. Sends the packet: Once the connection is from the ack packet sent by the destination
established, BufferedReader and matches ―CONFIDENTIALITY LOST‖ then we
BufferedOutputStream are used to create the consider that the message is modified. If the
input and output stream that sends and ratio of (cmiss/cpkt) is less than 20% and the
receives packets in bytes. Functions write () acknowledgement field extracted is ―ACK‖ then
and read () are used for sending the packet the link is considered to be working properly.
and receiving the acknowledgement. Thus the sender displays appropriate
information message indicating the behaviour
6. Waits for acknowledgement: The sender of the link.
keeps waiting till acknowledgement is received
from the intermediate node. The function read If the link is misbehaving or the confidentiality
() reads the acknowledgement written by the of the message is lost, there has to be a
intermediate node to the sender in to the switch in the intermediate node used. This is
string object chstr, and returns the number of done so that in the next session, a faithful
bytes read. The following snippet shows the communication is carried out. In case the link
infinite loop that is used for waiting. is learnt to be working properly then the same
link is used for the further sessions of sending
While (true) ///read ACK messages.
{

167
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4. EXPERIMENT ANALYSIS AND RESULTS


From table 2 comparison results for the link
The proposed algorithm was practically status considered as working properly and
implemented and tested in a lab scenario. misbehaving it is observed that even when the
Through the experimental analysis it is found misbehaviour is high the packet delivery ratio
that the algorithm exactly shows the results is 100% by (AODV+ACK+PFC) scheme
for two attacks. To analyze the algorithm with compared to (2ACK+DSR).
three mechanisms combined as
(AODV+ACK+PFC), Traffic sources of constant 4. CONCLUSIONS
bit rate (CBR) based on TCP have been used.
The following Table 1 shows the results for The MANETS security issues foster few ideas
the experiment conducted: and approaches as it has got potential
widespread applications in military and civilian
Table 1. Summary of Results communications. In these networks there will
be more dependence on the cooperation of all
NO. Cpkt Cmiss Cmiss/Cpkt Time Link its nodes to perform networking functions.
of taken Status Thus, makes it highly vulnerable to malicious
nodes nodes. One such misbehavior is related to
30 158 4 0.025 1000 Proper routing of packets. When such misbehaving
Sec nodes take part in the route discovery process,
30 4 4 1.0 1065 Mis but refuse to forward the data packets,
Sec behaving routing performance may be degraded
severely. In this paper, we have investigated
4.1. Performance Analysis the performance degradation caused by such
malicious nodes (misbehaving) in MANETS.
We have considered two of the network We have proposed and evaluated a technique
parameters for evaluating the performance called, (AODV+ACK+PFC) to detect and
with the combined (AODV+ACK+PFC) scheme. mitigate the effect of such routing
misbehavior. An immediate enhancement for
• Packet delivery ratio – the ratio of the this scheme can be done by evaluating for
number of packets received at the destination more number of nodes and network
and the number of packets sent by the source. parameters. Through simulations it can be
compared with the nearest methods. Further
• Routing overhead – The number of routing the scheme can also be extended for
packets transmitted per data packet delivered identifying and preventing more number of
at the destination. The following table 2 shows network layer attacks; so that the approach
the comparison of results by the proposed can be made more robust against attacks.
combined approach with that of a nearest
approach namely (2ACK+DSR) approach. The
(2ACK +DSR) approach is taken because it
also uses a 2 hop acknowledgement scheme.

Table 2. Comparison Results

Approach Link Status Packet Routing


Delivery Overhead
ratio REFERENCES
AODV+ACK+ Proper 100% Low [1] .V.P.Sundararajan, Dr.A.Shanmugam,‖
PFC Misbehavin 95% Low Performance Analysis of Selfish Node Aware
g Routing Protocol for Mobile Ad Hoc Networks‖,
2ACK+DSR Proper 98% High ICGST-CNIR Journal, Volume 9, Issue 1, July
Misbehavin 91% High 2009.
g

168
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[2] Payal N.Raj and Prashant B. swadas, ― ― Adaptive and Secure Routing protocol for
DPRAODV: A Dynamic learning system against Emergency Mobile Ad Hoc Networks‖,
blackhole attack in AODV based MANET‖, International Journal of Wireless and Mobile
International Journal of Computer Networks (IJWMN), Vol-2, No-2, May 2010.
ScienceIssues, Vol-2, 2009.
[5] Y. Zhang, and W. Lee, ―Intrusion detection
[3] Shailender Gupta and Chander Kumar, in wireless ad-hoc networks‖, in Proc. 6th ACM
―Shared Information BasedSecurity Solution International Conference on Mobile Computing
forMobile Ad Hoc Networks‖, International and Networking, Boston, USA, August 2000.
Journal of Wireless and Mobile Networks International Journal of Wireless & Mobile
(IJWMN), Vol-2, N0-1, Feb 2010. Networks (IJWMN) Vol.2, No.4, November
2010
[4] Emmanouil A. Panaousis, Tipu A.
Ramrekha, Grant P. Millar and Christos Politis,

169
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

IMAGE RECOGNITION FOR DESIGNING CAPTCHAS


* C.Poonguzhali, **D.Chithra
Master of Engineering in Computer Science
S.A.Engineering College
*poonguzhali81@yahoo.co.in, **meetchithra.d@gmail.com
Abstract- This project proposes IMAGINATION Thus the basic requirement of a CAPTCHA is that
(IMAge Generation for INternet AuthenticaTION), a computer programs must be slower than humans in
system for the generation of attack-resistant, user- responding correctly. To that purpose, the semantic
friendly, image-based CAPTCHAs. In our system, we gap [9] between human understanding and the
produce controlled distortions on randomly chosen current level of machine intelligence can be
images and present them to the user for annotation exploited. Most current CAPTCHAs are text-based.
from a given list of words. The distortions are Commercial text-based CAPTCHAs have been broken
performed in a way that satis.es the incongruous using object-recognition techniques [7], with
requirements of low perceptual degradation and accuracies of up to 99% on EZ-Gimpy. This reduces
high resistance to attack by content-based image the reliability of security protocols based on text-
retrieval systems. Word choices are carefully based CAPTCHAs. There have been attempts to
generated to avoid ambiguity as well as to avoid make these systems harder to break by
attacks based on the choices themselves. systematically adding noise and distortion, but that
Preliminary results demonstrate the attack- often makes them hard for humans to decipher as
resistance and user-friendliness of our system well. Image-based CAPTCHAs such as [1, 3, 8] have
compared to text-based CAPTCHAs. been proposed as alternatives to the text media.
General Terms — Verification &, Security. More robust and user-friendly systems can be
Keywords — Automated Turing Test, CAPTCHA, and developed. State-of-the -art content-based image
Image Retrieval. retrieval (CBIR) and annotation techniques have
shown great promise at automatically finding
INTRODUCTION semantically similar images or naming them, both of
A way to tell apart a human from a computer by a which allow means of attacking image-based
test is known as a Turing Test [10].When a CAPTCHAs. User-friendliness of the systems are
computer program is able to generate such tests and potentially compromised when repeated responses
evaluate the result, it is known as a CAPTCHA are required [3] or deformed face images are shown
(Completely Automated Public test to Tell Computers [8].
and Humans Apart) [1]. In the past, Websites have One solution is to randomly distort the images
often been attacked by malicious programs that before presenting them. However, current image
register for service on massive scale. Programs can matching techniques are robust to various kinds of
be written to automatically consume large amount of distortions, and hence a systematic distortion is
Web resources or bias results in on-line voting. This required. Here, we present IMAGINATION, a system
has driven researchers to the idea of CAPTCHA- for generating user-friendly image-based CAPTCHAs
based security, to ensure that such attacks are not robust against automated attacks. Given a database
possible without human intervention, which in turn of images of simple concepts, a two-step user-
makes them ineffective. CAPTCHA-based security interface allows quick testing for humans while being
protocols have also been proposed for related expensive for machines. Controlled composite
issues, e.g., countering Distributed Denial-of-Service distortions on the images maintain visual clarity for
(DDoS) attacks on Web servers [6]. A CAPTCHA acts recognition by humans while making the same
as a security mechanism by requiring a correct difficult for automated systems.
answer to a question which only a human can Requiring the user to type in the annotation may
answer any better than a random guess. Humans lead to problems like misspelling and polysemy [3].
have speed limitation and hence cannot replicate the In our system, we present to the user a set of word
impact of an automated program. choices, and the user must choose the most suitable
image descriptor. A problem with generating word

170
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

choices is that we might end up having, say, the from the word list, is recorded alongside the
word ―dog‖ and the word ―wolf‖ in the list, and this particular image category and distortion type. Since
may cause ambiguity in labeling. To avoid this it is difficult to get user responses for each distortion
problem, we propose a WordNet-based [5] algorithm type over all images 𝜒, we measure the average
to generate a semantically non-overlapping set of recognizability for a given distortion using the
word choices while preventing odd-one-out attacks following. If ( ) is the set of all images presented
using the choices themselves. Because the number to users subjected to (·),
of choices are limited, the location of the mouse-
click on the composite image acts as additional user
input, and together with the annotation, it forms the
two-step mechanism to reduce the rate of random
attacks. is correctly recognized(4)where I is the indicator
function. The implicit assumptions made here, under
II. HELPFUL HINTS which the term is comparable to )
or ñirm(äy) is that (a) all users independently assess
recognizability of a distorted image (since they are
A. Algorithmic Recognizability presented privately, one at a time), and (b) with
sufficient, but not necessarily identical number of
Algorithms that attempt to perform image responses, the average recognizability measures
recognition under distortion can be viewed from two converge to their true value.
different angles here. First, they can be thought of Assessing Recognizability with User Study: The user
as methods that potential adversaries may employ in study we use in order to measure what we term as
order to break image CAPTCHAs. Second, they can the average human recognizability under
be considered as intelligent vision systems. Because distortion , is only one of many ways to assess the
the images in question can be widely varying and be ability of humans to recognize images in clutter. This
part of a large image repository, content-based metric is designed specifically to assess the usability
image retrieval (CBIR) systems [30] seem apt. of CAPTCHAs, and may not reflect on general human
Essentially a memory-based method of attack, the vision. Furthermore, the study simply asks users to
assumption is that the adversary has access to the choose one appropriate image label from a list of 15
original (undistorted) images (which happens to be a words, and recognizability is measured as the
requirement [3] of CAPTCHAs) for matching with the fraction of times the various users made the correct
distorted image presented. While our experiments choice. While correct selection may mean that the
focus on image matching algorithms, other types of user recognized the object in the image correctly, it
algorithms also seem plausible attack strategies. could also mean that it was the only choice
Near-duplicate detection [15], which focus on perceived to be correct, by elimination of choices
finding marginally modified/distorted copyrighted (i.e., best among many poor matches), or even a
images, seems to be a potential choice as well. This random draw from a reduced set of potential
is part of our future work. Automatic image matches. Furthermore, using the averaged
annotation and scene recognition techniques [7] responses over multiple users could mean that the
have potential, but given the current state-of-the- CAPTCHA may still be unusable by some fraction of
art, these methods are unlikely to do better than the population. While it is very difficult to assess true
direct image-to-image matching. recognizability, our metric serves the purpose it is
used for: the ability of users to pick one correct label
B. Human Recognizability from a list of choices, given a distorted image, and
We measure human recognizability under distortion hence we use these averaged values in the
using a controlled user study. An image I is sampled CAPTCHA design. Furthermore, the user study
from X, subjected to distortion (·), and then consists of roughly the same number of responses
presented to a user, along with a set of 15 word from over 250 random users, making the average
choices, one of which is unambiguously an recognizability metric fairly representative. Later in
appropriate label. While higher than 15 choices Sec. V, we will see that there is sufficient room for
makes it harder to solve automatically, too many relaxing the intensity of distortions so as to ensure
choices also makes it more challenging for humans high recognizability for most users, without
and hence affects usability. The user choice, made compromising on security.

171
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

randomly partition the image in the following two


C. Candidate Distortions ways:

We look at image distortion candidates that are Multiple random orthogonal partitions. Image
relevant in designing image CAPTCHAs. With the segments, generated using k-means clustering with
exception of the requirement that the distortion k-center initialization on color, followed by
should obfuscate machine vision more than human connected component labeling.
vision, the space of possible distortions ä y(·) is
unlimited. Any choice of distortion gets further In either case, for each such partition, randomly
support if simple filtering or other pre-processing select y colors (being the parameter for this
steps are ineffective in undoing the distortion. distortion) and use them to dither that region. This
Furthermore, we avoid non-linear transformations on leaves a segment wise dithering effect on the image,
the images so as to retain basic shape information, which is difficult to undo. Automatic image
which can severely affect human recognizability. For segmentation is expected to be particularly affected.
the same reason we do not use other images or Distortion tends to have a more severe effect on
templates to distort an image. Pseudo-randomly recognizability at lower values of y.
generated distortions are particularly useful here, as
with text CAPTCHAs. For the purpose of making it 3 Cutting and Re-scaling
harder for machine recognition to undo the effect of For machine recognition methods that rely on pixel-
distortion, we need to also consider the approaches to-pixel correspondence based matching, scaling and
taken in computer vision for this task. In the translation helps making them ineffective.Take a
literature, the fundamental step in generic portion of one of the four sides of the image, cut out
recognition tasks has been low-level feature between 10 - 20% from the edge (chosen at
extraction from the images [30], [7]. In fact, this is random), and re-scale the remainder to bring it back
the only part of the recognition process that we to the original image dimensions. This is rarely
have the power to affect. disruptive to human recognition, since items of
interest occupy the central region in our image set.
1 Color Quantization On the other hand, it breaks the pixel
Instead of allowing the full color range, we quantize correspondence. Which side to cut is also selected at
the color space for image representation. For each random.
image, we transform pixels from RGB to CIE-LUV 4 Line and Curve Noise
color space. The resultant color points, represented Addition of pixel-wide noise to images is typically
in _3 space, are subject to k-means clustering with reversible by median filtering, unless very large
k-center initialization [4]. A parameter controls the quantities are added, in which case human
number of color clusters generated by the k-means recognizability also drops. Instead, stronger noise
algorithm. All colors are then mapped to this elements can be added on to the image, at random.
reduced set of colors. A lower number of color In particular, thick lines, sinusoids, and higher-order
clusters translate to loss of information and hence curves are added.
lower recognizability. 5. Word Choice Generator

The word choice generator quickly creates an


unambiguous list of 15 words, inclusive of the
2 Dithering correct label. For this, make use of a Word Net
Similar to half-toning of the printing industry, color based [22] word similarity measure proposed by
dithering is a digital equivalent that uses a few Leacock and Chodorow [17]. The 14 incorrect
colors to produce the illusion of color depth. This is a choices are generated by sampling from the word
particularly attractive distortion method here, since it pool, avoiding any one that is too similar
affects low-level feature extraction (on which semantically (determined by a threshold on
machine recognition is dependent) while having, by similarity) to the correct label. Though a more
design, minimal effect on human vision. elaborate strategy was proposed in [8], but for
Straightforward application of dithering is, however, limited pools of words, this simpler strategy was
ineffective for this purpose since a simple mean filter equally effective.
can restore much of the original image. Instead, we

172
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The system presents the user with a set of 8 images


tiled to form a single composite image. The user
III EXISTING SYSTEM must then select an image she wants to annotate by
clicking near its geometric center. If the location of
Existing system mainly based on the DOS (Denial of the click is near one of the centers, a controlled
Service) and CBIR (Content Based Image Retrieval). distortion is performed on the selected image and
In general, DoS attacks involve generating a large displayed along with a set of word choices pertaining
number of automated (machine) requests to one or to it, and the user must choose the appropriate one.
more network devices (e.g., servers) for resources in If the click is not near any of the centers or the
some form with the goal of overwhelming them and choice is invalid, the test restarts. Otherwise, this
preventing legitimate (human) users from getting click-and annotate process is repeated one more
their service. In distributed DoS, multiple machines time, passing which the CAPTCHA is considered
are compromised and used for coordinated cleared. The reason for having the click phase is that
automated attacks, making it hard to detect and the word choices are limited, making random attack
block the attack sources. rate fairly high. Instead of having numerous rounds
To prevent such forms of attacks and save of annotate, user clicks tend to make the system
resources, the servers or other network devices can more user-friendly, while decreasing the attack rate.
require CAPTCHA solutions to accompany each
request, thus forcing human intervention, and Click Step
hence, in the very least, reducing the intensity of the A single image is created on-the-fly by sampling 8
attacks. Because CAPTCHAs can potentially play a images from R and tiling them according to a
very critical role in system security, it is imperative randomly generated orthogonal partition. This image
that the design and implementation of CAPTCHAs be is then similarly partitioned twice over. Each time,
relatively foolproof. There has been sizable research and for each partition, 18 colors are chosen at
output in designing as well as breaking CAPTCHAs. random from the RGB space and are used to dither
In both these efforts, computing research stands to that partition using the two stage Floyd-Steinberg
benefit. error-diffusion algorithm.
IV PROPOSED SYSTEM
This proposed system explores the exploitation of The two rounds of dithering are employed to ensure
the limitation for potentially preventing automated that there is increased ambiguity in image borders
network attacks by bots such as eating up (more candidate ‗edges‘), and to make it much more
resources, biasing results, etc. difficult to infer the original layout. An example of
such an image is shown in Fig. 1 What the user
While undistorted natural images have been shown needs to do is select near the physical center of any
to be algorithmically recognizable and searchable by one of the 8 images. Upon successfully clicking
content to moderate levels, controlled distortions of within a tolerance radius r of one of the 8 image
specific type and strength can potentially make centers, the user is allowed to proceed. Otherwise,
machine recognition harder without affecting human authentication is considered failed.
recognition. This difference in recognizability makes
it a promising candidate for automated

Turing tests called CAPTCHAs which can differentiate


humans from machines. The application of
controlled distortions of varying nature and strength
are studied, and their effect on human and machine
recognizability. While human recognizability is
measured on the basis of an extensive user study,
machine recognizability is based on memory-based
content based image retrieval (CBIR) and matching
algorithms. A significant research topic within signal
analysis, CBIR is actually conceived here as a tool
for an adversary, so as to help us design more Annotate Step
foolproof image CAPTCHAs. Here, an image is sampled from R, a distortion type
and strength is chosen (from among those that

173

Figure 1 Image – based authentication


th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

satisfy the requirements - we find this out cause ambiguity, and the choices themselves may
experimentally, s described in Sec. V), applied to result in odd one- out attacks if the correct choice is
the image and presented to the user along with an semantically different from all others. An algorithm is
unambiguous choice of 15 words (generated proposed to generate the word choice set W
automatically). A sample screenshot is presented in containing unambiguous choices for the ease of
Fig. 2. If the user fails in image recognition, users, while ensuring that word-based attacks are
authentication is immediately considered failed and ineffective. For this a WordNet-based [5] is used.
re-start from step 1 is necessary. Semantic word similarity measure [4], denoted by
d(w1, w2) where w1 and w2 are English words.
Given the correct annotation wk (e.g. ―tiger‖) of
image ik, and optionally, other words Wo (e.g.
{―lake‖}) with the requirement of Nw choices, the
algorithm for determining W is as follows:

1. Set W  {wk} +Wo, t  1.


Figure 2 Distorted Images 2. Choose a word wi € W randomly from the
Determining the Allowed Distortion Set database.
Images can be distorted in various ways. The design 3. flag = 0.
of an allowed distortion set D requires the inclusion 4. For each word wi € W
of distortions that maintains good visual clarity for 5. If d(wk, wi) < θ then flag = 1.
recognition by humans while making automated 6. If flag = 1 then go to step 2.
recognition hard. CAPTCHA requires that the 7. W W + {wi }; t  t + 1
annotated database and relevant code be publicly 8. If t < Nw then go to step 2.
available, for added security. If undistorted images 9. W  W -Wo
from the database were presented as CAPTCHAs, The value of 8 depends on what range of values the
attacks would be trivial. Previous systems proposed word similarity measure yields and can be
are liable to such attacks. If the images are determined empirically or based on user surveys (i.e.
randomly distorted before being presented to the what values of θ causes ambiguity). Geometrically
user, it may still be possible to perform attacks using speaking, this method yields word choices like as if
computer vision techniques such affine/scale all the words lie beyond the boundaries of a (Nw)-
invariant features and CBIR. dimensional simplex or hyper-tetrahedron.

The main aim of building image-based CAPTCHAs V. CONCLUSION AND FUTURE WORK
secure against such attacks. Certain assumptions
about possible attack strategies are needed in order We have presented a novel way to distinguish
to design attack resistant distortions. Here, the only humans from machines by an image recognition
feasible way is to use CBIR to perform inexact test, one that has far-reaching implications in
matches between the distorted image and the set of computer and information security. The key point is
images in the database, and use the label associated that image recognition, especially under missing or
with an appropriately matched one for attack. This pseudo information is still largely unsolved, and this
assumption is reasonable since attack strategy needs fact can be exploited for the purpose of building
to work on the entire image database in real time in better CAPTCHA systems than the vulnerable text-
order to be effective, and image retrieval usually based CAPTCHAs that are in use today. We have
scales better than other techniques. explored the space of systematic distortions as a
means of making automated image matching and
Determining the Word Choice Set recognition a very hard AI problem. Without on-the-
For word choice generation, factors related to fly distortion, and with the original images publicly
image-based CAPTCHAs that have not been available, image recognition by matching is a trivial
previously addressed are it may be possible to task. We have learned that atomic distortions are
remove ambiguity in labeling images (hence making largely ineffective in reducing machine-based
annotation easier for humans) by the choices attacks, but when multiple atomic distortions
themselves, the images might seem to have multiple combine; their effect significantly reduces machine
valid labels (e.g. a tiger in a lake can be seen as recognizability.
―tiger‖ and ―lake‖ as separate entities), and this may

174
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Our study, while in no way encompassing the entire [9] J. Elson, J. R. Douceur, J. Howell, and J. Saul,
space of distortions (or algorithms that can ―Asirra: A CAPTCHA that Exploits Interest-Aligned
recognize under distortion), presents one way to Manual Image Categorization,‖ Proc. ACMCCS, 2007.
understand the effects of distortion on the [10] F. Fleuret and D. Geman, ―Stationary Features
recognizability of images in general, and more and Cat Detection,‖ J.
specifically to help design image CAPTCHA systems. Machine Learning Research, 9:2549-2578,
Furthermore, it attempts to expose the weaknesses 2008.
of low-level feature extraction to very simple [11] R.W. Floyd and L. Steinberg, ―An Adaptive
artificial distortions. As a bi-product, In the future, Algorithm for Spatial Grey Scale,‖ Proc. Society of
large scale user-studies can be carried out on the Information Display, 17:75-77, 1976.
ease of use, building a Web interface to the [12] P. Golle, ―Machine learning attacks against the
IMAGINATION system, and generate greater attack Asirra CAPTCHA,‖
resistance by considering other possible attack Proc. ACM CCS, 2008.
strategies such as interest points, scale invariants, [13] Guardian, ―How Captcha was foiled: Are you a
and other object recognition techniques. man or a mouse?‖
REFERENCES http://www.guardian.co.uk/technology/2008/
[1]. L. von Ahn, M. Blum, and J. Langford, ―Telling internet.captcha. Retrieved on 08/28/2008.
Humans and [14] A.K. Jain and R. C. Dubes, Algorithms for
Computers Apart (Automatically) or How Lazy Clustering Data, Prentice
Cryptographers do AI,‖ Hall, 1988.
Communications of the ACM, 47(2):57-60, 2004. [15] Y. Ke, R. Sukthankar, and L. Huston, ―Efficient
[2] A.L. Blum and P. Langley, ―Selection of Relevant Near-duplicate Detection and Subimage Retrieval,‖
Features and Proc. ACM Multimedia, 2004.
Examples in Machine Learning,‖ Artificial [16] R.E. Korf, ―Optimal Rectangle Packing: New
Intelligence, 97(1–2):245 Results,‖ Proc. ICAPS,
271,1997. 2004.
[3] ―The CAPTCHA Project,‖ [17] C. Leacock and M. Chodorow, ―Combining Local
http://www.captcha.net. Context and WordNet Similarity for Word Sense
[4] K. Chellapilla and P. Y. Simard, ―Using Machine Identification,‖ Fellbaum, 1998.
Learning to Break [18] D.B. Lenat, ―Cyc: A Large-Scale Investment in
Visual Human Interaction Proofs (HIPs),‖ Proc. Knowledge Infrastructure,‖Comm. of the ACM,
NIPS, 2004. 38(11):33-38, 1995.
[5] M. Chew and J. D. Tygar, ―Image Recognition [19] J. Li and J.Z. Wang, ―Real-time Computerized
CAPTCHAs,‖ Proc. Annotation of Pictures,‖ IEEE Trans. Pattern Analysis
ISC, 2004. and Machine Intelligence, 30(6):985–1002, 2008.
[6] Computerworld, ―Building a better spam-blocking [20] D.G. Lowe, ―Object Recognition from Local
CAPTCHA,‖ Scale-invariant Features,‖ Proc. ICCV, 1999.
http://www.computerworld.com/action/article.do? [21] C.L. Mallows, ―A Note on Asymptotic Joint
command=viewArticleBasic&articleId=9126378. Normality,‖ Annals of
Retrievedon 01/23/2009. Mathematical Statistics, 43(2):508–515, 1972.
[7] R. Datta, D. Joshi, J. Li, and J. Z. Wang, ―Image [22] G. Miller, ―WordNet: A Lexical Database for
Retrieval: Ideas, English,‖ Communications of the ACM, 38(11):39-41,
Influences, and Trends of the New Age,‖ ACM 1995.
Computing Surveys, [23] W.G. Morein, A. Stavrou, D.L. Cook, A.D.
40(2):1-60, 2008. Keromytis, V. Mishra, and D.
[8] R. Datta, J. Li, and J. Z. Wang, ―IMAGINATION:
A Robust Image-based CAPTCHA Generation
System,‖ Proc. ACM Multimedia, 2005.

175
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

PMG BASED HANDOFF IN WIRELESS MESH


NETWORKS

*S.Aruna** P.Prabhu *** M.Ramnath

*ASSISTANT PROFESSOR, DEPARTMENT OF IT, Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College
, Avadi, Chennai andersaruna@gmail.com
** ASSISTANT PROFESSOR, DEPARTMENT OF IT, Anjalai Ammal-Mahalingam , Engineering College, Kovilvenni,
Tiruvarur . ppradhu07@gmail.com
***PG, STUDENT (M.E/NETWORK ENGINEERING), DEPARTMENT OF IT,
Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College, Avadi,Chennai. ramnath25@gmail.com

mobile agent (MA)-based handoff approach to


Abstract –THE wireless mesh network (WMN) has address the issue of user mobility. Our approach
recently emerged as a promising technology for aims to reduce handoff delay and provide seamless
next-generation wireless. In WMN each mesh client handoff.
has a Mobile Agent (MA) residing on its registered Wireless mesh networks (WMNs) are
mesh router to handle the handoff signaling dynamically self-organized and self-configured, with
process. We proposed, handoff management for IP- the nodes in the network automatically establishing
based WMNs remains largely unexplored. an ad hoc network and maintaining the mesh
Conventional handoff mechanisms can cause connectivity. WMNs are comprised of two types of
significant performance degradation when directly nodes: mesh routers and mesh clients. Other than
applied to WMNs due to overlooking the key features the routing Capability for gateway/bridge functions
of WMNs. The proposed Planned Multicast Group as in a conventional Wireless router, a mesh router
(PMG)-based architecture can facilitate cross-layer contains additional routing Functions to support
handoffs and hence reduce the total handoff delay mesh networking. Through multi-hop
caused from multiple layers. Extensive simulations Communications, the same coverage can be
are conducted to evaluate the feasibility and achieved by a Mesh router with much lower
efficiency of the proposed PMG approach. transmission power. To further improve the flexibility
of mesh networking, a mesh router is usually
equipped with multiple wireless interfaces built on
Keywords - Mobility, Wireless Mesh either the same or different wireless access
Networks,Planned Multicast Group technologies. In spite of all these differences, mesh
and conventional wireless Routers are usually built
I. INTRODUCTION based on a similar platform.

IN THE last few years, the wireless mesh network


(WMN) has drawn significant attention as a fast,
easy, and inexpensive solution for broadband Mesh routers have minimal mobility and form the
wireless access. However, there are still many mesh Backbone for mesh clients. Thus, although
technical challenges that we have to overcome mesh clients can also work as a router for mesh
before the WMN can fully be deployed. Particularly, networking, the hardware platform and software for
it is crucial to provide mobility support in the WMN, them can be much simpler than those for mesh
because wireless users are free to move to routers. For example, communication protocols for
anywhere at any time.In this paper, we propose a mesh clients can be

176
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Light-weight, gateway or bridge functions do not B. Handoff Challenges


exist in mesh clients, and only a single wireless
interface is needed in a mesh client. Mesh clients achieve Internet access
through mesh routers. A mesh client quite often
moves from the coverage of one mesh router to that
II. HANDOFF CHALLENGES IN WMNs of another. As a result, it becomes an Urgent task in
WMNs to maintain the ongoing connections of
A. WMNs roaming users. The mobile IP and related protocols
can be applied to WMNs to support user mobility,
As shown in Fig. 1, a WMN consists of two types of but they only focus on the IP identity problem. In
Nodes: mesh routers and mesh clients. The mesh this paper, we investigate another important aspect
routers form an infrastructure of the mesh backbone of user mobility support, i.e., the handoff process.
for mesh clients. In general, mesh routers have Ideally, WMN handoff should be accomplished with
minimal mobility and operate just like a network of low computing cost and short latency so that the
fixed routers, except that they are connected by handoff process can be completely transparent to
wireless links through wireless technologies such as mesh clients. In this paper, we define a WMN that
IEEE 802.11. We observe from Fig. 1 that a WMN offers the above handoff function as a seamless
can access the Internet through a gateway mesh handoff WMN.
router, which is connected to the internet protocol Most WMNs today require specially modified clients
(IP) core network with physical wires. In a WMN, to transfer connectivity from one mesh router to
every mesh router is equipped with a traffic another. Although some of them give the
aggregation device (similar to an 802.11 access appearance of continuous connectivity to a roaming
point) that interacts with individual mesh clients. The client, handoff delay can be as long as several
mesh router relays aggregated data traffic of mesh seconds. This delay is unacceptable for real-time
clients to and from the IP core network. Typically, a applications, such as voice over IP (VoIP) or
mesh router has multiple wireless videoconferencing. In the current 802.11
Interfaces to communicate with other mesh routers implementation, the handoff consists of two phases,
and each wireless interface correspond to one i.e., channel scanning and connection
wireless channel. These wireless channels have reestablishment. During channel scanning, the mesh
different characteristics, because wireless interfaces router scans all channels to collect the information
are running on different frequencies and built on about neighboring mesh routers. During connection
either the same or different wireless access reestablishment, the mesh client first registers to the
technologies, e.g., IEEE 802.11a/b/g/n. It is also new mesh client through authentication
possible that directional antennas are employed on Then proceeds to the post registration stage, which
some interfaces to establish wireless channels over includes
long distances. Reassociation, CAC, rerouting, and resource
reservation to meet the requirements of real-time
applications. To reduce handoff delay, previous
studies mainly focused on shortening the channel
scan latency. Different from previous works, in this
paper, we propose an MA-based handoff
architecture, where an MA takes care of the handoff
signaling process in the network layer and above.
Specifically, an MA accomplishes the tasks of the
post registration stage, such as reassociation, CAC,
rerouting, and resource reservation, prior to the
actual handoff. As a result, the handoff mesh client
is able to immediately continue connectivity after
registering to the new mesh router.

III. RELATED WORKS

Fig.1: Wireless Mesh Backbone Network In this section, we propose an MA-based handoff
architecture, which offers seamless and fast handoff

177
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

to support VoIP and other real-time applications. In To provide seamless handoff, we apply MA
our approach, all the handoff logics are done by the technology to WMNs. As shown in Fig. 2, in our
MA, and only the standard medium-access control solution, each mesh client is assigned a ―client MA.‖
protocol and IP are used. Therefore, it is compatible The mesh client places its client MA in the mesh
with any 802.11 mobile devices, regardless of the router that it registers with. If the mesh client moves
vendor or architecture. from the coverage of one mesh router to that of
another mesh router, the client MA also migrates.
A. MA-BASED HANDOFF ARCHITECTURE IN WMNs We study the scenario that a mesh client moves
from the coverage of one mesh router to that of
An MA is an executing program that can migrate another mesh router during a call. To eliminate the
during execution from machine to machine in a overall handoff latency, we can employ a proactive
heterogeneous network. In other words, the agent scan scheme to counteract channel scan delay and
can suspend its execution, migrate to another our MA approach to counteract connection
machine, and then resume execution on the new reestablishment delay.
machine from the point at which it left off. On each Particularly, when the scan trigger of a proactive
machine, scan scheme is fired, the mesh client will actively
probe channels and choose the appropriate
neighboring mesh router for handoff. Then, the
client MA will move from the current mesh router to
the chosen mesh router and complete the processes
of reassociation, CAC, rerouting, resource
reservation, etc. Once the handoff trigger is fired,
the mesh client will register to the new mesh router
and resume all the connections using the facilities
that have been prepared by the client MA earlier.

Fig. 2: Architecture of MA-based handoff in WMNs.

The agent interacts with local resources to


accomplish its task. MAs have several advantages in
developing distributed computing applications. By
migrating to the information resource, an agent can
locally invoke resource operations, eliminating the
network transfer of intermediate data. By migrating
to the other side of an unreliable network link, an
agent can continue executing, even if the network
link goes down, making MAs particularly attractive in Fig.3: Process of MA-based handoff.
mobile computing environments. By choosing
different migration strategies, an agent can adapt Fig. 3 demonstrates that there are five steps in the
itself to different tasks and network conditions, joint
achieving full flexibility and customization. It is Handoff process of the proactive scan scheme and
appropriate to deploy MAs in a WMN, since a WMN our MA
is a typical distributed system with the feature of Approach. First, the scan trigger of the proactive
―mobility.‖ scan scheme activates the channel scan, which
locates the new mesh router for handoff. Second,
B. MA-Based Handoff the mesh client will inform its client MA on the
current mesh router which one is the new mesh
router. Third, the current mesh router transfers the

178
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

client MA to the new mesh router in the Our PMG-based WMN architecture has the
neighborhood, and the client MA will preset up following major advantages:
backup connections on the new mesh router to
prepare for seamless handoff. The preset up of By planning multicast groups during the
backup connections usually involves reassociation for deployment,
context switching between the old access point (AP) each mesh router knows that which subnet it
and the new AP by the inter access belongs to in
Point protocol, interaction with the CAC module for advance. This design makes it straightforward for
resource reservation and negotiation with the address
routing protocol for network layer path Management and L3 handoff detection.
reestablishment. Fourth, once the backup connection Both multicast groups and PMG multicast messages
is built up, the client MA will notify the mesh client can be easily implemented in IPv6 as multicast
that it is ready for handoff. Finally, the mesh client addressing
receives the notification and waits for the fire of the is a required part of the IPv6 protocol. Therefore, we
handoff trigger to register to the new mesh router believe that this solution is feasible and practical to
and complete the handoff. The foregoing be Implemented in future IPv6-based WMNs.
illustrations show that before the actual handoff Information sharing for network management for
occurs in the fifth step, the client MA has already intra subnet roaming is restrained to within a
constructed a backup connection on the new mesh multicast group, instead of broadcasting to the
router in the third step. As a result, overall handoff whole mesh backbone, which saves signaling
delay only involves registration delay, which is spent overhead. Information sharing between groups can
on the authentication information exchange between be implemented using PMG.
the mesh client and the new mesh router. In
addition to reducing handoff delay, MA-based Since the PMG-WMN architecture can facilitate the
handoff can also achieve high computing efficiency. Cross-layer protocol design and PMGs are able to
The client MA executes handoff logics on the mesh exchange handoff information between different
router where the network computing resource is subnets, both intra and inter-gateway mobility can
affluent and, thus releases the burden on the mesh be improved.
client, which is dedicated to running user
applications. The basic idea of the proposed cross-layer
handoff design is to take advantages of the PMG-
IV. PROPOSED PMG CROSS-LAYER HANDOFF based architecture and utilize the information
DESIGN obtained from the L2, such as the link quality of the
new channel and the IP address of the new AP after
We propose a PMG-based approach to a handoff, to predict the L3 and L5 handoffs in
position and configure mesh routers in order to form advance so that part of the handoff procedures can
a scalable wireless be carried out.
mesh backbone for mobility assistance. The benefit
of this
approach is that the protocols used for address
management and handoffs can be streamlined to
take advantage of the resulting network
architecture. Under the PMG approach, mesh routers
are grouped into connected multicast groups rooted
at gateway mesh routers. Special mesh routers,
namely PMGs, are equipped with multiple IP
addresses with each address corresponding to a
different subnet. Note that a mesh router can use
the Address Resolution Protocol (ARP) to map
different IP addresses to the MAC address of the
router. PMGs are the bridging nodes connecting Fig.4: Hand-off Delay using PMG based cross-layer
different groups. They can facilitate information handoff design
exchange between different groups during inter-
gateway handoffs.

179
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Parallel before an MN completes a L2 handoff. Fig. 5 the PMG to prepare the LHRD path between the cap
shows the sequence of the handoff delays of the and (PGMMR) for the MN in advance, while the PMG
proposed PMG based cross-layer handoff scheme takes care of the UHRD to the old gateway. For the
and the complete handoff procedure is shown in inter-gateway case, the corresponding PMG which
Algorithm 1. belongs to both the old and new subnets first
formulates an IP address for the MN by using the
Algorithm 1 cAP‘s network ID and MN‘s interface ID via the
PMGM it receives. This IP address is stored in the
PMG-based algorithm for cross-layer handoffs PMGMR and cAP‘s routing table before MN‘s L3
1. While (true) { handoff starts. Furthermore, the PMG prepares for
2. If (RL3T <RCUR ≤ RL2T) the LHRD to the cAP and the UHRD to the new
3. MN sends an HOM to its oAP to retrieve nAP list; gateway. By doing so, the routing path for both the
4. OAP informs nap‘s to activate additional channel; binding update to the HA and binding
5. MN sends Probe Request & waits for Probe acknowledgement from the HA are prepared for the
Response; MN in advance. Nevertheless, since the PMGMR
6. MN sorts nAPs & obtains the cAP; could be multiple hops away from the cAP and the
7. MN sends an HOM which contains the preferred gateway, the preparation time for both the LHRD
CAP‘s network ID to its oAP; and UHRD increases with the number of hops. After
8. OAP sends a multicast PMGM to locate the the MN finishes the L2 handoff, the cAP sends the
PMGMR; MN the IP address of the MN. Our objective in this
9. If (L2HT <RCUR ≤ RL3T) stage is to eliminate both the L3 handoff detection
10. OAP unicasts to the PMGMR for handoff delay and routing path discovery delay which are
preparations; significant handoff delays in the L3 handoff.
11. If (cAP belongs to another subnet)
12. PMGMR formulates an address for the MN; V. PERFORMANCE EVALUATION
13. PMGMR prepares the UHRD to the new gateway
& the LHRD to the cAP; In this section, we conduct simulations to evaluate
14. SIP message exchanges; the performance of the proposed PMG-based cross
15. else layer handoff scheme. Since the current modeler
16. PMGMR prepares the UHRD to the old gateway does not provide the WMN handoff support, we
& the LHRD to the cAP; implement new models for mesh routers with both
17. If (RCUR ≤ L2HT) Mobile IPv6and AODV routing functionalities
18. If (subnet changes) activated so as to realize the handoff support in IP-
19. MN associates to the cAP; based infrastructure WMNs.
20. MN obtains a new IP address & uses the
obtained A. Simulation Setup
routing path for address binding with the HA;
21. MN resumes the multimedia session on layer-5; We developed two default handoff scenarios in
22. else WMNs in order to compare with our proposed PMG-
23. MN associates to the cAP; based WMN architecture. One is the default-based
24. MN uses the obtained routing path for resuming handoff scheme which depends on RA messages to
the multimedia session on layer-5; trigger an MN‘s L3 handoff, as explained in Section
25 } III-A. The other is the gateway-based handoff
scheme under which an MN detects a L3 handoff by
A. L3 Handoff Preparation to Eliminate the L3 receiving a reply message from the gateway. In our
Handoff handoff simulation, the WMN is composed of two
Detection Delay & Routing Path Discovery Delay gateways, a few regular mesh routers, and one
PMG. All mesh routers and gateways‘ wireless
When the RCUR of the MN reaches the RL3T, it interfaces use both AODV and IPv6 routing protocols
triggers the oAP to notify the PMG to prepare for the for delivering multihop IPv6 traffic. Only the PMG
L3 handoff. The PMG first checks whether the cAP is has multiple IP addresses (two IP addresses in our
located in the current subnet of the MN or not. For simulation) with each IPv6 address belonging to a
the intra-gateway case, the IP address of the MN different subnet. The PMG message interval on
does not need to change. The cAP will be notified by PMGMR is uniformly distributed from 0.5s to 1s. The

180
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Internet backbone network has a constant latency of cross-layer design and eliminates L3 handoff
0.1 second. detection and route discovery delay.

B. Results Analysis

Fig. 5: Total handoff delay under different


percentage of background traffic.
Fig.5: shows the detailed delay elements On one hand, from the figure, we can see that the
incurred in L2, L3,and L5 handoffs versus the overhead is large if the MN is triggered to start the
number of wireless hops between the AP of the MN handoff preparation early. On the other hand, the
and the gateway, under the three considered total delay increases if the handoff preparation is
scenarios (default-based, gateway-based, and PMG- triggered late. Therefore, it is vitally important to
based). In the figure, L2SD is the L2 channel choose an appropriate handoff threshold in order to
scanning delay, L3DD is the L3 handoff detection balance the tradeoff between the overhead
delay, L3UD is the L3 binding update delay, L3AD is generated during a handoff and the corresponding
the L3 binding acknowledgement delay, L5RD is the handoff delay.
L5 multimedia session RE-INVITE delay, andTHD is
the total handoff delay. Under our proposed handoff Fig.6: Total handoff delay and number of overhead
scheme, the MN is notified the potential channel messages
information before being associated to the cAP due
to the exchange of handoff information between the VI. CONCLUSION
PMG and MN via L3 messages; For L3DD, unlike the
case of default-based and gateway-based handoff In this paper, we introduced a novel Planned
schemes in which the MN needs to wait for either a Multicast Group (PMG)-based architectural design to
RA message or the reply message from a gateway to facilitate cross-layer handoffs in WMNs. By
determine whether it has changed a subnet, the MN implementing PMG mesh routers (PMGs) which are
in our PMG-based architecture can start a L3 handoff strategically placed in the mesh backbone to cover
immediately after a L2 handoff finishes. So the L3DD target subnets, inter-gateway handoff preparations
in our proposed scheme can be reduced to almost can be proactively prepared before an MN loses its
zero. connection with the old subnet. We designed the
detail procedure of the proposed PMG-based cross-
There is no major difference in the other three layer handoff scheme. Through a comprehensive
delays (L3UD, L3AD, and L5AD) between the simulation study using the NS simulator, we showed
default-based and gateway base scenarios, since that our proposed PMG based cross layer handoff
after the MN detects its subnet change; it starts the scheme significantly reduce the total handoff delay,
L3 and L5 handoffs sequentially. In our PMG based as compared to conventional handoff schemes.
handoff scheme, as the PMGMR triggers the route Further reduction of the handoff delay can be
path preparation in the target subnet prior to the achieved through efficient multihop routing and MAC
MN‘s arrival, the L3UD, L3AD, and L5RD can be protocol design.
reduced to a level only depending on the multi-hop
signaling message traversal time.In Fig. 6, the total
handoff delay is much lower in our PMG based
handoff scheme as compared to the other two
schemes, because our proposed scheme employs a

181
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

VII. REFERENCES
[12] Y. Amir, C. Danilov, M. Hilsdale, R. Musaloui-
[1] I. F. Akyildiz and X. Wang, ―A survey on wireless Elefteri, and N. Rivera, ―Fast handoff for seamless
mesh networks,‖IEEE Communications Magazine, wireless mesh networks,‖ in Proc. ACM MobiSys,
vol. 43, no. 9, pp. 23-30, Sept. 2005. 2006, pp. 83–95.

[2] C. E. Perkins, ―IP mobility support for IPv4,‖ [13] D. C. Plummer, ―An Ethernet Address
Request for Comments (RFC) 3220, Internet Resolution Protocol,‖ IETF Request for Comments
Engineering Task Force (IETF), January 2002. (RFC) 826, November 1982.

[3] G. Holland and N. H. Vaidya, ―Analysis of TCP [14] C. E. Perkins, E. M. Belding-Royer, and S. Das,
performance over mobile ad hoc networks,‖ in Proc. ―Ad hoc on-demand distance vector (AODV)
ACM MobiCom, 1999, pp. 219–230. routing,‖ Request for Comments (RFC) 3561,
IETF, July 2003.
[4] I. Ramani and S. Savage, ―SyncScan: practical
fast handoff for 802.11 infrastructure networks,‖ in [15] C. Chang, C. J. Chang, and K. R. Lo, ―Analysis
Proc. IEEE INFOCOM, 2005, pp. 675–684. of a hierarchical cellular system with reneging and
[5] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. dropping for waiting new calls and handoff calls,‖
Johnston, J. Perterson, R. Sparks, M. Handley, and IEEE Trans. Veh. Technol., vol. 48, no. 4, pp. 1080–
E. Schooler, ―SIP: Session initiation protocol,‖ 1091, Jul. 1999.
Request for Comments (RFC) 3261, IETF, June
2002. [16] V. K. N. Lau and S. V. Maric, ―Mobility of
queued call requests of a new call-queuing
[6] H. Wu, K. Tan, Y. Zhang, and Q. Zhang, technique for cellular systems,‖IEEE Trans. Veh.
―Proactive scan: fast handoff with smart triggers for Technol., vol. 47, no. 2, pp. 480–488, May 1998.
802.11 wireless LAN,‖ in Proc. IEEE INFOCOM, 2007,
pp. 749–757. [17] W. Zhuang, B. Bensaou, and K. C. Chua,
―Adaptive quality of service handoff priority scheme
[7] N.Montavont and T. Noel, ―Handover for mobile multimedia networks,‖ IEEE Trans.Veh.
management for mobile nodes in IPv6 networks,‖ Technol., vol. 49, no. 2, pp. 494–505, Mar. 2000.
IEEE Communications Magazine, vol. 40, no. 8, pp. [18] J. Zhang, J. W. Mark, and X. Shen, ―An adaptive
38-43, August 2002. handoff priority scheme for wireless MC-CDMA
cellular networks supporting multimedia
[8] H. Soliman, C. Castelluccia, K. El Malki, and L. applications,‖in Proc. IEEE GLOBECOM, Nov. 2004,
Bellier, ―Hierarchical Mobile IPv6 Mobility vol. 5, pp. 3088–3092.
Management (HMIPv6),‖ Request for Comments
(RFC) 4140, IETF, August 2005. [19] M. R. Kibria and A. Jamalipour, ―NXG04-5: Fair
call admission control for prioritizing vertical handoff
[9] G. Dommety et. al, ―Fast Handovers for Mobile in multi-traffic B3G networks,‖ in Proc.IEEE
IPv6,‖ Request for Comments (RFC) 4068, IETF, July GLOBECOM, Nov. 2006, pp. 1–5.
2005.
[20] R. L. Geiger, J. D. Solomon, and K. J. Crisler,
[10] M. Buddhikot, A. Hari, K. Singh, and S. Miller, ―Wireless network extension using mobile IP,‖ IEEE
―MobileNAT: A new technique for mobility across Micro, vol. 17, no. 6, pp. 63–68, Nov./Dec. 1997.
heterogeneous address spaces,‖ Mobile Networks
and Applications, vol. 10, no. 3, pp. 289–302, June [21] S. Mohanty and I. F. Akyildiz, ―Performance
2005. analysis of handoff techniques based on mobile IP,
TCP-migrate, and SIP,‖ IEEE Trans. MobileComput.,
[11] V. Navda, A. Kashyap, and S. R. Das, ―Design vol. 6, no. 7, pp. 731–747, Jul. 2007.
and evaluation of iMesh: an infrastructure-mode
wireless mesh network,‖ in Proc. Sixth IEEE
International Symposium on a World of Wireless
Mobile and Multimedia Networks (WoWMoM), June
2005, pp. 164–170.

182
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A NOVEL TECHNIQUE FOR DETECTING


DATA HIDDEN ON DIGITAL IMAGE USING
STEGANOGRAPHY

* D.Anandhi **K.R.ArjunAdhityaa
*
Asst. Prof., Department of Information Technology,
anandhime@yahoo.com
**
PG Student, Department of Information Technology,
kr.arjunadhityaa@gmail.com

Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College


Avadi-alamathi road, Chennai-62, India

ABSTRACT Index Terms - least-significant-bit


Steganography is a technique (LSB)-based steganography, Least
for information hiding. It aims to embed significant bit substitution method, Least
secret data into a digital cover media, significant bit matching and
such as digital image, video, etc. On the Replacement(LSBMR).
other side, steganalysis aims to expose I. INTRODUCTION
the presence of hidden secret messages in STEGANOGRAPHY is a technique for
those stego media. If there exists a information hiding. It aims to embed
steganalytic algorithm which can guess secret data into a digital cover media,
whether a given media is a cover or not such as digital audio, image, video, etc.,
with a higher probability than random without being suspicious. On the other
guessing, the steganographic system is side, steganalysis aims to expose the
considered broken. In the existing system, presence of hidden secret messages in
pixel values are chosen randomly those stego media. If there exists a
according to a prng number .It generally steganalytic algorithm which can guess
affects the visual quality of the message whether a given media is a cover or not
which can be determined by pixel value with a higher probability than random
differencing. In the proposed system, two guessing, steganographic system is
new approach are used . Edge adaptive considered broken. In practice,two
detection which hides the data in the properties, undetectability and embedding
sharper edges. In LSB substitution method capacity, should be carefully considered
hides the choice of embedding positions when designing a steganographic
within a cover image mainly depends on a algorithm.Usually, the larger payload
pseudorandom number. These two embedded in a cover, the more detectable
methods generally improves the security artifacts would be introduced into the
and quality of image when compared to stego. In many applications, the most
the previous approach. important requirement for steganography
is undetectability, which means that the

183
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

stegos should be visually and statistically message, and the relationship (odd–even
similar to the covers while keeping the combination) of the two pixel values
embedding rate as high as possible. In carries another bit of secret message. In
this paper, we consider digital images as such a way, the modification rate of pixels
covers and investigate an adaptive and can decrease from 0.5 to 0.375
secure data hiding scheme in the spatial bits/pixel(bpp) in the case of a maximum
least-significant-bit (LSB) domain. embedding rate, meaning fewer changes
to the cover image at the same payload
LSB replacement is a well-known compared to LSB replacement and LSBM.
steganographic method. In this It is also shown that such a new scheme
embedding scheme, only the LSB plane of can avoid the LSB replacement style
the cover image is overwritten with the asymmetry, and thus it should make the
secret bit stream according to a detection slightly more difficult than the
pseudorandom number generator (PRNG). LSBM approach based on our experiments.
As a result, some structural asymmetry
(never decreasing even pixels and II. ANALYSIS OF LIMITATIONS OF
increasing odd pixels when hiding the RELEVANT APPROACHES
data) is introduced, and thus it is very AND STRATEGIES
easy to detect the existence of hidden
message even at a low embedding rate In this section, we first give a brief
using some reported steganalytic overview of the typical LSB-based
algorithms, such as the Chi-squared attack approaches including LSB replacement,
, regular/singular groups (RS) analysis, LSBM,and LSBMR, and some adaptive
sample pair analysis, and the general schemes including the original PVD
framework for structural steganalysis, scheme,the improved version of PVD
.same coverage can be achieved by a (IPVD),adaptive edges with LSB (AE-LSB),
Mesh router with much lower transmission and hiding behind corners (HBC),and then
power. To further improve the flexibility of show some image examples to expose the
mesh networking, a mesh router is usually limitations of these existing schemes.
equipped with multiple wireless interfaces Finally we propose some strategies to
built on either the same or different overcome these limitations. In the LSB
wireless access technologies. LSB replacement and LSBM approaches, the
matching (LSBM) employs a minor embedding process is very similar. Given a
modification to LSB replacement. If the secret bit stream to be embedded, a
secret bit does not match the LSB of the traveling order in the cover image is first
cover image, then or is randomly added to generated by a PRNG, and then each pixel
the corresponding pixel value. Statistically, along the traveling order is dealt with
the probability of increasing or decreasing separately. For LSB replacement, the
for each modified pixel value is the same secret bit simply overwrites the LSB of the
and so the obvious asymmetry artifacts pixel, i.e., the first bit plane, while the
introduced by LSB replacement can be higher bit planes are preserved. For the
easily avoided. Therefore, the common LSBM scheme, if the secret bit is not equal
approaches used to detect LSB to the LSB of the given pixel, then 1 is
replacement are totally ineffective at added randomly to the pixel while keeping
detecting the LSBM. Up to now, several the altered pixel in the range of . In such
steganalytic algorithms have been a way, the LSB of pixels along the
proposed to analyze the LSBM scheme. traveling order will match the secret bit
stream after data hiding both for LSB
Unlike LSB replacement and LSBM, which replacement and LSBM. Therefore, the
deal with the extracting process is exactly the same for
pixel values independently, LSB matching the two approaches.It first generates the
revisited (LSBMR)uses a pair of pixels as same traveling order according to a
an embedding unit, in which the LSB of shared key, and then the hidden message
the first pixel carries one bit of secret

184
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

can be extracted correctly by checking the data extraction. In practice, such side
parity bit of pixel values. information (7 bits in our work) can be
embedded into a predetermined region of
LSBMR applies a pixel pair(x(i),x(i+1)) in the image.
the cover image as an embedding unit. In data extraction, the scheme first
After message embedding, the unit extracts the side information from the
ismodified as (x(i)‘,x(i+1)‘) in the stego stego image. Based on the side
image which satisfies information, it then does some
LSB(x(i)‘)=m(i) preprocessing and identifies the regions
LSB((x(i)‘)/2+(x(i+1)‘)=m(i+1) that have been used for data hiding.
where the function denotes the LSB of the Finally, it obtains the secret message
pixel valueand are the two secret bits to according to the corresponding extraction
be embedded.By using the relationship algorithm. we apply such a region
(odd–even combination) of adjacent adaptive scheme to the spatial LSB
pixels, the modification rate of pixels in domain. We use the absolute difference
LSBMR would decrease compared with between two adjacent pixels as the
LSB replacement and LSBM at the same criterion for region selection, and use
embedding rate. What is more, it does not LSBMR as the data hiding algorithm. The
introduce the LSB replacement style details of the data embedding and data
asymmetry. Similarly, in data extraction, it extraction algorithms are as follows.
first generates a traveling order by a
PRNG with a shared key. And then for
each embedding unit along the order, two
bits can be extracted. The first secret bit is
the LSB of the first pixel value, and the
second bit can be obtained by calculating
the relationship between the two pixels as
shown above.

III.RELATED WORKS

A. EDGE ADAPTIVE DETECTION

The flow diagram of our proposed


scheme is illustrated inFig. 4. In the data
embedding stage, the scheme first It contains 3 different steps
initializes some parameters, which are 1. Region Selection
used for subsequent data preprocessing 2. Data embedding
and region selection, and then estimates 3. Data extraction
the capacity of those selected regions. If
the regions are large enough for hiding 1.Region Selection
the given secret message , then data
hiding is performed on the selected First initializes some parameters, which
regions. Finally, it does some are used for subsequent data
postprocessing to obtain the stego image. preprocessing and region selection, and
Otherwise the scheme needs to revise the then estimates the capacity of those
parameters, and then repeats region selected regions. If the regions are large
selection and capacity estimation until can enough for hiding the given secret
be embedded completely. Please note that message otherwise the scheme needs to
the parameters may be different for revise the parameters, and then repeats
different image content and secret region selection.A travelling order in the
message . We need them as side cover image is first generated and then
information to guarantee the validity of each pixel value is dealt separately.

185
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

/=(not equal to)


2.Data Embedding
Case 4: LSB(Xi)/=Mi & f(xi-xi+1)/=mi+1
In the data embedding stage, the scheme (x‘i,x‘i+1)=(xi+1,xi+1)
first initializes some parameters, which are /=(not equal to)
used for subsequent data preprocessing R is a random value in{-1,+1} and
and region selection, and then estimates denotes the pixel pair after data hiding.
the capacity of those selected regions. If
the regions are large enough for hiding Step 4:After data hiding, the resulting
the given secret message , then data image is divided into non overlapping
hiding is performed on the selected blocks. The blocks are then rotated by a
regions. Finally, it does some post random number of degrees based on .
processing to obtain the stego image. The process is very similar to Step 1except
that the random degrees are opposite.
Step 1:The cover image of size of is first Then we embed the two parameters into a
divided into non overlapping blocks of preset region which has not been used for
pixels. For each small block, we rotate it data hiding.
by a random degree in the range of
{0,90,180,360}, as determined by a secret 3.Data Extraction:
key. The resulting image is rearranged as To extract data, we first extract the side
a row vector by raster scanning. information, i.e., the block size and the
threshold from the stego image. We
Step 2:According to the scheme of LSBMR, then do exactly the same things as Step
2 secret bits can be embedded into each 1in data embedding. The stego image is
embedding unit. Therefore, for divided into blocks and the blocks are then
a given secret message M , the threshold t rotated by random degrees based on the
for region selection can be determined as secret key .
follows. Let EU(t) be the set The resulting image is rearranged as a
of pixel pairs whose absolute differences row vector . Finally,
are greater than or equal to a parameter t we get the embedding units by dividing
EU(t)={(xi,xi+1)||(xi-xi+1)|>=t, For all into non overlapping blocks with two
(xi,xi+1) E V} consecutive pixels. We travel the
embedding units whose absolute
Threshold t is calculated by differences are greater than or equal to
T=arg max t{2x |EU(t)>=M} the threshold according to a
pseudorandom order based on the secret
Step 3:Performing data hiding on the set key , until all the hidden bits are extracted
of completely. For each qualified embedding
EU(t)={(xi,xi+1)||(xi-xi+1)|>=t, For all unit where we extract the two secret bits
(xi,xi+1) E V} as follows:

We deal with the above embedding units mi=LSB(xi‘‘),mi+1=LSB([xi‘/2]+x‘i+1)


in a pseudorandom order determined by a
secret key . For each unit ,we perform the B. LSB SUBSTITUTION METHOD
data hiding according to the following four It contains 3 different steps
cases. 1. Region Selection
Case 1: LSB(Xi)=Mi & f(xi-xi+1)=mi+1 2. Data Embedding
(x‘i,x‘i+1)=(xi,xi+1) 3. Data Extraction
Case 2: LSB(Xi)=Mi & f(xi-xi+1)/=mi+1
(x‘i,x‘i+1)=(xi,xi+1+r) 1.Region Selection:
/=(not equal to) First initializes some parameters, which are
used for subsequent data preprocessing
Case 3: LSB(Xi)/=Mi & f(xi-xi+1)=mi+1 and region selection. the choice of
(x‘i,x‘i+1)=(xi-1,xi+1) embedding positions within a cover image

186
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

mainly depends on a pseudorandom


number. IV. Experiment Results

2.Data Embedding: In our Experiments , Four cover images


Before Embedding the data we use 8 ―Lena‖, ―Baboon‖,
bitsecret key and XOR with all the bytes of ―Peppers‖ were used, each with size
the message to beembedded. Message is 512X512. Three of the cover images used
recovered by XOR operation by thesame to embed text message. They were
key. Every pixel value in this image is compared with PVD and the results
analyzed and thefollowing checking obtained are shown in Table No.II. In
process is employed addition to this we have also introduced a
new parameter in our experiment which is
1. If the value of the pixel say gi, is in the known as Average Fractional change in
range 240 ≤ gi ≤255then we embed 4 bits Pixel value abbreviated as (AFCPV). The
of secret data into the 4 LSB‘s of thepixel. results for the AFCPV for the proposed
This can be done by observing the first 4 method are also included in Table No III.
MostSignificant Bits (MSB‘s). If they are all In Table No III results are shown where
1‘s then theremaining 4 LSB‘s can be used the message used is a image of an ATM
for embedding data. card.
TABLE I
2. If the value of gi (First 3 MSB‘s are all EMBEDDING CAPACITY OBTAINED IN PVD
1‘s), is in therange 224 ≤ gi ≤239 then we AND THE PROPOSED
embed 3 bits of secret datainto the 3 METHOD
LSB‘s of the pixel. Cover PVD PVD Our
Image Method I Method II Method
3. If the value of gi (First 2 MSB‘s are all Embeddin Embeddin Capacit
1‘s), is in the range192 ≤ gi ≤223 then we g using g using y in
embed 2 bits of secret data into the2 the range the range Bytes
LSB‘s of the pixel. widths of widths of
8, 8, 16, 2, 2, 4, 4,
4. And in all other cases for the values in 32, 64, 4, 8, 8,
the range 0 ≤gi ≤192we embed 1 bit of and128 16,
secret data in to 1 LSB of the pixel. Capacity 16, 32,
Similarly, we can retrieve the secret data in Bytes 32, 64,
from the grayvalues of the stego image by and64
again checking the first four MSB‘s of the Capacity
pixel value and retrieve the embedded in Bytes
data. Lena 50,960 25,940 35,827

3.Data Extraction: Baboo 56,291 36,061 34,235


In the extracting phase, the original range n
table is Pepper 50,685 27,269 60,317
necessary. It is used to partition the s
stego-image by the same method used for TABLE II
the cover image. Calculate the difference VALUE OF RMSE AND PSNR‘S OF STEGO
value d*(pi, pi+1) for each block of two IMAGES IN WHICH A FILE
consecutive pixels Then, find the optimum CONSISTING OF TEXT IS EMBEDDED
Ri of the d* same as in the hiding phase. Cove PVD Method I PVD Method II Our
Subtract li from d*(pi, pi+1) and b0 is r Embedding Embedding Method
obtained. The b0value represents the Imag using using
secret data in decimal number. Transform e the range the range
b0 into binary with t bits, where t = widths of widths of
[log2wi]. The t bits can stand for the 8, 8, 16, 32, 2, 2, 4, 4, 4, 8,
original secret data of hiding. 64, 8, 16,

187
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

and128 16, 32, 32, 64, released. When is 0, all the embedding
and64 units within the cover become available.
In such a case, our method can achieve
the maximum embedding capacity of
RMSE PSNR RMSE 100% (100% means 1 bpp on average for
PSNR RMSE PSNR all the methods in this paper), and
Lena 2.07 41.69 0.97 therefore, the embedding capacity of our
48.43 0.28 59.05 proposed method is almost the same as
Baboon 3.25 37.90 1.59 the LSBM and LSBMR methods except for
44.10 0.27 59.36 7 additional bits.It can also be observed
Peppers 2.09 41.73 1.20 that most secret bits are hidden within the
47.19 0.39 56.24 edge regions when the embedding rate is
TABLE III low, e.g., less than 30% in the example,
VALUES OF MSE, RMSE, PSNRS AND while keeping those smooth regions such
AFCPV OF STEGO- IMAGE IN WHICH as the sky in the top left corner as they
AN ATM CARD IMAGE IS EMBEDDED. are. Therefore, the subjective quality of
our stegos would be improved based on
Cover Image Our Method the human visual system (HVS)
MSE RMSE PSNR AFCPV characteristics.
Lena 0.14 0.38 56.42
0.001029 Our Method PVD Method
MSE: 0.08 MSE:
4.28
Fig. 2 Stego Images for the proposed
method and PVD

Based on experiments, we also observe


that the performances of the first three
edge-based schemes, i.e., PVD, IPVD, and
AE-LSB, are poorer than the LSB-based
approaches. For the HBC method, its
performance is similar to our method
although it can be easily detected by the
RS analysis (please refer to Table II),
which indicates that it is more difficult to
detect those pixel changes that along the B. Visual Attack
edges regions using the four universal Although our method embeds the secret
feature sets. message bits by changing those pixels
along the edge regions, it would not leave
A. Embedding Capacity and Image Quality any obvious visual artifacts in the LSB
Analysis planes of the stegos based on our
One of the important properties of our extensive experiments. Fig. 6 shows the
steganographic method is that it can first LSB of the cover and its stegos using our
choose the sharper edge regions for data proposed method with an embedding rate
hiding according to the size of the secret of 30% and 50%, respectively. It is
message by adjusting a threshold . As observed that there is no visual trace also,
illustrated in Fig. 5, the larger the number most smooth regions such as the are well
of secret bits to be embedded, the smaller preserved. While for the LSBM, LSBMR,
the threshold becomes, which means that and some PVD-based methods with the
more embedding units with lower random embedding scheme, the smooth
gradients in the cover image can be regions would be inevitably disturbed and

188
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

thus become more random. It shows the


LSB planes of the cover and its stegos VII. REFERENCES
using the seven steganographic methods
with the same embedding rate of 50%, [1] J. Mielikainen, ―LSB matching
respectively. It is observed that the LSB revisited,‖ IEEE Signal Process. Lett.,
planes of stegos using the LSBM, LSBMR, vol. 13, no. 5, pp. 285–287, May 2006.
PVD, and IPVD methods (especially for the [2] A. Westfeld and A. Pfitzmann, ―Attacks
LSBM due to its higher modification rate) on steganographic systems,‖
look more random compared with others. in Proc. 3rd Int. Workshop on Information
On zooming in, these artifacts are more Hiding, 1999, vol. 1768, pp.
clearly observed, as illustrated. Please 61–76.
note that the smooth regions can also be [3] J. Fridrich, M. Goljan, and R. Du,
preserved for HBC, and less smooth ―Detecting LSB steganography in
regions will be contaminated for AE-LSB color, and gray-scale images,‖ IEEE
due to its lower modification rate. Multimedia, vol. 8, no. 4, pp.
22–28, Oct. 2001.
VI. CONCLUSION [4] S. Dumitrescu, X.Wu, and Z.Wang,
―Detection of LSB steganography
In this project, we have compared Image via sample pair analysis,‖ IEEE Trans.
Steganographic in Spatial Domain. We Signal Process., vol. 51, no. 7,
have compared two different methods pp. 1995–2007, Jul. 2003. [5] J.
Edge Adaptive method and LSB Adaptive Rosenberg, H. Schulzrinne, G. Camarillo,
substitution. There exists some smooth A. Johnston, J. Perterson, R. Sparks, M.
regions in natural images . If embedding a Handley, and E. Schooler, ―SIP: Session
message in those regions , the LSB of the initiation protocol,‖ Request for Comments
stegeo image gets affected hence it is (RFC) 3261, IETF, June 2002.
easier to detect. To preserve the visual [5] A. D. Ker, ―A general framework for
features of the image, Edge Adaptive structural steganalysis of LSB
method which embed the secret message replacement,‖ in Proc. 7th Int. Workshop
in to sharper edge regions, LSB Adaptive on Information Hiding, 2005,
substitution which embed the secret vol. 3427, pp. 296–311.
message randomly selecting embedding [6] A. D. Ker, ―A funsion of maximum
positions within a cover image. In LSB likelihood and structural steganalysis,‖
substitution method more number of bits in Proc. 9th Int. Workshop on Information
can be embedded but it affects the quality Hiding, 2007, vol.
of the message whereas in Edge Adaptive 4567, pp. 204–219.
Detection only less number of bits can be [7] J. Harmsen and W. Pearlman,
embedded in the stegeo image and it does ―Steganalysis of additive-noise modelable
not affect the quality of the image. These information hiding,‖ Proc. SPIE Electronic
both methods have significantly improved Imaging, vol. 5020,
the quality and security of the image. Both pp. 131–142, 2003.
methods have their advantage and their [8] A. D. Ker, ―Steganalysis of LSB
disadvantages. matching in grayscale images,‖ IEEE
VI. FUTURE ENHANCEMENTS Signal Process. Lett., vol. 12, no. 6, pp.
It is expected that edge adaptive 441–444, Jun. 2005.
method and LSB substitution method can [9] F. Huang, B. Li, and J. Huang, ―Attack
be extended to hiding an image in an LSB matching steganography
image . It can also be extended to by counting alteration rate of the number
steganographic methods such as of neighbourhood gray
Audio/Video in the Spatial Domain or levels,‖ in Proc. IEEE Int. Conf. Image
Frequency Domain when the embedding Processing, Oct. 16–19, 2007,
rate is less than the maximal amount vol. 1, pp. 401–404.
[10] X. Li, T. Zeng, and B. Yang,
―Detecting LSB matching by applying

189
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

calibration technique for difference [15] K. M. Singh, L. S. Singh, A. B. Singh,


image,‖ in Proc. 10th ACM Workshop and K. S. Devi, ―Hiding secret
on Multimedia and Security, Oxford, U.K., message in edges of the image,‖ in Proc.
2008, pp. 133–138. Int. Conf. Information and Communication
[11] Y. Q. Shi et al., ―Image steganalysis Technology, Mar. 2007, pp. 238–241.
based on moments of characteristic [16] M. D. Swanson, B. Zhu, and A. H.
functions using wavelet decomposition, Tewfik, ―Robust data hiding for images,‖
prediction-error image, in Proc. IEEE on Digital Signal Processing
and neural network,‖ in Proc. IEEE Int. Workshop, Sep.
Conf. Multimedia and Expo, 1996, pp. 37–40.
Jul. 6–8, 2005, pp. 269–272. [17] D. Wu and W. Tsai, ―A
[12] B. Li, J. Huang, and Y. Q. Shi, steganographic method for images by
―Textural features based universal pixel value
steganalysis,‖ differencing,‖ Pattern Recognit. Lett., vol.
Proc. SPIE on Security, Forensics, 24, pp. 1613–1626,2003.
Steganography and Watermarking [18] X. Zhang and S. Wang, ―Vulnerability
of Multimedia, vol. 6819, p. 681912, 2008. of pixel-value differencing
[13] M. Goljan, J. Fridrich, and T. steganography to histogram analysis and
Holotyak, ―Newblind steganalysis and its modification for enhanced
implications,‖ Proc. SPIE on Security, security,‖ Pattern Recognit. Lett., vol. 25,
Forensics, Steganography and pp. 331–339, 2004.
Watermarking of Multimedia, vol. 6072, [19] C. H. Yang, C. Y. Weng, S. J. Wang,
pp. 1–13, 2006. and H. M. Sun, ―Adaptive data
[14] K. Hempstalk, ―Hiding behind hiding in edge areas of images with spatial
corners: Using edges in images LSB domain systems,‖ IEEE
for better steganography,‖ in Proc. Trans. Inf. Forensics Security, vol. 3, no.
Computing Women‘s Congress,Hamilton, 3, pp. 488–497, Sep. 2008.
New Zealand, 2006. [20] M. Kharrazi, H. T. Sencar, and N.
Authorized licensed [20] R. L. Geiger, J. D. Memon, ―Cover selection for
Solomon, and K. J. Crisler, ―Wireless steganographic embedding,‖ in Proc. IEEE
network extension using mobile IP,‖ IEEE Int. Conf. Image Processing,
Micro, vol. 17, no. 6, pp. 63–68, Nov./Dec. Oct. 8–11, 2006, pp. 117–120.
1997.

190
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

AUTOMATIC DATA EXTRACTION FROM


WEBPAGES BY WEBNLP
***
*Mrs.ANANDHI D**Mr.NEELAKANDAN S Mrs.MAHAALAKSHMI K

*ASST PROF, Department of Information Technology,


VEL TECH MULTI TECH Dr.RR & Dr.SR ENGG COLLEGE, anandhime@ymail.com

**ASST PROF, Department of Information Technology,


VEL TECH MULTI TECH Dr.RR & Dr.SR ENGG COLLEGE, snksnk07@gmail.com

***PG STUDENT (M.TECH/IT), Department of Information Technology,


VEL TECH MULTI TECH Dr.RR & Dr.SR ENGG COLLEGE, kmahaalakshmi@yahoo.com

ABSTRACT-Information Extraction (IE) is the name WebNLP enables bi-directional integration of page
given to any process, which selectively structures structure understanding and text understanding in
and combines data, which is found, explicitly an iterative manner
stated or implied, in one or more texts. The final
output of the extraction process varies; in every 1 INTRODUCTION
case, however, it can be transformed so as to
populate some type of database. Information
Extraction plays an important role in Web The World Wide Web contains huge amounts of
Knowledge discovery and management. data. However, we cannot benefit very much from
Information analysts working long term on specific the large amount of raw webpages unless the
tasks already carry out information extraction information within them is extracted accurately
manually with the express goal of database and organized well. Therefore, information
creation. The two most important tasks in extraction (IE) plays an important role in Web
information extraction from the Web are web page knowledge discovery and management. Among
structure understanding and natural language various information extraction tasks, extracting
sentences processing. Our recent work on web structured Web information about real-world
page understanding introduces a joint model of entities (such as people, organizations, locations,
Hierarchical Conditional Random Fields (HCRFs) publications, products) has received much
and extended Semi-Markov Conditional Random attention of late. However, little work has been
Fields (Semi-CRFs) to leverage the page structure done toward an integrated statistical model for
understanding results in free text segmentation understanding webpage structures and processing
and labeling. The HCRF model can reflect the natural language sentences within the HTML
structure and the Semi-CRF model can make use elements of the webpage. Our recent work on
of the gazetteers. In this top-down integration Web object extraction has introduced a template-
model, the decision of the HCRF model could independent approach to understand the visual
guide the decision making of the Semi-CRF model. layout structure of a webpage and to effectively
However, the drawback of the top down label the HTML elements with attribute names of
integration strategy is also apparent, i.e., the an entity.
decision of the Semi-CRF model could not be used Our latest work on webpage understanding
by the HCRF model to guide its decision making. introduces a joint model of the Hierarchical
The WebNLP framework consists of two Conditional Random Fields (HCRFs) model and the
components, a structure understanding extended Semi-Markov Conditional Random Fields
component and a text understanding component. (Semi-CRFs) model to leverage the page structure

191
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

understanding results in free text segmentation (i.e., wrappers) based on the labeling results.
and labeling. The HCRF model can reflect the Unsupervised approaches do not need labeled
structure and the Semi-CRF model can make use training samples. They first automatically discover
of the gazetteers. In this top-down integration clusters of the webpages and then produce
model, the decision of the HCRF model could wrappers from the clustered webpages. No matter
guide the decision of the Semi- CRF model. how the wrappers are generated, they can only
However, the drawback of the top-down strategy work on the webpages generated by the same
is that the decision of the Semi-CRF model could template. Therefore, they are not suitable for
not be used by the HCRF model to refine its general purpose webpage understanding.
decision making. In this paper, we introduce a In contrast, template-independent approaches can
novel framework called WebNLP that enables process various pages from different templates.
bidirectional integration of page structure However, most of the methods in the literature
understanding and text understanding in an can only handle some special kinds of pages or
iterative manner. In this manner, the results of specific tasks such as object block (i.e., data
page structure understanding and text record) detection. . Zhai and Liu proposed an
understanding can be used to guide the decision algorithm to extract structured data from list
making of each other, and the performance of the pages. The method consists of two steps. It first
two understanding procedures is boosted identifies individual records based on visual
iteratively. information and a tree matching method. Then a
Although the WebNLP framework is motivated by partial tree alignment technique is used to align
multiple mentions of object attributes (named and extract data items from the identified records.
entities) in a webpage, it will also improve entity First, they use the Vision-based Page
extraction from webpages without multiple Segmentation (VIPS) algorithm to partition a
mentions because of the joint optimization nature webpage into semantic blocks with a hierarchy
of the framework structure. Then, spatial features (such as position
The main contributions of this work are as follows: and size) and content features (such as the
1. We introduce a novel framework for number of images and links) are extracted to
webpage understanding called WebNLP to boost construct a feature vector for each block. Based
the perfor- mance of page structure understanding on these features, learning algorithms, such as
and shallow natural language processing SVM and neural network, are applied to train
iteratively. various block importance models.
2. We introduce the multiple occurrence However, the natural language understanding
features to the WebNLP framework. It improved component in the loop needs to be accurate
both the precision and recall of the named entity enough to provide positive feedback to the
extraction and structured Web object extraction on structure understanding compo- nent. In the
a webpage. Semi-CRF model is designed to handle simple text
3. Shallow natural language processing fragment segmentation, such as the segmenta-
features are applied to the WebNLP framework, tion between city, state, and zip code. Therefore,
which allows training of the natural language the model only contains some regular expression
features on existing large corpus different from features. For these regular expression features,
the limited labeled webpages. the model can be trained to achieve nearly optimal
parameters with only hundreds of labeled
2 RELATED WORK webpages. However, these features are not com-
prehensive enough to segment and label the
Webpage understanding plays an important role in natural language sentences in the webpage for
information retrieval from the Web. There are two tasks like business name extraction.
main branches of work for webpage It is natural to close the loop in webpage
understanding: template-dependent approaches understanding by introducing a bidirectional
and template-independent approaches. integration model, where the bottom-up model
Template-dependent approaches (i.e., wrapper- using text understanding to guide struc- ture
based approaches) can generate wrappers either understanding is integrated with the top-down
with supervision or without supervision. The model mentioned above. However, the natural
supervised approaches take in some manually language understanding component in the loop
labeled webpages and learn some extraction rules needs to be accurate enough to provide positive

192
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

feedback to the structure understanding compo- model and the Semi- CRF model. It also extends
nent. In [11], the Semi-CRF model is designed to the Semi-CRF model to take the vision node label
handle simple text fragment segmentation, such assignment as an input of the feature functions.
as the segmenta- tion between city, state, and zip The label of the vision node is actually a switch. It
code. Therefore, the model only contains some eliminates unnecessary searching paths in the
regular expression features. optimization procedure in the Semi-CRF model.
The task of the template-independent webpage This joint model is in fact only a top-down
understanding is defined in this paper as the task integration, where only the label of the vision node
of page structure understanding and text content can guide the segmentation and labeling of its
segmentation and labeling. The state-of-the-art inner text. The labeling of the text strings cannot
webpage structure under- standing algorithm is be used to refine the labeling of the vision nodes
the HCRF algorithm. HCRF organizes the Observing the drawback of existing models, we
segmentation of the page hierarchically to form a propose our WebNLP framework. The differences
tree structure and conducts inference on the vision between the model in and the WebNLP framework
tree to tag each vision node (vision block) with a are obvious. First, the WebNLP framework is a
label. The HCRF model has been proved effective bidirectional integration strategy, where the page
for product information extraction. Since the structure understanding and the text
attribute values of product objects (such as the understanding are reinforced by each other in an
product price, image, and description) are usually iterative way. It closes the loop in the webpage
the entire text content of an HTML element, text understanding. Second, we introduce multiple
segmentation of the content within HTML mention features in this new framework. Our
elements is done as a postprocessing step. model treats the segmentation and labeling
The requirement of text understanding in decision at all mentions of one same entity as its
information retrieval is simpler than classical observation. Such a treatment greatly expands the
natural language under- standing. Deep parsing of valid features of the entity to make more accurate
the sentences is unnecessary in most of the cases. decisions. Third, we introduce an auxiliary corpus
Shallow parsing that can extract some important to train the weights of the statistical language
named entities is usually enough. The most features of the extended Semi-CRF model. It
popular technique used for named entity makes our model perform much better than the
recognition is Conditional Random Field (CRF), extended Semi-CRF model in with only regular
which is language independent. The combination expression matching features and sequential
of structure understanding and text understanding structure features.
is natural. All this work holds the belief that the
structure understanding can help the text
understanding. For example, Zhu et al. described a 3 PROBLEM DEFINITON
joint model that was able to segment and label the
text within the vision node. It integrates the HCRF
This paper aims at introducing a joint framework whole webpage. All the leaf nodes form the most
that can segment and label both the structure detailed flat segmentation of the webpage. Only
layout and text in the webpage. In this section, we leaf nodes have inner text content. The text
first introduce the data representation of the content inside leaf nodes may contain information
structure layout of the webpage and the text like business name. The text content could be
content within the webpage. Then, we formally structured text like address lines or grammatical
define the webpage understanding problem paragraphs, which contain the attribute values of
3.1 Data Representation an entity.
We use the VIPS approach to segment a webpage In this study, we use the vision tree as the data
into visually coherent blocks. VIPS makes use of representation for the structure understanding. We
page layout features, such as client region, font, use X ¼ fx1; x2; . . . ; xi; . . . ; xjXjg to denote
color, and size, to construct a vision tree the entire vision tree of a webpage. xi is the
representation of the webpage. Different from the observation on the ith vision node, which can be
HTML DOM tree, each node in the vision tree either inner node or leaf node. The observation
represents a region on the webpage. The region of contains both the visual information, e.g., the
the parent node is the aggregation of those of all position of the node, and the semantic
its child nodes. The root node represents the information, e.g., the text string within the node.

193
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Each vision node is associated with a label h to formal definitions of the two conditional
represent the role of the node on the whole tree, optimization problems are as follows:
e.g., whether the node contains all or some of the Definition 2 (Structure understanding). Given a
attributes of an object. So, H ¼ fh1; h2; . . . ; hi; . vision tree X and the text segmentation and
. . ; hjXjg represents the label of the vision tree X. labeling results S on the leaf nodes of the tree,
We denote the label space of h by Q. structure understanding is to find the optimal label
The text string within the leaf node is represented assignment of all the nodes in the vision tree HÃ
by a character sequence. Understanding the text The objective of the structure understanding is to
means to segment the text into nonoverlapping identify the labels of all the vision nodes in the
pieces and tag each piece with a semantic label. vision tree. Both the raw observations of the
In this paper, text under- standing is equal to text nodes in the vision tree and the understanding
segmentation and labeling. We use s ¼ fs1; s2; . . results about the text within each leaf node are
. ; sm; . . . ; sjsjg to represent the segmentation used to find the optimal label assignment of all the
and tagging over the text string within a leaf node nodes on the tree.
x. Each segment in s is a triple sm ¼ fffm; fim; Definition 3 (Text understanding). Given a vision
ymg in which ffm is the starting position; fim is the tree X and the label assignment H on all vision
end position; and ym is the segment label that is nodes, text understanding is to find the optimal
assigned to all the characters within the segment. segmentation and labeling SÃ on the leaf nodes:
We use jxj to denote the length of the text string The task of the text understanding problem in
within the vision node x. Then, segment sm entity extraction is to identify all the named
satisfies 0 ffm < fim jxj and ffmþ1 ¼ fim þ 1. entities in the webpage. The labeling results of the
Named entities are some special segments vision nodes will constrain the text understanding
differentiated from other segments by their labels. component to search only part of the label space
We denote the label space of y by Y. All the of the named entities. The labels of the named
segmentation and tagging of the leaf nodes in the entities within a vision node are forced to be
vision tree are denoted by S ¼ fs1; s2; . . . ; si; . . compatible with the label of the node assigned by
. ; sjSjg. Unless otherwise specified, these symbols the structure understanding The problem
defined above have the same meaning throughout described in Definition 1 can be solved by solving
the paper. 3.2 Problem Definition Given the data the two subproblems in Definition 2 and Definition
representation of the page structure and text 3 iteratively, starting from any reasonable initial
strings, we can define the webpage understanding solution. In Definition 2, the S in the condition is
problem formally as Follows: the optimum of the text understanding in the last
Definition 1 (Joint optimization of structure iteration, and in Definition 3, the H in the condition
understanding and text understanding). Given a is the optima of the structure understanding in the
vision tree X, the goal of joint optimization of last iteration. The iteration can begin with either
structure understanding and text understanding is the structure understanding or text understanding.
to find both the optimal assignment of the node In this work, we will begin with the text
labels and text segmentations ðH; SÞÃ: understanding. The features related to the label
This definition is the ultimate goal of webpage given by structure understanding are set as zero in
understanding, i.e., the page structure and the the first run of text understanding. The loop stops
text content should be understood together. when the optima in two adjacent iterations are
However, such a definition of the problem is too close enough
hard because the search space is the Cartesian
product of Q and Y. Fortunately, the negative 4 WEBNLP FRAMEWORK
logarithm of the posterior in (1) will be a convex
function, if we use the exponential function as the In this section, we introduce the WebNLP
potential function [24]. Then we can use the framework to solve the webpage understanding
coordinatewise optimization to optimize H and S problem We first introduce the framework
iteratively. In this manner, we can solve two intuitively and describe the individual models
simpler conditional optimization problems instead within the framework formally. Then, we describe
of solving the joint optimization problem in (1) how we integrate the page structure
directly, i.e., we do structure understanding and understanding model and the text understanding
text understanding separately and iteratively. The model together in the framework. The parameter

194
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

learning method and the label assignment represent all functions defined on the vertex,
procedure will be explained last. edge, or triangle. As the WebNLP framework is an
4.1 Overview iterative one, we further use the superscript j to
The WebNLP framework consists of two indicate the decision in the jth iteration.
components, i.e., a structure understanding 4.3 Model Integration
component and a text under- standing component. We will analyze how these two models are
The observations of these two components are integrated together in this section. Fig. 2 gives an
both from the webpage. The understanding results illustrativeexample of the connection between the
of one component can be used by the other extended HCRF model and the extended Semi-CRF
component to make a decision. The information model in a webpage. It is generated based on the
flows between the two components form a closed example webpage shown in Fig. 1. There are two
loop. The beginning of the loop is not very types of connections in the integrated model. One
important. However, we will show that starting is the connection between the vision node label
from the text understanding component is a good and the segmentation of the inner text. The other
choice. is the connection between multiple mentions of a
The structure understanding component assigns same named entity.
labels to the vision blocks in a webpage, 4.3.1 Vision Tree Node and Its Inner Text
considering visual layout features directly from the The natural connection between the extended
webpage and the segments returned by the text HCRF model and the extended Semi-CRF model is
understanding component together. If the via the vision tree node and its inner text. The
segments of the inner text are not available, it will feature functions that connect the two models are
work without such information. The text rkð_Þ in the extended Semi-CRF model and ekð_Þ
understanding component segments the text in the extended HCRF model. Feature function
string within the vision block according to the rkð_Þ in the extended Semi-CRF model takes the
statistical language features and the label of the labeling results of the leaf node given by the
vision block assigned by the structure extended HCRF model as its input. Feature
understanding component. If the label of the function ekð_Þ in the extended HCRF model uses
vision block is not available, it can also work the segmentation and labeling results of the
without such information. The two components extended Semi-CRF model as its input.
run iteratively until some stop criteria are met. 4.3.2 Multiple Mentions
Such iterative optimization can boost both the In many cases, a named entity has more than one
performance of the structure understanding mention within a webpage. Therefore, it is natural
component and text understanding component. to collect evidence from all the different mentions
4.2 The Extended Models of one same named entity to make a decision on
As we introduced previously, the state-of-the-art all these occurrences together. The evidence from
models for webpage structure understanding and all the other mentions of a named entity is
text understanding are the HCRF model and the delivered to the vision tree node, where one of the
Semi-CRF model, respectively. However, there is mentions of the named entity lies via feature
no way to make them interact with each other in function ukð_Þ, when the extended Semi-CRF
their original forms. Therefore, we extend them by model is working.
introducing additional input parameters to the ukð_Þ can introduce the segmentation and
feature functions. The original forms of the HCRF labeling evidence from other occurrences of the
model and the Semi-CRF model have been text fragment all over the current webpage. By
introduced in Section 4. There- fore, we will only referencing the decision Sj_1 all over the text
introduce the forms of the extended HCRF model strings in last iteration, ukð_Þ can determine
and the extended Semi-CRF model in this section. whether the same text fragment has been labeled
We first extend the HCRF model by introducing as an ORGANIZATION elsewhere, or whether it
other kinds of feature functions. These feature has been given a label other than STREET. By this
functions take the segmentation of the text strings means, the evidence for a same named entity is
as their input. Analogizing to the feature functions shared among all its occurrences within the
defined in Section 4.2, we use ekðHjt; X; SÞ to webpage.
represent the feature functions having text strings
segmentation input. To simplify the expression, we
use the functions defined on the triangle to

195
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

5 EXPERIMENTS the extended Semi-CRF model. We name it the


We have carried out two sets of experiments to Natural language HCRF and extended Semi-CRF
illustrate the effectiveness of the WebNLP (NHS) algorithm. The extended Semi- CRF model
framework. The first set of experiments is based in the NHS algorithm is trained using both the text
on the Windows Live Local search service. It nodes from the labeled webpages and the corpus
focuses on the local business object extraction data.
from English webpages. The second set of The superlabels of all sentences from the Live
experiments is to extract named entities from search are set as NAME because we only queried
Chinese webpages for a social network research the NAME. The rest of this algorithm is identical
project. We will describe the experiments in detail to the BHS algorithm. The third algorithm is based
in the following sections on the NHS algorithm. It further adds global
5.1 Experiments on Local Search Service multiple mentions feature functions to the HCRF
The task of this application is to extract local model. We name this algorithm Multiple mentions
entities within a webpage for Windows Live Local HCRF and extended Semi-CRF (MHS). These
search. The attributes of a business entity in this feature functions are all for the business NAME
experiment include the business NAME, STREET, attribute. The summaries of the features at other
CITY, and STATE. These attributes are essential to mentions of the same
identify a specific business in Windows Live Local business NAME candidate are used as feature
search. functions for the current mention. It is some kind
5.1.1 Data Set of feature sharing.
The webpages used in this experiment are Results and Discussions
crawled according to a business name list. The object extraction results, as well as the
Because it is required that every webpage should attributes extraction results, of different algorithms
contain all the four attributes to identify a business are reported We can see that the P/R/F1 of all the
entity, we select 456 pages from all crawled pages attributes and the object of the proposed WebNLP
satisfying this requirement. An example of the framework are the highest among the four
page used in this experiment has been shown in algorithms. An interesting result in Table 1 is that
Fig. 1. The attributes in these webpages are all the precision of the attributes CITY and STATE of
manually labeled. We randomly select 200 pages all these algorithms is 100 percent. We checked
for training and the remaining 256 pages are left the value of the features related to CITY and
for testing. The auxiliary training corpus for the STATE, and found out that the 100 percent
statistical language features in the extended Semi- precision is because the validation features from
CRF model comes from two sources. The first one the gazetteer are quite strong, i.e., all erroneous
is the text in the labeled webpages. There were extractions of the city and state are filtered by the
3,030 sentences from this source. The second one gazetteer-based features. We also analyzed the
was automatically generated from Microsoft Live contribution of different components of the
search. We sent queries with quoted company framework. It shows that all components of the
names, which are randomly selected from yellow WebMLP framework contribute to its good
pages data, to Live search engine. Then, we performance.
filtered the returned snippets to select the The contribution of the statistical language
sentences containing the company name. In this features can be seen from the comparison of the
way, we could conveniently get a large amount of NHS algorithm and the BHS algorithm. The
accurate labeled data quickly. We picked 30,000 statistical language features in the NHS algorithm
sentences from this source. help to improve the precision of the business
5.1.2 Methods and Evaluation Metrics NAME and STREET because the two attributes
We compared four different algorithms in this may not be easy to segment precisely without
experiment. They are BHS, NHS, MHS, and some statistical language evidence. The NLP
WebNLP. The first algorithm is the original HCRF features provide accurate segmentation
and extended Semi-CRF framework. We name it suggestions for the extended Semi-CRF model.
the Basic HCRF and extended Semi-CRF (BHS) Though the recall of some attributes becomes low,
algorithm. It is the algorithm described in [11]. it can be amended by future components added to
The second algorithm is similar to the BHS the framework. We have to admit that the
algorithm. The only difference from BHS is that it comparison between NHS and BHS is a bit unfair
adds the natural language features directly into because NHS used an additional corpus that was

196
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

not seen by BHS. However, it is a practical and the extended Semi-CRF model for text
strategy to incorporate as many resources as understanding. The performance of both models
possible, as long as these resources are easy to can be boosted in the iterative optimization
obtain and the algorithm can handle them. For the procedure. The auxiliary corpus is introduced to
HNS algorithm, the additional corpus is easy to train the
obtain and it can handle it without too much statistical language features in the extended Semi-
effort. CRF model for text understanding, and the
The contribution of the multiple mention features multiple occurrence features are also used in the
is reflected by the difference between the MHS extended Semi-CRF model by adding the decision
algorithm and the NHS algorithm. The multiple of the model in last iteration. Therefore, the
mention features helped the MHS algorithm to extended Semi-CRF model is improved by using
increase both the precision and recall of the both the label of the vision nodes assigned by the
business NAME compared with the NHS algorithm. HCRF model and the text segmentation and
However, we can see that the improvement is labeling results, given by the extended Semi-CRF
limited. It proves that the simple feature sharing model itself in last iteration as additional input
mechanism could not fully utilize the information parameters in some feature functions; the
The WebNLP framework gets the best numbers on extended HCRF model benefits from the extended
all attributes and the object as a whole. It amends Semi-CRF model via using the segmentation and
the decrease of the recall of CITY and STATE by labeling results of the text
reusing part of the Semi-CRF model in BHS. The strings explicitly in the feature functions. The
iterative labeling procedure greatly improved the WebNLP framework closes the loop in webpage
recall of the business NAME. In our experiment, understanding for the first time. The experimental
we found out that two iterations were enough to results show that the WebNLP framework
make the labeling procedure converged. performs significantly better than the state-ofthe-
Therefore, the process of the WebNLP algorithm in art algorithms on English local entity extraction
this experiment was Semi-CRF ! HCRF ! Semi-CRF and Chinese named entity extraction on
! HCRF ! Semi-CRF. webpages.
We can also conclude from Table 1 that the object
extraction benefits from the improvement of the
attribute extraction, i.e., the extended Semi-CRF REFERENCES
model helps the extended HCRF model to make a [1] J. Cowie and W. Lehnert, ―Information
better decision on the object block extraction. Extraction,‖ Comm. ACM,vol. 39, no. 1, pp. 80-91,
Essentially, the object is described by its 1996.
associated attributes. [2] C. Cardie, ―Empirical Methods in Information
Extraction,‖ AI Magazine, vol. 18, no. 4, pp. 65-80,
1997.
[3] R. Baumgartner, S. Flesca, and G. Gottlob,
6 CONCLUSIONS ―Visual Web Information Extraction with Lixto,‖
Proc. Conf. Very Large Data Bases(VLDB), pp.
Webpage understanding plays an important role in 119-128, 2001.
Web search and mining. It contains two main [4] A. Arasu and H. Garcia-Molina, ―Extracting
tasks, i.e., page structure understanding and Structured Data from Web Pages,‖ Proc. ACM
natural language understanding. However, little SIGMOD, pp. 337-348, 2003.
work has been done toward an integrated [5] D.W. Embley, Y.S. Jiang, and Y.-K. Ng,
statistical model for understanding webpage ―Record-Boundary Discovery in Web Documents,‖
structures and processing natural language Proc. ACM SIGMOD, pp. 467-
sentences within the HTML elements. 478, 1999.
In this paper, we introduced the WebNLP [6] N. Kushmerick, ―Wrapper Induction: Efficiency
framework for webpage understanding. It enables and Expressiveness,‖ Artificial Intelligence, vol.
bidirectional integration of page structure 118, nos. 1/2, pp. 15-68, 2000. YANG ET AL.:
understanding and natural language CLOSING THE LOOP IN WEBPAGE
understanding. Specifically, the WebNLP UNDERSTANDING 649 TABLE 2 Extraction
framework is composed of two models, i.e., the Evaluation of NLP and WebNLP Authorized
extended HCRF model for structure understanding licensed use limited to: LA TROBE UNIVERSITY.

197
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Downloaded on July 17,2010 at 10:28:16 UTC Agents and Multi-Agent Systems, vol. 4, nos. 1/2,
from IEEE Xplore. Restrictions apply. pp. 93-114, 200 1.
[7] K. Lerman, S. Minton, and C.A. Knoblock, [9] J. Zhu, Z. Nie, J.-R. Wen, B. Zhang, and W.-Y.
―Wrapper Maintenance: A Machine Learning Ma, ―Simultaneous Record Detection and Attribute
Approach,‖ J. Artificial Intelligence Labeling in Web Data Extraction,‖
Research (JAIR), vol. 18, pp. 149-181, 2003. Proc. Int‘l Conf. Knowledge Discovery and Data
[8] I. Muslea, S. Minton, and C.A. Knoblock, Mining (KDD), pp. 494-503, 2006
―Hierarchical Wrapper Induction for
Semistructured Information Sources,‖ Autonomous
.

198
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

IMPLEMENTATION OF ENCRYPTED IMAGE


COMPRESSION USING RESOLUTION
PROGRESSIVE COMPRESSION SCHEME

*Mathew P C **M. Arunkumar

*II M.E CC, PSNA College of Engineering and Technology, Dindigul, Tamil Nadu .
Email: pcmathew_bmc@yahoo.com
**Lecturer, Department of Information and Technology, PSNA College of Engg and Technology,
Dindigul, Tamil Nadu

represented as: ƒij where ƒij= ƒ(x, y) When x, y


Abstract—When it is desired to transmit redundant and the amplitude value of ƒ are finite, discrete
data over an insecure channel, it is customary to quantities, the image is called a ―digital image‖.
encrypt the data. Here the image data is first The finite set of digital values is called picture
encrypted and then it undergoes compression in elements or pixels. Typically, the pixels are stored
resolution. Here the decoder gets only lower in computer memory as a two-dimensional array
resolution version of the image. The source or matrix of real number.
dependency is exploited to improve the Color images are formed by a combination of
compression efficiency. individual 2-D images. Many of the image
processing techniques for monochrome images
Index Terms— Advanced Encryption Standard, can be extend to color image (3-D) by processing
Encrypted images, Image Processing, Markov the three components image individually.
Decoding. Digital image processing refers to processing a
I. INTRODUCTION digital image by mean of a digital computer, and
Image processing is any form of information the study of algorithms for their transformation.
processing, in which the input is an image. Image Since the data of digital image is in the matrix
processing studies how to transform, store, form, the DIP can utilize a number of
retrieval the image. Digital Image Processing is the mathematical techniques. The essential subject
use of computer algorithms to perform image areas are computational linear algebra, integral
processing on digital images. transforms, statistics and other techniques of
Many of the techniques of image processing numerical analysis. Many DIP algorithms can be
were developed with application to satellite written in term of matrix equation, hence,
imagery, medical imaging, object recognition, and computational method in linear algebra become an
photo enhancement. With the fast computers and important aspect of the subject.
signal processors available in the 2000s, digital Digital image processing encompasses a wide
image processing has become the most common and varied field of application, such as area of
form of image processing, and is generally used image operation and compression, computer
because it is not only the most versatile method, vision, and image analysis (also called image
but also the cheapest. understanding). There is the consideration of three
An image can be defined as a two-dimensional types of computerized processing: low-level
function f(x, y) (2-D image), where x and y are processing is characterized by that both its inputs
spatial coordinates, and the amplitude of f at any and outputs are images; mid-level processing on
pair of (x, y) is gray level of the image at that images is characterized by the fact that its input
point. For example a gray level image can be are images, but outputs are attributes extracted

199
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

from those images, while higher-level processing provider. John wants to keep the information
involves ―making sense‖ of an ensemble of confidential to Ben. In this situation, John encrypts
recognized objects as in image analysis, and the data using a simple cipher and gets its
performing the cognitive function associated with forwarded. Thus Ben can compress the data
human vision. In particular digital image without accessing the secret key. If Ken holds the
processing is the practical technology for area of: secret key used by John, then he will be able
Image compression, Classification, Feature perform joint decryption and decompression. Thus
extraction, Pattern recognition, Projection, Multi- the overall performance of the system can be
scale signal analysis. increased by doing this. This is illustrated in the
figure 2.
The rest of the paper is organized as follows.
II. COMPRESSION OF ENCRYPTED IMAGES Section III gives about the related work in this
Security in communication systems has become area. Section IV gives a detailed explanation of the
increasingly important in recent times. The proposed system. Section V concludes and
Internet has become a hostile environment with discusses about the future work.
both wired and wireless channels offering no
inherent assurance of confidentiality. Strong
encryption schemes, such as the Advanced
Encryption Standard (AES), have been designed to Compression Decompression
provide confidentiality for arbitrary binary data.
However, communications have become
increasingly multimedia in nature and such strong
encryption schemes do not take into account the
special characteristics of multimedia data and the Encryption Decryption
way in which they are accessed. Images and video
are typically large in size compared to text and
audio, and often already consume significant Fig. 2. Conventional approach for secure data
computational resources at both the source and transmission.
receiver for coding and decoding, respectively.
Also, applications such as remote surveillance may III. RELATED WORK
involve the streaming of sensitive visual image
data over untrusted networks. Confidentiality may In the existing system, Lossless compression of
be required, but blindly applying a strong encrypted sources can be achieved through
encryption scheme such as AES would demand a Slepian-Wolf coding [3]. For encrypted real-world
prohibitive amount of computational resources for sources, such as images, the key to improve the
the large volume of real-time data. Other compression efficiency is how the source
applications, such as online collaboration, may dependency is exploited. Trellis Coded Vector
involve the use of power limited mobile devices, Quantization [3] can also be used for compressing
such as mobile phones and personal digital the encrypted image sources. It has been reported
assistants (PDAs) with embedded imaging that good results are produced for the binary
capabilities, forming ad-hoc wireless networks. images. But still challenges remain when it comes
Most of the computational resources of the devices practical in real world applications. The coding
are dedicated to the coding anddecoding of the efficiency can be improved only by exploiting the
visual data, making the application of schemes source dependency. Both these two techniques
such as AES exceedingly difficult or impossible. have the following disadvantages.
For secure transmission of data through the • Markov decoding in a Slepian-Wolf coding
communication channel, the data is usually first is expensive with computational complexity.
compressed and then encrypted at the source and • The source dependency is not fully
at the destination the data is received and is utilized.
decrypted followed by decompression [1]. This is • Since image and video are highly
illustrated in the figure 1. nonstationary, the Markov model cannot describe
But this traditional method is not suitable for its local statistics precisely.
some applications. For example if John want to
send information to Ken, while Ben is the network

200
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

• For 8-bit gray scale images, only two most outstrip the capabilities of available technologies.
significant bit-planes are compressible by In a distributed environment large image files
employing a 2-D Markov model in bit planes[10]. remain a major bottleneck within systems. Image
A. Encryption Compression is an important component of the
Image encryption techniques try to convert an solutions available for creating image file sizes of
image to another one that is hard to understand. manageable and transmittable dimensions.
On the other hand, image decryption retrieves the Platform portability and performance are important
original image from the encrypted one. There are in the selection of the compression/decompression
various image encryption systems to encrypt and technique to be employed.
decrypt data, and there is no single encryption Image compression has become increasingly
algorithm satisfies the different image types. important with the continuous development of
Internet, remote sensing and satellite
communication techniques. Due to the high cost of
providing a large transmission bandwidth and a
huge amount of storage space, many fast and
efficient image compression engines have been
introduced.
In image processing applications such as web
browsing, photography, image editing and
printing, a lossy coding such as JPEG is sufficient
as an image compression tool. Although some
information loss can be tolerated in most of these
applications, there are certain images processing
Fig. 2. Secure transmission using compression of applications that demand no pixel difference
encrypted data. between the original and the reconstructed image.
Such applications include medical imaging, remote
Most of the algorithms specifically designed to sensing, satellite imaging and forensic analysis
encrypt digital images are proposed in the mid- where a lossless compression is extremely
1990s. There are two major groups of image important.
encryption algorithms: (a) non-chaos selective
methods and (b) Chaos-based selective or non- IV. PROPOSED WORK
selective methods. Most of these algorithms are In the proposed system, in order to achieve
designed for a specific image format compressed efficient compression of encrypted images, a
or uncompressed, and some of them are even Resolution Progressive Compression scheme is
format compliant. There are methods that offer used. Here the encryption is performed using RSA
light encryption (degradation), while others offer algorithm. Here it compresses an encrypted image
strong form of encryption. Some of the algorithms progressively in resolution, such that the decoder
are scalable and have different modes ranging can observe a low-resolution version of the image,
from degradation to strong encryption. study local statistics based on it, and use the
statistics to decode the next resolution level. The
B. Image Compression success of RPC scheme is due to enabling partial
Data Compression is one of the technologies for access to the current source at the decoder side to
each of the aspect of this multimedia revolution. improve the decoder‘s learning of the source
Cellular phones would not be able to provide statistics.
communication with increasing clarity without data The encoder gets the ciphertext and decomposes
compression. Data compression is art and science it into four subimages, namely, the 00, 01, 10, and
of representing information in compact form. 11 sub-images. Each sub-image is a
Uncompressed multimedia (graphics, audio and downsampled-by-two version of the encrypted
video) data requires considerable storage capacity image. When the decomposition image is
and transmission bandwidth. Despite rapid obtained, we try to find a way how to code the
progress in mass-storage density, processor wavelet coefficients into an efficient result, taking
speeds, and digital communication system redundancy and storage space into consideration.
performance, demand for data storage capacity
and data-transmission bandwidth continues to

201
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

System Description

Context Adaptive
Interpolation

Fig. 4. Trees of wavelet coefficients.

Localized Channel We define the function


Estimation

Fig. 3. Block Diagram of RPC Scheme. Three lists are maintained by the algorithm:
1) list of insignificant sets (LIS);
SPIHT is one of the most advanced schemes 2) list of insignificant pixels (LIP);
available, even outperforming the state-of-the-art 3) list of significant pixels (LSP).
JPEG 2000 in some situations. The basic principle The LIS contains two types of entries,
is the same; a progressive coding is applied, representing the sets D(i,j) and L(i,j). The LIP is a
processing the image respectively to a lowering list of insignificant coefficients that do not belong
threshold. The difference is in the concept of to any of the sets in the LIS. The LSP is a list of
zerotrees (spatial orientation trees in SPIHT). This coefficients that have been identified as
is an idea that takes bounds between coefficients significant. The SPIHT algorithm encodes the
across subbands in different levels into wavelet coefficients by selecting a threshold such
consideration. The first idea is always the same: if that T≤max(i,j)│Cij,│< 2T, where (i,j) ranges over
there is a coefficient in the highest level of all coordinates in the coefficient matrix. Initially,
transform in a particular subband considered the LIP contains the coefficients in H, the LIS
insignificant against a particular threshold, it is contains D(i,j) entries, where (i,j) are coordinates
very probable that its descendants in lower levels with descendants in H , and LSP is empty. During
will be insignificant too, so we can code quite a the sorting pass, the significant coefficients in the
large group of coefficients with one symbol. In the LIS are identified by partitioning the sets D(i,j)into
SPIHT algorithm, each 2x2 block of coefficients in L(i,j) and the individual coefficients in O(i,j) or
the root level corresponds to three trees of L(i,j) into D(k,l),where (k,l)εO(i,j). During the
coefficients, as shown in Fig. 4. The coefficient at refinement pass, all coefficients in LSP that have
(i, j) is denoted as Ci,j. The following sets of been identified as significant in previous passes
coefficients are defined. are then refined in a way similar to binary
•O(i,j) is the set of coordinates of the children of search.ach significant coefficient is moved to the
the coefficient at (i,j) . LSP. The threshold is decreased by a factor of two,
•D(i,j)is the set of coordinates of all descendants and the above steps are repeated. The encoding
of the coefficient at (i,j). process stops when the desired bit rate is reached.
•H is the set of coordinates of all coefficients in The output is fully embeddedso that the output at
the root level. a higher bit rate contains the output at all lower
. L(i,j)= D(i,j)- O(i,j) bit rates embedded at the beginning of the data
Given a threshold T= 2n, a set of coefficients S is stream.
significantif there is a coefficient in S whose
magnitude is at least T.

202
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A feedback channel is needed for the encoder to


know how many bits to transmit for each sub-
image, which generally increases the transmission
delay. However, this cost is reasonable because
the encoder has no idea about the source statistics
and cannot determine the coding rate. It is the
decoder who is able to learn such information and
advise the encoder. On the other hand, the
feedback channel does consume some bandwidth,
but the consumption is not directly related to the
compression efficiency, and the amount of
information transmitted through the feedback
channel is minimal.
The SI generation in our scheme is through
interpolation. For the sake of simplicity, for any
pixel in the target sub-image, we only use the four
Fig. 5. Layout of three-level decomposition of the horizontal and vertical neighbors or the four
unencrypted image. diagonal neighbors in the known sub-image(s) for
the interpolation. Intuitively, the SI quality will be
The algorithm has several advantages. The first better, if the neighbors are geometrically closer to
one is an intensive progressive capability we can the pixel to be interpolated. Hence, we use a two-
interrupt the decoding (or coding) at any time and step interpolation in each resolution level to
a result of maximum possible detail can be improve the SI estimation. First, sub-image 11 is
reconstructed with one-bit precision. This is very interpolated from sub-image 00; after sub-image
desirable when transmitting files over the internet, 11 is decoded, we use both 00 and 11 to
since users with slower connection speeds can interpolate 01 and 10.
download only a small part of the file, obtaining
much more usable result when compared to other
codec such as progressive JPEG. Second
advantage is a very compact output bitstream with
large bit variability – no additional entropy coding
or scrambling has to be applied.
Decoding starts from the 00 sub-image of the
lowest-resolution level, say, level N. We suggest
transmitting the uncompressed 00Nsub-image as
the doped bits. Thus, the 00N sub-image can be
known by the decoder without ambiguity, and
knowledge about the local statistics will be derived
Fig. 7.Two-step interpolation at the decoder side.
based on it. Next, other sub-images of the same
resolution level are interpolated from the
Slepian-Wolf decoding treats the SI as a noisy
decrypted 00N sub image.
version of the source to be decoded. We can
consider that there is a virtual channel between
the source and the SI. To perform Slepian-Wolf
decoding, it is also necessary for the decoder to
estimate the statistics of the virtual channel. The
encoder decomposes each encrypted image into
four resolution levels. The sub-images in the
lowest-resolution level are sent without
compression. But the decoder still performs inter
Fig. 6. Decoders Diagram in decoding the 11n sub sub-image interpolation. For the other sub-images,
image we transmit the four least significant bit-planes
(LSB) as raw bits, because there is not much gain
to employ Slepian-Wolf coding on them. The four

203
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

LSBs are sent prior tothe MSBs, such that the [5] J. D. Slepian and J. K. Wolf, ―Noiseless coding
decoder can have better knowledge about the of correlated information sources,‖ IEEE Trans.
pixels before starting decoding the MSBs. The four Inf. Theory, vol. IT-19, pp. 471–480, Jul. 1973.
MSBs, on the other hand, are Slepian-Wolf [6] -Minn Ang and Kah Phooi
encoded using rate-compatible punctured turbo Seng, ―Lossless Image Compression using Tuned
codes in a bit-plane based fashion. The sending Degree-K Zerotree Wavelet Coding‖, Proceedings
rate of each Slepian-Wolf coded bit-plane is of the International MultiConference of Engineers
determined by the decoder‘s feedback. and Computer Scientists Vol I, IMECS 2009, March
18 - 20, 2009, Hong Kong.
[7]A. A. Kassim, W. S. Lee: Embedded Color
Image Coding Using SPIHT With Partially Linked
Spatial Orientation Tree, IEEE Transactions on
V. CONCLUSION Circuits and Systems for Video Technology, vol.
An efficient compression of encrypted image data 13, pp. 203-206, 2003.
scheme was proposed, employing SPIHT [8] Q. Yao, W. Zeng, and W. Liu, ―Multi-resolution
compression algorithm and RSA algorithm. Here based hybrid spatiotemporal compression of
this method provides a better coding efficiency encrypted videos,‖ in Proc. IEEE Int. Conf. Acous.,
and less computational complexity than existing Speech and Sig. Process., Taipei, Taiwan, R.O.C.,
approaches. This technique allows only partial Apr. 2009, pp. 725–728.
access to current sources at the decoder side. [9] J. Bajcsy and P. Mitran, ―Coding for the
Thus, further in the future we could try to use for Slepian-Wolf problem with turbo codes,‖ in Proc.
compression of encrypted videos where Resolution IEEE Global Telecommun. Conf., San Antonio, TX,
Progressive Compression Scheme can be used for Nov. 2001, pp. 1400–1404.
interframe and intraframe correlation learning at [10] Wei Liu, Wenjum Zeng, Lina Dong, and
the decoder side. Qiuming Yao, ―Efficient Compression of Gray Scale
REFERENCES Images‖, Vol. 19, no.4, Apr 2010.
[11] J.J. Amador, R. W.Green ―Symmetric-Key
[1] M. Johnson, P. Ishwar, V. M. Prabhakaran, D. Block Cipher for Image and Text Cryptography‖:
Schonberg, and K. Ramchandran, ―On International Journal of Imaging Systems and
compressing encrypted data,‖ IEEE Trans. Signal Technology, No. 3, 2005, pp. 178-188.
Process., vol. 52, no. 10, pp. 2992–3006, Oct. [12] M. J. Weinberger, J. J. Rissanen, and R. B.
2004. Arps, ―Applicationsof universal context modeling to
[2] A. Liveris, Z. Xiong, and C. Georghiades, lossless compression of gray-scale images,‖ IEEE
―Compression of binary sources with side Trans. Image Processing, vol. 5, pp. 575–586, Apr.
information at the decoder using LDPC codes,‖ 1996.
IEEE Commun. Lett., vol. 6, no. 10, pp. 440–442,
Oct. 2002.
[3] Y. Yang, V. Stankovic, and Z. Xiong, ―Image
encryption and data hiding: Duality and code
designs,‖ in Proc. Inf. TheoryWorkshop, Lake
Tahoe, CA, Sep. 2007, pp. 295–300.
[4] D. Schonberg, ―Practical Distributed Source
Coding and its Application to the Compression of
Encrypted Data,‖ Ph.D. dissertation, Univ.
California, Berkeley, 2007.

204
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

IDENTIFICATION OF STRUCTURAL CLONES


USING
ASSOCIATION RULE AND CLUSTERING

*P. Revathi **J. Jagadeesh

*Asst. Prof., Department of Information Technology,


p_revathime@gmail.com

**PG Student, Department of Information Technology,


me.jaga4688@gmail.com

Vel Tech Multi Tech Dr.Rangaragan & Dr.Sakunthala Engineering College,


Avadi-alamathi road, Chennai-62, India
Abstract –Code clones are similar program 1 INTRODUCTION
structures of considerable size and significant CODE clones are similar program structures of
similarity. Simple clone set formed by similar considerable size and significant similarity.
code fragments in software. The problem is Several studies suggest that as much as 20-50
the huge number of simple clones typically percent of large software systems consist of
reported by clone detection tools. We cloned code.Knowing the location of clones
observed that recurring patterns of simple helps in program understanding and
clones – so-called structural clones - often maintenance.The detection and subsequent
indicate the presence of interesting design- resolution of clones by refactoring, Function
level similarities. We propose a technique to calls, macros and templates etc., however,
detect some specific types of structural clones promises decrease in maintenance costs and
from the repeated combinations of co-located code size.
simple clone. We find the patterns of co-
occurring clones in different files using the In the past decade, clone detection and
frequent item set mining (FIM) technique. resolution has got considerable attention from
Finally, we perform file clustering to detect the software engineering research community
those clusters of highly similar files that are and many clone detection tools clone
likely to contribute to a design-level similarity detection has been focused on detecting
pattern. We implement the structural clone similar code fragments – so-called simple
detection technique in a tool called CCFinder. clones. We observed thatrecurring patterns of
Detection of clones provides several benefits simple clones often indicate the presence of
in terms of maintenance, program interesting higher-level similarities that we call
understanding, reengineering and reuse. Structural clones whose unification not only
brings more size reduction, but also helps in
Keywords - Design concepts, maintainability, understanding the design of the system for
reengineering and reusable software. better maintenance and future
enhancement.The limitation of considering

205
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

only simple clones is known in the field. The describes the implementation of CCfinder.
main problem is the huge number of simple Section 5 describes a mechanism to create
clones typically reported by clone detection generic representation of structural clones
tools. There have been a number of attempts found in the system for better maintenance
to move beyond the raw data of simple and reuse. Section 6 presentsthe related work
clones.We observed that at the core of the in higher-level similarities and design recovery.
structural clones, often there are simple clones Section 8 concludes the paper and presents
that coexist and relate to each other in certain future work.
ways.
2Structural clones- higher level
We proposed a technique to detect some Similarities in programs
specific types of structural clones from the
repeated combinations of colocated simple We describe in detail the phenomenon of
clones. We implemented the structural clone higher level similarities, which we call
detection technique in a tool called CCFinder, structural clones. We define structural clones
implemented in C++. It has its own token- as similar program structures that can be
based simple clone detector. Our structural analyzed hierarchically, at many levels of
clone detection technique works with the abstraction, with similar code fragments at the
information of simple clones, which may come bottom of such hierarchy. Locating these
from any clone detection tool. It only requires higher level similarities can have significant
the knowledge of simple clone sets and the value for program understanding, evolution,
location of their instances in programs. As reuse, and reengineering.
structural clones often represent some domain
or design concepts, their knowledge helps in 2.1 From Simple Clones to
program understanding, and their detection Structural Clones
opens new options for design recovery that
are both practical and scalable. Representing We primarily focus on similarity patterns
these repeated program structures of large representing design concepts or solutions that
granularity in a generic form also offers can be of significant importance in the context
interesting opportunities for reuse and their of understanding, maintaining, reengineering
detection becomes useful in the reengineering or reusing programs. We use the term
of legacy systems for better maintenance. structural clone to mean similar program
structures that are configurations of lower-
We can find clone patterns in different units of level similar program entities. Therefore, our
code, either methods or classes or structural clones may form a hierarchy of
components or modules, gaining useful clones, with cloned code fragments at the
insights into the cloning situation at different bottom level.
levels of abstraction. We have initially tried 2.1.1File-Level Structural Clone
this approach at file level, by finding the
frequently occurring clone patterns in different
files and analyzing those patterns, with
promising results.Detecting the frequently co-
occurring clone classes in different files, we
can isolate the groups of files that have strong
similarity with each other. This is achieved by
a clustering algorithm that we have devised
for this particularproblem. These clusters of
highly similar files form basic structural clones.
Fig 1- File level Structural clone
The remainder of this paper is organized Functions shown in the same shade are
as follows: In Section 2, we define types of clones of each other (e.g., staff_fn1, task_fn1,
structural clones-higher level similarities in project_fn1). The
programs. Section 3 describes our detecting
structural clones with data mining. Section 4

206
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Relationship between the functions is ‗same where each structure has four files
file‘, which holds between fragments of the create[M].php,
same file, regardless of the order in which display[M].php,edit[M].php,delete[M].php([M]
they appear. The three host files editStaff.php, =Staff, Task, Project) as entities and the
editTask.php and editProject.php perform relationship ‗same folder‘ among the entities
similar tasks, but belong to three different (relationships are not shown in figure). Note
modules (i.e., Staff module, Task module, and thatthe module Project does not carry a
Project module). Provided these structural deleteProject.php file. Still, there was enough
clones cover a substantial portion of the host similarityamong Project and other modules to
files, we can consider the three files as consider all of them as structural clones of
abstract entities that are clones of each other, each other.
as discussed in the previous section. This
illustrates how the concept of structural clones 2.1.3Multiple Structural Clones
helps us to move from smaller entities (in this in the Same File
case, functions) to larger entities (in this case,
files). These files can now be considered as
entities in forming a higher level structure.

2.1.2A Module Level Structural


Clone

Fig 3 – Multiple Structural Clones in the


same file

Multiple Structural clones can also exist in


the single file fig 2 shown The four structural
clonesare structures of code fragments that
Fig 2 – Module Level Structural clone are part of the different templates
According to the definition, structural clones of representing various hashed associative
higher granularity can be made up ofstructural containers. Each structural clone covers a
clones of lower granularity. For example, a significant part of the template it belongs to;
module-level structural clone can consist of hence we can consider these templates as
file-level structural clones. Such a situation is ‗abstract entities‘ (ignoring their internal
illustrated by the structural clone found in a structure) and form a clone class of four
web portal implementation, as shown in Figure clones at the next higher level. Furthermore, if
2. In this portal, files belonging to each the two templates present in one file are
module are stored in a separate folder. Each joined to each other with the ‗same file‘
module contains a set of files providing relationship, we have a structural clone class
module-specific implementation of certain of two structures, one in each file. Raising the
common functionalities (e.g., create, display, level of abstraction by one more step, we
edit, delete). When the module functionalities observed that the two templates present in
are similar, each of these common files ends each file cover the files significantly, so the
up being file-level structural clones of their files can also be considered as ‗abstract
counterparts in other modules. One such case entities‘, forming a clone class of two cloned
was the basis for previous example. At a files.
larger granularity, the modules Staff, Task and
Project can be considered structural clones

207
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2.1.4Crosscutting Structural
Clones

Structural clones can crosscut files (or classes,


modules etc.), as the example in Figure 4
shows. This example involves three PHP files
belonging to a Web portal module that
supports two similar crosscutting features.
This results in two structural clones, each
consisting of code fragments belonging to one
of the two crosscutting features. Fig 5 – Structural Clones based on hierarchy

3Detecting Structural clones with


Data mining

This section focuses on our approach to detect


some specific types of structural clones from
the bottom up analysis of similarities using a
data mining technique similar to the well-
known market basked analysis. This analysis
builds on the detection of simple clones
discussed in the previous chapter.
Weintroduce an iterative approach for the
detection of structural clones by moving from
the low-levelsimilarities to the higher level
Fig 4 – Two crosscutting structural clones similarities. To raise the level of analysis, we
useabstraction of entities based on clone
2.1.5Structural Clones Based on coverage. The higher level entities that have
Inheritance Hierarchy significantlow-level cloning are grouped
together. These groups of entities form the
The relationship(s) between entities of a basic similarityblocks for the next higher level
structural clone can vary widely. In object- analysis.
oriented systems, a set of entities related by
inheritance can be used to define a structural 3.1Finding Recurring Patterns of
clone. We found such a case in the Buffer SimpleClone Classes
library (java.nio.*) Figure 5 shows two
instances (out of seven) of the structural Here we describe the detection of patterns of
clone, each consisting of seven Java classes. simple clones in a file - the first level
More information on the structure of the ofstructural clones. The same technique can
Buffer library, and how ‗feature combinatorics‘ be applied to detect structural clones at other
problem gave rise to this structure. levels,as will be described later.An example of
this format is shown in Figure 6. After
detecting simple clone classes in asystem, the
data of simple clone classes is organized in
terms of files represented by their IDs.The
first data row says that the file with file ID 12
contains three clone instances belonging
toclone class 9 and one instance from each of
the clone classes 15, 28, 38, and 40. The

208
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

interpretation is likewise for the other data from FCIM is in the format shown in Figure 7.
rows. Each row represents one frequentclone
pattern along with its support count, indicating
the number of files containing this clone

Fig 7 - frequent clone patterns with support


count
Fig 6 - Simple clone classes listed per
file FCIM only deals with detecting frequent
patterns. These clone patterns can also be
To detect the recurring patterns of simple considered as unrestricted gapped clones,
clones in different files, we apply the where any numberof gaps are allowed with
―marketbasket analysis‖ technique from the arbitrary size and ordering. More work is
data mining domain. The idea behind this required to isolate clonepatterns where the
technique is tofind the items that are usually gaps are small and the clones are more
purchased together by different customers cohesive, to have more meaningfulgapped
from a departmentalstore. These patterns of clones. Finding gapped clones in this way also
clone classes will act as the unique provide the flexibility to detect
representation for a groupof files, and rearrangedgapped clones, where the cloned
depending upon its significance in terms of parts can occur in arbitrary order and should
files‘ coverage, will lead toidentifying groups of notnecessarily be arranged in the same way.
highly similar files. This will be the next level In figure 8 algorithms for finding clone pattern
of structural clones.Market basket analysis is
done with ―frequent itemset mining (FIM)‖.
The difference betweenour problem and the
standard problem for frequent itemset mining
is that in FIM, the items ina transaction are
considered unique, whereas in our data, one
file may contain multipleinstances of the same
clone class. We can normalize the data by
removing by doing so, we miss out the
important information, as multiple occurrences
of the instances ofsame clone class in different
files is a valid pattern of clones. For example,
9, 9, 9, 15 is avalid clone pattern represented
in File 12 and File14 in Figure 6
Fig 8 – Algorithm for finding Simple Clone
Mining all frequent itemsets returns many pattern
frequent itemsets that are subsets of bigger
frequentitemsets. The correct solution in our 3.2Clustering Highly Cloned Files
case is to perform ―Frequent Closed Itemset
Mining‖ (FCIM), where only those itemsets are To measure file coverage by a clone pattern,
reported which are not subsets of any bigger we calculate two metrics, namely the
frequent itemset.One of the parameters for FilePercentage Coverage (FPC), which
FCIM is the support count, which means the indicates the percentage of a file covered by a
number of files thatcontain the detected clonepattern, and the File Token Coverage
pattern of simple clonesIn our case, we (FTC), which tells the number of tokens in a
havehard coded the support to be 2, so that it filecovered by the clone pattern. These metrics
will report a clone pattern, even if it is present are calculated for each file containing the
onlyin 2 files, as it could be still be significant clonepattern. One complication here is that
for maintenance based on its size.The output

209
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

some clones may overlap in a file, as generated in the form of textfiles so that any
discussed earlier,so we cannot simply add up visualization tool developed in the futurecan
the size of all clones in a pattern to find the easily interface with Clone Miner.For
file coverage. The clustering based onthese performance evaluation, we ran CCFinder on
values and other parameters can also be made full J2SE 1.5 source code, 3 consisting of
totally customizable to suit the needs ofthe 6,558 source files in370 directories, 625,096
different users. Currently, we let the user LOC (excluding comments and blanklines),
specify a minimum FPC and FTC value and 70,285 methods, using different values of
toindicate the significance of a cluster. The minimumclone size. For forming FCSets and
cluster will be considered significant even if MCSets, a value of50 tokens is used for the
one filehas the FPC or FTC value greater than clustering parameter minLen, wherethe Len is
threshold values. The expected output is to measured in terms of tokens. Likewise,
find all thesignificant clusters that cover forminCover, a value of 50 percent is used in
maximum number of files and no file is all cases. The testswere run on a Pentium IV
preferably repeated intwo clusters. machine with 3.0 GHz processorand 1GB
RAM.Each time it took around two to three
minutesto run the whole process from finding
simple clones to theanalysis of files, methods,
and directories for structuralclones

5 Structural Clones in Software


Maintenance and reuse

To improve the design of the legacy systems,


various re-structuring or refactoring
techniquescan be applied [Opd92] [Fow99].
Fig 9 - Algorithm for cluster pruning Analysis of structural clones is helpful in
locating placeswhere high-level duplication is
Step 1 of the algorithm says that we remove present, which can be restructured or
from consideration all those clusters where refactored.For redesigning the system to
nofile passes the minimum criteria of FPC and enhance maintainability of the legacy code, we
FTC values. These are clusters where very propose some
small clones exist between files. In step 2, we Structural clone based techniques. A good
sort clusters based on their support count. starting point is to analyze the file clone
When thesupport count of two or more classes (i.e., groups of cloned files). After
clusters is same, the clusters are sorted based choosing file clone classes for refactoring,
on the maximumFPC value of the constituent simple clones or method clones within those
files. Steps 3 and 4 prune clusters.These groups of files can be more easily refactored
clusters of highly similar files give us the next because of thecontext information. It may also
level of structural clones that we call fileclone be possible to apply several small
classes or FCC. refactoringsimultaneously, for example moving
together several cloned methods to the parent
4TOOL IMPLEMENTATION class, orsimply changing the inheritance
structure to remove duplicates. Having only
CCFinder implements the structural clone the knowledgeof simple clones, possibility of
detectiontechniques presented in this paper. making such bigger changes is not very
CCFinder is writtenin C++, and it has its own apparent, and one has to go step by step, with
token-based simple clone detector[6]. For the risk of missing the bigger picture
frequent closed item sets mining (FCIM), we altogether.Analysis can also be done at code
areusing the algorithm from [21].For fragments, methods, or directories level,
manipulation of clones‘ data, CCFinder makes depending on howintense the cloning is and
useof the STL containers from the standard how major reengineering is practical. The
C++ library. Theoutput from CCFinder is other analysis featuresbuilt inside Clone
Analyzer, such as the structural clone

210
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

configurability and the Diff featurecan aid the configuration of lower-level clones. We
user in the finer details of refactoring.This presented a technique for detecting
proposed method is somewhat general due to structuralclones. The process starts by finding
the varying objectives; it gives a basic simple clones (that is, similar code fragments).
framework for the analysis process. Increasingly higher-level similarities are then
6 Related works found incrementally using data mining
technique of finding frequent closed item sets,
Clone detection tools produce an and clustering. We implemented the structural
overwhelming volume ofsimple clones‘ data clone detection technique in a tool called
that is difficult to analyze in order to finduseful Clone Miner. While Clone Miner can also detect
clones. This problem prompted different simple clones, its underlying structural clone
solutionsthat are related to our idea of detection technique can work with the output
detecting structural clones. Clone detection from any simple clone detector. Structural
techniques using Program DependenceGraphs clone informationleads to better program
(PDG). In addition tothe simple clones, these understanding, maintenance,reengineering
tools can also detect noncontiguousclones, and reuse.
where the segments of a clone are connected
bycontrol and data dependency information
links. Such clonesalso fall under the premise of
structural clones. While our technique detects 8 References
structural clones with segments related to
each other based only on their colocation, with [1] H.A. Basit and S. Jarzabek, ―Detecting
or without information links, the PDG-based Higher-Level Similarity Patterns in Programs,‖
techniques relate them using the information Proc. European Software Eng. Conf. and ACM
links only. Moreover, the clustering mechanism SIGSOFT Symp. Foundations of Software Eng.,
in Clone Miner, to identify to identify groups of pp. 156-165, Sept. 2005
highlysimilar methods, files, or directories
based on theircontained clones, is missing [2] J.R. Cordy, ―Comprehending Reality:
from these techniques. Practical Barriers toIndustrial Adoption of
Software Maintenance Automation, ―Proc. 11th
PR-Miner is another tool that discovers IEEE Int‘l Workshop Program Comprehension,
implicitprogramming rules using the frequent (keynote paper), pp. 196-206, 2003.
item set technique. Compared to structural
clones found by Clone Miner, these [3] A.De Lucia, G. Scanniello, and G. Tortora,
programming rules are much smaller entities, ―Identifying Clones in Dynamic Web Sites
usually confined to a couple of function calls Using Similarity Thresholds,‖ Proc. Int‘l Conf.
within a function. The work by Ammon‘s et al. Enterprise Information Systems, pp. 391-396,
is also similar, finding the frequent interaction 2004.
patterns of a piece of code with an API or an
ADT, and representing it in the form of a state [4] J.Y. Gil and I. Maman, ―Micro Patterns in
machine. These frequent interaction patterns Java Code,‖ Proc. 20th Object Oriented
may appear as a special type of structural Programming Systems Languages and
clone, in which the dynamicrelationship of Applications, pp. 97-116, 2005.
cloned entities is considered. Similar toClone
Miner, this tool also helps in avoiding [5] G.Grahne and J. Zhu, ―Efficiently Using
updateanomalies, though only in the context Prefix-Trees in Mining Frequent Itemsets,‖
of anomalies to the frequent interaction Proc. First IEEE ICDM Workshop Frequent
patterns. Itemset Mining Implementations, Nov. 2003.

7Conclusions [6]J. Han and M. Kamber, ―Data Mining:


Concepts and Techniques‖. ―Morgan Kaufman
We emphasized the need to study code Publishers‖, 2001.
cloning at a higher level. We introduced the
concept of structural clone as a repeating

211
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[7] Y. Higo, T. Kamiya, S. Kusumoto, and K. Programming Technique: A Case Study,‖ J.


Inoue, ―ARIES: Software Maintenance and Evolution:
Refactoring Support Environment Based on Research and Practice, vol. 18, no. 4, pp. 267-
Code Clone Analysis,‖ 292, July 2006.
Proc. Eighth IASTED Int‘l Conf. Software Eng.
and Applications, pp. 222-229, Nov. 2004. [10] C. Kapser and M.W. Godfrey, ―Toward a
Taxonomy of Clones in Source Code: A Case
[8] S. Jarzabek, ―Effective Software Study,‖ Proc. Int‘l Workshop Evolution of
Maintenance and Evolution: Reused- Based Large Scale Industrial Software Architectures,
Approach‖. ―CRC Press, Taylor and Francis‖, pp. 67-78, 2003.
2007.
[11] C. Rich and L.M. Wills, ―Recognizing a
[9] S. Jarzabek and S. Li, ―Unifying Clones Program‘s Design: Ac Graph-Parsing
with a Generative Approach,‖ IEEE Software, vol. 7, no. 1, pp.
82-89, Jan. 1990.

212
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DATA MINING TECHNIQUES FOR CUSTOMER


RELATIONSHIP MANAGEMENT
S. Asokkumar
Research Scholar, Anna University of Technology Coimbatore
Abstract of customer relationships are changing in
fundamental ways, and companies are
Data Mining has enjoyed great popularity facing the need to implement new
in recent years, with advances in both solutions and strategies that address these
research and commercialization. The first changes. The concepts of mass production
generation of data mining research and and mass marketing, first created during
development has yielded several the Industrial Revolution, are being
commercially available systems. The core supplanted by new ideas in which
components of data mining technology customer relationships are the central
have been developing for decades in business issue. Firms today are concerned
research areas such as statistics, artificial with increasing customer value through
intelligence, and machine learning. A top- analysis of the customer lifecycle. The
level breakdown of data mining tools and technologies of data
technologies is based on data retention. It warehousing, data mining, and other
should be clear from the discussion so far customer relationship management (CRM)
that customer relationship management is techniques afford new opportunities for
a broad topic with many layers, one of businesses to act on the concepts of
which is data mining, and that data mining relationship marketing.
is a method or tool that can aid companies Data mining techniques are the
in their quest to become more customer- result of a long research and product
oriented. Data mining represents the link development process. The origin of data
from the data stored over many years mining lies with the first storage of data
through various interactions with on computers, continues with
customers in diverse situations, and the improvements in data access, until today
knowledge necessary to be successful in technology allows users to navigate
relationship marketing concepts. through data in real time. In the evolution
from business data to useful information,
Keywords : Commercialization, Artificial each step is built on the previous ones.
intelligence, Marketing, Top-level
Table 1
Evolutionary stages of data mining
Introduction Stage Enabling Characterist
Technologies ics
Data Mining has enjoyed great Data Computers, Retrospecti
popularity in recent years, with advances Collection Tapes, disks ve, static
in both research and commercialization. (1960s) data
The first generation of data mining delivery
research and development has yielded Data RDBMS, SQL, Retrospecti
several commercially available systems, Access(198 ODBC ve, dynamic
both stand-alone and integrated with 0s) data
database systems, produced scalable delivery at
versions of algorithms for many classical record level
data mining problems and introduced Data OLAP, Retrospecti
novel pattern discovery problems. Navigation multidimensio ve, dynamic
A new business culture is (1990s) nal database, data
developing today. Within it, the economics data delivery at

213
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

warehouse multiple machine learning. Today, these


levels technologies are mature, and when
Data Advanced Prospective, coupled with relational database systems
Mining(200 algorithms, proactive and a culture of data integration, they
0) multiprocesso information create a business environment that can
r computers, delivery capitalize on knowledge formerly buried
massive within the systems.
databases
Applications of data mining
Table 1 shows the evolutionary stages Data mining tools take data and
from the perspective of the user. In the construct a representation of reality in the
first stage, Data Collection, individual sites form of a model. The resulting model
collected data used to make simple describes patterns and relationships
calculations such as summations or present in the data. From a process
averages. Information generated at this orientation, data mining activities fall into
step answered business questions related three general categories:
to figures derived from data collection
sites, such as total revenue or average
total revenue over a period of time.
Specific application programs were created Conditional Logic
for collecting data and calculations. The
second step, Data Access, used databases
to store data in a structured format. At Affinities
this stage, company-wide policies for data and Association
collection and reporting of management
information were established. Because
every business unit conformed to specific Discover
requirements or formats, businesses could
query the information system regarding
branch sales during any specified time Trends and Variations
period. Once individual figures were
known, questions that probed the
performance of aggregated sites could be
asked. Multi-dimensional databases, a Data Mining Predictive
business could obtain either a global view Outcome prediction
or drill down to a particular site for Modeling
comparisons with its peers (Data
Navigation). Finally, on-line analytic tools
provided real-time feedback and Forecasting
information exchange with collaborating
business units (Data Mining). This Forensic
capability is useful when sales Analysis Link
representatives or customer service Analysis
persons need to retrieve customer
information on-line and respond to
questions on a real-time basis.
Information systems can query past data Deviation Detection
up to and including the current level of
business.
The core components of data
mining technology have been developing Discovery—the process of looking in a
for decades in research areas such as database to find hidden patterns without
statistics, artificial intelligence, and a predetermined idea or hypothesis about
what the patterns may be.

214
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Predictive Modeling—the process of taking


patterns discovered from the database
and using them to predict the future.

Forensic Analysis—the process of applying


the extracted patterns to find anomalous Customer Relationship Management
or unusual data elements. (CRM)

Data mining techniques Customer Relationship


A top-level breakdown of data Management (CRM) emerged in the last
mining technologies is based on data decade to reflect the central role of the
retention. In other words, is the data customer for the strategic positioning of a
retained or discarded after it has been company. CRM takes a holistic view over
mined? . In early approaches to data customers. It encompasses all measures
mining, the data set was maintained for for understanding the customers and for
future pattern matching. The retention- exploiting this knowledge to design and
based techniques only apply to tasks of implement marketing activities, align
predictive modeling and forensic analysis, production and coordinate the supply-
and not knowledge discovery since they chain. CRM puts emphasis on the
do not distill any patterns. Approaches coordination of such measures, also
based on pattern distillation fall into three implying the integration of customer-
categories: logical, cross tabulation, and related data, meta-data and knowledge
equational. These technologies extract and the centralized planning and
patterns from a data set and then use the evaluation of measures to increase
patterns for various purposes. They ask, customer lifetime value. CRM gains in
―What types of patterns can be extracted importance for companies that serve
and how are they represented?‖ The multiple groups of customers and exploit
logical approach deals with both numeric different interaction channels for them.
and non-numeric data. Equations require This is due to the fact that information
all data to be numeric, while cross about the customers, which can be
tabulations work only on non-numeric acquired for each group and across any
data. Table 2 summarizes the pros and channel, should be integrated with
cons of these categories. existing knowledge and exploited in a
coordinated fashion. It should be noted,
however, that CRM is a broadly used term,
and covers a wide variety of functions, not
Table 2 all of which require data mining. These
functions include marketing automation
(e.g., campaign management, cross- and
up-sell, customer segmentation, customer
retention), sales force automation (e.g.,
contact management, lead generation,
sales analytics, generation of quotes,
product configuration), and contact center
management (e.g., call management,
integration of multiple contact channels,
problem escalation and resolution, metrics
and monitoring, logging interactions and
auditing), among others.

Data Mining Technology

215
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

(2) sell them more or higher-margin


Decision Trees products; and
(3) keep the customers for a longer period
Decision trees are a way of of time.
representing a series of rules that lead to
a class or value. However, the customer relationship
changes over time, evolving as the
Neural networks business and the customer learn more
about each other. So why is the customer
Neural networks are of particular lifecycle important? Simply put, it is a
interest because they offer a means of framework for understanding customer
efficiently modeling large and complex behavior. In general, there are four key
problems in which there may be hundreds stages in the customer lifecycle:
of predictor variables that have
interactions. 1. Prospects—people who are not yet
customers but are in the target market
Clustering 2. Responders—prospects who show an
interest in a product or service
Clustering divides a database into 3. Active Customers—people who are
different groups. The goal of clustering is currently using the product or service
to find groups that are very different from 4. Former Customers—may be ―bad‖
each other, and whose members are very customers who did not pay their bills or
similar to each other. Unlike classification, who incurred high costs; those who are
we don‘t know what the clusters will be not appropriate customers because they
when we start , or by which attributes the are no longer part of the target market; or
data will be clustered. Consequently, those who may have shifted their
someone who is knowledgeable in the purchases to competing products. The
business must interpret the clusters. After customer lifecycle provides a good
we have found clusters that reasonably framework for applying data mining to
segment our database, these clusters may CRM. On the ―input‖ side of data mining,
then be used to classify new data. the customer lifecycle tells what
Data mining and customer relationship information is available. On the ―output‖
management side, the customer lifecycle tells what is
It should be clear from the discussion so likely to be interesting.
far that customer relationship
management is a broad topic with many Looking first at the input side, there is
layers, one of which is data mining, and relatively little information about prospects
that data mining is a method or tool that except what is learned through data
can aid companies in their quest to purchased from outside sources. There are
become more customer-oriented. Now we two exceptions: one, there are more
need to step back and see how all the prospecting data warehouses in various
pieces fit together. industries that track acquisition campaigns
directed at prospects; two, click-stream
The relationship information is available about prospects‘
The term ―customer lifecycle‖ behavior on some websites. Data mining
refers to the stages in the relationship can predict the profitability of prospects as
between a customer and a business. It is they become active customers, how long
important to understand customer lifecycle they will be active customers, and how
because it relates directly to customer likely they are to leave. In addition, data
revenue and customer profitability. mining can be used over a period of time
Marketers say there are three ways to to predict changes in details. It will not be
increase a customer‘s value: an accurate predictor of when most
(1) increase their use (or purchases) of lifecycle events occur. Rather, it will help
products they already have; the organization identify patterns in their

216
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

customer data that are predictive. For Figure.1 The Basic CRM Cycle
example, a firm could use data mining to
predict the behavior surrounding a
particular lifecycle event (e.g., retirement) In this figure 1, boxes represent actions:
and find other people in similar life stages
and determine which customers are The customer takes the initiative of
following similar behavior patterns. contacting the company, e.g. to purchase
something, to ask for
Analysis of Data Mining in CRM after sales support, to make a reclamation
or a suggestion etc.
Problem Context: The maximization of • The company takes the initiative of
lifetime values of the (entire) customer contacting the customer, e.g. by launching
base in the context of a company's a marketing campaign, selling in an
strategy is a key objective of CRM. Various electronic store or a brick-and-mortar
processes and personnel in an store etc.
organization must adopt CRM practices • The company takes the initiative of
that are aligned with corporate goals. For understanding the customers by analyzing
each institution, corporate strategies such the information available from the other
as diversification, coverage of market two types of action. The results of this
niches or minimization of operative costs understanding guide the future behaviour
are implemented by "measures", such as of the company towards the customer,
mass customization, segment-specific both when it contacts the customer and
product configurations etc. The role of when the customer contacts it. The reality
CRM is in supporting customer-related of CRM, especially in large companies,
strategic measures. looks quite different from the central
Customer understanding is the core of coordination and integration suggested by
CRM. It is the basis for maximizing Figure 1:
customer lifetime value, which in turn • Information about customers flows into
encompasses customer segmentation and the company from many channels, but not
actions to maximize customer conversion, all of them are intended for the acquisition
retention, loyalty and profitability. Proper of customer-related knowledge.
customer understanding and actionability • Information about customers is actively
lead to increased customer lifetime value. gathered to support well-planed customer-
Incorrect customer understanding can related actions, such as marketing
lead to hazardous actions. Similarly, campaigns and the launching of new
unfocused actions, such as unbounded products. The knowledge acquired as the
attempts to access or retain all customers, result of these actions is not always
can lead to decrease of customer lifetime juxtaposed to the original assumptions,
value (law of diminishing return). Hence, often because the action-taking
emphasis should be put on correct organizational unit is different from the
customer understanding and concerted information-gathering unit. In many cases,
actions derived from it. neither the original information, nor the
derived knowledge are made available
outside the borders of the organizational
unit(s) involved. Sometimes, not even
their existence is known.
• The limited availability of customer-
related information and knowledge has
several causes. Political reasons, e.g.
rivalry among organization units, are
known to lead often in data and
knowledge hoarding. A frequently
expressed concern of data owners is that
data, especially in aggregated form,

217
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

cannot be interpreted properly without an choosing a suitable technology for


advanced understanding of the collection personalization or CRM, organizations
and aggregation process. Finally, must be aware of the tradeoffs when
confidentiality constraints, privacy considering differing data mining software
considerations and law restrictions often applications. The choice among different
disallow the transfer of data and derived options is not as critical as the choice to
patterns among departments. use data mining technologies in a CRM
• In general, one must assume that data initiative. Data mining represents the link
gathered by an organization unit for a from the data stored over many years
given purpose cannot be exported through various interactions with
unconditionally to other units or used for customers in diverse situations, and the
other purpose and that in many cases knowledge necessary to be successful in
such an export or usage is not permitted relationship marketing concepts. In order
at all. to unlock the potential of this information,
• Hence, it is not feasible to strive for a data mining performs analysis that would
solution that integrates all customer- be too complicated and time-consuming
related data into a corporate warehouse. for statisticians, and arrives at previously
The focus should rather be in mining non- unknown nuggets of information that are
integrated, distributed data while used to improve customer retention,
preserving privacy and confidentiality response rates, attraction, and cross
constraints. selling. Through the full implementation of
a CRM program, which must include data
Applying Data Mining to CRM mining, organizations foster improved
loyalty, increase the value of their
In order to build good models for your customers, and attract the right
CRM system, there are a number of steps customers. As customers and businesses
you must follow. The basic steps of data interact more frequently, businesses will
mining for effective CRM are : have to leverage on CRM and related
1. Define business problem technologies to capture and analyze
2. Build marketing database massive amounts of customer information.
3. Explore data Businesses also have a duty to
4. Prepare data for modeling execute their privacy policy so as to
5. Build model establish and maintain good customer
6. Evaluate model relationships. For such a sensitive issue as
7. Deploy model and results privacy, the burden is on businesses when
it comes to building and keeping trust.
The more effectively we can use The nature of trust is so fragile that once
the information about your customers to violated, it vanishes. Current CRM
meet their needs the more profitable you solutions focus primarily on analyzing
will be. But operational CRM needs consumer information for economic
analytical CRM with predictive data mining benefits, and very little touches on
models as its core. The route to a ensuring privacy. As privacy issues
successful business requires that we become major concerns for consumers,
understand our customers and their surely an integrated solution that
requirements, and mining is the essential streamlines and enhances the entire
guide. process of managing customer
relationships will become even more
necessary.
Conclusion
Intelligence CRM improves
customer relationship from the data about References
customers. customer relationship
management is essential to compete 1. Petrissans A. Customer relationship
effectively in today‘s marketplace. In management: the changing economics of

218
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

customer relationships. White Paper


prepared by Cap Gemini and International 7. IDC & Cap Gemini. Four elements of
Data Corporation, 1999. customer relationship management. Cap
Gemini White Paper.
2. Newton‘s Telecom Dictionary, Harry
Newton, CMP Books, 8. Freeman M. The 2 customer lifecycles.
http://www.cmpbooks.com Intelligent Enterprise 1999;2(16):9.

3. Edelstein H. Data mining: exploiting 9. Hill L. CRM: easier said than done.
the hidden trends in your data. DB2 Online Intelligent Enterprise 1999;2(18):53.
Magazine.
http://www.db2mag.com/9701edel.htm 10. Thearling K. Data mining and CRM:
zeroing in on your best customers. DM
4. Data Intelligence Group Pilot Software. Direct. December, 1999.
An overview of data mining at Dun & http://www.dmreview.com/editorial/dmrev
Bradstreet. DIG White Paper, 1995 iew/print—action.cfm?EdID
http://www3.shore.net/_kht/text/wp9501/
wp9501.shtml 11. Chablo E, Marketing Director,
smartFOCUS Limited. The importance of
5. Decision Support Solutions: Compaq. marketing data intelligence in delivering
Object relational data successful CRM, 1999. http://www.crm-
mining technology for a competitive forum.com/crm—forum—white—
advantage. papers/mdi/sld01.htm
http://www.tandem.com/brfs—
wps?odmadvwp/odmadvwp.htm 12. Thearling K, Exchange Applications,
Inc. Increasing customer value by
6. Garvin M. Data mining and the web: integrating data mining and campaign
what they can do together. management software. Exchange
DMReview.com, 1998. Applications White Paper, 1998.
http://www.dmreview.com/editorial/dmrev http://www.crmforum. com/crm—forum—
iew/print— white—papers/icv/sld01.html

219
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

LOCATION DEPENDENT PRIVACY AWARE


MONITORING FRAMEWORK FOR SAFE
REGION MOVING OBJECTS
*P.N.Nancy **Prof R.Prasanna Kumar ***Dr.T.Ravi
*M.E Computer Science and Engineering Email: baslinnancy24@gmail.com,
Jaya Engineering College
**Computer Science and Engineering, Jaya Engineering College
***Computer Science and Engineering,KCG college of Engineering

location updates to the server. I propose


spatial query monitoring framework for
ABSTRACT: continuously moving objects methods to
Traffic monitoring accuracy, efficiency, minimizing the processing time at the
and privacy, is necessary for the server and/or the communication cost
operation, maintenance and control of incurred by location updates. Due to the
communication networks. Traffic time-critical nature of the problem, the
monitoring has also important implications data are usually stored in main memory to
on the user privacy. Efficiency and privacy allow fast processing. Based on the
are two fundamental issues in moving notions of safe region and most probable
object monitoring. In my project I propose result, PAM performs location updates only
a privacy-aware monitoring (PAM) when they would likely alter the query
framework that addresses both issues. results. I develop efficient query
The framework distinguishes itself from evaluation/reevaluation and safe region
the existing work by being the first to computation algorithms in the framework.
holistically address the issues of location
updating in terms of monitoring accuracy, 1 INTRODUCTION
efficiency, and privacy. Particularly, when
and how mobile clients should send
Spatial Data mining refers to the spatial data mining techniques due to the
extraction of knowledge, spatial relationships, huge amount of spatial data and the
or other interesting patterns not explicitly complexity of spatial types and spatial access
stored in spatial databases. Such mining methods.
demands an integration of data mining with The fundamental problem in a monitoring
spatial database technologies. It can be used system is when and how a mobile client
for understanding spatial data, discovering should send location updates to the server
spatial relationships and relationships between because it determines three principal
spatial and non spatial data, constructing performance measures of monitoring—
spatial knowledge bases, reorganizing spatial accuracy, efficiency, and privacy. Accuracy
databases, and optimizing spatial queries. It is means how often the monitored results are
expected to have wide applications in moving correct, and it heavily depends on the
object mobile, geographic information frequency and accuracy of location updates.
systems, geo marketing, remote sensing,
image data base exploration, medical imaging,
navigation, traffic control, environmental
studies, and many other areas, where spatial
data are used. A crucial challenge to spatial
data mining is the exploration of efficient

220
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

provide their exact positions to the


Mobile client Location server.Location Cloaking was proposed to blur
update
the exact client positions into bounding boxes
[6], [8], [10], [11]. In the proposed system of
Result update aframework for monitoring continuous spatial
queries over moving objects. The framework is
the first to holistically address the issue of
Base Station Databasese location updating with regard to monitoring
rver
Register accuracy, efficiency, and privacy. A monitoring
Query framework where the clients are aware of the
Application spatial queries being monitored, so they send
server location updates only when the results for
some queries might change.
The safe region is computed based on
Fig 1 The System Architecture the queries in such a way that the current
In mobile and spatiotemporal databases, results of all queries remain valid as long as all
monitoring continuous spatial queries over objects reside inside their respective safe
moving objects is needed in numerous regions. We also devise three-client update
applications such as public transportation, strategies that optimize accuracy, privacy, and
logistics, and location based services.[1] Fig efficiency, respectively. The performance of
1 shows a typical monitoring system, which our framework is evaluated through a series of
consists of a base station database server, experiments. The framework is robust and
application servers, and large number of scales well with various parameter settings,
moving objects(i.e mobile clients).The such as privacy requirement, moving speed,
application servers gather monitoring requests and the number of queries and moving
and register spatial queries at the database objects.[4]Presenting a framework for
server, which then continuously updates the continuous reverse k nearest neighbor
query results until the queries are queries by assigning each object and query
deregistered. Two commonly used updating with a rectangular safe region. The results of a
approaches are periodic update and deviation query are to be recomputed whenever the
update [3] [5] [7] [9].Accuracy, efficiency, and query changes its location. Protecting user‘s
privacy are less, Monitoring accuracy is low. locations responding to a query. The privacy
The fundamental problem in a existing system constraints of the problem do not allow
of the monitoring system is when and how a revealing users' location information to the
mobile client should send location updates to entrusted entity. [2] One-way transformation
the server because it determines three used to preserve user‘s location privacy by
principal performance measures of encoding the space of all static and dynamic
monitoring—accuracy, efficiency, and objects and answering the query blindly in the
privacy.The disadvantages of the Monitoring encoded space. The integration of privacy into
system have some type of problems. These the monitoring framework poses challenges to
problems arethe monitoring accuracy is low: the design of PAM. First with the introduction
query results are correct only at the time of bounding boxes, the result of a query is no
instances of periodic updates, but not in longer unique
between them or at any time of deviation The efficiency and privacy having a major role
updates. The server workload using periodic in the world. So to maintain these we should
update is not balanced over time: it reaches use a PAM which means Privacy Aware
the peak when updates arrive (they must Monitoring. This will be used for location
arrive simultaneously for correct results) and updating in terms of monitoring accuracy,
trigger query reevaluation, but is idle for the efficiency, and privacy. We develop efficient
rest of the time. query evaluation/reevaluation and safe region
Privacy issue is simply ignored. Some recent computation algorithms in the framework.
work attempted to remedy the privacy issue. 2 RELATED WORK
The privacy issue is simply ignored by Prior work on spatial temporal query
assuming that the clients are always willing to processing assumed a static data set and

221
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

focused on efficient access methods and query


evaluation algorithms. Recently, a lot of Algorithm 1: Overview of Database Behavior
attention has been paid to moving-object 1: while receiving a request do
databases, where data objects or queries or 2: if the request is to register query q then
both of them move. In our work, the 3: evaluate q;
continuous spatial queries are monitored using 4: compute its quarantine area and insert it
KNN algorithm. Spatial queries are evaluated into the query index;
5: return the results to the application server;
using query evaluation algorithms. If the
6: update the changed safe regions of objects;
object moves from one safe region to another 7: else if the request is to deregister query q
safe region [section 3], then there is a need then
for recomputation or reevaluation. Our 8: remove q from the query index;
framework distinguishes from existing studies 9: else if the request is a location update from
with a comprehensive framework focusing on object p
location update. To protect location privacy then
various cloaking or anonymizing techniques 10: determine the set of affected queries;
have been proposed to hide the client‘s actual 11: for each affected query q’ do
location. 12: reevaluate q’;
13: update the results to the application
3 FRAMEWORK OVERVIEW:
server;
PAM framework works as follows (see fig 2): 14: recompute its quarantine area and update
Application server can register the spatial the Query index;
queries to the Data base server. Query 15: update the safe region of p’;
Processor identifies the queries which are all
affected by this update using the query index
and reevaluate them using the object index.
The updated query results are reported to the 3.1 Query Answering with Object Indexing
application servers. Location manager At any time instance t the index structure
computes the new safe region for the consists of each cell(i,j) having an object list,
updating object and then the location denoted by PL(i,j) containing identifiers(IDs)
manager checks object index and get the of objects enclosed by the cell(I,j), namely
updates from the database server. The PL(i,j)={p(t) Є P(t) : <P(t)x,p(t)y> Є
location based server manages the locations. [iδ,(i+1)δ).[jδ,(j+1)δ)}.[12]Building the index
The location constrains matching engine requires scanning through the objects and
matches locations with the corresponding inserting each object p(t) into the
queries and sends the location updates to the corresponding cell.thew index structure is
mobile clients. Location manager computes shown in fig 3.The Database server
the safe region. If the object present with in stores:1.Query parameters(e.g.,the rectangle
the safe region then there is no need for of a range query, the query point, and the k
reevaluation .This paper improves the value of a KNN query);2.the current query
performance of the PAM framework by means results 3. the quarantine area of the query.
of improving efficiency, privacy and accuracy. The quarantine area is used to identify the
queries whose results might be affected by an
incoming location update. It originates from
the quarantine line. [1].
3.2 The Object Index
The object index is the sever-side
view on all objects . More specifically, to
evaluate queries, the sever must store the
spatial range, in the form of a bounding box,
with in which each object can possibly
locate.The object index stores the current safe
regions of all the objects. While many spatial
index structures can serve this purpose, this
paper employs the well-known R-tree based
index. Since the safe region changes each

222
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

time the object updates its location (either quarantine areas overlap this cell as the p.srQ
client-initiated or server- initiated), the index for any rest query is the cell itself. These
should be optimized to handle frequent overlapping queries are called relevant queries
updates. and are exactly pointed by the bucket of this
3.3 Query processor and Location Manager cell in the query index.
The Location manager computes the
safe region of an object. Safe region is based
on the Quarantine line. The location manager
recomputed the safe region in two cases:
1.after a new query is evaluated 2. After
object sends the location update. Query
processor evaluates the Query using KNN
algorithm (Algorithm 2).

Object Query Data


Index Index Base
Sever
Register
Query

Query Location Location


based p
Process
q Manager
Result
or server kNN
update
(i,j).
Pl(i,j)
. p Location
constrain
matching
Location Safe engine
update Region

Mobile
Bounding Box Location Updater Client

fig2: PAM framework overview

4 SAFE REGION COMPUTATION


The safe region of a moving object p (denoted
as p.sr) des-ignites how far p can reach
without affecting the results of any registered
query. As queries are independent of each
other, we define the safe region for a query Q
(denoted as p.srQ) as the rectangular region
in which p does not affect Q‘s result. p.srQ is
essentially a rectangular approximation of Q‘s
quarantine area or its complement. Obviously,
p.sr is the intersection of individual p.srQ for fig 3:Data structure of the Object-Index
all registered queries. To efficiently eliminate
those queries whose p.srQdo not contribute to The safe region of an object p in three cases:
p.sr, we require p.sr (and p.srQ) to be fully 1) during the evaluation of a new query Q, if p
contained in the grid cell in which p currently is probed, its safe region needs to be updated.
resides. By this means, we only need to Since none of the existing queries change their
compute p.srQ for those queries whose quarantine areas, the new safe region p.sr′ is
simply the intersection of the current safe

223
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

region p.sr and the safe region for this new (D(q,p)). In general, when an object u is
query Q, i.e., p.sr′ = p.sr∩p.srQ. 2) After popped from H, we need to do the following.
processing a source-initiated location update If d(q,ui) is larger than D(q, v),
of object p, p‘s safe region needs to be Where v is the top object in h, then v is
completely recomputed by computing the guaranteed a kNN and removed from h. Then,
p.srQ for each relevant query. 3) During the d(q,u) is compared with the next D(q, v) until
processing of a source-initiated location it is no longer the larger one. Then, u itself is
update, if object p is probed, its safe region is inserted to h and the algorithm continues to
also completely recomputed as in case 2. pop up the next entry from H. The algorithm
Although it is still a probe as in case 1 and continues until k objects are returned.
only one p.srQ changes (i.e., the query which
Algorithm 3:Reevaluating a kNN Query
probes p), we completely recomputed p.sr Input: C: existing set of kNNs
since p.srQ could be enlarged by this probe p: the updating object
and recomputing it allows such enlargement to Output: C: the new set of kNNs
contribute to p.sr. the objective of a safe Procedure:
region is to reduce the number of location 1: if p is closer to the k-th NN then
updates. 2: if p ЄC then
3: p* =the rank of p in C;
4: else
5: p* =k;
6: enqueuep into C;
7: else
8: if p Є C then
9: evaluate 1NN query to find u;
10: p* =k;
11: remove p and enqueueu into C;
Algorithm 2: Evaluating a new kNN Query 12: relocate p or u in C, starting from p*;
Input: root: root node of object index q: the
query point To reevaluate an existing kNN query that is
Output C: the set of kNNs affected by the updating object p, the first
Procedure: step is to decide whether p is a result object
1: initialize queue H and h; by comparing p with the kth NN using the
2: enqueue<root; d(q,root)>into H; ―closer‖ relation: if p is closer, then it is a
3: while |C| < k and H is not empty do result object;
4: u = H.pop (); otherwise, it is a non result object. This then
5: if u is a leaf entry then leads to three cases: 1) case 1: p was a result
6: while d(q, u) > D(q,v)do
object but is no longer so; 2) case 2: p was
7: v = h.pop();
8: insert v to C; not a result object but becomes one; and 3)
9: enqueueu into H; case 3: p is and was a result object.4 For case
10: else if u is an index entry then 1, there are fewer than k result objects, so
11: for each child entry v of u do there should be an additional step of
12: enqueue<v, d(v,q)>into H; evaluating a 1NN query at the same query
point to find a new result object u. The
evaluation of such a query is almost the same
The query is evaluated using the algorithm 2 as Algorithm 2, except that all existing kNN
the algorithm maintains an additional priority result objects are not considered. The final
queue h besides H. It is a priority queue of step of reevaluation is to locate the order of
objects sorted by the ―closer‖ relation. The new
reason to introduce h is that when an object p result object p in the kNN set. This is done by
is popped from H, it is not guaranteed a kNN comparing it with other existing objects in the
in the new space. Therefore, H is used to hold kNN set using the ―closer‖ relation. For cases 1
p until it can be guaranteed a kNN. This and 2, since this object is a new result object,
occurs when another object p‘ is popped from the comparison should start from the kth NN,
H, and its minimum distance to q (d(q,p‘)) is then
larger than the maximum distance of p to q

224
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Algorithm 4:Cloaking Algorithm


G (V, E): directed graph
V: set of nodes (requests)
E: set of edges
Edge eij=(ri, rj) ∈ E, iff| rirj | <ri. n, j ≠ i} | ≥ k-1, and its location is covered
Edge eji=(rj, ri) ∈ E, iff| rirj | <rj. by the cloaking regions of at least k-1 other
requests (identifier anonymity set)
r=(id,l,,Δt,k,δ,data,t)
rican be anonymized immediately if there are | {j |ri. l Є rj‘. L, 1 ≤ j ≤ n,j ≠i} | ≥ k-1.Location
at least k-1 other forwarded requests in Uout Privacy expand the user location into a
and k-1 other forwarded requests in Uin cloaking region such that the location k-
Location anonymity set Uout= {request} anonymity model is satisfied,the request must
Outgoing neighbors be anonymized before the pre-defined
Identifier anonymity set Uin= {request} maximum cloaking delay, the cloaking region
Incoming neighbors size should not exceed a threshold.
Spatial Index Þ=(x,y) 6 PERFORMANCE EVALUATION
Min heap= (t+Δt) We have developed a discrete event-driven
simulator to evaluate the performance of the
proposed safe-region-based query monitoring
k-1th NN, and so on. However, for case 3, (denoted as SRB) framework. We compare it
since p was in the set, the comparison can with two other schemes, i.e., the optimal
start from where p was. Algorithm 3 shows monitoring (OPT) and the periodic monitoring
the pseudo code of kNN query reevaluation, (PRD).
where p* denotes the starting position of the
comparison.
5 PRIVACY PROTECTION

6.1Cloaking Success Rate


varing k
CliqueCloak
1
Proposed(No Dummy)
success rate

0.8

0.6

0.4

0.2

0
overall 2 3 4 5
k

Larger k = lower success rate.


The key idea underlying this algorithm 4 is Relative Anonymity Level
that a given degree of anonymity can be 10 CliqueCloak
maintained in any location and user. Directed
relative k level

Proposed(No Dummy)
8
graph finds the location anonymity set and 6
Proposed(Dummy)

identifier anonymity set to satisfy thelocation 4


k-anonymity model through neighbor ships of 2
request nodes. Spatial index uses window 0
query to facilitate construction 2 3 4 5

andmaintenance of neighbor ships in the k

graphMin-heap Order the requests Relative location anonymity level = k‘ / k


according to their cloaking deadlines, detect
the expiration of requests For any request, if
and only if its cloaking region covers the
locations of at least k-1 other requests
(location anonymity set) |{j | rj. l Є rj‘ .L,1 ≤ j ≤

225
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2428, 2009, Lyon, France Copyright 2009


Cloaking Efficiency
CliqueCloak
VLDB Endowment.
1.8 Proposed(No Dummy) [5] M.F. Mokbel, X. Xiong, and W.G. Aref,
average cloaking
time(millisec)

1.6 Proposed(Dummy) ―SINA: Scalable Incremental Processing of


1.4
1.2 Continuous Queries in Spatio-Temporal
1
0.8 Databases,‖ Proc. ACM SIGMOD, 2004.
0.6 [6] M. Gruteser and D. Grunwald, ―Anonymous
0.4
0.2 Usage of Location- Based Services through
0
2 3 4 5
Spatial and Temporal Cloaking,‖ Proc.
MobiSys, 2003
k [7] S. Prabhakar, Y. Xia, D.V. Kalashnikov,
W.G. Aref, and S.E. Hambrusch, ―Query
This method has much shorter cloaking time. Indexing and Velocity Constrained Indexing:
Scalable Techniques for Continuous Queries
on Moving Objects,‖ IEEE Trans. Computers,
7 CONCLUSIONS vol. 51, no. 10, pp. 1124-1140, Oct. 2002.
This paper proposes a framework for [8] ] B. Gedik and L. Liu, ―Location Privacy in
monitoring continuous spatial queries over Mobile Systems: A Personalized Anonymization
moving objects. The frame-work distinguishes Model,‖ Proc. IEEE Int‘l Conf. Distributed
itself from existing work by being the first to Computing Systems (ICDCS), pp. 620-629,
address the location update issue and to 2005. 418 ieee transactions on knowledge and
provide a common interface for monitoring data engineering, vol. 22, no. 3, march 2010
mixed types of queries. Based on the notion of fig. 22. Minimum cost strategy versus standard
safe region, the location updates are query strategy. (a) Accuracy. (b) Communication
aware and thus the wireless communication cost.
and query reevaluation costs are significantly [9] X. Yu, K.Q. Pu, and N. Koudas, ―Monitoring
reduced and this paper provide detailed k-Nearest Neighbor Queries over Moving
algorithms for query evaluation/reevaluation Objects,‖ Proc. IEEE Int‘l Conf. Data Eng.
and safe region computation.We established (ICDE), 2005.
that the proposed approach is in fact a very [10]M.F. Mokbel, C.-Y. Chow, and W.G. Aref,
efficient solution even if there are no limits on ―The New Casper: Query Processing for
object speed or nature of movement or Location Services without Compromising
fraction of objects that move at any moment Privacy,‖ Proc. Int‘l Conf. Very Large Data
in time, and this paper proposes cloaking Bases (VLDB), pp. 763- 774, 2006.
algorithm for privacy protection. [11] P. Kalnis, G. Ghinita, K. Mouratidis, and
D. Papadias, ―Preventing Location-Based
8 REFERENCES Identity Inference in Anonymous Spatialn
[1] PAM: An Efficient and Privacy-Aware Queries,‖ IEEE Trans. Knowledge and Data
Monitoring Framework for continuously Moving Eng., vol. 19, no. 12, pp. 1719-1733, Dec.
Objects 2007.
Haibo Hu, JianliangXu, Senior Member, IEEE, [12]‖Monitoring k-Nearest Neighbor Queries
and DikLun Lee over Moving objects‖Xiaohuiyu Ken Q.pu Nick
[2]Blind evaluation of nearest neighbor Koudas.
queries using space transformation to preserve
location privacy Pages: 239-257 ACM Year of
Publication: 2007
[3] C.S. Jensen, D. Lin, and B.C. Ooi, ―Query
and Update Efficient B+-Tree Based Indexing
of Moving Objects,‖ Proc. Int‘l Conf. Very
Large Data Bases (VLDB), 2004.
[4]An Efficient Technique to Continuously
Monitoring Reverse kNN VLDB ‗09, August

226
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

PRIVACY-PRESERVING USING TUPLE AND


THRESOLD MATCHING IN DISTRIBUTED
SYSTEMS
Author: Santhikala.M,
M.E. Computer Science and Engg,
Thiruvalluvar college of enggineering and
Technology,
Vandavasi.
Email addresssaravanansanthi@yahoo.co.in.
Mobile no:9789074232
Co-Author : Anantharaj. B
Asst. Professor,
Dept of Information and Technology,
Thiruvalluvar college of enggineering and
Technology,
Vandhavasi.

ABSTRACT Privacy Preserving Threshold Attributes Matching


This project aims at removing the can be used to strongly find out whole profit or
problems associated with tuple matching and loss, or survey of the organization. To calculate
lack of privacy in distributed databases. this it will set some threshold value and get
Nowadays many problems arise because of lack information from the other parties. All parties can
of privacy. In the distributed database matching access that information from the database.
the tuple, then the data sources may need to Corresponding party will provide the key to the
protect their private tuples against others, while requested party, and then only the requested
they are difficult to find a trusted party, which party can view the original information. After that
can read all the private tuples for computations they compare all the tables and do their process
without leaking the tuples. This project deals to get their output. These two protocols are
with the two tuple matching problems, Privacy- applied on the basis of pailliers encryption
Preserving Duplicate Tuple Matching (PPDTM) mechanism.
and Privacy Preserving Threshold Attributes IndexTerms— Privacy preservation, distributed
Matching (PPTAM), and addresses their privacy database, secure computation, zero-knowledge
protection issues without existence of a trusted proof.
party. Privacy Preserving Duplicate Tuple INTRODUCTION
Matching has many applications such as
departments of government, transport support TUPLE matching is a basic problem that has been
scheme, social welfare department or employees encountered in many applications of databases.
retraining board. I have also introduced In tuple matching among distributed databases,
the zero knowledge proof to provide transcript to data sources may need to protect their private
the party if they are identified as intruder. tuples against others, while they are difficult to

227
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

find a party trusted by all, which can read all the has a lower computation cost than the related
private tuples for computations without eaking solution in and by trading off communication
the tuples. We focus in this paper on two tuple cost. Since the computation cost dominates the
matching problems, Privacy-Preserving Duplicate whole cost, our protocol is faster than the
Tuple Matching (PPDTM) and Threshold solution in distributed system.
Attributes Matching (PPTAM), and address their
privacy protection issues without existence of a 2. Our PPTAM protocol for the
trusted party. semihonest model has lower computation and
Motivations for PPDTM communication costs than the related solution
PPDTM has numerous applications. For example, derived by the techniques in [22] and [23].
a foundation providing grants to support 3. By constructing the required zero-knowledge
participation of academic conferences usually proofs, we extend the PPDTM and PPTAM
does not accept an application that is protocols on malicious model. Some of these
simultaneously submitted to another foundation zero-knowledge proofs were also mentioned in
(duplicate submission). It may want to find out [23], but detailed constructions were not given.
all duplicate applications and remove them from In addition, the Proof of
its pool: denoted by a tuple with attributes such Correct Polynomial Evaluation (POCPE) is not
as the abbreviated conference name, abbreviated considered in [23], without which an adversary
academic paper title, and the first author‘s name, can ask for decryptions of any useful information
a submission in this foundation can be checked for itself. In comparison with the solutions
by tuple matching with those in other derived from the techniques in [23], our PPDTM
foundations. However, all foundations should do protocol in malicious model has the same
this under protection on applicants‘ personal magnitude of costs, and our PPTAM protocol in
information following privacy policies. malicious model has lower costs both in
Motivations for PPTAM computation and communication.
PRIVACY-PRESERVING DUPLICATE TUPLE
PPTAM can be used to securely find out regular MATCHING
rules in temporal database composed of Main Idea
timestamped transactions. The temporal Denoting T(I, j) as T(I,j)||k . . . k||T(I, j)M, Pi
database can be weekly sales data of a can compute a polynomial fi to represent its
supermarket, investment profits of a financial inputs Ti: fi ¼ (x _ T(i; 1)) _ _ _ (x _ T(i; S))
institution, etc. Some regular rules may be mod N. Then, each coefficient of fi is in ZZN ,
interesting for forecasting and decision support and can be encrypted to get E(fi). If T(i; j) has a
of the database owners, e.g.,―Everyday 6 to 7 duplicate on some Pi0 (i0 6¼ i), its evaluation on
PM, more than 100 bottles of beers are sold.‖ the polynomial Gi ¼ Q i02f0;...;N_1g;i0 6¼i fi0
The rules considered in this paper are about is 0. If T(i; j) has not a duplicate, we make the
whether each single item (beer, chips, etc.) has a evaluation a random number by randomizing Gi
threshold number of count in a regular time as Fi in (2), where r is a random number over
interval, so they are different from the cyclic ZZN . To prevent an adversary from factoring Fi
association rules in [27], which are regular rules to get the honest parties‘ inputs, we need to
among associated items (beer ! chips), such as encrypt Fi, evaluate T(i; j) for j ¼ 1; . . . ; S in
―Everyday 6 to 7 PM, 87 percent customers who E(Fi), and only decrypt the evaluations. If T(i; j)
buy beers also buy chips.‖ We consider the has a duplicate, the decryption will be 0 in the
scenario that several retail stores ally to stand in Paillier scheme. If it has not a duplicate, by
competition with some retailing magnate, and Lemma 1, the decryption will be a random
they want to find out the regular rules over the number.
union of their databases without publishing their PRIVACY-PRESERVING THRESHOLD
databases for privacy concerns. ATTRIBUTES MATCHING
My contributions Main Idea
My main contributions in this paper include:
1. By calculating the total exps (modular As defined in Section 1, for the PPTAM problem,
exponentiations) and muls (modular each Pi needs to privately determine whether T(i;
multiplications), and total communication bits, j) 2 T) . We will treat each attribute of a tuple as
our PPDTM protocol for the semihonest model an individual input. The 1772 IEEE

228
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

TRANSACTIONS ON KNOWLEDGE AND DATA CONCLUDING REMARKS


ENGINEERING, VOL. 21, NO. 12, DECEMBER We present protocols for the problem of PPDTM
2009 Authorized licensed use limited to: LA and the problem of PPTAM among N parties. The
TROBE UNIVERSITY. Downloaded on November protocols are constructed in the semihonest
30, 2009 at 07:35 from IEEE Xplore. Restrictions model first, and then,
apply value set of the kth attribute on Pi is Ai k extended to the malicious model by zero-
¼ fT(i; j)kjj ¼ 1; . . . ; Sg, and the value set of knowledge proofs. Solutions can also be derived
the kth attribute on N parties is Ak ¼ SN_1 i¼0 from the techniques in [22] and [23]. We analyze
Ai k. We can construct the polynomial fik (xk) ¼ the computation and communication
QS j¼1(xk _ T(i; j)k) mod N for Ai k, and fk(xk) costs of each protocol, and do experiments on
¼ the running times, to show the complexity
QN_1 i¼0 fik (xk) mod N for Ak. f(l) k (xk) is the superiorities of our protocols to the derived
lth derivative of fk(xk). Our PPTAM protocol is solutions. Experiments show that our protocols
based on the following ―iff‖ relation: an element are suitable for nonreal-time scenarios where no
a appears in the multiset fa1; . . . ; ang at least d urgent response is required, e.g., finding out
times () a is a common root of polynomial g and duplicate applications to conference grants or
derivatives g(1); . . . ; g(d_1), in which g ¼ (x _ government supports, or finding out regular rules
a1) _ _ _ (x _ an). To determine whether T(i; j) in distributed temporal databases. Our PPTAM
2 T) protocol is also suitable for real-time distributed
i , we compute f(l) k for k ¼ 1; . . .;M and l ¼ 1; anomaly detection. As pointed out in [25], a
. . . ; d _ 1 and randomize them as 1,024-bit
f(l) k _ PN_1 i¼0 si;k;l, in which si;k;l 2 ZZN is a modulus (or 160-bit elliptic curve) is enough for
nonzero random number generated by Pi. We a long-term security. If a larger key size is
also construct a multivariate polynomial G(x1; . . required to ensure stronger security, hardware-
. ; xM) ¼ PM k¼1 gk(xk), in which gk(xk) ¼ implemented ECP can be deployed. For
fk(xk) _ PN_1 i¼0 si;k;0 ) _ _ _)f(d_1) k (xk) _ example, one scalar multiplication on 160-bit
PN_1 i¼0 si;k;d_1. By Lemma 2 in Section 6.2, elliptic curve can be 0.21 ms by the elliptic curve
we can use the evaluation of T(i; j) at G(x1; . . . processor of [28].
; xM) to determine whether T(i; j) 2 T).

229
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DDOS DEFENSE MECHANISMS FOR


DETECTING, TRACING AND MITIGATING
NETWORK WIDE ANOMALIES
*Himavantha Raju Vatsavai **Ms.G.Muneeswari M.E.(Ph.D)
*Department of Computer Science & Engineering,

**Asst Professor, Department of Computer Science & Engineering,


R.M.K Engineering College,Kavaraipettai – 6010206, India.
Abstract - Application DoS attack, which aims based defense methods have tried to detect
at disrupting application service rather than these attacks by controlling traffic volume or
depleting the network resource, has emerged differentiating traffic patterns at the
as a larger threat to network services, intermediate routers .However, with the boost
compared to the classic DoS attack. Owing to in network bandwidth and application service
its high similarity to legitimate traffic and types,
much lower launching overhead than classic
DDoS attack, this new assault type cannot be recently, the target of DoS attacks has shifted
efficiently detected or prevented by existing from network to server resources and
detection solutions. To identify application DoS application procedures
attack, we propose a novel group testing themselves, forming a new application DoS
(GT)-based approach deployed on back-end attack. As stated in , by exploiting flaws in
servers, which not only offers a theoretical application design and implementation,
method to obtain short detection delay and application DoS attacks exhibit three
low false positive/negative rate, but also advantages over traditional DoS attacks which
provides an underlying framework against help evade normal detections: malicious traffic
general network attacks. More specifically, we is always indistinguishable from normal traffic,
first extend classic GT model with size adopting automated script to avoid the need
constraints for practice purposes, then for a large amount of ―zombie‖ machines or
redistribute the client service requests to bandwidth to launch the attack, much harder
multiple virtual servers embedded within each to be traced due to multiple redirections at
back-end server machine, according to specific proxies. According to these characteristics, the
testing matrices. Based on this framework, we malicious traffic can be classified into
propose a two-mode detection mechanism legitimate-like requests of two cases: 1) at a
using some dynamic thresholds to efficiently high inter arrival rate and 2) consuming more
identify the attackers. The focus of this work service resources. We call these two cases
lies in the detection algorithms proposed and ―high-rate‖ and ―high-workload‖ attacks,
the corresponding theoretical complexity respectively, in this paper.
analysis.
Keywords–Application DoS, group testing, Since these attacks usually do not cause
network security. congestion at the network level; thus, bypass
I. INTRODUCTION the network-based monitoring system ,
DENIAL-OF-SERVICE (DoS) attack, which aims detection, and mitigation at the end system of
to make a service unavailable to legitimate the victim servers have been proposed .
clients, has become a severe threat to the Among them, the DDoS shield and CAPTCHA-
Internet security. Traditional DoS attacks based defense are the representatives of the
mainly abuse the network bandwidth around two major techniques of system-based
the Internet subsystems and degrade the approaches: session validation based on
quality of service by generating congestions at legitimate behavior profile and authentication
the network . Consequently, several network- using human-solvable puzzles. By enhancing

230
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

the accuracy of the suspicion assignment for rate DoS attacks and cannot handle high-
each client session, DDoS shield can provide workload ones.
efficient session schedulers for defending
possible DDoS attacks. However, the overhead In a system viewpoint, our defense scheme is
for per-session validation is not negligible, to embed
especially for services with dense traffic. multiple virtual servers within each physical
CAPTCHA-based defenses introduce additional back-end server and map these virtual servers
service delays for legitimate clients and are to the testing pools in GT, then assign clients
also restricted to human interaction services. A into these pools by distributing their service
kernel observation and brief summary of our requests to different virtual servers. By
method is: the identification of attackers can periodically monitoring some indicators (e.g.,
be much faster if we can find them out by average responding time) for resource usage
testing the clients in group instead of one by in each server and comparing them with some
one.Thus, the key problem I show to group dynamic thresholds, all the virtual servers can
clients and assign themto different server be judged as ―safe‖ or ―under attack.‖ By
machines in a sophisticated way, so that if means of the decoding algorithm of GT, all the
any server is found under attack, we can attackers can be identified. Therefore, the
immediately biggest challenges of this method are
threefold: 1) How to construct a testing matrix
identify and filter the attackers out of its client to enable prompt and accurate detection. 2)
set. Apparently, this problem resembles the How to regulate the service requests to match
group testing (GT) theory which aims to the matrix in practical system. 3) How to
discover defective items in a large population establish proper thresholds for server source
with the minimum number of tests where each usage indicator, to generate accurate test
test is applied to a subset of items, called outcomes. Similar to all the earlier applications
pools, instead of testing them one by one. of GT, this new application to network security
Therefore, we apply GT theory to this network requires modifications of the classical GT
security issue and propose specific algorithms model and algorithms, so as to overcome the
and protocols to achieve high detection obstacle of applying the theoretical models to
performance in terms of short detection practical scenarios. Specifically, the classical
latency and low false positive/negative rate. GT theory assumes that each pool can have as
Since the detections are merely based on the many items as needed and the number of
status of service resources usage of the victim pools for testing is unrestricted. However, in
servers, no individually signature-based
authentications or data classifications are order to provide real application services,
required; thus, it may overcome the limitations virtual servers cannot have infinite quantity or
of the current solutions. GT was proposed capacity, i.e., constraints on these two
during World War Two and has been applied parameters are required to complete our
to many areas since then, such as medical testing model.
testing, computer networks, and molecular
biology . The advantages of GT lie in its Our main contributions in this paper are as
prompt testing efficiency and fault-tolerant follows:
decoding methods . To our best knowledge, . Propose a new size-constrained GT model for
the first attempts to apply GT to networking practical DoS detection scenarios.
attack defense are proposed in parallel by Thai . Provide an end-to-end underlying system for
et al. (which is the preliminary work of this GT based
journal) and Khattab et al.. The latter schemes, without introducing complexity at
proposed a detection system based on the network core.
―Reduced-Randomness Non adaptive . Provide multiple dynamic thresholds for
Combinatorial Group Testing‖ . However, since resource usage indicators, which help avoid
this method only counts the number of error test from legitimate bursts and diagnose
incoming requests rather then monitoring the servers handling various amount of clients.
server status, it is restricted to defending high- . Present three novel detection algorithms
based on the proposed system, and show their

231
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

high efficiencies in terms of detection delay sequential GT, use the results of previous tests
and false positive/negative rate via theoretical to determine the pool for the next test and
analysis and simulations. complete the test within several rounds. While
nonadaptive GT methods employ d-
Besides application DoS attacks, our defense disjunctmatrix , run multiple tests in parallel,
system is and finish the test within only one round. We
applicable to DoS attacks on other layers, e.g., investigate both these methods and propose
protocol layer attack—SYN flood where victim three algorithms accordingly.
servers are exhausted by massive half-open
connections. Although these attacks occur in 2.1.3 Decoding Algorithms
different layers and of different styles, the For sequential GT, at the end of each round,
victim machines will gradually run out of items in negative pools are identified as
service resource and indicate anomaly. Since negative, while the ones in positive pools
our mechanism only relies on the feedback of require to be further tested. Notice that one
the victims, instead of monitoring the client item is identified as positive only if it is the
behaviors or properties, it is promising to only item in a positive pool. Non adaptive GT
tackle these attack types. takes d-disjunct matrices as the testing matrix
M, where no column is contained in the
The paper is organized as follows: In Section Boolean summation of any other d columns.
2, we briefly introduce some preliminaries of Du and Hwang proposed a simple decoding
the GT theory, as well as the attacker model algorithm for this matrix type. A sketch of this
and the victim/detection model of our system. algorithm can be shown using Fig. 1 as an
In Section 3, we propose the detection example. Outcomes V [3] and V [4] are 0, so
strategy derived from the adjusted GT model items in pool 3 and pool 4 are negative, i.e.,
and illustrate the detailed components in the items 3, 4, and 5 are negative. If this matrix M
presented system, In section 4 we will see is a d-disjunct matrix, items other than those
simulation results and finally, we reach our appearing in the negative pools are positive;
conclusion by summating our contributions therefore, items 1 and 2 are positive ones.
and providing further discussions over false
positive negative rate i 2.1.4 Apply to Attack Detection
A detection model based on GT can be
2 PRELIMINARY assume that there are t virtual servers and n
clients, among which d clients are attackers.
2.1 Classic Group Testing Model Consider the matrix in Fig. 1, the clients can
2.1.1 Basic Idea be mapped into the columns and virtual
The classic GT model consists of t pools and n servers into rows in M, where M[i,j]=11 if and
items (including at most d positive ones). As only if the requests from client j are
shown in Fig. 1, this model can be represented distributed to virtual server i. With regard to
by a t x n binary matrix M where rows the test outcome column V, we have V [I,j]= 1
represent the pools and columns represent the if and only if virtual server i has received
items. An entry m[i,j]=1 if and only if the ith malicious requests from at least one attacker,
pool contains the jth item; otherwise, M[i,j]=
0. The t-dimensional binary column vector V
denotes the test outcomes of these t pools,
where 1-entry represents a positive outcome
and 0-entry represents a negative one. Note
that a positive outcome indicates that at least
one positive item exists within this pool, Fig.1. Binary testing matrix M and testing
whereas negative one means that all the items outcome vector V.
in the current pool are negative. but we cannot identify the attackers at once
unless this virtual server is handling only one
2.1.2 Classic Methods client. Otherwise, if V [i,j]= 0, all the clients
Two traditional GT methods are adaptive and assigned to server i are legitimate. The d
non adaptive . Adaptive methods, a.k.a.

232
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

attackers can then be captured by decoding characteristics of this attack . Due to the
the test outcome vector V and the matrix M. benefits of virtual Servers we employ, this
constraint can be relaxed, but we keep it for
the theoretical analysis in the current work

2.3 Victim/Detection Model


The victim model in our general framework
consists of
multiple back-end servers which can be Web
application servers, database servers, and
distributed file systems. We do not take classic
multitier Web servers as the model, since our
detection scheme is deployed directly on the
victim tier and identifies the attacks targeting
at the same victim tier; thus, multitier attacks
should be separated into several classes to
utilize this detection scheme. The victim model
Fig. 2.Victim/detection model. along with front-end proxies is shown in Fig.
2.2 Attacker Model 2. We assume that all the back-end servers
The maximum destruction caused by the provide multiple types of application services
attacks includes the depletion of the to clients using HTTP/1.1 protocol on TCP
application service resource at the server side, connections. Each back-end server is assumed
the unavailability of service access to to have the same amount of resource.
legitimate user, and possible fatal system Moreover, the application services to clients
errors which require rebooting the server for are provided by K virtual private servers,
recovery. We assume that any malicious which are embedded in the physical back-end
behaviors can be discovered by monitoring the server machine and operating in parallel. Each
service resource usage, based on dynamic virtual server is assigned with equal amount of
value thresholds over the monitored objects. static service resources, e.g., CPU, storage,
Data manipulation and system intrusion are memory, and network bandwidth. The
out of this scope. we assume that application operation of any virtual server will not affect
interface presented by the servers can be the other virtual servers in the same physical
readily discovered and clients communicate machine. The
with the servers using HTTP/1.1 sessions on
TCP connections. We consider a case that reasons for utilizing virtual servers are
each client provides a non spoofedID which is twofold: first, each virtual server can reboot
utilized to identify the client during our independently, thus is feasible for recovery
detection period. Despite that the application from possible fatal destruction; second, the
DoS attack is difficult to be traced, by state transfer overhead for moving clients
identifying the IDs of attackers, the firewall among different virtual servers is much smaller
can block the subsequent malicious requests. than the transfer among physical server
machines.
As mentioned in Section 1, the attackers are
assumed to As soon as the client requests arrive at the
launch application service requests either at front-end proxy, they will be distributed to
high inter arrival rate or high workload, or multiple back-end servers for load balancing,
even both. The term ―request‖ refers to either whether session sticked or not. Notice that our
main request or embedded request for HTTP detection scheme is behind this front-end tier,
page. Since the detection scheme proposed so the load balancing mechanism is orthogonal
will be orthogonal to the session affinity, we to our setting. On being accepted by one
do not consider the repeated one-shot attack physical server, one request will be simply
mentioned in . We further assume that the validated based on the list of all identified
number of attackers d <<n where n is the attacker IDs (blacklist). If it passes the
total client amount. This arises from the authentication, it will be distributed to one

233
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

virtual servers within this machine by means will address the last two in this section and
of virtual switch. This distribution depends on leave the first one to the next section.
the testing matrix generated by the detection
algorithm. By periodically monitoring the 3.2.1 System Overview
average response time to service requests and As mentioned in the detection model, each
comparing it with specific thresholds fetched back-end server works as an independent
testing domain, where all virtual servers within
it serve as testing pools. In the following
sections, we only discuss the operations within
one backend server, and it is similar in any
other servers. The detection consists of
multiple testing rounds, and each round can
Fig.3.Two-state diagram of the be sketched in four stages (Fig. 4):
system.
from a legitimate profile, each virtual server is
associated with a ―negative‖ or ―positive‖ First, generate and update matrix M for
outcome. Therefore, a decision over the testing. Second, ―assign‖ clients to virtual
identities of all clients can be made among all servers based onM. The back-end server maps
physical servers, as discussed further in the each client into one distinct column in
following Section 3. M and distributes an encrypted token queue to
it. Each token in the token queue corresponds
3 STRATEGY AND DETECTION SYSTEM to a 1-entry in the mapped column, i.e., client
j receives a token with destination virtual
3.1 Size Constraint Group Testing server i iffM[i,j]= 1. Being piggy backed with
As mentioned in the detection model, each one token, each request is forwarded to a
testing pool is mapped to a virtual server virtual server by the virtual switch. In addition,
within a back-end server machine. Although requests are validated on arriving at the
the maximum number of virtual servers can be physical servers for faked tokens or identified
extremely huge, since each virtual server malice ID. This procedure ensures that all the
requires enough service resources to manage client requests are distributed exactly as how
client requests, it is practical to have the the matrix M regulates and prevents any
virtual server quantity (maximum number of attackers from accessing the virtual servers
servers) and capacity (maximum number of other than the ones assigned to them. Third,
clients that can be handled in parallel) all the servers are monitored for their service
constrained by two input parameters K and w, resource usage periodically, specifically, the
arriving request aggregate (the total number
respectively. Therefore, the traditional GT of incoming requests) and average response
model is extended with these constraints to time of each virtual server are recorded and
match our system setting The maximum compared with some dynamic thresholds to be
number of attackers d is assumed known shown later. All virtual servers are associated
beforehand. Scenarios with nondeterministic d withpositive or negative outcomes
are out of the scope of this paper. In fact, accordingly.Fourth, decode these outcomes
these scenarios can be readily handled by first and
testing with an estimated d, then increasing d
if exactly d positive items are found. identify legitimate or malicious IDs. By
following the detection algorithms(presented
3.2 Detection System in the next section), all the attackers can be
The implementation difficulties of our identified within several testing rounds. To
detection scheme lower the overhead and delay introduced by
are threefold: how to construct proper testing the mapping and piggybacking for each
matrix M, how to distribute client requests request, the system is exempted from this
based on M with low overhead, and how to procedure in normal service state. As shown in
generate test outcome with high accuracy. We Fig. 3, the back-end server cycles between
two states, which we refer as NORMAL mode
and DANGER mode.

234
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

later), but we prefer faster detection in this


Once the estimated response time of any paper, and thus, adopt the former method.
virtual server exceeds some profile-based Matrix generations. The testing matrix M,
threshold, the whole backend server will which regulates distributing which client
transfer to the DANGER mode and execute the request to which server, poses as the kernel
detection scheme. Whenever the average part of this paper. All the three algorithms
response time of each virtual server falls proposed in the next section are concerning
below the threshold, physical server returns to with the design of M for the purpose of
NORMAL mode shortening the testing period and decreasing
the false positive/negative rate. Since the
3.2.2 Configuration Details detection phase usually undergoes multiple
Several critical issues regarding the testing rounds, M is required to be
implementation are as follows: regenerated at the beginning of each round.
The time overhead for calculating thisM is
quite low and will be shown via analytical
proofs in Section 4.

Distributing tokens. Two main purposes of


utilizing tokens are associating each client with
a unique, non spoofed ID and assigning them
Fig. 4.One testing round in DANGER mode.
to a set of virtual servers based on the testing
Session state transfer. By deploying the
matrix. On receiving the connection request
detection service on the back-end server tier,
from a client, each back-end server responses
our scheme is orthogonal with the session
with a token queue where each token is of 4-
state transfer problem caused by the load-
tuple: (client ID, virtual server ID, matrix
balancing at the reverse proxies (front-end
version, and encrypted key). ―client ID‖ refers
tier). To simplify the discussion of
to the unique non spoofed ID for each client,
implementation details, we assume that the
which we assume unchanged during our
front-end proxies distribute client requests
testing period (DANGER mode). ―virtual server
strictly even to the back-end servers, i.e.,
ID‖ is the index of each virtual server within
without considering session sticked. The way
the back-end server. This can be implemented
of distributing token queues to be mentioned
as a simply index value, or through a mapping
later is tightly related to this assumption.
from the IP addresses of all virtual servers.
However, even if the proxies conduct more
The back-end server blocks out-of-date tokens
sophisticated forwarding, the token queue
by checking their ―matrix version‖ value, to
distribution can be readily adapted by
avoid messing up the request distribution with
manipulating the token piggybacking
non uniform matrices. With regard to the
mechanism at the client side accordingly.
―encrypted key,‖ it is an encrypted value
Since the testing procedure requires
generated by hashing the former three values
distributing intra session requests to different
and a secured service key. This helps rule out
virtual servers, the overhead for maintaining
any faked tokens generated by attackers.
consistent session state is incurred. Our
Assume that the load-balancing at the proxies
motivation of utilizing virtual servers is to
is strictly even for all back-end servers, the
decrease such
client has to agree on piggy backing each
overhead to the minimum, since multiple
request with a token at the head of one token
virtual servers
queue, and then, the next request with the
can retrieve the latest client state though the
token at
shared memory, which resembles the principle
the head of the next token queue, when
of Network File
receiving this application service. Notice that
System (NFS). An alternative way out is to
there are multiple token
forward Intra session requests to the same
queues released by multiple back-end servers,
virtual server, which calls for longer testing
it is nontrivial to implement the correct
period for each round (to be further discussed
request distribution.

235
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4 SIMULATIONS 3. the overhead of maintaining the state


To demonstrate the theoretical complexity transfer among
results shown in the previous section, we virtual servers can be further decreased by
conduct a simulation study on the proposed more sophisticated techniques.
system, in terms of four metrics: average 4. even that we already have quite low false
testing latency T which refers to the length of positive/
the time interval from attackers starting negative rate from the algorithms, we can still
sending requests till all of them are identified; improve it via false-tolerant group testing
average false positive rate fp; false negative methods, as discussed next.
rate fn; as well as the average number of
testing rounds Rtest which stands for the Besides attempting to alleviate the errors
number of testing rounds needed for resulting from
identifying all the clients by each algorithm. the proactive learning on legitimate traffic, we
will develop our algorithms using the error-
tolerant d-disjunct matrix, which can still
provide correct identification results even in
the presence of some error tests. Specifically,
5 CONCLUSIONS a testing matrix M is (d;z) p-disjunct if any
We proposed a novel technique for detecting single column of it has at least z elements that
application are not in the union of any d other columns.
DoS attack by means of a new constraint- By applying this matrix M to our PND
based group algorithm, up to d positive items can be
testing model. Motivated by classic GT correctly identified, if the number of error
methods, three tests is not more than z -1. With efficient
detection algorithms were proposed and a decoding algorithms proposed by Du and
system based on these algorithms was Hwang , this error-tolerant matrix has great
introduced. Theoretical analysis and potentials to improve the performance of the
preliminary simulation results demonstrated PND algorithm and handle application DoS
the outstanding performance of this system in attacks more efficiently.
terms of low detection latency and false
positive/negative rate. Our focus of this paper REFERENCES
is to apply group testing principles to [1] S. Ranjan, R. Swaminathan, M. Uysal, and
application DoS attacks, and provide an E. Knightly, ―DDos-Resilient Scheduling to
underlying framework for the detection Counter
against a general case of network assaults, Application Layer Attacks under Imperfect
where malicious requests are indistinguishable Detection,‖ Proc. IEEE INFOCOM, Apr. 2006.
from normal ones. For the future work, we will [2] S. Vries, ―A Corsaire White Paper:
continue to investigate the potentials of this Application Denial of Service(DoS) Attacks,‖
scheme and improve this proposed system to application-level-dos-attacks.pdf, 2010.
enhance the detection efficiency. Some [3] S. Kandula, D. Katabi, M. Jacob, and A.W.
possible directions for this can be: Berger, ―Botz-4-Sale: Surviving Organized
DDoS Attacks That Mimic Flash Crowds,‖ Proc.
1. the sequential algorithm can be adjusted to Second Symp. Networked Systems Design and
avoid the Implementation (NSDI), May 2005.
requirement of isolating attackers. [4] S. Khattab, S. Gobriel, R. Melhem, and D.
2. more efficient d-disjunct matrix could Mosse, ―Live Baiting for Service-Level DoS
dramatically Attackers,‖ Proc. IEEE INFOCOM, 2008.
decrease the detection latency, as we showed [5] M.T. Thai, Y. Xuan, I. Shin, and T. Znati,
in the ―On Detection of Malicious Users Using Group
theoretical analysis. A new construction Testing Techniques,‖ Proc. Int‘l Conf.
method for Distributed Computing Systems (ICDCS),
this is to be proposed and can be a major 2008.
theoretical [6] M.T. Thai, P. Deng, W. Wu, and T. Znati,
work for another paper. ―Approximation Algorithms of Nonunique

236
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Probes Selection for Biological Target [13] L. Ricciulli, P. Lincoln, and P. Kakkar,
Identification,‖ Proc. Conf. Data Mining, ―TCP SYN Flooding Defense,‖ Proc. Comm.
Systems Analysis and Networks and Distributed Systems Modeling
Optimization in Biomedicine, 2007. and Simulation Conf. (CNDS), 1999.
[7] J. Mirkovic, J. Martin, and P. Reiher, ―A [14] D.Z. Du and F.K. Hwang, Pooling
Taxonomy of DDoS Attacks and DDoS Defense Designs: Group Testing in Molecular Biology.
Mechanisms,‖ Technical Report 020018, World Scientific, 2006.
Computer Science Dept., UCLA, 2002. [15] M.T. Thai, D. MacCallum, P. Deng, and
[8] M.J. Atallah, M.T. Goodrich, and R. W. Wu, ―Decoding Algorithms in Pooling
Tamassia, ―Indexing Information for Data Designs with Inhibitors and Fault Tolerance,‖
Forensics,‖ Proc. Int‘l Conf. Applied Int‘l J. Bioinformatics Research and
Cryptography and Network Security (ACNS), Applications, vol. 3, no. 2, pp. 145-152, 2007
pp. 206-221, 2005.
[9] J. Lemon, ―Resisting SYN Flood DoS
Attacks with a SYN Cache,‖ Proc. BSDCON,
2002.
[10] Service Provider Infrastructure Security,
―Detecting, Tracing, and Mitigating Network
eAnomalies,‖http://www. arbornetworks.com,
2005.
[11] Y. Kim, W.C. Lau, M.C. Chuah, and H.J.
Chao, ―Packet score: Statistics based Overload
Control against Distributed Denial-of- Service
Attacks,‖ Proc. IEEE INFOCOM, 2004.
[12] F. Kargl, J. Maier, and M. Weber,
―Protecting Web Servers from Distributed
Denial of Service Attacks,‖ Proc. 10th Int‘l
Conf. World Wide Web (WWW ‘01), pp. 514-
524, 2001.

237
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SPATIO-TEMPORAL INDEX STRUCTURE ANALYSIS


*Dr.KARTHIKEYANI.V., **SHAHINA BEGAM.I.,***TAJUDIN.K., ****PARVIN BEGAM.I
*Assistant Professor, Department of Computer Science, Govt. Arts College for Women, Salem-08,E-Mail:
drvkarthikeyani@gmail.com
**Asst.Professor, Department of MCA., VelTechHighTechDr.RR& Dr.SR Engg College, Ch-62E-Mail:
sbshahintaj@gmail.com
***Lecturer, Department Of Computer Science, New College, Royapettah, Chennai-14.
tajudinap@gmail.com
**** Lecturer, Department Of Computer Application, SokaIkedha College Of Arts and Science,Ch-
99.parvinnadiya@gmail.com

ABSTRACT : (also called timestamp). Hence, a point (nonpoint)


Index structure has always been an intensive and time-evolving object can be represented by a
extensive research topic in the database field. The line(volume) in the three-dimensional space and
main task of an index structure is to ensure fast corresponds to a moving point(moving region).
access to single or several records in a database on
the basis of a search key and thus avoid an 2.SPECIFICS OF SPATIO-TEMPORAL INDEX
otherwise necessary sequential scan through a file. STRUCTURES
For spatial data, many geometric index structures
have been devised as multidimensional spatial Here to give the simple method of geometric index
access methods (SAM), these structure don‘t include structure analysis Figure(2.1) shows the example
the time aspects. For temporal data, temporal index spatial and spatio-Temporal
structure have been proposed to index valid and/or
transaction times, these structure don‘t consider any
spatial aspects. Both kinds of index structures form
the basis for the current approaches on Spatio -
temporal index structure. Here to discuss about the
specification and classification scheme for Spatio-
temporal access methods. And also discuss how the
indexing is use in the multidimensional objects, and
efficiently support query processing .

Keywords: Spatio-Temporal , Specifics of spatio-


temporal index structure-continuous query
1.INTRODUCTION Figure (2.1)
2.1 DATA SET SUPPORTED:
The main task of an index structure is to ensure fast In the spatio-temporal case ,identified moving
access to single or several record in a database on points,moving regions ,and moving lines as the
the basis of the search key and avoid an necessary essential time-evolving spatial object classes.
sequential scant through a file. The goal of this 2.2 VALID VERSUS TRANSACTION TIME:
approach is to concentrate the movement of the The temporal database has identified a number of
spatial objects.Our goal is also to constitute a time models that partially can coexist. In particular ,
specification and classification scheme for spatio- transaction time(i.e., the time when a fact is current
temporal access method(STAMs) in the database and may be retrieved) and valid time
A spatio-temporal object o(denoted by its (i.e., the time when a fact is true in the modelled
identification number o_id) is a time evolving spatial reality) have been identified.This is lead to three
object-that is, its evolution or history is representing known kinds of spatio-temporal database
by a set of triplets (o_id,si,ti),where si (also called management systems (STDBMS): valid time (also
spacestamp) is the location object o_id at instant ti

238
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

called historical),transaction-time(also called


rollback) and bitemporal databases. In a moving object time is not just another third
2.3 DATABASE DYNAMICS: dimension,since it takes monotonically increasing
Another way of characterizing the requirements of values. Two consecutive triplets(0_id,si,ti) and
spatio-temporal index structures is to take into (o_id,si+1,ti+1) of an object o_id,ti+1 >ti always
account the degree of dynamics of a spatio-temporal holds.
database.
i)The first case is the cardinality of a database(i.e., 2.7 SUPPORT OF SPECIFIC SPATIO-TEMPORAL
the number of moving objects) is static overtime,but QUERIES:
moving objects may change their location The ultimate objective of indexing techniques is the
ii)The second case is the reverse efficient retrieval of data that satisfy the constraints
iii)The third case allows both the database of a user defined query. For example spatial domain
cardinality and the objects locations to vary ,operation such as spatial-selection,spatial join and
overtime. nearest-neighborhood queries are interest. In the
iv)The fourth and remaining case relates to a spatio-temporal domain,users could be interested in
database consisting of a fixed number of other kind of queries.
nontemporal spatial objects, they can be Time slice queries where timeslice can be instant
manipulated with spatial data base approaches. or an interval.
2.4 LOADING OF DATA: Ex-1:Find all spots where it rained yesterday at
Dynamic loading of data into the database.i.e 12.00 pm
distinguish between applications whose data are Ex-2:Find all pairs of objects that have lied spatially
bulk loaded with a timestamp ti in the database and close (i.e., within distance X), during a speci_c time
where an update of past instants is not allowed, and interval (or at a speci_c time instant)
applications with dynamics insertions and updates of
objects timestamps. The design of an efficient STAM Figure (2.2) Steps for spatio-temporal retrieval
also depends on this distinction.

2.5 APPROXIMATION OF MOVING OBJECTS:


R-tree and their variants approximate spatial objects Work
Presen
by their minimum bounding boxes (MBB) in order to ing
t Data
Mem
construct the spatial index. Transferring this
ory
approach to spatio-temporal objects turns out to be Histori
(W)
an inefficient solution due to the dynamic nature of cal
moving objects. Since these objects are moving Data
around ,their MBB include a vast amount of dead, Queue
nonvisited space. Fig(2.2)

3.EVALUATION OF FOUR INDEX STRUCTURE

To design spatio - temporal indexing techniques


have been exclusively restricted to purely spatial
indexing supporting multidimensional data or
temporal indexing for standard dat types
(e.g.numbers,strings).
These approaches can be classified into the following
categories:
i) Method that treat time as another
Fig (2.2) The MBB moving point occupies a large dimension
portion of data space ii) Method that incorporate the time
information into the nodes of the index structure but
without assuming another dimension
2.6 PECULIARITY OF THE TIME DIMENSION:

239
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

iii) Methods that use overlapping index specific area (or at a specific point), during a specific
structures in order to represent the state of the time
database in different (valid or transaction ) time interval (or at a specific time instant)‖ are expected
instants to be the most common ones addressed by STDBMS
users.Assuming a hierarchical tree structure, the
retrieval procedure is straightforward: starting from
the root node(s), a downwards traversal of the index
is executed by applying the criterion of intersected
intervals (for time) and ranges (for space) between
the query window and each node approximation. It
is important to point out that pure temporal or pure
spatial selection queries need to be supported as
well.
join queries:
4.TEMPORAL DATA AND Queries of the form ―find all pairs of objects that
QUERY PROCESSING have lied
Temporal attributes are important for many spatially close (i.e., within distance X), during a
applications. Investigate management of moving specifictime interval (or at a specific time instant)‖
objects, whose locations change continuously are also crucial in spatiotemporal databases. An
overtime and hence require continuous answers for immediate application is accident detection by
queries comparing vehicle trajectories. The retrieval
4.1 INDEXING AND MINING ON TEMPORAL DATA procedure is also straightforward: starting from the
Indexing and retrieving data records according to two root nodes, a downwards traversal of the two
their temporal attributes are primitive functionalities indexes is performed in parallel, by comparing the
for managing temporal data. While there are many entries of each visited node according to the overlap
temporal indexes proposed, it is shown in [6] that operator, such as the synchronized tree traversal
how the TSB-tree, a well-known temporal index proposed in [14] for R-tree structures.
structure is implemented in a commercial database nearest-neighbor queries:
and still retains a performance close to one with a Given an object X, the nearest-neighbor query
non-temporal, standard B+-tree. This involves (i) requests for the k closest objects with respect to X.
unique designs of version chaining and treating For example the query: ―find the 5 closest
index terms as versioned records to achieve the ambulances with respect to the accident place‖ is a
TSB-tree implementation with backward nearest-neighbor query. Evidently, such a query can
compatibility with B+tree, (ii) a data compression be supported by the algorithm proposed in [13].
scheme that reduces substantially the storage However, consider the query: ―find the 5 closest
needed for preserving historical data, and (iii) ambulances with respect to the accident place in a
dealing with technical issues such as concurrency time interval of 2 minutes before and after the
control, recovery, handling uncommitted data, and accident,knowing the directions and velocities of
log management ambulances and the street map‖. Evidently, more
Ex: Stock Market sophisticated algorithms are required, towards
spatiotemporal nearest-neighbor query processing.
4.1 QUERY PROCESSING
The major objective a STAM is to efficiently handle 5.ACCESS METHODS FOR PAST, PRESENT, AND
query FUTURE SPATIO-TEMPORAL DATA
processing. The broader is the set of queries
supported, the RPPF -tree [10]: The RPPF -tree (Past, Present, and
more applicable and useful the access method Future) indexes positions of moving objects at all
becomes. A points in time. The past positions of an object
set of fundamental query types as well as some between two consecutive samples are linearly
specialized interpolated and the future positions are computed
queries are discussed in the sequel. via a linear function from the last sample. The RPPF
selection queries: -tree applies partial persistence to the TPR-tree to
Queries of the form ―find all objects that have lied captures the past positions. Leaf and non-leaf
within a entries of the RPPF -tree include a time interval of

240
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

validity - [insertion time, deletion time]. When a tstart; tend; pointer >, where x rep is the
node, say x, is split at time t, entries in x alive at t transformed 1D object location value using an SFC,
are copied to a new node, say y and their tstartand tend are the insert and update times of the
timestamps are set to [t, ?) (i.e., their deletion times object, respectively. Each tree corresponds to a
are unidentified). While a time-parameterized timestamp signature being the end timestamp of a
bounding rectangle (TPBR) of the TPR-tree is valid phase when the tree is built and a lifespan being the
from the current time, the structure of a TPBR in the minimum start time and the maximum end time of
RPPF -tree is valid from its insertion time. The all the entries in that tree. Unlike the B x -tree that
straightforward, optimized, and double TPBRs are concatenates the timestamp signature and the 1D
studied. In the straightforward approach, the transformed value, the BBx -index maintains a
bounding rectangle is the integral of the TPBR from separate tree for each timestamp signature, and
its insertion time to infinity. In the optimized TPBR, models the moving objects from the past to the
the bounding rectangle is the integral of the TPBR future. Insertion is the same as in the B x -tree.
from its insertion time to (current time + H time Instead of deleting an object, update sets the end
units in the future that can be efficiently queried). time of the object to the current time, followed by
The straightforward and optimized TPBRs cannot be an insertion of the updated object into the latest
tightened since these rectangles start from their tree.
insertion times. The double TPBR allow tightening by
having two components: a tail MBR and a head MBR.
The tail MBR starts at the time of the last update 6.CONCLUSION
and extends to infinity and thus is a regular TPBR of
the TPR-tree. The head MBR bounds the finite In this short survey, we presented an overview of
historical trajectories from the insertion time to the existing spatio-temporal index structures. Spatio
last update. Querying is similar to the regular TPR- temporal indexing methods are classified based on
tree search with the exception of redefining the the type and time of the queries they can support
intersection function to accommodate for the double With the variety of spatio-temporal access methods,
TPBR. it becomes essential to have a general and
PCFI+-index [11]: The Past-Current-Future+-Index extensible framework to support the class of spatio-
builds on SETI and TPR_-tree [43]. As in SETI,space temporal indexing. There is still a lot of research
is divided into non-overlapping cells. For each cell, work that needs to be investigated in spatio-
an in-memory TPR_-tree indexes the current temporal indexing. A variety of tests using synthetic
positions and velocities of objects. Current data and real spatiotemporal data sets are necessary in
records are organized as a main-memory hash order to better understand the spatiotemporal
index, hence allowing efficient access to current indexing and retrieval issues.
positions. To index the objects‘ past trajectories, the
PCFI+-index uses a sparse ondisk R_-tree to index REFERENCES:
the lifetimes of historical data that only contains the [1]. Jiawei Han and MichelineKamber― Data Mining
segments from one cell. Insertion, update, and Concepts and Techniques‖ , second edition, Morgan
deletion are similar to those of SETI and TPR_-tree. Kaufmann Publishers an imprint of Elsevier
Upon update, if the new location resides inside the [2] TaherOmran Ahmed and MaryvonneMiquel
same partition, a new segment is inserted into the ―Multidimensional Structures Dedicated to
historical data file; the TPR_-tree updates the new Continuous Spatiotemporal Phenomena ― Springer-
location for the object. Otherwise, a split occurs and Verlag Berlin Heidelberg 2005, pp 1-12
two segments are inserted into the historical data [3] DimitrisPapadias, Yufei Tao, PanosKalnis, Jun
file at different pages; the corresponding entry in the Zhang, "Indexing Spatio-Temporal Data
old TPR_-tree is removed and is inserted into Warehouses," Data Engineering, International
another TPR_-tree. If the insertion of a segment Conference 2002
overflows a page, the corresponding R_-tree entry is 4.Kimball, R. The Data Warehouse Toolkit.John
updated to set its end time. Wiley, 1996.
[5]. Ghazi h. Al-naymat ―new methods for mining
BBx-index[12] : The BBx -index uses the Bx -tree Sequential and time series Data‖ , Ph.D thesis, the
techniques to support the present and future. To university of Sydney june 2009
index the past, the BBx-index keeps multiple tree [6]. Anthony David Veness ―A real-time spatio-
versions. Each tree entry has the form< x rep; temporal data exploration tool for marine

241
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

research‖,Master of Applied Science University of positions of moving objects.TODS, 31(1):255–298,


Tasmania, October 2009. 2006.
[7]. S.SudarsanKrithiramamritham [11] Z.-H. Liu, X.-L.Liu, J.-W.Ge, and H.-Y. Bae.
―DataWarehousing and Indexing large moving objects from past to future
Datamining‖,IITBombay,sudarsha@ with PCFI+-index.InCOMAD, pages 131–137, 2005.
cse.iitb.ernet.in,krithi@cse.iitb.ernet.in [12] D. Lin, C. Jensen, B. Ooi, and S. ˇ Saltenis.
[8] Subramanian Arumugam ―efficient algorithms for Efficient indexing of the historical, present, and
spatiotemporal data management‖,ph.d thesis future positions of moving objects. In MDM, pages
university of florida,2008 59–66, 2005.
[9] WANG Jizhou, LI Chengming ―Research on the [13] N. Roussopoulos, S. Kelley, F. Vincent, "Nearest
framework of spatio-temporal data warehouse‖, Neighbor Queries", Proceedings of ACM SIGMOD
Institute of GIS, Chinese Academy of Surveying and Conference, 1995.
Mapping, International Conference 2009, URL: [14] T. Brinkhoff, H.-P.Kriegel, B. Seeger, "Efficient
wjz@casm.ac.cn cmli@casm.ac.cn, Processing of Spatial Joins Using R-
icaci.org/documents/ICC_proceedings/ICC2009/html trees",Proceedings of ACM SIGMOD Conference,
/nonref/3_4.pdf 1993.
[10] M. Pelanis, S. ˇ Saltenis, and C. Jensen.
Indexing the past, present, and anticipated future

242
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

MODIFIED DELAY STRATEGY IN GRID


ENVIRONMENT
*N.Sivakamy, **B.Murugeswari, ***Dr.C.Jayakumar,

*Final Year M.E CSE,S.A. Engineering College,Chennai,sivakamyn@yahoo.com.

**AsstProf in CSE,S.A. EnggCollege,Chennai.muthusans@gmail.com

***Prof in,CSEdept,R.M.K.EnggCollege,Chennai.niyansree@gmail.com.

Abstract number of copies of each job, sends the copies and


Grids are a form of distributed computing in which the original job to different processors, and waits
many networked loosely coupled computers acting until the first replication is finished. A comparison of
together to perform very large tasks. The scheduling the performance of those two methods on a
of new jobs are delayed instead of dispatching them heterogeneous globally distributed grid environment
to overloaded workstations when the workstations has—to the best of the authors‘ knowledge—never
are under Full Load Situations (FLS). A new strategy been performed. Recently, a variety of grid test beds
based on Dynamic Load Balancing (DLB) is named have been developed. This enables us to perform
as mDELAY strategy. In Dynamic Load Balancing, comprehensive measurements of realistic job times
the decisions are taken at run time based on the to investigate how well certain implementations of
current state of the system. The mDELAY algorithm grid applications performing practice for a wide
significantly enhances DELAY strategy, which is a range of different experimental setups. In this paper,
delay-based scheduling algorithm. The mDELAY we provide extensive trace-driven simulation
algorithm produces better batch completion time and experiments of DLB and JR as two implementation
improved load balancing at workstations. To improve concepts to deal with the ever-changing
the performance a decentralized scheduling environment on widespread grid nodes. Moreover,
framework based on web service or grid service is we introduce a new selection method that
used. The processor load calculation to assigning job dynamically selects the best implementation and
for each processor. This processor calculates the show its effectiveness in global-scale grid
―optimal‖ load distribution given those predictions environments.
and sends relevant informationto each processor.
. 1.1 Related Works
Keywords:Synchronization, Fragmentations, The computations and communications structure of
Defragmentation, Task Sharing, scheduling, Load many parallel applications can be described by the
Calculating, data dissemination. Full Load Situations Parallel (FLS) model[4]. The
relevance of the FLS model lies in the fact that it has
1. Introduction the important property that the problems can be
divided into sub problems, each of which can be
Variation in the available resources (e.g., computing solved or executed in roughly the same way. As
power and bandwidth) may have a dramatic impact such, in the absence of any prior knowledge about
on the runtimes of parallel applications. Over the the processor speeds and link rates in large-scale
years, much research has been done on this subject grid environments, the FLS model can be seen as a
in grid computing. Generally, two methods for default means to parallelize computationally
parallel applications have been developed to deal intensive applications.[6] The BSP model includes
with those fluctuations in processor speeds on the the structure of Single Program Multiple Data which
nodes: Dynamic Load Balancing (DLB) DLB adapts is a classification of the structure of many parallel
the load on the different processors in proportion to implementations. Currently, not many of the FLS
the expected processor speeds. JR makes a given type of applications are able to run in a grid

243
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

environment due to the fact that they cannot deal FLS parallel programs have the property that the
with the ever-changing environment. Especially, the problem can be divided into sub problems or jobs,
synchronization in FLS programs causes inefficiency: each of which can be solved or executed in roughly
one late job can delay the whole process. This raises the same way. Each run consists of I iterations of P
the need for methods that make the FLS applications jobs, which are distributed on P processors: each
robust against changes in the grid environment. processor receives one job per iteration. Further,
Load-balancing[1] system is a cluster system every run contains I synchronization moments: after
composed of multiple hosts. Each individual host, computing the jobs, all the processors send their
which can provide services independently without data and wait for each others data before the next
any external assistance of other hosts, has iteration starts. In general, the runtime is equal to
equivalent status in the system. They form a cluster the sum of the individual iteration times (ITs)
in a symmetrical way. Logically, from the user side, presents the situation for one iteration of a FLS run
these machines function as a single large virtual in a grid environment. The figure shows that each
machine.[4]Instead of taking all these parameters processor receives a job, and the IT is equal to the
for the comparison, we have considered only two maximum of the individual job times plus the
parameters to represent the fairness and throughput synchronization time (ST). ELB assumes no prior
of the scheduling algorithms. The parameters are knowledge of processor speeds of the nodes and
batch completion time and workstation processing consequently balances the load equally among the
completion time. If workstations are completing their different nodes. The standard FLS program is
processing in less time, then throughput of the implemented according to the ELB principle.
algorithm is higher. Similarly, if batch completion
time is smaller then each job is getting proportionate 2.2 Load Balancing and Job Replication
CPU time resulting in the improved fairness of the In this section, we briefly discuss the two main
algorithm. We have also plotted the graphs for methods to cope with the dynamics of the grid
comparing the variation among these values. From environment: DLB and JR.
the graphs, it is observed that the matrices
considered by us can very well replace the 2.2.1 Dynamic Load Balancing
performance matrices considered by Hui and
Chanson.[2]BigJob is launched, sub-jobs keeps
c c c……….
c
running until final solution is achieved and the
manager quits the Pilot-Jobat that time. In case 1 2 C
………. Ma 1. 2 C
multiple BigJobs are submitted for the same . Slav n ster Slav n
simulation or if a load balancing function isincluded, e
e
sub-jobs experience several restarts from their
checkpointing data.
Servi Ser Servi
ce 1
Slav Slav
vice ce 3
2. System Modelling
Briefly describe the concept of the FLS model. Then,
e
cServi
c Sc e
cServi
c
in Section 2.2, we describe the implementation ……….
1 ce22 C ………
1 ce2n C
details of DLB and JR. Figure:
.
n he
DLB Architecture … n
DLB starts with the execution of an iteration, which
2.1 Full Load Situations does not differ from the common FLS program
du
explained above. However, at the end of each
iteration, the processors predict their processing
speed for the next iteration. We select one processor
ler
to be the DLB scheduler. After every N iteration, the
processors send their prediction to this scheduler.
Subsequently, this processor calculates the ―optimal‖
load distribution given those predictions and sends
relevant information to each processor. The load
distribution is optimal when all processors finish their
calculation exactly at the same time. Therefore, it is
―optimal‖ when the load assigned to each processor
is proportional to its predicted processor speed.

244
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Finally, all processors redistribute the load. Fig. 1b deduce the number of clients that has access to
provides an overview of the different steps within a itself. Basically the relationship between the
DLB implementation on four processors. The resources is a Client-Server relationship. The clients
effectiveness of DLB partly relies on the dividing in the network are attached loosely to the server and
possibilities of the load. Load balancing at every hence the number of clients changes dynamically in
single iteration is rarely a good strategy. On one the network, which has to be known to the server
hand, the runtime of a parallel application directly 3.2 PERFORMANCE ANALYSIS
depends on the overhead of DLB, and therefore, it is The server must deduce the free workspaces in the
better to increase the number of iterations between resources .If the free workspace is between 40 to
two load balancing steps. On the other hand, less 100%, a fragment of the text is assigned to client. If
load balancing leads to an imbalance of the load for it is less than 40% then it means that the clients are
the processors for sustained periods of time due to not free hence the server does not overload those
significant changes in processing speeds. The clients the server maintains entries about its clients.
present the theoretical speedups in runtimes when
using DLB compared to ELB, given that the 3.3 FRAGMENTATION
application rebalances the load every N iterations Based on the performance analysis like CPU idle time
but without taking into account the overhead. Based and the total memory of the remote system (i.e. free
on those speedups and the load balancing overhead and usage) the tasks are fragmented. The fragment
addressed above, a suitable value of N was found to is based on the no of free space available on the
be 2:5P iterations, for P> 1. The effectiveness of remote system.
DLB strongly relies on the accuracy of the 3.4 SCHEDULING & REFRAGMENTATION
predictions. Previous research has shown that the The fragments are then sent to the clients
Dynamic Exponential Smoothing (DES) predictor dynamically. The server has a control over the
accurately predicts the job times on shared resources residing at the clients. After performing
processors. For that reason, the DES predictor has the tasks, encrypted fragments are sent to server to
been implemented in the DLB-simulation obtain the final encrypted message.
implementation of this paper.
4. Function of DLB
2.2.2 Job Replication
In this section, we introduce the concept of job 4.1 DLB Experiments
replicating in FLS parallel programs. In an R-JR run, In this section, we present the results of the
R 1 exact copies of each job have been created and simulations of the DLB runs. We investigated the
have to be executed, such that there exist R samples DLB runtimes with both sets of processors for runs
of each job. Two copies of a job perform exactly the with a CCR of 0.01, 0.25, and 0.50 on 1, 2, 4, 8, 16,
same computations: the data sets, the parameters, and 32 processors. a depicts the average In light of
and the calculations are completely the same. A JR the above quantitative analyses and results; this
run consists of I iterations. One iteration takes in section develops a deterministic algorithm for the
total R steps. R copies of all P jobs have been broadcast schedule optimization and the assessment
distributed to P processors, and therefore, each of mean service access. Runtimes on a logarithmic
processor receives each iteration R different jobs. As scale of all performed simulations on nodes of set
soon as a processor has finished one of the copies, it one. Moreover, depicts the average runtimes on a
sends a message to the other processors that they logarithmic scale of all the performed simulations on
can kill the job and start the next job in the nodes of set two. From the simulations results of the
sequence. runs with a CCR of0.01, we conclude that selecting
more processors in the run decreases the runtimes,
which is the main motivation for programming in
parallel. Although the rescheduling and send times
increase when more processors are selected in the
run, the decrease in the computation times for this
3. Modules case is always higher. As is shown by we draw
different conclusions when the CCR is higher. For
3.1 DEDUCTION OF RESOURCES runs on nodes of set one and a CCR of 0.25, we
The server holds the complete text that was notice a decrease in runtimes until the amount of 16
encrypted by the different clients. The server must processors is selected. When more processors have

245
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

been selected, the runtimes will increase due to the


significant heights of the rescheduling and send
times. Furthermore, we conclude that for every
experimental setting, DLB consistently shows a
speedup in comparison to ELB, even for runs with a
CCR of 0.50. However, one could doubt the
effectiveness of programming in parallel, because of
the increase in runtimes when more processors are
used.

4.2 Implementation
4.2.1 DEDUCTION OF RESOURCES
The server holds the complete text that was
encrypted by the different clients. The server must
deduce the number of clients that has access to
itself. Basically the relationship between the Conclusion and Future Work
resources is a Client-Server relationship. The clients The proposed approaches are better then existing
in the network are attached loosely to the server and approach. Even single processor can finish the task
hence the number of clients changes dynamically in quickly if it is faster then other processors. The load
the network, which has to be known to the server distribution is optimal when all processors finish their
calculation exactly at the same time. Hence it is
―optimal‖ when the load assigned to each processor
is proportional to its predicted processor speed.

References
[1] Jie Chang, Wen‘an Zhou, Junde Song, Zhiqi Lin.
Beijing University of Posts and
Telecommunications‖Scheduling Algorithm of Load
Balancing Based on Dynamic Policies―978-0-7695-
3969-0/10 2010 IEEE DOI 10.1109/ICNS.2010.57.
[2] Soon-HeumKo, Nayong Kim, Joohyun Kim,
Abhinav Thota1, ShantenuJha‖Efficient Runtime
Environment for Coupled Multi-Physics Simulations
Dynamic Resource Allocation and Load-
Balancing‖.2010 10th IEEE/ACM International
4.2.2 PERFORMANCE ANALYSIS Conference on Cluster, Cloud and Grid Computing
The server must deduce the free workspaces in the 978-0-7695-4039-9/10 $26.00 © 2010 IEEE DOI
resources .If the free workspace is between 40 to 10.1109/CCGRID.2010.107.
100%, a fragment of the text is assigned to client. If [3] P. K. Suri , Department of Computer Sc. &
it is less than 40% then it means that the clients are Applications Kurukshetra University
not free hence the server does not overload those Kurukshetra,India ―An Efficient Decentralized Load
clients the server maintains entries about its clients. Balancing Algorithm For Grid‖,978-1-4244-4791-
6/10/$25.00 2010 IEEE.
[4] Hemant Kumar Mehta,ManoharChandwani,
PriyeshKanungo,‖A Modified Delay Strategy for
Dynamic Load Balancing in Cluster and Grid
Environment‖978-1-4244-5943-8/10/$26.002010
IEEE.

[5] HojjatJafarpour, SharadMehrotra and


NaliniVenkatasubramanian Department of Computer
Science University of California, Irvine {hjafarpo,
sharad, nalini}‖Dynamic Load Balancing for Cluster-
based Publish/Subscribe System‖IEEE ICDCS 2009.

246
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[6] Menno Dobber, Student Member, IEEE, Rob van [9] H. Attiya, ―Two Phase Algorithm for Load
der Mei, and GerKoole ―Dynamic Load Balancing Balancing in Heterogeneous Distributed Systems,‖
and Job Replication in a Global-Scale Grid Proc. 12th EuromicroConf.Parallel, Distributed and
Environment‖ IEEE Transaction on parallel and Network-Based Processing (PDP ‘04), p. 434,2004.
distributed systems, vol. 20, no. 2, February 2009. [10] R. Bajaj and D.P. Agrawal, ―Improving
[7] Dynamic Load Balancing for Cluster- Scheduling of Tasks in aHeterogeneous
BasedPublish/Subscribe s System[2009]. Environment,‖ IEEE Trans. Parallel and Distributed
Systems, vol. 15, no. 2, pp. 107-118, Feb. 2004.
[8]The Contribution Of Static And Dynamic Load [11] I.Banicescu and V. Velusamy, ―Load Balancing
Balancing In A Real-Time Distributed Air Defense Highly Irregular Computations with the Adaptive
Simulation [2008]. Factoring,‖ Proc. 16th Int‘lParallel and Distributed
Processing Symp. (IPDPS ‘02), p. 195, 2002.

247
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

AN EFFECTIVE WEB-BASED E-LEARNING BY


MANAGING RESOURCES USING ONTOLOGY
Indumathy.C
Easwari Engineering Collage, Anna University
indumathyc@hotmail.com
Abstract—Recent advances in Web and integrate e-learning databases by using ontology
information technologies have resulted in many semantics. Heterogeneous e-learning databases
e-learning resources. There is an emerging can be integrated under a mediated ontology.
requirement to manage and reuse relevant Taking into account the locality of resource reuse,
resources together to achieve on-demand e- representing context specific portions from the
learning in the Web. Ontologies have become a whole ontology as subontologies. Resource is
key technology for enabling semantic-driven reused based on subontology approach by using
resource management. To meet the requirements an evolutionary algorithm.
of semantic-based resource management for
Web-based e-learning, one should go beyond Index Terms— collaborative learning, information
using domain ontologies statically. In this paper, resource management, knowledge reuse.
A semantic mapping mechanism is provided to
necessary for users toretrieve and reuse them
I. INTRODUCTION in a global scope. An e-learningsystem needs
to compose relevant resources together
Nowadays, Web-based e-learning has inorder to achieve on-demand and
become one of theimportant methods for collaborative e-learning inthe Web.Besides the
students or users to acquirevarious kinds of semantic heterogeneity of e-learningresources,
knowledge. According to [1], e-learningwas there are semantic gaps between user
defined as using various technological tools requirementsand e-learning resources.
that areeither Web based, Web distributed, or Although users could queryor search e-
Web capable for thepurposes of education. learning resources in the Web, the semantic
Murdock in [2] modelled learning an active gap between user requirements and e-learning
deliberative goal-driven process of resources impedessystems from satisfying the
constructing plan over planning operations. An resource requests fromusers well. An adaptive
important aspect of the e-learning system is to hypermedia system [3], [4] isproposed to
provide required e-learning materialsto build a model of the background,
learners. Recent advances in Web and experience,preferences, and knowledge of an
informationtechnologies with the increasing individual user throughoutthe interaction with
decentralization of organizational structures the user, and resources like
have resulted in massive amount ofe-learning educationalhypermedia in the model can be
resources in many disciplines such as adapted to the requirementsof the user. In
medicine,biology, etc. There are many related order to achieve such an adaptive e-
resourcesgrouped into e-learning materials on learningsystem, it is necessary to manage e-
various topics, which are accessibleon the learning resourcessemantically .Ontology [5],
Web. hasrecently
One common assumption for Web-
based e-learning is that e-learning resources
could be provided by different enterprises or beenstudied rigorously as the common
institutes with only spontaneous relationships. standard for specifying Web resources
Thevast amount of e-learning resources today semantics to fill the semantic gap and
is distributedamong many specialized overcome the problem of semantic
databases and websites. Thedistributed heterogeneity. Ontologiesare increasingly seen
resources may be potentially related to each as a key technology for enabling semantic-
otherwithin an e-learning system, and it is driven knowledge processing [6]. Ontologies

248
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

can be used to improve resource propagation,analyse effects of the change,


management for e-learning systems in change validation. Change implementation.
semantic interoperability. To meet the on- His focus on providing the user with
demand resource managementfor Web-based capabilities to control and customize it. Flouris
e-learning, one should go beyond usingdomain [16] uses the extensive work that has been
ontologies statically but allow different performed in the field of belief change and
focusedaspects of the ontologies to evolve as appliesit to the ontology evolution context.
specialized subontologies(SubOs) via a Existing works for Semantic-Web- or
specialization process based on ontology-based
pastexperience.In this paper, a dynamic SubO e-learning tend to use ontologies or semantic
mechanismfor adaptive and efficient e-learning modelsstatically to mediate e-learning
resource managementis used. resources or improvee-learning behaviors. In
Various works in field of ontology contrast to the above and the approaches
related to e-lerning are given below reviewed earlier, this work related to e-
Papasalouros et al. [7] propose an RDF learningresource management relies on a
encoding of the conceptual model following a SubO-based approachthat reuses large-scale
specific RDF schema that is appropriate for ontology dynamically. The integration of e-
educational adaptive hypermedia applications. learning resource by semantic mapping may
Huang et al. [8] present a novel context-aware be similar with existing research on ontology-
semantic e-learning approach tointegrate based mappingor integration of e-learning
content provision, the learning process, resources; however, this is extended
andlearner personality in an integrated approach with a dynamic SubO
framework.Besides, there have been some evolutionmechanism for resource reuse is used
ongoing research effortsreported in the field of . To contrast it with ontologymodularity and
ontology, which are related to the idea of ontology evolution, main concern of
SubO-based resource management.Ontology SubOevolution is inclined to evolve the
modularity is mainly about segmenting resource repository of thee-learning system
orextracting modules from large ontologies to based on GA.
satisfyapplication requirements. Noy and The main contributions of this paper are
Musen [9] define a the following:
portion of an ontology as an ontology view  To provide a semantic mapping
and proposean approach of specifying and mechanism to integrate e-learning
maintaining ontology views. Seidenberg et al. databases by using ontology
[10] propose some algorithmsfor extracting semantics. E-learning databases can
relevant segments from large DL be integrated under a mediated
ontologies.Grau et al. [11] investigate the ontology
modularity of ontologiesand provide a notion  To define the SubO by taking into
of modularity grounded on thesemantics of account the locality of resource
OWL-DL. Doran [12] focuses on reference. context-specific portions
ontologymodularization. The aim of his from an as SubOs.
research is to reduce thesize of the ontology  SubO-based resource reuse approach
being imported to a new application.The term based on an evolutionary algorithm. It
ontology evolution in the area of will improve the local knowledge
ontologyengineering is treated as a part of the structure of e-learning systems, which
ontology versioningmechanism that is can reuse SubOs to achieve adaptive
analyzed in [13], which refers to access todata resource management.
through different versions of an ontology. Noy
and Klein [14] develop automatic techniques
for finding similaritiesand differences between
versions and analyze theeffects of ontology-
change operations. Stojanovic et al . [15]
identify a possible six-phase ontology
evolution process which includes identification
of the change ,change representation, change

249
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

II. SEMANTIC RESOURCE BASED


INTEGRATION

A. Semantic mapping

Everything that contains learnable


knowledge is an e-learning resource. The
granularity to determine an e-learning
resource is not unique: a paper can be a
resource, a database can be a resource, and a
website as a whole can also be a resource.
There is a vast amount of e-learning resources
in various disciplines, and many of them exist
in terms of structured or semistructured data
(e.g., Web pages, relational databases), most
of which canbe accessed in the Web. The
information in those resources is potentially
related to each other from the aspect of e-
learning, and it is necessary for users to reuse
them in a global scope. There is an increased
emphasis on integration of existing
heterogeneous e-learning resources for Web-
Fig. 1.Semantic-based resource integration by
based e-learning. As the foundation of the
using domain ontology.
Semantic Web, ontologies are increasingly
seen as a key
Relational schemata of e-learning
technology for enabling semantic-driven or
databases that containlearning materials are
semantic mediated
mapped to a global ontology according to their
resource integration. In this approach, a large-
underlying intrinsic relationships. Tables in
scale domain ontology acts as a semantic
different databases are mapped to ontology
mediator for integrating distributed and
classes ( classrefered as concept throughout
heterogeneous e-learning resources (see Fig.
this paper), and table fields (columns) are
1). An e-learning resource has a major topic
mapped to ontology properties ( property
(e.g., set theory, acupuncture,or genome) or
refered as relationship between concepts in
schema. Mapping of the schemata of various
ontology) (see Fig. 2). When a table is
e-learning resources to a mediated in a
mapped to a class, the records in the table can
universalscope. Main focus is on mapping
be treated as the instances of the class.
database resources to ontology in this work,
Implicit relationships between databases are
while mapping other unstructured-learning
interpreted as semantic relationships in the
resorces e-learning resources like web pages
ontology.
often require transforminga natural description
to an explicit semantic description of the
resource, which falls into the fields of
information extraction and natural language
processing . Just leave this issue to the future
work, and it will not befurther discussed in the
paper.

Fig. 2. Semantic mapping between databases


and ontology

250
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Semantic mapping information is particular collections of resources. As e-


stored in a semantic registry and can be learning resources are mapped to a large-scale
reused by applications. Users or applications ontology in this approach, an e-learning
are able to inquire about mapping information systemneeds to extract from the ontology
at the semantic registry. A mapping definition specific portions andkeeps evolving them to
can be exported as XML files and reused by satisfy the requirements of users.Different
applications. Organizing resources around the portions of the ontology with related e-
ontology provides a middle layer that makes learningresources are reused by users or
resource integration more efficient, reducing applications.Taking into account the locality of
the cost and maintenance of the application. resource reference, represent those context-
Besides, as the ontology can grow over time specific portions fromthe whole ontology as
as new e-learning resources become available, SubOs [10], which can be reused ine-learning
new relations are continually being madewith systems.
new knowledge assimilated in the ontology Given an ontology O with n classes, a
. concept setC ={ │ O; =1,2, . . k}, k≤ n,
B. View-based semantic integration a knowledge setK O, C, K, a SubO B
is a triple <C,K,O> .The concept set C denotes
According to the conventional data the learning scenario or context, and the
integration literature[31], the view-based elements in C are ontology classes. Given C,
approach has a well-understood foundation one can derive the set of knowledge K by
and has been proved to be flexible searching the ontologyO (also called the
forheterogeneous data resource integration. source ontology for SubOs) for the classes in
There are twokinds of view in conventional C. The knowledge set K includes a collectionof
data integration systems: global-as-view ontology classes and properties. Given two
(GAV) or local-as-view (LAV). The classes c1, c2€O, there is a property between
experiencesfrom conventional data integration c1 and c2, and r K, if c1, c2 K, then r K. This
systems tell us that LAV provides greater attribute ensures that K is ameaningful subset
extensibility than GAV: theaddition of new of the ontology. If a large-scale ontology is
resources is less likely to require a change to represented as a complex semantic graph, K
the mediated schema [17]. In domain-specific may include one or more subgraphs
e-learning,new e-learning resources are (components) from thesemantic graph.
regularly added so the total number of
databases is increasing gradually.Therefore, 3.2 SubO Operation
the LAV approach is employed in ourapproach,
that is, each relational table in e- In this section, manipulation of
learningdatabases is defined as a view over SubOs based on the semantic structure of the
the mediatedontology. ontology. Also defining a set of basic SubO
operations as follows:
III. SUBONTOLOGY Extract. Given an ontology O and a
concept set C={ } O, the extract
A. Definition operator‘s input can be represented as atriple
< > , where n is the traversal depth. The
The problem of resource integration in operator will return a SubO B.
the previous section. Reuse e-learning
resources further in the following sections. Algorithm 1. Extract(C, n, O)
Ontologies play the role of mediator in our e- Input: an ontology O, a concept set set C={c1
learning resource management approach. A ,c2 …..ck }
large-scale mediated ontology contains Andthe traversal depth n.
relatively complete knowledge about the Output: a SubO B.
domain of e-learning.According to [33], the Comments: extract a SubO from an ontology.
activities of many semantic-based systems
always rely only on localized information and For 1 to k
knowledge. The conjecture is that the perform breadth-first traversal from ci in O
activities of Web-based e-learning need only

251
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

terminate traversal up to a depth of ;


add the result set to the set K Algorithm 3.Compare(B1 ,B2 ,O)
end loop Input: An ontology O, two SubOs B1=<C1
get the corresponding SubO B = <C,K,O> ,K1,O>and
returnB B2 = <C2 ,K2 ,O>
Output: a similarity degree sim.
Extraction of the the contents of a Comments: compare two SubOs.
SubO mainly by breadth firsttraversal through
properties in the source ontology. Thetraversal compute the Levenshtein distance between C1
depth n is an alterable parameter for the and C2
SubOextraction. assign the result to LD(C1, C2 )
Store. Given a SubO B =<C,K,O> and compute the Levenshtein distance between
,a SubO repository R , the store operator‘s K1and K2
input is a pair <B,R>. assign the result to LD(K1 ,K2)
RLD(B1 , B2) LD(C1, C2) | C1| orLD(K1 ,K2)
Algorithm 2.Store(B, R) |K|
Input: a SubO B=<C,K,O> , a SubO repository sim 1 RLD(B1 , B2)
R. returnsim
Output: a Boolean value ret.
Comments: store a SubO into a repository. The discrepancy of SubOs is computed
based on theirLevenshtein distance There are
sizeOf(R) //get the number of the SubOs two strategies for computing the Levenshtein
in R distance: one is based on the Levenshtein
for 1 to n distance of two contextual sets C1 and
get th object in R C2 which is defined as the number ofchanges
assign the object to required to turn one concept set into another,
ifB equals to then andthe other is based on the Levenshtein
return false distance of the twoknowledge sets K1and K2 .
end if The latter is more complex but is of high
end loop accuracy. As the size of knowledge set is not
storeC, K, and O in three fields respectively invariant, we use the relevant Levenshtein
return distance instead of the conventional
Levenshtein distance directly.
Now store newly extracted SubOs as a Retrieve. Given a context description
small ontologymodel in the local repository of D and a SubOrepository R, the retrieve
an e-learning system, whichis inspired by operator takes a pair <D,R> as
caching. Caches work well because of its input and returns a SubO B.
aprinciple known as locality of reference. A SubO is retrieved from R by
There areseveral methods to restrict the size computing the RLDbetween D and the concept
of SubOs. First, we canuse the traversal depth sets of the SubOs in R. Here, theretrieved
in extraction to constrain the size ofSubOs. SubO has a similarity higher than a threshold.
Second, the repository space also limits the Ifno matched SubO is found, a new SubO will
size ofSubOs. If the size of a SubO is too be extractedfrom the source ontology O.
large, e.g., exceeding athreshold size, it
cannot be accepted by the repository.When a Algorithm 4.Retrieve(D, R)
repository is full, we can replace the SubOs Input: a context description D, a SubO
barelyused with new ones according to some repository R.
cache replacementpolicy. Output: a SubO B.
Compare. Given an ontology O and Comments: retrieve a SubO from a repository.
two SubOs B1=<C1 , K1 ,O> and B2=<C2 ,K2
,O> to be compared, thecompare operator sizeOf(R) //get the number of the SubOs
takes a triple <B1 ,B2 ,O> as its inputand in R
returns a similarity degree sim. for 1 to n

252
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

get th object in R various requirementsis one of the most


assign the object to important issues to the e-learning resource
sim RLD( ) reuse. As e-learning resources are mapped to
ifsim>Tsim then //Tsim is the threshold value a mediated ontology, a user request for e-
for learning resources can be transformed to a
semantic similarity query on the ontology in this approach. As
return long as portion of the ontology is retrieved,
end if the e-learning resources mapped to
end loop thatportion of the ontology. Given an ontology
return null Owith a collection of e-learningresources R, a
resource request q is a tuple <c1 ,c2
Merge. Given an ontology O and two …..ck>, O.
SubOs B1=<C1 , K1 ,O> and B2 = <C2 ,K2 ,O> An e-learning system needs to
to be compared, the merge operator takes a coordinate and reuse
triple <B1 ,B2 ,O> as its input and returns a e-learning resources to satisfy resource
SubO B. requests. A SubOgroups a collection of e-
learning resources together withsemantics in
Algorithm 5.Merge(B1 ,B2 ,O) this approach. Simplify the process ofresource
Input: an ontology O, and two SubOs B 1=<C1 reuse into matching the classes in
, K1 ,O> and B2 = <C2 ,K2 ,O> . resourcerequests with the ones in SubOs. The
Output: a SubO B. matching degreebetween a resource request
Comments: merge two SubOs into a new one and a SubO is computedusing the definitions
described as the buildingblocks. Assume that A
mergeC1 and C2 into a new set C and B are two ontology classes:
mergeK1and K2into a new set K  Exact. A is equal to B, or A andB are
check the completeness of K just the sameclass, denoted as A B.
extract additional properties from O  Plugin. A is a superclass of B, or A
add the additional contents to K subsumes B,denoted as B A.
get the corresponding SubO B = <C,K,O>  Subsume. A is a subclass of B, or B
return B subsumes A,denoted as A B.

Simply combine the knowledge sets K1and Given a resource request q = <c1 ,c2
K2 of the two SubO together to get a new one …..ck>and a SubO B=<C,K,O> to what extent
K. Additional extraction may be required in B is able tomatch qis represented by the
order to preserve a valid knowledge set in the knowledge matchingcoefficient (KMC). The
new SubO. For example, if c1 is a class in K1, KMC of B to q is given as
c2 is a class inK2 , and there is a property r KMC(B, q) = +
1 2+
between the two classes in O, neither K1nor K2
3 (1)
contains r. Then, we need to retrieve r from O
where and are weighting coefficients. n1
in order to maintain the completeness of K.
refers to thenumber of exact matches between
the classes in K and theclasses in q. n2 refers
IV. SUBO-BASED RESOURCE REUSE
to the number of plugin matches. n3refers to
the number of subsume matches.The
A.Resource Reuse
importance of the three weighting coefficients
where and is not equal. is the
Generally, the goal of learning is to
mostimportant in evaluating KMC, followed by
gain knowledge, skills,and experiences in
and Therefore, assign values to the
order to solve problems better andfaster. The
weighting coefficients according to the
users of e-learning systems need to reuse
variouse-learning resources to achieve a goal condition: > > One should take into
or solve a problem. Itreceives and handles account is thenormalization of the weighting
requests for e-learning resources coefficients. In order toevaluate KMC with
fromapplications or users. How to satisfy different sets of the coefficients,should
normalize the coefficients based on

253
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Besides pure SubOextraction assign the class to


and caching, the ability of self-evolving forthe if appears in K
SubOs is required in order to improve the set th allele in S to 1
ondemandresource reuse. end if
end loop
B.PROBLEM REPRESENTATION returnS
The encode operation proceeds as follows:
GA has been applied tomany problems  Convert O into a list of classes.
of optimization and classification. GAis here to  Create a chromosome S with each
solve the problem of SubO evolution tosupport allele set to 0.
dynamic resource management and reuse.  Compare the classes in B with the
Formapping our problem to the GA classes in O and change the allele
formulation, two steps areneeded to be genes of S.
performed, namely, problem encoding and  Transform B into a chromosome S.
determining the evaluation function based on Decode. Given a chromosome S and an
theontology semantics. In GA, a chromosome ontology O, thedecode operator‘s input is a
is composed of a list of genes,and a set of pair <S,O> . The operator willreturn a SubO
chromosomes groups together as a B.
population.In the problem of SubO evolution,it
refers to how to represent SubOs as Algorithm 7. Decode(S, O)
chromosomes. Beinga subset of an ontology is Input: a chromosome S,an ontologyO.
a common feature of SubOsfrom the same Output: a SubO B.
source ontology. It means that evolving Comments: decode a chromosome into a
collection of subsets of the sameontology. We SubO
take the source ontology as the problemspace
and map it to an encoding space that consists convertO to a class list L
ofcharacters 0 and 1. Given that the number sizeOf(S) //get the number of the alleles
of classes of asource ontology is n, the length in S
of chromosome is also n.Each class appearing for 1 to n //get the knowledge set for the
in the SubO will have the correspondingallele SubO
(gene) in the corresponding chromosomeset get th allele in S
to 1 or 0 otherwise.Now defining two assign the allele to
additional SubO operations based on the if =1 then
problem encoding for SubOs. add th class in L to a set K //retrieve class
Encode. Given a SubO B =<C,K,O> from the source
and anontology O, the encode operator‘s input end if
is a pair <B,O> .The operator will return a end loop
chromosome S. retrieve properties for the classes in K
cluster K into subgraphs
Algorithm 6.Encode(B, O) for each subgraph //get the concept set for
Input: a SubO B =<C,K,O> , an ontology O. the SubO
Output: a chromosome S. pick out the root node
Comments: encode a SubO into a add the class of the root node to a set C
chromosome end loop
convertO to a class list L compose a triple B = <C,K,O>
sizeOf(L) //get the number of the returnB
concepts in O The decode operation proceeds as
create a chromosome S with n alleles and follows:
each allele is set  Retrieve a knowledge set K from O by
to 0 checking thealleles of the
for 1 to n //set the alleles for the chromosome S.
chromosome  The semantic graph of K consists of
get ith class in L severalsubgraphs. Pick out the root

254
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

concept of eachsubgraph and 2. SubOs as a population of


compose a concept set C. chromosomes.
 Finally, we convert S into a SubO B 3. Evolve the population based on GA.
=<C,K,O> . 4. Compare the chromosomes in the
The decode operator is frequently population and merge ones with high
performed as chromosomes for SubOs evolve similarity.
generation by generation, and new 5. Evaluate the chromosomes in the
chromosomes should be transformed to population
SubOs. The problem of encoding is solved so 6. Terminate if the overall fitness is
far, and now job is to determine the fitness higher than a threshold value.
function. The fitness function is a measure of Otherwise, go to step 3.
performance for SubOs.SubO consists of 7. Decode the chromosomes in the
several separate components, or the population to SubOs and return a new
knowledge of a SubO clusters into different set of SubOs.
components. If the knowledge of a SubO 8. Retrieve e-learning resources related
clusters into fewer components, the SubO is to the SubOs
more meaningful and complete to resource The local knowledge structure of e-
requests logically. Therefore, we can evaluate learning systems becomes more
a SubO by computing the clustering degree of adaptive to resource requests via
its knowledge.Given a SubO B= <C,K,O> , the SubO evolution.
extent to which the knowledge in B forms
clusters can be quantified as a knowledge V. CONCLUSION
clustering coefficient (KCC), given as
2
KCC(B)= ( I …. In this paper, in toachieve the on-demand
(2) semantic-based resource management for
where it is assumed that the corresponding Web-based e-learning, one should go beyond
semantic graph is divided into k components, using domain ontologies statically. A semantic
and ni refers to the number of mapping mechanismis proposed to integrate
classes in the ith component. The total e-learning databases by usingontology
number of classes in the ontology is n. The semantics.Defining context-specific portions
fitness functionof evolution approach mainly from the whole ontology as SubOs and
evaluates the fitness of a chromosome by the propose a SubO-based resource reuse
KCC of itsSubO. A chromosome with a higher approach by using an evolution steps. GA-
KCC gets a higher based evolution steps for dynamic e-learning
chance to survive in the evolution. Let P be a resource reuse in detail. However, e-learning
population with chromosomes. Then, the is also a widely open research area, and there
fitness value of the thchromosome in P is is still much room forimprovement on the
calculated as follows: method. Future research issues include 1)
=KCC( ) KCC( ); = 1, 2,… . . . improving the proposed evolution approach by
(3) making use of and comparing different
which is only related to the knowledge evolutionary algorithms,2) applying the
structure of the proposed approach to support more
SubO, not depending on specific problem applications,
domains. and 3) extending to the situation with
multiplee-learning systems or services.
C. Evolution
The following steps evolves a set of SubOs VI. REFERENCE
based on a set of resourcerequests and gets a
new set of SubOs for resource reuse. [1] M. Nichols and D. Schedule, ―A Theory for
1. Retrieve different SubOs from a eLearning,‖ EducationalTechnology and
repository or extract different SubOs Soc., vol. 6, no. 2, pp. 1-10, 2003.
from the source ontology according to [2] J. Murdock, G. Shippey, and A. Ram,
different resource requests. ―Case Based Planning toLearn,‖ Proc.
Second Int‘l Conf. Case-Based Reasoning

255
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Research andDevelopment (ICCBR ‘97), [14] N.F. Noy and M. Klein, ―Ontology
pp. 467-476, 1997. Evolution: Not the Same as Schema
[3] T. Berners-Lee, J. Hendler, and O. Lassila, Evolution,‖ SMI Technical Report SMI-
―The Semantic Web,‖Scientific Am., vol. 2002-0926, http:// smi.stanford.edu/smi-
284, no. 5, pp. 34-43, 2001. web/reports/SMI-2002-0926.pdf, 2002.
[4] P. Brusilovsky, ―Adaptive Educational [15] L. Stojanovic, A. Maedche, B. Motik,
Hypermedia,‖ Proc. 10thInt‘l PEG Conf., pp. and N. Stojanovic, ―User- Driven Ontology
8-12, 2001. Evolution Management,‖ Proc. 13th Int‘l
[5] T. Gruber, ―A Translation Approach to Conf. Knowledge Eng. and Knowledge
Portable OntologySpecifications,‖ Management (EKAW ‘02), pp. 285-300,
Knowledge Acquisition, vol. 5, no. 2, pp. 2002.
199-220,1993.. [16] G. Flouris, ―On Belief Change and
[6] A. Maedche, B. Motik, L. Stojanovic, R. Ontology Evolution: Thesis,‖AI Comm.,
Studer, and R. Volz, ―Ontologies for vol. 19, no. 4, pp. 395-397, 2006.
Enterprise\ Knowledge Management,‖ [17] A.Y. Halevy, ―Answering Queries Using
IEEEIntelligent System, vol. 18, no. 2, pp. Views: A Survey,‖VLDB J., vol. 10, pp.
26-33, Mar./Apr. 2003. 270-294, 2001.
[7] A. Papasalouros, S. Retalis, and N.
Papaspyrou, ―SemanticDescription of
Educational Adaptive Hypermedia Based
on aConceptual Model,‖ Educational
Technology and Soc., vol. 7, no. 4,pp.
129-142, 2004
[8] W. Huang, D. Webster, D. Wood, and T.
Ishaya, ―An Intelligent Semantic e-
Learning Framework Using Context-Aware
Semantic Web Technologies,‖ British J.
Educational Technology, vol. 37, no. 3, pp.
351-373, 2006.
[9] N.F. Noy and M.A. Musen, ―Specifying
Ontology Views by Traversal,‖ Proc. Third
Int‘l Semantic Web Conf. (ISWC ‘04), pp.
713-725, 2004.
[10] J. Seidenberg and A. Rector, ―Web
Ontology Segmentation: Analysis,
Classification and Use,‖ Proc. 15th Int‘l
World Wide Web Conf. (WWW ‘06), pp.
13-22, 2006.
[11] B.C. Grau et al., ―Modularity and Web
Ontologies,‖ Proc. 10th Int‘l Conf.
Principles of Knowledge Representation
and Reasoning (KR ‘06), pp. 198-209,
2006.
[12] P. Doran, ―Ontology Reuse via
Ontology Modularisation,‖ Proc.
KnowledgeWeb PhD Symp. (KWEPSY ‘06),
http://www.l3s.de/
kweb/kwepsy2006/FinalSubmissions/kwep
sy2006_doran.pdf, 2006.
[13] M. Klein and D. Fensel―Ontology
Versioning for the Semantic Web,‖ Proc.
First Int‘l Semantic Web Working Symp.
(SWWS ‘01), pp. 483-493, 2001.

256
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EFFECTIVE AND EFFICIENT QUERY


PROCESSING FOR
IDENTIFYING VIDEO SUBSEQUENCE
Eswari.R
Easwari Engineering College, Anna University

eswariram_88@yahoo.co.in

Abstract— Many investigations have ANSWERING queries based on ―alike‖ but


been made on content-based video maybe not exactly ―same‖ is known as
retrieval. However, despite the similarity search. It has been widely used to
importance, video subsequence simulate the process of object proximity
identification, which is to find the similar ranking performed by human specialists, such
content to a short query clip from a long as image retrieval [1] and time series
video sequence, has not been well matching [2]. Nowadays, the rapid advances
addressed. This paper presents a graph in multimedia and network technologies
transformation and matching approach popularize many applications of video
to this problem, with extension to databases, and sophisticated techniques for
identify the occurrence of potentially representing, matching, and indexing videos
different ordering or length due to are in high demand. A video sequence is an
content editing. With a novel batch ordered set of a large number of frames, and
query algorithm to retrieve similar from the database research perspective, each
frames, the mapping relationship frame is usually represented by a high-
between the query and database video is dimensional vector, which has been extracted
first represented by a bipartite graph. from
The densely matched parts along the some low-level content features, such as
long sequence are then extracted, colordistribution, texture pattern, or shape
followed by a filter-and-refine search structure within the original media domain [3].
strategy to prune some irrelevant Matching of videos is often translated into
subsequences. During the filtering stage, searches among these feature vectors [4], [5],
Maximum Size Matching is deployed for [6], [7], [8], [9], [10], [11], [12], [13], [14],
each subgraph constructed by the query [15]. In practice, it is often undesirable to
and candidate subsequence to obtain a manually check whether a video is part of a
smaller set of candidates. During the long stream by browsing its entire length;
refinement stage, Sub-Maximum thus, a reliable solution of automatically
Similarity Matching is devised to identify finding similar content is imperative. Video
the subsequence with the highest subsequence identification involves locating
aggregate score from all candidates, the position of the most similar part with
according to a robust video similarity respect to a user-specified query clip Q from a
model that incorporates visual content, long prestored video sequence S. Ideally, it
temporal order, and frame alignment can identify relevantvideo, even if there exists
information. some transformation distortion, partial content
reordering, insertion, deletion, or replacement.
Keywords— Multimediadatabases, Its typical applications include the following:
subsequence identification, query processing,
similarity measures  Recognition for copyright
enforcement. Video content owners
would like to be aware of any use of
I. INTRODUCTION
their material, in any media or

257
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

representation. For example, the


producers of certain movie scenes
may want to identify whether or Fig.1.Visually similar videos,but not copies. (a)
where their original films have been Inserting different sales and local contact
reused by others, even with some kind information. (b) Modifying some content, and
of remixing for multimedia authoring. rearranging partial order.
 TV commercial detection.Some
companies would like to track their TV This paper addresses a different and
commercials when they are aired on considerably harder problem of searching
different channels during a certain visually similar videos. Different from copy
time period for statistic purpose. They detection which normally considers
can verify whether their commercials transformation distortions only, a visually
have been actually broadcasted as similar video can be further relaxed to be
contracted, and it is also valuable to changed with content editing at frame or shot
monitor how their competitors conduct level (swap, insertion, deletion, or
advertisements to apprehend their substitution), thus could lead to different
marketingstrategies. ordering or length with original source.Fig. 1
The primary difference between video retrieval shows two groups of similar TV commercials
and video subsequence identification is that, for Tourism New South Wales and a company
while retrieval task conventionally returns of Australia, respectively.Each of them is
similar clips from a large collection of videos displayed with five sampled frames extracted
which have been either chopped up into at the same time stamps. The corresponding
similar lengths or cut at content boundaries, videos in both groups are highly relevant, but
subsequence identification task aims at finding not copies. Another example is the extended
if there exists any subsequence of a long cinema version of Toyota commercial (60
database video that shares similar content to a seconds) and its shorter TV version (30
query clip. In other words, while for the seconds), which obviously are not copies of
former, the clips for search have already been each other by definition. On the other hand, a
segmented and are always ready for similarity video copy may be no longer regarded as
ranking [11], [12], [13], [14], [15], the latter visually similar if transformed
is a typical subsequence matching problem. substantially.Video subsequence matching
Because the boundary and even the length of techniques using a fixed length sliding window
target subsequence are not available initially, at every possible offset of database sequence
choosing which fragments to evaluate for exhaustive comparison [4], [5], [8] are not
similarities is not preknown. Therefore, most efficient, especially in the case of seeking over
existing methods for retrieval task on video a long-running video. Although a temporal skip
clip collections [11], [12], [13], [14], [15] are scheme using similarity upper bound [10],
not applicable to this more complicated [15] can accelerate the search process by
problem. reducing the number of candidate
subsequences, under the scenario that actually
a target subsequence could have different
ordering or length with a query, these
methods could be not effective.

Compared with existing methods, our


approach has the following distinct features:
 In contrast to the fast sequential
search scheme applying temporal
pruning to accelerate the search
process [10], [15] which assumes
query and target subsequence are
strictly of the same ordering and
length, our approach adopts spatial
pruning to avoid seeking over the

258
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

entire database sequence of feature The methods mentioned above only have
vectors for exhaustive comparison. been designedto detect videos of the same
 Our approach does not involve the temporal order and length. Tofurther search
presegmentation of video required by videos with changes from query due tocontent
the proposals based on shot boundary editing, a number of algorithms have been
detection [9], [19], [21], [22], [23]. proposedto evaluate video similarity. To deal
Shot resolution, which could be a few with inserting in orcutting out partial content,
seconds in duration, is usually too Hua et al. [6] used dynamicprogramming
coarse to accurately locate a based on ordinal measure of resampledframes
subsequence boundary. Meanwhile, at a uniform sampling rate to find the best
our approach based on frame matchfor different length video sequences.
subsampling is capable of identifying This method has onlybeen tested on a small
video content containing ambiguous video database. Through timewarping distance
shot boundaries (suchas dynamic computation, they achieved highersearch
commercial, TV program lead-in and accuracy than the methods proposed in [5]
lead-out subsequences). and [6].However, with the growing popularity
of video editing tools,videos can be temporally
II. RELATED WORK manipulated with ease. This workwill extend
the investigations of copy detection not only
inthe aspect of potentially different length but
A.Video Copy Detection
also allowingflexible temporal order (tolerance
Extensive research efforts have been made to content reordering).Cheung and Zakhor
on extractingand matching content-based [11], [12] developed Video Signatureto
signatures to detect copies ofvideos. Mohan summarize each video with a small set of
[4] introduced to employ ordinal measurefor sampledframes by a randomized algorithm.
video sequence matching. Naphade et al. [8] Shen et al. [13]proposed Video Triplet to
developedan efficient scheme to match video represent each clip with anumber of frame
clips using colorhistogram intersection.Pua et clusters and estimate the cluster similarityby
al. [9] proposed a methodbased on color the volume of intersection between two
moment feature to search video copy from hyperspheresmultiplying the smaller density. It
along segmented sequence. In theirwork, also derives the overallvideo similarity by the
query sequence slides frame by frame on total number of similar framesshared by two
databasevideo with a fixed length window. In videos. For compactness, these
addition to distortionsintroduced by different summarizationsinevitably lose temporal
encoding parameters, Kim andVasudev [5] information. Videos aretreated as a ―bag‖ of
proposed to use spatiotemporal frames thus they lack the ability todifferentiate
ordinalsignatures of frames to further address two sequences with temporal reordering,such
display formatconversions, such as different as ―ABCD‖ and ―ACBD.‖Various time series
aspect ratios (letter-box,pillar-box, or other similaritymeasures can be considered, such as
styles).Since the process of video Mean distance,DTW, and LCSS, all of which
transformation could give riseto several can be extended tomeasure the similarity of
distortions, techniques multidimensional trajectories and applied for
circumventingthesevariations by globe video matching. However, it adheres to
signatures have been considered. Theytend to temporal orderin a rigid manner and does not
depict a video globally rather than focusing on allow frame alignment orgap, and is very
itssequential details. This method is efficient, sensitive to noise. DTW can be utilized
but haslimitations with blurry shot boundaries toaddress frame alignment by repeating some
or very limitednumber of shots. Moreover, in frames as manytimes as needed without extra
reality, a query video clip canbe just a shot or cost [7], but no frame can beskipped even if it
even a subshot. However, this method isonly is just a noise. In addition, it is capacitylimited
applicable for queries which consist of multiple in the context of partial content reordering.
shots. LCSS isproposed to address temporal order
and handle possiblenoise by allowing some
B. Video Similarity Search elements to be skipped withoutrearranging the

259
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

sequence order [21], but it will ignore database video with query. It facilitates safely
theeffect of potentially different gap numbers. pruning a large portion of irrelevant parts and
As known from the research on psychology, rapidly locating some promising candidates for
the visualjudgment of human perception has further similarity evaluations. Constructing a
anumber of factors.The proposed model bipartite graph representing the similar frame
incorporating different factors formeasuring mapping relationship between Q and S with an
video similarity is inspired by the efficient batch kNN search algorithm [18], all
weightedschemes [19], [22], [23] originally the possibly similar video subsequences along
introduced at shot level. the 1D temporal line can be extracted. Then,
to effectively but still efficiently identify the
Definition 1.Video subsequence most similar subsequence, the proposed query
identification.Let Q = { q1,q2, . . . , q|Q| } processing is conducted in a coarse-to-fine
be a short query clip and S = {s1, s2, . . . , s|S| style. Imposing a one-to-one mapping
} be the long database video sequence, constraint similar in spirit to that of [19],
where qi = {qi1, . . . ,qid} ϵ Q and sj = {sj1, . Maximum Size Matching (MSM) [20] is
. . ,sjd} ϵ S are d-dimensional feature vectors employed to rapidly filter some actually
representing video frames, and |Q| and |S| nonsimilar subsequences with lower
denote the total frame number of Q and S, computational cost. The smaller numbers of
respectively (normally |Q|«|S|). Video candidates which contain eligible numbers of
subsequence identification is to find Ŝ = { sm, similar frames are then further evaluated with
sm+1, . . . , sn} in S, where 1≤m≤n≤|S|, relatively higher computational cost for
which is the most similar part to Q under a accurate identification. Since measuring the
defined score function. video similarities for all the possible 1:1
mappings in a sub graph is computationally
For easy reference, a list of notations used in intractable, a heuristic method Sub-Maximum
this paper is shown in Table 1. Similarity Matching (SMSM) is devised to
quickly identify the subsequence
corresponding to the most suitable 1:1
TABLE 1 mapping.
A List of Notations
A.Retrieving Similar Frames
Similar frame retrieval in S foreach element
qi ϵ Q is processed as a range or kNN
search.Given qi and S, Algorithm 1 gives the
framework of retrieving similar frames. The
output set F(qi) consists of frames of S.
However, as explained later, we are more
inclined to haveeach qi retrieve the same
number of similar frames, and thedifferences
ofthemaximumdistances, dmax(qi,sj), where
sjϵ F(qi) and dmax(qi',sj'), where sj'ϵ F(qi') can
vary substantially.Therefore, kNN search is
preferred.

Algorithm 1. Retrieve similar frames


Input:
III. PROPOSED WORK qi, S
Motivated by the efficient query capability Output:
brought by fruitful research in high- F(qi) - similar frame set of qi
dimensional indexing, we propose a graph Description:
transformation and matching approach to 1: if kNN search is defined then
process variable length comparison over 2: F(qi) ← {sj|sjϵ kNN(qi)};

260
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

3: return F(qi); E and the subsequences for the following


4: else steps.
5: F(qi) ← {sj|sjϵ range(qi)};
C. Dense Segment Extraction
6: return F(qi);
Along the S side of G with integer counts
7: end if
{0,1, . . . ,|Q|}, weconsider where the chances
of nonzero number presencesare relatively
B. Bipartite Graph Transformation
large. Considering potential frame
Each frame can be placed as a node along alignmentand gap, segments without strictly
the temporal line ofa video.Given a query clip consecutive nonzerocounts, e.g., the segment
Q and database video S, a shortline and a long {s1, . . . , s6} with counts ―241101,‖should also
line can be abstracted, respectively. be accepted. To depict the frequency of
Hereafter,each frame is no longer modeled as similarframe mappings, we introduce the
a high-dimensional pointas in the preliminary density of a segment.
step, but simply a node Q and S,which are two
finite sets of nodes ordered along the D. Filtering by Maximum Size Matching
temporallines, are treated as two sides of a
After locating the dense segments, we have k
bipartite graph. Formally, let G ={V, E} be a
separatesubgraphs in the form of Gk= {Vk, Ek},
bipartite graph representing the
where Vkis the vertex set while Ek is the edge
similarframemappings between Q and S.
set representing similar frame mappings.
However, high densityof a segment cannot
sufficiently indicate high similarity toquery due
to neglect of actual similar frame
number,temporal order, or frame alignment.

Definition 2. MSM A matching M in G =


{V,E} is a subsetof E, with pairwise
nonadjacent edges. The size of matching Mis
the number of edges in M, written as |M|.The
MSM of G isa matching MMSM with the largest
size |MMSM|.
Fig. 2.Construction of bipartite graph.
Observing the similar frame mappings
Relative to a matching Mk in Gk={Vk, Ek}, we
along the 1Dtemporal line of Sside, only a
say theverticesbelonging to the edges of Mk
small portion is denselymatched, while the
saturated by thematching, and the others are
most parts are unmatched at all or
unsaturated. MSM is characterizedby the
merelysparsely matched. Intuitively, the absence of augmenting paths [20]. A
unmatched and sparselymatched parts can be
matchingMk in Gk is its MSM if and only if Gk
directly discarded, as they clearlysuggest there
has noMk-augmentingpath. Starting with a
are no possible subsequences similar to
matching of size 0 in each
Q,because a necessary condition for a
subgraph,Augmenting Path Algorithm
subsequence to be similar to Qis they share
progressively selects theaugmenting path to
sufficient number of similar frames [11]. In
enlarge the current matching size by1 at a
view ofthis, we avoid comparing all the
time. We can search for an Mk-augmenting
possible subsequences in S,which is infeasible,
path fromeach Mk-unsaturated vertex. The
but safely and rapidly filter a largeportion of
detailed MSM algorithmcan be found in [20].
irrelevant parts prior to similarity evaluations.
Todo so, the densely matched segments of S
E. Refinement by Sub-MaximumSimilarity
containing all thepossibly similar video
subsequences have to be identified.Note that Matching
it is unnecessary to maintain the entire
The above filtering step can be viewed as a
graph.Instead, we just process the small-sized
rough similarity evaluation disregarding
temporal information. Observingthat a

261
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

segment may have multiple 1:1 mappings, query clip. Inthe preliminary phase, the similar
andthe most similar subsequence in S may frames of query clip areretrieved by a batch
only be a portion of the sequence, next, we query algorithm. Then, a bipartite graphis
further refine it to find the most suitable1:1 constructed to exploit the opportunity of
mapping for accurate identification (or spatial pruning;thus, the high-dimensional
ranking), byconsidering visual content, query and database videosequence can be
temporal order and framealignment transformed to two sides of a bipartitegraph.
simultaneously. Only the dense segments are roughly obtained
aspossibly similar subsequences. In the filter-
IV. EXPERIMENTS and-refine phase,some nonsimilar segments
are first filtered, several relevantsegments are
A. Effectiveness
then processed to quickly identify the
To measure the effectiveness of mostsuitable 1:1 mapping by optimizing the
ourapproach, we use hit ratio, which is defined factors of visualcontent, temporal order, and
as the number of our method to correctly frame alignment together. Inpractice, visually
identify the position of the most similar videos may exhibit with
similarsubsequence (ground-truth), to the differentorderings due to content editing,
total number of queries. Notethat since for which yields some intrinsiccross mappings.
each query there is only one target Our video similarity model which
subsequence(where the original fragment was elegantlyachieves a balance between the
extracted) in thedatabase, hit ratio approaches of neglectingtemporal order and
corresponds to P(1), i.e., the precisionvalue at strictly adhering to temporal order
the first rank. The original video has also isparticularly suitable for dealing with this
beenmanually inspected so that the ground- case, thus cansupport accurate identification.
truth of each queryclip can be validated. Although only color featureis used in our
experiments, the proposed approach
B. Efficiency inherentlysupports other features. For the
future work, we plan tofurther investigate the
To show the efficiency of our approach,
effect of representing videos by otherfeatures,
weuse response time, which indicates the
such as ordinal signature. Moreover, the
average running timeof a query. Without
weight ofeach factor for measuring
SMSM, all the possible 1:1 mappingswill be
video similarity might be adjustedby user
evaluated. Since it is computationally
feedback to embody the degree of similarity
intractable toenumerate all 1:1 mappings to
morecompletely and systematically.
find the most suitable one,and there is no
prior practical method dealing with
ACKNOWLEDGMENTS
thisproblem for performance comparison, we
mainly study theefficiency of our approach by Sound and Vision video is copyrighted. The
investigating the effect ofMSM filtering. Sound andVision videoused in this work is
Without MSM, all the segments extracted in provided solely forresearch purposes through
dense segment extraction will be processed, the TREC Video InformationRetrieval
while with MSM, only asmall number of Evaluation Project Collection. The authors
segments are expected. Note that wouldlike to thank the anonymous reviewers
theperformance comparison is not affected by for their comments,which led to improvements
the underlyinghigh-dimensional indexing of this paper. This work issupported in part by
method. Australian Research Council underGrant
DP0663272.
V. CONCLUSIONS
REFERENCES
This paper has presented an effective and
efficient queryprocessing strategy for temporal
[1] A.W.M. Smeulders, M. Worring, S. Santini,
localization of similarcontent from a long
A. Gupta, and R. Jain, ―Content-Based
unsegmented video stream, consideringtarget
Image Retrieval at the End of the Early
subsequence may be approximate occurrence
Years,‖ IEEE Trans. Pattern Analysis and
ofpotentially different ordering or length with

262
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Machine Intelligence, vol. 22, no. 12, pp. [12] S.-C.S. Cheung and A. Zakhor, ―Fast
1349-1380, Dec. 2000. Similarity Search andClustering of Video
[2] C. Faloutsos, M. Ranganathan, and Y. Sequences on the World-Wide-Web,‖ IEEE
Manolopoulos, ―Fast Subsequence Trans. Multimedia, vol. 7, no. 3, pp. 524-
Matching in Time-Series Databases,‖ Proc. 537, 2005.
ACM SIGMOD ‘94, pp. 419-429, 1994. [13] H.T. Shen, B.C. Ooi, X. Zhou, and Z.
[3] H. Wang, A. Divakaran, A. Vetro, S.-F. Huang, ―Towards Effective Indexing for
Chang, and H. Sun, ―Survey of Very Large Video Sequence Database,‖
Compressed-Domain Features Used in Proc. ACM SIGMOD ‘05, pp. 730-741,
Audio-Visual Indexing and Analysis,‖ J. 2005.
Visual Comm. and Image Representation, [14] H.T. Shen, X. Zhou, Z. Huang, J.
vol. 14, no. 2, pp. 150-183, 2003. Shao, and X. Zhou, ―Uqlips: A Real-Time
[4] R. Mohan, ―Video Sequence Matching,‖ Near-Duplicate Video Clip Detection
Proc. IEEE Int‘l Conf. Acoustics, Speech, System,‖ Proc. 33rd Int‘l Conf. Very Large
and Signal Processing (ICASSP ‘98), pp. Databases (VLDB ‘07), pp. 1374-1377,
3697-3700, 1998. 2007.
[5] C. Kim and B. Vasudev, ―Spatiotemporal [15] J. Yuan, L.-Y. Duan, Q. Tian, S.
Sequence Matching for Efficient Video Ranganath, and C. Xu, ―Fast and Robust
Copy Detection,‖ IEEETrans. Circuits and Short Video Clip Search for Copy
Systems for Video Technology, vol. 15, Detection,‖ Proc. Fifth IEEE Pacific-Rim
no. 1, pp. 127-132, 2005. Conf. Multimedia (PCM ‘04), vol. 2, pp.
[6] X.-S. Hua, X. Chen, and H. Zhang, ―Robust 479-488, 2004.
Video Signature Based on Ordinal [16] J. Shao, Z. Huang, H.T. Shen, X.
Measure,‖ Proc. IEEE Int‘l Conf. Image Zhou, E.-P. Lim, and Y. Li, ―Batch Nearest
Processing (ICIP ‘04), pp. 685-688, 2004. Neighbor Search for Video Retrieval,‖ IEEE
[7] C.-Y. Chiu, C.-H. Li, H.-A. Wang, C.-S. Trans. Multimedia, vol. 10, no. 3, pp. 409-
Chen, and L.-F. Chien, ―A Time Warping 420, 2008.
Based Approach for Video Copy [17] Y. Peng and C.-W. Ngo, ―Clip-Based
Detection,‖ Proc. 18th Int‘l Conf. Pattern Similarity Measure for Query-Dependent
Recognition (ICPR ‘06), vol. 3, pp. 228- Clip Retrieval and Video
231, 2006. Summarization,‖IEEE Trans. Circuits and
[8] M.R. Naphade, M.M. Yeung, and B.-L. Yeo, Systems for Video Technology, vol. 16, no.
―A Novel Scheme for Fast and Efficient 5, pp. 612-627, 2006.
Video Sequence Matching Using Compact [18] D.R. Shier, ―Matchings and
Signatures,‖ Proc. Storage and Retrieval Assignments,‖ Handbook of Graph Theory,
for Image and Video Databases (SPIE J.L. Gross and J. Yellen, eds., pp. 1103-
‘00), pp. 564-572, 2000. 1116, CRC Press, 2004.
[9] K.M. Pua, J.M. Gauch, S. Gauch, and J.Z. [19] L. Chen and T.-S. Chua, ―A Match and
Miadowicz, ―Real Time Repeated Video Tiling Approach toContent-Based Video
Sequence Identification,‖ Computer Vision Retrieval,‖ Proc. IEEE Int‘l Conf.
and Image Understanding, vol. 93, no. 3, Multimedia and Expo (ICME ‘01), pp. 417-
pp. 310-327, 2004. 420, 2001.
[10] K. Kashino, T. Kurozumi, and H. [20] X. Liu, Y. Zhuang, and Y. Pan, ―A New
Murase, ―A Quick SearchMethod for Audio Approach to Retrieve Video by Example
and Video Signals Based on Video Clip,‖ Proc. Seventh ACM Int‘l Conf.
HistogramPruning,‖ IEEE Trans. Multimedia (MULTIMEDIA ‘99), vol. 2, pp.
Multimedia, vol. 5, no. 3, pp. 348-357, 41-44, 1999.
2003. [21] Y. Wu, Y. Zhuang, and Y. Pan,
[11] S.-C.S. Cheung and A. Zakhor, ―Content-Based VideoSimilarity Model,‖
―Efficient Video SimilarityMeasurement Proc. Eighth ACM Int‘l Conf. Multimedia
with Video Signature,‖ IEEE Trans. Circuits (MULTIMEDIA ‘00), pp. 465-467, 2000.
and Systems for Video Technology, vol.
13, no. 1, pp. 59-74, 2003.

263
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

COMPUTER AND INFORMATION SECURITY


K.Veena
Information Technology Department, Anna University
14/68 Third Avenue, SundarNagar, Chennai 600 032., TamilNadu, India
dce.veena@gmail.com

Abstract—Computer security is a branch


Governments, military, corporations, financial
of computer technology known as
institutions, hospitals, and private businesses
information security as applied to
amass a great deal of confidential information
computers and networks.The objective
about their employees, customers, products,
of computer security includes protection
research, and financial status. Most of this
of information and property from theft,
information is now collected, processed and
corruption, or natural disaster,
stored on electronic computers and
whileallowing the information and
transmitted across networks to other
property to remain accessible and
computers.
productive to its intended users. The
term computer systemsecurity means This article presents a general overview of
the collective processes and mechanisms information security and its core concepts.
by which sensitive and valuable
information and services are protected
I. USER AUTHENTICATION SERVICES
from publication, tampering or collapse
by unauthorized activities or In this guide, get tips and advice on user
untrustworthy individuals and authentication and authorization methods
unplanned events respectively. and services: biometrics, two factor and
multifactor authentication, single sign-on
VIII. INTRODUCTION (SSO), smartcards, PKI.
A. ―Two-Factor and Multifactor
Information security means protecting Authentication Strategies‖
information and information systems from User names and passwords are no
unauthorized access, use, disclosure, longer enough and more enterprises
disruption, modification, perusal, inspection, are deploying two-factor or
recording or destruction. multifactor authentication products.
Browse the articles and advice in this
The terms information security, computer section for the latest information on
security and information assurance are using strong authentication in your
frequently incorrectly used interchangeably. organization.
These fields are interrelated often and share B. ―Enterprise Single Sign-On (SSO)‖
the common goals of protecting the Enterprise single sign-on (SSO)
confidentiality, integrity and availability of technologies can help reduce help-
information; however, there are some subtle desk calls and user mistakes by
differences between them. consolidating credentials into one
single password for all applications
Computer security can focus on ensuring the and services. Read the news and
availability and correct operation of a technical advice here to find
computer system without concern for the deployment strategies and practical
information stored or processed by the advice
computer. C. ―PKI and Digital Certificates‖
Using a public key infrastructure
(PKI), certificate authority (CA) and

264
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

digital certificates is a key way to security policy with trustworthy


develop a secure network hardware mechanisms.
infrastructure for user access, keep
data secure and eliminate hacker III. SECURITY ARCHITECTURE
threats. Get expert advice and tools
to implement PKI in your Security Architecture can be defined as the
organization. design artifacts that describe how the
D. ―Security Token and Smart Card security controls (security countermeasures)
Technology‖ are positioned, and how they relate to the
Get tips on how to use security overall information technology architecture.
tokens and smart card technology These controls serve the purpose to maintain
for secure user authentication. This the system's quality attributes, among them
resource defines what a smart card confidentiality, integrity, availability,
is, and provides information on accountability and assurance."
deployment, smart card writers and
readers and software
IV. NOTABLE SYSTEM ACCIDENTS
E. BiometricTechnology
Get advice from the experts on
biometrics technology and security. In 1994, over a hundred intrusions were
Learn about several biometric made by unidentified crackers into the Rome
authentication devices and methods Laboratory, the US Air Force's main
-- fingerprint and iris scanners, facial command and research facility. Using trojan
recognition -- that can be used for horse viruses, hackers were able to obtain
secure user access and how to unrestricted access to Rome's networking
implement them. systems and remove traces of their activities.
The intruders were able to obtain classified
II. SECURITY BY DESIGN files, such as air tasking order systems data
and furthermore able to penetrate connected
networks of National Aeronautics and Space
The technologies of computer security are
Administration's Goddard Space Flight
based on logic. As security is not necessarily
Center, Wright-Patterson Air Force Base,
the primary goal of most computer
some Defense contractors, and other private
applications, designing a program with
sector organizations, by posing as a trusted
security in mind often imposes restrictions on
Rome center user.
that program's behavior.
V. COMPUTER SECURITY POLICY
There are 4 approaches to security in
computing, sometimes a combination of
approaches is valid: On April 1, 2009, Senator Jay Rockefeller (D-
WV) introduced the "Cybersecurity Act of
2009 - S. 773" (full text) in the Senate; the
1. Trust all the software to abide by a
bill, co-written with Senators Evan Bayh (D-
security policy but the software is
IN), Barbara Mikulski (D-MD), Bill Nelson (D-
not trustworthy (this is computer
FL), and Olympia Snowy (R-ME), was
insecurity).
referred to the Committee on Commerce,
2. Trust all the software to abide by a
Science, and Transportation, which approved
security policy and the software is
a revised v which grants the President the
validated as trustworthy (by tedious
right to "order the version of the same bill
branch and path analysis for
(the "Cybersecurity Act of 2010") on March 2
example).
country's response to a "cyber-Katrina"),
3. Trust no software but enforce a
increase limitation or shutdown of Internet
security policy with mechanisms that
traffic to and from any c country's response
are not trustworthy (again this is
to a "cyber-Katrina"), increase limitation or
computer insecurity).
shutdown of Internet traffic to and from any
4. Trust no software but enforce a
compromised Federal Government or United

265
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

States critical infrastructure information H.R.4962" (full text) in the House of


system or network." The Electronic Frontier Representatives; the bill, co-sponsored by
Foundation, an international non-profit digital seven other representatives (among whom
rights advocacy and legal organization based only one Republican), was referred to three
in the United States, characterized the bill as House committees]. The bill seeks to make
promoting a "potentially dangerous approach sure that the administration keeps Congress
that favors the dramatic over the sober informed on information infrastructure,
response compromised Federal Government cybercrime, and end-user protection
or United States critical infrastructure worldwide. It also "directs the President to
information system or network." The give priority for assistance to improve legal,
Electronic Frontier Foundation, an judicial, and enforcement capabilities with
international non-profit digital rights respect to cybercrime to countries with low
advocacy and legal organization based in the information and communications technology
United States, characterized the bill as levels of development or utilization in their
promoting a "potentially dangerous approach critical infrastructure, telecommunications
that favors the dramatic over the sober systems, and financial industries"as well as
responses 4, 2010. The bill seeks to increase to develop an action plan and an annual
collaboration between the public and the compliance assessment for countries of
private sector on cybersecurity issues, "cyber concern"
especially those private entities that own
infrastructures that are critical to national VII. PROTECTING CYBERSPACE AS A NATIONAL
security interests (the bill quotes John ASSET ACT OF 2010 ("KILL SWITCH BILL")
Brennan, the Assistant to the President for
Homeland Security and Counterterrorism: On June 19, 2010, United States Senator Joe
"our nation‘s security and economic Lieberman (I-CT) introduced a bill called
prosperity depend on the security, stability, "Protecting Cyberspace as a National Asset
and integrity of communications and Act of 2010 - S.3480" (full text in pdf), which
information infrastructure that are largely he co-wrote with Senator Susan Collins (R-
privately-owned and globally-operated" and ME) and Senator Thomas Carper (D-DE). If
talks about the country's response to a signed into law, this controversial bill, which
"cyber-Katrina"].), increase public awareness the American media dubbed the "Kill switch
on cybersecurity issues, and foster and fund bill", would grant the President emergency
cybersecurity research. Some of the most powers over the Internet. However, all three
controversial parts of the bill include co-authors of the bill issued a statement
Paragraph 315, which grants the President claiming that instead, the bill "[narrowed]
the right to "order the limitation or shutdown existing broad Presidential authority to take
of Internet traffic to and from any over telecommunications
compromised Federal Government or United networks"Information security
States critical infrastructure information
system or network. The Electronic Frontier
VIII. INFORMATION SECURITY
Foundation, an international non-profit digital
COMPONENTS:
rights advocacy and legal organization based
or qualities, i.e., Confidentiality, Integrity and
in the United States, characterized the bill as
Availability (CIA). Information Systems are
promoting a "potentially dangerous approach
decomposed in three main portions,
that favors the dramatic over the sober
hardware, software and communications with
response.
the purpose to identify and apply information
security industry standards, as mechanisms
VI. INTERNATIONAL CYBERCRIME REPORTING of protection and prevention, at three levels
AND COOPERATION ACT or layers: Physical, personal and
organizational. Essentially, procedures or
On March 25, 2010, Representative Yvette policies are implemented to tell people
Clarke (D-NY) introduced the "International (administrators, users and operators)how to
Cybercrime Reporting and Cooperation Act - use products to ensure information security

266
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

within the organizations. in the management of their information


systems.

In the field of Information Security,


Harrisffers the following definitions of due
care and due diligence:

Attention should be made to two important


points in these definitions. First, in due care,
steps are taken to show - this means that
the steps can be verified, measured, or even
produce tangible artifacts. Second, in due
diligence, there are continual activities -
this means that people are actually doing
things to monitor and maintain the protection
mechanisms, and these activities are ongoing

CONCLUSIONS

Fig. 1 Example of an information security Information security is the ongoing process


concepts of exercising due care and due diligence to
protect information, and information
systems, from unauthorized access, use,
disclosure, destruction, modification, or
disruption or distribution. The never ending
IX. DEFENCE IN DEPTH process of information security involves
ongoing training, assessment, protection,
monitoring & detection, incident response &
repair, documentation, and review. This
makes information security an indispensable
part of all the business operations across
different .

ACKNOWLEDGEMENTS
The computer and information security is a
advanced topic. I owe a great many thanks
to a great many people who helped and
supported me during the writing of this
paper.

REFERENCES
Fig. 1 Example of a defence in depth [9] Firewalls VPNs Firewalls CompleteS.
The terms reasonable and prudent person, [10] Handbook of Information Security
due care and due diligence have been used Management (M.Krause.H.F.Tipton)
in the fields of Finance, Securities, and Law [11] The opensource PKI Book by
for many years. In recent years these terms Symeon(Simos)Xenitellis Mirror
have found their way into the fields of [12] IT Security Cookbook(Sean Boran)
computing and information security. U.S.A. [13] Intrusion Detection Systems(IDS)with
Federal Sentencing Guidelines now make it Snort advanced IDS with Snort Apache,
possible to hold corporate officers liable for MySQL, PHP, and ACID
failing to exercise due care and due diligence [14] O`reilly Security Books Chapters

267
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[15] Building Secure ASP.NET Applications


Authentication, Authorization and
Secure Communication
[16] The .NET Developers guide to Windows
Security(K.Brown)
[17] Building Internet firewalls(D.B.
Chapman E.D. Zwicky )
[18] Hacking Exposed 5th Edition
[19] Practical Unix and internet Security
Malware: Fighting malicious Code

268
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ENHANCING THE LIFETIME OF DATA


GATHERING WIRELESS SENSOR NETWORK
BY BALANCING ENERGY CONSUMPTION
J.Alen Jeffie Penelope, L.Bhakaya Laskshmi
Easwari Engineering College
Anna University
msg2allen@yahoo.co.in, phno:8015015378
Abstract-Unbalanced energy rotation schemes in terms of network
consumption is an inherent problem in lifetime.
wireless sensor networks characterized
by multihop routing and many-to-one 1 INTRODUCTION
traffic pattern, and this uneven energy
dissipation can significantly reduce Recent technological advances in sensor
network lifetime. In this paper, study the technology and wireless communication have
problem of maximizing network lifetime enabled the deployment of large-scale wireless
through balancing energy consumption sensor networks for a variety of applications
for uniformly deployed data-gathering including remote habitat monitoring,
sensor networks. Formulate the energy battlefield monitoring, and environmental data
consumption balancing problem as an collection (e.g., temperature, humidity, light,
optimal transmitting data distribution vibration, etc.). In such applications, hundreds
problem by combining the ideas of or even thousands of low-cost sensor nodes
corona-based network division and may be dispersed over the monitoring area,
mixed-routing strategy together with and these nodes self organize into a wireless
data aggregation. First propose a network, termed data-gathering sensor
localized zone-based routing scheme network, in which each sensor node must
that guarantees balanced energy periodically report its sensed data to the
consumption among nodes within each sink(s). Sensor nodes in such large-scale data-
corona. Then design an offline gathering sensor networks are generally
centralized algorithm with time powered by small inexpensive batteries in
complexity O(n) (n is the number of
coronas) to solve the transmitting data expectation of surviving for a long period.
distribution problem aimed at balancing Therefore, energy is of utmost importance in
energy consumption among nodes in power-constrained data-gathering
different coronas. The approach for sensor networks, and energy consumption
computing the optimal number of should be well managed to maximize the post
coronas in terms of maximizing network deployment network lifetime. Sensor-to-sink
lifetime is also presented. Based on the direct transmission is the easiest way for
mathematical model, an energy- reporting sensed data to the data sink(s) if the
balanced data gathering (EBDG) protocol transmission
is designed and the solution for range of each sensor node is large enough to
extending EBDG to large-scale data- reach a data sink. However, if each node uses
gathering sensor networks is also power-adjusted sensor-to-sink direct
presented. Simulation results transmission to report the sensed data, the
demonstrate that EBDG significantly nodes farther away from the sink run out of
outperforms conventional multihop energy quickly due to the long transmission
transmission schemes, direct distance. Moreover, sensor-to-sink long-range
transmission schemes, and cluster-head transmission is not energy efficient since the
transmission power is proportional to the

269
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

square or quadruple of the transmission based network division, mixed-routing, and


distance. To save energy, multihop routing is data aggregation, and can be summarized as
more preferable than sensor-to-sink direct follows: The network is divided into coronas
transmission for long-distance transmissions. centered at the sink with equal width, and all
However, multihop routing schemes tend to nodes in the same corona use the same
overuse the nodes close to the sink and make probability for direct transmission and the
them run out of energy quickly, leading to the same probability for hop-by-hop transmission.
existence of energy holes [3] around the Then the energy consumption balancing
sink(s). Experimental results in [4] show that, problem is divided into two sub problems:
by the time the sensors close to the sink intra- corona energy consumption balancing
exhaust their energy budget, up to 90 percent (Intra-CECB) and inter-corona energy
of the initial budget may still be available in consumption balancing (Inter-CECB). Intra-
the nodes farthest away from the sink. CECB is solved by optimally dividing each
Therefore, unbalanced energy consumption is corona to evenly distribute the amount of data
an inherent problem for both direct received by nodes in each corona. Inter-CECB
transmission and multihop routing schemes, is solved by optimally allocating the amount of
and this unbalanced energy consumption can data for direct transmission and hop by- hop
make the network collapse early due to the transmission. Finally, the NLM problem is
death of some critical nodes, resulting in solved by calculating the optimal number of
significant network lifetime reduction. Mixed- coronas that the network should be divided
routing scheme [1], [2], [6], [8], in which into. The main contributions of this paper can
each node alternates between hop-by-hop be summarized as follows: Propose a fully
transmission mode and direct transmission localized zone-based routing scheme for the
mode to report data, is an attractive scheme Intra-CECB problem. By optimally subdividing
for dealing with the unbalanced energy the coronas into zones and establishing the
consumption problem due to its simplicity and mapping between zones in different coronas,
effectiveness. In direct transmission mode, the zone-based routing scheme can evenly
each node sends its data directly to the sink distribute energy consumption among nodes in
without any relay, and this mode helps to each corona. . Map the network onto a linear
alleviate the relay burden for the nodes close network model, and design an algorithm with
to the sink. In hop-by-hop transmission mode, time complexity O(n) (n is the number of
each node forwards the data to its next hop coronas) to calculate the optimal data
neighbors, and this mode helps to relieve the distributions for all coronas with the objective
burden of long-distance transmission for the to balance energy consumption among nodes
nodes far away from the sink. Therefore, it is in different coronas. . Formulate the NLM
possible to obtain fairly even energy problem as a balanced energy consumption
consumption among all nodes by properly minimization problem and propose the solution
allocating the amount of data transmitted in for computing the optimal number of coronas
the two modes. An energy-balanced to maximize network lifetime. Design an
mixedrouting scheme beats every other energy-balanced data gathering protocol
possible routing strategies in terms of network called EBDG, which can achieve balanced
lifetime maximization (NLM). In this paper, energy consumption among nodes both within
investigate the problems of balancing Energy one corona and among different coronas. Also
consumption and maximizing network lifetime give the solution for extending EBDG to large-
for data-gathering sensor networks. Similar to scale data gathering sensor networks.
the models in [1] and [5], it is assumed that Simulation results show that EBDG can provide
the sensor nodes are uniformly deployed since near-optimal data gathering in terms of
near-uniform node deployment is one of the network lifetime. making it extremely difficult
easiest and most practical approaches to to traceback the real source. Therefore, the
provide full sensing coverage and connectivity. reason for lack of active back-hacking at
The solutions presented in this study are present is the lack of source tracing but this is
complimentary to existing work on designing changing rapidly.
random uniform node deployment schemes.
This approach takes full advantages of corona-

270
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2 RELATED WORK Power-adjusted transmission is


Current existing approaches for balancing another attractivescheme for balancing energy
energy consumption in wireless sensor consumption in wireless sensor networks. The
networks can be classified into four authors investigated the problems of avoiding
categories: Cluster-head rotation schemes, energy holes and maximizing lifetime in sensor
nonuniform deployment schemes, data networks with uniform distribution and
aggregation schemes, and power-adjusted uniform reporting based on corona-based
transmission schemes. network division and power-adjusted
Cluster-head rotation schemes (e.g., transmission. Mixed-routing scheme, in which
LEACH and each node alternates between direct
HEED) achieve fairly even energy consumption transmission and multihop transmission to
among nodes in each cluster by periodically transmit data. Efthymiou et al.proposed a
performing cluster head slice model and designed a probabilistic data
rotation among all nodes in the cluster. propagation algorithm for balancing energy
Clustering schemes such as EECS and UCS consumption in sensor networks with uniform
were further proposed to balance energy node deployment and uniform event
consumption among cluster heads by generationrate . Jarry et al. used the
partitioning the network into clusters with probabilistic data propagation algorithmin and
unequal size. However, to achieve desirable proved that there is a relationship between
balance of energy consumption, cluster-head energy balancing and life-span maximization.
rotation must be performed frequently, which However, the above schemes did not give the
may add excessive communication overhead solution for balancing energy consumption
to the network, resulting in much energy among nodes in the same slice and the
wastage. solution for maximizing network lifetime. The
Nonuniform deployment schemes, in problem of balancing energy consumption on
which additional nodes are deployed to the a
area with heavy traffic, have been proposed to linear data-gathering sensor network by
deal with the energy holes problem. A considering energy
nonuniform node distribution strategy was consumption for both data transmission and
proposed to increase network data capacity. data receiving
Liu et al. proposed a power-aware nonuniform was studied. The authors gave a formal
node distribution scheme to dealwith the sink- definition of an optimal data propagation
routing hole problem for a long-termed algorithm with the objective tomaximize
connectivity.Wu et al. proved that suboptimal network lifetime, and employed a spreading
balanced energy consumption is attainable technique to balance energy consumption
only if the number of nodes grows with among sensorswithin the same slice.
geometric proportion from the outer coronas Spreading schemes based on energy
to the inner ones except the outmost one. histogram were designed for energy efficient
However, nonuniform node distribution routing in wireless sensor network.
schemes may greatly increase the cost for The problem of maximizing network
deploying such networks since sensor devices lifetime of a sensor network has received
are more expensive than predicted. significant attention in the past few years.
Data aggregation has emerged as a Chang and Tassiulas proposed a shortest cost
useful paradigm in sensor networks. The key path
idea is to combine data from different sensors routing algorithm for maximizing network
to eliminate redundant lifetime based on
transmissions.Balancing energy consumption link costs that reflect both the communication
to increase networklifetime through data energy consumption rates and the residual
aggregation was discussed. The authors energy levels. Kalpakis et al. proposed
studied the problem of mitigating energy holes polynomial-time algorithms to solve the
by traffic compression and aggregation. maximum lifetime data gathering problem for
However, these studies do not explore the scenarios both
possibility of avoiding energy holes in data- with and without data aggregation. The
gathering sensor networks. authors addressed the problem of maximizing

271
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

network lifetime of wireless sensor networks transmission and direct transmission.


through optimal single-session flow routing.
The authors studied the problem of lifetime
maximization in interference-limited wireless
sensor networks through cross-layer design
techniques. Several distributed algorithms
were designed to maximize the lifetime of
wireless sensor networks.

3 SYSTEM MODELS AND PROBLEM


STATEMENT
3.1 Network Model

Similar to the models in [7], assume that all


sensor nodes are uniformly distributed in a
circular monitoring area A of radius R with
node distribution density. There is only one
sink which is located at the center of A. All
sensor nodes have the same maximum
transmission range rmax and the same amount Fig.1.Illustration of network division (n=4).
of initial energy budget. Each node has the For any node u Ci, it forwards its data to a
knowledge of its geographic location. It is neighbor in Ci-1 when hop-by-hop transmission
assumed that rmax ≥ R which guarantees that mode is used, and transmits its packets
each node can directly communicate with the directly to the sink when direct transmission
sink. This assumption puts constraint on the mode is used. Balanced energy consumption is
size of the network since sensor devices achieved by optimally distributing the amount
usually have limited transmission range. A of data for hop-by-hop and direct
solution based on clustering techniques is transmissions at each node. Assume that all
designed to remove this constraint so that the nodes in the same corona Ci have the same
schemes designed in this paper can be used in data distribution ratio pi. All nodes in Ci use
large-scale wireless sensor networks. The data the same transmission radius of i. r for direct
gathering operation is divided into rounds. In transmission and the same transmission radius
each round, all nodes generate data, perform of r for hop-by-hop transmission (r' is larger
data aggregation and send the data to the than r in order to balance energy consumption
sink. Between two adjacent rounds, all nodes among nodes in the same corona). Based on
turn off their radios to save energy. In this this, focus on solving the following problems:
study, all nodes in the network have the same 1. How to balance energy consumption among
data generation rate, and the amount of data nodes within each corona? Refer this problem
generated by every node in each round is l as Intra-Corona Energy Consumption
bits. Balancing (Intra- CECB).

3.2 Problem Statement


2. How to compute the optimal data
distribution ratio for each corona Ci so that
The circular monitoring area A is divided in to
energy consumption can be balanced among
n concentric coronas C1, C2,…,Cn centered at
nodes in different coronas? Refer this problem
the sink with the same width r(r = ), as as Inter-Corona Energy Consumption
Balancing (Inter-CECB).
shown in Fig.1.1 to balance energy
consumption, each node alternates between
two transmission modes: hop-by-hop 3. What is the optimal number of coronas n in
terms of maximizing the network lifetime?
Refer this problem as Network Lifetime
Maximization (NLM).

272
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4 INTRA-CORONA ENERGY 5 INTER-CORONA ENERGY


CONSUMPTION BALANCING CONSUMPTION BALANCING

This section focuses on solving the Intra-CECB This section concentrates on solving the Inter-
problem. The sufficient and necessary CECB problem. When energy consumption is
condition for Intra-CECB is first presented. It is balanced among nodes in Ci, all nodes in Ci
proven that energy consumption among nodes transmit the same amount of data Fi through
within each corona can be balanced if and hop-by-hop transmission mode, transmit the
only if theamount of data received by nodes in same amount of data Di through direct
this corona is balanced. Based on this transmission mode, and receive the same
observation, a localized zone-based routing amount of data Si from nodes in Ci-1.
scheme is designed to balance energy
consumption among nodes within each 5.1 Optimal Data Distribution Ratio
corona. Allocation

4.1 Zone-Based Routing Scheme Since balanced energy consumption might be


achieved by
The basic idea of the zone-based routing optimally distributing the amount of data for
scheme is explained as follows: each corona is hop-by-hopand direct transmissions at each
divided into subcoronas and each subcorona is node, this section is focused on
further divided into zones, as shown in Fig. 2. computing the optimal data distribution
There is a one-to-one mapping between the ratiofor each corona.
zones in two adjacent coronas. When hop-by-
hop mode is used, data communication is 6 NETWORK LIFETIME MAXIMATION
performed between nodes in two
corresponding zones. The objective is to This section focuses on solving the problem of
design an optimal zone division scheme so maximizing network lifetime by balancing
that the amount of data received by nodes in energy consumption among all nodes in the
each corona can be balanced. network.

6.1 Optimal Number of Coronas

Most data are transmitted through hop-by-hop


transmission mode which is more energy
efficient than long-distance direct
transmission. However, compared with direct
transmission mode, additional energy is spent
by receiving data in hop-by-hop transmission
mode. If the network is divided into only a few
coronas, it is likely to incur significant energy
dissipation due to the long hop-by-hop
Fig. 2. The partition of coronas Ci and Ci-1 transmission distance. On the other hand, if
the corona number is too large, a large
4.2 Hop-by-HopTransmissionRange amount of energy may be wasted by receiving
data. The objective of the NLM problem is to
In the zone-based routing scheme, the hop- compute the optimal number of coronas in
by- terms of maximizing the network lifetime.
guarantee that any node in any source zone Therefore, the NLM problem is a nonlinear
can transmit its data to any node in its integer programming problem. The each exact
corresponding destination zone. Therefore,r algorithm for solving nonlinear integer
programming problem has an exponential
computational complexity, which implies that it
terms of maximizing network lifetime. belongs to the class of NP-hard problem. In
practice, such problems are usually solved by

273
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

using heuristic algorithms. In this scheme, the scale sensor networks because of its high
optimal corona number is computed offline scalability and efficiency. By dividing the whole
using Simulated annealing algorithm and then network into small clusters, each node only
distributed to all nodes in network set-up needs to communicate with its cluster head
phase. through single or multihop routing, thereby
eliminating the requirement for large
7 The EBDG Protocol communication range. Similar to the work in it
is assumed that the network is composed of
In this section, the EBDG protocol is designed. two kinds of nodes: regular nodes and cluster-
The operation of EBDG is divided into two head nodes. The regular nodes have battery
phases: network set-up phase and data energy E0, and do the basic sensing, data
gathering phase. aggregation as well as packet relaying. The
cluster-head nodes are equipped with battery
7.1 Network Set-Up Phase energy E1 which is much lager than E0, and
these nodes are responsible for collecting data
As discussed in Sections 4 and 5, network from nodes within each cluster and
parameters such as the optimal number of transmitting the data to the remote sink. The
coronas n, the optimal number of subcoron as data gathering in the extended EBDG is
w, and the optimal data distribution ratio for performed as follows:
each corona can be computed offline. In
network set-up phase, the sink distributes  Intra-Cluster. In each cluster, EBDG is
these global parameters to all nodes through employed to gather the data from all
broadcasting, and each node establishes its nodes to the cluster head. Therefore,
corona, subcorona,and zone identifications energy consumption is balanced
based on these parameters. In hop-by-hop among nodes within each cluster.
transmission mode, EBDG uses the zone based  . Inter-Cluster. All the cluster heads
routing scheme presented in Section 4.1 to form into a superclusterwith the sink
balance energy consumption among nodes acting as the cluster head. Then EBDG
within the same corona. can be used to balance energy
consumption among cluster heads.
7.2 Data Gathering Stage Besides this approach, some schemes
based on mobile sink can also be used
In EBDG, all sensor nodes work in two states: to achieve the same goal.
active and
sleep. In active state, each node can transmit
data, receive data and perform data 9 SIMULATION RESULTS AND ANALYSIS
aggregation. In sleep state, each node turns
off its radio to save energy . In this section, the performance of EBDG is
evaluated through extensive simulations. To
8EXTENSION TO LARGE-SCALE DATA demonstrate the efficiency
GATHERING SENSOR NETWORKS of EBDG in terms of balancing energy
consumption and maximizing network lifetime,
In mixed-routing schemes, the basic EBDG is compared with a conventional
requirement is that all nodes must have the multihop routing scheme, a direct transmission
capability to directly communicate with the scheme, a cluster-head rotation scheme, and a
sink. However, in practice, most realistic maximum lifetime data gathering scheme.
sensor motes usually have limited transmission
range, which indicates that mixed-routing
scheme is not suitable for large-scale sensor
networks. In this section, the solution for
extending EBDG to large-scale sensor
networks is proposed by employing the
advantages of clustering techniques.
Clustering is a promising technique for large-

274
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

9.1 Comparison with Multihop Routing aggregation (MLDA) and without data
and Direct aggregation (MLDR).A centralized algorithm
Transmission Schemes which can generate a near-optimal data
gathering schedule in terms of maximizing
In EBDG, energy consumption among all network lifetime was proposed. In
nodes in the network is balanced in MLDA/MLDR, the data gathering schedule is a
expectation that all nodes shouldrun out of collection of directed trees rooted at the sink
energy at nearly the same time. This set of that spans all the sensors. Each tree may be
simulations are focused on evaluating the used for one or more rounds, and lifetime
performance ofEBDG in terms of energy maximization is Achieved by optimally
consumption balancing and network lifetime allocating the number of rounds to each
extension by comparing with multihoprouting tree.
and direct transmission schemes. For
multihoprouting schemes, the energy-efficient
geographic routingprotocols EEGR proposed is
referred for comparison EEGR can provide
near-optimal sensor-to-sink multihopdata
delivery in terms of minimizing the total
energy consumption for delivering each
packet. For the directtransmission scheme
(DT), in each round, every node generates its
data, performs data compression and
transmitsthe data directly to the sink without
any relay.

9.2 Comparison with Cluster-Head


Rotation Scheme Fig. 3 Network lifetime: Optimal schedule,
MLDA/MLDR, and EBDG
In this simulation, EBDG is compared with
LEACH inwhich cluster heads are frequently Fig.3 gives the comparisonofnetworklifetime
rotated to balance energy consumption among achieved by optimal schedule, MLDA/MLDR
nodes within clusters. For simplicity, only one and EBDG for scenariosboth with and without
cluster is considered in this simulation. When data aggregation. It can be seen that the
EBDG is performed, the cluster head is located lifetime achieved byEBDGkeeps very close to
at the center of the circle and remains that of MLDA/MLDR which is near optimal.
unchanged during the entire network lifetime.
The energy consumption at the cluster head is 10 CONCLUSIONS
not taken into account since the cluster head
in EBDG is equipped with much more power Unbalanced energy consumption is an
than regular nodes. When LEACH is important problem in wireless sensor
performed, the clusterhead is reelected at the networks, which can dramatically reduce
beginning of each round. To guarantee that network lifetime. This paper presents a
there is only one cluster head at each round, solution to maximize network lifetime through
the node that first broadcasts the cluster-head balancing energy consumption for uniformly
advertisement message is selected as the new deployed data-gathering sensor networks.
cluster head. Formulate the energy balancing problem as
the problem of optimal allocation of
9.3 Comparison with Maximum Lifetime transmitting data by combining the ideas of
Data corona-based network division and mixed-
Gathering Scheme routing strategy together with data
aggregation. Present the solutions for
Kalpakis et al. studied the problem of balancing energy consumption among nodes
maximizing lifetime data gathering in wireless both within the same coronas and within
sensor networks for scenarios both with data different coronas. Based on the model, an

275
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EBDG protocol and its extension to large-scale [3] J. Li and P. Mohapatra, ―Analytical
data-gathering sensor networks were Modeling and Mitigation Techniques for
developed. Simulation results show that EBDG the Energy Hole Problem in Sensor
can improve system lifetime by an order of Networks,‖ Pervasive and Mobile
magnitude compared with a multihop Computing, vol. 3, pp. 233-254, 2007.
transmission scheme, a direct transmission [4] J. Lian, K. Naik, and G.B. Agnew, ―Data
scheme, and a cluster-head rotation scheme. Capacity Improvement of Wireless
Future extensions of this work can be done in Sensor Networks Using Non-Uniform
two directions. First, this work is based on a Sensor Distribution,‖ Int‘l J. Distributed
collision-free MAC protocol. Future work can Sensor Networks, vol. 2, pp. 121-145,
be done to extend it to networks with 2006.
contention-basedMACprotocols. Second, in this [5] S. Olariu and I. Stojmenovic, ―Design
study, it is assumed that all nodes in the same Guidelines for Maximizing Lifetime and
corona have the same data distribution ratio Avoiding Energy Holes in Sensor
since assigning each node with a different Networks,‖ Proc. IEEE INFOCOM ‘06, pp.
data distribution ratio would significantly 1-12, 2006.
increase [6] O. Powell, P. Leone, and J. Rolim,
―Energy Optimal Data Propagation in
11 REFERENCES Wireless Sensor Networks,‖ J. Parallel
and Distributed Computing, vol. 67, pp.
[1] C. Efthymiou, S. Nikoletseas, and J. 302-317, 2007.
Rolim, ―Energy Balanced Data [7] X. Wu, G. Chen, and S.K. Das, ―On the
Propagation in Wireless Sensor Energy Hole Problem of Nonuniform
Networks,‖ Proc. 18th Int‘l Parallel and Node Distribution in Wireless Sensor
Distributed Processing Symp. (IPDPS Networks,‖ Proc. IEEE Int‘l Conf. Mobile
‘04), p. 225a, 2004. Adhoc and Sensor Systems (MASS ‘06),
[2] W. Guo, Z. Liu, and G. Wu, ―An Energy- pp. 180-187, 2006.
Balanced Transmission Scheme for [8] H. Zhang, H. Shen, and Y. Tan, ―Optimal
Sensor Networks,‖ Proc. First Int‘l Conf. Energy Balanced Data Gathering in
Embedded Networked Sensor Systems Wireless Sensor Networks,‖ Proc. 21st
(SenSys ‘03), pp. 300-301, 2003. Int‘l Parallel and Distributed Processing
Symp. (IPDPS ‘07), pp. 1-10, 2007.

276
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SECURITY ISSUES AND PRIVACY OF CLOUD


COMPUTING
*S.Amudha, **M.kavitha, ***P.Allirani

*S.Amudha,M.E Student, St Peter’s University


amudhasaravanan@gmail.com,9994554412

**M.kavitha Assistant professor, Sriram Engineering College


kavitha_kaushikraja@yahoo.co.in,9677296354

***P.Allirani Lecturer, Sriram Engineering College allirani07@gmail.com,9445933193


ABSTRACT network diagrams with a puffy cloud
The abstract discusses the issue of representing the behind-the-scenes
cloud computing and outlines its implications components. The cloud conveys the notion
for the privacy of personal information as well that you don‘t really need to know the location
as its implications for the confidentiality of or number of servers delivering the service.
business and governmental information. The Your experience is that you request a
report finds that for some information and for service—say, setting up a voice, video, and
some business users, sharing may be illegal, web conference, or viewing your tax-return
may be limited in some ways, or may affect status—and then by some magic you receive
the status or protections of the information it.
shared. The report discusses how even when Cloud computing unbinds a service
no laws or obligations block the ability of a from a particular infrastructure. A collection of
user to disclose information to a cloud servers stands at the ready, available to
provider, disclosure may still not be free of whichever agency or department needs them
consequences. The report finds that at any given moment. Depending on the
information stored by a business or an number of people using a service at the same
individual with a third party may have fewer or time, the cloud automatically pulls in the right
weaker privacy or other protections than number of servers, adding or releasing servers
information in the possession of the creator of dynamically as demand fluctuates.
the information. The report, in its analysis and Clouds are used for three main
discussion of relevant laws, finds that both purposes, and any agency might today or
government agencies and private litigants may someday use any or all of them:
be able to obtain information from a third • Software as a Service (SaaS): An
party more easily than from the creator of the example is Cisco WebEx, which agencies use
information. A cloud provider‘s terms of to enable people in different locations to
service, privacy policy, and location may collaborate with voice, video, and web sharing.
significantly affect a user‘s privacy and Another example is HR services common to
confidentiality interests. many agencies.
• Platform as a Service (PaaS): Some
Keywords:SaaS, PaaS, IaaS, MGCFP agencies have begun developing new software
not on their own server platform, but on a
1.INTRODUCTION shared platform in the cloud.
Cloud computing is all over the news •Infrastructure as a Service(IaaS):
as a cost-effective way to deliver innovative Certain government agencies have begun
government services over the network. The

277
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

sharing the infrastructure they use for voice, computing resources, and it is an alternative
video, or web applications. to having local servers handle applications.
Cloud Computing a Good Idea in Cloud computing groups together large
Government: numbers of compute servers and other
Cloud computing can reduce the costs resources and typically offers their combined
of existing services and enable government to capacity on an on-demand, pay-per-cycle
cost-effectively introduce enhanced services. basis. The end users of a cloud computing
Citizens benefit from cloud computing because network usually have no idea where the
their tax dollars are used more efficiently. servers are physically located—they just spin
Government IT costs often decrease because up their application and start working.
agencies don‘t need to purchase more Cloud computing is fully enabled by
capacity than they need to prepare for usage virtualization technology (hypervisors) and
spikes. Management costs can decrease, as virtualappliances. A virtual appliance is an
well. Agency IT personnel spend less time and application that is bundled with all the
resources making the IT infrastructure components that it needs to run, along with a
efficient, which enables them to focus on the streamlined operating system. In a cloud
core mission. Cloud computing also makes it computing environment, a virtual appliance
much easier for agencies to introduce new can be instantly provisioned and
citizen services. Examples include interactive decommissioned as needed, without complex
Web 2.0 applications that let you share videos configuration of the operating environment.
or collaborate with coworkers on a social This flexibility is the key advantage to
networking site. cloud computing, and what distinguishes it
from other forms of grid or utility computing
WHERE IS THE CLOUD? and software as a service (SaaS). The
An agency can host a cloud itself, ability to launch new instances of an
subscribe to a cloud service hosted by another application with minimal labor and expense
agency, or subscribe to a service from a third- allows application providers to:
party service provider. Some agencies • Scale up and down rapidly
subscribe to an external cloud for some • Recover from a failure
services and build a private cloud for others, • Bring up development or test instances
depending on the criticality and security • Roll out new versions to the customer base
classification of the service. The Office of • Efficiently load test an application
Management and Budget, General Service
Agency, and National Institute of Standards THE ECONOMICS OF THE CLOUD
and Technology are all defining standards for Before we delve into how to architect
cloud procurement and acquisition. an application for a cloud computing
To host a cloud, the agency needs a environment, we should explain why it is
platform with the following characteristics: financially advantageous to do so.
• Performance: The platform needs to be The first clear advantage of using an
able to support high transaction volume and existing cloud infrastructure is that you don‘t
multiple applications. have to make the capital investment yourself.
• Low management overhead: Acquiring Rather than expending the time and cost to
new servers should not add management build a data
burden. Automated provisioning offloads the center, you use someone else‘s investment
agency IT department. And to keep costs and turn what would have been a massive
down, IT staff should be able to manage capital
computing, storage access, outlay into a manageable variable cost.
networkinfrastructure, and virtualization from In the pay-per-cycle model of cloud
one interface. computing, you can start small and requisition
• Energy efficiency: Look for a platform that more computer time as your business grows.
minimizes the number of components to This makes starting up inexpensive, and gives
power and cool. you time to build your on-demand business
Cloud computing is a relatively new before investing in additional capacity. Instead
way of referring to the use of shared of investing ahead of demand, you simply use

278
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

and pay for exactly what you need when you open industry standard platforms J2EE. The
need it. portal enable the financial enterprise to have a
Development time can also be a common infrastructure that encapsulates
significant cost in creating an on-demand business rules, back-end connectivity logic and
application environment. If you adopt a SaaS transaction behavior, enabling banks to
model, your entire application must be re- writeonce, deploy-everywhere, across
architected to supportMulti-tenancy. With channels. The solution ensures a unified view
cloud computing, the cost of a machine year in of customer interactions to both the customers
the Amazon EC2 cloud (~$880 annually) is and the enterprises.
much less than the cost of a fully-loaded
developer (anywhere from $400-$1000 per Designing an application to run as a
day). This makes it a lot less expensive to virtual appliance in a cloud computing
scale up more virtual servers in the cloud than environment is very different than designing it
it is to spend even one day on development. for an on-premise or SaaS deployment. We
Finally, you can save money by discuss the following considerations. To be
designing your application with a simpler successful in the cloud, your application must
architecture ideal for cloud computing, which be designed to scale easily, tolerate failures
we‘ll spend the rest of this paper discussing. A and include management tools.
simpler architecture speeds time to market Scale
because it is easier to test, and you can Cloud computing offers the potential for nearly
eliminate some of the equipment and unlimited scalability, as long as the application
processes required to migrate an application is designed to scale from the outset. The best
from development into production. All the way to ensure this is to follow some basic
activities involved with development, test, QA application
and production can exist side-by-side in design guidelines:
separate instances running in the cloud. Start simple: Avoid complex design and
performance enhancements or optimizations in
ARCHITECTURAL CONSIDERATIONS favor
of simplicity. It‘s a good idea to start with the
simplest application and rely on the scalability
of the cloud to provide enough servers to
ensure good application performance. Once
you‘ve
gotten some traction and demand has grown,
then you can focus on improving the efficiency
of your application, which allows you to serve
more users with the same number of servers
or to reduce the number of servers while
maintaining performance. Some common
design techniques to improve performance
include caching, server affinity, multi-
Figure 1: MGCFP threading and tight sharing of data, but they
Figure 1 shows the architecture of all make it more difficult to distribute your
mining-grid centrice-finance portal (MGCFP) application across many servers. This is the
that has been developing byus. The MGCFP reason you don‘t want to introduce them at
consists of following primary applications,such the outset and only consider them when you
as banking, investment, insurances, need to and can ensure that you are not
mortgageand loans, wealth management as a breaking horizontal scalability.
set of integrated financialservices.The Split application functions and couple loosely:
architecture comprises a distributed, multi- Use separate systems for different pieces of
tired,service-oriented, component-based applicationfunctionality and avoid synchronous
solution that offers ahigh degree of connections between them. Again, as demand
modularity. The solution is available on the grows, you can scale each one independently
instead of having to scale the entire

279
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

application when you hit a bottleneck. The Fail


separation and reusability of functions Inevitably, an application will fail, no matter
inherent in SOA make it an ideal architecture what its environment. When you design an on-
for the cloud. premise or SaaS application, you typically
Network communication: Design the consider several ―doomsday‖ scenarios. The
application to use network-based interfaces same
and not interprocesscommunication or file- must be true for designing an application that
based communication paradigms. This allows runs in the cloud.
you to effectivelyscale in the cloud because Build-in resiliency and fault tolerance: To
each piece of the application can be separated tolerate failure, applications must operate as a
into distinctsystems. part of a group, while not being too tightly
Consider the cluster: Rather than scale a coupled to their peers. Each piece of the
single system up to serve all users, consider application should be able to continue to
splitting your system into multiple smaller execute despite the loss of other functions.
clusters, each serving a fraction of the Asynchronous interfaces are an ideal
application load. This is often called ―sharding‖ mechanism to help application components
and many web services can be split up along tolerate failures or momentary unavailability of
one dimension, often users or account. other components.
Requests can then be directed to the Distribute the impact of failure: With a
appropriate cluster based on some request distributed cloud application, a failure in any
attribute or users can be redirected to a one application cluster affects only a portion of
specific cluster at login. To deploy a clustered the application and not the whole application.
system, determine the right collection of By spreading the load across multiple clusters
servers that yield efficient application in the cloud, you can isolate the individual
performance, taking any needed functional clusters against failure in another cluster.
redundancy into account; for example, 2 web, Get back up quickly: Automate the launching
4 application and 2 database servers. You can of new application clusters in order to recover
then scale the application by replicating the quickly. Application components must be able
ideal cluster size and splitting the system load to come up in an automated fashion, configure
across the servers in the clusters. You‘ll soon themselves and join the application cluster.
realize the advantages of cloud computing Cloud computing provides the ideal
when it comes to scalability, such as: environment for this fast startup and recovery
• Inexpensive testing - testing can be done process.
against a test cluster without risking the Data considerations: When an application fails,
performance or integrity of the production data persistence and system state cannot be
system. You can also test the upper limits of taken for granted. To ensure data
the ideal cluster‘s performance by using ―robot preservation, put all data on persistent storage
users‖ in the cloud to generate load. and make sure it is replicated and distributed.
• Reduced risk - bring up a test instance of the If system state is stored and then used in the
cluster to prove a new code base, and roll out recovery process, treat it like data so the
a new version one cluster at a time. Fall back system can be restarted from the point of
to an older version if the new version doesn‘t failure.
work,without disrupting current users. Test your ―doomsday‖ scenario: Cloud
• Ability to segment the customer base – use computing makes it easy to bring up an
clusters to separate customers with varying instance of your
demands,such as a large customer who wants application to test various failure scenarios.
a private instance of theapplication, or one Because of the flexible nature of cloud
who requires extensive customizations. computing,
• Auto-scaling based on application load – with it is possible to simulate many different failure
the ready availability of resources, applications scenarios at a very reasonable cost. Single
can be built to recognize when they are instances of a system can be taken off-line to
reaching the limits of their current see how the rest of the application will
configuration and automatically bring up new respond. Likewise, multiple recovery scenarios
resources.

280
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

can be planned and executed ahead of any the application will slowly sprawl across the
real production cloud, wasting resources and money.
failure. Your management system also plays
Be aware of the real cost of failure: Of course an important role in the testing and
the ideal situation is avoiding any application deployment process. We‘ve already
failure, but what is the cost to provide that highlighted how the cloud can be used for
assurance? A large internet company once everything from general testing to load testing
said that they could tolerate failure as long as to testing for specific failure scenarios.
the impact was small enough as to not be Including your testing in your management
noticeable to the overall customer base. This system allows you to bring up a test cluster,
assertion came from an analysis of what it conduct any testing that is required and
would cost to ensure seven nines of then migrate the tested application into
application uptime versus the impact of a production. The uniform resources that
failure on a portion of the customer base. underlie the
Manage cloud mean that you can achieve a rapid
Deploying cloud applications as virtual release to production process, allowing you to
appliances makes management significantly deliver
easier. The appliances should bring with them updated features and functions to your
all of the software they need for their entire customers faster.
lifecycle in the cloud. More important, they Finally, by automating the creation
should be built in a systematic way, akin to an and management of these appliances, you are
assembly line production effort as opposed to tackling
a hand crafted approach. The reason for this one of the most difficult and expensive
systematic approach is the consistency of problems in software today: variability. By
creating and re-creating images. We have producing a
shown how effectively scaling and failure consistent appliance image and managing it
recovery can be handled by rapid provisioning effectively, you are removing variability from
of new systems, but these benefits cannot be the
achieved if the images to be provisioned are release management and deployment process.
not consistent and repeatable. Reducing the variability reduces the chances
When building appliances, it is obvious of mistakes – mistakes that can cost you
that they should contain the operating system money.
and any middleware components they need.
Less obvious are the software packages that The advantages of designing your application
allow them to automatically configure for management in the cloud include:
themselves, monitor and report their state
back to a management system, and update • Reducing the cost and overhead of preparing
themselves in an automated fashion. the application for the cloud
Automating the appliance configuration and • Reducing the overhead of bringing up new
updates means that as the application grows instances of the application
in the cloud, the management overhead does • Eliminating application sprawl
not grow in proportion. In this way appliances • Reducing the chance for mistakes as the
can live inside the cloud for any length of time application is scaled out, failed over,
with minimal management overhead. When upgraded, etc.
appliances are instantiated in the cloud, they
should also plug into a monitoring and
management system. This system will allow ISSUES OF CLOUD COMPUTING
you to track application instances running in The legal issues arising out of cloud
the cloud, migrate or shutdown instances as computing can be broadly categorized as
needed, and gather logs and other system operational, legislative or regulatory, security,
information necessary for troubleshooting or third party contractual limitations, risk
auditing. Without a management system to allocation or mitigation, and those relating to
handle the virtual appliances, it is likely that jurisdiction.

281
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Operational legal issues concern legal [1] J. Dean and S. Ghemawat, ―MapReduce:
issues that arise from the use of cloud Simplified DataProcessing on Large Clusters‖,
computing services on a day-to-day basis and Communications of the ACM,ACM, January
include concerns such as access to information 2008, pp. 107-113.
of the business and manner of storage of the
said information. It is imperative that such [2] L.A. Barroso, J. Dean, U. Holzle, ―Web
issues be addressed prior to availing services search for aPlanet: the Google Cluster
of a service provider and be adequately dealt Architecture‖, Micro, IEEE,March-April 2003,
with in the contractual negotiations. Also, pp.22-28.
included in operational issues are those of
upgrade and vendor lock-in. This would imply [3] S. Ghemavat, H. Gobioff, S. Leung, ―The
that the business must consider as to whether, Google FileSystem‖, SOSP, ACM, December
while performing its operations, it would be 2003, pp.29-43.
able to upgrade to newer operating
procedures and systems and who, and to what [4] L. Ivanov, H. Hadimioglu, M. Hoffman, ―A
extent, shall be responsible for the process. new look atParallel computing in the computer
science curriculum‖,Journal of Computing
Another operational concern that a business Sciences in Colleges, Consortium forComputing
must consider is data portability. Would it be Sciences in Colleges, May 2008, pp. 176-179.
possible, in the event of discontinuation of
relationship between the vendor and the [5] B. Raghavan, K. Vishwanath, S.
business or in case of technical, financial or Ramabhadran, K.Yocum, A.C. Snoeren, ―Cloud
other difficulties, for the business to access its control with distributed rate
information through other applications or limiting‖, SIGCOMM Computer Communication
service providers? It is essential for businesses Review,ACM, October 2007, pp. 337-348.
to consider such a scenario, since there have
been several instances of data being lost due [6] A. Weiss, ―Computing in the Clouds‖,
to technical hitches or due to the vendor networker, ACM,December 2007, pp.16-25.
closing up shop. Such contingencies, if
provided for and dealt with in the contract [7] J. Hu, N.Zhong, ―Developing Mining-Grid
between the parties can go a long way in Centric e-Finance Portal‖ in Proc. of the
eliminating risks and also allocating liability in International Conference onWeb Intelligence,
case of loss. Hong Kong, 2006, pp. 966-969.

CONCLUSION [8] Han Guo-wen, W. Young, ―Chaos and


Cloud computing represents an Complexity in theCumulative Effect of Financial
exciting opportunity to bring on-demand Innovation‖, in Proc. of theInternational
applications to customers in an environment of Conference on Management Science
reduced risk and enhanced reliability. andEngineering, Harbin, 2007, pp. 1635-1640.
However, it is important to understand that
existing applications cannot just be unleashed [9] B. Behsaz, P. Jaferian, M.R. Meybodi,
on the cloud as is. Careful attention to design ―Comparison ofGlobal Computing with Grid
will help ensure a successful deployment. Computing‖ in Proc. of theInternational
In particular, cloud-based applications Conference on Parallel and
should be deployed as virtual appliances so DistributedComputing, Applications and
they contain all the components needed to Technologies, December 2006,pp. 531-534.
operate, update and manage them. Simple
design will aid with scalability as demand for [10] V. Choudhary, ―Software as a Service:
the application increases. And, planning for Implications forInvestment in Software
failure will ensure that the worst doesn‘t Development‖ in Proc. of theInternational
happen when the inevitable occurs. Conference on System Sciences, January
2007,pp. 209.
REFERENCES

282
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[11] M. H. Ibrhaim, K. Holley, N. M. Josuttis, overlays: a middlewareApproach to network


B. Michelson,D. Thomas, John deVadoss, ―The heterogeneity‖, Operating SystemsReview,
future of SOA: whatWorked, what didn't, and ACM, May 2008, pp. 123-136.
where is it going from here?‖ inProc. of the
International Conference on Object [17] A. Ali, R. McClatchey, A. Anjum, I. Habib,
OrientedProgramming Systems Languages and K. Soomro,M. Asif, A. Adil, A. Mohsin, ―From
Applications, 2007,pp. 1034-1038. Grid Middleware to aGrid Operating System‖ in
Proc. of the InternationalConference on Grid
[12] M. Dan, ―The Business Model of and Cooperative Computing, 2006, pp.9-16.
Software-As-AService‖in Proc. of the
International Conference on [18] N. Ramon, J. Ferran, C. David, H. Kevin,
ServiceComputing, 2007, pp. 701-702. C. Jordi, L.Jesus, T. Jordi, ―Monitoring and
Analysis Framework forGrid Middleware‖ in
[13] E.R. Olsen, ―Transitioning to Software as Proc. of the International ConferenceOn
a Service:Realigning Software Engineering Parallel, Distributed and Network-Based
Practices with the NewBusiness Model‖, in Processing,2007, pp. 129-133.
Proc. of the International ConferenceOn
Service Operations and Logistics, and [19] N. Ramon, J. Ferran, T. Jordi, ―Should the
Informatics, 2006,pp. 266-271. grid
Middleware look to self-managing
[14] T. Al-Naeem, F.T. Dabous, F.A. Rabhi, B. capabilities?‖ in Proc.OfThe International
Benetallah,―Formulating the architectural Symposium on Autonomous
design of enterpriseApplications as a search DecentralizedSystems, March 2007, pp. 113-
problem‖, in Proc. of the AustralianSoftware 122.
Engineering Conference, Sydney, pp.282-291.
[20] F.M. Aymerich, G. Fenu, S.Surcis,
[15] N.Looker, J.Xu, ―Dependability ―ProgrammedBandwidth & Wavelengths pre-
Assessment of GridMiddleware‖ in Proc. of the allocation System forLambda Grids (through
International Conference onDependable lambda grid services)‖ in Proc. OfThe Wireless
Systems and Networks, 2007, pp.125-130. and Optical Communications, 2007, pp.111-
117.
[16] P. Grace, D. Hughes, B. Porter, G.S. Blair,
G. Coulson,F. Taiani, ―Experiences with open

283
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

COST EFFECTIVE WIRELESS HEALTH


MONITORING SYSTEM FOR INDUCTION
MOTORS

*Ms.Fathima.K., ** Ms. Kousalya, M.E.,


*M.E. (Embedded system technologies),Final year, EEE Department.(Affiliated to Anna University)
Vel Tech Multi Tech Dr.Rangarajan Dr.Sakunthala Engineering College,Avadi, Chennai-62.
Email id - fathimakhadar@gmail.com

**Assistant Professor,EEE Department.(Affiliated to Anna University)Vel Tech Multi Tech


Dr.Rangarajan Dr.Sakunthala Engineering College, Avadi, Chennai-62.Email id -
kousalya_r123@yahoo.co.in Cell no.:9944902586.Cell no.:9944734943
ABSTRACT: temperature function of the electrical current.,
causing the accident.
Motor in the national economy plays
an important role, the use of medium-sized At the same time, fault conditions
electric motor burned every year around about cannot store data records and cannot be set in
16%. In addition, because of electrical failure, accordance with the actual load current,
damage caused by accident, as well as other therefore not accurate, reliable and without
factory production cause the indirect economic visual convenience when used, is not
losses caused by the even greater. Detailed conducive to the exclusion of failure, often
analysis of motor operation and fault refusing to move, severe burning motor. With
conditions, the use of the current performance the digital computer, electronic technology,
of microcontroller, developed analysis of the rapid development of integrated circuits,
general motor protection devices, by computer monitoring and control system to
improving the software to hardware to achieve replace the traditional monitoring and control
much by the implementation of the function. system is ripe. The focus in most industry is
The fault conditions are to be stored in the shifting from scheduled maintenance to the
database when the results occur abnormal the predictive maintenance by constantly
differences between the values are to be observing and predicting the machine
calculated accurately to define the fault condition in advance.
occurrences and are rectified.
Most industrial motors are being
I.INTRODUCTION: monitored which either provide warning
signals or shut down the system before any
Motor protection devices, have a catastrophic failure occurs. Though they are
number of drawbacks, not the protection of a able to prevent permanent damage to the
good motor, very difficult to promote. Using of machine, they can neither predict the usable
integrated circuits without CPU produce the life of the equipment nor provide the severity
motor fault detector, although the current level of the problem. This resulted in the need
overrun and lack phase fault can be judged, of an advance system called Cost effective
current increase mainly as a criterion, the wireless health monitoring system.
protection of the principle of a rough, the II.RELATED WORK:
actual motor failure are the time and ambient

284
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Traditionally, monitoring system is the origin of which is external to the motor,


realized in wired systems formed by will be frequencies which are not changing in
communication cables. Most recent research time. The condition monitoring system will
has investigated the fault detection technique then compare the stator current signal
primarily based on the Motor Vibrations. measured over time with the baseline current
signal to determine if an increase in its
The cost of installation and harmonic components has occurred. Again, it
maintenance are difficult and expensive is important to note that the location of the
especially when the equipments are not at the harmonic component is not changing in almost
same location. To overcome these restrictions, any practical installation.
using wireless sensor networks for monitoring
is proposed in this paper. Although the fault Furthermore, this work is intended to
detection by means of analysis of the Vibration verify that measuring the displacement of the
is quite simple, the main drawback is that the air gap (i.e., the current harmonic
employed technique is only suitable for steady magnitudes) is a sufficient means of
state operation. Since the Motor Protection is approximating the displacement of the stator
also based on various other parameters. To frame and, thus, indicates changes in the
overcome these restrictions, using wireless overall vibration level of the machine. Simply,
sensor networks for monitoring is proposed in the method calculates vibration motion
this paper. arbitrarily, further current alone cannot be
taken as input and feedback for performance
analysis and health monitoring of motors
analyzed on their ability to detect induction
Vibration motor operation abnormalities. The detection
detection of electrical abnormalities through vibration
unit analysis is more beneficial when compared to
ZIGBEE
MSA, as it is non electrical contact Wireless
LCD Health monitoring system for induction motor.
Temperatur This paper proposes and develops a ZigBee
e sensing based wireless sensor network for health
unit monitoring of induction motors. The vibration
signals obtained from monitoring system are
processed with signal processing techniques.
Signal In order to predict the level of severity of rotor
Voltage condition PIC imbalance, the vibration detection techniques
detection unit were used.
unit

Wireless sensor network is a new


Current Information control network that integrates sensor,
detection unit wireless communication and distributed
unit intelligent processing technology. ZigBee is a
new wireless networking technology with low
power, low cost and short time-delay
Figure 1.Motor unit characteristics The system can be used to
collect reliable electric current and
The current signal of a single phase of temperature parameters, current of motor
the stator current and a vibration signal from a overload, overrun, leakage, unbalanced, All
vibration sensor located at the machine the parameters of motor health like Vibration,
bearing cap will serve as the baseline data. current, voltage, proper Phasing were
The vibration sensor is then removed after measured and calculated for determining a
installation. It is assumed that any externally variety of intelligent fault and fault happened
induced vibrations, i.e., mechanical vibrations, at or before the alarm system.

285
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The power supply is automatically cut framing, and then transmit the new data
off to prevent electrical damage and frames to the network coordination in
troubleshooting in time to remind the user. the same manner. Once receive the data, the
The motor unit has the PIC16f877a with CPU PAN network coordinator will upload the
where the data from the voltage and current receiving data to computer for further
sampling and signal processing unit is been processing and analysis.
received by mutual inductance current output
and mutual inductance voltage output. Current transformer:
Temperature sensing circuit and transform
circuit using vibration fault data storage to the A current transformer senses the
PIC microcontroller. Zigbee receiver is been current in induction motor and converts into
connected to the PIC and the LCD display for corresponding Voltage signal. The current
data displaying. The motor fault control circuit transformer output (through signal condition
and also the fault data storage. circuit) is given to input (Analog) port pin of
Microcontroller .Here we use PIC 16F877A, it
PROGRAM have Inbuilt of A/D Converter.
TO
MONITOR Potential transformer:
MOTOR DATA BASE
Similarly Potential transformer step down the
line voltage to 5V Peak. Voltage signal is given
to input (Analog) pin of PIC16F877A.
PC ZIGBEE
Example:
If Potential transformer output is given to
input AD0 (Channel 1) of PIC Controller means
Figure 2.Monitor unit
we have to Call Set_adc_channel (0), then
read_adc ();
III.ZIGBEE MODULE:
In Main function:
IEEE802.15.4 standard defines the
Int z;
protocol and interconnection of devices via
Set_adc_channel (0);
radio communication in a personal area
Delay_ms (100);
network (PAN). It operates in the ISM
Z=read_adc ();
(Industrial, Science and Medical) radio bands,
Here Variable Z have the Digital output of
at 868 MHz in Europe, 915 MHz in the USA
Potential Transformer
and 2.4 GHz worldwide. The purpose is to
provide a standard for ultra-low complexity,
Sensors:
Ultra-low cost, ultra-low power consumption,
Temperature sensor LM35:
and low data rate wireless connectivity. The
system framework for health monitoring
These sensors use a solid-state
system based on wireless sensor network is
technique to determine the temperature. The
made up of data collection nodes and PAN
fact as temperature increases, the voltage
network coordinator.
across a diode increases at a known
rate.Usually, a temperature sensor converts
The data collection nodes can carry
the temperature into an equivalent voltage
out desired functions such as detecting the
output IC LM35 is such a sensor.
vibration signals, signal quantizing, simple
processing, and the IEEE802.15.4 Standard
Here we describe a simple
package framing to transmit data to the PAN
temperature measurement and display system
network coordinator. In addition, they can also
based on LM35 sensor and PIC16F877A
receive data frames from other nodes, and
microcontroller. The temperature in degrees
then adding multi-hop information, package
Celsius is displayed on a 16×2 LCD. Fig. 3
shows the functional block diagram of the

286
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

PIC16F877A-based temperature monitoring Thermal model of low voltage motor is also


system. establish, achieve a variety of motor fault
General protection, as well as monitoring the
operation of the motor. After testing the
various parts of the hardware, the system can
Temperature sensor achieve the required accuracy of the
monitoring, stable Operation, the use of
effective, are in line with the target-site
requirements to ensure reliable operation of
the system, the promote a certain value.

IV.BIBLIOGRAPHY:
[1] HAO Yingji and LI Liangfu, ―A study of an
intelligent monitoring protection System based
on the 80C196 microcomputer for use with
motors,‖ Industrial Instrumentation
&Automation, No.4, 2001, pp.50-55.

Figure.3 .shows the circuit of the [2] Jiang Xianglong Cheng Shanmei and Xia
temperature monitoring system Litao,―The Design of Intelligent Monitor and
Protection System of AC Motor,‖ Monitor and
Vibration sensor: Protection, no.8, 2001, pp.18-20.
The Vibration Sensor can be mounted
on the grinding machine either by use of the [3] LI Jun bin, ZHANG Yanxian and YANG
magnetic mount provided, or by permanent Guangde, ―Application of PICMCU in
stud mount. The magnetic mount should be Electromotor Protect,‖ Journal of Zibo
used during initial system start up, until a University, Vol.3, No.4, Dec.2001, pp.57-59.
good permanent location is found on the
grinding machine for the sensor. The sensor [4] YI Pan, SHI Yihui and ZHANG Chengxue,
can then be permanently stud mounted at that ―Study of low voltage motor protection
location. When stud mounting the sensor, a devices,‖ RELAY, Vol.34, No.19, 2006, pp.7-
machined flat should be supplied at the 10.
mounting location on the machine
[5] ZHANG Nan, HUANG Yizhuang and LI
Signal conditioning unit: xuanhua, Multi task Processing in the
In both Current & voltage Integrated Protection Device,‖ Relay, Vol.31,
transformers outputs will be Alternative No. 3, 2003, pp.3132.
Current (AC). Microcontroller unit is working
under DC, so we must need to convert AC to [6] HU Zhijian, ZHANG Chengxue and CHENG
DC. For that purpose we use bridge rectifier Yunping, ―Study on Protective Algorithm for
and variable resistor. This setup is called as Elimination of Decaying Aperiodic
Signal conditioning circuit. Component,‖ Power System Technology,
Vol.25, No.3, 2001, pp.7211.

Conclusions: [7] LUO Shi ping, ―The Principles and Devices


Through various fault conditions of Relay Protection Realized by Microcomputer
analysis of the course motor running, and [M],‖Beijing: China Electric Power Press, 2001
using of mathematical models and simulate
the process of motor temperature. [8]. A. Mahmood, M. Aamir, and M. I. Anis,
Full advantage of the single-chip ―Design and Implementation of AMR Smart
system resources realize intelligent motor Grid System,‖ IEEE EPEC 2008 on Electric
protection and form a fully functional, practical Power, Conference, 2008, pp.1-6.
intelligent performance monitoring system
with a small number of Peripheral devices.

287
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[9]I. F. Akyildiz, W.Su,


Y.Sankarasubramaniam, and E.
Cayirci,―Wireless sensor networks
survey,‖Computer Networks, vol. 38,no. 4,
2002, pp. 393-422.

[10] C. M. Riley, B. K. Lin, T. G. Habetler, and


R. R. Schoen, ―A method
for sensor less on-line vibration monitoring of
induction machines,‖
IEEE Trans. Industry Applications, Vol. 34,
pp.1240-1245, Nov. 1998.

[11] Chipcon TI, ―A True System-on-Chip


solution for 2.4 GHz IEEE
802.15.4 /ZigBee® CC2430 DataSheet
(Rev.2.1),‖ 2008.

[12] IAR Systems, ―IAR Embedded Workbench


IDE User Guide,‖
ftp://ftp.iar.se/WWWfiles/guides/ide/ouew-
3.pdf, December 2004.

288
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

RANDOM CHECKPOINTING ARRANGEMENT


IN DECENTRALIZED MOBILE GRID
COMPUTING
*S.P.Santhoshkumar, **M.Yuvaraju
*
Department of Computer science and Engineering, Anna University of Technology, Coimbatore, India.
*sp_santhoshkumar@yahoo.co.in
**
Department of Computer science and Engineering, Anna University of Technology, Coimbatore, India.
**rajaucbe@gmail.com

Abstract–This Paper deals with an and application QoS level negotiation. It


autonomic, decentralized, QoSaware, fills a novel niche component of the ever
middleware, whose function is to developing field of MoG middleware, by
establish checkpointing arrangements proposing and demonstrating how QoS-
among MHs dynamically within the MoG, aware functionality can be practically
allowing its constituent MHs to support and efficiently added.
practical collaborative computation. As Keywords—Checkpointing,
wireless links are less reliable and MHs computational Grids, Mobile Grid
move at will, a given job executed by systems, Decentralized Checkpointing,
multiple MHs collaboratively, relies on Bayesian Estimation, Grid Computing,
efficient checkpointing to enable Mobile Computing
execution recovery upon a MoG I. INTRODUCTION
component failure by transferring While most existing Grids refer to
recently saved intermediate data and clusters of computing and storage resources
machine states to a substitute MoG which are wire-interconnected for offering
component, so that execution can utility services collaboratively, Mobile Grids
resume from the last checkpoint, saved (MoGs) are receiving growing attention and
prior to the failure. Our checkpointing expected to become a critical part of a future
methodology requires no BS to achieve computational Grid involving mobile hosts to
its function as checkpointing is handled facilitate user access to the Grid and to also
within the MoG by keeping checkpointed offer computing resources. A MoG can involve
data from a given MH at immediate a number of mobile hosts (MHs), i.e., laptop
neighboring MHs. Our methodology also computers, cell phones, PDAs, or wearable
facilitates encapsulation of the computing gear, having wireless
checkpointing function within the MoG, interconnections among one another, or to
making it transparent to the wired-Grid access points. Indeed, a recent push by HP to
or mobile client being served. Thus, in equip business notebooks with integrated
order to limit the use of relatively global broadband wireless connectivity, has
unreliable wireless links, and further to made it possible to form a truly mobile Grid
minimize the consumption of wireless (MoG) that consist of MHs providing
host’s memory resources and energy, computing utility services collaboratively, with
each MH sends its checkpointed data to or without connections to a wired Grid.
one selected neighboring MH, and also Due to mobility and intermittent
serves to take checkpointed data from wireless link loss, all such scenarios call for
one approved neighboring MH, realizing robust checkpointing and recovery to support
a decentralized form of checkpointing. It execution, minimizing execution rewind, and
provides implications for resource recovery rollback delay penalties. Depending
scheduling, checkpoint interval control, upon the application‘s or job‘s tolerance for

289
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

such delay, its performance can be poor or it rollback delay, and 2) providing performance
can be rendered totally inoperative and prediction to the scheduler, enabling the
useless. Our Reliability Driven middleware, client‘s specified maximum delay tolerance to
ReD, allows an MoG scheduler to make be better negotiated and matched with MoG
informed decisions, selectively submitting job resource capabilities. Suitable for scientific
portions to hosts having superior applications, MoGs are particularly useful in
checkpointing arrangements in order to ensure remote areas where access to the wired Grid is
successful completion by 1) providing highly infeasible, and autonomous, collaborative
reliable checkpointing, increasing the computing is needed.
probability of successful recovery, minimizing

Checkpointing is thus crucial for beginning in the presence of every failure,


practical and feasible job completion, for thus substantially enhancing the performance
without it, the MoG‘s potential is severely realized by grid applications.
limited. ReD works to maximize the probability A. Checkpointing in Wired Grid Systems
of checkpointed data recovery during job Checkpointing in wired Grid systems has been
execution, increasing the likelihood that a investigated earlier with various methodologies
distributed application, executed on the MoG, proposed [7], [8], [9], [10], [11], [12], [13],
completes without sustaining an unrecoverable [14] where hosts are connected by low
failure. It allows collaborative services to be latency, high-speed, wired links having low link
offered practically and autonomously by the and host failure rates [8], [11]. Diskless
MoG. Simulations and actual testbed checkpointing sends checkpointed data to a
implementation show ReD‘s favorable recovery cluster neighbor (instead of a local disk), in an
probabilities with respect to Random attempt to reduce checkpointing time
Checkpointing Arrangement (RCA) overhead on a LAN [10]. This works if
middleware, a QoS-blind comparison protocol message transmission time is less than disk-
producing random arbitrary checkpointing write time, a realistic possibility for a wired
arrangements. network. The method fails, however, to
The rest of this paper is organized as consider which neighbor or network path is
follows: Section 2 outlines related best used to reach storage. Recent work on
checkpointing work in wire-connected Grid portable checkpointing for wired Grids
systems and wireless systems with BSs. assumes a centralized ―middleware‖ support
Section 3 discusses our proposed Reliability for applications, checkpointing, and recovery
Driven (ReD) middleware. Section 4 Bayesian [10]. However, the MoG often requires
Estimation Algorithm Description. Section 5 decentralized ad-hoc support of Checkpointing
concludes the article. and recovery, due to its highly unreliable
II. RELATED WORK wireless connections and mobile environment.
At any time during job execution, a host or B. Checkpointing in Wireless Systems with BSs
link failure may lead to severe performance Mobile devices will be an integral part of
degradation or even total job abortion, unless distributed computing as their computational
execution checkpointing is incorporated. and storage abilities grow. Wireless
Checkpointing forces hosts involved in job communications advances, leading to high
execution to periodically save intermediate bandwidth and robustness, will enable such
states, registers, process control blocks, devices to practically operate as part of the
messages, logs, etc., to stable storage. This computational Grid. Hence, checkpointing in
stored checkpoint information can then be wireless computing systems has received
used to resume job execution at a substitute growing attention, with solution approaches
host chosen to run the recovered application treated. Specifically, a checkpointing tool for
in place of the failed host. Upon host failure the Palm Operating System has been
or inadvertent link disconnection, job developed [15] providing a set of APIs to
execution at a substitute host can then be enable checkpointing functionality on top of
resumed from the last good checkpoint. This the Palm OS. It is useful because the Palm OS
crucial function avoids having to start job causes a reset of the handheld computer upon
execution all over again from the very power loss. With this methodology,

290
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

checkpointed data must be stored on stable frequent need for multihop relays of
safe storage (i.e., a computer server or PC, checkpoint messages to access wired storage
dubbed a base station, BS, on a wired can lead to heavy traffic, significant latency,
network). The methodology is supported by and needless power consumption due to
recent routing mechanisms that interconnect collisions and interference.
inadvertently partitioned adhoc MH networks III. DECENTRALIZED CHECKPOINTING IN THE
[19]. Checkpointing wireless MHs to BSs has MOG
its own drawbacks, however, when not all MHs This work focuses on the MH
are adjacent to BSs or when BSs do not exist checkpointing arrangement mechanism,
(like the MoG at hand). Mobility is a major seeking superior checkpointing arrangements
impediment to moving checkpointed data from to maximize the probability of distributed
MH to BS. A complication is that routes application completion without sustaining an
between MH and BS change frequently due to unrecoverable failure. It deals with MoG
varying wireless links, complete and checkpointing among neighboring MHs without
intermittent disconnections, and mobility. The any access point or BS.

C. ReD‘s Heuristic Basis fully distributed manner, instead of attempting


ReD‘s algorithm takes into account desired a high-level centralized or global consensus,
behavioural controlling heuristics in the 3. ReD keeps checkpoint transmissions local,
following ways. First, we require the MoG to i.e., neighbor to neighbor, not requiring
be capable of autonomous operation without multiple hops and significant additional
an access point or BS and further to reduce transmission overhead to achieve
the use of relatively unreliable wireless links. checkpointing relationships, and
ReD ensures this by storing checkpointed data 4. ReD allows a given consumer or provider to
only at neighboring MHs, within the MoG, and breakits existing checkpointing relationship
not requiring BS access or checkpoint (when a provider breaks a checkpointing
transmission over multiple hops. Second, in a relationship, a break message is transmitted to
MoG, dynamicity ensures that a checkpointing the consumer) only when the arrangement
arrangement must be converged rapidly and reliability improvement is significant, thus
efficiently, even though it may only be close to promoting stability.
optimal. While it is true that poor
checkpointing arrangements play a role in D. ReD‘s Methodology
reducing the Ri, we seek to maximize, so too An executing host is considered to be in
do unconverged arrangements (i.e., ―failure,‖ if wireless connections to all of its
arrangements where a significant percentage neighbors are disrupted temporarily or
of consumers are still seeking to establish permanently, resulting in its isolation and
Checkpointing relationships with providers). To inability to achieve timely delivery of
ensure convergence within a reasonable time, intermediate or final application results to
ReD employs four strategies: other hosts. Executing MHs with poor
connectivity, have greater likelihood of
1. ReD is supported by a clustering algorithm, experiencing failure than do those with greater
which partitions the global MoG into clusters, connectivity and are thus in greater need of
allowing ReD to quickly and concurrently find checkpointing to the best, most reliably
superior arrangements within each cluster connected providers. In order to evaluate and
instead of having to labor toward a global MoG compare the strength of progressive
solution. While many clustering algorithms checkpointing arrangements, we calculate the
have been proposed for general ad-hoc reliability, Ri, of the whole arrangement on the
networks, a simple clustering algorithm is MoG structure (Mi). Link signal strength
devised and adopted both in our simulator and decreases inversely with the square of the
in our working testbed as a functional support distance between linked hosts. Reliability
layer for ReD, mapping for the link is thus based on this
2. ReD makes decisions about whether to assumed signal strength profile with failure
request, accept, or break checkpointing rate, i, assumed to be constant for the small
relationships, locally (at the MH level) and in a time interval, ti, typically a few milliseconds

291
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

in the mobile environment. ReD‘s


heuristicmethod ensures that checkpointing
arrangement decisions aremade locally and
individually at the host level, promoting rapid
convergence, while a threshold mechanism is
included in order to provide stability control.

Fig: 1. Architecture Diagram

IV. BAYESIAN ESTIMATION ALGORITHM Terms of Bayes‘ formula:


DESCRIPTION p(θ|x) = p(x|θ)p(θ) / p(x)
Given parametric model and data, estimate posterior = likelihood × prior /
model paramters (→ same setting as evidence
MLE).Bayesian estimation ≠ Bayes‘ rule. Pick Evidence p(x): Since data is assumed to be
the class which is most probable given the given (→ x fixed), p(x) is a normalization
data.For Bayes‘ rule, probability distribution is constant. Always think of the likelihood as a
assumed to be given. Even if we use function of θ:
parametric inference to obtain it, we may e. g. 1) Likelihood p(x|θ) is a density w.r.t. x, but x
use MLE rather than Bayesian is fixed to one particular value.
estimation.Modeled by parametric 2) p(x|θ) is no density w.r.t. θ (i. e. not
likelihood.Considers only maximizer of the normalized). Some people emphasize this by
likelihood.Consider all possible values of θ. writing e. g. l(θ) instead of p(x|θ).Generative
1) To rank them against one another, we need model of the data.Prior - User input! This is
their distribution. the key point of criticism often voiced
2) To take the data into account, we need a concerning Bayesian methods
conditional distribution of θ|x. Approach 1: Maximize it. This is called
→ how can we obtain p(θ|x) from likelihood?. maximum a posteriori estimation (MAP) and is
Posterior: Start with p(θ, x) and plug in the direct counterpart to MLE (Considers only
definition of conditional distribution. maximizer of the likelihood). We can apply the
Result: logarithm trick and obtain:
p(θ|x) = p(x|θ)p(θ)/p(x)
Consequence: To obtain data conditional
distribution of θ (―posterior‖) from likelihood
p(x|θ), we have to provide p(θ) (―prior‖). In Evidence p(x) not required. Not really
other words: For MLE, we need one model Bayesian estimation: Estimate once again
assumption (likelihood). To work with a full restricted to single value. We have penalized
distribution of the parameter, we need a MLE by prior knowledge.
second model assumption (prior). Since Approach 2: Compute expectations.
distribution of θ|x is provided by Bayes‘ 1. If interested in parameter estimate:
formula, estimation based on posterior Compute expectation w.r.t. posterior,
distribution is called Bayesian estimation.

292
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

p(x) = ∫p(x|)p()d
2. If interested in some statistic f(θ):
p(x|), p() knownp(x) computable, but
Compute Eθ|x [f(θ)]. This is a ―full‖ have to integrate.
Bayesian approach. Problem 2: Expectationsrequire integration
against p(x|)p().Analytic integration- Perfect
Problem 1: Normalization. How do we if it works, but even for many simple standard
compute the normalization constantp(x) (the models (e. g. Gaussian likelihood + Cauchy
evidence)? prior), integral has no analytic solution.
Quadrature-Next step if analysis does not
1) Evidence is value of joined distribution of work. Problem: Curse of dimensionality.
sample x1, ..., xn at the singlepoint (x1, ..., (Example: Estimate parameters of 3D
xn). We cannot hope to estimate p(x) from a Gaussianquadrature on 9D grid). Monte
single point! Carlo integration-E. g. MCMC sampling. Very
powerful, but requires some expertise.
2) Evidence is also normalization constant of
posterior, so

293
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

E1. Conjugate pairs [1]Paul J. Darby and Nian-Feng Tzeng,


Let the likelihood be in a family P1 of ―Decentralized QoS-Aware Checkpointing
distributions (e. g. P1 = Gaussian densities), Arrangement in Mobile Grid Computing‖ IEEE
and the prior in P2. Likelihood and prior form transactions on Mobile Computing, vol. 9, no.
a conjugate pair, if the resulting posterior is 8, August 2010.
again in P2. This Meaning isMany standard [2] SUN Microsystems, ―Sun Grid Compute
models have a known conjugate prior, which is Utility,‖ http://www.sun.com/service/sungrid,
also a standard model. Most standard models 2006.
can be handled analytically. Therefore: If our [3] Hewlett-Packard Development Company,
model has a conjugate prior, and if it is a L.P., ―Grid-Computing—Extending the
meaningful prior for our problem, we will be Boundaries of Distributed IT,‖
able to deal with the posterior.Some http://h71028.www7.hp.com/ERC/downloads/
examples (likelihood/prior):Gaussian 4AA03675ENW.pdf?jumpid=reg_R1002_USEN,
(unknown μ, fixed )/Gaussian, Gaussian Jan. 2007.
(fixed μ, unknown )/gamma, Gaussian (both [4] ―IBM Grid Computing,‖ http://www-
par. unknown)/Wishart, multinomial/Dirichlet. 1.ibm.com/grid/about_grid/what_is.shtml, Jan.
The data ―updates‖ the parameter values. 2007.
[5] S. Wesner et al., ―Mobile Collaborative
V. CONCLUDING REMARKS Business Grids—A Short Overview of the
As earlier proposed checkpointing approaches Akogrimo Project,‖ white paper, Akogrimo
cannot be applied directly to MoGs and are not Consortium, 2006.
QoS-aware, we have dealt with QoS-aware [6]Computerworld, ―HP Promises Global
checkpointing and recovery specifically for Wireless for Notebook
MoGs, with this paper focusing solely on PCs,‖http://www.computerworld.com/mobileto
Checkpointing arrangement. It has been pics/mobile/story/0,10801,110218,00.html?so
demonstrated via simulation and actual urce=NLT_AM&nid=110218, Apr. 2006.
testbed studies, that ReD achieves significant [7] J. Long, W. Fuchs, and J. Abraham,
reliability gains by quickly and efficiently ―Compiler-Assisted Static Checkpoint
determining checkpointing arrangements for Insertion,‖ Proc. Symp. Fault-Tolerant
most MHs in a MoG. ReD is shown to Computing, pp. 58-65, July 1992.
outperform its RCA counterpart in terms of the [8] K. Ssu, B. Yao, and W. Fuchs, ―An
average reliability metric and does so with Adaptive Checkpointing Protocol to Bound
fewer required messages and superior stability Recovery Time with Message Logging,‖ Proc.
(which is crucial to the checkpoint 18th Symp. Reliable Distributed Systems, pp.
arrangement, minimization of latency, and 244-252, Oct. 1999.
wireless bandwidth utilization). Because ReD [9] N. Neves and W. Fuchs, ―Coordinated
was tailored for a relatively unreliable wireless Checkpointing without Direct Coordination,‖
mobile environment, its design achieves its Proc. Int‘l Computer Performance and
checkpoint arrangement functions in a Dependability Symp., pp. 23-31, Sept. 1998.
lightweight, distributed manner, while [10] W. Gao, M. Chen, and T. Nanya, ―A
maintaining both low memory and Faster Checkpointing and Recovery Algorithm
transmission energy footprints.This work has with a Hierarchical Storage Approach,‖ Proc.
marked implications for resource scheduling, Eighth Int‘l Conf. High-Performance
checkpoint interval control, and application Computing in Asia-Pacific Region, pp. 398-402,
QoS level negotiation. It fills a novel niche Nov. 2005.
component of the ever developing field of MoG [11] R. de Camargo, F. Kon, and A. Goldman,
middleware, by proposing and demonstrating ―Portable Checkpointing and Communications
how QoS-aware functionality can be practically for BSP Applications on Dynamic
and efficiently added and how Bayesian Heterogenous Grid Environments,‖ Proc. Int‘l
estimation algorithm used for updating the Symp. Computer Architecture and High
data‘s. Performance Computing, pp. 226-234, Oct.
2005.
VI. REFERENCES [12] L. Wang et al., ―Modeling Coordinated
Checkpointing for Large- Scale

294
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Supercomputers,‖ Proc. Int‘l Conf. Dependable Trade-Off Analysis,‖ Proc. Symp. Fault-
Systems and Networks, pp. 812-821, July Tolerant Computing, pp. 16-25, June 1996.
2005. [17] H. Higaki and M. Takizawa, "Checkpoint-
Recovery Protocol for Reliable Mobile
[13] A. Agbaria and W. Sanders, ―Application- Systems," Proc. 17th IEEE Symp. Reliable
Driven Coordination- Free Distributed Distributed Systems, pp. 93-99, Oct. 1998.
Checkpointing,‖ Proc. 25th IEEE Conf. [18] C. Ou, K. Ssu, and H. Jiau, "Connecting
Distributed Computing Systems, pp. 177-186, Network Partitions with Location-Assisted
June 2005. Forwarding Nodes in Mobile Ad Hoc
[14] A. Oliner, R. Sahoo, J. Moreira, and M. Environments," Proc. 10th IEEE Pacific Rim
Gupta, ―Performance Implications of Periodic Int'l Symp. Dependable Computing, pp. 239-
Checkpointing on Large-Scale Cluster 247, Mar. 2004.
Systems,‖ Proc. 19th IEEE Int‘l Conf. Parallel [19] K. Ssu et al., "Adaptive Checkpointing
and Distributed Processing Symp., Apr. 2005. with Storage Management for Mobile
[15] C. Lin, S. Kuo, and Y. Huang, ―A Environments," IEEE Trans. Reliability, vol. 48,
Checkpointing Tool for Palm Operating no. 4, pp. 315-324, Dec. 1999.
System,‖ Proc. Int‘l Conf. Dependable Systems [20] G. Cao and M. Singhal, "Mutable
and Networks, pp. 71-76, July 2001. Checkpoints: A New Checkpointing Approach
[16] D. Pradhan, P. Krishna, and N. Vaidya, for Mobile Computing Systems," IEEE Trans.
―Recoverable Mobile Environment: Design and Parallel and Distributed Systems, vol. 12, no.
2, pp. 157-172, Feb. 2001.

295
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

NOVEL METHOD FOR SEQUENCE NUMBER


COLLECTOR
PROBLEM IN BLACK HOLE ATTACK
DETECTION-AODV BASED MANET
S.AARTHI
M.E Final Year, Dept of Computer Science and Engineering, RajalakshmiEngineeringCollege, Chennai.
Email id: cse.aarthi@gmail.com (Ph:9551003591)

Abstract—Mobile ad hoc network because it does not rely on preexisting


(MANET) is an autonomous system, infrastructure, such as routers in wired
where nodes/stations are connected networks or access points in managed
with each other through wireless links. (infrastructure) wireless networks. Instead,
MANETs are highly vulnerable to attacks each node participates in routing by
due to the open medium, dynamically forwarding data for other nodes, and so the
changing topology, lack of centralized determination of which nodes forward data is
monitoring and management point. The made dynamically based on the network
possible and the commonest attack in ad connectivity. The decentralized nature of
hoc networks is the black hole attack. In wireless ad hoc networks makes them suitable
the black hole attack, a malicious node for a variety of applications where central
advertises itself as having the shortest nodes can't be relied on, and may improve the
path to the destination node. In the scalability of wireless ad hoc networks
existing method a detection method compared to wireless managed networks,
based on checking the sequence number though theoretical and practicallimits to the
in the Route Reply message by making overall capacity of such networks have been
use of a new message originated by the identified. Minimal configuration and quick
destination node was developed but the deployment make ad hoc networks suitable for
drawback here is that a malicious node emergency situations like natural disasters or
can play a role of sequence number military conflicts. The presence of a dynamic
collector in order to get the sequence and adaptive routing protocol will enable ad
number of as many other nodes as hoc networks to be formed quickly. Wireless
possible. In the proposed system the ad hoc networks can be further classified by
sequence number collector problem is their application as mobile ad hoc
overcome by classifying the nodes into networks (MANETs), wireless mesh networks,
three categories based on the behavior. wireless sensor networks.
The malicious node is isolated from the
active data forwarding and routing. The A.CHARACTERISTICS OF MANET
association between the nodes are used
for the route selection. A wireless ad hoc network is a decentralized
wireless networks. The network is ad hoc
Keywords— Secured Routing, AODV, Black because it does not rely on a preexisting
hole attack, cooperative black hole attack, infrastructure, such as routers in wired
Malicious nodes, Adhoc network. networks or access points in managed
(infrastructure) wireless networks. Instead,
1. INTRODUCTION each node participates in routing by
A wireless ad hoc network is a decentralized forwarding data for other nodes, and so the
wireless network. The network is ad hoc determination of which nodes forward data is

296
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

made dynamically based on the network connection. Other AODV nodes forward this
connectivity. One of the classification of message, and record the node that they heard
wireless ad hoc network is MANET (Mobile ad it from, creating an explosion of temporary
hoc network).A mobile ad hoc routes back to the needy node. When a node
network (MANET), sometimes called a mobile receives such a message and already has a
mesh network, is a self-configuring network of route to the desired node, it sends a message
mobile devices connected by wireless links. backwards through a temporary route to the
Each device in a MANET is free to move requesting node. The needy node then begins
independently in any direction, and will using the route that has the least number of
therefore change its links to other devices hops through other nodes. Unused entries in
frequently. Each must forward traffic unrelated the routing tables are recycled after a time.
to its own use, and therefore be a route. The When a link fails, a routing error is passed
primary challenge in building a MANET is back to a transmitting node, and the process
equipping each device to continuously repeats.Much of the complexity of
maintain the information required to properly the protocol is to lower the number of
route traffic[1]. Such networks may operate messages to conserve the capacity of the
by themselves or may be connected to the network. For example, each request for a
larger Internet. MANETs are a kind of wireless route has a sequence number. Nodes use this
ad hoc networks that usually has a routeable sequence number so that they do not repeat
networking environment on top of a Link route requests that they have already passed
Layer ad hoc network. They are also a type on. Another such feature is that the route
of mesh network, but many mesh networks requests have a "time to live" number that
are not mobile or not wireless. MANET are limits how many times they can be
highly vulnerable to attacks due to open retransmitted. Another such feature is that if a
medium, dynamically changing network route request fails, another route request may
topology, co-operative algorithms, lack of not be sent until twice as much time has
centralized monitoring and management point, passed as the timeout of the previous route
lack of clear line defense. One of the typical request. The advantage of AODV is that it
routing protocols for MANET is called Ad Hoc creates no extra traffic for communication
On-Demand Distance Vector(AODV)[2] . One along existing links. Also, distance vector
of the possible and commonest attacks in ad routing is simple, and doesn't require much
hoc networks is the black hole attack. memory or calculation. However AODV
requires more time to establish a connection,
B. AODV PROTOCOL and the initial communication to establish a
Ad hoc On-Demand Distance Vector (AODV) route is heavier than some other approaches.
Routing is a routing protocol for mobile ad hoc A RouteRequest carries the source
networks (MANETs) and other wireless ad-hoc identifier ,the destination identifier ,
networks. It is a reactive routing protocol, the source sequence number , the destination
meaning that it establishes a route to a sequence number , the broadcast identifier ,
destination only on demand. In contrast, the and the time to live (TTL) field.
most common routing protocols of the
Internet are proactive, meaning they find C. BLACK HOLE ATTACK
routing paths independently of the usage of One of the possible and commonest attacks in
the paths. AODV is, as the name indicates, Ad hoc networks is the Black Hole attack[3].
a distance-vector routing protocol. AODV In the Black Hole Attack the malicious nodes
avoids the counting-to-infinity problem of advertise itself as having the shortest path to
other distance-vector protocols by using the destination node. Black hole attack is one
sequence numbers on route updates, a of the security threat in which the traffic is
technique pioneered by DSDV. AODV is redirected to such a node that actually does
capable of both unicast and multicast routing. not exist in the network. It‘s an analogy to the
In AODV, the network is silent until a black hole in the universe in which things
connection is needed. At that point the disappear. The node presents itself in such a
network node that needs a way to the node that it can attack other nodes
connection broadcasts a request for and networks knowing that it has the shortest

297
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

path. MANETs must have a secure way for AODV) to prevent security threats of black
transmission and communication which is hole by notifying other nodes in the network
quite challenging and vital issue. of the incident is proposed.The prevention
scheme detects the malicious nodes and
2. PROBLEM STATEMENT isolates it from the active data forwarding and
routing and reacts by sending ALARM packet
When a node moves out of the transmission to its neighbors. The calculation of the
range of the source node, the source node threshold value is difficult. Zhao Min et al.[7]
assumes to be normal node as a malicious proposed a method to prevent cooperative
node (False positive). When a normal node black hole attack for manets. Two
detects a malicious node the node will authentication mechanisms, based on the
broadcast an alarm message. However this hash function, the Message Authentication
alarm message may not arrive at the source Code (MAC) and the Pseudo Random Function
node in time for various reasons such as no (PRF), are proposed to provide fast message
route to the source node. Thus the source verification and group identification, identify
node cannot judge that there is a malicious multiple black holes cooperating with each
node in the root and begins to send data other and to discover the safe routing avoiding
along this dangerous route(False negative). cooperative black hole attack. Xiao Yang
The sequence number generated by a Zhang et al.[8] proposed a method to detect
destination node is very important, a malicious black hole attack in MANET. Every
node can play a role of sequence number conventional method to detect such an attack
collector in order to get the sequence number has a defect of rather high rate of
of as many other nodes as possible by misjudgment in the detection. In order to
broadcasting request with high frequency to overcome this defect, a new detection method
different nodes in MANET, so that this based on checking the sequence number in
collector always keeps the freshest of the Route Reply message by making use of a
sequence numbers of other nodes. new message originated by the destination
node and also by monitoring the messages
3. RELATED WORK relayed by the intermediate nodes in the route
Satoshi Kurosawa et al.[4] proposed a is proposed. Can detect more than one black
Dynamic Learning Method to detect black hole hole attacker at the same time. No need of
attack in AODV based MANET. This paper any threshold or trust level.When a node
analyzes the black hole attack which is one of moves out of the transmission range of the
the possible attacks in ad hoc networks. In source node the source node assumes the
conventional schemes, anomaly detection is normal node as a malicious node. When a
achieved by defining the normal state from normal node detects a malicious node the
static training data. However, in mobile ad hoc node will broadcast an alarm msg. however
networks where the network topology this alarm message may arrive at the source
dynamically changes, such static training node in time for various reasons such as no
method could not be used efficiently. In this route the source node. Thus the source node
paper, an anomaly detection scheme using cannot judge that there is a malicious node in
dynamic training method in which the training the route and begins to send data along this
data is updated at regular time intervals is dangerous route.H.Lan Hguyen et al.[9] made
proposed. Latha Tamilselvan et al.[5] a study of different types of attacks on
proposed a method to prevent black hole multicast in MANET.Security is an essential
attack in MANET. To reduce the probability it requirement in mobile ad hoc networks
is proposed to wait and check the replies from (MANETs).In the above study the following are
all the neighboring nodes to find a safe route. analyzed First, protocols that use the duplicate
Handling the timer expiration event is difficult suppression mechanism such as ODMRP,
The handling of route reply packet takes more MAODV, ADMR are very vulnerable to rushing
time. Payal N. Raj et al.[6] proposed a attacks. Second, although the operations of
dynamic learning system against black hole black hole attacks and neighbor attacks are
attack in AODV based Manet. In this paper, a different they both cause the same degree of
DPRAODV (Detection, Prevention and Reactive damage to the performance of a multicast

298
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

group in terms of packer loss rate. Finally, malicious behavior is very less. Table 1 is the
jellyfish attacks do not affect the packet Association table of node 1 in Fig.1
delivery ratio or the throughput of a multicast
group, but they severely increase the packet A.BLOCK DIAGRAM
end-to-end delay and delay jitter. The
performance of a multicast session in MANETs Input (Nodes
under attacks depends heavily on many entering the
factors such as the number of multicast MANET)
senders, the number of multicast receivers,
the number of attackers as well as their Behavior of the
positions. node is analyzed

4. PROPOSED SCHEME
Unassociated Associated Friend
(Blackhole node)
This section presents the extension of
Association based Routing which is to be
applied over the AODV protocol in order to Not given Given preference in the route
enhance the security. The purpose of this preference in the selection and data forwarding
route selection
scheme is to fortify the existing and data
implementation by selecting the best and forwarding Message is forwarded to
secured route in the network. For each node correct destination
in the network, a trust value will be stored
that represent the value of the trustiness to
each of its neighbor nodes. This trust value B.CALCULATION OF TRUST VALUE
will be adjusted based on the experiences that
the node has with its neighbor nodes. The trust values are calculated based on the
following parameters of the nodes. We
In our proposed scheme we classify the propose a very simple equation for the
Association among the nodes and their calculation of trust value.
neighboring nodes in to three types as below R=The ratio between the number of packets
forwarded and number of packets to be
forwarded. The threshold trust level is
1) UNASSOCIATED(UA) calculated by using (1)
The nodes that newly joined the network and A=Acknowledgement bit.(0 or 1)
those nodes which have not forwarded any T= tanh (R+A)
message comes under this category. The trust
levels are very low and the malicious behavior
is very high.

2) ASSOCIATED(A)
The nodes that have started to send message
but have some more messages to forward
comes under this category. The trust levels
are neither low nor too high, probability of
malicious nodes in the network is to be
observed.

3) FRIEND(F) Fig.1. Nodes in Adhoc network


The nodes that have forwarded all the
messages to the corresponding destination
falls under this category and the trust levels TABLE I: ASSOCIATION TABLE FOR NODE 1
between them are very high, probability of IN FIG. 1

299
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4) Neighbors 5) Nature of Next hop Next hop Path


Association neighbor in neighbor in selection
2 F the best the next
3 F path P1 best path
4 A P2
5 F F F F is chosen in
7 UA P1 or P2
based on the
length of the
path
F A F is chosen in
The threshold trust level for an unassociated
P1
node to become associated to its neighbor is
represented by TA and the threshold trust A F F is chosen in
level for a associated node to become a Friend P2
of its neighbor is denoted by TF. The A A A is chosen in
Associations are represented as P1 or P2
A (node x → node y) = F when T ≥ TF based on the
A (node x → node y) = A when TA≤ T < TF length of the
A (node x → node y) = UA when 0<T ≥ TA path
Also, the Association between nodes is F UA F is chosen in
asymmetric, (i.e.,) A (node x → node y) is an P1
Association evaluated by node x based on
trust levels calculated for its neighbor node y. The source selects the shortest and the next
A (node y → node x) is the Association from shortest path. Whenever a neighboring node is
the Association table of node y. This is a friend, the message transfer is done
evaluated based on the trust levels assigned immediately. This eliminates the overhead of
for its neighbor. Asymmetric Associations invoking the trust estimator between friends.
suggest that the direction of data flow may be If it is a associated or unassociated, transfer is
more in one direction. In other words, node x done based on the ratings. This protocol will
may not have trust on node y the same way converge to the AODV protocol if all the nodes
as node y has trust on node x or vice versa. in the ad hoc network are friends. In the
proposed scheme the route is not selected on
the basis of first arrival of RREP and waits till it
C.PATH SELECTON FOR ACTIVE DATA gets the RREP from all neighboring nodes and
FORWARDING decides the routing path based on the nature
of Association between them. Thus the black
When any node wishes to send messages to a hole nodes are identified as unassociated in
distant node, its sends the ROUTE REQUEST both the hops and were not given preference
to all the neighboring nodes. The ROUTE in the route selection.
REPLY obtained from its neighbor is sorted by
trust ratings. The source selects the most 5. CONCLUSION
trusted path. If its one hop neighbor node is a
friend, then that path is chosen for message In this paper we have discussed the
transfer. If its one-hop neighbor node is a characteristics of mobile adhoc network and
Associated, and if the one hop neighbor of the about the Black hole attacks. The proposed
second best path is a friend choose C. scheme of Association based AODV protocol
Similarly an optimal path is chosen based on increases the routing security and also
the degree of Association existing between the encourages the nodes to cooperate in the
neighbor nodes. adhoc structure. It identifies the malicious
nodes and isolates them from the active data
TABLE 2: PATH PREFERENCE AMONG NODES forwarding and routing. Since the black hole
PATH CHOSEN BASED ON PROPOSED SCHEME node is the one which do not forward any

300
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

message to the destination and consumes the [5] Latha Tamilselvan, V. Sankaranarayanan:
entire message our proposed scheme "Prevention of Black Hole Attack in MANEr",
identifies more than one black hole node and The 2nd international conference on wireless,
the data is not allowed to pass through the Broadband and Ultra Wideband
black hole node path thus delay and overhead Communications (January 2007).
in route selection is reduced. [6] Payal N. Raj and Prashant B. Swadas,
―DPRAODV: A Dynamic Learning System
REFERENCES Against Black Hole Attack In AODV Based
[1] Elizabeth M, Royer, and Chai-Keong Toh: Manet‖, IJCSI International Journal of
"A Review of CurrentRouting Protocols for Ad Computer Science Issues, Vol. 2, 2009.
Hoc Mobile Wireless Networks," IEEE Personal [7] Zhao Min, Zhou Jiliu, ―Cooperative Black
Communications, pp. 46-55, (April 1999). Hole Attack Prevention for Mobile Ad Hoc
[2] C.E. Perkins, S,R, Das, and E. Royer: "Ad- Networks‖, International Symposium on
I-Ioe on Demand Distance Vector(AODV)", Information Engineering and Electronic
RFC 3561. Commerce, 2009.
[3] H. Lan Nguyen and U, Trang Nguyen: "A [8] Xiao Yang Zhang, yuji Sekiya and Yasushi
study of different types of attacks on multicast wakahara, ―Proposal of a Method to Detect
in mobile ad hoc networks", Ad Hoc Black Hole Attack in MANET‖, Autonomous
Network,VoI.6,No. I, (2007). Decentralized Systems, pp 1-6, ISADS 2009 .
[4] Satoshi Kurosawa, Hidehisa Nakayama, Nei [9] H.Lan Hguyen and U.Trang Nguyen, ―A
Kato, Abbas Jamalipour, and Yoshiaki Nemoto: Study of Different Types of Attacks on
"Detecting Blackhole Attack on AODV-based multicast in Mobile Ad Hoc Networks‖, Ad Hoc
Mobile Ad Hoc Network by Dynamic Learning Network,Vol.6, No.1, 2007.
Method", International Journal of Network [10] Network Simulator Official Site for
Security, Vo1.5, PP.33S-346, (November, Package Distribution, web reference,
2007). http://www.isi.edulnsnam

301
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ADVANCED CONGESTION CONTROL


TECHNIQUE FOR HEALTH CARE MONITORING
IN WIRELESS BIOMEDICAL SENSOR
NETWORKS
*Jeshifa.G.Immanuel **Asst Prof.A.Fidal Castro ***Prof.E.Babu Raj

*M.E (CSE), Jaya Engineering College, Thirunindravur-602024 Email: jeshifa@gmail.com


Mobile no: 8015306705
**Department of Computer Science and Engineering,Jaya Engineering College, Thirunindravur-
602024

***Department of Computer Science and Engineering, Sun College of Engineering and Technology .

Abstract: 1. INTRODUCTION
Congestion is the difficult problem in 1.1 Wireless Sensor Networks:
wireless sensor networks, which causes an With the popularity of laptops, cell
increase in the amount of data loss and delays phones, PDAs, GPS devices, RFID, and
in data transmission. Both the node level and intelligent electronics in the post-PC era,
link level congestions have direct impact on computing devices have become cheaper,
energy efficiency and Quality of Data (QoD). more mobile, more distributed, and more
In this paper, we propose a new congestion pervasive in daily life. It is now possible to
control technique called Adaptive construct, from commercial off-the-shelf
Compression-based congestion control (COTS) components, a wallet size embedded
Technique (ACT) which is designed for remote system with the equivalent capability of a 90's
monitoring of patient vital signs and PC. Such embedded systems can be supported
physiological signals. The compression with scaled down Windows or Linux operating
techniques used in the ACT are Discrete systems. From this perspective, the
Wavelet Transform (DWT), Adaptive emergence of wireless sensor networks
Differential Pulse Code Modulation (ADPCM), (WSNs) is essentially the latest trend of
and Run-Length Coding (RLC).DWT is Moore's Law toward the miniaturization and
introduced for priority based congestion ubiquity of computing devices. Typically, a
control because it classifies the data in to four wireless sensor node (or simply sensor node)
groups with different frequencies. Congestion consists of sensing, computing,
is detected in advance, using ACT at each communication, actuation, and power
intermediate sensor node. The main purpose components. These components are
of ACT is to guarantee high quality of data by integrated on a single or multiple boards, and
reducing dropped data due to congestion. ACT packaged in a few cubic inches. With state-of-
increases the network efficiency and the-art, low-power circuit and networking
guarantees fairness to sensor nodes as technologies, a sensor node powered by 2 AA
compared with the existing methods. batteries can last for up to three years with a
Moreover; it exhibits a very high ratio of the 1% low duty cycle working mode. A WSN
available data in the sink. usually consists of tens to thousands of such
nodes that communicate through wireless
channels for information sharing and
cooperative processing. WSNs can be

302
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

deployed on a global scale for environmental packets in case of a concurrent occurrence of


monitoring and habitat study, over a battle multiple events. During congestion, sensor
field for military surveillance and nodes usually drop the overflowed packets;
reconnaissance, in emergent environments for however, packet drops lead to data loss and
search and rescue, in factories for condition unnecessary energy dissipation. Therefore,
based maintenance, in buildings for routing protocols should efficiently control the
infrastructure health monitoring, in homes to network congestion.
realize smart homes, or even in bodies for Conventional congestion control protocols
patient monitoring. usually use a back-pressure scheme, which
Wireless biomedical sensor networks reduces congestion by reducing the transfer
(WBSNs) involve a convergence of biosensors, rate of child nodes of a congested node and
wireless communication and network by decreasing the packet generation rate of a
technologies. These sensors are placed on the sensor node that causes heavy traffic.
human body hidden in their clothes allowing However, packet drops might still occur
the monitoring of a number of parameters in because the back pressure is propagated
their native environment. The patient‘s slowly due to the collision among sensor
physical signs are monitored using the nodes in case of serious congestion. Slow
wireless sensors and transferred in real time propagation of the back pressure may also
and this vital sign data is given to the cause fluctuation in packet flows because
corresponding hospital, emergency center or network state changes frequently with the
directly to individual doctors. Using health care back pressure. Moreover, a decrease in the
monitoring systems, it is possible to monitor packet generation rate leads to a reduction in
changes in physiological and vital signs of the fidelity of an event because heavy traffic is
patients and provide feedback to help generated by a sensor node that detects the
maintain proper health monitoring. Thus there specified event.
has been increased interest among research Various congestion control techniques
groups in developing wireless recording and are studied for wireless sensor networks, such
monitoring for real-time physiological signals as CODA, PCCP, QCCP-PS.CODA (Congestion
and vital signs [3] such as Electrocardiograms Detection and Avoidance) is an energy-
(ECGs), Blood Pressure (BP), Heart Rate (HR), efficient congestion control scheme and
Skin Temperature (ST), Electromyograms comprises of three basic mechanisms: (i)
(EMGs), Electro-encephalograms (EEGs), receiver-based congestion detection; (ii) open-
glucose level, and oxygen saturation. Some of loop hop-by-hop backpressure; and (iii)
these applications require sensor nodes to closed-loop multi-source regulation. CODA [6]
send data continuously, whereas in some detects congestion based on queue length as
applications, sensor nodes should send data well as wireless channel load at intermediate
only when a specified event or phenomenon nodes. It uses explicit congestion notification
occurs. With fast continuous transmission, the and an AIMD rate adjustment technique. PCCP
variation of the phenomenon in the is an upstream congestion control protocol for
environment can be monitored precisely; WSNs which measures the congestion degree
however, the dissipation of energy, which is a as the ratio of packet inter-arrival time to the
very critical resource, can drastically increase. packet service time. Based on the introduced
The energy consumption can be reduced by congestion degree and node priority index,
sending only the information of an event PCCP utilizes a cross-layer optimization and
occurrence, but the variation in the imposes a hop-by-hop approach to control
phenomenon cannot be monitored in detail. congestion. PCCP achieves efficient congestion
For energy efficiency and detailed monitoring, control and flexible weighted fairness for both
sensor nodes should control the data single-path and multi-path routing. These are
transmission interval in an inverse proportion general congestion control mechanisms for
to the variation in the phenomenon. wireless sensor networks, and none of them
Therefore, the transfer rate could be also made any special considerations for
varied according to the event occurrence. communication of biomedical signals. Queue
However, variable transfer rates might cause based Congestion Control Protocol with Priority
network congestion due to concentrated Support (QCCP-PS)[1] control the congestion

303
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

with the packet priority based on the node


priority for a WSN. QCCP-PS improved the Patient Sensor
PCCP by controlling the queue more finely. nodes nodes
However, it does not have any mechanism for
handling prioritized heterogeneous traffic in
the network.
The main purpose of the proposed ACT [2] Send patient details
Maintenance
scheme is to guarantee a high quality of data of records Control
by reducing dropped packets due to the congestion
congestion. First ACT reduces the amount of
generated packets on the source node with Collect the ACT
compression scheme-DWT, ADPCM, and RLC. physical Data protocol
information
Second ACT reduces the amount of base
transmitting rate on the relaying node with
compression scheme under the congestion by
adjusting the quantization step on the ADPCM. Scheduling patients details
Third ACT assigns the priority based on the Send special message
result of DWT to guarantee the reconstruction
of data with packet loss. Last, for the fast PDA sends a
propagation of congestion notification, the Priority special
queue is operated adaptively according to the Scheduler message
congestion state and queue state.

Taking Send message toprovider

decision
Service
provider
2. System Model
Fig 1: Architecture Diagram
Monitoring nodes
The Fig 1 shows the architecture
diagram where the physiological and vital
signs are monitored from sensor nodes. These
details are stored in the database where all
records about the patients are maintained.
The Priority Scheduler schedules the data
packets based on priority. During the
transmission of data packets the congestion is
controlled using ACT protocol.

2.1. Problem
There are mainly two types of
congestion in WSNs. The first type is the
node-level congestion which occurs due to
queue overflow inside the node. Queue
overflow might lead to packet drop and this
leads to retransmission if required and
therefore consumes additional energy.
Wireless channels are shared by several nodes
using Carrier Sense Multiple Access (CSMA)-
like protocols and thus collisions among sensor

304
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

nodes can occur when multiple sensor nodes packets with congestion notification. If a
try to occupy the channel concurrently. This is sensor node receives a routing packet with
the second type of congestion-link-level congestion notification from the parent node,
congestion. Link-level congestion increases the the child sensor node increases the
packet service time and decreases the link transmission interval of packets in the queue
utilization. Both the node level and the link- and checks whether the ARC and APC are
level congestions have direct impact on energy applicable. If the child node faces congestion
efficiency and Quality of Data (QoD). similar to the parent node, it propagates the
Therefore congestion must be efficiently congestion notification to its child nodes.
controlled. Congestion control protocol
efficiency depends on how much it can 2.3. Compression Technique:
achieve the following objectives: first, energy- In ACT we use three compression
efficiency should be improved in order to techniques namely DWT, ADPCM, and
extend system lifetime. Therefore congestion RLC.The Discrete Wavelet Transform (DWT),
control protocols need to avoid or reduce which is based on sub-band coding is found to
packet loss due to buffer overflow, and remain yield a fast computation of the Wavelet
lower control overhead that consumes less Transform.Adaptive DPCM (ADPCM) is a
energy; second, it is also necessary to support variant of Differential Pulse-Code Modulation
traditional QoS metrics such as packet loss (DPCM) that varies the size of the quantization
ratio, packet delay, and throughput; third, step, to allow further reduction of the required
fairness needs to be guaranteed so that each bandwidth for a given signal-to-noise ratio.The
node can achieve fair throughput. most commonly used entropy encoders are
In this paper we propose a concept the Huffman encoder and the arithmetic
called ACT (Adaptive compression-based encoder, although for applications requiring
congestion control technique. ACT transforms fast execution, simple Run Length Coding
the data from time domain to the frequency (RLC) is very effective. It is important to note
domain. This reduces the range of data by that a properly designed quantizer and
using ADPCM (Adaptive differential pulse code entropy encoder are absolutely necessary
modulation).Next it reduces the number of along with optimum signal transformation to
packets with the help of RLC before get the best possible compression.
transferring data to source node. ACT
introduces DWT (Discrete wavelet transform)
for priority-based congestion control because
the DWT Classifies data into four groups with
different frequencies. The ACT assign priorities
to these data groups in an inverse proportion
to the respective frequencies of the data
groups and defines the quantization step size Fig 2: Data compression procedure
of ADPCM in an inverse proportion to the The ACT first transforms the data from the
priorities. time domain to the frequency domain by using
the DWT, reduces the range of the data with
2.2. Operation of ACT: the help of ADPCM, and then reduces the
ACT checks the queue state number of packets by employing RLC before
periodically using routing timer. If the queue is transfer of data in source node. Then, it
congested with packets, then the ACT checks introduces the DWT for priority-based
whether the ARC and APC are applicable or congestion control because DWT classifies the
not. If the ARC and APC are applicable, the data into four groups with different
ACT applies the APC for source packets and frequencies. Subsequently, it assigns priorities
the ARC for transit packets. If the congestion to these data groups in an inverse proportion
persists, the quantization step size is increased to the respective frequencies of the data
drastically. If the quantization step size groups and defines the quantization step size
reaches the limit and the queue is still of ADPCM in an inverse proportion to the
congested, the ACT starts to drop packets with priorities. RLC generates a less number of
a low priority in the queue and send routing

305
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

packets for a packet with a low priority. In the queue is served in the clockwise direction and
relaying node, the ACT reduces the number of a routing packet is inserted at the end of the
packets by increasing the quantization step queue.
size of ADPCM in case of congestion. The
destination node (usually a sink node) 2.5. APC
reverses the compression procedure. A sink The DWT and RLC are similar to the
node should apply Inverse Run-Length Coding ones conventionally used, and ADPCM is a
(IRLC), Inverse Adaptive Differential Pulse reduced version used for sensed data
Code Modulation (IADPCM), and then Inverse compression. The encoder of the ADPCM
Discrete Wavelet Transform (IDWT). consists of a difference signal computation,
adaptive quantizer, inverse adaptive quantizer,
2.4. Adaptive Queue Operation in quantizer scale factor adaptation, and signal
Congestion reconstructor. The difference signal computer
The DWT classifies the incoming data subtracts the reconstructed signal from the
in to four groups based on priority; these data original data. The subtraction of two
will be given to the queue. The queue in the successive data packets that have similar
network layer works in the First Come First values reduces the range of data values. The
Serve mode [Fig 3-A]. range of values is reduced again by the
adaptive quantizer. The quantized data is
encoded by RLC and then inserted into the
network layer. The quantized data are also
inversely quantized by an inverse adaptive
quantizer and reconstructed into the signal by
the signal reconstructor. The number of
generated packets is varied by the adaptive
quantizer, which is controlled by the quantizer
scale factor adaptation. The quantizer scale
factor adapter increases the quantizer step
(A) size in proportion to the strength of
congestion. Further, the decoder of the
ADPCM consists of an inverse adaptive
quantizer, signal reconstructor, and quantizer
scale factor adapter.

2.6. ARC
The ARC controls the output packet
rate by re-encoding the transit packets in the
relaying queue. It consists of RLC, IRLC,
ADPCM, and IADPCM units.Transit packets are
(B) in a compressed state after RLC is performed,
and therefore, the ARC first decodes them
Fig 3: Queue Operation in Congestion with the help of IRLC. Then, the
decompressed transit packets are
If there is no packet in the sending reconstructed by IADPCM and compressed
queue, the packet is served immediately [Fig again with a different quantizer step size,
3-A-(2)]. If packets are already present in the which is affected by the congestion state.
queue, the packet that was inserted last has
3. Simulation Results
to wait for the previous packets to be served
[Fig 3-A-(3)]. If there is no congestion, We use a simulation study to evaluate
packets will be served in the order in which the performance of the proposed protocol
they are inserted. If packets are congested in under different scenarios. For this purpose, we
queue, they must wait until the congestion is simulated a wireless biomedical sensor
reduced [Fig 3-A-(4)].During congestion, if the network including 10 different patients.

306
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A central computer gathers information Parameter Value


about 4 different vital signs, (namely ECG, BP, Average delay of 1 ms
HR and ST) for each patient and records them the proposed
in a database. We suppose that these different protocol for Class4
vital signs require different priorities, and thus
assigned them different weights. The weights Average delay of 1.2 ms
assigned to Class 4, Class 3, Class 2 and Class the proposed
protocol for Class3
1 data packets were 0.4, 0.3, 0.2 and 0.1,
respectively. The simulation time is set to Average delay of 1.3 ms
1000 s. Each queue size was set to 100 the proposed
packets, and we consider exponentially protocol for Class2
weighted service time at each sensor node. At Average delay of 1.6 ms
the beginning of the simulation, we assume all the proposed
end sensor nodes (the patients) have the
protocol for Class1
same priorities. The source priorities assigned
to NORMAL, URGENT and CRITICAL patients Average delay of 1.3 ms
were set to 1, 2 and 3, respectively. the PCCP

TABLE I.IMPACT OF THE SERVICE We evaluate the impact of service


DIFFERENTIATION AND differentiation and prioritization in monitoring
PRIORITIZATION ON NORMALIZED AVERAGE vital signs and physiological signals in a
THROUGHPUT WBSN. We assume that all patients in the
system are in NORMAL condition. From the
Parameter Value results in Table 1, we can observe that the
Total average 0.95073 proposed protocol can assign network
throughput of the bandwidth to each traffic class based on its
proposed protocol weight (0.4 for Class 4, 0.3 for Class 3, 0.2 for
Class 2 and 0.1 for Class 1). The Class 4 has
Average 0.381514 the highest throughput while Class1 has the
throughput of the lowest throughput.
proposed protocol Conclusion:
of Class4 In this paper, we presented a service
Average 0.28931 prioritization and congestion control protocol
throughput of the for wireless biomedical sensor networks
proposed protocol involved in healthcare monitoring. At the local
of Class3 wireless device that gathers the patient‘s
Average 0.187313 physiological data (the PDA in our description),
throughput of the the sensed vital signs and physiological signals
proposed protocol are grouped into different classes. Using
of Class2 weighted scheduling mechanisms, higher
Average 0.094495 priority classes are given a better quality of
throughput of the service and more bandwidth than the lower
proposed protocol priority classes. Congestion is detected in
of Class1 advance, using a simple queue-based
Total average 0.816194 congestion detection strategy at each
throughput of intermediate sensor node. Based on the
the PCCP current congestion degree and the priority of
its child nodes, the parent node dynamically
TABLE II.IMPACT OF THE SERVICE computes and allocates the transmission rate
DIFFERENTIATION AND for each of its children. When the central
PRIORITIZATION ON THE AVERAGE DELAY computer which maintains the physiological
data for each patient detects any anomaly in
the received data, it sends a special message

307
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

to the particular patient‘s sensor node and 3. H. L. Ren, M. Q. H. Meng, and X. J. Chen,
increases the patient‘s priority. All sensor "Physiological Informationacquisition through
nodes along the path detect this change in wireless biomedical sensor networks,"
situation and allocate more network bandwidth Proceeding ofIEEE International Conference on
for vital signs and physiological signals from Information Acquisition July2005.
the patients in need. 4. M.H.Yaghmaee, Donald Adjeroh, ―A new
priority based congestioncontrol protocol for
wireless multimedia sensor networks‖,
InProceedings, 9th IEEE International
References: Symposium on a World of Wireless,Mobile and
1. Moghaddam, M.H.Y. Adjeroh, ―A Novel Multimedia Networks (WOWMOM 2008),
Congestion Control Protocol for Vital Signs Newport Beach,CA, June 23-27, 2008.
Monitoring in Wireless Biomedical Sensor 5. B. Hull, K. Jamieson, and H. Balakrishnan,
Networks,‖ Proceeding of IEEE International ―Mitigating congestion inwireless sensor
Conference on Information Acquisition 08 July networks,‖ in Proc. ACM Sensys ‘04, Baltimore,
2010. MD,Nov. 3–5, 2004.
2. Lee, J.-H.; Jung, I.-B.Adaptive-Compression 6. C.-Y. Wan, S. B. Eisenman, and A. T.
Based Congestion Control Technique for Campbell, ―CODA: congestiondetection and
Wireless Sensor Networks.Sensors2010, 10, avoidance in sensor networks,‖ in Proc. ACM
2919-2945. Sensys ‘03,Los Angeles, CA, Nov. 5–7, 2003.

308
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A GAME THEORETIC FRAMEWORK FOR


POWER CONTROL IN WIRELESS AD HOC
NETWORKS
*Nithya Kumari.K **Bhagyalakshmi.L
*M.E(Software Engineering)Easwari Engineering CollegeAnna University
{kumari.nithu@yahoo.co.in}9884987753
**Assistant ProfessorEaswari Engineering CollegeAnna University
{suman_bl @yahoo.co.in}9841751064

Abstract—In infrastructure-less ad hoc networks, with these thresholds. The power level at which a
efficient usage of energy is very critical because node should transmit, to maximize its utility, is
of the limited energy available to the sensor evaluated. Moreover, we compare the utilities
nodes. Among various phenomena that consume when the nodes are allowed to transmit with
energy, radio communication is by far the most discrete and continuous power levels; the
demanding one. One of the effective ways to limit performance with discrete levels is upper
unnecessary energy loss is to control the power bounded by the continuous case. We define a
at which the nodes transmit signals. In this distortion metric that gives a quantitative
paper, we apply game theory to solve the power measure of the goodness of having finite power
control problem in a CDMA-based distributed levels and also find those levels that minimize the
sensor network. We formulate a noncooperative distortion. Numerical results demonstrate that
game under incomplete information and study the proposed algorithm achieves the best
the existence of Nash equilibrium. With the help possible payoff/utility for the sensor nodes even
of this equilibrium, we devise a distributed by consuming less power.
algorithm for optimal power control and prove Index Terms—Wireless ad hoc network,
that the system is power stable only if the nodes game theory, distributed power control,
comply with certain transmit power thresholds. energy efficiency.
We show that even in a noncooperative scenario,
it is in the best interest of the nodes to comply
I. INTRODUCTION
The advancements in wireless communication development of low-cost, low-power, multifunctional
technologies coupled with the techniques for sensor networks. The sensor nodes in these networks
miniaturization of electronic devices have enabled the are equipped with sensing mechanisms that gather and
process information. These nodes are also capable of be as energy efficient as possible. Since the
communicating untethered over short distances [1]. transmission of data signals consumes the most energy,
Oftentimes, sensor networks are deployed at locations transmission at the optimal transmit power level is very
that do not allow human intervention due to difficulty in crucial. This is because a node will always try to
accessing such areas; hence, refurbishing energy via transmit at high power levels just to make sure that the
replacing battery is infeasible. As a result, these packets are delivered with a high success probability.
networks are deployed only once with finite amount of Hence, smart power control algorithms must be
energy available to every sensor node. As energy is employed that find the optimal transmit power level for
depleted for sensing, computing, and communication a node for a given set of local conditions. Some
activity, the algorithms and protocols that are used must distributed iterative power control algorithms have been

309
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

proposed for cellular networks; these algorithms  We investigate the existence of Nash equilibrium
investigate to find the power vector for all the nodes [14]. We observe that there exists a transmission
that minimizes the total power with good convergence power threshold and channel quality threshold that
[2], [3]. In this respect, it is important that concepts the nodes must comply with in order to achieve
from game theory are used to guide the design process
of the nodes that work in a distributed manner. Ideas Nash equilibrium. We also observe that with
and fundamental results from game theory have been repeated games in effect, sensor nodes follow the
used for solving resource management problems in transmission strategies to achieve Nash equilibrium
many computational systems, such as network even without presence of any third party
bandwidth allocation, distributed database query enforcement.
optimization, and allocating resources in distributed
systems such as clusters, grids, and peer-to-peer  Next, we consider a system that would allow only
networks ([4], [5], [6], [7], [8], [9], [10], [11], and finite number of discrete power levels. A metric
references therein). In a game theoretic framework, the
called distortion factor is defined to investigate the
nodes buy, sell, and consume goods in response to the
prices that are exhibited in a virtual market. A node performance of such system and compare it with
attempts to maximize its ―profit‖ for taking a series of systems that would allow any continuous power
actions. Whether or not a node receives a profit is levels. We also propose a technique to find the
defined by the success of the action; for example, power levels that would minimize the distortion.
whether a packet is successfully received. The essence
of this research is the application of game theory to  We present numerical results to verify the
achieve efficient energy usage through optimal selection performance of the proposed games. The results
of the transmit power level. show that if the nodes comply with the transmit
In this paper, we take a game theoretic approach to
thresholds, net utility is maximized. Also with the
regulate the transmit power levels of the nodes in a
distributed manner and investigate if any optimality is proposed mechanism of finding discrete power
achievable. We focus on the problem of optimal power levels, distortion factor is reduced.
control in wireless sensor networks with the aim of
maximizing the net utilities (defined later) for the
rational sensor nodes. One may argue that the sensor II. GAME THEORY FOR AD HOC/SENSOR
nodes usually belong to the same authority, and hence NETWORKS
they can be programmed to negotiate strategies that is
most advantageous for the entire network. However,
this claim may not be applicable to the power control Game theory has been successfully used in ad hoc and
problem in sensor networks as strategies for sensor networks for designing mechanisms to induce
transmission power and negotiation for self-coexistence desirable equilibria by offering incentives to the
must be done in real time and distributed manner [12], forwarding nodes [15], [16], [17] and also punishing
[13]. We adopt a noncooperative game model where nodes for misbehaving [18]. Recently, there has been a
each node tries to maximize its net utility. Net utility is growing interest in applying game theoretic techniques
computed by considering the benefit received and the to solve problems where there are agents/nodes that
cost.In summary, the contributions of this paper are as might not have the motive or incentive to cooperate.
follows: Such noncooperation is very likely since the rational
 We formulate a noncooperative game under agents will not work (e.g., forward packets) for others
incomplete information for the distributed sensor unless, and until, convinced that such cooperation will
nodes. We define the benefit received and the cost eventually be helpful for themselves. In [13], Niyato et
al. investigated energy harvesting technologies required
incurred, and hence the net utility for successful
for autonomous sensor networks using a noncooperative
packet transmission. game theoretic technique. Nash equilibrium was
proposed as the solution of this game to obtain the

310
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

optimal probabilities of the two states, viz., sleep and (signatures) can be allocated to different nodes with
wake up, that were used for energy conservation. Their possible code reuse between spatially separated nodes.
solutions revealed that sensor nodes selfishly try to In general, due to nonzero cross-correlation between
conserve energy at the expense of high packet blocking node signatures, we understand that there is an upper
probability. Xidong et al. applied game theoretic limit in the number of simultaneously active nodes in the
dynamic power management (DPM) policy for vicinity of a receiver (i.e., within the interference range
distributed wireless sensor network using repeated of a receiver) so that then received SINR stays above a
stage games. minimum operational threshold.
As far as ad hoc networks are concerned, Buttyan and To obtain the node distribution, we use the following
Hubaux [16] proposed the concept of virtual currency assumptions and definitions:
(called ―nuglets‖) which is a method to reward nodes  All nodes have an omnidirectional transmit and
participating in forwarding packets in a mobile ad hoc receive antenna of the same gain.
network. The Terminodes project [24] has proposed a
method that encourages cooperation in ad hoc networks  Receiving and interference ranges for each sensor
that is based on the principles laid out in [25]. It has node depend on the transmission power of the
been well established that incorporating pricing schemes sender and the other sensor nodes in vicinity.
(in terms of reward and penalty) can stimulate a
cooperative environment, which benefits both the  The receiving distance, is defined as the
network and the nodes. A traffic-pricing-based approach
maximum distance from which a receiving node can
was proposed in [26].
Though game theory has been used to study various correctly recover a transmitted signal.
aspects of ad hoc and sensor networks, there is none
that tries to find the optimal transmission power levels  The interference distance, , is defined as the
when the nodes are allowed both continuous and maximum distance from which a receiving node can
discrete power levels. The problem arises due to the sense a carrier.
difficulty in characterizing the information that each
sensor node has about the others. Hence, seeking the  The signal power level at each receiver is controlled
desired operating point in the incomplete-information by the corresponding transmitter and is equal to the
scenario becomes a challenge. Though there are several lowest possible operational threshold. Since the
game theory power control approaches for cellular
internodal distance varies randomly, the required
networks (see [31] and references therein), those
centralized algorithms cannot be directly applied to transmit power is different for different transmitter
sensor networks. In this paper, we attempt to develop a receiver pairs.
game theoretic framework that helps the nodes decide
on the optimal power levels for a specified objective Fig. 1 shows node w as the receiver under
given by the utility function. consideration. Node u, while transmitting to node v, acts
as an interferer to node w. Note that the reverse need
III. INTERFERENCE FOR RANDOMLY not necessarily be true since the transmission power of
DISTRIBUTED NODES node u and node w can be different. A rigorous
treatment for the distribution of interference power can
be found in [32].
We consider the problem of communication
between neighboring nodes in a network that consists of Fig. 1. Interference at node from a local neighbor
sensor nodes scattered randomly over an area. Given node .
that the sensor nodes have limited energy, buffer space,
and other resources, contention-based protocols may
not be a suitable option. Here, as an alternative, we use
code division multiplexing, where distinct codes

311
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

power .Note that can be zero. =0 implies


that a node decides not to transmit at that game
iteration. However, the question is: Will there be any
finite value of that the nodes will follow and will
still be able to maximize their benefits? In this game, if
node 1 chooses its power level , node 2 chooses
its power level ,and so on, we can describe
such a set of strategies chosen by all (a node
with its neighbors) nodes as one ordered -
tuple,

IV. NONCOOPERATIVE GAME UNDER


INCOMPLETE INFORMATION This vector of individual strategies is called a strategy
profile (or sometimes a strategy combination). For every
different combination of individual choices of strategies,
With the average number of interferers for a node we would get a different strategy profile . The set of all
known, we formulate a game for such a distributed such strategy profiles is called the space of strategy
sensor network and then try to devise the game profiles . It is simply the Cartesian product of the
strategies to find if any steady state equilibrium exists power vectors for each node. We write it as
for this game model.

A. Game Formulation
We consider that the strategy profile of all the nodes is
identical, i.e., all the nodes can transmit with a power
We assume a set of homogeneous nodes in our sensor level between and .Since all the nodes are
network playing repeated game. The information from identical, we assume that the set of allowable transmit
previous rounds are used to devise strategies in future powers is applicable to all the nodes. Hence, ,
rounds. We focus our attention on a particular node with
where i and j denote any two nodes, and is the fixed
potentially as many as N neighbors within the
strategy profile of any node. Then, (2) reduces to
interference range. Due to homogeneity of the nodes,
the actions allowed by the nodes are the same, i.e., all
the nodes can transmit with any power level to make its
transmission successful. Also, the nodes have no
B. Utility
information if the other nodes are transmitting, hence
leading to an incomplete information scenario [34]. If
the nodes transmit with an arbitrary high power level, it
The game is played by having all the nodes
will increase the interference level of the other nodes.
The neighboring nodes in turn will transmit at higher simultaneously pick their individual strategies. This set
power to overcome the effect of high interference. of choices results in some strategy profile , which
Soon, this will lead to a noncooperative situation. To we call the outcome of the game. Each node has a set
control this noncooperative behavior, we try to devise of preferences over these outcome s . At the end of
an equilibrium game strategy which will impose an action, each node receives a utility value.
constraints on the nodes to act in cooperative manner
even in a noncooperative network.
We assume the existence of some strategy sets
for the nodes 1, 2,3,4,5,...,( ). These Where is the strategy profile of all the nodes but for
sets consist of all possible power levels ranging from the the th node. Note that the utility each node receives
minimum transmit power to maximum transmit depends not only on the strategy it picked, but also on
the strategies which all the other nodes picked. In other

312
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

words, the utility to any one node depends on the entire


strategy profile. The individual utilities for all the First, we try to find the probability of successful
nodes for a particular strategy profile define a utility transmission. We assume that node is transmitting to
vector for that strategy profile. node . Node not only hears from node but also from
other neighboring nodes if they are transmitting; these
signals appear as interference. The probability of
successful transmission of a packet containing bits
With the notations already defined and to emphasize from node to node can be given by
that the th node has control over its own power level
only, we define the utility (if a node is transmitting) as
[36].
For simplicity, the bit corruption is assumed to be
independently and identically distributed. It is clear from
the above definition that with increased , the bit error
where node is transmitting to a node , and is the probability decreases, which in turn increases the
number of information bits in a packet of size bits is probability of successful transmission and vice versa.
the transmission rate in bits/sec using strategy . With the probability of successful transmission defined,
is the efficiency function which increases with we need to find the desired transmit power level for a
expected SINR of the receiving node. We define the link over which the packets are to be transmitted.
efficiency function as follows , is the Before doing so, let us try to find the expected power
consumption. We consider a scenario where a node is
bit error rate (BER). The bit error rate , depends on
allowed to retransmit a packet if a transmission is
the channel state and interference from other nodes,
unsuccessful, and it continues to retransmit until the
i.e., in other words, is a function of SINR.
transmission is successful. Let the power level chosen by
the transmitter node be , and there are unsuccessful
C. Net Utility
transmission followed by successful
transmission. Then, the expected power consumption by
the transmitter node can be given as below.
With the utility of a node defined, let us consider the
cost/ penalty incurred by a node. We assume that the With the power consumption given in (34), we define
each sensor node tries to maximize its own utility by the expected power efficiency for power level P as an
adjusting its own power optimally as given by utility inverse function of the expected power consumption.
function. The utility function from a sensor node‘s
perspective takes into account the interference it gets
from other nodes; however, it ignores the fact that this
node imposes on itself in terms of drainage of energy.
Pricing (or regulating cost) has been shown to be
effective in regulating this externality, as it encourages
the nodes to use resources more efficiently. We use
pricing (cost) as a negative incentive signal to model the
usage-based cost that a sensor node must pay for using
the resource. Hence, we consider a cost component that
accounts for the energy consumed/drained by the
sensor nodes with usage of resources (transmission
power).Therefore, we define a metric: net utility, which
is the utility achieved minus the cost incurred. This
justifies the rational (self-optimizing) behavior of a
sensor node even in the distributed scenario.

V. DETECTING TRANSMISSION POWER

313
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Then, the optimal transmit power is the power level, from all its neighboring nodes. We show the results for
which will maximize the expected power efficiency. two different modulation schemes: DPSK and
noncoherent PSK. As expected, with improvement in
A. Distortion Factor channel condition, i.e., with increase in SINR, the
probability of successful transmissions increases.
Fig. 4 present the maximum power efficiency for both
We define distortion factor, , as the difference schemes. More precisely, from the graphs, we find that
between the best possible net utility obtainable with if SINR is low and transmitting power P is high,
continuous power level and the best possible net utility where , then the power efficiency is
obtained with discrete power levels. almost equal to zero. This proves our previous claim,
Fig. 2. Average bit error rate. which is, during bad channel conditions or below a
certain threshold channel condition (when the SINR of
Given the transmission powers in both continuous and the intended receiver node is very low), a node should
discrete cases, respectively, as and , the distortion not transmit. This only increases its power consumption
factor for the th node is represented by and thus expected power consumption is no longer
minimized. On the contrary, when the SINR is high, a
node should transmit with low power to maximize its
power efficiency. In this case, increasing transmitting
where represents the strategy profile of rest of the power unnecessarily will decrease the power efficiency
nodes. With increase in number of power levels , the below its maximum.
distortion can be reduced. It is intuitive that there will be an optimal value of
beyond which the net utility will only decrease. This
figure serves as a guideline for calculating the desired
transmitting power to maximize net utility for a node
transmitting to node given the strategies taken by all
other nodes.

Fig. 3.Successful frame transmission probability.

VI. NUMERICAL RESULTS


Fig. 4.Maximum efficiency for DPSK.

We consider that the sensor nodes can transmit For finding the best response to the strategies adopted
uniformly in the range . We assume that the by other nodes, we assume a subset of nodes to be
SINR received by the nodes is uniformly distributed active that are operating with fixed strategies. Fig. 6
between . For our calculation, we shows the effect of having nonuniform power levels.
assume and . SINR is assumed We choose 1, 5, 20, 30, 50, and 100 mW as the power
to range from -12:5 dB to 11.5 dB. levels. For our calculation, we varied transmitting power
Fig. 2 and 3 show the average bit error rate and from 1 to 100 mW. We find that there exist points for
probability of successful transmissions, respectively, for each of the cost functions considered (i.e., linear,
different values of SINR (in dB) perceived by node j quadratic, and exponential), which give the maximum
net utility given the strategies taken by all other nodes

314
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

as fixed. This desired transmitting power level gives the existence of Nash equilibrium. We found that Nash
best response for the node. If a node unilaterally equilibrium exists if we assume a minimum and
changes its strategy and does not transmit with this maximum threshold for channel condition and power
transmitting power level, then the node will not get its level, respectively.
best response and will not be able to reach Nash We suggest that a node should only transmit when its
equilibrium even if Nash equilibrium exists for this condition is better than the minimum threshold and its
model. transmission power level is below the threshold power
Fig. 5 plots the net utility against the transmission level. We evaluated the desired power level at which the
power for a fixed received power. We compare nodes.should transmit to maximize their utilities under
continuous power level with two sets of discrete power any given condition.
levels; one set has six and the other has 20 power
levels. The power levels are uniformly spaced between
the maximum and the minimum. As expected, with
more number of allowed power levels, the maximum net
utility gets closer to that as obtained by continuous
power levels. Here, we compare our proposed
mechanism of finding discrete power levels based on
interference distribution with uniform spaced discrete
power levels. The result shows that the distortion factor
is reduced with increase in number of power levels.
Moreover, the distortion obtained is reduced if the
knowledge of the interference is used instead of having
uniform equal spaced power levels.
Fig. 6.Net utility for nonuniform power levels.

Numerical results demonstrate that the proposed


algorithm achieves the best possible payoff/utility for the
sensor nodes even by consuming less power. We also
analyzed the case where nodes are allowed discrete
power levels as in most practical systems and compared
their performances with the continuous power levels.

VIII. REFERENCES

[1] Akyildiz, W. Su, Y. Sankarasubramaniam, and E.


Cayirci, ―A Survey on Sensor Networks,‖ IEEE
Comm. Magazine, vol. 40, no. 8, pp. 102-114,
Fig. 5.Net utility for continuous and (6, 20) discrete Aug. 2002.
power levels.
[2] A.Sampath, P.S. Kumar, and J. Holtzman,
VII. CONCLUSIONS ―Power Control and Resource Management for a
Multimedia CDMA Wireless System,‖ Proc. IEEE
Int‘l Symp. Personal, Indoor and Mobile Radio
In this paper, we presented a game-theoretic approach Communications (PIMRC), vol. 1, pp. 21-25,
to solve the power control problem encountered in
Sept. 1995.
sensor networks. We used noncooperative games with
incomplete information and studied the behavior and

315
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[3] R. Yates, ―A Framework for Uplink Power Parallel and Distributed Systems, vol. 19, no. 1,
Control in Cellular pp. 66-76, Jan. 2008.

[4] Radio Systems,‖ IEEE J. Selected Areas in [12] S. Sengupta and M. Chatterjee, ―An
Comm., vol. 13, no. 7, pp. 1341-1348, Sept. Economic Framework for Dynamic Spectrum
1995. Access and Service Pricing,‖ IEEE/ACM Trans.
Networking, vol. 17, no. 4, pp. 1200-1213, Aug.
[5] S. Clearwater, Market-Based Control: A 2009.
Paradigm for Distributed Resource Allocation.
World Scientific, 1996. [13] M. Kubisch, H. Karl, A. Wolisz, L. Zhong,
and J. Rabaey ―Distributed Algorithms for
[6] F. Kelly, A. Maulloo, and D. Tan, ―Rate Control Transmission Power Control in Wireless Sensor
in Communications Networks: Shadow Prices, Networks,‖ Proc. IEEE Wireless Comm. And
Proportional Fairness and Stability,‖ J. Networking Conf., vol. 1, pp. 558-563, 2003.
Operations Research Soc., vol. 49, pp. 237-252,
1998. [14] D. Niyato, E. Hossain, M. Rashid, and V.
Bhargava, ―Wireless Sensor Networks with
[7] P. Key and D. McAuley, ―Differential QoS and Energy Harvesting Technologies: A Game-
Pricing in Networks: Where Flow Control Meets Theoretic Approach to Optimal Energy
Game Theory,‖ IEE Proc.Software, vol. 146, no. Management,‖ IEEE Wireless Comm., vol. 14,
1, pp. 39-43, Feb. 1999. no. 4, pp. 90-96, Aug. 2007.
[8] H. Lin, M. Chatterjee, S. Das, and K. Basu, [15] J. Nash, ―Equilibrium Points in N-Person
―ARC: An Integrated Admission and Rate Control Games,‖ Proc. Nat‘l Academy of Sciences, vol.
Framework for CDMA Data Networks Based on 36, pp. 48-49, 1950. S. Buchegger and J. Le
Non-Cooperative Games,‖ Proc. Ninth Ann. Int‘l Boudec, ―Performance Analysis of the
Conf. Mobile Computing and Networking, pp. CONFIDANT Protocol,‖ Proc. Third ACM Int‘l
326-338, 2003. Symp. Mobile Ad Hoc Networking & Computing,
pp. 226-236, 2002.
[9] R. Maheswaran and T. Basar, ―Decentralized
Network Resource Allocation as a Repeated [16] L. Buttyan and J.P. Hubaux, ―Nuglets: A
Noncooperative Market Game,‖ Proc. 40th IEEE Virtual Currency to Stimulate Cooperation in
Conf. Decision and Control, vol. 5, pp. 4565- Selforganized Mobile Ad-Hoc Networks,‖
4570, 2001. Technical Report DSC/2001/001, Swiss Fed.
Inst. Of Technology, Jan. 2001.
[10] M. Stonebraker, R. Devine, M.
Kornacker,W. Litwin, A. Pfeffer, A. Sah, and C. [17] W. Wang, M. Chatterjee, and K. Kwiat,
Staelin, ―An Economic Paradigm for Query ―Enforcing Cooperation in Ad Hoc Networks with
Processing and Data Migration in Mariposa‖ Unreliable Channel,‖ Proc. Fifth IEEE Int‘l Conf.
Proc. Third Int‘l Conf. Parallel and Distributed Mobile Ad-Hoc and Sensor Systems (MASS), pp.
Information Systems, pp. 58-67, Sept. 1994. 456-462, 2008.
[11] R. Subrata, A. Zomaya, and B. [18] V. Srinivasan, P. Nuggehalli, C.
Landfeldt, ―Game-Theoretic Approach for Load Chiasserini, and R. Rao, ―Cooperation in
Balancing in Computational Grids,‖ IEEE Trans.

316
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Wireless Ad Hoc Networks,‖ Proc. IEEE Data Networks,‖ IEEE Trans. Comm., vol. 53,
INFOCOM, vol. 2, pp. 808-817, Apr. 2003. no. 11, pp. 1885-1894, Nov. 2005.

[19] L. Blazevic, L. Buttyan, S. Capkun, S. [22] S. De, C. Qiao, D. Pados, M. Chatterjee,


Giordiano, J. Hubaux, and J. Le Boudec, ―Self- and S. Philip, ―An Integrated Cross-Layer Study
Organization in Mobile Ad-Hoc Networks: The of Wireless CDMA Sensor Networks,‖ IEEE J.
Approach of Terminodes,‖ IEEE Comm. Selected Areas in Comm. (JSAC), Special Issue
Magazine, vol. 39, no. 6, pp. 166-174, June on Quality of Service Delivery in Variable
2001. Topology Networks, vol. 22, no. 7, pp. 1271-
1285, Sept. 2004.
[20] J. Crowcroft, R. Gibbens, F. Kelly, and S.
Ostring, ―Modelling Incentives for Collaboration [23] D. Fundenberg and J. Tirole, Game
in Mobile Ad Hoc Networks,‖ Proc. Modeling and Theory. MIT Press, 1991.
Optimization in Mobile, Ad Hoc and Wireless
Networks (WiOpt ‘03), 2003. Y. Xing and R. Chandramouli, ―Distributed Discrete
PowerControl for Bursty Transmissions over Wireless
[21] F. Meshkati, H. Poor, S. Schwartz, and Data Networks,‖Proc. IEEE Int‘l Conf. Comm. (ICC), vol.
1, pp. 139-143, 2004.
N. Mandayam, ―An Energy-Efficient Approach to
Power Control and Receiver Design in Wireless

317
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

SMABS: SECURE MULTICAST


AUTHENTICATIONBASED ON BATCH SIGNATURE
*R.Uma **L. Paul Jasmine Rani
Student Senior Lecturer
Master of Engineering in Computer ScienceMaster of Engineering in Computer Science
S.A.Engineering College S.A.Engineering College
uma_devi1985@yahoo.com pauljasminerani@yahoo.co.in
Abstract—Conventional block-based multicast attractive to malicious attacks. Basically, multicast
authentication schemes overlook the authentication may provide the following security
heterogeneity of receivers by letting the sender services:
choose the block size, divide a multicast stream 1. Data integrity: Each receiver should be able to
into blocks, associate each block with a assure that received packets have not been modified
signature, and spread the effect of the signature during transmissions.
across all the packets in the block through hash 2. Data origin authentication: Each receiver should be
graphs or coding algorithms. The correlation able to assure that each received packet comes from
among packets makes them vulnerable to the real sender as it claims.
packet loss, which is inherent in the Internet 3. Nonrepudiation: The sender of a packet should not
and wireless networks. Moreover, the lack of be able to deny sending the packet to receivers in case
Denial of Service (DoS) resilience renders most there is a dispute between the sender and receivers.
of them vulnerable to packet injection in hostile All the three services can be supported by an
environments. In this paper, we propose a novel asymmetric key technique called signature. In an ideal
multicast authentication protocol, namely case, the sender
SMABS, including two schemes. The basic generates a signature for each packet with its private
scheme (SMABS-B) eliminates the correlation key, which is called signing, and each receiver checks
among packets and thus provides the perfect the validity of the signature with the sender‘s public
resilience to packet loss, and it is also efficient key, which is called verifying. If the verification
in terms of latency, computation, and succeeds, the receiver
communication overhead due to an efficient knows the packet is authentic.
cryptographic primitive called batch signature, Designing a multicast authentication protocol is not an
which supports the authentication of any easy task. Generally, there are following issues in real
number of packets simultaneously. We also world challenging the design. First, efficiency needs to
present an enhanced scheme SMABS-E, which be considered, especially for receivers. Compared with
combines the basic scheme with a packet the multicast sender, which could be a powerful
filtering mechanism to alleviate the DoS impact server, receivers can have different capabilities and
while preserving the perfect resilience to packet resources. The receiver heterogeneity requires that the
loss. multicast authentication protocol be able to execute on
Keywords— Multimedia, multicast, not only powerful desktop computers but also
authentication, signature. resource-constrained mobile handsets. In particular,
latency, computation, and communication overhead
IX. INTRODUCTION are major issues to be considered. Second, packet loss
is inevitable. In the Internet, An overloaded router
MULTICAST [1] is an efficient method to deliver drops buffered packets according to its preset control
multimedia content from a sender to a group of policy. Though TCP provides a certain retransmission
receivers and is gaining popular applications such as capability, multicast content is mainly transmitted over
realtime stock quotes, interactive games, video UDP, which does not provide any loss recovery
conference, live video broadcast, or video on demand. support. In mobile environments, the situation is even
Authentication is one of the critical topics in securing worse. The instability of wireless channel can cause
multicast [2], [3], [4], [5], [6], [7] in an environment packet loss very frequently. Moreover, the smaller data

318
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

rate of wireless channel increases the congestion already-received packets can still be
possibility. This is not desirable for applications like authenticated by receivers.
real time online streaming or stock quotes delivering. 2. SMABS-B is efficient in terms of less latency,
Therefore, for applications where the quality of service computation, and communication overhead.
is critical to end users, a multicast authentication Though
protocol should provide a certain level of resilience to SMABS-E is less efficient than SMABS-B since it
packet loss. Efficiency and packet loss resilience can includes the DoS defense, its overhead is still at
hardly be supported simultaneously by conventional the same level as previous schemes.
3. We propose two new batch signature schemes
multicast schemes. In order to reduce computation
based
overhead, conventional schemes use efficient on BLS [36] and DSA [38] and show they are
signature algorithms [8], [9] or amortize one signature more
over a block of packets [10], [11], [12], [13], [14], efficient than the batch RSA [33] signature
[15], [16], [17], [18], [19], [20], [21], [22], [23], scheme.
[24], [25], [26] at the expense of increased The rest of the paper is organized as follows: We
communication overhead [8], [9], [10], [11] or briefly review related work in Section 2. Then, we
vulnerability to packet loss [12], [13], [14], [15], [16], present a basic scheme for lossy channels in Section 3,
[17], [18], [19], [20], [21], [22], [23], [24], [25], which also includes three batch signature schemes
[26]. based on RSA [33], BLS [36], and DSA, respectively
Another problem with schemes in [8], [9], [10], [38]. An enhanced scheme is discussed in Section 4.
[11], [12], [13], [14], [15], [16], [17], [18], [19], After performance evaluation in Section 5, the paper is
[20], [21], [22], [23], [24], [25] is that they are concluded in Section 6.
vulnerable to packet injection by malicious attackers.
An attacker may compromise a multicast system by X. II. RELATED WORKS
intentionally injecting forged packets to consume Schemes in [8], [9] follow the ideal approach of
receivers' resource, leading to Denial of Service signing and verifying each packet individually, but
(DoS).Compared with the efficiency requirement and reduce the computation overhead at the sender by
packet loss problems. In the literature, some scheme using one-time signatures [8] or k-time signatures [9].
attempt to provide the DoS resilience. However, they They are suitable for RSA [33], which is expensive on
still have the packet loss problem because they are signing while cheap on verifying. For each packet,
based on the same approach as previous schemes however, each receiver needs to perform one or more
[10], [11], [22], [23], [24], [25]. verification on its one-time or k-time signature plus
Recently, we demonstrated that batch signature one ordinary signature verification. Moreover, the
schemes can be used to improve the performance of length of one-time signature is too long (on the order
broadcast authentication [5], [6]. In this paper, we of 1,000 bytes).
present our comprehensive study on this approach and Tree chaining was proposed in [10], [11] by
propose a novel multicast authentication protocol constructing a tree for a block of packets. The root of
called SMABS (in short for Multicast Authentication the tree is signed by the sender. Each packet carries
based on Batch Signature). SMABS includes two the signed root and multiple hashes. When each
schemes. The basic scheme (called SMABS-B receiver receives one packet in theblock, it uses the
hereafter) utilizes an efficient asymmetric authentication information in the packet to
cryptographic primitive called batch signature which authenticate it. The buffered authentication
supports theauthentication of any number of packets information is further used to authenticate other
simultaneously with one signature verification, to packets in the same block.
address the efficiency and packet loss problems in Graph chaining was studied in [12], [13], [14], [15],
general environments. The enhanced scheme (called [16],
SMABS-E hereafter) combines SMABS-B with packet
[17], [18], [19], [20], [21]. A multicast stream is
filtering to alleviate the DoS impact in hostile
environments. SMABS provides data integrity, origin divided into
authentication, and nonrepudiation as previous blocks and each block is associated with a signature.
asymmetric key based protocols. In addition, we make In each block, the hash of each packet is embedded
the following contributions: into several other packets in a deterministic or
probabilistic way. The hashes form a graph, in which
1. Our SMABS can achieve perfect resilience to each path links a packet to the block signature. Each
packet loss in lossy channels in the sense that receiver verifies the block signature and authenticates.
no matter how many packets are lost the Erasure codes were used in [22], [23], [24], [25]. A
signature is generated for the concatenation of the

319
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

hashes of all the packets in one block and then is key management. In this paper, we focus on multicast
erasure-coded into many pieces. authentication.
All these schemes [10], [11], [12], [13], [14], [15],
[16],[17], XI. III. BASIC SCHEME
[18], [19], [20], [21], [22], [23], [24], [25] are Our target is to authenticate multicast streams from a
indeedcomputationally efficient since each receiver sender to multiple receivers. Generally, the sender is a
needs to verify only one signature for a block of powerful multicast server managed by a central
packets. However, they all increase packet overhead authority and can be trustful. The sender signs each
for hashes or erasure codes and the block design packet with a signature and transmits it to multiple
introduces latency when buffering many packets. receivers through a multicast routing protocol. Each
Another major problem is that most schemes [12], receiver is a less powerful device with resource
[13], [14], [15], [16], [17], [18], [19], [20], [21], constraints and may be managed by a non-
[22], [23], [24],[25], are vulnerable to packet loss trustworthy person. Each receiver needs to assure that
even though they are designed to tolerate a certain the received packets are really from the sender
(authenticity) and the sender cannot deny the signing
level of packet loss. If too many packets are lost, other
operation (nonrepudiation) by verifying the
packets may not be authenticated. In particular, if a
corresponding signatures.
block signature is lost, the entire block cannot be Ideally, authenticating a multicast stream can be
authenticated. achieved by signing and verifying each packet.
Moreover, previous schemes [8], [9], [10], [11], [12], However, the per-packet signature design has been
[13], criticized for its high computation cost, and therefore,
[14], [15], [16], [17], [18], [19], [20], [21], [22], most previous
[23], [24], [25], target at lossy channels, which are schemes [10], [11], [12], [13], [14], [15], [16], [17],
realistic in our daily life since the Internet and wireless [18], [19], [20], [21], [22], [23], [24], [25]
networks suffer from packet loss. In a hostile incorporate a block-based design as shown in Section
environment, however, an active attacker can inject 2.
forged packets to consume receivers' resource, leading They do reduce the computation cost, but also
to DoS. In particular, schemes in [8], [9], introduce new problems. The block design builds up
[10], [11], [12], [13], [14], [15], [16], [17], [18], correlation among packets and makes them vulnerable
[19], [20], [21] are vulnerable to forged signature to packet loss, which is inherent in the Internet and
attacks because they require each receiver to verify wireless networks. Also, the heterogeneity of receivers
each signature whereby to authenticate data packets, means that the buffer resource at each receiver is
different and can vary over the
and schemes in [22], [23], [24], [25] suffer from
time depending on the overall load at the receiver. In
packet injection because each receiver has to theblock design, the required block size, which is
distinguish a certain number of valid packets from a chosen by the sender, may not be satisfied by each
pool of a large number of packets including injected receiver.
ones, which is very time-consuming. Third, the correlation among packets can incur addi-
In order to deal with DoS, schemes in were tional latency. Consider the high layer application
proposed.PARMis similar to the tree chaining scheme needs new data from the low layer authentication
[10], [11] in the sense that multiple oneway hash module in order to render a smooth video stream to
chains are used as shared keys between the sender the client user. It is desirable that the lower layer
and receivers. Unfortunately, these schemes are still authentication module delivers authenticated packets
vulnerable to DoS because they require that one-way to the high layer application at the time when the high
hash chains are signed and transmitted to each layer application needs new data. In the per-packet
receiver and therefore an attacker can inject forged signature design it is not a problem, since each packet
signatures for oneway hash chains. can be independently verifiable at any time. In the
PRABSuses distillation codes to deal with DoS. LTT block design, however, it is possible that the packets
buffered at the low layer authentication module are
uses error correction codes to replace erasure codes in
not verifiable because the correlated packets,
schemes [22], [23], [24], [25]. The reason is that
especially the block signatures, have not been
error correction codes tolerate error packets. These
received. Therefore, the high layer application has to
three schemes are resilient to DoS, but they still have
the packet loss problem. either wait, which leads to additional latency, or return
with a no-available-packets exception, which could be
In this paper, we focus on the signature approach.
interpreted as that the buffered packets are "lost."
Though confidentiality is another important issue for In view of the problems regarding the sender-
securing multicast, it can be achieved through group favored block-based approach, we conceive a receiver-

320
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

oriented approach by taking into account the random loss or burst loss. This is a significant
heterogeneity of the receivers. As receiving devices advantage over previous schemes [10], [11], [12],
have different computation and communication [13], [14], [15], [16], [17], [18], [19], [20], [21],
capabilities, some could be powerful desktop [22],
computers, while the others could be cheap handsets 4. [23], [24], [25], Meanwhile, efficiency can also be
with limited buffers and low-end CPUs. achieved because a batch of packets can be
In order to fulfill the requirement, the basic scheme authenticated simultaneously through one batch
SMABS-B uses an efficient cryptographic primitive signature verification operation. The packet indepen-
called batch signature which supports simultaneously dency also brings other benefits in terms of smaller
verifying the signatures of any number of packets. In latency and communication overhead compared with
particular, when a receiver collects n packets: previous
5. schemes [10], [11], [12], [13], [14], [15], [16],
Pi ={mt,at},i=1,...,n, [17], [18], [19], [20], [21], [22], [23], [24], [25]
wheremiis the data payload, aiis the corresponding 6. [32]. In particular, each receiver can verify the
signature, and n can be any positive integer, it can authenticity of all the received packets in its buffer
input them into an algorithm whenever the high layer applications require, and
there is no additional hash or code overhead in each
BatchVerify(p1,p2,. . . ,pn)2{True, False}. packet.
7. Next, we present three implementations. In
If the output is True, the receiver knows the n packets addition to the one based on RSA [33], we propose
are authentic, and otherwise not. two new batch signature schemes based on BLS [36]
To support authenticity and efficiency, the and DSA [38], which are more efficient than batch
BatchVerifyQalgorithm should satisfy the following RSA. We must point out and will show later that
properties: SMABS is independent from these signature
algorithms. This independency brings the freedom to
1. Given a batch of packets that have been signed optimize SMABS for a particular physical system or
by the sender, BatchVerify( ) outputs True. platform as much as possible.
2. Given a batch of packets including some
unauthentic packets, the probability that
Batch RSA Scheme
BatchVerify() outputs True is very low.
RSA
3. The computation complexity of BatchVerify() is
comparable to that of verifying one signature RSA [33] is a very popular cryptographic algorithm in
and is increased only gradually when the batch many security protocols. In order to use RSA, a sender
size n is increased. chooses two large random primes P and Q to get N =
The computation complexity of BatchVerify() comes PQ,
with the fact that there are some additional cost on and then calculates two exponents e, d 2
processing multiple packets. As we will show later, ed = 1 mod 0(N), where 0(N) = (P - 1)(Q - 1). The
those additional computations are mostly modular sender publishes (e, N) as its public key and keeps d
additions and multiplications, which are much faster in secret as its private key. A signature of a message
than modular exponentiations required in final m can be generated as a =(h(m))d mod N, where h()
signature verifications. Theoretically, a concern comes is a collision-resistant hash function. The sender sends
when the cost grows higher than the final signature {m,a} to a receiver that can verify the authenticity of
verification if the batch size is too large. However, it is the message m by checking ae= h(m) mod N.
not the case in reality. In order to show the merit of BATCH RSA
signature preaggregation, we implemented batch To accelerate the authentication of multiple signatures,
signature by using our Batch-BLS (will be discussed the batch verification of RSA [34], [35] can be used.
later) as an example. We measured the normalized Given n packets {mi,ai} , i = 1 , . . . ,n, where miis
time cost of batch signature verification with the batch the data payload, aiis the corresponding signature and
size growing from 1 to 1,000, and recorded the results n is any positive integer, the receiver can first calculate
for two scenarios, with and without signature hi = h(mi) and then perform the following verification:
preaggregation. SMABS-B uses per-packet signature
instead of per-block signature and thus eliminates the
correlation among packets. The packet independency
makes SMABS-B perfect resilient to packet loss. The
Internet and wireless channels tend to be lossy due to If all n packets are truly from the sender, the equation
congestion or channel instability, where packets can holds because
be lost according to different loss models, such as

321
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2. Nondegenerate: For the generator of i.e.,


where p is the order of G\, we have

The BLS signature scheme consists of three phases:


1. In the key generation phase, a sender chooses
a random integer and
Before the batch verification, the receiver must ensure computes The private key is x and
all the messages are distinct. Otherwise batch RSA is the public key is y.
vulnerable to the forgery attack. It has been proved in 2. Given a message in the signing
that when all the messages are distinct, batch RSA is phase, the sender first
resistant to signature forgery as long as the underlying computes where is a hash
RSA algorithm is secure. function, then c o m p u t e s T h e
In some circumstances, an attacker may not forge signature of m is
signatures but manipulate authentic packets to 3. In the verification phase, the receiver first
produce invalid signatures. For example, given two computes
packets and f o r and then check whether
t h e attacker can modify them
into and The modified packets can
still passthe batch verification, but the signature of If the verification succeeds, then the message m is
each packet is not correct (that is why batch RSA authentic because
verification is called screening in [35]). However, the
attacker can do this only when it gets One merit of the BLS signature is that it can
and which means the message m, and generate a very short signature. It has been shown in
mjhave been correctly signed by the sender. [36] that an n-bit BLS signature can provide a security
Therefore, this attack is of no harm to the receiver. level equivalent to solving a Discrete Log Problem
(DLP) [39] over a finite field of size
Requirements to the Sender approximately Therefore, a 171-bit BLS signature
In most RSA implementations, the public key e is provides the same level of security as a 1,024-bit DLP-
usually small (e = 3 for instance) while the private key based signature scheme such as DSA [38]. This is a
d is large. Therefore, the RSA signature verification is very nice choice in the scenario where communication
efficient while the signature generation is expensive. overhead is an important issue.
This poses a challenge to the computation capability of
the sender because the sender needs to sign each Batch BLS
packet. Choosing a small private key d can improve Based on BLS, we propose our batch BLS scheme
the computation efficiency but compromise the here. Given n packets the receiver
security. If the sender does not have enough resource, can verify
a pair of {e, dg with comparable sizes can achieve a the batch of BLS signatures by first
certain level of trade-off between computation computing and then checking
efficiency and security at the sender part. If the whether
sender is a powerful server, then signing each packet This is because if all the messages are
can be affordable in this scenario. authentic, then

BATCH BLS Signature


BLS
The BLS signature scheme uses a cryptographic
primitive called pairing, which can be defined as a map
over two cyclic
groups and satisfying the
following properties:
We can prove that our batch BLS is secure to
1. Bilinear: For all . and signature forgery as long as BLS is secure to signature
forgery.
Theorem 1.Suppose an attacker A can break the
batch BLS by forging signatures. Then, another

322
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

attacker B can break BLS under the chosen message Boyd and Pavlovski pointed out in that Harn's
attack by colluding with A. work is still vulnerable to malicious attacks.
Proof.Suppose B is given n — 1 messages and Here, we propose a batch DSA scheme based on
their valid signatures B can Harn's work and counteract the attack described
forge a signa- in.
ture for any chosen message mn, such
that satisfies the BLS signature scheme, by
Ham DSA
colluding with A in the following steps: In Harn DSA, some system parameters are
defined as:
1. B sends n messages and n —
1 signatures to A. 1. p, a prime longer than 512 bits.
2. Because A can break the batch BLS scheme, 2. q, a 160-bit prime divisor of p — 1.
A generates n false 3. g, a generator of with order q,
signatures that pass the batch i.e.,
BLS verification, then returns to Ba 4. x, the private key of the signer, 0 < x < q .
value 5. y, the public key of the signer,
3. B computes as the signature 6. h(), a hash function generating an output
for mn, because in
Given a message m, the signer generates a
signature by:
1. randomly selecting an integer k with 0
<k<q,
2. computing h=h(m),
3. computing and
4. computings=rk—hxmodq.
The signature for mis (r, s).
The receiver can verify the signature by first
Also like batch RSA, an attacker may not forge computing h = h(m) and then checking whether
signatures but manipulate authentic packets to
produce invalid signatures. For instance, two
packets and Fori = j can be
replaced with and andstill pass This is because if the packet is authentic, then
the batch verification. However, it does not
affect the correctness and the authenticity of m i
Proof.Suppose B is given n — heir
and m j because they have been correctly signed
by the sender.

Requirements to the Sender


In our batch BLS, the sender needs to sign each
packet. Because BLS can provide a security level
Harn Batch DSA
equivalent to conventional RSA and DSA with
Given n packets the receiver
much shorter signature [36], the signing
can
operation is more efficient than the RSA
verify the batch of signatures by first computing
signature generation. Moreover, BLS can be
hi = h(mi ) and then checking whether
implemented over elliptic curves, which have
been shown in the literature to be more efficient
than finite integer fields on which RSA is
implemented. Therefore, we can expect that our
batch BLS is more affordable by the sender than
batch RSA. This is because, if the batch of packets is
authentic, then
Batch DSA Signature
A batch DSA signature scheme was proposed in,
but later was found insecure. Unfortunately,

323
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

(ri,si) } to produce invalid signatures {mi, (ri',si')}, which


can still pass the batch verification. The attacker can
keep riunchanged, randomly choose si', i = 1 . . . n —
1 and solve sn' satisfying

However, this attack does not affect the correctness


The Boyd-Pavlovski Attack and authenticity of messages because they have been
Boyd and Pavlovskipointed out an attack against really signed by the sender. Therefore, the receiver
the Harn batch DSA scheme, where an attacker can still accept them because the batch verification
can forge signatures for any chosen message set succeeds.
that has not been signed by the sender. The Requirements to the Sender
process is: In batch RSA and our batch BLS, the sender needs to
compute one modular exponentiation to sign each
1. Choose B and C, calculate mod q.
packet. In our batch DSA, the sender needs to
2. For any message set mi, i = 1 . . . n, compute one modular exponentiation to get r and two
randomly choose ri, i = 1 ; . . . ; n — 2. modular multiplications to get s. However, r is
3. Compute rn—1and rnto ensure that independent on the message m. Therefore, the sender
can generate many r values offline. When the sender
starts a multicast session, it can use reserved r values
to compute s values.
Preaggregation of Message Hashes and
Signatures
If we take a closer look at batch-RSA and batch-BLS,
we can notice that they use exactly the orignal RSA
4. Randomly choose si, i = 1 ; . . . ; n — 1 and and BLS algorithms, respectively. The only difference
compute snto ensure that is that the batch algorithms take the aggregations of
message hashes and signatures as the
parameters to the original signature algorithms. This
aggregation is independent of the final signature
verification. Therefore, each receiver can compute and
The probability that {mi,ri,si}, i = 1; . . . n are
update after receiving every packet.
forged messages satisfying the batch verification.
When the time of batch verification comes, each
Our Batch DSA receiver needs just one signature verification. For
batch-DSA, the aggregations of message hashes and
In order to counteract the Boyd-Pavlovski attack,
signature are inside the
our batch DSA makes an improvement to the
signature verification. Therefore, the cost of the
Harn DSAalgorithm.
aggregations is incurred with the final signature
We replace the hash operation h(m) in the signature
generation and verification process with verification. However, if we modify the original DSA so
h(r,m).Though it is simple, our method can that it takes { sr-1,hr-1,r} instead of { h, s, r} as
significantly increase the security of batch DSA. In the parameters, then batch-DSA can take the advantage
Boyd-Pavlovski attack, the attacker can compute of preaggregating message hashes and signatures,
rivalues according to (9) and (10) because parameters just like batch-RSA and batch-BLS.
A, C, hivalues are known.
Like the cases in batch RSA and our batch BLS, the I V E N H A N C E D SCHEME
attacker may manipulate authentic packets {mi,

324

Fig. 4. Verification rate under the burst loss model with the maximum burst length 10
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Merkle tree, and it can batch-verify the set anytime.


Fig. 3.Verification rate under the random loss model.
An attacker may inject small sets of forged packets or
even inject single packet that does not belong to any
set. In this case, each receiver has many small sets
The basic scheme SMABS-B targets at the packet loss and each of them has only a few packets. Doing the
problem, which is inherent in the Internet and wireless BatchVerifyalgorithm on each small set compromises
networks. It has perfect resilience to packet loss, no efficiency. Since the sets from the sender can have a
matter whether it is random loss or burst loss. large number of packets, each receiver can choose a
In some circumstances, however, an attacker can threshold (say t) and start batch verification over one
inject forged packets into a batch of packets to disrupt set only when the set has no less than t packets. If a
the batch signature verification, leading to DoS. A set has less than t packets and the root value
naive approach to defeat the DoS attack is to divide recovered from the set has not been authenticated,
the batch into multiple smaller batches and perform the receiver simply drops the set of packets without
batch verification over each smaller batch, and this processing them and thus save computation resource.
divide-and-conquer approach can be recursively V PERFORMANCE EVALUATION
carried out for each smaller batch, which means more
signature verifications at each receiver. In the worst In this section, we evaluate SMABS performance in
case, the attacker can inject forged packets at very terms of resilience to packet loss, efficiency, and DoS
high frequency and expect that each receiver stops the resilience. As we discussed before, SMABS does not
batch operation and recovers the basic per-packet assume any particular underlying signature algorithm.
signature verification. Resilience to Packet Loss
In this section, we present an enhanced scheme called
We use simulations to evaluate the resilience to packet
SMABS-E, which combines the basic scheme SMABS-B
and a packet filtering mechanism to tolerate packet loss. The metric here is the verification rate, i.e., the
injection. In particular, the sender attaches each ratio of the number of authenticated packets to the
packet with a mark, which is unique to the packet and number of received packets.
cannot be spoofed. At each receiver, the multicast We compare SMABS with some well-known loss
stream is classified into disjoint sets based on marks. tolerant schemes EMSS [14], augmented chain
Each set of packets comes from either the real sender (AugChain) [18],PiggyBack [16], tree chain (Tree)
or the attacker. The mark design ensures that a packet [11], and SAIDA [23].These schemes are
from the real sender never falls into any set of packets representatives of graph chaining, tree chaining, and
from the attacker, and vice versa. Next, each receiver erasure coding schemes and are widely used in
only needs to perform BatchVerify( ) over each set. If performance evaluation in the literature.
the result is True, the set of packets is authentic. If For EMSS [14], we choose the chain configuration
not, the set of packets is from the attacker, and the of 5 — 11 — 17 — 24 — 36 — 39, which has the best
receiver simply drops them and does not need to performance among all the configurations of length 6
divide the set into smaller subsets for further batch as is shown in [14]. For AugChain [18], we choose
verification. In SMABS-E, Merkle tree is used to C3,7chain configuration. For PiggyBack [16], we
generate marks. An example is illustrated in Fig. 2. choose two class priorities. For Tree chain [11], we
The sender constructs a binary tree for eight packets. choose binary tree. For SAIDA [23], we choose the
Each leaf is a hash of one packet. Each internal node erasure code (256; 128). For all these schemes, we
is the hash value on the concatenation of its left and choose the block size of 256 packets and simulate over
right children. For each packet, a mark is constructed 100 blocks. We consider the random loss and the burst
as the set of the siblings of the nodes along the path loss with a maximum loss length of 10 packets. The
from the packet to the root. For example, the mark of verification rates under different loss rates are given in
the packet P3is {H4,H12,H58}and the root can be Fig. 3 and Fig. 4.
recovered as H, 8= H((H1, 2, (H(P3) , H4)) , H5, 8).This is We can see that the verification rates of EMSS [14],
not in the set associated with the Merkle tree and from augmented chain (AugChain) [18] and PiggyBack [16]
which there is a path to the root. This guarantees that
are decreased quickly when the loss rate is increased.
forged packets cannot fall into the set of authentic
packets. The reason is that graph chaining results in the
When the sender has a set of packets for multicast, correlation among packets and this correlation is
it generates a Merkle tree for the set and attaches a vulnerable to packet loss. SAIDA [23] illustrates a
mark to each packet. The root can be recovered based resilience to packet loss up to a certain threshold,
on each packet and its mark. Each receiver can find because of the threshold performance of erasure
whether two packets belong to the same set by codes. Our SMABS and Tree schemes [11] have
checking whether they lead to the same root value. perfect resilience to packet loss in the sense that all
Therefore, the recovered roots help classify received the received packets can be authenticated. This is
packets into disjoint sets. Each receiver does not need because all the packets in SMABS and Tree schemes
to wait for a set to include all the packets under the are independent from each other. As we will show

325
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

later, however, Tree [11] achieves this independency signatures and SMABS-E requires additional O(nlog2n)
by incurring large overhead and latency at the sender hashes. If long signatures are used (like 1,024-bit
and each receiver and is vulnerable to DoS, while our RSA), SMABS-B and SMABS-E have more
SMABS-B has less overhead and latency and SMABS-E communication overhead than those in [14], [16],
is resilient to DoS at the same level of overhead as [18], [23], which is the same case as Tree [11].
Tree [11]. However, BLS generates short signatures of 171 bits,
One thing needs to be pointed out is that we do not which is comparable to most well-known hash
differentiate between SMABS-B and SMABS-E in Fig. 3 algorithmsMD5 (128 bits) and SHA-1(160 bits).
and Fig. 4. SMABS-B is perfect resilient to packet loss Comparisons over DoS Channels
because of its inherent design. While it is not designed DoS is a method for an attacker to deplete the
for lossy channels, SMABS-E can also achieve the resource of a receiver. Processing injected packets
perfect resilience to packet loss in lossy channels. In from the attacker always consumes a certain amount
the lossy channel model, where no DoS attack is of resource. Here, we assume an attacker factor
assumed to present, we can set the threshold t = 1 (3,which means that for nvalid packets (3ninvalid
(refer to Section 4) for SMABS-E, and thus each packets are injected.
receiver can start batch-verification as long as there is For schemes in [11], [14], [16], [18], which
at least one packet received for each set of packets authenticate signatures first and then authenticate
constructed under the same Merkle tree. packet through hash chains, the attacker can inject
( n forged signature packets because signature
Efficiency verification is an expensive operation. For SAIDA [23],
We consider latency, computation, and communication which requires erasure decoding, the attacker simply
overhead for efficiency evaluation under lossy injects ( n forged packets because each receiver has to
channels and DoS channels. choose a certain number of valid packets from all the
Comparisons over Lossy Channels (1+ (3) n packet to do decoding, which can have a
SMABS-B and well-known loss-tolerant schemes tree significant number of tries.
chain (Tree) [11], EMSS [14], PiggyBack [16],
augmented chain (AugChain) [18], andSAIDA [23]. We 5.3 Comparisons of Signature Schemes
also include SMABS-E and three DoS resilient schemes We compare the computation overhead of three batch
PRABS [30], BAS [31], and LTT [32] in the table just signature schemes in Table 4. RSA and BLS require
for comparisons even though they are not designed one modular exponentiation at the sender and DSA
for lossy channels. requires two modular multiplications when r value is
Previous block-based schemes introduce latency computed offline. Usually one c-bit modular
either at the sender [11], [16] or at each receiver exponentiation is equivalent to 1.5c modular
[14], [31] or both [18], [23], [30], [32]. The latency is multiplications over the same field. Moreover, a c-bit
inherent in the block design due to chaining or coding. modular exponentiation in DLP is equivalent to a 6-bit
At the sender side, the correlation among a block of modular exponentiation in BLS for the same security
packets has to be established before the sender starts level. Therefore, we can estimate that the computation
sending the packets. At each receiver, the latency is overhead of one 1,024-bit RSA signing operation is
incurred when the high layer application waits for the roughly equivalent to that of 768 DSA signing
buffered packets to be authenticated after the operations (1,536 modular multiplications) and that of
correlation is recovered. This receiver side latency is 6 BLS signing operations (each one is corresponding to
variable depending on whether the correlation among 255 modular multiplications).
the underline buffered packets has been recovered or According to the report on the computational overhead
not when the high layer application needs new data, of signature schemes on PIII 1 GHz CPU, the signing
and its maximum value is the block size. SMABS-B and certification time for 1,024-bit RSA with a 1,007-
eliminates the correlation among packets. bit private key are 7.9 ms and 0.4 ms, for 157-bit BLS
In SMABS, a trade-off for perfect resilience to are 2.75 ms and
packet loss is that the sender needs to sign each 81 ms, and for 1,024-bit DSA with a 160-bit private
packet, which incurs more computation overhead than key(without precompiling r value) are 4.09 ms and
conventional block-based schemes. Therefore, efficient
4.87 ms. We can observe that for BLS and DSA the
signature generation is desirable at the sender.
Compared with RSA [33], which is efficient in verifying signing is efficient but the verification is expensive,
but is expensive in signing, BLS [36] and DSA [38] are and vice versa for RSA.
pretty good candidates as we will show later. Given the same security level as 1,024-bit RSA, BLS
For n packets, Tree [11] require an overhead of n generates a 171-bit signature and DSA generates a
signature and O(nlog2n) hashes, schemes in [14], 320-bit signature. It is clear that by using BLS or DSA,
[16], [18], [23], require one or more signatures and SMABS can achieve more bandwidth efficiency than
up to O(n2) hashes. SMABS-B and SMABS-E require n

326
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

using RSA, and could be even more efficient than [12] R. Gennaro and P. Rohatgi, "How to Sign
Digital Streams," Information and Computation,
conventional schemes using a large number of hashes. vol. 165, no. 1, pp. 100-116, Feb. 2001.
[13] R. Gennaro and P. Rohatgi, "How to Sign Digital
VI.CONCLUSION Streams," Proc. 17th Ann. Cryptology Conf.
To reduce the signature verification overheads in the Advances in Cryptology (CRYPTO '97),
secure multimedia multicasting, block-based Aug. 1997.
authentication schemes have been proposed. [14]A. Perri g, R. Canetti , J . D. T ygar, andD. Song, "
Unfortunately, most previous schemes have many E ffi ci ent Authentication and Signing of Multicast
problems such as vulnerability to packet loss and lack Streams over Lossy Channels," Proc. IEEE Symp.
of resilience to denial of service (DoS) attack. To Security and Privacy (SP 00), pp. 5675, May 2000.
overcome these problems, we develop a novel [15] Y. Challal, H. Bettahar, and A. Bouabdallah,
authentication scheme SMABS. We have demonstrated "A2Cast: AnAdaptive Source Authentication
that SMABS is perfectly resilient to packet loss due to Protocol for Multicast Streams," Proc. Ninth Int'l
the elimination of the correlation among packets and Symp. Computers and Comm. (ISCC '04), vol. 1,
can effectively deal with DoS attack. Moreover, we pp. 363-368, June 2004.
[16] S. Miner and J. Staddon, "Graph-Based
also show that the use of batch signature can achieve Authentication of Digital Streams," Proc. IEEE
the efficiency less than or comparable with the Symp. Security and Privacy (SP 01), pp. 232246,
conventional schemes. Finally, we further develop two May 2001.
new batch signature schemes based on BLS and DSA, [17] Z. Zhang, Q. Sun, W-C Wong, J. Apostolopoulos,
which are more efficient than the batch RSA signature and S. Wee, "A Content-Aware Stream
Authentication Scheme Optimized for Distortion
scheme. and Overhead," Proc. IEEE Int l Conf. Multimedia
and
Expo (ICME 06), pp. 541-544, July 2006.
[18] P. Golle and N. Modadugu, "Authenticating
REFERENCES Streamed Data in the Presence of Random Packet
[1] S.E. Deering, "Multicast Routing in Internetworks Loss," Proc. Eighth Ann. Network and Distributed
and Extended LANs," Proc. ACM SIGCOMM Symp. System Security Symp. (NDSS 01), Feb. 2001.
Comm. Architectures and Protocols, pp. 55-64, [19] Z. Zhang, Q. Sun, and W-C Wong, "A Proposal of
Aug. 1988.
[2] T. Ballardie and J. Crowcroft, "Multicast-Specific Butterfly-Graphy Based Stream Authentication over
Security Threats and Counter-Measures," Proc. Lossy Networks," Proc.IEEEInt l Conf. Multimedia
Second Ann. Network and Distributed System and Expo (ICME 05), July 2005.
Security Symp. (NDSS '95), pp. 2-16, Feb. 1995.
[3] P. Judge and M. Ammar, "Security Issues and [20] S. Ueda, N. Kawaguchi, H. Shigeno, and K.
Solutions in Mulicast Content Distribution: A Okada, "Stream Authentication Scheme for the
Survey," IEEE Network Magazine, vol. 17, no. 1, Use over the IP Telephony," Proc. 18th Int l Conf.
pp. 30-36, Jan./Feb. 2003.
[4] Y. Challal, H. Bettahar, and A. Bouabdallah, "A Advanced Information Networking and Application
Taxonomy of Multicast Data Origin Authentication: (AINA 04), vol. 2, pp. 164-169, Mar. 2004.
Issues and Solutions," IEEE Comm. Surveys & [21] D. Song, D. Zuckerman, and J.D. Tygar,
Tutorials, vol. 6, no. 3, pp. 34-57, Oct. 2004. "Expander Graphs for Digital Stream
[5] Y. Zhou and Y. Fang, "BABRA: Batch-Based Authentication and Robust Overlay Networks,"
Broadcast Authentication in Wireless Sensor Proc. 2002 IEEE Symp. Security and Privacy (S&P
Networks," Proc. IEEE GLOBECOM,Nov. 2006. '02), May 2002.
[6] Y. Zhou and Y. Fang, "Multimedia Broadcast
Authentication Based on Batch Signature," IEEE [22] J.M. Park, E.K.P. Chong, and H.J. Siegel, "Efficient
Comm. Magazine, vol.45,no.8,pp.72-77, 2007 MulticastPacket Authentication Using Signature
[7] K. Ren, K. Zeng, W. Lou, and P.J. Moran, "On Amortization," Proc. IEEESymp. Security and
Broadcast Authentication in Wireless Sensor
Networks," Proc. First Ann. Int'l Conf. Wireless Privacy (SP 02), pp. 227-240, May 2002.
Algorithms, Systems, and Applications (WASA '06) [23] J.M. Park, E.K.P. Chong, and H.J. Siegel, "Efficient
[8] S. Even, O. Goldreich, and S. Micali, "On-
Line/Offline Digital Signatures," J. Cryptology, vol. MulticastStream Authentication Using Erasure
9, pp. 35-67, 1996. Codes," ACM Trans. Information and System
[9] P. Rohatgi, "A Compact and Fast Hybrid Signature Security, vol. 6, no. 2, pp. 258-285, May 2003.
Scheme for Multicast Packet," Proc. Sixth ACM [24] A. Pannetrat and R. Molva, "Authenticating Real
Conf. Computer and Comm
Security (CCS 99), Nov. 1999. Time Packet Streams and Multicasts," Proc.
[10] C.K. Wong and S.S. Lam, "Digital Signatures for Seventh IEEE Int l Symp.Computersand Comm.
Flows and Multicasts," Proc. Sixth Int'l Conf. (ISCC 02), pp. 490-495, July 2002.
Network Protocols (ICNP '98), [25] A. Pannetrat and R. Molva, "Efficient Multicast
pp. 198-209, Oct. 1998. Packet Authentication," Proc. 10th Ann. Network
[11] C.K. Wong and S.S. Lam, "Digital Signatures for and Distributed System Security Symp. (NDSS
Flows and Multicasts," IEEE/ACM Trans. '03), Feb.
Networking, vol. 7, no. 4, pp. 502513, Aug. 1999.

327
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

AFFINE SYMMETRIC IMAGEMODEL

*k.sanjaikumar ** G.Umaranisrikanth, M.E.(PhD)

*M.E.Computer science & Engineering


S.A.Engineering College, Poonamallee-Avadi Road,
Veeraragavapuram, Thiruverkadu post
Chennai-600077
Tamilnadu, India.
k.sanjai31@rocketmail.com

**H.O.D, P.G.Department,
S.A.EngineeringcollegePoonomallee-Avadi road
Veeraragavapuram, Thiruverkadu post
Chennai-600077
Tamilnadu,
India.

Abstract— Natural images contain patches of texture.self similarity of the image is


considerable redundancy, some of which is exploited across space and scale. Experimental
successfully captured using recently developed evaluation demonstrates the effectiveness of
directional wavelets. In this paper, an affine the approach in affine invariant texture
symmetric image model is considered. It segmentation and image approximation.
provides a flexible scheme to exploit geometric
redundancy. A patch of texture from an image is KEYWORDS: Affine, Textures, patches, Invarient
rotated, scaled and sheared to approximate textures
other similar parts in the image, revealing the
II. Introduction
self-similarity relation. A texture model is
Images are derived by several methods according to
required that identifies structural patterns. thwarted aspects. Humans can view the objects as 3-D
physicalthings or the usual objects with part of 2-D
Then the affine symmetry is exploited between
plane by means of retina. it is considered as collection
structural textures at a local level, the objective of projected surfaces of objects.obiviously most of
natural surfaces a providing the textures.
being to find the minimum residual error by
Textureshaving the characteristics of equivocal also
estimating the affine transform relating two providing the Sense of touch, holding the amount of
surfaces, which are obtained by the physical surface

328
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

characteristics of roughness, smoothness, reflection of


colors.
We are getting explanation of textures from [1][3].It
has sectioned as two types.

Stochastic: Randomized structure type and created by


stochastic .it is the inversely process of defined
structure.

Defined structure: it is arranged periodic type and


depends on the positioned protocols. They are
characterized as the revisedparticles.
These two textures are determined by uniform
intensities. Structure textures consisting the differed
intensities as forecast able and returned type of FIG.1. SAMPLEOF AFFINE SYMMETRIC NATURAL
elements. But the stochastic model has no any IMAGE
forecast able views .stochastic structure are adapt to
the synthetic structures. The original textures and
replica has no difference in human views.
We can identify the little differences in structured The goal of our process is to contribute the basic work
texture. Views of humans are varied as roughness of of affine symmetric implementation with natural
surfaces, intensities of colors; angle of viewing position images which is depicted with Fig.2.
etc.Structural textures is important view of process. To
the sample, to place the jaguar by its blob patterns Sometime there can be the following drawback will be
with split .Almost the perspective process is not used occurred.
to identify the structured textures otherwise separate  Determine the differences of structured and
the objects from background by means of texture non
cues. By the biological aspects neurons in cortex are structured elements.
identify the objects to our brain cells based on the  Define the high efficiently of split the image in
directional based orientation and frequency [5]- self-similar regions.
[8].These are adopted by statistical process for linear  Utilize the symmetric textures into real apply
of directional base textures [8]. Due to this process we
evaluate the structural textures and determine the 2.Affine Image by Texton Theory
interaction in the transformation of affine system.
From the Fig.1 we depict the Affine symmetric of This is based on the Texton theory [10][9] and
natural image In which hair patches for directional directional pattern. Pattern f is localized inξ, is
patterns are transformed with the hat and shoulder for frequency, Ө is orientation is location are the
matches.Inwhich that the symmetric is obtained significant factor in texture as per the definition
among the part of image areas named as ‗structural [5][7][11][12].A combination of linear signal generate
textures‘. Determine the patch to patch accordance the microstructure of texture is called Texton.
will split the image into the affine symmetric
circumstances. Texton = ∫ξ∫Өfξ,Ө texture=∫u T(Texton).
The representation of texton is that the affine
transformed textons are defined by spatial distribution
and it is superimposed by lattice. Large amount of
textons are necessary to representing the edges and
several of prime models.

Textons are transformed into the other form in


geometrical sense by process T. They are evaluated by
transformation of affine. Two spaces are linked by a
map. Redundancy of images is utilized in fractal with
iterative maps.

329
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

In which linear transform can be coefficient around [14].coarser scale sub band has
processed by different process such as coefficient of parent as Fig-1.a
shearing, P scaling, rotation. It is
applied for provide the
interrelationship among the textons.

(n+1)th scale
(Lower frequency)

s s s 2.a Synchronization nth scale


s x s of Affine: (Higher frequency)
s s s

Depend on the affine length


in curve that the boundary
pixels points are traversed for providing the parametric
solution.Inwhich c(x,y) is contour signature.

Initial representations are Fig-3: General neighborhood of wavelet coefficient


.The figure identify the generalized patch of coefficient
dt in position(X) containing the sibling(S)and
c parent(p)coefficients.

Provided solution by the Fourier transformation It permit the cross scale dependencies in natural
boundary formula is [U,V]T In which U,and V are images [22][20].we can‘t split the pictures. We can
coefficients of Fourier function .They representing to use the artifacts of block boundary. we can remove
the x, y coordination . the patch overlap instead of use the local method on
patches overlap.
Linear operator of Fourier Transformation is The consist of single large coefficient are indicating the
other large coefficient which is obtained from
[UK, VK]T = A [U0 , V0] T adjacent. it is adopted to the Gaussian Scale mixture
k
(GSM).From
Where [U0,V0]T is representing the coefficients of
transformation of affine . υ Rd representing the coefficient d of patches.
GSM is denoting = υ

2. B. Coefficient of patches as localized and GSM In which Z is the scalar hidden variable .it is
method: spatially varying and C is the covariance is the zero
mean Gaussian.
Amplitude of wavelets coefficient inhomogeeneity is
Marginal for statistics of coefficient of wavelets are provided by GSM model threw scalar variable z.it
subjected to the associative tractability. These provides the homogeneity of Gaussian process. The
coefficients are independent. Natural images original sub band marginal statics can be define by log
coefficients from adjacent space, scale locations are histogram.
providing the spastically interdependencies. These
concepts are utilized in many textures analysis, image g( ;C) = exp(-1/2υC-1υ )
codes, denoising and artifact destructing process [19]-
[21],[6],[18]. Zero is multi variance Gaussian with covariance C.
Multivariate probability models are generated by
means of captured dependencies from natural images
.it is applied for small patches of wavelets coefficients.
Denoising is determined by the collection of centered

330
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

is parabolic curve fit to the log histogram, for


comparison

.
III. Ease of Use

3 .CONSTRUCTION OF FRACTAL CODES FOR

IMAGES

The inverse problem of Iterated Transformation


Theory[1,3,4,5] applied to digital images is an image
coding problem. Given an original r * r digital image p
to encode, it consists of finding amongthe class of
image transformations defined in section 11, a
transformation r which leaves p approximately
invariant, i.e. whichminimizes the distortion:
dL2z( .
An image for which a transformation r exists, such
that this distortion is ―small‖, is said to be
approximately block self transformable.
The image r( is then called a collage of p.Since our
image transformations have the form:
r= .

the problem is to find, for every range block Bi of size


bxb2 in the image partition, a domain block Di and a
block processing
transformation Ti, such that the distortion:
dL2(Bi, io (Di)) is minimum.

3A. DECODING FRACTAL CODES

3. A.1. Image reconstruction

The decoding scheme simply consists in iterating a


code r on any initial image /io, until convergence to a
final decoded image is observed5. The sequence of
Fig. 1.a. Effects of divisive normalization. (a) Original
images:
sub band; (b) log-histogram of marginal statistics for
original sub band; (c) sub band normalized by = r n(
Estimated � is called a fractal reconstruction sequence. The
� at each location; (d) log-histogram for normalized mapping of an imageunder a fractal code is done
coefficients, showing Gaussian behavior. Dashed line sequentially. For each cell index i, thetransformation ri

331
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

is applied to the image block over the domain Di,and Let Nsr N, and Ne denote the total numbers-parents
mapped onto the range cell Bi. andchildren-of range shade blocks, midrange blocks,
In figure 2, are displayed the first eight iterations of a and edge blocks inthe original image.
fractalcode for the ―Lena‖ image, applied to an initial The bit rate is equal to:
black image. Thevalues of the SNR between the
original ―lena‖ image and successiveterms of the +bpp
reconstruction sequence are listed in table 2.
.
Convergenceof the sequence of images is obtained, Where Np= B is the total number of parent blocks in
within .2 dB of accuracy, at the eighth iteration. Block
the image.
type Parametersshade gi

3C. coding simulation:


3B. Computation of bit rates
The characteristics of the ITT-based system used for
The full description of a fractal code r, in view of its theencoding of the 256x256, 6 bpp, ―Lena‖ image
storageor transmission, is the key to the evaluation of
are given in table 4.The original image was initially
bit rates. It dependson: (i) the description of the
split into four 128x128 sub images,
image partition, (ii) the nature of theblock Which were encoded independently? The decoded
transformations used, and (iii) the quantization of the image-a fractalapproximation of the original, is
numerical
displayed in figure 3.The performance of the encoding
Parameters of these block transformations.Three
system, in terms of bit rate
distinct types of parameterized block
and fidelity of the decoded image to the original, for
transformationsare used, depending on the type of the
the particularblock statistics specific to ―Lena‖, is also
underlying range block: shade,
given in table 4. Image―textures‖ are well preserved
Midrange or edge. The information in bits needed to although some finely textured areas, suchas the
represent a blocktransformation of each type is
turban around Lena‘s hat, are ―smoothed out‖ by the
computed from the encodingcharacteristics listed in
encoding
table 4, and is presented in table 3.
Process. Sharp, contrasted contours, whether they are
smooth orrugged, are very accurately preserved. Some
blocking
artifact is Block Information
visible,as is types Parameters in Bits
expected for
the type of Shade gi
memory less
block-
encoding Midrange Di
method ai
usedhere. ∆i 7+7=14
However,
the Edge Di 2
reconstructio ai
n is free of ∆gi
edge
degradation
7+7=14
by
3
―Staircase
7
effect‖. This
Table 3.Information in bits for the representation of is especially clear in the magnification on
blocktransformations.
Lena‘sshoulder, shown in figure 46, where the outline
of the shoulder is perfectly smooth.

332
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Encoding specifications
Partition
Range blocks: 16x16 (parent), 8x8
(child)

Domain pool
Domain blocks: 16x16 (parent), 8x8
(child)
Classification: shade (s), midrange (m),
simple edge (se)
mixed edge (me)
SNR 27.7 dB
Transformation pool
Shade block: Absorption at gE{glmin . ,
Midrange block: glmax}
Edge block: Gray level scaling by
aE{.7,. . . ’1.0)
Translation by Ag€{-
glmax,...,glmax..)
Gray level scaling by a E {
.2, . . . , .9}
Translation by Age{ -
glmax,,,, . . . , glmax‖}
Isometrics { ln}0<n<7

:
System performance

Block statistics: s: 11.0 %, m: 30.0 %, e:


59.0 %
Bit rate:
.68 bpp
SNR:
4. TIME VARRYING MODEL
11.0 %, m: 30.0 %, e:
59.0 %
A time varying image, I(x,y,t),is
Midrange modeled as a linear superpositionof basic functions,
Item.no1 2 3 4 5 6 7 8 9 10 &(x:, 7), where eachbasisfunction is localized in time
SNR 5.9 19.5 22.8 25.2 26.4 27.0 27.3 27.5 27.6 27.6 hut can he applied at any instantduring the image
Di (domain) 7 + 7 = 1 4 sequence:

I(x, y,t)=
=

Where *denotes convolution over time. Thus, the


timevaryingcoefficient, ai(t) tells us the amount by
which basisfunction is multiplied lo model the
structure aroundtime t in the moving image sequence.
'The term v(x,y,t).)is used to model additional
structure not well described bythis model. Importantly,
we examine here the case wherethe image code is
overcomplete, meaning that the numberof coefficient

333
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

signals ai( t ) exceeds the dimensionality of the movie but may heapplied at any time within the movie
I ( x , y, t ) . The model is illustrated schematically sequence.
infigure 1
The coefficients for a given image sequence are
computedby maximizing the posterior distribution over whereS is a non-convex function appropriate for
the coefficients shaping
the prior to heof sparse form [i.e., more peaked at
A = arg max P(a/I:Ө ) zero andwith heavy tails a5 compared to a Gaussian of
a the same variance,a? shown in figure 2). Here we use
= arg max P(I/a, Ө)P(a/Ө) S(x) = log(1 + )2)where eis a scaling parameter, and
a
controls thedegree of sparseness.

Where R denotes the model parameters. The image


likelihoodP(IJa0, ) is Gaussian (assuming Gaussian
noise v )
P(I/a,Ө ) =
andλ is the inverse of the noise variance. The prior
probabilitydistribution is specified to he factorial (i.e.,
statisticalindependence) over both coefficients and
time, and themarginal distribution of each coefficient is
assumed to besparse

P(a/Ө) =

P(ai(t))=

Fig. 2.The prior probability distribution over the


coefficientsis peaked at zero with heavy tails as
compared toa Gaussian ' the same variance (overlaid
as dashed line').Such a distribution would result from a
sparse activity distribution over the coefficients.
Maximizing the posterior distribution over the
coefficientsis equivalent to minimizing –-log(p(a/I,Ө)

Fig. 1.Image model.A movie I(x:,y,t) is modeled asa


linear superposition of spatial-temporal basis
functions, ( x,y,t), each of which is localized in lime

334
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

where* denotes correlation over time. Note however


thatin order to he considered a causal system, Ө(z, y,t on the origin of the spectrum and coordinatesu = (r,
) must hezero for t >0. For overlook the issue s)T . The Gaussian parameters, {am,Cm}are then
of causality and focus on what may be learned from estimated by minimizing the residual error,
sparsecoding of time-varying images.
(|Fi(u)| − Gi(u; am,Cm))2,

using a standard non-linear optimization method,


Levenberg-Marquardt (LM). LM is a conjugate gradient
descent method which requires the gradient with
4.1 IMAGE WARPLETS respect to the fitting parameters to be known:

Image regions are modeled as sets of repeating dGi(u)/dam= Gi(u)/am,


texture patches, assuming that each region can be dGi(u)/dC−1m= Gi(u)uuT/2.
adequately representedby affine coordinate
transformations of some prototypicalpatch (the source In the experiments presented below, we have found
texton) and an intensity scaling. Denotingthe that amixture with 2 components is sufficient to model
coordinates of a target patch, the generalshape and amplitude of oriented and
fj, by y = (p, q)T , 0 _p, q,_ N, periodic patches so that
the target patch estimate is a linear transformation of the model can be uniquely
fit tothe amplitude spectrum of a target image block.
fj(y)= , The second step of the affine transformation
estimationuses the mixture model of the source block
Tij(X)=Aijx+tij; spectrum andsearches for a linear transformation that
minimizes the squared residuals (Gi (Aij
A2ij= (x)/ 2
(u))−|Fj(u)|) . This search can beagain performed by
the LM method. This time the gradientsof
Where Tijis a 2D affine transformation of the Hi(u) = Gi(A(u)) w.r.t. the parameters of the
coordinates ofsome source patch fi and _ijis a scaling lineartransformation matrix are needed:
for image intensitychanges. The transformation
consists of a linear part, Aij,and a translation, tij. It is dHi (u)/dA= −Hi(u)
convenient to use overlappingsquare image patches
(blocks) as warp lets of size B, suchas 16 × 16, 32 ×
32 etc. By using a squared cosine window Function The final step is to estimating the translation, tij,
centered on the block, which isexhibited in the phase spectrum of the block
DFT. An estimateof a transformed block ˆ fj(y), can be
W(y) = cos2 [ ] cos2 [ q/B]and having the blocks synthesized bythe applying the linear transformation: y
overlap by 50%, it is possible to sumthe transformed = Aijx. Thetranslation, tij, is then taken as the peak
patches, w(y)fj(y), without blocking artifacts location of the
being introduced. To estimate the affine Cross-correlation i.e. argmaxt[ ˆfj*fj].
transformations Tij, the Fouriertransform of the source The Image Warp let, Wi, is defined as the set of all
patch, Fi(u), is used to separate thetransformation, image
Blocks j transformed to the coordinate frame of block i
Fj(u) =1/|detAij|exp(iuTtij)/Fi(|ATij|-1u)| Using T−1ij, Wi = {wij(x) = ˆ fij(x), x = A−1ij (y −
tij)8j}
such that the linear part, Aij, affects only the
amplitudespectrum and the translation tijis exhibited PCA (or harmonic analysis) of the warp let blocks,
as a phasegradient. The amplitude spectrum, |Fi(u)|, wijisused to encode their variability. Then, any image
is modeled as block isrepresented by the mean warp let plus an
a two-dimensional, M component Gaussian mixture appropriate numberof modes of variation in the Warp
let domain, and thecorresponding block-to-block affine
Gi(u) = TC u/2), having centroids fixed warps, Tij.

335
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

3. SELF-SIMILARITY CLUSTERING Information encoded at twice the spatial resolution as


thenumber of blocks in each dimension i.e. 4 extra
To discover a set of prototypical warp lets, ideally one parameters per block: the total being 6 + 4 + 1 = 11
foreach image region that appropriately ‗span‘ the per block. In addition, each prototype itself has B2
image, aclustering approach is used. In the supervised pixels per image:
case, we cluster
not just to the representative (given) cluster centroids,
butto an extended family of cluster centers which are
created bytaking a finite sub-group from the group
(Aij, tij)fl, where
lis a given prototypical block. As with the GMM
modelestimation described above, we extrapolate the
sub-groupof rotations and scaling from the amplitude
spectrum prototype
Block. In our experiments we have restricted these
prototypetransformations to a sub-group of rotations
and scaling of fland ignored shears. We have used 8
rotations and8 scaling to create a cluster family flk, 0 _
k <64. Havingapplied each transformation, k, these
prototype blocks
giveB2 dimensional vectors: flk(x) ! flk. The
remainingimage blocks are then clustered using a
nearest neighbor criterion. We can imagine that the
data feature vectors clusterclosest points on manifolds
represented by the k rotation/scaling family of each
prototype. Thus each data block
is labeled as the closest family member which have
alreadytaken into account the affine group invariance.
Scholkopfhas suggested this same idea but for
extending SVM vectors [6].

Unsupervised clustering can take place directly in the


Warp let domain on the warped blocks, wijsince the
warpingwill have eliminated the local shape variability
and allow, the data vectors wijto be directly compared.
A natural, extension to modeling the class variability, is
to perform a
Set of local PCA‘s on each resulting cluster.,

4. EXPERIMENTAL RESULTS

Figure 1 shows results of self-similarity image


modeling and reconstruction on a fingerprint and two
wild-life imageswith zebras and a jaguar, and the Lena
image. The mixturemodeling analysis was performed
on pre-whitened (Laplacianpyramid) version of the
original (see [4] for details) andcoarse version of the
low-pass information was added backat the end. The V. SUMMARY AND DISCUSSIONS
number of parameters per pixel for theseimages is
related to the block size, B and the image size NAnd We have described an affine symmetric image
can be estimated by: P(B) = 11(B/2)2 + (B/N)2. model that defines patch-to-patch affine relationships
Thisassumes that each affine transformation requires 6 on an image with a uniform lattice. Considering
parameters, with 1 for the intensity scaling, _, and the textures on every patch, it was realized that not all
low-pass patches are appropriate for the exploitation of affine

336
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

symmetry with other patches, particularly patches and synthesis,‖ IEEE Trans. Image Process., vol. 7, no.
from background areas with uniform intensity. The 10, pp.
first obstacle, therefore, was to develop a texture 1466–1476, Oct. 1998.
analyzer so that a patch with structural texture can be [2] R. Haralick, ―Statistical and structural approaches
distinguished and treated differently from to texture,‖ Proc.
nonstructural textures. Two different approaches to IEEE, vol. 67, no. 5, pp. 786–804, May 1979.
estimate the distance in affine space were presented; [3] R. Haralick, K. Shanmugam, and I. Dinstein,
one based upon warping residue and the other based ―Texture features for
upon affine invariant features. The latter provides a Image classification,‖ IEEE Trans. SMC, vol. SMC–3,
more practical solution in terms of computational no. 6, pp.
efficiency. Affine invariance has received much 610–621, Nov. 1973.
attention with the recent emergence of content based [4] B. Olshausen and D. Field, ―What is the other 85%
retrieval systems, which could take advantage of this of V1 doing?,‖ in
work. The majority of existing texture analysis 23 Problems in Systems Neuroscience. London, U.K.:
methods, however, are not designed to analyze Oxford Univ.
texture from an invariance viewpoint. Several Press, 2004.
noteworthy geometric invariant analysis methods have [5] B. Olshausen and D. Field, ―Emergence of simple-
the same common theme of directional pattern cell receptive field
recognition. This led us to develop further the texture properties by learning a sparse code for natural
model used for structured-ness analysis into a fully images,‖ Nature, vol.
affine invariant descriptor. The usefulness of the new 381, pp. 607–609, 1996.
affine [6] E. Oja, ―A simplified neuron model as a principal
invariant feature was demonstrated in a multi component analyzer,‖
resolution framework for the segmentation of a J. Math. Biol., vol. 15, pp. 267–273, 1982.
textured object. This work not only presents an [7] R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki ,
interesting approach to the segmentation task but also Probabilistic Models
offers a feasible solution for efficient implementation. of the Brain: Perception and Neural Function.
The underlying concept has been applied to image Cambridge, MA:MIT
classification by many researchers but few have Press, 2002.
applied the affine PARK et al.: AN AFFINE SYMMETRIC [8] E. O. A. Hyvärinen and J. Karhunen, Independent
IMAGE MODEL AND ITS APPLICATIONS 1705 Component Analysis.
symmetry model to segmentation by partitioning the Hoboken, NJ: Wiley, 2001.
image into blocks. The complexity of the algorithms, [9] B. Julesz, ―Texture gradients: The texton theory
however, has been a major issue prohibiting practical revisited,‖ Spatial Vis.,
implementations. The motivation has been to develop vol. 1, no. 1, pp. 19–30, 1985.
a computationally efficient image texture classification [10] S. Zhu, Y. Z. W. C. Guo, and Z. J. Xu, ―What are
algorithm while maintaining the texture discriminative textons?,‖ Int. J.
power of previous approaches. The simplicity and Comput.Vis., pp. 121–143, 2005.
efficiency of the presented approach utilizing an affine [11] B. Olshausen and D. Field, ―Sparse coding with an
invariant shape description is demonstrated. It may be over-complete
of interest where efficient texture segmentation is basis set: A strategy employed by V1?,‖ Vis. Research,
required. Experimental evaluation indicates acceptable vol. 37, pp.
segmentation results for structural texture and the 3311–3325, 1997.
algorithm‘s robustness to noise. Further study utilizing [12] B. Olshausen, ―Learning sparse, over complete
a random field segmentation framework with other representations of time varying
useful features may improve the algorithm, thereby natural images,‖ in Proc. IEEE Int. Conf. Image
determining the optimal number of segmented Processing,
regions. Additionally, it can also be utilized for image 2003, vol. 1, pp. 41–44.
compression. [13] A. Jacquin, ―A novel fractal block-coding
technique for digital images,‖
REFERENCES inProc. IEEE ICASSP, Alberquerque, NM, Apr. 1990,
[1] T. Hsu and R. Wilson, ―A two-component model of pp.
texture for analysis 2225–2228.

337
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[14] Y. Fisher, Fractal Image Compression: Theory and Process. vol. 38, no. 3, pp. 674–690, Mar. 1992.
Applications, [22] H. Park, G. Martin, and A. Bhalerao, ―Structural
ser. Communications and Information Theory. New texture segmentation
York: Using affine symmetry,‖ in Proc. IEEE Int. Conf. Image
Springer-Verlag, 1995. Processing,
[15] R. Bracewell, K. Chang, A. Jha, and Y. Wang, San Antonio, TX, Sep. 2007, pp. 49–52.
―Affine theorem for [23] Rawlinson and C. Li, ―A class of discrete
Two-dimensional Fourier transform,‖ Electron. Letts., Multiresolution random fields
vol. 29, no. 3, p. And its application to image segmentation,‖ IEEE
304, 1993. Trans. Pattern Anal.
[16] E. Brigham, The fast Fourier Transform and Its Mach. Intell., vol. 25, no. 1, pp. 42–56, Jan. 2003.
Applications. Upper [24] A. Bhalerao and R. Wilson, ―Affine invariant image
Saddle River, NJ: Prentice-Hall, 1988. segmentation,‖
[17] Z. Yao, N. Rajpoot, and R. Wilson, ―Directional Presented at the British Machine Vision Conference,
wavelet with Fourier type 2004, Kingston
bases for image processing,‖ in Wavelet Analysis and University, U.K..
Applications, [25] T. Smith, ―Texture Modeling and Synthesis in Two
ser. Applied and Numerical Harmonic Analysis, X. Y. T. and Three Dimensions,‖
QianAnd M. I. Vai, Eds. New York: Springer-Verlag, M.S. thesis, Dept. Comput. Sci., Univ.Warwick,
2007, pp. 123–142. Coventry,
[18] H. Park, G. Martin, and Z. Yao, ―Image denoising U.K., 2004.
with directional [26] H. Park, A. Bhalerao, G. Martin, and A. Yu, ―An
Bases,‖ in Proc. IEEE Int. Conf. Image Processing, San affine symmetric approach
Antonio, TX, to natural image compression,‖ in Proc. 2nd Int.
Sep. 2007, pp. 301–304. Mobile Multimedia
[19] D. Hammond and E. Simon celli, ―Image modeling Communications Conf., Alghero, Italy, Sep. 2006, pp.
and denoising 1–6.
With orientation-adapted Gaussian scale mixtures,‖ [27] C. Li, ―Multiresolution image segmentation
IEEE Trans. Image integrating Gibbs sampler
Process. vol. 17, no. 11, pp. 2089–2101, Nov. 2008. ang region mergion algorithm,‖ Signal Process., vol.
[20] K. Arbter,W. Snyder, H. Burkhardt, and G. 83, pp. 67–78,
Harbinger, ―Application of 2003.
affine-invariant Fourier descriptors to recognition of 3- [28] D. Donoho and I. Johnstone, ―Adapting to
d objects,‖ IEEE unknown smoothness via
Trans. Pat. Anal. Mach. Intell., vol. 12, no. 7, pp. 640– Wavelet shrinkage,‖ J. Amer. Statist.Assoc., vol. 90,
647, Jul. 1990. no. 432, pp.1200–1224, 1995.
[21] R. Wilson, A. Calway, and E. Pearson, ―A [29] A. Bhalerao and R. Wilson, ―Warp let: An image-
generalized wavelet transform dependent wavelet
for Fourier analysis: The Multiresolution Fourier Representation,‖ in Proc. IEEE Int. Conf. Image
Transform and Processing, 2005, pp.
its application to image and audiosignal analysis,‖ IEEE 490–493.
Trans. Image

338
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

COMBINING TPE SCHEME AND SDEC FOR SECURE


DISTRIBUTED NETWORKED STORAGE
*S. Madhavi ** S. Kalpana Devi
*M.E., Computer Science and Engineering
S.A. Engineering College, Chennai-77
Madhavi11lakshmi@gmail.com
**Assistant Professor
Department of Computer Science and Engineering
S.A. Engineering College, Chennai-77

Abstract--Distributed networked storage that runs in a distributed system is called a


systems provide the storage service on the distributed program, and distributed
Internet. The data stored in the system should
remain private even if all storage servers in the programming is the process of writing such
system are compromised. The major challenge programs. Distributed computing also refers to the use
of designing these distributed networked of distributed systems to solve computational
storage systems is to provide a better privacy problems. In distributed computing, a problem is
guarantee while maintaining the distributed divided into many tasks, each of which is solved by
structure. To perform this, a new technique one computer. The word distributed in terms such as
called, secure decentralized erasure code, which "distributed system", "distributed programming", and
combines a threshold public key encryption "distributed algorithm" originally referred to computer
scheme and a variant of the decentralized networks where individual computers were physically
erasure code is used. This secure distributed distributed within some geographical area. The terms
networked storage system constructed by the are nowadays used in a much wider sense, even
SDEC is decentralized, robust, private, and with referring to autonomous processes that run on the
low storage cost. To maintain the decentralized same physical computer and interact with each other
architecture while applying the data encryption, by message passing. While there is no single definition
a new threshold public key encryption scheme of a distributed system, the following defining
is used such that each key server can properties are commonly used:
independently perform the decryption. As a  There are several autonomous computational
result, the distributed networked storage entities, each of which has its own local
system constructed by SDEC will be secure and memory.
fully decentralized.  The entities communicate with each other by
message passing.
Index Terms--Secure Decentralized Erasure A distributed system may have a common
Code (SDEC), Threshold Public Key Encryption(TPE), goal, such as solving a large computational problem.
security, Distributed Networked Storage. Alternatively, each computer may have its own user
with individual needs, and the purpose of the
1. INTRODUCTION distributed system is to coordinate the use of shared
resources or provide communication services to the
Distributed computing is a field of users.
computer science that studies distributed systems. A Other typical properties of distributed systems
distributed system consists of multiple autonomous are:
computers that communicate through a computer  The system has to tolerate failures in
network. The computers interact with each other in individual computers
order to achieve a common goal. A computer program

339
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 The structure of the system (network disk. But it is not simple to do so. Because today‘s
topology, network latency, number of large data centers have so many disks that multiple
computers) is not known in advance, the disk failures are more common. Permanent data loss
system may consist of different kinds of becomes likely.
computers and network links, and the system The performance metrics is affected like
may change during the execution of a storage efficiency, saturation throughput, rebuild time,
distributed program mean time to data loss,
 Each computer has only a limited, incomplete encoding/decoding/update/rebuild complexity, etc.
view of the system where each computer may Hence Erasure codes are used to overcome the above,
know only one part of the input. where the data are encoded on n-disks onto (n+m)
A distributed computer (also known as a disks such that the whole system can tolerate upto m
distributed memory multiprocessor) is a distributed disk failures.
memory computer system in which the processing Distributed networked storage systems aim to
elements are connected by a network. Distributed provide the storage service on the Internet. Current
computers are highly scalable. It is possible to roughly research on distributed networked storage systems
classify concurrent systems as "parallel" or focuses on:
"distributed" using the following criteria:  Efficiency of the storage system
 In parallel computing, all processors have  Robustness of the storage systems
access to a shared memory. Shared memory The methods for accelerating the storing and
can be used to exchange information between retrieval processes should be done with minimal cost
processors. and maximal robustness. Since the Internet is a public
 In distributed computing, each processor has environment that anyone can freely access, it is also
its own private memory (distributed memory). important to consider the privacy issue of the stored
Information is exchanged by passing information of the users. The goal is to design the
messages between the processors. distributed networked storage systems in such a way
There are two main reasons for using that privacy is guaranteed while maintaining the
distributed systems and distributed computing. distributed structure and to ensure that the data
1. The very nature of the application may require stored in the system remain private even if all storage
the use of a communication network that connects servers in the system are compromised.
several computers. For example, data is produced in
one physical location and it is needed in another 2. LITERATURE REVIEW
location.
2. There are many cases in which the use of a The purpose of distributed networked storage
single computer would be possible in principle, systems [2], [3], [4] is to store data reliably over a
but the use of a distributed system is very long period of time by using a distributed
beneficial for practical reasons. accumulation of storage servers. Long term reliability
For example, it may be more cost-efficient to requires some sort of redundancy. A straightforward
obtain the desired level of performance by using a solution is simple replication; however, the storage
cluster of several low-end computers, in comparison cost for the system is high. Erasure codes are
with a single high-end computer. A distributed system proposed in several designs for reducing the storage
can be more reliable than a non-distributed system, as overhead in each storage server [5], [6] after linear
there is no single point of failure. Moreover, a network codes [7], [8] are proposed. A decentralized
distributed system may be easier to expand and erasure code [9] is an erasure code with a fully
manage than a monolithic uniprocessor system. decentralized encoding process. Assume that there are
Distributed Storage needs increase almost n storage servers in the networked storage system,
exponentially – widespread use of e-mail, photos, and k messages are stored into the storage servers
videos, logs, etc. It cannot store everything on one such that one can retrieve the kmessages by only
large disk. Thus, if the disk fails, all the stored querying any k storage servers. The method of erasure
information will be lost. So, the solution is to store the codes provides some level of privacy guarantee since
users information along with some redundant the stored data in less than k storage servers are not
information across many disks. Even if a disk fails, enough to reveal all information about the k
then it still have enough information in the surviving messages. However, it is hard to assure that only less
disks, where lost information can be replaced in a new than k storage servers are compromised in an open
network. Thus, a more sophisticated method is

340
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

required to protect the data in the storage servers The figure provides an overview of system.
while the owner of the messages can retrieve them There are k messages Mi, 1 ≤ i ≤ k, to be stored into n
even if only some storage servers respond to the storage servers SSi, 1 ≤ i ≤ n. These messages are the
retrieval request. segments of a file. For those k messages, a message
identifier is assigned. Each message Mi is encrypted
3. SYSTEM DESIGN under the owner‘s public key pk as Ci= E(pk,Mi). Then,
each cipher text is sent to v storage servers, where
The scope of the project is to protect the data the storage servers are randomly chosen.
from the unauthorized access in the distributed
network storage system. To reduce the unauthorized
access, a concept called SDEC is used. Because of this
concept the data can be secured in the form of
encryption. SDEC are designed for reducing the
storage overhead in each storage server. An erasure
code is a fully decentralized encoding process. A SDEC
is proposed which combines the concepts of data
encryption and decentralized erasure codes. In this
code, the messages are stored in an encrypted form.
Even if the attacker compromises all storage servers,
he cannot compute information about the content of
the messages. In cryptography, the security of a
system lies on protection of the secret key. Thus, the
key servers that hold the secret key are set up or Each storage server SSi combines the received cipher
carefully chosen by the owner. Due to their texts by using the decentralized erasure code to form
importance, they are highly protected by various the stored data ζi. The owner‘s secret key sk is shared
system and cryptographic mechanisms[10][11]. among m key servers KSi, 1 ≤ i ≤ m, by a threshold
In this storage system, the owner shares his secret sharing scheme so that the key server KSi holds
decryptionkey to a set of key servers in order to a secret key share ski. To retrieve the k messages, the
mitigate the riskof key leakage. As long as less than t owner instructs the m key servers such that each key
key servers arecompromised by the attacker, the server retrieves stored data from u storage servers
decryption key is safe.Furthermore, as long as t key and does partial decryption for the retrieved data.
servers get cipher texts fromsome storage servers to Then, the owner collects the partial decryption results,
decrypt, the owner can computethe messages back. called decryption shares, from the key servers and
The system should maintain decentralized combines them to recover the k messages.
architecture while applying the data encryption. Thus,
a new threshold public key encryption scheme is used 5. PRELIMINARIES
such that each key server independently perform the
decryption. In traditional threshold public key This section briefly describes the bilinear maps and
encryption schemes [12], [13], decrypting a set of threshold public key encryption using bilinear map is
cipher texts requires that each of the key servers proposed. Also provides the overview of the
decrypts all of the cipher texts. On the other hand, in decentralized erasure codes.
new threshold public key encryption scheme,
decrypting a set of cipher texts only requires that each 5.1 BILINEAR MAPS AND ASSUMPTIONS
of the key servers decrypts one of the cipher texts. As
a result, the distributed networked storage system Bilinear map: If there are two cyclic
constructed by secure decentralized erasure code is multiplicative groups with prime order and generator,
secure and fully decentralized: each encrypted then the bilinear mapping can be generated provided if
message is distributed independently; each storage it satisfies bilinearity and non-degeneracy. The
server performs the encoding process independently; assumptions based on Bilinear map are:
and each key server executes decryption Bilinear Diffie-Hellman assumption: The
independently. assumption is that it is hard to solve the problem with
a significant probability in polynomial time.
4. SYSTEM ARCHITECTURE

341
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Decisional Bilinear Diffie-Hellman message be I = (m1,m2, . . .,mk), the generator matrix


assumption: The assumption is that, it is hard to G = [gi,j] where 1≤i≤k,1≤j≤n and the codeword be
distinguish group from a random element. O=(w1,w2, . . . ,wn). The generator matrix constructed
by an encoder is:
5.2 THRESHOLD PUBLIC KEY ENCRYPTION  For each row, randomly marks an entry as 1
and repeats this process
In cryptography, a cryptosystem is called a  The encoder randomly sets a value for each
'threshold cryptosystem', if in order to decrypt an marked entry
encrypted message a number of parties exceeding a The decoding process receives k columns of G
threshold is required to cooperate in the decryption and the corresponding codeword elements to compute
protocol. The message is encrypted using a public key the original message. The decoding process is
and the corresponding private key is shared among successful if the submatrix is invertible. But by doing
the participating parties. Let n be the number of so, it is resilient to errors and the messages can be
parties. Such a system is called (t,n)-threshold, if at attacked by other users.
least t of these parties can efficiently decrypt the
ciphertext, while less than t have no useful 5.4 SECURE DECENTRALIZED ERASURE CODES
information. Similarly it is possible to define (t,n)-
threshold signature scheme, where at least t parties Assume that there are n storage servers which
are required for creating a signature. store data and m key servers which own secret key
Threshold versions of encryption schemes can shares and perform partial decryption. Consider that
be built for many public encryption schemes. The the owner has the public key and shares the secret
natural goal of such schemes is to be as secure as the key x to m key servers with a threshold t, where m ≥ t
original scheme. ≥ k. Let the k messages be M1,M2, . . .,Mk.
A threshold public key encryption consists of 6 The storage process and the retrieval process
algorithms: are described as follows:
 SetUp: generates the public parameters of
the whole system  Storage process: To store k messages, the
 KeyGen: generates a key pair, consisting of a storage process is as follows:
public key pk and a secret key sk for each
user. 1) Message encryption: The owner
 ShareKeyGen: Each user uses ShareKeyGen encrypts all k messages via the threshold
to share his secret key into n secret key public key encryption with the same identifier.
shares such that any t of them can recover the 2) Cipher text distribution: For each cipher
secret key. text, the owner randomly chooses storage
 Enc: Enc encrypts a given message by a servers and sends each of them a copy of
public key pk, and outputs a cipher text. cipher text.
 ShareDec: partially decrypts a given 3) Decentralized encoding: for all received
ciphertext by a secret key share and outputs a cipher texts with the same message identifier,
decryption share. the storage server groups them.
 Combine: takes a set of decryption shares as
input and outputs the message if and only if  Retrieval process: To retrieve k messages,
there are at least t decryption shares. the retrieval process is as follows:
When a fixed h is used for a set of cipher
texts, the set of those cipher texts are multiplicative 1) Retrieval command: The owner sends a
homomorphic. The multiplicative homomorphic command to the key servers with the message
property is that given a cipher text for M1 and a cipher identifier.
text for M2, a cipher text for M1×M2 can be generated 2) Partial decryption: Each key server randomly
without knowing the secret key x, M1, and M2. queries storage servers with the message identifier
and obtains at most stored data from the storage
5.3 DECENTRALIZED ERASURE CODE servers. Then the key server performs ShareDec
on each received cipher text by its secret key to
A decentralized erasure code [9] is a random obtain a decryption share of the cipher text.
linear code with a sparse generator matrix. Let the

342
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

3) Combining and decoding: The owner Proceedings ofthe 18th Symposium on Operating
chooses decryption share from all received data Systems Principles - SOSP. ACM, 2001, pp. 202–215.
and computes the message identifier. If the [5]S. Aceda´n ski, S. Deb, M. M´edard, and R.
number of the received decryption share is more Keettor, ―How good is random linear coding based
than the key servers, then the owner randomly distributed networked storage,‖ in Proceedings of the
selects out of them. If the number is less than key First workshop on Network Coding, Theory,and
servers, the retrieval process fails. By using the Applications - NetCod, 2005.
messages identifier, the owner decrypts and [6]C. Gkantsidis and P. Rodriguez, ―Network coding for
obtains the original information. large scale content distribution,‖ in Proceedings of
IEEE 24th AnnualJoint Conference of the IEEE
Computer and CommunicationsSocieties - INFOCOM,
6. CONCLUSION
vol. 4. IEEE Communications Society, 2005, pp. 2235–
2245.
Thus the distributed networked storage
[7]R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung,
system when constructed by SDEC provides both of
―Network information flow,‖ IEEE Transactions on
the storage service and the key management service
Information Theory, vol. 46, pp. 1204–1216, 2000.
where the construction will be fully decentralized; each
[8]S.-Y. R. Li, R. W. Yeung, and N. Cai, ―Linear
encrypted message will be distributed independently;
network coding,‖ IEEE Transactions on Information
each storage server might perform the encoding
Theory, vol. 49, pp. 371– 381, 2003.
process in a decentralized way; each key server
[9]A. G. Dimakis, V. Prabhakaran, and K.
queries the storage servers independently. Moreover,
Ramchandran, ―Decentralized erasure codes for
the secure distributed networked storage system
distributed networked storage,‖ IEEETransactions On
guarantees the privacy of messages even if all storage
Information Theory, vol. 52, no. 6, pp. 2809– 2816,
servers are compromised. Hence storage system
2006.
securely stores data for a long period of time on
[10]A. Herzberg, S. Jarecki, H. Krawczyk, and M.
untrusted storage servers in the distributed network
Yung, ―Proactive secret sharing or: How to cope with
structure.
perpetual leakage,‖ in Proceedings of the 15th Annual
International CryptologyConference - CRYPTO, ser.
Lecture Notes in Computer Science, vol. 963. Springer,
1995, pp. 339–352.
REFERENCES
[11]C. Cachin, K. Kursawe, A. Lysyanskaya, and R.
Strobl, ―Asynchronous verifiable secret sharing and
[1]Hsiao-Ying Lin and Wen-GueyTzeng ―A Secure
proactive cryptosystems,‖ in Proceedings of the 9th
Decentralized Erasure Code for Distributed Networked
ACM conference on Computer andcommunications
Storage‖, IEEE Transaction on Parallel and distributed
security - CCS. ACM, 2002, pp. 88–97.
systems, 2010.
[12]R. Canetti and S. Goldwasser, ―An efficient
[2]J. Kubiatowicz, D. Bindel, Y. Chen, S. E. Czerwinski,
threshold public key cryptosystem secure against
P. R. Eaton, D. Geels, R. Gummadi, S. C. Rhea, H.
adaptive chosen cipher text attack,‖ 1999, pp. 90–106.
Weatherspoon, W. Weimer, C. Wells, and B. Y. Zhao,
[13]D. Boneh, X. Boyen, and S. Halevi, ―Chosen cipher
―Oceanstore: an architecture for global-scale persistent
text secure public key threshold encryption without
storage,‖ in Proceedings ofthe Ninth international
random oracles,‖ in CT-RSA, ser. Lecture Notes in
Conference on Architectural Support for Programming
Computer Science, vol. 3860. Springer, 2006, pp. 226–
Languages and Operating Systems - ASPLOS, vol. 35.
243.
ACM, 2000, pp. 190–201.
[3]S. C. Rhea, C. Wells, P. R. Eaton, D. Geels, B. Y.
Zhao, H.Weatherspoon, and J. Kubiatowicz,
―Maintenance-free

global data storage,‖ IEEE Internet Computing, vol. 5,


no. 5, pp. 40– 49, 2001.
[4]F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and
I. Stoica, ―Wide-area cooperative storage with cfs,‖ in

343
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

PERFORMANCE EVALUATION OF FLOOD


SEQUENCING PROTOCOLS IN SENSOR
NETWORKS
*Lavanya RE.Sujatha**
*Computer Science Department
S.A EngineeringCollege
laaviraj@gmail.com
**Computer Science Department
S.A Engineering College
sanjaymohankumar@gmail.com

of a message to every sensor in the network. The


Abstract— Flood is a communicative primitive execution of a flood starts by the base station sending
that can be used by the base station of a sensor a message to all its neighbors. When a sensor receives
network to send a copy of a message to every a message, the sensor needs to check whether it has
sensor in the network .when a sensor receives a received this message for the first time or not. Only if
flood message, the sensor needs to check the sensor has received the message for the first time,
whether it has received this message for the the sensor keeps a copy of the message and may
first time and so this message is fresh message, forward the message to all its neighbors. Otherwise,
or it has received the same message ealier and the sensor discards the message.
so the message is redundant. In this project, a To distinguish between ―fresh‖ flood messages that
family of four flood sequencing protocols has a sensor should keep and ―redundant‖ flood messages
been discussed, that use protocols are of: a that a sensor should discard, the base station selects a
sequencing free protocol, a linear sequencing sequence number and attaches it to a flood message
protocol, a circular sequencing protocol, and a before the base station broadcasts the message. When
differentiated sequencing protocol. The self- a sensor receives a flood message, the sensor
stabilization property ,stable property, collision determines based on the sequence number in the
avoidance, no of sources of transmission, received message if the message is fresh or
secured communication, flooding period of redundant. The sensor accepts the message if it is
these four flood sequencing protocols has been fresh and discards the message if it is redundant. We
analyzed, over various setting of sensor call a protocol that uses sequence numbers to
networks .On this analysis the performance of distinguish between fresh flood messages and
all this protocol has been analyzed and redundant flood messages a flood sequencing
evaluated. protocol.
In a flood sequencing protocol, when a fault
Keywords— Self-stabilization, Flood sequencing corrupts the sequence numbers stored in some
protocol, Sequence numbers, Sensor networks, sensors in a sensor network, the network can become
Flood in an illegitimate state where the sensors discard fresh
flood messages and accept redundant flood messages.
XII. INTRODUCTION Therefore, a flood sequencing protocol should be
Flood is a communication primitive that can be used designed such that if the protocol ever reaches an
by the base station of a sensor network to send a copy illegitimate state due to some fault, the protocol is

344
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

guaranteed to converge back to its legitimate states decisions of their flood sequencing protocols) were not
where every sensor accepts every fresh flood message specified.
and discards every redundant flood message. In Scalable Reliable Multicast (SRM) [10], when a
In this paper, we discuss a family of four flood receiver in a multicast group detects that it has a
sequencing protocols. They are a sequencing free missing data message, it attempts to retrieve the
protocol, a linear sequencing protocol, a circular message from any node in the group by requesting
sequencing protocol, and a differentiated sequencing retransmission. This work is based on the assumption
protocol. We analyse the stabilization properties of that each data message has a unique and persistent
these four protocols. For each of the protocols, we first name, and it utilizes application data units to name
compute an upper bound on the convergence time of messages. In a flood sequencing protocol, sensors can
the protocol from an illegitimate state to legitimate use sequence numbers in a limited range for flood
states. Second, we compute an upper bound on the messages. Thus, the sensors cannot identify a
number of fresh flood messages that can be discarded message uniquely based on the sequence number of
by each sensor during the convergence. Third, we the message, and cannot use the sequence number
compute an upper bound on the number of redundant for requesting retransmission and replying to a
flood messages that can be accepted by each sensor request.
during the convergence. The protocols in [11], [12] use named data that is
specific to applications for dissemination and routing in
XIII. RELATED WORK sensor networks. However, a flood sequencing
A flood sequencing protocol can be designed in protocol can be used, before any application is
various ways, depending on several design decisions deployed in the network. Thus, using named data is
such as how the next sequence number is selected by not suitable for a flood sequencing protocol.
the base station, how each sensor determines based
on the sequence number in a received message if the
received message is fresh or redundant, and what
information the base station and each sensor stores in
its local memory.The practice of using sequence
numbers to distinguish between fresh and redundant
flood messages has been adopted by most flood
protocols in the literature.

There have been earlier efforts to study flood


protocols in sensor networks [6], [7], [8], [9].It is
important to state that these protocols focus on Fig 1. A specification of sensor 0 in a flood
reducing the total number of retransmissions or sequencing protocol
forwarding (sensor) nodes for a flood message, while
the flood sequencing protocols focus on distinguishing
between fresh and redundant messages, to prevent A directed edge (u, v), from a sensor u to a sensor
nodes from forwarding the same message more than v, that is labelled with probability p (where p > 0)
once. indicates that if sensor u sends a message, then this
In [2], [9], it was suggested to associate a message arrives at sensor v with probability p
sequence number with each flood message. The flood (provided that neither sensor v nor any ―neighbouring
protocols discussed in [3], [4], [7] propose to attach a sensor‖ of v sends another message at the same
unique identifier or sequence number to each flood time). In this work, two values 0.95 and 0.5 are
message and make each node maintain a list of selected for p. If the topology of a sensor network has
identifiers that it has received recently. a directed edge from a sensor u to a sensor v, then u
In [5], for a recently received flood message, each is called an in-neighbor of v and v is called an out-
node maintains an entry of the source address, neighbor of u.
sequence number, and lifetime. However, in these This sensor 0 specification is given in [1].During the
protocols, any details on how sequence numbers or execution of a sensor network protocol, several faults
identifiers are used by nodes, how many identifiers or can occur, resulting in corrupting the state of the
messages each node maintains, when a node deletes protocol arbitrarily. Examples of these faults are wrong
an identifier or a message from the list, or how the initialization, memory corruption, message corruption,
lifetime of a message is determined (i.e., the design

345
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

and sensor failure and recovery. We assume that


these faults do not continuously occur in the network.

XIV. OVERVIEW OF A FLOOD SEQUENCING


PROTOCOL
Consider a network that has n sensors. In this
network, sensor 0 is the base station and can initiate
message floods over the network. To initiate the flood
of a message, sensor 0 selects a sequence number
slast for the message, and sends the message of the
form data(hmax, slast), where hmax is the maximum
number of hops to be made by this data message in
the network
If sensor 0 initiates one flood and shortly after
initiates another flood, some forwarded messages
from these two floods can collide with one another
causing many sensors in the network not to receive
the message of either flood, or not to receive the
messages of both floods. To prevent message collision
across consecutive flood messages, once sensor 0
broadcasts a message, it needs to wait enough time
until this message is no longer forwarded in the Fig 2.A specification of sensor u in a flood
network, before broadcasting the next message. The sequencing protocol.
time period that sensor 0 needs to wait after To reduce the probability of message collision, any
broadcasting a message and before broadcasting the sensor u, that decides to forward a message, chooses
next message is called the flood period. The flood a random period whose length is chosen uniformly
period consists of f time units from the range 1...tmax, and sets its time-out to
Each sensor u that is not sensor 0 keeps track of the expire after the chosen random period, so that u can
last sequence number accepted by u in a variable forward the received message at the end of the
called slast. When sensor u receives a data (h, s) random period. This random time period is called the
message, the sensor decides whether it accepts the forwarding period. A sensor u maintains a variable
message based on the values of slast and s, and called new. The value of new is true only when u is in
forwards it as a data (h-1, s) message, provided h > the forwarding period.
1. This flood sequencing protocol is in a legitimate
state iff it satisfies the following two conditions:
1. Every time sensor 0 initiates a new flood, previous
flood messages whether initiated by sensor 0
legitimately or other sensors illegitimately due to some
fault are no longer forwarded in the network.
2. Every sensor u accepts every fresh flood message,
and discards every redundant flood message.

A. First Protocol: Sequencing Free


In this protocol, no sensor can distinguish between
fresh and redundant flood messages, resulting that the
sensor accepts every received message. This protocol
is called the sequencing free protocol. To initiate the
flood of a new message, sensor 0 sends a data (hmax)
message, and then sets its time-out to expire after f
time units to broadcast the next message. The timeout
action of sensor 0 is specified as follows:

346
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

accepts at most 2* f redundant messages (before the


protocol converges to a legitimate state).

B. Second Protocol: Linear Sequencing


In this protocol each flood message carries a unique
Fig. 3. A specification of sensor 0 in a sequencing sequence number that is linearly increased, and so a
free protocol. sensor accepts a flood message that has a sequence
number larger than the last sequence number
When sensor u receives a data (h) message, u accepted by the sensor. This protocol is called the
always accepts the message. Sensor u forwards the linear sequencing protocol.
message as data (h - 1), if h > 1 in the received Each flood message in this protocol is augmented
message and new ¼ false in u. The time-out and with a unique sequence number. Whenever sensor 0
receiving actions of sensor u are specified as follows: broadcasts a new message, sensor 0 increases the
sequence number of the last message by one, and
attaches the increased sequence number to the
message. The time-out action of sensor 0 is given as
follows:

.
Fig. 4. A specification of sensor u in a sequencing Fig. 5. A specification of sensor 0 in a linear
free protocol. sequencing protocol

The stabilization property of the sequencing free When sensor u receives a data (h, s) message,
protocol can be stated by the following three sensor u accepts the message if s > slast, and
theorems: Theorem 1A gives an upper bound on the forwards the message if h > 1. Otherwise,
convergence time of the protocol from an illegitimate sensor u discards the message. The receiving action of
state to legitimate states. Theorem 1B gives an upper u is given as follows:
bound on the number of fresh messages that can be Let k be the maximum value between 1 and k‘,
discarded by each sensor during the convergence. where k‘ is the maximum difference slast: u-slast: 0
Theorem 1C gives an upper bound on the number of for any sensor u in the network at an initial state. Note
redundant messages that can be accepted by each that the value of k is finite but it is unbounded.
sensor during the convergence. Theorem 2A. In the linear sequencing protocol,
Theorem 1A. In the sequencing free protocol, starting from any illegitimate state, the protocol
starting from any illegitimate state, the protocol reaches a legitimate state within (k+1)*f time units,
reaches a legitimate state within 2* f time units, and and continues to execute within legitimate states.
continues to execute within legitimate states.

Theorem 1B. In the sequencing free protocol,


starting from any illegitimate state, every sensor
discards no fresh message (before the protocol
converges to a legitimate state). Note that starting
from any legitimate state, every sensor discards no
fresh message, since the sensor accepts every
received message.

Theorem 1C. In the sequencing free protocol,


starting from any illegitimate state, every sensor .

347
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Fig. 6. A specification of sensor u in a linear returns true if s is logically larger than slast, and
sequencing protocol otherwise returns false. Sensor u accepts the message
if Larger(s, slast) returns true, and forwards it if h > 1.
Theorem 2B. In the linear sequencing protocol, The receiving action of sensor u is modified as follows:
starting from any illegitimate state, every sensor
discards at most (k+1)*f fresh messages (before the
protocol converges to a legitimate state).

Theorem 2C. In the linear sequencing protocol,


starting from any illegitimate state, every sensor
accepts at most n- 1 redundant messages (before the
protocol converges to a legitimate state).
The linear sequencing protocol requires sensors to
use unbounded sequence numbers. Thus, this protocol
is very expensive to implement for sensor networks Fig. 8. A specification of sensor u in a circular
that have limited resources. However, once the sequencing protocol
protocol starts its execution from any legitimate state,
every sensor accepts every fresh message and To prove the stabilization property of the circular
discards every redundant message under any degree sequencing protocol, we make an assumption of
of message loss. bounded message loss as follows:
 Bounded message loss:
Starting from any state, if sensor 0 broadcasts smax
A. Third Protocol: Circular Sequencing
/2 consecutive flood messages, then every sensor in
In this protocol, each flood message carries a the network receives at least one of those flood
sequence number that is circularly increased within a messages.
limited range, and so a sensor accepts a flood
message that has a sequence number ―logically‖ larger Theorem 3A. In the circular sequencing protocol,
than the last sequence number accepted by the starting from any illegitimate state, the protocol
sensor. This protocol is called the circular sequencing reaches a legitimate state within (smax+2)* f time
protocol. units, and continues to execute within legitimate
Each flood message is augmented with a sequence states.
number that has a value in the range 0... smax, where Theorem 3B. In the circular sequencing protocol,
smax> 1.We assume that smax is an even number (to starting from any illegitimate state, every sensor
keep our presentation simple). discards at most within (smax+2)* f fresh messages
Whenever sensor 0 broadcasts a new message, (before the protocol converges to a legitimate state).
sensor 0 increases the sequence number of the last
message by one circularly within the range 0... smax Theorem 3C. In the circular sequencing protocol,
and attaches the increased sequence number to the starting from any illegitimate state, every sensor
message. The time-out action of sensor 0 is modified accepts at most f + 1 redundant message.
as follows:
B. Fourth Protocol: Differentiated Sequencing
In this protocol,the sequence numbers of flood
messages are in a limited range, similar to the circular
sequencing protocol. However, in this protocol, a
sensor accepts a flood message if the sequence
number of the message is different from the last
sequence number accepted by the sensor. This
protocol is called the differentiated sequencing
Fig. 7. A specification of sensor 0 in a circular protocol.
sequencing protocol Each flood message is augmented with a sequence
number that has a value in the range 0... smax, where
When a sensor u receives a data (h, s) message, smax> 0.We assume that smax is an even number
sensor u checks if s is logically larger than slast. Similar to the circular sequencing protocol, if a
Sensor u calls the function ―Larger(s, slast)‖ that sensor does not receive a large number of consecutive

348
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

flood messages, the differentiated sequencing protocol Theorem 4A. In the differentiated sequencing
cannot be self-stabilizing. protocol, starting from any illegitimate state, the
Sensor 0 in this protocol is identical to the one in protocol reaches a legitimate state within (smax/ 2 +
the circular sequencing protocol. However, when a 2)* f time units, and continues to execute within
sensor u receives a data (h, s) message, sensor u legitimate states.
accepts the message if s is different from slast, and
forwards the message if h > 1. The receiving action of Theorem 4B. In the differentiated sequencing
sensor u is modified as follows: protocol, starting from any illegitimate state, every
sensor discards at most (smax/ 2 + 2)* f fresh
messages (before the protocol converges to a
legitimate state).

Theorem 4C. In the differentiated sequencing


protocol, starting from any illegitimate state, every
sensor accepts at most f + 1 redundant message
(before the protocol converges to a legitimate state).

Note that starting from any legitimate state, every


sensor accepts every fresh message and discards
Fig. 9. A specification of sensor u in a differentiated every redundant message under the assumption of
sequencing protocol bounded message loss. We compare the stabilization
properties of the four flood sequencing protocols in
Table 1. We also compare the properties of the flood
TABLE 1 sequencing protocols after convergence (or starting
from a legitimate state) in Table 2. We call these
Stabilization Properties of the Flood Sequencing properties the stable properties of the protocols. In
Protocols Tables 1 and 2, ―free,‖ ―lin,‖ ―cir,‖ and ―dif‖ represent
the sequencing free, linear sequencing, circular
sequencing, and differentiated sequencing protocols,
respectively. We conclude that the differentiated
sequencing protocol has better stabilization and stable
properties than those of the other three protocols.

XV. PERFORMANCE EVALUATION


A. Methodology
In the model, a message can be lost due to
TABLE 2 probabilistic message transmission and message
Stable Properties of the Flood Sequencing Protocols collision. If a sensor u sends a message at an instant t,
then an out-neighbor v of u receives a copy of the
message at the same time instant t, provided that the
following three conditions hold:

1) A random integer number is uniformly selected


in the range 0... 99, and this selected number
is less than 100- p, where p is the probability
label of edge (u, v) in the topology.
2) Sensor v does not send any message at
instant t. If v sends a message at t, this
message collides with the message sent by u
(with the net result that v receives no
message at t).

349
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

3) For each in-neighbor w of v, other than u, if w Fig. 3. Reach of the four flood sequencing protocols
sends a message at t, then a random integer starting from an illegitimate state in sparse networks.
number is uniformly selected in the range 0... (a) A 10_10 network. (b) A 20_20 network
99 and the selected number is at least 100-p‘,
where p0 is the probability label of edge (w,
v). If the selected number is less than 100 –
p‘, then this message sent by w collides with
the message sent by u.

The following two types of topologies that have


different network density

1) A topology for a sparse network: The edge


probability between two sensors is labeled
with probability 0.95 if their distance is at
most 1 and with probability 0.5 if their Fig. 4. Reach of the four flood sequencing protocols
distance is larger than 1 and less than 2. starting from an illegitimate state in dense networks.
Otherwise, there is no edge between the two (a) A 10_10 network. (b) A 20_20 network
sensors. In this topology, each sensor (i, j)
that is not on or near the boundary of the grid XVI. DISCUSSION
generally has eight neighbors. Frequent floods. The flood period, hmax*tmax+1,
2) A topology for a dense network: The edge used in previous sections is based on the lower bound
probability between two sensors is labeled on the flood period. In a typical execution of the
with probability 0.95 if their distance is at protocol, each sensor chooses its forwarding period at
most 1.5 and with probability 0.5 if their random in the range 1…tmax, and so most sensors
distance is larger than 1.5 and less than 3. likely receive the flood messages within (hmax -1)
Otherwise, there is no edge between the two *tmax/2 time units, instead of (hmax-1)* tmax time
sensors. In this topology, each sensor (i, j) units. Thus, in practical setting, sensor 0 may not need
that is not on or near the boundary of the grid to wait a full period that guarantees no collision
generally has 24 neighbors. between two consecutive floods to initiate a next
flood. When a shorter flood period is used, sensor 0
B. Metric can flood a message frequently. Even less than half
The performance of a flood sequencing protocol can the full flood period can be used, depending on a
be measured by the following two metrics: network topology type and size.
1) Reach: The percentage of sensors that receive Multiple sources. In previous sections, only sensor
a message sent by sensor 0. 0 is a source that can initiate message floods. The
2) Communication: The total number of flood sequencing protocols presented in this paper can
messages forwarded by all sensors in the support multiple sources. Each source needs to make
network. sure that it waits f time units after initiating one flood
and before initiating another flood, but after f time
units, it can choose not to initiate a new flood. Each
flood message is augmented with a source ID as well
as a sequence number.

XVII. CONCLUSIONS
In this paper, we discussed a family of the four
flood sequencing protocols that use sequence numbers
to distinguish between fresh and redundant flood
messages. The members of our family are the
sequencing free protocol, the linear sequencing
protocol, the circular sequencing protocol, and the
differentiated sequencing protocol. We concluded that
the differentiated sequencing protocol has better

350
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

overall performance in terms of communication, and in Wireless Multi-Hop Networks,‖ Proc. IEEE
stabilization and stable properties, compared to those INFOCOM, 2006.
of the other three protocols. Note that our analysis is [10] S. Floyd, V. Jacobson, C. Liu, S. McCanne, and
useful for sensor network designers or developers to L. Zhang, ―A Reliable Multicast Framework for
select a proper flood sequencing protocol that satisfies Light-Weight Sessions and Application Level
the needs of a target sensor network. Framing,‖ IEEE/ACM Trans. Networking, vol. 5,
no. 6, pp. 784-803, Dec. 1997.
XVIII. ACKNOWLEDGMENT [11] J. Kulik, W. Heinzelman, and H. Balakrishnan,
The authors are grateful to anonymous referees for ―Negotiation- Based Protocols for Disseminating
their helpful comments. A preliminary version of this Information in Wireless Sensor Networks,‖
paper appeared at the IEEE Transaction on Parallel Wireless Networks, vol. 8, nos. 2/3, pp. 169-
and Distributed Computing (July 2010)[1]. 185, 2002.
[12] C. Intanagonwiwat, R. Govindan, and D. Estrin,
XIX. REFERENCES ―Directed Diffusion: A Scalable and Robust
[1] Young-ri Choi, and Chin-Tser Huang, Communication Paradigm for Sensor Networks,‖
―Stabilization of Flood Sequencing Protocol in Proc. ACM MobiCom, 2000.
Sensor Network‖ IEEE Trans. Parallel and
Distributed Systems, vol. 21, no. 7, July. 2010.
[2] S. Ni, Y. Tseng, Y. Chen, and J. Sheu,
―TheBroadcast Storm Problem in a Mobile Ad
Hoc Network,‖ Proc. ACM MobiCom, pp. 151-
162, 1999.
[3] B. Williams and T. Camp, ―Comparison of
Broadcasting Techniques for Mobile Ad Hoc
Networks,‖ Proc. ACM Int‘l Symp. Mobile Ad
Hoc Networking and Computing, 2002.
[4] D.B. Johnson and D.A. Maltz, ―Dynamic Source
Routing in Ad Hoc Wireless Networks,‖ Mobile
Computing, Chapter 5, vol. 353, pp. 153-181,
Kluwer Academic Publishers, 1996.
[5] W. Peng and X. Lu, ―AHBP: An Efficient
Broadcast Protocol for Mobile Ad Hoc Network,‖
J. Science and Technology, 2001.
[6] A. Durresi, V. Paruchuri, S.S. lyengar, and R.
Kannan, ―Optimized Broadcast Protocol for
Sensor Networks,‖ IEEE Trans. Computer, vol.
54, no. 8, pp. 1013-1024, Aug. 2005.
[7] H. Sabbineni and K. Chakrabarty, ―Location-
Aided Flooding: An Energy-Efficient Data
Dissemination Protocol for Wireless-Sensor
Networks,‖ IEEE Trans. Computers, vol. 54, no.
1, pp. 36-46, Jan. 2005.
[8] D. Ganesan, B. Krishnamurthy, A. Woo, D.
Culler, D. Estrin, and S. Wicker, ―An Empirical
Study of Epidemic Algorithms in Large Scale
Multihop Wireless Networks,‖ IRP-TR-02-003,
2002.
[9] M. Heissenbu¨ ttel, T. Braun, M. Waelchli, and
T. Bernoulli, ―Optimized Stateless Broadcasting

351
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

KNOWLEDGE DISCOVERY PROCESS THROUGH


TEXTMINING

*
Dr.KARTHIKEYANI.V **PARVIN BEGUM.I***TAJUDIN.K ****SHAHINA BEGAM.I.,

* ** *** ****
Assistant Professor, Lecturer, .Lecturer, Asst.Professor,
Department of Computer Department Of Department Of Department of MCA.,
Science, Govt. Arts Computer Application, Computer Science, VelTech, Dr.RR &
College for Women,Salem- Soka Ikeda College Of New College, Dr.SR Engg College,
08 Arts and Science ,Ch- Royapeetah,Ch-14 Ch
99
drvkarthikeyani@gmail.co tajudinap@gmail.com sbshahintaj@gmail.c
m parvinnadiya@gmail.c om
om
Key words: TextMining,DataMining,association
rule.
ABSTRACT

I.INTRODUCTION
This paper describes knowledge discovery
HE information age is characterized by a rapid growth
through text mining for extracting association
rules from a collection of database. The main for information available in electronic media such as
contribution of the technique with Information
Retrieval(TF-IDF) It consists of three phases (i)
collecting a database , (ii) Association rule databases, data warehouses, intranet documents,
mining phase , The EART system treats texts
business emails and www. This growth has created a
only not images or figures. EART discovers
association rules amongst keywords labeling demanding task called Knowledge Discovery in
the collection of text database. (iii)
Databases (KDD) and in Texts (KDT). Therefore,
Visualization phase. The experiments applied on
diseases .The term mining refers to loosely researchers and companies in recent years [1] focused
coupled to finding relevant information or
on this task and significant progress has been made.
discovery knowledge from a large volume of
data .Knowledge discovered from a database Text Mining (TM) and Knowledge discovery in Text
can be represented by a set of
(KDT) are new research areas that try to solve the
rules.Experiments applied on a collection of
database selected from MEDLINE that are problem of information overload by using techniques
related to the outbreak of chikungunya, TB in
from Data Mining, Machine Learning, Natural
Tamil nadu.
LanguageProcessing (NLP).NLP components improving

352
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

overtime.A manual search on a corpus is not sufficient source of knowledge and considered as assets, it is
for giving answers to more complex research worthwhile to invest in efforts to get access to these
questions[8].The NLP field can be used to support sources.
scientific work. Information Retrieval (IR), Information
Extraction (IE) and Knowledge The outline of the paper is as the follows: in section II,
we present the EART system. Experimental results are
present in section III, Section IV presents the related
work. Section V provides conclusion and future work.
II EART System

Management (KM) [1]. The main goal of text mining is Collection


Of Data‟s
to enable users to extractinformation from large
textual resources. The final output of the mining
process varies and it can only be defined with respect
to a specific application. Most Text Association rule Applied Based on
Keyword

Mining objectives fall under the following categories of


operations: Feature Extraction, Text-base Navigation, Visualization Phase
Search and Retrieval, Categorization (supervised
classification), Clustering (unsupervised classification),
Summarization, Trend Analysis, Association Rules and
Visualization [2]. Figure .1.Extracting on Association Rule

EART depends on word feature to extract association


rules. A).Data Collection

Association rule is one of the important techniques of In the Text Mining analysis to collect the data in the
data mining. Association rules highlight correlations disease in public health organization .Based on the
betweenkeywords in the texts. Moreover, association data to create a database on name of the district
rules are easy to understand and to interpret for an diseases name .
analyst. In this paper, we focus on the extraction of
association rules amongst keywords labeling the B).Association Rule
database.Since collections of database are a valuable

353
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The goal ofAssociation rules mining to generate all among the two sets containing nominal or ordinal data
possible rules that exceeds some minimum user– items. More specifically, such an association rule
specified Support and Confidence Thresholds[7]. should translate the following meaning: Statistical
basis is represented in the form of minimum support
Various proposals for mining association rules from and confidence measures of these rules
transaction data were presented on different context.
Some of these proposals are constraint-based in the C).Visualization Phase
sense that all rules must fullfill a predefined set of
conditions, such as support and confidence [4],[6]. Information visualization for text mining typically
involves producing a 2D or 3D representation that
An Enhanced Scaling Apriori for Association Rule exposes selected types of semantic patterns in data
Mining Efficiency form X + Y, where X and Y are sets base collection.Visualization characteristics for the
of items. The problem is to find all association rules rendering to be meaningful. It can be generate
that satisfy user-specified minimum support and visualizing based on attributes such as keyword,
minimum confidence constraints. Conceptually, this District id ,District name,Support and Confidence.
problem can be viewed as finding associations Principal components visualizations that represent
between the ―l‖ values in a relational table where all relationships among association rule based.Text data
the attributes are Boolean. The table has an attribute needs a separate visual tools that combine numeric
corresponding to eachitem and a record corresponding and textual information.The higher dimensionality of
to each disease. The value of an attribute for a given text makes it harder to view than numeric data with its
record is ―1― if the disease corresponding to the fewer dimensions.This approach supports the user in
attribute is present in the transaction corresponding to identifying quickly the main topics or concepts by their
the record, ―0‖ else Relational tables in most disease importance on the representation..The ability to
and scientific domains have richer attribute types. visualize large data set lets users quickly explore the
Attributes can be quantitative (e.g. effected, status) semantic relationships that exist large collection of
Boolean attributes can be considered a special case of data‘s[4].
categorical attributes. This research work defines the
problem of mining association rules over quantitative III EXPERIMENTAL RESULT
attribute in large relational tables and techniques for
discovering such rules. This is referred as the
Quantitative Association Rules problem.
The problem of mining association rules in categorical
data presented in diseases
The original problem of mining association rules was
formulated as how to find rules of theform set1, set2.
This rule is supposed to denote affinity or correlation

354
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Figure 2.Disease occurred districts in


Tamilnadu

355
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DATA REPORT

Table1.There are a total of 32 Districts


inTamilnadu[3].

Did Dname T C
no B hi
k
101 Chennai 1 1
102 Coimbatore 1 0
103 Cudalore 1 1
104 Dharmapuri 1 1
105 Dindigual 1 0
106 Erode 1 0
107 Kanchipuram 1 0

IV RELATED WORK 108 Kanyakumari 1 1


109 Karur 1 0
SQL> create table disease(did number(10),dname
110 Krishnagiri 1 1
varchar2(20),chickunguniyanumber(20),tb
111 Madurai 1 1
number(10),support number(10),confidence
112 Nagapattinam 1 1
number(10));
113 Namakkal 1 0
Table created. 114 Permabular 1 1
115 Pudukottai 1 0
116 Ramanathapu 1 1
ram
117 Salem 1 0
118 Sivagangi 1 0
119 Thanjavur 1 0

356
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

120 Theni 1 0 The work considers the upward closure properties of


121 Nilgris 1 0 association rule set for omission ofuninformative
122 Thoothukudi 1 0 association rules, and presented a direct algorithm to
123 Trichy 1 1 efficiently generate the informative rule set without

124 Thiruneveli 1 1 generating all frequent item sets.


125 Thiruvalur 1 1
126 Thiruvannama 1 1
lai The approach in our algorithm is to explore
127 Vellore 1 1 multidimensional properties of the data (provided such
128 Villupuram 1 1
129 Virudhunagar 1 1 properties are present), allowing to combine this
130 Ariyalur 1 1 additional information in a very efficient pruning
131 Thirupur 1 1
phase. This results in a very flexible and efficient
132 Thiruvarur 1 1
algorithm that was used quantitative databases with
performance measure done on the memory utilization
during the disease of the record sets.
In future using advanced approach for association
rule applied more disease to find out status of the
states.
IV. PREVIOUS RESEARCH

Various proposals for mining association rules from


REFERENCE
transaction data were presented on different context.
Some of these proposals are constraint-based in the [1] R. Feldman and I. Dagan, ―Knowledge discovery in
sense that all rules must fulfill a predefined set of
textual databases (KDT)‖,Proc.1st nt. Conf on
conditions, such as support and confidence [5],[6].
knowledge Discovery and DataMining 1995.
The main goal common to all of these algorithms is to
reduce the number of generated rules The research [2] Agrawal R, H. Mannila, R. Srikant, H. Toivonen,
work extend the first earlier techniques since it do not
and A.Verkamo. ―Fast discovery of association
relax any set of conditions nor employ a
rules.Advances in Knowledge Discovery and
interestingness criteria to sort the generated rules. DataMining.‖San Jose,CA,pages,307-328,1996.
Although simple, the rules generated by this approach
may not be intuitive, mainly when there are semantic
intervals that do not match the partition employed.
V.CONCLUSION AND FUTURE WORK

The research work has defined a association rule using


the confidence priority.Association rule applied only on
two disease

357
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DATA LEAKAGE DETECTION


USING
ROBUST AUDIO HIDING TECHNIQUES
* P.SUNIL
E-mail Id: psunilcad@gmail.com
Ph no:9790168660
Prathyusha Institute Of Technology And Management
Aranvayalkuppam
Tiruvallur Dt-602 025.

Abstract—We study the following problem: A data attributes, or one can replace exact values by ranges
distributor has given sensitive data to a set of [18]. However, insome cases it is important not to
supposedly trusted agents (third parties). Some of the alter the original distributor‘s data. For example, if an
data is leaked and found in an unauthorized place outsourcer is doingour payroll, he must have the exact
(e.g., on the web or somebody‘s laptop). The salary and customer bank account numbers. If medical
distributor must assess the likelihood that the leaked researchers will be treating patients (as opposed to
data came from one or more agents, as opposed to simply computing statistics),they may need accurate
having been independently gathered by other means. data for the patients.Traditionally, leakage detection is
We propose data allocation strategies (across the handled by watermarking,e.g., a unique code is
agents) that improve the probability of identifying embedded in each distributed copy. If that copy is
leakages. These methods do not rely on alterations of later discovered in the hands of an unauthorized party,
the released data (e.g.,watermarks). In some cases the leaker can be identified.Watermarks can be very
we can also inject ―realistic but fake‖ data records to useful in some cases, but again, involve some
further improve our chances of detecting leakage and modification of the original data.Furthermore,
identifying the guilty party. watermarks can sometimes be destroyed if the data
recipient is malicious.In this paper we study
1 INTRODUCTION unobtrusive techniques for detecting leakage of a set
In the course of doing business, of objects or records. Specifically we study the
sometimessensitivedata must be handed over to following scenario: After giving a set of objects to
upposedly trusted third parties. For example, a agents, the distributor discovers some of thosesame
hospital may give patient records to researchers who objects in an unauthorized place. (For example, the
will devise new treatments. Similarly,a company may data may be found on a web site, or may be
have partnerships with other companiesthat require obtainedthrough a legal discovery process.) At this
sharing customer data. Another enterprise may point the distributor can assess the likelihood that the
outsource its data processing, so data must be given leaked data came from one or more agents, as
to various other companies.We call the owner of the opposed to having been independently gathered by
datathe distributor and the supposedly trusted third other means. Using an analogy with cookies stolen
parties the agents. Our goal is to detect when the from a cookie jar, if we catch Freddie with a single
distributor‘s sensitive data has been leaked by agents, cookie, he can argue that a friendgave him the cookie.
and if possible to identify the agent that leaked the But if we catch Freddie with 5 cookies, it will be much
data.We consider applications where the original harder for him to argue that his hands were not in the
sensitive data cannot be perturbed. Perturbation is a cookie jar. If the distributor sees ―enough evidence‖
very usefultechnique where the data is modified and that an agent leaked data, he may stop doing business
made ―less sensitive‖ before being handed to agents. with him, or may initiate legal proceedings.In this
For example,one can add random noise to certain paper we develop a model for assessing the ―guilt‖ of

358
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

agents. We also present algorithms for distributing Suppose that after giving objects to agents, the
objects to agents, in a way that improves our chances distributor discovers that a set S ⊆T has leaked. This
of identifying a leaker. Finally, we also consider the means that some third party called the target, has
option of adding ―fake‖ objects to the distributed been caught in possession of S. For example, this
set.Such objects do not correspond to real entities but target may be displaying S on its web site, or perhaps
appear realistic to the agents. In a sense, the fake as part of a legal discovery process, the target turned
objects acts as a type of watermark for the entire set, over S to the distributor.Since the agents U1, . . . , Un
without modifying any individual members. If it turns have some of the data, it is reasonable to suspect
out an agent was given one or more fake objects that them leaking the data. However,the agents can argue
were leaked, then thedistributor can be more that they are innocent, and that theSdata was
confident that agent was guilty. We start in Section 2 obtained by the target through other means.
by introducing our problem setup and the notation we For example, say one of thebjects in S represents a
use. In the first part of the paper, Sections 4 and 5, customer X. Perhaps X is also a customer of some
we present a model for calculating ―guilt‖ probabilities other company, and that company provided the data
in cases of data leakage. Then, in the second part, to the target. Or perhaps X can be reconstructed from
Sections 6 and 7, we present strategiesfor data variouspublicly available sources on the web. Our goal
allocation to agents. Finally, in Section 8, we evaluate is to estimate the likelihood that the leaked
the strategies in different data leakage scenarios,and data came from the agents as opposed to other
check whether they indeed help us to identify a leaker. sources.Intuitively, the more data in S, the harder it is
2. PROBLEM SETUP AND NOTATION for the agents to argue they did not leak anything.
2.1 Entities and Agents Similarly,the ―rarer‖ the objects, the harder it is to
A distributor owns a set T = {t1, . . . , tm} of argue that the target obtained them through other
valuabledata objects. The distributor wants to share means. Not onlydo we want to estimate the likelihood
some of theobjects with a set of agents U1, U2, ...,Un, the agentsleakeddata, but we would also like to find
but does not wish the objects be leaked to other third out if one of themin particular was more likely to be
parties. The objects in T could be of any type and size, the leaker. For instance, if one of the S objects was
e.g., they could be tuples in a relation, or relations in a only given to agent U1, while the other objects were
database.An agent Uireceives a subset of objects given to all agents, we may suspect U1 more. The
Ri⊆T,determined either by a sample request or an model we present next captures this intuition.
explicit request:• Sample request Ri= SAMPLE(T,mi): We say an agent Uiis guilty and if it contributes one or
Any subset of mi records from T can be given to Ui.• more objects to the target. We denote the event that
Explicit request Ri= EXPLICIT(T, condi): Agent agent Uiis guilty as Giand the event that agent Uiis
Uireceives all the T objects that satisfy condi. guilty for a given leaked set S as Gi|S. Our next step is
Example.Say T contains customer records for a given to estimate Pr{Gi|S}, i.e., the probability that agent
company A. Company A hires a marketing agency U1 Uiis guilty given evidence S.
to do an on-line survey of customers. Since any
customers will do for the survey, U1 requests a sample 3 RELATED WORK
of 1000 customer records. At the same time, company The guilt detection approach we present is related to
Asubcontracts the data provenance problem [3]: tracing the lineage
with agent U2 to handle billing for all California of S objects implies essentially the detection of the
customers. Thus, U2 receives all T records that satisfy guilty agents. Tutorial [4] provides a good overview on
the condition ―state is California.‖ theresearch conducted in this field. Suggested
Although we do not discuss it here, our model can be solutions are domain specific, such as lineage tracing
easily extended to requests for a sample of objects for data warehouses [5], and assume some prior
that satisfy a condition (e.g., an agent wants any 100 knowledge on the way a data view is created out of
California customer records). Also note that we do not data sources.Our problem formulation with objects
concern ourselves with the randomness of a and sets is moregeneral and simplifies lineage tracing,
sample.(We assume that if a random sample is since we do not consider any data transformation from
required, there are enough T records so that the to- Risets to S.As far as the data allocation strategies are
be-presented objectselection schemes can pick concerned, our work is mostly relevant to
random records from T.) watermarking that is used as a means of establishing
original ownership of distributed objects. Watermarks
2.2 Guilty Agents were initially usedin images [16], video [8] and audio
data [6] whose digital representation includes

359
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

considerable redundancy.Recently, [1], [17], [10], [7] probability that U2 is guilty decreases significantly: all
and other works have also studied marks insertion to of U2‘s 8 objects were also given to U1, so it gets
relational data. Our approachand watermarking are harderto blame U2 for the leaks. On the other hand,
similar in the sense of providing agents with some kind U2‘s probability of guilt remains close to 1 as p
of receiver-identifying information. However, by its increases, since U1 has 8 objects not seen by the
very nature, a watermark modifies the item being other agent. At the extreme, as p approaches 1, it is
watermarked. If the object to be watermarked very possible that thetarget guessed all 16 values, so
cannot be modified then a watermark cannot be the agent‘s probability of guilt goes to 0.
inserted. In such cases methods that attach 5.2 Impact of Overlap between Riand S
watermarks to the distributed data are not In this subsection we again study two agents, one
applicable.Finally, there are also lots of other works on receiving all the T = S data, and the second one
mechanismsthat allow only authorized users to access receiving a varying fraction of the data. Figure 1(b)
sensitivedata through access control policies [9], [2]. shows the probability of guilt for both agents, as a
Such approaches prevent in some sense data leakage function of the fraction of the objects owned by U2,
by sharing information only with trusted parties. i.e., as a function of|R2 ∩ S|/|S|. In this case, p has a
However, thesepolicies are restrictive and may make it low value of 0.2, and U1 continues to have all 16 S
impossible to satisfy agents‘ requests. objects. Note that in our previous scenario, U2 has
To compute this Pr{Gi|S}, we need an estimate for 50% of the S objects. We see that when objects are
theprobability that values in S can be ―guessed‖ by rare (p = 0.2), it does not take many leakedobjects
thetarget. before we cansay 2 is guilty with high confidence. This
4 AGENT GUILT MODEL result matches our intuition:an agent that owns even a
small number of incriminating objects is clearly
probability that values in S can be ―guessed‖ by the suspicious.
target. For instance, say some of the objects in S are
emails of individuals. We can conduct an experiment 6 DATA ALLOCATION PROBLEM
and ask a person with approximately the expertise and
resources of the target to find the email of say 100 The main focus of our paper is the data allocation
individuals. If this person can find say 90 emails, then problem.how can the distributor ―intelligently‖ give
we can reasonably guess that the probability of finding data to agents in order to improve the chances of
oneemail is 0.9. On the other hand, if the objects in detecting a guilty agent? As illustrated in Figure 2,
question are bank account numbers, the person may there are fourinstances of this problem we
only discover say 20, leading to an estimate of 0.2. We address,epending on thetype of data requests made
call this estimatept, the probability that object t can be by agents and whether ―fake objects‖ are allowed.
guessed by the target. The two types of requests we handle were defined in
Section 2: sample and explicit. Fake objects are
objects generated by the distributor that are not in set
5 GUILT MODEL ANALYSIS T. The objects are designed to look like real objects,
and are distributed to agents together with the T
In order to see how our model parameters interact objects, in orderto increase the chances of detecting
and to check if the interactions match our intuition, in agents that leak data. We discuss fake objects in more
this section we study two simple scenarios. In each detail in Section 6.1 below.As shown in Figure 2, we
scenario we have a target that has obtained all the represent our four problem nstances with the names
distributor‘s objects, i.e., T = S. EF, EF, SF and SF, where Estands for explicit requests,
5.1 Impact of Probability p S for sample requests, F for the use of fake objects,
In our first scenario, T contains 16 objects: all of them and F for the case where fake objects are not allowed.
are given to agent U1 and only 8 are given to a Note that for simplicity we are assuming that in the E
second agent U2. We calculate the probabilities problem instances, all agents make explicitwhile in the
Pr{G1|S} and Pr{G2|S} for p in the range [0,1] and S instances, all agents make sample requests.Our
we present the results in Figure 1(a). The dashed line results can be extended to handle mixed cases, with
shows Pr{G1|S}and the solid line shows Pr{G2|S}. As some explicit and some sample requests. We provide
p approaches 0, it becomes more and more unlikely here a small example to illustrate how mixed requests
that the target guessed all 16 values. Each agent has can be handled, but then do not elaborate further.
enough of the leaked data that its individual guilt Assume that we have two agents with requests R1 =
approaches 1. However, as p increases in value, the EXPLICIT(T, cond1) and R2 = SAMPLE(T_, 1)where T_

360
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

= EXPLICIT(T, cond2). Further, say cond1 is scenarios with objects shared among fewer agents are
―state=CA‖ (objects have a state field). If agent U2 obviously easier to handle.As far as scenarios with
has the same condition cond2 = cond1, we can create many objects to distribute and many overlapping
an equivalent problem with sample data requests on agent requests are concerned,they are similar to the
setT_. That is, our problem will be how to distribute scenarios we study, since we can map them to the
the CA objects to two agents, with R1 = SAMPLE(T_, distribution of many small subsets.In our scenarios we
|T_|)and R2 = SAMPLE(T_, 1). If instead U2 uses have a set of |T| = 10 objects
condition ―state=NY,‖ we can solve two different for which there are requests by n = 10 different
problems for sets T_ and T − T_. In each problem we agents.times for each value of B. The results we
will have only one agent. Finally, if the conditions present are the average over the 10 runs.
partially overlap,R1 ∩ T_ _= ∅, but R1 _= T_ we can
solve three differentproblems for sets R1 − T_, R1 ∩
T_ and T_ − R1.

9 CONCLUSIONS
7 ALLOCATION STRATEGIES
In a perfect world there would be no need to hand
In this section we describe allocation strategies that over sensitive data to agents that may unknowingly or
solve exactly or approximately the scalar versions of maliciously leak it. And even if we had to hand over
Equation 8 for the different instances presented in sensitive data, in a perfect world we could watermark
Figure 2.We resort to approximate solutions in cases each object so that we could trace its origins with
where it is inefficient to solve accurately the absolute certainty. However, in many cases we must
optimization problem.In Section 7.1 we deal with indeed workwith agents that may not be 100%
problems with explicit datarequests and in Section 7.2 trusted, and we may not be certain if a leaked object
with problems with sample data requests. came from an agent or from some other source, since
8 EXPERIMENTAL RESULTS certain data cannot admit watermarks.
In spite of these difficulties, we have shown it is
We implemented the presented allocation algorithms possible to assess the likelihood that an agent is
in Python and we conducted experiments with responsiblefor a leak, based on the overlap of his data
simulated data leakage problems to evaluate their with the leaked data and the data of other agents, and
performance.In Section 8.1 we present the metrics we
based on the probability that objects can be ―guessed‖
use for the algorithm evaluation and in Sections 8.2
and 8.3 we present the evaluation for sample requests by other means. Our model is relatively simple, but we
and explicit data requests respectively. believeit captures the essential trade-offs. The
8.1 Metrics algorithms we have presented implement a variety of
In Section 7 we presented algorithms to optimize the data distribution strategies that can improve the
problem of Equation 8 that is an Approximation to the distributor‘s chances ofidentifying a leaker. We have
original optimization problem of Equation 7. In this
shown that distributing objects judiciously can make a
sectionwe evaluate the presented algorithms with
respect to the original problem. In this way we significant difference in identifying guilty agents,
measure not only the algorithm performance, but also especially in cases where there is large overlap in the
we implicitly evaluate how effective the approximation data that agents must receive.Our future work
8.2 Explicit Requests includes the investigation of agent guilt models that
In the first place, the goal of these experiments was to capture leakage scenarios that are not studied in this
see whether fake objects in the distributed data sets paper. For example, what is the appropriate model for
yield significant improvement in our chances of
cases where agents can collude and identify fake
detecting guilty agent. In the second place, we wanted
to evaluate our e-optimal algorithm relative to a tuples? A preliminary discussion of such a model is
random allocation.We focus on scenarios with a few available in [14]. Another open problem isthe
objects that are shared among multiple agents. These extension of our allocation strategies so that they can
are the most interestingscenarios, since object sharing handle agent requests in an online fashion (the
makes it difficultto distinguish a guilty from non-guilty presented strategies assume that there is a fixed set of
agents. Scenarios with more objects to distribute or agents withrequests known in advance).

361
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A SECURED CLOUD COMPUTING FOR LIFE CARE


INTEGRATED WITH WSN
S.kiruthika devi # 1, A.Bhagyalakshmi *2
#
M.E., Computer Science and Engineering
S.A. Engineering College, Chennai.
1
kiruthisuju29@gmail.com

*Assistant Professor,
Department of Computer science and Engineering
S.A. Engineering College, Chennai.

Abstract A secured cloud computing for Life Care 1 Introduction In recent years, Wireless Sensor
integrated with Wireless Sensor Network (WSN) Networks (WSNs) have been employed to monitor
monitors human health, activities, and shares human health and provide life care services.
information among doctors, care-givers, clinics, Existing life care systems simply monitor human
and pharmacies in the Cloud and it incorporates health and rely on a centralized server to store
various technologies with novel ideas including; and process sensed data, leading to a high cost of
sensor networks, Cloud computing security, and system maintenance, yet with limited services and
activities recognition. In addition to that in low performance. For instance, Korea u-Care
emergency condition the person may System for a Solitary Senior Citizen
communicate with the care givers through voice Monitors human health at home and provide
communication and automatically the alert limited services like 24 hours×365 days safety
messages of emergency are sent to the care monitoring services for a SSC, emergency-
giver‘s mobile phone, and the main server. Thus connection services, and information sharing
the patients can provided with much cares and services. In this paper, I have proposed A secured
services. Also the cloud computing offers the Cloud computing for life care integrated with WSN
facility of sharing the resources with less cost and which monitors not only human health but also
provides any where accessing capability of human activities [1] including emergency
patient‘s details which is maintained in a secured notification to the care givers to provide low cost,
way thus nobody can access the details of the high-quality care service to users.
patients other than the authenticated and
authorized person. WSNs are deployed in home environments for
monitoring and collecting raw data. The software
Keywords Wireless sensor network (WSN), Voice architecture is built to gain data efficiently and
Over Internet protocol(VoIP), Activity recognition, precisely. Sensed data is uploaded to Clouds using
Accelerometer, Hidden Markov Model (HMM), a fast and scalable Sensor Data Dissemination
Conditional Random Fields (CRF). mechanism [2][3]. In the Cloud, this sensed data
is either health data or can be used to detect

362
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

human activities. For human activity recognition, tree, support vector machine and some other
there are two novel approaches: embodied kinds of classification methods were evaluated in
sensor-based and video-based activity recognition [9] and [6]. To make use of the sequential
[2]. In the former approach, a gyroscope and structure of activities, Hidden Markov Model
accelerometer - supported sensor is attached to (HMM) was used in [10]. Recently, Conditional
human body (e.g. on his/her wrist). By using Random Fields model (CRF) was introduced as a
gyroscope and accelerometer data, an activity is much better approach compared to HMM in
predicted or inferred based on Semi Markov sequential modelling. Thus, some researchers
Conditional Random Fields. Detected activities have successfully applied CRF to activity
could be simple (e.g. sitting, standing, and falling recognition [7], [11]. A limitation of both
down) or more complicated (e.g. eating, reading, conventional HMM and first-order CRF is the
teeth brushing, and exercising). In the latter Markovian property, which assumes that the
approach, activities are detected by collecting current state depends only on the previous state.
images from cameras, extracting the background Because of this assumption, the labels of two
to get body shapes and comparing to predefined adjacent states must be supposed to occur
patents. It can detect basic activities like walking, successively in the observation sequence.
seating, and falling down. Ontology engine is Unfortunately, the presumption is not always
designed to deduce high-level activities and make satisfied in reality. For example, in the activity
decisions according to user profile and performed recognition problem, two expected activities
activities. To access data on the Cloud, the user (activities that we want to recognize) are often
must authenticate and granted access separated by irrelevant activities (activities that
permissions. An image-based authentication and we do not intend to detect). Furthermore,
activity-based access control are proposed to constant self-transition probabilities cause the
enhance security and flexibility of user‘s access distribution of state‘s duration to be geometric [8]
[4][5]. Independent Clouds can collaborate with which is inappropriate to the real activity duration
each other by using Cloud Dynamic Collaboration model.For accessing the data on the Cloud, the
method [3]. Using these data on the Cloud, many user must authenticate and granted access
low-cost and high quality life cares can be permissions [4], a Human Identification Protocol
provided to users. based on the ability of humans to efficiently
process an image given a secret predicate. It is a
2 Paper contribution and Outline Paper challenge-response protocol in which a subset of
contribution are of two folds, first prosed a novel images presented satisfies a secret predicate
implementation of VoIP (Voice Over Internet shared by the challenger and the user. We
protocol), a voice communication between a conjectured e that it is hard to guess this secret
patients or others in the patients environments predicate for adversaries, both humans and
with the care givers through the servers programs. It can be efficiently executed by
administrators in the hospital environments with humans with the knowledge of the secret which in
human activity recognition by cameras [1]. turn is easily memorable and replaceable.
Second , the emergency alert messages are
passed to the doctors, care givers through the 4 Implementation The functional architecture is
mobile phones by passing Short message shown in the below Figure1 . First of all, human
services. activity data is captured from sensors and
cameras, then being transmitted to the Cloud
3 Related Works For human activity recognition Gateway. The gateway classifies data into health
in the patient environment a semi-CRF is used data, gyroscope and accelerometer data and
[2]. imaging data, and store in a local database. The
Filtering Module filtered redundant and noise data
So far, many algorithms have been proposed for to reduce communication overhead before
accelerometer based activity recognition. Decision sending to the Cloud. The filtered data is also

363
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

updated to the local database. If there is a query


requested from a service/application, the
Query/Response Manager fetches data from the
Cloud gate way

local database and sends it to the requester. Data Bio Sensor


Data
Sensory Sensory
is transmitted to the Cloud so that the Activity Data/Image/ Voice
Image
Data/Image/ Voice
Recognition engine in the Cloud can infer user Env Sensor Classification. Voice Filtering.

activities. Activity and context are forwarded to Sensory data


Camera
the Ontology engine for representation and Img/Voice Query response
manager
inferenceing higher level activities and context. It
Local
Mike DB
also makes decision to respond to different Query

situations. Patient Image, data, voice

Context-Aware Engine

Activity Context-
Recognition Awareness Health Data
Engine

Lwl activity Lwl context Lwl data

Ontology Engine

High-level activity, data, context

Request Aggregator Delivery


Manager
Clou
d DB
Service Manager

Req

Security Manager

Service access Control (SAC) Service


Service
Service Authentication (SA) ce
A
Req, UserID, PWD

Doctors/ Care givers / Nurses

Figure1 – System Architecture

For example, if patient is reading a book, then TV


should be turned off. When doctors, nurses want
to access data, they must authenticate
themselves first. After successful authentication,
the Access Control module makes decision
whether his/her access permission is allowed or
not. If yes, it allows him/her to access to the

364
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Cloud data. Data is forwarded to authentic nurses [5] X.H. Le, S. Lee, Y. Lee, H. Lee. Activity-based
and doctors. Access Control Model to Hospital Information,
Proceeding of 13th IEEE Int. Conf. Embedded and
Real-Time Computing Systems and Applications
5 Conclusion and future work This paper (RTCSA 2007), Seoul, Korea, 2007, pp. 488-496.
presented A secured Cloud computing for life care [6]Ling Bao and Stephen S. Intille. Activity
integrated with WSN .It monitors human health as recognition from user annotated acceleration
well as activities and shares this information data. In Proceedings of the 2nd International
among doctors, care-givers, clinics, and Conference on Pervasive Computing, volume
pharmacies from the Cloud to provide low-cost 3001, pages 1–17, 2004.
and high-quality care to users. Thus proposed [7] Lin Liao, Dieter Fox, and Henry A. Kautz.
system is a combination of various technologies Extracting places and activities from gaps traces
with novel proposed ideas. using hierarchical conditional random fields.
International Journal of Robotics Research,
Future planning is to work on and provide more 26:119–134,2007.
services to different kind of disease patient‘s such [8]. Lawrence R. Rabiner. A tutorial on hidden
as nervous disorder, cardio problem disease markov models and selected applications in
patients. Improving the security and privacy of speech recognition. In Proceedings of the IEEE,
data available on Cloud is also in the pipeline. volume 77, pages 257–286, 1989.
Another extension is to extend its services military [9]. Nishkam Ravi, Nikhil Dandekar, Preetham
services. Mysore, and Michael L Littman. Activity
recognition from accelerometer data.
6 References In Proceedings of the 20th National Conference
[1]Xuan Hung Le, Sungyoung Lee1, Phan Tran Ho on Artificial Intelligence, volume 20, pages 1541–
Truc, La The Vinh, Asad Masood Khattak, 1546, 2005.
Manhyung Han, DangViet Hung, Mohammad M. [10]. Jaakko Suutala, Susanna Pirttikangas, and
Hassan, Miso (Hyung-Il) Kim, Kyo-Ho Koo, Young- Juha Rning. Discriminative temporal smoothing
Koo Lee, Eui-Nam Huh for activity recognition from wearable sensors. In
[2] L. Vinh, X.H. Le, S. Lee. Semi Markov Proceedings of the 4th International Symposium
Conditional Random Fields for Accelerometer on Ubiquitous Computing Systems, volume 4836,
Based Activity Recognition (submitted). pages 182–195,2007.
[3] M. Hassan, E. Huh. A Framework of Sensor- [11]. Douglas L. Vail, Manuela M. Veloso, and
Cloud Integration: Opportunities and Challenges. John D. Lafferty Lafferty.Conditional random fields
International Conference on Ubiquitous for activity recognition. In Proceedingsof the 6th
Information Management and Communication . International Joint Conference on
[4] H. Jameel, R.A. Shaikh, H. Lee and S. Lee. AutonomousAgents and Multi-agent Systems,
Human Identification through Image Evaluation page 235, 2007.
Using Secret Predicates. Topics in Cryptology -
CT-RSA 07, LNCS 4377 (2007) 67–84.

365
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

TRAFFIC ANALYSIS AGAINST FLOW CORRELATION


ATTACKS
Renukadevi.M#1 , Mrs.G.Umarani Srikanth*2
Computer Science Department
S.A .Engineering College, Chennai-77
1
cserenukadevi13@gmail.com,
Computer Science Department S.A .Engineering College, Chennai-77
*
gmurani@yahoo.com

Abstract: Traffic analysis is typically countered configurations and mechanisms to be used to


by the use of intermediary nodes, whose role is counter flow-correlation attacks.
to perturb the traffic flow and thus confuse an Keywords — mix, anonymity, flow-correlation
external observer. Such intermediaries are attack, intermediaries node, security
called mixes. We address attacks that exploit XX. INTRODUCTION
the timing behavior of TCP and other protocols As the Internet is increasingly used in all aspects of
and applications in low-latency anonymity daily life, the realization has emerged that privacy
networks. Intermediaries delay and reroute and confidentiality are important requirements for the
exchanged messages, reorder them, pad their success of many applications. It has been shown that,
size, or perform other operations such a mix in many situations, encryption alone cannot provide
network to handle mail traffic. Mixes have the level of confidentiality required by users, since
been used in many anonymous communication traffic analysis can easily uncover information about
systems and are supposed to provide counter the participants in a distributed application. User
measures to defeat traffic analysis attacks. We anonymity is one important confidentiality criterion
focus on a particular class of traffic analysis for many applications, ranging from peer-to-peer file
attacks, flow correlation attacks, by which an sharing and electronic commerce, and finally to
adversary attempts to analyze the network electronic voting.
traffic and correlate the traffic of a flow over The anonymity of a system can be passively attacked
an input link with that over an output link. by an observer in two ways, either through inspection
Flow-correlation attacks attempt to reduce the of payload or headers of the exchanged data packets,
anonymity degree by estimating the path of or, when encryption is used, through traffic analysis.
flows through the mix network. Two classes of Sufficiently effective encryption can be used to
correlation methods are considered, namely prevent packet content inspection, giving prevalence
time-domain methods and frequency-domain to the second form of attack. Traffic analysis is
methods. In the time domain, statistical typically countered by the use of intermediary nodes,
information about rate distributions is whose role is to perturb the traffic flow and thus
collected and used to identify the traffic confuse an external observer. Such intermediaries
dependency. In the frequency domain, it (often called mixes) delay and reroute exchanged
identifies traffic similarities by comparing the messages, reorder them, pad their size, or perform
Fourier spectra of timing data. The empirical other operations. Chaum [1] proposed such a mix
results provided in this paper give an indication network to handle mail traffic.
to designers of Mix networks about appropriate

366
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The original Chaum mix network operates on entire Mixes with any tested batching strategy may fail
mail messages at a time and therefore does not need under flow-correlation attacks in the sense that, for a
to pay particular attention to latency added by the given flow over an input link, the adversary can
mixes. Increasingly, the data exchanged exceed by effectively detect which output link is used by the
far the capacity of mixes, for example, in file-sharing same flow. To overcome, the detection rate is used,
applications. As a result, current mixes operate on which is the probability that the adversary correctly
individual packets of a flow rather than on entire correlates flows into and out of a mix, defined as the
messages. In conjunction with source routing at the measure of success for the attack.
sender, this allows for very efficient network-level XXI. RELATED WORKS
implementations of mix networks. Chaum [1] pioneered the idea of anonymous
Mixes are also being used in applications where low communication . Since then, researchers have applied
latency is relevant, for example, voice-over-IP or the idea to different applications such as message-
video streaming. Many other applications, such as based e-mail and flow-based low-latency
traditional FTP or file-sharing applications, rely on communications, and they have developed new
delay-sensitive protocols, such as TCP, and are defense techniques as more attacks have been
therefore in turn delay-sensitive as well. For such proposed.
applications, it is well known that the level of traffic For anonymous e-mail applications, Chaum [1]
perturbation caused by the mix network must be proposed using relay servers, called mixes, which
carefully chosen in order to not unduly affect delay encrypt and reroute messages. An encrypted
and throughput requirements of the applications. message is analogous to an onion constructed by a
This paper focuses on the quantitative evaluation of sender, who sends the onion to the first mix:
mix performance. We focus our analysis on a a) Using its private key, the first mix peels off
particular type of attack, which we call the flow- the first layer, which is encrypted using the public key
correlation attack. In general, flow-correlation attacks of the first mix.
attempt to reduce the anonymity degree by b) Inside the first layer is the second mix‘s
estimating the path of flows through the mix network. address and the rest of the onion, which is encrypted
Flow correlation analyzes the traffic on a set of links with the second mix‘s public key.
(observation points) inside the network and estimates c) After getting the second mix‘s address, the
the likelihood for each link to be on the path of the first mix forwards the peeled onion to the second mix.
flow under consideration. An adversary analyzes the This process repeats all the way to the receiver.
network traffic with the intention of identifying which d) The core part of the onion is the receiver‘s
of several output ports a flow at an input port of a address and the real message to be sent to the
mix is taking. Obviously, flow correlation helps the receiver by the last mix.
adversary identify the path of a flow and Chaum proposed return address and digital
consequently reveal other critical information related pseudonyms for users to communicate with each
to the flow (e.g., sender and receiver). other anonymously.
C. Goals Zhenghao Zhang [2] proposes simultaneous Multiple
Major contributions are summarized as follows: Packet Transmission (MPT) to improve the downlink
1) Two Classes Of Correlation Methods performance of networks. Using this multiple packet
Formally model the behavior of an adversary who transmissions, two compatible packets can send
launches flow-correlation attacks. In order to simultaneously by the sender to two different
successfully identify the path taken by a particular systems. This will increase the performance of a
flow, the attacker measures the dependency of traffic network. Paper gives fast approximation algorithm
flows. Two classes of correlation methods are that is capable of finding a matching at least 75% of
considered, namely time-domain methods and the size of a maximum matching in a calculated time.
frequency-domain methods. There are some limits for arrival rate that can allow in
2) Detection Rate a network. The project can enhance MPT and the
It measures the effectiveness of a number of popular results show that the maximum arrival rate increases
mix strategies in countering flow-correlation attacks.

367
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

appreciably even with a very small compatibility


probability.
Suh [3] proposes an interesting problem to
characterize the nature of relayed traffic and to
detect its presence in the network. This paper focuses
on characterizing and detecting relayed traffic
generated by Skype, a popular voice over IP
application that uses relays. This paper [4] focus on
traffic analysis on encrypted Voice over IP (VoIP) calls
at the network level and the application level. The
network-level traffic analysis aims to correlate VoIP Fig 2 –Typical Flowchart for Flow Correlation
traffic flows with features in time domain and The flow correlation attack can de discovered by four
frequency domain. For the designer of the anonymity techniques.
system, the tradeoff between the anonymity degree 1) Data Collection
[5] and quality-of-service (QOS).Onion routing [6],  The adversary is able to collect information
Freedom [7], and—most prominently—TOR [8] about all the packets on both input and output links.
belong to this category.  For each collected packet, the arrival time is
recorded and also all the packets are encrypted and
padded to the same size.
 The arrival times of packets at input link ―i‖
form a time series
Ai = { (ai,1) ,….., (ai,r) }
Where ai,k is the kth packet‘s arrival time at input link
―i‖ and ―r‖ is the size of the sample collected during a
given sampling interval. Similarly, the arrival times of
packets at output link j form a time series.
Bj = { (bj,1) ,….., (bj,s) }
Fig 1 – Architecture of Single
Mix Where bj,k is the kth packet‘s arrival time at output
The adversary node objective is to correlate an link ―j‖, and ―s‖ is the size of the sample collected
incoming flow to an output link at a Mix. This is flow during a given sampling interval. The packets come
correlation. This flow-correlation attack is harmful in a out from mixes in batches.
variety of situations. For example, in the single-mix 2) Flow Pattern Vector Extraction
scenario depicted in Fig. 1, the adversary can  The aim of the adversary is to analyze the
discover whom sender (say, S1) is talking to (R1 or time series Ais and Bjs in order to determine
R2 in this case) by correlating the output traffic at the ―Similarity‖ between an input flow and an output flow
Mix to S1‘s traffic despite cross traffic from S2 or of the mix.
other senders. In a mix network, the adversary can  A direct analysis over these time series will
easily reconstruct the path of the connection by not be effective.
combining measurements and results of flow  They need to be transformed into so-called
correlation either at the network boundaries or within pattern vectors.
the network.  The time series Ai is transformed into pattern
A. Traffic Flow Correlation Techniques vector
Xi = {( xi,1) , ..... , (xi,q)}
This section discusses the traffic flow-correlation
techniques that may be used by the adversary either  And time series Bj is transformed into pattern
to correlate senders and receivers directly or to vector
Yj = {( yj,1) , ..... , (yj,q)}
greatly reduce the searching time for such a
correlation in a mix network.
3) Distance Function Selection

368
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

We define the distance function d(Xi,Yj) ,which in a batch and the time that elapses between two
measures the ―distance‖ between an input flow at batches.
input link ―i‖ and the traffic at output link ―j‖. The For threshold-triggered batching strategies, packets
smaller the distance, the more likely the flow on an leave the mix in batches. Hence, the interarrival time
input link is correlated to the corresponding flow on of packets in a batch is determined by the link
the output link. bandwidth, which is independent of the input flow.
Thus, the useful information to the adversary is the
Distance function
number of packets in a batch and the time that
d ( Xi , Yj )
elapses between two batches. Normalizing this
Once the distance function has been defined between relationship, we define the elements in pattern vector
an input flow and an output link, it can be easily carry Yj
out the correlation analysis by selecting the output Yj,k= Number of packets in batch in the sampling
link whose traffic has the minimum distance to input interval
flow pattern vector Xi. (Ending time of batch k)-(Ending time of batch
This paper focuses on preventing Flow- k-1)
correlation attack from the adversary node. The Flow- For timer-triggered batching strategies, a batch of
correlation attack can be overcome by using the packets is sent whenever a timer fires. The length of
Intermediate node. This intermediate node performs the time interval between two consecutive timer
the batching and the reordering techniques for events is a predefined constant. Thus, following a
providing security to the data via a congested similar argument made for the threshold-triggered
network. In this paper, Single mix can achieve a batching strategies, we define the elements in pattern
certain level of communication anonymity. The vector Yj as follows:
sender of a message attaches the receiver address to Yj,k = Number of packets in the kth time-out interval
a packet and encrypts it using the mix‘s public key. (time of kth time-out)-(time of(k-1)th time-out)
Upon receiving a packet, a mix decodes the packet.
Different from an ordinary router, a mix usually will = Number of packets in the kth time-out interval
not relay the received packet immediately. Rather, it Predefined inter-time-out length
collects several packets and then sends them out in a For the traffic without batching (i.e., the baseline
batch. The order of packets may be altered as well. strategy s0 defined in Table 1), we use similar
Techniques such as batching and reordering are methods defined for timer-riggered batching
simple means to perturb the timing behavior of strategies as shown in (5). The basic idea in the
packets across a mix, which in turn is considered methods for extraction of pattern vectors is to
necessary for mixes to prevent timing-based attacks. partition a sampling interval into multiple subintervals
Due to this batching and reordering the intruders and to calculate the average traffic rate in each
cannot hack any datas or files that are batched. subinterval as the values of the elements of traffic
D. Flow Pattern Vector Extraction pattern vectors. The above two methods differ on
Once the data are collected, the relevant pattern how to partition the interval, depending on which
vectors must be extracted. Recall that batching batching strategy is used by the mix. We take a
strategies in Table 1can be classified into two classes: similar approach to extract pattern vectors Xis
threshold-triggered batching (s1, s3, and s5)1 and corresponding to Yjs. Again, the specific method of
timer-triggered batching (s2,s4, s6, and s7). The subinterval partition depends on how the mix is
packet timing characteristics at the output link allows batching the packets.
for targeted feature extraction for these different XXII. PROCESS DESCRIPTION
classes of batching. For threshold-triggered batching A. Shortest Path Algorithm
strategies, packets leave the mix in batches. Hence, The shortest path problem is the problem of finding a
the interarrival time of packets in a batch is path between two vertices (or nodes) such that the
determined by the link bandwidth, which is sum of the weights of its constituent edges is
independent of the input flow. Thus, the useful minimized. Shortest path algorithms are applied to
information to the adversary is the number of packets automatically find directions between physical

369
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

locations, such as driving directions on web mapping entire mix has only one queue.Each of these two
websites like Map quest or Google Maps. methods has its own advantages and disadvantages.
The Shortest path algorithm is used in proposed The
scheme to find the intermediate system from the end-
systems. Then the batching algorithm is used for
sending the datas in a security manner. Algorithm

B. Batching Stratergies
Batching strategies are designed to prevent not only Name Adjust Algorithm
simple timing analysis attacks, but also powerful Strategy able
trickle attacks, flood attacks, and many other forms of Index Param
attacks. The seven batching strategies are listed in eters
Table 1, in which batching strategies from s1 to s4 S0 Simple none no batching or reordering
are denoted as simple mixes, while batching Proxy
strategies from s5 to s7 are denoted as pool mixes. S1 Threshold < rn > if n = ra, send n packets
Table 1 – Batching Strategies Mix
Glossary S2 Timed Mix < t > if timer times out, send n
packets
S3 Threshold < m, if timer times out, send n
queue size Or Timed t > packets; elseif n = m {send
Mix n packets; reset the timer}
threshold to control the packet sending
S4 Threshold < m,t if (timer times out) and (n
timer's period if a timer is used and > > ra), send n packets
Timed Mix
the minimum number of packets left in the pool for S5 Threshold <m,f if n = m + /, send m
pool Mixes Pool Mix > randomly chosen packets
a fraction only used in Timed Dynamic-Pool Mix S6 Timed <t,f> if (timer times out) and (n
Batching is typically accompanied by reordering. In Pool Mix > f), send n — f randomly
this proposed scheme, the attacks focus on the traffic chosen packets
characteristics. As reordering does not significantly S7 Timed <m,t,f, if (timer times out) and (n
change packet interarrival times for mixes that use Dynamic- p> > ra + /), send max (l,
batching, these attacks are unaffected by reordering. Pool Mix [p(n — f)\) randomly cho-
Thus, these results are applicable to systems that use sen packets
any kind of reordering methods. More precisely, control of link-based batching is distributed inside the
reorderings are in all cases caused by packets being mix and hence may have good efficiency.
delayed by the batcher, and can therefore be handled XXIII. SYSTEM ANALYSIS
by modifying the batching algorithm accordingly. Any A. Architecture
of the batching strategies can be implemented in two The proposed system architecture consist of three
ways: main modules (objectives), namely, the client
 Link-Based Batching module, server module and a intermediator module.
 Mix-Based Batching
Link-Based Batching, in which each output link has a
separate queue. Mix-Based Batching, in which the

370
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

the proposed system, while overcoming existing


system and also providing the support for the future
enhancement system. They are listed below:-
 Server Modules
 Intermediate Modules
 Client Modules

The Server Module is used select the best


intermediate node in the congested network via the
systems connected in the LAN (Local Area Network)
connection. The Server module generates a key for
providing security. Intermediate Module is used to
calculate the speed of each systems and stored in the
database. Batching and reodering techniques are
used and the client module is used to receive the
files.
XXIV. EMPIRICAL EVALUATION
Fig 3 - Architecture A. Metrics
This paper generally used to overcome the overhead We use detection rate as a measure of the ability of
that occurs in the congested network. The the mix to protect anonymity. Detection rate here is
confidential file cannot be send via the congested defined as the ratio of the number of correct
network. To overcome this an Intermediate node is detections to the number of attempts. While the
selected from a congested network to send the detection rate measures the effectiveness of the mix
confidential files in a secure manner via the (the lower the detection rate, the more effective the
congested network. An intermediate node is selected mix), we measure its efficiency in terms of QoS
based on the shortest-path algorithm as Fig 5.1. This perceived by the applications. We use FTP goodput as
intermediate node performs the batching stratergy an indication of FTP QOS.
and reordering techniques B. Performance Evaluation
The intermediate node batches the confidential files Effectiveness of Batching Strategies Fig. 3 shows the
and again reorders those files (for security). By detection rate for systems using a linkbased batching
performing these techniques even the server or client strategy. Fig. 5 shows the detection rate for systems
does not know in what way the file are batched and using a mix-based batching strategy as a function of
hence it provides security. By using the batching and the number of packets observed. A sample may
reordering techniques even the adversary node include both FTP packets and cross traffic packets
cannot identify the data. while FTP packets account for less than 20 percent of
B. Modules the number (sample size) of packets. Parameters in
A modular design reduces complexity, facilities the legends of these figures are listed in the same
change (a critical aspect of software maintainability) order as in Table 1.
and results in easier implementation by encouraging
parallel development of different parts of the system.
Software with effective modularity is easier to
develop because function may be compartmentalized
and interfaces are simplified. Software architecture
embodies modularity, that is, software is divided into
separately named and addressable components called
modules that are integrated to satisfy problem
requirements.
The following are the modules of the project which is
planned in aid to complete the project with respect to

371
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

service. Various methods used in mix networks were


considered: seven different packet batching strategies
and two implementation schemes, namely the link-
based batching scheme and mix based batching
scheme. We found that mix networks that use
traditional batching strategies, regardless of the
implementation scheme, are vulnerable under flow-
correlation attacks. By using statistical analysis, an
adversary can accurately determine the output link
used by traffic that comes to an input flow of a mix.
The detection rate can be as high as 100 percent as
Fig 3 - Detection rate for link-based batching. long as enough data are available. This is true even if
(a) Mutual information. (b) Matched filter. heavy cross traffic exists. The data collected in this
paper should give designers guidelines for the
development and operation of mix networks.

Fig 4 - FTP goodput. (a) Link-based batching.


(b) Mix-based batching.

Based on these results, we make the following


observations:
 For all the strategies, the detection rate
monotonically increases with increasing amount of
available data. The detection rate approaches 100 Fig. 5 Power spectrum of an FTP flow.
percent when the sample size is sufficiently large.
This is consistent with intuition, as more data imply REFERENCES
that there is more information about the input flow, [1] D. Chaum, ―Untraceable Electronic Mail, Return
which in turn improves the detection rate. Addresses, and Digital Pseudonyms,‖ Comm. ACM,
 Different strategies display different vol. 24, no. 2, pp. 84-90, Feb.1981.
resistances to flow-correlation attacks. [2] Zhenghao Zhang; Yuanyuan Yang ―Enhancing
 Frequency-analysis-based distance functions Downlink Performance In Wireless Networks By
typically outperform mutual-information-based Simultaneous Multiple Packet Transmission‖ IEEE
distance functions in terms of detection rate. For conference paper on parallel and distributed
many batching strategies, the former performs systems,2006.
significantly better. This is because the frequency- [3] Suh, K.; Figueiredo, D. R.; Kurose, J.; Towsley, D.
based analysis is resilient to phasing. ―Characterizing and detecting Skype-relayed traffic‖,
 We do not find a significant difference IEEE conference paper on parallel and distributed
between link based and mix-based batching. systems,2006.
[4] Yuanchao Lu; Ye Zhu, ―Correlation-Based Traffic
XXV. SUMMARY AND FUTURE WORK Analysis on Encrypted VoIP Traffic‖, IEEE journal on
We have analyzed mix networks in terms of their parallel and distributed systems,2010
effectiveness in providing anonymity and quality-of-

372
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[5] O.R.D. Achives, ―Link Padding and the [7] P. Boucher, A. Shostack, and I. Goldberg,
Intersection Attack,‖ http://archives.seul.org/or/dev, ―Freedom Systems 2.0 Architecture,‖
2002. http://www.freedom.net/products/whitepapers
[6] P.F. Syverson, D.M. Goldschlag, and M.G. Reed, /Freedom_System_2_Architecture.pdf, Dec. 2000.
―Anonymous Connections and Onion Routing,‖ Proc. [8] R. Dingledine, N. Mathewson, and P. Syverson,
IEEE Symp. Security and Privacy, pp. 44-54, 1997. ―Tor: The Second-Generation Onion Router,‖ Proc.
13th USENIX Security Symp.,pp. 303-320, Aug. 2004

373
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A FAULT TOLERANT BASED RESOURCE


ALLOCATION FOR THE GRID ENVIRONMENT
M. Samsul Adam#1, U. Syed Abudhagir *2 , M. Deivamani #3
#1,3
Department Of Information science and Technology
Anna University, Chennai, India.
1
adams146@gmail.com, 3 m.deivamani@gmail.com
*2
Department of Electronics & Communication Engineering
Anna University, Chennai, India.
abu.06.au@gmail.com
Abstract— Grid is an emerging field in computing on resource and network failure rates, administrative
technology, for solving large scale resource policies, and fluctuations in system load. Apparently,
sharing, the grid is widely used. Since it is large runtime changes in system availability can significantly
scale resource sharing environment providing affect application (job) execution. Since for a large
resource allocation and fault tolerance services group of time-critical or time consuming jobs delay and
are important issues. The availability of the loss are not acceptable, fault tolerance should be taken
selected resources for job selection is a primary into account. Providing fault tolerance in a distributed
factor that determines the computing environment, while optimizing resource utilization and
performance. Typically probability of a failure is job execution times, is a challenging task. To accomplish
higher in the grid computing than in a traditional it, two techniques are often applied: job check-pointing
parallel computing and the failures of resources and job replication. In this paper, it is argued that both
affect job execution fatally. It affects the techniques in their pure static form are not able to cope
performance of the grid too. Fault tolerance is with unexpected load and failure conditions within grids.
implemented i.e. failure is overcome by using Therefore, several solutions are proposed that
check-pointing(CP) method. This will periodically dynamically adapt the check-pointing frequency and the
save the status of running jobs to the stable number of replicas as a reaction on changing system
storage so that if a failure occurs the job is properties (number of active resources and resource
allocated to another available resource and failure frequency). Furthermore, a novel hybrid
restarted from the last saved checkpoint instead scheduling approach is introduced that switches at
of starting from the beginning. This reduces the runtime between check-pointing and replication
execution time, cost, and increases computing depending on the system load. Decisions taken by the
performance. abovementioned algorithms are primarily based not only
XXVI. INTRODUCTION on monitored grid state but also on job characteristics
Compared to other distributed environments, such as and on collected historical information. Currently, the
clusters, complexity of grids mainly originates from proposed techniques are limited to address hardware
decentralized management and resource heterogeneity. failure in grids running applications composed of
The latter refers to hardware, as well as to foreseen independent jobs. Simulation-based experiments, using
utilization. These characteristics often lead to strong the gridsim simulator that will form the grid
variations in grid availability, which in particular depends environment. The paper is organized as follows: Section

374
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2 discusses related work; Sections 3 elaborate check- will checkpoint during its execution, and let be the
pointing and provides a simulation-based comparison probability the job finishes the first time it executes,
between check-pointing and replication Section 4, in where β is the failure rate of the system. Then
turn, discusses simulation results while Sections 5 n
concludes the paper. Pc( n|t / k ) = (1)
XXVII. RELATED WORK
The fault tolerance scheme is responsible for detecting Is the probability that there are n failures for a job with
failure events and supporting schedulers to make k checkpoints. The expected number of times the job
appropriate decisions regarding scheduling of failed must run before it successfully completes is given by
jobs. Condor-G [6] detects resource failure using polling,
while heartbeat mechanism is used in Netsolve[7]. In ic ( n| t / k ) = Pc( n|t / k ) = -1 (2)
[8], a fault tolerance scheme for dynamic scheduler is
presented with the aim of improving performance in the The pdf the job will fail at time x with k checkpoints
presence of multiple simultaneous failures. A fault when a failure occurs is given by the average time it
tolerant resource broker architecture for economy based takes the job to complete with k checkpoints and
computational grid, which combines check-pointing and failures can occur is
transaction to provide fault-tolerant scheduling is
implemented in [9]. Failure models are created by Tc( t / k ) = dx = (3)
means of probabilistic distributions with fully
configurable parameters [10]. A large number of When f(x) is exponentially distributed, it can be shown
research efforts have already been devoted to fault that Tc (t) is
tolerance in the scope of distributed environments. Tc( 1- βr ) ( k – 1 )c. (4)
Aspects that have been explored include the design and
implementation of fault detection services [4], [5], as
Using equation (6) – (7), the optimal number of
well as the development of failure prediction [3], [6],
checkpointsfor some job with time t, recovery cost C,
[7], [8] and recovery strategies [9], [10]. The latter are
failure rate β, checkpoint cost c, and job service time
often implemented through job check-pointing in
distribution f(x) can be determined by minimizing the
combination with migration and job replication. Although
function given in (4) above and solving for k. for
both methods aim to improve system performance in
exponential distributions ( by using Equation (7) ) this
the presence of failure, their effectiveness largely
would be
depends on tuning runtime parameters such as the
kopt = γ + (5)
check-pointing interval and the number of replicas .
Determining optimal values for these parameters is far
from trivial, for it requires good knowledge of the Tc(t) gives the mean time to finish the task for some job
application and the distributed system at hand. time with fixed t and k. If job service times are modeled
by some distribution f(x), then the mean time to finish
XXVIII. CHECKPOINTING the job averaged overall possible job times is
Check-pointing is the process of saving the current state Tc(t) =
to a stable storage, so that whenever a failure occurs in
the resource running jobs are failed on account of
resource failure in between they will be allocated to Again, observe that the integral for Tc(t) does not
another available resource, instead of starting from the converge if . This indicates Tc(t) is
beginning. This reduces the execution time, cost, and infinite for all distributions that go to 0 more slowly that
increases the computing performance. exp(-β , which implies that Tc (t) falls into the class of
E. Job Completion Times With Check-pointing
PT distributions.
When a job checkpoint completed work is saved to
stable storage at intervals as it is executing. If the job To estimate Tc (t) for small β, can be replaced
fails, then it begins processing again from the last with .
checkpoint. Let k represent the number of times a job Thus, it can be shown that

375
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

were to measure job completion times, then periodically


Tc(t) one would see a completion time that will be much,
≈ (6) much longer than those previously observed before.
This can be problematic for applications having service
level objectives.
The above implies that even small failure rates could be
IV.SIMULATION RESULTS
problematic but also that checkpointing decreases C2 by
We simulate the above experiment using GridSim
a factor of k2, which improves performance. Suppose a
simulation toolkit and built a grid topology shown in
job that checkpoints have associated with recovery and
Fig.1.
checkpointing costs (i.e. r and C respectively) which is a
more realistic model that can be discussed previously.
The time to finish the job can be computed in a similar
way as earlier. The mean time to finish the job is the
mean time it takes the job to complete given it fails and
then recovers which is then multiplied by the mean
number of times the job executes before it successfully
completes. This is added to to the time to execute the
job without any failures .Finally the cost of the number
of checkpoints c must be added k-1 times since the job
does not checkpoint when it finishes. Thus it follows that
T‘c(t) = ic ( i / k )

(7)
Averaging over all possible job times, we get
T‘c ( t ) ( 1 – βr )
(8) Fig. 1. Simulated topology
When f(x) is exponentially distributed, it can be seen
that T‘c (t) is
T‘c (t) = ( 1- βr )
(9)
Using equation (6) – (7), the optimal number of
checkpoints for some job with time t, recovery cost C,
failure rate β, checkpoint cost c, and job service time
distribution f(x) can be determined by minimizing the
function given in (6) above and solving for k. for
exponential distributions ( by using Equation (7) ).
kopt = γ +
(10) Fig.2. Resources executing jobs with check-pointing
We get the above equation to utilize the number of
checkpoints sometimes breaks up the highly variable
behavior (i.e. infinite and variance), however, as the
mean time of the job increases, then using the optimal
number of checkpoints does not necessarily break up
this behavior. Truncation parameters are special cases
of hyper-exponential distributions that asymptotically
behave like PT distributions as the number of
exponential phases approach infinity. That is, if one

376
th
PROCEEDINGS OF 4 NATIONAL CONFERENCE ON HIGH PERFORMANCE
COMPUTING on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Organizations. International Journal of High


Performance Computing Applications, 15(3): page(s).
200-222, 2001.
[14] M. Affan and M. A. Ansari. Distributed Fault
Management for Computational Grids. In Proceeding of
Fifth International Conference on Grid and Cooperative
Computing (GCC 2006), page(s): 363-368, Hunan,
China, 2006.
[15] Wei Chen, Sam Toueg, andMarcos Kawazoe
Aguilera. On the Quality of Service of Failure Detectors.
IEEE Transactions on Computers, 51(5), page(s):561 –
Fig 3. Resources executing jobs without check-pointing 580, 2002.
[16] E. Huedo, R. S. Montero, and I. M. Llorente. A
The above figures shows the simulated results for the Framework for Adaptive Execution in Grids. Software
check-pointing mechanism. With check-pointing we are Practice Experiment, 34(7), page(s):631–651, 2004.
yielding good performance since it takes snapshot of [17] Bianca Schroeder and Garth A. Gibson. A Large-
running jobs periodically if failure occurs that can be Scale Study of Failures in High-Performance-Computing
recovered from the last saved check-point i.e the failed Systems. In Proceedings of the International Conference
job rescheduled to another available resource from the on Dependable Systems and Networks (DSN2006),
last saved check-point. Whereas resources executing page(s): 249 – 258, Philadelphia, USA, 2006.
jobs without check-pointing requires more execution [18] Douglas Thain, Todd Tannenbaum, and Miron
time to run them again from the beginning. In fig 1. Cs Livny. Distributed Computing in Practice: The Condor
represents checkpoint server, GSched represents grid Experience. Concurrency and Computation: Practice and
scheduler, UI represents user interface, IS represents Experience, 17(2), page(s): 323-356, 2005.
information service. UI through which jobs are [19] M. Brzezniak. Innovation of the NetSolve Grid
submitted, scheduler will schedule the jobs, checkpoint Computing System. Concurrency: Practice and
server will save the states of running jobs periodically, Experience, 14(13), page(s):1457-1479, 2002.
IS is used for storing all the available resources. [20] Nitin B. Gorde and Sanjeev Aggarwal. A Fault
V. CONCLUSION Tolerance Scheme for Hierarchical Dynamic Schedulers
In this work, we apply fault tolerant algorithms in Grid in Grids. In Proceeding of International Conference on
system to improve its fault tolerant capacity. From Parallel Processing – Workshop (ICPP-W‘08), page(s):
results, we can find out that the performance of fault 53-58, Portland, USA, Sept. 2008.
tolerant algorithms is better. The failure of computing [21] Waqas Jadoon and Le Ken Lee. Fault Tolerant
resources and their impact on scheduling of grid jobs, in Quality Aware Resource Scheduling Strategy in
terms of execution time under different resource Computational Economy Based Grid Framework. In
allocation policies and failure patterns is simulated and Proceeding of International Seminar on Future
analyzed. The future work focuses on providing fault Information Technology and Management Engineering
tolerance to the Cloud Environment. As cloud is an (FITME‘08), page(s): 233- 237, Leicestershire, United
emerging in the field of information infrastructure. Cloud Kingdom, Nov. 2008.
is another field lot of research going on in providing [22] A. Caminero, A. Sulistion, B. Caminero, C. Carrion
fault tolerance. Since it is a dynamic environment and it and R. Buyya. Extending GridSim with an Architecture
provides services are on demand basis. So it is for Failure Detection. In Proceeding of 13th
necessary to provide failover mechanism for the cloud International Conference on Parallel and Distributed
environment. Systems (ICPADS 2007), page(s): 1-8, Hsinchu, Taiwan
REFERENCES , December, 2007.
[13] I. Foster, C. Kesselman, and S. Tuecke. The
Anatomy of the Grid: Enabling Scalable Virtual

377
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

NETWORK UTILITY MAXIMIZATION USING


DECOMPOSITION METHOD
V. Sivaperumal Mrs. P. Mahalakshmi
Department of Computer Science Engineering, Department of Computer Science Engineering,
Jerusalem College of Engineering, Jerusalem College of Engineering,
Anna university chennai Anna university chennai
Chennai, India. Chennai, India.
E-mail: shiva_sathya@yahoo.com Email: p_mlaxmi@yahoo.com

Abstract—This paper presents a load-aware


routing scheme for wireless mesh networks I. INTRODUCTION
(WMNs). In a WMN, the traffic load tends to be
unevenly distributed over the network. In this A wireless mesh network (WMN) consists of a number
situation, the load-aware routing scheme can of wireless routers, which do not only operate as hosts
balance the load, and consequently, enhance but also forward packets on behalf of other routers.
the overall network capacity. We design a WMNs have many advantages over conventional wired
routing scheme which maximizes the utility, networks, such as low installation cost, wide coverage,
i.e., the degree of user satisfaction, by using the and robustness, etc. Because of these advantages,
dual decomposition method. The structure of WMNs have been rapidly penetrating into the market
this method makes it possible to implement the with various applications, for example, public Internet
proposed routing scheme in a fully distributed access, intelligent transportation system (ITS), and
way. With the proposed scheme, a WMN is public safety. One of the main research issues related
divided into multiple clusters for load control. A to WMNs is to develop the routing algorithm optimized
cluster head estimates traffic load in its cluster. for the WMN.
As the estimated load gets higher, the cluster In mobile ad-hoc networks, the primary concern of
head increases the routing metrics of the routes routing has been robustness to high mobility.
passing through the cluster. Based on the However, nodes in the WMN are generally quasi-static
routing metrics, user traffic takes a detour to in their location. Thus, the focus of the routing studies
avoid overloaded areas, and as a result, the in the WMN has moved to performance enhancement
WMN achieves global load balancing. We by using sophisticated routing metrics. For example, as
present the numerical results showing that the the routing metrics, researchers have proposed the
proposed scheme effectively balances the traffic expected transmission number (ETX), the expected
load and outperforms the routing algorithm transmission time (ETT) and weighted cumulative ETT
using the expected transmission time (ETT) as a (WCETT), the metric of interference and channel
routing metric. switching (MIC), and the modified expected number of
transmissions (mETX) and effective number of
Keywords— Wireless mesh network, load-aware transmissions (ENTs). Although these metrics have
routing, utility, dual decomposition. shown significant performance improvement over the

378
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

traditional hop-count routing metric, they neglect the we define N and L as the sets of the indices of all
problem of traffic load imbalance in the WMN. nodes and all links in the network, respectively. In
In the WMN, a great portion of users in tends to Table 1, we summarize all mathematical notations
communicate with outside networks via the wired introduced in this section.
gateways. In such environment, the wireless links
around the gateways are likely to be a bottleneck of
the network. If the routing algorithm does not take TABLE 1
account of the traffic load, some gateways may be Table of Symbols
overloaded while the others may not. This load
imbalance can be resolved by introducing a load-aware Notation Description
routing scheme that adopts the routing metric with N Set of indices of all nodes
load factor. When the load-aware routing algorithm is
L Set of indices of all links
designed to maximize the system capacity, the major
F Set of indices of flows
benefit of the load-aware routing is the enhancement
of the overall system capacity due to the use of C Set of indices of clusters
underutilized paths. Dr Set of indices of all intermediate links
on route r
II. SYSTEM MODEL Hl Set of indices of all routes passing
Each wireless router in a WMN is fixed at a location. through link l
Thus, the WMN topology does not change frequently Gf Set of indices of all possible routes for
and the channel quality is quasi-static. In addition, flow f
each wireless router serves so many subscribers (i.e., Qr Set of indices of all flows using router r
users) in general that the characteristic of the Mc Set of indices of all links in cluster c
aggregated traffic is stable over time. Therefore, we Vl Set of indices of all clusters including
design the routing scheme under the system model of link l
which topology and user configuration are stable. Pf,r Flow data rate of flow f
dl Effective transmission rate of link l
al Airtime ration of link l
Ratio of the time for data transmission
to the whole time
uf(x) Utility of flow f when the data rate in x
Α System wide fairness parameter
Pf Priority of flow f
Ζ Delay penalty parameter
Fig 1. Example mesh network.
In Fig. 1, in this figure, a node stands for a wireless
The WMN under consideration provides a connection
router, which not only delivers data for its own users,
oriented service, where connections are managed in
but also relays data traffic for other wireless r outers.
the unit of a flow. A flow is also unidirectional. A user
Among nodes, there are some gateway nodes
can communicate with the other user or the gateway
connected to the wired backhaul network. Each user is
node after setting up a flow connecting them. Since a
associated with its serving node. In this paper, we do
user is connected to a unique node, the flow between
not deal with the interface between a user and its
a pair of users can also be specified by the
serving node to focus on the mesh network itself.
corresponding node pair. The node where a flow starts
Through the serving node, a user can send (receive)
(ends) will be called the source (destination) node of
data traffic to (from) the other user in the WMN or to
the flow. Fig. 1 shows an example scenario where a
(from) outside networks via the gateway nodes. If
user intends to send data to outside networks. As seen
node can transmit data to node directly (i.e., without
in this figure, if a flow conveys data to (from) outside
relaying), there exists a link from the node into the
networks, all gateway nodes can be the destination
node m. In this paper, we define a link as
(source) node of the flow. We will identify a flow by an
unidirectional. For the mathematical representation,

379
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

index, generally f, and define F as the set of the cluster c. This process is the follows: Each node
indices of all flows in the network. estimates the total loads for all outgoing links and
Data traffic on a flow is conveyed to the destination broadcasts the estimated load periodically receives
node through a multihop route. We only consider broadcasting message, the cluster head can compute
acyclic routes. Thus, a route can be determined by the the total airtime ratio consumed by the links in its
set of all intermediate links that the route takes. We cluster. If the total airtime ratio exceeds the available
will index a route by rand define Df as the set of the airtime ratio, it means that the cluster is overloaded.
indices of all intermediate links on the route r. For a Therefore cluster h
flow, there can be a number of possible routes that
connect the source and destination nodes. Let G ead c increases .If the cluster c is not overloaded,
denote the set of the indices of all possible routes for that is ,its total airtime ratio is smaller than , the
flow f. cluster head c decreases
The cluster head c periodically broadcasts .
III. DISTRIBUTED IMPLEMENTATION
The routing scheme can be implemented in a Routing: The link cost of link l is calculated as
distributed way, which improves the scalability of the .
WMNs. In this section, discuss the distributed Since is the effective transmission rate reflecting
implementation of the proposed scheme. The flow the PHY transmission rate as well as the packet error
data rate vectors and the Lagrange probability, we can say that is equivalent to the
multipliers are distributively managed by the ETT. Therefore, the link cost in the proposed scheme
nodes in the WMN. The flow data rate vector can be viewed as the ETT augmented with the load
managed by the source node of the flow f. Recall control variable. To find the optimal route on which
is the single path flow data rate vector with the active the sum of the proposed link cost is minimized, we can
route of use either the existing proactive routing protocols or
,and the flow data rate on is equal to reactive routing protocols. The source node of flow f
. periodically finds the new optimal route by using these
For implementation, one node within a cluster is routing protocols. The source node is aware of the link
designated as the head of the cluster. The head of a costs on the current active route from the periodic
cluster is assumed to able to communicate with the report, and is also informed of the link cost on the
transmitted nodes of the links in its cluster. Let us call current active route from the periodic report, and is
the head of cluster c the ―cluster head‖c the cluster also informed of the link costs on the new optimal
head c takes the role of maintaining and updating route by the routing protocol. Based on these link
. costs, the source node decides whether to change the
When the dual decomposition method is used, active route or not.
different variables can be updated according to the
different time schedules. Therefore, in order to Flow/Congestion control: The source node periodically
improve the convergence speed, in practice, different recalculates the flow date rate by using
network entities carry out these operations the link cost on the active route. The source node
asynchronously, by using currently available lowers the flow data rate of its traffic to the flow data
information. Though it is difficult to prove that the rate, the source node can be quenched when the
asynchronous operation leads to the exact solution, we active route passes through the congested area of the
have confirmed by simulation that the solutions network.
produced by the asynchronous and the synchronous By the above three operations, network-wide load
operations are the same in our routing problem. In the balance can be achieved. If an area of the network is
following, we describe three operations in more detail overloaded, the link costs around the area are
when they are implemented asynchronously. increased by the link cost control operation. Then, the
source node of the flow passing through the area
Link cost control: For link cost control, the cluster head reduces its flow data rate, for find another route that
c gathers the information on the total load in the allows a higher flow data rate.
cluster c and adjusts to control the load on the

380
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

the original algorithm. The active route is changed


more conservatively with the smaller value of ξ. At jth
iteration, the dampening algorithm first finds any
optimal route.
Consequently, the dampening algorithm can alleviate
the route flapping problem if the dampening
parameter ξ is set to a sufficiently small value. The
stability comes at the cost of the suboptimal route
selection.

V. RESULTS AND DISCUSSION


In the gateway scenario, the whole are is divided
into four square areas. At the center of each divided
Fig.2.Control information exchange for distributed area, a gateway node is placed. All users try to send
implementation. data traffic to the outside network. Therefore, they
should find the route to one of the four gateways. To
The figure illustrates an example of control information model the load imbalance situation, we introduce a
exchange for a flow and a cluster. On the active route parameter named the load skewness, denoted by the
, the flow data rate and the link cost load skewness represents the degree that the source
are nodes of flows are concentrated in the shaded. the
exchanged. This control information can be proposed scheme increases the throughput by 18-64
piggybacked on the data and acknowledgement (ACK) percent over the ETT-based scheme when the load
packets. In addition, the airtime ratio and the skewness is one. If the load skewness is zero, the
Lagrange multiplier are exchanged between the throughput gain of the proposed scheme is 10-18
cluster head and nearby nodes. If the proposed percent. In both cases, the proposed scheme
scheme is applied to the systems broadcasting the outperforms the ETT based scheme, and the
beacon message periodically, for example, such as performance gap is greater when the load skewness is
IEEE 802.11,802.15.3, and 802.15.4 standards. high. Fig. 6 clearly demonstrates the effect of the load
skewness on the throughput. This figure shows that a
IV. DAMPENING ALGORITHM s t he load skewness increases, the throughput of the
The dampening algorithm should alleviate the route proposed scheme only slightly drops (i.e ., from 246 to
flapping problem while keeping the solution in the 239 Mb/s if the number of flows is 100), whereas the
close range of the optimal one. Moreover, the throughput of the ETT-based scheme sharply drops
dampening algorithm should be able to be (i.e., from 213 to 144 Mb/s if the number of flows is
implemented in a distributed way. To accomplish these 100). This means that the proposed scheme is robust
goals, the dampening algorithm prevents the route to high load skewness following to its load balancing
flapping by changing the active route more capability. To clearly show the convergence speed, we
conservatively than the original algorithm does. When assume in this simulation that all operations are
the original algorithm is used, we have performed synchronously at each iteration, instead of
. This means that, at the jth iteration, the original employing the asynchronous distributed
algorithm finds any optimal route in and immediately implementation in previous simulations. Note that one
changes the active route to the new optimal route. iteration includes one round of routing, link cost
However, the dampening algorithm changes the active control, and flow/congestion control. That the routes
route only if the new route increases by a certain are stabilized as the number of iterations increases.
margin. Every 100 iterations, we count the number of flows
Let us explain the operation of the dampening whose route is changed during the last 100 iterations.
algorithm. We define ξ as the dampening parameter The simulation is performed in both the gateway and
which controls the conservativeness in changing the no gateway scenarios. The load skewness is one for
route. The value of ξ is between zero and one. If ξ is the gateway scenario, and the concentrated traffic
set to one, the dampening algorithm is the same as model is selected for the no gateway scenario. The

381
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

total number of flows is 100.


To see that almost all route changes occur within
200 iterations for both scenarios. It is noted that this REFERENCES
number of iterations (i.e., 200) is required for
convergence from the initial state, where all flows start [1] Bhupendra Kumar Gupta and B.M.Acharya, Manoj
simultaneously. Since there is only a small change in Kumar Mishra.Optimization of Routing Algorithm in
network configuration (e.g., addition/deletion of a Wireless Mesh Networks, IEEE 2009 World Congress
flow) at a time in usual situation, much smaller on Nature & Biologically Inspired Computing.
number of iterations is needed to converge in practice. [2] Chi Ma, Zhenghao Zhang and Yuanyuan Yang.
In case of the gateway scenario, we observe that no Battery - Aware Router Scheduling in Wireless Mesh
route change takes place after 1 ,700 iterations. In the Networks, IEEE 2009 International Conference.
no gateway scenario, few routes change continuously
due to the route flapping. However, this is acceptable [3] Fatos Xhafa, Leonard Barolli. Ad Hoc and
since the number of route flappings is very small Neighborhood Search Methods for Placement of
compared to the total number of flows. As mentioned Mesh Routers in Wireless Mesh Networks, 2009
before, if needed, the routing scheme can be further 29th IEEE International Conference on Distributed
stabilized by decreasing the parameter ξ. Computing Systems Workshops.

VI. CONCLUSION [4] Jonathan Guerin, Marius Portmann. Routing


A load aware routing scheme is developed for WMN. Metrics for Multi-Radio Wireless Mesh
We have formulated the routing problem as an Networks, IEEE Applications Conference December
optimization problem, and have solved it by using the 2nd – 5th 2007.
dual decomposition method. The dual decomposition
method makes it possible to design a distributed [5] Md. Arafatur Rahman, Md. Saiful Azad, Farhat
routing scheme. However, there could be a route Anwar. Integrating Multiple Metrics to Improve the
flapping problem in the distributed scheme. To tackle Performance of a Routing Protocol over Wireless
this problem, we have suggested a dampening Mesh Networks, IEEE 2009 International Conference
algorithm and have analyzed the performance of the on Signal Processing Systems.
algorithm. The numerical results show that the
proposed scheme with a dampening algorithm well [6] Richard Draves Jitendra Padhye Brian Zill. Routing
converges to a stable state and achieves much higher in Multi-Radio, Multi-Hop Wireless Mesh
throughput than the ETT based scheme does owing to Networks Richard Draves Jitendra Padhye
its load-balancing capability. Brian Zill, MobiCom‘04, Sept. 26-Oct. 1, 2004.
The main advantage of the proposed routing scheme
is that it is favorable to practical implementation [7] Usman Ashraf, Slim Abdellatif and Guy Juanole.
although it is theoretically designed. The proposed Route Stability in Wireless Mesh Access Networks,
scheme is a practical single-path routing scheme, 2008 IEEE/IFIP International Conference.
unlike other multipath routing schemes which are
designed by using the optimization theory. Also, the
proposed scheme can easily be implemented in a
distributed way by means of the existing routing
algorithms. The proposed scheme can be applied to
various single-band PHY/MAC layer protocols. In future
work, we can extend the proposed scheme so that it
can also be applied to the multiband protocols, which
can provide larger bandwidth to the WMN.

382
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

OPTIMIZED ROUTING ALGORITHM FOR


WIRELESS MESH NETWORKS
V. Sivaperumal
Department of Computer Science Engineering,
Jerusalem College of Engineering,
Anna University Chennai,
Chennai, India.
E-mail:shiva1533@gmail.com

Mrs. P. Mahalakshmi
Department of Computer Science Engineering,
Jerusalem College of Engineering,
Anna University Chennai,
Chennai, India.

E-mail: p_mlaxmi@yahoo.com proposed scheme effectively balances the


Abstract—This paper presents a load-aware traffic load and outperforms the routing
routing scheme for wireless mesh networks algorithm using the expected transmission
(WMNs). In a WMN, the traffic load tends to time (ETT) as a routing metric.
be unevenly distributed over the network. In
this situation, the load-aware routing Keywords— Wireless mesh network, load-
scheme can balance the load, and aware routing, utility, dual decomposition.
consequently, enhance the overall network
capacity. We design a routing scheme which I. INTRODUCTION
maximizes the utility, i.e., the degree of user
satisfaction, by using the dual A wireless mesh network (WMN) consists of a
decomposition method. The structure of this number of wireless routers, which do not only
method makes it possible to implement the operate as hosts but also forward packets on
proposed routing scheme in a fully behalf of other routers. WMNs have many
distributed way. With the proposed scheme, advantages over conventional wired networks,
a WMN is divided into multiple clusters for such as low installation cost, wide coverage, and
load control. A cluster head estimates traffic robustness, etc. Because of these advantages,
load in its cluster. As the estimated load WMNs have been rapidly penetrating into the
gets higher, the cluster head increases the market with various applications, for example,
routing metrics of the routes passing public Internet access, intelligent transportation
through the cluster. Based on the routing system (ITS), and public safety. One of the main
metrics, user traffic takes a detour to avoid research issues related to WMNs is to develop the
overloaded areas, and as a result, the WMN routing algorithm optimized for the WMN.
achieves global load balancing. We present In mobile ad-hoc networks, the primary concern of
the numerical results showing that the routing has been robustness to high mobility.
However, nodes in the WMN are generally quasi-

383
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

static in their location. Thus, the focus of the gateway nodes connected to the wired backhaul
routing studies in the WMN has moved to network. Each user is associated with its serving
performance enhancement by using sophisticated node. In this paper, we do not deal with the
routing metrics. For example, as the routing interface between a user and its serving node to
metrics, researchers have proposed the expected focus on the mesh network itself. Through the
transmission number (ETX), the expected serving node, a user can send (receive) data
transmission time (ETT) and weighted cumulative traffic to (from) the other user in the WMN or to
ETT (WCETT), the metric of interference and (from) outside networks via the gateway nodes. If
channel switching (MIC), and the modified node can transmit data to node directly (i.e.,
expected number of transmissions (mETX) and without relaying), there exists a link from the node
effective number of transmissions (ENTs). into the node m. In this paper, we define a link as
Although these metrics have shown significant unidirectional. For the mathematical
performance improvement over the traditional representation, we define N and L as the sets of
hop-count routing metric, they neglect the the indices of all nodes and all links in the
problem of traffic load imbalance in the WMN. network, respectively. In Table 1, we summarize
In the WMN, a great portion of users in tends to all mathematical notations introduced in this
communicate with outside networks via the wired section.
gateways. In such environment, the wireless links
around the gateways are likely to be a bottleneck
of the network. If the routing algorithm does not TABLE 1
take account of the traffic load, some gateways Table of Symbols
may be overloaded while the others may not. This
load imbalance can be resolved by introducing a Notation Description
load-aware routing scheme that adopts the routing N Set of indices of all nodes
metric with load factor. When the load-aware
L Set of indices of all links
routing algorithm is designed to maximize the
F Set of indices of flows
system capacity, the major benefit of the load-
aware routing is the enhancement of the overall C Set of indices of clusters
system capacity due to the use of underutilized Dr Set of indices of all intermediate
paths. links on route r
Hl Set of indices of all routes passing
II. SYSTEM MODEL through link l
Each wireless router in a WMN is fixed at a Gf Set of indices of all possible routes
location. Thus, the WMN topology does not change for flow f
frequently and the channel quality is quasi-static. Qr Set of indices of all flows using
In addition, each wireless router serves so many router r
subscribers (i.e., users) in general that the Mc Set of indices of all links in cluster c
characteristic of the aggregated traffic is stable Vl Set of indices of all clusters including
over time. Therefore, we design the routing link l
scheme under the system model of which topology Pf,r Flow data rate of flow f
and user configuration are stable. dl Effective transmission rate of link l
al Airtime ration of link l
Ratio of the time for data
transmission to the whole time
uf(x) Utility of flow f when the data rate in
x
Α System wide fairness parameter
Pf Priority of flow f
Ζ Delay penalty parameter
Fig 1. Example mesh network.
Dampening parameter
In Fig. 1, in this figure, a node stands for a
wireless router, which not only delivers data for its
The WMN under consideration provides a
own users, but also relays data traffic for other
connection oriented service, where connections
wireless r outers. Among nodes, there are some
are managed in the unit of a flow. A flow is also

384
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

unidirectional. A user can communicate with the available information. Though it is difficult to prove
other user or the gateway node after setting up a that the asynchronous operation leads to the exact
flow connecting them. Since a user is connected to solution, we have confirmed by simulation that the
a unique node, the flow between a pair of users solutions produced by the asynchronous and the
can also be specified by the corresponding node synchronous operations are the same in our
pair. The node where a flow starts (ends) will be routing problem. In the following, we describe
called the source (destination) node of the flow. three operations in more detail when they are
Fig. 1 shows an example scenario where a user implemented asynchronously.
intends to send data to outside networks. As seen
in this figure, if a flow conveys data to (from) Link cost control: For link cost control, the cluster
outside networks, all gateway nodes can be the head c gathers the information on the total load in
destination (source) node of the flow. We will the cluster c and adjusts to control the load on
identify a flow by an index, generally f, and define the cluster c. This process is the follows: Each
F as the set of the indices of all flows in the node estimates the total loads for all outgoing
network. links and broadcasts the estimated load
Data traffic on a flow is conveyed to the periodically receives broadcasting message, the
destination node through a multihop route. We cluster head can compute the total airtime ratio
only consider acyclic routes. Thus, a route can be consumed by the links in its cluster. If the total
determined by the set of all intermediate links that airtime ratio exceeds the available airtime ratio, it
the route takes. We will index a route by rand means that the cluster is overloaded. Therefore
define Df as the set of the indices of all cluster head c increases .If the cluster c is not
intermediate links on the route r. For a flow, there overloaded, that is ,its total airtime ratio is smaller
can be a number of possible routes that connect than , the cluster head c decreases
the source and destination nodes. Let G denote The cluster head c periodically broadcasts .
the set of the indices of all possible routes for flow
f. Routing: The link cost of link l is calculated as
.
III. DISTRIBUTED IMPLEMENTATION Since is the effective transmission rate
The routing scheme can be implemented in a reflecting the PHY transmission rate as well as the
distributed way, which improves the scalability of packet error probability, we can say that is
the WMNs. In this section, discuss the distributed equivalent to the ETT. Therefore, the link cost in
implementation of the proposed scheme. The flow the proposed scheme can be viewed as the ETT
data rate vectors and the Lagrange augmented with the load control variable. To find
multipliers are distributively managed by the the optimal route on which the sum of the
nodes in the WMN. The flow data rate vector proposed link cost is minimized, we can use either
managed by the source node of the flow f. Recall the existing proactive routing protocols or reactive
is the single path flow data rate vector with the routing protocols. The source node of flow f
active route of periodically finds the new optimal route by using
,and the flow data rate on is equal to these routing protocols. The source node is aware
. of the link costs on the current active route from
For implementation, one node within a cluster is the periodic report, and is also informed of the link
designated as the head of the cluster. The head of cost on the current active route from the periodic
a cluster is assumed to able to communicate with report, and is also informed of the link costs on
the transmitted nodes of the links in its cluster. Let the new optimal route by the routing protocol.
us call the head of cluster c the ―cluster head‖c Based on these link costs, the source node decides
the cluster head c takes the role of maintaining whether to change the active route or not.
and updating
. Flow/Congestion control: The source node
When the dual decomposition method is used, periodically recalculates the flow date rate
different variables can be updated according to the by using the link cost on the active route. The
different time schedules. Therefore, in order to source node lowers the flow data rate of its traffic
improve the convergence speed, in practice, to the flow data rate, the source node can be
different network entities carry out these quenched when the active route passes through
operations asynchronously, by using currently the congested area of the network.

385
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

By the above three operations, network-wide load the dampening algorithm should be able to be
balance can be achieved. If an area of the network implemented in a distributed way. To accomplish
is overloaded, the link costs around the area are these goals, the dampening algorithm prevents
increased by the link cost control operation. Then, the route flapping by changing the active route
the source node of the flow passing through the more conservatively than the original algorithm
area reduces its flow data rate, for find another does. When the original algorithm is used, we
route that allows a higher flow data rate. have . This means that, at the jth
iteration, the original algorithm finds any optimal
route in and immediately changes the active
route to the new optimal route. However, the
dampening algorithm changes the active route
only if the new route increases by a certain
margin.
Let us explain the operation of the dampening
algorithm. We define ξ as the dampening
parameter which controls the conservativeness in
changing the route. The value of ξ is between
zero and one. If ξ is set to one, the dampening
algorithm is the same as the original algorithm.
The active route is changed more conservatively
with the smaller value of ξ. At jth iteration, the
dampening algorithm first finds any optimal route.
Consequently, the dampening algorithm can
alleviate the route flapping problem if the
dampening parameter ξ is set to a sufficiently
small value. The stability comes at the cost of the
suboptimal route selection.

V. RESULTS AND DISCUSSION


In the gateway scenario, the whole are is
divided into four square areas. At the center of
each divided area, a gateway node is placed. All
users try to send data traffic to the outside
Fig.2.Control information exchange for distributed network. Therefore, they should find the route to
implementation. one of the four gateways. To model the load
imbalance situation, we introduce a parameter
The figure illustrates an example of control named the load skewness, denoted by the load
information exchange for a flow and a cluster. On skewness represents the degree that the source
the active route , the flow data rate nodes of flows are concentrated in the shaded.
and the link cost are the proposed scheme increases the throughput by
exchanged. This control information can be 18-64 percent over the ETT-based scheme when
piggybacked on the data and acknowledgement the load skewness is one. If the load skewness is
(ACK) packets. In addition, the airtime ratio and zero, the throughput gain of the proposed scheme
the Lagrange multiplier are exchanged between is 10-18 percent. In both cases, the proposed
the cluster head and nearby nodes. If the scheme outperforms the ETT based scheme, and
proposed scheme is applied to the systems the performance gap is greater when the load
broadcasting the beacon message periodically, for skewness is high. Fig. 6 clearly demonstrates the
example, such as IEEE 802.11,802.15.3, and effect of the load skewness on the throughput.
802.15.4 standards. This figure shows that a s t he load skewness
increases, the throughput of the proposed scheme
IV. DAMPENING ALGORITHM only slightly drops (i.e ., from 246 to 239 Mb/s if
The dampening algorithm should alleviate the the number of flows is 100), whereas the
route flapping problem while keeping the solution throughput of the ETT-based scheme sharply
in the close range of the optimal one. Moreover, drops (i.e., from 213 to 144 Mb/s if the number of

386
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

flows is 100). This means that the proposed The main advantage of the proposed routing
scheme is robust to high load skewness following scheme is that it is favorable to practical
to its load balancing capability. To clearly show the implementation although it is theoretically
convergence speed, we assume in this simulation designed. The proposed scheme is a practical
that all operations are performed synchronously at single-path routing scheme, unlike other multipath
each iteration, instead of employing the routing schemes which are designed by using the
asynchronous distributed implementation in optimization theory. Also, the proposed scheme
previous simulations. Note that one iteration can easily be implemented in a distributed way by
includes one round of routing, link cost control, means of the existing routing algorithms. The
and flow/congestion control. That the routes are proposed scheme can be applied to various single-
stabilized as the number of iterations increases. band PHY/MAC layer protocols. In future work, we
Every 100 iterations, we count the number of can extend the proposed scheme so that it can
flows whose route is changed during the last 100 also be applied to the multiband protocols, which
iterations. The simulation is performed in both the can provide larger bandwidth to the WMN.
gateway and no gateway scenarios. The load
skewness is one for the gateway scenario, and the REFERENCES
concentrated traffic model is selected for the no
gateway scenario. The total number of flows is [1] Bhupendra Kumar Gupta and B.M.Acharya,
100. Manoj Kumar Mishra.Optimization of Routing
To see that almost all route changes occur Algorithm in Wireless Mesh Networks, IEEE 2009
within 200 iterations for both scenarios. It is noted World Congress on Nature & Biologically Inspired
that this number of iterations (i.e., 200) is required Computing.
for convergence from the initial state, where all [2] Chi Ma, Zhenghao Zhang and Yuanyuan Yang.
flows start simultaneously. Since there is only a Battery - Aware Router Scheduling in Wireless
small change in network configuration (e.g., Mesh Networks, IEEE 2009 International
addition/deletion of a flow) at a time in usual Conference.
situation, much smaller number of iterations is
needed to converge in practice. In case of the [3] Fatos Xhafa, Leonard Barolli. Ad Hoc and
gateway scenario, we observe that no route Neighborhood Search Methods for Placement
change takes place after 1 ,700 iterations. In the of Mesh Routers in Wireless Mesh
no gateway scenario, few routes change Networks, 2009 29th IEEE International
continuously due to the route flapping. However, Conference on Distributed Computing Systems
this is acceptable since the number of route Workshops.
flappings is very small compared to the total
number of flows. As mentioned before, if needed, [4] Jonathan Guerin, Marius Portmann. Routing
the routing scheme can be further stabilized by Metrics for Multi-Radio Wireless Mesh
decreasing the parameter ξ. Networks, IEEE Applications Conference December
2nd – 5th 2007.
VI. CONCLUSION
A load aware routing scheme is developed for [5] Md. Arafatur Rahman, Md. Saiful Azad, Farhat
WMN. We have formulated the routing problem as Anwar. Integrating Multiple Metrics to Improve
an optimization problem, and have solved it by the Performance of a Routing Protocol over
using the dual decomposition method. The dual Wireless Mesh Networks, IEEE 2009 International
decomposition method makes it possible to design Conference on Signal Processing Systems.
a distributed routing scheme. However, there
could be a route flapping problem in the [6] Richard Draves Jitendra Padhye Brian Zill.
distributed scheme. To tackle this problem, we Routing in Multi-Radio, Multi-Hop Wireless
have suggested a dampening algorithm and have Mesh Networks Richard Draves Jitendra
analyzed the performance of the algorithm. The Padhye Brian Zill, MobiCom‘04, Sept. 26-Oct. 1,
numerical results show that the proposed scheme 2004.
with a dampening algorithm well converges to a
stable state and achieves much higher throughput [7] Usman Ashraf, Slim Abdellatif and Guy
than the ETT based scheme does owing to its Juanole. Route Stability in Wireless Mesh Access
load-balancing capability.

387
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Networks, 2008 IEEE/IFIP International


Conference.

388
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

AUTOMATIC MULTILEVEL THRESHOLDING OF


DIGTAL IMAGES
*G.SIVARANJANI,
PG STUDENT,
DEPARTMENT OF ECE,
Adiparasakthi Engineerimg College,
Melmaruvathur.
Mobile #:98404789999

**MRS.M.RAJALAKSHMI, M.E,
ASSISTANT PROFESSOR OF ECE DEPT,
Adiparasakthi Engineerimg College,
Melmaruvathur.

Email Id: shivapss@gmail.com

ABSTRACT processing is now a critical component in


The segmentation has been applied in several science and technology. The rapid progress in
areas, especially where it is necessary use computerized medical image reconstruction,
tools for feature extraction and to get the and the associated developments in analysis
needed object from the rest of the image for methods and computer-aided diagnosis, has
analyzing a particular object. There are several propelled medical imaging into one of the
segmentation methods for segmenting most important sub-fields in scientific imaging.
medical images , but it is difficult to find a In computer vision, segmentation refers to the
method that can be adapt and better for process of partitioning a digital image into
different type of medical images. For better
segmentation multilevel thresholding is applied
and to adapt for different type of images
histogram based segmentation is performed. multiple regions (sets of pixels). The goal of
Multilevel thresholding is a process that segmentation is to simplify and/or change the
segments a gray-level image into several representation of an image into something
distinct regions. The technique was applied to that is more meaningful and easier to analyze.
segment the cell core and potential rejection Image segmentation is typically used to locate
of tissue in myocardial images of biopsies objects and boundaries (lines, curves, etc.) in
from cardiac transplant. images. The result of image segmentation is a
set of regions that collectively cover the entire
Keywords: Segmentation,Multilevel image, or a set of contours extracted from the
thresholding, Myocardial biopsy. image (see edge detection). Each of the pixels
in a region are similar with respect to some
1.INTRODUCTION characteristic or computed property, such as
The influence and impact of digital images on color, intensity, or texture. Adjacent regions
modern society is tremendous, and image

389
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

are significantly different with respect to the idea is to define k centroids, one for each
same characteristic(s). cluster [2]. These centroids should be placed
in a cunning way because of different location
During segmentation, an image is causes different result. So, the better choice is
preprocessed, which can involve restoration, to place them as much as possible far away
enhancement, or Simply representation of the from each other. The next step is to take each
data. Certain features are extracted to point belonging to a given data set and
segment the image into its key components. associate it to the nearest centroid. When no
The segmented image is routed to a Classifier point is pending, the first step is completed
or an image-understanding system. The image and an early group age is done. At this point
classification process maps different regions or we need to re-calculate k new centroids as bar
segments into one of several objects. Each centers of the clusters resulting from the
object is identified by a label. The image previous step.
understanding system then determines the
relationships between different objects in a After we have these k new centroids, a new
scene to provide a complete scene description. binding has to be done between the same
Powerful segmentation techniques are data set points and the nearest new centroid.
currently available however, each technique is A loop has been generated. As a result of this
ad hoc. The creation of hybrid techniques loop we may notice that the k centroids
seems to be a future research area that is change their location step by step until no
promising with respect to current Navy digital more changes are done. In other words
mapping applications. Medical image centroids do not move any more. Finally, this
segmentation refers to the segmentation of algorithm aims at minimizing an objective
known anatomic structures from medical function, in this case a squared error function.
images. Structures of interest include organs
or parts thereof, such as cardiac ventricles or Using K Means clustering we can segment
kidneys, abnormalities such as tumors and Angiographic images. The goal is to propose
cysts, as well as other structures such as an algorithm that can be better for large
bones, vessels, brain structures etc. The datasets and to find initial centroids. K-Means
overall objective of such methods is referred Clustering is an iterative technique that is used
to as computer-aided diagnosis they are used to partition an image into K clusters.
for assisting doctors in evaluating medical
imagery or in recognizing abnormal findings in REGION GROWING METHODS
a medical image.
In Region growing technique, segment an
2.METHODOLOGY: image pixels that are belong to an object into
regions. Segmentation is performed based on
Several general-purpose algorithms and some predefined criteria. Two pixels can be
techniques have been developed for image grouped together if they have the same
segmentation. Since there is no general intensity characteristics or if they are close to
solution to the image segmentation problem, each other. It is assumed that pixels that are
Thresholding approaches, Region Growing closed to each other and have similar intensity
approaches, Clustering techniques often have values are likely to belong to the same object.
to be combined with domain knowledge in The simplest form of the segmentation can be
order to effectively solve an image achieved through threshold and component
segmentation problem for a problem domain. labeling. Another method is to find region
boundaries using edge detection. Region
CLUSTERING METHODS growing is a procedure that groups pixels or
K-means is one of the simplest unsupervised subregions into larger regions.
learning algorithms that solve the well known
clustering problem. The procedure follows a The simplest of these approaches is pixel
simple and easy way to classify a given data aggregation, which starts with a set of ―seed‖
set through a certain number of clusters points and from these grows regions by
(assume k clusters) fixed a priori. The main appending to each seed points those

390
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

neighboring pixels that have similar properties


(such as gray level, texture, color,
shape).Region growing based techniques are
better than the edge-based techniques in
noisy images where edges are difficult to
detect.

OTSU’S METHOD

Otsu‘s thresholding chooses the threshold to


minimize the
intraclass variance of the threshold black and
white pixels. Otsu‘s Thresholding Method
based on a very simple idea Find the threshold
that minimizes the weighted within-class FIG: 1 Myocardial images obtained with
variance. This turns out to be the same as biopsies of a transplanted heart patient.
maximizing the between-class variance[4].
Operates directly on the gray level histogram Segmenting the input cardiac for diagnosing
so it‘s fast (once the histogram is computed). the mismatch of tissue in heart transplant
Assumptions for OTSU‘s method patient using clustering method. The cardiac
• Histogram (and the image) are bimodal. image which is shown in Fig 1is segmented
• No use of spatial coherence,nor any other using k means clustering method number of
notion of object structure. clusters is the user defined parameter for this
• Assumes stationary statistics, but can be image which is selected k=4.Fig.2 From the
modified to be locally adaptive. segmented image ,cell core of the cardiac is
not clear because tissues and blood vessels
Now, we could actually stop here. All we need too present in this segmented image so this
to do is just run through the full range of t method of segmentation for biopsy cardiac
values [1,256] and pick the value that image is not suitable. The ouput dependents
minimizes. But the relationship between the on the parameter k which is user defined will
within-class and between-class variances can change so the ouput is not stable.
be exploited to generate a recursion relation
that permits a much faster calculation. For any
given threshold, the total variance is the sum
of the within-class variances (weighted) and
the between class variance, which is the sum
of weighted squared distances between the
class means and the grand mean.
Fig.1 is taken as a input image which is a
myocardial images obtained with biopsies of a
Transplanted heart patient. In the studied
images we used three methods to diagnose
the matching of cell core or tissue of a FIG: 2 Segmentation using k-means clustering
transplanted heart patient. For comparison of
the segmented regions, beyond the strategy of Fig 3 is a segmented cardiac image using
maximum entropy, we compared the results region growing method from which we can
with those provided by the Otsu's method. visualize the cell core which is not clearly
viewed for diagnosing the mismatching of
tissues. When comparing other method this is
stable method which is one of the advantage
of using this method for segmentation. Here
cell core , tissue and vessels are segmented so
it not clear for visualizing the mismatch of the

391
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

cell core because entire image is not in same pixel by pixel. For values greater than 1, the
intensity . value used in the process of iteration of the
algorithm is the value that was defined by the
user.

B. Valleys Analysis
The identification of the histogram valleys is
very important, because in these valleys the
thresholds are concentrated, and therefore the
division of classes. The algorithm identifies
automatically these valleys using the
transition of the histogram values signals,
which is done in the following way:
FIG:3 Segmentation using Region growing • First you compare the first group‘s value,
method which was determined in the histogram
segmentation, with this group‘s last value. If
using otsu‘s method threshold value of the the first is lower it means that the histogram
cardiac image is calculated setting that as values are increasing and the signal is positive.
threshold value biopsy cardiac image is On the other hand, the histogram values are
segmented which is shown in fig 4. From the decreasing and the signal is negative.
segmented image we cannot clearly • The next step is to identify the sign of the
diagnosing the mismatch of the tissue or cell next
core because the threshold value is dependent group, every time there is a transition from a
upon the user. This method show different negative to positive, a valley is identified.
segmented image for different value of Once you found the first valley, you pass to
threshold. the next step, the analysis of the percentage
of slope.

BLOCK DIAGRAM

CARDIA Histogram Quantizati


C IMAGE Calculatio on
n

FIG: 4 Segmentation using OTSU‘s method


Multilevel Maximum Histogram
Threshold Entropy Slope
3.PROPOSED SEGMENTATION Percentage
ALGORITHM:

The proposed algorithm is based on a model


of automatic multilevel thresholding and
considers techniques of group histogram Segmen
quantization, analysis of the histogram slope
tation
percentage and calculation of maximum
entropy to define the threshold.
FIG: 5 Block diagram of multilevel thresholding
A. Histogram Quantification
To evaluate the histogram in specific groups, C. Analysis of the slope percentage
the user should set the size of the group. If
the given value is 1 then the process analyzes

392
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The determination of thresholds based on an FIG:6 Segmented image using automatic


analysis of vouchers may include differences in multilevel thresholding
cases where there is a distribution of values in Applications of multilevel thresholding:
the histogram, without valleys or valleys
insignificant, with little variance. For the type Segments the cell core and rejection of tissue
of image analyzed, we identified that an in myocardial images of biopsies from cardiac
effective threshold would be near to the base transplant.
of a group
that has a considerable percentage of slope.
This minimum percentage of slope is set by 4.CONCLUSION:
the user, who may adjust according to the
type of image. This approach involves According to used method, it was possible to
calculating the slope achieved by the find the image thresholds, and therefore,
difference of the average of the last three segmenting them, presenting satisfactory
values of the group with the first three. If this results. Through comparisons of the
difference is greater than the parameter set, techniques, that all techniques show better
the scanning of the slope percentage is results, with irrelevant differences. Compared
interrupted and go to the step of identifying with the method shown in , for the studied
the threshold. If the difference is smaller, the images, the proposed technique will show
histogram scan continues until the difference better results because it allows the adjustment
of means is greater. of parameters such as group size and slope
percentage of the histogram, factors that
D. Threshold identification using influence the threshold values. These
maximum entropy characteristics are significant aspects of the
Having established a group with valley, the developed technique, and allow the application
threshold identification is calculated by to other image types, since the input
determining the maximum entropy, which is parameters are adjustable to the studied case.
achieved from probabilistic calculations. In this This versatility and quality of results make the
context, we consider an image as a result of a developed technique a considerable alternative
random process, where the probability pi to be applied during the stage of feature
corresponds to the probability of a pixel of the extraction in artificial vision systems.
image taking a value of intensity i (i = 1, .., n)
. The gray level with the highest entropy is 5.REFERENCES:
identified as a threshold. After identifying the
threshold, we return to the stage of analysis of  Abutaleb, ―Automatic thresholding of
the valleys until the entire histogram has been gray-level pictures using two-dimensional
processed. entropy‖. Computer Vision, Graphics,and
Image Processing, vol.47, pp. 22-32, 1989.

 Jain.K, Duin.R, ―Statistical pattern


recognition: A review‖.IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol.
22, no.1, pp. 4–37, 2000.

 Kapur, J. N. Sahoo, P. K, Wong.K.C


―A new method for graylevel Picture
thresholding using the entropy of the
histogram‖. Computer Vision, Graphics, and
Image Processing, v.29, p. 273-
285, 1985.

 Otsu, N. ― A threshold selection


method from gray-level histogram‖. IEEE

393
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Transactions on Systems, Man, and Information science and engineering 17, 713-
Cybernetics, v. 9, n.1, 1979. 727.

 Ping-sung liao, Tse-sheng chen and  www.imageprocessingplace.com


Pau-choochung,(2001),―A fast algorithm for
multilevel thresholding‖, journal of  www.search.medicinenet.com

394
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EFFICIENT ROUTING BASED ON LOAD


BALANCING IN WIRELESS MESH NETWORKS
*S. Sumathi, **S.Bharathiraja
Vel Tech Multi Tech Dr.Rangaragan & Dr.Sakunthala Engineering College, Anna University
Avadi, Chennai, India
*sumathikumarapandian@gmail.com
**bharathiraja.88s@gmail.com

Abstract—In This paper proposes a usually equipped with multiple wireless


clustered routing scheme for wireless interfaces built on either the same or different
mesh networks (WMNs). In a WMN, the wireless access technologies.
traffic load tends to be unevenly Inspite of all these differences, mesh and
distributed over the network. In this conventional wireless routers are usually built
situation, the clustered routing scheme based on a similar hardware platform. Mesh
can balance the load, and consequently, routers have minimal mobility and form the
enhance the overall network capacity. mesh backbone for mesh clients. Thus,
We design a routing scheme which although mesh clients can also work as a
maximizes the utility. In this system router for mesh networking, the hardware
WMN is divided into multiple clusters for platform and software for them can be much
load control. A cluster head estimates simpler than those for mesh routers. For
traffic load in its cluster. In this paper example, communication protocols for mesh
we propose an algorithm to network clients can be light-weight, gateway or bridge
these mesh nodes in to well define functions do not exist in mesh clients, only a
clusters with less-energy-constrained single wireless interface is needed in a mesh
gateway nodes acting as cluster heads, client, and so on.
and balance load among these gateways. In the WMN, a great portion of users
Simulation results show how our intends to communicate with outside networks
approach can balance the load and via the wired gateways.In such environment,
improve the lifetime of the system. the wireless links around the gateways are
Keywords— Wireless mesh networks, likely to be a bottleneck of the network. If
cluster head therouting algorithm does not take account of
the traffic load,some gateways may be
Introduction overloaded while the others maynot. This load
T IN Wireless mesh networks (WMNs) are imbalance can be resolved by introducing a
dynamically self-organized and self-configured, load-aware routing scheme that adopts the
with the nodes in the network automatically routing metric with load factor. When the
establishing an ad hoc network and load-aware routing algorithm is designed to
maintaining the mesh connectivity. WMNs are maximize the system capacity, the major
comprised of two types of nodes: mesh benefit of the load-aware routing is the
routers and mesh clients. Other than the enhancement of the overallsystem capacity
routing capability for gateway/bridge functions due to the use of underutilized paths.Although
as in a conventional wireless router, a mesh there have been some works on load-aware
router contains additional routing functions to routing for mobile ad-hoc networks andWMNs
support mesh networking. Through multi-hop , they simply include some load factors in the
communications, the same coverage can be routing metric without consideration of
achieved by a mesh router with much lower thesystem-wide performance.In this paper, we
transmission power. To further improve the propose a load-aware routing scheme,which
flexibility of mesh networking, a mesh router is maximizes the total utility of the users in the

395
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

WMN.The utility is a value which quantifies and the multi radio link quality source routing
how satisfied a user is with the network. Since (MR-LQSR), respectively. The WCETT is a
the degree of user satisfaction depends on the modification of the ETT to consider the intra
network performance, the utility can begiven flow interference. While the WCETT only
as a function of the user throughput. considers the intra flow interference, the MIC
Generally, the utility function is concave to and the interference aware (iAWARE) take
reflect the law of diminishing marginal utility. account of the interflow interference as well as
To design the scheme, we use the dual the intra flow interference.
decomposition method for utility maximization The mETX and the ENT are
.Using this method, we can incorporate not proposed to cope with the fast link quality
only the load aware routing scheme but also variation. These routing metrics contain the
congestion control and fair rate allocation standard deviation of the link quality in
mechanisms into the WMN. Most notably, we addition to the average link quality. The
can implement the load-aware routing scheme blacklist-aided forwarding (BAF) algorithm in
in a distributed way owing to the structure of explains to tackle the problem of shortterm
the dual decomposition method. link quality degradation by disseminating the
In the proposed routing blacklist, i.e., a set of currently degraded links.
scheme, a WMN is divided into multiple The ExOR algorithm explains the next hop
overlapping clusters. A cluster head takes role after the transmission for that hop without
of controlling the traffic load on the wireless predetermined routes. The ExOR can choose
links in its cluster. The cluster head the next hop that successfully received the
periodically estimates the total traffic load on packet, and therefore, it is robust to packet
the cluster and increases the ―link costs‖ of error and link quality variation. The resilient
the links in the mesh client. . In this paper we opportunistic mesh routing(ROMER) algorithm
propose an algorithm to network these mesh uses opportunistic forwarding to deal with
nodes in to well define clusters with less- short-term link quality variation. The
energy-constrained gateway nodes acting as ROMER maintains the long-term routes and
cluster heads, and balance load among these opportunistically expands or shrinks them at
gateways. Simulation results show how our runtime.
approach can balance the load and improve The ad hoc on-demand distance
the lifetime of the system vector spanning tree (AODV-ST) is an
adaptation of the AODV protocol to the WMN
2 RELATED WORKS with the wired gateways. The AODV-ST
For the WMN, a number of routing constructs a spanning tree of which the root is
metrics and algorithms have been proposed to the gateway. A routing and
take advantage of the stationary channel assignment algorithm for the
topology. The first routing metric is the ETX , multichannel WMN.
which is the expected number of In this algorithm, a spanning tree
transmissions required to deliver a packet to is formed in
the neighbor. In minimum loss (ML) metric such a way that a node attaches itself to the
this is used to find the route with lowest end- parent node.
to-end loss probability. The medium time The load-aware routing protocols incorporate
metric (MTM) is proposed for the multirate the load factor into their routing metrics. The
network.The MTM of a link is inverse dynamic load-aware routing (DLAR) takes as
proportional to the physical layer transmission the routing metric the number of packets
rate of the link. The ETT is a combination of queued in the node interface. The load-
the ETX and the MTM. The ETT is a required balanced ad hoc routing (LBAR) counts then
time to transmit a single packet over a link in umberof active pathson a node and its
the multirate network, calculated in neighbors, and uses it as a routing metric.
consideration of both the number of Both the DLAR and LBAR are designed for the
transmissions and the physical layer mobile ad hoc network, and aim to reduce the
transmission rate. packet delay and the packet loss ratio. An
The routing metric and algorithm admission control and load balancing
for the multiradio WMN, which are the WCETT algorithm is proposed for the 802.11 mesh

396
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

networks. In this work, the available radio can be decomposed into the subproblems
time (ART) is calculated for each node, and which are solved by the different network
the route with the largest ART is selected layers in the different network nodes. In the
when a new connection is requested.This decomposed problem, the Lagrange multipliers
algorithm tries to maximize the average act as a interface between the layers and the
number of connections. The WCETT load nodes, enabling the distributed entities to find
balancing (WCETT-LB) metric. The WCETT-LB the global optimal solution only by solving
is the WCETT augmented by the load factor their own subproblems.
consisting of the average queue length and Therefore ,the dual decomposition
the degree of traffic concentration. method provides a systematical way to design
The QoS-aware routing algorithm a distributed algorithm which finds the global
with congestion control and load balancing optimal solution.The mesh router relays
(QRCCLB) calculates the number of congested aggregated data traffic of mesh clients to and
nodes on each route and chooses the route from the IP core network. Typically, a mesh
with the smallest number of congested router has multiple wireless Interfaces to
nodes.Compared to these load-aware routing communicate with other mesh routers, and
protocols, the each wireless interface corresponds to one
proposed routing scheme has three major wireless channel.
advantages. First,the proposed scheme is These wireless channels have
design to maximize the system capacity by different characteristics, because wireless
considering all necessary elements for load interfaces are running on different frequencies
balancing, e.g., the interference between and built on either the same or different
flows, the link capacity, and the user demand, wireless access technologies, e.g., IEEE
etc. On the other hand, the existing protocols 802.11a/b/g/n. It is also possible that
fail to reflect these elements since they use directional antennas are employed on some
heuristically designed routing metrics. For interfaces to establish wireless channels over
example, the DLAR, the ART, and the WCETT- long distances.
LB do not take account of the interference
between flows. Also, the link capacity is not 3 SYSTEM MODEL
considered by the DLAR, the LBAR, the ART, 3.1 Mesh Network Structure
and the QRCCLB. Second, the proposed Each wireless router in a WMN
scheme can guarantee fairness between users. is fixed at a location. Thus, the WMN topology
When the network load is high, it is of does not change frequently and the channel
importance for users to fairly share scarce quality is quasi-static. In addition, each
radio resources.
However, the existing protocols
cannot fairly allocate resources, since they are
unable to distinguish which route is
monopolized by a small number of users.
Third, the proposed scheme can provide
routes stable over time. Since most of the
existing protocols adopt highly variable routing
metrics such as the queue length or the
collision probability, they are prone to suffer
from the route flapping problem.
We design the proposed routing
scheme by using the dual decomposition
method for the network utility maximization.
To use this method, one should formulate the
global optimization problem that is to
maximize the total system utility under the
constraints on the traffic flows and the radio
resources. After the constraints are relaxed by
the Lagrange multipliers, the whole problem

397
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

wireless router serves so many subscribers If all flows convey data traffic
(i.e., users) in general that the characteristic through each route at their flow data rates,
of the aggregated traffic is stable over time. the sum of the data rates of traffic passing
Therefore, we design the routing through link l is calculated as Pr2Hl Pf2Qr
scheme under the system model of which _f;r, where Hl is defined as the set of the
topology and user configuration are stable. In indices of all routes passing through the link l,
Fig. 1, we illustrate an example of the WMN. and Qr is the indices of all flows that use the
In this figure, a node stands for a wireless route r.We define the ―airtime ratio‖ of the link
router, which not only delivers data for its own l, denoted by al, as the ratio of the time taken
users, but also relays data traffic for other up by the transmission to the total time of link
wireless routers. Among nodes, there are l. The airtime ratio of the link l can be
some gateway nodes connected to the wired calculated as the sum of the data rates on the
backhaul network. Each user is associated link l divided by the effective transmission rate
with its serving node. In this paper, we do not of the link l. That is,
deal with the interface between a user and its
serving node to focus on the mesh network
itself. Through the serving node, a user can
send (receive) data traffic to (from) the other Now, we discuss the restriction on the
user in the WMN or to (from) outside networks radio resource allocation. For the protocols
via the gateway nodes. If node n can transmit under consideration, time is the only radio
data to node m directly (i.e., without relaying), resource, which is shared by links for data
there exists a link from the node n to the transmission. If two links are adjacent enough
nodem. In this paper, we define a link as to interfere with each other, packets cannot be
unidirectional. conveyed through the two links at the same
time. To incorporate this restriction into the
3.2 Physical and Medium Access Control Layer proposed scheme, we divide the WMN into
Model multiple overlapping clusters. A cluster
The proposed scheme can be includes the links adjacent enough to interfere
implemented on top of various physical with each other. Therefore, any pair of links in
(PHY) and medium access control (MAC) the same cluster cannot deliver packets
layer protocols that utilize a limited simultaneously. A cluster is generally indexed
bandwidth and divide the time for multiple by c, and let C be the set of the indices of all
access, for example, such as the carrier sense clusters in the WMN. We also define Mc as the
multiple access/collision avoidance set of all links in the cluster c. The proposed
(CSMA/CA), the time division multiple access scheme estimates the traffic load in each
(TDMA), cluster. The traffic load in a cluster is the sum
and the reservation ALOHA (R-ALOHA). of the traffic load on the links in the cluster. If
The effective transmission rate of a the traffic load in a cluster is estimated to be
link is defined as the number of actually too high, the proposed scheme can redirect
transmitted bits divided by the time spent for the routes passing through the overloaded
data transmission, calculated in consideration cluster for load balancing. The airtime ratio of
of retransmissions due to errors. That is, the a link represents the traffic load on the link. If
effective transmission rate can be calculated the sum of the airtime ratios of the links in a
as the PHY layer transmission rate times the cluster exceeds a certain bound, the cluster
probability of successful transmission. The can be regarded as overloaded. Roughly, we
PHY layer transmission rate can be fixed, or assume that a fixed portion of the time can be
can be adaptively adjusted according to the used for data transmission, while the
channel quality by means of rate control remainder is used for the purpose of control,
schemes such as the receiver-based autorate e.g., control message exchange and random
(RBAR). In the WMN under consideration, the back-off. Let _ denote the ratio of the time for
effective transmission rate of a link is assumed data transmission to the whole time. Since
to be static for a long time due to fixed only a link can convey data traffic at a time
locations of nodes. We define dl as the within a cluster, the sum of the airtime ratios
effective transmission rate of the link l. of the links in a cluster cannot exceed _.
Therefore, we have the following constraint:

398
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

practical point of view, the clusters do not


need to cover all possible cliques, but it is
enough for the clusters to be formed in such a
way that the traffic load in each region of the
WMN is separately evaluated
In Fig. 1, we give an example organization of
clusters.
4. Load-Balanced Clustering
Note that we do not draw all
The main objective of our
clusters to avoid overcrowding. In this figure,
approach is to cluster sensor network
four clusters are presented, each of which is
efficiently around few high-energy gateway
indicated by a dashed circle. Suppose that a
nodes. Clustering enables network scalability
cluster includes all incoming and outgoing links
to large number of Sensors and extends the
of the nodes in the dashed circle. In this
life of the network by allowing the Sensors to
example, the clusters 1 and 2 cover the areas
conserve energy through communication with
around the gateway nodes 1 and 2,
Closer nodes and by balancing the load among
respectively. When the estimated traffic load
the gateway nodes. Gateways associate cost
around the gateway node 1 is too high, the
to communicate with each sensor in the
user taking the route to the gateway node 1
network. Clusters are formed based on the
may not achieve high data rate due to the
cost of communication and the load on the
constraint (2) for cluster 1. In this case, if the
gateways. Network setup is performed in two
gateway node 2 is lightly loaded, it is desirable
stages;
for the user to choose the route to the
Bootstrapping‘ and ‗Clustering‘.
gateway node 2 for higher data rate. Thus, it
In the bootstrapping phase, gateways discover
can be said that the traffic load is estimated
the nodes that are located within their
and controlled in the unit of the cluster for
communication range. Gateways broadcast a
global load balancing.
message indicating the start of clustering. We
The notion of a cluster corresponds
assume that receivers of sensors are open
to a clique in the conflict graph. In the conflict
throughout the clustering process. Each
graph, vertices correspond to the links in the
gateway starts the clustering at a different
WMN. An edge is drawn between two vertices
instance of time in order to avoid collisions. In
if the corresponding links interfere with each
reply the sensors also broadcast a message
other. Thus, an edge stands for confliction
with their maximum transmission power
between two vertices. A clique in the conflict
indicating their location and energy reserve in
graph is a set of vertices that mutually conflict
this message. Each node discovered in this
with each other. Unless the conflict graph is a
phase is included in a range set per gateway.
―perfect graph,‖ the clique constraints in (2)
In the clustering phase, gateways
are not tight in the strict sense even when all
calculate the cost of communication, with each
cliques (clusters) are taken into account. We
node in the range set. This
propose the centralized algorithm that
Information is then exchanged between all the
transforms the conflict graph to a perfect
gateways.
graph by adding unnecessary edges to the
After receiving the data from all the other
conflict graph. This algorithm can also be
gateways each
applied to our routing scheme. However, from
gateway start clustering nodes based on the
a practical point of view, this algorithm is
communication cost and the current load on
inefficient since it requires centralized control
its cluster. When the clustering is over, all the
and can overly reduce spatial reuse.
sensors are informed about the ID of the
Therefore, in this paper, we recommend to
cluster they belong to. Since gateways share
use the clique constraints in (2) as it is.
the common information during clustering,
Actually, these clique constraints
each sensor belongs to only one cluster. For
are enough to serve our purpose, i.e.,
inter-cluster communication all the traffic is
identifying overloaded regions in the WMN to
routed through
redirect the routes. Also, there can be too
the gateways.
many cliques in the conflict graph, and
therefore, considering all of them can render
the proposed scheme highly complex.Froma

399
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4.1 Optimization Heuristics the part of the cluster for which it minimizes
Before minimizing the objective the objective function. The process is repeated
function we allocate the nodes in the ESet to till all the nodes in the sensors are not
their respective clusters and calculate the load. clustered.
If we allocate the remaining nodes to the
clusters only by minimizing the objective
function we experience large overlapping of 5 Performance Results
clusters. Considering only the load on In this section we present some results
gateways as a factor for clustering might do so obtained by our simulation. To evaluate the
at the expense of sensors. Our experiments performance of our algorithm, we compare the
also show that some sensors are not part of results with shortest distanceclustering where
the gateways nearest to them. This will a gateway includes a sensor in its cluster if the
increase the communication energy of the distance between them is minimum. We
sensors. Exhaustive search methods like measure three different properties of the
simulated annealing can be used to find the system based on different metrics. Standard
optimum results to balance the load as well as Deviation of Load per cluster: Experiments are
maintain the minimum distance with the performed to measure load on each cluster
gateway. But by using these methods the after clustering. Standard deviation of the load
complexity of the algorithm is increased with of the system gives a good evaluation of
the increase in sensors and gateways. In order distribution on load per cluster. We measure
to balance load of gateways and preserve the deviation in load by varying the number of
precious energy of sensors, we select few gateways from 2 to 10 in a fixed 100 nodes
nodes that are located radically near a network. In order to demonstrate that the
gateway and include them in the ESet of the load is balanced for any setup we ran the
gateway. A node is included in the ESet of a same experiments for 10 different normal
gateway if, its distance to the gateway is less distributions. Same experiments are performed
than a critical distance. Initially, the critical with shortest distance clustering and the
distance is equal to the minimum distance in results are compared with our approach.
the ESet. Then the critical distance is gradually Variance in load signifies that load is not
increased till the median of distances in ESet is uniformly distributed among the clusters.
reached. This procedure is repeated for all the Results demonstrate that for all distributions
gateways based on increasing order of our approach outperforms the shortest
cardinality, which balances load while distance
performing the selection. Experimental results clustering
show that this method significantly reduces
the number of nodes to be considered for
exhaustive search and reduces overlapping
between the clusters.
Now, we start clustering the
remaining sensors in the system. Since
sensors cannot reach all the gateways,
minimizing objective function for those
gateways will unnecessarily increase the
complexity of algorithm. In order to save
computation for clustering we sort the sensors
based on increasing order their reach. Nodes
with same reach are grouped together to
avoid extra computation of calculating the
objective function for the gateways they
cannot reach. Nodes with lower reach are To test our system for different sensors
considered first because they have fewer densities we measured standard deviation of
clusters to join. The objective function is load by using 5 gateways and increasing the
calculated by assigning these nodes to the number of sensors in the system from 100 to
gateways they can reach. The node becomes 500 with uniform increments. The graph

400
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

shown in Fig 3 clearly indicates that our serves as a hop to relay data from sensors
approach increase the scalability of the to a distant command node. If nodes are not
system. The performance of our approach uniformly distributed around the gateways the
remains constant with increase in density. The clusters formed will be of varied load, which
rising curve of the shortest distance clustering will affect the lifetime and energy consumption
indicates that variance in load is increasing of the system. Simulation results demonstrate
with increase in density. The demonstrated that our algorithm consistently balances load
results are based on the normal distribution of among different clusters and performs well in
sensors. Average communication energy per all distributions of nodes. Our future plan
cluster: We measure total energy required to includes extending the clustering model to
communicate between gateway and all the allow gateway mobility. Also, we plan to study
sensors in its cluster. Communication energy is different failure scenarios in sensor networks
directly proportional to the distance between and introduce run-time fault-tolerance in the
two nodes. If clusters are formed based on system.
shortest distance the average energy REFERENCES
consumed will be minimal but the load will not [1] R. Bruno, M. Conti, and E. Gregori,
be balanced. Sensors clustered by shortest ―Mesh Networks: Commodity Multihop Ad Hoc
distance method will consume less Networks,‖ IEEE Comm. Magazine, vol. 43, no.
communication energy in the beginning but 3, pp. 123-131, Mar. 2005.
will consume more energy later due to [2] D. De Couto, D. Aguayo, J. Bicket, and
overhead of re-clustering. We try to minimize R. Morris, ―High- Throughput Path Metric for
the average communication energy to perform Multi-Hop Wireless Routing,‖ Proc. ACM
as good as shortest distance algorithm in MobiCom, Sept. 2003.
terms of communication energy. The [3] D. Passos, D.V. Teixeira, D.C.
experimental results, in Fig. 4, show that the Muchaluat-Saade, L.C.S. Magalhaes, and
performance of shortest distance clustering C.V.N. Albuquerque, ―Mesh Network
decreases with the increase in number of Performance Measurements,‖ Proc. Int‘l
clusters. Information and Telecomm. Technologies
6. Conclusions and future work: Symp.,
In this paper we have introduced an Dec. 2006.
approach to cluster unattended wireless [4] B. Awerbuch, D. Holmer, and H.
sensors about few high-energy gateway nodes Rubens, ―The Medium Time Metric: High
and balance load among these clusters. The Throughput Route Selection in Multi-Rate Ad
gateway node acts as a centralized manager Hoc Wireless Networks,‖ Mobile Networks and
to handle the sensors Applications, vol. 11,no. 2, pp. 253-266, Apr.
2006.
[5] R. Draves, J. Padhye, and B. Zill,
―Routing in Multi-Radio, Multi-Hop Wireless
Mesh Networks,‖ Proc. ACM MobiCom, Sept.
2004.
[6] Y. Yang, J. Wang, and R. Kravets,
―Designing Routing Metrics for Mesh
Networks,‖ Proc. IEEE Workshop Wireless
Mesh Networks, Sept. 2005.

401
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ANALYSIS ON THE PERFORMANCE OF


VARIOUS DATA MINING ALGORITHMS FOR
CARDIOVASCULAR RISK FACTORS
Binu John, II ME, Department of Computer Science, Rajalakshmi Engineering College,
Rajalakshmi Nagar, Thandalam, Chennai-602105. Ph: 9043692663
binujohn86@gmail.com
Abstract-Data mining is the core step, which
results in the discovery of hidden but useful cardiovascular disease. We used classification
knowledge from massive databases. It is used using ID3, J48, Simple CART, Random Tree
in various fields of medicine like diabetics, and Naïve Bayes are used. Considering 29
cardiology, oncology etc for mining valuable predictor variables is one of the strong points
information. In this paper data mining in of this work. In section 2 we have described
cardiovascular disease field is considered. This about the dataset used. The section 3 is a
work deals with mining the major risk factors brief introduction of the various classification
of cardiovascular disease using the various methods used. Section 4 is about the accuracy
data mining algorithms. Twenty nine predictor measures used and Section 5 gives the results
variables and LDL (low density lipoprotein) as of data mining.
the target class are considered. About 10000 2. Dataset and variables used
records collected from various hospitals are Data collection, Cleaning and Coding
used for mining. Exploration of various data Data of 10000 CVD patients were collected
mining classification algorithms like ID3, J48, from various medical institutions during the
Simple CART, Naïve Bayes are considered. year 2010 under the supervision of medical
Analyses on the performance of the above practitioners. The data collected is as given in
algorithms are made to suggest the best Table 2: 1)General factors like age, sex, job
suitable algorithm for medical environment. type, sleep time, smoking, physical exercise,
Keywords: Data Mining; Cardiovascular using meat and salt, using milk and egg,
disease; Simple CART; ID3; J48; Naïve Bayes alcohol consumption; 2)Lipids like cholesterol
1. Introduction in mg/dL, triglycerides in mg/dL, high density
By the beginning of 21st century the death rate lipoproteins in mg/dL; 3)Blood factors like
due to cardiovascular disease has hit a rate of WBC, RBC, HB, hematocrit, platelets, mean
25% of the total mortality rate. This rate is corpuscular volume, mean corpuscular
increasing on a day to day basis. This hemoglobin in cells/cubic ml; 4)Obesity factors
significant growth in death rate has shown the like body mass index in kg/m2, waist
importance of research that should be made in circumference, waist to hip ratio;
this field of cardiovascular disease. 5)Apolipoproteins like APOA, APOB,
Research institutions have conducted various APOB/APOA in mg/dL; 6)Inflammation factor
investigations to mine the various factors CRP; 7)Sugar factors like fasting blood sugar
related to this disease. According to their in mg/dL, homeostasis model assessment and
research various factors like lipids, blood 8)Resting blood pressure in mmHg.
factors, obesity factors, apolipoproteins, To clean the data, fields were identified,
inflammation factor, sugar factor and general duplications extracted, records with missing
factors result in the occurrence of values were removed and data were coded
cardiovascular disease. The data collected according to Table 2.After data cleaning the
from various medical institutions is the source number of records were reduced as given in
for this work. Table 1 mainly due to unavailability of certain
In this work various supervised learning attribute value.
methods were analyzed to mine the major risk
factors for

402
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Table 1 further splitting is impossible or stopped.


No. of instances before and after cleaning Splitting is impossible if only one case remains
LDL No. of instances in a particular node or if all the cases in that
Optimal 2079 node are exact copies of each other. CART
Acceptable 2840 employs the computer-intensive technique of
Borderline 3240 cross validation.
High 1502
Very High 339 B. ID3
Total 10000
After Cleaning ID3 (Iterative Dichotomiser 3) matching
learning algorithm based on decision tree
LDL No. of instances induction is considered a well-known tool of
Optimal 2076 learning from examples [5]. The main concept
Acceptable 2838 of this algorithm is that: 1. Each node belongs
Borderline 3236 to a non-class attribute and each arc to a
High 1496 possible value of that attribute.
2. The amount of information in a node is
Very High 338
given by entropy in eq1
Total 9984
H = -(eq1)
3. Classification Methods Evaluated 3. To each node must be associated that non-
One of the most important and most class attribute offering the most informational
applicable tool is classification method, which gain from all not considered attributes starting
is a method with 2 stages [1][4]. In the first from the root to the current node. A drawback
stage a classifier is made, according to a of ID3 is that it is applicable to small set of
group of data with specified labels. This stage data [2]. Efficiency becomes an issue when it
is called learning. In the second stage built is applied to large database.
classifier is applied on a group of data, called C. J48
test data, to test the accuracy of the classifier. J48 is a WEKA implementation of the C4.5
If the accuracy is acceptable the classifier can algorithm [2]. It is an extension of ID3
be exerted on new dataset and determine algorithm and is based on information gain
their classes. The accuracy of a classifier is a theory. J48 algorithm implies an automatic
percentage of data records that are correctly procedure capable to select relevant features
classified by that. There are different options from data. It is an iterative algorithm which
to select a classifier. One of the most splits the instances in that point where the
important and simplest of them is decision information gain is the greatest [6][7]. J48
tree [1][4]. produces an output in the form of a decision
A. Simple CART tree. It can cut the poor or non-meaningful
CART is used to create binary decision tree. branches into an efficient pruning process. It
Binary means, each node has two child nodes. can also handle continuous attributes and
CART can be used both nominal and incomplete values.
continuous values. And there are three steps D. Bayesian Classification
in CART. These are statistical classifiers based on Bayes
1.Splitting each node in a tree. theorem. According to this theorem, the
2.Deciding when a tree is complete. predicted class (i) for a tested instance-
3.Assigning each node to a terminal outcome. defined by a set of attribute values as follows
To create maximum tree, we have to find the V= Λ Λ…Λ – is that one having the
best splitting node. The splitting node is found highest conditional probability, computed
using the Gini index method. The Gini index is using eq2 [8][9].
an impurity based criterion that measures the -(eq2)
divergence between the probability distribution
of target attribute values [3]. Once a best split
is found, CART repeats the search process for
Table 2: Coding of risk factors
each child node, continuing recursively until
Risk Factors Code1 Code2 Code3 Code4

403
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Code5
General Factors
Age
Sex M:Male F:Female
Job type
Sleep time
Smoking Yes No
Physical Exercise Yes No
Using meat
and salt Yes No
Using milk
and egg Yes No
Alcohol
consumption Yes No
Lipids(mg/dL)
CH 1:<200 2:200-239 3:>240
TG 1:<150 2:150-199 3:200-499 4:>500
HDL 1:<40 2:40-59 3:>60
LDL Optimal:<100 Acceptable:100-129 Bordeline:130-159 High:160-189
Very High:>190
Blood Factors
WBC Yes:>11000 No:<11000
RBC Yes:>4-5 No:<4-5
HB Yes:>11-13 No:<11-13
HCT Yes:>30-35 No:<30-35
PLT Yes:>1-2 No:<1-2
MCV Yes:>30-35 No:<30-35
MCH Yes:>28-36 No:<28-36
Obesity factors
BMI Yes:>18.5-24.5 No:<18.5-24.5
WC Yes:>86 No:<86
WHR (Woman) Yes:>0.7 No:<0.7
(Man) Yes:>0.9 No:<0.9
Apo lipoprotein(mg/dL)
APOA Yes:>2-200 No:<2-200
APOB Yes:>40-125 No:<40-125
APOB/APOA Yes:>0.9 No:<0.9
Inflammation
Factor(mg/dL) Yes:>10 No:<10
Sugar factor(mg/dL)
FBS Yes:>110 No:<110
HOMA Yes:>2.5 No:<2.5
Resting
blood pressure(mmHg) Yes:>110 No:<110

More importance is given to the numerator as assumes that the effect of an attribute value
the denominator does not depend on and on a given class is independent of the values
also it is effectively constant. It is the of the other attributes. This is called class
normalizing factor which is equal for all independence.
classes. Bayesian classifier has high accuracy
when applied to large databases. Simple Table 3: Classification errors
Bayesian classifier is called Naïve Bayes which

404
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Simple CART
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal .289 0.05 0.6 0.289 0.39
Acceptable 0.835 0.201 0.622 0.835 0.713
Borderline 1 0.288 0.625 1 0.769 62.13
High 0 0 0 0 0
Very High 0 0 0 0 0
ID3
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal 0.96 0.033 0.883 0.96 0.92
Acceptable 0.908 0.012 0.967 0.908 0.937
Borderline 0.984 0.051 0.903 0.984 0.942 92.4
High 0.811 0.006 0.959 0.811 0.879
Very High 0.772 0.001 0.974 0.772 0.861
J48
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal 0.634 0.055 0.753 0.634 0.689
Acceptable 0.825 0.101 0.764 0.825 0.793
Borderline 0.945 0.185 0.71 0.945 0.811 73.4
High 0.353 0.022 0.738 0.353 0.478
Very High 0.275 0.006 0.624 0.275 0.382
Naïve Bayes
Class TP Rate FP Rate Precision Recall F-Measure Accuracy(%)
Optimal 0.301 0.071 0.526 0.301 0.383
Acceptable 0.777 0.197 0.61 0.777 0.683
Borderline 0.999 0.287 0.625 0.999 0.769 60.74
High 0.001 0.001 0.167 0.001 0.001
Very High 0 0 0 0 0

405
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

4. Accuracy Measures CART, ID3, J48 and Naïve Bayes for mining
The performance of different algorithms of the major risk factors of cardiovascular
data mining are analyzed with the help of disease. By choosing 29 predictor variable and
accuracy measures which include True Positive LDL as the target class, it was shown that
rate, False Positive rate, Precision, Recall and APOA, CRP, Resting blood pressure, HDL and
F-Measure. Confusion matrix is an efficient MCH are the major risk factors according to
way to represent these values in a matrix ID3 which has an accuracy rate of 92.4%.In
format. future work we can expand and enhance this
a) True Positive rate: The TruePositive(TP) work with clustering, association analysis and
rate is the proportion of examples which were other classification algorithms.
classified as class x, among all examples which 7. References
truly have class x, i.e. how much part of the [1]. Alierza Kajabadi, Mohamad Hosein Sarace,
class was captured. It is equivalent to Recall. Sedighe Asgari. ―Data mining Cardiovascular
In the confusion matrix, this is the diagonal Risk Factors‖, IEEE 2009.
element divided by the sum over the relevant [2]. Dan-Anderi, Adela Viviana, ―Overview on
row. How Data Mining Tools May Support
b) False Positive rate:The FalsePositive(FP) Cardiovascular Disease Prediction‖, Journal of
rate is the proportion of examples which were Applied Computer Science & Mathematics, 57-
classified as class x, but belong to a different 62, 2010.
class, among all examples which are not of [3]. Minas A. Karaolis, Joseph A. Moutiris,
class x. In the matrix, this is the column sum Demetra Hadjipanayi, Constantinos S.
of class x minus the diagonal element, divided Pattichis, ―Assessment of the Risk Factors of
by the rows sums of all other classes. Coronary Heart Events Based on Data Mining
c)Precision:The Precision is the proportion of With Decision Trees‖, IEEE Transactions on
the examples which truly have class x among information technology in Biomedicine, vol. 14,
all those which were classified as class x. In No. 3, May 2010.
the matrix, this is the diagonal element divided [4]. Jiawei H. Micheline Kamber, ―Data Mining,
by the sum over the relevant column. d) Concepts and Techniques‖, Second edition,
Recall:Recall is same as true positive rate. e) Elsevier, 2006.
F-Measure:The F-Measure is simply [5]. L.Gaga, V. Moustakis, Y. Vlachakis, G.
2*Precision*Recall/(Precision+Recall), a Charissis, ―ID3: Enhancing Medical Knowledge
combined measure for precision and recall. Acquisition with Machine Learning‖, Applied
Artificial Intelligence, vol. 10, p79-94, Taylor &
5. Experimental Results Francis, 1996.
The WEKA tool, which is open source java [6]. I.H. Witten and E. Frank, ―Data Mining:
software, is used for provision of classification Practical Machine Learning tools and
technique. Default settings of WEKA are used techniques‖, 2nd Edition, Morgan Kaufmann,
for other options.29 predictor variables and San Francisco, 2005.
LDL as target variable is used in the tool. The [7]. K. Mollazade, H. Ahmadi, M. Omid, R.
major factors as per Simple CART are CH, Alimardani. International Journal of Intelligent
APOA, CRP, PLT, HCT and FBS. According to Technology, ―An Intelligent Combined Method
ID3 APOA, CRP, Resting blood pressure, HDL Based on Power Spectral
and MCH are the major risk factors. J48 gives Density, Decision Trees and Fuzzy Logic for
the major risk factors as APOA, CRP, HCT, Hydraulic Pumps Fault Diagnosis‖, Vol. 3 Issue
HDL, PLT and HOMA. The classification error is 4, p251-263, 2008.
shown in Table 3. ID3 with 92.4% accuracy [8]. D.W. Aha, D. Kibler and M.K. Albert.
performs better than Simple CART, J48 and Instance-Based Learning Algorithms. Machine
Naïve Bayes with 62%, 73% and 60% Learning, 6(1):37–66, 1991.
respectively. [9] Z. Zheng, G.I. Webb. Lazy Learning of
Bayesian Rules, Machine Learning, 41, 53–87,
6. Conclusion and future work Kluwer Academic Publishers, 2000.
In this paper we compared the performance of
different classification algorithms like Simple

406
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

QOS-AWARE CHECKPOINTING ARRANGEMENT


IN MOBILE GRID ENVIRONMENT
*J. Sangeetha, **M. Nithya
*M.E., Computer Science and Engineering
S.A .Engineering College, Chennai-77
1jsangeetha22@gmail.com
**Senior Lecturer
Department of Computer Science and Engineering
S.A .Engineering College, Chennai-77
Abstract— Mobile grids (MoGs) are receiving Grid computing is to have a consistent and accurate
growth attention and expected to become a definition, or at least determination of what a Mobile
critical part of a future computational Grid Grid is. However the various approaches that have been
involving MH to facilitate user access to the grid made address in a high degree of accuracy the term
and to also offer computing resources. Because Grid. The Grid can be viewed as a distributed, high
there some inconvenient problems has been performance computing and data handling
occurred due to mobility, less reliable wireless infrastructure that incorporates geographically and
link, frequent disconnections and variations in organizationally dispersed, heterogeneous resources
mobile systems. By introducing checkpointing (computing systems, storage systems, instruments and
arrangements in Mobile grid systems can avoid other real-time data sources, human collaborators,
such problems. Checkpointing saves the communication systems) and provides common
intermediate data and machine states interfaces for all these resources, using standard, open,
periodically to reliable storage during the course general-purpose protocols and interfaces[7].However, it
of job execution. It avoids having to start job is also the basis and the enabling technology for
execution all over again from the very beginning pervasive and utility computing due to the ability of
in the presence of every failure; it starts its being open, highly heterogeneous and scalable.
function from the failed host itself. This paper Mobile Computing is a generic term describing the
deals with distributed, QoS middleware for application of small, portable, Grid is already being
checkpointing arrangement in MoG computing successfully used in many scientific applications where
systems. Using ReD middleware employed huge amounts of data have to be processed and/or
decentralized QoS heuristics, to construct stored. Such demanding applications have created,
superior checkpointing arrangement as justified and diffused the concept of Grid among the
efficiently. This method is proposed to improve scientific community. As the amount of potential Grid
the performance of MHs services and maximize users is really enormous, the accumulated data
the probability of checkpointed data during job processing and storage requirements are at least
execution. comparable. In particular, mobile users might be the
future users of this new technology. Moreover, we have
Keywords— Mobile Grid, Checkpointing, ReD nomadic users who travel and work only seldom at their
Middleware, QoS Heuristics offices.
Mobile Grid enables both the mobility of the users
requesting access to a fixed Grid and the resources that
I. INTRODUCTION are themselves part of the Grid. Both cases have their
own limitations and constraints that should be handled.
Most existing Grids refer to clusters of computing and In the first case the devices of the mobile users act as
storage resources which are wire-interconnected for interfaces to the Grid enabling job submission,
offering utility services collaboratively, One of the most monitoring and management of the activities in an
critical things for understanding and realizing Mobile ‗anytime, anywhere‘ mode, while the Grid provides

407
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

them with a high reliability, performance and cost- connections to a server are deemed reliable, of high
efficiency. In those cases mobile Grid has the meaning bandwidth, and of low latency, 2) Fail to deal with link
of ‗gridifying‘ the mobile resources. In the second case disconnections and degrees of system topological
of having mobile Grid resources, we should underline dynamicity [6] [9]. At any time during job execution, a
that the performances of current mobile devices are host or link failure may lead to severe performance
significantly increased. degradation or even total job abortion, unless execution
A. CHECKPOINTING checkpointing is incorporated [8].
A checkpoint facility enables the intermediate state of a The mobile Grid will introduce changes to the general
process to be saved to a file. Users can later resume Grid concept. New functionalities of the Grid will be
execution of the process from the checkpoint file. This needed since the old ones will not make use of all the
prevents the loss of data generated by long-running capabilities that will be available [8]. These
processes due to program or system failures, and it functionalities will involve end-to-end solutions with
also facilitates debugging when the bug appears after emphasis on Quality of Service (QoS) and security, as
the program has executed for a long time. well as interoperability issues between the diverse
technologies involved. Enhanced security policies and
approaches to address large scale and heterogeneous
environments will be needed. [1][2] Additionally, the
volatile, mobile and poor networked environments have
to be addressed with adaptable QoS aspects which
have to be contextualized with respect to users and
their profiles. Mobile Grids will make use of the value
that many mobile users perceive due to the advanced
capabilities of their mobile devices[12]. These advanced
capabilities refer to the comparison of today‘s mobile
devices with the ones that existed in the past. Although
mobile devices are subject to physical constraints due
to their nature they have the ability of having
computational and storage capabilities similar to PCs,
Fig 1: A typical Checkpoint/Restart on an high quality displays, multiple interfaces (for instance
application Bluetooth, Ethernet adapters, USB, Infrared) etc. This
value can be converted into revenue for service
providers. This implies changes in various business
II. RELATED WORKS
models and policy issues. Complex workflows for
A lot of studies have been done on Mobile Grids businesses will be needed and Virtual Organizations
(MoGs), are receiving growing attention and expected (VO) will be enriched with the opportunity for automatic
to become a critical part of future computational Grid federations and resource sharing schemes.[15]
involving mobile hosts to facilitate user access to the Valuations of the diverse services and especially with
Grid and to also offer computing resources [4]. A MoG respect to the Service/Sharing level agreement have to
can involve a number of mobile hosts (MHs), having be adapted in relation to QoS aspects. Enterprises have
wireless interconnections among one another, or to to issue policies that will handle the conflict between
access points [2] [3]. Indeed, a recent push by HP to public rights to their users and viable models for
equip business notebooks with integrated global operation. Moreover the fair use of Grid must be
broadband wireless connectivity has made it possible to determined by reconciling rights of public access to
form a truly mobile Grid (MoG) that consist of MHs resources and private ownership of infrastructure and
providing computing utility services collaboratively, with resources [13]. In the sequel we present a short
or without connections to a wired Grid.[5][6][7].Various discussion on some of the most important challenges of
checkpointing mechanisms have been pursued for Mobile Grids with respect to the resource management
distributed systems (whose computing hosts are wire- topic. Of course many things will change and raise their
connected) However, they are not suitable for the MoG own challenges to be addressed in this new context.
because their checkpointing arrangements, 1) are
relatively immaterial as checkpointed data from hosts
can be stored at a designated server or servers, since

408
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

III. DISTRIBUTED CHECKPOINTING IN MOG


Our checkpoint arrangement protocol, as long as all
This work focuses on the MH checkpointing hosts running the distributed application have high
arrangement mechanism, seeking superior connectivity (i.e., low likelihood of separation from the
checkpointing arrangements to maximize the MoG), robust safe storage or reliable transmission to
probability of distributed application completion without safe storage is not really needed. Only poorly connected
sustaining an unrecoverable failure. It deals with MoG hosts (e.g., hosts on the fringes of the MoG) need
checkpointing among neighboring MHs without any robust safe storage and reliable transmission to that
access point or BS. Our main focus in checkpointing safe storage. Furthermore, since host failures happen
arrangement lies in MH connectivity, with host failure most frequently to poorly connected hosts, it makes
being the failure of all of the host‘s connecting wireless sense to reserve the most robust checkpoint storage
links. Our study will focus in resource management (providers) for these hosts, while leaving the least
topic which is a very critical subject for the efficient robust storage for the best connected hosts. This way
utilization of the Grid infrastructure functional entities. puts valuable and scarce robust checkpointing
The set of all such directed relationships, on a MoG resources to work where they are most needed in
instance, is called a checkpointing arrangement. reducing unrecoverable failures effectively.

A. PRODUCER AND CONSUMER ALGORITHM B. ReD METHODOLOGY


Let MHk → MHl define a checkpointing relationship An executing host is considered to be in ―failure,‖ if
between MHk and MHl, functioning as the consumer wireless connections to all of its neighbors are disrupted
and provider of checkpointing services, respectively. temporarily or permanently, resulting in its isolation and
We define stability, in the context of a checkpointing inability to achieve timely delivery of intermediate or
arrangement, as one where no consumer or provider final application results to other hosts. Executing MHs
prefers another provider or consumer, respectively, to with poor connectivity, have greater likelihood of
its current partner in the relationship, and all experiencing failure than do those with greater
consumers and providers have found providers and connectivity and are thus in greater need of
consumers, respectively. Such an arrangement is called checkpointing to the best, most reliably connected
stable. Distinct arrangements result in differing MoG provider.
system reliability values and potential instabilities due ReD seeks to determine the best possible
to positional and wireless signal strength variations checkpointing arrangement to maximize the probability
among MHs. Less reliable wireless links are more prone of application (job) recovery without experiencing an
to frequent and intermittent disconnections. With MHs, unrecoverable failure (Ri max). ReD‘s algorithm takes
relative locations, velocities, intervening obstacles, into account desired behavioral controlling heuristics in
multipathing, interference, and other effects all partially the following ways. First, we require the MoG to be
determine link strength and reliability. capable of autonomous operation without an access
point or BS and further to reduce the use of relatively
unreliable wireless links. ReD ensures this by storing
checkpointed data only at neighboring MHs, within the
MoG, and not requiring BS access or checkpoint
transmission over multiple hops. Second, in a MoG,
dynamicity ensures that a checkpointing arrangement
must be converged rapidly and efficiently, even though
it may only be close to optimal. While it is true that
poor checkpointing arrangements play a role in
reducing the Ri, we seek to maximize, so too do
unconverged arrangements (i.e., arrangements where a
significant percentage of consumers are still seeking to
establish checkpointing relationships with providers). To
ensure convergence within a reasonable time, ReD
employs four strategies:
1. ReD is supported by a clustering algorithm, which
Fig 2: Protocol Function partitions the global MoG into clusters, allowing ReD to

409
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

quickly and concurrently find superior arrangements readily grants permission and sends a positive
within each cluster instead of having to labor toward a acknowledgment back to MHk, establishing a MHk →
global MoG solution. MHl relationship. On the other hand, if a relationship,
2. ReD makes decisions about whether to request, say, MHj → MHl already exists, MHl checks to see if the
accept, or break checkpointing relationships, locally (at requesting consumer‘s pairing reliability gain is greater
the MH level) and in a fully distributed manner, instead than that for its existing paired consumer, If so, it
of attempting a high-level centralized or global breaks its relationship with MHj by sending it a break
consensus. message, and then grants permission to MHk by
3. ReD keeps checkpoint transmissions local, i.e., sending it an acknowledgment. If, on the other hand,
neighbor to neighbor, not requiring multiple hops and the statement proves false, MHk is sent a negative
significant additional transmission overhead to achieve acknowledgment, and MHl maintains the relationship,
checkpointing relationships, and MHj → MHl, unless otherwise severed due to mobility,
4. ReD allows a given consumer or provider to break its or weak signal. ReD‘s protocol messages while the
existing checkpointing relationship, only when the pseudo code used is listed in the Appendix., MHk, with
arrangement reliability improvement is significant, thus some provider, MHa, the pairing gain of k on a, ReD
promoting stability. has been enhanced since our initial preliminary work to
include pairing gain considerations when comparing
C. ReD METHODOLOGY DESCRIPTION prospective relationships as opposed to just strictly
comparing alternative relationship reliabilities directly. It
The central mechanism of our MoG middleware attempts to globally maximize Ri, through local,
component is our Reliability Driven (ReD) protocol, decentralized, MHk → MHa pairing decisions. To allow
which is aware of the reliabilities of links among MHs for mobility, tables of connectivity and link reliabilities
within the MoG, a significant indicator of the service are updated an aged via the soft state process. A
quality (i.e., QoS) a distributed application will receive. consumer periodically refreshes, checks its λka × ρa
Defining the term ―connectivity‖ to mean the parallel sorted product list, and determines if it might do better
reliability of all links from a given node to its neighbors, to find another provider, whereupon it takes action. A
ReD makes use of these link reliability values to provider, upon loss of relationship with a consumer, for
determine the best possible checkpointing arrangement any reason, deletes its consumer pointer and admits
dynamically. We seek to maximize requests from other consumers. Finally, upon receiving
a break message, a paired consumer initiates the
process of finding a checkpoint provider all over again.
Note that ReD is designed to be IID (meaning to run on
where RA is the arrangement reliability we seek to each host independently and identically). Because
maximize, Ci is the connectivity of consumer i, Pj is the messages can be lost in transmission, especially over
connectivity of provider j, and Lij is the reliability of the poor wireless links, tables of host connectivity link
wireless link from Ci to Pj. Because we determined that reliabilities, and consumer and provider pointers are
the problem of finding the optimum checkpointing maintained at hosts by the soft state registration
arrangement to be NP-Complete, ReD utilizes a process, e.g., due to mobility or a weak signal, a host
heuristic algorithm. To ensure convergence within a may declare that it has lost its provider (or consumer),
reasonable time, the global MoG is partitioned into attempting to find a new one .
clusters. Because ReD operates within clusters, and is
localized, it often arrives at suboptimal checkpoint IV. IMPLEMENTATION
arrangements
Upon initiation or refresh, if some consumer, MHk, To evaluate our approach, we design an optimal global
does not have a designated provider, it begins to look arrangement to improve the performance of wired grid
for one. In doing so, it examines and compares λka × computing systems, which will develop in C# and to
ρa products of each of its n neighbors, MHa . Next, it implement our techniques for network based approach.
transmits a checkpoint request, first to the list top host As this figure shows, superior reliability checkpointing
(having the greatest λka × ρa product), e.g., MHl. In arrangements in MoG.
essence, a checkpoint request asks the provider, MHl‘s
permission to send checkpointed data to it. If
prospective provider, MHl has no consumer of record, it

410
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[4] ―Integration of Mobile computing with Grid


Computing: A Middleware Architecture‖ 2nd national
conference on challenges & opportunities in
Information Technology (COIT-2008) Rimt-Iet,
Mandi Gobindgarh. March 29, 2008
[5] Blocking and Non-Blocking Checkpointing and
Rollback Recovery for Networks-On-Chip, IEEE/IFIP
DSN-2008 2nd Workshop on Dependable and Secure
Nanocomputing, June 27, 2008
[6] The Performance of Checkpointing and Replication
schemes for fault tolerant Mobile Agent Systems,
2002 IEEE.,
[7] ―Grid Checkpointing Architecture - Integration of
Low-Level Checkpointing capabilities with Grid‖,
CORE GRID May 22, 2007
Fig 3: System Architecture for Checkpointing [8] ―Application-Driven Coordination-Free Distributed
Arrangement Check pointing‖, Agberia and W.Sandres Proc. 25th
IEEE conf. Distributed Computing Systems June
IV. CONCLUSION 2005.
We have investigated the network load induced by [9] ―The Performance of Check pointing and Replication
inference mechanisms, and presented efficient schemes for Fault Tolerant Mobile Agent Systems‖ ,
algorithms to identify Nodal mobility in a large MoG Taesoon Park and Ilsoo Byun, Proc 21st IEEE Symp.
may render a MH participating in one job execution, Reliable Distributed Systems, Oct 2002.
unreachable from the remaining MHs occasionally, [10] ―Connecting Network Partitions with Location-
calling for efficient check pointing in support of long job Assisted Forwarding nodes in Mobile Ad hoc
execution. As earlier proposed check pointing Environments‖, Chia-Ho Ou* Kuo-Feng Ssu, Proc.
approaches cannot be applied directly to MoGs and are 10th IEEE Pacific Rim Intl Symp. Dependable
not QoS-aware, we have dealt with QoS-aware check Computing, Mar 2004.
pointing and recovery specifically for MoGs, with this [11] ―Portable Checkpointing and Communication for
paper focusing solely on check pointing arrangement. BSP applications on Dynamic Heterogeneous Grid
ReD achieves significant reliability gains by quickly and Environments‖, Raphael Y. de Camargo, Fabio Kon,
efficiently determining check pointing arrangements for and Alfredo Proceedings of the 17th International
most MHs in a MoG. Symposium on Computer Architecture and High
Future work, will involve to analysis of wireless Performance Computing (SBAC-PAD‘05), 2005.
mobile networks using checkpoints arrangements for [12] Checkpoint-Recovery Protocol for Reliable
Mobile grid environment which includes further Mobile Systems, proc. 17th IEEE symp. Reliable
development of requirements of wireless mobile Distributed Systems, pp. 93-99, Oct. 1998.
devices and wireless mobile networks for mobile grids. [13] P. Darby and N. Tzeng, ―Peer-to-peer
Checkpointing Arrangement for Mobile Grid
Computing Systems,‖ proc. 16th IEEE Int‘l symp.
REFERENCES High Performance Distributed Computing (HPDC-16),
[1] ―Decentralized QOS-aware Checkpointing June 2007.
Arrangement in Mobile Grid Computing‖, Paul J. [14] X. Ren, R. Eigenmann, and S. Bagchi, ―Failure-
Darby III, and Nian-Feng Tzeng, IEEE transactions Aware Checkpointing in Fine-grained Cycle Sharing
on mobile computing, Vol. 9, no. 8, August 2010. Systems,‖ Proc. 16th IEEE Int‘l symp. High
[2] ―A survey of Checkpointing Algorithms for Performance Distributed Computing (HPDC-16), pp.
Distributed Mobile systems‖, International Journal of 33- 42, June 2007.
Research and Reviews in Computer Science [15] W. Gao, M. Chen, and T. Nanya, ―A Faster
(IJRRCS) Vol. 1, No. 2, June 2010 Checkpointing and Recovery Algorithm with a
[3] ―Architecture support for behavior-based Adaptive Hierarchical Storage Approach,‖ Proc. Eighth Int‘l
Checkpointing‖, Chen, Shangping Ren Journal of Conf. High-Performance Computing in Asia-Pacific
Software, vol. 3, no. 2, February 2008. region, pp. 398-402, Nov. 2005

411
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EXTENDED QUERY ORIENTED, CONCEPT-BASED


USER PROFILES FROM SEARCH ENGINE LOGS

*K.Siva Sakthi, **R.Geetha


*M.E., Computer Science and Engineering
S.A .Engineering College, Chennai-77
k.s.shakthi@gmail.com
**Assistant Professor
Department of Computer Science and Engineering
S.A .Engineering College, Chennai-77

Abstract— A good user profiling strategy is an demands on concepts and technologies to support
essential and fundamental component in users to filter relevant information. Information
search engine personalization. Most existing retrieval (IR) and information filtering (IF) are two
user profiling strategies such as Web server log analytical information seeking stategies in this paper
files and Meta data describing page contents we focus on information filtering. Information filtering
are based on objects that users are interested assumes a rather stable user interest (reflected
in (i.e., positive preferences), but not the through a user profile) but has to deal with highly
objects that users dislike (i.e., negative dynamic information sources [20]. In IF systems, a
preferences). In this paper, we focus on search user profile typically includes long-term user interests
engine personalization and develop several [3] and the acceptance of IF systems highly depends
concept-based user-profiling strategies that on the quality of user profiles. In particular, a user
are based on both positive and negative profile describes a set of user interests which can be
preferences. We evaluate the proposed modeled via categories like sports, technology, or
methods against our previously proposed nutrition, and can be used for the purpose of
personalized query clustering method. User information filtering. The definition of user profiles
profiling methods that incorporate negative can either be explicit, implicit or a combination of
concept weights return termination points that both. In the explicit approach the system interacts
are very close to the optimal points obtained with the user and acquires feedback on information
by exhaustive search. An accurate user profile that the user has retrieved or filtered respectively. In
can greatly improve a search engine’s turn, the user can, for example, indicate which
performance by identifying the information filtering results are of most interest to him to improve
needs for individual users. By applying future filtering results (so called relevance feedback).
preference mining rules to infer not only users’ A good user profiling strategy is an essential
positive preferences but also their negative and fundamental component in search engine
preferences and utilized both kinds of personalization.
preferences in deriving users profiles.
XXX. RELATED WORKS
Keywords— Negative preferences, personalization, User profiling strategies can be broadly classified
personalized query clustering, search engine, user into two main approaches: document-based and
profiling. concept-based approaches. Document-based user
profiling methods aim at capturing users‘ clicking and
XXIX. INTRODUCTION browsing behaviors. Users‘ document preferences are
The constantly growing information supply in first extracted from the clickthrough data, and then,
Internet-based information systems poses high used to learn the user behavior model which is

412
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

usually represented as a set of weighted features. On Section 4, are employed to create conceptbased user
the other hand, concept-based user profiling methods profiles. Finally, the concept-based user profiles are
aim at capturing users‘ conceptual needs. Users‘ compared with each other and against as baseline
browsed documents and search histories are our previously proposed personalized concept-based
automatically mapped into a set of topical categories. clustering algorithm
User profiles are created based on the users‘
preferences on the extracted topical categories. 3.1 Concept Extraction Using Web-Snippets
Our concept extraction method is inspired by the
2.1 Document-Based Methods wellknown problem of finding frequent item sets in
Most document-based methods focus on analyzing data mining [9]. When a user submits a query to the
users‘ clicking and browsing behaviors recorded in the search engine, a set of web-snippets are returned to
users‘ clickthrough data. On Web search engines, the user for identifying the relevant items. We
clickthrough data is a kind of implicit feedback from assume that if a keyword or a phrase appears
users. Table 1 is an example clickthrough data for the frequently in the web-snippets of a particular query, it
query ―apple,‖ which shows the URLs returned from represents an important concept related to the query
the search engine for the query and the URLs clicked because it coexists in close proximity with the query
on by the user. in the top documents. We use the following support
formula for measuring the interestingness of a
TABLE 1
particular keyword/phrase ti with respect to the
THE CLICKTHROUGH DATA FOR THE QUERY returned websnippets arising from a query q:
―APPLE‖
. │ti│

where n is the total number of web-snippets


returned, sf(ti) is the snippet frequency of the
keyword/phrase ti (i.e., the number of web-snippets
containing ti), and │ti│is the number of terms in the
keyword/phrase ti. For simplicity, we omit q in the
above expression if no ambiguity arises. To extract
concepts for a query q, we first extract all the
keywords and phrases from the web-snippets
2.2 Concept-Based Methods returned by the query. After obtaining a set of
keywords/phrases (ti), we compute the support for all
Most concept-based methods automatically derive
users‘ topical interests by exploring the contents of ti (support(ti)). If the support of a keyword/phrase ti
the users‘ browsed documents and search histories. is bigger than the threshold s (support(ti) > s), we
Liu et al. [13] proposed a user profiling method based would treat ti as a concept for the query q. Table 3
on users‘ search history and the Open Directory illustrates the extracted concepts for the query q =
Project (ODP). The user profile is represented as a ‗‗apple‘‘.
set of categories, and for each category, a set of TABLE 3
keywords with weights. The categories stored in the
EXTRACTED CONCEPTS FOR THE QUERY ―APPLE‖
user profiles serve as a context to disambiguate user
queries. If a profile shows that a user is interested in
certain categories, the search can be narrowed down
by providing suggested results according to the user‘s
preferred categories.

XXXI. CONCEPT EXTRACTION


Our personalized concept-based clustering method
consists of three steps. First, we employ a concept
extraction algorithm, which will be described in
Section 3.1, to extract concepts and their relations
from the Web-snippets returned by the search
engine. Second, seven different concept-based user
profiling strategies, which will be introduced in

413
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

3.2 Mining Concept Relations is the snippet frequency of concept t in


To find relations between concepts, we apply a either document summaries or document titles.
well-known signal-to-noise ratio formula from data
mining to establish similarity between terms t1 and
t2. The similarity value of Church and Hanks‘ formula
always lies between [0, 1] and thus can be used
directly in step 3:
log n

where n is the number of documents in the corpus,


is the joint document frequency of t1 and
t2, and is the document frequency of the term
t.
In our context, two concepts ti, tj could coexist in
a web snippet in the following situations: 1) ti and tj
coexist in the title, 2) ti and tj coexist in the summary,
or 3) ti exists in the title, while tj exists in the
summary (or vice versa) Therefore, we modify
Church and Hanks‘ formula for the three different
cases in our context as follows:

where is the similarity between Fig. 3. (a) A concept relationship graph for the query
―apple‖ derived without incorporating user
concepts ti and tj, which is composed of clickthroughs. (b) A concept preference profile
, and constructed using the user clickthroughs and the
as follows: concept relationship graph in (a). wti is the
interestingness of the concept ti to the user. More
clicks on a concept gradually increase the
interestingness wti of the concept.
3.3 Creating User Concept Preference Profile
The concept relationship graph is first derived
without taking user clickthroughs into account.
Intuitively, the graph shows the possible concept
space arising from user‘s queries. The concept space,
in general, covers more than what the user actually
wants. For example, when the user searches for the
query ―apple,‖ the concept space derived from the
web-snippets contains concepts such as ―ipod,‖
―iphone,‖ and ―recipe.‖ Therefore, we propose the
where n is the total number of web-snippets following formulas to capture user‘s interestingness
returned, is the joint snippet frequency wti on the extracted concepts ti when a clicked web-
of concepts ti and tj in document titles, snippet sj, denoted by click(sj), is found as follows:
is the snippet frequency of concept t in Click(sj) => = 1
document titles, is the joint Click(sj) => = if
snippet frequency of ti and tj in document summaries,
is the snippet frequency of concept t in where sj is a web-snippet, wti is the interestingness
document summaries, is the joint weight of the concept ti, and tj is the neighborhood
snippet frequency of concept ti in a document title concept of ti.
and tj in the document‘s summary (or vice versa), XXXII. QUERY CLUSTERING ALGORITHM
and

414
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

We now review our personalized concept-based


clustering algorithm [11] with which ambiguous
queries can be classified into different query clusters. XXXIII. USER PROFILING STRATEGIES
Concept-based user profiles are employed in the In this section, we propose several user profiling
clustering process to achieve personalization effect. strategies which are both concept-based and utilize
First, a query-concept bipartite graph G is constructed users‘ positive and negative preferences. They are
by the clustering algorithm in which one set of nodes PClick, PSpyNB_C, and PClick+SpyNB_C.
corresponds to the set of users‘ queries and the other
corresponds to the sets of extracted concepts. Details 5.1 Click-Based Method (PClick)
of the personalized clustering algorithm is shown in The concepts extracted for a query q using the
Algorithm 1. concept extraction method discussed in Section 3.1
describe the possible concept space arising from the
query q. The concept space may cover more than
where Nx is a weight vector for the set of neighbor what the user actually wants. If the user is indeed
nodes of node x in the bipartite graph G, the weight interested in ―apple‖ as a fruit and clicks on pages
of a neighbor node nx in the weight vector Nx is the containing the concept ―fruit,‖ the user profile
weight of the link connecting x and nx in G, Ny is a represented as a weighted concept vector should
weight vector for the set of neighbor nodes of node y record the user interest on the concept ―apple‖ and
in G, and the weight of a neighbor node ny in Ny is its neighborhood (i.e., concepts which having similar
the weight of the link connecting y and ny in G. meaning as ―fruit‖), while downgrading unrelated
Algorithm 1. Personalized Agglomerative concepts such as ―macintosh,‖ ―ipod,‖ and their
Clustering neighborhood. Therefore, we propose the following
Input: A Query-Concept Bipartite Graph G formulas to capture a user‘s degree of interest wci on
Output: A Personalized Clustered Query-Concept the extracted concepts ci, when a Web-snippet sj is
Bipartite Graph Gp clicked by the user (denoted by click(sj)):
// Initial Clustering Click(sj) => = 1
1: Obtain the similarity scores in G for all possible Click(sj) => = if
pairs of query nodes using Equation (7).
2: Merge the pair of most similar query nodes
(qi,qj) that does not contain the same query from where sj is a Web-snippet, wci represents the
different users. Assume that a concept node c is user‘s degree of interest on the concept ci, and cj is
connected to both query nodes qi and qj with weight the neighborhood concept of ci.
wi and wj, a new link is created between c and (qi,qj)
with weight w = wi + wj. 5.2 SpyNB-C Method (PSpyNB_C)
3: Obtain the similarity scores in G for all possible Both Joachims and mJoachims are based on a
pairs of concept nodes using Equation (7). rather strong assumption that pages scanned but not
4: Merge the pair of concept nodes (ci,cj) having clicked by the user are considered uninteresting to
highest similarity score. Assume that a query node q the user, and hence, irrelevant to the user‘s query.
is connected to both concept nodes ci and cj with SpyNB does not make this assumption [15], but
weight wi and wj, a new link is created between q instead assumes that unclicked pages could be either
and (ci,cj) with weight w = wi + wj. relevant or irrelevant to the user. Therefore, SpyNB
5. Unless termination is reached, repeat Steps 1-4. treats clicked pages as positive samples and unclicked
// Community Merging pages as unlabeled samples in the training process.
6. Obtain the similarity scores in G for all possible The problem of finding user preferences becomes one
pairs of query nodes using Equation (7). of identifying from the unlabeled set reliable negative
7. Merge the pair of most similar query nodes documents that are considered irrelevant to the user.
(qi,qj) that contains the same query from different
users. Assume that a concept node c is connected to XXXIV. EXPERIMENTAL RESULTS
both query nodes qi and qj with weight wi and wj, a The collected clickthrough data are used by the
new link is created between c and (qi,qj) with weight proposed user profiling strategies to create user
w = wi+ wj. profiles. we study the performance of a heuristic for
8. Unless termination is reached, repeat Steps 6-7. determining the termination points of initial clustering
and community merging based on the change of
intracluster similarity. We show that user profiling

415
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

methods that incorporate negative concept weights [34] D. Beeferman and A. Berger, ―Agglomerative
return termination points that are very close to the Clustering of a
optimal points obtained by exhaustive search. [35] Search Engine Query Log,‖ Proc. ACM
SIGKDD, 2000.
6.1 Experimental Setup [36] C. Burges, T. Shaked, E. Renshaw, A. Lazier,
The query and clickthrough data for evaluation are M. Deeds, N.
adopted from our previous work [11]. To evaluate the [37] Hamilton, and G. Hullender,
performance of our user profiling strategies, we ―Learning to Rank Using Gradient
developed a middleware for Google3 to collect [38] Descent,‖ Proc. Int‘l Conf. Machine
clickthrough data. We used 500 test queries, which learning (ICML), 2005.
are intentionally designed to have ambiguous [39] K.W. Church, W. Gale, P. Hanks, and D.
meanings (e.g., the query ―kodak‖ can refer to a Hindle, ―Using Statistics
digital camera or a camera film). We ask human [40] in Lexical Analysis,‖ Lexical
judges to determine a standard cluster for each Acquisition: Exploiting On-Line
query. To avoid any bias, the test queries are [41] Resources to Build a Lexicon,
randomly selected from 10 different categories. Table Lawrence Erlbaum, 1991.
8 shows the topical categories in which the test [42] Z. Dou, R. Song, and J.-R. Wen, ―A Largescale
queries are chosen from. When a query is submitted Evaluation and
to the middleware, a list containing the top 100 [43] Analysis of Personalized Search
search results together with the extracted concepts is Strategies,‖ Proc. World Wide Web
returned to the users, and the users are required to [44] (WWW) Conf., 2007.
click on the results they find relevant to their queries. [45] S. Gauch, J. Chaffee, and A. Pretschner,
―Ontology-Based
XXXV. CONCLUSIONS [46] Personalized Search and Browsing,‖
An accurate user profile can greatly improve a ACM Web Intelligence and
search engine‘s performance by identifying the [47] Agent System, vol. 1, nos. 3/4, pp.
information needs for individual users. In this paper, 219-234, 2003.
we proposed and evaluated several user profiling [48] T. Joachims, ―Optimizing Search Engines Using
strategies. The techniques make use of clickthrough Clickthrough
data to extract from Web-snippets to build concept- [49] Data,‖ Proc. ACM SIGKDD, 2002.
based user profiles automatically. We [50] K.W.-T. Leung, W. Ng, and D.L. Lee,
applied preference mining rules to infer not only ―Personalized Concept-
users‘ positive preferences but also their negative [51] Based Clustering of Search Engine
preferences, andutilized both kinds of preferences in Queries,‖ IEEE Trans. Knowledge
deriving users profiles. [52] and Data Eng., vol. 20, no. 11, pp.
[23] REFERENCES 1505-1518, Nov. 2008.
[24] E. Agichtein, E. Brill, and S. Dumais, [53] B. Liu, W.S. Lee, P.S. Yu, and X. Li, ―Partially
―Improving Web Search Supervised
[25] Ranking by Incorporating User [54] Classification of Text Documents,‖
Behavior Information,‖ Proc. ACM Proc. Int‘l Conf. Machine
[26] SIGIR, 2006. [55] Learning (ICML), 2002.
[27] E. Agichtein, E. Brill, S. Dumais, and R. [56] F. Liu, C. Yu, and W. Meng, ―Personalized
Ragno, ―Learning User Web Search by
[28] Interaction Models for Predicting Web [57] Mapping User Queries to Categories,‖
Search Result Preferences,‖ Proc. Int‘l Conf. Information
[29] Proc. ACM SIGIR, 2006. [58] and Knowledge Management (CIKM),
[30] Appendix: 500 Test Queries, 2002.
http://www.cse.ust.hk/~dlee/ [59] Magellan, http://magellan.mckinley.com/,
[31] tkde09/Appendix.pdf, 2009. 2008.
[32] R. Baeza-yates, C. Hurtado, and M. Mendoza, [60] W. Ng, L. Deng, and D.L. Lee, ―Mining User
―Query Recommendation Using Query Logs in Preference Using Spy
Search Engines,‖ Proc. Int‘l [61] Voting for Search Engine
[33] Workshop Current Trends in Personalization,‖ ACM Trans. Internet
Database Technology, pp. 588-596, 2004.

416
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[62] Technology, vol. 7, no. 4, article 19,


2007.

417
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EFFICIENTLY IDENTIFYING DDOS ATTACKS BY


GROUP BASED THEORY
*S.MADHAN KUMAR, **R.PANNEER SELVI

* Department of Computer Science and Engineering, VelTech MultiTech Dr RR Dr SR Engineering College,


Chennai, Tamil Nadu.
madhan866@gmail.com
**M.E., Computer Science and Engineering, VelTech MultiTech Dr RR Dr SR Engineering College,
Chennai, Tamil Nadu
rpanneer_selvi@yahoo.co.in

ABSTRACT 1. INTRODUCTION
Distributed Denial of Service (DDoS) attacks continue
to plague the Internet. Application DoS attack, which Denial of Services (DoS) attacks aimed at disrupting
aims at disrupting application service rather than network services range from simple bandwidth
depleting the network resource, has emerged as a exhaustion attacks and those targeted at flaws in
larger threat to network services, compared to the commercial software to complex distributed attacks
classic DoS attack. When the DDoS attack is detected exploiting specific commercial off-the-shelf
by IDS, the firewall just discards all over-bounded (COTS)software flaws. Denial of Services (DoS)
traffic for a victim of absolutely decreases the attacks aimed at disrupting network services range
threshold of the router. Also, Attacker use spoofing IP from simple bandwidth exhaustion attacks and those
address. Defense against these attacks is complicated targeted at flaws in commercial software to complex
by spoofed source IP addresses, which make it distributed attacks exploiting specific COTS software
difficult to determine a packet‘s true origin. To flaws. DENIAL-OF-SERVICE (DoS) attack, which aims
identify application DoS attack, we propose a novel to make a service unavailable to legitimate clients,
group testing (GT)-based approach deployed on has become a severe threat to the Internet security
back-end servers, which not only offers a theoretical [2]. Traditional DoS attacks mainly abuse the network
method to obtain short detection delay and low false bandwidth around the Internet subsystems and
positive/negative rate, but also provides an degrade the quality of service by generating
underlying framework against general network congestions at the network [2], [3]. Consequently,
attacks. More specifically, we first extend classic GT several network-based defense methods have tried to
model with size constraints for practice purposes, detect these attacks by controlling traffic volume or
then redistribute the client service requests to differentiating traffic patterns at the intermediate
multiple virtual servers embedded within each back- routers [9], [10].However, with the boost in network
end server machine according to specific testing bandwidth and application service types, recently, the
matrices. Since this method only counts the number target of DoS attacks has shifted from network to
of incoming requests rather than monitoring the server resources and application procedures
server status, it is restricted to defending high-rate themselves, forming a new application DoS attack
DoS attacks. Based on this framework, we propose a [1], [2].Application DoS attacks exhibit three
two-mode detection mechanism using some dynamic advantages over traditional DoS attacks which help
thresholds to efficiently identify the attackers. evade normal detections: malicious traffic is always
indistinguishable from normal traffic, adopting
Index Terms—IDS, Application DoS, group testing, automated script to avoid the need for a large
network security. amount of ―zombie‖ machines or bandwidth to launch
the attack, much harder to be traced due to multiple

418
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

redirections at proxies. According to these and only if the ith pool contains the jth item;
characteristics, the malicious traffic can be classified otherwise, M[i, j]= 0. The t-dimensional binary
into legitimate-like requests of two cases: 1) at a high column vector V denotes the test outcomes of these t
inter_arrival rate and 2) consuming more service pools, where 1-entry represents a positive outcome
resources. We call these two cases ―high-rate‖ and and 0-entry represents a negative one. Note that a
―high-workload‖ attacks, respectively. Since these positive outcome indicates that at least one positive
attacks usually do not cause congestion at the item exists within this pool, whereas negative one
network level; thus, bypass the network-based means that all the items in the current pool are
monitoring negative.
system [21], detection, and mitigation at the end
system of the victim servers have been proposed by 2.1.2 Classic Methods
the two techniques, DDoS shield and CAPTCHA-
based defense .In DDoS shield the session validation Two traditional GT methods are adaptive and
based on legitimate behavior profile and in CAPTCHA nonadaptive [14]. Adaptive methods, use the results
authentication using human-solvable puzzles. The of previous tests to determine the pool for the next
overhead for per-session validation is not negligible, test and complete the test within several rounds.
especially for services with dense traffic. CAPTCHA- While nonadaptive GT methods employ d-disjunct
based defenses introduce additional service delays for matrix [14], run multiple tests in parallel, and finish
legitimate clients and are also restricted to human the test within only one round.
interaction services.
2.1.3 Decoding Algorithms
2.1 Classic Group Testing
Model
For sequential GT, at the end of each round, items in
The identification of attackers can be much faster if negative pools are identified as negative, while the
we can find them out by testing the clients in group ones in
instead of one by one. Thus, the key problem is how positive pools require to be further tested. Notice that
to group clients and assign them to different server one item is identified as positive only if it is the only
machines in a sophisticated way, so that if any server item in a
is found under attack ,we can immediately identify positive pool. Nonadaptive GT takes d-disjunct
and filter the attackers out of its client set. matrices as the testing matrix M, where no column is
Apparently, this problem resembles the group testing contained in the
(GT) theory [14] which aims to discover defective Boolean summation of any other d columns.
items in a large population with the minimum number
of tests where each test is applied to a subset of
items, called pools, instead of testing them one by
one. Therefore, we apply GT theory to this network
security issue and propose specific algorithms and
protocols to achieve high detection performance in
terms of short detection latency and low false
positive/negative rate. Since the detections are Fig. 1. Binary testing matrix M and testing outcome
merely based on the status of service resources vector V .
usage of the victim servers, no individually signature-
based authentications or data classifications are
required; thus, it may overcome the limitations of the Fig- 1 as an example, outcomes V [ 3] and V[ 4] are
0, so items in pool 3 and pool 4 are negative, i.e.,
current solutions.
items 3, 4, and 5 are negative. If this matrix M is a d-
disjunct matrix, items other than those appearing in
2.1.1 Basic Idea the negative pools are positive; therefore, items 1
and 2 are positive ones.
The classic GT model consists of t pools and n items
(including at most d positive ones). As shown in Fig. 2.1.4 Apply to Attack Detection
1, this model can be represented by a t×n binary
matrix M where rows represent the pools and
columns represent the items. An entry M[i, j]= 1 if

419
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A detection model based on GT can be assume that


there are t virtual servers and n clients, among which
d clients are
attackers. Consider the matrix M [ t × n ]
in Fig. 1, the clients can be mapped into the columns
and virtual servers into rows
Fig. 4. One testing round in DANGER mode.
in M, where M[i; j] = 1 if and only if the requests
from client j are distributed to virtual server i. With
First, generate and update matrix M for testing.
regard to the test outcome column V, we have V [i]=
Second, ―assign‖ clients to virtual servers based on
1 if and only if virtual server i has received malicious
M. The back-end server maps each client into one
requests from at least one attacker, but we cannot
distinct column in M and distributes an encrypted
identify the attackers at once unless this virtual server
token queue to it.
is handling only one client. Otherwise, if V[i] = 0, all
Third, all the servers are monitored for their service
the clients assigned to server i are legitimate. The d
resource usage periodically, specifically, the arriving
attackers can then be captured by decoding the test
request
outcome vector V and the matrix M.
aggregate (the total number of incoming requests)
and average response time of each virtual server are
recorded and compared with some dynamic
thresholds.
Fourth, decode these outcomes and identify
legitimate or
malicious IDs.

Fig. 3. Two-state diagram of the system.


To lower the overhead and delay introduced
by the mapping and piggybacking for each request,
the system is exempted from this procedure in
Fig. 2. Victim/detection model.
normal service state.The back-end server cycles
between two states, which we refer as NORMAL
mode and DANGER mode. Once the estimated
response time (ERT) of any virtual server exceeds
some profile-based threshold, the whole backend
3. Detection System
server will transfer to the DANGER mode and execute
The implementation difficulties of our detection the detection scheme. Whenever the average
scheme are threefold: response time (ART) of each virtual server falls below
 how to construct proper testing matrix M, the threshold, the physical server returns to NORMAL
 how to distribute client requests based on M mode.
with low overhead, and
 how to generate test outcome with high 3.2 Configuration Details
accuracy.
Several critical issues regarding the implementation
3.1 System Overview
are as
follows:
The detection consists of multiple testing rounds, and Session state transfer. By deploying the detection
each round can be sketched in four stages: service on the back-end server tier, our scheme is
orthogonal with the session state transfer problem
caused by the load-balancing at the reverse proxies
(front-end tier). That the front-end proxies distribute

420
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

client requests strictly even to the back-end servers, The PND and SDP algorithms achieve slightly better
i.e., without considering session sticked. The way of performance than the SDoP algorithm. Furthermore,
distributing token queues to be mentioned later is the efficiency of the PND algorithm can be further
tightly related to this assumption. However, even if enhanced by optimizing the d-disjunct matrix
the proxies conduct more sophisticated forwarding, employed and state maintenance efficiency.
the token queue distribution can be readily adapted
by manipulating the token piggybacking mechanism
at the client side accordingly. 6 RELATED WORK IN DoS
Since the testing procedure requires distributing DETECTION
intrasession requests to different virtual servers, the Numerous defense schemes against DoS have been
overhead for maintaining consistent session state is proposed and developed [7], which can be
incurred. Our motivation of utilizing virtual servers is categorized into network-based mechanisms and
can retrieve the latest client state though the shared system-based ones. Existing network-based
memory, which resembles the principle of Network mechanisms aim to identify the malicious packets at
File System (NFS). An alternative way out is to the intermediate routers or hosts [10], [12], by either
forward intrasession requests to the same virtual checking the traffic volumes or the traffic
server, which callsfor longer testing period for each distributions. However, the application DoS attacks
round. have no necessary deviations in terms of these
metrics from the legitimate traffic statistics; therefore,
5.SIMULATION CONFIGURATIONS AND network-based mechanisms cannot efficiently handle
RESULTS these attack types.

5.1 CONFIGURATIONS 7 CONCLUSIONS AND


DISCUSSIONS
To demonstrate the theoretical
We proposed a novel technique for detecting
complexity results shown in the
application DoS attack by means of a new constraint-
previous section, we conduct a
based group testing model. Motivated by classic GT
simulation study on the proposed
methods, three detection algorithms were proposed
system, in terms of four metrics:
and a system based on these algorithms was
 average testing latency T which refers to the introduced. Theoretical analysis and preliminary
length of the time interval from attackers simulation results demonstrated the outstanding
starting sending requests till all of them are performance of this system in terms of low detection
identified; latency and false positive/negative rate. Some
 average false positive rate fp; false negative possible directions for this can be:
rate fn; 1. the sequential algorithm can be adjusted to
 as well as the average number of testing avoid the requirement of isolating attackers;
rounds R 2. more efficient d-disjunct matrix could
 test which stands for the number of testing dramatically decrease the detection latency,
rounds needed for identifying all the clients as we showed in the theoretical analysis.
by each algorithm. 3. the overhead of maintaining the state
transfer among virtual servers can be further
5.2 Results decreased by more sophisticated techniques.

Overall, the simulation results can be concluded as


follows:
In general, the system can efficiently detect the
attacks, filter out the malicious clients, and recover to REFERENCES
NORMAL mode within a short period of time in
real-time network scenarios. [1] S. Ranjan, R. Swaminathan, M. Uysal, and E.
All the three detection algorithms can complete the Knightly, ―DDos-
detection with short latency (less than 30 s) and low Resilient Scheduling to Counter Application Layer
false negative/positive rate (both less than 5 percent) Attacks under
for up to 2,000 clients. Thus, they are applicable to Imperfect Detection,‖ Proc. IEEE INFOCOM, Apr.
large-scale time/error-sensitive services. 2006.

421
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[2] S. Vries, ―A Corsaire White Paper: Application


Denial of Service
(DoS) Attacks,‖
http://research.corsaire.com/whitepapers/
040405-application-level-dos-attacks.pdf, 2010.
[3] S. Kandula, D. Katabi, M. Jacob, and A.W. Berger,
―Botz-4-Sale: Surviving Organized DDoS Attacks That
Mimic Flash Crowds,‖ Proc. Second Symp. Networked
Systems Design and Implementation (NSDI), May
2005.
[4] S. Khattab, S. Gobriel, R. Melhem, and D. Mosse,
―Live Baiting for
Service-Level DoS Attackers,‖ Proc.IEEE INFOCOM,
2008.
[5] M.T. Thai, Y. Xuan, I. Shin, and T. Znati, ―On
Detection of Malicious Users Using Group Testing
Techniques,‖ Proc. Int‘l Conf. Distributed Computing
Systems(ICDCS), 2008.
[6] M.T. Thai, P. Deng, W. Wu, and T. Znati,
―Approximation Algorithms of Nonunique Probes
Selection for Biological Target Identification,‖ Proc.
Conf. Data Mining, Systems Analysis and Optimization
in Biomedicine, 2007.
[7] J. Mirkovic, J. Martin, and P. Reiher, ―A Taxonomy
of DDoS Attacks and DDoS Defense Mechanisms,‖
Technical Report 020018, Computer Science Dept.,
UCLA, 2002.
[8] M.J. Atallah, M.T. Goodrich, and R. Tamassia,
―Indexing Information for Data Forensics,‖ Proc.Int‘l
Conf. Applied Cryptography
and Network Security (ACNS), pp. 206-221, 2005.
[9] J. Lemon, ―Resisting SYN Flood DoS Attacks with
a SYN Cache,‖
Proc. BSDCON, 2002.
[10] Service Provider Infrastructure Security,
―Detecting, Tracing, and Mitigating Network-Wide
Anomalies,‖ http://www.arbornetworks.com, 2005.
[11] Y. Kim, W.C. Lau, M.C. Chuah, and H.J. Chao,
Packetscore:Statistics based Overload Control against
Distributed Denial-of-Service Attacks,‖ Proc. IEEE
INFOCOM, 2004.
[12] F. Kargl, J. Maier, and M. Weber, ―Protecting
Web Servers from
Distributed Denial of Service Attacks,‖ Proc. 10th Int‘l
Conf. World
Wide Web (WWW ‘01), pp. 514-524, 2001.
[13] L. Ricciulli, P. Lincoln, and P. Kakkar, ―TCP SYN
Flooding
Defense,‖ Proc. Comm. Networks and Distributed
Systems Modeling
and Simulation Conf. (CNDS), 1999.

422
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

FAST MUTUAL AUTHENTICATION AND


KEY EXCHANGE BY NESTED ONE-TIME SECRET
MECHANISMS IN MOBILE NETWORK
*Shanmukh Chejarla, ** Bagya Lakshmi.A
* M.Tech Student, **Assistant Professor
Computer Science & Engineering,
S.A Engineering College, Chennai, Tamil Nadu, India
E-Mail Id: * Shan.mailcity@gmail.Com
Abstract: technologies. One of these popular services is wireless
communication. Ubiquitous wireless networks make it
possible for distributed entities to remotely and
Authentication plays a quite important role in
efficiently communicate with each other anytime and
the entire mobile network system and acts as the first
anywhere, even in mobile status. Furthermore, tiny
protector against attackers since it ensures the
and exquisite handsets greatly raise the portability of
correctness of the identities of distributed
mobile devices. wireless communication has played an
communication entities before they engage in any
extremely important role in personal communication
other communication activity. To guarantee the quality
activities. Most of the current mobile communication
of this advanced service, an efficient (especially user-
services are based on the Global System for Mobile
efficient) and secure authentication scheme is urgently
Communications (GSM) architecture, and some novel
desired which is a novel protocol, called the nested
applications based on the third generation (3G) of
one-time secret mechanism for fast mutual
mobile communication systems have also been
authentication and key exchange for mobile
deployed. However, the messages transmitted in
communication environments. Through maintaining
wireless communication networks are exposed in the
inner and outer synchronously changeable common
air, so malicious parties in wireless environments have
secrets, every mobile user can be rapidly
more opportunities than those in wire-line
authenticated by visited location register (VLR) and
environments to Intercept these transmitted
home location register (HLR), respectively. The
messages. It will seriously threaten the security of
proposed solution achieve mutual authentication and
wireless communication systems if no protection
also reduces the computation and communication cost
mechanism is considered. Although some security
of the mobile user. Finally, the security of the
aspects of current mobile communication systems
proposed scheme will be demonstrated by formal
have been concerned, there still have security
proofs.
problems in some GSM-based systems.
Index Terms— Information security, mutual
For example, the impersonating attack works
authentication, onetime secrets, secure mobile
because of the lack of mutual authentication in the
communication
GSM system. The performance of secure mutual
authentication schemes and an efficient solution to
further simplify and speed up the authentication
processes through synchronously changeable secrets,
which form a nested structure (containing an outer
I. Introduction
one-time secret and an inner one), shared by each
mobile user and the system. The outer one-time secret
is a temporal common key of the user and the HLR for
Due to the fast progress of communication
initial authentication or authentication when the user
technologies, many popular services have been
roams around the service area of a new VLR.
developed to take advantage of the advanced

423
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The proposed scheme greatly reduces the computation 2) The transmission time for the authentication
cost required for each mobile user. Furthermore, the message transmitted from Alice to Bob (or from Bob to
proposed scheme is formally demonstrated as being Alice) must be stable.
immune to both the replay attack and the
impersonating attack. The Advantages Of A Timestamp-Based
Authentication Scheme:
The rest of this paper is organized as follows. 1) The protocol only requires two rounds of
Objective and Hwang and Chang‘s scheme of [1] is transmission to reach the goal of mutual
briefly described in Section II and III. Basic idea is authentication.
illustrated in Section IV. In Section V, I present an 2) It is efficient in computation and communication.
efficient mutual authentication scheme for mobile
communications. The conclusion, Future enhancement Although timestamp-based authentication
and references are discussed in section VI, VII, VIII. schemes are simple and efficient, the above two
constraints make them impractical in the Internet and
mobile environments since most of the users‘ clocks
II. Objective
are not synchronous with the server‘s or system‘s
clocks and the transmission time is usually not stable.
The main objective is topropose a secure
mutual authentication and key exchange scheme for The Advantages Of A Nonce-Based
mobile communications by the nested one-time secret Authentication Scheme:
mechanisms and also accomplish the fast mutual 1) It is not necessary to synchronize the clocks of Alice
authentication for mobile environment and Bob.
2) The transmission time for the authentication
III. Review Of Hwang And Chang’s message transmitted from Alice to Bob (or from Bob to
Scheme Alice) can be unstable.

Hwang and Chang‘s scheme is quite efficient The Drawbacks Of A Nonce-Based


for mobile users without impractical assumptions. Authentication Scheme:
I will present a novel practical mobile 1) The protocol requires three rounds of transmission
authentication scheme that is much more efficient to reach the goal of mutual authentication.
than Hwang and Chang‘s scheme [1] in both 2) The scheme is less efficient than a timestamp-
computation and communication under the same based authentication scheme in computation and
assumption of [1]. communication.

IV. Basic Idea A nonce-based authentication scheme is free


from the two constraints required in a timestamp-
In this section, I will introduce the basic idea based authentication scheme, but the performance
that is the underlying foundation for the construction may be a problem in the nonce-based scheme as
of the proposed authentication scheme in mobile compared to the timestamp-based one.
environments
In addition to that schemes, another
An Efficient Hybrid Mechanism For Mutual authentication scheme is also introduced which is
Authentication nothing but one time secret

The two basic approaches to achieve mutual The Advantages Of An Authentication Scheme
authentication between two entities (Alice and Bob). Based On Onetime Secrets:
One is the timestamp-based approach, and the other 1) The protocol only requires two rounds of
is the nonce-based approach transmission to reach the goal of mutual
authentication.
The Assumptions Of A Timestamp-Based 2) It is more efficient than a nonce-based
Authentication Scheme: authentication scheme in computation and
1) The clocks of Alice and Bob must be synchronous. communication. (However, it is less efficient than a
timestamp-based scheme since an additional string

424
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

must be computed in the scheme based on a one-time On the other hand, it is difficult to synchronize
secret.) the clocks of the system (VLR and the HLR) and all
mobile users Hence, I cannot utilize the timestamp-
The Drawback Of An Authentication Scheme based solution to construct the authentication protocol
Based On Onetime Secrets: between the system and every mobile user even
Alice and Bob must store an extra string, i.e., though the solution is the most efficient one among
the one-time secret, in their devices or computers. the three authentication mechanisms. Owing to the
assumption of the mechanism based on one-time
The comparisons of the three authentication secrets, it cannot form the authentication protocol for
mechanisms (i.e., timestamps, one-time secrets, and the initial authentication between the system and each
nonces) are summarized in Table- I . mobile user. Thus, I adopt the nonce-based
mechanism to establish the authentication protocol for
the initial authentication between the system and
every user

V. The Proposed Scheme

Nested One Time Secret Mechanism

A sequence of mutual authentication


processes based on our proposed hybrid mechanism
between mobile user and the system (a VLR and the
HLR). In the initial authentication, the user and the
system authenticate each other by performing a
nonce-based authentication protocol, and then they
negotiate an initial value of a one-time secret. Thus,
they make use of the one-time secret, called the
outerone-time secret, to complete the following
authentication processes.

In fact, the cost of the authentication can be


further reduced again if the user does not leave the
service area of the current VLR. In this case, the user
Table- I performs an initial mutual authentication protocol with
In the GSM system, two authentication the VLR only, and they set an initial value of another
actions must be performed, i.e., the mutual one-time secret, called the innerone-time secret,
authentication between a VLR and the HLR and the shared by them. They can perform the following
mutual authentication between the system (VLR and authentication actions via the inner one-time secret
HLR) and each user. In order to guarantee the quality until the user leaves the service area of the VLR. Once
of mobile communication, the authentication the user enters the service area of another VLR, the
mechanisms I adopt should be as efficient as possible. outer one-time secret will be resumed to serve as the
Each VLR and the HLR are both located in the interior key parameter for the next round of authentication
wired network of the GSM system, so they can between the user and the system.
authenticate each other through the timestamp-based
authentication mechanism without suffering from the In the proposed idea, mobile user shares the
problem of clock synchronization. Since the clocks of outer one-time secret with the HLR and shares the
each VLR and the HLR can be easily synchronized and inner one-time secret with the current VLR. This is
the time consumed by transmitting a message referred to as the nested one-time secret mechanism,
between them is stable, I can make use of the which is illustrated in below figure.
timestamp-based solution to build up the mutual
authentication protocol between each VLR and the
HLR.

425
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Based on the above ideas, I propose a fast


mutual authentication and key exchange scheme for
mobile communications in this section. The proposed
scheme consists of three protocols

1) Initial Authentication For Mobile User And


The System (VLR And HLR)
The Initial Authentication Protocol for Mobile
User and the System the user generate A and send A
to the VLR. .VLR verify the A and generate B then
send B to HLR. HLR verify the B and generate C it
combine D then send C to VLR.VLR verify C and send
D to user. Finally user can verify D and send x to VLR
If user visit on new VLR can redo Authentication
Protocol for Mobile User and the System. Otherwise
move to next process.
Where
A=EKvh (r+1)
B= EKvh (A,U,tv)
D= EKvh (r,x,y,w)
C= EKvh (x,y,w,th,D)
EKvh=common secret key

Figure
2) Initial Authentication For User And The
Current VLR According to the specification of Advanced
Encryption Standard (AES) [2], which is the current
The next process is The Initial Authentication standard of symmetric cryptosystems. The key length
protocol for User and the Current VLR. That is user of every encryption/decryption key in the proposed
generate A and send A to VLR.VLR verify the A then scheme is 256 bits, which will generate a large enough
generate D and send D to user. User can verify the D key space containing possible key values.
and send x to the VLR.
VI. Conclusion
Where
A=EW(s+1) I have proposed a secure mutual
D=EW(x, y, s) authentication and key exchange scheme for mobile
communications based on a novel mechanism, i.e.,
3) Final Authentication For User And The nested one-time secrets. The proposed scheme can
Current VLR withstand the replay attack and the impersonating
attack on mobile communications and speed up
After the process of Initial Authentication authentication. Not only does the proposed scheme
protocol for User and the Current VLR If user visit on reduce the communication and computation cost, but
new VLR redo the Authentication Protocol for Mobile also the security of our scheme has been formally
User and the System Otherwise can perform Final proved.
Authentication Protocol for User and the Current VLR.
After the Final Authentication If user visit on new VLR VII. Future Enhancement
redo the Authentication Protocol for Mobile User and
the System Otherwise end Authentication process This project can use the Symmetric-key algorithms
for the encryption. The enhancement of this project
is to use an asymmetric-key algorithm for encryption.
Technique

426
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

It can provide more security than symmetric key


algorithms. Finally I can compare both the result.

VIII. References:

[1] K. F. Hwang and C. C. Chang, ―A self-encryption


mechanism for authentication of roaming and
teleconference services,‖ IEEE Trans. Wireless
Commun., vol. 2, no. 2, pp. 400–407, Mar. 2003.
[2] U. S. Department of Commerce/National Institute
of Standard andTechnology, ―Specification for the
Advanced Encryption Standard(AES),‖ FIPS PUB 197,
Nov. 2001 [Online].
Available:http://csrc.nist.gov/encryption/aes

427
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EFFECTIVE AND EFFICIENT QUERY PROCESSING


FOR
IDENTIFYING VIDEO SUBSEQUENCE
*Eswari.R , **Professor Mariappan.A.K
Easwari Engineering College, Anna University.
*eswariram_88@yahoo.co.in
**maris2612@yahoo.com
Abstract— Many investigations have been made high demand. A video sequence is an ordered set of a
on content-based video retrieval. However, large number of frames, and from the database
despite the importance, video subsequence research perspective, each frame is usually
identification, which is to find the similar represented by a high-dimensional vector, which has
content to a short query clip from a long video been extracted from
sequence, has not been well addressed. This some low-level content features, such as color
paper presents a graph transformation and distribution, texture pattern, or shape structure within
matching approach to this problem, with the original media domain [3]. Matching of videos is
extension to identify the occurrence of often translated into searches among these feature
potentially different ordering or length due to vectors [4], [5], [6], [7], [8], [9], [10], [11], [12],
content editing. With a novel batch query [13], [14], [15]. In practice, it is often undesirable to
algorithm to retrieve similar frames, the manually check whether a video is part of a long
mapping relationship between the query and stream by browsing its entire length; thus, a reliable
database video is first represented by a solution of automatically finding similar content is
bipartite graph. The densely matched parts imperative. Video subsequence identification involves
along the long sequence are then extracted, locating the position of the most similar part with
followed by a filter-and-refine search strategy respect to a user-specified query clip Q from a long
to prune some irrelevant subsequences. During prestored video sequence S. Ideally, it can identify
the filtering stage, Maximum Size Matching is relevant video, even if there exists some
deployed for each subgraph constructed by the transformation distortion, partial content reordering,
query and candidate subsequence to obtain a insertion, deletion, or replacement. Its typical
smaller set of candidates. During the refinement applications include the following:
stage, Sub-Maximum Similarity Matching is
devised to identify the subsequence with the  Recognition for copyright enforcement. Video
highest aggregate score from all candidates, content owners would like to be aware of any
according to a robust video similarity model use of their material, in any media or
that incorporates visual content, temporal representation. For example, the producers of
order, and frame alignment information. certain movie scenes may want to identify
whether or where their original films have
Keywords— Multimedia databases, subsequence been reused by others, even with some kind of
identification, query processing, similarity measures remixing for multimedia authoring.
VI. INTRODUCTION  TV commercial detection. Some companies
would like to track their TV commercials when
ANSWERING queries based on ―alike‖ but maybe they are aired on different channels during a
not exactly ―same‖ is known as similarity search. It has certain time period for statistic purpose. They
been widely used to simulate the process of object can verify whether their commercials have
proximity ranking performed by human specialists, been actually broadcasted as contracted, and
such as image retrieval [1] and time series matching it is also valuable to monitor how their
[2]. Nowadays, the rapid advances in multimedia and competitors conduct advertisements to
network technologies popularize many applications of apprehend their marketing strategies.
video databases, and sophisticated techniques for The primary difference between video retrieval and
representing, matching, and indexing videos are in video subsequence identification is that, while retrieval

428
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

task conventionally returns similar clips from a large videos in both groups are highly relevant, but not
collection of videos which have been either chopped copies. Another example is the extended cinema
up into similar lengths or cut at content boundaries, version of Toyota commercial (60 seconds) and its
subsequence identification task aims at finding if there shorter TV version (30 seconds), which obviously are
exists any subsequence of a long database video that not copies of each other by definition. On the other
shares similar content to a query clip. In other words, hand, a video copy may be no longer regarded as
while for the former, the clips for search have already visually similar if transformed substantially. Video
been segmented and are always ready for similarity subsequence matching techniques using a fixed length
ranking [11], [12], [13], [14], [15], the latter is a sliding window at every possible offset of database
typical subsequence matching problem. Because the sequence for exhaustive comparison [4], [5], [8] are
boundary and even the length of target subsequence not efficient, especially in the case of seeking over a
are not available initially, choosing which fragments to long-running video. Although a temporal skip scheme
evaluate similarities is not preknown. Therefore, most using similarity upper bound [10], [15] can accelerate
existing methods for retrieval task on video clip the search process by reducing the number of
collections [11], [12], [13], [14], [15] are not candidate subsequences, under the scenario that
applicable to this more complicated problem. actually a target subsequence could have different
ordering or length with a query, these methods could
be not effective.

Compared with existing methods, our approach has


the following distinct features:
 In contrast to the fast sequential search
scheme applying temporal pruning to
accelerate the search process [10], [15] which
assumes query and target subsequence are
strictly of the same ordering and length, our
approach adopts spatial pruning to avoid
seeking over the entire database sequence of
feature vectors for exhaustive comparison.
 Our approach does not involve the
presegmentation of video required by the
proposals based on shot boundary detection
[9], [19], [21]. Shot resolution, which could be
a few seconds in duration, is usually too
coarse to accurately locate a subsequence
boundary. Meanwhile, our approach based on
frame subsampling is capable of identifying
Fig.1. Visually similar videos, but not copies. (a) video content containing ambiguous shot
Inserting different sales and local contact information. boundaries (such as dynamic commercial, TV
(b) Modifying some content, and rearranging partial program lead-in and lead-out subsequences).
order.
VII. RELATED WORK
This paper addresses a different and considerably
harder problem of searching visually similar videos. A. Video Copy Detection
Different from copy detection which normally considers
transformation distortions only, a visually similar video Extensive research efforts have been made on
can be further relaxed to be changed with content extracting and matching content-based signatures to
editing at frame or shot level (swap, insertion, detect copies of videos. Mohan [4] introduced to
deletion, or substitution), thus could lead to different employ ordinal measure for video sequence matching.
ordering or length with original source.Fig. 1 shows Naphade et al. [8] developed an efficient scheme to
two groups of similar TV commercials for Tourism New match video clips using color histogram intersection.
South Wales and a company of Australia, respectively. Pua et al. [9] proposed a method based on color
Each of them is displayed with five sampled frames moment feature to search video copy from a long
extracted at the same time stamps. The corresponding segmented sequence. In their work, query sequence

429
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

slides frame by frame on database video with a fixed extended to measure the similarity of multidimensional
length window. In addition to distortions introduced by trajectories and applied for video matching. However,
different encoding parameters, Kim and Vasudev [5] it adheres to temporal order in a rigid manner and
proposed to use spatiotemporal ordinal signatures of does not allow frame alignment or gap, and is very
frames to further address display format conversions, sensitive to noise. DTW can be utilized to address
such as different aspect ratios (letter-box, pillar-box, frame alignment by repeating some frames as many
or other styles). Since the process of video times as needed without extra cost [7], but no frame
transformation could give rise to several distortions, can be skipped even if it is just a noise. In addition, it
techniques circumventing these variations by globe is capacity limited in the context of partial content
signatures have been considered. They tend to depict reordering. LCSS is proposed to address temporal
a video globally rather than focusing on its sequential order and handle possible noise by allowing some
details. This method is efficient, but has limitations elements to be skipped without rearranging the
with blurry shot boundaries or very limited number of sequence order [21], but it will ignore the effect of
shots. Moreover, in reality, a query video clip can be potentially different gap numbers. As known from the
just a shot or even a subshot. However, this method is research on psychology, the visual judgment of human
only applicable for queries which consist of multiple perception has a number of factors. The proposed
shots. model incorporating different factors for measuring
video similarity is inspired by the weighted schemes
B. Video Similarity Search [19] originally introduced at shot level.
The methods mentioned above only have been
Definition 1. Video subsequence identification.
designed to detect videos of the same temporal order
Let Q = { q1,q2, . . . , q|Q| } be a short query clip and S
and length. To further search videos with changes
= {s1, s2, . . . , s|S| } be the long database video
from query due to content editing, a number of
sequence, where qi = {qi1, . . . ,qid} ϵ Q and sj = {sj1,
algorithms have been proposed to evaluate video
. . . ,sjd} ϵ S are d-dimensional feature vectors
similarity. To deal with inserting in or cutting out
representing video frames, and |Q| and |S| denote the
partial content, Hua et al. [6] used dynamic
total frame number of Q and S, respectively (normally
programming based on ordinal measure of resampled
|Q|«|S|). Video subsequence identification is to find Ŝ
frames at a uniform sampling rate to find the best
= { sm, sm+1, . . . , sn} in S, where 1≤m≤n≤|S|,
match for different length video sequences. This
which is the most similar part to Q under a defined
method has only been tested on a small video
score function.
database. Through time warping distance
For easy reference, a list of notations used in this
computation, they achieved higher search accuracy
paper is shown in Table 1.
than the methods proposed in [5] and [6]. However,
with the growing popularity of video editing tools,
videos can be temporally manipulated with ease. This
work will extend the investigations of copy detection
not only in the aspect of potentially different length
but also allowing flexible temporal order (tolerance to
TABLE 1
content reordering). Cheung and Zakhor [11], [12]
A List of Notations
developed Video Signature to summarize each video
with a small set of sampled frames by a randomized
algorithm. Shen et al. [13] proposed Video Triplet to
represent each clip with a number of frame clusters
and estimate the cluster similarity by the volume of
intersection between two hyperspheres multiplying the
smaller density. It also derives the overall video
similarity by the total number of similar frames shared
by two videos. For compactness, these summarizations
inevitably lose temporal information. Videos are
treated as a ―bag‖ of frames thus they lack the ability
to differentiate two sequences with temporal
reordering, such as ―ABCD‖ and ―ACBD.‖ Various time
series similarity measures can be considered, such as
Mean distance, DTW, and LCSS, all of which can be

430
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

have each qi retrieve the same number of similar


frames, and the differences of the maximum distances,
dmax(qi,sj), where sj ϵ F(qi) and dmax(qi',sj'), where sj'
ϵ F(qi') can vary substantially. Therefore, kNN search is
preferred.

Algorithm 1. Retrieve similar frames


Input:
qi, S
Output:
F(qi) - similar frame set of qi
Description:
1: if kNN search is defined then
2: F(qi) ← {sj|sj ϵ kNN(qi)};
3: return F(qi);
4: else
5: F(qi) ← {sj|sj ϵ range(qi)};
VIII. PROPOSED WORK
Motivated by the efficient query capability brought 6: return F(qi);
by fruitful research in high-dimensional indexing, we 7: end if
propose a graph transformation and matching
approach to process variable length comparison over B. Bipartite Graph Transformation
database video with query. It facilitates safely pruning Each frame can be placed as a node along the
a large portion of irrelevant parts and rapidly locating temporal line of a video. Given a query clip Q and
some promising candidates for further similarity database video S, a short line and a long line can be
evaluations. Constructing a bipartite graph abstracted, respectively. Hereafter, each frame is no
representing the similar frame mapping relationship longer modeled as a high-dimensional point as in the
between Q and S with an efficient batch kNN search preliminary step, but simply a node Q and S, which are
algorithm [18], all the possibly similar video two finite sets of nodes ordered along the temporal
subsequences along the 1D temporal line can be lines, are treated as two sides of a bipartite graph.
extracted. Then, to effectively but still efficiently Formally, let G = {V, E} be a bipartite graph
identify the most similar subsequence, the proposed representing the similar frame mappings between Q
query processing is conducted in a coarse-to-fine style. and S.
Imposing a one-to-one mapping constraint similar in
spirit to that of [19], Maximum Size Matching (MSM)
[20] is employed to rapidly filter some actually
nonsimilar subsequences with lower computational
cost. The smaller numbers of candidates which contain
eligible numbers of similar frames are then further
evaluated with relatively higher computational cost for
accurate identification. Since measuring the video
similarities for all the possible 1:1 mappings in a sub
graph is computationally intractable, a heuristic
method Sub-Maximum Similarity Matching (SMSM) is
devised to quickly identify the subsequence Fig. 2. Construction of bipartite graph.
corresponding to the most suitable 1:1 mapping. Observing the similar frame mappings along the 1D
temporal line of S side, only a small portion is densely
A. Retrieving Similar Frames matched, while the most parts are unmatched at all or
merely sparsely matched. Intuitively, the unmatched
Similar frame retrieval in S for each element qi ϵ Q and sparsely matched parts can be directly discarded,
is processed as a range or kNN search. Given qi and S, as they clearly suggest there are no possible
Algorithm 1 gives the framework of retrieving similar subsequences similar to Q, because a necessary
frames. The output set F(qi) consists of frames of S. condition for a subsequence to be similar to Q is they
However, as explained later, we are more inclined to share sufficient number of similar frames [11]. In view

431
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

of this, we avoid comparing all the possible only be a portion of the sequence, next, we further
subsequences in S, which is infeasible, but safely and refine it to find the most suitable 1:1 mapping for
rapidly filter a large portion of irrelevant parts prior to accurate identification (or ranking), by considering
similarity evaluations. To do so, the densely matched visual content, temporal order and frame alignment
segments of S containing all the possibly similar video simultaneously.
subsequences have to be identified. Note that it is
unnecessary to maintain the entire graph. Instead, we IX. EXPERIMENTS
just process the small-sized E and the subsequences
A. Effectiveness
for the following steps.
To measure the effectiveness of our approach, we
C. Dense Segment Extraction use hit ratio, which is defined as the number of our
method to correctly identify the position of the most
Along the S side of G with integer counts {0,1, . . .
similar subsequence (ground-truth), to the total
, |Q|}, we consider where the chances of nonzero
number of queries. Note that since for each query
number presences are relatively large. Considering
there is only one target subsequence (where the
potential frame alignment and gap, segments without
original fragment was extracted) in the database, hit
strictly consecutive nonzero counts, e.g., the segment
ratio corresponds to P(1), i.e., the precision value at
{s1, . . . , s6} with counts ―241101,‖ should also be
the first rank. The original video has also been
accepted. To depict the frequency of similar frame
manually inspected so that the ground-truth of each
mappings, we introduce the density of a segment.
query clip can be validated.
D. Filtering by Maximum Size Matching
B. Efficiency
After locating the dense segments, we have k separate
To show the efficiency of our approach, we use
subgraphs in the form of Gk = {Vk, Ek}, where Vk is the
response time, which indicates the average running
vertex set while Ek is the edge set representing similar
time of a query. Without SMSM, all the possible 1:1
frame mappings. However, high density of a segment
mappings will be evaluated. Since it is computationally
cannot sufficiently indicate high similarity to query due
intractable to enumerate all 1:1 mappings to find the
to neglect of actual similar frame number, temporal
most suitable one, and there is no prior practical
order, or frame alignment.
method dealing with this problem for performance
comparison, we mainly study the efficiency of our
Definition 2. MSM A matching M in G = {V, E} is a
approach by investigating the effect of MSM filtering.
subset of E, with pairwise nonadjacent edges. The size
Without MSM, all the segments extracted in dense
of matching M is the number of edges in M, written as
segment extraction will be processed, while with MSM,
|M|. The MSM of G is a matching MMSM with the
only a small number of segments are expected. Note
largest size |MMSM|.
that the performance comparison is not affected by
the underlying high-dimensional indexing method.
Relative to a matching Mk in Gk = {Vk, Ek}, we say the
vertices belonging to the edges of Mk saturated by the
matching, and the others are unsaturated. MSM is
X. CONCLUSIONS
characterized by the absence of augmenting paths
[20]. A matching Mk in Gk is its MSM if and only if Gk This paper has presented an effective and efficient
has no Mk-augmenting path. Starting with a matching query processing strategy for temporal localization of
of size 0 in each subgraph, Augmenting Path Algorithm similar content from a long unsegmented video
progressively selects the augmenting path to enlarge stream, considering target subsequence may be
the current matching size by 1 at a time. We can approximate occurrence of potentially different
search for an Mk-augmenting path from each Mk- ordering or length with query clip. In the preliminary
unsaturated vertex. The detailed MSM algorithm can phase, the similar frames of query clip are retrieved by
be found in [20]. a batch query algorithm. Then, a bipartite graph is
constructed to exploit the opportunity of spatial
E. Refinement by Sub-Maximum Similarity Matching pruning; thus, the high-dimensional query and
database video sequence can be transformed to two
The above filtering step can be viewed as a rough
sides of a bipartite graph. Only the dense segments
similarity evaluation disregarding temporal information.
are roughly obtained as possibly similar subsequences.
Observing that a segment may have multiple 1:1
In the filter-and-refine phase, some nonsimilar
mappings, and the most similar subsequence in S may

432
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

segments are first filtered, several relevant segments Detection,‖ IEEETrans. Circuits and Systems for
are then processed to quickly identify the most Video Technology, vol. 15, no. 1, pp. 127-132,
suitable 1:1 mapping by optimizing the factors of 2005.
visual content, temporal order, and frame alignment [27] X.-S. Hua, X. Chen, and H. Zhang, ―Robust
together. In practice, visually similar videos may Video Signature Based on Ordinal Measure,‖ Proc.
exhibit with different orderings due to content editing, IEEE Int‘l Conf. Image Processing (ICIP ‘04), pp.
which yields some intrinsic cross mappings. Our video 685-688, 2004.
similarity model which elegantly achieves a balance [28] C.-Y. Chiu, C.-H. Li, H.-A. Wang, C.-S. Chen,
between the approaches of neglecting temporal order and L.-F. Chien, ―A Time Warping Based
and strictly adhering to temporal order is particularly Approach for Video Copy Detection,‖ Proc. 18th
suitable for dealing with this case, thus can support Int‘l Conf. Pattern Recognition (ICPR ‘06), vol. 3,
accurate identification. Although only color feature is pp. 228-231, 2006.
used in our experiments, the proposed approach [29] M.R. Naphade, M.M. Yeung, and B.-L. Yeo, ―A
inherently supports other features. For the future Novel Scheme for Fast and Efficient Video
work, we plan to further investigate the effect of Sequence Matching Using Compact Signatures,‖
representing videos by other features, such as ordinal Proc. Storage and Retrieval for Image and Video
signature. Moreover, the weight of each factor for Databases (SPIE ‘00), pp. 564-572, 2000.
measuring [30] K.M. Pua, J.M. Gauch, S. Gauch, and J.Z.
video similarity might be adjusted by user feedback to Miadowicz, ―Real Time Repeated Video Sequence
embody the degree of similarity more completely and Identification,‖ Computer Vision and Image
systematically. Understanding, vol. 93, no. 3, pp. 310-327, 2004.
[31] K. Kashino, T. Kurozumi, and H. Murase, ―A
ACKNOWLEDGMENTS Quick Search Method for Audio and Video Signals
Based on Histogram Pruning,‖ IEEE Trans.
Sound and Vision video is copyrighted. The Sound
Multimedia, vol. 5, no. 3, pp. 348-357, 2003.
and Vision video used in this work is provided solely
[32] S.-C.S. Cheung and A. Zakhor, ―Efficient Video
for research purposes through the TREC Video
Similarity Measurement with Video Signature,‖
Information Retrieval Evaluation Project Collection.
IEEE Trans. Circuits and Systems for Video
The authors would like to thank the anonymous
Technology, vol. 13, no. 1, pp. 59-74, 2003.
reviewers for their comments, which led to
[33] S.-C.S. Cheung and A. Zakhor, ―Fast Similarity
improvements of this paper. This work is supported in
Search and Clustering of Video Sequences on the
part by Australian Research Council under Grant
World-Wide-Web,‖ IEEE Trans. Multimedia, vol. 7,
DP0663272.
no. 3, pp. 524-537, 2005.
[34] H.T. Shen, B.C. Ooi, X. Zhou, and Z. Huang,
REFERENCES
―Towards Effective Indexing for Very Large Video
Sequence Database,‖ Proc. ACM SIGMOD ‘05, pp.
[22] A.W.M. Smeulders, M. Worring, S. Santini, A.
730-741, 2005.
Gupta, and R. Jain, ―Content-Based Image
[35] H.T. Shen, X. Zhou, Z. Huang, J. Shao, and X.
Retrieval at the End of the Early Years,‖ IEEE
Zhou, ―Uqlips: A Real-Time Near-Duplicate Video
Trans. Pattern Analysis and Machine Intelligence,
Clip Detection System,‖ Proc. 33rd Int‘l Conf. Very
vol. 22, no. 12, pp. 1349-1380, Dec. 2000.
Large Databases (VLDB ‘07), pp. 1374-1377, 2007.
[23] C. Faloutsos, M. Ranganathan, and Y.
[36] J. Yuan, L.-Y. Duan, Q. Tian, S. Ranganath,
Manolopoulos, ―Fast Subsequence Matching in
and C. Xu, ―Fast and Robust Short Video Clip
Time-Series Databases,‖ Proc. ACM SIGMOD ‘94,
Search for Copy Detection,‖ Proc. Fifth IEEE
pp. 419-429, 1994.
Pacific-Rim Conf. Multimedia (PCM ‘04), vol. 2, pp.
[24] H. Wang, A. Divakaran, A. Vetro, S.-F. Chang,
479-488, 2004.
and H. Sun, ―Survey of Compressed-Domain
[37] J. Shao, Z. Huang, H.T. Shen, X. Zhou, E.-P.
Features Used in Audio-Visual Indexing and
Lim, and Y. Li, ―Batch Nearest Neighbor Search for
Analysis,‖ J. Visual Comm. and Image
Video Retrieval,‖ IEEE Trans. Multimedia, vol. 10,
Representation, vol. 14, no. 2, pp. 150-183, 2003.
no. 3, pp. 409-420, 2008.
[25] R. Mohan, ―Video Sequence Matching,‖ Proc.
[38] Y. Peng and C.-W. Ngo, ―Clip-Based Similarity
IEEE Int‘l Conf. Acoustics, Speech, and Signal
Measure for Query-Dependent Clip Retrieval and
Processing (ICASSP ‘98), pp. 3697-3700, 1998.
Video Summarization,‖IEEE Trans. Circuits and
[26] C. Kim and B. Vasudev, ―Spatiotemporal
Sequence Matching for Efficient Video Copy

433
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Systems for Video Technology, vol. 16, no. 5, pp. [41] X. Liu, Y. Zhuang, and Y. Pan, ―A New
612-627, 2006. Approach to Retrieve Video by Example Video
[39] D.R. Shier, ―Matchings and Assignments,‖ Clip,‖ Proc. Seventh ACM Int‘l Conf. Multimedia
Handbook of Graph Theory, J.L. Gross and J. (MULTIMEDIA ‘99), vol. 2, pp. 41-44, 1999.
Yellen, eds., pp. 1103-1116, CRC Press, 2004. [42] Y. Wu, Y. Zhuang, and Y. Pan, ―Content-Based
[40] L. Chen and T.-S. Chua, ―A Match and Tiling VideoSimilarity Model,‖ Proc. Eighth ACM Int‘l
Approach to Content-Based Video Retrieval,‖ Proc. Conf. Multimedia (MULTIMEDIA ‘00), pp. 465-467,
IEEE Int‘l Conf. Multimedia and Expo (ICME ‘01), 2000.
pp. 417-420, 2001.

434
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

AN EFFICIENT CROSS LAYER INTRUSION


DETECTION TECHNIQUE FOR MANET

*V. Balaji **N. Partheeban,


*2 nd yr M.E., CSE,S.A .Engineering College, Chennai-77
vbalajichinna@gmail.com Mobile: 915083327
**Asst. Professor, Dept. CSE,S.A .Engineering College, Chennai-77
knparthi78@gmail.com Mobile:9841966120

Abstract— A mobile ad hoc network (MANET) is a Nodes in the MANET usually share the same
self-configuring network of mobile devices physical media; they transmit and acquire signals at
connected by wireless links. Security has the same frequency band, and follow the same
become important to Mobile Ad Hoc Networks hopping sequence or spreading code. The data-link-
(MANETs) due to their nature and use for many layer manages the wireless link resources and
mission and life-critical applications. So, there is coordinates medium access among neighboring nodes.
a critical need to replace single layer intrusion
detection technology with multi layer detection.
An efficient cross layer intrusion detection The medium access control (MAC) protocol allows
system is proposed to discover the malicious mobile nodes to share a common broadcast channel.
nodes and different types of DoS attacks by The network-layer holds the multi-hop communication
exploiting information available across different paths across the network. All nodes in the mobile ad
layers of protocol stack in order to improve the hoc network function as routers that discover and
accuracy of detection. Fixed width clustering maintain routes to other nodes in the network.
algorithm is used for efficient detection of the
anomalies in the MANET traffic and different The nature of mobility creates new vulnerabilities
types of attacks in the network. due to the open medium, dynamically changing
network topology, cooperative algorithms, lack of
Keywords— MANET, Fixed Width Clustering centralized monitoring and management points. A
algorithm, Cross Layer malicious node may take advantages of the MANET
node to launch routing attacks as the node acts as
 INTRODUCTION router to communicate with each other. The wireless
links between the nodes along with the mobility raises
Ad hoc networks are a new paradigm of wireless
the challenges of IDS to detect the attacks. It is very
communication for mobile hosts. Wireless ad-hoc
difficult and challenging for Intrusion Detection system
network consists of a collection of ―peer‖ mobile nodes
(IDS) to fully detect routing attacks due to MANET‘s
that are capable of communicating with each other
characteristics. So, the IDS needs a scalable
without help from a fixed infrastructure. Mobile ad hoc
architecture to collect sufficient evidences to detect
network (MANET) does not rely on a preexisting
routing attacks effectively.
infrastructure, such as routers in wired networks or
access points in managed (infrastructure) wireless We have proposed a new intrusion detection
networks. Instead, each node participates in routing architecture which incorporates cross layer that
by forwarding data for other nodes, and so the interacts between the layers. In addition to this we
determination of which nodes forward data is made have used association module to link between the OSI
dynamically based on the network connectivity. protocol stack and the IDS module which results in low
overhead during the data collection. We have

435
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

implemented the fixed width clustering algorithm in types of routing attacks. They are able to detect
anomaly detection engine for efficient detection of sinkhole attacks effectively which are intense form of
intrusion in the adhoc networks. attack. There are some flaws like there is absence of
simulation platform that can support a wider variety of
The rest of the paper is organized as follows.
attacks on larger scale networks. Fixed width
Section II presents the related work. A brief
clustering algorithm has shown to be highly effective
description about cross layer techniques in IDS and
for anomaly detection in network intrusion [9]. It
followed by association module is given in Section III.
presents a geometric framework for unsupervised
A detailed description of intrusion detection module
anomaly detection. This paper needs more feature
and its underlying architecture is dealt in Section IV.
maps over different kinds of data and needs to
The anomaly detection mechanism used in MANET is
perform more extensive experiments evaluating the
discussed in section V. Finally, section VI concludes
methods presented.
the carried out research and future works.
 CROSS LAYER TECHNIQUES IN IDS
 RELATED WORKS
The very advantage of mobility in MANET leads to
A lot of studies have been done on security
its vulnerabilities. For efficient intrusion detection, we
prevention measures for infrastructure-based wireless
have used cross layer techniques in IDS. The
networks but few works has been done on the
traditional way of layering network approach with the
prospect of intrusion detection [1]. Some general
purpose of separating routing, scheduling, rate and
approach has been used in a distributed manner to
power control is not efficient for ad-hoc wireless
insure the authenticity and integrity of routing
networks. A. Goldsmith discussed that rate control,
information such as key generation and management
power control; medium access and routing are building
on the prevention side. Authentication based
block of wireless network design [10].
approaches are used to secure the integrity and the
authenticity of routing messages such as [2], [3]. Generally, routing is considered in a routing layer
There are some difficulties that have to be faced in and medium access in MAC layer whereas power
realizing some of the schemes like cryptography and control and rate control are sometimes considered in a
they are relatively expensive on MANET because of PHY and sometimes in a MAC layer. With the help of
computational capacity. A number of intrusion cross layer interaction, the routing forwards possible
detection schemes for intrusion detection system have route choices to MAC and MAC decides the possible
been presented for ad-hoc networks. In [4], Zhang routes using congestion and IDS information as well as
proposes architecture for a distributed and cooperative returns the result to the routing.
intrusion detection system for ad-hoc networks based The selection of correct combination of layers in the
on statistical anomaly detection techniques but has not design of cross layer IDS is very critical to detect
properly mentioned about the simulation scenario and attacks targeted at or sourced from any layers rapidly.
the type of mobility used has not been mentioned. It is optimal to incorporate MAC layer in the cross layer
In [5], Huang details an anomaly detection design for IDS as DoS attack is better detected at this
technique that explores the correlations among the layer. The routing protocol layer and MAC layer is
features of nodes and discusses about the routing chosen for detecting routing attacks in an efficient
anomalies. In [6], A. Mishra emphasizes the challenge way. Data with behavioural information consisting of
for intrusion detection in ad-hoc network and purpose layer specific information are collected from multiple
the use of anomaly detection, but do not provide a layers and forward it to data analysis module which is
detailed solution or implementation for the problem. In located in an optimal location [11]. Figure-1 illustrates
[7],Kisung Kim discusses about sinkhole attack which the cross layer design.
is one of the representative attacks in MANET caused This cross layer technique incorporating IDS leads
by attempts to draw all network traffic to a sinkhole to an escalating detection rate in the number of
node. He focuses on the sinkhole problem on Dynamic malicious behaviour of nodes increasing the true
Source Routing (DSR) protocol in MANET and detects positive and reducing false positives in the MANET. It
the sinkhole node, several useful sinkhole indicators also alleviates the congestion which can adapt to
through analyzing the sinkhole problem has been changing network and traffic characteristics. In order
used. to evade congestion and reroute traffic, MAC and
routing layers have to cooperate with each other with
In [8], Loo presents an intrusion detection method
the IDS in order to avoid insertion of malicious nodes
using a clustering algorithm for routing attacks in
in the new routes.
sensor networks. It is able to detect three important

436
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Support and confidence are generally used to


measure the relevance of the association rules. The
association rule is decomposed into itemsets and the
rules. The itemsets with minimum supports are called
frequent itemsets. TheApriori property is based on the
following observation. If an itemset I does not satisfy
the minimum support threshold, min sup, then I is not
frequent; that is, P(I) <min sup. If an item A is added
to the itemset I, then the resulting itemset (i.e., XUY)
cannot occur more frequently than I. Therefore, XUY is
not frequent either; that is, P(XUY) <min sup. This
property belongs to a special category of properties
called antimonotone in the sense that if a set cannot
pass a test, all of its supersets will fail the same test as
Fig. 1: Cross Layer Design well. It is called antimonotone because the property is
monotonic in the context of failing a test. To
The physical layer collects various types of understand this, we look at how Lk-1 is used to find
communication activities including remote access and Lkfor k-2. A two-step process is followed, consisting of
logons, user activities, data traffics and attack traces. join and prune actions.
MAC contains information regarding congestion and
interference. The detection mechanism for Algorithm: Apriori. Find frequent itemsets using an
misbehaving nodes interacts with routing layer for the iterative level-wise approach based on candidate
detection process as MAC layers also help in detection generation.
of certain routing attacks. MAC also interacts with the
Input: D, a database of transactions;
physical layer to determine the quality of suggested
path [12]. min sup, the minimum support count threshold.
By combining cross layer features, attacks between Output:L, frequent itemsets in D.
the layers inconsistency can be detected. Furthermore,
these schemes provide a comprehensive detection Method:
mechanism for all the layers i.e. attacks originating L1 = find frequent 1-itemsets(D);
from any layers can be detected with better detection
accuracy. for (k = 2;Lk-1 6= f;k++) {
Once association rules are extracted from multiple Ck= apriori gen(Lk-1);
segments of a training data set, they are then
for each transaction t Є D {
aggregated into a rule set. The feature sets consist of
control and data frames from MAC frames and control Ct = subset(Ck, t);
packets like RREQ, RREP and RERR including data
for each candidate c Є Ct
packets of IP packets from network layer. All the
control packets are combined into one category as c.count++; }
routing control packet and IP data packet as routing
Lk = fc 2 Ckjc:count _ min supg }
data packet. So, the payloads in MAC data frames
contain either a routing CtrlPkt or routing DataPkt return L = UkLk;
[13]. The feature set is foreshortened by associating
procedure apriori gen(Lk-1:frequent (k-1)-itemsets)
one or more features from different layers to specific
MAC layer feature so that the overhead of learning is for each itemset l1 Є Lk-1
minimized. The characteristics are assorted based on
for each itemset l2 Є Lk-1
dependency on time, traffic and other features [14].
Our association rule is of the form X→Y,c,s. Where if (l1[1] = l2[1])^(l1[2] = l2[2])^…^(l1[k-2]
X and Y are itemsets, and X ∩Y = Ø S-support (XUY)
= l2[k-2])^(l1[k-1] <l2[k-1]) then {
is the support of the rule, and c= sup(X U Y ) /sup(X)
is confidence. Let D be database of traffic and the c = l1 on l2; // join step: generate candidates
association rules have support and confidence greater
if has infrequent subset(c, Lk-1) then
than minimum support (minsup) and minimum
confidence (minconf) respectively [15].

437
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

delete c; // prune step: remove unfruitful


candidate
else add c to Ck; }
return Ck;
procedure has_ infrequent_ subset(c: candidate k-
itemset;
Lk-1: frequent (k-1)-itemsets); // use prior
knowledge
Fig 2: Intrusion Detection System
for each (k-1)-subset s of c
if s 62 Lk-1 then A. Local Data Collection
The local data collection module collects data streams
return TRUE; of various information, traffic patterns and attack
return FALSE; traces from physical, MAC and network layers via
association module. The data streams can include
system, user and mobile nodes‘ communication
 INTRUSION DETECTION SYSTEM activities within the radio range.
We have used data mining techniques in Intrusion
B. Local Detection
detection module in order to improve the efficiency
The local detection module consists of anomaly
and effectiveness of the MANET nodes. With our
detection engine. The local detection module analyzes
studies, we have found out that among all the data
the local data traces gathered by the local data
mining intrusion detection techniques, clustering-based
collection module for evidence of anomalies. A normal
intrusion detection is the most potential one because
profile is an aggregated rule set of multiple training
of its ability to detect new attacks. Many traditional
data segments. New and updated detection rules
intrusion detection techniques are limited with
across ad-hoc networks are obtained from normal
collection of training data from real networks and
profile. The normal profile consists of normal behavior
manually labeled as normal or abnormal. It is very
patterns that are computed using trace data from a
time consuming and expensive to manually collect
training process where all activities are normal. During
pure normal data and classify data in wireless
testing process, normal and abnormal activities are
networks [16].
processed and any deviations from the normal profiles
We have used association algorithm such as Apriori are recorded. The anomaly detection distinguishes
which can be utilized to achieve traffic features which normalcy from anomalies as of the deviation data by
is then followed by clustering algorithm. In [16], it comparing with the test data profiles with the
states that a good efficiency and performance is expected normal profiles. If any detection rules deviate
obtained with association algorithm and clustering beyond a threshold interval and if it has a very high
algorithm. The association rule and clustering are used accuracy rate it can determine independently that the
as the root for accompanying anomaly detection of network is under attack and initiates the alert
routing and other attacks in MANET. Our proposed IDS management.
architecture is shown in fig. 2 and the IDS module is
C. Cooperative Detection
described below [17].
When the support and confidence level is low or
intrusion evidence is weak and inconclusive in the
detecting node then it can make collaborative decision
by gathering intelligence from its surrounding nodes
via protected communication channel. The decision of
cooperative detection is based on the majority of the
voting of the received reports indicating an intrusion or
anomaly.
D. Alert Management
The alert management receives the alert from the local
detection or co-operative detection depending on the

438
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

strength of intrusion evidence. It collects them in the We use some of the features for detecting DoS attacks
alert cache for t seconds. If there are more abnormal and attacks that manipulate routing protocol. The
predictions than the normal predictions then it is number of data packets received is used to detect
regarded as ―abnormal‖ and with adequate information unusual level of data traffic which may indicate a DoS
an alarm is generated to inform that an intrusive attack based on a data traffic flood.
activity is in the system.
C. Training normal data using cluster mechanism
 ANOMALY DETECTION MECHANISM We have implemented fixed-width clustering
The anomaly detection system creates a normal algorithm as an approach to anomaly detection. It
base line profile of the normal activities of the network calculates the number of points near each point in the
traffic activity. The main objective is to collect set of feature space. In fixed width clustering technique, set
useful features from the traffic to make the decision of clusters are formed in which each cluster has fixed
whether the sampled traffic is normal or abnormal. radius w also known as cluster width in the feature
Some of the advantages of anomaly detection system space [20]. The cluster width w is chosen as the
are it can detect new and unknown attacks, it can maximum threshold radius of a cluster.
detect insider attacks; and it is very difficult for the
attacker to carry out the attacks without setting off an Algorithm: Fixed Width Clustering
alarm [18]. The process of anomaly detection
Input: k: the number of clusters,
comprises of two phases: training and testing.
A. Construction of normal Dataset ST: a data set containing n traffic samples.
The data obtained from the audit data sources mostly Output: A set of k clusters.
contains local routing information, data and control
information from MAC and routing layers along with Method:
other traffic statistics. The training of data may entail (1) Arbitrarily choose k objects from D as the initial
modeling the allotment of a given set of training points
or characteristic network traffic samples. cluster centers;
We have to make few assumptions so that the (2) Repeat
traced traffic from the network contains no attack
(3) (re)assign each object to the cluster to which the
traffic [19]:
• The normal traffic occurs more frequently than the object is the most similar, based on the mean value of
attack traffic.
the objects in the cluster;
• The attack traffic samples are statistically different
from the normal connections. (4) Update the cluster means, i.e., calculate the mean
Since, we have used two assumptions; the value of the objects for each cluster;
attacks will appear as outliers in the feature space
resulting in detection of the attacks by analyzing and (5) Until no change;
identifying anomalies in the data set. Explanation of the fixed width algorithm:
A set of network traffic sample ST are obtained
from the audit data for training purpose. Each sample
B. Feature construction
si in the training set is represented by a d-dimensional
For feature construction, an unsupervised method
vector of attributes. In the beginning, the set of
is used to construct the feature set. Clustering
clusters as well as the number of clusters are null.
algorithm is used to construct features from the audit
Since, there is significant variation in each attribute.
data. The feature set is created by using the audit data
While calculating the distance between points,
and most common feature set are selected as essential
normalization is done before mapping into the feature
feature set which has weight not smaller than the
space to ensure that all features have the same
minimum threshold. A set of considerable features
outcome. It is obtained by normalizing each
should be obtained from the incoming traffic that
continuous attribute in terms of the number of
differentiates the normal data from the intrusive data.
standard deviations from the mean. The first point of
Few and semantic information is captured which
the data forms the centre of the new cluster. A new
results in better detection performance and saves
cluster ψ1 is formed having centroid ψ1* from sample
computation time. In case of feature construction, we
si. For every succeeding point, we measure the
collect the traffic related features as well as non-traffic
distance of each traffic sample si to the centroid of
related features which represents routing conditions.

439
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

each cluster ψ1* that has been generated by the doing the simulation using NS-2 simulator and analyze
cluster set Ψ. If the distance to the nearest cluster ψn the result.
is within w of cluster center, then the point is assigned
to the cluster, and the centroid of the closest cluster is
updated.
REFERENCES
The total number of points in the cluster is
 S. Jacobs, S. Glass, T. Hiller, and C. Perkins,
incremented. Else, the new point forms the centroid of
―Mobile IP authentication, authorization, and
a new cluster. Euclidean distance as well as argmin is
accounting requirements,‖ Request for Comments
used because it is more convenient to have items
2977, Internet Engineering Task Force, October
which minimizes the functions. As a result, the
2000.
computational load is decreased. Moreover, the traffic
 K. Sanzgiri, B. Dahill, B.N. Levine, E.B. Royer, and
samples are not stored and only one pass is required
C. Shields, ―A Secure Routing Protocol for Ad-hoc
through the traffic samples. In the final stage of
Networks,‖ in the Proceedings of International
training, labeling of cluster is done based on the initial
Conference on Network Protocols (ICNP), 2002.
assumptions like ratio of the normal traffic is very
 Yih-Chun Hu, Adrian Perrig, and David Johnson.
small than attack traffic and the anomalous data points
Ariadne: ―A Secure On- Demand Routing Protocol
are statistically different to normal data points. If the
for Ad Hoc Networks,‖ in the Proceedings of
cluster contains less than a threshold η % of the total
MobiCom, 2002.
set of points then it is considered as anomalous.
 Y. Zhang, W. Lee, and Y.-A. Huang, "Intrusion
Otherwise the clusters are labeled as normal. Besides,
Detection Techniques for Mobile Wireless
the points in the dense regions will be higher than the
Networks," ACM J. Wireless Networks, pp. 545-
threshold; we only consider the points that are
556, 2003.
outliers.
 Y. Huang, W. Fan, W. Lee, and P. S. Yu, "Cross-
D. Testing Phase feature analysis for detecting ad-hoc routing
The testing phase takes place by comparing each new anomalies," in the Proceedings of the 23rd
traffic samples with the cluster set Ψ to determine the International Conference on Distributed
anonymity. The distance between a new traffic sample Computing Systems (ICDCS) Providence, pp. 478-
point si and each cluster centroid ψ1* is calculated. If 487, 2003.
the distance from the test point s to the centroid of its  A. Mishra, K. Nadkarni, and A. Patcha, "Intrusion
nearest cluster is less than cluster width parameter w, Detection in Wireless Ad-Hoc Networks," in IEEE
then the traffic sample shares the label as either Wireless Communications, pp. 48- 60, February
normal or anomalous of its nearest cluster. If the 2004.
distance from s to the nearest cluster is greater than  Kisung Kim, Sehun Kim ―A Sinkhole Detection
w, then s lies in less dense region of the feature space, Method based on Incremental Learning in
and is labeled as anomalous. While comparing our IDS Wireless Ad Hoc Networks‖, 2008.
module with [14], it has complexity of the system due  C. Loo, M. Ng, C. Leckie, and M. Palaniswami,
to non linear pattern recognition where as the ―Intrusion Detection for Routing attacks in Sensor
proposed IDS is simple using association rule to Networks,‖ in International Journal of Distributed
comply with the anomaly profiling. Similarly, [12] has Sensor Networks, pp. 313-332, october-December
message overhead as a result it consumes more power 2006
resulting battery constrains while the proposed IDS  E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S.
consumes low energy by adopting association rule. Stolfo, ―A geometric framework for unsupervised
anomaly detection: detecting intrusions in
 CONCLUSIONS AND FUTURE WORK
unlabeled data,‖ in Applications of Data Mining in
Hence, an efficient intrusion detection mechanism Computer Security. Kluwe, 2002.
based on anomaly detection is presented in this paper  A.Goldsmith and S.B. Wicker, ―Design challenges
utilizing cluster data mining technique. We expect that for energyconstrained ad hoc wireless networks,‖
our proposed cross-layer based intrusion detection in IEEE Wireless Communications, pp. 9(4):8–27,
architecture designed to detect DoS attacks, sinkhole August 2002.
attack at different layers of the protocol stack, and  C. J. John Felix, A. Das, B.C. Seet, and B.S. Lee,
various types of UDP flooding attack in an efficient ―Cross Layer versus Single Layer Approaches for
way. Intrusion Detection in MANET,‖ in IEEE
Future work will involve in implementing the International Conference on Networks, Adelaide,
proposed architecture with fixed width algorithm and pp. 194-199, November, 2007.

440
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 J. S. Baras and S. Radosavac, "Attacks and Networking in China, CHINACOM '07, pp. 296-
Defenses Utilizing Cross- Layer Interactions in 300, August 2007.
MANET," in workshop on Cross-Layer Issues in  R. Shrestha, K.H. Han, J.Y. Sung, K.J Park, D.Y.
the Design of Tactical Mobile Ad Hoc Wireless Choi, S.J. Han, ―An Intrusion Detection System in
Networks: Integration of Communication and Mobile Ad-Hoc Networks with Enhanced Cross
Networking Functions to Support Optimal Layer Features,‖ KICS conference, Suncheon
Information Management, Washington, DC, June University, pp. 264-268, May 2009.
2004.  A. Patcha and J.M. Park, ―An overview of anomaly
 L. Yu, L. Yang, and M. Hong, ―Short Paper: A detection techniques: existing solutions and latest
Distributed Cross- Layer Intrusion Detection technological trends,‖ Elsevier Computer
System for Ad Hoc Networks,‖ in Proceedings of Networks, Vol. 51, Issue 12, pp. 3448–3470,
the 1st International Conference on Security and 2007.
Privacy for Emerging Areas in Communication  L. Portnoy, E. Eskin, and S. Stolfo, ―Intrusion
Networks, Athens, Greece, pp. 418-420, detection with unlabeled data using clustering,‖ in
September 2005. proceedings of the Workshop on Data Mining for
 C. J. John Felix, A. Das, B.C. Seet, and B.-S. Lee, Security Applications, November 2001.
―CRADS:Integrated Cross Layer Approach for  C. Loo, M. Ng, C. Leckie, and M. Palaniswami,
Detecting Routing Attacks in MANETs,‖ in IEEE ―Intrusion Detection for Routing attacks in Sensor
Wireless Communications and Networking Networks,‖ in International Journal of Distributed
Conference (WCNC), Las Vegas, CA, USA, pp. Sensor Networks, pp. 313-332, October-
1525-1530, March 2008. December 2006.
 R. Shrikant, ―Fast algorithm for mining association
rule and sequential pattern,‖ PhD Thesis ,
University of Wisconsin, Madison, 1996.
 S.J. hua and M.C. Xiang, ―Anomaly Detection
Based on Data-Mining for Routing Attacks in
Wireless Sensor Networks,‖ in Second
International Conference on Communications and

441
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

WIRELESS SENSOR NETWORK SECURITY


USING VIRTUAL ENERGY BASED
ENCRYPTION
*S.Hanson Thaya **Mr.C.BalaKrishnan,
*Master of Engineering, Department of CSE
S.A.Engg College, Chennai
hason2001@gmail.com
**Assistant ProfessorDepartment of CSE
S.A. Engg College, Chennai

Abstract— Designing cost-efficient, secure network Keywords— Wireless Sensor Networks, Security in
protocols for Wireless Sensor Networks (WSNs) is a Wireless Sensor Net-Works, energy-based keying,
challenging problem because sensors are resource- resource-constrained devices.
limited wireless devices. The communication cost is the
most dominant factor in a sensor‘s energy I. INTRODUCTION
consumption, we introduce an wireless sensor network Today, WSNs are no longer a nascent technology and
security using virtual energy based encryption scheme future advances in technology will bring more sensor
for WSNs that significantly reduces the number of applications into our daily lives as well as into many
transmissions needed for rekeying to avoid stale keys. diverse and challenging application scenarios. For
The key to the hashing function dynamically changes example, in a battlefield scenario, sensors may be
as a function of the transient energy of the sensor, used to detect the location of enemy sniper fire or to
thus requiring no need for re-keying. Multiple sensing detect harmful chemical agents before they reach
nodes using their own authentication key. The goal of troops. In another potential scenario, sensor nodes
saving energy, minimal transmission is imperative for forming a network under water could be used for
some military applications of WSNs where an oceanographic data collection, pollution monitoring,
adversary could be monitoring the wireless spectrum. assisted navigation, military surveillance, and mine
Energy Based Encryption and Keying is a secure reconnaissance operations. Future improvements in
communication framework where sensed data is technology will bring more sensor applications into our
encoded using a scheme based on a permutation code daily lives and the use of sensors will also evolve from
generated via the RC4 encryption mechanism. The key merely capturing data to a system that can be used for
to the RC4 encryption mechanism dynamically changes real-time compound event alerting
as a function of the residual virtual energy of the Surveillance of territorial waters, to biological and
sensor. Thus, a one-time dynamic key is employed for chemical attack detection.
one packet only and different keys are used for the In this regard, designing secure protocols for
successive packets of the stream. The intermediate wireless sensor networks is vital. However, designing
nodes along the path to the sink are able to verify the secure protocols for WSNs requires first the detailed
authenticity and integrity of the incoming packets under-standing of the WSN technology and its relevant
using a predicted value of the key generated by the security aspects. Compared to other wireless
sender‘s virtual energy, thus requiring no need for networking technologies, WSNs have unique
specific rekeying messages. Energy Based Encryption characteristics that need to be taken into account
and Keying is able to efficiently detect and filter false when building protocols. Among many factors, the
data injected into the network by malicious outsiders. available resources (i.e., power, computational
capacities, and memory) onboard the sensor nodes are
severely limited.

442
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

In this paper, we focus on keying mechanisms for exchange explicit control messages for rekeying;
WSNs. There are two fundamental key management 2. provision of one-time keys for each packet
schemes for WSNs: static and dynamic. In static key transmitted
management schemes, key management functions to avoid stale keys;
(i.e., key generation and distribution) are handled 3. a modular and flexible security architecture
statically. That is, the sensors have a fixed number of with a
keys loaded either prior to or shortly after network simple technique for ensuring authenticity ,
deployment. On the other hand, dynamic key integrity,
management schemes perform keying functions and nonrepudiation of data without enlarging
(rekeying) either periodically or on demand as needed packets with
by the network. The sensors dynamically exchange MACs; and
keys to communicate. Although dynamic schemes are 4. a robust secure communication framework that is
more attack-resilient than static ones, one significant operational in dire communication situations and over
disadvantage is that they increase the communication unreliable medium access control layers.
overhead due to keys being refreshed or redistributed This paper is structured as follows: Section 2 deals
from time to time in the network. There are many about the Background and Motivation about the
reasons for key refreshment, including: updating keys concepts discussed through the rest of the paper,
after a key revocation has occurred, refreshing the key Section 3, Section 4, Section 5 and finally section 6.
such that it does not become stale, or changing keys
due to dynamic changes in the topology. In this paper, 2 BACKGROUND AND MOTIVATION
we seek to minimize the overhead associated with
refreshing keys to avoid them becoming stale. Because One significant aspect of confidentiality research in
the communication cost is the most dominant factor in WSNs entails designing efficient key management
a sensor‘s energy consumption schemes. This is because regardless of the encryption
The purpose of this paper is to develop an efficient mechanism chosen for WSNs, the keys must be made
and available to the communicating
secure communication framework for WSN nodes (e.g., sources and sink(s)). The keys could be
applications. Secure communication framework distributed to the sensors before the network
provides a technique to verify data in line and drop deployment or they could be redistributed (rekeying)
false packets from malicious nodes, thus maintaining to nodes on demand as triggered by keying events.
the health of the sensor network. Wireless sensor The former is static key management and the latter is
network security using virtual energy based encryption dynamic key management. There are myriads of
dynamically updates keys without exchanging variations of these basic schemes in the literature. The
messages for key renewals and embeds integrity into main motivation behind Wireless sensor network
packets as opposed to enlarging the packet by security using virtual energy based encryption is that
appending message authentication codes (MACs). the communication cost is the most dominant factor in
Specifically, each sensed data is protected using a a sensor‘s energy consumption.
simple encoding scheme based on a permutation code Rekeying with control messages is the approach of
generated with the RC4 encryption scheme and sent existing dynamic keying schemes whereas
toward the sink. The key to the encryption scheme rekeying without extra control messages is the
dynamically changes as a function of the residual primary feature of the Wireless sensor network
virtual energy of the sensor, thus requiring no need for security using virtual energy based encryption
rekeying. Therefore, a one-time dynamic key is used framework. Dynamic keying schemes go through the
for one message generated by the source sensor and phase of
different keys are used for the successive packets of rekeying either periodically or on demand as needed
the stream. The nodes forwarding the data along the by the network to refresh the security of the system.
path to the sink are able to verify the authenticity and With rekeying, the sensors dynamically exchange keys
integrity of the data and to provide non repudiation. that are used for securing the communication. Hence,
The protocol is able to continue its operations under the energy cost function for the keying process from a
dire communication cases as it may be operating in a source sensor to the sink while sending a message on
high-error-prone deployment area like under water. a particular path with dynamic key-based schemes can
The contributions of this paper are as follows: be written as follows (assuming computation cost,
1. a dynamic en route filtering mechanism that Ecomp, would approximately be fixed):
does not E Dyn= (EKdisc + Ecomp)* E[ŋh]* X / T, (1)

443
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Where X is the number of packets in a message, T is E[Ne]= N*π* t2r (5)


the key refresh rate in packets per key, EKdisc is the A
cost of shared key discovery with the next hop sensor
after initial deployment, and E[ŋh is the expected On the other hand, EBEK does rekeying without
number of hops. In the dynamic key-based schemes, messages. There are two operational modes of EBEK
Tmay change periodically, on demand, or after a node- (EBEK-I and EBEK-II). The details of these modes are
compromise. given in Section 4. However, for now it suffices to
know that EBEK-I is representative of a dynamic
system without rekeying messages, but with some
E[ŋh] = -----------+1 ( 2) initial neighborhood info exchange whereas EBEK-II is
E[dh] a dynamic system without rekeying messages and
without any initial neighborhood info exchange. Using
where D is the end-to-end distance (m) between the the energy values given in Fig. 1 shows the analytical
sink and the source sensor node, tr is the results for the above expressions. For both EBEK
approximated transmission range (m), and E[dh]_ is modes, we assume there would be a fixed cost of
the expected hop distance (m) An accurate estimation Ecomp
of E[dh] can be found in. Finally, EKdisc, can be
written as follows: because EBEK does not exchange messages to refresh
keys, but for EBEK-I, we also included the cost of
EKdisc . With this initial analysis, we see that dynamic
key-based schemes, in this scenario, spend a large
amount of their energy transmitting rekeying
messages. With this observation, same benefits of
dynamic key-based schemes, but with low energy
consumption. It does not exchange extra control
messages for key renewal. Hence, energy is only
consumed for generating the keys necessary for
protecting the communication. The keys are dynamic;
thus, one key per packet is employed. This makes
EBEK more resilient to certain attacks (e.g., replay
attacks, brute-force attacks, and masquerade attacks)

Sensing data
Forward data Drop data
Key
Fig. 1. Keying cost of dynamic key-based schemes Check
based on E[nh] Crypto Authentication

versus EBEK Source node


Module
Forwarding Sink
node
EKdisc = {E[Ne]* Enode)*M -2* Enode}, (3)

Enode =Etx + Erx + Ecomp, (4) Keying


Monitoring
module

where Enode is the approximate cost per node for key


generation and transmission, E[Ne] is the expected
Neighbor node
number of neighbors for a given sensor, M is the
Tim Upload Download
e data
number of key establishment messages between two
nodes, and Etx and Erx are the energy cost of
transmission and reception, respectively. Given the
transmission range of sensors (assuming bidirectional
communication links for simplicity), tr, total
deployment area, A, total number of sensors deployed,
N, E[Ne] can be computed as

444
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Fig. 2. Modular structure Wireless sensor network al. to address both the insider and outsider threats.
security using virtual energy based encryption However, the common downside of all these schemes
is that they are complicated for resource-constrained
3 SEMANTICS OF EBEK sensors
and they either utilize many keys or they transmit
The EBEK framework is comprised of three modules: many messages in the network, which increases the
Energy-Based Keying, Crypto, and Forwarding. The energy consumption of WSNs. Also, these studies have
energy-based keying process involves the creation of not been designed to handle dire communication
dynamic keys. Contrary to other dynamic keying scenarios unlike VEBEK. Another significant
schemes, it does not exchange extra messages to observation with all of these works is that a realistic
establish keys. A sensor node computes keys based on energy analysis of the protocols wasnot presented.
its residual virtual energy of the sensor. The key is Last, the concept of dynamic energy-based
then fed into the crypto module. The crypto module in encoding and filtering was originally introduced by the
VEBEK employs a simple encoding process, which is DEEF framework. Essentially, EBEK has been largely
essentially the process of permutation of the bits in the inspired by DEEF. However, VEBEK improves DEEF in
packet according to the dynamically created several ways. First, VEBEK utilizes virtual energy in
permutation code generated via RC4. The encoding is place of actual battery levels to create dynamic keys.
a simple encryption mechanism adopted for EBEK. EBEK‘s approach is more reasonable because in real
However, EBEK‘s flexible architecture allows for life, battery levels may fluctuate and the differences in
adoption of stronger encryption mechanisms in lieu of battery levels across nodes may spur synchronization
encoding. Last, the forwarding module handles the problems, which can cause packet drops. Second,
process of sending or receiving of encoded packets VEBEK integrates handling of communication errors
along the path to the sink. A high-level view of the into its logic, which is missing in DEEF. Last, EBEK is
EBEK framework and its underlying modules are implemented based on a realistic WSN routing
shown in Fig. 2. These modules are explained in protocol, i.e., Directed Diffusion , while DEEF
further detail below. Important notations usedare articulates the topic only theoretically. Another crucial
given in Table 1. idea of this paper is the notion of sharing a dynamic
cryptic credential (i.e., virtual energy) among the
sensors. A similar approach was suggested inside the
SPINS study via the SNEP protocol. In particular,
nodes share a secret counter when generating keys
and it is updated for every new key. However, the
SNEP protocol does not consider dropped packets in
the network due to communication errors. Although
another study, Minisec , recognizes this issue, the
solution suggested by the study still increases the
4. RELATED WORK packet size by including some parts of a counter value
into the packet structure. Finally, one useful pertinent
work surveys cryptographic primitives and
En route dynamic filtering of malicious packets has implementations for sensor nodes.
been the focus of several studies, including DEF by Yu
and Guan , As the details are given in the 5 CONCLUSION AND FUTURE WORK
performance evaluation section where they were
compared with the VEBEK framework, the reader is Communication is very costly for wireless sensor
referred to that section for further details as not to networks (WSNs) and for certain WSN applications.
replicate the same information here. Moreover, Ma‘s Independent of the goal of saving energy, it may be
work applies the same filtering concept at the sink very important to minimize the exchange of messages
and utilizes packets with multiple MACs appended. A (e.g., military scenarios). To address these concerns,
work proposed by Hyun and Kim uses relative location we presented a secure communication framework for
information to make the compromised data WSNs called Energy- Based Encryption and Keying. In
meaningless and to protect the data without comparison with other key management schemes,
cryptographic methods. In using static pairwise keys EBEK has the following benefits: 1) it does not
and two MACs appended to the sensor reports, ―an exchange control messages for key renewals and is
interleaved hop-by-hop authentication scheme for therefore able to save more energy and is less chatty,
filtering of injected false data‖ was proposed by Zhu et

445
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2) it uses one key per message so successive packets Wireless Sensor Networks,‖ Proc. IEEE
of the stream use different keys—making VEBEK more
INFOCOM, pp. 1-12, Apr. 2006.
resilient to certain attacks (e.g., replay attacks, brute-
force attacks, and masquerade attacks), and 3) it [5] C. Kraub, M. Schneider, K. Bayarou, and C.
unbundles key generation from security services,
Eckert, ―STEF: A Secure Ticket-Based En-Route
providing a flexible modular architecture that allows
for an easy adoption of different key-based encryption Filtering Scheme for Wireless Sensor Networks,‖
or hashing schemes. We have evaluated EBEK‘s
Proc. Second Int‘l Conf.
feasibility and performance through both theoretical
analysis and simulations. Our results show that Availability, Reliability and Security (ARES ‘07), pp.
different operational modes of EBEK (I and II) can be
310-317, Apr. 2007.
configured to provide optimal performance in a variety
of network configurations depending largely on the [6] S. Zhu, S. Setia, S. Jajodia, and P. Ning, ―An
application of the sensor network. We also compared
Interleaved Hop-by- Hop Authentication Scheme for
the energy performance of our framework with other
en route malicious data filtering schemes. Our results Filtering of Injected False Data in Sensor Networks,‖
show that VEBEK performs better (in the worst case
Proc. IEEE Symp.
between 60-100 percent improvement in energy
savings) than others while providing support for Security and Privacy, 2004.
communication error handling, which was not the
[7] A. Perrig, R. Szewczyk, V. Wen, D. Cullar,
focus of earlier studies. Our future work will address
insider and J. Tygar, ―Spins: Security Protocols for Sensor
threats and dynamic paths.
Networks,‖ Proc. ACM MobiCom, 2001.
[8] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and
E.Cayirci, ―Wireless Sensor Networks: A
Survey,‖ Computer Networks, vol. 38, no. 4,
6 REFERENCES pp. 393-422, Mar. 2002.
[1] S. Uluagac, C. Lee, R. Beyah, and J.
Copeland, ―Designing Secure Protocols for Wireless [9] C. Vu, R. Beyah, and Y. Li, ―A Composite Event
Sensor Networks,‖ Wireless Algorithms, Systems, and Detection in Wireless Sensor Networks,‖ Proc.
Applications, vol. 5258, pp. 503-514, Springer, 2008. IEEE Int‘l Performance, Computing, and
[2] H. Hou, C. Corbett, Y. Li, and R. Beyah, Comm. Conf. (IPCCC ‘07), Apr. 2007.
―Dynamic Energy-Based Encoding and Filtering in
Sensor Networks,‖ Proc. IEEE Military Comm. Conf. [10] L. Eschenauer and V.D. Gligor, ―A Key-
(MILCOM ‘07), Oct. 2007. Management Scheme for Distributed Sensor
[3] F. Ye, H. Luo, S. Lu, and L. Zhang, ―Statistical Networks,‖ Proc. Ninth ACM Conf. Computer
En-Route Filtering of Injected False Data in Sensor and Comm. Security, pp. 41-4, 2002.
Networks,‖ IEEE J. Selected Areas in Comm., vol. 23, [11] M. Eltoweissy, M. Moharrum, and R.
no. 4, pp. 839-850, Mukkamala, ―Dynamic Key Management in
Apr. 2005. Sensor Networks,‖ IEEE Comm. Magazine, vol.
[4] Z. Yu and Y. Guan, ―A Dynamic En-Route 44, no. 4, pp. 122-130, Apr. 2006
Scheme for Filtering False Data Injection in [12] K. Akkaya and M. Younis, ―A Survey on
Routing Protocols for Wireless Sensor

446
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Networks,‖ Ad Hoc Networks, vol. 3, pp. 325- 3) http://webhosting.devshed.com/c/a/Web-


349, May 2005. Hosting-Articles/Wireless-Sensor-Networks-
[13] S. Uluagac, R. Beyah, and J. Copeland, part-2-Limitations/
―Secure Source-Based Time Synchronization
(SOBAS) for Wireless Sensor Networks,‖ 4) http://en.wikipedia.org/wiki/Network_security
technical report, Comm. Systems Center,
School of Electrical and Computer Eng., 5) http://www.cc.boun.edu.tr/network_security.h
Georgia Inst. of Technology, http://users.ece. tml
gatech.edu/selcuk/sobas-csc-techreport.pdf,
2009. 6) http://nile.wpi.edu/NS/
[14] R. Venugopalan et al., ―Encryption Overhead
in Embedded Systems and Sensor Network 7) http://www.wpi.edu
Nodes: Modeling and Analysis,‖ Proc. ACM Int‘l
Conf. Compilers, Architecture, and Synthesis
for Embedded Systems (CASES ‘03), pp. 188-
197, 2003.
[15] M. Passing and F. Dressler, ―Experimental
Performance Evaluation of Cryptographic
Algorithms on Sensor Nodes,‖ Proc. IEEE Int‘l
Conf. Mobile Adhoc and Sensor Systems, pp.
882-887, Oct. 2006.
[16] J. Hyun and S. Kim, ―Low Energy Consumption
Security Method for Protecting Information of
Wireless Sensor Networks,‖ Advanced Web
and Network Technologies, and Applications,
vol. 3842, pp. 397-404, Springer, 2006.
[17] A. Perrig, R. Szewczyk, V. Wen, D. Cullar, and
J. Tygar, ―Spins: Security Protocols for Sensor
Networks,‖ Proc. ACM MobiCom, 2001.

.Sites Referred

1) http://en.wikipedia.org/wiki/Sensor_Networks

2) http://searchdatacenter.techtarget.com/definiti
on/sensor-network

447
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ENSEMBLE REGISTRATIONOF MULTI SENSOR


IMAGES
*S.Alagumani, **Mrs.S.Vanitha
*Computer Science Department S.A .Engineering College, Chennai-77
r.alagumani@gmail.com
**Computer Science Department S.A .Engineering College, Chennai-77
Vanith_81@yahoo.co.in

Abstract : Image registration is a fundamental register them all together. We call this collection of
operation in image analysis. Conventionally ensembles, images an ensemble. The vast majority of registration
the image sets, are registered by choosing one image methods are designed to register only two images at a
as a template, and every other image is registered to time. It is not clear how to use these pairwise methods
it. The problem with this pair wise approach is that the for ensemble registration.
results depend on which image is chosen as the The problem of registration becomes more difficult
template. Since different sensors create images with when the images come from different sources. For
different features the issue is particularly acute for example, a body part could be imaged with different
multi-sensor ensembles. The problem of registration modalities such as magnetic resonance imaging (MRI),
becomes more difficult when the images come from computed tomography (CT), and positron emission
different sources. This paper addresses the question of tomography (PET), or a region of the earth captured
how to register more than two images. In this paper, by satellite imagery using a variety of different
we present a method that employs clustering to sensors, or several images of a face acquired with
simultaneously register an entire ensemble of images. different illumination conditions. In these cases, the
The method clustering in the JISP, jointly modeling the image intensities cannot be compared directly
distribution of points in the JISP as it estimates the because, although the images depict the same
motion parameters. The method computes the content, they do so with different transfer functions.
registration solution, and at the same time generates a We refer to such registration problems as multisensor
model of the transfer functions among the images of registration.
the ensemble.
Pairwise ensemble registration has the undesirable
Index Terms—registration, multi-sensor, multi-image, property that the solution depends on which pairs of
mutual images are chosen and registered. We will refer to this
information, Gaussian mixture models. issue as selection dependency. In addition, most
pairwise registration methods do not offer a way to
I. INTRODUCTION guarantee that redundancy in the solution is
Image registration is the process of transforming consistent.
different sets of data into one coordinate system. Data
may be multiple photographs, data from different
sensors, from different times, or from different
viewpoints It is used in, computer vision, medical
imaging, military automatic target recognition, and streaming. Many other applications, such as
compiling and analyzing images and data from traditional FTP or file-sharing applications, rely on
satellites. Registration is necessary in order to be able delay-sensitive protocols, such as TCP, and are
to compare or integrate the data obtained from these therefore in turn delay-sensitive as well. For such
different measurements. applications, it is well known that the level of traffic
This paper addresses the question of how to register perturbation caused by the mix network must be
more than two images. Suppose you have several carefully chosen in order to not unduly affect delay
images—all of the same content—and you want to and throughput requirements of the applications.

448
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

This paper focuses on the quantitative to automatically and simultaneously solve two
evaluation of mix performance. We focus our analysis problems, image alignment and clustering. Both the
on a particular type of attack, which we call the flow- alignment parameters and clustering parameters are
correlation attack. In general, flow-correlation attacks formulated into a unified objective function, whose
attempt to reduce the anonymity degree by estimating optimization leads to an unsupervised joint estimation
the path of flows through the mix network. Flow approach. It is further extended to semi-supervised
correlation analyzes the traffic on a set of links simultaneous estimation where a few labeled images
(observation points) inside the network and estimates are provided. Extensive experiments on diverse real-
the likelihood for each link to be on the path of the world databases demonstrate the capabilities of our
flow under work on this challenging problem.
Fig1 Success rate of existing pairwise registration 4) Image recognition
method The recognition of members of certain object
classes, such as faces or cars, can be substantially
improved by first transforming a detected object into a
. Consider a pairwise method that registers phantom canonical pose. Such alignment reduces the variability
image A to B and B to C. By composing those two that a classifier/recognizer must cope with in the
transformations, one can derive a transformation from modeling stage. Given a large set of training images,
A to C. However, it is extremely unlikely that one popular alignment approach is called congealing
registering A to C will yield exactly the same which jointly estimates the alignment/warping
transformation. parameters in an unsupervised manner for each image
We refer to this phenomenon as internal in an ensemble. It has been shown that congealing
inconsistency. We hypothesize that a registration can be reliably performed for faces and it improves the
strategy that egisters allthe images simultaneously can appearance-based face recognition performance. The
avoid both selection dependency and internal conventional congealing approach works on an image
inconsistency. That is, including all the images in a ensemble of a single object class. However, in
single, global registration problem precludes the need practices we often encounter the situation where the
to Choosewhich pairs to register, while generating a multiple object classes, or object modes exhibited in
solution that is not redundant (and, thus, is internally an ensemble.
consistent). Moreover, we hypothesize that the
statistical power of using all the images at the same II. BACKGROUND
time, rather than just two at a time, Consider two images, one overlaid on the other.
will yield more accurate registration solutions. Each pixel corresponds to two intensity values, one
In this paper, we present a method that employs from each of the two images. This 2-tuple can be
clustering to simultaneously register an entire plotted in the joint intensity space, where each axis
ensemble of images. The method computes the corresponds to intensity from each of the images.
registration solution, and at the same time generates a Plotting the points for all the pixels creates a scatter
model of the transfer functions among the images of plot in this joint intensity space, and we refer to this
the ensemble. scatter plot as the joint intensity scatter plot, or JISP.
The implicit assumption linking different images of
A. Goals the same object is that they are recognizable as the
same object because of some consistency by which
Major contributions are summarized as intensities are assigned to components in the image.
follows: The pixels with intensities near x in one image often
correspond to pixels with intensities near y in the other
3) Image transformation image. We call this correspondence an intensity
Joint alignment for an image ensemble can rectify mapping. An intensity mapping need not be one-to-
images in the spatial domain such that the aligned one.
images are as similar to each other as possible. This Each object in an image corresponds to a coherent
important technology has been applied to various collection of points in the JISP. As two images are
object classes and medical applications. However, moved out of register, the spatial correspondence of
previous approaches to joint alignment work on an objects in the images gets disturbed, causing the
ensemble of a single object class. Given an ensemble coherence of the JISP to be disrupted. The clusters
with multiple object classes, we propose an approach and swaths of scatter points spread out and move
around because some bone pixels are now paired with

449
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

muscle pixels, others with fat pixels, etc. Intensity- (i) Density Estimation
based multi-sensor image registration is based on this
(ii) Motion Adjustment
observation. The objective is to move the images until
the JISP is optimally coherent, or minimally disperse.
One of the most successful applications of this idea Suppose we are registering an ensemble of D
uses the entropy of the joint histogram to quantify images. Then, each pixel in our image domain has D
dispersion. Given the JISP between two images, one values associated with it. We will refer to the vector of
forms a joint histogram to reflect the density of points intensities for a single pixel as an ―intensity vector‖,
in the scatter plot. One can compute the entropy of and denote the intensity vector for pixel .
this histogram. The lower the entropy, the more Let us represent our density estimate by . If we
compact and tightly clustered the scatter plot, and model the pixels as spatially independent variables, the
hence the more closely registered the two images. likelihood of observing the images can be written
The same idea can be applied to ensemble [63]
registration. The problem with the entropy-based
methods is that they do not scale well for registration
with more than two images. The joint histogram is an where p is a probability function (defined later) and
intermediary to those cost functions, and as you add x denotes a pixel in our image domain (usually a
more images to the problem, the number of histogram subset of R2 or R3). Thus, L( ) is the probability of
bins increases exponentially. For example, the joint observing the set of intensity vectors, given the
histogram among five images, with each axis distribution specified by .
partitioned into 64 bins, has 230 bins (over 1 billion).
Because of the form of L, it is easier to optimize its
With 256 intensity bins per image, it gives us 240 bins logarithm, log L, because the product over x turns into
(over 1 trillion). Hence,these histogram-based a sum,
methods are infeasible for ensemble registration.
[64]
Some registration methods measure the dispersion
in the JISP without the need to form the joint
histogram. The dispersion is quantified as the length of
a minimum-length spanning tree on the joint intensity
scatter plot. Roche et al. model the clusters in the Our aim to maximize log L(ø,ө). There are three
JISP as a polynomial, thus assuming a functional steps in our algorithm
relation between the intensities in the two images.
This is often not the case. Some other ensemble
registration methods have recently emerged in the Gaussian mixture model
literature .However, these methods have not been
demonstrated on multi-sensor image ensembles, but
rather focus on the problem of registering a set of
images from the same modality to form a template (or Density Estimation
so-called atlas). A different domain-specific method
was designed to simultaneously register sets of brain
MR images, but relies on the use of a human brain
atlas to perform tissue classification, and then aligns
the tissue-classification images Finally, one method
Motion Adjustment
jointly registers and clusters a set of motion-corrupted
images, automatically grouping images by similarity.
However, their method assumes that the set of images
is composed of moved and noisy versions of a set of
F. Gaussian Mixture Model
prototype images, so the registration of the images to
their class archetype amounts to mono-modal  The density of points in the JISP is modelled
registration. Hence, these methods are not suitable for using a Gaussian Mixture Model.
general-purpose multi-sensor ensemble registration  The mixture consists of K Gaussian
components, each specified by a mean μ and
.III. ENSEMBLE REGISTRATION METHOD covariance matrix∑. Then, for a single pixel
Our approach to minimizing the dispersion in the location x, the likelihood of observing the
JISP involves two steps: intensity vector Ix is

450
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

at each scale is used as an initial guess for the next


scale. In general, images were registered at scales
10%, 20%, 50%, and then finally 100%. Though not
required in our model, each image in the ensemble
was subject to the same type of spatial transformation
(either rigid-body, or affine). Each image then had the
G. Density Estimation same number of motion parameters ,M , associated
with its own transformation .Thus , the total number of
motion parameters stored in the vector is MD.
 Taking Ø to be the correct motion, we can
improve our density estimate by optimizing log
L( ) with respect to the probability density
function .
 For the GMM,we can find the optimal value
iteratively using the expectation-maximization
(EM) algorithm .
 The algorithm has an expectation step that
maps scatter points to clusters, followed by a
maximization step that reestimates the optimal
clusters. The advantage of using this algorithm
with a GMM is that each iteration has a closed
form, least-squares solution.
 In the context of ensemble registration, the
expectation step divvies-up the membership of
each intensity vector among the K clusters.
The membership of pixel Ix to cluster k is

H. Motion Adjustment

 The other half of the method involves holding


Ø, fixed,and using it to find a motion
increment that moves all the scatter points so
that the overall log-likelihood, log L(Ø,ө), is
increased.
 We describe here a Newton-type step. To
optimize log L(Ø,ө) with respect to the
parameters ө, we set its gradient vector to
zero,

The method computes the registration solution, and


at the same time generates a model of the transfer
functions among the images of the ensemble.
IV. PROCESS DESCRIPTION Our method requires an initial density estimate. The
initialization has three phases, with increasing degrees
A. Clustering Algorithm of freedom in the maximization step. In the first
We use a multiresolution framework in which phase, only the means are adjusted using . In the
images are registered first at a low resolution, and second phase, the weights are also adjusted using .
then at successively higher resolutions]. The solution Finally, in the last phase, the covariances are also

451
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

adjusted. Ten EM iterations are executed for each of because function may be compartmentalized and
these three phases. interfaces are simplified. Software architecture
The algorithm has an expectation step that maps embodies modularity, that is, software is divided into
scatter points to clusters, followed by a maximization separately named and addressable components called
step that re-estimates the optimal clusters. The modules that are integrated to satisfy problem
advantage of using this algorithm with a GMM is that requirements.
each iteration has a closed-form, least-squares The following are the modules of the project which
solution. Density estimation of the clusters is modeled is planned in aid to complete the project with respect
as a Gaussian mixture model (GMM), and is to the proposed system, while overcoming existing
established iteratively using an estimation- system and also providing the support for the future
maximization (EM) enhancement system. They are listed below:-
method. The motion parameters are also solved using (i) user interface design
an iterative Newton-type method. The iterates of these (ii) Image conversion
two methods are interleaved, thereby solving the two (iii) Recognition
problems (density estimation and registration) in (iv) Output
synchrony We design the windows for the project in user
interface design. These windows are used to give input
V. SYSTEM ANALYSIS to the process and to display the output.Recognition is
performed at the level of informative features
extracted from images of objects. The features are
A. Architecture combined in vectors called templates. Templates of
different objects are stored in a library.

INPUT IMAGE
IMAGE CONVERSIO
N VI. EMPIRICAL EVALUATION
A Satellite images
Out of the 300 pairwise registrations (10 trials, each
IMG3 IMG4
with 30 registration pairs), the initial average error for
IMG1 IMG2
the unregistered images was 15.9 pixels. The
ensemble clustering registration method failed on 20
of them. The pairwise clustering method failed on 150
pairs (50%), and FLIRT‘s correlation ratio method
failed on 37 of the pairs (12%).Both clustering
IMAGE IMAGE OUTPUT methods used a multiresolution framework with scales
TEMPLATE COMPARISON IMAGE 20%, 50% and 100%.
It is worth noting that the 20 misregistration cases
for the ensemble method were the result of only two
Fig 2. Architecture registration failures. In each of two trial, one of the six
images failed to converge to the other five (and vice
versa), and was thus recorded as 10 misregistered
The proposed system architecture consist of four image pairs. Of the successful cases, the ensemble
main modules (objectives), namely, the user interface clustering method had a mean error of 0.31, while the
module, image conversion module, template module pairwise clustering method and FLIRT‘s CR method
and recognition module reported 0.65 and 0.41, respectively
This paper generally used to register images from
different sensors.
B. Modules
A modular design reduces complexity, facilities
change (a critical aspect of software maintainability)
and results in easier implementation by encouraging
parallel development of different parts of the system.
Software with effective modularity is easier to develop

452
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

In this paper, we have not considered strategies for


increasing or decreasing the number of Gaussian
components during registration. However, we
conjecture that the number of Gaussian components
can be used in a manner similar to a multi-resolution
strategy; starting with fewer components will
help guide the registration process toward the global
optimum,
while increasing the number of components as
convergence progresses can improve accuracy.
B. Number of Gaussian Components
Table II shows the results of running the ensemble VIII. SUMMARY AND FUTURE WORK
clustering method using different numbers of Gaussian Ensemble registration is the process of
components. The mean error (over 10 trials) suggests registering multiple images together simultaneously
that K, the number of Gaussian components, plays a within a single optimization problem. This approach for
significant role in the success or failure – of the multisensor registration was not previously feasible
ensemble clustering registration method. With too few because the high-dimensional joint histogram was too
components (3 instead of 5), the registration success large to store in memory. Instead, we use a Gaussian
rate declined from 100% to 63.3%. Having too many mixture model to perform density estimation of the
components also reduced the success rate, though content in the joint intensity space. This GMM model
only lightly (93.3%). However, the number of naturally leads to a cost function based on likelihood.
components did not seem to have a significant impact We formulate an optimization problem that
on the accuracy of the trials that did converge. has two aspects, developing solutions for the density
It should be noted that the cluster-modeling hiccups estimation and motion parameters in synchrony.
can be observed with any number of Gaussian Within each iteration, we hold the motion parameters
components. For example, some K = 5 cases showed fixed and update the density estimation parameters,
stretching of a Gaussian component to model two and then hold the density estimation parameters fixed
clusters, thereby freeing up one component to model a and update the motion parameters.
partial-volume branch (similar to that shown in (d)). Our experiments show that ensemble
However, these variations do not necessarily registration is more robust than pairwise registration.
devastates The content shared by one pair of images might be
quite different from the content shared by another pair
of images. The key is to leverage all these
correspondences simultaneously. Ensemble
registration does exactly that, implicitly coupling the
content of all the images into one optimization
problem

[65]
[66] REFERENCES
VII. DISCUSSION [1] M. Jenkinson and S. Smith, ―A global optimisation
Our method can be viewed as a parametric method for robust
regression method, with the number of parameters affine registration of brain images,‖ Med Image Anal,
dictated by the number of Gaussian components. The vol. 5, no. 2, pp.
clustering registration method scales linearly with the 143–156, 2001.
number of Gaussian components (k), and the number [2] A. Collignon, F. Maes, D. Delaere, D.
of pixels (N). However, the computation time is Vandermeulen, P. Suetens, and
proportional to the square of the number of motion G. Marchal, ―Automated multi-modality image
parameters (M), and the cube of the number of registration based on
images (D) because of the matrix products in (17). information theory,‖ in Proceedings of Info Proc Med
More precisely, the method has computational Imaging, Y. Bizais,
complexity O(kNM2D3) C. Barillot, and R. Di Paola, Eds., 1995, pp. 263–274.

453
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[3] W. M. Wells III, P. Viola, H. Atsumi, S. Nakajima,


and R. Kikinis,
―Multi-modal volume registration by maximization of
mutual information,‖
Med Image Anal, vol. 1, no. 1, pp. 35–51, 1996.
[4] C. Studholme, D. L. G. Hill, and D. L. Hawkes, ―An
overlap invariant
entropy measure of 3D medical image alignment,‖
Pattern Recognition,
vol. 32, pp. 71–86, 1999.
[5] H. Neemuchwala, A. Hero, and P. Carson, ―Image
matching using alphaentropy
measures and entropic graphs,‖ Signal Processing, vol.
85, no. 2,
pp. 277–296, 2005.
[6] A. Roche, G. Malandain, X. Pennec, and N. Ayache,
―The correlation
ratio as a new similarity measure for multimodal image
registration,‖
in MICCAI‘98, Lecture Notes on Computer Science,
LNCS, W. Wells,
A. Colchester, and S. Delp, Eds., vol. 1996, 1998, pp.
1115–1124.

[7] A. Guimond, A. Roche, N. Ayache, and J. Meunier,


―Three-dimensional
multimodal brain warping using the demons algorithm
and adaptive
intensity corrections,‖ IEEE Trans Med Imaging, vol.
20, no. 1, pp.
58–69, January 2001.
[8] M. E. Leventon and W. E. L. Grimson, ―Multi-modal
volume registration
using joint intensity distributions,‖ in MICCAI‘98,
Lecture Notes on
Computer Science, LNCS, W. Wells, Ed. Springer-
Verlag, 1998, pp.
1057–1066.
[9] R. P. Woods, S. T. Grafton, C. J. Holmes, S. R.
Cherry, and J. C.
Mazziotta, ―Automated image registration: I. General
methods and
intrasubject, intramodality validation,‖ J Comput Assist
Tomogr, vol. 22,
pp. 139–152, 1998.
[10] K. K. Bhatia, J. V. Hajnal, B. K. Puri, A. D.
Edwards, and D. Rueckert,
―Consistent groupwise non-rigid registration for atlas
construction,‖
in Proc.of the IEEE International Symposium of
Biomedical Imaging
(ISBI‘04), vol. 1, April 2004, pp. 908–911.

454
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

EMBEDDING CRYPTOGRAPHY IN VIDEO


STEGANOGRAPHY

*G.SHOBA.,B.E.,M.E **S.UMA.,B.E.,M.E

*Dr.Paul’s Engineering College ,mailtoshoba@gmail.com,09443033607


**Dr.Paul’s Engineering College ,dewuma@gmail.com,08825226654
Abstract: elements of finite possible plaintexts, finite possible
Steganography is the art of hiding information in ways ciphertexts, finite possible keys, and the encryption
that avert the revealing of hiding messages whereas and decryption algorithms which correspond to each
cryptographic techniques try to conceal the contents of key. Keys are important, as ciphers without variable
a message. Video Steganographic Scheme that can keys can be trivially broken with only the knowledge of
provide provable security with high computing speed, the cipher used and are therefore useless (or even
that embed secret messages into images without counter-productive) for most purposes.
producing noticeable changes. Here we are embedding Text, image, audio, and video can be
data in video frames. In this work, we show a model represented as digital data. The explosion of Internet
for a case where extreme security is needed. In such applications leads people into the digital world, and
case steganocryptography (steganography and communication via digital data becomes recurrent.
cryptography) is used. In this model we use Secure However, new issues also arise and have been
Hash Algorithm-2. explored, such as data security in digital
communications, copyright protection of digitized
1. Introduction properties, invisible communication via digital media,
The security and privacy of digital videos has etc.
become increasingly more important in today's highly Steganography is the art of hiding information
computerized and interconnected world. Digital media in ways that prevent the detection of hiding message.
content must be protected in applications such as pay- In steganography, the object of communication is the
per-view TV or confidential video conferencing, as well hidden message and the cover data are only the
as in medical, industrial or military multimedia means of sending it. Secret information as well as
systems. With the rise of wireless portable devices, cover data can be any multimedia data like text,
many users seek to protect the private multimedia image, audio, video etc
messages that are exchanged over the wireless or
wired networks. In general, applying a well- 2. Cryptography
established, general-purpose hash function encryption Data that can be read and understood without
algorithm to ensure the confidentiality during video any special measures is called plaintext or cleartext.
transmission is a good idea from a security point of The method of disguising plaintext in such a way as to
view. hide its substance is called encryption. Encrypting
cryptography referred almost exclusively plaintext results in unreadable gibberish called
to encryption, which is the process of converting ciphertext. You use encryption to ensure that
ordinary information plaintext into unintelligible information is hidden from anyone for whom it is not
i.e., ciphertext. Decryption is the reverse, in other intended, even those who can see the encrypted data.
words, moving from the unintelligible ciphertext back The process of reverting ciphertext to its original
to plaintext. A cipher is a pair of algorithms that create plaintext is called decryption.A cryptographic
the encryption and the reversing decryption. The algorithm, or cipher, is a mathematical function used
detailed operation of a cipher is controlled both by the in the encryption and decryption process. A
algorithm and in each instance by a key. This is a cryptographic algorithm works in combination with a
secret parameter (ideally known only to the key—a word, number, or phrase—to encrypt the
communicants) for a specific message exchange plaintext. The same plaintext encrypts to different
context. A "cryptosystem" is the ordered list of ciphertext with different keys. The security of

455
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

encrypted data is entirely dependent on two things: DES or DES-EDE (encrypt-decrypt-encrypt), uses three
the strength of the cryptographic algorithm and the applications of DES and two independent DES keys to
secrecy of the key produce an effective key length of 168 bits.
IDEA uses a fixed length, 128-bit key (larger
than DES but smaller than Triple-DES). It is also faster
than Triple-DES. These use variable length keys and
are claimed to be even faster than IDEA. Despite the
efficiency of symmetric key cryptography , it has a
fundamental weak spot-key management. Since the
same key is used for encryption and decryption, it
must be kept secure. If an adversary knows the key,
then the message can be decrypted. At the same time,
the key must be available to the sender and the
receiver and these two parties may be physically
separated. Symmetric key cryptography transforms the
problem of transmitting messages securely into that of
Plain Encryptio Ciphe transmitting keys securely. This is an improvement ,
because keys are much smaller than messages, and
Text n r Text the keys can be generated beforehand. Nevertheless,
ensuring that the sender and receiver are using the
same key and that potential adversaries do not know
this key remains a major stumbling block. This is
referred to as the key management problem.

Plain Decryption Public/Private Key Cryptography


Text
Asymmetric key cryptography overcomes the
key management problem by using different
encryption and decryption key pairs. Having
knowledge of one key, say the encryption key, is not
Fig 1-Process of Encryption and Decryption sufficient enough to determine the other key - the
decryption key. Therefore, the encryption key can be
made public, provided the decryption key is held only
2.1 TYPES OF CRYPTOGRAPHIC ALGORITHMS by the party wishing to receive encrypted messages
(hence the name public/private key cryptography).
There are several ways of classifying Anyone can use the public key to encrypt a message,
cryptographic algorithms. For purposes of this but only the recipient can decrypt it.
report they will be categorized based on the RSA is a widely used public/private key
number of keys that are employed for algorithm is, named after the initials of its inventors,
encryption and decryption, and further defined Ronald L. Rivest, Adi Shamir, and Leonard M. Adleman
by their application and use. The following are . It depends on the difficulty of factoring the product
the three types of Algorithm that are disscused of two very large prime numbers. Although used for
encrypting whole messages, RSA is much less efficient
Symmetric Key Cryptography than symmetric key algorithms such as DES. ElGamal
is another public/private key algorithm . This uses a
The most widely used symmetric key different arithmetic algorithm than RSA, called the
cryptographic method is the Data Encryption Standard discrete logarithm problem.
(DES) . It is still the most widely used symmetric-key The mathematical relationship between the
approach. It uses a fixed length, 56-bit key and an public/private key pair permits a general rule: any
efficient algorithm to quickly encrypt and decrypt message encrypted with one key of the pair can be
messages. It can be easily implemented in hardware, successfully decrypted only with that key's
making the encryption and decryption process even counterpart. To encrypt with the public key means you
faster. In general, increasing the key size makes the can decrypt only with the private key. The converse is
system more secure. A variation of DES, called Triple-

456
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

also true - to encrypt with the private key means you The 256, 384, and 512-bit versions of SHA2
can decrypt only with the public key. share the same interface. . SHA-256 and SHA-512 are
novel hash functions computed with 32- and 64-bit
Hash functions words, respectively. They use different shift amounts
and additive constants, but their structures are
―Is a type of one-way function this are fundamental otherwise virtually identical, differing only in the
for much of cryptography. A one way function - is a number of rounds. SHA-224 and SHA-384 are simply
function that is easy to calculate but hard to invert. It truncated versions of the first two, computed with
is difficult to calculate the input to the function given different initial values.The SHA-2 functions are not as
its output. The precise meanings of "easy" and "hard" widely used as SHA-1, despite their better security.
can be specified mathematically. With rare exceptions,
almost the entire field of public key cryptography rests 3. Steganography
on the existence of one-way functions
In this application, functions are characterized The objective of this work is to develop a
and evaluated in terms of their ability to withstand Compressed Video Steganographic Scheme that can
attack by an adversary. More specifically, given a provide provable security with high computing speed,
message x, if it is computationally infeasible to find a that embed secret messages into images without
message y not equal to x such that H(x) = H(y) then H producing noticeable changes. Here we are embedding
is said to be a weakly collision-free hash function. data in video frames. A video can be viewed as a
A strongly collision-free hash function H is one for sequence of still images. Data embedding in videos
which it is computationally infeasible to find any two seems very similar to images. However, there are
messages x and y such that H(x) = H(y). many differences between data hiding in images and
The requirements for a good cryptographic videos, where the first important difference is the size
hash function are stronger than those in many other of the host media. Since videos contain more sample
applications (error correction and audio number of pixels or the number of transform domain
identification not included). For this reason, coefficients, a video has higher capacity than a still
cryptographic hash functions make good stock hash image and more data can be embedded in the video.
functions--even functions whose cryptographic security Also, there are some characteristics in videos which
is compromised, such as MD5 and SHA-1. The SHA-2 cannot be found in images as perceptual redundancy
algorithm, however, has no known compromises‖ hash in videos is due to their temporal features.
function ca also be referred to as a function with
certain additional security properties to make it
suitable for use as a primitive in various information
security applications, such as authentication and
message integrity. It takes a long string (or message) Original Embed
of any length as input and produces a fixed length image/Frame Cipher Text
string as output, sometimes termed a message digest
or a digital fingerprint.

2.2 SHA 2
The SHA2 functions implement the NIST
Secure Hash Standard. The SHA2 functions are used
to generate a condensed representation of a message
Stego Image
called a message digest, suitable for use as a digital
signature. There are three families of functions, with
names corresponding to the number of bits in the
resulting message digest. The SHA-256 functions are
limited to processing a message of less than 2^64 bits
as input. The SHA-384 and SHA-512 functions can
process a message of at most 128 - 1 bits as input. Fig 2 -Steganography Process
The SHA2 functions are considered to be more secure
than the sha1 functions with which they share a Here data hiding operations are executed
similar interface. entirely in the compressed domain. On the other hand,
when really higher amount of data must be embedded

457
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

in the case of video sequences, there is a more This architecture consists of four functions: I, P and B
demanding constraint on real-time effectiveness of the frame extraction, the scene change detector, motion
system. The method utilizes the characteristic of the vectors calculation and the data embedder and
human vision‘s sensitivity to color value variations. The steganalysis. The details of data embedding in P and B
aim is to offer safe exchange of color stego video frames are as follows:
across the internet that is resistant to all the 1. For each P and B frames, motion vectors are
steganalysis methods like statistical and visual extracted from the bitstream.
analysis. 2. The magnitude of each motion vector is calculated
Image based and video based steganographic as follows:
techniques are mainly classified into spatial domain MVj |= sqrt(| Hj2+Vj 2 )
and frequency domain based methods. The former where j MV the motion vector of the jth macroblock,
embedding techniques are LSB, matrix embedding etc. and i H is horizontal and j V is the vertical components
Two important parameters for evaluating the of the MV respectively.
performance of a steganographic system are capacity 3. This magnitude is compared with a threshold
and imperceptibility. Capacity refers to the amount of 4. Select the block with maximum magnitude and
data that can be hidden in the cover medium so that embed the data using PVD method
no perceptible distortion is introduced. Imperceptibility To increase the capacity of the hidden secret
or transparency represents the invisibility of the hidden information and to provide an imperceptible
data in the cover media without degrading the stegoimage
perceptual quality by data embedding. for human vision, here pixel-value differencing (PVD)
is used for embedding

3.1 Compressed Video


Steganographic Algorithm
Here a novel steganographic approach called
tri-way pixel-value differencing with pseudorandom
dithering (TPVDD) is used for embedding. TPVDD
enlarges the capacity of the hidden secret information
and provide an imperceptible stego-image for human
vision with enhanced security. A small difference value
of consecutive pixels can be located on a smooth area
and the large one is located on an edged area.
According to the properties of human vision,
eyes can tolerate more changes in sharp-edge blocks
than in smooth blocks. That is, more data can be
embedded into the edge areas than into smooth areas.
This capability is made used in this approach which
leads to good imperceptibility with a high embedding
rate. The Tri-way Differencing Scheme is explained as
follows. In general, the edges in an image are roughly
classified into vertical, horizontal, and two kinds of
diagonal directions. Motivated from the PVD method,
using two-pixel pairs on one directional edge can work
efficiently for information hiding. This should
accomplish more efficiency while considering four
Fig 3 Process of Steganography and directions from four two pixel pairs
Steganalysis

A steganographic algorithm for compressed


video is introduced here, operating directly in
compressed bit stream. The secret data‘s are Cipher Text Embedding in video
embedded in I frame, and in P frames and in B Figure 4 shows frames before and after
frames. This secure compressed video Steganographic embedding. Here Text data‘s are the secret
architecture taking account of video statistical information .
invisibility .The frame work is shown in the Figure 3.

458
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[7] G. C. Langelaar and R. L. Lagendijk.: Optimal


Differential Energy Watermarking of DCT Encoded
Images and Video. IEEE Trans. on Image Processing,
2001, 10(1):148-158.
[8] Bin Liu, Fenlin Liu, Chunfang Yang and Yifeng Sun,
―Secure Steganography in Compressed Video
Bitstreams‖ " Proc of the Int. Conf. IEEE ARS ,pp 520-
525,2008
[9] A.Hanafy,Gouda I.salama and Yahya Z.Mohasseb,
―A Secure Covert Communication model Based On
Video Steganography‖ Proc of the Int. Conf. IEEE
Fig 4 Frame Before and after Embedding Military Communication,2008
[10] M. Abadi and B. Blanchet.
Conclusion Secrecy types for asymmetric communication.
In Foundations of Software Science and Computation
A Video Steganographic Scheme along with Structures, volume 2030 of Lecture Notes in Computer
SHA 2 was proposed in this paper, operating directly in Science, pages 25–41. Springer, 2001.
compressed domain. This technique provides high [11]M. Abadi and B. Blanchet.
capacity and imperceptible stego-image for Analyzing security protocols with secrecy types and
humanvision of the hidden secret information. Here logic programs.
the frame with maximum scene change blocks were In 29th ACM Symposium on Principles of Programming
used for embedding. The performance of the Languages, pages 33–44, 2002.
steganographic algorithm is studied and experimental [12]M. Abadi.
results shows that this scheme can be applied on Secrecy by typing in security protocols.
compressed videos with no noticeable degradation in Journal of the ACM, 46(5):749–786, September 1999.
visual quality. [13] X. Y. Wang, X. J. Lai, D. G. Feng, H. Chen, X. Y.
Yu. Cryptanalysis for Hash Functions MD4 and
References RIPEMD. Advances in Cryptology–Eurocrypt‘05, pp.1-
18, SpringerVerlag, May 2005.
[1] F Hartung., B. Girod.: Watermarking of [14] X. Wang, Y. Yin, H. Yu, Finding Collisions in the
uncompressed and compressed video, Signal Full SHA-1. In Advances in Cryptology - CRYPTO '05,
Processing, Special Issue on Copyright Protection and 2005.
Access Control for Multimedia Services, 1998, 66 (3): [15] A. Lenstra, X. Wang and B. de Weger, Colliding
283-301. X.509 Certificates, Cryptology ePrint Archive, Report
[2] Bin Liu., Fenlin Liu., Chunfang Yang and Yifeng 2005/067, 2005. Available at: http://eprint.iacr.org/
Sun.: Secure Steganography in Compressed Video
Bitstreams,The Third International Conference on
Availability,Reliability and Security,2008
[3] Ko-Chin Chang., Chien-Ping Chang., Ping S.
Huang., and Te-Ming Tu,: A Novel Image
Steganographic Method Using Tri-way Pixel-Value
Differencing, Journal of Multimedia , VOL. 3, NO. 2,
JUNE 2008
[4] Y. K. Lee., L. H. Chen.: High capacity image
steganographic model, IEE Proceedings on Vision,
Image and Signal Processing, Vol. 147, No.3, pp. 288-
294, 2000.
[5] D.-C. Wu., and W.-H. Tsai.: A steganographic
method for images by pixel-value differencing, Pattern
Recognition Letters, Vol. 24, pp. 1613–1626, 2003
[6] Y. J. Dai., L. H. Zhang and Y. X. Yang.: A New
Method of MPEG VideoWatermarking Technology.
International Conference on Communication
Technology Proceedings (ICCT), 2003.

459
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ENHANCED VEHICLE DETECTION BY


EARLY OBJECT IDENTIFICATION
N.NIROSHA
M.E. – COMPUTER SCIENCE AND ENGINEERING
PRATHYUSHA INSTITUTE OF TECHNOLOGY AND MANAGEMENT
TIRUVALLUR- 602 025
niroshait@gmail.com

Abstract— Object detection and classification are autonomous system. Especially, object classification
necessary components in an artificially intelligent plays a major role in applications such as security
autonomous system. It is expected that these systems, traffic surveillance system, target identification,
artificially intelligent autonomous system venture etc. It is expected that these artificially intelligent
on to the street of the world, thus requiring autonomous system venture on to the street of the
detection and classification of car objects world, thus requiring detection and classification of car
commonly found on the street. The identification objects commonly found on the street.In reality, these
and classification of object in an image should be classification systems face two types of problems. i)
faster and accurate. The aim of our project is to objects of same category with large variation in
detect the object as soon as possible with better appearance ii) the objects with different viewing
accuracy and improved performance even if the conditions like occlusion, complex background
object varies in appearance. Object identification containing buildings, trees, people, road vies, etc.. This
and classification is a challenging process when paper tries to bring out the importance of the
the object of same category with large variation background elimination as early as possible. Thus
appears. Though number of papers deal with background is removed and the image is fed to small
appearance variation, object detection process is subset of detectors for improving the speed of object
considered to be slower. In our proposed work, detection.
we tend to detect the object as quickly as
possible and we improve the detection speed by The existing system deals with whole bank of
using the optimized detectors i.e. small subset of detectors for the given input image. Then, during object
detector for the given input. Also, we detect the detection, we tend to avoid false detection.There are
multi-posed vehicle for small variation of the three main contributions of our object detection
rotation angle. Moreover, we avoid false framework. The first contribution of the object detection
detection when objects are close to one another. is a new image representation called an integral
imagethat allows for very fast feature evaluation. The
Keywords— Detection, Classification, Multi-posed
integral image can be computed from an image using a
vehicle few operations per pixel. The second contribution of the
object detection is a method for constructing a classifier
XXXVI. INTRODUCTION by selecting a small number of important features using
AdaBoost. AdaBoost provides an effective learning
algorithm and strong bounds on generalization
Given an input image, object detection is to determine
performance.The third major contribution of the object
whether or not the specified object is present. Object
detection is a method for combining successively more
detection is a very complex problem that includes some
complex classifiers in a cascade structure which
real hardcore math and long tuning of parameters to the
dramatically increases the speed of the detector by
computation methods. In our project, object in the
focussing attention on promising regions of the image.
sense vehicle i.e. cars.Object detection and classification
are necessary components in an artificially intelligent

460
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Our object detection procedure classifies images feature extraction, we used two methods to compare the
based on the value of simple features. There are many
motivations for using features rather than the pixels efficiency .i.e. Principle Component Analysis (PCA) and
directly. The most common reason is that features can
act to encode ad-hoc domain knowledge that is difficult Histogram of Oriented Gradients (HOG). After extracting
to learn using a finite quantity of training data. For this
system there is also a second critical motivation for the features using these two methods, we perform feature
features: the feature-based system operates much
faster than a pixel-based system. selection using Adaptive Boosting technique. We get the

The research for object detection and relevant features after performing feature selection. In
recognition is focusing on
1) Representation: How to represent an object. Training Module, the relevant features are used to train the
2) Learning: Machine Learning algorithms to learn the
common property of a class of objects. classifier. We perform training by 100 car images and 100
3) Recognition: Identify the object in an image using
learning models. non-car images. The trained features are then classified.In

In our proposed work, we tend to detect the Classification Module, to classify the objects the Support
object as quickly as possible and we improve the
detection speed by using the optimized detectors i.e. Vector Machine (SVM) classifier is used. The trained
small subset of detectors for the given input. Also, we
detect the multi-posed vehicle for small variation of the features are divided into two classes to classify the car
rotation angle. Moreover, we avoid false detection when
objects are close to one another. image and non-car image. After classification, the query
The Phase I is divided into four modules, they are
image .i.e. the image to be tested is given as input. Then

1. Background Subtraction Module. the features are extracted and it is used as test feature.
2. Feature Extraction and Feature
Selection Module. After the features are extracted, the Classification is done
3. Training Module.
4. Object Classification Module. likewise. Then the object is identified by performing the

above process.
In Background Subtraction Module, we first convert

the original image to gray scale image. Then the region


The system architecture diagram of Object
filling technique is performed and the gray scale image is
Identification and Classification system is shown in Figure
subtracted from the region filled image. Finally, the
1.1.
subtracted image is mapped to obtain the object of

interest.In the Feature Extraction and Feature Selection

Module, the features are extracted and the relevant

features are selected from the object of interest. For

461
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

In vehicle Objects Detection of Video Images


Based on Gray-Scale Characteristics [1], first the color
images are converted to gray-scale images. Then the
methods of frame differencing and selective background
updating are utilized to generate initial background and
update current background. Furthermore, every
Query image Feature processed image is filtered by fast median filter to
Training remove noise. When the current background is
obtained, moving objects in the video can be detected
Feature effectively by background frame differencing. Finally,
Extraction morphological filtering is used for decreasing
and accumulative errors.Whenever detection system is
Feature started, a robustbackground image can be got quickly
Selection and moving targets canbe detected effectively.
However, false detection also happens when
vehiclesadhere to each other.

In Cluster Boosted Tree Classifier for Multi-View,


Multi-Pose Object Detection [2], a Cluster BoostedTree
(CBT) learning algorithm was introduced to
Background Object
automatically construct tree structured object detectors.
Subtraction Identification
Instead of using predefined intra-class sub-
categorization based on domain knowledge, they divide
the sample space by unsupervised clustering based on
discriminative image features selected by boosting
algorithm. The sub-categorization information of the leaf
nodes is sent back to refine their ancestors‘ classification
functions. Theirlearning algorithm does not limit the
type of features used.New features could be integrated
to the framework easily.

In Rapid Object Detection using a Boosted


Image Object Cascade of Simple Features [3], they have presented an
repository approach for object detection which minimizes
Classification
computation time while achieving high detection
accuracy. This approach is 15 times faster than any
previous approach. They worked on three key
contributions. 1) A representation of new image called
the ―Integral Image‖ which allows the features used by
our detector to be computed very quickly. 2) A learning
algorithm, based on AdaBoost, which selects a small
number of critical visual features from a larger set and
yields extremely efficient classifiers. 3) A method for
Figure 1.1 Architecture of the Object Identification combining increasingly more complex classifiersin a
and Classification System ―cascade‖ which allows background regions of the image
to be quickly discarded. Experiments on such a large
and complexdataset are difficult and time consuming.
XXXVII. RELATED WORKS
In Sharing features: efficient boosting
procedures for multi class object detection [4], they

462
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

have introduced a joint boosting algorithm, for jointly


training multiple classifiers so that they share as many
features as possible. The result is a classifier that runs
faster and requires less data to train. They have applied
the joint boosting algorithm to the problem of multi-
class, multi-view object detection in clutter. An
important consequence of joint training is that the
amount of training data required is reduced. When
reducing the amount of training, some of the detectors
trained in isolation perform worse than chance level.
In Fast Pose Estimation with Parameter For testing, we use query image as follows
Sensitive Hashing [5], they presented new hash-based
searching techniques to rapidly find relevant examples
in a large database of image data, and estimates the
parameters for the input using a local model learned
from those examples. But the learning algorithm,
implicitly assumes independence between the features;
they are exploring more sophisticated feature selection
methods that would account for possible dependencies.

In a trainable object detection system for static


images [6], results are shown for car detection. The I. Background Subtraction
system uses a representation based on Haar wavelets
that captures the significant information about elements In Background Subtraction Module, we first
of the object class. When combined with powerful convert the original image to gray scale image. Then the
classification engine i.e. the support vector machine, region filling technique is performed and the gray scale
they obtain a detection system that achieves accuracy image is subtracted from the region filled image. Finally,
with low rates of false positives. Due to the significant the subtracted image is mapped to obtain the object of
change in the image information of cars under varying interest. Background subtraction can be shown as
viewpoint, developing a pose invariant car detection follows
system is likely to be more difficult than a pose invariant
people detection.

XXXVIII. SYSTEM MODEL

QUIKER OBJECT DETECTION

In our proposed work, we tend to detect the


object as quickly as possible and we improve the
detection speed by using the optimized detectors i.e. J. Feature Extraction And Feature Selection
small subset of detectors for the given input. Also, we
In the Feature Extraction and Feature Selection
detect the multi-posed vehicle for small variation of the
Module, the features are extracted and the relevant
rotation angle. Moreover, we avoid false detection when
features are selected from the object of interest. For
objects are close to one another. This can be shown by
feature extraction, we used two methods to compare
the following modules.
the efficiency .i.e. Principle Component Analysis (PCA)
and Histogram of Oriented Gradients (HOG). After
The image repository can be represented as follows
extracting the features using these two methods, we
perform feature selection using Adaptive Boosting

463
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

technique. We get the relevant features after performing


feature selection.

The resulted HOG feature vectors can be shown as


follows

The resulted PCA feature vectors can be shown as


follows
Figure 3.1 Detailed design of Object Identification
and Classification System

By performing background subtraction, we eliminate the


occluded background from the original image and send
the object of interest to the next module. Then we
perform feature extraction for extracting the features of
the object based on their shape and appearance.
K. SVM Training
In our model, this feature extraction is done by two
In Training Module, the relevant features are
types as given below
used to train the classifier. We perform training by 100
 Histogram of Oriented Gradients
car imagesand 100 non-car images. The trained features
are then classified. (HOG) and
 Principle Component Analysis (PCA)
L. Object Classification
In Classification Module, to classify the objects The extracted features are then optimized .i.e.
the Support Vector Machine (SVM) classifier is used. The the relevant features are identified to increase the speed
trained features are divided into two classes to classify of object detection. Then the optimized features are
the car image and non-car image. After classification, sent to the next module for training. The relevant
the query image .i.e. the image to be tested is given as features are trained and the trained features are stored
input. Then the features are extracted and it is used as in the database for future comparison. Then the objects
test feature. After the features are extracted, the are identified and classified accordingly to the defined
Classification is done likewise. Then the object is classes. Now, if the test image is given, then the result
identified by performing the above process. specifies which class it belongs to.

The detailed design of the paper is given as follows


XXXIX. MULTI-VIEW OBJECT DETECTION

In the next experiment we look at a multi-view


vehicle detection problem. We evaluate theperformance

464
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

availability of target images in the collection for a particular


No. of images True False
query image.The classification performance of the system
Type of image
tested positive negative
using both HOG and PCA feature vectors are tabled in Table

Car images 50 50 0 5.1 and 5.2.

Non-car images 20 16 4
Table 5.1 Predicated result based on HOG features
Total 70 66 4

Accuracy 94% 6%

of the proposed method. In training, a bootstrap


method is employed to collect non-trivial background
examples for all methods. First a linear SVM classifier is
trained with an initial set of 100 training background
patches. Then we exhaustively search all the
background trainingimages with this linear SVM classifier
to collect false positive image patches (―hard
examples‖).

Then the trained system is tested with 70


images and the results are shown in the following
section. Table 5.1 Predicated result based on PCA features

XL. RESULTS AND PERFORMANCE ANALYSIS


No. of images True False
Type of image
For the object identification and classification, we tested positive negative

tested the query image and we got two classes of results as


Car images 50 49 1
Car image andNon-car image

Non-car images 20 13 7

The Accuracy of the results depends upon the


Total 70 62 7
amount of training and testing items. The Training set is of

about 100 images and the Testing set is of about 70 Accuracy 89% 11%

images. Performance is directly proportional to the


After performing the test with 70 images using

465
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

HOG and PCA technique, I found that HOG performs Proc. IEEE International Conf. on Computer Vision, pp.
5-8, 2003.
betterthan PCA technique. So, I used HOG technique for [6]. C. Papageorgiou and T. Poggio. ―A trainable system
for object detection‖. International Journal of Computer
further testing the accuracy of object detection and
Vision, 38(1), pp. 15–33, 2000.
classification.
[68]

XLI. CONCLUSION AND FUTURE WORK

Thus, we conclude that the object detection is


more efficient than the previous method. In our work,
we reject the background patches quickly by using
background subtraction.Then, we used SVM classifier for
training the images and we used small subset of
detectors for efficient detection. Thus we improved the
speed of detection using HOG technique for single
orientation images.

The future work can be done with multiple


orientations of an image for better accuracy and for
increasing the speed of detection. To enhance our
model, we perform our future work in video images
where the object is segmented from the background
and then we find the object by viewing the object at
different angles.

REFERENCES
[67]
[1]. Jie Cao, Li Li. ―Vehicle Objects Detection of Video
Images Based on Gray-Scale Characteristics‖. First
International Workshop on Education Technology and
Computer Science, pp. 937–940, 2009.
[2]. B. Wu and R. Nevatia. ―Cluster boosted tree
classifier for multi-view multi-pose object detection‖. In
Proc. IEEE International Conf. on Computer Vision, pp.
1-8, 2007.
[3]. P. Viola and M. Jones. ―Robust real time object
detection‖. International Journal of Computer Vision,
57(2), pp. 137 – 154, 2004.
[4]. Torralba, K. Murphy, and W. Freeman. ―Sharing
features: Efficient boosting procedures for multiclass
object detection‖. In Proc. IEEE Conf. on Computer
Vision and Pattern Recognition, pp. 1-8, 2004.
[5]. G. Shakhnarovich, P. Viola, and T. Darrell. ―Fast
pose estimation with parameter-sensitive hashing‖. In

466
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

ROUTING BASED ON LOAD BALANCING IN


MULTI-HOP WIRELESS MESH NETWORKS
*M.Usha ** C M Nalayini *** Mr.T.K.S.Rathish Babu
*Asst Professor-I, CSE Dept
VEC, Chennai
umahalingam@gmail.comPH No: 9962704940
**Asst Professor-II IT Dept
VEC,Chennai
nalacm_13@yahool.com PH No: 044-26591113
***Asst Professor-II CSE Dept
VEC, Chennai
tksbabu80@gmail.com PH No: 9894567685

ABSTRACT directly or through one or more intermediate


This paper proposes a load-aware routing scheme nodesIn networking,load balancing is a technique to
for wireless mesh networks(WMN). It is divided into distribute workload evenly across two or more
multiple clusters for load control. A cluster head computers, network links, CPUs, hard drives, or
estimates traffic load in its cluster. As the estimated other resources, in order to get optimal resource
load gets higher, the cluster head increases the routing utilization, maximize throughput, minimize
metrics of the routes passing through the cluster. Based response time, and avoid overload. Using multiple
on the routing metrics, user traffic takes a detour to components with load balancing, instead of a single
avoid overloaded areas, and as a result, the WMN component, may increase reliability through
achieves global load balancing. We present the redundancy. The load balancing service is usually
numerical results showing that the proposed scheme provided by a dedicated program or hardware
effectively balances the traffic load and outperforms the device (such as a multilayer switch or a DNS
routing algorithm using the expected transmission time server).
(ETT) as a routing metric.

1.INTRODUCTION
1.1Routing
A wireless mesh network (WMN) is a
communications network made up of radionodes Routing algorithms have used many different
organized in a meshtopology. Wireless mesh metrics to determine the best route. Sophisticated
networks often consist of mesh clients, mesh routing algorithms can base route selection on
routers and gateways multiple metrics, combining them in a single
(hybrid) metric. All the following metrics have
A mesh network is reliable and offers redundancy. been used:
When one node can no longer operate, the rest of
the nodes can still communicate with each other,  Path Length

467
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 Reliability
 Delay
 Bandwidth
 Load

1.2Routing Metrics for WMNs


2.Overall Design
 Hop Count
Creating Database to store the configuration of
 Expected Transmission Count (ETX)
mesh network with number of nodes and cost of
 Expected Transmission Time (ETT)
link. Calculating all available paths from source to
 Weighted Cumulative ETT (WCETT)
destination nodes.. from that finding the best path
 Metric of Interference and Channel
using optimization algorithm and transmit the data
Switching (MIC)
by using dual decomposition method. If there is no
 The metrics evolved, each incorporating
overload, then the message is transmitted
features of the previous ones
successfully otherwise, proper load estimation is
 Communication Cost
done by the respective cluster head, then selects the
best path and transmits the message.

User
Input

Topology construction

Dual Decomposition

DataBas
ee Calculating Available
Path

Finding Best Path

Clust Message Transmission


3.Topology construction:
er
Head
Here we use mesh topology because of its
Load Estimation unstructured nature. Mesh Network where all the
nodes are connected to each other and are a
complete network. Topology is constructed by

Increase Link cost Y N Successful


E If O Message
S overloa Transmission
ds 468
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

getting the names of the nodes and the connections


among the nodes are taken as input. While getting
each of the nodes, their associated port and ip
address is also obtained. For successive nodes, the
node to which it should be connected is accepted
from the user.While adding nodes, comparison will
bedone so that there would be no node duplication.
Then we identify the source and the destinations

Enter No.of Nodes(n)


Path Construction
Here we get the total number of available paths for
No the particular topology. The steps involved in this
Node<=5
finish topology process are, calculating the number of nodes,
construct calculating the no of paths for a particular set of
nodes and processing those paths when the
Yes
particular set of nodes are chosen. This process also
calculates the aggregate cost and delay for
Get Node name& concurrent paths.
IP-Address

4.Dual Decomposition Method

Wireless mesh network is divided into multiple


overlapping clusters. A cluster head takes role of
controlling the traffic load on the wireless links in Path Determination
DB
its cluster.The cluster head periodically estimates
A metric is a standard of measurement, such as path
the total traffic load on the cluster and increases the
length, that is used by routing algorithms to
“link costs” of the links in the cluster, if the
determine the optimal path to a destination. To aid
estimated load is too high. In this scheme, each user
in this process of path determination, routing
chooses the route that has the minimum sum of the
algorithms initialize and maintain routing tables,
link costs on it.users can share the informations
which contain route information. This information
between them in overloadded areas by selecting the
can vary widely depending on which routing
best path based on best link capacity efficiently
algorithm generated the routes.
through dual decomposition method
Routing algorithms fill routing tables with a list of
networks and its corresponding "next hop" on the
way its destination. When a router receives an
incoming packet, it checks the destination address

469
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

and attempts to associate this address with a next starts Load Estimation. It takes the required
hop. information from the database and starts calculating
load, it uses Dijkstras algorithm for distance
Route Flap Damping Algorithm calculation and it increases the link cost in order to
overcome the overloaded situation. Route Flap
The dual decompositionmethod makes it possible to Damping Algorithm is used to do the proper load
design a distributedrouting scheme. However, there sharing. After the Load Estimation is done, it
could be a route flappingproblem in the distributed selects the best path and sends the message to the
cheme. To tackle this problem,we have suggested a destination.
dampening algorithm and haveanalyzed the
performance of the algorithms Route flap damping
(RFD) plays an important role in maintaining the
stability of the Internet routing when receiving a
route r with prefix d from peer j

if (W(r) and !W(p))

// W(x) returns true only if x is a withdrawn route


// p is the previous route with prefix d from peer j

a flap is identified: route withdrawal


else if (!W(r) and !W(p) and r ≠ p)
a flap is identified: route attribute change 6.Conclusion
p=r
The proposed scheme is a practicalsingle-path
Algorithm 1. Pseudo code of the original RFD routing scheme, unlike other multipath
algorithm system. routingschemes which are designed by using the
optimizationtheory. Also, the proposed scheme can
A common approach in route flap damping is to easily be implementedin a distributed way by
assigna penalty to a route and increment the penalty means of the existingrouting algorithms. The
value whenthe route flaps. When the penalty of a proposed scheme can be applied tovarious single-
route exceeds thethreshold suppress limit, the route band PHY/MAC layer protocols. In futurework, we
is suppressed and notadvertised further. The penalty can extend the proposed scheme so that it can
of a route decaysexponentially according to the alsobe applied to the multiband protocols, which
parameter half life, whichspecifies the time for the can providelarger bandwidth to the WMN.
penalty to be reduced by half. If thepenalty
decreases below the threshold reuse limit, the
routeis reused and may be advertised again. 7.References

5.Message Transmission [1]http://www.inetdaemon.com/tutorials/internet/ip/


routing/bgp/operation/bgp_route_flap_dampening.s
In this stage user transmits the messageto the html
destination. During Message transmission, it will [2]http://www.nanog.org/mtg-0210/ppt/flap.pdf
check for overload, If overloads, then Cluster Head

470
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[3]http://www.springerlink.com/content/h71831624
01gk741/
[4]http://www.linuxjournal.com/article/3345
[5]http://portal.acm.org/citation.cfm?id=1618241
[6]http://airccse.org/journal/jwmn/0210s5.pdf
[7]http://www.computer.org/portal/web/csdl/doi/10.
1109/AINAW.2007.50

[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]

471
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

VIRTUAL MOUSE USING HCI


1.M.Saranya M.E **2.M.Surendran(B.E) ***3.S.Subramanian(B.E)

*saranya_munirathinam@yahoo.com9965107139
**suren.csc@gmail.com9043409185
***ssmani184@gmail.com9788736081
Sri Ram Engineering College

ABSTRACT does not change its distinctive convex shape which


This project “Virtual Mouse using makes it easier to track as the face moves.
HCI”aims to present an application that is able to
replacing the traditional mouse with the human face
as a new way to interact with the computer. Facial Eyes were used to simulate mouse clicks, so
features (nose tip and eyes) are detected and tracked the user can fire their events as he blinks. We use an
in real-time to use their actions as mouse events. The off-the-shelf webcam that affords a moderate
coordinates and movement of the nose tip in the live resolution and frame rate as the capturing device in
video feed are translated to become the coordinates order to make the ability of using the program
and movement of the mouse pointer on the user‟s affordable for all individuals.
screen. The left/right eye blinks which fire left/right
mouse click events. INTRODUCTION
One way to achieve human computer With the growth of attention about computer
interface is to capture the desired feature with a vision, the interest in Human Computer Interface
webcam and monitor its action in order to translate it (HCI) has increased proportionally. There are
to some events that communicate with the computer. different human features and monitoring devices
In this project we are trying to compensate people were used to achieve HCI, but this paper is interested
who have hands disabilities that prevent them from only in works that involved the use of facial features
using the mouse by designing an application that and webcams. There is a large diversity of the facial
uses facial features (nose tip and eyes) to interact features that were selected, the way they were
with the computer. detected and tracked, and the functionality that they
The nose tip is selected as the pointing presented for the HCI.
device; the reason behind that decision is the With the availability of high speed processors
location and shape of the nose; as it is located in the and inexpensive web cams, more and more people
middle of the face it is more comfortable to use it as have become interested in real-time applications that
the feature that moves the mouse pointer and defines Involve image processing. Use human features (e.g.
its coordinates, not to mention that it is located on face, hands) to interact with the computer. One way
the axis that the face rotates about, so it basically to achieve that is to capture the desired feature with a
web cam and monitor its action in order to translate
472
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

it to some events that communicate with the movement of the nose tip in the live video feed are
computer. translated to become the coordinates and movement
In our work we were trying to compensate of the mouse pointer on the user‟s screen. The
people who have hands disabilities that prevent them left/right eye blinks fire left/right mouse click events.
from using the mouse by designing an application
that uses facial features (nose tip and eyes) to MODULES DESCRIPTION
interact with the computer. The nose tip was selected
as the pointing device; because it is located in the The paper concerns each and every aspect in
middle of the face it is more comfortable to track the detecting and tracking the face. Let be in the
mouse as the face moves. Eyes were used to simulate following hierarchy.
mouse clicks, so the user can fire their events as He
blinks.
While different devices were used in HCI (e.g.
infrared cameras, sensors, Microphones) we used an
off-the-shelf web cam that affords a moderate
Resolution and frame rate as the capturing device in Video Frame Module
order to make the ability of using the program
affordable for all individuals. We will try to present The first module is to capture the Live Image
an algorithm that distinguishes true eye blinks from using the Web Cam and to detect a face in the
involuntary ones, detects and tracks the desired captured Image Segment.
facial features precisely
Face Detection Module
OBJECTIVES
The paper enable the user to select the items Face detection has always been a vast
present in the computer screen by mouse but not research field in the computer vision World,
directly using with hand but through the movement considering that it is the backbone of any application
of his/her head. The main objective of this paper is that deals with the human face (e.g. surveillance
listed as follows. systems, access control). Researchers did not spare
1. Mouse pointer is recognized from the nose point any effort or imagination in inventing and Evolving
of the human. methods to localize, extract, and verify faces in
2. Selecting an event in the screen will be done by images.
blinking the left eye.
3. Right click event in the mouse button can be Simple heuristics were applied to images taken
done by blinking right with certain restrictions (e.g. plain background).
eye. These methods however have improved over time
and become more robust to lighting conditions, face
PAPER DESCRIPTION orientation, and scale. Despite the large number of
face detection methods, they can be organized In two
PROBLEM DEFINITION main categories: Feature-based methods, and image-
This paper aims to present an application that based methods.
is able of replacing the traditional mouse with the
human face feature. Facial features (nose tip and
eyes) are detected and tracked in real-time to use
their actions as mouse events. The coordinates and

473
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

 The first involves finding facial features (e.g. The sum of pixels in each sector is denoted as S
nose trills, eye brows, Lips, eye pupils….) and in along with the sector
order to verify their authenticity performs number.The use of this filter will be explained in
Geometrical analysis of their locations, areas, detail in the face detection algorithm.
and distances from each other. This feature-based
analysis will eventually lead to the Localization Integral Image
of the face and the features that it contains. In order to facilitate the use of SSR filters an
intermediate image representation called integral
 Some of the most famous methods that are image will be used. In this representation the integral
applied in this category are skin models, and image at location x, y contains the sum of pixels
motion cues which are effective in image which are above and to the left of the pixel x, y [3] .
segmentation and face extraction. On one hand
feature-based analysis is known for its pixel- Fig Integral Image.
accuracy features localization, and speed, on the Ii: Integral image, i: Pixel value
other hand its lack of Robustness against head
rotation and scale has been a drawback of its With this representation calculating the
application in computer vision. sectors of the SSR filter becomes fast and easy. No
matter how big the sector is,we will need only 3
 The second is based on scanning the image of arithmetic operations to calculate the sum of pixels
interest with a window that looks for faces at all which belong to it.So each SSR filter requires 6*3
scales and locations. This category of face operations to calculate it.
detection implies pattern recognition, and
achieves it with simple methods such as template Find Candidates Face Module
matching or with more advanced techniques such The human face is governed by proportions
as neural networks and support vector Machines. that define the different sizes and distances between
facial features. we will be using these proportions in
 Image-based detection methods are popular our heuristics to improve facial features detection
because of their Robustness against head rotation and
and scale, despite the fact that the Exhaustive trackin
window scanning requires heavy computations. g.

SSR Filter stands for: Six Segmented Rectangular


filters (see fig).

Face
detecti
on
general steps.
We will be using feature based face detection
methods to reduce the area in which we are looking
for the face, so we can decrease the execution time.
To find candidates face the SSR filter will be used in
Fig SSR Filter. the following way.

474
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

At first we calculate the integral image by making a the conditions are fulfilled then the center of the
one pass over the video frame using these equations: filter will be considered as a face candidate.
s(x, y) = s(x, y-1) + i(x, y)
ii(x, y) = ii(x-1, y) + s(x, y) Extract BTE Templates
Where s(x, y) is the cumulative row sum, s(x,-1) = 0, Now that we found pupils candidates for each
and ii(-1, y) = 0. of the clusters (face candidates) we can extract BTE
templates in order to pass them to the support vector
machine. As earlier mentioned, we trained our
support vector machine on templates of size 35 * 21
pixels , so no matter how big is the template that we
are going to extract we need to scale it down to that
Figure shows an ideal location of the SSR filter, size.
where its center is
considered as a candidate face. In order to find the scale rate we divide the
distance between the left and right pupil candidates
on 23, where 23 is the distance between the left and
right pupils in our training templates.
We will extract a template that has the size
of 35*SR * 21*SR where the left and right pupil
candidates are aligned on the 8*SR row and the
distance between them is 23*SR pixels. After
extracting the template we scale it down with SR,
and we get a template that has the size and
alignments of the training templates.

It is very important to notice that all our


training templates had the pupils aligned
horizontally, so if the template that we extracted was
Ideal SSR filter location for a face candidate.(x, y) is rotated (the face was rotated with a certain angle)
the location of the filter (upper left corner).The plus .We need to rotate the template back to a horizontal
sign is the center of the filter which is the face alignment of the pupils Since that this is a real time
candidate. application and our goal is less processing time we
thought of skipping the down scale operation by
We can notice that in this ideal position the eyes fall doing the following.
in sectors S1 and S3, while the nose falls in sector
S5.Since the eyes and eye brows are darker than the We know that the final result should be 35 *
BTE and the cheek bones, we deduce that 21 pixels template, in other words we need 21 lines
S1 < S2 && S2 > S3 where 35 pixels are extracted from each one of them,
S1 < S4 && S3 < S6 and that is what we are going to do. Since that the
So in order to find face candidates we place scale rate between the training templates and the one
the upper left corner (see fig.10) of the SSR filter on we are about to extract is SR.
each pixel of the image (only on pixels where the
filter falls entirely inside the bounds of the image). if That means if we have a line that is 35*SR
pixels long and we extracted each SR pixel of that

475
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

line we will get 35 pixels that cover that line, the So since that the nose bridge is brighter than
same for the height of the template (21*SR) if we the surrounding features the values should
select each SR line we will get 21 lines that cover accumulate faster at the bridge location. In vertical
21*SR lines, so the final result will be 35*21 pixels intensity profiles we add horizontally to each column
that represent 35*SR * 21*SR pixels; in other words the values of the columns that precedes it in the ROI;
we got a 35*21 template that represents the SR down the same as in the horizontal profile, the values
scale of the original template so now it is ready to be accumulate faster at the nose tip position. So the
passed to the support vector machine. maximum value gives us the „y‟ coordinates of the
nose tip. From both, the horizontal and vertical
Find Nose TIP Module profiles we were able to locate the nose tip position,
Now that we located the eyes, the final step is but unfortunately this method did not give accurate
to find the nose tip. So the first step is to extract the results because there might be several max values in
Region of Interest (ROI) in case the face was rotated a profile that are close to each other, and choosing
we need to rotate the ROI back to a horizontal the correct max value that really points out the
alignment of the eyes. coordinates of the nose tip is a difficult task.
From the following figure we can see that the So instead of using the intensity profiles
blue line defines a perfect square of the pupils and alone, we will be applying the following method.
outside corners of the mouth; the nose tip should fall
inside this square, so this square becomes our region At first we need to locate the nose bridge and
of interest in finding the nose tip. then we will find the nose tip on that bridge. As
earlier mentioned the nose bridge is brighter than
surrounding features, so we will use this criterion to
locate the nose-bridge-point (NBP) on each line of
the ROI.

We will be using an SSR filter to locate the


NBP candidates in each ROI line.The width of the
filter is set to the half of the distance between the
eyes, because from figure 19 we can notice that the
yellow line (nose width) is equal to the half of the
blue line (distance between the eyes).

SSR filter to locate nose bridge candidates


Find nose After calculating the integral image of the
tip ROI, each line of it will be scanned with this filter;
So the first step is to extract the ROI. The we remember that the nose bridge is brighter than the
nose tip has a convex shape so it collects more light regions to the left and right of it; in other words the
than other features in the ROI because it is closer to center of the SSR filter is considered as an NBP
the light source. Using the previous idea we tried to candidate if the center sector is brighter than the side
locate the nose tip with intensity profiles In sectors.
horizontal intensity profiles we add vertically to each S2 > S1 (13)
line the values of the lines that precedes it in the S2 > S3 (14)
ROI. In each line we might get several NBP
candidates, so the final NBP will be the candidate

476
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

that has the brightest S2 sector. In order to avoid


picking some bright video noise as the NBP we will Now that we found the facial features that we
be using the horizontal intensity profile. need, we will be tracking them in the video stream.
The nose tip is tracked to use its movement and
So instead of applying the SSR filter to a line coordinates as the movement and coordinates of the
of the ROI we will be applying it to the horizontal mouse pointer. The eyes are tracked to detect their
profile calculated from the first line to the line that blinks, where the blink becomes the mouse click.
we are dealing with, because as already mentioned The tracking process is based on predicting the place
the values will accumulate faster at the nose bridge of the feature in the current frame based on its
location.So by using the horizontal profile we are location in previous ones; template matching and
sure that we are picking the right NBP candidate not some heuristics are applied to locate the feature‟s
some bright point caused by noise; of course the new coordinates.
results will get more accurate as we reach the last
line of the ROI because the accumulation at the nose
bridge location will get more obvious.

CONCLUSION
In detection mode the eyes and nose tip were
located accurately when the following conditions
were fulfilled:
 The face is not rotated more than 5° around
the axis that passes from the nose tip (as long
as the eyes fall in sectors S1 and S3 of the
SSR filter).
 The face is not rotated more than 30° around
the axis that passes from the neck (profile
view).
Nose bridge detection with the SSR filter and the  Wearing glasses does not affect our detection
horizontal profile process.

ROI is enlarged only for a clearer vision. As for different scales it is best to get about
Now that we located the nose bridge we need to find 35 cm close to the webcam, because when
the nose tip on that bridge. Since each NBP the face is a bit far from the screen the
represents the brightest S2 sector on the line it program may detect a false positive
belongs to, and that S2 sector contains the especially when the background is jammed
accumulated vertical sum of the intensities in that (crowded).
sector from the first line to the line it belongs to, we
will be using this information to locate the nose tip.

Face Tracking Module


477
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DESIGN OF DETERMINISTIC KEY


DISTRIBUTION FOR WSN

*S. Jenifa Subha Priya, Student **T.Kavitha, Assistant Professor


Dept of Computer Science and Engg
Jerusalem College of Engineering
Anna University Chennai
Chennai, India.
*jenefa16@gmail.com
**haikavi18@yahoo.co.in
key; if they don‘t share key in common
then there may be a path, called key path,
Abstract—Wireless sensor network among these two nodes where each pair
(WSN) is composed of large number of of neighboring nodes on this path have a
sensor nodes with Limited computation key in common. In this paper we have
power, storage and communication shown a novel key pre distribution
capabilities. The wireless communication algorithm based on number theory which
employed by sensor network facilitates uses Chinese Reminder Theorem.
eavesdropping and packet injection by an
adversary. The Security of the wireless Key Wordss— Security in Wireless Sensor
sensor networks depends on the existence Network; Key pre distribution; key
of strong and efficient key distribution management; Number Theory; deterministic
mechanisms.Since the network topology is approach.
unknown prior to deployment, a key pre-
distribution scheme is required where keys Introduction
are stored into ROMs of sensors prior to Wireless sensor network (WSN) is composed
the deployment. The keys stored must be of large number of sensor nodes with limited
carefully selected so to increase the power, computation, storage and
probability that two neighboring sensor communication capabilities. Sensor networks
nodes, which are within each other‘s are being deployed for a wide variety of
wireless communication range, have at applications, including military, sensing,
least one key in common. Nodes which do environment monitoring, patient monitoring
not share a key may communicate through and tracking, smart environments, etc. When
a path on which each pair of neighboring sensor networks are deployed in a hostile
nodes share a key. environment, security becomes extremely
The main task is to safely important, as they are prone to different types
distribute the shared keys to the sensor of malicious attacks environments. Therefore
nodes with high connectivity, good security must be provided for sensor network
resilience with minimum resource to ensure secrecy of sensitive data. However,
requirement. The solution to key security in WSNs is more difficult than that in
distribution is such that, a pool of conventional wired networks due to the
symmetric keys is chosen and a subset of inherent resource and computing constraints
the pool (key chain) is distributed to each of sensors. Therefore, the design of security
sensor node. Two nodes that want to schemes for WSNs should consider factors that
communicate search their key chain to
determine whether they share a common

478
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

are related to energy consumption, This article is structured as follows. In the


computation, and memory resources. next section some selected related works are
Key management lays the foundation to given. Section III introduces about some
ensuring the security of network services and basics of number theory. Section IV
applications in WSNs. The goal of key emphasizes design of proposed key pre
management is to establish the required keys distribution algorithm. Lastly, section V covers
between sensor nodes that exchange data. the conclusion.
Due to the constraints of sensor nodes, RELATED WORKS
symmetric key management systems should
be the only option for WSNs. Key distribution Basic probabilistic key pre-distribution
is divided into two major types: key pre scheme
distribution and dynamic key generation. This basic scheme[1] relies on
If the environment is uncontrolled or the probabilistic key sharing among the
WSN is very large, deployment has to be nodes of a random graph. In key setup
performed by randomly scattering the sensor phase, a large key-pool of KP keys and
nodes to target area. It may be possible to their identities are generated. For each
provide denser sensor deployment at certain sensor, k keys are randomly drawn from
spots, but exact positions of the sensor nodes the key-pool KP without replacement.
cannot be controlled. Thus, network topology These k keys and their identities form
cannot be known precisely prior to the key-chain for a sensor node.
deployment. Since the network topology is In shared-key discovery phase, two neighbor
unknown prior to deployment, key pre nodes exchange and compare list of identities
distribution can overcome this problem of keys in their key-chains.
Key pre-distribution is a scheme where keys
are stored into ROMs of sensors prior to the Cluster key grouping scheme
deployment. The keys stored must be carefully This cluster scheme[8] divide key-chains into C
selected so to increase the probability that two clusters where each cluster has a start key ID.
neighboring sensor nodes, which are within Remaining key IDs within the cluster are
each other‘s wireless communication range, implicitly known from the start key ID. Thus,
have at least one key in common. Nodes only start key IDs for clusters are broadcasted
which do not share a key may communicate during shared-key discovery phase.
through a path on which each pair of Pair-wise key establishment protocol
neighboring nodes share a key. The length of Here [5] every sensor node to have a
this path is called key-path length. Average unique ID which is used as a seed to a PRF.
key-path length is an important performance Key IDs for the keys in the key-chain of node
metric and design consideration. SA are generated by PRF(IDA). Thus,
The evaluation parameters involved in key broadcast messages carry only one key ID.
distribution are key connectivity, resilience, Also, storage, which is required to buffer
scalability, and resource requirement. The key received broadcast message before
connectivity is evaluated to know the processing, decreases substantially. But, a
probability of key-share. Enough key
sensor node has to execute PRF(ID) for each
connectivity must be provided for a WSN to
broadcast message received from a neighbor.
perform its intended functionality. The
resilience factor is the resistance against node
capture. Compromise of security credentials,
which are stored on a sensor node or
exchanged over radio links, should not reveal
information. Scalability is the ability to support
larger networks. Key distribution mechanism
must support large networks, and must be
flexible against substantial increase in the size
of the network even after deployment.
Storage, processing and communication range
required must also be considered. A survey of
key management techniques for WSNs is
presented [16 -20].

479
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Key pre-distribution by using as well as the wider classes of problems that


deployment knowledge scheme arise from their study. In elementary number
theory, integers are studied without use of
It builds a key pre-distribution scheme techniques from other mathematical fields.
based on a deployment knowledge Questions of divisibility, use of the Euclidean
model [21]. Sensor nodes are divided algorithm to compute greatest common
into groups and deploy them at a divisors, integer factorizations into prime
resident point where the points are numbers, investigation of perfect numbers and
arranged in two dimensions. In key congruences belong here. Several important
setup phase, key-pool KP is divided into discoveries of this field are Fermat's little
sub key pools are used to deploy the theorem, Euler's theorem, the Chinese
remainder theorem and the law of quadratic
appropriate key-chains to the proper
reciprocity. Elementary Number Theory
nodes based on their predicted location
The propertiesof multiplicative functions
in the WSN. such as the Möbius function and Euler's θ
Combinatorial design function, integer sequences, factorials, and
Combinatorial design based pair-wise key Fibonacci numbers all also fall into this area.
pre-distribution scheme[2,10] is based on One of the important discoveries of the
block design techniques. It employs symmetric elementary number theory is Chinese reminder
and generalized quadrangles design theorem is described detailed in the following
techniques. The scheme uses finite projective subsection.
plane of order n (for prime power n) to Modular arithmetic is the arithmetic of
generate a symmetric design with parameters congruences. In modular arithmetic, numbers
(n2+n+1, n+1, 1). Design supports n2 + n + "wrap around" upon reaching a given fixed
1 nodes, and uses key-pool of size n2 + n + 1. quantity, which is known as the modulus. For
It generates n2 + n + 1 key-chains of size n + example, in arithmetic modulo 12 (for which
1 where every pair of key-chains has exactly the associated ring is C12), the allowable
one key in common, and every key appears in numbers are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
exactly n + 1 key-chains. More scalable and 11. Let m ≠ 0 be an integer. We say that
solutions are provided and key sharing two integers a and b are congruent modulo m
increased. But the drawback is not all network if there is an integer k such that a – b = km,
sizes are supported for the fixed key chain and in this case we write a ≡ b mod m. The
size. condition "a – b = km for some integer k" is
equivalent to the condition "m divides a – b".
F. Signal range based probabilistic
method: Chinese Reminder Theorem
This CRT appears in the mathematical classic
In basic probabilistic scheme any of the two
nodes in the sensor network has the same of Sun Zi, Sun's Arithmetical Manual, written in
ancient China around the first century A. D.
probability of shared keys. But in this [15]
method the probability is different, depending CRT is a result about congruences in number
on the signal range of the node. The result is theory. To date, its usefulness has been
that the key-sharing probability of two nodes evident within the extent of "three C's. Its
original field of applications was computing.
in the signal range is higher than that of two
nodes outside the signal range. The theory of codes and cryptography are two
more recent fields of application. The Chinese
NUMBER THEORY Remainder Theorem says that certain systems
A key pre-distribution algorithm using of simultaneous congruences with different
number theory with high connectivity, high moduli have solutions.
resilience and memory requirements is being Given the moduli set mi for such
designed by implementing a deterministic that all are relative primes, ie.
approach. During this phase the key chain is , there exists a
generated and the number of nodes supported unique integer u in the range [0,M - 1]
by the network is calculated using the Chinese where and let
reminder theorem as its basis. be integers. Then the congruent
Number theory is the branch of pure equations , , …… ,
mathematics concerned with the properties of has unique solution:
numbers in general, and integers in particular,

480
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Where and A  (a1, a2, …………..ak). where A Z m, ai


. Zmi, and ai = A mod mi for 1  i  k. Using this
Any object A in Zm can be represented by a property here the key chains are formed.
unique k-tuble whose elements are in Zmi Following are the steps involved in deriving a
using the correspondence key chain from key pool using CRT.
Based on the requirement of the network
size, parameter selection phase is performed
. using the different combination of set of pair
Example1: Let mi={2,3} where m0=2 and wise relatively prime numbers of the CRT then
m1=3,M=2*3=6,A->{0,1,2,3,4,5}. Using the key chains are generated, After the validation
CRT (A mod m0,Amod m1) the tuples of the chains, it is short listed. Then analysis
generated of the various factors such as connectivity,
are{(0,0),(1,1),(0,2),(1,0),(0,1),(1,2)}. resilience and resource requirements are
performed with short listed key chain list.
Appplications Of CRT
CRTBA: Chinese Remainder Theorem-Based Parameter Selection
Broadcast Authentication in Wireless Sensor The network size is got as the input
Networks [6] Broadcast authentication is a from the user and the relatively prime
critical security service in wireless sensor numbers are determined in this phase. The
networks (WSNs). However, due to resource relatively prime numbers are numbers whose
constrains of sensor nodes, providing greatest common factor between them is one.
authentication mechanism for broadcast These relatively prime numbers are denoted
message is difficult. μTESLA is a lightweight by ‗mi‘. The relatively prime numbers are
broadcast authentication protocol, which use a selected in such a way that the number of
one-way hash chain and the delayed nodes supported at the end is equal to the
disclosure of keys to provide the network size specified by the user. The
authentication service. product of the relatively prime numbers is also
Chinese Remainder Theorem is used in found to determine the key pool and it is
Energy Saving in Wireless Sensor denoted by the ‗M‘. Key-pool is the list of all
Networks.This approach is characterized by a keys or keying materials which are used in the
computationally simple packet splitting wireless sensor network.
procedure able to reduce the energy needed Let ‗mi‘ be randomly generated relatively
for transmission and increase the network prime numbers and ‗M‘ be product of the
lifetime accordingly. relatively prime numbers mi M= ..
Decomposable Forward Error Correction Algorithm:
Codes Based on Chinese Remainder Input: n=no of relatively prime number,
Theorem[4]. Forward Error Correction (FEC) mi=relatively
codes are proposed to facilitate reliable prime numbers.
multicast data distribution and are applied to Output: M=product of relatively prime
several applications. FEC provides a more numbers
efficient way of reliable multicast data for(i=0;i≤n-1;i++)
distribution because of no retransmission. And {
CRT plays a vital role in many of the for(j=0;j≤n-1;j++)
applications of digital signal processing. {
if gcd(mi,,mj)=1≦ i , j & i≠j then
DESIGN relatively pair
The proposed system involves a design of else not relatively pair
a pre distribution algorithm using a }
deterministic approach. Deterministic approach }
is the process of determining the keys before
placing them within the sensor nodes. The From eg1 (2,3) is a relatively prime
Chinese reminder theorem of the number numbers where gcd(2,3) =1, mi ={2,3} where
theory is used in the generation of the key m0=2,m1=3, M=2×3=6. The value of M
chain. determines the key pool size.
By CRT any object A in Zm can be Key Chain Generation
represented by a unique k-tuble whose The key-chain is the list of keys or
elements are in Zmi using the correspondence, keying materials which are stored on a sensor

481
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

node. The key chain is generated by finding Analysis Of Connectivity, Resilience And
out the key combinations using the Chinese Memory Requirements
reminder theorem formulation which includes Key connectivityis the probability that two
a mod function which is found between the (or more) sensor nodes store the same key or
values of the key pool and the relatively prime keying material. Enough key connectivity must
numbers. be provided for a WSN to perform its intended
Let A denote the key pool which ranges functionality.
from 0 to M-1 and m0 and m1 be the Key connectivity can be calculated using the
relatively prime numbers selected during the following formulas. Heren=no of nodes
parameter selection phase the key chains are supported.
generated using the CRT formula (A mod m0,  Total no of pairs=n(n-1)/2.
A mod m1). Let A denote the Key pool which  No of pairs sharing atleast 1
ranges from 0 to M-1. key=(total pairs - the no of pairs NOT
Algorithm: sharing atleast one key).
Input: mi=relatively prime numbers, Key connectivity (KC) = (No of pairs sharing
M=Product of atleast one key / Total pairs).
mi, n=no of relatively prime numbers. Resilience is the resistance against node
Output: Key ring set. capture. Compromise of security credentials,
A=(0 toM-1) which are stored on a sensor node or
for(j=0;j≤M-1;j++) exchanged over radio links, should not reveal
{ any credential information. These factors are
for(i=0;i≤n-1;i++) to be analyzed.
{
KR[i][j]={j mod mi} conclusion
} To overcome the various security
} vulnerabilities in the wireless sensor networks
the design of a pre distribution algorithm using
The key pool A={0,1,2,3,4,5} where M=6. a deterministic approach is initiated. The
The various key combinations of mi =(2,3) are deterministic approach is the process of
found using the Chinese reminder theorem. determining the keys or key chain based on
The key chain is generated for the above some criteria. In this paper we have proposed
example using the mod functions (A mod m0,A a novel deterministic key pre distribution
mod m1) is algorithm using the Chinese reminder theorem
{(0,0),(1,1),(0,2),(1,0),(0,1),(1,2)}. for distributed wireless sensor network. In our
.Key Chain Selection future work, after analyzing the performance
Key chain selection is performed by metrics of key distribution algorithm, we plan
considering the constraint that there should be to extend this novel algorithm for hierarchal
no same or repeated keys within a key chain. wireless network by adapting probabilistic
The nodes having repeated keys are rejected. method.
The key chain having without any repeated
key is termed as the valid key chain. The
count of such valid key chain determines the
network size it can support. The combinations REFERENCES
which are same or with repeated keys are
considered as in valid key chain. [1] L. Eschenauer and V. D. Gligor, ―A key-
management scheme for distributed sensor
Eliminate key ring which satisfy Condition 1 networks.‖ in ACM CCS, 2002, pp. 41–47.
Condition1:checking whether any keys are [2]Camtepe, S. and Yener, B.. ―Combinatorial
repeated within the key ring design of key istribution mechanisms for
Condition2:Check whether the key rings wireless sensor networks‖. In 9th European
has same keys but in different order if so keep Symposium on Research Computer Security
anyone key ring and eliminate the rest ,2004.
From example1considering the condition [3]Elaine shi and Adrian perrig, Carnegie
specified above the valid key chains Mellon University. ―Designing Secure Sensor
are(0,1),(0,2),(1,2) and the number of nodes Networks‖,IEEE wireless communications
supported are 3 by mi =(2,3). December 2004.

482
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[4]Hung-Min Sun∗, Shih-Ying Chang∗, Yu- networks‖, IEEE communications surveys and
Hsiang Hung∗, Yu-Kai Tseng∗ and Hsin-Ta tutorials 2nd Quarter 2006,Volume8,No2.
Chiao, ―Decomposable Forward Error [15] Nguyen, H.T.T.; Guizani, M.; Minho Jo;
Correction Codes Based on Chinese Remainder Eui-Nam Huh ―An Efficient Signal-Range-Based
Theorem‖, 10th International Symposium on Probabilistic Key Predistribution Scheme in a
Pervasive Systems, Algorithms, and Wireless Sensor Network‖, Vehicular
Networks,2009. Technology, IEEE
[5] Zhu, S., Setia, S., and Jajodia, S. ―Leap: Transactions on, Volume 58, Issue 5, 2009,
Efficient security mechanisms for large-scale Page(s):2482 – 2497.
distributed sensor networks‖, In 10th ACM [16] S. A. Camtepe and B. Yener, ―Key
Conference on Computer and Communications Distribution Mechanisms for Wireless Sensor
Security2003. Networks: A Survey,‖ Computer Science
[6]Jianmin Zhang , Wenqi Yu,Xiande Department at RPI, Tech. Rep. TR-05-07,
Liu,‖CRTBA: Chinese Remainder Theorem- 2005.
Based Broadcast Authentication in Wireless [17] Yang Xiao, Venkata Krishna Rayi, Bo
Sensor Networks‖, 2009 IEEE Sun, Xiaojiang Du, Fei Hu and Michael
[7]Kaoru Kurosawa, Wataru Kishimoto, and Galloway, ―A survey of key management
Takeshi Koshiba,‖A Combinatorial Approach to schemes in wireless sensor networks‖,
Deriving lower transactions on information Computer Communications Elsevier, Science
theory,‖ vol. 54, no. 6, june 2008 Direct, pp. 2007, 2314–2341.
[8] Hwang, D., Lai, B., and Verbauwhede, I. [18] Yang Xiao, Venkata Krishna Rayi, Bo
―Energy-memory-security tradeoffs in Sun, Xiaojiang Du, Fei Hu, and Michael
distributed sensor networks‖, In 3rd Galloway, ―A Survey of Key Management
International Conference on Ad-Hoc Networks Schemes in Wireless Sensor Networks‖,
and Wireless Network. In 1st ACM Workshop Computer Communications, Special Issue On
on Security of Ad Hoc and Sensor Networks, Security On Wireless Ad Hoc And Sensor
2004. Networks, April 24, 2007.
[9] Seyit a. Camtepe and Bulent yener [19] SUN Dong-Mei1 HE Bing ―Review of Key
Rensselaer Polytechnic Institute. ―Key Management Mechanisms in Wireless Sensor
Distribution Mechanisms for Wireless Sensor Networks‖, Acta Automatica Sinica, Vol. 32,
Networks‖: A Survey TechnicalReport TR-05- No. 6, November, 2006.
07, March 23,2005.
[10]T. Kavitha, Dr.D.Sridharan, ―Hybrid Design [20] T. kavitha, Dr.D.sridharan ―Security
of Scalable Key Distribution for Wireless vulnerabilities in wireless sensor networks: a
Sensor Networks‖. IACSIT International survey‖ is published in an international Journal
Journal of Engineering and Technology, Vol.2, on Information Assurance and Security (JIAS)
No.2, April 2010 in issue 5, 2010, pp 031-044.
[11]Taekyoung Kwon, JongHyup Lee,JooSeok [21] Du, W., Deng, J., Han, Y., Chen, S., and
Song, Kwon, ―Location-Based Pairwise Key Varshney, P. ―A key management scheme for
Predistribution for Wireless Sensor wireless sensor networks using deployment
Networks‖IEEE transactions on wireless knowledge‖, In IEEE Infocom‘04, 2004.
communications, vol. 8, no. 11, november
2009.
[12]xiangqian chen,kia makki,kang yen and
nikki pissinou―sensor network security: a
survey‖ IEEE communications surveys and
tutorials vol 11,2009
[13] Yi Cheng and Dharma P. Agrawal,‖
Efficient Pairwise Key Establishment and
Management in Static Wireless Sensor
Networks‖, 2005 IEEE.
[14]Yong Wang,Garchan Attebury,and Byrav
ramamurthy .University of Nebraska-Lincoln.
―A survey of security issues in wireless sensor

483
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DEDUCING THE SCHEMA FOR WEBSITES


USING
PAGE-LEVEL WEB DATA EXTRACTION
*TAMILARASI.P **S.MADHAN KUMAR M.E

*M.E CSE II YEAR


tamuluparaman@gmail.com
Tel No. : +919003886188

**Head Of the Department


Assistant professor
madhan866@gmail.com

Computer Science and Engineering Department


Vel tech Multi tech Dr.RangarajanDr.SakunthalaEngineeringCollege
Avadi, Chennai-600 062

1 INTRODUCTION
Abstract— Many web sites contain large sets DEEP Web, as is known to everyone,
of pages generated using a common template contains magnitudes more and valuable
or layout. So web data extraction has been an information than the surface Web. However,
important part for many web data analysis making use of such consolidated information
applications. In this paper, we study the requires substantial efforts since the pages are
problem of automatically extracting the generated for visualization not for data
database values from template generated web exchange. Thus, extracting information from
pages without any learning examples. And also Webpages for searchable Websites has been a
we study an unsupervised, page level data key step for Web information integration. An
extraction approach to deduce the schema and important characteristic of pages belonging to
templates for each individual web site, which the same Website is that such pages share the
contains either singleton or multiple data same template since they are encoded in a
records in one webpage.FiVaTech applies tree consistent manner across all the pages. In
matching, tree alignment, and mining other words, these pages are generated with a
techniques to achieve the challenging task. In predefined template by plugging data values.
experiments, FiVaTech has much higher In practice, template pages can also occur in
precision than EXALG.The experiments show surface Web (with static hyperlinks).In
an encouraging result for the test pages used addition, templates can also be used to render
in many state-of-the-art Web data extraction a list of records to show objects of the same
works. kind. Thus, information extraction from
template pages can be applied in many
Index Terms-Semi structured data, Web data situations. What‘s so special with template
extraction, multiple trees merging, wrapper pages is that the extraction targets for
induction. template Webpages are almost equal to the
data values embedded during page

484
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

generation. Thus, there is no need to annotate automatically. Generally speaking, templates,


the Webpages for extraction targets as in as a common model for all pages, occur quite
nontemplate page information extraction (e.g., fixed as opposed to data values which vary
Softmealy [5], Stalker [9], WIEN [6], etc.) and across pages. Finding such a common
the key to automatic extraction depends on template requires multiple pages or a single
whether we can deduce the template page containing multiple records as input.

Fig. 1. (a) A Webpage and its two different schemas (b) S and (c) S‘.
When multiple pages are given,the 2 PROBLEM FORMULATION
extraction target aims at page-wide In this section, we formulate the model
information (e.g., RoadRunner [4] and EXALG for page creation, which describes how data
[1]).When single pages are given, the are embedded using a template. As we know,
extraction target is usually constrained to a Webpage is created by embedding a data
recordwide information (e.g., IEPAD [2], DeLa instance encoding function that combines a
[11], and DEPTA [14]), which involves the data instance with the template to form the
addition issue of record-boundary detection. Webpage, where all data instances of the
Page-level extraction tasks, although do not database conform to a common schema,
involve the addition problem of boundary which can be defined as follows (a similar
detection, are much more complicated than definition can also be found at EXALG [1]):
record-level extraction tasks since more data Definition 2.1 (Structured data). A data
are concerned. In this paper, we focus on schema can be of the following types:
page-level extraction tasks and propose a new 1. A basic type β represents a string of
approach, called FiVaTech, to automatically tokens, where a token is some basic
detect the schema of a Website. The proposed units of text.
technique presents a new structure, called
fixed/variant pattern tree, a tree that carries 2. If η1,η2, . . . ,ηk are types, then their
all of the required information needed to ordered list <η1,η2, . . . ηk> also
identify the template and detect the data forms a type η. We say that the type
schema. We combine several techniques: η is constructed from the types
alignment, pattern mining, as well as the idea <η1,η2, . . . ,ηk> using a type
of tree templates to solve the much difficult constructor of order k. An instance of
problem of page-level template construction. the korder _ is of the form <x1; x2; .
In experiments, FiVaTech has much higher . . ; xk>, where x1, x2, . . . .,xk are
precision than EXALG, one of the few page- instances of types η1,η2 . . . ηk,
level extraction system. respectively. The type η is called

485
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

a. A tuple, denoted by <k >η, if introduce how input DOM trees can be
the cardinality (the number recognized and merged into the pattern tree
of instances) is 1 for every for schema detection. According to our page
instantiation. generation model, data instances of the same
type have the same path from the root in the
b. An option, denoted by DOM trees of the input pages.
<k>?η, if the cardinality is Thus, our algorithm does not need to
either 0 or 1 for every merge similar subtrees from different levels
instantiation. and the task to merge multiple trees can be
broken down from a tree level to a string level.
c. A set, denoted by {k}η, if the Starting from root nodes <html> of all input
cardinality is greater than 1 DOM trees, which belong to some type
for some instantiation. constructor we want to discover, our algorithm
applies a new multiple string alignment
d. A disjunction, denoted by
algorithm to their first-level child nodes. There
{η1|η2|....|ηk)η are options
are at least two advantages in this design.
and the cardinality sum of
First, as the number of child nodes under a
the k options (η1-ηk)equals 1
parent node is much smaller than the number
for every instantiation of η.
of nodes in the whole DOM tree or the number
As mentioned before, template pages of HTML tags in a Webpage, thus, the effort
are generated by embedding a data instance for multiple string alignment here is less than
in a predefined template via a CGI program. that of two complete page alignments in
Thus, the reverse engineering of finding the RoadRunner [4].
template and the data schema given input Second, nodes with the same tag
Webpages should be established on some name (but with different functions) can be
page generation model, which we describe better differentiated by the subtrees they
next.The advantage of tree-based page represent, which is an important feature not
generation model is that it will not involve used in EXALG [1]. Instead, our algorithm will
ending tags (e.g., </html>, </body>, etc.) recognize such nodes as peer nodes and
into their templates as in string-based page denote the same symbol for those child nodes
generation model applied in EXALG. to facilitate the following string alignment.
Concatenation is a required operation After the string alignment step, we conduct
in page generation model since subitems of pattern mining on the aligned string S to
data must be encoded with templates to form discover all possible repeats (set type data)
the result. For example, encoding of a k-order from length 1 to length |S|/2. After removing
type constructor η with instance x should extra occurrences of the discovered pattern
involve the concatenation of template trees T, (as that in DeLa [11]), we can then decide
with all the encoded trees of its subitems for whether data are an option or not based on
x. However, tree concatenation is more their occurrence vector, an idea similar to that
complicate since there is more than one point in EXALG [1].
to append a subtree to the rightmost path of
an existing tree. Thus, we need to consider
the insertion position for tree concatenation.

3 FIVATECH TREE MERGING


The proposed approach FiVaTech
contains two modules: tree merging and
schema detection (see Fig. 2). The first
module merges all input DOM trees at the
same time into a structure called fixed/variant
pattern tree, which can then be used to detect
the template and the schema of the Website in
the second module. In this section, we will

486
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

P. For nonleaf child node c, if c is not a fixed


template tree (as defined in the next section),
the algorithm recursively calls the tree
merging algorithm with the peer subtrees of c
(by calling procedure peerNode(c,M), which
returns nodes in M having the same symbol of
c) to build the pattern tree.The next four
sections will discuss in details recognition of
peer subtrees, multiple string alignment,
frequent pattern mining, and merging of
optional nodes, which are applied for each
node in constructing the fixed/variant pattern
tree.
4 SCHEMA DETECTION
In this section, we describe the
procedure for detecting schema and template
based on the page generation model and
Fig. 2. The FivaTech approach for wrapper problem definition. Detecting the structure of
induction a Website includes two tasks:
Given a set of DOM trees T with the Identifying the schema and defining
same function and its root node P, the system the template for each type constructor of this
collects all (first-level) child nodes of P from T schema. Since we already labelled basic type,
in a matrix M, where each column keeps the set type, and optional type, the remaining task
child nodes for every peer subtree of P. Every for schema detection is to recognize tuple type
node in the matrix actually denotes a subtree, as well as the order of the set type and the
which carries structure information for us to optional data. the system traverses the
differentiate its role. Then, we conduct the fixedvariant pattern tree P from the root
four steps: peer node recognition, matrix downward and marks nodes as k-order (if the
alignment, pattern mining, and optional node node is already marked as some data type) or
detection in turn. k-tuple. For nodes with only one child and not
In the peer node recognition step, two marked as set or optional type, there is no
nodes with the same tag name are compared need to mark it as 1-tuple (otherwise, there
to check if they are peer subtrees. All peer will be too many 1-tuples in the schema);
subtrees will be denoted by the same symbol. thus, we simply traverse down the path to
In the matrix alignment step ,the discover other type nodes. For nodes with
system tries to align nodes (symbols) in the more than one branch (child), we will mark
peer matrix to get a list of aligned nodes them as k-order if k children have the function
childList. In addition to alignment,the other MarkTheOrder(C) return true. The identified
important task is to recognize variant leaf tuple nodes of the running example are
nodes that correspond to basic-typed data. marked by angle brackets <>.
In the pattern mining step ,the system Then, the schema tree S can be
takes the aligned childList as input to detect obtained by excluding all of the tag nodes that
every repetitive pattern in this list starting with have no types. Once the schema is identified,
length 1. For each detected repetitive pattern, the template of each type can be discovered
all occurrences of this pattern except for the by concatenating nodes without types. The
first one are deleted for further mining of insertion positions can also be calculated with
longer repeats. The result of this mining step reference to the leaf node of the rightmost
is a modified list of nodes without any path of the template subtree.
repetitive patterns. 5 EXPERIMENTS
In the last step, the system
recognizes optional nodes if a node disappears We conducted experiment to evaluate
in some columns of the matrix and group the schema resulted by our system and
nodes according to their occurrence vector. compare FiVaTech with other recent approach.
After the above four steps, the system inserts The experiment is conducted to evaluate the
nodes in the modified childList as children of

487
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

schema resulted by our system, and at the challenge and causes many problems. Also,
same time, to compare FiVaTech with EXALG EXALG assumes that a pair of two valid
[1]; the pagelevel data extraction approach equivalence classes is nested, although this is
that also detects the schema of a Website. not necessarily true. Two data records may be
Given a set of Webpages of a Website intertwining in terms of their HTML codes.
as input, FiVaTech outputs three types of files Finally, a more compact schema can
for the Website. The first type (a text file) be conducted by compressing continuous
presents the schema (data values) of the tuples, removing continuous sets and any 1-
Website in an XML-like structure. We use tuples. A list of types η1,η2, . . . ,ηn is
these XML files in the experiment to compare continuous if ηi is a child of ηi-1 (for n > i > 1).
FiVaTech with EXALG. If η1,η2, . . . ,ηn are of tuples of order k1,k2, . . .
,kn respectively, then the new compressed
6 COMPARISONS WITH RELATED tuple is of order k1+k2+ . . ....kn-n+1. For the
WORK above example, we can compress η3,η4,η5,η7 to
Web data extraction has been a hot get a 7-tuple(=2+2+4+2+2-4+1) and the
topic for recent 10 years. A lot of approaches new schema
have been developed with different task S={{β1,β2,(β3),β4,(β5)?η8,
domain, automation degree, and techniques (<{β6}η11>η10)?η9,(<β7>η13)?η12}ω}η1,
[3], [7].Regardless of the granularity of the where ω is a 7-set.
used tokens, both FiVaTech and EXALG 7 CONCLUSIONS
recognize template and data tokens in the In this paper, we proposed a new Web
input Webpages. data extraction approach, called FiVaTech to
EXALG assumes that HTML tags as the problem of page-level data extraction. We
part of the data and proposes a general formulate the page generation model using an
technique to identify tokens that are part of encoding scheme based on tree templates and
the data and tokens that are part of the schema, which organize data by their parent
template by using the occurrence vector for node in the DOM trees. FiVaTech contains two
each token and by differentiating the role of phases: phase I is merging input DOM trees to
the tokens according to its DOM tree path. construct the fixed/variant pattern tree and
Although this assumption is true, phase II is
differentiating HTML tag tokens is a big
schema and template detection based generation model with tree-based template
on the pattern tree. matches the nature of the Webpages.
According to our page generation Meanwhile, the merged pattern tree gives very
model, data instances of the same type have good result for schema and template
the same path in the DOM trees of the input deduction.
pages. Thus, the alignment of input DOM trees REFERENCES
can be implemented by string alignment at [1] A. Arasu and H. Garcia-Molina, ―Extracting
each internal node. We design a new Structured Data from Web Pages,‖ Proc. ACM
algorithm for multiple string alignment, which SIGMOD, pp. 337-348, 2003.
takes optional- and set-type data into [2] C.-H. Chang and S.-C. Lui, ―IEPAD:
consideration. The advantage is that nodes Information Extraction Based on Pattern
with the same tag name can be better Discovery,‖ Proc. Int‘l Conf. World Wide Web
differentiated by the subtree they contain. (WWW-10), pp. 223-231, 2001.
Meanwhile, the result of alignment makes [3] C.-H. Chang, M. Kayed, M.R. Girgis, and
pattern mining more accurate. With the K.A. Shaalan, ―Survey of Web Information
constructed fixed/variant pattern tree,wecan Extraction Systems,‖ IEEE Trans. Knowledge
easily deduce the schema and template for the and Data Eng., vol. 18, no. 10, pp. 1411-1428,
input Webpages. Oct. 2006.
Although many unsupervised [4] V. Crescenzi, G. Mecca, and P. Merialdo,
approaches have been proposed for Web data ―Knowledge and Data Engineerings,‖ Proc.
extraction (see [3], [7] for a survey), very few Int‘l Conf. Very Large Databases (VLDB), pp.
works (RoadRunner and EXALG) solve this 109-118, 2001.
problem at a page level. The proposed page

488
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[5] C.-N. Hsu and M. Dung, ―Generating Finite-


State Transducers for Semi-Structured Data
Extraction from the Web,‖ J. Information
Systems, vol. 23, no. 8, pp. 521-538, 1998.
[6] N. Kushmerick, D. Weld, and R.
Doorenbos, ―Wrapper Induction for
Information Extraction,‖ Proc. 15th Int‘l Joint
Conf. Artificial Intelligence (IJCAI), pp. 729-
735, 1997.
[7] A.H.F. Laender, B.A. Ribeiro-Neto, A.S.
Silva, and J.S. Teixeira, ―A Brief Survey of Web
Data Extraction Tools,‖ SIGMOD Record, vol.
31, no. 2, pp. 84-93, 2002.
[8] B. Lib, R. Grossman, and Y. Zhai, ―Mining
Data Records in Web pages,‖ Proc. Int‘l Conf.
Knowledge Discovery and Data Mining (KDD),
pp. 601-606, 2003.
[9] I. Muslea, S. Minton, and C. Knoblock, ―A
Hierarchical Approach to Wrapper Induction,‖
Proc. Third Int‘l Conf. Autonomous Agents (AA
‘99), 1999.
[10] K. Simon and G. Lausen, ―ViPER:
Augmenting Automatic Information Extraction
with Visual Perceptions,‖ Proc. Int‘l Conf.
Information and Knowledge Management
(CIKM), 2005.
[11] J. Wang and F.H. Lochovsky, ―Data
Extraction and Label Assignment for Web
Databases,‖ Proc. Int‘l Conf. World Wide Web
(WWW-12), pp. 187-196, 2003.
[12] Y. Yamada, N. Craswell, T. Nakatoh, and
S. Hirokawa, ―Testbed for Information
Extraction from Deep Web,‖ Proc. Int‘l Conf.
World Wide Web (WWW-13), pp. 346-347,
2004.

489
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A TOOL FOR FINDING BUGS IN WEB


APPLICATIONS
D.Ramalingam.,B.E., (M.E.),
e-mail:ramscse_2006@yahoo.co.in
Studying Second year M.E.(cse)
At
Adhiparasakthi Engineering College,Melmaruvathur

ABSTRACT web applications such as ASP,JSP


Web script crashes and malformed
and PHP with extended concept of
dynamically generated WebPages are
dynamic test generation technique.
common errors, and they seriously impact
In addition, we can find bugs for
the usability of Web applications. Current
different web applications at
tools for webpage validation cannot
different computers at a time by this
handle the dynamically generated pages
tool with the help of proxy server.
that are ubiquitous on today‘s Internet.
This paper presents Apollo‘s algorithms
We present a dynamic test generation
and implementation, and an experimental
technique for the domain of dynamic Web
evaluation that revealed 673 faults in six
applications. The technique utilizes both
PHP Web applications
combined concrete and symbolic execution
and explicit-state model checking. The
1 INTRODUCTION
technique generates tests automatically,
DYNAMIC test generation tools,
runs the tests capturing logical constraints
such as DART, Cute, and EXE, generate
on inputs, and minimizes the conditions on tests by executing an application on
concrete input values, and then creating
the inputs to failing tests so that the
additional input values by solving symbolic
resulting bug reports are small and useful constraints derived from exercised control-
flow paths. To date, such approaches
in finding and fixing the underlying faults.
have not been practical in the domain of
The earlier version of Apollo implements Web applications, which pose special
challenges due to the dynamism of the
the technique for the PHP programming
programming languages, the use of
language. Apollo generates test inputs for implicit input parameters, their use of
persistent state, and their complex
a Web application, monitors the
patterns of user interaction.
application for crashes, and validates that
This paper extends dynamic test
the output conforms to the HTML
generation to the domain of web
specification. In this paper, we propose applications that dynamically create web
(HTML) pages during execution, which are
that we are finding bugs in all kind of

490
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

typically presented to the user in a Fourth, a browser might succeed in


browser. Apollo applies these techniques displaying only part of a malformed
in the context of the scripting language webpage, while silently discarding
PHP, one of the most popular languages important information. Fifth, search
for server-side Web programming. engines may have trouble indexing
malformed pages.
According to the Internet research
service, Netcraft,1 PHP powered 21 million 1.1 EXISTING SYSTEM:
domains as of April 2007, including large,
well-known webs sites such as Wikipedia The existing systems that it
and WordPress. In addition to dynamic generates test to the domain of web
content, modern Web applications may application for finding bugs by a single
also generate significant application logic, computer at a time. The system is also
typically in the form of JavaScript code finding bugs only for PHP web
that is executed on the client side. Our applications. This system‘s goal is to find
techniques are primarily focused on two kinds of failures in web applications:
server-side PHP code, although we do execution failures that are manifested as
some minimal analysis of client-side code crashes or warnings during program
to determine how it invokes additional execution, and HTML failures that occur
server code through user-interface when the application generates malformed
mechanisms such as forms. HTML. Execution failures may occur, for
example, when a web application calls an
Our goal is to find two kinds of undefined function or reads a nonexistent
failures in web applications: execution file. In such cases, the HTML output
failures that are manifested as crashes or contains an error message and execution
warnings during program execution, and of the application may be halted,
HTML failures that occur when the depending on the severity of the failure.
application generates malformed HTML.
Execution failures may occur, for example, 1.2 PROPOSED SYSTEM
when a web application calls an undefined The proposed system is mainly
function or reads a nonexistent file. In designed to extend the dynamic test
such cases, the HTML output contains an generation technique for all domains of
error message and execution of the web applications (such as JSP, ASP, and
application may be halted, depending on PHP). This technique implemented
the severity of the failure. through Apollo algorithm. To implement
Apollo algorithm, we are going to develop
HTML failures occur when output a tool called Apollo. The tool for which
is generated that is not syntactically well- can test multiple web applications at
formed HTML (e.g., when an opening tag multiple computers at a time by sharing
is not accompanied by a matching closing the proxy through server.
tag). HTML failures are generally not as
important as execution failures because The failure detection algorithm
Web browsers are designed to tolerate returns bug reports. Each bug report
some degree of malformedness in HTML, contains a set of path constraints, and a
but they are undesirable for several set of inputs exposing the failure. Previous
reasons. First and most serious is that dynamic test generation tools presented
browsers‘ attempts to compensate for the whole input (i.e., many input
malformed web pages may lead to crashes Parameter; value pairs) to the user
and security vulnerabilities. Second, without an indication of the subset of the
standard HTML renders faster. Third, input responsible for the failure. As a
malformed HTML is less portable across postmortem phase, our minimization
browsers and is vulnerable to breaking or algorithm attempts to find a shorter path
looking strange when displayed by constraint for a given bug report. This
browser versions on which it is not tested. eliminates irrelevant constraints, and a

491
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

solution for a shorter path constraint is


often a smaller input. The above problem
can completely be eliminated by this
exposed system.

2. IMPLEMENTATION

2.1 SYSTEM ARCHITECTURE

We created a tool called Apollo that


implements our technique for PHP. Apollo Clearly, this potentially leaves
consists of three major components, many uses of input unaccounted for.
Executor, Bug Finder, and Input Generator However, our results suggest that this is
illustrated in the below Fig.. This section sufficient to capture the bulk of how PHP
first provides a high-level overview of the code uses inputs in practice. Values
components and then discusses the derived directly from input are those
pragmatics of the implementation. The read from one of the special arrays POST,
inputs to Apollo are the program under GET, and REQUEST, which store
test and an initial value for the parameters supplied to the PHP program.
environment. The environment of a PHP For example, executing the statement $x
program consists of the database, cookies, ¼ $ GET½‗‗param1}_ results in
and stored session information. The initial associating the value read from the global
environment usually consists of a parameter param1 and bound to
database populated with some values, and parameter x with the symbolic variable
usersupplied information about param Values maintain their associations
username/password pairs to be used for through the operations mentioned above;
database authentication. that is, the symbolic variables for the new
values receive the same value as the
THE EXECUTOR source value had. Importantly, during
program execution, the concrete values
It is responsible for executing a PHP remain, and the shadow interpreter does
script with a given input in a given state. not influence execution.
The executor contains two
subcomponents: . The Shadow Interpreter
is a PHP interpreter that we have modified BUG FINDER
to propagate and record path constraints The bug finder is in charge of
and positional information associated with transforming the results of the executed
output. This positional information is used inputs into bug reports. Below is a detailed
to determine which failures are likely to be description of the components of the bug
symptoms of the same fault. The State finder. Bug report repository. This
Manager restores the given state of the repository stores the bug reports found in
environment (database, session, and all executions. Each time a failure is
cookies) before the execution and stores detected, the corresponding bug report (if
the new environment after the execution. the same failure was discovered before) is
The Bug Finder uses an oracle to find updated with the path constraint and the
HTML failures, stores all bug reports, and configuration inducing the failure.
finds the minimal conditions on the input
parameters for each bug report. The Bug INPUT GENERATOR
Finder has the following subcomponents: .
The Oracle finds HTML failures in the UI option analyzer: Many PHP
output of the program. Web applications create interactive HTML
pages that contain user-interface elements
such as buttons and menus that allow the

492
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

user interaction needed to execute further tracks the state of the environment, and
parts of the application. In such cases, automatically discovers additional
pressing the button may result in the configurations based on an analysis of the
execution of additional PHP source files. output for available user options. In
There are two challenges involved in particular, the algorithm 1) tracks changes
dealing with such interactive applications. to the state of the environment (i.e.,
session state, cookies, and the database)
and 2) performs an ―on-the-fly‖ analysis of
the output produced by the program to
determine what user options it contains,
with their associated PHP scripts.

simplified algorithm that was


previously shown in Fig. 2.

1 A configuration contains an explicit


state of the environment (before the
only state that was used was the
initial state S0) in addition to the path
constraint and the input (line 3).

Constraint solver: The 2 Before the program is executed, the


algorithm (method executeConcrete)
interpreter implements a lightweight
will restore the environment to the
symbolic execution, in which the only
state given in the configuration (line
constraints are equality and inequality with
constants. Apollo transforms path 7) and will return the new
3 When the getConfigs subroutine is
constraints into integer constraints in a
executed to find new configurations, it
straight forward way, and uses choco15 to
solve them. This approach still allows us analyzes the output to find possible
transitions from the new environment
to handle values of the standard types
state (lines 24-27). The analyzeOutput
(integer, string), and is straight forward
function extracts parameter names
because the only constraints are equality
and possible values for each
and inequality. In cases where parameters
are unconstrained, Apollo randomly chose parameter, and represents the
values from a predefined list of constants. extracted information as a path
While limiting to the basic types number constraint. For simplicity, the
algorithm uses only one entry point
and string and only comparisons may
seem very restrictive, note that all input into the program. However, in
comes to PHP as strings; furthermore, in practice, there maybe several entry
our experience, the bulk of use of input points into the program (e.g., it is
values consists of the kinds of simple possible to call different PHP scripts).
operations that are captured by our The analyzeOutput function discovers
tracing and the kinds of simple these entry points in addition to the
path constraints. In practice, each
comparisons captured here. Our coverage
results suggest this is valid for a significant transition is expressed as a pair of a
range of PHP applications. path constraint and an entry point.

3. ALGORITHM
4 The algorithm uses a set of
Fig. 1 shows pseudocode configurations that are already in the
queue (line 14) and it performs state
for the algorithm, which extends the
matching in order to only explore new
algorithm in Fig. 2 with explicit-state
model checking to handle the complexity configurations (line 11).
of simulating user inputs. The algorithm

493
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

FIG 1

4. CONCLUSION

We have presented a
technique for finding faults in PHP
Web applications that is based on
combined concrete and symbolic
execution. The work is novel in several
respects. First, the technique not only
detects runtime errors but also uses
an HTML validator as an oracle to
FIG 2 determine situations where malformed
HTML is created. Second, we address
a number of PHP-specific issues, such
as the simulation of interactive user
input that occurs when user-interface
elements on generated HTML pages
are activated, resulting in the
execution of additional PHP scripts.
Third, we perform an automated
analysis to minimize the size of failure-
inducing inputs.

We created a tool, Apollo, that


implements the analysis. We
evaluated Apollo on six open-source
PHP web applications. Apollo‘s test
generation strategy achieves over 50
percent line coverage. Apollo found a
total of 673 faults in these

494
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

applications: 72 execution problems Tainting,‖ Proc. Int‘l Symp. Software


and 601 cases of malformed HTML. Testing and Analysis, 2009.
Finally, Apollo also minimizes the size [9] H. Cleve and A. Zeller, ―Locating
of failure-inducing inputs: The Causes of Program Failures,‖ Proc. Int‘l
minimized inputs are up to 5:3_ Conf. Software Eng., pp. 342-351, 2005.
smaller than the unminimized ones.

In this first phase, first modules of


this project have been developed and
remaining modules are to be done as
future work in the second phase.

REFERENCES
[1] S. Anand, P. Godefroid, and N.
Tillmann, ―Demand-Driven Compositional
Symbolic Execution,‖ Proc. Int‘l Conf.
Tools and Algorithms for the Construction
and Analysis of Systems, pp. 367-381,
2008.
[2] S. Artzi, A. Kiezun, J. Dolby, F.
Tip, D. Dig, A. Paradkar, and M.D. Ernst,
―Finding Bugs in Dynamic Web
Applications,‖ Proc. Int‘l Symp. Software
Testing and Analysis, pp. 261-272, 2008.
[3] M. Benedikt, J. Freire, and P.
Godefroid, ―VeriWeb: Automatically
Testing Dynamic Web Sites,‖ Proc. Int‘l
Conf. World Wide Web, 2002.
[4] D. Brumley, J. Caballero, Z.
Liang, J. Newsome, and D. Song,
―Towards Automatic Discovery of
Deviations in Binary Implementations with
Applications to Error Detection and
Fingerprint Generation,‖ Proc. 16th
USENIX Security Symp., 2007.
[5] C. Cadar, D. Dunbar, and D.R.
Engler, ―Klee: Unassisted and Automatic
Generation of High-Coverage Tests for
Complex Systems Programs,‖ Proc.
USENIX Symp. Operating Systems Design
and Implementation, pp. 209-224, 2008.
[6] C. Cadar and D.R. Engler,
―Execution Generated Test Cases: How to
Make Systems Code Crash Itself,‖ Proc.
Int‘l SPIN Workshop Model Checking of
Software, pp. 2-23, 2005.
[7] C. Cadar, V. Ganesh, P.M.
Pawlowski, D.L. Dill, and D.R. Engler,
―EXE: Automatically Generating Inputs of
Death,‖ Proc. Conf. Computer and Comm.
Security, pp. 322-335, 2006.
[8] J. Clause and A. Orso,
―Penumbra: Automatically Identifying
Failure-Relevant Inputs Using Dynamic

495
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

QOS METRICS IN PARTICLE SWARM


TECHNIQUE FOR SELECTION, RANKING AND
UPDATION OF WEB SERVICE

*N.Suganya, (P.G.Student)/CSE,
**K.Jayashree (Senior Lecturer)/CSE
*suganya2011@gmail.com, Ph: +91 9944573696
**K_jayashree106@yahoo.com
Rajalakshmi Engineering College, Chennai, India
flow management, application integration, etc. It
Abstract—Nowadays the number of services presents a promising solution for solving platform
published over the internet is growing at an interoperability problems encountered by the
explosive speed. So it is difficult for service application system integrators.
requesters to select satisfactory web services,
which provide similar functionality. The Quality of With the rapid development of web service
service is considered the most important criterion technology in these years, traditional XML based
standards (i.e., UDDI) have been mature during
for service filtering. In this paper, the web service
description models consider the service Qos service registry and discovery process. They can be
information and present an overall web service dynamically discovered and integrated at runtime in
selection and ranking for fulfilling service order to develop and deploy business applications.
requester‘s functional and non functional However, the standard WS techniques (such as
requirements. The service selection method is WSDL and UDDI) fail to realize dynamic WSDi, as
based on particle swarm optimization technique. By they rely on syntactic and static descriptions of
using this multi objective Particle swarm service interfaces and other nonfunctional service
optimization technique, a number of Qos values attributes for publishing and finding WSs. As a
can be optimized at the same time and it ultimatelyresult, the corresponding WSDi mechanisms return
improve the service performance. This method can results with low precision and recall. In addition, no
significantly improve the problem solving speed andmeans are provided in order to select among
reduce the complexity in selection, ranking and multiple functionally equivalent WSs. The solution
updation of Qos Web service. to the last problem is QoS-based WS Description
and Discovery (WSDD). QoS for WSs is a set of
Keywords-Web Services, Service selection, nonfunctional properties that encompass
Particle Swarm Optimization, Service Oriented performance characteristics among others. As users
Architecture, Quality of Service are very concerned about the performance of WSs
they use, QoS can be used for discriminating
between functionally equivalent WSs.
Introduction
Web Services (WSs) are modular, self-describing, There is a large body of related work in service
and loosely coupled software applications that can selection that attempt to solve the service selection
be advertised, located, and used across the problem using various techniques. The work was
Internet using a set of standards such as SOAP, proposed in the reference [1], which presented a
WSDL, and UDDI. Web service technology is model of reputation-enhanced QoS-based web
becoming more and more popular in many practical service discovery that combines an augmented
application domains, such as electronic commerce, UDDI registry to publish the QoS information and a
reputation manager to assign reputation scores to

496
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

services. However, it only described an abstract A. Web Services Overview


service matchmaking, ranking and selection
algorithm. Moreover, they failed to give an efficient 1. Definition
metrics method for QoS computation, which was
only evaluated by the dominant QoS attribute. In Web service is defined as: "A collection of
order to enable quality-driven web service functions that are packaged as a single entity and
selection, the authors in [2] proposed a QoS published to the network for use by other
computation model by implementation and programs. Web services are building blocks for
experimentation with a QoS registry in a creating open distributed systems, and allowing
hypothetical phone service provisioning. companies and individuals to quickly and cheaply
Unfortunately, as a result of their measurement make their digital assets available worldwide". Web
way of QoS values normalization, it is very difficult services use XML-based messaging as a
to make a uniform evaluation for all quality criteria fundamental means of data representation.
because their QoS metrics values are not limited in
a definite range. Therefore, it will bring about a 2. Web Services Architecture
problem that a quality attribute even has a higher
weight, while its internal impact is decreased by its Web service architecture defines interactions
smaller QoS value. In [3], the authors presented a between three roles: service provider, service
QoS-based service selection model. They specified registry and service requester.
QoS ontology and its vocabulary by Web Service
Modeling Ontology (WSMO)[4]. Especially, they
gave a selection mechanism based on an optimum
normalization algorithm, which integrates service
selection and ranking. Although this can simplify
computational complexity, it will also cause a
problem that some returned web services with high
synthetic QoS score cannot fulfill some single QoS
criteria condition. In [5], the authors proposed a
web service discovery model where functional and
non-functional requirements are taken into
account. However, not any feedback can be
collected from service requesters as reference to
updating QoS value. In [6], the author described a
agent based web service selection and ranking
framework for fulfilling requesters functional and
non functional requirements.
In this paper, an efficient approach to Fig.1 Web Service Architecture
implement optimal web service selection and
ranking for fulfilling service requesters‘ functional A "Service Provider" is a network node that
and non-functional requirements is developed. Web provides an interface to the web services it is
service description was given a description of QoS, providing; it also responds to requests for using its
the description of multi-objective optimization and services. A "Service Requester" is a network node
provided a service selection method based on that discovers and invokes web services to realize a
particle swarm optimization. This method can business solution.
significantly improve the problem solving speed and A "Service Registry" or service broker is a
accuracy, reducing the complexity of service network node that acts as a repository, for the
composition and improve service performance. description of services interfaces that are published
by service providers. The interactions between
II. BACKGROUND these roles involve three operations: publish, find
and bind. The service providers "publish" services

497
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

to a service broker. Service requester "find" With semantic annotations being added to WSDL
required services by using a service broker, and and UDDI, basic definitions of Web service,
then "binds" to these services. operation of service and user query are given as
follows:
B. QOS Attributes
Definition 1:Web Service
QoS for web services is defined as the A Web service ws is the 4-tuple: ws = {n, d, q, P},
nonfunctional properties of the service being where:
provided to its users. These properties are called 1. n is the name of the Web service,
also metrics; common quality attributes for web 2. d is the functional description of the Web
services are Response time, Availability, Latency, service,
Cost, and Throughput. In Service Oriented 3. q is a set of quality items of the Web service,
Architectures (SOA), both service providers and 4. P = {p1, p2…pm } is a set of operations of Web
service clients should be able to define QoS related service.
statements to enable QoS-aware service publication
and service discovery. Definition 2:Operation
The first step toward supporting QoS in WS is a An operation p is the 4-tuple: p = {n, d, I, O},
precise definition of quality metrics related to the where:
services. The quality attributes are classified and 1. n is the name of the operation,
defined according to users‘ QoS requirements 2. d is the functional description of the operation,
(different requirements for different user profiles). 3. I = {i1, i2…in} is a set of input parameters of
These attributes and their related internal the operation,
properties are identified and described by accurate 4. O = {o1, o2...om} is a set of output parameters
and valid metrics. Classes of QoS attributes are of the operation.
defined; each one has different QoS attributes with
different values. Definition 3:User Query:
A user query r is the 4-tuple: r = {n, I, O, λ},
where:
1. n is the functional name of user query, it is the
desired functional operation,
2. I is a set of input parameters of user query,
3. O is a set of output parameters of user query
4. λ(0<λ≤1) is a value set by user, when the
similarity of a Web service and user query is not
less than λ, the Web service is the suitable service.
λ could be changed as needed.

TABLE1: CLASS OF WS WITH QOS WSDL document describes a web service as a


collection of abstract items called "ports" or
Table 1 describes examples of QoS classes for WS "endpoints." A WSDL document also defines the
based on a set of QoS attributes. actions performed by a web service and the data
transmitted to these actions in an abstract way.
III. WEB SERVICE DISCRIPTION Actions are represented by "operations," and data
is represented by "messages."
Discovery of Web service is actually a matching
between user query and published Web service
descriptions.

498
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

The service selection and ranking module


performs initial service selection from web
service repository.

Fig.3 Service Selection and Ranking Module

Based on the functional requirements, the


Fig. 2 WSDL file for travel agent initial set of services is discovered. Then the service
selection is based on the customer Quality of
service requirements. The service requesters
A collection of related operations is known as a submit their Qos requirement. The Request Agent
"port type." which provides interface and communicates with
service requester for acquiring functional
requirements and QoS constraints. The Discovery
A port type constitutes the collection of actions Agent which is in charge of finding initial web
offered by a web service. service set satisfying service requester‘s functional
What turns a WSDL description from abstract to requirements.
concrete is a "binding." A binding specifies the The Selection Agent which collects QoS
network protocol and message format information from QoS database in terms of initial
specifications for a particular port type. discovered web service set and then selects web
A port is defined by associating a network address service set fulfilling service requester‘s QoS
with a binding. If a client locates a WSDL document constraints. The Rank Agent which is utilized to
and finds the binding and network address for each calculate synthetic QoS score of each selected web
port, it can call the service's operations according services, and then ranks them in a descending
to the specified protocol and message format. The sequence according to their QoS marks. Finally,
fig.1 is an example of WSDL document for an ranked service set is returned back to service
travelling agent. In the example that the WSDL requester. The Update Agent which refreshes
document specifies one operation for the service: quality criteria value in the QoS database according
getCar to accumulated feedback information in quality
rating database.
IV. SERVICE SELECTION AND RANKING

V. PARTICLE SWARM OPTIMIZATION IN


SERVICE SELECTION

499
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

PSO is a robust stochastic optimization


technique which is used to solve multi objective
optimization problem.PSO is initialized with a group
of random particles (solutions) and then searches
for optima by updating generations. In every
iteration, each particle is updated by following two
"best" values.
 The first one is the best solution (fitness) it
has achieved so far. (The fitness value is
also stored.) This value is called pbest.
 Another "best" value that is tracked by the
particle swarm optimizer is the best value,
obtained so far by any particle in the
population. This best value is a global best
and called gbest. When a particle takes
part of the population as its topological
Fig.4 Flowchart of PSO process
neighbors, the best value is a local best
and is called lbest.
The service selection algorithm implementation
steps based on particle swarm is as follows.
After finding the two best values, the particle
 Initialize set of service is selected from the
updates its velocity and positions with following
service selection and ranking framework.
equation (a) and (b).
 Evaluate the initial values of the individual
history of the optimal solution Pid and for
v[] = v[] + c1 * rand() * (pbest[] - present[]) + c2
the groups within the optimal solution Pg.
* rand() * (gbest[]-present[]) (a)
Repeat Step3-Step4 until termination
present[]=persent[]+v[] (b)
condition met. Then the output solution set
is obtained.
v[] is the particle velocity, persent[] is the current
 Update particle values for each services
particle (solution). pbest[] and gbest[] are defined
using formula (a) and (b).
as stated before. rand () is a random number
 Re-evaluation each services fitness value If
between (0,1). c1, c2 are learning
the service fitness value is better than Pid
factors.usuallyc1=c2=2.
fitness, then in Pid is set to a new location.
If the particle fitness is better than Pgd
The flowchart of the procedure is given in the fig
fitness, then in Pgd is set to a new location.

VI. MODEL AND EXPERIMENTAL RESULTS

The model particle swarm technique in selection


and ranking framework is implemented as in the fig

500
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Web Service User


Repository Interfac
e

Searching
Process

Selection and Particle


Ranking Spam
Framework Technique

Show the Qos Based


priority based Service
web services Fig,6 SOAP request and SOAP response

The protocol contains SOAP request and SOAP


Display the response.The request contains the requirements
Qos Data given by the cusstomer.The response will return
output
Base the service which match the requesters
requirements.

Fig.5 Design for PSO in Service selection module

The set of services for travel reservation is


registered in a Web service repository using WSDL
file. Then based on the Qos information, the set of
services is selected which meet requesters
functional and non functional criteria. The SOAP
(Simple Object access Protocol) is used for
exchanging information.

Fig.7 Collecting Qos values


The set of Qos values can be collected from the
customer and the efficient service for the

501
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

requirement is provided by using multi objective Intl. Conf. on Service-Oriented Computing,


optimization. 2006, pp.390-401.
[4] D. Roman, U. Keller, H. Lausen, et al, ―Web
service modeling ontology,‖ Applied
Ontology, vol. 1, no. 1, 2005, pp.77-106.
[5] V. Diamadopoulou, C. Makris, Y. Panagis
and E. Sakkopoulos, ―Techniques to
VII. CONCLUSION support web service selection and
consumption with QoS characteristics‖.
Due to the increasing popularity of Web service Journal ofNetwork and Computer
technology and the potential of dynamic service Applications, vol. 31, 2008, pp. 108-130.
discovery and integration, multiple service [6] Guobing Zou, Yang Xiang, Yanglan Gan,
providers are now providing similar services. Dong Wang1 and Zengbao Liu,‖ An agent-
Consumers are, therefore, concerned about the based web service selection and
service quality in addition to the required functional ranking framework with QoS‖, IEEE
properties. This paper proposed an approach on International Conference on 8-11
how to be able to efficiently select web services August,2009,page(s):37-42
with similar functionalities. This is done based on [7] S. Deng, J. Yin, Y. Li, J. Wu and Z. Wu, ―A
particle swarm optimization technique using Quality method of semantic web service discovery
of Service requirements. This optimization based on bipartite graph matching,‖
technique solves the multi objective optimization Chinese Journal of Computers, vol. 31, no.
problem and provides efficiency in service selection, 8, 2008, pp.1364-1375.
ranking and updation of Qos Web services. [8] Mou Yu-jie, Cao Jian, Zhang Shen-sheng,
Zhang Jian-hong,‖Interactive Web Service
REFERENCES Choice-Making Based on Extended QoS
Model‖,Proceedings of the 2005 The Fifth
[1] Z. Xu, P. Martin, W. Powley and F. International Conference on Computer and
Zulkernine, ―Reputation-enhanced Information Technology,IEEE
QoS-based web services discovery,‖ In [9] M. A. Serhani, R.. Dssouli, H. Sahraoui, A.
Proc. of the IEEE Intl. Conf. on Web Benharref, M. E. Badidi,‖ Qos integration in
services, 2007, pp.249-256. value added web services‖,The Second
[2] Y. Liu, A. Ngu and L. Zeng, ―QoS International Conference on Innovations in
computation and policing in dynamic web Information Technology (IIT‘05)
service selection,‖ In Proc. of the13th Intl. [10] Demian Antony
Conf. on World Wide Web, New York: ACM D‘Mello,Ananthanarayana V.S,V.Lakshmi
Press, 2004, pp.66-73. Narasimha,‖ Challenges (Research Issues)
[3] X. Wang, T. Vitvar, M. Kerrigan and I. in Web Services‖, International Conference
Toma, ―A QoS aware selection model for on Advances in Computer Engineering,2010
semantic web services,‖ In Proc.of the 4th

502
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DISTRIBUTED DATA BACKUP AND


RELIABLE RECOVERY USING MOBILE
GRID ENVIRONMENT
B.Bhuvaneswari1 V.Usha Rani2
1
PG Student, Department of Computer Science,
Adhiparasakthi Engineering College, Melmaruvathur,
E-Mail: bhuvanabharathy@gmail.com

2
Assistant Professor, Department of Computer Science,
Adhiparasakthi Engineering College, Melmaruvathur.

Abstract -- In a distributed computing base of many frameworks and


system, middleware can be defined as applications. All are working on fixed
the software layer that lies between the network (communication) topologies. A
operating system and the applications. peer-to-peer (or P2P) network is a model
Today, middleware is successfully used where each host acts as client as well as
in database management systems, web server. They are typically used for
servers, application servers, content connecting hosts via largely ad-hoc
management systems, and similar tools connections. Such networks are useful
that support the application for many purposes like file sharing and
development and delivery process. The data sharing. The decentralized QoS-
reason why using middleware is to aware middleware for check pointing
provide high-level abstractions and arrangement in Mobile Grid (MoG)
services to applications, to ease of computing systems. Check pointing is
application programming, application more crucial in MoG systems than in
integration, and system management their conventional wired counterparts
tasks. Technologies like Corba, J2EE, due to host mobility, dynamicity, less
DOTNET, enterprise service bus are the reliable wireless links, frequent

503
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

disconnections and variations in mobile collectively. Grid computing usually consists of


systems. Simulations and actual test bed one main computer that distributes
implementation show Red’s favorable information and tasks to a group of networked
recovery probabilities with respect to computers to accomplish a common goal.
Random Check pointing Arrangement
(RCA) middleware, a QoS-blind
comparison protocol producing random
arbitrary check pointing arrangements.

Index Terms – Random Check pointing,


Mobile Grid, Reliability Driven
middleware, decentralized check
pointing. Fig 1. Structure of grid computing.
1 INTRODUCTION Mobile computing is one of the fast-
Grid computing is a term referring to growing fields of the current computer era. In
the combination of computer resources from this paper we present a middleware
multiple administrative domains to reach a appropriated to the development of
common goal. What distinguishes grid applications for mobile ad-hoc peer-to-peer
computing from conventional high networks. The proposed middleware
performance computing systems such as implements a communication service and a
cluster computing is that grids tend to be more host respectively service discovery in fully
loosely coupled, heterogeneous, and decentralized networks.
geographically dispersed. Although a grid can The design and the function of the
be middleware will be discussed. To argue the
quality of the middleware
Dedicated to a specialized application, it is We sketch the development of two services, a
more distributed card game and a chat application.
Common that a single grid will be used for a A MoG can involve a number of mobile hosts
variety of different purposes. Grids are often (MHs), i.e., laptop computers, cell phones,
constructed with the aid of general-purpose PDAs, or wearable computing gear, having
grid software libraries known as middleware. wireless interconnections among one another,
Interconnected computer systems where the or to access points. Grid computing is often
machines utilize the same resources used to complete complicated or tedious

504
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

mathematical or scientific calculations. Grid To ensure good grid performance, fault


size can vary by a considerable amount. Grids tolerance should be taken into account.
are a form of distributed computing whereby a Commonly utilized techniques for providing
―super virtual computer‖ is composed of many fault tolerance in distributed systems are
networked loosely coupled computers acting periodic job checkpointing and replication.
together to perform very large tasks. While very robust, both techniques can delay
Furthermore, ―Distributed‖ or ―grid‖ computing job execution if inappropriate checkpointing
in general is a special type of parallel intervals and replica numbers are chosen. This
computing that relies on complete computers paper introduces several heuristics that
(with onboard CPUs, storage, power supplies, dynamically adapt the abovementioned
network interfaces, etc.) connected to a parameters based on information on grid
network (private, public or the Internet) by a status to provide high job throughput in the
conventional network interface, such as presence of failure while reducing the system
Ethernet. This is in contrast to the traditional overhead. Furthermore, a novel fault-tolerant
notion of a supercomputer, which has many algorithm combining checkpointing and
processors connected by a local high-speed replication is presented. The proposed
computer bus. methods are evaluated in a newly developed
grid simulation environment Dynamic
2 RELATED WORK Scheduling in Distributed Environments
In order to carry out this project (DSiDE), which allows for easy modeling of
several references and white papers are dynamic system and job behavior.
referred, from which many valuable
information are inferred, these are:

2.1 Adaptive Task Check pointing and


Replication Toward Efficient Fault-Tolerant
Grid
Grid is a distributed computational and
storage environment often composed of
heterogeneous autonomously managed
subsystems. As a result, varying resource
availability becomes commonplace, often
resulting in loss and delay of executing jobs. Fig 2.Grid Architecture

505
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

2.2 Flexible Rollback Recovery in Dynamic this implies that the execution will always
Heterogeneous Grid Computing create the same (isomorphic) dataflow graph.
Large applications executing on Grid
or cluster architectures consisting of hundreds 2.3 On the Design of Fault-Tolerant Scheduling
or thousands of computational nodes create Strategies Using Primary Backup Approach for
problems with respect to reliability. The source Computational Grids with Low Replication
of the problems is node failures and the need Costs.
for dynamic configuration over extensive Fault-tolerant scheduling is an
runtime. This paper presents two fault imperative step for large-scale computational
tolerance mechanisms called Theft-Induced Grid systems, as often geographically
Checkpointing and Systematic Event Logging. distributed nodes cooperate to execute a task.
By and large, primary-backup approach is a
THEFT-INDUCED CHECKPOINTING
common methodology used for fault tolerance
wherein each task has a primary copy and a
The dataflow graph constitutes a
backup copy on two different processors. For
global state of the system. In order to use its
independent tasks, the backup copy can
abstraction for recovery, it is necessary that
overload with other backup copies on the
this global state also represents a consistent
same processor, as long as their corresponding
global state. we can capture the abstraction of
primary copies are scheduled on different
the execution state at two extremes. At Level
processors. However, for dependent tasks,
0, one assumes the representation derived
precedence constraint among tasks must be
from the construction of the dataflow graph,
considered when scheduling backup copies
whereas at Level 1, the interpretation is
and overloading backups. In this paper, we
derived as the result of its evaluation, which
first identify two cases that may happen when
occurs at the time of scheduling.
scheduling dependent tasks with primary-
SYSTEMATIC EVENT LOGGING
backup approach.
C1: Once a task starts executing, it will
The following are the conditions under which
continue without being affected by external backups can be overloaded on a processor :
events, until its execution ends.
1. Backups scheduled on a processor
C2: The execution of a task is
can overload only if their primaries are
deterministic with respect to the tasks and
scheduled on different processors. Therefore,
shared data objects that are created. Note that
although several backups may be overlapped
on a processor, at most, one of them needs to

506
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

be executed under the single processor failure substitute host chosen to run the recovered
model. The case that several overlapped application in place of the failed host. If data is
backups execute concurrently will not happen. deleted the data should recovery from nearest
2. At most, one of these primaries is neigh our node to avoid delay and improve
expected to encounter a fault. This is to QOS. Upon host failure or inadvertent link
ensure that, at most, one backup is required to disconnection, job execution at a substitute
be executed among the overloaded backups. host can then be resumed from the last good
3. At most, one version of a task is checkpoint. This crucial function avoids having
expected to encounter a fault. In other words, to start job execution all over again from the
if the primary of a task fails, its backup always very beginning in the presence of every
succeeds. This condition is guaranteed by the failure, thus substantially enhancing the
assumption that the minimum required value performance realized by grid applications.
of mean time to failure (MTTF) 1 is always
greater than or equal to the maximum task 3.1 Server Module
execution time in a primary-backup approach. In this module it totally monitor the
client objects and store all the details of the
3. DESIGN AND IMPLEMENTATION clients. if the client misses any details, the
The mobile moving object are server can send the missed data to the
monitored and back up by nearest corresponding client requested by the server.
neighborhood node. If a mobile host or data The server can choose the path to send the
loss may lead to severe performance datas.while choosing the path hierarchical
degradation or even total job abortion, unless clustering algorithm is used. The admin can
execution check pointing is incorporated. view all the details of user, through their
Certain hosts may suspend execution while respective user-id and password. In mobile
waiting for intermediate results (as input to registration the User must register their new
their processes) that may never arrive due to user Id and password with their respective
the host or link failure. If the link failure we Mobile number for any further verification of
can‘t recovery our data Check pointing forces Admin.
hosts involved in job execution to periodically In the user Login the user can view their
save intermediate states, registers, process updated details by sign in with their respective
control blocks, messages, logs, etc., to stable User-Id and Password. In View mobile
storage. This stored checkpoint information Transaction the admin has rights to view all
can then be used to resume job execution at a

507
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

the details of user and they can also view what 2. Random checkpoints are set up to
are all the transaction made by the user. achieve surprise, as opposed to known
3.2 Mobile Host permanently located manned
In the mobile host it will always checkpoints. They might be
monitor the user and send the client datas to established in locations where they
the server. Set check point is used to save the cannot be observed by approaching
upto-date client information into this mobile traffic until it is too late to withdraw
host. Updating the information to the server is and escape without being observed
used to update their respective requested 3. The middleware is built by layered
information to the admin. interacting packages and may be
tailored using different managers
3.3 Moving Object called by a common API so that the
Client is nothing but the moving object users are not concerned of the
which will send the request to the server and different syntax and access methods
get the details from the server. In the of specific packages. The most
information recovery, the user can send their common example is given by the job
request to the admin regarding of their loss of scheduler that can be any of a more or
information or data and they can login through less complex set of products.
their respective user-id and password to view
their data‘s. 4.1 EVALUATION METRICS

1. There are relatively immaterial as check


4 METHODOLOGIES
pointed data from hosts can be stored at a

1. Random checkpoint (also known as a designated server or servers, since

"flying checkpoint", "mobile connections to a server are deemed Reliable,

checkpoint" or "hasty checkpoint") is a of high bandwidth, and of low latency.

military and police tactic involving the


2. To fail to deal with link disconnections and
set up of a hasty roadblock primarily
degrees of system topological dynamicity. In
by mobile truck-mounted infantry or
contrast, a MoG highly desires its check
police units in order to disrupt
pointed data to be kept all at neighboring
unauthorized or unwanted movement
MHs rather than remote ones who require
and/or military activity.
multiple, relatively unreliable, hops to

508
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

transmit checkpoints and to reach check bed studies, that ReD achieves significant
pointed data when it is needed. reliability gains by quickly and efficiently
determining checkpointing arrangements for
most MHs in a MoG. ReD is shown to
outperform its RCA counterpart in terms of the
average reliability metric and does so with
fewer required messages and superior stability
(which is crucial to the checkpoint
arrangement, minimization of latency, and
wireless bandwidth utilization).
High mobility environments necessitate
some moderation in so that the system can
flexibly adjust arrangements sufficiently to
more optimally track reliability as host
positions and network conditions change. This
is borne out by the flattening of the curves
5 CONCLUSION AND FUTURE WORK
indicating a point of diminishing returns upon
In the proposed system we have used further increases.
the random checkpointing arrangements to Therefore, both average host
increase the minimization of latency and mobility and average provider density are
produce 100% accuracy by using less wireless inputs to the process of modulating and thus
connections. Nodal mobility in a large MoG stability versus reliability control.
may render a MH participating in one job Using stability control mechanism,
execution, unreachable from the remaining hosts could theoretically snoop broadcasts of
MHs occasionally, calling for efficient general wireless traffic to monitor the level of
checkpointing in support of long job execution. break message activity, gather data about
As earlier proposed checkpointing approaches neighborhood density and mobility through ex-
cannot be applied directly to MoGs and are not change with neighbors about connectivity, and
QoS-aware, we have dealt with QoS-aware thus modulate in order to effectively and
checkpointing and recovery specifically for responsively control stability versus
MoGs, with this paper focusing solely on arrangement reliability.
checkpointing arrangement. It has been
demonstrated via simulation and actual test 6 REFERENCES

509
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[1] Computerworld, ―HP Promises Global Design of Fault-Tolerant Scheduling Strategies


Wireless for Notebook PCs,‖ Using Primary-Backup Approach for
http://www.computerworld.com/mobiletopics/ Computational Grids with Low Replication
mobile/story/0,10801,110218,00.html?source= Costs,‖
NLT_AM&nid=110218,Apr.(2006). [8] SUN Microsystems, ―Sun Grid Compute
[2] H. Higaki and M. Takizawa, Oct. (1998) Utility,‖ http://www. sun.com/service/sun grid,
―Checkpoint-Recovery Protocol for Reliable (2006).
Mobile Systems,‖ Proc. 17th IEEE Symp. [9] Samir Jafar, Axel Krings, Senior Member,
Reliable Distributed Systems, pp. 93-99, IEEE, and Thierry Gautier (2009), ―Flexible
[3] ―HP and Cingular Wireless Introduce First Rollback Recovery in Dynamic Heterogeneous
Global Broadband Notebook PC in U.S.,‖ HP Grid Computing,‖
Press Release, http://www.hp.com/
hpinfo/newsroom/press/2006/061211a.html?ju
mpid=reg_ R1002_USEN, Dec. (2006).
[4] Hewlett-Packard Development Company,
L.P., ―Grid-Computing—Extending the
BoundariesofDistributedIT,‖http://h71028.ww
w7.hp.com/ERC/downloads/4AA03675ENW.pdf
?
Jumpid=reg_R1002_USEN, Jan (2007).
[5] ―IBM Grid Computing,‖ http://www-
1.ibm.com/grid/about_grid/what_is.shtml, Jan.
(2007).
[6] Maria Chtepen, Filip H.A. Claeys, Bart
Dhoedt, Member, IEEE, Filip De Turck,
Member, IEEE, Piet Demeester, Senior
Member, IEEE, and Peter A. Vanrolleghem
(2009), ―Adaptive Task Checkpointing and
Replication: Toward Efficient Fault-Tolerant
Grids,‖
[7] Qin Zheng, Member, IEEE, Bharadwaj
Veeravalli, Senior Member, IEEE, and Chen-
Khong Tham, Member, IEEE (2008), ―On the

510
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

A CROSS LAYER PLATFORM FOR


SEAMLESS VERTICAL HANDOVER IN
HETEROGENEOUS NETWORKS

*G.SHANMUGA PRIYA **MR.P.VETRIVELAN

P.G. SCHOLAR , M.E. CSE, RAJALAKSMI ENGINEERING COLLEGE, CHENNAI, INDIA.


PRIYAGSP1987@GMAIL.COM.
SENIOR LECTURER, DEPARTMENT OF CSE, RAJALAKSMI ENGINEERING COLLEGE, CHENNAI, INDIA.
VETRIVELANSIR@GMAIL.COM
Abstract— In this paper we proposed the Index Terms:-Vertical Handoff, quality-of-
simulation of Vertical Handover which improves service(QoS),received signal
the monetary cost and application QoS. Vertical strength(RSS),access point(AP) mobile
Handoff is an important mechanism for seamless node(MN) Mobility management, monetary cost.
handover in heterogeneous networks. There is a
trade-off between the monetary cost and 1. INTRODUCTION
application QoS. In existing systems it becomes Rapid progress in wireless networking
complicated to illustrate the gains in terms of technologies has created different types of
both the metrics. The distribution of multimedia wireless systems such as Bluetooth, Wi-Fi,
content over heterogeneous wireless networks WiMax, 3G Cellular Networks, and so on. These
to mobile devices involves significant technical wireless networks are heterogeneous in sense of
challenges related to mobility management and the different radio access technologies and
quality of service provisioning. In some communication protocols they use and the
situation where many mobile devices try to different administrative domains they belong to
handoff to the same AP, where the RSS is . From this fact, it follows that no access
maximum there will be congestion at the AP technology or service provider can offer
and delay at the AP. In this paper to improve ubiquitous coverage expected by users requiring
the monetary cost we are maximizing the connectivity anytime and anywhere. The actual
battery life time of each MN. However in trend is to integrate complementary wireless
heterogeneous network , the amount of traffic technologies with overlapping coverage, to
that each MN relays has a great impact on the provide the expected ubiquitous coverage and to
MN‘s battery lifetime. Hence a VHD algorithm achieve the Always Best Connected (ABC)
and route selection algorithm is used to improve concept .The ABC concept allows the user to use
the monetary cost. Here we take account of QoS the best available access network and device at
parameters such as through put , jitter , packet any point in time.
error rate and end-to-end delay to improve the
QoS of the application services. Results are 4G Networks are all IP based
simulated using the NS2 simulator. heterogeneous networks that allow users to use
any system at anytime and anywhere. Users

511
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

carrying any integrated terminal can use a wide application level performance measures and
range of applications provided by multiple requirements should be collected and considered
wireless networks. 4G systems provide not only together in order to make optimal handover
telecommunications services, but also a data- decision for each user flow.
rate service when good system reliability is
provided. At the same time, a low per-bit The rest of this paper is organized in the
transmission cost is maintained. Users can use following way. Section 3 gives a detailed
multiple services from any provider at the same explanation on the architecture of the proposed
time. The mobile may simultaneously connect to platform. Simulations and its numerical results
different wireless systems. to illustrate the potential merits of the proposed
Under these circumstances, critical to the platform and the algorithms are presented in
satisfaction of service users is being able to Section 3. Finally, Section 4 concludes the
select and utilize the best access technologies paper.
among the available ones. In this paper ,we
envision a comprehensive architectural platform Handovers refer to the automatic
for mobility management that allow end-users to failover from one technology to another in order
dynamically and fully take advantage of different to maintain communication. This handoff
access networks. All of the existing solutions technology is needed for seamless mobility and
cannot satisfy all different QoS constraints of to get the connection without any interruption.
various applications and also the monetary cost. The types of handovers are:
1) Horizontal Handover
In this paper, we also deploy an end-to-end 2) Vertical Handover
mobility management technique that is similar to 3) Downward and Upward Handover
the ones proposed in [2]. However, our main In vertical handover the user can move
objective of adopting an end-to-end mobility between different network access technologies.
management is not just to achieve the In vertical handover the mobility perform
objectives of [2], but to consider the monetary between the different layers. In vertical
cost. Specifically, handover the mobile node moves across the
the objectives of the proposed architecture can different heterogeneous networks and not only
be summarized as follows: changes the IP address but also change the
network interface, QoS characteristics etc .
1. Enabling the triggering of handover decisions 2. MOBILITY MANAGEMENT
when a new user enters or move away from the Mobility management contains two components:
current network. location management and handover
2. Enabling the handover decision that optimizes management.
the handover performance in terms of each
application requirements and also monetary Location management enables the network to
cost. find the current attachment point of a mobile
3. Implementing a route selection algorithm that user. The first step is location registration (or
maximizes the overall battery lifetime as well location update). In this step, the mobile
providing Quality of service. terminal periodically informs the network of its
up-to-date location information, allowing the
These objectives require the gathering network to authenticate the user and update the
of dynamic status information across entire user location profile. The second step is call
protocol stack as well as the dynamic delivery. The network determines the current
adjustment of protocol parameters and controls location in which a mobile terminal is located.
accordingly. For instance, not only the There are some challenges for the design of
information on the available wireless network location management especially for inter-domain
interfacees and their characteristics but also the roaming in terms of the signaling overhead, call

512
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

delivery latency, and QoS guarantees in different functional modules to facilitate efficient inter-
systems. layer communications. The MAs are the interface
to each legacy protocol layers to monitor and
Handover management enables the network to collect protocol specific dynamic status
maintain the on-going connection when a mobile information as well as to adjust the protocol
terminal switches its access point. There are controls without requiring direct modification to
three stage processes for handover. First the the existing protocols.
initiation for handover is triggered by user,
network agent, or changing network conditions. The PDB maintains both the static and
The second stage is new connection generation, dynamic information necessary for handover
where the network must find new resources for related decisions and processes, and the
the handover connection and perform any dynamic part of the information in the PDB are
additional routing operations. Finally dataflow updated by the MAs. The information about the
control needs to maintain the delivery of the available network interfaces available, available
data from the old connection path to the new bandwidth, protocols for increasing the QoS.
connection path according to agreed-upon
service guarantees. Mobility management in The decision engine (DE) maintains the
heterogeneous networks can happen in different per-application handover processing policies to
layers of the OSI protocol stack reference model enable seamless handover of each user session.
such as network layer, transport layer, and A set of rules to determine when to trigger the
application layer. handover decision procedure for a certain
service flow is also maintained to avoid
3. A PROTOCOL FOR MAINTAINING unnecessary handover decision processing
MOBILTY MANAGEMENT FOR CROSS caused by redundant status reports from
LAYER multiple MAs. In making the handover decisions,
the DE utilizes the information on the predefined
key parameters across the protocol layers by
obtaining the necessary static/dynamic data
from the PDB. The DE consists of one or more
application-specific DEs (ADEs). Each ADE
contains application-specific mechanisms for
handover decisions (HDM) and the

The DE maintains a session profile table


(SPT), which contains information on each on-
going user session. A Vertical Handover Decision
Controller (VHDC) is used for maximizing
battery power and load balancing. The VHDC
obtains triggers in two conditions.
1) While in service at an AP, the RSS for the MN
Fig:1 The modules has dropped below a specified threshold.
The modules of the platform are 2) While in service at a BS, the RSS from one or
Monitoring agent (MA) more APs has just exceeded a specified
Profile database (PDB) threshold.
Decision engine (DE)
IP Agent If the triggers indicates possibility 1,
then the VHDC tries to search for other
Monitoring agent (MA) for each protocol networks for connection handoff. In the case
layer, and profile database (PDB) are the that there exist multiple choices of APs for

513
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

handoff, the VHDC evaluates the APs and then going user session. An SPT entry is created for
directs a handoff operation to the network with each user session, and it contains the 5-tuple
optimal performance /cost. On the other hand, representing the user session, i.e., (source
if no other APs are found perceived address, destination perceived
for a possible handoff, then the cellular network address, transport protocol, source port number,
would then be considered the best available destination port number), transport IP and MAC
wireless network. addresses, ADE class of the user session,and the
unique session identification (SID). The SID is
IP Agent is responsible for the mapping used by the all functional modules of the
of the end point addresses of ongoing sessions proposed platform in order to identify the user
to the addresses corresponding to the current session uniquely. The DE also updates its SPT
location. It enables the discovery of a peer‘s entry and/or executes the ADE in the following
current location as well as the continuity of data three cases: (1) a new user session
delivery transparent to the mobility by tracking initiation is notified by the application layer MA,
IP address changes of end points. The
functionality of IP Agent consists of two core
modules: address management module (AMM)
and location management module (LMM).

The proposed inter-layer communication


architecture is realized through the MAs and the
PDB. MAs provides per protocol interfaces to
interact with the rest of the proposed platform.
Upon the start of each user session, the DE
informs the MAs not only the parameters to be
monitored but also the criteria to update the
PDB and/or the conditions to trigger a
notification to the DE for the new user session.

Following these directions, the MAs


access the PDB to update the protocol specific
dynamic status information and notify the DE
about certain protocol specific events that
requires for the handover decision procedure to Fig 2: The proposed platform
be triggered for the related user sessions. For
certain protocol parameters monitored by the (2) event requiring the handover decision
MAs, especially, the parameters of network or of procedure to be triggered is notified by one or
the layers below, more than one user sessions more MA(s), or (3) handover of a peer for a
could be related. PDB maintains the protocol certain user session is notified by the IP Agent.
specific dynamic status information collected by The main functions of the IP Agent are the
the MAs as well as the static information. The discovery of a peer‘s current location as well as
static information may maintaining the continuity of data delivery
include the user preference, device capability, during handover by tracking IP address
QoS requirements of different application changes.
classes, static characteristics or capability of
various wireless access networks. As shown in Fig 2 the functionality of IP Agent
consists of two core modules: address
The DE maintains a session profile table management module (AMM) and location
(SPT), which contains information on each on- management module (LMM). For end-to-end

514
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

mobility management, we define two different 3. When QoS requirements of applications in the
kinds of IP addresses recognized by the current access network cannot be satisfied.
proposed platform: (1) perceived address – an 4. When the RSS for the MN has dropped below
IP address originally perceived by transport layer a specified threshold.
session and its socket, and (2) transport address
– an IP address used for actual data When a new access network is detected or the
transportation that changes per-handover and current access network is unreachable, the MAs
represents the current location of an MN. in the PHY/MAC layer can generate HDTEs
through the detection of link status changes.
When a new user session is initiated, the DE is Further, the platform allows the network layer
informed of a new user session initiation by the MA to generate HDTEs by detecting a new IP
application MA . The DE first determines the SID address acquisition of a reachable neighbor
and the ADE class for the new session. The DE network from the current access router,
then consults the PDB to find out all of the receiving ICMP error messages due to a link
available network interfaces. If more than one failure over data path.
interface is active, the DE executes the ADE in
order to determine the initial transport address When QoS requirements of applications in the
for the user session. Determining the above current access network cannot be satisfied any
three things completes the generation of the MA of each protocol layer can generate HDTEs
SPT entry for the new session. The DE then as different QoS parameters in each protocol
informs the new user session initiation to IP layer are monitored. For example, QoS
Agent with the source/ parameters for video service such as a peak
destination perceived addresses and the source noise to signal ratio (PNSR), delay-jitter, or
transport address of the session, and the ADE packet loss rate can be monitored by any MAs in
generates the directions to the MAs regarding application, transport and network layers.
the parameters and criteria for the new user
session. The AMM of IP Agent then generates a When the RSS for the MN has dropped below a
new MIT entry for the new user session, and specified threshold , the MA can generate when
queries LMM to find out the destination a new access network is detected or current
transport address. By obtaining the peer‘s access network is not reachable the HDTE is
current location IP address from the LMM, the also send to VHDC to optimize the system
AMM completes the new MIT entry . The AMM performance[1].
then start establishing an end-to-end mobility
management session for the new user session
with the peer AMM by transmitting an end-to- For each AP the load on the AP can be given as
end mobility management (E2E-MM) message.
Load(P)= ∑ e

4.1 HANDOVER DECISION TRIGGERING e->Effective bandwidth of MN when attached to


MECHANISM AP.

The battery lifetime of MN which is denoted by L


The MN generate handover decision Triggering L=∑ l*x
Events(HDTE) in the following cases. where
l->is the ratio of available battery power to
1. When a new access network is detected. Power consumption rate per unit time.
2. When the current access network is X=0 if RSS<RSS threshold to connect to the AP.
unreachable.

515
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Referring to the SPT, the DE determines the [3] to evaluate the performance of the proposed
sessions to enter the handover decision platform and the application-specific handover
procedure for each HDTE generated decision approach. In this simulation, we
considered the throughput, packet loss ,delay,
load balancing and maximizing the battery
The DE can receive multiple HDTEs in a raw power occurred only by handovers. For the
from various layers. Preference must be given handover
for cause 1 and cause 2 rather than cause 3 and decision polices, both the monetary cost and the
4. application QoS are the most interested ones to
the users as well as to the service providers.
4.2 THE HANDOVER PROCEDURE
As illustrated in Fig. 2, when an MA notifies the
DE of HDTEs to trigger the handover decision
procedure , the DE determines the related
session by referring to the SPT and determines
whether it is necessary to execute the
corresponding ADE or not. If the ADE is invoked,
the HDM of the selected ADE is then executed
first. The HDM consults the PDB to obtain the
necessary status information to make the In our simulation, though, we consider both the
application handover decision and the network application QoS and monetary cost as the
selection and handover takes place. Then the handover decision policy in order to illustrate the
VHDC is invoked. It checks for the triggers and gains of the proposed platform in a
execution takes place. The SPT entry is also straightforward way. Since there is a tradeoff
updated with the new transport IP and MAC between the monetary cost and the application
addresses. In order to inform its peer IP agent QoS, it becomes complicated to illustrate the
about the change of its transport address, the gains of the proposed approach themselves in
DE informs the new transport address to the terms of both metrics. But we evaluate the
AMM of IP Agent with the Handover Indication performance of the proposed
signal . The AMM then modifies the source platform in terms of the application QoS and
transport address of the corresponding session monetary cost. Whenever the trigger occurs the
from the MIT. Further, the AMM sends out an VHDC checks for handover decision and
E2E-MM message informing the peer AMM of its maintains the load balancing and thereby
new transport address . In this case, the E2E- optimizing the battery life time of MN.
MM message may also include additional
information necessary for the transport/ (1) For file transfer the policy was set to
application layer control adjustment for maximize the throughput. Therefore, whenever
seamless a new network is discovered or the current
handover. access network is not
reachable any the HDM is called to select a
A route selection algorithm is used to forward network with the highest available bandwidth
the packets. We use Dynamic Source among the candidate networks. After the
Routing(DSR) for forwarding. handover completion, when the peer IP Agent
receives E2E-MM control messages notifying the
transmission path update, the peer‘s TAC can
5. PERFORMANCE EVALUATION direct the MA to resume the data transmission
We have constructed a simulation model with using the adaptive transmission rate to the new
NS-2 data path.

516
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

(2) For Audio and Video data the HDM, the management platform for application-
policy was set to minimize the handover latency, specific handover decision in overlay
packet delay-jitter variations, and packet loss. networks.‖ homepage:
Therefore, only when the current service www.elsevier.com/locate/ comnet published by
network cannot meet its QoS requirement or the the authors.
MN moves out of the coverage of the current [3] http://www.isi.edu/nsnam/ns.
service network , the MN attempts a handover.
A network with the largest coverage is selected
as long as it provides the required QoS in order
to minimize the number of handover.

CONCLUSION
When connections need to migrate
between heterogeneous networks seamless
vertical handoff is necessary. We have proposed
the platform for mobility management which is
essential seamless vertical handoff is the first
step... There is a trade-off between the
monetary cost and application QoS. In existing
systems it becomes complicated to illustrate the
gains in terms of both the metrics. In this paper
we have improved the monetary cost as well as
QoS for applications which is not available in the
existing methods. To improve the monetary cost
we are maximizing the battery life time of each
MN. However in heterogeneous network , the
amount of traffic that each MN relays has a
great impact on the MN‘s battery lifetime. Hence
a VHD algorithm and route selection algorithm is
used to improve the monetary cost. Here we
take account of QoS parameters such as
through put , jitter , packet error rate and end-
to-end delay to improve the QoS of the
application services. Results are simulated using
the NS2 simulator.

REFERENCES:
[1] SuKyoung Lee, Member, IEEE, Kotikalapudi
Sriram, Fellow, IEEE, Kyungsoo Kim, Yoon Hyuk
Kim, and Nada Golmie ―Vertical Handoff
Decision Algorithms for Providing
Optimized Performance in Heterogenous
Wireless Network‖, Member, IEEE IEEE
TRANSACTIONS ON VEHICULAR TECHNOLOGY,
VOL. 58, NO. 2, FEBRUARY 2009
[2] Moonjeong Chang a, Hyunjeong Lee b,
Meejeong Lee Ewha Womans University, 11-1
Daehyun-dong, Seodaemun-gu, Seoul 120-750,
Republic of Korea ―A per-application mobility

517
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

DETOURING AVOIDANCE IN DISTRIBUTED


CLUSTERS WITH SELF-ORGANIZING MODE
USING KADEMLIA

*P.SIREESHA **E. SUJATHA, M.TECH, M.B.A, (PH.D).,


*PGSCHOLAR, M.E CSE, S.A ENGINEERING COLLEGE, CHENNAI,,INDIA, SIRI.IT.05@GMAIL.COM
**ASSISTANT PROFESSOR,DEPT. OF COMPUTER SCIENCE & ENGG.,S.A ENGINEERING
COLLEGE,CHENNAI,,INDIA

which makes a node's logical ID independent of its


physical location. Consequently, the routing
Abstract algorithm decreasing hops in logical topology
can't shrink physical latency. This mismatch
Kademlia proposal with topology-awareness between the two topologies results in a serious
to focus on the Detouring problem caused problem known as detouring.
by difference between logical topology and
physical topology of structured peer-to-peer
networks using distributed algorithm, in To solve this problem, a topology-aware Kademlia
which nodes are classified into self- based on distributed clustering in self-organizing
organized clusters. In this paper clustering- mode is proposed in this paper. First, a distributed
based mechanism is designed to rationalize clustering algorithm is presented to classify all the
the routing procedure node by node. As per nodes into self-organized clusters according to
theoretical analysis, this advanced Kademlia their physical proximity. Then, a nodeID (NodeID
is also O(logN) running but more capable is the identifier to mark a node assignment
compare to original in physical topology. mechanism based on the constructed clusters IS
Results of simulation show its marvelous designed to correlate two topologies. Both
concert on decreasing latency and improving theoretical analysis and simulation results prove
stretch by nearly 15% by removing the that the improved structure can boost the
difference. efficiency of system remarkably by rationalizing
I. INTRODUCTION the routing procedure. The remaining sections of
The structured P2P has recently emerged as a this paper are organized as follows. After a survey
candidate infrastructure for building large-scale of related studies in Section 2, a clustering
and robust network applications. The core algorithm in self-organizing mode is proposed .in
component of the structured P2P is the distributed Section 3. Section 4 details the topology-aware
hash table (DHT). Many classical DHT networks Kademlia. Simulation results and performance
have been proposed in [1-5], among which analysis are presented in Section 5. Finally,
Kademlia is widely utilized in practical system for Section 6 concisely summarizes this paper.
its simple mechanism and low maintenance
overhead. Since DHT networks create a virtual
topology over the physical topology only relation II. RELATED WORK
between two layers exists in the Hash algorithm,

518
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Topology-aware techniques for P2P have been locations, into self-organized clusters with
extensively studied in recent years. Three main properties of certainty and symmetry.
approaches are widely used to construct topology- A. Basic definitions
aware structured P2P overlay: geographic layout, Before illustrating the clustering algorithm, four
proximity routing and proximity neighbor selection basic definitions are introduced to show the
[6]. Geographic layout is to reflect physical method on the physical topology-aware overlay
location of a node by the value of its NodeID. As construction and the cluster
NodeID assignment mechanism is always identification.
determined by the architecture Itself, the
geographic layout method is always conceived for Definition 1) The reference frame of physical
a certain P2P protocol. [7] Presents an effective topology is defined as n different landmarks [10]
implementation in CAN [1] with an achievement of in Internet; each landmark stands for one
a loss delay stretch. In addition, [8] proposes a dimension of physical topology. In such reference
method of getting an appropriate NodeID by frame, a node locates itself according to the
considering the nodes' physical position in Chord Round Trip Time (RTT) by sending ICMP echo
[2]. Proximity routing approach attempts to select message (Ping) to each landmark. Out of
a relatively nearnode from a set of candidates as consideration for accuracy, the landmarks are
the next hop of routing. supposed to be distributed uniformly in Internet.

Definition 2) The landmark permutation indicates


Following this approach, clustering is usually a node's location in physical topology. Before
considered to produce the set of candidates. In [9- joining the system, a new node Nj measures the
11], overlay networks are constructed on RTT to all landmarks
centralized clusters in which super nodes Incline to and permutated them following the Shortest
be bottleneck of system. Although some Latency First principle to construct Permutationj.
distributed clustering algorithms are proposed in
[12-14], there is still much room to improve the Definition 3) The cluster identifier ClusterID
efficiency and reduce the expense. Proximity labels a cluster uniquely. In a reference frame with
neighbor selection is implemented by means of n landmarks, the length of ClusterID is n i1og2n 1
establishing and maintaining a routing table bits. For a certain node Nj € Clusterj, ClusterIDj
comprising proximity neighbors. The classical equals to Permutationj, when landmarks are
example is Pastry [5], as well as researchers try to interpreted in binary. The relation <ClusterIDj ,
apply this approach to other DHTs in [15-16]. NodeIDj> is published in network as resource .
Definition 4) The resolution of reference frame
As far as special protocols are concerned, some indicates how many different clusters can be
new methods are proposed for IPv6 or Ad-hoc in discriminated in the reference frame. Since the
[17-22]. However, they are not suited to current cluster and the landmark permutation are in one-
Internet circumstance. one correspondence, ni is defined as the resolution
of a reference frame with n landmarks.
Both advantages and drawbacks exist in three
approaches mentioned above. This paper B. Clustering Algorithm
concentrates on combining and highlighting their Supposing Nj € Clusterj whose cluster identifier
advantages to investigate a methodology is ClusterIDj, Algorithm I details the clustering
for Kademlia. algorithm. According to Algorithm I , the reference
frame divides the physical topology into n!
independent regions and nodes fallen into the
III. DISTRIBUTED CLUSTERING A LGORITHM same region form a cluster. In addition, these
clusters have two favorable properties as follows
Traditional clustering relies on the entire for designing routing algorithm.
knowledge of topology, which is impossible to
acquire for .P2P system due to its dynamic. Property 1) Certainty: for a node Nj, there must
Therefore, a distributed algorithm is proposed to be a unique cluster which Nj belongs to.
classify nodes, according to their physical

519
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Proof sketch: According to steps 6-7 in Algorithm =100101011000001010 and ClusterIDj


I, ClusterIDj equals to Permutationj denoted in 100101011010001000, then group(ClusterIDj)
binary code, besides the existence and unicity of =100│101│011│000│00I│010 and
Permutationj, so Property 1 is proved. group(ClusterIDj) 100│101│011│010│001│000
.So the mapping distance between Clusterj and
Property 2) Symmetry: if Nj € Clusterj is true. Clusterj is 000101 in binary or 5 in integer.
Then Nj € Clusterj holds.
Proo/sketch: According to steps l3-15 in Algorithm I: procedure ClusteringO
Algorithm I, Nj exchange contact information with Require: N has a contact to an already
any Nj € Clusterj and Nj combine Ni into Clusterj participated node Nk{k≠i}.
afterwards. When the network is steady, if Nj € L={ landmarkj│1≤j≤n} list the set of landmarks
Clusterj, then Nj € Clusterj holds. interpreted in binary.

IV. TOPOLOGY-AWARE KADEMLIA 1: Clusteri = {Ni};


2: for every landmarkk € L
In previous studies [12-17], nodes in the same 3: Test the RTT to landmarkk by Ping;
cluster are selected to be the next hop to improve 4: endfor
routing efficiency. However, they omit the way 5: Construct Permutation;
how to choose a node in different clusters. Hence, 6: ClusterIDi = Permutationj;
a novel definition of distance between clusters is 7: J=FindResource(Nk, ClusterIDi};
introduced in this section, along with a NodeID 8: for every Nj € J
assignment mechanism correlating physical 9: if Nj is reachable
topology with logical topology in Kademlia. 10: Clusteri = Clusteri U {Nj};
A. Basic Scheme 11: endif
Unconventionally, each landmark has different
capability of locating a certain node in the 12: endfor
reference frame defined-the closer nodes are more 13: for every Nj € Cluster;
capable than the further ones. If via weighs such 14: Exchange its contact information with Nj
capability of the ath nearest landmark in 15: endfor
Permutationj, with is in reverse ratio to a . Thus, 16: Publish < ClusterIDi, NodeIDi > as resource;
for Nj and Nj, the ordinal of inconsistent 17: endprocedure
landmarks between Permutationj and Permutationj
can be used to measure their distance in physical Original Kademlia implements the proximity
topology. neighbor selection in logical topology, nevertheless
the detouring problem still remains as a result of
Definition 5) The mapping distance the mismatch of two topologies. The cluster
Dis(ClusterIDj, ClusterIDj) is defined as the group- identifier will be integrated into NodeID to handle
wise exclusive or (XOR) result between ClusterIDj it in this paper.
and ClusterIDj. It is calculated from equation (1),
in which group() operation Algorithm IL Function RoutingO
means dividing the cluster identifier into segments Require: Nj is looking up the target Nj. i is
of [ lOg2 n] bits in a reference frame with n initialized to 0.
landmarks,
I: distance = NodelDi ® NodelDj ;
Dis(ClusterIDi ClusterIDj) = group(ClusterIDJi) & 2: k'= [ log2 distence ];
group(ClusterIDj) 3: while i <K
4: if the contact information of Nj is in k'th bucket
Revealed by step 7 in Algorithm I, all nodes in the 5: break;
same cluster have an identical landmark 6: else
permutation, so that they are close enough to 7: NodeSet = a closet nodes without being
ignore their tiny distance. Therefore, the definition requested;
above just depicts the distance between clusters 8: for every Nm € NodeSet
rather than nodes. For instance, if ClusterIDj 9: FIND_ NODECNm, NodeIDj);

520
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

10: if Receive(Nm)=true
I I: i++; The routing table is an improvement to traditional
12: Update the routing table; k-buckets [4] in the proposed Kademlia. Every
13: endif node stores a list of <NodeID, IP address, Port,
14: endfor Dis, T ime> records for neighbors whose XOR
15: endif distance fall into the range between I and i+l. If Ni
16: endwhile EClusteri and Nj EClusterj,Dis stands for the
17: if i≤K mapping distance between Clusteri and Clusterj.
1 8: return the contact information of Nj; In each k-bucket, records are sorted by Dis-the
19: else nearest node at the head, the farthest node at the
20: return 0; end; thus the nearer neighbors have the priority to
21: endif be requested. Time registers the latest contact
22: end function time of neighbor, which is the basis to update
B. Logical Topology routing table. The farthest XOR distance between
nodes is 2^ n[log2n ] +128 and there are n[log2n
The logical topology of original Kademlia is an ] +128 k-buckets totally, since NodeID is a
Incomplete Binary Tree, where nodes are n[log2n] +128 bit quantities. The original
determined as leaves by unique prefix of 160 bit Kademlia defines four RPCs: PING, STORE,
hash quantities. The notion of distance is defmed FIND_NODE and FIND VALUE. In the proposed
to be bit-wise XOR between NodeIDs. structure, they are also adopted in routing
algorithm which is detailed in Algorithm II .

D. Theoretical Analysis

The comparisons of two structures are listed in


Tab. 1 to demonstrate the effective function of the
proposed Kademlia. N is the total number of nodes
in P2P system. T represents the average latency
between nodes in the original structure, whereas
T1 and T2 represent respectively the average
latency between nodes within the same cluster
Figure I. Logical topology of the proposed and among different clusters in the proposed
Kademlia structure. Evidently, TI is less than T2 . The
statistics in table is proved by three assertions.
Fig. 1 illustrates the logical topology of the
proposed Kademlia which is still an Incomplete TABLE I. Comparisons of Two Structures
Binary Tree but comprises two layers. Structure Capacity Exp.of Exp.of Latency
Hop
 Layer A constructs ni1og2n 1 bit cluster Original 216u O(logN) TO(logN)
space, in which the proximity relation of
Proposed nt* 2T!8 O(logN) (pT,
subtrees reflects the mapping distance
+(I.p)T2)O(logN)
between corresponding clusters.
 Layer B categorizes nodes belonging to
the same cluster into an identical subtree Assertion 1) The capacity of proposed Kademlia
where the representing leaves are built on n landmarks is n!*i28 .
determined by unique prefix of 128 bit Proof sketch: n! is the resolution of the
hash quantities. reference frame with n landmarks. Moreover,
there can be 2^128 nodes in a single cluster at
NodeID is constituted of two parts in the proposed most. Thus the capacity of network built on n
structure: the n[Iog2n] bit cluster identifier and landmarks is n!*i28 .
the 128 bit hash quantities.
Assertion 2) The proposed Kademlia is as efficient
C. Routing Algorithm as the original in logical topology.

521
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

Proof sketch: The difference between Algorithm


II and lookup [4] procedure in original Kademlia is
in the way how to select the next hop. However,
they share the common routing mechanism.
Therefore, they are equally efficient in logical
topology as O(logN) running system proved in [4].

Assertion 3) The proposed Kademlia is more


efficient than the original in physical topology.
Proof sketch: Define the probability that
FIND_NODE arises within the same cluster to be p.
In this case, the expectation of latency in
proposed structure is (pTI+(I-p)T2)O(logN) .On Figure 3. Percentage contrast in physical latency
the other hand, the latency in original structure is
TO(logN). Because TI ≤ T2 ≤ T is true, ((pTI+(I-
p)T2)O(logN) ≤ (TO(logN» always holds. In Fig. 3 shows the percentage contrast in physical
conclusion, the proposed Kademlia has more latency among four protocols. It is observed that
excellent physical efficiency than the original. A-Kademlia performs best. Comparing with
Kademlia, A-Kademlia has shortened the latency
V. SIMULATION RESULTS AND PERFORMANCE by nearly 15%. This result accords with what is
ANALYSIS proved in Assertion 3).
The simulation is carried out on P2PSim [23]
platform, in which a new protocol named as A-
Kademlia realizes the proposed structure, derived
from the existed Kademlia protocol. King data set
[24] and E2E Graph are used as the topology data
and the topology model respectively. RedHat9.0 is
the operating system used. All of the data are
collected from output files by programs.

Figure 4. Percentage contrast in latency per hop

Fig. 3 shows the percentage contrast in physical


latency among four protocols. It is observed that
A-Kademlia performs best. Comparing with
Kademlia, A-Kademlia has shortened the latency
Figure 2. Percentage contrast in logical hops by nearly 15%. This result accords with what is
proved in Assertion 3).
Fig. 2 shows the percentage contrast in logical
hops among four protocols. Notice that request in
Tapestry undergoes the longest path in logical
topology. Furthermore, the other three protocols
are almost equal in this aspect of performance,
which is consistent with Assertion 2).

522
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

peer lookup service for internet application," Proc.


ACM SIGCOMM, ACM Press, 2001, pp. 149-160,
doi:lO.1145/383059.383071.
[3] Y. B. Zhao, 1. Kubiatowicz and A. D. Joseph,
"Tapestry: an infrastructure for fault-tolerant wide-
area location and routing," Technical Report: CSD-
Ol-1l41, University of California Berkley, 2001.

[4] P. Maymounkov and D. Mazieres, "Kademlia: a


Peer-to-peer information system based on the
XOR metric," Proc. International workshop on
Peer-to-Peer Systems, Springer-Verlag, 2002, pp.
Figure 5. Percentage contrast in success rate
53-65.
Fig. 5 shows the percentage contrast in success
[5] A. Rowstron and P. Druschel, "Pastry: scalable,
rate among four protocols. Evidently, Tapestry has
distributed object location and routing for large-
the lowest rate, while the other three are
scale peer-to-peer systems," Proc. The 18th
comparable with one another. The simulation
IFIP/ACM International Conference on Distributed
results shows that Chord has the most serious
Systems
detouring problem owing to its unidirectional
Platforms, Springer-Verlag, 2002, pp. 329-350.
clockwise routing along ring. Moreover, Tapestry
has the oorest scalability. Its performance decays
[6] K. Gummadi, R. Gummadi, S. Gribble, S.
at the highest peed, while the node number
Ratnasamy, S.Shenker and I. Stoica, 'The impact
increases. On the other hand, Kademlia performs
of DHT routing geometry on resilience and
similarly as Kademlia in logical tpology but
proximity," Proc. ACM SIGCOMM, ACM Press,
shortens the physical latency remarkably by
2003, pp. 381-394, doi: 10.1145/863955.863998.
rationalize the routing procedure.
[7] S. Ratnasamy, M. Handley, R. Karp and S.
VI. SUMMERY
Shenke,
In DHT networks, detouring problem caused by
"Topologically-aware overlay construction and
topology mismatch degrade the system efficiency
server selection," Proc. IEEE INFOCOM
severely. In this paper, a topology-aware Kademlia
Conference, IEEE Press, 2002, pp. 1190-1199.
is studied to solve it. Based on a presented
distributed clustering algorithm, a novel NodeID
[8] Y. S. Yu, Y. B. Miao and C. K. Shieh,
assignment mechanism is designed to correlate
"Improving the lookup performance of Chord
physical topology and logical topology, so that the
network by hashing landmark clusters," Proc. IEEE
routing procedure is rationalized. Theoretical
International Conference on Networks, IEEE Press,
analysis proves that the proposed Kademlia is also
2006, pp.
a O(logN) running system
1-4, doi: 1O.1109IICON.2006.302674.
but more efficient than the original in physical
topology.Simulation results show that the
[9] B. Y. Zhao, Y. Duan, L. Huang, A. D. Joseph
proposed structure can reduce the latency by
and J. D. Kubiatowicz, "Brocade: landmark routing
nearly 15% through avoiding the mismatch of two
on overlay networks," Proc. International
topologies.
Workshop on Peer-to-Peer Systems Springer-
Verlag, 2002, pp.34-44.
REFERENCES
[10] B. Krishnamurthy, J. Wang, and Y. Xie, "Early
[ I) S. Ratnasamy, P. Francis and M. Handly, "A
measurements of a cluster-based architecture for
scalable content-addressable network," Proc. ACM
P2P systems," Proc. ACM SIGCOMM Internet
SIGCOMM, ACM Press, 200 I, pp. 161-172, doi: I
Measurement Workshop, ACM Press, 2001, pp.
0.1145/383059.383072.
105-109, doi: 10.1145/505202.505216.
[2] L Stoica, R. Morris, D. Karger, M. F. Kaashoek
and H. Balakrishnan, "Chord: a scalable peer-to-

523
PROCEEDINGS OF ― 4th NATIONAL CONFERENCE on ADVANCED COMPUTING
TECHNOLOGIES(NCACT‘11) ― on FEBRUARY 2,2011 @ S.A.ENGINEERING COLLEGE

[11] G. Kwon and K. D. Ryu, "BYPASS: topology- Xiong, Y. W. Zhang, P. L. Hong and J. S. Li,
aware lookup overlay for DHT-based P2P file "Reduce Chord
locating services," Proc. International Conference routing latency issue in the context of IPv6," IEEE
on Parallel and Distributed Systems, IEEE Communications Letters, vol. 10, Jan. 2006, pp.
Computer Society, 2004, pp. 297-304, doi: 62-64,
10.1109/ ICPADS.2004.24. doi: 10. 1109/LCOMM.2006.1576571.

[12] F. Hong, M. Li and J. D. Yu, "PChord: [20] J. P. Xiong, Y. W. Zhang, P. L. Hong and J. S.
improvement on Chord to achieve better routing Li, "Chord6: IPv6 based topology-aware Chord,"
efficiency by exploiting proximity," Proc. IEEE Proc. International Conference on Networking and
International Conference on Distributed Computing Services, IEEE Computer Society, 2005, pp. 4.
Systems Workshops, IEEE Computer Society,
2006, pp. 806-811, doi: 1 0.11 09/ICDCSW.2005.1 [21] S. G. Wang, H. Ji, T. Li and J. Q. Mei,
08. "Topology-aware peer-to-peer overlay network for
[13] Y. Liu, P. Yang, Z. Chu and J. G. Wu, "TCS- Ad-hoc," The Journal of China Universities of Posts
Chord: an improved routing algorithm to Chord and lecommunications, vol. 16, Feb. 2009, pp.
based on the topology-aware clustering in self- 111-115.
organizing mode," Proc. International Conference [22] R. Winter, T. Zahn and J. Schiller, "Random
on Semantics, landmarking in mobile, topology-aware peer-to-
Knowledge, and Grid, IEEE Computer Society, peer networks," Proc. IEEE International
2005. pp. 25-25, doi:10.1109/SKG.2005.121. Workshop on Future Trends of Distributed
Computing Systems,
[14] Y. Liu and P. Yang, "An advanced algorithm IEEE Press, 2004, pp. 319-324.
to P2P semantic routing based on the
topologically-aware clustering in self-organizing [23] The P2PSim Project,
mode," Journal of Software, 2006, vol.17, part 2, http://pdos.csail.mit.edulp2psirnl, July, 2008.
pp. 339-348.

[15] Z. C. Xu, C. Tang and Z. Zhang, "Building


topology-aware overlays using global soft-state,"
Proc. International Conference on Distributed
Computing Systems, IEEE Computer Society,
2006, pp. 500.

[16] H. J. Wang and Y. T. Lin, "Cone: a topology-


aware structured P2P system with proximity
neighbor selection," Proc. Future Generation and
Networking, IEEE Computer Society, 2007, pp. 43-
49, doi: 1O.11091IFGCN.2007.91.

[17] J. Q. Cui, Y. X. He and L. B. Wu, "More


efficient mechanism of topology-aware overlay
construction in application-layer multicast," Proc.
International Conference on Networking,
Architecture, and Storage, IEEE Computer Society,
2007, pp. 31-36.

[18] L. H. Dao and J. W .Kim, "AChord: topology-


aware Chord in anycast-enabled networks," Proc.
International Conference on Hybrid Information
Technology, IEEE Computer Society, 2006, pp.
334-341, doi: 1O.1109/ICHIT.2006.47. [19] J. P.

524

You might also like